Problems with reindexing magento

I had a problem running php shell/indexer.php – -reindex algolia_search_indexer as it ran out of memory. After upping the amount of memory to 4GB (!), I’m now dumping with an exception…

$ /usr/bin/php  shell/indexer.php -- -reindex algolia_search_indexer
Algolia Search Products index process unknown error:
exception 'Exception' with message 'Item (Mage_Catalog_Model_Product) with the same id "484" already exist' in /home/spyshop/upgrade/public_html/lib/Varien/Data/Collection.php:373
Stack trace:
#0 /home/spyshop/upgrade/public_html/app/code/core/Mage/Eav/Model/Entity/Collection/Abstract.php(265): Varien_Data_Collection->addItem(Object(Mage_Catalog_Model_Product))
#1 /home/spyshop/upgrade/public_html/app/code/core/Mage/Eav/Model/Entity/Collection/Abstract.php(1056): Mage_Eav_Model_Entity_Collection_Abstract->addItem(Object(Mage_Catalog_Model_Product))
#2 /home/spyshop/upgrade/public_html/app/code/core/Mage/Eav/Model/Entity/Collection/Abstract.php(871): Mage_Eav_Model_Entity_Collection_Abstract->_loadEntities(false, false)
#3 /home/spyshop/upgrade/public_html/app/code/community/Algolia/Algoliasearch/Helper/Data.php(541): Mage_Eav_Model_Entity_Collection_Abstract->load()
#4 /home/spyshop/upgrade/public_html/app/code/community/Algolia/Algoliasearch/Model/Observer.php(213): Algolia_Algoliasearch_Helper_Data->rebuildStoreProductIndexPage('60', Object(Mage_Catalog_Model_Resource_Product_Collection), 5, '100', NULL, Array, false)
#5 /home/spyshop/upgrade/public_html/app/code/community/Algolia/Algoliasearch/Model/Resource/Engine.php(45): Algolia_Algoliasearch_Model_Observer->rebuildProductIndex(Object(Varien_Object))
#6 /home/spyshop/upgrade/public_html/app/code/community/Algolia/Algoliasearch/Model/Resource/Engine.php(309): Algolia_Algoliasearch_Model_Resource_Engine->addToQueue('algoliasearch/o...', 'rebuildProductI...', Array, '100')
#7 /home/spyshop/upgrade/public_html/app/code/community/Algolia/Algoliasearch/Model/Resource/Engine.php(202): Algolia_Algoliasearch_Model_Resource_Engine->_rebuildProductIndex('60', Array, false)
#8 /home/spyshop/upgrade/public_html/app/code/community/Algolia/Algoliasearch/Model/Indexer/Algolia.php(219): Algolia_Algoliasearch_Model_Resource_Engine->rebuildProducts()
#9 /home/spyshop/upgrade/public_html/app/code/core/Mage/Index/Model/Process.php(212): Algolia_Algoliasearch_Model_Indexer_Algolia->reindexAll()
#10 /home/spyshop/upgrade/public_html/app/code/core/Mage/Index/Model/Process.php(260): Mage_Index_Model_Process->reindexAll()
#11 /home/spyshop/upgrade/public_html/shell/indexer.php(168): Mage_Index_Model_Process->reindexEverything()
#12 /home/spyshop/upgrade/public_html/shell/indexer.php(216): Mage_Shell_Compiler->run()
#13 {main}

Any suggestions on what I should do / where to look to address this?

Cheers,

Steve

Hello Steve,

can you set up an indexing queue and re-index with the queue enabled?
More information about the queue you can find here: https://community.algolia.com/magento/doc/m1/indexing/

After doing that, then boincing back and forth, I got the back end to behave and reindex.
I’m now running ( manually - cron disabled for the moment )
EMPTY_QUEUE=1 /usr/bin/php shell/indexer.php – -reindex algolia_queue_runner
and it’s been going for 15+ minutes ( there’s a lot of storeftonts in there ).

Is there anything else that needs setting up out of the box. Having seen the difference it does make to a site, we really want to get this working.

Although that seems to have kept on going, I have run out of memory ( temporarily set to 4GB ) in gd2.php
PHP Fatal error: Allowed memory size of 4294967296 bytes exhausted (tried to allocate 280897 bytes) in lib/Varien/Image/Adapter/Gd2.php on line 290

EDIT:
I’ve just watched the resources used by your indexer program, and the memory usage just goes up and up and up. Can you please rewrite it so that it releases stuff as it goes along? Otherwise I’ve no idea how much memory it’s going to need to successfully run.

EDIT 2:
I’ve disabled the search at the master level, and just enabled it for a single storefront. Not good.
$ EMPTY_QUEUE=1 /usr/bin/php shell/indexer.php – -reindex algolia_queue_runner
Segmentation fault

EDIT 3:
Running it with a version of php 7.0 it didn’t segfault when just set up to run a single webfront. However, it still hit the 4GB limit and crashed.

If there’s anything we can do to a) help or b ) get this started please let us know. We REALLY do want to use this product

Hello,

the idea behind the queue is not run the indexer with EMPTY_QUEUE=1. The purpose of the queue is to process all indexing operations in small batches with the cron job.
So it won’t run out of memory and will reindex all data correctly.

Can you set up the cron job to process the queue and not process it all at once?

So what’s the point of EMPTY_QUEUE=1 if I can’t run it from the command line.

If I run without empty_queue=1 then it comes straight back in 0 seconds. Nothing extra added. How should I get the initial load done so the extension initialised?

Losing faith and looking elsewhere…

Call me stupid, I haven’t yet tested and experimented that, but running without empty queue in a stupid while loop should do the trick. If you monitor the algolia queue table, it should show progress.
Obviously needs you to stop it manually

I think I must call you stupid - I’m sure that running it in a loop where it’s doing nothing will fix… nothing. Even if it did work, resorting to tricks like this is not the mark of a professional product.

Totally agree! I though you were looking for a manual hack to get it going at full speed.
I’m also not fully happy with the indexing solution!

Hello @kolodziej and @alex1,

unfortunately there are some memory leaks in Magento which we didn’t manage to solve. So when you’re reindexing large amount of products the script might fail on the lack of memory.
That’s why we introduced queue which process only small portion of products at the time.

And the parameter EMPTY_QUEUE=1 is there to give you the ability to process the queue at once, but not on a regular basis. It should be used exceptionally when you for example increase the memory limit for PHP.

I’m very open to any suggestions and improvements you might have for the indexing part to make it smoother and more effective.
@alex1, can you please elaborate more on why you’re not happy with the solution? I’ll be happy to improve! :slight_smile:

When developing, I need to frequently reindex.
I can’t run the reindex from admin backend, because this would time out.
So I need to run from command line. Here I can’t run with empty_queue, due to memory/timeout issues as well.
Ideally, I’d have a way to rekick the queue runner until queue is empty.
Also, judging from a lot of software with queues, you will always need to have control and montoring over it. I.e. a view on it in adminhtml + a warning when it reaches a certain level. Async background task have a nasty tendency to fail.

When I expand my available memory to php to 4GB, it still runs out. This makes it far less smooth and effective, it in fact detracts from your product big time ( and I repeat - I think it’s great! ).

I would have thought, seeing that you are aware of a memory problem, that you’d address it. For example you could batch up the necessary updates ( just create a product master index - that’s never going to hit big numbers with an integer per entry ) , then process that in sections, clearing memory in between. I’m sure there are plenty of similar solutions.

When going live for the first time, it really is imperative that we have a method of ensuring your search is fully preloaded beforehand.