I’m manage a site and we’re in the process of switching our search to Algolia. The site needs to search not only it’s own data, but data from an external site which I do not control. I can get them to expose a JSON file of data that I can use to create an index, but my question is what is the best way to keep that index up to date when new content is added to the site? Is the best approach to just loop through the JSON and use the addObjects function to push everything back? As long as I manage the IDs this would only add new entries, correct? And then I could use a cron to run it regularly. Or is there a better approach to this?
Just in case, you have the update strategies from the doc here.
You could also see this topic on the forum to see the tradeoff in term of operation counts. I believe update an object is recommended when you know exactly which informations has changed.
If it’s only about new content and you don’t need update, the simplest solution I think is to store the date of last inserted objects and then add only the ones created after this date. May be a contentID will be enough if available and incremental [If not set, each Algolia’s object has an objectID but you can set it as your custom contentID too].
The use of a cron only depends of the frequency you expect new content. The ideal way would be that an entry point of your site receive the new content as soon as it’s created and add/update Algolia’s index
Hope this helps,
Very good answer @pierre.aurele.martin, thank you!
In short yes, you can just run a cron that will fetch all the data from the external service periodically and then push it to Algolia. In order not to consume too many operations, you’ll want to somehow store locally which version of the external data was already pushed to algolia (using timestamps or digests), so that you don’t push it unnecessarily. However that might not be necessary depending on how many external records we are talking about, and how often you will synchronise it.