Best practices on handling the search index

What’s a good approach in which (rough) case?

I have a database that I want to index and I have thought of a few approaches to index it:

  1. Batch create all my indexes. This is obviously good if I want to initialise a search index on already created and not yet indexed data.
  2. On creation / deletion of objects, e.g. hooked into ORM event handlers. Very much realtime and I cannot think of any no-cases.
  3. Scheduled Atomic reindexes, basically an extended approach of 1. with periodical iteration to keep the index as up-to-date as possible
  4. Queued indexing, where I would track what has been added, removed or updated in my database for the last X hours / days / whatever and then process the appropriate indexing operations on them after another passed time period.
  5. …more?

Hi,

  1. It’s best to initialize your index first if you have settings to set. It’s always best to configure the settings before you start indexing the data. Other than settings (or synonyms) there is no way need to init the index. It will be created automatically when you start indexing data.

  2. This is ideal, this will keep your data in Algolia as up to date as possible. It works great with 4.

  3. This strategy is the best to initialize your index, send all existing data and then use 2. to keep everything in sync. It’s very useful to recover after an error or if you batched a lot of operation in your database without using your ORM for instance. This one should include 1.

  4. The queue in interesting your your language is calling Algolia synchronously. When a user deletes a comment, you may not want to wait for the Algolia HTTP call to come back before showing the confirmation message to the user. Although it’s best to run the queue very often, like every few minutes.

Are you using a framework by any chance? These 4 ideas are implemented in our framework integration (Laravel, Rails, Symfony and Django).
Please let me know if you need further details.

1 Like

Hi, thanks for your answer. I am using express.js as a graphql endpoint and it also handles the database. I just wanted to know if I’ve thought through everything as I am starting to dive into this whole implementation :slight_smile: