Optimize record update operations

Hi,
I’m developing a website for a client and I was planning on using Algolia’s Starter plan.
It’s a static website, so the client will trigger new builds when the content changes, likely a few times per day.
In the build script, I included a full Algolia reindexing (delete the index, re-create it and populate it with about 5k records).
But I realize now that, by looking at the Monitoring page, and after having received a quota alert, that this costs a lot in term of “record operations”.
Is there a was to optimize this? Maybe perform a diff on the index when building?
I was thinking about making a query to retrieve the full index, performing a diff and only updating the changed records, but there might be a better way…
Thanks for the help!

Hi Antoine, thanks for reaching out!

The fact that you’re using a static site generator makes this kind of reindexing strategy a bit harder because you can’t rely on a last_updated column in a database. However, there is a way to do it, but it requires some coding on your end.

First, what is the static site generator that you’re using? If this is Jekyll, then you should use our Algolia for Jekyll plugin, which implements the reindexing solution you’re looking for.

If you use something else, you want to manually reimplement the way the Algolia for Jekyll plugin does reindexing.

The idea is to leverage the unique objectID that each record must have, and assign it yourself by hashing the current content of a page and assign that hash as the objectID. Whenever the content changes, the objectID would change as well.

This would let you compare the list of records you have on your Algolia index (using the browse method), and the new list of changes to push: if an objectID that you’re about to push already exists in your index, you don’t have to push it, and you can remove it from the list of records to push. If it’s not, then it means it was either changed (and therefore has a new objectID) or deleted. Therefore you can create another list of records to delete from your index.

Finally, you can use the batch method to send the list of records to add and the list of records to delete at the same time.

This method allows you never to try to update records that haven’t changed. If your end user has 100 articles and only updates one, you would only perform two indexing operation: one add (the new content with a new objectID) and one delete (the old content with the old objectID), instead of 100.

Happy coding!

1 Like

Thanks @sarah.dayan! I’m using Gatsby.
I went with something like this in gatsby-node.js#onPostBuild:


  const index = algolia.initIndex('events')

  const round = (score) => Math.round(score * 1000000) / 1000000

  const getEventRecord = (event) => ({
    objectID: event.id,
    path: event.path,
    title: event.title,
    image: event.image && event.image.publicURL,
    date: event.date,
    date_score: round(event.past ? 1 + 1000000000 / event.date : 1 - 1000000000 / event.date),
    // ...and some other specific fields
  })

  const diff = (previous, current) => {
    const update = {
      objectID: previous.objectID,
    }
    const attributes = Object.keys(previous)
    attributes.forEach((attr) => {
      if (JSON.stringify(previous[attr]) !== JSON.stringify(current[attr])) {
        update[attr] = current[attr]
      }
    })
    if (Object.keys(update).length > 1) {
      return update
    }
  }

  await new Promise((resolve, reject) => {
    const updates = []
    const ids = {}

    const browser = index.browseAll()

    browser.on('error', reject)

    browser.on('result', (content) => {
      content.hits.forEach((record) => {
        const item = data.allEvent.edges.find(({node: {id}}) => id === record.objectID)
        if (item) {
          const object = getEventRecord(item.node)
          const update = diff(record, object)
          if (update) {
            updates.push({
              action: 'partialUpdateObjectNoCreate',
              indexName: 'events',
              body: update,
            })
          }
        } else {
          updates.push({
            action: 'deleteObject',
            indexName: 'events',
            body: {
              objectID: record.objectID,
            },
          })
        }
        ids[record.objectID] = true
      })
    })

    browser.on('end', () => {
      data.allEvent.edges.forEach(({node: event}) => {
        if (!ids[event.id]) {
          const object = getEventRecord(event)
          updates.push({
            action: 'addObject',
            indexName: 'events',
            body: object,
          })
        }
      })
      console.log('algolia batch:', updates)
      resolve(algolia.batch(updates))
    })
  })

But I guess I could also just use a hash to compare objects, like:

require('crypto')
    .createHash('md5')
    .update(JSON.stringify(event))
    .digest('hex')

Anyway, thanks!