Using a URL as an Algolia objectID

We want to use Query Rules (great feature by the way!) to promote a specific webpage result.

We are crawling our webpages and indexing them as records in Algolia - Algolia creates the objectID. We plan to re-crawl each night, delete the index content and re-index with the new results of the crawl. Much of the data won’t have changed but we’re doing it this way because it’s more straightforward than having to keep track, across our many websites, when new webpages have been created or their content updated.

When we reindex, of course the Algolia generated objectIDs, mapping to the webpage records, will differ from what they were before and therefore the existing query rules will work incorrectly. This means that we will have to delete these query rules and add them again.

We could generate our own objectIDs and map them to specific webpages, then when we reindex we’d have ensure that the same ID maps to the same URL. But rather than generate our own IDs and keep a track of their mapping to certain URLs, why not use the webpage URL itself as the objectID? Has anyone else done this? Any gotchas using a URL as an objectID?

1 Like

Hi Amanda,

Indeed the right approach will be to set the ObjectID on your side, so it doesn’t change.

ObjectID must be unique, so using URL for that will work if you have 1 record per page. If your pages are too long and are split, you can set url in another attribute to be able to use distinct. Then you can just suffix the ObjectID: http://example.com/about.html-1, http://example.com/about.html-2

If you go for URLs, please share your experience here after :slight_smile:

And thanks for Query rules, we are very proud of this feature!

Thanks @julienbourdeau for the info - as we’re still experimenting and thinking things through that’s good to know.

What you could also do I believe is hash your urls to be used as objectIDs, like http://www.google.com would be 738ddf35b3a85a7a6ba7b232bd3d5f1e4d284ad1. So that you avoid the issue of having too long urls. You can still store the url in another attribute for better reading.

Yes that’s a good idea as part of the crawler framework we may be using fingerprints the request with a hash based on the url, among other things - so perhaps we could use that.