I’m indexing long form blog posts somewhere in the length of 100-200 paragraphs. Based on Algolia’s recommendation in their docs, I chunk the blog posts into smaller records to meet Algolia’s records size constraint and to maintain good record relevancy. Each blog post also has other fields within each records that represent it. These are effectively duplicated inside my index. A given blog post may have 100 records associated with it and the only difference between each of those records is the content in the
blogExerpt field. I then use a distinct query on the
blogExcerpt field to only show me one record at a time.
While this approach works, there are some issues here causing me to look for different solutions. The main issue I have is that whenever I need to make an update to a field on my record other than
blogExcerpt, it propagates to all of the records that make up a blog post and causes my indexing operations to be orders of magnitude higher than they would be if I was just updating one record. I rarely have to update the
blogExcerpt field but there are frequent updates to other fields on the blog records.
What are the possible solutions to reduce indexing operations here? Partial updates don’t work because then I’m left with some records having different data than others.
My idea is to create two indices. The first index would be all of the
blogExcerpt records, each with a shared identifier. The second index would be the remaining data fields on the blog post that are updated more frequently also with a shared identifier for a set of
Is it possible to join the 100 blog excerpt records with the 1 other data record during search time as I described above with the instantsearch libraries? If so, could someone point me in the direction of some literature/documentation describing this process or some examples?
Thanks. Let me know if I’m thinking about this completely wrong.