How to approach search indexing for a forum with post content being searchable?

Hi guys, I need your help on how I will approach a search index forum in which I can search the post content searchable. The problem is that I have some threads that has more than 500-800+ posts and most of them have huge amount of texts (multiple paragraphs).

Currently my data format is that the posts is a nested object, so that means I will have a 500-800+ nested object in a thread. The problem is that if I will sync a post (someone updated or added a post in a thread) I will need to sync all thos 500-800+ nested objects in a thread. Am I doing this right? Hope someone can guide me on this on a more optimizied way. I just want the threads as a result. Thanks.

Please look at the screenshot for more info about the data format I’m using.

1 Like

Hello david2,

Thank you for posting on our Discourse. Your idea is a good approach. However, it impacts the relevancy and efficiency of your queries.

A better solution would be to split your record into multiple ones, one of each paragraph, so that you end up with many records matching a single post that must all share a unique ID for the given post. Then, you can use this ID attribute as attributeForDistinct and the Algolia distinct feature which will let you search within those records with good relevancy. This is actually what we do internally with DocSearch.

I highly recommend you to read this article from Julien, our CTO, regarding this specific use-case: https://blog.algolia.com/inside-the-engine-part-7-better-relevance-via-dedup-at-query-time/

EDIT: I forgot to add a link to our guide regarding the distinct feature. Here it is: https://www.algolia.com/doc/guides/search/distinct/#guides

And feel free to tell me if you need more help. :slight_smile:

Have a nice day,

1 Like

Hi Anthony,

Thanks for the solution, I’ve tried it and this is what I need.

Have a nice day,

David

1 Like