I have a website where I am allowing users to create small bits of user generated content on a variety of topics . This content is in the nature of news items on different topics like politics, sports, business etc. that that can then be searched for by other users. Every content created has 4 attributes
5 createdAt - (time)
It is quite possible that the similar news content is generated by more than one user. This could result in more than one user creating content on similar news stories. In such case the “title” and “summary” attributes of the these stories would be similar but not necessarily identical for e.g. one user could create a news item with title “Trump says global warming is a hoax” and another user could create a similar news item with title “No global warming - Trump”.
I need a mechanism to be able to identify such similar content created by comparing the “title” or “summary” attributes for every news item so that I can ensure that similar content can be identified and either removed from the index or grouped together. Ideally everytime a new piece of content is generated by a user, I need to be able to check my algolia index and identify all similar news items in the index. Then I want to be able to review these similar news items together and decide whether to keep the new content on the index or remove it.
I needed some assistance on how I can go about enabling this on my algolia index.