[Deduplication] Detect if hit was grouped by Algolia

I have configured the distinct feature of Algolia. It now returns only the most single relevant result (hit), which is good. For my UI I’m using the Instantsearch API (JavaScript) implementation and I’d like to know if the results (hits) I’m looping through have been deduplicated or not.

I’m looking for a property like isDistinct or similar in my hit object, but wasn’t able to find something like that. Is there a possibility to detect if my hit was grouped by Algolia?

Thanks for your suggestions!

Hi @aarongerig,

Welcome to the Algolia community and I apologize for the delay!

There is no way to tell if the hit has been deduplicated because the hits response only contains data from the records found in the index, regardless of how they were found.

If you have distinct > 1, you will see a _distinctSeqID field in the response that tells you the order of the distinct objects found in the response.

You can also get other information about how things were ranked by setting the rankingInfo parameter to true, but this won’t include any information about objects that weren’t found, only about how the current object was ranked.

const search = instantsearch({
  indexName: 'test-distinct',
  searchClient,
  searchParameters: {
    getRankingInfo: true,
    distinct: 2,
  },
});

I have created a codesandbox that demonstrates the getRankingInfo and the _distinctSeqID field. Check out the console to see the results returned for the hit.

Hi @cindy.cullen,

Thanks so much for your response!

Unfortunately this is not exactly what I’m looking for. My use case is the following: Deduplication happens based on an attribute in the index. If the value of this attribute matches on other hits as well, they get grouped. So far so good. Now I’d like to display those hits a bit differently than the ones where no deduplication took place. Too bad Algolia doesn’t provide that info.

I appreciate your effort though. :slight_smile:

BTW: I’m using distinct: true

HI @aarongerig,

Thanks for the details of your use case!

I understand and will pass your feedback and this use case to our team. I can understand where that information might be helpful.

Thanks for using Algolia!

Hi @cindy.cullen,

Is it possible to remove duplicates or - better yet - not allow them to be added?

Hi Ken,

Our “distinct” parameter allows the de-duplication of search results.

Is this what you were looking for?

-Kevin

Yes, I am using “distinct” and it prevents duplicates from appearing in the search results which is great, but I still have a problem:

Say two users each add an item - a link - that counts as a duplicate in our main index. It’s the same link they both want to annotate. Each has added it independently and each works on it, but in the end only one will appear in our results. We would need to tell them that it’s actually a duplicate.

Or, we may remove an item from the index thinking it is gone, but its duplicates will remain. Basically, we want to prevent duplicates or remove them, and the “distinct” param seems to have the potential to help, but how?

Hi Ken,

I understand now, thanks for that clarification. Your question looks similar to the one from Aaron at the top of this thread. Unfortunately as Cindy mentioned in her response there is no way to tell if the hit has been deduplicated at query time. I also do not believe there is any functionality in the Algolia dashboard to see a list of deduplicated records. However, I am going to follow-up on this as a feature request (or double-down on Cindy’s previous request) as I agree it could be useful.

Thanks!

1 Like