How can I use search results to improve data quality?

Hi everyone! I’ve been looking at the docs and the analytics api/dashboard but am unsure how to go about this.

My company is generating data (descriptions and keywords) that is fed to Algolia. We use ChatGPT and other models to generate this data, but sometimes it’s a bit misleading. I’ve been thinking how to get Algolia to show me which records need manual refinement but have not been able to come up with something that tells me ‘Product X has faulty data’.

One way (looking for suggestions!) to go about this is to record positions in search results so as to prioritize records like this:

  1. If a record shows up ‘highly ranked’ but is rarely clicked, it may be a sign that the data is not useful and is creating noise
  2. If a record shows up ‘lowly ranked’ but is often clicked, it may be a sign that the data needs improving to better match the queries

How would you go about this? I was thinking of sending every query + result (objectID + position) to prioritize specific records for curation.

1 Like

Well, it depends. In my company we had a similar problem, but it’s more language-related, since everything is in Serbian.
Often, we had the same situation - a user searches for a product, the product is rarely clicked = the data is not useful - a solution here is to implement dynamic re-ranking since it analyses clicks and popularity. More clicks, better ranking. And it worked.
You can try to analyze click position (visible on your dashboard) and see how it performs. The lower, the better.
Also, consider your searchable attributes and their hierarchy, maybe lower the searchable attribute with faulty data.
I think Algolia can’t directly indicate which records need manual refinement, but you can look at analytics and somehow make a conclusion based on that. Does this relate to you?

Makes sense. What I ended up doing was first looking at the regular dashboard and seeing low click through and conversion rates to find opportunities to improve. Next, I’ll use these endpoints to compare top searches and top hits per search (Analytics API | Algolia Docs) to see how records rank.

1 Like