Rank based on frequency

Hey! I’m using Aloglia in my Bubble app.

I want to search through a bunch of keywords/tags that users may have linked to their account.

This can be cities, colors, numbers, etc.

User #1: Red, Blue, Chartreuse, Red, Blue, Gray, Gray, Blue, Blue, Purple, Purple, Purple, One, Two, One, Three, Four, Five, One, Two, One, Two, New York, New York, New York, New York, Tucson, San Francisco, San Francisco

User #2: Red, Red, Red, Blue, Green, Green, Green, Green, Purple, Purple, Purple, One, One, One, Two, Two, Three, Four, Four, Four, Five, Six, Six, One, One, Three, New York, New York, New York, New Mexico, New Mexico, New Mexico, Tucson, Tucson, Greenville, Greenville, Greenville, Greenville, San Diego, San Diego, San Francisco, Tucson, Tucson, Tucson

You can see that user #2 has MORE Tucson tags.

However, when I search for the ‘Tucson’ query, User #1 is ranked at top.

I’m struggling with understanding how to refine my ‘exact’ parameter. I’ve adjusted and tested multiple times… Not sure why it’s not working.

Thanks in advance!

I suppose before anything I should be asking the question: can this be done natively through the dashboard or will this require something custom?

Yup - officially going insane. Seems VERY simply but it is not working.


As you can see, using what I’ve set above, when I query ‘Tucson’, user #1 is on top:

image


Please help this make sense!

Dang, I really thought the Algolia team was more responsive as indicated by ‘this forum is monitored during business hours’ lol

Hi Cameron –

I meant to post a message here to tell you, but I actually dug into this on our livestream today.

The short answer: I couldn’t find an easy way to do frequency-based ranking in Algolia.

Algolia search is built for fast accurate, prefix-first text search, so this use case a little outsider our wheelhouse.

Algolia’s algorithm was actually built in response to frequency-based rankings of earlier search engines like TF-IDF. We’ve found frequency to be an inaccurate ranking criteria for the kinds of consumer facing text search we focus on. There’s a pretty comprehensive blog post about it here.

You can find more information in the documentation around the Exact criterion as well:

One caveat to keep in mind is that exact words are only counted once per record. If the same word has 10 exact matches in a record, whether it’s in the same attribute or not, that word of the query still gets only 1 point. This means exact is always a number between 0 and the number of words in the query.

On the livestream I played with pre-calculating values for the tags, so the record would look more like:

{
  "user": "Bob",
  "objectID": "2",
  "unique_tags": "Red, Blue, Green, Purple, One, Two, Three, Four, Five, Six, New York, New Mexico, Tucson, Greenvile, San Diego, San Francisco",
  "tag_sums": {
    "Red": 3,
    "Blue": 1,
    "Green": 4,
    "Purple": 3,
    "One": 3,
    "Two": 2,
    "Three": 1, 
    "Four": 3, 
    "Five": 1,
    "Six": 2, 
    "One": 2, 
    "Three": 1,
    "New York": 3,
    "New Mexico": 3,
    "Tucson": 5,
    "Greenville": 4,
    "San Diego": 2,
    "San Francisco": 1
  },
}

You could then use unique_tags as a searchable attribute and each of the tag_sums as a ranking/sorting attribute. Obviously, this only makes sense with a fixed set of tags, since each new tag would need to be added to the index configuration.

For handling frequency-based ranking in unstructured tag strings you’ll probably want a different tool.

2 Likes