As far as I can tell, the ranking & sorting tie-breaking algorithm sorts results either by relevance or by time (date of publication in our use-case) but can’t do both. Ideally, I’d like to exponentially decay results so that they’re ordered by relevance, but that relevance score is weighted by the date of publication (so that highly relevant older results appear below mostly relevant newer results). How can we achieve this?
Thank you for reaching out !
If I understand correctly, you can achieve what you are looking for with Custom Ranking : https://www.algolia.com/doc/guides/managing-results/must-do/custom-ranking/
If you define in your Custom Ranking something related to the date, then the Custom Ranking layer would allow the Ranking Formula to tie-break records with the same relevance.
As for being able to compute a single score that would be the aggregation of some relevance scores with some date of publication, this is not currently possible and we advice to use the Custom Ranking in order to benefit from the full power of the tie-breaking ranking formula.
Please let us know if it helps,
Thanks for your prompt response. The issue I see with custom ranking is that it is a static value associated with each record. A custom ranking value that would address our use case would be something like “Freshness”. For example, we could create bands (categories) for this custom metric which would say whether the article was 1 day old, 2-7 days old, 8-30 days old, etc. However, since time is not static, these static values would need to be updated every day, meaning we’d need to reindex our articles index every day. That doesn’t seem like a good solution.
I guess what I was envisioning was a way to modify the relevance score, at the time of search, such that there’s no such thing as a tie that needs to be broken. That is, there would be no two identical relevance scores since no two pieces would have been published at exactly the same time. Our current problem is the following: if we choose
attribute as our first tie-breaker then we get highly relevant but older articles as the first results. If we choose
date_published as the first tie-breaker, then we get somewhat relevant but very recent pieces as the top results.
What we’re trying to achieve is some sort of balance and I thought there might be something akin to Elasticsearch’s function score query
Although we don’t have a solution to compute a single score that would be the aggregation of some relevance scores with some date of publication, we do have filter scoring.
Could this help in your use case to filter the records based on a score? Your attributes would still need to be static, but you could dynamically create the scores to achieve the balance you desire.
If you don’t want to completely filter out some records, you may want to use the optional filters scoring.
Thanks for this. Yes, I think optional filters could work. We’ll investigate.
exponentially decay results so that they’re ordered by relevance, but that relevance score is weighted by the date of publication (so that highly relevant older results appear below mostly relevant newer results)
This is something I’ve implemented with Algolia in the past, but it isn’t doable at query time. You need to compute a score that you frequently refresh (once per day or week), and the formula can look like the following:
score = popularity / ((1 + nb_days) ** gravity)
- the double star is a power (this formula is using the Python syntax)
- popularity: an integer relevance metric
- nb_days: age of the item expressed in number of days from today
- gravity: a value higher than 1 that will define how fast an item’s score should decay (suggestion: start with something around 1.8).
Hope this helps!
Thanks very much. This is indeed very close to what I was hoping to do but I feel reindexing our articles index each day to update the score will be too costly.
I think we will experiment with some optional filters for the time being but I appreciate the follow up here.
Sounds good, let us know how it goes!