Inconsistent results against a static data set - why?

Hi there, we are in the process of testing to work out what we need, but rather worringly as we have now loaded a data set more akin to our data we are getting inconsistent results which we didn’t see at lower volumes. Some examples are posted below, if I add a space, remove add a word remove i’m seeing different results some higher and lower than before. See screenshots below. Can you advise please?

1 Like

Hi @rob.severs,
That’s a good question. So there are two reasons that explain why the number of hits returned may be different when performing the query several times:

1. Presence (or not) of a white space at the end of the search query: "bmw 3 series leather seat" VS "bmw 3 series leather seat "
By default, the last query word is always considered as a prefixed word. In other words, the query word seat may match records with the word seat (= exact match), but also match with the words seats or seattle. As soon as there is a space added to the end of the search query, the engine considers that the user want has done with that word, and stops considering it as a potential prefix. => Resulting in less matching results.

Note: You have the possibility to change this default behaviour using the queryType index setting.

2. Search engine design and behaviour
By design, and in order to return results in just a few milliseconds, our engine has a timeout mechanism that prevents search queries to spend too much time finding all the possible hits in case the dataset is large.
That has no impact on the result themselves or their ranking, but has a small downside on the nbHits value returned. That one may be exhaustive or be an approximation in case the timeout logic gets triggered.

In addition to the hits, you can find in the JSON response returned for each search query, the two following attributes:

  • "exhaustiveFacetsCount":false
  • "exhaustiveNbHits":false

When equal to false, it means that the nbHits or facetsCount have been estimated, and cannot be considered as exhaustive.

As the engine doesn’t cache any of the search results, when a same search query is sent twice in the raw to Algolia, it’s totally possible that the hardware can process more / less results in the same amount of time. Resulting in variations in the nbHits.

=> As a best practice we recommend not displaying the number of hits and/or facet counts when "exhaustiveFacetsCount" and "exhaustiveNbHits" are respectively equal to false.

2 Likes

@alex hi, would a dedicated enterprise instance give a better chance of the counts returning within time?

Hi @steve.flitcroft,
Moving to a dedicated enterprise instance will for sure give you better performance at both indexing and query time, in addition to additional features like Personalisation, dedicated Solutions Engineer, …

We’d need to run some test on your dataset in order to know more precisely what would be the gain.
Would you like to setup a call later this week to discuss about it?

@alex lex, Happy to have a chat Thursday if that suits, I have been dealing with ben btw from the sales/demo perspective

Hi @steve.flitcroft,
I understand that you’re in touch with my colleagues @olance and @benjamin.digne
I’ve briefed them on the exchanges we had, and they will follow up with you.