Indexation performance for timestamp attributes with facet

Hi there,

I’m working on a project where we are facing indexation performance issues, and I’m looking for ways to improve this.

Let’s say you store events in an Algolia index.
These events have several timestamp attributes (startAt, endAt, createdAt…).
These attributes are stored as integer values, as suggested in documentation.

If you configure these attributes to be a facet, what is the impact on indexation performance?

It sound natural there is one, because there is an infinite possible values in such attributes : should be harder to index that kind of attribute versus an enum attribute with only 100 distinct possible values.
But I would like to know of there is any kind of documentation about the “nature” of attributes that should/shouldn’t be facet?

Thanks for your help.

Hi this seems to be a similar question to this one here?

What sort of indexing performance issues are you seeing? Do you have these attributes set as searchable? How often are you adding/updating records?

Without knowing your specific issue, here are a couple of articles on optimizing indexing that may help:

Actually, yes. That was me, but with a temp account… But as the post was blocked in moderation for 4 days, I decided to open another one with a “true” account.

1 Like

Got it. Yeah, sorry for the slow moderation there – vacation season. Slightly different answers to the two versions of the questions. Hopefully there’s some things of use between the two.

(I’m sorry if I put some pressure on you, that was not intended, I just wanted to be sure it won’t fall in the forgotten realm of Akismet ^^)
(Anyway, let’s merge discussion here, as this is the exact same topic)

In my case, records get created/updated very often (about 2M record operation per day).

My understanding of faceting is that there is 3 kind of config:

  • facet: will provide for each search a count for each remaining option
  • searchable facet: same as above, plus you can perform text query on the remaining option list to get your values
  • filter facet: there is no count feature available, still you can provide filter in your searches

In the case of a date range picker, I believe I should go with the option 3 (filter facet).
But if I decide to go for the option 1 (facet), won’t Algolia make longer to index my documents ?

Filter facet would definitely be the most efficient here. The other options add meta data that would put some drag on indexing for sure (I’m not sure of exact values).

Do you have a CSE? With an index that large, it might make sense for you to have some more dedicate guidance.

It almost sounds like you’re trying to do log search, which isn’t a great fit for Algolia without tight scoping.

OK, so it confirms there is a performance impact when activating true facet mode on a field.

I don’t know if the company I work with is having a CSE, but it would definitely help.

The scenario I used here is fictional, we are not recording logs in Algolia, but the data do changes a lot.
And of course we are having performance issues, this is why I’m looking for leverages on this topic.

Yup, in order to search over facets or display record counts there needs to be additional operations at index time. Thus the other settings to remove those operations if you don’t need them.

1 Like