Data structure for properties in Algolia - Long documents, large properties

I’ve been reading through Algolia’s size limits and handling indexing of large documents, and I have a question regarding the optimal way to structure properties that might contain a long string of text.

For the purpose of semantics, what I refer to as a “property” would be interchangeable with what Algolia calls a record’s “attribute”.

We have run into property (attribute) kb limits before and are trying to avoid them.

My question is in relation to the two following data structures that might contain a property with long set of comma delineated strings:

{
    property: ["value1", "value2", "value3", "value4"]
}

versus, something that breaks this up into multiple sub properties:

{
    property: {
        subprop: "value1",
        subprop: "value2",
        subprop: "value3",
        subprop: "value4"
    }
}
  1. Given an attribute with the following two structures, does Algolia’s record limit see these as different sizes, with one accommodating more properties than the other?
  2. Is there a benefit to one over the other in terms of hitting record limits? Or are they seen as essentially the same to Algolia in terms of these limits?
  3. If you had to have a record with a property that might be a very long list of keywords/tags, would the different data structures above help prevent hitting record limits?
  4. If not, how could you accomodate a long list of keywords/tags to be used as a searchable attribute as to not hit record limits?
    (Outside of the obvious “limit the number of properties and their character lengths” which poses a challenge of knowing the right “size” to limit for Algolia records)

Thanks hoping someone can help me out in understanding this so we can better prepare our data for Algolia!

Hi @mattucci

Size limits are across the whole record, so from our index’s perspective these attributes/properties would be identical. Since the records live in memory, there are physical constraints on the record size.

Typically, when there is a long list of values (users, tags, etc.) we recommend using integer identifiers to represent the underlying users/tags/etc. If you can figure out a way to turn the property into an integer array, you can use integer compression to reduce the overall size of the property.

1 Like

Thanks Chuck! This is very useful information! Thank you for clarifying that the different structure is all the same.

How would you make the integer array’s referenced values searchable?
If some tags are now set as [1, 2, 3] and that equates to the values tag1, tag2, tag3,
what I am assuming is there would be another index or other data store that would contain some key/value pairs of the integer array like:

{
    1: "tag1",
    2: "tag2",
    3: "tag3"
}

How would you implement search to still pick up a search query of “tag1” when now all that is available is a list of integers in the index’s property?

Let me know if I am understanding this correctly.