Question about how to structure data

Hi, I have a data structure that includes aliases that can be user defined. So I have object “Foo”. “Foo” has a global alias “Bar”. User X adds an alias “baz”. When user X searches for “Foo”, “Bar” or “Baz”, the “Foo” object should be found. When user Y searches for “Foo” or “Bar”, the Foo" object should be found, but if she searches for “Baz” the “Foo” object should NOT be found. Is there a way to do this with algolia?

Thanks.

Hi Ben!

The only way I can see to build that is the following:

Data structure

Index all your objects as “global” by including a global tag in the data:

{
  "id": "42"
  "name": "Foo",
  "_tags": ["global"],
  "aliases": ["Bar"],

  // ... 
}

Then when a user defines an alias for one of those objects, duplicate the record and make it specific to the user by tagging it with their user ID. For instance, say user 123 adds the alias “Baz” to “Foo”:

{
  "id": "42",
  "name": "Foo",
  "_tags": ["uid_123"],
  "aliases": ["Baz"],

  // ... 
}

Note that the only things that change between the original record and its duplicate are:

  • The _tags attribute that contains an identifier for the user instead of "global"
  • The aliases attribute that contains all the aliases this user has set for this object

Index configuration

  • Add name and aliases to your searchable attributes (Ranking tab)
  • Set distinct to true (Display tab)
  • Set the attribute for distinct to id (Display tab)

You can learn more about distinct here: https://www.algolia.com/doc/guides/ranking/distinct/

Here we’re using distinct to de-duplicate objects with the same id. I’ll explain why below.

Searching

You’ll basically search as you’re used to, only with an additional filter when your user is logged in.

If you know who’s searching (i.e. logged in user), what you want to do is make sure we search through the global aliases and the ones they’ve defined.
To do so, you’ll add the following filter to your search parameters: filters=global OR uid_123
Obviously the uid_xxx will depend on the current user.

As a result, the search engine will only take into account the global records that include the global aliases, and the records defined for the current user with his own aliases.
So when user 456 makes a search, he won’t see the record tagged uid_123 and thus won’t match “Foo” on the “Baz” keyword.

The distinct feature here is important to make sure we only keep one of the two records that can be returned because of this filter: either the global or the user-specific one.

Updating objects

To easily update your global and user-specific records, use predictable values for objectID.

For instance, global records can bear the id value as an objectID:

{
  "objectID": "42",
  "id": "42",
  "name": "Foo",
  "_tags": ["uid_123"],
  "aliases": ["Baz"],

  // ... 
}

And user-specific records a compound of the object ID and the user ID:

{
  "objectID": "42:123"
  "id": "42",
  "name": "Foo",
  "_tags": ["uid_123"],
  "aliases": ["Baz"],

  // ... 
}

You then have to distinguish different cases:

  • When the object 42 is updated:
    • Update the records 42 and 42:XXX for each user XXX who has defined custom aliases
  • When only a global alias is added to object 42:
    • Only update the aliases attribute of the 42 record
  • When user XXX adds an alias to object 42:
    • Only update the aliases attribute of the 42:XXX record

For all those updates, you can use the partial updates functions to update only specific fields in existing records and also create user-specific records on the fly when they don’t exist (for the first custom alias of a user on an object)

Drawbacks

Obviously this solution is going to significantly increase your number of records depending on user activity and number of users.

As an optimization, if it applies to your use case, you can try to aggregate records for multiple users defining exactly the same aliases for an object. To do so will require some more work for you on the backend to detect overlaps, but then you just have to tag multiple user IDs on a single user-specific record.

I hope this helps :slight_smile:

1 Like

Thanks so much for the detailed reply! I came up with a slightly different solution, and wanted to see if there were any pitfalls here, since you came up with a different one. Instead of using the “tag” I used a new field “userId” and then created a facet on “userId”. The global “userId” is “0”, so I search with "filters: “userId:0 OR userId:123” along with the distinct flag.

I realize that this will create a lot of user facets, but I didn’t see a limitation to the number of facets allowed.

Thanks, Ben

Hey Ben, you’re welcome!

This is fine, but the performance using facets is going to be slightly lower compared to using _tags, because using facets means the engine is going to try and count the hits for each value of userId in your result.
Even though your filter will limit this to a single value to count the hits for, it’s still an overhead you don’t need.

To get the same performance, declare your facet userId as a filter using the onlyFilter qualifier.
You just have to type it in and hit Enter:

https://d1ax1i5f2y3x71.cloudfront.net/items/473H2a3l0s143n0o1c43/Image%202017-07-17%20at%2011.17.14%20AM.png

This is basically what _tags is under the hood :slight_smile:

Thanks, one more quick question. Is there any limit, strict or performance-based on the number of tags an object can have?

Hey Ben, sorry I missed your question!

There’s no strict limit on the array itself, but a practical/performance one:

  1. your records cannot exceed 10kB in size, so you cannot have too big of an array otherwise you will go above that limit
  2. performance-wise, the larger the array the longer it will take for the engine to parse it when reading the record. It can have quite an impact when you start having many items.

As a result, we don’t recommend storing more than a few thousands tags per record.
Performance will also depend on your use case and how you’re using Algolia; so definitely do some tests :slight_smile: