Records viewable only by certain groups with Secure API key doesn't correctly show all records

Dear Algolia Team and Algolia Community,

TLDR: We need help generating Secured API keys for all users based on their user-group.

We’re using Algolia to make our product documentation easily searchable. For this, we’ve created plugins for Discourse to run this together with the forum platform.

Each piece of documentation should not be accessible to everyone, for example, documentation of a product still in beta, should only be able to be viewed by a specific department. For this, we decided to leverage the Groups built into Discourse. For example, the regular user group has ID 10, and the admin group has ID 3.

Each of our Algolia records has a “viewable_by” attribute with an array of numbers, corresponding to these groups. So an example would be: "viewable_by" => [ 3, 10 ], meaning this is readable for everyone in either the regular user group and/or the admin group (usually users in the admin group are also part of the regular group, but that shouldn’t really matter?) Basically trying to mimic what’s explained in this example.

In the backend (Discourse plugin), we generate a Secured API key when an user logs in or when a user’s group-permission changes. We do that like this (in Ruby):

viewable_by = ""
user.groups.each do |group|
	viewable_by += " OR " if (viewable_by.length > 0)
	viewable_by += "viewable_by:#{group.id}"
end

filters = {
	"filters" => viewable_by,
	"restrictIndices" => SiteSetting.algolia_index_name.freeze,
	"validUntil" => 1.year.from_now.to_i
}

public_key = Algolia.generate_secured_api_key( SiteSetting.algolia_search_key.freeze, filters )

The first few lines generates a string that could look like this: viewable_by:10 OR viewable_by:3 and we pass this filter to the filters object. We use the Algolia Rails gem and try to leverage the generate_secured_api_key function.

We save this API key, and send it to the frontend to the user. However, it seems certain records do not appear for users in the Algolia search results eventhough the “viewable_by” contains the group-id they’re in. When I replace the Secured API key with the Search Key, I do get the expected records.

I was hoping anyone could give me pointers on what I might be missing.

Thank you so much!

Kind regards,
Arjen

1 Like

Hi Arjen,

I’m not familiar with Rails, but here are a few things I would check:

  1. Can you make sure that the records that did not appear as expected do contain a viewable_by attribute that really matches the user’s group (you can use the algolia dashboard to check that)
  2. Can you log the values of SiteSetting.algolia_index_name, SiteSetting.algolia_search_key and viewable_by to make sure that they are correct
  3. Make sure that you provide a list rather than a string in the restrictIndices (see https://www.algolia.com/doc/api-reference/api-methods/generate-secured-api-key/?language=javascript#method-param-restrictindices)
  4. If it still does not work, I would only include the filters property in the filters object, and disable the other two filters to check if it solves the problem.

Hope this helps! Please let us know how it goes!

Dear Adrien,

First of all, thank you for your response. Your support is very much appreciated!

I tried to do some testing on the code and benchmarked some of the results. First let me explain the steps I’ve taken:

I generated 1 Secured API key with the following parameters (using the code described in my original post):

{ "filters"=>"viewable_by:10", "restrictIndices"=>"our-index-key", "validUntil"=>1555060310 }

For simplicities sake, I’ve only added one viewable_by parameter. So, an API key was generated based on this. In my frontend, I executed the following code:

algolia = algoliasearch(Discourse.SiteSettings.algolia_application_id, retrieved_secured_key);
helper  = algoliasearchHelper(algolia, Discourse.SiteSettings.algolia_index_name, {
            disjunctiveFacets: ['hierarchicalCategories.lvl1'],
            hitsPerPage: 35,
            maxValuesPerFacet: 1000
          });

const sleep = (ms) => {
    let start = Date.now();
    while ( Date.now() < start + ms ) {} 
}
const benchmark = (res) => {
  const facetValues = res.getFacetValues('hierarchicalCategories.lvl1');
  const facetLength = Object.keys(facetValues).length;
  console.log(`Number of facet values: ${facetLength}`);
}

helper.on('result', (results) => benchmark(results));

// Loop 100 times
const arr = [...Array(100).keys()]
for (let a of arr) {
  helper.search(); // perform search
  sleep(250);  // wait for 250ms
}

So when I run this, I get very mixed results. This are the first 12 lines of the output, I’ve omitted the rest, since it seems irrelevant.

Number of facet values: 164
Number of facet values: 216
Number of facet values: 231
Number of facet values: 235
Number of facet values: 233
Number of facet values: 232
Number of facet values: 235
Number of facet values: 180
Number of facet values: 164
Number of facet values: 229
Number of facet values: 234
Number of facet values: 231
(...)

However, when I run this code with our (global/almighty) Search API key, I simply get 100 times:

Number of facet values: 241

We do not understand this discrepancy between searches with the exact same Secured API key. Any chance you can shed some light on this? We could really use some help.

Kind regards,
Arjen

1 Like

Hi again, Arjen,

As stated in Algolia’s documentation:

If the number of hits is high, facet counts may be approximate. The response field exhaustiveFacetsCount is true when the count is exact.

(Source: https://www.algolia.com/doc/api-reference/api-parameters/facets/)

Dear Adrien,

Thanks again.
When a user hits our search, we do a call to Algolia without any filtering, to retrieve all available facets (‘hierarchicalCategories.lvl1’) as you can see in our code above. In the response from Algolia, this is displayed as:

{
	"facets" : {
		"hierarchicalCategories.lvl1" : {
			"The first facet": 11437,
			"The second facet": 9489,
			"The third facet": 9438,
			"The fourth facet": 8925,
			"The fifth facet": 8372,
			(...)
		}
	}
}

I understand that the numbers/hits in this response might be approximate, and that’s fine. But when I count the actual number of different hierarchicalCategories.lvl1, this number seems to differ as shown in my previous post. I would expect that the facets which I can choose from is always the same. So the number represented in my code above (Number of facet values: X) are the number of unique facets in a response. Not a representation of the counts for each facet.

So if the unique facets I can choose from is also subject to approximation, how can I safely retrieve an exhaustive list of all available facets? I have already asked a question about this before in Searching through Algolia hierarchically with version filters.

Kind regards,
Arjen

Did you try using facets="*" in a search query, as proposed in the facets documentation ?

Also, if you’re using distinct, you could try using the facetingAfterDistinct parameter.

Dear Adrien,

When I try the “facets='*'” option, I do not see any changes in the behaviour.

When using the facetingAfterDistinct: true parameter, the results do look more promising, but still too much variance.

Number of facet values: 238
Number of facet values: 237
Number of facet values: 238
Number of facet values: 237
Number of facet values: 238
Number of facet values: 237
(3) Number of facet values: 238
Number of facet values: 237
(4) Number of facet values: 238
Number of facet values: 237
(4) Number of facet values: 238
Number of facet values: 157
(3) Number of facet values: 238
Number of facet values: 237
(7) Number of facet values: 238
Number of facet values: 219
Number of facet values: 237
(60) Number of facet values: 238
Number of facet values: 161
Number of facet values: 238

As you can see, the number of unique facets is more stable around 237/238, but we still see outliers which are way too far off.
When I use a combination of your proposed solutions, I get about the same response as the one above.

Again, this variance only occurs when using a Secured API key with a filter, but not for the regular Search API key.

Is there anything else you can suggest we can use to use a Secured (filtered) API key and to reliably retrieve a list of all available facets?

Kind regards,
Arjen

Thanks for having tried these solutions and shared your results!

Let me ask a few people about that and get back to you!

1 Like

Hi again,

After investigating with colleagues, my conclusion is that the Algolia API was optimised for speed, relevance and user experience for most regular search use cases, and sometimes this comes at the cost of exhaustiveness.

Even though it will not be always the case, the search engine will sometimes be able to be exhaustive when returning facets despite time constraints. In that case, the exhaustiveFacetCounts property will be returned with search results.

If the counts are exhaustive, a fortiori the list of keys will be exhaustive too.

If you never get that property, our advice is to make queries that are less heavy to run on the search engine.

Does that help?

This is a major disappointment for us.

I get that trade-offs need to be made between completion and speed; that the hit number are approximated would be fine; that level-0 facets are missing in their entirety is not acceptable.
We assumed that Algolia was an enterprise ready application, we we’re evidently wrong.

We will be rewriting our application so that it is less dependent on Algolia reliably returning results.

I’m sorry to read your disappointment, Bas…

I would just like to mention that it is possible to optimise for this kind of uses cases, but we can only do it for Business and Enterprise customers (i.e. not for Essential accounts), because it requires to have their search engine running on a dedicated cluster.

Feel free to contact us if you would like to get more information about our plans!

At this time, I have no interest in quadrupling our costs to fix a (perceived) bug.