Receive Results but Make Them Impossible to Access

Hello Algolia community :wave:

I’ve got a bit of a unique situation. I’m building a web app with Nodejs, Express, and MongoDB. The business model revolves around the sale of industry specific data. The user can search my website to find out if I have data on the topic that they searched. But they’re not allowed to actually see the details of that data until they’ve decided to purchase the full report.

So I need users to be able to search for a topic, and Algolia should be able to search through all of the data, including the stuff that the user isn’t allowed to see until purchasing. Algolia needs to return all the unpaid-for data to the user’s browser where some javascript recomposes it for display.

The problem with this is that a user could easily open the browser’s dev tools, turn on debug mode and find the complete set of data before they’ve paid for it.

The recomposing that my script does is basically faceting the data, but I don’t think I can make Algolia do the group faceting that I want because I’ll never know how many items will be in the groups, also Algolia only lets me facet search results by one specific attribute whereas I want to facet the data by every attribute that had a matching value.

Example:

• User searches ‘computer programmer’ with a filter to only include records with ‘location’: 'Australia’

• Algolia returns these results. All of them contain ‘computer programmer’ at some point, but not all in the ‘occupation’ attribute.
{
occupation: ‘architect’,
employer: ‘BCD Global’,
age: 43,
hobbies: ‘computer programmer’
location: ‘australia’
}
{
occupation: ‘computer programmer’,
employer: ‘Algolia’,
age: 29,
hobbies: ‘racing cars’,
location: australia
}
{
occupation: ‘computer programmer’,
employer: ‘Software Makers’,
age: 43,
hobbies: ‘architect’,
location: ‘australia’
}

• These records would get recompiled in the browser to show search results to the user as follows:

Results:
Item 1
Occupation: ‘computer programmer’
Location: ‘australia’
Count: 2

Item 2
Hobbies: ‘computer programmer’
Location: ‘australia’
Count: 1

The user should not be allowed to know any more information than what the result item tells them on the GUI, but their browser did have the complete set of data in the background for a moment. The only way around this, that I can find is to have the search box on client side contact my server to do the search request to Algolia, then the server recomposes it, and sends it back to the client. But Algolia doesn’t recommend that and I don’t like that solution either.

Please, any suggestions. Thank you! :slight_smile:

Hello,

That’s a nice advanced use-case you have here. I’ll try to do my best to give you a few pointers.

First of all, I don’t think the way you want to group the results will be possible. Algolia can give you the count of each facet, but not the way you want it.

What we’ll do is tell you "For the query XXX you just did, we found 700 records. Out of those 700, 200 have a hobby of YYY and 300 have a hobby of ZZZ". We can’t tell you “Out of those 700, 200 actually have XXX as a hobby”. What we do is we find all records that match XXX (in any of your searchable attributes), then filter those results based on the value of the facets. The actual XXX may or may not be in the actual facet value.

So, you could try to implement the count in the front-end. You could get all 700 results returned, and loop through them to count how many have XXX in their facets, then display this number. This would work fairly well, until you have more than 1000 matching results. Our API can only return up to 1000 results in a response. It means that if you actually have more than 1000 results matching, we’ll only return the 1000 first, so all the front-end calculation will be off :confused:

Now regarding the securing of data I see two ways. The first one would be as you suggest to add your own backend as a proxy. This will let you have complete control over what to return to the end-user, but it will also add latency to the time required to display results. It is doable, but you will lose some of the speed goodness we offer.

The other solution would be to use our attributesToRetrieve and/or unretrievableAttributes settings. Those two settings will let you define (through a whitelist/blacklist approach) which attributes will be part of the response.

It means that you would be able to do a search on several fields, while preventing those fields from being returned to the user. For example, you could allow a search in the content field, but not return this content to the display.

Hope this helps :slight_smile:

Fantastic response, I can tell you really thought about it. Thank you for that.

I’ve decided to go with running it through my own backend despite the extra delay. The search functionality on this web app is only going to get more advanced as time goes on, so I want to be sure from the beginning that I can achieve everything I need.

It would be nice if we could specify a script to run on the Algolia server before returning the results. This would help retain the speed goodness and add a lot of extra flexibility to your platform. I understand there’s some serious complexity involved in making this happen, but my thinking is: If c9.io can safely run people’s code on their servers, maybe Algolia can do it.

That’s a great idea. We’ve been thinking about similar things, but nothing as advanced as Cloud9 yet.

Hope you’ll find a way to have the grouping you want. Don’t hesitate to come back and share a link to what you build :slight_smile:

1 Like

Another question:

I have two search fields.
The first one takes any text input and searches all attributes of my records.
The second field has an autocomplete on it. That autocomplete should only search the location attribute of the same records to make suggestions.

I can obviously get the autocomplete to only SHOW the location attribute, but it’s still searching all attributes because that’s how the index is setup, which means it often matches with attributes other than location and shows irrelevant location suggestions.

Ideally I’d be able to override the searchableAttributes at autocomplete query time, but it seems I can’t. Is there another way to only let it match with the location attribute or do I need a separate index (possibly only containing the locations) and have it setup to only match with the location attribute? I don’t like this idea because it means it will always literally double my record count.

You’re in luck, because we have restrictSearchableAttributes which does exactly that :slight_smile:

This will allow you to search only into a subset of your searchableAttributes at query time :slight_smile:

Excellent, thanks again for that.

I’m afraid I have another question that I can’t seem to answer on my own.

Is there a way to make the filter attributes on query time be flexible enough to not require an exact match? For example:
Continuing on with my having two search fields to define a single search event…
Search field 1: “astronomy”
Search field 2: “Victoria, Australia”

My search query ends up like this:
index.search({ query: “astronomy”, filters: “Victoria, Australia” });

I want it to filter out anything that doesn’t have a location containing “Victoria, Australia”.

None of my records have a location attribute of just “Victoria, Australia”. They all have a town and sometimes a street name as well, like such: “Campaspe Street, Whittlesea, Victoria, Australia”.

Filter values do not match sub-strings. It needs to be the whole, correct string or it returns nothing. Is it possible to filter results to only those which have a location attribute which contains a specific sub-string?

For the mean time I’ve been just combining “astronomy” with the location into one string and just setting that as the query string, but that often returns results where the location matches with other attributes and not the location, which makes it become quite irrelevant.