Snippeting unretrievable attributes

I’d like to be able to snippet unretrievable attributes.

While this may seem a slightly odd requirement, I think it’s valid. I’d like to make the snippet available if there is a match, but not expose the whole field (it is a multiline text field)

Right now if I set an attribute to be unretrievable, I don’t get any snippets.

I know I can set a limited number of retrievable attributes, and leave out the attribute I don’t want to expose, but this is overridable at query time.

Can I do this? If not can I propose it as a feature request? Or make it so retrievable attributes are not overridable at query time?

Hi @dan

The purpose of unretrievableAttributes is to hide sensitive information from users.

If you are willing to make an unretrievable attribute “snippetable” it means that any part of the attribute could be retrieved which makes it retrievable.

In that regard, why not simply make the attribute retrievable and use the snippet?

I’d gladly hear more about your use case!

Ah, the old “What you are trying to do is wrong”…:wink:

I simply want to be able to snippet an attribute, but not have the whole thing returned.

My use case is that I am indexing CVs. If there is a match in the CV text I want to snippet it, but I don’t want the entire CV retrievable.

Ah, the old “What you are trying to do is wrong”

Definitely not what I was saying, sorry if you perceived it that way :confused:

With your last answer, the use case is now clearer for me and I can tell the use case definitely makes sense.

What I’d advise, is that you store the “content” of the CVs in a searchable attribute and enable snippetting on it.

You could also avoid sending us the data you don’t want to be retrieved if you don’t use them in the ranking formula.

Let me know if this can’t work for your use case.

That doesn’t quite work.

The entire cv is a single field of text (we extract it from a .doc/.pdf etc) so there is no easy to separate the parts we want searchable from those that are more ‘sensitive’

The data is available to the users who have access rights to search, but what I want to avoid is it being possible to write a simple script that would download full CVs wholesale.

So either the field needs to be unretrievable, or the retrievable attributes shouldn’t be overrideable by the client (I think)

Hmmm,

I personally would try to find a way to remove maybe the first lines of the CV or ask users to fill in informations in different fields.

Of course in your use case, you seem to already have the CVs and no easy way to detect what is sensible and what isn’t. The problem is that even inside the CV there could be sensitive information.

Maybe someone else has a better idea on how to approach this?

Asking the users who upload their CVs to do more is not an attractive option, as that is putting extra barriers in the way of capturing their data.

As I say the users who have access to search the CVs can see the full CV anyway. This is more about making it harder to be able to make a script to download the CVs wholesale (right now it is trivial: override the attributesToRetrieve at query time to get the cv field in bulk)

Elsewhere in the app we have logic to stop mass CV harvesting, but Algolia still allows it (if we want snippetting of the CV results)

Like I say, either of these would resolve it:

  • Make unretrievable attributes snippetable
  • Don’t allow retrievable attributes to be overridden at query time

Hmm… can I do this with a secured API key? Can the retrievable attributes be part of the HMAC?

1 Like

Oh wow that totally works! Ace!

1 Like

I also ran into this use case, for a different reason I guess. I just don’t want the full attribute to be sent over the wire because it is large (several KB of text) and doesn’t need to be used or shown as long as I have the snippet to display to the user. Or should I be designing this a different way? Thanks.

You’re on the right track, you can use attributesToRetrieve or unretrievableAttributes to prevent that field from coming back in the payload.

Thanks @dzello

I realized what I had to do. I had attributesToRetrieve properly set to exclude that undesired attribute, but it was still coming through in the _highlightResult payload. Managed to get rid of it with attributesToHighlight = [].

1 Like