Querying a small number of records with a large number of associated keys/values

Hi folks,

We have a small number of documents (a few hundred), but each of these will have many tens of thousands of associated key/value pairs, which users should be able to search for. Keys and values will be shared between documents.

Things I’ve tried

Including the keys/values in the document - too large
The ‘obvious’ solution is to have the key/value pairs in a nested object, but the object gets far too large (several MB):

{
	"objectID": "1234",
	"title": "Some title",
	"summary": "A text summary about this object",
	... lots more other fields in here
	"associated_keys": {
		"ABCDEF": "Some text about ABCDEF"
		"GHIJKL": "Some text about GHIJKL"
		... tens of thousands more like this
	}
},
{
	"objectID": "5678",
	"title": "Some other title",
	"summary": "A text summary about this second object",
	... lots more other fields in here
	"associated_keys": {
		"ABCDEF": "Some text about ABCDEF"
		"QRSTU": "Some text about QRSTU"
		... tens of thousands more like this
	}
}

Concatenating the keys/values into a string - too large
I have also tried concatenating the keys/values into a string to make them searchable, but we still exceed size limits by a couple of orders of magnitude (10,000 keys like are 250KB):

{
"objectID": "1234",
"title": "Some title",
"summary": "A text summary about this object",
"associated_keys_text": "ABCDEF Some text about ABCDEF GHIJKL Some text about GHIJKL tens_of_thousands more like this"
}

Using distinct and separate indexes

According to the docs, it seems like the solution will be to connect the keys / values in one index with the main documents in another, using distinct. I would like to structure it like this:

# Primary document index
{
	"objectID": "1234",
	"title": "Some title",
	"summary": "A text summary about this object",
	... lots more other fields in here
},
{
	"objectID": "5678",
	"title": "Some other title",
	"summary": "A text summary about this second object",
	... lots more other fields in here
	
}

# Secondary k/v index
{
	"objectID": "someId"
	"key": "ABCDEF"
	"associated_documents": [1234,5678]
},
{
	"objectID": "someId"
	"key": "GHIJKL"
	"associated_documents": [1234]
},
{
	"objectID": "someOtherId"
	"key": "QRSTU"
	"associated_documents": [5678]
}

Is this the most efficient structure, and how can I query against the text in the main document, and either a key or value in the secondary index, to return documents from the main index?

The presence of keys and/or values should ideally be included in snippets, but the results should be ranked in favour of fields in the primary index.

Thanks.

Hi there,

Thank you for reaching out !
For this type of tasks, I think that splitting every documents in chunks of records (one document will have lets say 16 records, all of them with the same documentId) and adding the disctinct feature will be enough. And depending on your custom ranking and other attributes, you will decide on what documents to put first (if you have Keys and values will be shared between documents).
So, split everything into chunks, add the document id to all of them, use the disctint feature on the documentId, and it should be fine.

Let us know how it goes,

Hi @baptiste.coquelle thanks for the response!

Just to be clear, I should use only one index, and index the same document multiple times, but with a different subset of the key/value pairs, like this?

{
	"objectID": "autoDetermined",
	"document_id": 1234,
	"title": "Some title",
	"summary": "A text summary about this object",
	... lots more other fields in here
	"associated_keys": {
		"ABCDEF": "Some text about ABCDEF"
		"GHIJKL": "Some text about GHIJKL"
		... a small number more like this
	}
},
{
	"objectID": "autoDetermined",
	"document_id": 1234,
	"title": "Some title",
	"summary": "A text summary about this object",
	... lots more other fields in here
	"associated_keys": {
		"MNOPQ": "Some text about MNOPQ"
		"RSTUV": "Some text about RSTUV"
		... a small number more like this
	}
},
{
	"objectID": "autoDetermined",
	"document_id": 5678,
	"title": "Some other title",
	"summary": "A text summary about this other object",
	... lots more other fields in here
	"associated_keys": {
		"ABCDEF": "Some text about ABCDEF"
		"RSTUV": "Some text about RSTUV"
		... a small number more like this
	}
}

Have I understood correctly? And then I distinct on the document_id?

If I do this, is it better to have the k/v pairs as an object like I’ve shown above, or a concatenated string?

And you mention 16; is that just to keep the limit well below ~10KB? I could probably fit a lot more than that in a single document.

Ed

So, yes that’s exactly what I mean, and yes distinct on document_id !And yes, very good point :slight_smile: You should definitely have an array of concatenated values instead of objects story k/v, that s the only solution if you want to keep 1 single indices.
And yes as will, we need to limit the size of the records for performance reason.

Let me know if this helps,

Thanks @baptiste.coquelle I’m grateful. Will reply to this thread if I need more input. :slight_smile: