Size limit when indexing blog articles

I’ve seen the article here about maximum record size being 10k for free accounts, but does this mean indexing blog articles is not feasible? I can see there’s a jekyll plugin that automatically pushes post to the index, but how does this then work with the size limit?

Hello,

I’m Tim, the author of the jekyll-algolia plugin :slight_smile: I can give you some details about all this.

So you’re right and there is a 10Kb limit to each record (20kb on paid plans). That does not mean you can’t index blog articles though.

For any large text, what we suggest (and what our jekyll and wordpress plugins do) is to split each article into smaller parts. Usually, we split per paragraph. So each paragraph of text will become one record, but they will all share a common key (like a article_id).

This will make sure each record is under the limit. And by using the distinct feature of Algolia, and setting attributeForDistinct: article_id, it will allow searches to only return one paragraph (the most relevant one) for any article. The trick here is to copy all metadata of the article itself (like the article title, url, etc) into each record, so this can be used in the display.

In addition to fitting into the size limit, splitting into smaller records also has a great impact on relevance, allowing you to display the exact paragraph matching your query in the results.

Hope that helps :slight_smile:

1 Like

Extremely helpful! Thank you very much, I wouldn’t have thought to split it into a record paragraph but it makes complete sense. Re-reading the Jekyll plugin article, I can see it explains this, just didn’t quite click when I first read it. Thank you for explaining :slight_smile: