How to set up facets with Docsearch


I’m totally new to Algolia, and I’ve set up the docsearch scraper in a docker container to crawl my site. It is a software documentation site with documentation for different versions (example: 6.0, 6.1, 6.2). The versions of the content sit in directories named for their version (6.0, 6.1, 6.2). Right now, search suggestions don’t specify which version that they are related to. I want to set up facets so that those versions appear.

I’ve tried setting it up in the config.json file, based on examples I’ve seen, but when I run the docsearch scraper, it tells me the config file is not valid. So, there’s something I’m missing somewhere.

I added a facet for ‘version’ in the Algolia dashboard for my APP ID. But I’m trying to understand how I tell Algolia that content in the 6.0 directory is ‘version 6’, and so on for the other versions, so that it will be displayed under the correct ‘version’ in the Algolia search suggestions that appear when searching on my site. Can you tell me what I need to do to get this working?

btw, here’s my site with the Algolia search: Introduction to ThoughtSpot | ThoughtSpot Demo Site

Any assistance would greatly be appreciated!


You’ve correctly identified the problem: to add a facet on version, you need an attribute with this version in your records.
If the version is in your documentation URL, you can make use of the variables feature of the configuration.

You can see its documentation here: .

You can also define a variables key that will be injected into your specific URL pattern. The following example makes this variable feature clearer:

  "start_urls": [
      "url": "<lang>.*?)/(?P<version>.*?)/",
      "variables": {
        "lang": ["en", "fr"],
        "version": ["latest", "3.3", "3.2"]

The beneficial side effect of using this syntax is that every record extracted from pages matching will have attributes lang: en and version: latest . It enables you to filter on these facetFilters .

If it’s not exposed in the URL directly, you then have two options:

  1. If you can differentiate them using different URLs, you can use tags
  2. If you can only find it in the page content, you can add an entry to selectors different from the default lvlX and text. You most likely want to mark it global: true too.
      "selectors": {
        "lvl0": "h1",
         "version": {
           "selector": "#version",
           "global": true

If you’re looking for example usages of each feature, you can use the algolia/docsearch-configs repository, e.g. for the variables feature: