Sometimes, Algolia is not recognizing the tags and variables

Hi folks, I’m facing an * intermittent issue using Algolia + DocSearch scraper.

For some reason, some executions of the scraper are not well parsed by Algolia, it means that sometimes Algolia is identifying the tags + variables on the records and sometimes it does not.

On this image, you can see the record has a tag “coder” but the dashboard is not showing the filters and it is also returning empty when I use “coder” as tag in the query parameter:


However, after run the scraper again, the dashboard is returning the correct values.

Here it is the docsearch scraper config:

{
    index_name: "docs",
    start_urls: [
      {
        url: "https://coder.com/docs/coder/(?P<version>.*?)",
        variables: {
          version: ["latest", "v1.20", "v1.19"],
        },
        tags: ["coder"],
      },
      {
        url: "https://coder.com/docs/code-server/(?P<version>.*?)",
        variables: {
          version: ["latest", "v3.10.2"],
        },
        tags: ["code-server"],
      },
    ],
    selectors: {
      lvl0: "section .crumbs a:last-child",
      lvl1: "section header h1",
      lvl2: "section h2",
      lvl3: "section h3",
      lvl4: "section h4",
      lvl5: "section h5",
      text: "section p",
    },
    min_indexed_level: 2,
  }

Any help is appreciated, thanks.

Hey Bruno,

You can ensure that these variables will be added as facets by adding them to the attributesForFaceting array in the custom_settings of your config.

Your config updated:

{
  "index_name": "docs",
  "start_urls": [
    {
      "url": "https://coder.com/docs/coder/(?P<version>.*?)/",
      "variables": {
        "version": [
          "latest",
          "v1.20",
          "v1.19"
        ]
      },
      "tags": [
        "coder"
      ]
    },
    {
      "url": "https://coder.com/docs/code-server/(?P<version>.*?)/",
      "variables": {
        "version": [
          "latest",
          "v3.10.2"
        ]
      },
      "tags": [
        "code-server"
      ]
    }
  ],
  "selectors": {
    "lvl0": "section .crumbs a:last-child",
    "lvl1": "section header h1",
    "lvl2": "section h2",
    "lvl3": "section h3",
    "lvl4": "section h4",
    "lvl5": "section h5",
    "text": "section p"
  },
  "min_indexed_level": 2,
  "custom_settings": {
    "attributesForFaceting": [
      "tags",
      "version"
    ]
  },
  "nb_hits": 4471
}

Hope this answers your question :smiley:

1 Like

I will try this and see if that solves the issue. Thanks.

Even after add this config, sometimes Algolia is not understanding the records correctly like you can see bellow. The record has a tag but the UI is not showing the filter widget on the left.

As extra info, when this happens, I see the “Configure searchable attributes” and “Configure custom ranking” are empty.

Hi Bruno,

I’ve tried to reproduce by running the scraper around 10 times but I’ve never had missing facets. Also, the above solution is what we use in production.

Could you please:

  • Make sure you have the latest docker image (or have pulled master if you are running from the codebase)
  • Manually delete both indices
  • Check the docs_tmp index created during the crawl, to see if this one contains the facets
  • Make sure only one job is running on this index

Are the number of hits different with/without facets? If this is the case, it could be due to some client-side rendered content, so it might be worth adding the js_render option

Hope this gives you hints,
Have a nice day!