Crawler not detecting content in web component shadow DOM

Hello -

I’m having trouble with the Algolia Crawler detecting my content. I have a web component that renders a table of information, sourced from an external JSON file, the final output is a table rendered in the shadow DOM of the web component. i.e. <custom-element file=""></custom-element>

Examining the indexed records, the content returns as empty on these pages; these particular pages only have the web component, therefore it’s thinks there’s no content. I configured the plugin in my netlify.toml to have renderJavaScript = true but still no luck.

Is there any way to hint to the crawler to collect what’s in the shadow DOM, perhaps there’s a timing issue at play and the crawler isn’t waiting long enough for the web component to fully render? If you have any other suggestions, I’m all ears.

Thank you :slight_smile:

Hey @davehudson52

Wonder if you have looked at the renderJavaScript documentation, specifically waitTime?

Hi @davehudson52, could you share an example of URL with this behaviour so we can check?

@the Indeed it could help if it’s confirmed to be a timing issue, but this parameter is not exposed to the Netlify plugin. Let’s see what the issue is first :slight_smile:

1 Like

Hello -

Thank you for the replies!

Here is a functional link, with the page I’d like crawled and indexed:

If you inspect code on the Breadcrumbs page default tab/API tab, you’ll see a <customelement-manifest-element> web component that renders a table out in its shadow root:

Here’s a screenshot of the contents of this page according to index record; none of the web component content is captured in record:

If you click on Demos tab from the Breadcrumbs page, there are demos that are loaded via JS, and these are being correctly indexed, it’s worth noting the demos are generated via web component, <docs-demos>, but one created without a shadow DOM, therefore this content is in the light DOM:

Screenshot of ‘Demos’ tab record:

I’ll take a shot at playing with waitTime, maybe a buffer will give it time to detect additional content.

Thank you both for the help and suggestions!

Hi Dave, so after a few tests it seems that you are using too advanced tech hehe:

  • Firefox 92: Doesn’t work, I see the following error in the console: Uncaught SyntaxError: unexpected token: identifiercustomelement-manifest-element.js:7:55
  • Chromium 93: Doesn’t work: Uncaught TypeError: "css" is not a valid module type.
  • It works with Chrome 93 or Chromium 96 though.

We are using Puppeteer for the renderJavaScript feature, and the latest version uses Chromium 93:

So it will eventually work when Puppeteer is updated to a greater Chromium version, but personally I think it would be great to make it available for Firefox users too :fox_face: :slight_smile: