Content truncated on some pages

I’ve been trying to debug an issue, where, the algolia-netlify plugin is working well and indexing the content on most of the pages on my nuxt site, but on a couple of pages, just the first couple of HTML tags go into the content and nothing else. There are no error messages here.

I’ve been working on the basis that there is something in the rendered HTML that the crawler doesn’t like, or sees as the end of the document, and stops. My very basic debugging approach has been to comment out sections of code, redpoloy to netlify, and see if the index for those pages then shows correctly.

Can i ask if anyone can help me out with any aspects of this please? Namely:

  • Is there a better way I can debug this that redeploying to netlify?
  • Any suggesions on what might be causing this, or where I can look to see in more detail what the crawler is doing?

This is one of the pages that is causing the issue: https://dev--aurora-alliance.netlify.app/what-we-do - the content for this page in the index is only: “Universities Professional Development. Learning for Societal Impact, Engaging Communities and Sustainability.”

The site is a statically generated nuxt site, all the data is fetched using asyncData() and is loaded on the server side.

Any suggestions much appreciated!

Hi @andy2.
We indeed haven’t made all our crawler debugging tools available for the Netlify plugin, so you need to contact us for now, sorry for that!
In this particular case, there might be a problem on our side about how we determine what is the “main content” of the page (we use headers to determine that and it seems that here the crawler only keeps the first two headers).
We’ll have a look and get back to you!

1 Like

Hi again, we have released a fix on the logic. To give a bit more details on what happens internally, when there is only one h1 in the page, we were using the first h1 and the first h2 to determine the main “content” node. This means that in that case:

<div id="main">
  <div id="foo">
    <h1>Title</h1>
    <h2>Subtitle</h2>
  </div>
  <div id="bar">
    <h2>Subtitle 2</h2>
  </div>
</div>

The main content node detected was <div id="foo">.
We are now using the first h1 and the last h2, which will solve the problem with the mentioned page. I’ll let you retest!

Thanks for your report and don’t hesitate if you see anything else that can be improved!

Hi Silvain - Thats fixed it! Thank you for your super quick replay and resolution.

Really liking how quickly and easily I’ve been able to get full text search on my static nuxt site using this plugin :+1:

1 Like