Index code examples

The initial docsearch config for our site docs.contao.org has the following configured for the "text" selector:

      "text": "#body-inner p, #body-inner li"

Now we noticed that any code example which are either occuring within paragraphs like so:

<p>Lorem ipsum <code>code example</code> dolor sit amet.</p>

or on its own like so:

<div class="highlight">
<pre …><code class="language-php" data-lang="php">
<span …>// some code</span>
<span …>// some more code</span>
</code></pre>
</div>

are not indexed at all. This is not ideal for a documentation dealing with lots of code example where you also want to find anything within those code examples.

However, I am not sure how to adjust the selector for this. Is it sufficient to just alter it to

      "text": "#body-inner p, #body-inner li, #body-inner code"

for example?

On that note: does Algolia ignore any <span>s by default?

Hi,

Selectors are behaving the same way as a document.querySelectorAll() call, so selecting a paragraph will select all his children.

Could you please give me some example where code examples nested in a paragraph are not indexed?

We usually don’t recommend indexing <code> blocks because it creates a lot of noise in the search results and it can be misleading.

In your case, adding section code[class*="language"] would index any code blocks like these two Config :: Contao Developer Documentation

Ah, I see, I think the problem is something else then. I only searched for examples that happen to be in table cells, and not in regular paragraphs. So may be we just need to expand the selector with #body-inner table to cover these parts and then we’ll re-evaulate, if we really need to index full code examples.

I noticed that the default configuration set up for us uses #body-inner li rather than #body-inner ul for example. Is there any difference/advantage? So should we use #body-inner table or #body-inner th, #body-inner td for instance?

Using ul instead of li would consider all the objects of the list as a single hit, where li consider each object of the list as a hit.

Keep in mind that indexing td or th could lead to some results like Sorting mode (integer) which doesn’t have much sense as it’s not in its context.

If you’d like to index a table (usually, the first column), we recommend to add a special class/id to the td to avoid any unwanted results.

Wouldn’t that be preferable anyway? Semantically I see <ul> the same way as <p>. It’s content that belongs together contextually.

In our case there are a lot of tables where the first column is the header for something, e.g. a setting, which should be indexed, and the second column is the explanation for that setting. Both should ideally be indexed.

Unfortunately we do not have much control over that. And it wouldn’t always fit anyway - and as I said, we do want to index the whole table, not just a specific column, otherwise you would not be able to find a term from either.

Wouldn’t that be preferable anyway? Semantically I see <ul> the same way as <p> . It’s content that belongs together contextually.

I guess it depends on the user preferences, as you have full control on how to index your content, you can decide to use ul instead of li.

However, for some pages like Framework :: Contao Developer Documentation, it might not be the best use case, as the hit will not be formatted the same way.

ul li
ul li

Unfortunately we do not have much control over that. And it wouldn’t always fit anyway - and as I said, we do want to index the whole table, not just a specific column, otherwise you would not be able to find a term from either.

It might be possible to index the whole row with #body-inner tbody tr, but I think that the query will have to be really precise to see a hit like this pop in the top results, considering its length.

1 Like

Ah, I see now :slight_smile:

Yeah, I think we are gonna go this route. I am not really overlay concerned about the results though.