Indexing too many records

Hi there!
I am trying to implement Algolia’s Wordpress Search to try it out.
I have 7010 pages I want to index but when I do it I have way too many records (more than 20 000 and I hit my record limit). I don’t know why I don’t only have 7010 records?

Thanks for your help!

Hey @nico_lrx!

Thanks for your question :slight_smile:

Our Wordpress plugin is designed to optimize indexing when content is very long. The way we do that is by splitting pages or posts into smaller chunks of text, and each one will become an individual record.

We do that because it greatly improves relevance and speed, and returns the most relevant chunk of text instead of a full piece of content.
Side note: by default, we enable the distinct feature which avoids having multiple results for the same page or post (relying on an id to link split parts together). More info about this here.

Does that make sense?

You have more technical information available in the FAQ: https://community.algolia.com/wordpress/frequently-asked-questions.html#i-have-more-records-than-i-have-posts-is-that-normal-

Hope this helps!

1 Like

Thanks @julien.paroche, that’s what I thought.
Is there any way to only index the title for each record using the Wordpress plugin? (the content is not relevant)

Yes, you can define a custom parser in which you set what attributes from your content are sent to Algolia: https://community.algolia.com/wordpress/index-schema.html#introduction
There is a code example lower in the page.

Would that work for you?

Thanks, I tried to add the code to my functions.php file but when I re-index the content, it still gets the content. Maybe I did not understand quite well how the function works, here is my code:

function my_custom_parser( Algolia\DOMParser $parser ) {
    // Custom selectors.
    $parser->setAttributeSelectors( array(
   ) );

   // Custom exlusion rules.
   $parser->setExcludeSelectors( array(
    'pre', '.related-articles', '#toc', '#social-medias', 'title', 'div.heading', 'p.sub-header', 'h4', 'h5', 'h6', 'p', 'ul', 'ol', 'dl', 'table'
   ) );

   return $parser;
}

add_filter( 'algolia_post_parser', 'my_custom_parser' );

Hi @nico_lrx,

I’m not an expert on the parser configuration for Wordpress, but I think you may need to remove the line with $parser->setAttributeSelectors(array()); to let the default selectors work and only use the exclusion list you have below.

Keep in mind that doing it may require updating the templates too, because they rely on attributes that will probably not exist with your specific rules.

If you want a simpler solution, you can go for disabling the splitting in the config: define('ALGOLIA_SPLIT_POSTS', false);
(Details are in the link I shared in the first message)

Tell me how that goes!

2 Likes

It works like a charm! Thank you @julien.paroche : http://conjugaison.lalanguefrancaise.com/

4 Likes