DocSearch Scraper with Anchors

Hi! I’m integrating DocSearch with our website, and using DocSearch scraper to index the site. I would like to use anchors in order to jump to the correct location on the page when a user clicks on a search result. However, when I use anchors I get many duplicates, because the full page is reindexed for each anchor href. Is it possible to only index the element that is referred to by the webpage’s url / anchor when crawling? The “:target” css selector looks promising but has not been working for me.

:wave:@canyon

Thank you for reaching out.

The duplicates you have noticed are due to the way the DocSearch scraper builds records from a page. It creates one record for each levels lvlX defined from your configuration. You can find out more details about this strategy in our dedicated documentation.

I would suggest you to make sure that every DOM elements matching the lvlX selectors have a unique id or name attribute. This will help the redirection to directly scroll down to the exact place of the matching elements. These attributes define the right anchor to use.

Another solution to this problem is to only index record built with a text/content attribute not null. You will need to use the setting only_content°level set from your DocSearch configuration. You can find out more details about this setting in our dedicated documentation.

The CSS pseudo class :target will not help here. It is useful when you need to highlight an element matching the URL’s fragment. The URL must have a fragment defined (text after # in the URL). A use case for this selector would be to put the emphasise on the section of the page when a user is redirected for example.

Let us know if you met any roadblock along the way.

Thanks for the thorough response!

1 Like