nbHits 0 for <indexName>

Trying to run the docsearch-scrapper library on a webpage to crawl and index data as listed under instructions here

When I run the command : ./docsearch run config.json I am getting the below error:
Crawling issue: nbHits 0 for

Here’s my config.json :

{
  "index_name": "<indexName>",
  "start_urls": [
    "https://help.developer.intuit.com/s/article/Listing-on-the-Apps-for-QuickBooks-Desktop-store"
  ],
  "selectors": {
    "lvl0": "h1.selfServiceArticleHeaderDetail",
    "text": "p"
  }
}

Any pointers would be really helpful. Thanks.

Hi @anil_kumar3,

Thanks for contacting Algolia. Currently I see that your configuration selectors are as follows:

  "selectors": {
    "lvl0": "h1.selfServiceArticleHeaderDetail",
    "text": "p"
  }

We provide some “Tips” including:

Please do not hesitate to see if the Tips help your configuration get to the right content successfully!

1 Like

Thanks, @ajay.david . I did use the tips suggested and in fact referred to a few configs listed here

I chose the selectors to drill down from lvl0 : article to lvl1: h1 but to no avail. Below is the updated selectors

  "selectors": {
    "lvl0": ".siteforceSldsTwoCol84SidebarFeaturedLayout.siteforceContentArea article",
    "lvl1": ".summary h1",
    "text": ".slds-form-element__control slds-grid itemBody span p *"
  } 

I am not sure if the crawler is unable to find the HTML elements since the page is hosted on salesforce-community. I appreciate any thoughts if you have encountered a similar situation.

Thanks !

Hi @anil_kumar3,

I am not sure if the crawler is unable to find the HTML elements since the page is hosted on salesforce-community. I appreciate any thoughts if you have encountered a similar situation.

We can inquire internally of the team. Does hosting a page on salesforce-community mean access for your users is password-protected or involve any other form of authentication? Thank you

The URL that I have used above does not need any user authentication.

:wave: @anil_kumar3

The issue you are struggling with here is that the website requires JS to be run in order to be correctly rendered.

You will need to add the following attribute to your configuration:

"js_render":true,

You can find out more details here.