Locally hosted Docsearch scraper unable to crawl site running on a local web server

Previously I had requested Docsearch to be enabled on my documentation site. It was done and has been working well so far. However, for some reasons, I need to make the site private, and it won’t be accessible publicly in future.

Therefore, I installed Docsearch locally, created a new app in my Algolia account, used the same .json file that was being used to scrape my public site (downloaded it from the public github repo), updated the start_urls to point to http://localhost:<port>/doc URL instead of https://www.example.com/doc URL.

However, the scraper only indexes the page listed under start_urls. It doesn’t follow the links in the starting page like it does on the public site. If I change back the URL to point to the public site and run the local docsearch again, it is able to follow the links in the page

So, there’s something about running the docsearch locally that it’s not able to follow the links on the localhost server. The content of the page is exactly the same when running locally or on the public site.

I’m using Jekyll to build my site.

Is there a limitation or a bug, or am I missing some configuration?

Any help would be appreciated.

:wave: @deepfriedbrain

Could you try to remove the port please? We do not handle it IIRC

Cheers

Hi Sylvain,

The application is running on a local web server. How will I access it without the port number?

Hi,

Can you run your local web server on port 80 (or 443 if HTTPS)? In this case you will not need to specify the port to the crawler.

Awesome! That worked. Thank you so much, Alefort and Sylvain.

Hi,

I’m also trying to achieve this. Somehow, running
./docsearch docker:run example.json returns:

2019-01-31 15:37:04 [127.0.0.1] ERROR: Failure without response 
Connection was refused by other side: 111: Connection refused.

Crawling issue: nbHits 0 for 127.0.0.1

My website is running on http://127.0.0.1:80.

My config file contains:

{
  "index_name": "127.0.0.1",
  "start_urls": [
    "http://127.0.0.1"
  ],
  "stop_urls": [],
  "selectors": {
    "lvl0": "FIXME h1",
    "lvl1": "FIXME h2",
    "lvl2": "FIXME h3",
    "lvl3": "FIXME h4",
    "lvl4": "FIXME h5",
    "lvl5": "FIXME h6",
    "text": "FIXME p, FIXME li"
  }
}

I’m using npm’s http-server.

Any idea?

EDIT: i’m a Docker noob. Running the scraper from the container wont work since it can’t see the host on my machine. Ok… Sorry about this.
All good now :+1: