Setting up DocSearch for Private Documentation

I’m trying to set up DocSearch for my company’s private documentation, but the docs at https://docsearch.algolia.com/docs/run-your-own are pretty sparse. Can someone point me toward resources for getting the crawler to authenticate when visiting the start URL?

Current behavior:

Ignored from start url: https://[redirect url for login]

Crawling issue: nbHits 0 for [index_name]

:wave: @canyon,

Unfortunately, the DocSearch crawler is not able to access content behind a login.

The scraper is built upon Scrapy. I would recommend to either whitelist the IP address or the user agent of the crawler if possible. You can also fork the scraper and try to follow this recommendation from the Scrapy documentation. Please note that following this lead will require to fork the scraper and you will need to develop your own solution.

Best regards

Thanks for getting back to me! I’m doing a rendered crawl and was able to add some authenticated cookies to the driver in custom_downloader_middleware.py in docsearch-scraper as a temporary workaround.

1 Like