Typesense/docsearch-scraper container works on localhost but not on other internal systems

For some reason, I can use the typesense/docsearch container to scrape my locally hosted Docusaurus site but it does not seem to ‘see’ a different locally hosted docusaurus site. Has anyone else encountered a similar issue?

The env file has the following content:


The config JSON file has the following content:

“index_name”: “Documentation”,

“start_urls”: [
“url”: “http://doc-dev.wyden.io/docs/book/
“js_render”: true,
“selectors”: {
“lvl0”: “h1”,
“lvl1”: “h2”,
“lvl2”: “h3”,
“lvl3”: “h4”,
“lvl4”: “h5”,
“lvl5”: “h6”,
“text”: “p, li”
“scrape_start_urls”: true,
“strip_chars”: " .,;:#"

The command used to run typesense/docsearch is as follows:

run -it --env-file=./Typesense-Scraper-DocDev.env -e “CONFIG=$(cat ./TEST-CONFIG-DOCDEV.json | jq -r tostring)” typesense/docsearch-scraper

(Note that Typesense-Scraper-DocDev.env is the env file and TEST-CONFIG-DOCDEV.json is the config.json file.)

When it runs, the relevant message is as follows:

INFO:scrapy.extensions.logstats:Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
DEBUG:scrapy.core.engine:Crawled (200) <GET http://doc-dev.wyden.io/docs/book/> (referer: None)
DEBUG:typesense.api_call:Making post /collections/CrucialDocumentation_1689867254/documents/import
> > DocSearch: http://doc-dev.wyden.io/docs/book/ 1 records)

Does anyone have a clue as to why the docker container cannot crawl this internal site? It can crawl a locally hosted site - with, of course, different env and config settings. It can also crawl public sites.