Hi!
On ubuntu, I ran the Docker algolia/docsearch-scraper image in the hope of scraping a Docusaurus site that was on localhost on port 80. Unfortunately, nothing was actually scraped. I looked throughout the community for an exact answer as to why this did not work but could not find one.
The exact command that I ran - on the PC where Docusaurus was running on localhost - is as follows:
docker run -it --network=“host” --env-file=NEW_TEST.env -e “CONFIG=$(cat ./my_first_index.json | jq -r tostring)” algolia/docsearch-scraper
The message that appears on the screen is as follows:
DocSearch: http://host.docker.internal/docs/Order-Configuration 0 records)
The contents of NEW_TEST.env are as follows:
ALGOLIA_APP_ID=G1SHEWBZ5L
ALGOLIA_API_KEY=5fb47ade2ce99b9bdd51f264c002119d
ALGOLIA_INDEX_NAME_TMP=GERBIL
The contents of the my_first_index.json file are as follows:
{
“index_name”: “GERBIL”,“start_urls”: [
{
“url”: “http://host.docker.internal/docs/book”,
“url”: “http://host.docker.internal/docs/config-intro”,
“url”: “http://host.docker.internal/docs/user-guide-intro”,
“url”: “http://host.docker.internal/docs/reference_data”,
“url”: “http://host.docker.internal/docs/ref-guide-introduction”,
“url”: “http://host.docker.internal/docs/Order-Configuration”
}
],
“selectors”: {
“default”: {
“lvl2”: “article h2”,
“lvl3”: “article h3”,
“lvl4”: “article h4”,
“lvl5”: “article h5, article td:first-child”,
“lvl6”: “article h6”,
“text”: “article p, article li, article td:last-child”
}
},
“strip_chars”: " .,;:#"
}
Please note that I have not made a request to have DocSearch crawl my web sites. I want to perform an internal evalation of DocSearch before making any recommendations. Our documentation is accessible with a user ID and password. I do have an Algolia account and entered the correct API key and APP ID.