How to index entire site, not just docusaurus

I’m using docsearch to crawl our website substrate.dev . It’s configuration is here.

Many of our docs are hosted in a docusaurus installation, and they are being indexed perfectly. Some of our docs are not in docusaurus (example1 example2), and we would like them to be indexed as well.

My first attempt was https://github.com/algolia/docsearch-configs/pull/894 . When this fix didn’t work I realized I would need a better iterating cycle than attempting changes, getting the PR merged, and waiting for a crawl.

I decided to run the scraper myself, but the docker container gives me a python traceback. https://gist.github.com/JoshOrndorff/fbb37b8f8e17deee75f815d319b239da

Any help with the primary issue of crawling out entire site, or the secondary issue of running the scraper are greatly appreciated. For cross reference, we’re tracking this issue internally at https://github.com/substrate-developer-hub/substrate-developer-hub.github.io/issues/145

:wave:

I think you are not using the proper credentials while running the crawl. Could you try to respecte these.

Cheers

Thank you for this help. I’m confident I have the correct API_KEY as it’s the same one I use for the automatic crawls (https://github.com/substrate-developer-hub/substrate-developer-hub.github.io/blob/source/website/siteConfig.js#L135)

I guess I’m not certain about the APPLICATION_ID. I was using ‘substrate’. Is that not correct? Where can I find the proper APPLICATION_ID?

Thanks again for your help.

That’s not the good credentials. These keys are not the same since one is to create content and the UI one is a search only key!

Do not forget to set your Application_ID from the UI as stated here.

Thanks