How to fix ./docsearch docker:build Errors!

I am trying to follow the Run Your Own guide to setup docsearch. I have setup the environment, but I am getting various errors when I run the ./docsearch docker:build command. I have very limited Docker and Python knowledge and would appreciate some help to get things started. See errors below:

$ ./docsearch docker:build
Traceback (most recent call last):
  File "./docsearch", line 5, in <module>
    run()
  File "C:\Users\arai\Documents\docsearch-scraper\cli\src\index.py", line 189, in run
    exit(command.run(sys.argv[2:]))
  File "C:\Users\arai\Documents\docsearch-scraper\cli\src\commands\build_docker_scraper.py", line                                        14, in run
    code = self.build_docker_file("scraper/dev/docker/Dockerfile.base", "algolia/base-documentatio                                       n-scrapper")
  File "C:\Users\arai\Documents\docsearch-scraper\cli\src\commands\abstract_build_docker.py", line                                        24, in build_docker_file
    return AbstractCommand.exec_shell_command(cmd)
  File "C:\Users\arai\Documents\docsearch-scraper\cli\src\commands\abstract_command.py", line 54,                                        in exec_shell_command
    p = Popen(arguments, env=merge_env)
  File "C:\Python27\lib\subprocess.py", line 394, in __init__
    errread, errwrite)
  File "C:\Python27\lib\subprocess.py", line 644, in _execute_child
    startupinfo)
TypeError: environment can only contain strings

I am using Windows 10 and Git bash. I tried using PowerShell, but the ./docsearch docker:build command did not work.

Hi there,

Are you familiar with how you are setting up your environment variables? Did you create a .env file in the root as noted here:

Thanks!
Jason

:wave: @arshabhi.rai,

Could you explain us why you want to do that please?

This command is only used by the DocSearch team and mean to build the image in a specific way. It is a part of our internal tooling.

You might need to create your own way to build the image,

Cheers

This makes sense now. I was just following the instructions as a step by step procedure. I guess Docker won’t run without appropriate environment variables configured? I thought, as per the instructions, I had do the build first and create the .env file after. I will try it again. Thank you for your help!

I didn’t realize that /docsearch docker:build was an internal tool. The documentation, however, recommends using it.

We then recommend using DocSearch from inside a Docker image. You can setup one by running ./docsearch docker:build.

Sorry, I am new to Docker so a lot of things to understand and figure out. Is Docker the only way to do this or is there an alternative? I guess my goal is to be able to crawl my docs site, extract content, and then push it to Algolia for testing.

Unfortunately, our docs are not open source, would have been much easier to just add the JS snippet :slight_smile:. Our team really likes Algolia docsearch and I would like to make it work. We use MadCap Flare for content dev, which is an xml based authoring tool that generates HTML5 output. I have to integrate docsearch with Flare.

Thank you for your help!

Sorry for the misleading documentation.

We will update our doc once this PR will be merged. It will introduce an easier way to use python and separated environment.

@arshabhi.rai why don’t you use the docker image to run the crawl?

You should use a proper python environment thanks to pipenv(~package manager tool of python):

Please follow these steps:

  • Fork this temporary codebase
  • Install pipenv
  • Run pipenv install
  • Create a .env file at the root of the scraper project with APPLICATION_ID & API_KEY set
  • Run pipenv shell
  • Run ./docsearch run <path to your config>

Thanks for posting the instructions, @Sylvain.PACE.

I haven’t had much joy, but here is what is happening:

The pipenv shell command worked, but only with Windows Command Prompt (CMD), it did not work with Git Bash on Windows, and I got a bunch of errors. Unfortunately, Windows Command Prompt (CMD) did not recognize the ./docsearch command.

The ./docsearch docker:build command, for some reason worked with Gitbash but the pipenv executable file was flagged by my antivirus, and it stopped working again :(. Setting up the environment itself was pretty tricky in Windows and took me a while. I will probably spend another day on this to try and find a way to crawl my website.

About using docker image to run the crawl, I am really not sure how I would do that? Sorry, I am probably asking too much here.

Also, I noticed that the ./docsearch does not work on the first execution in Gitbash. I had to restart the shell to make it work.

Ahhh… just realized that I missed an important note for pipenv installation. I had Python 2.7 installed as per docserach documentation. I guess I will have to update to Python 3 for pipenv to work? I will give it a stab today.

:wave: @arshabhi.rai,

Just a follow up on this point, You can now use pipenv from our scraper. We have merged the pending PR.

Let us know if you are still struggling with it.