Crawling issue: nbHits 0 for <MyIndex>

I’ve used https://github.com/algolia/docsearch-scraper to index my site which has worked fine so far.

First I used it on an internal test website on the same server where I ran docsearch-scraper. The site was internal (192…) but and only accessible via http/https and IP address (and not by using its URL). However, when I tried it on the live site using https and URL I get this error:

2019-09-10 09:13:55 [secmaker-live-site] ERROR: Failure without response TCP connection timed out: 110: Connection timed out.

Crawling issue: nbHits 0 for secmaker-live-site

The whole CLI output (Root URL page is docs.secmaker.com):

$ sudo ./scrape MyIndex.json

[sudo] password for <user>:
10/09/2019 09:47:48 passing arg to libvncserver: -passwd
10/09/2019 09:47:48 x11vnc version: 0.9.13 lastmod: 2011-08-10  pid: 11
10/09/2019 09:47:48 XOpenDisplay(":99") failed.
10/09/2019 09:47:48 Trying again with XAUTHLOCALHOSTNAME=localhost ...

10/09/2019 09:47:48 ***************************************
10/09/2019 09:47:48 *** XOpenDisplay failed (:99)

*** x11vnc was unable to open the X DISPLAY: ":99", it cannot continue.
*** There may be "Xlib:" error messages above with details about the failure.

Some tips and guidelines:

** An X server (the one you wish to view) must be running before x11vnc is
   started: x11vnc does not start the X server.  (however, see the -create
   option if that is what you really want).

** You must use -display <disp>, -OR- set and export your $DISPLAY
   environment variable to refer to the display of the desired X server.
 - Usually the display is simply ":0" (in fact x11vnc uses this if you forget
   to specify it), but in some multi-user situations it could be ":1", ":2",
   or even ":137".  Ask your administrator or a guru if you are having
   difficulty determining what your X DISPLAY is.

** Next, you need to have sufficient permissions (Xauthority)
   to connect to the X DISPLAY.   Here are some Tips:

 - Often, you just need to run x11vnc as the user logged into the X session.
   So make sure to be that user when you type x11vnc.
 - Being root is usually not enough because the incorrect MIT-MAGIC-COOKIE
   file may be accessed.  The cookie file contains the secret key that
   allows x11vnc to connect to the desired X DISPLAY.
 - You can explicitly indicate which MIT-MAGIC-COOKIE file should be used
   by the -auth option, e.g.:
       x11vnc -auth /home/someuser/.Xauthority -display :0
       x11vnc -auth /tmp/.gdmzndVlR -display :0
   you must have read permission for the auth file.
   See also '-auth guess' and '-findauth' discussed below.

** If NO ONE is logged into an X session yet, but there is a greeter login
   program like "gdm", "kdm", "xdm", or "dtlogin" running, you will need
   to find and use the raw display manager MIT-MAGIC-COOKIE file.
   Some examples for various display managers:

     gdm:     -auth /var/gdm/:0.Xauth
              -auth /var/lib/gdm/:0.Xauth
     kdm:     -auth /var/lib/kdm/A:0-crWk72
              -auth /var/run/xauth/A:0-crWk72
     xdm:     -auth /var/lib/xdm/authdir/authfiles/A:0-XQvaJk
     dtlogin: -auth /var/dt/A:0-UgaaXa

   Sometimes the command "ps wwwwaux | grep auth" can reveal the file location.

   Starting with x11vnc 0.9.9 you can have it try to guess by using:

              -auth guess

   (see also the x11vnc -findauth option.)

   Only root will have read permission for the file, and so x11vnc must be run
   as root (or copy it).  The random characters in the filenames will of course
   change and the directory the cookie file resides in is system dependent.

See also: http://www.karlrunge.com/x11vnc/faq.html
Initializing built-in extension Generic Event Extension
Initializing built-in extension SHAPE
Initializing built-in extension MIT-SHM
Initializing built-in extension XInputExtension
Initializing built-in extension XTEST
Initializing built-in extension BIG-REQUESTS
Initializing built-in extension SYNC
Initializing built-in extension XKEYBOARD
Initializing built-in extension XC-MISC
Initializing built-in extension SECURITY
Initializing built-in extension XINERAMA
Initializing built-in extension XFIXES
Initializing built-in extension RENDER
Initializing built-in extension RANDR
Initializing built-in extension COMPOSITE
Initializing built-in extension DAMAGE
Initializing built-in extension MIT-SCREEN-SAVER
Initializing built-in extension DOUBLE-BUFFER
Initializing built-in extension RECORD
Initializing built-in extension DPMS
Initializing built-in extension Present
Initializing built-in extension DRI3
Initializing built-in extension X-Resource
Initializing built-in extension XVideo
Initializing built-in extension XVideo-MotionCompensation
Initializing built-in extension SELinux
Initializing built-in extension GLX
screen 0 shmid 0
[dix] Could not init font path element /usr/share/fonts/X11/cyrillic, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/100dpi/:unscaled, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/75dpi/:unscaled, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/Type1, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/100dpi, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/75dpi, removing from list!
09:47:48.597 INFO - Launching a standalone server
Setting system property webdriver.chrome.driver to /usr/lib/node_modules/selenium-standalone/.selenium/chromedriver/2.13-x64-chromedriver
09:47:48.621 INFO - Java: Oracle Corporation 25.171-b11
09:47:48.621 INFO - OS: Linux 4.9.0-8-amd64 amd64
09:47:48.627 INFO - v2.44.0, with Core v2.44.0. Built from revision 76d78cf
09:47:48.673 INFO - Default driver org.openqa.selenium.ie.InternetExplorerDriver registration is skipped: registration capabilities Capabilities [{ensureCleanSession=true, browserName=internet explorer, version=, platform=WINDOWS}] does not match with current platform: LINUX
09:47:48.694 INFO - RemoteWebDriver instances should connect to: http://127.0.0.1:4444/wd/hub
09:47:48.695 INFO - Version Jetty/5.1.x
09:47:48.696 INFO - Started HttpContext[/selenium-server,/selenium-server]
09:47:48.708 INFO - Started org.openqa.jetty.jetty.servlet.ServletHandler@d7b1517
09:47:48.708 INFO - Started HttpContext[/wd,/wd]
09:47:48.708 INFO - Started HttpContext[/selenium-server/driver,/selenium-server/driver]
09:47:48.708 INFO - Started HttpContext[/,/]
09:47:48.710 INFO - Started SocketListener on 0.0.0.0:4444
09:47:48.711 INFO - Started org.openqa.jetty.jetty.Server@79fc0f2f
SELENIUM-STANDALONE: Selenium started
7 XSELINUXs still allocated at reset
SCREEN: 0 objects of 264 bytes = 0 total bytes 0 private allocs
DEVICE: 0 objects of 96 bytes = 0 total bytes 0 private allocs
CLIENT: 0 objects of 144 bytes = 0 total bytes 0 private allocs
WINDOW: 0 objects of 48 bytes = 0 total bytes 0 private allocs
PIXMAP: 2 objects of 16 bytes = 32 total bytes 0 private allocs
GC: 4 objects of 16 bytes = 64 total bytes 0 private allocs
CURSOR: 1 objects of 8 bytes = 8 total bytes 0 private allocs
TOTAL: 7 objects, 104 bytes, 0 allocs
2 PIXMAPs still allocated at reset
PIXMAP: 2 objects of 16 bytes = 32 total bytes 0 private allocs
GC: 4 objects of 16 bytes = 64 total bytes 0 private allocs
CURSOR: 1 objects of 8 bytes = 8 total bytes 0 private allocs
TOTAL: 7 objects, 104 bytes, 0 allocs
4 GCs still allocated at reset
GC: 4 objects of 16 bytes = 64 total bytes 0 private allocs
CURSOR: 1 objects of 8 bytes = 8 total bytes 0 private allocs
TOTAL: 5 objects, 72 bytes, 0 allocs
1 CURSORs still allocated at reset
CURSOR: 1 objects of 8 bytes = 8 total bytes 0 private allocs
TOTAL: 1 objects, 8 bytes, 0 allocs
1 CURSOR_BITSs still allocated at reset
TOTAL: 0 objects, 0 bytes, 0 allocs
[dix] Could not init font path element /usr/share/fonts/X11/cyrillic, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/100dpi/:unscaled, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/75dpi/:unscaled, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/Type1, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/100dpi, removing from list!
[dix] Could not init font path element /usr/share/fonts/X11/75dpi, removing from list!
2019-09-10 09:54:24 [MyIndex] ERROR: Failure without response TCP connection timed out: 110: Connection timed out.
2019-09-10 09:54:24 [MyIndex] ERROR: Failure without response TCP connection timed out: 110: Connection timed out.
2019-09-10 09:54:24 [MyIndex] ERROR: Failure without response TCP connection timed out: 110: Connection timed out.
2019-09-10 09:54:24 [MyIndex] ERROR: Failure without response TCP connection timed out: 110: Connection timed out.
2019-09-10 09:54:24 [MyIndex] ERROR: Failure without response TCP connection timed out: 110: Connection timed out.
2019-09-10 09:54:24 [MyIndex] ERROR: Failure without response TCP connection timed out: 110: Connection timed out.
2019-09-10 09:54:24 [MyIndex] ERROR: Failure without response TCP connection timed out: 110: Connection timed out.

Crawling issue: nbHits 0 for <MyIndex>

UdevQt: unhandled device action "move"

:wave: @magnus.nordstrand

Can you make sure that at least one start_url is available (status 200). It seems that https://docs.secmaker.com/ is not available right now (time out)

Once the website is available, the crawl should work fine.

Cheers

I’m using ssl so with https it should work, which I also specified in my JSON config file. But now I’ve made it available using http by using URL Rewrite.