Domain is not getting verified when trying to create a crawler

Hello,
We have a requirement of crawling the content on the site. The content is on Adobe Experience Manager and we plan to use the crawler for crawling this content.
I have added the Algolia code in robots.txt and the same is placed in the author instance of AEM. So the author instance does require a sign-in. But when i try to give the author instance domain (https://author-XXXXXXXX.adobeaemcloud.com/), it’s not able to verify it.
We are still in the implementation phase of the project and do not have a live site. This is blocked since Algolia does not take the author instance nor the publisher instance URL as the domain.
Can someone please let me know what am I not doing right here?

Hi @tina.oswal – Have you validated the robots.txt file accessible publicly from either instance (author or publisher). The crawler need to be able to see this file to confirm the app id.

Also you need to make sure the domain matches exactly. The crawler doesn’t support wildcard domains.

Hi @chuck.meyer ,
I tried through the browser. The author instance requires a sign-in. So when i sign-in and access robots.txt, it is getting downloaded (it does not open up directly). In crawler dashboard, when i try to verify this domain, it is giving me 401 error for author instance.
From the publisher instance, the robots.txt is not getting accessed through the browser.
We are trying to make the robots.txt accessible at the root level in AEM. But I wanted to know if there is anything else (other than adding the algolia code in robots.txt) that we need to do otherwise to ensure that the robots.txt is accessible to the crawler and we can run the crawler ?

Thanks.

Have you added the authentication information to the Crawler configuration? My gut feeling is maybe we don’t apply authentication when doing validation since robots.txt is typically publicly accessible at the root of a domain, but I’m trying to get validation.

I tried adding the information in crawler configuration using “login” but unfortunately, it didn’t work. Though I am not sure if the problem was with my configuration or something else since the editor just kept raising the URL issue.
Any inputs on how I can check if that authentication was happening or not?

I’m trying to find out more on my side, but from what I’ve read so far it appears do do automated verification we are expecting the robots.txt to be at the root of the domain and publicly accessible.

If that’s not possible, you should open a ticket with support do assist with a manual validation.

oh okay. Let me try then opening a ticket with Support. Thank you @chuck.meyer.