How to Crawl and Update Private Sites

A Guide on How to Crawl Password Protected Webpages

The website with a lot of rich content that you would like to crawl may be behind a login authentication page.

To bypass this login authentication, you have to follow the steps below and also download an extension. Please also ensure that you have the full authority and right to crawl a password protected site.


Step 1: Download a Chrome Extension

Download a chrome extension which would help you to fetch a session cookie allowing Wonderchat access to crawl your site.

Download the chrome extension at this link: Get cookies.txt LOCALLY and click on "Add to Chrome". This tool downloads cookies locally into your server so it would allow you to safely store the cookie.

Once you have downloaded the extension successfully, it will show up on your side bar.

Step 2: Log into your private website

Log into your password protected website. For instance, if you are looking to crawl a private Wordpress community site so you have to be logged into the website to save the session cookie.

Step 3: Use the Cookies extension within your logged in private website

After you have logged into your private website, open the “Get cookies.txt LOCALLY” extension.

Set “Export Format” as JSON. This is critically important, as the default format is set as “NetScape.”

Click copy to copy your session cookie to your clipboard

Go back to your Wonderchat dashboard, click on “Create Chatbot” and "Edit Chatbot > Chatbot Settings > Data Sources"

Enter the link to your website that requires a login.

Adjust the settings of the crawl, if you only want to crawl one page or a sub-directory from your private site, remember to specify doing so.

Under “Advanced Settings”, paste the previously copied Session cookie into the field.

Hit “create” to create a chatbot trained on your private website data. A successful crawl would allow the chatbot to crawl pages shown in the “pages crawled” column.

⚠️ Note: If the chatbot crawl fails …

  • Many websites handle authentication differently. While we have tried to support as many websites as possible, edge cases may still fall through.

  • If your website is still unable to be crawled, please reach out to us at https://wonderchat.io/contact and we will do our best to assist you.

Step 5: Add more private pages into your Wonderchat Bot

To add more private pages to Wonderchat, you can click on the “add pages” section. Enter the link of the website you want to crawl.

Always remember to copy and paste your session cookie within the advanced settings button for the chatbot to be able to crawl the private pages.

Once the session cookie is added, click on 'confirm' to add the new page into your chatbot.


If you have any more questions, feel free to reach out to us at support@wonderchat.io

Last updated