Advanced Crawler Settings

Depending on your website, you might have some special requirements that can only be fulfilled by accessing our advanced settings, which you will find under Website Crawling or Sitemap Indexing.

How do I index secure or password-protected content?

If you have password-protected content that you'd like to include in your search results, you need to authenticate our crawler so we're able to access the secure pages. There are several options:

  1. If you use HTTP Basic Authentication, simply fill out a username and password (not applicable if JavaScript-rendered content needs to be added to the search index)

  2. If you have a custom login page, use the Custom Login Screen settings instead (not applicable if JavaScript-rendered content needs to be added to the search index)

  3. Set a cookie to authenticate our crawler

  4. Whitelist our crawler's IP addresses so it can access all pages without a login (under Firewall > Tools):

    • 88.99.218.202

    • 88.99.149.30

    • 88.99.162.232

    • 88.99.29.101

    • 149.56.240.229

    • 51.79.176.191

    • 51.222.153.207

    • 139.99.121.235

    • 94.130.54.189

    • 116.202.85.24

  5. Provide a special sitemap.xml with deep links to the hidden content

  6. Detect our crawler with the following User Agent string in the HTTP header:

    Mozilla/5.0 (compatible; SiteSearch360/1.0; +https://sitesearch360.com/)

  7. Add another User Agent for our JavaScript crawler if you need JS-rendered content to appear in your search results:

    Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36

  8. Push your content to our HTTP REST API

How do I crawl content behind a custom login page?

  1. Under Advanced Settings > Custom Login Screen, check the box called "Active."

  2. Provide the URL of your login page, e.g. https://yoursite.com/login

  3. Provide the login form XPath:

    On your login page, right-click the login form element, press Inspect, and find its id in the markup. For example, you might see something like:

    <form name="loginform" id="loginform" action="
    https://yoursite.com/login.php
    " method="post”>

    So you'd take id="loginform" and address it with the following XPath: //form[@id="loginform"]

  4. Define the authentication parameter names and map them with the credentials for the crawler to access the content.

    Let's find out what parameter name is used for your login field first. Right-click the field and press Inspect. For example, you'll have:

    <input type="text" name="log" id="user_login" class="input”>

    So you’d take log and use it as Parameter Name. The login (username, email, etc.) would be the Parameter Value. Click Add and repeat the same process for the password field.

  5. Save and go to the Index section where you can test your setup on a single URL and re-index the entire site to add the password-protected pages to your search results.

Some login screens have a single field, usually for the password, in which case you'd only need one parameter name-value pair.

Note: Custom login screens can’t be used alongside JavaScript crawling. If your login screen is JS-rendered, please use alternative indexing methods outlined in the previous section of this article (items 3 through 8).

How do I use cookies with the crawler?

Sometimes it can be useful to tell our crawler to set a specific cookie when accessing your website.

For example, if you have a location cookie that determines which language your search results are in, you can set this cookie to "us" for your English-language project or to "de" for your German-language project.

What is indexing intensity?

Indexing intensity influences how quickly our crawler moves through your website. You can set the intensity anywhere from 1 (slowest indexing, little stress on your server) to 5 (fastest indexing, higher stress on your server).

If you are looking for other ways to increase crawling speed, consider switching to sitemap indexing and using the Optimize Indexing setting.

How do I index JavaScript content?

The Site Search 360 crawler can index content that is dynamically loaded via JavaScript. To enable JS crawling, activate the respective toggle under Website Crawling > Advanced Settings, and re-index your site.

Add-On Required

JavaScript crawling is an add-on feature. Create your custom plan.

Note: JS crawling isn't enabled for free trial accounts by default. Please reach out if you need to test it before signing up for a paid plan.

JavaScript crawling also takes more time and more resources to index JavaScript-rendered content. If there are no search results or some important information seems missing unless you activate the JavaScript Crawling feature, make sure to add it to your Custom Plan options to be able to use it after your trial period expires.

Alternatively, you can push your JavaScript content to our API.