Your Search Index

The Index section of the Control Panel shows you all the pages and documents that the crawler has found on your website.

How do I fix problems with my search index?

If you have problems with your search, you probably have problems with your search index.

When you look through your index, keep an eye out for missing pages, unwanted pages, and/or duplicate pages. If you find problems like this, refer to our crawler settings post to resolve them.

You will also want to check for status messages that might explain why the search does not behave like you want it to...yet!

Why do my pages get skipped?

Successfully indexed URLs will display a green 200 status:

Skipped pages will have a gray 800 or 802 status depending on why they were skipped by the crawler. In both cases, a rule you have set up under the crawler settings is communicating to the crawler that these pages should not be found in your search. If pages you want to be searchable are getting skipped, you will have to remove or change your so they no longer match those pages.

After making those changes, you can index the single URL to make sure you get a status 200 instead.

You might also spot red error statuses in your index.

URLs with an error status have not been successfully indexed and will not show up in your search results.

How do I fix Client Error 499?

When indexing your site's pages, we need to send HTTP requests to your web server. Client errors are the result of HTTP requests sent by a user client (i.e. a web browser or other HTTP client). Client Error 499 means that the client closed the connection before the server could answer the request (often because of a timeout) so the requested page or document could not be loaded. Re-indexing specific URLs or your entire site would usually help in this case.

This error can also occur when our crawlers are denied access to your site content by Cloudflare. Please make sure to whitelist our crawler IPs at Cloudflare (under Firewall > Tools):

Here's the list of IPs used by our crawlers:

  • 88.99.218.202

  • 88.99.149.30

  • 88.99.162.232

  • 88.99.29.101

  • 149.56.240.229

  • 51.79.176.191

  • 51.222.153.207

  • 139.99.121.235

  • 94.130.54.189

  • 116.202.85.24

You can also allow us as a User Agent at Cloudflare:

Mozilla/5.0 (compatible; SiteSearch360/1.0; +https://sitesearch360.com/)

Keep in mind that the User Agent is different for our JavaScript crawler:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36

Note: Cloudflare can be set up as part of your CMS (Content Management System - e.g. WordPress, Magento, Drupal, etc.). If you're not sure how to approach this, check in with your CMS's support and ask them to whitelist Site Search 360 crawler IPs for you.

How do I fix Client Error 403?

The 403 error means that when our crawler requests a specific page or file from your server, the server refuses to fulfill the request.

If whitelisting our crawler IP addresses and allowing us as a User Agent haven't helped, the issue may be related to your HTTP method settings.

Some content management systems, e.g., Magnolia CMS, block HEAD requests by default. Try adding HEAD to the allowed methods so your documents can successfully be reached by the crawler and added to your search index.

How do I fix Client Error 404?

404 errors are page not found errors. If you find 404 errors in your index, try opening the page and check whether it still exists. If not, you can remove this page from your sitemap and/or find the broken link on your site that points to the missing page and remove or update the link.

Add-On Feature

The crawler log is an add-on feature. Create your custom plan.

By default, we do not track which pages lead to broken links (404s), but there is a paid add-on feature called Crawler Log that would allow you to do so. When this feature is enabled, you'll be able to see the "Index Source" column in the Index Control Status Table:

This provides you with the opportunity to easily trace the route from a functional URL to a broken one.

How do I fix Server Error 500?

Server Error 500 means your server is experiencing problems. You will only be able to crawl your site when these issues have been resolved on your end. Once your site is back up and functioning normally, try deleting your index and crawling with a clean slate.

What other status codes are there?

Please refer to this post for a comprehensive list of status codes and their meanings.

What does it mean to re-index and when is it necessary?

Many changes you make to your search setup will require you to re-index your site. Re-indexing means the crawler will fully recrawl your site taking into account the changes you have made since the last crawl. For example, if you change or add data points, result groups, and whitelisting, blacklisting, or no-index patterns, you will need to re-index before the changes take effect. In case you aren't sure if a re-index is necessary, pay attention to the notification bar in the upper righthand corner. It will remind you when re-indexing is required.

You can re-index by pressing the Re-Index Now button within the notification or by navigating to the Index and scrolling down until you find the re-index button.

Note: this button can only be pressed if you have set at least one source to Auto-Reindex and only sources set to Auto-Reindex will be crawled.