Index Status Codes: the Secrets of your Index Log
In order to make your website's content searchable, we need to index it. That's where our crawler comes in, combing through every URL available on your site to create what is essentially a database Site Search 360 can pull search results from.
Every URL our crawler picks up from your site (aka everything that's been indexed, successfully or otherwise) is available under Index -> Index Log. There you will find a full list of indexed pages, and on the left of each entry you'll find its index status and the code of said status. You might, for example, see something like this:
There are quite a few index status code numbers that may or may not be present in your Index Log, so let's go through them one by one.
200 - the request to crawl this URL was successful, the page exists, and it is indexed.
400 - Bad Request - the Site Search 360 crawler is blocked by your server (or your CDN provider like Cloudflare if you use one). It can usually be fixed by whitelisting our crawler's IP addresses and/or allowing our crawler as a User Agent at Cloudflare. Make sure to re-index the project when you're done for the changes to be applied.
401 - Unauthorized - some content is hidden from our crawler (for instance, by a login page), so an authentication method needs to be manually set up, thus granting the crawler full access to the site.
403 - Forbidden - your server refuses to fulfill our crawler's request and provide it with the page/file that was intended to be indexed. This can usually be fixed by whitelisting the IP addresses our crawler uses and/or allowing our crawler as a User Agent at Cloudflare and then re-indexing the entire site (as is the case with 400 and 499 error messages).
404 - Not Found - a broken link has been detected. We automatically remove broken links from search results, so it's more of an indication that you have some cleaning up to do by manually removing these dysfunctional URLs from your site.
NOTE: By default, we do not track which pages lead to broken links (404s), but there is a paid add-on feature called Crawler Log that would allow you to do so. When this feature is enabled, you'll be able to see the "Index Source" column in the Index Control Status Table:
This provides you with the opportunity to easily trace the route from a functional URL to a broken one, after which you can either update or remove the latter. Alternatively, you could use an external broken link checker to comb through your website in search of broken links, though it's usually better to keep everything in-house. Just keep in mind that running periodic checks for broken links is beneficial for your site in more ways than one: it ensures your search includes all possible entries, thus making them available to your customers, and also improves your SEO.
499 - Client Closed Request - your server closed the connection before a response was returned. This too can be fixed by whitelisting the IP addresses our crawler uses and/or allowing our crawler as a User Agent at Cloudflare and then re-indexing the entire site (repeating the steps listed for errors 400 and 403).
500 - Internal Server Error - our server (and/or yours) is experiencing temporary problems, but the system can't pinpoint any specific error or the root cause of such error (e.g. the crawler failed to parse a non-HTML document).
520 - a specific return code used by Cloudflare to indicate that there was a protocol violation or an empty response by the website behind Cloudflare. The root cause will be explained in your Cloudflare logs.
800 - No-Index - the URL is not indexed because of a no-indexing rule set up in your Control Panel under Data sources -> Website Crawling -> No-Index URL Patterns. You can read up on those in our docs. Keep in mind that while the URL with this status isn't indexed, the ones it leads to are, which isn't the case for blacklisted URLs that are skipped entirely.
801 - Different Canonical - the URL is not indexed because of a specific rule set up in your Control Panel under Data sources -> Website Crawling -> Use Canonical URL.
802 - Blacklisted - means that URL is not indexed because of a blacklisting rule set up in your Control Panel under Data sources -> Website Crawling -> Blacklist URL Patterns. You can read up on those in our docs. Keep in mind that URLs with this status are skipped entirely, meaning that all the URLs they lead to are invisible to the crawler, unlike no-indexed URLs that are still crawled for useful links.
If you have any questions about index status error codes or other features of the Site Search 360 platform, please contact us via email. We're always happy to help!