Starting the Crawler
The quality of your search results depends on the quality of your search index. This is why it is crucial to tell the Site Search 360 crawler which pages, documents, and other content on your site is important for your visitors.
In case you have never heard of a crawler before, it is a type of bot that browses your website and builds the search index that your search is based on. A search index is a list of pages and documents that are shown as search results in response to a query that your site visitor types into a search box on your site.
You have two main options to direct our crawler to the right pages and content:
1. Website Crawling
When you enter a root URL (typically your homepage) and click Index, our crawler will visit the root URL and follow all the links it finds that point to other pages on your site. The pages it finds will be added to your index.
You can enter just one root URL or several.
2. Sitemap Indexing
Sitemap Indexing is our preferred indexing method as it is the quickest and most efficient way to crawl a website.
If we can detect a valid sitemap XML file for the domain you've provided at registration, our crawler will go to that sitemap—typically found at https://www.yoursite.com/sitemap.xml
or https://www.yoursite.com/sitemap-index.xml
—to pick up your website URLs listed there.
Note: The sitemap XML file must be formatted correctly for the crawler to process it. Check out these guidelines.
If we cannot detect a valid sitemap for your domain, we automatically switch on the Website Crawling method.
If we did not detect your sitemap, but you have one, simply provide the URL to your sitemap (or sitemaps, if you have more than one). Press Test Sitemap to make sure your sitemap is valid and ready for indexing. This check also shows you how many URLs are found in your sitemap. If that works, you can go ahead and press Index All Sitemaps.
With Website Crawling, the only way to check the number of indexed pages and documents is to wait until a full crawl is complete.
If necessary, you can use Website Crawling and Sitemap Indexing at the same time.
Don't forget to switch on the "Auto Re-Index" toggle under your preferred crawling method(s)!
Note: If you use both “Website Crawling” and “Sitemap Indexing” to push pages to the search results, the contents of your sitemap(s) will be taken into account first. Should any given URL be found in both Data Sources, it will not be indexed again once our crawler starts following links via your root URL(s).
How do I index and search over multiple sites?
Let's assume you have the following setup:
A blog under http://blog.mysite.com/
Your main page under http://mysite.com/
And some content on a separate domain http://myothersite.com/
Now you want to crawl all three sites and end up with one index and a search that finds content on all those pages.
This can be easily achieved by using one or a combination of the following three methods:
Create a sitemap that contains URLs from all the sites you want to index or submit multiple sitemaps, one per line. In this case, our crawler only picks up the links that are present in your sitemap(s).
Let the crawler index multiple sites by providing multiple root URLs in Website Crawling.
Add pages from any of your sites via the API using your API key (available with the Batman plan or if you create a custom plan with API access). You can either index by URL or send a JSON object with the indexable content.
Now you are familiar with the main ways to start the crawler and build your search index. For information on how to further control which pages end up in your index from your configured sources, refer to our Crawler Settings.