Working with XPaths

How can we control which content is used in our search?

We made the Site Search 360 crawlers as intelligent as possible when it comes to analyzing your website and picking the title, image, and content from the right place. Nonetheless, sometimes it is still necessary to tune the crawlers by pointing them directly to the desired content's position.

This is done via XPath expressions placed in the Site Search 360 control panel. In the following video, we will show you how to configure your search engine to present the content your visitors are searching for.

 

  1. First we will search for a Google Chrome extension called "XPath Helper", which allows us to easily define XPaths right from your own site.
  2. Navigate to one of your website's content pages. Press the XPath Helper icon in the top right corner of your browser to open the black overlay which will present the currently selected XPath expression.
  3. Now we want to extract the main content. After opening the XPath Helper, hold the [Shift] key and hover your mouse over your website's elements. You will see how the extension highlights them in yellow while displaying the XPath query in the black overlay box. As you move your mouse this XPath query will change. Try to get all your content highlighted in yellow.
  4. You may fine tune your XPath expression by shortening it. There are two ways of shortening an XPath query—you can remove something from the end to match more child nodes or you can leave the tail and cut the head off to make it match more generally. Make sure your XPath always starts with // when shortening from the front. You can find a live example of this step in the video.
  5. Copy the XPath query over to the Site Search 360 control panel and place it under "Indexing Control" > "Crawler Settings" in the appropriate XPath section, e.g. "Include Content XPath(s)" or "Image XPath(s)".
  6. Press the "Test" button and enter your webpage URL to test the XPath query. If everything is fine you will see the extracted content, headline, or image URL below.
  7. You may define XPath expressions for
    • Include Content XPath(s): Only content found by these XPaths will be indexed. Leave empty if everything should be indexed.
    • Title XPath(s): The XPath pointing to the main title of the page. Default is , change if is different for your site.
    • Image XPath(s): The XPath(s) pointing to the main image. Leave empty if you trust our crawler to find it. For example,
    • Default Image XPath: The XPath pointing to the default image to be used when no other image is found. For example,
    • Exclude Content XPath(s): One XPath per line. Content found by these XPaths will not be indexed. Leave empty if everything should be indexed.
    • Search Snippet XPath: The XPath pointing to the content that you want to be shown in the search results. Note that you have to change the Search Snippet setting under "Search Settings".
  8. After you set all your XPaths don't forget to save the new settings.
  9. For the new settings to take effect, you have to re-index your entire site under the "Index Control" section in the SS360 control panel.
If you have any questions concerning this process please feel free to contact us. Use our live chat in Gitter or use the chat widget on the Site Search 360 main site. Of course you may also reach us via email at mail[at]sitesearch360.com.