REST API

API methods

1. Indexing

The one thing that will help you quickly wrap your head around API indexing is our POSTMAN collection of endpoints (aka points where API connects with Site Search 360) which you can download here. If you're unfamiliar with POSTMAN, check out this article which briefly explains what POSTMAN is and how it works. Spoiler: it's an intuitive platform for building and using API.

There’s a couple of principles to keep in mind when working with API.

The unique identifier for an indexed page is its URL. Each URL gets indexed once regardless of the data source (and no source is more or less powerful than others - they’re all on equal footing, so API can be used alongside Website Crawling, Sitemap Indexing, etc.). If you have a page with a given URL already indexed, it will be updated and not duplicated, so any changes to the content of the page will be reflected in the search. However, if the content of the page remains the same, but its URL was changed, the crawler will regard said page as a completely new one, in which case the old URL must be removed from the index to avoid having duplicate content in the search results.

Your API setup might look like this:

POST
https://api.sitesearch360.com/index/documents?url={URL}&token={API_KEY}

{URL} is the absolute URL you want to index and {API_KEY} is your search project's API key that you can find under Account > General (if your plan includes API access).

The /documents endpoint is responsible for pushing the URL to your search index. Our crawler checks and processes it according to various settings defined in your Control Panel (for instance, blacklisting/no-indexing rules, Data points, Result Grouping, Content Extraction), so some entries might be skipped, moved to certain result groups, etc.

Alternatively, you can send the searchable content in JSON which will transform your setup as follows:

POST
https://api.sitesearch360.com/sites/pageJson?token={API_KEY}

The /pageJSON endpoint needs a JSON body request. Our crawler isn't used for this. To make your page’s content (and not just the URL/title of the page/whatever else is defined directly in JSON) searchable, you can set appendPageContent to true in the request. This will trigger our crawler to download and index the content of the file, combining it with what was defined in the request.

If you send the searchable content in JSON, it doesn't respect any rules configured in the Control Panel (for instance, blacklisting/no-indexing rules, Data points, Result Grouping, Content Extraction).

The body of the POST request must contain a JSON object of the following format:

{
  "url": "https://test.com",
  "title": "My page title",
  "content": "This is a simple test page",
  "appendPageContent": false,
  "contentGroup": "Services",
  "imageUrl": "https://test.com/image.jpg",
  "dataPoints": [
    {
      "key": "Name of Data Point",
      "value": "Value of Data Point",
      "show": false
    },
    {
      "key": "Price",
      "value": "$3",
      "show": true
    }
  ],
  "filters": [
    {  /* Range filter */
       "key": "fid#1",
       "value": 2.5
    },
    {  /* Multiple choice filter */
       "key": "fid#2",
       "value": ["val1", "val2"]
    }
  ],
  "boost": 3,
  "keywords": ["keyword1", "keyword2", "keyword3"],
  "language": "english"
}

The fields dataPoints, filters, language, boost, and keywords are optional.

NOTE: Filters have to be defined in the Control Panel and referenced with the generated Filter-ID.

You can also index up to 100 JSON entries at once:

POST
https://api.sitesearch360.com/sites/pagesJson?token={API_KEY}
[{
  "url": "https://test.com",
  "title": "My page title",
  "content": "This is a simple test page",
  "appendPageContent": false,
  "contentGroup": "Services",
  "imageUrl": "https://test.com/image.jpg",
  "dataPoints": [
    {
      "key": "Name of Data Point",
      "value": "Value of Data Point",
      "show": false
    },
    {
      "key": "Price",
      "value": "$3",
      "show": true
    }
  ],
  "filters": [
    {  /* Range filter */
       "key": "fid#1",
       "value": 2.5
    },
    {  /* Multiple choice filter */
       "key": "fid#2",
       "value": ["val1", "val2"]
    }
  ],
  "boost": 3,
  "keywords": ["keyword1", "keyword2", "keyword3"],
  "language": "english"
},
...]
DELETE
https://api.sitesearch360.com/index/documents?url={URL}&token={API_KEY}

{URL} remains the absolute URL you want to remove.

DELETE
https://api.sitesearch360.com/index/documents?urlPattern={URL_PATTERN}&token={API_KEY}

{URL_PATTERN} is a regular expression. All URLs matching said expression will be removed from the index.

2. Searching

Your API setup for searching through indexed entries might look like something like that:

GET
https://global.sitesearch360.com/sites?query={QUERY}&site={SITE_ID}&filterOptions={FILTER_OPTIONS}&filters={FILTERS}&sort={SORT}&sortOrder={ORDER}&offset={OFFSET}&limit={LIMIT}&includeContent={INCLUDE}&highlightQueryTerms={HIGHLIGHT}&includeContentGroups={CONTENT_GROUPS}&log={LOG}

Each parameter has its own meaning:

  • {SITE_ID} is your site ID.

  • {QUERY} is your search query.

  • {FILTER_OPTIONS} (optional) stands for filters found in the result set which the API will return only if the element is set to true.

  • {FILTERS} (optional) is a JSON array of filters. For example, [{"key": "fid#2", "values": [{"name": "val1"}]},{"key": "fid#1", "min": 2, "max": 5}]. The following structure must be used when setting up filters in the ss360Config.results.filters property — [{"key": "fid#2", "name": "My Filter 2", "values": [{"name": "val1", "value": "val1"}]},{"key": "fid#1", "name": "My Filter 1", "min": 2, "max": 5}].

  • {SORT} (optional) is the name of the data point by which you want the search results to be sorted.

  • {ORDER} (optional) is the sorting order (can be either ASC for ascending or DESC for descending).

  • {INCLUDE} (optional) corresponds with search snippets - if this parameter is set to true, your search results will contain the content snippet, but by default it is configured as false.

  • {HIGHLIGHT} (optional) stands for highlighted terms in your search results (this feature is used to signify complete matches between the query and the result) - query terms will be highlighted if the parameter is set to true, but by default it is configured as false.

  • {CONTENT_GROUPS} (optional) is a JSON array which corresponds with specific result groups you want to limit your search to, for instance: ["Group 1","Group 2"].

  • {OFFSET} (optional) is the number of results the crawler's supposed to skip from the beginning of your search results.

  • {LIMIT} (optional) is the number of results to return within a certain range, for instance: [1,100]. The maximum number is 100.

  • {LOG} (optional) configures whether to log the query or not and can be set to either true or false (but the default is true).

The search response will have the following structure:

{
    "suggests": {
        "_": [
            {
                "name": "Search result title",
                "image": "https://mysite.com.com/sample.jpg",
                "link": "https://mysite.com/sample.html",
                "type": "HTML",
                "content": "This is the search snippet.",
                "dataPoints": [
                    {
                        "key": "Data Point Name",
                        "value": "Data Point Value",
                        "show": true
                    }
                ]
            },
            …
        ],
        "Products": [
            …
        ]
    },
    "query": "The search query",
    "totalResults": 360,
    "totalResultsPerContentGroup": {
        "_": 5,
        "Products": 355  
    },
    "sortingOptions": ["Date"],
    "sorting": "",
    "filterOptions": [],
    "activeFilterOptions": []
}
  • suggests is a mapping of content groups corresponding with an array of retrieved search results (all uncategorized results are listed as _).

  • query is your search query.

  • totalResults is the number of all available search results.

  • totalResultsPerContentGroup is a mapping of all available search results per content group.

  • sortingOptions is an array of all available sorting options.

  • sorting is the active/selected sorting option.

  • filterOptions is an array of all available filtering options.

  • activeFilterOptions is an array of all active/applied filter options.

Search results can be presented in any of the following types:

  • HTML

  • YOUTUBE_VIDEO

  • PDF

  • XLS

  • DOC

  • PPT

  • ODP

  • ODS

  • ODT

  • TXT

  • custom

NOTE: To support custom HTML results ("type": "custom") you need to use the html property which will render your search result.

If you want to get query suggestions (autocomplete) and suggested results based on a query, use this API call:

GET
https://global.sitesearch360.com/sites/suggest?query={QUERY}&limit={LIMIT}&site={SITE_ID}&groupResults={GROUP_RESULTS}&maxQuerySuggestions={MAX_QUERY_SUGGESTIONS}

Parameters in this setup are as follows:

  • {SITE_ID} is your site ID.

  • {QUERY} is your search query.

  • {LIMIT} (optional) is the number of results the system has to return within a specific range [1,2].

  • {GROUP_RESULTS} (optional) defines whether to combine search suggestions by content group or not, depending on the settings (it's configures as true by default).

  • {MAX_QUERY_SUGGESTIONS} (optional) defines the maximum number of query suggestions (autocomplete queries) to return. Please note that these will only show up if matching phrases for the given query have been searched multiple times before.

NOTE: To support HTML custom suggestions ("type": "custom") you need to use the suggestionHtml property to render the search suggestion.

Searches are logged automatically (if {LOG} is true). If you use API without the Site Search 360 Javascript, you can use this endpoint to also log abandoned search suggestions and selected suggestions.

3. MISC

To log a specific query, you’ll need this setup:

POST
https://api.sitesearch360.com/sites/queries/log

The following parameters will have to be included in the POST body:

  • query is the query to log.

  • action is the action that will be taken - it can be either "select" (the search suggestion will be selected) or "abandon" (search suggestions will be generated but the query will not be executed).

  • timeToAction (optional) is the number of milliseconds before your user abandons or selects the query.

  • site is your site ID.

  • apiKey is your API key.

To make the endpoint return a JSON file with the number of indexed pages, the setup will have to be modified to this:

GET
https://api.sitesearch360.com/sites/indexStatus?token={API_KEY}

To check which pages have been indexed, you can use a request that’s even more elaborate:

GET
https://api.sitesearch360.com/sites/indexedContent?url={URL}&contentType={CONTENT_TYPE}&status={STATUS}&offset={OFFSET}&limit={LIMIT}&token={API_KEY}

The following query parameters are used to filter the results:

  • {URL} (optional) is the string that should be part of the URL.

  • {CONTENT_TYPE} (optional) is the content type you want to filter by (e.g. "HTML" or "PDF").

  • {STATUS} (optional) is the Index Status you want to filter by (e.g. “200” for successfully indexed entries).

  • {OFFSET} (optional) is the number of results the crawler's supposed to skip from the beginning of your search results.

  • {LIMIT} (optional) is the number of results to return within a certain range, for instance: [1,100]

To access URLs logged in the Index Status Table under Index, you can use the following request:

GET
https://api.sitesearch360.com/sites/queries/frequent?start={START_TIMESTAMP}&end={END_TIMESTAMP}&token={API_KEY}
  • {START_TIMESTAMP} is the UNIX timestamp of the beginning of the period during which the desired URLs were indexed.

  • {END_TIMESTAMP} is the UNIX timestamp of the end of the period during which the desired URLs were indexed.

The setup for a time chart of indexed entries is pretty much the same:

GET
https://api.sitesearch360.com/sites/queries/timechart?start={START_TIMESTAMP}&end={END_TIMESTAMP}&token={API_KEY}
  • {START_TIMESTAMP} is the UNIX timestamp of the beginning of the period during which the desired URLs were indexed.

  • {END_TIMESTAMP} is the UNIX timestamp of the end of the period during which the desired URLs were indexed.

If you have any questions regarding API or any other topic, make sure to email us. We're always happy to help!