Search Engines

Search Engines are data retrieved from your server logs. It’s firstly used to compute the number of crawls by search engine by URL and therefore if an URL is known by a given search engine or not. Plus, it’s also used to compute visits. This data is mainly used for the Search Engines tab of the Botify Analytics Report.

It also introduces Search Engines orphan URLs which are URLs not in your website structure (or in the scope of your crawl) but which were crawled by a search engine. Search engines’ crawl orphan URLs are shown in the following chart (the red part) available in the Search Engines tab of the Botify Analytics report.

Crawled URLs venn diagramm

URLs datamodel fields

Search Engines feature’s fields.

Examples of Aggregation

The following examples use URLs aggregation to metrics regarding main data.
Note: All the following results are only computed on analyzed URLs (URLs crawled by Botify)

Number of crawled/not crawled URLs by Google

{
  "aggs": [
    {
      "group_by": [
        {
          "range": {
            "field": "search_engines.google.crawls.count",
            "ranges": [
              { "from": 1 }, // Crawled
              { "from": 0, "to": 1 } // Not Crawled
            ]
          }
        }
      ]
    }
  ]
}

Number of crawled URLs by Google bot

{
  "aggs": [
    {
      "metrics": [
        { "sum": "search_engines.google.crawls.search.count" },
        { "sum": "search_engines.google.crawls.smartphone.count" },
        { "sum": "search_engines.google.crawls.ads.count" },
        { "sum": "search_engines.google.crawls.other.count" }
      ]
    }
  ]
}

Number of crawled/not crawled URLs by Google by depth

{
  "aggs": [
    {
      "group_by": [
        "depth",
        {
          "range": {
            "field": "search_engines.google.crawls.count",
            "ranges": [
              { "from": 1 }, // Crawled
              { "from": 0, "to": 1 } // Not Crawled
            ]
          }
        }
      ]
    }
  ]
}

Get metadata

Search Engines feature metadata includes:
- imported data timeframe.
- orphans URLs counts.

Request

  • Operation: getAnalysisSummary
  • Path: analyses/{username}/{project_slug}/{analysis_slug}
  • HTTP Verb: GET
  • Response: Analysis
curl "https://api.botify.com/v1/analyses/${username}/${project_slug}/${analysis_slug}" \
     -X GET \
     -H "Authorization: Token ${API_KEY}" \
     -H "Content-type: application/json"

Response

In the response, you can find the imported logs timeframe with date_start and date_end properties.
Plus, you can get the number of orphan URLs with the orphans property. For instance, the number of orphan URLs for Google crawl can be found in features.search_engines.orphans.crawls.google.total; and the total number of visits generated by those URLs in features.search_engines.orphans.visits.google.
Note: If feature is not enabled, features.search_engines resolves to null.

An extract of the response could be the following.

{
  "features": {
    "search_engines": {
      "date_start": "2016-02-08",
      "date_end": "2016-03-09",
      "orphans": {
        "total": 1931796,
        "crawls": {
          "bing": {
            "bots": {
              "search": 518546
            },
            "total": 518546
          },
          "total": 1735262,
          "google": {
            "bots": {
              "search": 443855,
              "smartphone": 79227,
              "other": 686192,
              "ads": 7442
            },
            "total": 1216716
          }
        },
        "visits": {
          "bing": 79135,
          "total": 196534,
          "google": 117399
        }
      }
    }
  }
}

Get Orphan URLs

Not available yet.
You can’t get the list of orphan URLs yet, which explains why you can’t click on the red part in the Botify Analytics Application.
Please note that we are developing a way to query these orphan URLs.