BQLFilter
allows to define the filter to perform on URLs fields. It can be composed using boolean conditions (and, or, not).
A field filter allows to describe a predicate to apply for a given field. Full list of filterable fields can be found in Analysis Datamodel.
{
"predicate": string,
"field": string,
"value": ?any,
}
According to the field type, several predicates can be applied.
Can be applied to numerical fields, including integer
, long
, float
, double
, date
and datetime
fields.
{
"predicate": "eq",
"field": "http_code",
"value": 200
}
{
"predicate": "gt",
"field": "http_code",
"value": 200
}
{
"predicate": "gte",
"field": "http_code",
"value": 200
}
{
"predicate": "lt",
"field": "http_code",
"value": 200
}
{
"predicate": "lte",
"field": "http_code",
"value": 200
}
{
"predicate": "between",
"field": "http_code",
"value": [200, 300]
}
Note: Lower and upper boundaies are inclusive. For instance the above example means 200 <= http_code <= 300
.
Can be applied to categorical fields, including string
, boolean
and tree
fields.
{
"predicate": "eq",
"field": "url",
"value": "https://botify.com"
}
{
"predicate": "contains",
"field": "url",
"value": "botify"
}
{
"predicate": "starts",
"field": "url",
"value": "https"
}
{
"predicate": "ends",
"field": "url",
"value": "hifi"
}
{
"predicate": "re",
"field": "url",
"value": "https.*hifi"
}
tree
fields can use both categorical predicates and some extra predicates making it easy to filter on their children. Tree fields are saved as flat strings like for/bar
.
Returns given value and all its children. For instance, the following filter could returns foo
, foo/bar
and foo/baz
.
{
"predicate": "with_children",
"field": "segments.segment_1.value",
"value": "foo"
}
Returns matching value excluding children. For instance, the following filter could only returns foo
.
{
"predicate": "without_children",
"field": "segments.segment_1.value",
"value": "foo"
}
Returns matching children excluding parent. For instance, the following filter could returns foo/bar
and foo/baz
.
{
"predicate": "only_children",
"field": "segments.segment_1.value",
"value": "foo"
}
timeseries
fields contains a sequence of values for each day. Those fields are any available in the Logs API, where you need to define a range of dates you want to work on.
For instance, if you want to list the number of crawls by URL from the 1st of January to the 7th, crawls.google.count_by_day
is a list, the first item is the number of crawls on Jan. 1st, the last item is the number of crawl on Jan. 7th.
This example filters URLs that have been crawled at least once every day on the period. Note that you can use all numerical predicates (all.eq
, all.gte
, all.gt
, all.lt
, all.lte
or all.between
)
{
"field": "crawls.google.count_by_day",
"predicate": "all.gte",
"value": "1"
}
This example filters URLs that have been crawled at least once whenever on the period. Note that you can use any numerical predicates (any.eq
, any.gte
, any.gt
, any.lt
, any.lte
or any.between
)
{
"field": "crawls.google.count_by_day",
"predicate": "any.gte",
"value": "1"
}
This example filters URLs that have been crawled at least once the first day on the period. If you want the ones that have been crawled the second day, you would replace [0]
by [1]
.
{
"field": "crawls.google.count_by_day[0]",
"predicate": "gte",
"value": "1"
}
The exists
predicate takes no value and tests if the field exists in the document. Some fields don’t exist because the related feature wasn’t enabled or failed during analysis. For instance, previous fields do not exist if the comparison feature is not enabled.
{
"predicate": "exists",
"field": "previous"
}
Some fields can contain a list of values, they are called multiple fields. For instance, query_string_keys
could be equal to ['page', 'length']
on a URL with pagination.
To filter on these fields, predicates must be prefixed by any.
For instance, the following filters URLs on these having a query string key that is equal to “page”.
{
"predicate": "any.eq",
"field": "query_string_keys",
"value": "page"
}
FieldFilter
can be composed with some boolean condition.
{
"and": [
FILTER_1,
FILTER_2,
...
]
}
{
"or": [
FILTER_1,
FILTER_2,
...
]
}
{
"not": FILTER
}
The following BQLFilter
filters the analysis URLs dataset on new URLs which are compliant and have no title.
{
"and": [
{
"field": "compliant.is_compliant",
"value": true
},
{
"field": "metadata.title.nb",
"predicate": "eq",
"value": 0
},
{
"not": {
"field": "previous",
"predicate": "exists"
}
}
]
}