Analytical Functions

The analysis functions give a wide variety of flexible ways to analyze data for anomalies.

Temporal (time-based) analysis is invoked by default, unless an overFieldName is specified, which shifts the analysis to be population/peer based.

When byFieldName is used with a function, the analysis considers whether there is an anomaly for one of more specific values of byFieldName.

Some functions cannot be used with a byFieldName or overFieldName.

A partitionFieldName can be specified with any function. When partitionFieldName is used, the analysis is replicated for every distinct value of partitionFieldName.

A summaryCountFieldName can be specified with any function except metric. When summaryCountFieldName is used, the input data is expected to be pre-summarized by the client, with the value of the summaryCountFieldName field containing the count of raw events that were summarized.

Some functions can benefit from overlapping buckets. This improves the overall accuracy of the results but at the cost of a 2 bucket delay in seeing the results.

Most functions detect anomalies in both low and high values. In statistical terminology, they apply a two-sided test. Some functions offer low and high variations (e.g. count, low_count and high_count). These variations apply one-sided tests, detecting anomalies only when the values are low or high, depending one which alternative is used.

The table below provides a high-level summary of the analytical functions provided by the API. Each of the functions is described in detail over the following pages. Note the examples given in these pages use single Detector Configuration objects.

name description fieldName byFieldName overFieldName overlappingBuckets
count, high_count, low_count individual count N/A optional optional optional
non_zero_count high_non_zero_count, low_non_zero_count count, but zeros are treated as null and ignored N/A optional N/A optional
distinct_count, high_distinct_count, low_distinct_count distinct count required optional optional optional
rare rare items N/A required optional N/A
freq_rare frequently rare items N/A required required N/A
info_content, high_info_content, low_info_content information content required optional optional optional
lat_long geographic location required optional optional optional
metric all of mean, min and max required optional optional optional
mean, high_mean, low_mean arithmetic mean required optional optional optional
median statistical median (beta) required optional optional optional
min arithmetic minimum required optional optional N/A
max arithmetic maximum required optional optional N/A
sum, high_sum, low_sum arithmetic sum required optional optional optional
non_null_sum, high_non_null_sum, low_non_null_sum arithmetic sum where null buckets are ignored required optional optional optional
time_of_day, time_of_week unusual time-based N/A optional optional N/A
varp, high_varp, low_varp population variance required optional optional optional