Results Endpoint

The results of an Engine API analytics job are organized into Records and Buckets. The results are aggregated and normalized in order to identify the mathematically significant anomalies. When categorization is specified, the results also contain Category Definitions.

Records

Records contain the analytic results. They detail the anomalous activity that has been identified in the input data based upon the detector configuration. For example, if looking for unusually large data transfers, an anomaly record would identify the source IP address, the destination, the time window during which it occurred, the expected and actual size of the transfer and the probability of this occurring. Something that is highly improbable is therefore highly anomalous.

The anomaly record probability is stored to a high precision of over 300 decimal places. Low probabilities will be written in scientific notation, e.g. 3.24E-184. We have therefore calculated a friendly view of this as normalizedProbability, which is percentage amount.

There can be many anomaly records depending upon the characteristics and size of the input data; in practice too many to be able to manually process. The Engine API therefore performs a sophisticated aggregation of the anomaly records into buckets.

Influencers

Influencers are the entities that have contributed to, or are to blame for, the anomlies. Influencers are given an Anomaly Score, which is calculated based on the anomalies that have occurred in each bucket interval. For jobs with more than one detector, this gives a powerful view of the most anomalous entities.

Influencers can be accessed using the influencers endpoint. Upon identifying an influencer with a high score, you can investigate further by accessing the records endpoint for that bucket and enumerating the anomaly records that contain this influencer.

Buckets

Buckets are the grouped and time ordered view of the analytic results. A bucket time interval is defined by bucketSpan which is specified in the job configuration. The default is 5 mins. Each bucket has an anomalyScore, which is a statistically aggregated and normalized view of the combined anomalousness of the records. Use this for rate controlled alerting.

Each bucket also has a maxNormalizedProbability that is equal to the highest normalizedProbability of the records with the bucket. This gives an indictaion of the most anomalous event that has occured within the time interval. Unlike anomalyScore this does not take into account the number of correlated anomalies that have happened.

Buckets can be accessed using the buckets endpoint. Upon identifying an anomalous bucket, you can investigate further by either expanding the bucket resource to show the records as nested objects or by accessing the records endpoint directly and filtering upon date range.

Method for Normalization

Normalization is performed on both the anomalyScore and normalizedProbability. The value is between 0 and 100 which allows for easier prioritization and filtering. Normalization uses dynamic quantiles which are optimized for high throughput, will gracefully age historical data, act to reduce the signal to noise levels and adjusts for any variations in event rate.

Interim Results

It is possible to accurately calculate anomalies without having seen the entirity of a data bucket. Under normal usage, results are calculated at the end of each bucket. In order to query interim results a special API flush call is required to trigger the interim results calculation. Then, when querying using the API you must request to include interim results which are not provided by default. However when viewing using the Prelert Kibana App, interim results will be displayed.

Interim results are calculated for buckets that are not complete yet i.e. these buckets contain partial data. Some functions (e.g. count or sum functions) are sensitive to that data partiality. For these functions, the probability calculation for interim results takes into account the predicted value based on the estimated proportion of the bucket that has been seen. While interim results are as accurate as possible, one should be aware that they may differ from the final results.

Interim results will also be available if overlapping buckets are enabled.

Category Definitions

When categorization is specified, it is possible to view the definitions of the resulted categories. A category definition describes the common terms matched and contains examples of matched values. To access category definitions use the category definitions endpoint.