Bucket Overview

For any given job the bucket timestamp uniquely identifies that bucket. The size of the bucket is an interval equal to the job’s bucketSpan parameter. Each bucket time interval is closed below and open above i.e. things that happened at exactly the timestamp of the bucket are included in the results for the bucket, but things that happened exactly bucketSpan seconds later are considered to be in the next bucket.

Buckets have a records sub-resource containing a list of anomaly records: each of those records has a probability and additional fields detailing the source of the anomaly. The anomaly records can also be accessed outside of the bucket context and ordered by anomalousness via the records endpoint.

No results are returned for the last bucket, even when a job is closed. This is because it is unclear whether all data relating to it has been received. For example, suppose there are usually around 1000 events per bucket but only 500 have been received for the last one. It is unclear whether the reduced count is a true anomaly or whether an additional 500 events relating to the bucket have yet to be generated.

Buckets Endpoint

The endpoint for obtaining a job’s buckets is:

http://localhost:8080/engine/v2/results/<jobId>/buckets

Bucket results

Using the cURL command line client for example, to get the result buckets for the job with ID weblogs03 you would use the call:

curl -X GET http://localhost:8080/engine/v2/results/weblogs03/buckets

This call returns a paging document containing a list of bucket results. If jobId is not recognized an error is returned. To get more than the first 100 results, or page the results, see pagination. If no results are available an empty paging document is returned.

Each bucket has a timestamp in seconds from the Epoch which can be used to uniquely identify it within a job. This timestamp can be used to access individual bucket results through the endpoint:

http://localhost:8080/engine/v2/results/<jobId>/buckets/<timestamp>

For example, using cURL to get the results for job ID weblogs03 for the bucket at epoch time 1389168000:

curl -X GET http://localhost:8080/engine/v2/results/weblogs03/buckets/1389168000

may return the following document:

{
  "exists" : true,
  "type" : "bucket",
  "document" : {
    "timestamp" : "2014-01-08T08:00:00.000+0000",
    "bucketSpan" : 3600,
    "maxNormalizedProbability" : 100.0,
    "anomalyScore" : 100.0,
    "bucketInfluencers" : [ {
      "influencerFieldName" : "bucketTime",
      "anomalyScore" : 100.0,
      "probability" : 2.22507E-308
    } ],
    "eventCount" : 439,
    "recordCount" : 1
  }
}

Filtering Buckets

Buckets can be filtered by date using the start and end query parameters or thresholded on the anomalyScore or normalizedProbability fields so that only those greater than or equal to a certain value are returned. For example:

http://localhost:8080/engine/v2/results/<jobId>/buckets/?start=2014-06-09T00:00:00Z&end=2014-06-14T00:00:00Z&anomalyScore=75

This request asks for buckets between 9 June 2014 and 14 June 2014 where anomalyScore is >75

Bucket Resource Expansion

The default view of the results is a shallow view of the buckets excluding the anomaly records. Resource expansion is a mechanism to return all the associated anomaly records with a bucket in a single request using the expand query parameter.

http://localhost:8080/engine/v2/results/<jobId>/buckets/<timestamp>?expand=true

Such a query would return the bucket instance with the anomaly records embedded, for example:

{
  "exists" : true,
  "type" : "bucket",
  "document" : {
    "timestamp" : "2014-01-08T08:00:00.000+0000",
    "bucketSpan" : 3600,
    "maxNormalizedProbability" : 100.0,
    "anomalyScore" : 100.0,
    "bucketInfluencers" : [ {
      "influencerFieldName" : "bucketTime",
      "anomalyScore" : 100.0,
      "probability" : 2.22507E-308
    } ],
    "records" : [ {
      "timestamp" : "2014-01-08T08:00:00.000+0000",
      "fieldName" : "value",
      "normalizedProbability" : 100.0,
      "probability" : 2.22507E-308,
      "anomalyScore" : 100.0,
      "function" : "max",
      "typical" : 101.426,
      "actual" : 120.053
    } ],
    "eventCount" : 439,
    "recordCount" : 1
  }
}

The same expand query option can be applied to all buckets in a job to get all the result buckets and associated records in one simple call:

http://localhost:8080/engine/v2/results/<jobId>/buckets?expand=true

Such a query will return all result buckets in the job with the anomaly records expanded inline.

Additionally the expand query option can also be used in conjunction with the start and end date filter query parameters, for example:

http://localhost:8080/engine/v2/results/<jobId>/buckets/?start=2014-02-12T00:00:00Z&end=2014-02-14T00:00:00Z&expand=true