Alerts Endpoint

The API has a long poll alerting endpoint for low latency real time alerting. Alerts may be triggered by an influencer score, bucket influencer score or the result bucket’s anomaly score/probability. The alertOn parameter dictates which to use.

Register for alerts by making a GET request to the alerts_longpoll endpoint with one or both of the threshold query parameters.

Alerts are dynamic real-time events that can only be created by jobs in the RUNNING state. Alerts are created at the point the data is processed. They are raised based on data which has been seen up until this point.

As the Engine API continues to learn, it will subsequently re-evaluate these scores in a process called re-normalization. When reconciling Engine API results to alerts at a later point in time, the alert score will be equal to the initialAnomalyScore or initialNormalizedProbability.

Setting up a Long Poll

The alert endpoint is:

http://localhost:8080/engine/v2/alerts_longpoll/<jobId>

GET requests to this URL will block until either a alert is generated or the request times out. In the event of a timeout the request should be repeated.

As an example using the cURL command line client register for critical alerts in the job apm.

curl -X GET 'http://localhost:8080/engine/v2/alerts_longpoll/apm?score=75'

The alert may look like this:

{
  "timestamp" : "2014-11-18T16:02:08.858+0000",
  "timeout" : false,
  "alertType" : "bucket",
  "isInterim" : false,
  "records" : [ {
    "function" : "sum",
    "fieldName" : "Out Octets",
    "anomalyScore" : 64.8906,
    "normalizedProbability" : 94.3634,
    "byFieldName" : "host",
    "probability" : 4.33822E-7,
    "byFieldValue" : "netprobe.acme.com",
    "typical" : 4.16899E7,
    "actual" : 4.55047E8
  } ],
  "jobId" : "apm",
  "uri" : "http://localhost:8080/engine/v2/results/apm/buckets/1400490000?expand=true",
  "anomalyScore" : 64.8906,
  "maxNormalizedProbability" : 94.3634
}

Detecting a Timeout

If the request times out the alert’s timeout field will be set to true and HTTP status code 200 (OK) returned. When timeout is true, the call to /alerts_longpoll should be repeated. This is the typical procedure for longpoll alerting.

{
  "timeout" : true,
  "alerttype" : "bucket",
  "jobId" : "apm",
  "anomalyScore" : 0.0,
  "maxNormalizedProbability" : 0.0,
  "isInterim" : false
}

Setting the Timeout

The alert request will timeout after a certain period has expired by default the timeout is 90 seconds. In certain situations you may wish to change this value which can be achieved using the timeout query parameter.

Alert on buckets with anomaly score >= 75 timing out after 120 seconds.

http://localhost:8080/engine/v2/alerts_longpoll/<jobId>?score=75&timeout=120

One situation where this is useful is if you are accessing the API through an intermediary gateway server and your request returns HTTP status code 504 (Gateway Timeout) then lower the timeout value so it is below that of the gateway.