API Quick Start Guide

This quick start guide will show you how to evaluate the Engine API within 30 minutes.

From installation to a worked example, you will be taken through the steps required to process a dataset and find the anomalies.

Step 1: Installing Engine API

Step 2: Uploading Flight Comparison Website sample data

Step 3: Viewing the analysis results

Step 4: Deleting the Job

For detailed information on the API, please take a look through the reference documentation.

The examples shown on this page assume you are installing the Engine API locally on the default port 8080; if not please update the host and port number in the URLs accordingly.

Requirements

The Engine API must be installed on 64-bit Linux, Windows or Mac OS X.

The following Linux distributions are supported:

  • Red Hat Enterprise Linux (RHEL) 6.x or 7.x
  • CentOS 6.x or 7.x
  • SuSE Linux Enterprise Server (SLES) 11 or 12
  • Fedora 10+
  • Ubuntu 12+
  • Amazon Linux (latest)

The following Windows versions are supported:

  • Windows 8/Server 2012
  • Windows 8.1/Server 2012r2

The following Mac OS X versions are supported:

  • 10.9 “Mavericks” (evals only; deprecated)
  • 10.10 “Yosemite” (evals only)
  • 10.11 “El Capitan” (evals only)

Please get in touch with support@prelert.com if you have a specific requirement to run on a currently unsupported operating system.

Step 1: Installing

  1. Download from here.
  2. If the installer has lost its execute permission during the download process, reinstate it using
chmod +x prelert_engine_linux_64bit.bin
  1. On the command line run the executable and follow the instructions.

This will allow you to select the installation directory and configure the required port:

./prelert_engine_linux_64bit.bin

A successful install will display the following to stdout:

Copyright (c) Prelert Ltd 2006-2016
====================================
License installed correctly

Starting datastore ....
The datastore has started
Starting the Engine API server ....
The Engine API server has started

Install complete
  1. You are now ready to use the Engine API.

The Engine API is available at http://localhost:8080/engine/v2 The Behavioral Analytics Dashboard is available at http://localhost:5601/app/prelert#

  1. Before proceeding to the worked example, check the installation is running by visiting the following URL in a web browser: http://localhost:8080/engine/v2

If the Engine API is running you will be greeted by a page containing the version information:

Engine API Version

Step 2: Uploading Flight Comparison Website sample data

Let’s now try to analyze an example time series dataset. This data has been taken from a fictional flight comparison website where users can request real-time quotes from multiple airlines. The website makes programmatic calls to each airline to get their latest fare information. It is important that this data request is quick, as slow responding airlines will negatively impact user experience as a whole.

Here we will investigate response times to by airline.

In this exercise we will be using the command-line utility cURL <https://curl.haxx.se> which allows the easy transfer of data using a URL syntax over HTTP.

Before we start, please download this example CSV file (farequote.csv <http://s3.amazonaws.com/prelert_demo/farequote.csv>). Time series data is preferred to be ordered by date. For more information about the significance of the time order of the data see handling out-of-sequence data. The raw csv data looks like this:

time,airline,responsetime,sourcetype
2014-06-23 00:00:00Z,AAL,132.2046,farequote
2014-06-23 00:00:00Z,JZA,990.4628,farequote
2014-06-23 00:00:00Z,JBU,877.5927,farequote
2014-06-23 00:00:00Z,KLM,1355.4812,farequote
2014-06-23 00:00:00Z,NKS,9991.3981,farequote
...
  1. Create New Job

This creates an analysis job for the example data file. This will baseline responsetime for all airlines and report if any responsetime value deviates significantly from its baseline.

We supply two parameters to the endpoint - analysisConfig, which specifies how the data should be analyzed, and dataDescription, which describes how the data is formatted:

curl -X POST -H 'Content-Type: application/json' 'http://localhost:8080/engine/v2/jobs' -d '{
    "id":"farequote",
    "description":"Analysis of response time by airline",
    "analysisConfig" : {
        "bucketSpan":3600,
        "detectors" :[{"function":"metric","fieldName":"responsetime","byFieldName":"airline"}]
    },
    "dataDescription" : {
        "fieldDelimiter":",",
        "timeField":"time",
        "timeFormat":"yyyy-MM-dd HH:mm:ssX"
    }
}'

In this example we are creating a new job with the ID ‘farequote’ and specifying that we want the analysis to be executed on the ‘responsetime’ field. This field contains a numeric value, so we specify the metric function, which expands to all of min, mean and max. (Had we wanted to look at event rate or rare fields we’d have used one of the other available functions.) By declaring byFieldName as ‘airline’, the analysis will be performed across all airlines, instead of a unique analysis done for each of the 19 airlines.

bucketSpan defines that the analysis should be performed across hourly (3600 second) windows.

The dataDescription section gives clues as to how the data is formatted, what character delimits the fields, and what is the format of the timestamp. See Describing your data format for more information.

This will return a unique job identifier that will be used in the remainder of the tutorial, for example:

{"id":"farequote"}
  1. Check Job Status

Now the the analysis job is created, you can check out the details of the job:

curl 'http://localhost:8080/engine/v2/jobs/'

The response shows detailed information about the configuration of the job and its id which uniquely identifies the job:

{
  "hitCount" : 1,
  "skip" : 0,
  "take" : 100,
  "nextPage" : null,
  "previousPage" : null,
  "documents" : [ {
    "location" : "http://localhost:8080/engine/v2/jobs/farequote",
    "description" : "Analysis of response time by airline",
    "dataEndpoint" : "http://localhost:8080/engine/v2/data/farequote",
    "bucketsEndpoint" : "http://localhost:8080/engine/v2/results/farequote/buckets",
    "recordsEndpoint" : "http://localhost:8080/engine/v2/results/farequote/records",
    "logsEndpoint" : "http://localhost:8080/engine/v2/logs/farequote",
    "status" : "CLOSED",
    "timeout" : 600,
    "id" : "farequote",
    "analysisConfig" : {
      "detectors" : [ {
        "fieldName" : "responsetime",
        "function" : "metric",
        "byFieldName" : "airline"
      } ],
      "bucketSpan" : 300
    },
    "dataDescription" : {
      "format" : "DELIMITED",
      "fieldDelimiter" : ",",
      "timeField" : "time",
      "timeFormat" : "yyyy-MM-dd HH:mm:ssX",
      "quoteCharacter" : "\""
    },
    "counts" : {
      "bucketCount" : 0,
      "processedRecordCount" : 0,
      "processedFieldCount" : 0,
      "inputRecordCount" : 0,
      "inputBytes" : 0,
      "inputFieldCount" : 0,
      "invalidDateCount" : 0,
      "missingFieldCount" : 0,
      "outOfOrderTimeStampCount" : 0
    },
    "createTime" : "2014-09-30T11:14:32.792+0000"
  } ]
}

For detailed explanation of the output, please refer to the Job Resource documentation.

  1. Upload Data

Now we can send the CSV data to the data endpoint to be processed by the engine. Using cURL, we will use the -T to upload the file. You will need to edit the URL to contain your job id and specify the path to the farequote.csv file:

curl -X POST -T farequote.csv 'http://localhost:8080/engine/v2/data/farequote'

This will stream the file farequote.csv to the REST API for analysis. This should take less than a minute on modern commodity hardware. Once the command prompt returns, the data upload has completed. Next, we can start looking at the analysis results.

  1. Close the Job

Since we have uploaded a batch of data with a definite end point it is best practice to close the job before requesting results. Closing the job tells the API to flush through any data that is being buffered, store all results and release any resources associated with the job. Once again, you will need to edit the URL to contain your job id:

curl -X POST 'http://localhost:8080/engine/v2/data/farequote/close'

Step 3: Viewing the analysis results

  1. View Dashboard

Browse to the following URL:

http://localhost:5601/app/prelert#

This will display a list of Jobs.

  • Click on the Summary action icon to view the summarized results for the whole job.
  • Click on the Explorer action icon to view the details of individual anomalies and filter them by airlines.
Anomaly Search Job list
  1. Query Anomaly Results using the API

Results are presented at two levels:

  • Buckets - aggregated and normalized for each time interval
  • Records - individual anomaly results

To query bucket results, use the bucket results endpoint for your job id (farequote):

curl 'http://localhost:8080/engine/v2/results/farequote/buckets?skip=0&take=100'

This returns results aggregated for each time interval. skip and take default to 0 and 100 meaning the first 100 results are returned. (See paging results for instructions to retrieve the next 100 results.)

This view is best for seeing overall anomalies over time.

{
  "hitCount" : 119,
  "skip" : 0,
  "take" : 100,
  "nextPage" : "http://localhost:8080/engine/v2/results/farequote/buckets?skip=100&take=100&expand=false&includeInterim=false&anomalyScore=0.0&maxNormalizedProbability=0.0",
  "previousPage" : null,
  "documents" : [ {
    "timestamp" : "2014-06-23T00:00:00.000+0000",
    "influencers" : [ ],
    "recordCount" : 0,
    "anomalyScore" : 0.0,
    "maxNormalizedProbability" : 0.0,
    "initialAnomalyScore" : 0.0,
    "eventCount" : 649,
    "bucketInfluencers" : [ ],
    "bucketSpan" : 300,
    "isInterim" : false
  }, {
  ...
  }
}

Once anomalous buckets have been identified, you can drill down to view the detail.

  1. Query Bucket Details using the API

To drill into these results, we can view the details of a particular bucket. For this example dataset, the bucket with the highest anomaly score has id 1403712000. We can request the details of just this one bucket interval as follows:

curl 'http://localhost:8080/engine/v2/results/farequote/buckets/1403712000?expand=true'
{
  "documentId" : "1403712000",
  "exists" : true,
  "type" : "bucket",
  "document" : {
    "timestamp" : "2014-06-25T16:00:00.000+0000",
    "records" : [ {
      "fieldName" : "responsetime",
      "timestamp" : "2014-06-25T16:00:00.000+0000",
      "function" : "mean",
      "probability" : 5.92822E-30,
      "anomalyScore" : 94.35376,
      "normalizedProbability" : 100.0,
      "byFieldName" : "airline",
      "byFieldValue" : "AAL",
      "typical" : 101.823,
      "actual" : 242.75,
      ...
    } ],
    "anomalyScore" : 94.35376,
    "maxNormalizedProbability" : 100.0,
    "recordCount" : 1,
    "eventCount" : 909,
    "bucketInfluencers": [ {
      "probability": 8.88553E-25,
      "influencerFieldName": "bucketTime",
      "anomalyScore": 94.35376
    } ],
    ...
  }
}

This shows that between 2014-06-25T16:00:00-0000 and 2014-06-25T17:00:00-0000 (the bucket start time and bucketSpan) the responsetime for airline AAL increased from a normal mean value of 101.823 to 242.75. The probability of seeing 242.75 is 5.92822E-30 (which is very unlikely).

This increased value is highly unexpected based upon the past behavior of this metric and is therefore an outlier.

  1. Querying Top Anomalies using the API

To view the most anomalous data records, you can query the records end point. This will accept filters on start date, end date, anomalyScore and normalizedProbability. This will return record documents which match the filter criteria specified in the query. For example, you can query for all significant anomalies that have occured in the the previous hour.

The following example returns all records where the anomalyScore is greater or equal to 80 and timestamp is greater or equal to 2014-06-25T16:00:00-0000 and less than 2014-06-25T17:00:00-0000. We recommend using anomalyScore in the filter, as this is a sophisticated aggregation of anomalousness for each bucket time interval. By using this, rate control for alerting is built-in.

curl 'http://localhost:8080/engine/v2/results/farequote/records?anomalyScore=80&start=2014-06-25T16:00:00-00:00&end=2014-06-25T17:00:00-00:00'

This returns the indivual anomaly records. Please note that these results are the same values as if you had queried a particular bucket with ?expand=true. However these records are returned as a list and are not nested within a parent bucket.

{
  "hitCount" : 1,
  "skip" : 0,
  "take" : 100,
  "nextPage" : null,
  "previousPage" : null,
  "documents" : [ {
    "fieldName" : "responsetime",
    "timestamp" : "2014-06-25T16:00:00.000+0000",
    "function" : "mean",
    "probability" : 5.92822E-30,
    "anomalyScore" : 94.35376,
    "normalizedProbability" : 100.0,
    "byFieldName" : "airline",
    "byFieldValue" : "AAL",
    "typical" : 101.823,
    "actual" : 242.75,
    ...
  } ]
}

Step 4: Deleting the Job

  1. Delete Job

Finally, if the results are no longer wanted the job can be deleted. This shuts downs any active resources associated with the job, deletes the results and removes all job specific log files. This cannot be undone.

curl -X DELETE 'http://localhost:8080/engine/v2/jobs/farequote'