Tutorial: Using Windows PowerShell

This introductory tutorial provides sample data to analyze a data set using Windows PowerShell.

Pre-requisites

Engine API is installed - This worked example assumes that the Engine API is installed locally. If you are working with a remote instance, please substitute localhost:8080 with your remote ipaddress:port details.

Windows PowerShell 3.0 (or later) installed - The examples use the command Invoke-RestMethod which was introduced in PowerShell 3.0. You can use $PSVersionTable.PSVersion in PowerShell to determine which version you have installed. PowerShell 3.0 (or later) is installed by default starting with Windows 8 and Windows Server 2012. If you have an earlier version installed on your Windows system and you wish to upgrade, please refer to the article TechNet: Installing Windows PowerShell for full instructions on upgrading Windows PowerShell.

Basic knowledge of PowerShell - If you are familiar with scripting languages then PowerShell will be intuative and easy to learn. If you are new to it, then try the following resources.

Getting Started

First browse to the following URL to check the installation is running:

http://localhost:8080/engine/v2

This will return the version number of the Engine API. Don’t worry if the version or build number is not exactly the same as the example below. If your version number is lower you may want to consider upgrading to a newer version.

Prelert Engine REST API
Analytics Version:
Model State Version 22
prelert_autodetect_api (64 bit): Version 6.1.0 (Build d771b5fc3b9077) Copyright (c) Prelert Ltd 2006-2016

Overview

Let’s now try to analyze an example time series dataset. This data has been synthetically generated and contains two, easy to find anomalies which occur as an unusually low and an unusually high value.

The examples in this tutorial assume that you will be using the Windows PowerShell Integrated Scripting Environment (ISE). This has a UI that provides a much richer scripting experience than the traditional command line interface.

Either copy and paste the examples below, or download the PowerShell script from http://s3.amazonaws.com/prelert_demo/v1.0/power-tutorial.ps1.

Before we start, please download the example CSV data set from http://s3.amazonaws.com/prelert_demo/power-data.csv.

This tutorial assumes that you will save this locally in a folder c:\data\tutorial\.

The input data are preferred to be a time series, ordered by date. In case data cannot be sent in time order, see Working with out-of-sequence data. The raw data looks like this:

time,category,value
21/Jul/2014 04:14:15Z,host1,98.939
21/Jul/2014 04:14:29Z,host1,100.873
21/Jul/2014 04:14:45Z,host1,98.942
21/Jul/2014 04:14:59Z,host1,101.116
...

Tutorial

  1. Create New Job.

This creates an analysis job for the example data file. This will model the typical values in the input data and analyze where values deviate from the expected behavior. It will calculate the probability of this deviation and use this to show where anomalies have occured and their significance.

Creating a new job requires both a declaration of how the data is formatted (dataDescription), and how the data is expected to be analyzed (analysisConfig).

Note that single quotes are used for the $jobconfig variable as its value contains double quotes. This avoids needing to use escape characters.

$engine "http://localhost:8080/engine/v2"

$jobconfig '{
  "id":"powershell",
  "description":"Test job for powershell tutorial",
  "analysisConfig" : {
    "bucketSpan":600,
    "detectors" :[
      {"function":"min", "fieldName":"value" },
      {"function":"max", "fieldName":"value" }
      ]
    },
    "dataDescription" : {
      "fieldDelimiter":",",
      "timeField":"time",
      "timeFormat":"dd/MMM/yyyy HH:mm:ssX"
    }
}'

Invoke-RestMethod -uri $engine/jobs -Method POST -ContentType "application/json" -Body $jobconfig

In this example, we are specifying that we want to analyze the value field, looking for unusually high or unusually low values.

The bucketSpan defines that the analysis should be performed across 10 minute time intervals (600 seconds).

The dataDescription section describes how the data is formatted, what character delimits the fields and the format of the timestamp. See Describing your data format (dataDescription) for more information.

We’ve given the job a sensible name and description. The job id must be unique and if omitted, one will be auto-generated. The job id is used as part of the request URL and should not contain any unsafe characters.

Upon successfully creating a job, the following is returned:

{"id":"powershell"}
  1. Check Job Status

Now use the Jobs Dashboard to check out the details of the job.

http://localhost:5601/app/prelert#

Expand the row to view the configuration values. You will see that the input counts are zero and the status is closed, as we have not yet posted any data.

Engine API PowerShell job

For a detailed explanation of this output, please refer to Job Resource.

  1. Upload Data

Now we can send the input CSV file to the data endpoint to be processed by the engine. Edit the $jobdata section if you have saved the example CSV data set to a different location, and edit the $jobid line if you have given your job a different job ID.

$engine "http://localhost:8080/engine/v2"
$jobdata "C:\data\tutorial\power-data.csv"
$jobid "powershell"
Invoke-RestMethod -uri $engine/data/$jobid -Method POST -InFile $jobdata

This will stream the file power-data.csv to the REST API for analysis. This should take less than 30 seconds on modern commodity hardware.

  1. Close the Job

It is best practice to close the job before requesting results. Closing the job tells the API to flush through any data that is buffered and to store the model and the results.

$engine "http://localhost:8080/engine/v2"
$jobid "powershell"
Invoke-RestMethod -uri $engine/data/$jobid/close -Method POST
  1. View Results

Results can be immediately viewed using the Engine API Dashboard.

http://localhost:5601/app/prelert#

Job counts now show that data has been analyzed.

Engine API PowerShell job

Jump to the Explorer dashboard. Here you will see two anomalous time intervals have been identified, and each time interval contains a single anomaly. For these anomalies, the min and max values of the data have deviated from the modeled behavior.

Engine API PowerShell results
  1. Alert on the Results

In a real-time scenario, we need to look for anomalies as they happen. We can query the results endpoint to look for time intervals that are very anomalous. Here we are requesting to see time intervals where the value of anomalyScore is greater than or equal to 80.

$engine "http://localhost:8080/engine/v2"
$jobid "powershell"
Invoke-RestMethod -uri $engine/results/$jobid/buckets?anomalyScore=80 -Method GET | Select -ExpandProperty documents

This returns the same results as seen in the Dashboard. There are two highly anomalous buckets which both have an anomalyScore of 100. This is the maximum value meaning they are considered to be very unusual.

timestamp                : 2014-08-08T08:00:00.0000000
bucketSpan               : 600
anomalyScore             : 91.7975
maxNormalizedProbability : 97.80331
bucketInfluencers        : {@{influencerFieldName=bucketTime; anomalyScore=91.7975; probability=1.57744E-204}}
recordCount              : 1
eventCount               : 39

timestamp                : 2014-08-12T12:00:00.0000000
bucketSpan               : 600
anomalyScore             : 97.80331
maxNormalizedProbability : 97.80331
bucketInfluencer         : {@{influencerFieldName=bucketTime; anomalyScore=97.80331; probability=1.54472E-204}}
recordCount              : 1
eventCount               : 40

The first anomaly shown, occurred in the time interval that started on 8th Aug 2014 at 8am.

The time interval has a calculated anomalyScore of 100, which is the maximum and highly unusual.

The maxNormalizedProbability indicates the probability of the most unusual anomaly for this time period.

The bucketInfluencers contain the aggregated anomaly score for each influencer.

The recordCount shows that a single anomaly was found in this time interval. Note there can be many (or no) anomalies found within each time interval.

The eventCount shows that 39 input records were analyzed.

If this job was analyzing in real time, then we would in practice query on the last x minutes i.e. since last queried. To do this, use the start and end date parameters.

The following is an example of how to find the most recent anomaly which occurred after 10 Aug. Note the escape character required before the ampersand “&” where multiple filters are included in the URL.

$engine "http://localhost:8080/engine/v2"
$jobid "powershell"
$results Invoke-RestMethod -uri $engine/results/$jobid/buckets?anomalyScore=80`&start=2014-08-10T00:00:00-00:00 -Method GET
$results | Select -ExpandProperty documents

This will return the single anomalous bucket.

  1. Delete Job

Finally, the job can be deleted which shuts down all resources associated with the job, and deletes the results.

$engine "http://localhost:8080/engine/v2"
$jobid "powershell"
Invoke-RestMethod -uri $engine/jobs/$jobid -Method DELETE

Conclusion

PowerShell is a powerful script language for the Windows platform. It works very well with the Engine API as it has excellent built-in handling for JSON objects.