Troubleshooting

We have designed the Engine API to be as simple as possible. However despite our best intentions we appreciate that sometimes things do not go quite as planned and that you may run into problems when trying to run a job to find anomalies in your data. This section is designed to give you some hints and tips into how you can troubleshoot issues at all stages of the process, from starting the engine, creating jobs and analyzing data, through to viewing your results.

If you are still having problems, please do not hesitate to contact support@prelert.com.

The troubleshooting process can be broadly split into the following areas. Various techniques are listed to help you isolate any issues that you may encounter.

  1. Checking the Engine API is running
  2. Checking the job setup and configuration
  3. Checking that data is being analyzed
  4. Checking for anomalous results

We recommend that you try out our API Quick Start Guide which contains a known data source and configuration. Using the quick start example data set it is possible to validate that the Engine API is installed and operating correctly and troubleshooting can then be focussed in other areas.

The examples given throughout this documentation assume you have installed the Engine API locally on the default port 8080 - if not please update the host and port number in the URLs accordingly.

Basic Pre-flight Checks

To check the Engine API is running, browse to the following:

http://localhost:8080/engine/v2

This will return the version number of the Engine API. Don’t worry if the version or build number is not exactly the same as the example below. If the version number is lower we recommend upgrading to a newer version.

Prelert Engine REST API
Analytics Version:
Model State Version 22
prelert_autodetect_api (64 bit): Version 6.1.0 (Build d771b5fc3b9077) Copyright (c) Prelert Ltd 2006-2016

If the Engine API is not running, on Windows you can start it using Start Menu -> All Programs -> Prelert Engine -> Start Prelert Engine Services.

On Linux or Mac OS X you can start it by using the following command line:

$PRELERT_HOME/bin/prelert_startup.sh

If you have sourced $PRELERT_HOME/profile (bash/ksh/sh) or $PRELERT_HOME/cshrc (tcsh/csh) then you’ll be able to type simply:

prelert_startup.sh

because $PRELERT_HOME/bin will be on your $PATH.

Important

Do not run prelert_startup.sh as the root user. Doing so means that you will be vulnerable to any security flaws in Elasticsearch, Jetty or Java itself, and will also create files owned by root in your installation that subsequently prevent the correct non-root user running the software. Always run the Engine API as the same non-root user who installed it.

Checking Job Setup

The following describes how to check that the job entity has been created. Full documentation on how to creating an Engine API job is provided here.

Find the job ID

Each Engine API job has a unique job ID which is needed to troubleshoot further. If you already know the job ID, move onto the next step. The default job ID is in the format 20140613170020-00011 (YYYYMMDDhhmmss-xxxxx). This is the UTC time that the job was created, appended with a counter. Otherwise the quickest way to view the list of all Engine API jobs is using a web browser and hitting the URL:

http://localhost:8080/engine/v2/jobs

Scroll down the page to identify the job ID, using the createTime as reference.

If you cannot find the job ID, there may have been a problem in creating the job. If this is the case, check the Engine API core log files for errors.

Look at the job details

View the job details by browsing to the following link, substituting <jobId>.

http://localhost:8080/engine/v2/jobs/<jobId>

Check that the job configuration appears to be correct, with the Detectors listed as you would expect. Don’t worry if the status is “CLOSED”, this just means that the job is not currently processing data. If the process to send data is running as a timed event, a “CLOSED” job will still be able to accept data on the next timer.

Next steps

We have verified that the job has been set up and configured correctly.

Now check if data is being received and processed in order to display results.

Checking the API is running

To check the Engine API is running, browse to the following:

http://localhost:8080/engine/v2

This should return the version number of the Engine API. If you do not see the correct response, then perform the following checks:

  1. Try to start the engine API by running prelert_startup.sh as described here.
  2. Check firewall and security settings to ensure that the Engine API port (default port 8080) is open.
  3. Check the core log file for errors: $PRELERT_HOME/logs/engine_api/engine_api.log.
  4. Check the licensing log file to confirm a valid license is applied: $PRELERT_HOME/logs/lictest/lictest.log. If your license has expired, details on how to renew your license can be found here.

Next steps

We have validated that the Engine API is running and licensed correctly. Now check that the jobs are set up and configured correctly.

Checking data is being analyzed

As the Engine processes data it regularly updates the number of records it has processed. If you want to establish that your new job is actually processing data these counts are the first values you should look at.

First, a little terminology:

  • Fields starting with input are the counts of the raw data uploaded to the API
  • Fields starting with processed are the counts of the data actually processed by the Prelert Engine

If 100 records each with 4 fields are uploaded to the API processedRecordCount and processedFieldCount will be 100 and 400 respectively. Assuming that each record has a readable timestamp inputRecordCount will be the same as processedRecordCount but if the job is configured to only analyze 2 of the 4 fields in each record then inputFieldCount will be 200.

counts is a sub-resource of the job resource. For more details see Job Counts.

http://localhost:8080/engine/v2/jobs/<jobId>
{
    "documentId" : "jobID",
    ...

    "counts" : {
      "bucketCount" : 119,
      "processedRecordCount" : 86275,
      "processedFieldCount" : 172550,
      "inputRecordCount" : 86275,
      "inputFieldCount" : 258825,
      "inputBytes" : 8246803,
      "invalidDateCount" : 0,
      "missingFieldCount" : 0,
      "outOfOrderTimeStampCount" : 0
    }
}

bucketCount is the number of result buckets that have been produced, a non-zero value shows that the Engine is analyzing the data. If bucketCount stubbornly remains at 0 while processedRecordCount is increasing here are some common issues to look out for:

Incorrect field names

A high missingFieldCount value indicates that a detector has been mis-configured or possibly a typo in the configuration.

When using CSV data you will receive an error message if the job’s Detector Configuration uses a field name that does not exist in the csv header.

When uploading data in JSON format it is possible to have different fields in different events; so if a typo is made in the Detector Configuration the Engine API will keep disregarding the valid events as it looks for the field containing the typo.

Note that field names are case sensitive.

Incorrect time formats

For every record where the time field does not exist or the time value cannot be parsed invalidDateCount will be incremented by 1. Records with an invalid time field will not be processed, so if your job has a high invalidDateCount and a low inputRecordCount review the timeFormat job’s Data Description.

By default the API will look for the field time in your data, with the date in seconds from the Epoch. If this field is not present in the data then the Data Description must provide the name of the field containing the timestamp and a format string describing how to parse that timestamp.

For example, if the time is formatted as 2014-06-28T21:40:23+0000 then the corresponding timeFormat is yyyy-MM-dd’T’HH:mm:ssXXX Note: The apostrophes around the character T may need an escape character, for example when using cURL in a Bash script.

Further information on how to specify the time format in your data can be found in Date Time Format.

Unordered time series data

The Engine API prefers time series data to be in ascending chronological order. If the order of the data cannot be guaranteed, a latency window can be specified in the job analysis configuration.

Records that are out of order and outside of the latency window are discarded. A high value of outOfOrderTimeStampCount indicates the need to review the data and ensure it is correctly time ordered before uploading it to the API.

Whilst the golden configuration for real-time anomaly detection would be to process data in chronological order without latency this is not always possible. If the order of the data cannot be guaranteed, we recommend specifying as short a latency window as your data will allow. If you are using a large bucketSpan you could consider rounding the timestamps of your input data to the nearest second.

Further notes on working with out-of-sequence data are available.

Model memory limit may stop anomalies being detected in a very large job

The Engine has an internal limit to stop it potentially consuming more than the available amount of memory on the machine. See analysis limits for more detail. This is not the total amount of memory used by the API, but just the Engine’s internal mathematical models. To see how much memory the models are consuming for a job, view the modelBytes value in the modelSizeStats section. The default limit for the Engine is 4096 MiB.

http://localhost:8080/engine/v2/jobs/<jobId>
{
    "documentId" : "jobID",
    ...
    "modelSizeStats" : {
      "memoryStatus" : "OK",
      "modelBytes" : 443676,
      "totalByFieldCount" : 20,
      "totalPartitionFieldCount" : 2,
      "totalOverFieldCount" : 0
    }
}

Further troubleshooting

Note that for very low data rates bucketCount may be zero because there isn’t the weight of data to push the results through the data buffers. In this case close the job after the upload to flush the results.

Try uploading a initial small subset of the data (at least 500 records) and look at the counts. The API will return an error if a significant proportion of the uploaded data is invalid (out of order timestamp or a timestamp field that cannot be parsed).

As a troubleshooting step, we recommend using a known good JSON or delimited file. We have provided the example data for this purpose here.

Next steps

If the job show a bucketCount greater than zero, the Engine API is receiving and processing data.

Now check to see if the results can be viewed.

Checking for anomalous results

In previous troubleshooting steps we have validated that your data is being analyzed by the Engine API. Now we shall see if any anomalies are being identified.

A Kibana Dashboard has been provided as a quick-start way to graphically view the analytics job results. To view the results dashboard open the URL:

http://localhost:5601/app/prelert#

To use the Kibana Dashboard, port 9200 is required to be open on the firewall as this dashboard directly queries the Elasticsearch component in the Prelert Engine API installation.

Select a Job to view

In the Kibana UI, navigate to the Summary view and ensure that the time pickers covers the data range of your analysis e.g. last 4 hours. By default, results will be displayed aggregated across all jobs. Select the relevant job from the drop down picker.

Job Picker

Still no anomalies

If the time interval is correct for the data set and the anomaly chart remains a flat line and hits (number of result buckets) are greater than zero, then this indicates that no anomalies have been found.

Possible reasons for this are:

There are no anomalies - This may occur if the data is uniform. Export a subset of data and manually review to see if there are variations in the data.

Insufficient data within each bucketSpan - If the bucketSpan is 5 mins and the data is reporting every 30 mins say, then the granularity of analysis does not align well with the granularity of data. Ensure that the chosen bucketSpan is wide enough to contain sufficient data points for analysis.

Incorrect job configuration - The job configuration defines the fields and the functions to analyze by. For example, to identify potential data exfiltration events, look at client IP addresses that are sending unusually large amounts of data. In this instance, anomaly detection is required for the sum of bytes sent by IP address. The Engine API is capable of analyzing multiple metrics within a single job, but as a troubleshooting step, we recommend that you start with the simplest use case first, and then add complexity.

Accessing Log Files

There are two simple ways to access the log files: either directly on the server by navigating to the directory:

$PRELERT_HOME/logs

or downloading them over http by hitting the support endpoint:

http://localhost:8080/engine/v2/support

This returns a Zip file containing all the log files. The logs for individual jobs or software components can be downloaded by accessing the specific logging endpoint, for example:

http://localhost:8080/engine/v2/logs/engine_api

By default, log files roll over once a 1MB size limit has been reached. The current active log file will have a “.log” extension, the second most recent “.log.1” and so on. Only ten log files are retained by default, so the oldest log file will have a “.log.9” extension. Below is a description of the key log files

Core Log Files

engine_api/engine_api.log:
 contains top level job management events along with the general health of the Engine API.
engine_api/stderr.log:
 contains the standard error stream.
engine_api/stdout.log:
 contains the standard output stream.

To download a Zip file of the core log files, hit the endpoint:

http://localhost:8080/engine/v2/logs/engine_api

or navigate to the directory:

$PRELERT_HOME/logs/engine_api

Job Specific Log Files

It is important to note that specific log files will not exist until after the job has received and processed its first data. If you have sent data, but these logs do not exist, then there would have been an error uploading the data - check the core log files and the data export routine for errors.

<jobId>/engine_api.log:
 contains detailed job management events.
<jobId>/autodetect_api.log:
 contain processing and analysis events specific for the job.
<jobId>/normalize_api.log:
 contains messages about normalization specific for the job.

To access the job specific log files, hit the endpoint:

http://localhost:8080/engine/v2/logs/<jobId>

or navigate to the directory:

$PRELERT_HOME/logs/<jobId>

where jobId is the identifier of the job of interest.

Licensing Log File

lictest/lictest.log:
 identifies if a valid license exists. Read more on Licensing.

To access the licensing log file browse to the endpoint:

http://localhost:8080/engine/v2/logs/lictest

or navigate here:

$PRELERT_HOME/logs/lictest

Download the Support Bundle

The support bundle contains all of the Engine API’s log files and basic information about the host machine it is downloaded as a Zip file from the server via the dedicated Support Endpoint:

http://localhost:8080/engine/v2/support

What’s in the Support Bundle?

basic_info.log:Basic information about the operating system, hardware and Prelert license.
engine_api_info.log:
 Engine API version and the list of configured jobs.
elasticsearch_info.log:
 Elasticsearch status information.

In addition the bundle also contains the core Engine log files and any job specific logs as described in Accessing Log Files.

Stopping and starting the API

Engine API Shutdown

In a troubleshooting scenario, please run the shutdown script beforehand to ensure that the system state is entirely reset:

> $PRELERT_HOME/bin/prelert_shutdown.sh

This ensures a clean shutdown of the Engine API - all data is saved to disk and all processes and threads are cleanly exited.

Engine API Startup

Following clean shutdown, the Engine API and its pre-requisites can then be restarted by running the startup script:

> $PRELERT_HOME/bin/prelert_startup.sh

Important

Do not run prelert_startup.sh as the root user. Doing so means that you will be vulnerable to any security flaws in Elasticsearch, Jetty or Java itself, and will also create files owned by root in your installation that subsequently prevent the correct non-root user running the software. Always run the Engine API as the same non-root user who installed it.

Getting a Windows installer log

In the event that anything goes wrong with an installation on Windows and you cannot determine what the problem is, please use the following procedure to obtain a log file from the Windows installer prior to contacting support@prelert.com.

  1. Open a cmd.exe prompt

  2. Change to the directory that the installer is located in using the cd command

  3. Run the following:

    msiexec /log C:\Users\<userid>\Desktop\msi.log /i prelert_engine_release_windows_64bit.msi
    

(Change the path to your desktop in the above command, or choose some other convenient location that you have write access to for the msi.log to be created.)

This is the equivalent of double-clicking on the .msi installer, but the log will contain information as to what has occurred. Sending this log file when you contact support will dramatically improve the chance of us being able to diagnose what is preventing installation.