Securing Behavioral Analytics or the Engine API

There are a number of options for securing the Engine API through restricting access and protecting communications to and from it. This is recommended in environments that are accessible to the public internet and/or less trusted parts of a corporate network.

The Engine API design favors performance and is suitable in environments where access to the data and servers is already controlled. The Engine API itself has no security on its REST endpoints or on the dashboard. All data is served using unencrypted HTTP and there is no option for password protection.

Use a firewall

In the case where the Engine API is running on the same machine as the processes that need to connect to it, the best option is to close the Engine API ports on the machine’s firewall. This means that all communication will have to be via localhost.

By default, the Engine API uses TCP ports 5601, 8080 and 9200. Windows, Linux and Mac OS X all have software firewalls that can be configured to restrict external access to these ports.

Run on a trusted network

The next possible scenario is that the Engine API needs to be accessible from other machines, but all machines that need to communicate are on a subnet that is isolated from the public internet and other, less trusted, parts of the corporate network. In this case it may not be a problem that the Engine API is accessible to anyone who can gain access to the subnet.

This relies on appropriate IT policies and network level controls being in place.

Configure a reverse proxy server to secure the API

The final possibility for securing the Engine API is to place a reverse proxy server between the Engine API and the network.

This is certainly recommended if you are installing the Engine API on a machine that is connected to the public internet, such as cloud server.

If configured correctly this is more secure, more flexible and more performant than having the Engine API provide an in-built security layer:

  • More secure because the reverse proxy can be a program dedicated to being an internet-facing web server rather than a JVM (which is what the main Engine API processes run in). A JVM is a far more complex piece of software and hence has more scope for potential security holes.
  • More secure because if somebody manages to get a memory dump from the reverse proxy it will not contain the full Engine API state.
  • More flexible because the security settings are maintained separately to the Engine API, meaning that fewer people need administrative access to them and they persist through an Engine API upgrade.
  • More performant (providing the machine as a whole has sufficient CPU cores) because the SSL encryption/decryption is being done in a separate process in parallel to the rest of the data processing.

This last point does raise the issue that encryption is a CPU-intensive process and the Engine API server needs to be specified to allow for this extra CPU load.

Whilst any web server that can act as a reverse proxy can be used, the one that has been tested by Prelert is Nginx. All the examples on this page are for Nginx. If you prefer to use a different web server, such as Microsoft Internet Information Services or Apache httpd then you will need to translate the configurations described below to your chosen reverse proxy server.

Installing Nginx

For Amazon Linux, Nginx can be installed using the command:

sudo yum install nginx

For most other Linux distributions supported by Prelert, Nginx packages can be downloaded and installed by following the instructions on the Nginx website.

For Windows, Nginx can be downloaded via this page and installed by following the instructions here.

For Mac OS X, use HomeBrew to install Nginx using the command:

brew install nginx

Obtain SSL certificates

The next step is to obtain SSL certificates for use by HTTPS.

The simplest and cheapest way obtain these certificates is to generate them yourself. For an environment where defending against casual snooping is all that is required such self-signed certficates may be adequate. However, it is important to realize that browsers and other client applications may not trust them and are likely to display confirmation prompts before using them. Self-signed certificates leave you more vulnerable to man-in-the-middle attacks and for the highest level of HTTPS security it is recommended that you purchase an SSL certificate from a trusted provider.

If you decide that self-signed certificates are adequate, the OpenSSL program can be used to create them. This will have been pre-installed with the operating system on Linux and Mac OS X, and some pre-built Windows versions are listed on the OpenSSL website.

Assuming Nginx has been successfully installed, the following detailed example steps work on Linux:

# Become root to save typing "sudo" in front of every command below
sudo su -

# Make a directory to store the certificates and secure it such that only the root and nginx users can see the contents
mkdir /etc/nginx/certs
chgrp nginx /etc/nginx/certs
chmod 750 /etc/nginx/certs
cd /etc/nginx/certs

# Create the CA key and certificate for signing the server certificate
openssl genrsa -des3 -out ca.key 4096
openssl req -new -x509 -days 365 -key ca.key -subj "/C=US/ST=Massachusetts/L=Framingham/O=Prelert/CN=prelert.com" -out ca.crt

# Create the server key, CSR, and certificate
openssl genrsa -out server.key 2048
openssl req -new -key server.key -subj "/C=US/ST=Massachusetts/L=Framingham/O=Prelert/CN=prelert.com" -out server.csr

# Self-sign the server certificate here - clients will not trust it
openssl x509 -req -days 365 -in server.csr -CA ca.crt -CAkey ca.key -set_serial 01 -out server.crt

# Secure the certificates such that only the root and nginx users can see them
chgrp nginx *.*
chmod 640 *.*

# Exit from the root shell
exit

Configuring Nginx for HTTPS for the REST API only

The simplest level of security is to completely block remote access to the UI, and only allow access to the REST endpoints remotely via HTTPS.

Assuming the default ports were chosen during installation, the Prelert Engine REST API will be on TCP port 8080, the Elasticsearch data store on port 9200 and the UI on port 5601. These ports should be blocked using a firewall so that they cannot be accessed remotely.

Next, choose another TCP port - all the examples on this page will use 7080 - and open this port on the firewall. The next step is to configure Nginx to reverse proxy data received on this port through to the Engine API running on port 8080.

Edit the nginx.conf file. On Linux:

sudo vi /etc/nginx/nginx.conf

Find the server section within the http section within the example configuration that came with Nginx, and replace it with the following:

server {
    listen 7080 ssl;

    ssl_certificate     /etc/nginx/certs/server.crt;
    ssl_certificate_key /etc/nginx/certs/server.key;

    error_page 497 https://$host:$server_port$request_uri;

    location / {
        proxy_pass http://localhost:8080;
        proxy_redirect http://localhost:8080 https://$host:$server_port;
        sub_filter_types *;
        sub_filter_once off;
        sub_filter http://localhost:8080 https://$host:$server_port;
    }
}

In other words:

  • We’re listening with SSL on port 7080 (i.e. HTTPS)
  • We’re using the certificates generated earlier
  • If somebody tries to talk HTTP on port 7080, redirect them to HTTPS instead
  • Pass all incoming traffic through to http://localhost:8080
  • Map any URLs in response headers from http://localhost:8080 to the protocol, host and port of the original request
  • Map any URLs in response bodies from http://localhost:8080 to the protocol, host and port of the original request

It is important to understand that HTTPS without authentication does not stop anyone using the Engine API. However, it does mean that snoopers cannot easily examine the data being sent to and from the Engine API by other users.

An example of access to the Engine API secured in this way running on a machine with IP address 54.81.22.189 is as follows:

curl -k 'https://54.81.22.189:7080/engine/v2/jobs'

The -k option for curl tells it that an untrusted SSL certificate is acceptable.

The output is:

{
  "hitCount" : 1,
  "skip" : 0,
  "take" : 100,
  "nextPage" : null,
  "previousPage" : null,
  "documents" : [ {
    "description" : "airline tutorial",
    "timeout" : 600,
    "status" : "CLOSED",
    "dataDescription" : {
      "format" : "DELIMITED",
      "fieldDelimiter" : ",",
      "timeField" : "time",
      "timeFormat" : "yyyy-MM-dd HH:mm:ssX",
      "quoteCharacter" : "\""
    },
    "counts" : {
      "bucketCount" : 0,
      "processedRecordCount" : 0,
      "processedFieldCount" : 0,
      "inputRecordCount" : 0,
      "inputBytes" : 0,
      "inputFieldCount" : 0,
      "invalidDateCount" : 0,
      "missingFieldCount" : 0,
      "outOfOrderTimeStampCount" : 0,
      "failedTransformCount" : 0
    },
    "createTime" : "2015-04-20T16:15:43.920+0000",
    "dataEndpoint" : "https://54.81.22.189:7080/engine/v2/data/farequote",
    "bucketsEndpoint" : "https://54.81.22.189:7080/engine/v2/results/farequote/buckets",
    "recordsEndpoint" : "https://54.81.22.189:7080/engine/v2/results/farequote/records",
    "logsEndpoint" : "https://54.81.22.189:7080/engine/v2/logs/farequote",
    "location" : "https://54.81.22.189:7080/engine/v2/jobs/farequote",
    "analysisConfig" : {
      "bucketSpan" : 3600,
      "latency" : 0,
      "detectors" : [ {
        "fieldName" : "responsetime",
        "byFieldName" : "airline",
        "function" : "metric"
      } ]
    },
    "id" : "farequote"
  } ]
}

Notice that the URLs in the output have been mapped to begin https://54.81.22.189:7080 even though the Engine API itself had no idea it was being proxied on this host/port.

Configuring Nginx for HTTPS and authentication for the REST API only

This is a simple extension of the configuration above, but with a username and password to restrict access. As before, a firewall should be configured to block TCP ports 8080, 9200 and 5601, and allow port 7080.

First it is necessary to choose a username and password, and encrypt the password. The password can be encrypted using OpenSSL as follows:

openssl passwd -crypt mypassword

Replace mypassword in the command above with your chosen password.

The username and password then need to be stored in the file htpasswd.prelertengine for use by Nginx. If you choose to use username prelert and password secret then the following commands will accomplish this on Linux:

echo prelert:`openssl passwd -crypt secret` | sudo tee -a /etc/nginx/htpasswd.prelertengine
sudo chgrp nginx /etc/nginx/htpasswd.prelertengine
sudo chmod 640 /etc/nginx/htpasswd.prelertengine

Then replace the server section within the http section in nginx.conf with the following:

server {
    listen 7080 ssl;

    ssl_certificate     /etc/nginx/certs/server.crt;
    ssl_certificate_key /etc/nginx/certs/server.key;

    auth_basic           "Authentication required for Prelert Engine API";
    auth_basic_user_file /etc/nginx/htpasswd.prelertengine;

    error_page 497 https://$host:$server_port$request_uri;

    location / {
        proxy_pass http://localhost:8080;
        proxy_redirect http://localhost:8080 https://$host:$server_port;
        sub_filter_types *;
        sub_filter_once off;
        sub_filter http://localhost:8080 https://$host:$server_port;
    }
}

There are only two additional lines compared to the simpler configuration: auth_basic and auth_basic_user_file. auth_basic is the prompt that will appear in a browser dialog box. The example above reveals that the password is for the Prelert Engine API. For an internet-facing server you may wish to make the prompt terser to avoid giving this information away.

An example of access to the Engine API secured in this way running on a machine with IP address 54.81.22.189 is as follows:

curl -k -u prelert:secret 'https://54.81.22.189:7080/engine/v2/jobs'

The -k option for curl tells it that an untrusted SSL certificate is acceptable; the -u option tells it the username and password to use for authentication.

If you visit https://54.81.22.189:7080/engine/v2 in a web browser you will be prompted for the username and password.

Configuring Nginx for HTTPS and authentication for the UI

If you are using Behavioral Analytics as an end-user product rather than an API, you’ll almost certainly want to block direct access to the API and make a secure UI available.

Assuming the default ports were chosen during installation, the Prelert Engine REST API will be on TCP port 8080, the Elasticsearch data store on port 9200 and the UI on port 5601. These ports should be blocked using a firewall so that they cannot be accessed remotely.

Next, choose another TCP port - all the examples on this page will use 6601 - and open this port on the firewall. Then configure Nginx to reverse proxy data received on this port through to the UI running on port 5601 as follows.

Add usernames and passwords for each user to the file htpasswd.prelertui for use by Nginx. If you have two users, alice and bob, with passwords secret and mostsecret respectively, then the following commands will accomplish this on Linux:

echo alice:`openssl passwd -crypt secret` | sudo tee -a /etc/nginx/htpasswd.prelertui
echo bob:`openssl passwd -crypt mostsecret` | sudo tee -a /etc/nginx/htpasswd.prelertui
sudo chgrp nginx /etc/nginx/htpasswd.prelertui
sudo chmod 640 /etc/nginx/htpasswd.prelertui

Edit the nginx.conf file. On Linux:

sudo vi /etc/nginx/nginx.conf

Find the server section within the http section within the example configuration that came with Nginx, and replace it with the following:

server {
    listen 6601 ssl;

    ssl_certificate     /etc/nginx/certs/server.crt;
    ssl_certificate_key /etc/nginx/certs/server.key;

    auth_basic           "Authentication required for Prelert UI";
    auth_basic_user_file /etc/nginx/htpasswd.prelertui;

    error_page 497 https://$host:$server_port$request_uri;

    location / {
        proxy_pass http://localhost:5601;
        proxy_redirect http://localhost:5601 https://$host:$server_port;
        sub_filter_types *;
        sub_filter_once off;
        sub_filter http://localhost:5601 https://$host:$server_port;
    }
}

In other words:

  • We’re listening with SSL on port 6601 (i.e. HTTPS)
  • We’re using the certificates generated earlier
  • If somebody tries to talk HTTP on port 6601, redirect them to HTTPS instead
  • Pass all incoming traffic through to http://localhost:5601
  • Map any URLs in response headers from http://localhost:5601 to the protocol, host and port of the original request
  • Map any URLs in response bodies from http://localhost:5601 to the protocol, host and port of the original request

Configuring Nginx for HTTPS and authentication for both the REST API and the UI

It is possible to allow secure access to both the REST API endpoints and the UI.

Assuming the default ports were chosen during installation, the Prelert Engine REST API will be on TCP port 8080, the Elasticsearch data store on port 9200 and the UI on port 5601. These ports should be blocked using a firewall so that they cannot be accessed remotely.

The configuration for this is effectively the combination of the steps for the two sections above. The nginx.conf file can have multiple server blocks, so you’d replace the default server section with the following:

server {
    listen 7080 ssl;

    ssl_certificate     /etc/nginx/certs/server.crt;
    ssl_certificate_key /etc/nginx/certs/server.key;

    auth_basic           "Authentication required for Prelert Engine API";
    auth_basic_user_file /etc/nginx/htpasswd.prelertengine;

    error_page 497 https://$host:$server_port$request_uri;

    location / {
        proxy_pass http://localhost:8080;
        proxy_redirect http://localhost:8080 https://$host:$server_port;
        sub_filter_types *;
        sub_filter_once off;
        sub_filter http://localhost:8080 https://$host:$server_port;
    }
}

server {
    listen 6601 ssl;

    ssl_certificate     /etc/nginx/certs/server.crt;
    ssl_certificate_key /etc/nginx/certs/server.key;

    auth_basic           "Authentication required for Prelert UI";
    auth_basic_user_file /etc/nginx/htpasswd.prelertui;

    error_page 497 https://$host:$server_port$request_uri;

    location / {
        proxy_pass http://localhost:5601;
        proxy_redirect http://localhost:5601 https://$host:$server_port;
        sub_filter_types *;
        sub_filter_once off;
        sub_filter http://localhost:5601 https://$host:$server_port;
    }
}

Alternatively, if you want exactly the same usernames and passwords to be used for both the REST API endpoints and the UI, then create a single htpasswd file, say /etc/nginx/htpasswd.prelert, and configure both server blocks to use it:

server {
    listen 7080 ssl;

    ssl_certificate     /etc/nginx/certs/server.crt;
    ssl_certificate_key /etc/nginx/certs/server.key;

    auth_basic           "Authentication required for Prelert Engine API";
    auth_basic_user_file /etc/nginx/htpasswd.prelert;

    error_page 497 https://$host:$server_port$request_uri;

    location / {
        proxy_pass http://localhost:8080;
        proxy_redirect http://localhost:8080 https://$host:$server_port;
        sub_filter_types *;
        sub_filter_once off;
        sub_filter http://localhost:8080 https://$host:$server_port;
    }
}

server {
    listen 6601 ssl;

    ssl_certificate     /etc/nginx/certs/server.crt;
    ssl_certificate_key /etc/nginx/certs/server.key;

    auth_basic           "Authentication required for Prelert UI";
    auth_basic_user_file /etc/nginx/htpasswd.prelert;

    error_page 497 https://$host:$server_port$request_uri;

    location / {
        proxy_pass http://localhost:5601;
        proxy_redirect http://localhost:5601 https://$host:$server_port;
        sub_filter_types *;
        sub_filter_once off;
        sub_filter http://localhost:5601 https://$host:$server_port;
    }
}

Note about securing Elasticsearch

In all the examples on this page the TCP port used by the Elasticsearch instance that stores Prelert results (9200 by default) is closed. In an untrusted environment it is very important to ensure that this TCP port is not open to the world and should only be accessible via localhost. See known issues.

However, Behavioral Analytics / Engine API 2.1 can query input data from a separate secured Elasticsearch cluster by specifying a username and password in the job configuration.