Geographic Functions

Detect anomalies in the geographic location of the input data.

The fieldName supplied must be a string of the form latitude,longitude, where each of latitude and longitude is in the range -180 to 180, representing a point on the surface of the Earth.

  • lat_long

lat_long

  • fieldName: required
  • byFieldName: optional
  • overFieldName: optional

Example

{ "function" : "lat_long", "fieldName" : "transactionCoordinates", "byFieldName" : "creditCardNumber" }
  • Detect anomalies where the geographic location of a credit card transaction is unusual for a particular customer’s credit card.

An anomaly could indicate fraud (subject to filtering the input data to include only “card present” transactions).

Formatting the input fieldName

It is important to understand that the fieldName used with geographic functions is a single string containing two comma separated numbers. How you create such a field depends on where your source data is and the format in which it is supplied to the Engine API.

CSV data

If you are sending CSV data, the key thing is to quote the field containing the coordinates, for example:

time,transactionCoordinates,creditCardNumber
1460464275,"40.7,-74.0",1234123412341234

The transactionCoordinates field has a value of 40.7,-74.0.

JSON data

It should be fairly obvious how to format JSON data:

{
    "time": 1460464275,
    "transactionCoordinates": "40.7,-74.0",
    "creditCardNumber": "1234123412341234"
}

Elasticsearch Data

In Elasticsearch, location data is likely to be stored in fields of type geo_point. There are four possible structures for this data:

  1. As an object, for example:

    {
      "time": 1460464275,
      "creditCardNumber": "1234123412341234",
      "transactionCoordinates": {
        "lat": 40.7,
        "lon": -74.0
      }
    }
    
  2. As a string, for example:

    {
      "time": 1460464275,
      "creditCardNumber": "1234123412341234",
      "transactionCoordinates": "40.7,-74.0"
    }
    
  3. As a Geohash, for example:

    {
      "time": 1460464275,
      "creditCardNumber": "1234123412341234",
      "transactionCoordinates": "dr5rs7zyn"
    }
    
  4. As an array in GeoJSON format, for example:

    {
      "time": 1460464275,
      "creditCardNumber": "1234123412341234",
      "transactionCoordinates": [ -74.0, 40.7 ]
    }
    

Depending on which of these you are using, convert to the required format using transforms as follows:

  1. Convert objects using the concat transform:

    "transforms": [
      {
        "transform": "concat",
        "arguments": ",",
        "inputs": [ "transactionCoordinates.lat", "transactionCoordinates.lon" ],
        "outputs": "latLong"
      }
    ]
    
  2. Strings do not require any conversion.

  3. Convert Geohash values using the geo_unhash transform:

    "transforms": [
      {
        "transform": "geo_unhash",
        "inputs": [ "transactionCoordinates" ],
        "outputs": "latLong"
      }
    ]
    
  4. Convert GeoJSON arrays using a combination of the split and concat transforms (because the coordinates need to be reversed):

    "transforms": [
      {
        "transform": "split",
        "arguments": ",",
        "inputs": [ "transactionCoordinates" ],
        "outputs": ["long", "lat" ]
      },
      {
        "transform": "concat",
        "arguments": ",",
        "inputs": [ "lat", "long" ],
        "outputs": "latLong"
      }
    ]
    

Of course, it is not compulsory to store coordinates in fields of type geo_point in Elasticsearch. If you have coordinates in other fields, you need to somehow manipulate them into the required format for Prelert geographic functions, and the split, concat and/or geo_unhash transforms should provide the tools necessary to achieve this.