exclude

Filter out any input record where the value of the input field meets the given condition. This transform requires a condition using one of the pre-defined operators. If the operator is one that requires a numerical operand e.g. eq, lt, lte, gt or gte the value field must be a number in a quoted string. If the match operator is used the argument should be a regular expression. In all cases if the input field matches the condition the Prelert Engine will ignore the entire record.

Multiple exclude transforms are combined as an inclusive logical disjunction, if any one of the transforms’ conditions is true then the record is excluded i.e. multiple excludes form a logical or rather then an and.

  • single input
  • no outputs
  • conditionally executed

The Engine API uses the Java regular expression implementation for more information about features and compatibility see http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html.

Example 1

Exclude all records where ‘CPU_Usage’ is less than, or equal to 1.0

"transforms":[
    {
        "transform":"exclude",
        "inputs":["CPU_Usage"],
        "condition":{
            "operator":"lte", "value":"1.0"
        }
    }
]

Example 2

Using exclude to whitelist known safe values or records of low interest.

I have a DNS log file and wish to anaylze the records using the domain_split transform to extract the Highest Registered Domain from the DNS request. I will then look for unusual interactions between client IPs and destinations IPs.

The log file contains lines such as these:

29-Jun-2015 14:49:54.796 queries: info: client 192.168.31.45#63457: query: plus.google.com IN A + (192.168.32.130)
29-Jun-2015 14:50:33.299 queries: info: client 192.168.31.63#61011: query: syndication.twitter.com IN A + (192.168.32.130)

Browsing the DNS log file I notice a large proportion of entries are for domains ending in google.com or prelert.com. I’m not interested in either of these domains and wish to remove them from analysis. Creating two exclude transforms with regular expressions matching strings ending in google.com and prelert.com serves this purpose.

The expressions to match the string literals are quite simple but the DNS log file sometimes contains extra white-space characters after the domain so zero or more white-space characters are allowed before the end of input to account for this. The required regular expressions are:

.*google\.com\s*$
.*prelert\.com\s*$

The transform definitions are as follows note that the JSON standard requires the backslash character to be escaped with another backslash this must be done to make the expressions valid.

"transforms":[
    {
        "transform":"exclude",
        "inputs":["dns-query"],
        "condition":{
            "operator":"match", "value":".*google\\.com\\s*$"
        }
    },
    {
        "transform":"exclude",
        "inputs":["dns-query"],
        "condition":{
            "operator":"match", "value":".*prelert\\.com\\s*$"
        }
    }
]

We could define further exclude transforms to remove other common domains, this is often simpler than constructing an single overly elaborate regular expression, if any of the conditions match then the record will be discarded.