Rare Functions

Detect values that occur rarely in time or rarely for a population.

rare analysis will detect anomalies according to the number of distinct rare values. This differs from freq_rare which will detect anomalies according to the number of times (frequency) rare values have occurred.

  • rare
  • freq_rare

Note

  • The rare and freq_rare functions should not be used in conjunction with excludeFrequent.
  • Shorter bucket spans (less than 1 hour say) are recommended when looking for rare events. The functions model if something happens in a bucket at least once. With longer bucket spans, it is more likely that entities will be seen in a bucket and therefore they appear less rare. Picking the ideal the bucket span depends on the characteristics of the data with shorter bucket spans typically being measured in minutes, not hours.
  • Modeling of rare data requires a learning period of at least 20 buckets (for typical data).

rare

Detect values that occur rarely in time or rarely for a population. Detect anomalies according to the number of distinct rare values.

  • fieldName: N/A
  • byFieldName: required
  • overFieldName: optional

Example 1 - rare in time

{ "function" : "rare", "byFieldName" : "status" }
  • Models status codes that occur over time
  • Detects when rare status codes occur compared to the past

For example, will detect status codes in a web access log that have never (or rarely) occured before.

Example 2 - rare in a population

{ "function" : "rare", "byFieldName" : "status", "overFieldName" : "clientip" }
  • Models status code and clientip interactions that occur
  • Defines a rare status code as one that occurs for few clientip’s compared to the population [1]
  • Detects clientip’s that experience one or more distinct rare status codes compared to the population [2]

For example in a web access log, a clientip that experiences the most number of different rare status codes compared to the population will be regarded as highly anomalous. This is based on the number of different status code values, not the count of occurrences.

[1]To define a status code as rare we look at the number distinct status codes that occur, and not the number of times the status code occurs. If a single clientip experiences a single unique status code this will be rare, even if it occurs for that clientip in every bucket.
[2]Here with rare we look at the number of distinct status codes.

freq_rare

Detect values that occur rarely for a population. Detect anomalies according to the number of times (frequency) that rare values occur.

  • fieldName: N/A
  • byFieldName: required
  • overFieldName: required

Example 1 - frequently rare in a population

{ "function" : "freq_rare", "byFieldName" : "uri", "overFieldName" : "clientip" }
  • Models uri paths and clientip interactions that occur
  • Defines a rare uri path as one that is visited by few clientip’s compared to the population [3]
  • Detects clientip’s that experience many interactions with rarev uri paths compared to the population [4]

For example in a web access log, a clientip that visits one or more rare uri paths many times compared to the population, will be regarded as highly anomalous. This is based on the count of interactions with rare uri paths, not the number of different uri path values.

[3]To define a uri path as rare we look at the number distinct values that occur, and not the number of times the uri path occurs. If a single clientip visits a single unique uri path this will be rare, even if it occurs for that clientip in every bucket.
[4]Here with freq_rare we look at the number of times interactions have happened.