Best practices for selecting a bucketspan

What is the bucketspan?

The bucketspan is the window for time series analysis. This is typically somewhere between 5 mins to 1 hr, although this will vary depending on the data.

During analysis, we gather summarized statistics for each bucket interval which is used to model the data. In turn, we detect anomalies in each bucket. We provide individual record results for each anomaly found in each bucket. We also aggregate the results (for all records from each detector) to provide a summary for the bucket and for each influencer.

For example when analyzing unusual counts in DNS logs using a 5 min bucket:

If an anomaly is raised at 12:05, then this represents the start of the bucket that the anomaly occured in. i.e. an unusual count of records occured between 12:05:00 and 12:09:59.

Buckets are always aligned to clock time and to zero seconds at the lower boundary. One bucketspan is defined per Anomaly Search.

Picking a suitable bucketspan

It is ideal for the bucketspan to try to match the typical duration of an anomaly; however additional factors should be taken into account:

  • the granularity at which you want to analyze by
  • the frequency at which alerting is required
  • the frequency of the input data
  • the analytical functions used

It is possible to quickly evaluate different bucketspans using Tools / Evaluation Mode.

In this example, we will analyze Splunk logs looking for unusual counts in error and warning messages. The following three commands perform the same analysis at 5min, 10min and 1hr intervals. Run in three different tabs to compare results, remembering to select a suitable time range e.g. Last 7 days.

index=_internal (log_level=WARN OR log_level=ERROR) | prelertautodetect bucketspan=300 count by component
index=_internal (log_level=WARN OR log_level=ERROR) | prelertautodetect bucketspan=600 count by component
index=_internal (log_level=WARN OR log_level=ERROR) | prelertautodetect bucketspan=3600 count by component

Impact on analysis

Anomalies have different significance when analyzed using different bucketspans. The functions most affected by length of the bucketspan are:

  • mean - A short spike in a value for a few seconds may not be seen if analyzing using mean over 1hr; however this will be evident using max. Consider using both mean and max detectors in a single Anomaly Search.
  • rare, freq_rare - Shorter bucket spans (less than 1 hour say) are recommended when performing a rare analysis. The rare function models if something happens in a bucket at least once. So, with longer bucket spans it is more likely that entities will be seen in a bucket and therefore they appear less rare.

If your data has a low event rate, a shorter bucketspan means that fewer data points will be analyzed in each bucket. This can make it more likely for there to be a greater variance in the data model, which can increase learning time and reduce the accuracy of the fit of the model. If this is the case consider increasing the bucketspan.

The longer the bucketspan, the more data points can be analyzed and sumarized per interval.

Impact on timeliness of results

Results will be available within a bucketspan.

The longer the bucketspan, the less frequent the Anomaly Search runs, therefore the longer the delay between “now” and when possible anomalies are detected.

Impact on processing

When a continuous Anomaly Search is running, the bucketspan determines the invocation frequency of the analysis. If your bucketspan is 5 mins, Prelert will analyze the last 5 mins worth of data every 5 mins. This can take seconds to process, however on very large systems it could take longer.

The shorter the bucketspan, the more work is required for the server to perform.