Model Snapshots Endpoint

Overview

The Engine API reacts quickly to anomalous input, learning new behaviors in data. Highly anomalous input will increase the variance in the models whilst the system learns if this is a new step-change in behavior or a one-off event. In the case where this anomalous input is known to be a one-off, then it may be appropriate to reset the model state to a time before this event, for example after Black Friday or a critical system failure.

In order to revert to a saved snapshot, this sequence must be followed:

  • close job
  • revert to a snapshot (only valid whilst job is closed)
  • send new data to the job (implying reopen)

When reverting to a snapshot, there is a choice to make about whether or not you want to keep the results the Prelert has created between the time of the snapshot and the current time. In the case of Black Friday for instance, you might want to keep the results and carry on processing data from the current time, though without the models learning the one-off behavior and compensating for it. However, say in the event of a critical system failure and you decide to reset and models to a previous known good state and process data from that time, it makes sense to delete the intervening results for the known bad period and resend data to Prelert from that earlier time. Note that if you choose not to delete intervening results when reverting a snapshot the engine will not accept older input data than the current time: if you want to resend data then delete the intervening results.

Any gaps in data since the snapshot time will be treated as nulls and not modeled. If there is a partial bucket at the end of the snapshot and/or at the beginning of the new input data, then this will be ignored and treated as a gap.

For jobs with many entities, the model state may be very large. If a model state is several GB, this could take 10-20 mins to revert depending upon machine spec and resources. If this is the case, please ensure this time is planned for. Model size (in bytes) is available as part of the Job Resource Model Size Stats.

Important

Ensure that the job is in a CLOSED state before attempting to revert to a saved snapshot. Sending data to a CLOSED job will change its status to RUNNING, so also ensure that data is not imminently scheduled to be sent.

Configuration

Model snapshot configuration can be performed at job creation and can be updated after. Updates to the configuration are only applied after the job has been CLOSED and new data sent to it.

Model snapshots are saved to disk periodically. By default, this is occurs approximately every 3 hours. This can be configured using the backgroundPersistInterval configuration parameter and is set per job. When configuring, please choose a value which takes the following into account:

  • Persistence enables resilience in the event of a system failure.
  • Persistence allows for snapshots to be reverted.
  • The time taken to persist a job is proportional to the size of the model in memory.
  • The smallest allowed value is 3600 (1 hour).

Model snapshots are retained for 1 day by default. This can be configured using the modelSnapshotRetentionDays configuration parameter and is set per job. Snapshots older than this period will be deleted from the Elasticsearch datastore. It is also possible to explicitly delete model snapshots.