Prelert
Technology

This is the Modern World

Modern service delivery infrastructures are characteristically highly resilient. Dynamic routing combined with server and network virtualization has enabled the delivery of applications on demand with dynamic configuration of components.

Legacy/Modern Infrastructure

Back to top

Legacy Infrastructure Modern Infrastructure

The difficulty for the providers of service assurance is that customer experience is immediately impacted by an application behavior anomaly while fault management systems typically show no relevant incidents. This leaves operations managers blind to the application error and the impact of that application error.

Application errors do not correlate directly to a discrete failure of an infrastructure component but rather to sequences of state changes across a number of infrastructure components underpinning the application.

Back to top

Service Assurance Technology is Stuck in the Last Millennia

Historically an interface failure could be correlated as the root-cause of symptomatic failures because the topology was both localized and simplistic. In modern infrastructures however, with mesh networking, dynamic routing and server virtualization, application delivery platforms are too complex to be mapped.

Even though built in resilience has made it unlikely that catastrophic failures will widely disrupt service delivery, our customers’ experience is still regularly impacted by adverse application behavior although in reality the application has probably not failed.

Application instabilities are caused by [apparently] unrelated state changes in the service delivery infrastructure that underpins a given application such that users can no longer work.

The causality of an application error is caused by the domino effect from a series of state changes through the service delivery infrastructure. To our service assurance staff, these represent needles found within separate management haystacks.

Stacks

Traditional management tools have become ineffective in modern infrastructures.

  • Root-cause analysis tools cannot directly correlate application errors to their likely causes due to the dynamic nature of topology models and the fact that there are no indicative application faults.
  • Business transaction and application response time measurement may confirm the existence of a customer experience issue but does not give any indication as to the causality.

Back to top

Prelert Service Causality Analysis works with your existing data!

Prelert uses patent pending analytics techniques to automatically identify significant Episodes of causality, from streaming event, trend and usage service management telemetry, that lead to application errors impacting the customer’s experience.

Prelert utilizes existing sources of service management telemetry including IBM Tivoli Netcool, HP OpenView, BMC Patrol, Splunk, LogLogic and more…

Back to top

Service Causality Analysis shows How Application Abnormalities Occur

Prelert’s Service Causality Analysis sequences How services are impacted, whereas Root-Cause Analysis filters out symptomatic fault noise leaving What fault actually occurred.

In the case of application behavior abnormalities, there is no outright failure, so there is no “What” happened, and so root-cause analysis tools have become irrelevant. Prelert however shows the sequence of states that led to the application behavior abnormality, enabling prompt resolution and configuration to ensure the problem does not recur.

Throughout our service delivery infrastructures, applications, network and server equipment, operating systems and other building blocks are continuously documenting their state at a given time in management telemetry logs and SNMP Traps. In addition network and systems management tools are polling managed objects to determine their state and producing event messages.

This combined service management telemetry consists of fault, status and usage information, from which, with the appropriate analytics techniques, Episodes of Causality can be inferred about application behavior across the service delivery infrastructure.

Time

Some of these episodes of causality document normal behavior and some highlight application abnormalities that are manifested as application errors to the users, as in the example below.

This telemetry data is normally being collected already by organizations, but the value is locked within disambiguated silos of service management; the network department, DBAs, Storage, Middleware, and specific applications management groups.

Not only is it locked away in these silos, until now, there has been no management tool capable of unlocking the value contained within this telemetry by combining the silos together and relating its behavior.

Back to top

Prelert – How it Works

Prelert uses state of the art stochastic and automated significance-induction analytics techniques to transform a patchwork of IT and Telecoms telemetry messages into coherent knowledge that can be used by user specific Service Assurance applications.

Prelert is unique in multiple respects:

  • consumes both Fault and Trend data
  • Consumes User Experience (Application Response) data
  • consumes Usage data
  • does not require manual rules administration or maintenance at all
  • does not require topology or object model knowledge

1. Service Assurance Telemetry is transformed into a unified Evidence stream

Service assurance telemetry data is received in real time from tokenized sources such as: IBM Tivoli Netcool, CA eHealth, InfoVista, Splunk, LogLogic, HP OpenView, BMC Patrol and others.

Input telemetry messages are filtered and non-discrete data transformed into distinct Evidence State messages.

  • Prelert uses a variety of algorithms and methods to identify significant deviations and features in the data
  • Not just trending and thresholding
  • Heavy use of ‘automated stock market trading’ type algorithms along with classical multivariate statistics

Methods are applied for each dimension (source haystack) and across dimensions (servers, services, applications, users, etc.)

2. Inductive Learning creates a unified Episode model of “The System”

A suite of proprietary and patented stochastic analysis algorithms automatically creates a unified multidimensional model of the entire service delivery system represented by the Evidence data.

Algorithms include:

  • Temporal sequencing (DNA motif identification)
  • Multi-attribute matching (Unsupervised clustering)
  • Causality testing
  • Sequence significance ranking - information entropy, mutual information

Inductive Learning techniques continue to maintain the System model of Episodes in real-time, without the need for manual rules or fettling.

3. Resulting in a Background Model and Exceptions

Prelert describes the telemetry data (evidence) as:

  • Background model - repeated Episodes (sequences) of evidence
  • Exceptions - a departure from the background model

Time

The Background Model contains Episodes that identify the likely cause of an evidence item.

As new evidence is received from the service management tools – in real-time – Prelert isolates the most significant Episodes in watchlists. Users are able to both select an Evidence item and view the causality, or the view the likely impact of an Evidence item.

Back to top

© Prelert, 2009

Website Design: Donohue Design