
Modern service delivery infrastructures are characteristically highly resilient. Dynamic routing combined with server and network virtualization has enabled the delivery of applications on demand with dynamic configuration of components.
![]()
The difficulty for the providers of service assurance is that customer experience is immediately impacted by an application behavior anomaly while fault management systems typically show no relevant incidents. This leaves operations managers blind to the application error and the impact of that application error.
Application errors do not correlate directly to a discrete failure of an infrastructure component but rather to sequences of state changes across a number of infrastructure components underpinning the application.
![]()
Historically an interface failure could be correlated as the root-cause of symptomatic failures because the topology was both localized and simplistic. In modern infrastructures however, with mesh networking, dynamic routing and server virtualization, application delivery platforms are too complex to be mapped.
Even though built in resilience has made it unlikely that catastrophic failures will widely disrupt service delivery, our customers’ experience is still regularly impacted by adverse application behavior although in reality the application has probably not failed.
Application instabilities are caused by [apparently] unrelated state changes in the service delivery infrastructure that underpins a given application such that users can no longer work.
The causality of an application error is caused by the domino effect from a series of state changes through the service delivery infrastructure. To our service assurance staff, these represent needles found within separate management haystacks.
Traditional management tools have become ineffective in modern infrastructures.
![]()
Prelert uses patent pending analytics techniques to automatically identify significant Episodes of causality, from streaming event, trend and usage service management telemetry, that lead to application errors impacting the customer’s experience.
Prelert utilizes existing sources of service management telemetry including IBM Tivoli Netcool, HP OpenView, BMC Patrol, Splunk, LogLogic and more…
![]()
Prelert’s Service Causality Analysis sequences How services are impacted, whereas Root-Cause Analysis filters out symptomatic fault noise leaving What fault actually occurred.
In the case of application behavior abnormalities, there is no outright failure, so there is no “What” happened, and so root-cause analysis tools have become irrelevant. Prelert however shows the sequence of states that led to the application behavior abnormality, enabling prompt resolution and configuration to ensure the problem does not recur.
Throughout our service delivery infrastructures, applications, network and server equipment, operating systems and other building blocks are continuously documenting their state at a given time in management telemetry logs and SNMP Traps. In addition network and systems management tools are polling managed objects to determine their state and producing event messages.
This combined service management telemetry consists of fault, status and usage information, from which, with the appropriate analytics techniques, Episodes of Causality can be inferred about application behavior across the service delivery infrastructure.
Some of these episodes of causality document normal behavior and some highlight application abnormalities that are manifested as application errors to the users, as in the example below.
This telemetry data is normally being collected already by organizations, but the value is locked within disambiguated silos of service management; the network department, DBAs, Storage, Middleware, and specific applications management groups.
Not only is it locked away in these silos, until now, there has been no management tool capable of unlocking the value contained within this telemetry by combining the silos together and relating its behavior.
![]()
Prelert uses state of the art stochastic and automated significance-induction analytics techniques to transform a patchwork of IT and Telecoms telemetry messages into coherent knowledge that can be used by user specific Service Assurance applications.
Prelert is unique in multiple respects:
Service assurance telemetry data is received in real time from tokenized sources such as: IBM Tivoli Netcool, CA eHealth, InfoVista, Splunk, LogLogic, HP OpenView, BMC Patrol and others.
Input telemetry messages are filtered and non-discrete data transformed into distinct Evidence State messages.
Methods are applied for each dimension (source haystack) and across dimensions (servers, services, applications, users, etc.)
A suite of proprietary and patented stochastic analysis algorithms automatically creates a unified multidimensional model of the entire service delivery system represented by the Evidence data.
Algorithms include:
Inductive Learning techniques continue to maintain the System model of Episodes in real-time, without the need for manual rules or fettling.
Prelert describes the telemetry data (evidence) as:
The Background Model contains Episodes that identify the likely cause of an evidence item.
As new evidence is received from the service management tools – in real-time – Prelert isolates the most significant Episodes in watchlists. Users are able to both select an Evidence item and view the causality, or the view the likely impact of an Evidence item.