Business intelligence gold panning

Panning for gold Steve Nimmons of Atos Origin looks at complex event processing.

There is an inherent complexity in understanding the relationship between what can appear to be seemingly unconnected events occurring in real or near-real time. I make the temporal distinction as there are sophisticated business intelligence and data mining solutions for pattern or trend discovery in previously captured business information.

These are proven and do a very solid job in specific circumstances. There are also interesting extensions emerging in terms of mash-ups that augment and enrich more fully-featured end-user driven business intelligence solutions. Analysing events in real-time can be exceptionally informative, adding to the overall utility of business intelligence and providing a mechanism for business processes to react advantageously.

I describe complex event processing (CEP) as gold panning the elemental binary soup. CEP is trying to amplify the signal of interesting information against the noise of your event driven architecture. What is interesting is generally termed a 'concept' and CEP is all about trying to instantiate concepts from the flow of events.

CEP is typically achieved using forward chaining rules engines, themselves based on the RETE algorithm. There are a number of examples of good COTS products, I myself had the privilege of working on one prototype system that has now been successfully productised.

The business problem we faced was the correlation of events in a distributed mobile telecoms network. Tracing failures throughout the user experience was complex and put a large burden on operations and customer service representatives (CSRs).

The solution was to decode large volumes of real-time data from RADIUS servers, WAP servers, MMS, MML, SMS, micropayment systems et al, and hunt for failure patterns in the datasets (and present these in a simple consumable format). The result was exceptional operational saving, increased user satisfaction and greater insight into systems failures (which were then addressed).

Having 'done battle' with early systems I table the following implementation recommendations:

  1. Trial candidate CEP systems in a proof of concept unless you are familiar with the solution, product and approach.
  2. It is complex and a mutual frustration to produce decent test data with known results. When configuring a CEP system to look for 'needles in haystacks' it is very handy to actually know where and what some of them are. False positives, false negatives and correlations that fall inside or outside of configurable tolerances will need to be tested. Start thinking about this very early in the project cycle and engage domain experts that know the respective systems inside out. Make this 'slick' and you will get a better result at much lower cost.
  3. Tuning is extremely important. Make sure your key attributes can be configured. This often (although not exclusively) relates to timeouts.
  4. Ensure, unless by explicit design, that you can handle disasters and fail-over. This may sound obvious, but engines running in memory for ultimate speed may not provide you with necessary state management. An inelegant solution is to run two engines performing the same work. If you need to run at very high throughput volumes this must be part of non-functional testing in a proof of concept.
  5. CEP can become an arcane mess and can be quite 'organic' in nature. Black-magic systems are only fun for propeller heads with little commercial responsibility, so avoid the 'allure of mystery'.
  6. Make sure the golden nuggets CEP expose are used. This can often be a feedback loop into other business processes. The key point is new information might be disruptive to existing processes, so plan ahead.
  7. Depending on the implementation, presenting results to a 'human' through a worklist (workflow / BPM) is essential. This might be in the validation of suspect movements, purchases, funds transfers etc. Badly tuned CEP systems with inappropriate thresholds overwhelm the human tasks. Conversely a system that is 'too loose' can allow suspect events (potentially of a very serious nature) to pass undetected.

Solutions and industries in which I see increased CEP interest and uptake include: transport, financial services (fraud detection) and fault monitoring systems (cross industries). The latter case is very interesting and systems are being developed that seek to provide early warning of developing faults that generate event streams with tell-tale characteristics.

The definition of 'tell-tale' is of course rather slippery, with practically limitless variation between circumstances. This is at the heart of what makes CEP both interesting and challenging.

November 2008