The most common form of application trouble, from the eyes of the end-user, is slow response times. In extreme circumstances, applications can fail altogether as if a power has been switched off.
In many cases, the causes of end-user application woes can be identified and preempted before there is perceivable degradation; other times, they cannot. But in all cases of poor end-user application experience, the cause needs to be identified, remedied and restored swiftly.
The ability to deliver a consistent, quality end-user experience and to fix problems is growing both in importance and complexity. More important because users today are more sophisticated, expect a superior experience, are less patient and more dependent on IT than ever before.
The challenge is more complex because the move to virtualisation, cloud computing, and service oriented architecture (SOA) makes for more intricate application and infrastructure dependencies. These two factors, in turn, make troubleshooting much trickier.
Rising complexity
As business applications grow ever more complex, seemingly simple transactions can now encompass the end-user client, web and application servers, databases, mainframe, message busses, and increasingly services from web service providers, such as public and private clouds.
When managing end-user application performance the ability to quickly pinpoint trouble spots in the system is crucial. That's the only way service level agreements (SLAs) are met, calls to the help desk are minimised, and web sites aren't abandoned due to poor response times.
All this requires a solid understanding of how the health and capacity of every system and device, across all tiers of the infrastructure, supports the application, from the network through the servers and databases to the end user itself. Unfortunately, the way this information is typically gathered is no longer effective, or efficient, in today's rapidly changing and dynamic environment.
Monitoring in the dark ages
To date, most organisations have been relying on point products, instead of gaining the view of the entire application transaction life cycle. End user experience is monitored, but in isolation to the rest of the infastructure. The same is true for their database performance, and the latency of the network layer, as well as how servers, mainframes, and other systems are performing.
The result is siloed information; the data necessary to fix the problem is scattered among a variety of monitoring and troubleshooting tools. In order to determine the causes of problems, members of each respective IT group need to assemble together with their respective reports. They then need to compare their numbers and try to figure out the cause of the problem. It is this manual correlation of data that should have been made extinct years ago.
Multi-tier monitoring is the future
While these specialised tools are necessary, and often provide deep insight into their areas on which they focus, they don't help manage the entire business-technology infrastructure, in real-time.
When problems can start to reveal themselves with shifts in performance measured in milliseconds or fractions of a percentage of CPU utilisation, and then impact the end user minutes or hours later, it becomes clear that organisations need the ability to detect end user issues before they occur.
That means not only monitoring the application, but also tracking the entire transaction flow and monitoring each step of every transaction by measuring response times, latencies, protocol and application errors, and all of the associated dependencies, on every tier from the end user through the data centre.
Consider the behind-the-scenes complexity of a typical online purchase. The buyer adds items to the shopping basket, entering his or her billing information and clicking submit. If the user gets an error, the business is most likely lost. This transaction will have likely touched dozens of systems, the underlying infrastructure, applications, databases, a credit card authentication system, and other tiers.
Had the capabilities been in place to monitor all of those pieces of the transaction, their error may very well had of been avoided all together. For instance a sluggish database, or partner application, could have been spotted, and remedied, before ever impacting the online shopper. And, for those errors that cannot be spotted in advance of an error or failure, they too can be fixed much more swiftly.
Gather as much information as you can
Consider this capability as it applies to shipping physical packages. In the dark ages of shipping (barely a decade ago), packages were shipped and the sender and receiver knew little more than when the package left its starting point and arrived at its destination.
Today, packages are tagged and customers can track their parcels progress in near real-time as it progress along each waypoint in its journey. Still, there is no easy way to determine, while the package is en route or before it is shipped, if it will miss its deadline.
It would be useful to have more detail. To predict if a package will not arrive on time as a result of any difficulties more data is required. That data could be culled from slowdowns at the loading dock, the health of the truck engine and tires, and real-time traffic information for the truck’s journey.
Similarly, end user experience monitoring today provides that type of visibility into application response time and then alerts when response times have degraded. Unfortunately, many tools typically lack the more detailed information necessary to predict and rectify potential performance issues before they arise.
To manage the end user's application experience properly, organisations need the same capabilities today when it comes to tracking application performance. They need to tag the transaction from its starting point and be able to monitor it as it traverses its way from the end-user's system all the way through the data centre and back again.
That kind of capability won't be found with conventional point solutions that individually measure the performance of networks, databases, servers, or applications. It can only be found by monitoring application performance from the end-user's perspective, as well as understanding how that experience is affected by the real-time health of all of the devices and systems on which that application depends.
That ultimately means that application failures, slow response times, and unmet SLAs must not be discovered at the help desk or from upset customers when the damage is already done, and problems are the most difficult to fix.