Rapid bottleneck identification: insurance for your brand

According to research group Gartner, bugs discovered after release of new website applications can not only spell doom for a project, but also cost 80 to 1,000 times more to fix than if they were found during pre-deployment testing. Andy Buchanan reports.

As more businesses harness the power of having an online presence and using web-based applications to enhance their business, it becomes imperative that every element during the development of any web project goes ahead smoothly and efficiently, with as few glitches as possible. However, this isn't always possible because most major web-based initiatives involve some form of unforeseen delay that cause a slippage in the planned timescale. Moreover, these same ventures tend to have deadlines that cannot afford to be missed. The net result of this is that time becomes a precious but limited commodity, a scenario that quality assurance and testing specialists are unfortunately all too familiar with.

So how do we deal with this?

A common approach is to keep things simple. For example, on the eve before the opening day of a new sports stadium in the US, engineers tested the plumbing by getting the entire workforce together for a synchronised flushing of the toilets. A similar but labour-intensive approach is often applied to load-test new web applications, with lots of people asked to log in simultaneously.

Testing should also be quick, simple, and thorough, which is where rapid bottleneck identification (RBI) comes in. Every web application has at least one bottleneck, usually an element of hardware, software or bandwidth that places defining limits on data flow or processing speed. Therefore applications will only be as efficient as their least efficient elements. It is these bottlenecks that directly impact performance and scalability.

Furthermore, bottlenecks can only be discovered and resolved one by one. And they can be found throughout an organisation's web application infrastructure, at the system level (firewalls, routers, server hardware etc), web server level (hit rate, CPU etc), application server level (page rate, memory etc), or the database server level (queuing, connections etc). While this sounds like a recipe for an arduous testing process, there is a way to quicken the process.

Throughput or concurrency?

80 per cent of all system and application problems come through limitations in throughput capacity - the measure of the flow of data that a system can support, measured in hits per second, pages per second or megabits per second (Mbps). And only 20 per cent of issues are down to concurrency - or the number of independent users that a system can support.

Therefore if most bottlenecks occur in the throughput, it makes sense for performance testing to focus most of its efforts there, instead of on levels of concurrent users, which has been the traditional focus.

This way of testing involves minimising the number of user connections to a system while maximising the amount of work that is being performed by those user connections - pushing the web application and the system to their maximum capacity.

At the system level this means adding basic files to the web and application servers. Typically a large image is used for bandwidth tests, a small text file or image is used for hit-rate tests, and a very simple sample application page is used for page-rate testing.

If the system is unable to meet the requirements of the application, there is no need to continue testing until it has been improved, either by tuning its settings, boosting hardware capacity or increasing bandwidth.

Throughput testing at the application level means hitting key pages and user transactions with limited delay between requests to find the page-per-second capacity limit of the various components. The pages or transactions with the worst throughput are those in need of the most tuning.

Concurrency, however, is still an important part of testing. On the system and application level it can be limited by sessions and socket connections. It can also be hit by incorrect sever configuration or code flaws. Testing it involves increasing the number of users running with realistic page-delay times, while ensuring the increase is slow enough to harvest useful data throughout the testing.

A faster, simpler way to test

An initial focus on throughput testing saves time. For example, say you were testing a system expected to handle 5,000 concurrent users, each spending an average of 45 seconds on each page. If the application has a bottleneck that will limit its scalability to 25 pages per second, a typical concurrency test would have found the problem at approximately 1125 users, or 94 minutes into the test. A throughput test would have uncovered the glitch in less than 60 seconds.

So, RBI can help speed testing along, but how is it kept simple? Very often, performance testing begins with overly complex scenarios exercising too many components, which makes it easy for bottlenecks to hide. By beginning with basic system-level testing you can check performance before the web application is even deployed.

Furthermore, a modular approach simplifies things. For example, you start by testing the simplest possible test case and gradually build in complexity. If the simplest test case works, testing moves on - if the next stage fails you know where the bottleneck is.

This modular method also allows you to rule out previously tested components from the equation as you go forward. For example, if hitting the homepage shows no problem, but hitting the homepage plus executing a search shows a very poor performance, the cause of the bottleneck is in the search functionality.

System and application level testing

Any performance testing should begin with an assessment of the basic network infrastructure supporting the web application. If this cannot support the anticipated user load then even infinitely scalable application code will bottleneck.

After checking the system is up to the job, it's time to turn to the web application itself. Again, the approach to testing should be that you start with the simplest possible test case and then add complexity. In a typical ecommerce application that would mean testing the homepage first, then adding in pages and business functions until complete real world transactions are being tested, first individually and then in complex scenario usage patterns.

Once this has taken place, transactions can be put into scenario concurrency tests. Any concurrency test must reflect what users really do on the site (for example, 50 per cent just browse, 35 per cent search, 10 per cent register and log in and 5 per cent add to a shopping cart and make a purchase). However, virtual users testing the site must also execute the steps of those transactions using the same pace that real world visitors do.

So whether you conduct your performance testing in-house using automated tools or manually, or via a managed service, the important thing is that it is done methodically and rigorously.

Just ask the team from a major UK newspaper that saw its new website crash earlier this year due to unforeseen traffic levels; or the world's second-largest stock market, which had to halt trading after a software upgrade had an undesired effect. Testing is a bit like insurance. You usually don't realise it is not up to scratch until it is too late.