You're sold on load testing. But for what "unreasonable" load should you test?
Load testing – where you discover the point at which a computer system fails – is based on preparing for (graceful) failure by knowing its breaking point.
Load testing – where you discover the point at which a computer system fails – is based on preparing for (graceful) failure by knowing its breaking point. Successful load testers anticipate high demand – but at what point do you pass from “high demand” to “ridiculous”? The guideline: Expect the unexpected.
One step in the software testing process is load testing, in which you discern how well the application handles both the anticipated site traffic and an abnormally high load. If you expect 100 concurrent users, for instance, a load testing plan would measure how well the application responds with those 100 users – and also with 300 users, in case your estimates are inaccurate. Even if you don’t have reason to respond to a 300-user workload, at least the site can be designed to degrade gracefully, such as to cut back some application functionality in order to ensure the important tasks complete.
If you expect 100 users, then it makes sense to test 300 to account for user ebb-and-flow and to know where things break. But is 3,000 too much? Is 30,000 ridiculous? Who decides what that number should be? You need to strike a balance. It’s easy to get caught up in a limitless stress test scenario and forget to address other elements of the testing process.
A recent example is the state unemployment websites that were rendered unreachable when millions of Americans were furloughed or fired following the lockdowns caused by to the Covid-19 virus. From New York to Oregon, state unemployment sites went down, or were so slow as to be unresponsive. Because who stress tests an unemployment site for 20% unemployment? Those are Great Depression levels, considered unthinkable until now.
The ancient cliché “expect the unexpected” applies here. It would be unreasonable to design a website for 20 million concurrent users. Or would it? For how many concurrent users should you design and test?
The sky’s the limit (sort of)
If you have the time, money, and compute resources, say veteran testers, keep going until you hit your limit. Of course, no one has unlimited resources, but use whatever you have.
“Explore where your breaking point is, regardless of where the load might be,” says Eric Proegler, a senior software tester with CreditKarma and a developer of the CloudTest load testing tool. “If you don’t know where the limit is, you can run into it without realizing it. It’s like a sandbar out there you might drive your boat into.”
There are practical issues beyond time and money resources, Proegler says. Developers can encounter tools’ license limits. Many tools are licensed on a per-user basis, which may affect the number of users you can simulate. “Open source tools step around that a little, but then there’s a limit of the number of computers you can simulate them on,” he says.
James Bach, consulting software tester with Satisfice, takes it one step further. “I would go with as high a load as I can easily arrange, just to see what happens. If a million concurrent users were not too costly to try, then I would try that. In other words, a big reason why I would not use a particular load is that it would cost a lot of time or resources to make that happen. And if it is not costly, then curiosity alone would drive me to try it,” he says. Within reason, that is. “There is always something else I can do with my time and energy that might be more worth doing.”
Ben Simo, a veteran QA tester and past president of the Association for Software Testing, also advocates going further than you anticipate seeing. “The appropriate test scenario is to do a usage model for test and the worst case scenario. I like to go beyond what is the worst case to understand what the limits are,” he says.
The extreme scenarios eventually reach a state of diminishing returns; in many cases, the real problems may show up early. “I won’t go higher than I need to go to get useful information,” says Bach. “I might not need to test with a high load in order to know that it won’t work well. A lower load might already indicate the performance envelope.”
Design to avoid bottlenecks
With the right design, many bottlenecks shouldn’t even reach the testers, argues Simo. “Although testers may not be involved in the initial design, I want testers to question any of these design issues should they encounter them. However, I have encountered many performance testers who primarily focus on whether systems meet specified technical speed and volume requirements rather than look at the bigger picture and help teams make informed design decisions,” he says.
One example of poor design – and a failure of load testing – was the 2013 Healthcare.gov site launch, which immediately crashed. The site wasn’t adequately tested before launch, among other design flaws.
Hindsight is always 20/20 but in this case, the design flaws of HealthCare.gov were glaring and should never have been designed into the system to begin with. Among those defects was its scalability, based on the original Healthcare.gov’s synchronous, sequential workflow. You had to provide income information to see if you qualified for the program; that data had to be verified before you could look at the plans. It wasn’t built for people to browse and see their options first, as is standard practice at any e-commerce site.
A glaring design mistake like this should never get to the test phase, Simo argues. If it does, professional QA testers should feel justified in criticizing it. “Although testers may not be involved in the initial design, I want testers to question any of these design issues should they encounter them,” says Simo. “However, I have encountered many performance testers who primarily focus on whether systems meet specified technical speed and volume requirements rather than look at the bigger picture and help teams make informed design decisions.”
HealthCare.gov was a bad aberration to what is normally good design practices for e-commerce sites. Systems that normally deal with high volume spikes tend to scale better because they are designed to scale horizontally, says Simo; instead of throwing bigger hardware at the problem, you add virtual machines to pick up the work.
The challenge, Simo adds, is that people who deal with smaller volumes of traffic tend not to design systems that can handle unexpected growth. That requires developers (and testers) with a different mindset. “I’ve been seeing patterns where in software systems that normally operate in high volume, people make better decisions in making scalable systems than people who deal with a low volume,” he says. “The high-volume people scale out horizontally rather than scale up with a big system.”
Involve the stakeholders
A tester may want to take load testing as far as possible. But who says “Stop!”?
That’s the job of the project stakeholders, says Bach. “My first cut at that is simply ‘Who pays my salary’?” he says. “Sometimes the project manager is that person. Sometime it’s the CEO, or the client who hired my company.”
One unfortunate effect of Agile philosophy is to occasionally obscure who matters, says Bach; the result is that testers are unsure who they work for. But if there is a role called product owner, then one would think that person gets to decide.
Accurately modeling a system workload is a specialty skill, says Proegler, and few people really know it well. Not many people can design and plan for massive spikes in traffic, and even they fudge some of the testing factors. “The expertise of what production workloads should look like is rare,” Proegler says. So you have to turn to the people who understand the business issue: the software’s users and stakeholders.
Project stakeholders have some understanding of what the project needs to do, says Proegler. “There has to be some reconciliation between business expertise and technical expertise, where the stakeholders ask, ‘How do we model that in a way relevant to performance testing?’”
“There are a lot of misses in that area and very few people feel comfortable to say a number,” Proegler adds. “Stakeholders should realize we’re solving a problem. Having an understanding of business patterns and what extremes look like make it possible to come up with a load model. We need to know: How many visitors in a day can we expect?”
Use storytelling to convey the issues. To get stakeholders to take action, says Simo, the design and test teams need to connect to a plausible scenario that might happen, such as a spike in unemployment claims or a news article going viral. “Unless we get a plausible scenario, we are unlikely to get support for spending the money and time to pursue what is likely to be perceived as unlikely or unreasonable scenario.”
As a result, you should find out what those decision makers care about and frame the problem into a plausible scenario they understand. Put it in their terms, not the development team’s. One of Simo’s product managers connected on issues of revenue and liability. “If I failed to connect to revenue and liability, he didn’t care to hear what I had to say,” says Simo. “It’s important for testers to understand the domain they are working on.”
Ideally, test your load limits as much as you can, as far as you can. But don’t do it at the expense of other aspects of testing, such as memory leaks or security vulnerabilities.
By all means, use load testing tools to make your job easier. In fact, it’s a good idea to automate the things that a computer can do well – so that the humans can focus on the tasks that require real human intelligence. This white paper on test automation can give you useful guidelines on where to draw the line.