Today we use the web for almost everything. With continuously growing numbers of users for their web applications, developers face the issues of performance and scalability more than ever. This is also the case here at TOPdesk: while there used to be a small group of people developing performance tests, we now aim for the goal that each development team is able to write and run their own tests. To make it easier for teams who are new to this, we are collecting guidelines and documentation. Here is an introduction into performance testing with pointers for further reading.
- Which parts to test
- How to model the workload
- How much load to apply
- Where and when to run the tests
1. Which parts to test
For functional automated tests you normally think of writing unit, integration or end-to-end tests. You might also be familiar with the concept of test pyramid. For performance tests you face a similar choice. You can test small parts of the code, or individual components of your system — for example, one service. You can also test multiple components in integration, or perform tests with browsers at the end-to-end level.
Internal tests / microbenchmarks
For functional testing, it is a good practice to write unit tests that verify individual methods or classes. For performance, testing small pieces of code (microbenchmarking) is more difficult and most of the times you are better off focusing on higher level tests. This is because optimizations from the compiler, virtual machine, hardware etc. work differently when you run a small piece of code in isolation than when you run the whole application. You should only consider microbenchmarks for pieces of code with high impact on the application’s performance — for example, if you wrote your own algorithm or data structure. Nowadays there is rarely a need for this, as you can easily find third-party libraries that are reliable and document their performance.
If you do decide to write microbenchmarks, there are various tools available depending on the language you use. For Java and other JVM languages you can use JMH. Here is a quick tutorial for JMH which also points to some common pitfalls with microbenchmarking in the JVM. Another option for Java is to add timings to your JUnit tests, which will give you a rough performance estimation. You can do this by integrating Apache JMeter with JUnit. Here is an introduction about how to do that.
Component level tests
A more effective way to get started is to test higher level components of your application for performance. Such components are for instance (micro)services or the application’s tiers. Testing individual components will not show how well the whole application performs, but it can help you quickly pinpoint the most obvious problems.
API level testing: For services with an HTTP-based API, there is enough choice of tools and testing is relatively easy. One of the most popular tools, which we are also using, is Apache JMeter. For further reading, this article makes a nice comparison of more tools for API performance testing.
Testing front-end components: In the past years, many web applications have shifted the rendering work from the server to the client side. If your application also has complex front-end components, you might want to look at their performance. There are various tools to help improve your front-end’s performance: some of them included in modern browsers, others available online (for instance, https://www.webpagetest.org). For automated front-end performance tests, two options are Google’s PageSpeed Insights API and YSlow for PhantomJS. Also an interesting read is this article describing BrowserLab, a system developed at Facebook for client-side performance testing; among others, it explains an approach for testing the front-end components in isolation.
An integration test involves multiple components / services that work together, or a single service together with its data store. When your system has a large number of components (like in a micro-service architecture), the communication overhead might cause more performance issues than the components themselves. This is why it is important to also test components in integration. To take the analysis one step further, you can measure the number of requests made among components. This is described by Dynatrace as dynamic architecture validation.
An end-to-end test gives a better idea of how the whole system performs and can show if the components interact as they are expected to. Such a test however also involves a more complex software and hardware infrastructure: web server(s), browsers, controllers, multiple physical machines etc. All of these parts and the latencies among them will influence the timings of your tests, sometimes producing variations that are hard to track down. You should therefore keep in mind that the results of an end-to-end test will only be an estimation of the performance that your application will have in production.
Selenium is the most widely used system for end-to-end testing, and you can set it up for performance tests. The Selenium community does warn as well about the pitfalls of end-to-end performance testing. There are several commercial solutions for Selenium-based performance testing, but you can also configure a test environment yourself — by using Selenium Grid to run a large number of browsers.
If possible, it is better to use a headless browser like PhantomJS with Selenium for performance / scalability tests. Even if a headless browser does not behave exactly like a full-fledged one, it does consume significantly fewer system resources. You will thus be able to launch a larger number of browser instances in your test environment, which is important if you are running a scalability test.
Performance tests are expensive to write and deploy, so you cannot cover as much of your code with them as with functional tests. You will need to choose the parts of your system with the highest performance risks. Think of services that are computation-intensive or highly used, components that communicate a lot, large data stores etc. Instead of a testing pyramid, in most cases you want to aim for a diamond-like scheme with:
- a large number of integration and component-level tests (will give you reasonably precise and reproducible estimations of how components perform and how they interact)
- a small number of end-to-end tests (will give you a rough indication of how your system performs overall)
- if needed, a small number of microbenchmarks / internal tests for parts of your code with high performance impact
2. How to model the workload
Once you chose a part of your application to test for performance, the next step is to set up the workload you will apply in the test.
An often used approach is to generate the workload yourself — this is called “synthetic workload”. If the component you are testing has multiple functionalities, you need to decide which (combination) of them to test. For example, for an API you will choose the endpoints you want to test; for an end-to-end test, you will choose user actions such as logging in or searching for an item in the web shop.
To start quickly, choose the most important functionalities / endpoints and test them separately. To move towards more realistic scenarios, you should consider how the application is normally used — for instance, a user first logs into a web shop, then searches for a product, then purchases it. If your application is already running in production, a good way to find the usage patterns is to analyse the logs or use a monitoring tool. This article gives some guidelines about how to model workload for web applications.
Instead of generating the workload yourself, there is also the option to use actual production workload. This method spares you from having to guess what a realistic workload would be, but it comes with its own difficulties.
Live traffic: One way to apply production workload in a performance test is to capture live HTTP traffic and send it to your test environment. An open source tool for this purpose is GoReplay. Using live HTTP traffic however requires a more complex infrastructure and has the disadvantage that you cannot repeat the test later with the same workload. A repeatable test helps detecting if there were changes in your application’s performance over multiple versions, and can also be used to determine long term trends.
Log replay: The other way to apply production workload in performance tests is to collect logs from production servers and replay them offline. This requires less infrastructure to set up and, like a synthetic workload, allows for repeatable tests. On the other hand, a web server log will most likely not contain all the information you need for replaying the workload. For instance, web servers usually don’t log POST data as it might be too large and/or contain sensitive information. You would thus either have to limit your tests to GET requests, or find a workaround for handling POST requests. If you choose to replay web server logs, you can use JMeter for this purpose. These articles show ways to do it for Apache and IIS servers.
To start small, begin with either:
- a simple synthetic workload to test separate operations / endpoints
- log replay for GET requests if they make up most of the production workload
If you have more time to invest, design realistic synthetic scenarios or find a good way to reproduce your whole production workload.
3. How much load to apply
An obvious question when you plan a performance test is how much load to put on the system. This usually refers to the number of requests per minute or hour, but you should also think about other aspects that influence the system’s performance. An important one is the size of the data store you use in the testing environment.
A good starting point is a load test, which means to test the system under usual load conditions. Again, if your application is running in production, the best way to determine the usual load conditions is to monitor it. Otherwise, just try to estimate the expected load or, where applicable, discuss that with the customer. Either way, make sure you take into account what the peak load on your system would be. Think of scenarios like all the employees of a company logging in early in the morning, or a large number of people shopping online during the sales.
Stress & soak tests
To take your performance tests further, you can also consider stress tests. These are tests that expose the application to higher loads than expected, or have it run with less resources than normal. The purpose of stress tests is to check if the application is able to recover from abnormal situations like a temporary peak in the load or the failure of some hardware resources. A stress test might also reveal some other unexpected behaviors or bugs in your application that do not occur under normal load.
A particularly useful type of stress test is the spike test, which applies bursts of high load to the system. A spike test can show how your system behaves when exposed to a very sudden increase in load, for instance right after it was mentioned in the news or social media. Here is an article showing different spike test scenarios.
To stress test the communication between services (or between client and server side), you can use special tools to simulate network problems like high latency or packet loss in your test environment. On Linux this can be achieved with the Network Emulation (netem) kernel component. This blog post shows how to use netem in a test environment.
A type of test that gives insight in your application’s stability is the soak test. This is a test you run for a long period of time (a few days), to check if the application is leaking resources — for instance, memory.
Begin with measuring or estimating the production load, and make load tests based on it. If you have more time to invest, think about what your highest risks are (load spikes? network outages? memory leaks?) and try to cover them with stress or soak tests.
4. Where and when to run the tests
For the “where” part of the question, you should have dedicated (virtual) machines for the test. This is to avoid interference with other applications. For (end-to-end) tests with automated browsers, keep in mind the fact that the browsers consume quite a lot of resources themselves. You should allocate enough machines to run the browsers, otherwise they will become the bottleneck of your test instead of the servers.
It is also a good idea to monitor the machines involved in the test. This will show when a resource problem on the machines (like a network connectivity issue, or too little memory or disk space) is causing the tests to perform poorly or even to fail. Moreover, monitoring can also show how much resources your application is consuming and help you detect various leaks: memory, files or network connections that are not closed properly etc. A free stack of monitoring tools that we had good experience with is Telegraf / InfluxDB / Grafana. Here is an article that presents more options for monitoring.
For the “when” part of the question, the best answer is “regularly”. This will help you identify performance regressions in your code. To identify regressions, the first step is establishing a performance baseline. You can then compare subsequent runs of the test with this baseline and also identify long-term trends in your performance results.
If you have a continuous delivery process in place, you should aim to add the performance tests to your pipeline. If some of the tests consume a lot of resources or take a long time to run, you can choose to run them less often. For further reading, here are suggestions for integrating performance tests in your continuous delivery pipeline.
Last but not least, if you have the performance tests running automatically you should also be able to generate reports automatically. For JMeter there is a plugin with basic functionalities for this purpose. There are also commercial solutions with more advanced reporting capabilities, but sometimes even a simple script can be good enough for the purpose.
One thing to keep in mind for reporting is that you should gather not only the performance related results: you should also report if the test completed correctly or not. You will typically get a lot of logging data from a performance test and it is very easy to overlook errors hidden in the log that cause misleading performance results. Make sure to collect the functional errors from your test logs and include them in your reports — it will make troubleshooting much easier.
- run performance tests on a dedicated (and monitored) infrastructure
- automate your performance testing process as much as you can
- have a method to identify performance regressions
- make sure errors are reported properly and visible enough
Thanks to Robbert Jan Grootjans and Arvind Ganga for providing very helpful feedback on this post.