ApacheBench and the Story of a Horrid Bugfix

ApacheBench and the Story of a Horrid Bugfix

The problem: some time ago, while working on a project we had to perform a rather complicated bugfix. To make sure our database data was up to date, the bugfix had to perform multiple concurrent connections to an external and heavy service. This made us worry about the application’s performance. We used ApacheBench to measure response times.

We added a feature toggle. When enabled, it made connections to the external service, and when disabled it did no validation and just used current data from the application’s database. We considered whether the bugfix is worth enabling in production code or whether this would just be too slow. We needed to check how the bugfix would slow down the whole system.
 

Why ApacheBench?

Testing with cURL or some browser REST client would only show how quickly an endpoint is able to respond with a single request. What we were really interested in were the response times at thousands of requests done concurrently, as this was a very frequently requested endpoint.
To compare response times, we chose ApacheBench (ab). It’s a command line tool for measuring the performance of HTTP Web Servers. It’s pretty simple to use and install. As we needed to measure only response times, a simple program was just enough for our purposes.
To use it, you just need to install Apache httpd utils. The installation depends on your OS.
Other tools might be more sophisticated, but ab is simple, quick and good for testing a single endpoint. And this was exactly what we were about to do.
With ApacheBench, you can perform multiple requests on multiple threads and measure the request times.
 

Let’s take a look at the following command:

ab -n 5000 -c 20 "http://foo:8080/bar"

performs 5000 requests (-n 5000) in 20 threads (-c 20) hitting the endpoint http://foo:8080/bar
Output example for the command above is as follows:

Server Software:        foobar/1.0.0-SNAPSHOT
Server Hostname:        foo
Server Port:            8080
Document Path:          /bar
Document Length:        285 bytes
Concurrency Level:      20
Time taken for tests:   13.049 seconds
Complete requests:      5000
Failed requests:        0
Write errors:           0
Total transferred:      2355000 bytes
HTML transferred:       1425000 bytes
Requests per second:    383.17 [#/sec] (mean)
Time per request:       52.196 [ms] (mean)
Time per request:       2.610 [ms] (mean, across all concurrent requests)
Transfer rate:          176.24 [Kbytes/sec] received
Connection Times (ms)
               min  mean[+/-sd] median   max
Connect:        0    1   0.2      1       7
Processing:    14   51  20.9     48     288
Waiting:       14   51  20.8     48     288
Total:         15   52  20.9     48     289
Percentage of the requests served within a certain time (ms)
 50% 48
 66% 55
 75% 60
 80% 64
 90% 76
 95% 88
 98% 106
 99% 126
100% 289 (longest request)

As you can see, ApacheBench gives you quite a lot of information. The output begins with some basic data like hostname, port, path for the content and its length etc. Time taken for all tests tells us how long it took to conduct our 5000 requests. There’s also information if all of them were completed and if not, and what kind of errors occurred.
 
We also have info about connection times (minimum, maximum, mean and median):

  1. Connect – the waiting time for connection to the server (in the case of keep-alive connections, this may be zero)
  2. Processing – the time measured after connecting to the server until the end of the download of the entire document and connection close
  3. Waiting – the time from sending a request to the server to receiving the first bytes of the server response

“Percentage of requests served within a certain time” is the last part of the report.
 

Testing scenario

We decided to do a test of 5000 requests performed simultaneously by 20 threads with variants. For each request, we needed to make 0 (feature flag disabled), 5, 10, 20 or 50 external service connections. What interested us the most was how increasing the number of concurrent external service calls affects the 95th percentile of response times. We expected it to be < 1000 ms in each case, even under load of thousands of requests performed by multiple threads.
 

Analysis of results

The results for 95th percentile of request times, depending on number external service connections, were as follows:
0 connections: 88 ms
5 connections: 431 ms
10 connections: 720 ms
20 connections: 1067 ms
50 connections: 1412 ms
The results showed that responses were delivered up to 16x slower with feature toggle enabled for complicated data, which makes a big difference. Processing time increased significantly with the increase in number of calls to external service, however, not linearly. The difference between 20 and 50 connections was relatively small, while the difference between 0 and 20 calls was larger than we thought.
This put us in a situation where we had to wonder adding a bugfix is actually worth it if it slows down the server response by an order of magnitude. The decision was up to the managers. Ultimately, they had to decide what was more important: either presenting data which was always up-to-date, or a well-performing endpoint.
ApacheBench

Conclusion

The ApacheBench tool is very simple, easy to install and, above all, suitable for measuring response times of a specific endpoint.
For more complex usages, where the testing scenario involves multiple endpoint or more output data is needed, you can try to use more advanced tools such as Apache JMeter, Tsung, Gatling etc. However, in our case, ab gave us all the information necessary to make important decision about the software. In situations similar to the use case described in this article, it’s a perfect tool to investigate a specific problem and get valuable feedback very quickly and easily.