Load Testing Elasticsearch Using Python asyncio and the Slow Log

Over the past couple of days I’ve been reading over Yeray Diaz’s wonderful blog posts on python3 asyncio (AsyncIO for the Working Python Developer and Asyncio Coroutine Patterns: Beyond await) and I decided to see if I could come up with some sort of Elasticsearch load testing framework.

Soon after directing my attention to Elasticsearch I ran across a neat idea in Florian Hopf’s blog post Logging Requests to Elasticsearch: If you set all of the slow log thresholds to 0s then you will log all of the traffic coming through elasticsearch in its entirety.

index.search.slowlog.threshold.query.debug: 0s
index.search.slowlog.threshold.fetch.debug: 0s
index.indexing.slowlog.threshold.index.debug: 0s

So I came up with a goal for learning asyncio and making something useful in the process:

Make an asyncio-powered Elasticsearch load test utility that consumes production traffic from the slow log and plays it back as requests to a test cluster

And after bumping my head on the desk for a few days, I actually built it! Check it out! Here is the script on github. And here’s how you run it:

$ source activate your_python_3.7_environment
$ python load_tester.py --log_file='~/load_testing/machine_1_elasticsearch_slowlog.log' --host='qa-core-es3' --port=9200 --speed_multiplier=2 --run_time_minutes=10

Given the specified log file, load_tester.py parses the log lines, pulls out the timestamp and the request. Then each request is sent to the specified host to simulate the production load. One neat thing about this setup is that the simulated load does not just contain the same requests, but they are played at the same relative times as the original traffic. So this has the potential to be a very close approximation to real traffic. What’s more, you have the option to specify a speed_multiplier parameter that controls how fast the logs are played back. Want so see how well your cluster performs under 2x the production traffic? All you have to do is set speed_multiplier=2. The final parameter run_time_minutes tells how many minutes to run the simulation. If your production log runs out before run_time_minutes is up load_tester.py starts over at the beginning of the logs and reruns the log as many times as needed.

Internal Design

The design is composed of 5 classes:

The design is reasonably generic. For instance, the LoadTester object doesn’t know it’s dealing with Elasticsearch, it just receives a stream of Request object and it knows how to send these requests off at the appropriate time to some other specified host/port. So in principle you could make some other request generator in the pattern of ElasticsearchRequestLogParser.

Sample Run and Output

So let’s go ahead an run it.

$ python load_tester.py --log_file='~/load_testing/machine_1_elasticsearch_slowlog.log' --host='qa-core-es3' --port=9200 --speed_multiplier=10 --run_time_minutes=1

At the end of the run load_tester.py prints out some summary information that looks like this:

{
    "run_information": {
        "run_time_minutes": 1.0000119845072428,
        "num_sent_requests": 53720,
        "average_requests_per_second": 895.322603333109,
        "num_outstanding_requests": 33,
        "seconds_behind": 0,
        "percentage_behind": 0.0
    },
    "accumulator_information": {
        "completion_status_counts": {
            "200": 53687
        },
        "average_time_per_successful_request": 4.0092573621174585
    }
}

The run_information tells you some basic information about the run. For instance we can see that in the 1 minute that the script was running we sent over 53K requests to elasticsearch averaging just over 895 requests per second. Not bad right!? At the end of the run there were 33 outstanding requests. seconds_behind represents how many seconds (wall clock time) ago the last request should have been sent out in order to be delivered on time. 0 indicates that at 895 requests per second, the simulation is keeping up just fine! If I crank up the speed_multiplier to some insanely high level then my laptop will max out at about 1267 requests per second on one process.

The accumulator_information is the information collected and summarized by ElasticsearchResponseAccumulator. In this case we see that all the requests have succeeded 100% of the time. But, keep in mind that python only takes advantage of a single cpu core. I can spin up 8 copies of this process and each one will average about 390 requests per second for a total of about 3,500 requests per second. With this load the completion statuses are now a mixture of mostly 200s, a hand full of 503 Service Unavailable, and a few 429 Too Many Requests.

So long as your computer can keep up with the load (e.g. the seconds_behind should remain near 0) then it makes a lot of sense to spin up several runs like this on one machine. In principle each load simulation would be responsible for replaying the log file from one of the production servers to a clone test cluster.

Still To Do

There are plenty of things we can do to improve the current code.

Wanna Help?

If you think this a neat idea, lemme know! I’d love advice on the asyncio, the Elasticsearch, or anything else. Send me pull requests or ping me on Twitter @JnBrymn.

comments powered by Disqus