Performance Testing with Locust
If you start talking about performance testing, everybody thinks about JMeter first, as it undoubtedly remains the most known tool with the biggest number of plugins. As for me, I have never liked JMeter because of unfriendly interface and high learning curve which you face each time when it’s required to test something more complicated than a “Hello World” application.
And now, inspired by successful testing within two different projects, I’ve decided to share some information about a relatively simple and convenient software — Locust.
What is Locust?
Locust is an open-source testing tool, which allows us to specify loading scenarios by a Python code, supports distributed loading and, according to authors, is used for the Battlelog load testing for the Battlefield games series (which immediately wins you over).
Advantages:
- Simple documentation, including a copy-paste example. It is possible to begin testing with just basic programming skills.
- It utilizes a requests library (HTTP for humans). Its documentation can be used as a detailed prompt to debug tests.
- Python support — I just like this language.
- The previous item allows using different platforms to launch tests.
- A dedicated web-server on Flask to present test results.
Disadvantages:
- No Capture & Replay — all is done manually.
- Consequently, you need to think. As in the case of using Postman, it is necessary to understand the mechanics of HTTP.
- Minimal programming skills are required.
- The linear load model, which immediately disappoints those who like to generate load “by Gauss”.
Testing process
Any testing is a complex task that requires planning, preparation, performance control, and results analysis. With performance testing, it is necessary, if possible, to collect all the data, which is able to influence the result:
- Hardware servers (CPU, RAM, ROM);
- Software servers (OS, server version, JAVA, .NET, and others, database and data quantity, server and tested application logs);
- Network bandwidth;
- Proxy-servers, load balancers and DDOS shield presence;
- Performance testing data (users quantity, response average time, queries quantity per second).
Hereinafter described examples can be classified as black-box functional performance testing. We can measure performance even without having any information about the application under test and without access to the logs.
Before starting
To check the performance tests in practice, I have locally deployed a simple web-server. Almost all of the following examples will be done on it. I have taken the server’s data from the deployed online example. To launch it, nodeJS is necessary.
Obvious spoiler: experiments with performance testing are better performed locally, without loading online services to avoid being banned.
Python is necessary to start, and I will use version 3.6 and Locust itself (at the moment of writing the article — version 0.9.0) in all examples. It can be installed using the following command:
python -m pip install locustio
Installation details are described in official documentation.
Example analysis
Further, we need a test file. I have taken the example from the documentation, because it is very simple and clear:
from locust import HttpLocust, TaskSet def login(l): l.client.post("/login", {"username":"ellen_key", "password":"education"}) def logout(l): l.client.post("/logout", {"username":"ellen_key", "password":"education"}) def index(l): l.client.get("/") def profile(l): l.client.get("/profile") class UserBehavior(TaskSet): tasks = {index: 2, profile: 1} def on_start(self): login(self) def on_stop(self): logout(self) class WebsiteUser(HttpLocust): task_set = UserBehavior min_wait = 5000 max_wait = 9000
That is it! That is enough to start the test! Let us analyze the example above before getting down to testing itself.
Skipping the “import” part in the very beginning, we can see two almost identical functions of login and logout, consisting of one line. l.client is the object of HTTP session, which we are going to use to create the loading. We are going to use a POST method, almost identical to the one in the requests library. I say “almost identical” because in this example we input not a full path URL as the first argument, but only its part, i.e. a specific service.
The data are transferred as the second argument, and, I must admit, using Python dictionaries is very convenient as they are automatically converted to json.
It is also worth pointing out that we do not process the request result in any way — if it is successful, the results (cookies for instance) will be saved in this session. If an error occurs, it will be recorded and added to the load statistics.
If we want to know, whether our request is correctly written, it is possible to check it in the following way:
import requests as r
response=r.post(base_url+”/login”,{“username”:”ellen_key”,”password”:”education”})
print(response.status_code)
I have added only base_url variable, which must contain a full address of tested resource.
Next several functions are requests that will create the load. Once again, we do not need to process server response — the results will immediately appear in statistics.
Further on, there is a UserBehavior class (the class may have any name). As the name suggests, this class will describe the behavior of a spherical user in the vacuum of tested application. Tasks property will be supplied from a methods dictionary, called by a user, as well as frequency of calls. Although we do not know which functions will be called by each user and their order (they will be selected randomly), we guarantee that index function will be called, on average, twice as often as profile function.
Apart from behavior description, TaskSet parent class allows assigning 4 functions, which can be performed before and after the tests. The order of calls is going to be the following:
- setup is called once at the start of UserBehavior(TaskSet)/span> — it is not given in the example.
- on_start is called once by each new loading user at the beginning of work.
- tasks is the performance of the tasks themselves.
- on_stop is called once by each user when the test has finished its work.
- teardown is called once when TaskSet has finished its work — it is also not given in the example.
It is worth reminding that there are two ways to define user’s behavior: the first is mentioned in the previous example —functions are specified in advance. The second way is to specify methods inside UserBehavior class:
from locust import HttpLocust, TaskSet, task
class UserBehavior(TaskSet): def on_start(self): self.client.post("/login", {"username":"ellen_key", "password":"education"}) def on_stop(self): self.client.post("/logout", {"username":"ellen_key", "password":"education"}) @task(2) def index(self): self.client.get("/") @task(1) def profile(self): self.client.get("/profile") class WebsiteUser(HttpLocust): task_set = UserBehavior min_wait = 5000 max_wait = 9000
In this example, user’s functions and their call frequency are set by task annotation. Functionally, nothing has changed.
The last class given in the example is WebsiteUser (the class can have any name). In this class, we set user’s behavior model UserBehavior, and minimum and maximum time of waiting between calls of each user’s separate tasks. To clarify it a little, this can be visualized in the following way:
Starting testing
‘Launch the server’ is still to be tested:
json-server --watch sample_server/db.json
Also, let us modify the example file so that it corresponds to the service which we are testing. Let us remove login and logout, and set user’s behavior:
- When you start working, open the main page once.
- Receive the list of all the posts х2.
- Comment on the initial post х1.
from locust import HttpLocust, TaskSet, task class UserBehavior(TaskSet): def on_start(self): self.client.get("/") @task(2) def posts(self): self.client.get("/posts") @task(1) def comment(self): data = { "postId": 1, "name": "my comment", "email": "test@user.habr", "body": "Author is cool. Some text. Hello world!" } self.client.post("/comments", data) class WebsiteUser(HttpLocust): task_set = UserBehavior min_wait = 1000 max_wait = 2000
To launch in the command line, perform the following command
locust -f my_locust_file.py --host=http://localhost:3000
where host is the address of the tested resource. It will be supplemented by service addresses specified in the test.
If there are no mistakes in the test, the loading server will start and will be accessible at http://localhost:8089/
As can be seen, the tested server is mentioned, and the addresses from the test file are added specifically to this URL.
Here we can also set the quantity of users to create the load, as well as their increment per second.
Start the test by clicking on the “Start swarming” button.
Results
After some time, let us stop the test and look at the first results:
- As expected, each of 10 created users appeared on the main page at the very beginning.
- On average the posts list was opened two times more frequently than comments were written.
- There is an average and a median time of response for each operation, and quantity of operations per second, which is already useful information, which can be used to compare actual performance with the expected result.
The second tab has the loading graphs in real time. If the server falls (over) under a certain load, or its behavior changes, the graph will show it immediately.
The third tab contains mistakes. In my case, that is client’s mistake. But if the server returns mistakes 4ХХ or 5ХХ — their text will be recorded specifically here.
If a mistake happens in your text code, it is moved to Exceptions tab. So far my most frequent mistake is connected with the print() command in the code — this isn’t the best logging technique :)
The last tab allows loading all the test results in the CSV format.
Are these results relevant? Let us think about it a little. Most often, demands for performance (if specified at all) are something like this: average time of loading the page (server response) must be less than N seconds under the load of M users, without specifying what the users have to do. And this is what I like locust for — it creates the activity of a specified quantity of users who in random order perform activities expected from users.
If you need to perform a benchmark test, i.e. to measure system’s behavior under various loads, several behavior classes can be created and several tests under various loads can be conducted.
Benefits from writing a load test with Python code
Server response handling
Sometimes in performance testing it is not enough to simply receive 200 OK from HTTP server, and it is necessary to check the contents of the answer to make sure that under the load the server outputs correct data or performs correct calculations. Specifically for these cases, there is a possibility to configure criteria of successful response in Locust. Let’s check out the following example:
from locust import HttpLocust, TaskSet, task import random as rnd class UserBehavior(TaskSet): @task(1) def check_albums(self): photo_id = rnd.randint(1, 5000) with self.client.get(f'/photos/{photo_id}', catch_response=True, name='/photos/[id]') as response: if response.status_code == 200: album_id = response.json().get('albumId') if album_id % 10 != 0: response.success() else: response.failure(f'album id cannot be {album_id}') else: response.failure(f'status code is {response.status_code}') class WebsiteUser(HttpLocust): task_set = UserBehavior min_wait = 1000 max_wait = 2000
The example above has a single request aimed at creating load according to the following scenario:
The photos objects are requested from the server with random id in the interval between 1 and 5000. The id of the album in these objects is checked, presuming that it cannot be divisible by 10.
Several explanations seem appropriate here:
- construction with request() as response: can be replaced with response = request() to work with a response object.
- URL is formed according to string format syntax, this feature has been added to python 3.6, — f’/photos/{photo_id}’. This construction doesn’t exist in previous versions!
- new argument catch_response=True, indicates to Locust that we ourselves will specify a successful server response. If it is not specified, we will still receive the answer object and will be able to process its data, but will not be able to predetermine the test result. A detailed example is provided further.
- one more argument, name=’/photos/[id]’, is necessary to group requests in statistics. Any text can be used as a name, and we don’t have to repeat the url. Without it, each request with unique address or parameters will be recorded as a separate statistic record. It works in the following way:
By using this argument, it is possible to perform another trick — sometimes one service with different parameters (for example with different POST requests content) executes different logic. In order for test results not to get mixed, it is possible to write several tasks, specifying a separate argument name for each.
Then we perform checks. I have done two of them. Initially I have checked whether the server returns us the answer: if response.status_code == 200:
If the answer is correct, then I check whether the album id can be divided by 10. If it is not divisible, this answer is marked as successful: response.success().
In other cases I have paid attention to the reason of response failure: response.failure(‘error text’). The following text is displayed on Failures page in the course of test execution.
Attentive readers could notice the absence of the exceptions handler (Exceptions), which is typical for code that is working with network interfaces. In fact, in case of timeout, connection error, and other unexpected exceptions, Locust processes them by itself and returns a response object in any case, setting the response status code to 0.
If the code generates Exceptions, it is recorded in the Exceptions tab during execution so that we can check it. The most typical situation is when json’s answer does not return the expected value, but we are already performing / we have already performed operations on it.
Before moving on I’d like to point out thatI use json server to illustrate things in the example, because it is easier to handle responses in this way. Nevertheless, in the same way it is possible to work with HTML, XML, FormData, attached files, and other data utilized by protocols based on HTTP.
Working with complicated scenarios
Almost every time when a Web-application is to undergo load testing, it quickly becomes clear that it is impossible to thoroughly cover everything using GET services only, which simply return the data.
A classic example: to test an Internet-shop it is desirable for a user to
- open the main page of the shop,
- search for goods,
- open the details of a product,
- add a product to the cart, and
- pay.
It is clear from the example that calling services in random order isn’t possible, and can be done only in a sequence. Moreover, the goods, the cart, and the payment method can all have unique identifiers for each user.
Using the previous example, small updates can help us conduct testing of such scenario easily. Adapting the example to our testing server:
- A user writes a new post.
- A user writes a comment to the new post.
- A user reads the comment
from locust import HttpLocust, TaskSet, task class FlowException(Exception): pass class UserBehavior(TaskSet): @task(1) def check_flow(self): # step 1 new_post = {'userId': 1, 'title': 'my shiny new post', 'body': 'hello everybody'} post_response = self.client.post('/posts', json=new_post) if post_response.status_code != 201: raise FlowException('post not created') post_id = post_response.json().get('id') # step 2 new_comment = { "postId": post_id, "name": "my comment", "email": "test@user.habr", "body": "Author is cool. Some text. Hello world!" } comment_response = self.client.post('/comments', json=new_comment) if comment_response.status_code != 201: raise FlowException('comment not created') comment_id = comment_response.json().get('id') # step 3 self.client.get(f'/comments/{comment_id}', name='/comments/[id]') if comment_response.status_code != 200: raise FlowException('comment not read') class WebsiteUser(HttpLocust): task_set = UserBehavior min_wait = 1000 max_wait = 2000
I have added a new class FlowException in this example. After each step, if it is executed in an unexpected way, I run this exception class to terminate the scenario — if it is impossible to create a post, there is nothing to comment on, etc. The construction could be replaced by an usual return, but in this case in the course of execution and results analysis it will not be clearly seen in the Exceptions tab where exactly the performed scenario has failed. This is also the reason why I don’t use the try… except construction.
Making the load realistic
One can argue that in the shop case example above, all things are really linear, but the example with posts and comments is too far-fetched — posts are read at least 10 times as often as they are created. That is a reasonable observation, so let us bring the example closer to real life. There are at least two approaches:
- “Hardcoding” the list of posts read by users and simplifying the text code if that is possible and if backend functionality isn’t dependent on specific posts.
- Saving created posts and reading them if it is impossible to specify the list of posts, or if making the load realistic critically depends on what posts are read and what posts are not (I have removed commenting from the example to make the code smaller and clearer)
from locust import HttpLocust, TaskSet, task import random as r class UserBehavior(TaskSet): created_posts = [] @task(1) def create_post(self): new_post = {'userId': 1, 'title': 'my shiny new post', 'body': 'hello everybody'} post_response = self.client.post('/posts', json=new_post) if post_response.status_code != 201: return post_id = post_response.json().get('id') self.created_posts.append(post_id) @task(10) def read_post(self): if len(self.created_posts) == 0: return post_id = r.choice(self.created_posts) self.client.get(f'/posts/{post_id}', name='read post') class WebsiteUser(HttpLocust): task_set = UserBehavior min_wait = 1000 max_wait = 2000
I have created created_posts list in UserBehavior class. Please note that this is an object and it is created not in __init__() class constructor. Therefore, unlike user’s sessions, this list is common for all users. The first task creates a post and records its id in the list. The second is 10 times as frequent, and it reads one randomly selected post from the list. An additional condition for the second task is checking if some posts have been created.
If we need each user to operate their own data, it is possible to define them in constructor in the following way:
class UserBehavior(TaskSet): def __init__(self, parent): super(UserBehavior, self).__init__(parent) self.created_posts = list()
Additional functionality
To launch tasks consequently, official documentation suggests using tasks annotation @seq_task(1), specifying the order number of the task in the argument
class MyTaskSequence(TaskSequence): @seq_task(1) def first_task(self): pass @seq_task(2) def second_task(self): pass @seq_task(3) @task(10) def third_task(self): pass
In the example above, each user executes first_task first, then executes second_task, and after that third_task 10 times.
To be honest, I quite like having this feature, but unlike previous examples, it is not clear how to transfer the results of the first task to the second task if necessary.
Another feature can be used for very complicated scenarios. It allows creating embedded tasks sets – creating several TaskSet classes and connecting them.
from locust import HttpLocust, TaskSet, task class Todo(TaskSet): @task(3) def index(self): self.client.get("/todos") @task(1) def stop(self): self.interrupt() class UserBehavior(TaskSet): tasks = {Todo: 1} @task(3) def index(self): self.client.get("/") @task(2) def posts(self): self.client.get("/posts") class WebsiteUser(HttpLocust): task_set = UserBehavior min_wait = 1000 max_wait = 2000
In the example above, Todo scenario will be launched with the probability of 1 to 6, and it will be executed until it is interrupted by UserBehavior scenario with the probability of 1 to 4. self.interrupt() is very important here because if it is absent, testing will be stuck on the subtask.
Locust testing difficulties and how to overcome them
Authorization
When writing the first tests with Locust, I faced the need to get an authorization token on one server and then use it on the second one for performance testing. Here immediately arose the question — of how to do it, because the instrument is tuned to send all the requests to one resource, which we define in the console starting the test. There are several variants to solve the problem:
- to disable authorization on the tested resource — if there is such a possibility;
- to generate a token in advance and insert it into the text code before launch; it’s the worst variant, which demands manual work at each launch, but has the right to exist in several rare cases;
- to send a request using library requests and receive a token in the response — luckily, the syntax is similar.
I have chosen the third variant. An example from the first article, which describes different possibilities of obtaining a token, is given below. The authorization server is google.com and, since there is no token, I will receive the simplest values:
from locust import HttpLocust, TaskSet, task
import requests
class UserBehavior(TaskSet): def on_start(self): response = requests.post("http://mysite.sample.com/login", {"username": "ellen_key", "password": "education"}) # get "token" from response header self.client.headers.update({'Authorization': response.headers.get('Date')}) # get "token" from response cookies self.client.cookies.set('Authorization', response.cookies.get('NID')) # get "token" from response body self.client.headers.update({'Authorization': str(response.content.decode().find('google'))})
As it’s clear from the example, before a work task begins, a user sends a request to an external server and processes an answer, placing the data to headers or cookies.
Headers
When working on the request headers, it is necessary to take into account several important details. For each particular request, it is possible to set its own headers in the following way:
self.client.post(url='/posts', data='hello world', headers={'hello': 'world'})
Running the mentioned example, the hello header key will be added to already existing headers of a client’s session, but in this request only — in all subsequent requests it will be absent. To make the header permanent, you can add it into a session:
One more interesting finding is that if we mention a header, which already exists in the session in a request, it will be rewritten, but only for this request. So, don’t be afraid of erasing something important.
But there are some exceptions. If we need to send a multipart form, the request will automatically form a header Content-Type, where a form data divider will be indicated. But if we forcibly re-write the header with the headers argument, then the request will crush, because the form will not be able to be processed correctly.
Also, it is worth drawing attention to the fact that all the headers are strings in all cases. Trying to submit an integer, for example {‘aaa’: 123} the request will not be sent and the code will return exception InvalidHeader.
Distributed testing
For distributed testing, locust gives several CLI arguments: –master and –slave, for strict roles determination. With this, the machine with master will not simulate loading, but only gather statistics and coordinate the work. Let’s try to launch test server and several sessions in distributed mode executing commands in different consoles:
json-server --watch sample_server/db.json locust -f locust_files\locust_file.py --master --host=http://localhost:3000 locust -f locust_files\locust_file.py --slave --master-host=localhost locust -f locust_files\locust_file.py --slave --master-host=localhost
Opening the locust in a browser (localhost:8089), you can see that in the right upper corner we have the number of machines that will perform the loading.
Testing without UI
When all the tests are written and debugged, it is a good idea to include them in the regress automatic testing and simply periodically check results. With the following command, you can launch the locust test without UI:
locust -f locust_files\locust_file.py --host=http://localhost:3000 --no-web -c 10 -r 2 --run-time 1m --csv=test_result
where
- –no-web — is an argument allowing to launch tests without UI
- -c 10 — maximum number of users
- -r 2 — users increase per second
- –run-time 1m — test execution time (1 minute)
- –csv=test_result — after the test execution in the current folder 2 csv files with results will be created and their names will begin with test_result.
Final facts, findings, and conclusions
Distributed testing can be combined with regression testing — to guarantee that all the nodes for loading are started, you can add the argument –expect-slaves=2 on the master’s; in this case the test will begin only when at least 2 nodes are launched.
Several times, I have faced the situation when the tested resource worked with HTTPS only; at the same, time the certificate was generated by a customer and an operating system marked it as suspicious. To make tests work successfully it is possible to add an argument ignoring the safety check to all the requests, for instance:
self.client.get("/posts", verify=False)
To hide multiple warnings about unsecure connection just add next lines in the locustfile start lines:
import urllib3 urllib3.disable_warnings()
Because I am not always sure in what environment the tests will be launched, I always mention this argument/moment.
That is all I wanted to share regarding performance testing with Locust, a simple and convenient tool with wide capabilities for testing and variability for creating requests and processing answers of the server. Thank you for reading it to the end.