TLDR:
JMeter crawls with just 1 test user if I select "Retrieve all embedded resources".
JMeter throws Uncaught Exception java.lang.OutOfMemoryError: Java heap space in thread Thread if I test with 100 users
How to eventually scale to test with thousands of users?
More details:
I'm recording the jmx script using the BlazeMeter Chrome extension. The user logs in, and completes a course and a test. There are many ajax requests being made along the way. Around 350 - 400 steps for the script. Mostly ajax POST request that have a json response.
When I click through manually in Chrome, the site loads quickly. Maybe a page load takes 2 seconds tops.
But when I import that script into JMeter with the "Retrieve all embedded resources" and "Parallel downloads" set to 6, and run it (in the GUI initially with just 1 user), it will get through, say, 7 steps quickly, and then just hang, sometimes for 10+ minutes before advancing to the next step. This doesn't happen if I uncheck "Retrieve all embedded resources", but I don't want to do that since that wouldn't be a realistic test.
If I take that same test and run it with 100 users (from the command line using JVM_ARGS='-Xms4096m -Xmx4096m' sh jmeter -n -t myfolder/mytest.jmx -l myfolder/testresults.jtl), I get Uncaught Exception java.lang.OutOfMemoryError: Java heap space in thread Thread and my computer fan goes nuts.
I have an HTTP Cache Manager configured with only "Use Cache-Control/Expires header when processing GET requests" checked and I've lowered the "Max Number of elements in cache down to 10 since that's the only way I can get the test running at all.
Ideally, I'd like to test thousands of users at once, but if I can't reliably get 100 users tested, I don't understand how I'm supposed to scale to thousands.
I see that there are other cloud-based testing options, and I've tried a few out now, however I always come to a halt when it comes to configuring how to test logged in users. Seems like most solutions don't support this.
Feels like the kind of thing that lots of people should have come across in the past, but I find almost no one having these issues. What's the right way to load test thousands of logged-in users on a web application?
If JMeter "hangs" for 10 minutes retrieving an embedded resource there is a problem with this specific resource, you can see the response times for each resource using i.e. View Results Tree listener and either raise a bug for this particular "slow" component or exclude it from the scope using HTTP Request Defaults. There you can also specify the timeout, if the request won't return the response within the given timeframe - it will be marked as failed so you won't have to wait for 10 minutes:
Your way of increasing the heap is correct, it looks like 4Gb is not sufficient you will either have to provide more or consider switching to distributed testing
Make sure to follow JMeter Best Practices in any case. If your "computer fan goes nuts" it means that CPU usage is very high therefore JMeter most probably won't be able to send requests fast enough so you will get false-negative results.
Related
I have a simple JMeter experiment with a single Thread Group with 16 threads, running for 500s, hitting the same URL every 2 seconds on each thread, generating 8 requests/second. I'm running in non-GUI (Command Line) mode. Here is the .jmx file:
https://www.dropbox.com/s/l66ksukyabovghk/TestPlan_025.jmx?dl=0
Here is a plot of the result, running on an AWS m5ad.2xlarge / 8 cores / 32GB RAM (I get the same behavior on VirtualBox Debian on my PC, very large Hetzner server, Neocortix Cloud Services instances):
https://www.dropbox.com/s/gtp6oqy0xtuybty/aws.png?dl=0
At the beginning of the Thread Group, all 16 threads report a long response time (0.33s), then settle in to a normal short response time (<0.1s). I call this the "Start of Run" problem.
Then about 220s later, there is another burst of 16 long response times, and yet another burst at about 440s. I call those the "Start of Run Echo" problem, because they look like echoes of the "Start of Run" problem. The same problem occurs if I introduce another Thread Group with a delay, say 60s. That Thread Group gets its own "Start of Run" problem at t=60s, and then its own echos at 280s and 500s.
These two previous posts seem related, but no conclusive cause was given for the "Start of Run" problem, and the "Start of Run Echo" problem was not mentioned.
Jmeter - The time taken by first iteration of http sampler is large
First HTTP Request taking a long time in JMeter
I can work around the "Start of Run" problem by hitting a non-existent page with the first HTTP request in each thread, getting a 404 Error, and filtering out the 404's. But that is a hack, and it doesn't solve the "Start of Run Echo" problem, which is not guaranteed to hit the non-existent pages. And it introduces "holes" in the delivered load to the real target pages.
Update: After suggestion from Dmitri T, I have installed JMeter 5.3. It has default value httpclient4.time_to_live=60000 (60s), and its output matches that:
https://www.dropbox.com/s/gfcqhlfq2h5asnz/hetzner_60.png?dl=0
But if I increase the value of httpclient4.time_to_live=600000 (600s), it does not push all the "echoes" out past the end of the run. It still shows echoes at about 220s and 440s, i.e. the same original behavior that I am trying to eliminate.
https://www.dropbox.com/s/if3q652iyiyu69b/hetzner_600.png?dl=0
I am wondering if httpclient4.time_to_live has an effective maximum value of 220000 (220s) or so.
Thank you,
Lloyd
The first request will be slow due to initial connection establishment and SSL handshake
Going forward JMeter will act according to its network properties in particular:
httpclient4.time_to_live - TTL (in milliseconds) represents an absolute value. No matter what, the connection will not be re-used beyond its TTL.
httpclient.reset_state_on_thread_group_iteration - Reset HTTP State when starting a new Thread Group iteration which means closing opened connection and resetting SSL State
also it seems that you're using kind of outdated JMeter version which is 5 years old, according to JMeter Best Practices you should always be using the latest version of JMeter so consider upgrading to JMeter 5.3 (or whatever is the latest stable version available from JMeter Downloads page) as you might be suffering from a JMeter bug which has been resolved already.
It might also be the case you need to perform OS and JMeter tuning, see Concurrent, High Throughput Performance Testing with JMeter for example problems and solutions
I am using JMeter to do stress testing to my project.
I set the Constant Throughout Timer as 20/s, and last for 100s. But the actual is: the TPS most get 7/s, and never get 20/s. I don't know why is this.
I want to ask 2 questions:
1、As my understanding, the JMeter can simulate the stress so it can get TPS at 20/s, is there any config wrong of my JMeter?
2、If for other case, the Jmeter can simulate the stress only depend on the server, is it mean there is problem with my server lead to the TPS can't be improved?
Constant Throughput Timer can only pause the threads in order to limit JMeter to the defined number of requests per minute. Moreover, it's more or less precise only on "minute" level, if your test lasts less - you may not see the impact.
Another factor is application response time as JMeter waits for the previous response before sending the next request therefore if the application doesn't respond fast enough you will not be able to reach the desired number of requests per second
And last but not the least is that JMeter should have enough resources in order to send required number of requests per second because if JMeter is not properly configured or the machine where JMeter is running is overloaded - you will not be able to send requests fast enough even the application can handle more load.
So recommendations are:
Make sure you provide enough threads (virtual users) to conduct the required load in the Thread Group
Make sure to follow JMeter Best Practices
Make sure to monitor the baseline health metrics of the machine where JMeter is running (CPU, RAM, Network, Disk, etc.), it can be done using JMeter PerfMon Plugin. If you detect the lack of resources you will need to find another machine and go for Distributed Testing
You can consider Concurrency Thread Group and Throughput Shaping Timer combination instead of your current setup, they can be connected via Feedback Function so JMeter will be able to kick off more threads if current amount is not enough to create the necessary load
We just did a rolling restart of our server, but now every few hours our cluster stops responding to API calls. Instead, when we make a call, I get a response like this:
curl -XGET 'http://localhost:9200/_cluster/health?pretty'
{
"error" : "OutOfMemoryError[unable to create new native thread]",
"status" : 500
}
I noticed that we can still index data fine, it seems, but cannot search or call any API functions. This seems to happen every few hours, and the most recent time it happened, there was no logs in any of the node's log files.
Our cluster is 8 nodes over 5 servers (3 servers run 2 ElasticSearch processes, 2 run 1), running RHEL6u5. We are running ElasticSearch1.3.4.
This can be due to the OS not allowing more threads to be created. Increasing the number of threads per process can solve the problem. Set the ulimit -u value higher.
Although the above is good, an even better solution is to configure ElasticSearch to use a thread pool. This is a better solution since thread creation and destruction are expensive. In fact, some (or all?) JVMs can't cleanup terminated threads except during a full GC.
There is similar question on this topic I participated in it, but it doesn't really answer what I need at this moment.
How to rigorously test a site?
I noticed java.util.ConcurrentModificationException in my server log, so I fixed that one, but I still don't know if this or some other concurrency will ever occur without testing it.
I've tried to create test in jmeter which just does simple GET and simulates 100 users.
The problem :
I retrieve some information from server when page is done loading, so I'm interested in that part(because that part cause this exception before).
But jmeter gets only the page when its loaded, and all ajax pending requests if any are not displayed in the logs. Actually I can't see anything in logs because, jmeter never reaches these ajax calls when document is ready, it exits just before that.
Naturally when I refresh page from browser I can see logs, what exactly is going on on the server side. Is there a some kind of tool, that waits for all pending requests or can stay on the website for n amount of time, or is there a smarter way to test this to avoid further concurrency exceptions.
AJAX requests are simple GET requests as well, so you just need to configure JMeter to directly call the servlets which serve them.
If you use Selenium instead of JMeter for your tests, you will spawn real browsers that will perform AJAX request exactly like the real application. Simply because it is the real application that is being run.
The problem is... Selenium is for regression testing and compatibility with browsers, not for raw performance. You can't run more than a few browser per computer. Some companies provide cluster of browserd (up to 5000 and up to 500 000 virtual user for browsermob) that you can rent for your performance campain.
You can also use the desktop computer in your office, let say the night to perform your tests.
I know this might be a little complicated and not be the best solution.
[UPDATE: I forgot to add that this 30 sec. freezing problem only happens the first time I try to load a file from the server. Subsequent loads are very quick. Maybe some strange reverse DNS lookup? I am hosting on Google's appengine.]
I started a little project recently called http://www.chartle.net which is build around an applet.
Startup time is an important factor in the user's experience of an applet. I collect statistics and am shocked that I find often very long startup times (factor 50 to 100 higher then necessary)
The applet starts in 1-3 seconds depending on the speed of your computer and connection. Still for some users it takes up to 100 sec.
I have mixed results from my own tests. Mostly it is very fast but sometimes freezes the browser for a long time and the Java console doesn't tell me why. Best guess is, that it stalls when loading a saved chart.
Please help me figuring this out - best test by opening an already saved chart (click on one of the 'create' links at http://www.chartle.net/gallery)
Cheers,
Dieter
This is generic help rather than specific for your demo (which loaded fine for me in a few attempts).
Freezing applets
In the JDK bin directory there is a very handy programme called jstack. Refresh your browser window until it crashes and then run:
jstack *process_id*
This will give you the stack trace of any frozen Java process. If Java is not a separate process then you can use the browser's process (eg for Opera).
The following few problems were/are common for me:
I reccommend you use invokeLater rather than invokeAndWait on the init method (although you can't do this if you use start/stop methods)
Opera's custom java plugin acts very poorly...
Deadlocks caused by synch blocks and invokeAndWait's
Slow applets
Possibly the browser is fetching resources from the server, unable to use the jar file?
It may be that only the old plugin causes these problems. That means basically all people running on OSX and other users with Java prior to 1.6_update_10.
So, I would really appreciate people with such setups to watch their Java console and describe the first startup behaviour.
Cheers,
Dieter