How to test your website against multiple users(extended)

How to test your website against multiple users(extended) - java

There is similar question on this topic I participated in it, but it doesn't really answer what I need at this moment.
How to rigorously test a site?
I noticed java.util.ConcurrentModificationException in my server log, so I fixed that one, but I still don't know if this or some other concurrency will ever occur without testing it.
I've tried to create test in jmeter which just does simple GET and simulates 100 users.
The problem :
I retrieve some information from server when page is done loading, so I'm interested in that part(because that part cause this exception before).
But jmeter gets only the page when its loaded, and all ajax pending requests if any are not displayed in the logs. Actually I can't see anything in logs because, jmeter never reaches these ajax calls when document is ready, it exits just before that.
Naturally when I refresh page from browser I can see logs, what exactly is going on on the server side. Is there a some kind of tool, that waits for all pending requests or can stay on the website for n amount of time, or is there a smarter way to test this to avoid further concurrency exceptions.

AJAX requests are simple GET requests as well, so you just need to configure JMeter to directly call the servlets which serve them.

If you use Selenium instead of JMeter for your tests, you will spawn real browsers that will perform AJAX request exactly like the real application. Simply because it is the real application that is being run.
The problem is... Selenium is for regression testing and compatibility with browsers, not for raw performance. You can't run more than a few browser per computer. Some companies provide cluster of browserd (up to 5000 and up to 500 000 virtual user for browsermob) that you can rent for your performance campain.
You can also use the desktop computer in your office, let say the night to perform your tests.
I know this might be a little complicated and not be the best solution.

Related

JMeter slow with "Retrieve all embedded resources" - how do I scale?

TLDR:
JMeter crawls with just 1 test user if I select "Retrieve all embedded resources".
JMeter throws Uncaught Exception java.lang.OutOfMemoryError: Java heap space in thread Thread if I test with 100 users
How to eventually scale to test with thousands of users?
More details:
I'm recording the jmx script using the BlazeMeter Chrome extension. The user logs in, and completes a course and a test. There are many ajax requests being made along the way. Around 350 - 400 steps for the script. Mostly ajax POST request that have a json response.
When I click through manually in Chrome, the site loads quickly. Maybe a page load takes 2 seconds tops.
But when I import that script into JMeter with the "Retrieve all embedded resources" and "Parallel downloads" set to 6, and run it (in the GUI initially with just 1 user), it will get through, say, 7 steps quickly, and then just hang, sometimes for 10+ minutes before advancing to the next step. This doesn't happen if I uncheck "Retrieve all embedded resources", but I don't want to do that since that wouldn't be a realistic test.
If I take that same test and run it with 100 users (from the command line using JVM_ARGS='-Xms4096m -Xmx4096m' sh jmeter -n -t myfolder/mytest.jmx -l myfolder/testresults.jtl), I get Uncaught Exception java.lang.OutOfMemoryError: Java heap space in thread Thread and my computer fan goes nuts.
I have an HTTP Cache Manager configured with only "Use Cache-Control/Expires header when processing GET requests" checked and I've lowered the "Max Number of elements in cache down to 10 since that's the only way I can get the test running at all.
Ideally, I'd like to test thousands of users at once, but if I can't reliably get 100 users tested, I don't understand how I'm supposed to scale to thousands.
I see that there are other cloud-based testing options, and I've tried a few out now, however I always come to a halt when it comes to configuring how to test logged in users. Seems like most solutions don't support this.
Feels like the kind of thing that lots of people should have come across in the past, but I find almost no one having these issues. What's the right way to load test thousands of logged-in users on a web application?

If JMeter "hangs" for 10 minutes retrieving an embedded resource there is a problem with this specific resource, you can see the response times for each resource using i.e. View Results Tree listener and either raise a bug for this particular "slow" component or exclude it from the scope using HTTP Request Defaults. There you can also specify the timeout, if the request won't return the response within the given timeframe - it will be marked as failed so you won't have to wait for 10 minutes:
Your way of increasing the heap is correct, it looks like 4Gb is not sufficient you will either have to provide more or consider switching to distributed testing
Make sure to follow JMeter Best Practices in any case. If your "computer fan goes nuts" it means that CPU usage is very high therefore JMeter most probably won't be able to send requests fast enough so you will get false-negative results.

Issue with scraping data from websites

I'm trying to gather data by scraping webpages using Java with Jsoup. Ideally, I'd like about 8000 lines of data, but I was wondering what the etiquette is when it comes to accessing a site that many times. For each one, my code has to navigate to a different part of the site, so I would have to load 8000 (or more) webpages. Would it be a good idea to put delays between each request so I don't overload the website? They don't offer an API from what I can see.
Additionally, I tried running my code to just get 80 lines of data without any delay, and my internet is out. Could running that code have caused it? When I called the company, the automated message made it sound like service was out in the area, so maybe I just didn't notice it until I tried to run the code. Any help is appreciated, I'm very new to network coding. Thanks!

Here are couple of things you should consider and things that I learned while writing a super-fast web scraper with Java and Jsoup:
Most important one is legal aspect, whether website allows crawling and till what extent they allow using their data.
Putting delays is fine but adding custom user agents that are compatible with robots.txt is more preferred. I had seen significant increase in response times with changing user agent from default to robots.txt.
If site allows and you need to crawl big number of pages, which was allowed for one of my previous project, you could use theadexecutor to load N number of pages simultaneously. It turns hours of data gathering job with single threaded java web scraper into just couple of minutes.
Many ISP's blacklist users who are doing programmable repeatative tasks such as web crawling and setting up email servers. Its varies from ISP to ISP. I previously avoided this by using proxies.
With a website having a response time of 500ms per request, my web scraper was able to scrape data from 200k pages with 50 threads, 1000 proxies in 3 minutes over a connection of 100MBPS.
Should there be delays between requests?
Answer: It depends if website allows you constantly hitting then you don't need it but should be better to have it. I had a delay of 10ms between each request.
I tried running my code to just get 80 lines of data without any delay, and my internet is out?
Answer: Most probably. ISP may assume you are doing DOS attack against the website and may have temporarily/permanently put a limit on your connection.

Tomcat - Is there a way to spew the current/active requests that have not yet responded?

I'm having a hard time debugging a problem with long running response times. My web application sometimes takes a long time to respond and I'm having a hard time nailing down the specific requests that are the culprits. The trouble I'm having is that tomcat only logs to the access log once it responds to the request. I'm wondering if there is a way to either log all incoming requests the moment they come in (and not wait for a response) or tell tomcat to spew out all the requests that it is currently handling (something analogous to jstack, but instead of showing me threads show me the current/active request urls)?

Once served, the slow requests can be identified by activating the Extended Log Valve with the time-taken token. It's not your real question, as you want to see in real time, but it's not clear if you already identified long time to respond requests.
VisualVM will show you the running threads but it's not enough to understand what's going on. Probably you again need to activate the Extended Log Valve, this time with the x-threadname token to compare with what you have seen in VisualVM. But for the debugging itself, the solution above (time-taken token) is usually enough.
But it not might be enough to identify the bottleneck particularly if the application does in fact have no problem at all (and it seems to be a random problem). Does the application use a database ? If so, activate the slow query log for example with MySQL. Does it use any other tier (authentication, NFS/CIFS,..) ? If so, you need to monitor their availability, in case they are blocking anything when not available.

java gwt a script is causing the browser to run slowly

I have a web application written in java gwt. When opening the website in IE8 there always popups the message that says 'A script on this page is causing your web browser to run slowly'
The message only appears in IE8 no higher version and not in FF or Chrome!
Since the application is written in java gwt its pretty difficult to debug the javascript code , is there another possibility to determine the problem?
The application also has many asynchronous calls a database might that be the problem?

This message means that JavaScript blocks browser thread for quite a long time.
Its implementation in IE8 is really silly. It counts number of JavaScript lines of code (instructions) it executes and if it reaches certain threshold this message is shown.
Actually this limit is configured in Windows registry, by default it is 5000000 or something like that. It could be increased, which is not a recommended solution of course.
One of the ways to avoid this message is to use GWT DeferredCommand. If you could split the work being done to chunks small enough not to trigger IE8 guard constraint you will be fine. Also try to merge multiple asynchronous requests into as few as possible and improve rendering logic, potentially shifting from Widgets to UI Binder or plain DOM.
This is related question (Disabling the long-running-script message in Internet Explorer)

I would slightly disagree on - "java gwt its pretty difficult to debug the javascript code"
MSDN article for disabling slow script warning only hides the problem.
The slow script warning occurs when you have a heavy for loop or deep recursive call. This can happen in 2 scenarios -
1) Poorly coded client side processing logic - example tree navigation
2) Deep object graph in rpc.
You can quickly isolate the trouble spot if you familiarize yourself with
1) Using Speed Tracer - https://developers.google.com/web-toolkit/speedtracer/
2) Using GWT logging - https://developers.google.com/web-toolkit/doc/latest/DevGuideLogging
3) Using Chrome Dev Tools & Firebug to capture timeline, profiling etc
4) IE8 has profiling , but it is darn slow and cumbersome.
5) Use GWT Pretty mode instead OBF mode when profiling.
Once you are sure which part of the code is causing the slow script warning Just FIX it.

Because some scripts may take an excessive amount of time to run, Internet Explorer prompts the user to decide whether they would like to continue running the slow script.
If the Generated Cache.js Javascript file is some what in big size that message may come .
So The message box for Internet Explorer versions 4.0, 5.0, 6, 7, and 8 come with a message
Read this article on MS blog
And refer the below question
a script on this page is causing ie to run slowly

multithreaded web application in java

I am doing a web application which has Java as a front end and shell script as a back end. The concept is I need to process multiple files in the back end. I will get the date range from the user (for example from July 1st-8th) and for each day process around 100 files. So in total I have 800 files to process.
I will get these details from JSP and delegate a background call to shell script and get back the results and display the same to the user.
Now I did all these in a sequential approach - by which I mean without threads. So there is only one main thread that executes and the user has to wait till 800 files are processed sequentially. However this is really slow. And because of this I am thinking of going for threads. Since I am a beginner of threads, I read a some stuffs regarding this and I have come up with the following idea:
As I read threads work have to be split .. I thought of splitting the
8 day work to 4 threads where each thread would perform 2 day work
I would like to know whether I am following a correct approach and my major concerns are:
Is it recommended to spawn multiple threads from a web application
Whether or not this is a good approach
Some guidance of how to proceed with this. An example instance would be great. Thank you.

Yes, you can run the long processing job in multi-threaded or in any high performance environment. You should also you Servlet 3.0 Asynchronous Request Processing to suspend the request thread and wait till the Long processing task is done.

Yes, there's nothing wrong with spawning multiple threads from a web application. In fact, if you're running a Servlet container (which you most likely are since you're using Java), it's already spawning multiple threads for you. In general a Servlet container will automatically spawn a new thread (or reuse one out of a pool) to handle each request it receives.
Your approach is fine, thought you'll want to fine-tune the number of threads to something that is suitable given the hardware configuration of your system and the amount of concurrent load on your web service. Also note that while spinning up a bunch of threads will reduce the total amount of time needed to process all the data, it will still leave a potentially large chunk of time before any data is ready to go back to the user. So you might get a better result by doing smaller work units sequentially, and posting each batch of results to the user-interface as soon as it is ready. Then it will still take a long while for the user to have all the data, but that can start viewing at least a portion of it almost immediately.

The way to improve user experience is not by parallelizing at Servlet level on 100000 threads but rather to provide incremental rendering of the view. First of all it would be useful to separate your application in multiple layers, according to the MVC pattern for example.
Saying that, you will have to look on how
Create a service that is able to return partial answers and a last answer, meaning that all available data has been returned. Each of this answers can be computed in parallel to improve performance.
Fill a web page incrementally, tipically by calling back this service which returns a JSON string you use to add data to the DOM. Every time you get an answer, if this is a partial answer, you call again the service providing the previous sequence number.
If you look to Liligo to understand, you will see how this is works. The technique I described is known as polling, but there are others technique to obtain similar asynchronous results at UI Level. In general, you don't want to work directly with the Servlet API, which is a very low level API,but rather use a reasonable framework or abstraction for that.
If you want a warm advice, you should have a look to the Play! framework http://www.playframework.org/documentation/2.0.2/JavaStream HTTP streaming.

Create threads in a web application is not a good solution. It is a bad design because normally it would be the container (web server) who is charged with that activity. So I think you have to find another solution.
I suggest you putting the shell scripts in cron, scheduled to run each minute, and to "activate" them you can touch files that act as semaphores. At each run the scripts verify if the web application touched the semaphore file, if so they read the date interval from those files and then start to process.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.