More threads = Less requests per second?

More threads = Less requests per second? - java

I'm currently writing a crawler in java, and I'm stuck by something.
In my crawler, I have threads downloading a static distant page, using HttpURLConnection.
I tried to download one small file (2kb) with different parameters. The connection has a timeout set to 1s.
I noticed that, if I use 100 threads for the download, I suceed in making 3 times more request per second (~10k requests per second, which use ), whereas when using 500 threads I suceed in making "only" 4k requests per second.
I would have expected to be able to do at least as many request per second as with 100 threads.
Could you explain me why is this behaving this way, and if there is some parameter to activate somewhere to increase the maximum number of parallel connection ?
Thanks :)

i think it's just a matter of your cpu, at a certain point switching threats is more expensive then the time gained by not waiting for a single connection.
i would try to maximize parralel connection by setting a upper limit

Related

Gatling (and JMeter) struggling to maintain Requests per Second (RPS)?

I'm load testing an API. We have a problem - our response times are too large, sometimes close to a minute. We want to be at the range of under a second. But that is besides the point.
When I use a load testing tool, such as Gatling, the RPS sent seem to hit a halt. As you can see in the attached image, there is an initial 15 seconds of 20RPS, and suddenly almost no RPS at all. How can I maintain constant RPS? Probably it has to do with the poor response times, but what if I don't care about the response times? I just want the RPS constant.
My initial tests with JMeter also show similar behaviour.

What injection strategy are you using? How scenario looks? Is every user making one request, chain of requests or any of above in a loop?
Assuming that you want to test single endpoint, best approach to get constant requests per second (not constant responses as you already know) is to use scenario that executes single request and strategy that injects constant number of users per second fe:
setUp(
scn.inject(constantUsersPerSec(25) during(15 minute))
)
If your user performs more then 1 requests there is option to throttle requests, but you need to remember that it will only throttle down not up, so you need to make sure that active users will make enough requests per second to reach that limit fe:
setUp(scn.inject(
constantUsersPerSec(10) during(15 minutes)
).throttle(
jumpToRps(25), holdFor(15 minutes)
))
So here if fe. single user makes 5 requests you can reach even 50 req/s but it will be throttled to 25. But you must remember that new users will be added every second so if it takes more time to finish 1 user then number of active users will increase. Also if response time is high then active users may not produce enough req/s since most of their time is waiting for response.

In JMeter, You can achieve that by using Constant Throughput Timer at your test plan level.
Constant Throughput timer allows you to maintain the throughput of your server (requests/sec). Constant Throughput Timer is only capable of pausing JMeter threads in order to slow them down to reach the target throughput. Also, it works only on a minute level so you need to properly calculate the ramp-up period and let your test run long enough.
Let's see a brief thought on this:
To achieve the target throughput, you need to have enough number of threads in your test plan.
To calculate the number of threads you need for this test, you can use the below formula:
RPS * max response time in second
In your case, If you want 20 RPS and your max response time is 60 seconds, you need at least 1200 (20*60=1200) Threads in your test plan.
As Constant Throughput Timer works on a minute level, to achieve 20 RPS you have to configure your "Target Throughput" value to 1200/min and "Calculate Throughput based on" value as "All active threads".
Constant Throughput Timer Config:
Now, if you have more than single requests in your test plan (i.e 4 requests), then 1200 requests/min will be distributed among 4 samplers. That means you will get 5RPS for each sampler.
Now, for the Thread Group configurations, as you have mentioned "Calculate Throughput based on" value in Constant Throughput Timer for "All active threads", so all of your 1200 threads need to be started on the server to achieve that 20 RPS. Use Ramp-Up Period config to control these threads to start.
Ramp-Up Period is the time in which all the threads arrive on your tested application server. So if you use 60 seconds, then it will take 60 seconds to start all of your 1200 threads. 1200 threads will be active in 60 seconds.
You also need to set your test duration accordingly. Say, you want to keep that 20 RPS for 5 minutes. In this case, you have to set your test duration for 7 mins (2 min extra is for: Starting 1 min for 1200 threads to start which is the ramp up time and last 1 min for the 1200 threads ramp-down time). Don't forget to check the loop counts to Forever if you are using Thread Group.
Thread Group Config for the above-mentioned scenario:
You can also use another handy JMeter plugin which is Ultimate Thread Group if you are confused with the default Thread Group configurations. You can download JMeter Plugins by using JMeter Plugins Manager.
Here is the Ultimate Thread Group Config for the above-mentioned scenario:
Now while after the test finishes, you can check those 5 minutes results where all of your 1200 threads were active by using the Hits Per Second Listener and as well as Active Threads Over Time Listener.
Do not use JMeter GUI for Load testing, use the Non-GUI mode. Also, remove assertions if you have any in your test plan while you're trying to achieve some target RPS.

How does AWS Lambda serve multiple requests?

How does AWS Lambda serve multiple requests?
I want to know is it a multi-thread kind of a model here as well?
If I am calling a Lambda from an API gateway. And there are 1000 requests in 10 secs to the API. How many containers will be created and how many threads.

How does AWS Lambda serve multiple requests?
Independently.
I want to know is it a multi-thread kind of a model here as well?
No, it is not a multi-threaded model in the sense that you are asking.
Your code can, of course, be written to use multiple threads and/or child processes to accomplish whatever purpose it is intended to accomplish for one invocation, but Lambda doesn't send more than one invocation at a time to the same container. The container is not used for a second invocation until the first one finishes. If a second request arrives while a first one is running, the second one will run in a different container.
If I am calling a Lambda from an API gateway. And there are 1000 requests in 10 secs to the API. How many containers will be created and how many threads?
As many containers will be created as are needed to process each of the arriving requests in its own container.
The duration of each invocation will be the largest determinant of this.
1000 very quick requests in 10 seconds are roughly equivalent to 100 requests in 1 second. Assuming each request finishes in less than 1 second and arrival times are evenly-distributed, you could expect fewer than 100 containers to be created.
On the other hand, if 1000 requests arrived in 10 seconds and each request took 30 seconds to complete, you would have 1000 containers in existence during this event.
After a spike in traffic inflates the number of containers, they will all tend to linger for a few minutes, ready to handle the additional load if it arrives, and then Lambda will start terminating them.

AWS Lambda is capable of serving multiple requests by horizontally scaling for multiple containers. Lambda can support up to 1000 parallel container executions by default.
there are 1000 requests in 10 secs to the API. How many containers will be created and how many threads.
Requests per second = 1000/10 = 100
There will be 100 parallel Lambda executions assuming each execution takes 1 second or more to complete.
Note: Also you can spawn multiple threads but its difficult to predict the performance gain.
Also keep in mind that, having multiple threads is not always
efficient The CPU available to your Lambda function is shared between
all threads and processes your Lambda function creates. Generally you
will not get more CPU in a Lambda function by running work in parallel
among multiple threads. Your code in this case isn’t actually running
on two cores, but on two “hyperthreads” on a single core; depending on
the workload, this may be better or worse than a single thread. The
service team is looking at ways to better leverage multiple cores in
the Lambda execution environment, and we will take your feedback as a
+1 for that feature.
Reference: AWS Forum Post
For further details on concurrent executions of Lambda, refer this aws documentation.

There are a few angles to discuss.
AWS Lambda does support handling requests in parallel, but any single instance / container of a Lambda will only process one request at a time. If all existing instances are busy then new ones will be provisioned (depending on concurrency settings, discussed below).
Within a single Lambda instance multi-threading is supported, but still only one request will be handled per instance. In practice parallelization is rarely beneficial in Lambda, it adds significant overhead and is best used for processing very large sets. Additionally, Lambdas need to have more than 1 virtual core for it to have any benefit. Cores are configured by raising the memory setting--many Lambdas run with a low enough memory setting to have just one core.
Determining exactly how many containers / instances are created isn't always possible due to there being many factors:
Lambda will reuse any existing, paused, instances
Existing instances are often very fast to handle requests, a small number of warm instances can process many, many requests in the time it takes to provision new instances (especially with runtimes like Java or .NET Core, which often have startup times of 1+ seconds)
The concurrency settings of your Lambda are a significant factor
If you have Reserved Concurrency of X, you will never have more than X instances
If you have unreserved concurrency, then the limit is based on available concurrency. This defaults to 1000 instances per account, so if 990 instances of any Lambdas already exist then only 10 could be created
If you have provisioned concurrency then you will always have a minimum number of instances, reducing cold-starts
But, to try to answer your story problem, let's assume you are sending your 1000 requests at a steady pace over the 10 minutes. That's one request every 600 milliseconds. Let's also assume your Java app is given a fairly high memory allocation, and its initialization is relatively quick -- let's say 1 second for a cold start. Once the cold start is complete invocation is fast -- let's say 10ms. And, let's assume there are no instances when the traffic begins.
The first request will see a response time of ~1,010ms -- 1 second for a cold start, and 10ms for handling the request. A second request will arrive while the first is still processing, so it's likely that Lambda will provision a second instance, and the second request will see a similar response time.
By the time the third request comes in (1800ms after the start) both instances are now idle and can be reused--so this request will not experience a cold start, and the response time will be 10ms. From this point forward it's likely that no additional instances are needed--but this all assumes a steady rate of requests.
But--changing any variable can have a big impact.

What's a good sleep time to avoid java.net.SocketExeption: Unexpected end of file from server?

Okay, so I'm using over 300 threads that use buffer readers to get information from over 300 sites at a reasonable speed. So basically, it spams this exception a whole bunch of times each second. Meanwhile, it's only ending up getting less than 50% of the information from the sites. So, I was wondering what would be a good sleep time to wait until starting a new thread.

...only experience will tell I guess. This depends a lot on your infrastructure, the quality of you connection, the performance of the OS to manage the sockets...
Just try some small sleeps or less simultaneous connections and tune them to satisfy your needs.
You could also try some simple "sleep and retry" policy, where successive errors would increase the sleep time each time before the next request.

I think your problem is not sleep time, but the number of simultanous operations doing reads. You can use as much threads as you want, but allow only N of them to do I/O operations. You can consider using Semaphore class for entering I/O sections or commons-pool from apache or anything similar.

Minimum size for a piece of work to be benefically executed on another thread?

I have a low latency system that receives UDP messages. Depending on the message, the system responds by sending out 0 to 5 messages. Figuring out each possible response takes 50 us (microseconds), so if we have to send 5 responses, it takes 250 us.
I'm considering splitting the system up so that each possible response is calculated by a different thread, but I'm curious about the minimum "work time" needed to make that better. While I know I need to benchmark this to be sure, I'm interested in opinions about the minimum piece of work that should be done on a separate thread.
If I have 5 threads waiting on a signal to do 50 us of work, and they don't contend much, will the total time before all 5 are done be more or less than 250 us?

Passing data from one thread to another is very fast 1-4 us provided the thread is already running on the core. (and not sleep/wait/yielding) If your thread has to wake it can take 15 us but the task will also take longer as the cache is likely to have loads of misses. This means the task can take 2-3x longer.

Is that 50us compute-bound, or IO-bound ? If compute-bound, do you have multiple cores available to run these in parallel ?
Sorry - lots of questions, but your particular environment will affect the answer to this. You need to profile and determine what makes a difference in your particular scenario (perhaps run tests with differently size Threadpools ?).
Don't forget (also) that threads take up a significant amount of memory by default for their stack (by default, 512k, IIRC), and that could affect performance too (through paging requests etc.)

If you have more cores than threads, and if the threads are truly independent, then I would not be surprised if the multi-threaded approach took less than 250 us. Whether it does or not will depend on the overhead of creating and destroying threads. Your situation seems ideal, however.

limit of connections with database and number of java threads in an application

I am working to develop a JMS application(stand alone multithreaded java application) which can receive 100 messages at a time , they need to be processed and database procedures need to be called for inserting/updating data. Procedures are very heavy as validations are also performed in them. Each procedure is taking about 30 to 50 seconds of time to execute and they are capable to run concurrently.
My concern is to execute 100 procedures for all 100 messages and also send reply within time limit of 90 seconds by jms application.
No application server to be used(requirement) and database is Teradata (RDBMS)
I am using connection pool and thread pool in java code and testing code with 90 connections.
Question is :
(1) What should be the limit on number of connections with database at a time?
(2) How many threads at a time are recommended?
Thanks,
Jyoti

90 seems like a lot. My recommendation is to benchmark this. Your criteria is uniques and you need to make sure you get the maximum throughput.
I would make the code configurable with how many concurrent connections you use and run it with 10 ... 100 connections going up 10 at a time. This should not take long. When you start slowing down then you know you have exceeded the benefits of running concurrently.
Do it several times to make sure your results are predictable.

Another concern is your statement of 'procedure is taking about 30 to 50 seconds to run'. How much of this time is processing via Java and how much time is waiting for the database to process an SQL statement? Should both times really be added to determine the max number of connections you need?
Generally speaking, you should get a connection, use it, and close it as quickly as possible after processing your java logic if possible. If possible, you should avoid getting a connection, do a bunch of java side processing, call the database, do more java processing, then close the conection. There is probably no need to hold the connection open that long. A consideration to keep in mind when doing this approach is what processing (including database access) you need to keep in single transaction.
If for example, of the 50 seconds to run, only 1 second of database access is necessary, then you probably don't need such a high max number of connections.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.