Multiple Request Handling in java using concurrency api

Multiple Request Handling in java using concurrency api - java

I have a server which can handle 1000 threads simultaneously.
So for handling the request, i implemented the producer consumer pattern in my code similar kind of servlet container.
At a time, we can have more than 3000 request so for handling this
scenario, what should be the queue size and why?
let's assume we have a queue size of 2000, then what should we do if
we have 4000 request. how can we handle this scenario(easiest way is
to discard the extra request but we need to handle each and every
request)?
I want to generate 20 parallel thread just like jmeter does. How can
i do that using java concurrency API.
In the above scenario, what type of ThreadPool should we need to
utilize like CachedThreadPool or any other and why?

There are two dimensions of bounds to think about: one dimension is how many threads you can use (which is ultimately usually bounded by having enough memory for the corresponding thread stacks -- on a modern JVM the default is a megabyte of stack per thread so 1000 threads is a gigabyte of memory for thread stacks alone). The other dimension is the bounds on your work queue.
For request serving, you probably want a fixed size thread pool (sized according to Little's Law for your workload) and a queue that can grow as needed. For a request-based server an unbounded queue probably has the best graceful degradation result, although you might want to experiment with a bounded queue and a load-shedding RejectedExecutionPolicy.
All of these options can be configured in ThreadPoolExecutor which is probably the implementation you want to use.

Based on your requirement "To process each and every request" you should consider using a Queue (just for suggestion ActiveMQ, it can be any queue implementation).
Put every incoming request into queue and have a thread pool of consumers to consume data from queue. In this way you can grantee that every request is getting processed and your application can scale without changing any stuff.

Related

Default behavior of REST api. How multiple request to same api is handled

I have a api(GET REST) developed using java & Quarkus. I want to understand the default mechanism on how this api handle the multiple requests to same api. Is there any queuing mechanism used by default.? is there any multithreading used by default?
Please help to understand this.

Quarkus became popular for it's optimization of resources and benchmarks on heavy loaded systems. It will by default use 2 kind of different threads.
I/O threads or otherwise called event-loop threads
Worker threads
I/O threads or otherwise called event-loop threads. Those threads are responsible, among other things, for reading bytes from the HTTP request and writing bytes back to the HTTP response. The important here is that those threads are usually not blocked at all. You can see a simple of the functionality of those threads as illustrated in the following picture
The number of those I/o threads as described in documentation
The number if IO threads used to perform IO. This will be
automatically set to a reasonable value based on the number of CPU
cores if it is not provided. If this is set to a higher value than the
number of Vert.x event loops then it will be capped at the number of
event loops. In general this should be controlled by setting
quarkus.vertx.event-loops-pool-size, this setting should only be used
if you want to limit the number of HTTP io threads to a smaller number
than the total number of IO threads.
Worker threads. Here a pool of threads again are maintained by the system and the system assigns a worker thread to a execute some scheduled work on some request. Then this thread can be used from another thread to execute some other task. These threads normally take over long running tasks or blocking code.
The default number of these type of threads are 20 if not otherwise configured as indicated by documentation
So to sum up a request in Quarkus will be executed either by some I/O thread or some Worker thread and those threads will be shared between other requests too. An I/O thread will normally take over non blocking tasks that do not take long to be executed. A Worker thread will normally take over blocking tasks and long running processes.
Taking in consideration the above it makes sense that Quarkus will have configured much more Worker threads in the worker thread pool than I/O threads in the i/o thread pool.
What is very important to take from the above information is the following:
A worker thread will serve a specific request (ex request1) and if during this serve it get's blocked to do some I/O operation it will continue to wait for the I/O in order to complete the request it serves. When this request is finished the thread is able to move on and serve some other request (ex request2).
An I/O thread or event-loop thread will serve a specific request (ex request1) and if during this serve it get's blocked for some I/O operation which is needed for this request, it will pause this request, and continue to serve another request (ex request2). When the I/O of first request is completed the thread will return according to some algorithm that schedules the job again to request1 to continue from where it was left.
Now someone may question what is the case then, since usually every request requires some type of I/O operation then how can someone use I/O thread to have better performance. In that case the programmer has 2 choices when he declares the controller of quarkus to use I/O thread:
Spawn manually inside the controller method which is declared to be I/O some other thread to do the blocking code block work while the outer thread that serves the request is of type I/O (read http request data, write http response). The manual thread can be of type worker inside some service layer. This is a bit complicated approach.
Use some external library for I/O operations that's expected to work with the same approach that I/O threads work in quarkus. For example for database operations the I/O could be operated by the library hibernate-reactive. This way full benefits of I/O approach can be achieved.
Some side notes
Considering that we are in the java ecosystem it will be very useful to also mention that the above architecture and efficiency of resources is similar (not exactly same) with Spring Reactive (Web Flux).
But quarkus is based on Jax-Rs and will by default provide this architecture of efficient use of resources, independently of whether you write reactive code or not. When using Spring Boot however in order to have a similar efficiency with quarkus you have to use Spring Reactive (Web Flux).
In case you use the basic spring boot web, the architecture used will be of a single thread per incoming request. A specific thread in this case is not able to switch between different threads. It will need to complete some request in order to handle the next request.
Also in quarkus making a controller method execute from an I/O thread is as simple as placing an annotation #NonBlocking in that method. The same for an endpoint method that needs to be executed from a worker thread with #Blocking.
In Spring boot however switching from those 2 type of threads may mean switching from spring-boot-web to spring-boot-webflux and vice versa. Spring-boot-web has some support however now with servlet-3 to optimize it's approach, article with such optimization, but this requires some programming optimization and not an out of the box functionality.

How to setting http thread pool max size on quarkus

Sorry, I'm not really know about http thread pool how work on Quarkus.
I want setting http thread pool max size, But I see the Quarkus doc have a lot config property about thread.
I try setting each that and check it working or not on prometheus. but prometheus base_thread_max_count always not match my config.
So, I want know how setting and check that.
Thanks so much

If you mean the pool that handles HTTP I/O events only, then that's quarkus.http.io-threads
If you mean the worker pool that handles blocking operations, that's quarkus.vertx.worker-pool-size
The metric base_thread_max_count isn't really relevant to this, it shows you the maximum number of threads ever being active in that JVM, so it takes into account various threads unrelated to the HTTP layer. To see the number of active IO threads, I'd suggest taking a thread dump and counting the threads named vert.x-eventloop-thread-*, for worker threads it is executor-thread-*.
Also bear in mind that worker threads are created lazily, so setting a high number to quarkus.vertx.worker-pool-size might not have any immediate effect.

SimpleMessageListener vs DirectMessageListener

I'm trying to see difference between DirectMessageListener and SimpleMessageListener. I have this drawing just to ask if it is correct.
Let me try to describe how I understood it and maybe you tell me if it is correct.
In front of spring-rabbit there is rabbit-client java library, that is connecting to rabbit-mq server and delivering messages to spring-rabbit library. This client has some ThreadPoolExecutor (which has in this case I think - 16 threads). So, it does not matter how many queues are there in rabbit - if there is a single connection, I get 16 threads. These same threads are reused if I use DirectMessageListener - and this handler method listen is executed in all of these 16 threads when messages arrive. So if I do something complex in handler, rabbit-client must wait for thread to get free in order to get next message using this thread. Also if I increase setConsumersPerQueue to lets say 20, It will create 20 consumer per queue, but not threads. These 20*5 consumers in my case will all reuse these 16 threads offered by ThreadPoolExecutor?
SimpleMessageListener on the other hand, would have its own threads. If concurrent consumers == 1 (I guess default as in my case) it has only one thread. Whenever there is a message on any of secondUseCase* queues, rabbit-client java library will use one of its 16 threads in my case, to forward message to single internal thread that I have in SimpleMessageListener. As soon as it is forwarded, rabbit-client java library thread is freed and it can go back fetching more messages from rabbit server.

Your understanding is correct.
The main difference is that, with the DMLC, all listeners in all listener containers are called on the shared thread pool in the amqp-client (you can increase the 16 if needed). You need to ensure the pool is large enough to handle your expected concurrency across all containers, otherwise you will get starvation.
It's more efficient because threads are shared.
With the SMLC, you don't have to worry about that, but at the expense of having a thread per concurrency. In that case, a small pool in the amqp-client will generally be sufficient.

Camel Split EIP and use of threads in pool doesn't seem to go over min threads

I am using Apache Camel 2.15 and found an interesting behavior.
I am placing data received through a REST API call into a Camel route that is a direct endpoint. This route in turn uses the split EIP and calls another Camel route that is also a direct endpoint.
Here is what the relevant Camel code looks like
from("direct:activationInputChannel").routeId("cbr_activation_route")
// removed some processes
.split(simple("${body}"))
.parallelProcessing()
.to("direct:activationItemEndPoint")
.end()
// removed some processes
and
from("direct:activationItemEndPoint").routeId("cbr_activation_item_route")
.process(exchange -> this.doSomething(exchange))
// removed some processes
The use of the direct endpoint should cause the calls to be synchronous and run on the same thread. As expected the split/parallelProcessing usage causes the second route to run on a separate thread. The splitter is using the default thread pool.
When I ran some load tests against the application using jMeter I found that the split route was becoming a bottleneck. With a load test using 200 threads I observed the Tomcat http thread pool as having a currentThreadCount of 200 in jConsole. I also observed the Camel route cbr_activation_route had 200 ExchangesInflight.
The problem was the cbr_activation_item_route only had 50 ExchangesInflight. The number 50 corresponded to the poolSize set for the default pool. The maxPoolSize was set to 500 and the maxQueueSize was set to 1000 (the default).
The number of inflight exchanges for this route never rose above the min pool size. Even though there were plenty of requests queued up and threads available. When I changed the poolSize in the Camel default thread pool to 200 then the cbr_activation_item_route used the new min value and had 200 ExchangesInflight. It seems that Camel would not use more threads than the minimum even when there were more threads available and even when under load.
Is there a setting or something that I could be missing that is causing this behavior? Why wouldn't Camel use 200 threads in the first test run when the minimum was set to 50?
Thanks

Agree with Frederico's answer about the behavior of Java's Thread Pool Executor. It prefers to add new requests to the queue instead of creating more threads after 'corePoolSize' threads have been reached.
If you want your TPE to add more threads as requests come in after 'corePoolSize' has been reached, there is a slightly hacky way of achieving this based on the fact that Java calls the offer() method in BlockingQueue to queue the requests. If the offer() method returns false, it creates a new thread and calls the Executor's rejectedExecutionHandler. It is possible to override the offer() method and create your own version of the ThreadPool executor that can scale the number of threads based on load.
I found an example of this here: https://github.com/kimchy/kimchy.github.com/blob/master/_posts/2008-11-23-juc-executorservice-gotcha.textile

That's the expected behavior. This has nothing to do with Camel itself, but with Java's ThreadPoolExecutor in general.
If you read the linked docs, there it says:
If there are more than corePoolSize but less than maximumPoolSize threads running, a new thread will be created only if the queue is full.
If you set maxQueueSize to 1000, you have to create 1050 requests before new threads are created, up to 200. Try telling Camel to use a SynchronousQueue if you don't want your requests to be queued (not recommended, IMHO).

Best way to configure a Threadpool for a Java RIA client app

I've a Java client which accesses our server side over HTTP making several small requests to load each new page of data. We maintain a thread pool to handle all non UI processing, so any background client side tasks and any tasks which want to make a connection to the server. I've been looking into some performance issues and I'm not certain we've got our threadpool set up as well as possible. Currently we use a ThreadPoolExecutor with a core pool size of 8, we use a LinkedBlockingQueue for the work queue so the max pool size is ignored. No doubt there's no simple do this certain thing in all situations answer, but are there any best practices. My thinking at the moment is
1) I'll switch to using a SynchronousQueue instead of a LinkedBlockingQueue so the pool can grow to the max pool size figure.
2) I'll set the max pool size to be unlimited.
Basically my current fear is that occasional performance issues on the server side are causing unrelated client side processing to halt due to the upper limit on the thread pool size. My fear with unbounding it is the additional hit on managing those threads on the client, possibly just the better of 2 evils.
Any suggestions, best practices or useful references?
Cheers,
Robin

It sounds like you'd probably be better of limiting the queue size: does your application still behave properly when there are many requests queued (is it acceptable for all task to be queued for a long time, are some more important to others)? What happens if there are still queued tasks left and the user quits the application? If the queue growing very large, is there a chance that the server will catch-up (soon enough) to hide the problem completely from the user?
I'd say create one queue for requests whose response is needed to update the user interface, and keep its queue very small. If this queue gets too big, notify the user.
For real background tasks keep a separate pool, with a longer queue, but not infinite. Define graceful behavior for this pool when it grows or when the user wants to quit but there are tasks left, what should happen?

In general, network latencies are easily orders of magnitude higher than anything that can be happening in regards to memory allocation or thread management on the client side. So, as a general rule, if you are running into a performance bottle neck, look first and foremost to the networking link.
If the issue is that your server simply can not keep up with the requests from the clients, bumping up the threads on the client side is not going to help matters: you'll simply progress from having 8 threads waiting to get a response to more threads waiting (and you may even aggravate the server side issues by increasing its load due to higher number of connections it is managing).
Both of the concurrent queues in JDK are high performers; the choice really boils down to usage semantics. If you have non-blocking plumbing, then it is more natural to use the non-blocking queue. IF you don't, then using the blocking queues makes more sense. (You can always specify Integer.MAX_VALUE as the limit). If FIFO processing is not a requirement, make sure you do not specify fair ordering as that will entail a substantial performance hit.

As alphazero said, if you've got a bottleneck, your number of client side waiting jobs will continue to grow regardless of what approach you use.
The real question is how you want to deal with the bottleneck. Or more correctly, how you want your users to deal with the bottleneck.
If you use an unbounded queue, then you don't get feedback that the bottleneck has occurred. And in some applications, this is fine: if the user is kicking off asynchronous tasks, then there's no need to report a backlog (assuming it eventually clears). However, if the user needs to wait for a response before doing the next client-side task, this is very bad.
If you use LinkedBlockingQueue.offer() on a bounded queue, then you'll immediately get a response that says the queue is full, and can take action such as disabling certain application features, popping a dialog, whatever. This will, however, require more work on your part, particularly if requests can be submitted from multiple places. I'd suggest, if you don't have it already, you create a GUI-aware layer over the server queue to provide common behavior.
And, of course, never ever call LinkedBlockingQueue.put() from the event thread (unless you don't mind a hung client, that is).

Why not create an unbounded queue, but reject tasks (and maybe even inform the user that the server is busy (app dependent!)) when the queue reaches a certain size? You can then log this event and find out what happened on the server side for the backup to occur, Additionally, unless you are connecting to a multiple remote servers there is probably not much point having more than a couple of threads in the pool, although this does depend on your app and what it does and who it talks to.
Having an unbounded pool is usually dangerous as it generally doesn't degrade gracefully. Better to log the problem, raise an alert, prevent further actions being queued and figure out how to scale the server side, if the problem is there, to prevent this happening again.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.