I am working on a Java web application for tomcat6 that offers suggest functionality. This means a user types in a free text and get suggestions for completing his input. It is essential that the web application needs to react very fast to make sense.
This web application makes suggestions for data that can be modified at any time. If new data is available the suggest index will be created completely new in the background with the help of a daemon thread. If the data preparation process is finished, the old index is thrown away and the new index comes into play. This offers the advantage of no service gaps.
The background process for data preparation costs a lot of CPU power. This causes the service to hang sometimes for more than a seconds, which makes the service less usable and causes a bad user experience.
My first attempt to solve the problem was to pause all background data preparation threads, when a request has to be processed. This attempt narrows the problem a bit, but the service is still not smooth. It is not that the piece of code for generating the suggests itself gets slower, but it seems as if tomcat does not start the thread for the request immediately all the time because of high load in other threads (I guess)
I have no idea how to face this problem. Can I interact with the thread scheduler of tomcat and tell him to force execution of requests? As expected adjusting the thread priority also did not help. I did not find configuration options for tomcat that offer help. Or is there no way to deal with this and I have to modify the software concept? I am helpless. Do you have any hints for my how to face this problem?
JanP
I would not change the thread priority, By doing that you are slowing down other Threads and will slow down other users. If you have synchronized data then you will run into a priority inversion problem, where your faster threads are waiting on lower priority threads to release locks on data.
Instead I would look at how to optimize the data generation process. What are you doing there ?
EDIT:
You could create an ExecutorService and send messages to it through a Queue like in this example: java thread pool keep running In order to be able to change the Thread priority of the tasks instead of calling ExecutorService pool = Executors.newFixedThreadPool(3); you would create a ThreadFactory and then have the ThreadFactory lower the priority of the Threads, then call ExecutorService pool = Executors.newSingleThreadExecutor(threadFactory);
Related
I am trying to implement pagination in LDAP using vlv, using reference from document https://docs.ldap.com/ldap-sdk/docs/javadoc/com/unboundid/ldap/sdk/controls/VirtualListViewRequestControl.html
it is working fine with single thread, but when try with multiple threads concurrently upto 5 threads it works fine, but as number of threads increased only 5 threads can run successfully exceed threads got failed with below error message:
LDAPException(resultCode=51 (busy), numEntries=0, numReferences=0, diagnostiMessage='Other sort requests already in progress', ldapSDKVersion=5.1.1..
I am using OpenLDAP, Unboundid api for connection with Java. About data size it is around 100k.
Tried with single connection and multiple connections(with multiple concurrent threads) getting same error in both cases.
Tried to synchronize block for fetching data.
On exception, make thread to wait and try again.
All above things didn't worked, threads cannot fetch data from LDAP.
After trying to close and reconnect connection as described in https://www.openldap.org/lists/openldap-technical/201107/msg00006.html
failed thread can fetch data but after retry lot of times, in my case thread retried about 2k times then it started fetching data.
Is there any better solution, retrying 2k times and getting result is not a good option.
From my experience in JAVA, it is better to use thread pools which shifts your solution from "how to manage threads" into a more robust and tasks oriented one.
To the point (of your use case): you may want to define a thread pool with a fixed size of thread. The pool will manage all incoming loads by re-using the threads in the pool. This is very efficient because more threads does not equal more performance. You may want to use a mechanism that re-uses threads, rather than just open and close threads and use too much of them.
You may start with something similar to this:
ExecutorService executorService = Executors.newFixedThreadPool(10);
Future<SearchResult> task1 = executorService.submit(() -> {
// your logic goes here
return result;
});
SearchResult result = task1.get();
This is an over simplified piece of code but you can clearly see that:
Tasks may be initiated from a stack (dynamically)
Results can be fetched by using a listener (you grab results only when they are ready - no polling needed)
The thread pool manages loads - so you can tweak your configuration and boost performance without changing your code (perfect for various environments that may want to configure your solution to suit their hardware profile)
I think you should give it a try.. after all - retrying 2000 times before success is really not that kind of idle 🙃
To give some context here, I have been following Project Loom for some time now. I have read The state of Loom. I have done asynchronous programming.
Asynchronous programming (provided by Java NIO) returns the thread to the thread pool when the task waits and it goes to great lengths to not block threads. And this gives a large performance gain, we can now handle many more request as they are not directly bound by the number of OS threads. But what we lose here, is the context. The same task is now NOT associated with just one thread. All the context is lost once we dissociate tasks from threads. Exception traces do not provide very useful information and debugging is difficult.
In comes Project Loom with virtual threads that become the single unit of concurrency. And now you can perform a single task on a single virtual thread.
It's all fine until now, but the article goes on to state, with Project Loom:
A simple, synchronous web server will be able to handle many more requests without requiring more hardware.
I don't understand how we get performance benefits with Project Loom over asynchronous APIs? The asynchrounous API:s make sure to not keep any thread idle. So, what does Project Loom do to make it more efficient and performant that asynchronous API:s?
EDIT
Let me re-phrase the question. Let's say we have an http server that takes in requests and does some crud operations with a backing persistent database. Say, this http server handles a lot of requests - 100K RPM. Two ways of implementing this:
The HTTP server has a dedicated pool of threads. When a request comes in, a thread carries the task up until it reaches the DB, wherein the task has to wait for the response from DB. At this point, the thread is returned to the thread pool and goes on to do the other tasks. When DB responds, it is again handled by some thread from the thread pool and it returns an HTTP response.
The HTTP server just spawns virtual threads for every request. If there is an IO, the virtual thread just waits for the task to complete. And then returns the HTTP Response. Basically, there is no pooling business going on for the virtual threads.
Given that the hardware and the throughput remain the same, would any one solution fare better than the other in terms of response times or handling more throughput?
My guess is that there would not be any difference w.r.t performance.
We don't get benefit over asynchronous API. What we potentially will get is performance similar to asynchronous, but with synchronous code.
The answer by #talex puts it crisply. Adding further to it.
Loom is more about a native concurrency abstraction, which additionally helps one write asynchronous code. Given its a VM level abstraction, rather than just code level (like what we have been doing till now with CompletableFuture etc), It lets one implement asynchronous behavior but with reduce boiler plate.
With Loom, a more powerful abstraction is the savior. We have seen this repeatedly on how abstraction with syntactic sugar, makes one effectively write programs. Whether it was FunctionalInterfaces in JDK8, for-comprehensions in Scala.
With loom, there isn't a need to chain multiple CompletableFuture's (to save on resources). But one can write the code synchronously. And with each blocking operation encountered (ReentrantLock, i/o, JDBC calls), the virtual-thread gets parked. And because these are light-weight threads, the context switch is way-cheaper, distinguishing itself from kernel-threads.
When blocked, the actual carrier-thread (that was running the run-body of the virtual thread), gets engaged for executing some other virtual-thread's run. So effectively, the carrier-thread is not sitting idle but executing some other work. And comes back to continue the execution of the original virtual-thread whenever unparked. Just like how a thread-pool would work. But here, you have a single carrier-thread in a way executing the body of multiple virtual-threads, switching from one to another when blocked.
We get the same behavior (and hence performance) as manually written asynchronous code, but instead avoiding the boiler-plate to do the same thing.
Consider the case of a web-framework, where there is a separate thread-pool to handle i/o and the other for execution of http requests. For simple HTTP requests, one might serve the request from the http-pool thread itself. But if there are any blocking (or) high CPU operations, we let this activity happen on a separate thread asynchronously.
This thread would collect the information from an incoming request, spawn a CompletableFuture, and chain it with a pipeline (read from database as one stage, followed by computation from it, followed by another stage to write back to database case, web service calls etc). Each one is a stage, and the resultant CompletablFuture is returned back to the web-framework.
When the resultant future is complete, the web-framework uses the results to be relayed back to the client. This is how Play-Framework and others, have been dealing with it. Providing an isolation between the http thread handling pool, and the execution of each request. But if we dive deeper in this, why is it that we do this?
One core reason is to use the resources effectively. Particularly blocking calls. And hence we chain with thenApply etc so that no thread is blocked on any activity, and we do more with less number of threads.
This works great, but quite verbose. And debugging is indeed painful, and if one of the intermediary stages results with an exception, the control-flow goes hay-wire, resulting in further code to handle it.
With Loom, we write synchronous code, and let someone else decide what to do when blocked. Rather than sleep and do nothing.
The http server has a dedicated pool of threads ....
How big of a pool? (Number of CPUs)*N + C? N>1 one can fall back to anti-scaling, as lock contention extends latency; where as N=1 can under-utilize available bandwidth. There is a good analysis here.
The http server just spawns...
That would be a very naive implementation of this concept. A more realistic one would strive for collecting from a dynamic pool which kept one real thread for every blocked system call + one for every real CPU. At least that is what the folks behind Go came up with.
The crux is to keep the {handlers, callbacks, completions, virtual threads, goroutines : all PEAs in a pod} from fighting over internal resources; thus they do not lean on system based blocking mechanisms until absolutely necessary This falls under the banner of lock avoidance, and might be accomplished with various queuing strategies (see libdispatch), etc.. Note that this leaves the PEA divorced from the underlying system thread, because they are internally multiplexed between them. This is your concern about divorcing the concepts. In practice, you pass around your favourite languages abstraction of a context pointer.
As 1 indicates, there are tangible results that can be directly linked to this approach; and a few intangibles. Locking is easy -- you just make one big lock around your transactions and you are good to go. That doesn't scale; but fine-grained locking is hard. Hard to get working, hard to choose the fineness of the grain. When to use { locks, CVs, semaphores, barriers, ... } are obvious in textbook examples; a little less so in deeply nested logic. Lock avoidance makes that, for the most part, go away, and be limited to contended leaf components like malloc().
I maintain some skepticism, as the research typically shows a poorly scaled system, which is transformed into a lock avoidance model, then shown to be better. I have yet to see one which unleashes some experienced developers to analyze the synchronization behavior of the system, transform it for scalability, then measure the result. But, even if that were a win experienced developers are a rare(ish) and expensive commodity; the heart of scalability is really financial.
I have questions regarding the performance tuning.
I'm using Linux 64bit server, java 1.8 with wildfly 10.0.0.final . I developed a webservice which uses thread factory and managed executor service through the wildfly configuration.
the purpose of my webervice is to receive the request which has large data, save data, and then create a new thread to process this data, then return response to request. This way the webservice can return response quickly without waiting for data processing to finish.
The configured managed-executor-service holds a thread pool config specifically for this purpose.
for my understanding in configuration, the core-thread defines how many threads will be alive in the thread pool. when core-thread is full, new requests will be put in queue, when queue is full, then new threads will be created, but these newly created thread will be terminated after some time.
I'm trying to figure out what is the best combination to set the thread pool. The following is my concerns:
if this core-thread is set too small (like 5), maybe the responding time will be long because only 5 active threads are processing data, the rest are put in queue until queue is full. the response time won't look good at heavy load time
if I set core-thread to be big, (like 100 maybe), that means even the system is not busy, there still will be 100 live threads in the pool. I don't see any configuration that can allow these threads to be terminated. I'm concerned it is too many live threads idle.
Does anyone have any suggestions on how to set parameters to handle both heavy load and light load situation without too many idle threads left in pool? I'm actually not familiar with this area, like how many idle threads means too many, how to measure it.
the following is the configuration for thread factory and managed-executor-service.
<managed-thread-factory name="UploadThreadFactory" jndi-name="java:jboss/ee/concurrency/factory/uploadThreadFactory"/>
<managed-executor-service name="UploadManagedExecutor" Jodi-name="java:jboss/ee/concurrency/executor/uploadManagedExecutor" context-service="default" thread-factory="UploadThreadFactory" hung-task-threshold="60000" core-thread="5" max-thread="100" keep-alive-time="5000" queue-length="500"/>
Thanks a lot for your help,
Helen
I'm really new to programming and having performance problems with my software. Basically I get some data and run a 100 loop on it(i=0;i<100;i++) and during that loop my program makes 1 of 3 decisions, keep the data its working on, discard it, or send a version of it back to the queue to process. The individual work each thread does is very small but there's a lot of it(which is why I'm using a queue server to scale horizontally).
My problem is it never takes close to my entire cpu, my program runs at around 40% per core. After profiling, it seems the majority of the time is spend sending/receiving data from the queue(64% approx. in a part called com.rabbitmq.client.impl.Frame.readFrom(DataInputStream) and com.rabbitmq.client.impl.SocketFrameHandler.readFrame(), 17% approx. is getting it in the format for the queue(I brought down from 40% before) and the rest is spend on my programs logic). Obviously, I want my work to be done faster and want to not have it spend so much time in the queue and I'm wondering if there's a better design I can use.
My code is actually quite large but here's a overview of what it does:
I create a connection to the queue server(rabbitmq and java)
I fork as many threads as I have cpu cores(using the same connection)
Data from thread is
each thread creates its own channel to the queue server using the shared connection.
There'a while loop that pools the server and gets X number of messages without acknowledgments
Once I get a message, I use thread executor to send an acknowledge while my job is running
I parse the message and run my loop
If data is sent back to the queue, I send it to a thread executor that sends it back so my program can proceed with the next data set.
One weird thing I did, was although I use thread executor for acknowledgments and sending to the queue, my main worker thread is just a forked thread(using public void run()) because my program is dedicated to this single process I did that to make sure there was always X number of threads ready to work(and there was no shutting down/respawning of them). The rest is in threads because I figured the rest could wait/be queued while my main program runs.
I'm not sure how to design it better so it spends less time gathering/sending data. Is there any designs, rabbitmq, Java things I can use to help?
If it's not IO wait, then I suspect that it's down to some locking going on inside those methods.
It looks to me like your threads are spending a significant amount of time waiting for them to return. Somewhat counter-intuitively, you might well be able to increase your performance by cutting down on the number of threads, since they'll spend less time tripping over each other and more time actively doing something.
Give it a try and see what affect it has on the profile.
I've a Java client which accesses our server side over HTTP making several small requests to load each new page of data. We maintain a thread pool to handle all non UI processing, so any background client side tasks and any tasks which want to make a connection to the server. I've been looking into some performance issues and I'm not certain we've got our threadpool set up as well as possible. Currently we use a ThreadPoolExecutor with a core pool size of 8, we use a LinkedBlockingQueue for the work queue so the max pool size is ignored. No doubt there's no simple do this certain thing in all situations answer, but are there any best practices. My thinking at the moment is
1) I'll switch to using a SynchronousQueue instead of a LinkedBlockingQueue so the pool can grow to the max pool size figure.
2) I'll set the max pool size to be unlimited.
Basically my current fear is that occasional performance issues on the server side are causing unrelated client side processing to halt due to the upper limit on the thread pool size. My fear with unbounding it is the additional hit on managing those threads on the client, possibly just the better of 2 evils.
Any suggestions, best practices or useful references?
Cheers,
Robin
It sounds like you'd probably be better of limiting the queue size: does your application still behave properly when there are many requests queued (is it acceptable for all task to be queued for a long time, are some more important to others)? What happens if there are still queued tasks left and the user quits the application? If the queue growing very large, is there a chance that the server will catch-up (soon enough) to hide the problem completely from the user?
I'd say create one queue for requests whose response is needed to update the user interface, and keep its queue very small. If this queue gets too big, notify the user.
For real background tasks keep a separate pool, with a longer queue, but not infinite. Define graceful behavior for this pool when it grows or when the user wants to quit but there are tasks left, what should happen?
In general, network latencies are easily orders of magnitude higher than anything that can be happening in regards to memory allocation or thread management on the client side. So, as a general rule, if you are running into a performance bottle neck, look first and foremost to the networking link.
If the issue is that your server simply can not keep up with the requests from the clients, bumping up the threads on the client side is not going to help matters: you'll simply progress from having 8 threads waiting to get a response to more threads waiting (and you may even aggravate the server side issues by increasing its load due to higher number of connections it is managing).
Both of the concurrent queues in JDK are high performers; the choice really boils down to usage semantics. If you have non-blocking plumbing, then it is more natural to use the non-blocking queue. IF you don't, then using the blocking queues makes more sense. (You can always specify Integer.MAX_VALUE as the limit). If FIFO processing is not a requirement, make sure you do not specify fair ordering as that will entail a substantial performance hit.
As alphazero said, if you've got a bottleneck, your number of client side waiting jobs will continue to grow regardless of what approach you use.
The real question is how you want to deal with the bottleneck. Or more correctly, how you want your users to deal with the bottleneck.
If you use an unbounded queue, then you don't get feedback that the bottleneck has occurred. And in some applications, this is fine: if the user is kicking off asynchronous tasks, then there's no need to report a backlog (assuming it eventually clears). However, if the user needs to wait for a response before doing the next client-side task, this is very bad.
If you use LinkedBlockingQueue.offer() on a bounded queue, then you'll immediately get a response that says the queue is full, and can take action such as disabling certain application features, popping a dialog, whatever. This will, however, require more work on your part, particularly if requests can be submitted from multiple places. I'd suggest, if you don't have it already, you create a GUI-aware layer over the server queue to provide common behavior.
And, of course, never ever call LinkedBlockingQueue.put() from the event thread (unless you don't mind a hung client, that is).
Why not create an unbounded queue, but reject tasks (and maybe even inform the user that the server is busy (app dependent!)) when the queue reaches a certain size? You can then log this event and find out what happened on the server side for the backup to occur, Additionally, unless you are connecting to a multiple remote servers there is probably not much point having more than a couple of threads in the pool, although this does depend on your app and what it does and who it talks to.
Having an unbounded pool is usually dangerous as it generally doesn't degrade gracefully. Better to log the problem, raise an alert, prevent further actions being queued and figure out how to scale the server side, if the problem is there, to prevent this happening again.