I am working with Volley library in android for Http communications . By default volley library is keeping 4 threads which take http 'Request' objects(Request object contains all those details for making an http request like url,http method,data to be posted etc) from a BlockingQueue and make http requests concurrently . When I analyze my app requirement, only below 10% of time I will be using the all 4 threads at a time , and rest of the time I will be using 1 or 2 threads from that thread pool. So in effect 2 to 3 threads will be at wait() mode almost 90% of the time .
So here is my question,
1) What is the overhead of a thread which is in wait() mode , does it consume a significant amount of cpu cycles? and is it a good idea for me to keep all those threads in wait.
I assume that since a waiting thread will be continuously checking on a monitor/lock in a loop or so(internal implementation) to wake up ,it might consume a considerable amount of cpu cycles to maintain a waiting thread . Correct me if I am wrong.
Thanks .
What is the overhead of a thread which is in wait() mode
None. Waiting thread doesn't consume any CPU cycles at all, it just waits for being awakened. So don't bother yourself.
I assume that since a waiting thread will be continuously polling on a monitor/lock internally to wake up ,it might consume a considerable amount af cpu cycles to maintain a waiting thread . Correct me if I am wrong.
That's not true. A waiting thread doesn't do any polling on a monitor/ lock/ anything.
The only situation where a big number of threads can hurt performance is where there is many active threads (much more than nr of CPUs/ cores) which are often switched back and forth. Because CPU context switching also comes with some cost. Waiting threads only consumes memory, not CPU.
If you want to look at the internal implementation of threads - I have to disappoint you. Methods like wait()/ notify() are native - which means that their implementation depends on the JVM. So in case of the HotSpot JVM you can take a look at its source code (written in C++/ with a bit of the assembler).
But do you really need this? Why you don't want to trust a JVM documentation?
Related
I have a third party API, which I call using an HTTP GET request. Each request takes a few seconds to get a response.
Currently I am using a CompletableFuture which I am executing on a FixedThreadPool of size 64. This is causing the threads to be blocked until it recieves a response for the GET request, i.e. the threads sit idle after sending the GET response until they recieve a response. So my maximum number of simultaneous requests I can send out is limited by my thread size i.e. 64 here.
What can I use instead of CompletableFuture so that my threads don't sit idle waiting for the response?
As #user207421 says:
A truly asynchronous (i.e. event driven) HTTP client application is complicated.
A multi-threaded (but fundamentally synchronous) HTTP client application is simpler, and scales to as many threads as you have memory for.
Assuming that you have 64 worker threads processing requests, the actual bottleneck is likely to be EITHER your physical network bandwidth, OR your available client-side CPU. If you have hit those limits, then:
increasing the number of worker threads is not going to help, and
switching to an asynchronous (event driven) model is not going to help.
A third possibility is that the bottleneck is server-side resource limits or rate limiting. In this scenario, increasing the client-side thread count might help, have no effect, or make the problem worse. It will depend on how the server is implemented, the nature of the requests, etc.
If your bottleneck really is the number of threads, then a simple thing to try is reducing the worker thread stack size so that you can run more of them. The default stack size is typically 1MB, and that is likely to be significantly more than it needs to be. (This will also reduce the ... erm ... memory overhead of idle threads, if that is a genuine issue.)
There are a few Java asynchronous HTTP client libraries around. But I have never used one and cannot recommend one. And like #user207421, I am not convinced that the effort of changing will actually pay off.
What can I [do] so that my threads don't sit idle waiting for the response?
Idle threads is actually not the problem. An idle thread is only using memory (and some possible secondary effects which probably don't matter here). Unless you are short of memory, it will make little difference.
Note: if there is something else for your client to do while a thread is waiting for a server response, the OS thread scheduler will switch to a different thread.
So my maximum number of simultaneous requests I can send out is limited by my thread [pool] size i.e. 64 here.
That is true. However, sending more simultaneous requests probably won't help. If the client-side threads are sitting idle, that probably means that the bottleneck is either the network, or something on the server side. If this is the case, adding more threads won't increase throughput. Instead individual requests will take (on average) longer, and throughput will stay the same ... or possibly drop if the server starts dropping requests from its request queue.
Finally, if you are worried of the overheads of a large pool of worker threads sitting idle (waiting for the next task to do), use an execution service or connection pool that can shrink and grow its thread pool to meet changing workloads.
I have developed a JAVA based server having a Thread Pool that is dynamically growing with respect to client request rate.This strategy is known as FBOS(Frequency Based Optimization Strategy) FBOS for Thread pool System.
For example if request rate is 5 requests per second then my thread pool will have 5 threads to service client's requests. The client requests are I/O bound jobs of 1 seconds.i.e. each request is a runnable java object that have a sleep() method to simulate I/O operation.
If client request rate is 10 requests per second then my thread pool will have 10 threads inside in it to process clients. Each Thread have an internal timer object that is activated when its corresponding thread is idle and when its idle time becomes 5 seconds the timer will delete its corresponding thread from the Thread Pool to dynamically shrink the Thread Pool.
My strategy is working well for short I/O intensities.My server is working nicely for small request rate but for large request rate my Thread pool have large number of threads inside it. For example if request rate is 100 request per second then my Thread Pool will have 100 threads inside it.
Now I have 3 questions in my mind
(1) Can i face memory leaks using this strategy, for large request rate?
(2) Can OS or JVM face excessive Thread management overhead on large request rate that will slow down the system
(3) Last and very important question is that ,I am very curious to implement my thread Pool in a clustered environment(I am DUMMY in clustering).
I just want to take advice from all of you that how a clustering environment can give me more benefit in the scenario of Frequency Based Thread Pool for I/O bound jobs only. That is can a clustering environment give me benefit of using memories of other systems(nodes)?
The simplest solution to use is a cached thread pool, see Executors I suggest you try this first. This will create the number of threads to need at once. For an IO bound request, a single machine can easily expand to 1000s of threads without needing an additional server.
Can i face memory leaks using this strategy, for large request rate?
No, 100 per second is not particularly high. If you are talking over 10,000 per second, you might have a problem (or need another server)
Can OS or JVM face excessive Thread management overhead on large request rate that will slow down the system
Yes, my rule of thumb is that 10,000 threads wastes about 1 cpu in overhead.
Last and very important question is that ,I am very curious to implement my thread Pool in a clustered environment(I am DUMMY in clustering).
Given you look to be using up to 1% of one machine, I wouldn't worry about using multiple machines to do the IO. Most likely you want to process the results, but without more information you couldn't say whether more machines would help or not.
can a clustering environment give me benefit of using memories of other systems(nodes)?
It can help if you need it or it can add complexity you don't need if you don't need it.
I suggest you start with a real problem and look for a solution to solve it, rather than start with a cool solution and try to find a problem for it to solve.
In my application there are several services that process information on their own thread, when they are done they post a message to the next service which then continue to do its work on its own thread. The handover of messages is done via a LinkedBlockingQueue. The handover normally takes 50-80 us (from putting a message on the queue until the consumer starts to process the message).
To speed up the handover on the most important services I wanted to use a busy spin instead of a blocking approach (I have 12 processor cores and want to dedicate 3 to these important services).
So.. I changed LinkedBlockingQueue to ConcurrentLinkedQueue
and did
for(;;)
{
Message m = queue.poll();
if( m != null )
....
}
Now.. the result is that the first message pass takes 1 us, but then the latency increases over the next 25 handovers until reaches 500 us and then the latency is suddenly back to 1 us and the starts to increase.. So I have latency cycles with 25 iterations where latency starts at 1 us and ends at 500 us. (message are passed approximately 100 times per second)
with an average latency of 250 it is not exactly the performance gain I was looking for.
I also tried to use the LMAX Disruptor ringbuffer instead of the ConcurrentLinkedQueue. That framwork have its own build in busy spin implementation and a quite different queue implementation, but the result was the same. So im quite certain that its not the fault of the queue or me abusing something..
Question is.. What the Heck is going on here? Why am I seeing this strange latency cycles?
Cheers!!
As far as I know thread scheduler can deliberately pause a thread for a longer time if it detects that this thread is using CPU quite intensively - to distribute CPU time between different threads fairer. Try adding LockSupport.park() in the consumer after queue is empty and LockSupport.unpark() in the producer after adding the message - it might make the latency less variable; whether it will actually be better comparing to blocking queue is a big question though.
If you really need doing the job the way you described (and not the way Andrey Nudko replied at Jan 5 at 13:22), then you definitedly need looking at the problem also from other viewpoints.
Just some hints:
Try checking your overall environment (outside the JVM). For example:
OS CPU scheduler has a huge impact on this..currently the default is very likely
http://en.wikipedia.org/wiki/Completely_Fair_Scheduler
number of running processes, etc.
"problems" inside your JVM
garbage collector (try different one: http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html#1.1.%20Types%20of%20Collectors%7Coutline)
Try changing thread priorities: Setting priority to Java's threads
This is just wild speculation (since as others have mentioned, you're not gathering any information on the queue length, failed polls, null polls etc.):
I used the force and read the source of ConcurrentLinkedQueue, or rather, briefly leafed through it for a minute or two. The polling is not quite your trivial O(1) operation. It might be the case that you're traversing more than a few nodes which have become stale, holding null; and there might be additional transitory states involving nodes linking to themselves as the next node as indication of staleness/removal from the queue. It may be that the queue is starting to build up garbage due to thread scheduling. Try following the links to the abstract algorithm mentioned in the code:
Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue by Maged M. Michael and Michael L. Scott (link has a PDF and pseudocode).
Here is my 2 cents. If you are running on linux/unix based systems, there is a way to dedicate a certain cpu to a certain thread. In essence, you can make the OS ignore that cpu for any scheduling. Checkout the isolation levels for cpu
I'm trying to use thread pool in Java. But the number of threads is unknown, so I'm trying to find a solution. Then two questions occured:
I'm looking for increasing size of thread pool for some time, but I couldn't come up with something yet. Any suggestions for that? Some say Executors.newCachedThreadPool() should work but in definition of the method it says it's for short-time threads.
What if I set the size of the thread pool as a big number like 50 or 100? Does it work fine?
You can use Executors.newCachedThreadPool for more long-lived tasks also, but the thing is that if you have long running tasks, and they're added constantly and more frequently than existing tasks are being completed, the amount of threads will spin out of control. In such case it might be a better idea to use a (larger) fixed-size thread pool and let the further tasks wait in queue for a free thread.
This will only mean you'll (probably) have lots of alive threads that are sleeping (idle) most of the time. Basically the things to consider are
How many threads can your system handle (ie. how many threads can be created in total, in Windows-machines this can be less than 1000, in Linuces you can get tens of thousands of thread and even more with some tweaking of the system configuration)
Each thread consumes at least the stack size of a single thread in terms of memory (in Linux, this can be something like 1-8MB per thread by default, again it can be tweaked from ulimits and the JVM's -Xss -parameter)
At least with NPTL, there should minimal or almost zero context-switching penalty for sleeping threads, so excess threads aren't "heavy" in terms of cpu usage
That being said, it'd probably be best to use the ThreadPoolExecutor's constructors directly to get the kind of pooling you want.
Executors.newCachedThreadPool() allows you to create thread on demands. I think you can start by using this - I cannot see where it's stated that it's for short-time threads, but I bet the reason is since you are re-using available threads, having short threads allows you to keep the number of simultaneous active threads quite low.
Unless you've got too many threads running (you can check it using JVisualVM or JConsole), I would suggest sticking with that solution - specially because number of expected threads is undefined. Analyze then the VM and tune your pool accordingly.
For question 2 - were you referring to using something like Executors.newFixedThreadPool(int)? If yes, remember that going aobve the number of threads you defined when you've created the ThreadPool will make threads wait - instead of newCachedThreadPool in which new threads are dynamically created.
i would like to ask if Java will utilize more CPU resources when threads are blocked, i.e. waiting to lock a monitor which is currently being locked by another thread.
I am now looking at a thread dump whereby some threads are blocked as they are waiting to lock a monitor, and i am unsure if that is what may be accountable for the high CPU usage.
Thanks!
EDIT (6 May 2011) I forgot to mention if this behavior is relevant for Java SE 1.4.2.
Threads consume resources such as memory. A blocking/unblocking thread incurs a once off cost. If a thread blocking/unblocks tens of thousands of times per second this can waste significant amounts of CPU.
However once a thread is blocked, it doesn't matter how long it is blocked for, there is no ongoing cost.
The answer is not so simple. There may be cases where threads that go into the blocked state may end up causing CPU utilization.
Most JVMs employ tiered locking algorithms. The often involve algorithms such as spinlocks especially for locks held for a short duration. When a thread tries to acquire a monitor and finds it cannot, the JVM may actually put it in a loop and have the thread attempt to acquire the monitor, rather than context switching it out immediately. If the thread fails to acquire the lock after a certain number of tries or duration (depending on the specific JVM implementation), the JVM switches to a "fat lock" or "inflated lock" mode where it does context switch out the thread.
It is with the spinlock behavior where you may incur CPU costs. If you have code that holds lock for a very short duration and the contention is high, then you may see appreciable bump in the CPU utilization. For some discussions on various techniques JVMs use to reduce costs on contention, see http://www.ibm.com/developerworks/java/library/j-jtp10185/index.html.
No, threads that are blocked on a monitor do not take up additional CPU time.
Suspended or blocked thread do not consume any CPU time.