Why is an ExecutorService created via newCachedThreadPool evil?

Why is an ExecutorService created via newCachedThreadPool evil? - java

Paul Tyma presentation has this line:
Executors.newCacheThreadPool evil, die die die
Why is it evil ?
I will hazard a guess: is it because the number of threads will grow in an unbounded fashion. Thus a server that has been slashdotted, would probably die if the JVM's max thread count was reached ?

(This is Paul)
The intent of the slide was (apart from having facetious wording) that, as you mention, that thread pool grows without bound creating new threads.
A thread pool inherently represents a queue and transfer point of work within a system. That is, something is feeding it work to do (and it may be feeding work elsewhere too). If a thread pool starts to grow its because it cannot keep up with demand.
In general, that's fine as computer resources are finite and that queue is built to handle bursts of work. However, that thread pool doesn't give you control over being able to push the bottleneck forward.
For example, in a server scenario, a few threads might be accepting on sockets and handing a thread pool the clients for processing. If that thread pool starts to grow out of control - the system should stop accepting new clients (in fact, the "acceptor" threads then often hop into the thread-pool temporarily to help process clients).
The effect is similar if you use a fixed thread pool with an unbounded input queue. Anytime you consider the scenario of the queue filling out of control - you realize the problem.
IIRC, Matt Welsh's seminal SEDA servers (which are asynchronous) created thread pools which modified their size according to server characteristics.
The idea of stop accepting new clients sounds bad until you realize the alternative is a crippled system which is processing no clients. (Again, with the understanding that computers are finite - even an optimally tuned system has a limit)
Incidentally, JVMs limit threads to 16k (usually) or 32k threads depending on the JVM. But if you are CPU bound, that limit isn't very relevant - starting yet another thread on a CPU-bound system is counterproductive.
I've happily run systems at 4 or 5 thousand threads. But nearing the 16k limit things tend to bog down (this limit JVM enforced - we had many more threads in linux C++) even when not CPU bound.

The problem with Executors.newCacheThreadPool() is that the executor will create and start as many threads as necessary to execute the tasks submitted to it. While this is mitigated by the fact that the completed threads are released (the thresholds are configurable), this can indeed lead to severe resource starvation, or even crash the JVM (or some badly designed OS).

There are a couple of issues with it. Unbounded growth in terms of threads an obvious issue – if you have cpu bound tasks then allowing many more than the available CPUs to run them is simply going to create scheduler overhead with your threads context switching all over the place and none actually progressing much. If your tasks are IO bound though things get more subtle. Knowing how to size pools of threads that are waiting on network or file IO is much more difficult, and depends a lot on the latencies of those IO events. Higher latencies mean you need (and can support) more threads.
The cached thread pool continues adding new threads as the rate of task production outstrips the rate of execution. There are a couple of small barriers to this (such as locks that serialise new thread id creation) but this can unbound growth can lead to out-of-memory errors.
The other big problem with the cached thread pool is that it can be slow for task producer thread. The pool is configured with a SynchronousQueue for tasks to be offered to. This queue implementation basically has zero size and only works when there is a matching consumer for a producer (there is a thread polling when another is offering). The actual implementation was significantly improved in Java6, but it is still comparatively slow for the producer, particularly when it fails (as the producer is then responsible for creating a new thread to add to the pool). Often it is more ideal for the producer thread to simply drop the task on an actual queue and continue.
The problem is, no-one has a pool that has a small core set of threads which when they are all busy creates new threads up to some max and then enqueues subsequent tasks. Fixed thread pools seem to promise this, but they only start adding more threads when the underlying queue rejects more tasks (it is full). A LinkedBlockingQueue never gets full so these pools never grow beyond the core size. An ArrayBlockingQueue has a capacity, but as it only grows the pool when capacity is reached this doesn't mitigate the production rate until it is already a big problem. Currently the solution requires using a good rejected execution policy such as caller-runs, but it needs some care.
Developers see the cached thread pool and blindly use it without really thinking through the consequences.

Related

Is it safe to have multiple Executors.newCachedThreadPool() in a Java program?

The spec for this method: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executors.html#newCachedThreadPool()
Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available. These pools will typically improve the performance of programs that execute many short-lived asynchronous tasks. Calls to execute will reuse previously constructed threads if available. If no existing thread is available, a new thread will be created and added to the pool. Threads that have not been used for sixty seconds are terminated and removed from the cache. Thus, a pool that remains idle for long enough will not consume any resources. Note that pools with similar properties but different details (for example, timeout parameters) may be created using ThreadPoolExecutor constructors.
It's not clear to me from this description - is it safe to have several of these pools in a single program? Or would I potentially run into a situation where one pool stalls on many threads and freezes up other pools?

I don't think there is a clear yes / no answer on this.
On the one hand, there is not a finite number of threads that ThreadPoolExecutor instances consume. The JVM architecture itself doesn't limit the number of threads.
On the the second hand, the OS / environment may place some limits:
The OS may have hard limits on the total number of native threads it will support.
The OS may restrict the number of native threads that a given process (in this case the JVM) can create. This could be done using ulimit or cgroup limits, and potentially other ways.
A Java thread stack has a size of 1MB (by default) on a typical 64 bit JVM. If you attempt to start() too many threads, you may run out of memory and get an OOME.
If there are a large enough number of threads and/or too much thread context switching, the thread scheduler (in the OS) may struggle.
(Context switching typically happens when a thread does a blocking syscall or has to wait on a lock or a notification. Each time you switch context there are hardware related overheads: saving and restoring registers, switching virtual memory contexts, flushing memory caches, etc.)
On the third hand, there are other things than just the number and size of thread pools that could cause problems. For example, if the thread tasks interact with each other, you could experience problems due to:
deadlocking when locking shared objects,
too much contention on shared locks leading to resource starvation,
too much work leading to timeouts, or
priority inversion problems ... if you try to use priorities to "manage" the workload.
So ...
Is it safe to have several of these pools in a single program?
Or would I potentially run into a situation where one pool stalls on many threads and freezes up other pools.
It is unlikely you would get a "stall" ... unless the tasks are interacting in some way.
But if you have too many runnable threads competing for CPU, each one will get (on average) a smaller share of the finite number of cores available. And lock contention or too much context switching can slow things down further.

how can i utilize the power of CLUSTER ENVIRONMENT for my thread pool that is dealing with I/O bound jobs?

I have developed a JAVA based server having a Thread Pool that is dynamically growing with respect to client request rate.This strategy is known as FBOS(Frequency Based Optimization Strategy) FBOS for Thread pool System.
For example if request rate is 5 requests per second then my thread pool will have 5 threads to service client's requests. The client requests are I/O bound jobs of 1 seconds.i.e. each request is a runnable java object that have a sleep() method to simulate I/O operation.
If client request rate is 10 requests per second then my thread pool will have 10 threads inside in it to process clients. Each Thread have an internal timer object that is activated when its corresponding thread is idle and when its idle time becomes 5 seconds the timer will delete its corresponding thread from the Thread Pool to dynamically shrink the Thread Pool.
My strategy is working well for short I/O intensities.My server is working nicely for small request rate but for large request rate my Thread pool have large number of threads inside it. For example if request rate is 100 request per second then my Thread Pool will have 100 threads inside it.
Now I have 3 questions in my mind
(1) Can i face memory leaks using this strategy, for large request rate?
(2) Can OS or JVM face excessive Thread management overhead on large request rate that will slow down the system
(3) Last and very important question is that ,I am very curious to implement my thread Pool in a clustered environment(I am DUMMY in clustering).
I just want to take advice from all of you that how a clustering environment can give me more benefit in the scenario of Frequency Based Thread Pool for I/O bound jobs only. That is can a clustering environment give me benefit of using memories of other systems(nodes)?

The simplest solution to use is a cached thread pool, see Executors I suggest you try this first. This will create the number of threads to need at once. For an IO bound request, a single machine can easily expand to 1000s of threads without needing an additional server.
Can i face memory leaks using this strategy, for large request rate?
No, 100 per second is not particularly high. If you are talking over 10,000 per second, you might have a problem (or need another server)
Can OS or JVM face excessive Thread management overhead on large request rate that will slow down the system
Yes, my rule of thumb is that 10,000 threads wastes about 1 cpu in overhead.
Last and very important question is that ,I am very curious to implement my thread Pool in a clustered environment(I am DUMMY in clustering).
Given you look to be using up to 1% of one machine, I wouldn't worry about using multiple machines to do the IO. Most likely you want to process the results, but without more information you couldn't say whether more machines would help or not.
can a clustering environment give me benefit of using memories of other systems(nodes)?
It can help if you need it or it can add complexity you don't need if you don't need it.
I suggest you start with a real problem and look for a solution to solve it, rather than start with a cool solution and try to find a problem for it to solve.

Optimization of Thread Pool Executor-java

I am using ThreadPoolexecutor by replacing it with legacy Thread.
I have created executor as below:
pool = new ThreadPoolExecutor(coreSize, size, 0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>(coreSize),
new CustomThreadFactory(name),
new CustomRejectionExecutionHandler());
pool.prestartAllCoreThreads();
Here core size is maxpoolsize/5. I have pre-started all the core threads on start up of application roughly around 160 threads.
In legacy design we were creating and starting around 670 threads.
But the point is even after using Executor and creating and replacing legacy design we are not getting much better results.
For results memory management we are using top command to see memory usage.
For time we have placed loggers of System.currentTime in millis to check the usage.
Please tell how to optimize this design. Thanks.

But the point is even after using Executor and creating and replacing legacy design we are not getting much better results.
I am assuming that you are looking at the overall throughput from your application and you are not seeing a better performance as opposed to running each task in its own thread -- i.e. not with a pool?
This sounds like you were not being blocked because of context switching. Maybe your application is IO bound or otherwise waiting on some other system resource. 670 threads sounds like a lot and you would have been using a lot of thread stack memory but otherwise it may not have been holding back the performance of your application.
Typically we use the ExecutorService classes not necessarily because they are faster than raw threads but because the code is easier to manage. The concurrent classes take care of a lot of locking, queueing, etc. out of your hands.
Couple code comments:
I'm not sure you want the LinkedBlockingQueue to be limited by core-size. Those are two different numbers. core-size is the minimum number of threads in the pool. The size of the BlockingQueue is how many jobs can be queued up waiting for a free thread.
As an aside, the ThreadPoolExecutor will never allocate a thread past the core thread number, unless the BlockingQueue is full. In your case, if all of the core-threads are busy and the queue is full with the core-size number of queued tasks is when the next thread is forked.
I've never had to use pool.prestartAllCoreThreads();. The core threads will be started once tasks are submitted to the pool so I don't think it buys you much -- at least not with a long running application.
For time we have placed loggers of System.currentTime in millis to check the usage.
Be careful on this. Too many loggers could affect performance of your application more than re-architecting it. But I assume you added the loggers after you didn't see a performance improvement.

The executor merely wraps the creation/usage of Threads, so it's not doing anything magical.
It sounds like you have a bottleneck elsewhere. Are you locking on a single object ? Do you have a single single-threaded resource that every thread hits ? In such a case you wouldn't see any change in behaviour.
Is your process CPU-bound ? If so your threads should (very roughly speaking) match the number of processing cores available. Note that each thread you create consumes memory for its stack, and if you're memory bound, then creating multiple threads won't help here.

Number of processor core vs the size of a thread pool

Many times I've heard that it is better to maintain the number of threads in a thread pool below the number of cores in that system. Having twice or more threads than the number of cores is not only a waste, but also could cause performance degradation.
Are those true? If not, what are the fundamental principles that debunk those claims (specifically relating to java)?

Many times I've heard that it is better to maintain the number of threads in a thread pool below the number of cores in that system. Having twice or more threads than the number of cores is not only a waste, but also could cause performance degradation.
The claims are not true as a general statement. That is to say, sometimes they are true (or true-ish) and other times they are patently false.
A couple things are indisputably true:
More threads means more memory usage. Each thread requires a thread stack. For recent HotSpot JVMs, the minimum thread stack size is 64Kb, and the default can be as much as 1Mb. That can be significant. In addition, any thread that is alive is likely to own or share objects in the heap whether or not it is currently runnable. Therefore is is reasonable to expect that more threads means a larger memory working set.
A JVM cannot have more threads actually running than there are cores (or hyperthread cores or whatever) on the execution hardware. A car won't run without an engine, and a thread won't run without a core.
Beyond that, things get less clear cut. The "problem" is that a live thread can in a variety of "states". For instance:
A live thread can be running; i.e. actively executing instructions.
A live thread can be runnable; i.e. waiting for a core so that it can be run.
A live thread can by synchronizing; i.e. waiting for a signal from another thread, or waiting for a lock to be released.
A live thread can be waiting on an external event; e.g. waiting for some external server / service to respond to a request.
The "one thread per core" heuristic assumes that threads are either running or runnable (according to the above). But for a lot of multi-threaded applications, the heuristic is wrong ... because it doesn't take account of threads in the other states.
Now "too many" threads clearly can cause significant performance degradation, simple by using too much memory. (Imagine that you have 4Gb of physical memory and you create 8,000 threads with 1Mb stacks. That is a recipe for virtual memory thrashing.)
But what about other things? Can having too many threads cause excessive context switching?
I don't think so. If you have lots of threads, and your application's use of those threads can result in excessive context switches, and that is bad for performance. However, I posit that the root cause of the context switched is not the actual number of threads. The root of the performance problems are more likely that the application is:
synchronizing in a particularly wasteful way; e.g. using Object.notifyAll() when Object.notify() would be better, OR
synchronizing on a highly contended data structure, OR
doing too much synchronization relative to the amount of useful work that each thread is doing, OR
trying to do too much I/O in parallel.
(In the last case, the bottleneck is likely to be the I/O system rather than context switches ... unless the I/O is IPC with services / programs on the same machine.)
The other point is that in the absence of the confounding factors above, having more threads is not going to increase context switches. If your application has N runnable threads competing for M processors, and the threads are purely computational and contention free, then the OS'es thread scheduler is going to attempt to time-slice between them. But the length of a timeslice is likely to be measured in tenths of a second (or more), so that the context switch overhead is negligible compared with the work that a CPU-bound thread actually performs during its slice. And if we assume that the length of a time slice is constant, then the context switch overhead will be constant too. Adding more runnable threads (increasing N) won't change the ratio of work to overhead significantly.
In summary, it is true that "too many threads" is harmful for performance. However, there is no reliable universal "rule of thumb" for how many is "too many". And (fortunately) you generally have considerable leeway before the performance problems of "too many" become significant.

Having fewer threads than cores generally means you can't take advantage of all available cores.
The usual question is how many more threads than cores you want. That, however, varies, depending on the amount of time (overall) that your threads spend doing things like I/O vs. the amount of time they spend doing computation. If they're all doing pure computation, then you'd normally want about the same number of threads as cores. If they're doing a fair amount of I/O, you'd typically want quite a few more threads than cores.
Looking at it from the other direction for a moment, you want enough threads running to ensure that whenever one thread blocks for some reason (typically waiting on I/O) you have another thread (that's not blocked) available to run on that core. The exact number that takes depends on how much of its time each thread spends blocked.

That's not true, unless the number of threads is vastly more than the number of cores. The reasoning is that additional threads will mean additional context switches. But it's not true because an operating system will only make unforced context switches if those context switches are beneficial, and additional threads don't force additional context switches.
If you create an absurd number of threads, that wastes resources. But none of this is anything compared to how bad creating too few threads is. If you create too few threads, an unexpected block (such as a page fault) can result in CPUs sitting idle, and that swamps any possible harm from a few extra context switches.

Not exactly true, this depends on the overall software architecture. There's a reason of keeping more threads than available cores in case some of the threads are suspended by the OS because they're waiting for an I/O to complete. This may be an explicit I/O invocation (such as synchronous reading from file), as well as implicit, such as system paging handling.
Actually I've read in one book that keeping the number of threads twice the number of CPU cores is is a good practice.

For REST API calls or say I/O-bound operations, having more threads than the number of cores can potentially improve the performance by allowing multiple API requests to be processed in parallel. However, the optimal number of threads depends on various factors such as the API request frequency, the complexity of the request processing, and the resources available on the server.
If the API request processing is CPU-bound and requires a lot of computation, having too many threads may cause resource contention and lead to reduced performance. In such cases, the number of threads should be limited to the number of cores available.
On the other hand, if the API request processing is I/O-bound and involves a lot of waiting for responses from external resources such as databases, having more threads may improve performance by allowing multiple requests to be processed in parallel.
In any case, it is recommended to perform performance testing to determine the optimal number of threads for your specific use case and monitor the system performance using metrics such as response time, resource utilization, and error rate.

Java - Managing Size of Thread Pool (Increasing mostly)

I'm trying to use thread pool in Java. But the number of threads is unknown, so I'm trying to find a solution. Then two questions occured:
I'm looking for increasing size of thread pool for some time, but I couldn't come up with something yet. Any suggestions for that? Some say Executors.newCachedThreadPool() should work but in definition of the method it says it's for short-time threads.
What if I set the size of the thread pool as a big number like 50 or 100? Does it work fine?

You can use Executors.newCachedThreadPool for more long-lived tasks also, but the thing is that if you have long running tasks, and they're added constantly and more frequently than existing tasks are being completed, the amount of threads will spin out of control. In such case it might be a better idea to use a (larger) fixed-size thread pool and let the further tasks wait in queue for a free thread.
This will only mean you'll (probably) have lots of alive threads that are sleeping (idle) most of the time. Basically the things to consider are
How many threads can your system handle (ie. how many threads can be created in total, in Windows-machines this can be less than 1000, in Linuces you can get tens of thousands of thread and even more with some tweaking of the system configuration)
Each thread consumes at least the stack size of a single thread in terms of memory (in Linux, this can be something like 1-8MB per thread by default, again it can be tweaked from ulimits and the JVM's -Xss -parameter)
At least with NPTL, there should minimal or almost zero context-switching penalty for sleeping threads, so excess threads aren't "heavy" in terms of cpu usage
That being said, it'd probably be best to use the ThreadPoolExecutor's constructors directly to get the kind of pooling you want.

Executors.newCachedThreadPool() allows you to create thread on demands. I think you can start by using this - I cannot see where it's stated that it's for short-time threads, but I bet the reason is since you are re-using available threads, having short threads allows you to keep the number of simultaneous active threads quite low.
Unless you've got too many threads running (you can check it using JVisualVM or JConsole), I would suggest sticking with that solution - specially because number of expected threads is undefined. Analyze then the VM and tune your pool accordingly.
For question 2 - were you referring to using something like Executors.newFixedThreadPool(int)? If yes, remember that going aobve the number of threads you defined when you've created the ThreadPool will make threads wait - instead of newCachedThreadPool in which new threads are dynamically created.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.