I would like to know which one should i use in the particular scenario:
there are several tasks, usually 400k tasks to process.
most of the tasks took less than 4 sec to process but some of them (300 to 500 tasks) took long time usually between 10 to 30 min.
Currently, we have FixedThreadPool implemented with size 200.
I am wondering if we can do it better with CachedThreadpool?
Also want to know what will be the impact on server as only one server is dedicated for the task.
All task performs just calculations. No I/O operations
Thread pool type, in your case, does not impact performance because the cost of thread management is very small compared to each task cost (from 4 s. to 30 m.).
The number of parallel threads that are running is more important. If each task does not perform any I/O operation, probably the correct number of parallel threads is the number of cores of your hardware. If your tasks involve network or disk I/O, it is more difficult to determine the correct level of parallelism to maximize performance.
Starting point
Here's what stands out in your question:
There are ~400,000 tasks to process
Most tasks (~399,500 or ~99.875%) take 4 seconds or less to complete
Some tasks (~500 or ~0.125%) usually take 10-30 minutes to complete
Tasks perform "no I/O operations"
Current approach uses a FixedThreadPool with size 200
Overview
Given that the tasks perform "no I/O operations", this implies:
no disk I/O
no network I/O
The tasks are then bound (limited) either by CPU or memory.
A first step would be to understand which of the two is the limiter: CPU or memory.
Nothing in your problem statement sounds like the choice of thread pool is a factor.
Limited by CPU
If the work is CPU-bound, in general nothing will be improved by increasing the thread pool size beyond the number of available CPU cores. So if you have a 32 CPU cores available, a thread pool with a larger number of active threads (for example: 100) would incur overhead (run slower) due to context switching. Things won't "go faster" with more threads if the underlying contended resource is CPU.
With a CPU-bound problem, I would first set the thread pool no higher than total CPU cores on the machine (and probably less). So if your machine has 32 cores, try a thread pool of 16 or maybe 20 to start. Then process real-world data and make obesrvations about performance, possibly making additional changes based on those test runs.
Besides your own program, there are always other things running on any computer system, so it isn't a given that 16 (for example) is "low enough" – it depends on what else is running on the system. That's the importance of doing a test run though – set it to 16, look for signs of CPU contention, possibly reduce below 16 if needed; or maybe there's plenty of idle CPU available with 16, so it could be safe/fine to increase higher.
Limited by memory
If the work is memory-bound, the thread pool size isn't as directly tied to the contended resource (like it is with CPU cores). It might take additional
understanding to decide if or how to tune the system to avoid memory contention.
As with CPU-bound problems, you should be able to start with a fixed size (something smaller than 200), and make observations using real-world data sets.
There should be some pattern that emerges, perhaps (for example) that the ~500 or so 10-30 minute tasks use way more memory than all other tasks.
Related
Please I got confused about something.
What I know is that the maximum number of threads that can run concurrently on a normal CPU of a modern computer ranges from 8 to 16 threads.
On the other hand, using GPUs thousands of threads can run concurrently without the scheduler interrupting any thread to schedule another one.
On several posts as:
Java virtual machine - maximum number of threads https://community.oracle.com/message/10312772
people are stating that they run thousands of java threads concurrently on normal CPUs.
How could this be ??
And how can I know the maximum number of threads that can run concurrently so that my code adjusts it self dynamically according to the underlying architecture.
Threads aren't tied to or limited by the number of available processors/cores. The operating system scheduler can switch back and forth between any number of threads on a single CPU. This is the meaning of "preemptive multitasking."
Of course, if you have more threads than cores, not all threads will be executing simultaneously. Some will be on hold, waiting for a time slot.
In practice, the number of threads you can have is limited by the scheduler - but that number is usually very high (thousands or more). It will vary from OS to OS and with individual versions.
As far as how many threads are useful from a performance standpoint, as you said it depends on the number of available processors and on whether the task is IO or CPU bound. Experiment to find the optimal number and make it configurable if possible.
There is hardware and software concurrency. The 8 to 16 threads refers to the hardware you have - that is one or more CPUs with hardware to execute 8 to 16 threads parallel to each other. The thousands of threads refers to the number of software threads, the scheduler will have to swap them out so every software thread gets its time slice to run on the hardware.
To get the number of hardware threads you can try Runtime.availableProcessors().
At any given time, a processor will run the number of threads equal to the number of cores contained. This means that on a uniprocessor system, only one thread (or no thread) is being run at any given moment.
However, processors do not run each thread one after another, rather they switch between multiple threads rapidly to simulate concurrent execution. If this weren't the case let alone create multiple threads, you won't even be able to start multiple applications.
A java thread (compared to processor instructions) is a very high level abstraction of a set of instructions for the CPU to process. When it gets down to the processor level, there is no guarantee which threads will run on which core at any given time. But given that processors rapidly switch between these threads, it is theoretically possible to create an infinite amount of threads albeit at the cost of performance.
If you think about it, a modern computer has thousands of threads running at the same time (combining all applications) while only having 1 ~ 16 (typical case) number of cores. Without this task-switching, nothing would ever get done.
If you are optimizing your application, you should consider the amount of threads you need by the work at hand, and not by the underlying architecture. Performance gains from parallelism should be weighted against increasing overheads of thread execution. Since every machine is different, every runtime environment is different, it is impractical to work out some golden thread count (however, a ballpark estimate may be made by benchmarking and looking at number of cores).
While all the other answers have explained how you can theoretically have thousands of threads in your application at the cost of memory and other overheads already well explained here. It is however worth noting that the default concurrencyLevel for the data structures provided in the java.util.concurrent package is 16.
You will come across contention issues if you don't account for the same.
Using a significantly higher value than you need can waste space and time, and a significantly lower value can lead to thread contention.
Make sure you have set the appropriate concurrencyLevel in case you are running into issues related to concurrency with a higher number of threads.
Im using a thread pool to execute tasks , that are mostly cpu based with a bit of I/O,
of size one larger than the number of cpus.
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors() + 1)
Assuming case of simple program that submits all its tasks to this executor and does little else I assume having a thread pool any larger would slow things because the OS would have to timeslice it cpus more often chance to give each thread in the threadpool a chance to run.
Is that correct, and if so is this a real problem or mostly theoretical, i.e if I increased threadpool size to 1000 would I notice a massive difference.
If you have CPU bound tasks, as you increase the number of threads you get increasing overhead and slower performances. Note: having more threads than waiting tasks is just a waste of resources, but may not slow down the tasks so much.
I would use a multiple (e.g. 1 or 2) of the number of cpus rather than adding just one as having one too many threads can have a surprising amount of overhead.
For reference, check this description.
http://codeidol.com/java/java-concurrency/Applying-Thread-Pools/Sizing-Thread-Pools/
In short, what you have (No. CPU + 1) is optimal on average.
I have large array of type C and a pool of threads. Each thread has a range of indexes (they don't overlap) and does some CPU bound operations to populate them.
After submission of tasks to the executor (created with newFixedThreadPool) I monitor the output of 'top' command and can notice that the cpu spends significant amount of time in kernel space ("%sy" in 'top' output) - between 15 and 25% - during the execution of those tasks (before it is low and after it decreases again).
On some test runs it does happen that "%sy" stays close to 0 and then the execution is much faster.
The number of threads is equal to the number of logical cpus on the test machine and this is also the number of tasks that I submit to the executor (so it's like 1 thread - 1 CPU bound task). Therefore I wouldn't expect here a lot of context switching.
In this part of code there is no explicit synchronization done by me, I rely only on the guarantees provided by the executor service as the threads don't share any variables.
Operating system is Amazon Linux AMI 2014.09, the program runs on Java 8.
Any ideas why this could happen? How I can debug such issue?
You might need to use a Profiler
Many times I've heard that it is better to maintain the number of threads in a thread pool below the number of cores in that system. Having twice or more threads than the number of cores is not only a waste, but also could cause performance degradation.
Are those true? If not, what are the fundamental principles that debunk those claims (specifically relating to java)?
Many times I've heard that it is better to maintain the number of threads in a thread pool below the number of cores in that system. Having twice or more threads than the number of cores is not only a waste, but also could cause performance degradation.
The claims are not true as a general statement. That is to say, sometimes they are true (or true-ish) and other times they are patently false.
A couple things are indisputably true:
More threads means more memory usage. Each thread requires a thread stack. For recent HotSpot JVMs, the minimum thread stack size is 64Kb, and the default can be as much as 1Mb. That can be significant. In addition, any thread that is alive is likely to own or share objects in the heap whether or not it is currently runnable. Therefore is is reasonable to expect that more threads means a larger memory working set.
A JVM cannot have more threads actually running than there are cores (or hyperthread cores or whatever) on the execution hardware. A car won't run without an engine, and a thread won't run without a core.
Beyond that, things get less clear cut. The "problem" is that a live thread can in a variety of "states". For instance:
A live thread can be running; i.e. actively executing instructions.
A live thread can be runnable; i.e. waiting for a core so that it can be run.
A live thread can by synchronizing; i.e. waiting for a signal from another thread, or waiting for a lock to be released.
A live thread can be waiting on an external event; e.g. waiting for some external server / service to respond to a request.
The "one thread per core" heuristic assumes that threads are either running or runnable (according to the above). But for a lot of multi-threaded applications, the heuristic is wrong ... because it doesn't take account of threads in the other states.
Now "too many" threads clearly can cause significant performance degradation, simple by using too much memory. (Imagine that you have 4Gb of physical memory and you create 8,000 threads with 1Mb stacks. That is a recipe for virtual memory thrashing.)
But what about other things? Can having too many threads cause excessive context switching?
I don't think so. If you have lots of threads, and your application's use of those threads can result in excessive context switches, and that is bad for performance. However, I posit that the root cause of the context switched is not the actual number of threads. The root of the performance problems are more likely that the application is:
synchronizing in a particularly wasteful way; e.g. using Object.notifyAll() when Object.notify() would be better, OR
synchronizing on a highly contended data structure, OR
doing too much synchronization relative to the amount of useful work that each thread is doing, OR
trying to do too much I/O in parallel.
(In the last case, the bottleneck is likely to be the I/O system rather than context switches ... unless the I/O is IPC with services / programs on the same machine.)
The other point is that in the absence of the confounding factors above, having more threads is not going to increase context switches. If your application has N runnable threads competing for M processors, and the threads are purely computational and contention free, then the OS'es thread scheduler is going to attempt to time-slice between them. But the length of a timeslice is likely to be measured in tenths of a second (or more), so that the context switch overhead is negligible compared with the work that a CPU-bound thread actually performs during its slice. And if we assume that the length of a time slice is constant, then the context switch overhead will be constant too. Adding more runnable threads (increasing N) won't change the ratio of work to overhead significantly.
In summary, it is true that "too many threads" is harmful for performance. However, there is no reliable universal "rule of thumb" for how many is "too many". And (fortunately) you generally have considerable leeway before the performance problems of "too many" become significant.
Having fewer threads than cores generally means you can't take advantage of all available cores.
The usual question is how many more threads than cores you want. That, however, varies, depending on the amount of time (overall) that your threads spend doing things like I/O vs. the amount of time they spend doing computation. If they're all doing pure computation, then you'd normally want about the same number of threads as cores. If they're doing a fair amount of I/O, you'd typically want quite a few more threads than cores.
Looking at it from the other direction for a moment, you want enough threads running to ensure that whenever one thread blocks for some reason (typically waiting on I/O) you have another thread (that's not blocked) available to run on that core. The exact number that takes depends on how much of its time each thread spends blocked.
That's not true, unless the number of threads is vastly more than the number of cores. The reasoning is that additional threads will mean additional context switches. But it's not true because an operating system will only make unforced context switches if those context switches are beneficial, and additional threads don't force additional context switches.
If you create an absurd number of threads, that wastes resources. But none of this is anything compared to how bad creating too few threads is. If you create too few threads, an unexpected block (such as a page fault) can result in CPUs sitting idle, and that swamps any possible harm from a few extra context switches.
Not exactly true, this depends on the overall software architecture. There's a reason of keeping more threads than available cores in case some of the threads are suspended by the OS because they're waiting for an I/O to complete. This may be an explicit I/O invocation (such as synchronous reading from file), as well as implicit, such as system paging handling.
Actually I've read in one book that keeping the number of threads twice the number of CPU cores is is a good practice.
For REST API calls or say I/O-bound operations, having more threads than the number of cores can potentially improve the performance by allowing multiple API requests to be processed in parallel. However, the optimal number of threads depends on various factors such as the API request frequency, the complexity of the request processing, and the resources available on the server.
If the API request processing is CPU-bound and requires a lot of computation, having too many threads may cause resource contention and lead to reduced performance. In such cases, the number of threads should be limited to the number of cores available.
On the other hand, if the API request processing is I/O-bound and involves a lot of waiting for responses from external resources such as databases, having more threads may improve performance by allowing multiple requests to be processed in parallel.
In any case, it is recommended to perform performance testing to determine the optimal number of threads for your specific use case and monitor the system performance using metrics such as response time, resource utilization, and error rate.
Im using a thread pool to execute tasks , that are mostly cpu based with a bit of I/O,
of size one larger than the number of cpus.
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors() + 1)
Assuming case of simple program that submits all its tasks to this executor and does little else I assume having a thread pool any larger would slow things because the OS would have to timeslice it cpus more often chance to give each thread in the threadpool a chance to run.
Is that correct, and if so is this a real problem or mostly theoretical, i.e if I increased threadpool size to 1000 would I notice a massive difference.
If you have CPU bound tasks, as you increase the number of threads you get increasing overhead and slower performances. Note: having more threads than waiting tasks is just a waste of resources, but may not slow down the tasks so much.
I would use a multiple (e.g. 1 or 2) of the number of cpus rather than adding just one as having one too many threads can have a surprising amount of overhead.
For reference, check this description.
http://codeidol.com/java/java-concurrency/Applying-Thread-Pools/Sizing-Thread-Pools/
In short, what you have (No. CPU + 1) is optimal on average.