Java Thread Pool Size For Production [duplicate] - java

Im using a thread pool to execute tasks , that are mostly cpu based with a bit of I/O,
of size one larger than the number of cpus.
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors() + 1)
Assuming case of simple program that submits all its tasks to this executor and does little else I assume having a thread pool any larger would slow things because the OS would have to timeslice it cpus more often chance to give each thread in the threadpool a chance to run.
Is that correct, and if so is this a real problem or mostly theoretical, i.e if I increased threadpool size to 1000 would I notice a massive difference.

If you have CPU bound tasks, as you increase the number of threads you get increasing overhead and slower performances. Note: having more threads than waiting tasks is just a waste of resources, but may not slow down the tasks so much.
I would use a multiple (e.g. 1 or 2) of the number of cpus rather than adding just one as having one too many threads can have a surprising amount of overhead.

For reference, check this description.
http://codeidol.com/java/java-concurrency/Applying-Thread-Pools/Sizing-Thread-Pools/
In short, what you have (No. CPU + 1) is optimal on average.

Related

CachedThreadPool vs FixedThreadPool for huge number of task

I would like to know which one should i use in the particular scenario:
there are several tasks, usually 400k tasks to process.
most of the tasks took less than 4 sec to process but some of them (300 to 500 tasks) took long time usually between 10 to 30 min.
Currently, we have FixedThreadPool implemented with size 200.
I am wondering if we can do it better with CachedThreadpool?
Also want to know what will be the impact on server as only one server is dedicated for the task.
All task performs just calculations. No I/O operations
Thread pool type, in your case, does not impact performance because the cost of thread management is very small compared to each task cost (from 4 s. to 30 m.).
The number of parallel threads that are running is more important. If each task does not perform any I/O operation, probably the correct number of parallel threads is the number of cores of your hardware. If your tasks involve network or disk I/O, it is more difficult to determine the correct level of parallelism to maximize performance.
Starting point
Here's what stands out in your question:
There are ~400,000 tasks to process
Most tasks (~399,500 or ~99.875%) take 4 seconds or less to complete
Some tasks (~500 or ~0.125%) usually take 10-30 minutes to complete
Tasks perform "no I/O operations"
Current approach uses a FixedThreadPool with size 200
Overview
Given that the tasks perform "no I/O operations", this implies:
no disk I/O
no network I/O
The tasks are then bound (limited) either by CPU or memory.
A first step would be to understand which of the two is the limiter: CPU or memory.
Nothing in your problem statement sounds like the choice of thread pool is a factor.
Limited by CPU
If the work is CPU-bound, in general nothing will be improved by increasing the thread pool size beyond the number of available CPU cores. So if you have a 32 CPU cores available, a thread pool with a larger number of active threads (for example: 100) would incur overhead (run slower) due to context switching. Things won't "go faster" with more threads if the underlying contended resource is CPU.
With a CPU-bound problem, I would first set the thread pool no higher than total CPU cores on the machine (and probably less). So if your machine has 32 cores, try a thread pool of 16 or maybe 20 to start. Then process real-world data and make obesrvations about performance, possibly making additional changes based on those test runs.
Besides your own program, there are always other things running on any computer system, so it isn't a given that 16 (for example) is "low enough" – it depends on what else is running on the system. That's the importance of doing a test run though – set it to 16, look for signs of CPU contention, possibly reduce below 16 if needed; or maybe there's plenty of idle CPU available with 16, so it could be safe/fine to increase higher.
Limited by memory
If the work is memory-bound, the thread pool size isn't as directly tied to the contended resource (like it is with CPU cores). It might take additional
understanding to decide if or how to tune the system to avoid memory contention.
As with CPU-bound problems, you should be able to start with a fixed size (something smaller than 200), and make observations using real-world data sets.
There should be some pattern that emerges, perhaps (for example) that the ~500 or so 10-30 minute tasks use way more memory than all other tasks.

How many threads should I create in my case?

I searched in Google for a solution but I'm still a bit confused about how many threads I should use in my particular case.
I have two usages of threads. First, I have a folder with 10 files in it which I want to parse in parallel (independent of each other). Second, I have a shared data object on which 100 tasks run. Each tasks consists of reading the data object and writing to a shared structure (HashMap).
Should I use only as many threads as CPU cores? Or should I use a ThreadPoolExecutor with a minimum number of threads equals 2 and a maximum number equals 999 (then 100 threads are created)?
Consider use of Executors.newCachedThreadPool(). This creates a thread pool with as many threads needed and reuse idle threads.
I can't tell you how many threads will be created for your 100 tasks. If task is long to execute, 100 threads will be created to start all tasks in parallel immediatly. If task is very short or if you don't push all tasks at the same moment, first thread will be reused for executing more tasks (and not only one).
By the way, creating a thread implies some cost (cpu and memory) and too many threads can be useless due to limitation of number of cores. In this case, you can also limit the number of threads using Executors.newFixedThreadPool( int nThreads ).
A widespread practice is use of number of cores x 2 for the thread count
The ThreadPoolExecutor is only a higher level way to apply multithreading the substance don't change, but it's use can be helpful in the management.
There is no real rules all depends on the type of processing, IO, sync/async tasks involved.
Normally for batch processing for evaluate the number of needed/optimal threads I start with a number of thread == number of CPU then by trial I estimate if can be benefical increase them, depending on the type of tasks involved a slightly higher number of threads (than the number of cores) can be benefical to performance
For example you can try starting with 1.5*cpu tasks verifying the performance difference with 1*cpu and 2*cpu
Bye
Using Executors is recommended as in that case you shall have a minimum threshold for creation of threads and threads will be reused otherwise creating separate threads for each task may result in creation of too many threads.

Increasing Thread Count in Executors.newFixedThreadPool() slows down

If number of threads are increased from nThread to nThread + 1, the speed decreases by half.
ExecutorService executor = Executors.newFixedThreadPool(nThread);
If I just set nThread to 1, it doesn't use all my cores. What's going on?
My task doesn't involve reading file or network. It creates objects and computes. However, it reads a data from a vector.
Can multiple threads reading data from a same vector decrease performance? If so, how can I fix then?
A vector is an old list implementation that relies on a lock to provide threadsafety. If multiple threads at the same time are accessing that vector, these threads will suffer from lock contention and that is probably what you are experiencing now.
If the vector is only read from, I would replace it by an ArrayList (or an array). Because no locking is done, and in case of a readonly data-structure, isn't needed.
If number of threads are more then irrespective of number of tasks, context switching will be slow as threads in a thread pool executor are of same priority and CPU has to be shared amongst them. Also more are number of threads , more are the chances of threads waiting for a monitor.
Even if there is no synchronization, more number of threads can heavily affect performance.
In one of the applications I have worked upon, there was a task of xml parsing which took 100 ms, increased to 5 seconds when number of threads increased from 10 to 50.
Configuring thread pool is a learn and implements thing. It do depends on the no of cores in CPU, More cores will allow more parallel processing.

Optimising max number of threads running on a CPU

Just wondering what is the best way to decide when to stop creating new threads on a single-core machine which is running the same program multiple times as a thread?
The threads are fetching web content and doing a bit of processing, which means the load of each thread is not constant all the way until the thread terminates.
I'm thinking to have a thread which monitors the CPU/RAM load, and stop creating threads if the load reaches a certain treshold, but also stop creating threads if a certain threads count has been reached, to make sure the CPU doesn't get overloaded.
Any feedback on what techniques are out there to achieve this?
Many thanks,
Vladimir
It is going to be difficult to do this by monitoring the CPU used by the current process. Those numbers tend to lag reality and the result is going to be peaks and valleys to a large degree. The problem is that your threads are mostly going to be blocked by IO and there is not any good way to anticipate when bytes will be available to be read in the near future.
That said, you could start out with a ThreadPoolExecutor at a certain max thread number (for a single processor let's say 4) and then check every 10 seconds or so the load average. If the load average is below what you want then you could call setMaximumPoolSize(...) with a larger value to increase it for the next 10 seconds. You may need to poll 30 or more seconds between each calculation to smooth out the performance of your application.
You could use the following code to track your total CPU time for all threads. Not sure if that's the best way to do it
long total = 0;
for (long id : threadMxBean.getAllThreadIds()) {
long cpuTime = threadMxBean.getThreadCpuTime(id);
if (cpuTime > 0) {
total += cpuTime;
}
}
// since is in nano-seconds
long currentCpuMillis = total / 1000000;
Instead of trying to maximize the CPU level for your spider, you might consider trying to maximize throughput. Take the sample of the number of pages spidered per a unit of time and increase or decrease the max number of threads in your ExecutorService until this is maximized.
One thing to consider is to use NIO and selectors so your threads are always busy as opposed to always waiting for IO. Here's a good example tutorial about NIO/Selectors. You might also consider using Pyronet which seems to provide some good features around NIO.
If async I/O is not a good fit, I would consider using thread pools, e.g. ThreadPoolExecutor, so you don't have the overhead of creating, destroying and recreating threads.
Then I would do performance testing to tweak the max number of threads offers the best performance.
You could start with 10 threads, then rerun your performance test with 20 threads until you hone in on an optimal value. At the same time I would use system tools (depending on your OS) to monitor the thread run queue, JVM, etc.
For the performance test you would have to ensure that your test is repeatable (i.e. using the same inputs) and representative of the actual input that your program would be using.

What is optimum thread pool size for simple program running cpu based tasks in Java

Im using a thread pool to execute tasks , that are mostly cpu based with a bit of I/O,
of size one larger than the number of cpus.
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors() + 1)
Assuming case of simple program that submits all its tasks to this executor and does little else I assume having a thread pool any larger would slow things because the OS would have to timeslice it cpus more often chance to give each thread in the threadpool a chance to run.
Is that correct, and if so is this a real problem or mostly theoretical, i.e if I increased threadpool size to 1000 would I notice a massive difference.
If you have CPU bound tasks, as you increase the number of threads you get increasing overhead and slower performances. Note: having more threads than waiting tasks is just a waste of resources, but may not slow down the tasks so much.
I would use a multiple (e.g. 1 or 2) of the number of cpus rather than adding just one as having one too many threads can have a surprising amount of overhead.
For reference, check this description.
http://codeidol.com/java/java-concurrency/Applying-Thread-Pools/Sizing-Thread-Pools/
In short, what you have (No. CPU + 1) is optimal on average.

Categories

Resources