Threads utilization strategy

Threads utilization strategy - java

In my Java-system, I have X persons, each person has Y strings, where Y >> X. I need to execute some complex calculations on each string. In order to boost the process, I run strings computing process in separate threads (threads number = CPU cores * 2). My question is should I put each person treatment in the separate thread too or it is enough to run only strings treatment in separate threads?
Should I execute person treatment in separate threads in additional to the thread-based strings computing? Or, because I'm already using the maximum optimal number of threads per number of CPU cores for strings treatment I will not benefit if will put the persons in the separate threads.
All persons are independent of each other.
All person's strings are independent of each other.

I think creating additional threads can slow down the processing, because of some additional overhead needed for new threads creation. But to be sure try to do an experiment. Try with different numbers of threads, then choose the optimal number.
P.S. Like other people in this topic I would recommend using thread pool for this task.
P.P.S. Consider using java.util.concurrent FixedThreadPool (launches n threads, if there are more tasks they are waiting for free thread) or CachedThreadPool (if there are more tasks creates new thread, otherwise reuses existing sleeping threads).
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool(int)
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executors.html#newCachedThreadPool()

I am first assuming that the threads are native threads (not green threads for performance reasons).
There isn't really a performance consideration with passing references of objects into the thread
other than making the gc continually skipping the reference for clean up which is more efficient than
serializing/deserializing the object into the thread.
Long story short, you should avoid creating any unnecessary threads that exceed the hardware capacity if you know that the running threads have a high utilization rate (ie very rarely blocking on io/net/db/etc) otherwise you will force the cpu to perform a thread context switch which is very expensive.

I would likely create a thread pool with a configurable size, which process a queue of person objects.
This allows then a thread to access, update and process an entire persons data without concerns of conflicts with other threads.
If there is IO within the process, you might be able to increase your thread pool size, or decrease it if over utilising the CPU.
Hope that helps

If processing each string takes in the order of 1µs or more, you should be fine putting each string processing in its own Runnable and pass that job to a ThreadPool with as many worker threads as you have logical CPUs. If they are faster, you should batch them so there is less overhead handling the job queue.

Related

How many threads should I create in my case?

I searched in Google for a solution but I'm still a bit confused about how many threads I should use in my particular case.
I have two usages of threads. First, I have a folder with 10 files in it which I want to parse in parallel (independent of each other). Second, I have a shared data object on which 100 tasks run. Each tasks consists of reading the data object and writing to a shared structure (HashMap).
Should I use only as many threads as CPU cores? Or should I use a ThreadPoolExecutor with a minimum number of threads equals 2 and a maximum number equals 999 (then 100 threads are created)?

Consider use of Executors.newCachedThreadPool(). This creates a thread pool with as many threads needed and reuse idle threads.
I can't tell you how many threads will be created for your 100 tasks. If task is long to execute, 100 threads will be created to start all tasks in parallel immediatly. If task is very short or if you don't push all tasks at the same moment, first thread will be reused for executing more tasks (and not only one).
By the way, creating a thread implies some cost (cpu and memory) and too many threads can be useless due to limitation of number of cores. In this case, you can also limit the number of threads using Executors.newFixedThreadPool( int nThreads ).
A widespread practice is use of number of cores x 2 for the thread count

The ThreadPoolExecutor is only a higher level way to apply multithreading the substance don't change, but it's use can be helpful in the management.
There is no real rules all depends on the type of processing, IO, sync/async tasks involved.
Normally for batch processing for evaluate the number of needed/optimal threads I start with a number of thread == number of CPU then by trial I estimate if can be benefical increase them, depending on the type of tasks involved a slightly higher number of threads (than the number of cores) can be benefical to performance
For example you can try starting with 1.5*cpu tasks verifying the performance difference with 1*cpu and 2*cpu
Bye

Using Executors is recommended as in that case you shall have a minimum threshold for creation of threads and threads will be reused otherwise creating separate threads for each task may result in creation of too many threads.

Increasing Thread Count in Executors.newFixedThreadPool() slows down

If number of threads are increased from nThread to nThread + 1, the speed decreases by half.
ExecutorService executor = Executors.newFixedThreadPool(nThread);
If I just set nThread to 1, it doesn't use all my cores. What's going on?
My task doesn't involve reading file or network. It creates objects and computes. However, it reads a data from a vector.
Can multiple threads reading data from a same vector decrease performance? If so, how can I fix then?

A vector is an old list implementation that relies on a lock to provide threadsafety. If multiple threads at the same time are accessing that vector, these threads will suffer from lock contention and that is probably what you are experiencing now.
If the vector is only read from, I would replace it by an ArrayList (or an array). Because no locking is done, and in case of a readonly data-structure, isn't needed.

If number of threads are more then irrespective of number of tasks, context switching will be slow as threads in a thread pool executor are of same priority and CPU has to be shared amongst them. Also more are number of threads , more are the chances of threads waiting for a monitor.
Even if there is no synchronization, more number of threads can heavily affect performance.
In one of the applications I have worked upon, there was a task of xml parsing which took 100 ms, increased to 5 seconds when number of threads increased from 10 to 50.
Configuring thread pool is a learn and implements thing. It do depends on the no of cores in CPU, More cores will allow more parallel processing.

Clarification on Thread performance processing 1000's of log files

I am extracting out lines matching a pattern from log files. Hence I allotted each log file to a Runnable object which writes the found pattern lines to a result file. (well synchronised writer methods)
Important snippet under discussion :
ExecutorService executor = Executors.newFixedThreadPool(NUM_THREAD);
for (File eachLogFile : hundredsOfLogFilesArrayObject) {
executor.execute(new RunnableSlavePatternMatcher(eachLogFile));
}
Important Criteria :
The number of log files could be very few like 20 or for some users the number of logs files could cross 1000. I recorded series of tests in an excel sheet and I am really concerned on the RED marked results. 1. I assume that if the number of threads created is equal to the number of files to be processed then the processing time would be less, compared to the case when the number of thread is lesser than the number of files to be processed which didn't happen. (please advice me if my understanding is wrong)
Result :
I would like to identify a value for the NUM_THREAD which is efficient for less number of files as well as 1000's of files
Suggest me answer for Question 1 & 2
Thanks !
Chandru

you just found that your program is not CPU bound but (likely) IO bound
this means that beyond 10 threads the OS can't keep up with the requested reads of all the thread that want their data and more threads are waiting for the next block of data at a time
also because writing the output is synchronized across all threads that may even be the biggest bottle neck in your program, (producer-consumer solution may be the answer here to minimize the time threads are waiting to output)
the optimal number of threads depends on how fast you can read the files (the faster you can read the more threads are useful),

It appears that 2 threads is enough to use all your processing power. Most likely you have two cores and hyper threading.
Mine is a Intel i5 2.4GHz 4CPU 8GB Ram . Is this detail helpful ?
Depending on the model, this has 2 cores and hyper-threading.
I assume that if the number of threads created is equal to the number of files to be processed then the processing time would be less,
This will maximise the overhead, but wont give you more cores than you have already.

When parallelizing, using a lot more threads than you have available cpu cores will usually increase the overall time. You system will spend some overhead time switching from thread to thread on one cpu core instead of having it executing the tasks at once, one after an other.
If you have 8 cpu cores on your computer, you might observe some improvement using 8/9/10 threads instead of using only 1 while using 20+ threads will actually be less efficient.

One problem is that I/O doesn't parallelize well, especially if you have a non-SSD, since sequential reads (what happens when one thread reads a file) are much faster than random reads (when the read head has to jump around between different files read by several threads). I would guess you could speed up the program by reading the files from the thread sending the jobs to the executor:
for (File file : hundredsOfLogFilesArrayObject) {
byte[] fileContents = readContentsOfFile(file);
executor.execute(new RunnableSlavePatternMatcher(fileContents));
}
As for the optimal thread count, that depends.
If your app is I/O bound (which is quite possible if you're not doing extremely heavy processing of the contents), a single worker thread which can process the file contents while the original thread reads the next file will probably suffice.
If you're CPU bound, you probably don't want many more threads than you've got cores:
ExecutorService executor = Executors.newFixedThreadPool(
Runtime.getRuntime().availableProcessors());
Although, if your threads get suspended a lot (waiting for synchronization locks, or something), you may get better result with more threads. Or if you've got other CPU-munching activitities going on, you may want fewer threads.

You can try using cached thread pool.
public static ExecutorService newCachedThreadPool()
Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available. These pools will typically improve the performance of programs that execute many short-lived asynchronous tasks. Calls to execute will reuse previously constructed threads if available.
You can read more here

Optimization of Thread Pool Executor-java

I am using ThreadPoolexecutor by replacing it with legacy Thread.
I have created executor as below:
pool = new ThreadPoolExecutor(coreSize, size, 0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>(coreSize),
new CustomThreadFactory(name),
new CustomRejectionExecutionHandler());
pool.prestartAllCoreThreads();
Here core size is maxpoolsize/5. I have pre-started all the core threads on start up of application roughly around 160 threads.
In legacy design we were creating and starting around 670 threads.
But the point is even after using Executor and creating and replacing legacy design we are not getting much better results.
For results memory management we are using top command to see memory usage.
For time we have placed loggers of System.currentTime in millis to check the usage.
Please tell how to optimize this design. Thanks.

But the point is even after using Executor and creating and replacing legacy design we are not getting much better results.
I am assuming that you are looking at the overall throughput from your application and you are not seeing a better performance as opposed to running each task in its own thread -- i.e. not with a pool?
This sounds like you were not being blocked because of context switching. Maybe your application is IO bound or otherwise waiting on some other system resource. 670 threads sounds like a lot and you would have been using a lot of thread stack memory but otherwise it may not have been holding back the performance of your application.
Typically we use the ExecutorService classes not necessarily because they are faster than raw threads but because the code is easier to manage. The concurrent classes take care of a lot of locking, queueing, etc. out of your hands.
Couple code comments:
I'm not sure you want the LinkedBlockingQueue to be limited by core-size. Those are two different numbers. core-size is the minimum number of threads in the pool. The size of the BlockingQueue is how many jobs can be queued up waiting for a free thread.
As an aside, the ThreadPoolExecutor will never allocate a thread past the core thread number, unless the BlockingQueue is full. In your case, if all of the core-threads are busy and the queue is full with the core-size number of queued tasks is when the next thread is forked.
I've never had to use pool.prestartAllCoreThreads();. The core threads will be started once tasks are submitted to the pool so I don't think it buys you much -- at least not with a long running application.
For time we have placed loggers of System.currentTime in millis to check the usage.
Be careful on this. Too many loggers could affect performance of your application more than re-architecting it. But I assume you added the loggers after you didn't see a performance improvement.

The executor merely wraps the creation/usage of Threads, so it's not doing anything magical.
It sounds like you have a bottleneck elsewhere. Are you locking on a single object ? Do you have a single single-threaded resource that every thread hits ? In such a case you wouldn't see any change in behaviour.
Is your process CPU-bound ? If so your threads should (very roughly speaking) match the number of processing cores available. Note that each thread you create consumes memory for its stack, and if you're memory bound, then creating multiple threads won't help here.

Java - Managing Size of Thread Pool (Increasing mostly)

I'm trying to use thread pool in Java. But the number of threads is unknown, so I'm trying to find a solution. Then two questions occured:
I'm looking for increasing size of thread pool for some time, but I couldn't come up with something yet. Any suggestions for that? Some say Executors.newCachedThreadPool() should work but in definition of the method it says it's for short-time threads.
What if I set the size of the thread pool as a big number like 50 or 100? Does it work fine?

You can use Executors.newCachedThreadPool for more long-lived tasks also, but the thing is that if you have long running tasks, and they're added constantly and more frequently than existing tasks are being completed, the amount of threads will spin out of control. In such case it might be a better idea to use a (larger) fixed-size thread pool and let the further tasks wait in queue for a free thread.
This will only mean you'll (probably) have lots of alive threads that are sleeping (idle) most of the time. Basically the things to consider are
How many threads can your system handle (ie. how many threads can be created in total, in Windows-machines this can be less than 1000, in Linuces you can get tens of thousands of thread and even more with some tweaking of the system configuration)
Each thread consumes at least the stack size of a single thread in terms of memory (in Linux, this can be something like 1-8MB per thread by default, again it can be tweaked from ulimits and the JVM's -Xss -parameter)
At least with NPTL, there should minimal or almost zero context-switching penalty for sleeping threads, so excess threads aren't "heavy" in terms of cpu usage
That being said, it'd probably be best to use the ThreadPoolExecutor's constructors directly to get the kind of pooling you want.

Executors.newCachedThreadPool() allows you to create thread on demands. I think you can start by using this - I cannot see where it's stated that it's for short-time threads, but I bet the reason is since you are re-using available threads, having short threads allows you to keep the number of simultaneous active threads quite low.
Unless you've got too many threads running (you can check it using JVisualVM or JConsole), I would suggest sticking with that solution - specially because number of expected threads is undefined. Analyze then the VM and tune your pool accordingly.
For question 2 - were you referring to using something like Executors.newFixedThreadPool(int)? If yes, remember that going aobve the number of threads you defined when you've created the ThreadPool will make threads wait - instead of newCachedThreadPool in which new threads are dynamically created.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.