Specifying an applications number of work threads

Specifying an applications number of work threads - java

I'm working on an application that does some heavy number crunching. It is intended to run on a single computer. Recently, we've started looking at multithreading to speed up the calculations. Some of the algorithms can be made to run parallel without much effort, and I use a fixed thread pool to run each of the sub tasks.
What I was wondering is: how is the number of threads (the size of the pool), typically specified for these kinds of algorithms? I suspect this is typically done by using either a configuration file, or a commandline parameter, but I haven't seen any examples like that, so I was wondering if there are better ways.
Related to this: Is it even relevant to specify the number of threads? I was thinking that setting the pool size to the number of assignable cores will likely run the fastest, but is the thread-contention for processing power in the case of over assignment even relevant for performance? Eg: will setting 20 max threads on a 4 core machine be worse than setting 4 max threads?
Edit: the application is intended for sale, so I have no idea what computers it will be run on. I'm looking for general guidelines, and best practices.

A rule of thumb is to use NUMBER_OF_CORES + 1 threads (certain parallel build systems use this) - one "spare" thread to work during another thread's IO. To find the number of cores, refer to this SO question.

The number of threads depends on several factors, mainly the number of HW cores like you mention, and also on the problem being solved/worked on in the threads. It might be best to do some trial and error with different thread pool sizes.
If the work that the threads are doing is hard-core number crunching like you mention, then I wouldnt expect much of an increase in performance by using more threads than HW cores. If the threads are doing work where they have to wait on external entities (like networking) then you may see an increase in performance by using more threads.

Related

Optimal number of threads [duplicate]

Let's say I have a 4-core CPU, and I want to run some process in the minimum amount of time. The process is ideally parallelizable, so I can run chunks of it on an infinite number of threads and each thread takes the same amount of time.
Since I have 4 cores, I don't expect any speedup by running more threads than cores, since a single core is only capable of running a single thread at a given moment. I don't know much about hardware, so this is only a guess.
Is there a benefit to running a parallelizable process on more threads than cores? In other words, will my process finish faster, slower, or in about the same amount of time if I run it using 4000 threads rather than 4 threads?

If your threads don't do I/O, synchronization, etc., and there's nothing else running, 1 thread per core will get you the best performance. However that very likely not the case. Adding more threads usually helps, but after some point, they cause some performance degradation.
Not long ago, I was doing performance testing on a 2 quad-core machine running an ASP.NET application on Mono under a pretty decent load. We played with the minimum and maximum number of threads and in the end we found out that for that particular application in that particular configuration the best throughput was somewhere between 36 and 40 threads. Anything outside those boundaries performed worse. Lesson learned? If I were you, I would test with different number of threads until you find the right number for your application.
One thing for sure: 4k threads will take longer. That's a lot of context switches.

I agree with #Gonzalo's answer. I have a process that doesn't do I/O, and here is what I've found:
Note that all threads work on one array but different ranges (two threads do not access the same index), so the results may differ if they've worked on different arrays.
The 1.86 machine is a macbook air with an SSD. The other mac is an iMac with a normal HDD (I think it's 7200 rpm). The windows machine also has a 7200 rpm HDD.
In this test, the optimal number was equal to the number of cores in the machine.

I know this question is rather old, but things have evolved since 2009.
There are two things to take into account now: the number of cores, and the number of threads that can run within each core.
With Intel processors, the number of threads is defined by the Hyperthreading which is just 2 (when available). But Hyperthreading cuts your execution time by two, even when not using 2 threads! (i.e. 1 pipeline shared between two processes -- this is good when you have more processes, not so good otherwise. More cores are definitively better!) Note that modern CPUs generally have more pipelines to divide the workload, so it's no really divided by two anymore. But Hyperthreading still shares a lot of the CPU units between the two threads (some call those logical CPUs).
On other processors you may have 2, 4, or even 8 threads. So if you have 8 cores each of which support 8 threads, you could have 64 processes running in parallel without context switching.
"No context switching" is obviously not true if you run with a standard operating system which will do context switching for all sorts of other things out of your control. But that's the main idea. Some OSes let you allocate processors so only your application has access/usage of said processor!
From my own experience, if you have a lot of I/O, multiple threads is good. If you have very heavy memory intensive work (read source 1, read source 2, fast computation, write) then having more threads doesn't help. Again, this depends on how much data you read/write simultaneously (i.e. if you use SSE 4.2 and read 256 bits values, that stops all threads in their step... in other words, 1 thread is probably a lot easier to implement and probably nearly as speedy if not actually faster. This will depend on your process & memory architecture, some advanced servers manage separate memory ranges for separate cores so separate threads will be faster assuming your data is properly filed... which is why, on some architectures, 4 processes will run faster than 1 process with 4 threads.)

The answer depends on the complexity of the algorithms used in the program. I came up with a method to calculate the optimal number of threads by making two measurements of processing times Tn and Tm for two arbitrary number of threads ‘n’ and ‘m’. For linear algorithms, the optimal number of threads will be N = sqrt ( (mn(Tm*(n-1) – Tn*(m-1)))/(nTn-mTm) ) .
Please read my article regarding calculations of the optimal number for various algorithms: pavelkazenin.wordpress.com

The actual performance will depend on how much voluntary yielding each thread will do. For example, if the threads do NO I/O at all and use no system services (i.e. they're 100% cpu-bound) then 1 thread per core is the optimal. If the threads do anything that requires waiting, then you'll have to experiment to determine the optimal number of threads. 4000 threads would incur significant scheduling overhead, so that's probably not optimal either.

I thought I'd add another perspective here. The answer depends on whether the question is assuming weak scaling or strong scaling.
From Wikipedia:
Weak scaling: how the solution time varies with the number of processors for a fixed problem size per processor.
Strong scaling: how the solution time varies with the number of processors for a fixed total problem size.
If the question is assuming weak scaling then #Gonzalo's answer suffices. However if the question is assuming strong scaling, there's something more to add. In strong scaling you're assuming a fixed workload size so if you increase the number of threads, the size of the data that each thread needs to work on decreases. On modern CPUs memory accesses are expensive and would be preferable to maintain locality by keeping the data in caches. Therefore, the likely optimal number of threads can be found when the dataset of each thread fits in each core's cache (I'm not going into the details of discussing whether it's L1/L2/L3 cache(s) of the system).
This holds true even when the number of threads exceeds the number of cores. For example assume there's 8 arbitrary unit (or AU) of work in the program which will be executed on a 4 core machine.
Case 1: run with four threads where each thread needs to complete 2AU. Each thread takes 10s to complete (with a lot of cache misses). With four cores the total amount of time will be 10s (10s * 4 threads / 4 cores).
Case 2: run with eight threads where each thread needs to complete 1AU. Each thread takes only 2s (instead of 5s because of the reduced amount of cache misses). With four cores the total amount of time will be 4s (2s * 8 threads / 4 cores).
I've simplified the problem and ignored overheads mentioned in other answers (e.g., context switches) but hope you get the point that it might be beneficial to have more number of threads than the available number of cores, depending on the data size you're dealing with.

4000 threads at one time is pretty high.
The answer is yes and no. If you are doing a lot of blocking I/O in each thread, then yes, you could show significant speedups doing up to probably 3 or 4 threads per logical core.
If you are not doing a lot of blocking things however, then the extra overhead with threading will just make it slower. So use a profiler and see where the bottlenecks are in each possibly parallel piece. If you are doing heavy computations, then more than 1 thread per CPU won't help. If you are doing a lot of memory transfer, it won't help either. If you are doing a lot of I/O though such as for disk access or internet access, then yes multiple threads will help up to a certain extent, or at the least make the application more responsive.

Benchmark.
I'd start ramping up the number of threads for an application, starting at 1, and then go to something like 100, run three-five trials for each number of threads, and build yourself a graph of operation speed vs. number of threads.
You should that the four thread case is optimal, with slight rises in runtime after that, but maybe not. It may be that your application is bandwidth limited, ie, the dataset you're loading into memory is huge, you're getting lots of cache misses, etc, such that 2 threads are optimal.
You can't know until you test.

You will find how many threads you can run on your machine by running htop or ps command that returns number of process on your machine.
You can use man page about 'ps' command.
man ps
If you want to calculate number of all users process, you can use one of these commands:
ps -aux| wc -l
ps -eLf | wc -l
Calculating number of an user process:
ps --User root | wc -l
Also, you can use "htop" [Reference]:
Installing on Ubuntu or Debian:
sudo apt-get install htop
Installing on Redhat or CentOS:
yum install htop
dnf install htop [On Fedora 22+ releases]
If you want to compile htop from source code, you will find it here.

The ideal is 1 thread per core, as long as none of the threads will block.
One case where this may not be true: there are other threads running on the core, in which case more threads may give your program a bigger slice of the execution time.

One example of lots of threads ("thread pool") vs one per core is that of implementing a web-server in Linux or in Windows.
Since sockets are polled in Linux a lot of threads may increase the likelihood of one of them polling the right socket at the right time - but the overall processing cost will be very high.
In Windows the server will be implemented using I/O Completion Ports - IOCPs - which will make the application event driven: if an I/O completes the OS launches a stand-by thread to process it. When the processing has completed (usually with another I/O operation as in a request-response pair) the thread returns to the IOCP port (queue) to wait for the next completion.
If no I/O has completed there is no processing to be done and no thread is launched.
Indeed, Microsoft recommends no more than one thread per core in IOCP implementations. Any I/O may be attached to the IOCP mechanism. IOCs may also be posted by the application, if necessary.

speaking from computation and memory bound point of view (scientific computing) 4000 threads will make application run really slow. Part of the problem is a very high overhead of context switching and most likely very poor memory locality.
But it also depends on your architecture. From where I heard Niagara processors are suppose to be able to handle multiple threads on a single core using some kind of advanced pipelining technique. However I have no experience with those processors.

Hope this makes sense, Check the CPU and Memory utilization and put some threshold value. If the threshold value is crossed,don't allow to create new thread else allow...

difference between 96 threads and 6500 threads?

I recently read in a post that I can run 96 threads. However, when I look at my PC's performance, and posts like this, I see thousands of threads being able to run.
I assume the word thread is being used for two different things here (correct me if I'm wrong and explain). I want to know what the difference between the two "thread"s are. Why is one post saying 92 but another saying 6500?

The answer is: both links "talk" about the same thread.
The major difference is: the first is effectively asking about the number of threads that a given CPU can really execute in parallel.
The other link talks about the fact how many threads you can coexist within a certain scope (for example one JVM).
In other words: the major "idea" behind threads is that ... most of the time, they are idle! So having 6400 threads can work out - assuming that your workload is such, that 99.9% of the time, each thread is just doing nothing (like: waiting for something to happen). But of course: such a high number is probably not a good idea, unless we are talking about a really huge server that has zillions of cores to work with. One has to keep in mind that threads are also a resource, owned by the operating system, and many problems that you did solve using "more threads" in the past have no different answers (like using nio packages and non-blocking io instead of having zillions of threads waiting for responses for example).
Meaning: when you write example code where each thread just computes something (so, if run alone, that thread would consume 100% of the available CPU cycles) - then adding more threads just creates more load on the system.
Typically, a modern day CPU has c cores. And each cores can run t threads in parallel. So, you got often like 4 x 2 threads that can occupy the CPU in parallel. But as soon as your threads spent more time doing nothing (waiting for a disk read or network request to come back), you can easily create, manage, and utilize hundreds or even thousands of threads.

Will threading help improve efficiency in Java?

My application is supposed to have a "realtime with pause" functionality. The user can pause execution, do some things that modify what's going to happen, then unpause and let stuff happen. Stuff happens at regular intervals as specified by the user, can be slow, can be fast.
My goal at using threading here is to improve performance on multicore systems. The amount of data that the application is supposed to crunch at the time intervals is supposed to be arbitrarily large (I expect lots and lots of loops over collections, modifying object properties and generating random numbers, but precious little disk access). I don't want the application to be constrained by the capacity of a single core, if it can use more to run faster.
Will this actually work this way?
I've run some tests (made a program crunch numbers a lot, and looked at CPU usage during its activity), but it's not really conclusive - usage is certainly in the proximity of 100% on my dual core machine, but hardly ever 100%. Does a single-threaded (main only) Java application use all available cores for computation?

Does a single-threaded (main only) Java application use all available cores for computation?
No, it will normally use a single core.
Making a program do computations in parallel with multiple threads may make it faster, but it's not a magical solution for any kind of problem. Whether this is a suitable solution for your program depends on what your program is doing exactly, and if the algorithm can be parallelized. If, for example, you are doing lots of computations where the next computation depends on the result of the previous computation, then making it multi-threaded will not help a lot, because you can't do the computations at the same time - the next one first has to wait for the answer of the previous one. So, you first have to think about what computations in your program could be run in parallel.
Java has a lot of support for multi-threading. You can program with threads directly, or use an executor service, or use the fork/join framework. Whatever is appropriate depends on what exactly you want to do.

Does a single-threaded (main only) Java application use all available cores for computation?
Not usually, but you could make use of some higher level apis in java that is actually using threads for you and youre not even usinfpg threads directly, more obviousiously fork/join and executors, less obvious the new Streams API on collections (ie parallelStream).
In general, though, to make use of all cores, you need to do some kind of concurrency. Further...its really hard to just observe you OS monitor to see what is going on (especially with only 2 cores)...your OS has other things going on (trying to manage itself, running your IDE, running crontab, running a browers to post to stackoverflow ;).
Finally, just implementing (concurrency) itself may not help, you have to do it "right" for your code/algorithm.

a java thread will run in a single cpu. to use multiple CPUs, you should have multiple threads.

Imagine that u have to do various tasks using your hand. You will do it slowly using one hand and more effciently using both your hands. Similarly, in java or in any other language multi threading provides the system with many hands. The good news is that you can have many threads to do different tasks. Running operations in a single thread will make the program sluggish and sometimes unresponsive. A good practice is to do long running tasks in a separate thread. For example loading large chunks of data from a database should be processed in a separate thread. Downloading data from the internet should also be processed in a separate thread. What happens if you do long running operations in the main thread? The program HANGS and will become unresponsive till the task gets completed and the user will think that there is someting wrong. I hope you get it

How many threads is it advisable to have running at the same time in Java?

I am new to multithreading in Java, after looking at Java virtual machine - maximum number of threads it would appear there isn't a limit to how many threads a Java/Android app can run. However, is there an advisable limit? What I mean by this is, is there a number of threads where if you run past this number then it is unwise because you are unable to determine what thread does what at what time? I hope my question makes sense.

There are some advisable limits, however they don't really have anything to do with keeping track of them.
Most multithreading comes with locking. If you are using central data storage or global mutable state then the more threads you have, the more lock contention you will get. This is app-specific and depends on how much of said state you have and how often threads read and write it.
There are no limits in desktop JVMs by default, but there are OS limits.It should be in the tens of thousands for modern Windows machines, but don't rely on the ability to create much more than that.
Running multiple tasks in parallel is great, but the hardware can only cope with so much. If you are using small threads that get fired up sometimes, and spend most their time idle, that's no biggie (Java servers were written like this for years). However if your threads are very intensive, making more of them than the number of cores you have is not likely to give you any benefit. (I believe the standard practice is twice the number of cores if you anticipate threads going idle sometimes).
Threads have a cost to them. Whenever you switch Threads you switch context, and while it isn't that expensive, doing it constantly will hurt performance. It's not a good idea to create a Thread to sum up two integers and write back a result.
If Threads need visibility of each others state, then they are greatly slowed down, since a lot of their writes have to be written back to main memory. Threads are best used for standalone tasks that require little interaction with each other.
TL;DR
Depends on OS and Hardware: on servers creating thousands of threads is fine, on desktop machines you should limit yourself to 50-200 and choose carefully what you do with them.
Note: Androids default and suggested "UI multithread helper" - the AsyncTask is not actually a thread. It's a task invoked from a ThreadPool, and as such there is no limit or penalty to using it. It has an upper limit on the number of threads it spawns and reuses them rather than creating new ones. Most Android apps should use it instead of spawning their own threads. In general, Thread Pools are fairly widespread and are a great choice unless you are forced into blocking operations.

How to decide the suitable number of threads to create in java?

I have a java application that creates SSL socket with remote hosts. I want to employ threads to hasten the process.
I want the maximum possible utilization that does not affect the program performance. How can I decide the suitable number of threads to use? After running the following line: Runtime.getRuntime().availableProcessors(); I got 4. My processor is an Intel core i7, with 8 GB RAM.

If you have 4 cores, then in theory you should have exactly four worker threads going at any given time for maximum optimization. Unfortunately what happens in theory never happens in practice. You may have worker threads who, for whatever reason, have significant amounts of downtime. Perhaps they're hitting the web for more data, or reading from a disk, much of which is just waiting and not utilizing the cpu.
Depending on how much waiting you're doing, you'll want to bump up the number. The costs to increasing the number of threads is that you'll have more context switching and competition for resources. The benefits are that you'll have another thread ready to work in case one of the other threads decides it has to break for something.
Your best bet is to set it to something (let's start with 4) and work your way up. Profile your code with each setting, and see if your benchmarks go up or down. You should soon see a pattern and a break-even point.
When it comes to optimization, you can theorize all you want about what should be the fastest, but you won't beat actually running and timing your code to truly answer this question.

As DarthVader said, you can use a ThreadPool (CachedThreadPool). With this construct you don't have to specify a concrete number of threads.
From the oracle site:
The newCachedThreadPool method creates an executor with an expandable thread pool. This executor is suitable for applications that launch many short-lived tasks.
Maybe thats what you are looking for.
About the number of cores is hard to say. You have 4 hyperthreading cores, at least one core you should leave for your OS. i would say 4-6 Threads.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.