Program Performance in java fluctuates with thread variation - java

The title I admit is a bit misleading but I am sort of confused why this happens.
I've written a program in java that takes an argument x that instantiates x number of threads to do the work of the program. The machine i'm running it on has 8 cores / can handle 32 threads in parallel (each core has 4 hyperthreads). When I run the program past 8 threads (i.e. 22), I notice that if I run it with an even amount of threads, the program runs faster as opposed to when I run it with 23 threads (which is actually slower). The performance difference is about 10% between the two. Why would this be? Thread overhead doesn't really take this into account and I feel that as long as im running <32 threads, it should only be faster as I increase the # of threads.
To give you an idea what the program is doing, the program is taking a 1000 * 1000 array and each thread is assigned a portion of that array to update (roundoffs/leftovers in uneven are given to the last thread instantiated).
Is there any good reason for the odd/even thread performance difference?

Two reasons I can imagine:
The need to synchronize the memory access of your cores/threads. This will eventually invalidate CPU core caches and such things, which brings down performance. Try giving them really disjoint tasks, don't let them work on the same array. See: the memory isn't managed in individual bytes.
Hyperthreading CPUs often don't have full performance. They may for example have to share some floating point units. This doesn't mattern when e.g. one thread is integer-math heavy and the other is float-heavy. But having four threads each needing the floating point units means probably waiting, switching contexts, signalling the other thread, switching context back, waiting again...
Just two guesses. For example, you should have given the actual CPU you are using, the partitioning scheme you are, and a more detailed hint about the computational task.

Related

How much threads can be created and executed on single cpu core?

If I have a program with multithreading and 1 dual-core cpu on working machine, how much threads can I create parallel for these 2 cores? In some articles I saw that cpu cor can handle only on thread. Does it mean that I can create only 2 threads? Or I can create multiple threads, but only 2 of them will be executed by core in a tact? I know, that this question is simple, but, I'm confused a little bit
Modern hardware and OS combinations can easily handle thousands of them. And they can all be in the 'running state'.
However, let's say you have 2000 threads, 2 CPUs in the box, and each CPU has 16 cores, then certainly at most 32 of your running threads are truly 'executing stuff' - the other ones are ready to go if any of he 32 currently actually executing threads do something that isn't CPU-dependent (for example, waiting for some bytes to flow in from the network, or bytes to be read from a disk), or just time passes (eventually the CPU+OS+JVM work together to pre-empt an executing thread - to pause it so that it doesn't hog the resources).
The bottleneck you're most likely to run into first is memory: Each thread needs a stack, and stacks take memory. If for example you are working with 1MB stacks, creating 2000 threads means you now have 2GB worth of stackspace alone, and the java heap (Where all objects live) probably also takes a GB or 2, that's 4. On your average provisioned IAAS box you don't have that much.
A simple solution to that problem is that you can actually control the stack size when you create a new thread object. You can use this to make much smaller stacks, for example, 64k stacks - now 2000 threads only take an eight of a GB. Of course, a 64k stack is not capable of running particularly deeply nested methods - usually if you're creating thousands of threads, the code they actually end up running should be relatively simple and not create large stacks or do recursive calling.
If you mess this up, you'll get StackOverflowError. Then you know to either adjust the code or increase the stack sizes.
If you're not making thousands of threads (merely, say, hundreds), don't worry about it. Just.. make em and run em, trust the OS+CPU+JVM to sort it out for you.

Does how many processes can be executed dependent on number of cores [duplicate]

Please I got confused about something.
What I know is that the maximum number of threads that can run concurrently on a normal CPU of a modern computer ranges from 8 to 16 threads.
On the other hand, using GPUs thousands of threads can run concurrently without the scheduler interrupting any thread to schedule another one.
On several posts as:
Java virtual machine - maximum number of threads https://community.oracle.com/message/10312772
people are stating that they run thousands of java threads concurrently on normal CPUs.
How could this be ??
And how can I know the maximum number of threads that can run concurrently so that my code adjusts it self dynamically according to the underlying architecture.
Threads aren't tied to or limited by the number of available processors/cores. The operating system scheduler can switch back and forth between any number of threads on a single CPU. This is the meaning of "preemptive multitasking."
Of course, if you have more threads than cores, not all threads will be executing simultaneously. Some will be on hold, waiting for a time slot.
In practice, the number of threads you can have is limited by the scheduler - but that number is usually very high (thousands or more). It will vary from OS to OS and with individual versions.
As far as how many threads are useful from a performance standpoint, as you said it depends on the number of available processors and on whether the task is IO or CPU bound. Experiment to find the optimal number and make it configurable if possible.
There is hardware and software concurrency. The 8 to 16 threads refers to the hardware you have - that is one or more CPUs with hardware to execute 8 to 16 threads parallel to each other. The thousands of threads refers to the number of software threads, the scheduler will have to swap them out so every software thread gets its time slice to run on the hardware.
To get the number of hardware threads you can try Runtime.availableProcessors().
At any given time, a processor will run the number of threads equal to the number of cores contained. This means that on a uniprocessor system, only one thread (or no thread) is being run at any given moment.
However, processors do not run each thread one after another, rather they switch between multiple threads rapidly to simulate concurrent execution. If this weren't the case let alone create multiple threads, you won't even be able to start multiple applications.
A java thread (compared to processor instructions) is a very high level abstraction of a set of instructions for the CPU to process. When it gets down to the processor level, there is no guarantee which threads will run on which core at any given time. But given that processors rapidly switch between these threads, it is theoretically possible to create an infinite amount of threads albeit at the cost of performance.
If you think about it, a modern computer has thousands of threads running at the same time (combining all applications) while only having 1 ~ 16 (typical case) number of cores. Without this task-switching, nothing would ever get done.
If you are optimizing your application, you should consider the amount of threads you need by the work at hand, and not by the underlying architecture. Performance gains from parallelism should be weighted against increasing overheads of thread execution. Since every machine is different, every runtime environment is different, it is impractical to work out some golden thread count (however, a ballpark estimate may be made by benchmarking and looking at number of cores).
While all the other answers have explained how you can theoretically have thousands of threads in your application at the cost of memory and other overheads already well explained here. It is however worth noting that the default concurrencyLevel for the data structures provided in the java.util.concurrent package is 16.
You will come across contention issues if you don't account for the same.
Using a significantly higher value than you need can waste space and time, and a significantly lower value can lead to thread contention.
Make sure you have set the appropriate concurrencyLevel in case you are running into issues related to concurrency with a higher number of threads.

new string(byte[]) too slow in multithreaded applications

I noticed that when I call new String(byte[]) from a single thread, it's pretty fast. But when I call it from several different threads it becomes painfully slow.
For example I have a parser that makes calls to new String(bytes). If I called the parser 50 times sequentially each parse takes about 100ms, but if I make 50 threads and call the parse, each parse takes between 12000ms to 21000ms! (it becomes slower in the later threads). It seems as if the String(bytes) construction is defined to be synchronized.
The profiler tracked the bottleneck to new String(bytes) and indeed when I changed it to new String("Hello") the parser became as fast in multi-thread as it was in single thread.
Does anyone know why this is the case? And what's the workaround?
Update:
My verification test was wrong because apparently Java compiler has some internal optimizations that shares String objects instead of creating new ones when I call new String("Hello"). So that's why it was faster when I made that change. I will rewrite my test code and will update this question.
Answer:
Both #nafas and #peter answers below are correct. The String itself was not slow, but the profiler mistakenly identified them as the bottleneck. The real culprits were:
Having more threads than available cores.
Garbage collector pausing the execution in the middle of operation because too many temporary objects were created and destroyed.
Swap
If the RAM is low, the JVM together with the OS swap to the HDD. Increase the availabe Xmx and Xms for a better Performance.
I'm pretty sure, it's not String class, because they are immutable, they have nothing to do with synchronization or threads.
if the number of cores relatively low (say 2) and you make 50 threads, that reduces speeds by quite a large factor. its because you have created extra complexity in your program.
each time your CPU cycles through threads costs some times,
There is no rule of thumb on choosing number of threads I normally go with :
number of Cores + 1
NOTE:
if you have say lots of API calls I would go for a bigger number of threads, whereas if threads are purely using CPU (as you are using them) then I go for a smaller number of threads (threads= #cores + 1)

Java threads are not actually executed in parallel?

Until now I was under the impression that 2 threads that start in the same time are also executed in parallel (both running their piece of codes in the same time), but I read some documentation recently and I understood that they actually take turns on the execution of their code, so there is no piece of code for first thread executed in the same time as a piece of code from the second thread.
Is my understanding correct?
If yes, then how multi-threading is faster then one thread execution?
I'm asking this because the only difference is that a single thread executes the code sequential, while multithreading can take turns on the execution, but still should take the same amount of time since it's nothing done in parallel
a) on multi-processor machines, threads can actually run in parallel (one per CPU)
b) If your thread calls Thread.sleep() while waiting for IO etc., it makes resources available to other threads. So multi-threaded applications are actually faster than single-threaded ones when dealing with external resources
Java threads are executed in parallel if there are enough CPUs available for a JVM. You can't run 2 computations on a machine with a single computing element at the same time, so this computing element is used either by first, or by second computation at any given time. Probably what you've read concerned this circumstances.
No, Java threads are executed in parallel (unlike some other platforms like CPython). However, whether that gives performance improvements depends on the code you execute.
If you test with easily parallelizable & CPU intensive tasks like calculating PI with a parallelizable algorithm or resizing lots of images etc., you can easily demonstrate that performance can be increased basically linearly (if you have 2 CPUs = x2, 4 CPUs = x4 etc.)
EDIT:
When you only have one CPU, multi-threading is still beneficial. For example, you can have one thread reading images from the disk while the other thread resizes the images. This will also improve the performance because you can utilize the CPU without waste.
EDIT2:
When you read and resize images (note the plural) in a single thread, then you will see that CPU usage won't be 100% at all times. This is because while the thread is reading from file, it can't perform the resizing. If you had more than one thread, by the time a resize has finished another file would have been ready in-memory. If you are dealing with big images, it's relatively easy to peg the CPU at 100% with this design.
Well the answer of you question depends on the number of CPU a system has .
Keep in mind that a single CPU can process only one thread at a time but the context switching between the threads is so fast that it seems that the threads are running concurrently.
On your second question If yes, then how multi-threading is faster then one thread execution?
Mutlithreading utilizes the CPU cycles . Say if one thread is blocked on some resource , other threads might get a chance to run .
On a side note , go through this blog page if you want to see some basic multithreading tutorials http://javasolutionsonline.blogspot.in/p/java-concurrency.html
umm..threads do run in parallel...but not in your conventional pcs that had single cores..
if you have a multi core chip or many CPUs , then they can run in parallel..
imagine one thread running on every of the quad-cores...
thread give u many other advantages as well , as you must already know

How many threads should I use in my Java program?

I recently inherited a small Java program that takes information from a large database, does some processing and produces a detailed image regarding the information. The original author wrote the code using a single thread, then later modified it to allow it to use multiple threads.
In the code he defines a constant;
// number of threads
public static final int THREADS = Runtime.getRuntime().availableProcessors();
Which then sets the number of threads that are used to create the image.
I understand his reasoning that the number of threads cannot be greater than the number of available processors, so set it the the amount to get the full potential out of the processor(s). Is this correct? or is there a better way to utilize the full potential of the processor(s)?
EDIT: To give some more clarification, The specific algorithm that is being threaded scales to the resolution of the picture being created, (1 thread per pixel). That is obviously not the best solution though. The work that this algorithm does is what takes all the time, and is wholly mathematical operations, there are no locks or other factors that will cause any given thread to sleep. I just want to maximize the programs CPU utilization to decrease the time to completion.
Threads are fine, but as others have noted, you have to be highly aware of your bottlenecks. Your algorithm sounds like it would be susceptible to cache contention between multiple CPUs - this is particularly nasty because it has the potential to hit the performance of all of your threads (normally you think of using multiple threads to continue processing while waiting for slow or high latency IO operations).
Cache contention is a very important aspect of using multi CPUs to process a highly parallelized algorithm: Make sure that you take your memory utilization into account. If you can construct your data objects so each thread has it's own memory that it is working on, you can greatly reduce cache contention between the CPUs. For example, it may be easier to have a big array of ints and have different threads working on different parts of that array - but in Java, the bounds checks on that array are going to be trying to access the same address in memory, which can cause a given CPU to have to reload data from L2 or L3 cache.
Splitting the data into it's own data structures, and configure those data structures so they are thread local (might even be more optimal to use ThreadLocal - that actually uses constructs in the OS that provide guarantees that the CPU can use to optimize cache.
The best piece of advice I can give you is test, test, test. Don't make assumptions about how CPUs will perform - there is a huge amount of magic going on in CPUs these days, often with counterintuitive results. Note also that the JIT runtime optimization will add an additional layer of complexity here (maybe good, maybe not).
On the one hand, you'd like to think Threads == CPU/Cores makes perfect sense. Why have a thread if there's nothing to run it?
The detail boils down to "what are the threads doing". A thread that's idle waiting for a network packet or a disk block is CPU time wasted.
If your threads are CPU heavy, then a 1:1 correlation makes some sense. If you have a single "read the DB" thread that feeds the other threads, and a single "Dump the data" thread and pulls data from the CPU threads and create output, those two could most likely easily share a CPU while the CPU heavy threads keep churning away.
The real answer, as with all sorts of things, is to measure it. Since the number is configurable (apparently), configure it! Run it with 1:1 threads to CPUs, 2:1, 1.5:1, whatever, and time the results. Fast one wins.
The number that your application needs; no more, and no less.
Obviously, if you're writing an application which contains some parallelisable algorithm, then you can probably start benchmarking to find a good balance in the number of threads, but bear in mind that hundreds of threads won't speed up any operation.
If your algorithm can't be parallelised, then no number of additional threads is going to help.
Yes, that's a perfectly reasonable approach. One thread per processor/core will maximize processing power and minimize context switching. I'd probably leave that as-is unless I found a problem via benchmarking/profiling.
One thing to note is that the JVM does not guarantee availableProcessors() will be constant, so technically, you should check it immediately before spawning your threads. I doubt that this value is likely to change at runtime on typical computers, though.
P.S. As others have pointed out, if your process is not CPU-bound, this approach is unlikely to be optimal. Since you say these threads are being used to generate images, though, I assume you are CPU bound.
number of processors is a good start; but if those threads do a lot of i/o, then might be better with more... or less.
first think of what are the resources available and what do you want to optimise (least time to finish, least impact to other tasks, etc). then do the math.
sometimes it could be better if you dedicate a thread or two to each i/o resource, and the others fight for CPU. the analisys is usually easier on these designs.
The benefit of using threads is to reduce wall-clock execution time of your program by allowing your program to work on a different part of the job while another part is waiting for something to happen (usually I/O). If your program is totally CPU bound adding threads will only slow it down. If it is fully or partially I/O bound, adding threads may help but there's a balance point to be struck between the overhead of adding threads and the additional work that will get accomplished. To make the number of threads equal to the number of processors will yield peak performance if the program is totally, or near-totally CPU-bound.
As with many questions with the word "should" in them, the answer is, "It depends". If you think you can get better performance, adjust the number of threads up or down and benchmark the application's performance. Also take into account any other factors that might influence the decision (if your application is eating 100% of the computer's available horsepower, the performance of other applications will be reduced).
This assumes that the multi-threaded code is written properly etc. If the original developer only had one CPU, he would never have had a chance to experience problems with poorly-written threading code. So you should probably test behaviour as well as performance when adjusting the number of threads.
By the way, you might want to consider allowing the number of threads to be configured at run time instead of compile time to make this whole process easier.
After seeing your edit, it's quite possible that one thread per CPU is as good as it gets. Your application seems quite parallelizable. If you have extra hardware you can use GridGain to grid-enable your app and have it run on multiple machines. That's probably about the only thing, beyond buying faster / more cores, that will speed it up.

Categories

Resources