Java multithreading doesn't use the threads at 100% - java

I have this program that has to run 10k times a specific method, which is fairly heavy, every time with different input.
I tried both to thread it naturally (one thread per input) and tried to thread it with nr_threads= Runtime.getRuntime().availableProcessors() threads, or 2-3 times that amount (the method has different complexity based on the input, so I found out that if I deploy exactly nr_threads threads, then one usually is still alive when all the other threads are dead.
When I run it locally on my computer (4 physical cores, 8 considering the virtual ones) it runs at 100%, but every time I try to run it on a server (amazon instances, if that matters), where I have 36 or 72 cores, the average load per thread is between 15 and 25%.
I use a Callable class for the multithreading, and from that one I call only static methods. I also update a matrix, but I'm sure that no two threads try to access the same cell, so it shouldn't be a concurrency issue. RAM usage is also fairly safe ( 40GB out of 60 ), so I would exclude intense GC activity. But I have no idea how to test GC activity.
Does anyone know why it's using only 25% of each thread?
Also, I get that 10k threads might be a bit overwhelming for the computer to handle, but I found no info about it. Is there a best practice when it comes down to the number of threads deployed?
This is another section of the program, that deploys 10000 threads all of them heavily loaded. It takes 15 minutes to complete.
This is the code I am referring to, it deploys 5 times the available processors (so 180 in total). It takes around 10 minutes to complete. Nothing changes if I run 10000 threads instead.

Related

How much threads can be created and executed on single cpu core?

If I have a program with multithreading and 1 dual-core cpu on working machine, how much threads can I create parallel for these 2 cores? In some articles I saw that cpu cor can handle only on thread. Does it mean that I can create only 2 threads? Or I can create multiple threads, but only 2 of them will be executed by core in a tact? I know, that this question is simple, but, I'm confused a little bit
Modern hardware and OS combinations can easily handle thousands of them. And they can all be in the 'running state'.
However, let's say you have 2000 threads, 2 CPUs in the box, and each CPU has 16 cores, then certainly at most 32 of your running threads are truly 'executing stuff' - the other ones are ready to go if any of he 32 currently actually executing threads do something that isn't CPU-dependent (for example, waiting for some bytes to flow in from the network, or bytes to be read from a disk), or just time passes (eventually the CPU+OS+JVM work together to pre-empt an executing thread - to pause it so that it doesn't hog the resources).
The bottleneck you're most likely to run into first is memory: Each thread needs a stack, and stacks take memory. If for example you are working with 1MB stacks, creating 2000 threads means you now have 2GB worth of stackspace alone, and the java heap (Where all objects live) probably also takes a GB or 2, that's 4. On your average provisioned IAAS box you don't have that much.
A simple solution to that problem is that you can actually control the stack size when you create a new thread object. You can use this to make much smaller stacks, for example, 64k stacks - now 2000 threads only take an eight of a GB. Of course, a 64k stack is not capable of running particularly deeply nested methods - usually if you're creating thousands of threads, the code they actually end up running should be relatively simple and not create large stacks or do recursive calling.
If you mess this up, you'll get StackOverflowError. Then you know to either adjust the code or increase the stack sizes.
If you're not making thousands of threads (merely, say, hundreds), don't worry about it. Just.. make em and run em, trust the OS+CPU+JVM to sort it out for you.

Executors.newFixedThreadPool() - how expensive is this operation

I have a requirement where i need to process some tasks for current live shows.
This is a scheduled tasks and runs every minute.
At any given minute, there can be any number of live shows(though number cannot be that large, approx max 10). There are more than 20 functionalities needs to be done for all the live shows. or say 20 worker classes are there , all are doing there job.
Let say for first functionality, there are 5 shows, then after few minutes shows reduced to 2, then again after few minutes shows increase to 7.
Currently i am doing something like this,
int totalShowsCount = getCurrentShowsCount();
ExecutorService executor = Executors.newFixedThreadPool(showIds.size());
The above statements gets executed every minute.
Problem Statement
1.) How much expensive the above operation be..??. Creating fixedThreadPool at every given minute.
2.) What can i do to optimize my solution, should i use a fixed thread pool, say (10), and maybe 3 or 5 or 6 or any number of threads getting utilized at any given minute.
Can i create a fixed thread pool at worker level, and maintain it and
utilize that.
FYI, using Java8, if any better approach is available.
How much expensive the above operation be..??. Creating fixedThreadPool at every given minute.
Creating a thread pool is a relatively expensive operation which can take milli-seconds. You don't want to be doing this many times per second.
A second is an eternity for a computer, if you have a 36 core machine it can execute as much as 100 billion instructions in that amount of time. A minute is a very, very long time, and if you only do something once a minute you could even restart your JVM every minute and still get reasonable throughput.
What can i do to optimize my solution, should i use a fixed thread pool, say (10), and maybe 3 or 5 or 6 or any number of threads getting utilized at any given minute.
Possibly, it depends on what you are doing. Without most analysis you could say for sure. Note: If you are using parallelStream(), if not you should see if you can, you can use the built in ForkJoinPool.commonPool() and not need to create another pool. But again, this depends on what you are doing.

Program execution slows down the more threads I have running (Java)

I'm experiencing some strange behaviour in a java program. Basically, I have a list of items to process, which I can choose to process one at a time, or all at once (which means 3-4 at a time). Each item needs about 10 threads to be processed, so processing 1 item at a time = 10 threads, 2 at a time = 20 threads, 4 at a time = 40 threads, etc.
Here's the strange thing, if I process just one item, its done in approx 50-150 milliseconds. But if I process 2 at a time, it goes up to 200-300 ms per item. 3 at a time = 300-500MS per item, 4 at a time = 400-700 MS per item, etc.
Why is this happening? I've done prior research which says that jvm can handle upto 3000-4000 threads easily, so why does it slow down with just 30-40 threads for me? Is this normal behavior? I thought that having 40 threads would mean each thread would work in parallel rather than in a queue as it seems to be.
How many CPU cores do you have?
If I have one CPU core, and I max out a single threaded application on it, the CPU is always busy, if I give it two threads, both doing this heavy task I don't get double-the-cpu, no, they both get ~0.5 seconds / second (seconds per second) of CPU time take away the time the OS needs to switch threads.
So it doubles the time taken for each thread to work, but they might finish at about the same time (depending on the scheduler)
If you have two CPU cores.... then it'd (theoretically again) finish in the same time as one thread, because one thread can't use two cpu cores (at the same time)
Then there's hardware threads, some threads yield or sleep, if they're reading/writing the OS will run other threads while they are blocked, so forth....
Does this help?
It would be nice to see some source code.
Without it i have only 4 assumptions :
1) You haven't done the load balancing. You should consider about optimal number of threads.
2) Work, executed by each thread does not justify the time, needed to setup and start the thread (+ context switching time).
3) There is the real problems with your code quality
4) Weak hardware

Program Performance in java fluctuates with thread variation

The title I admit is a bit misleading but I am sort of confused why this happens.
I've written a program in java that takes an argument x that instantiates x number of threads to do the work of the program. The machine i'm running it on has 8 cores / can handle 32 threads in parallel (each core has 4 hyperthreads). When I run the program past 8 threads (i.e. 22), I notice that if I run it with an even amount of threads, the program runs faster as opposed to when I run it with 23 threads (which is actually slower). The performance difference is about 10% between the two. Why would this be? Thread overhead doesn't really take this into account and I feel that as long as im running <32 threads, it should only be faster as I increase the # of threads.
To give you an idea what the program is doing, the program is taking a 1000 * 1000 array and each thread is assigned a portion of that array to update (roundoffs/leftovers in uneven are given to the last thread instantiated).
Is there any good reason for the odd/even thread performance difference?
Two reasons I can imagine:
The need to synchronize the memory access of your cores/threads. This will eventually invalidate CPU core caches and such things, which brings down performance. Try giving them really disjoint tasks, don't let them work on the same array. See: the memory isn't managed in individual bytes.
Hyperthreading CPUs often don't have full performance. They may for example have to share some floating point units. This doesn't mattern when e.g. one thread is integer-math heavy and the other is float-heavy. But having four threads each needing the floating point units means probably waiting, switching contexts, signalling the other thread, switching context back, waiting again...
Just two guesses. For example, you should have given the actual CPU you are using, the partitioning scheme you are, and a more detailed hint about the computational task.

time required to finish the multithreaded program?

A java process starts 5 threads , each thread takes 5 minutes. what will be the minimum and maximum time taken by process? will be of great help if one can explain in java threads and OS threads.
Edit : I want to know how java schedule threads at OS level.
This depends on the amount of logical processor cores you have and the already running processes and the priority of the threads. The theoretical minimum would be 5 minutes plus the little overhead in starting and controlling threads, if you have at least five logical processor cores. The theoretical maximum would be 25 minutes plus the little overhead, if you have only one logical processor core available. The mentioned overhead is usually not more than a few milliseconds.
The theoretical maximum can however be unpredictably (much) higher if there are at the same time a lot of another running threads with a higher priority from other processes than the JVM.
Edit : I want to know how java schedule threads at OS level.
The JVM just spawns another native thread and it get assigned to the process associated with JVM itself.
Minimum time, 5 minutes, assuming that threads run entirely concurrently with no interdependencies and have a dedicated core available. Maximum time, 25 minutes, assuming that each thread has to have exclusive use of some global resource and so can't run in parallel with any other thread.
A glib (but realistic answer) for the maximum is that they might take an infinite amount of time to complete, as multi-threaded programs often contain deadlock bugs.
It depends! There isn't enough information to quantify this.
Missing Info: Hardware - How many threads can run at the same time on your CPU. Workload - Does it take 5 minutes because it's doing something for 5 minutes or is it performing some calculation that usually takes about 5 min's and uses a lot of CPU resources.
When you run multiple threads concurrently there can be lock waits for resources or the threads may even have to take turns executing and although they have been running for 5 minuets they may only have had a few CPU seconds.
5 threads never euqals 5X output. It can get close but will never reach 5X.
I am not sure whether you are looking for CPU time spent by the thread. If that is the case, you can measure the CPU time, see below
ThreadMXBean tb = ManagementFactory.getThreadMXBean()
long startTime= tb.getCurrentThreadCpuTime();
Call the above when thread is created
long endTime= tb.getCurrentThreadCpuTime();
The difference between endTime - starTime, is the CPU time that the thread used

Categories

Resources