Program execution slows down the more threads I have running (Java)

Program execution slows down the more threads I have running (Java) - java

I'm experiencing some strange behaviour in a java program. Basically, I have a list of items to process, which I can choose to process one at a time, or all at once (which means 3-4 at a time). Each item needs about 10 threads to be processed, so processing 1 item at a time = 10 threads, 2 at a time = 20 threads, 4 at a time = 40 threads, etc.
Here's the strange thing, if I process just one item, its done in approx 50-150 milliseconds. But if I process 2 at a time, it goes up to 200-300 ms per item. 3 at a time = 300-500MS per item, 4 at a time = 400-700 MS per item, etc.
Why is this happening? I've done prior research which says that jvm can handle upto 3000-4000 threads easily, so why does it slow down with just 30-40 threads for me? Is this normal behavior? I thought that having 40 threads would mean each thread would work in parallel rather than in a queue as it seems to be.

How many CPU cores do you have?
If I have one CPU core, and I max out a single threaded application on it, the CPU is always busy, if I give it two threads, both doing this heavy task I don't get double-the-cpu, no, they both get ~0.5 seconds / second (seconds per second) of CPU time take away the time the OS needs to switch threads.
So it doubles the time taken for each thread to work, but they might finish at about the same time (depending on the scheduler)
If you have two CPU cores.... then it'd (theoretically again) finish in the same time as one thread, because one thread can't use two cpu cores (at the same time)
Then there's hardware threads, some threads yield or sleep, if they're reading/writing the OS will run other threads while they are blocked, so forth....
Does this help?

It would be nice to see some source code.
Without it i have only 4 assumptions :
1) You haven't done the load balancing. You should consider about optimal number of threads.
2) Work, executed by each thread does not justify the time, needed to setup and start the thread (+ context switching time).
3) There is the real problems with your code quality
4) Weak hardware

Related

Context switching in for loop

I've the following java code. Will there be any "context switching" occuring during its execution?
Collection<MyBusinessClass> myCollection = getMyCollection();//has 1000 items
for (MyBusinessClass item : myCollection) {
new Thread(() -> {
MyLongRunningTask();
}).start();
Thanks.

Unless you have enough cores to host all the 1000 threads (+ 1 main briefly) + whatever few threads JVM needs like GC and finalizer, your threads will have to share cores. And hence context switches will occur. I am assuming here that MyLongRunningTask actually runs for a sufficiently long time for all of them to be still active by the time your last ones are spawned, otherwise probably the required number of available cores is a bit lower.
We can try to concoct a scenario how the scheduler actually runs ALL tasks sequentially, never 'overlapping', by having very short tasks (or a fairly crazy scheduler) so you can get away with a small number of CPU cores. But that seems off the point.

Java multithreading doesn't use the threads at 100%

I have this program that has to run 10k times a specific method, which is fairly heavy, every time with different input.
I tried both to thread it naturally (one thread per input) and tried to thread it with nr_threads= Runtime.getRuntime().availableProcessors() threads, or 2-3 times that amount (the method has different complexity based on the input, so I found out that if I deploy exactly nr_threads threads, then one usually is still alive when all the other threads are dead.
When I run it locally on my computer (4 physical cores, 8 considering the virtual ones) it runs at 100%, but every time I try to run it on a server (amazon instances, if that matters), where I have 36 or 72 cores, the average load per thread is between 15 and 25%.
I use a Callable class for the multithreading, and from that one I call only static methods. I also update a matrix, but I'm sure that no two threads try to access the same cell, so it shouldn't be a concurrency issue. RAM usage is also fairly safe ( 40GB out of 60 ), so I would exclude intense GC activity. But I have no idea how to test GC activity.
Does anyone know why it's using only 25% of each thread?
Also, I get that 10k threads might be a bit overwhelming for the computer to handle, but I found no info about it. Is there a best practice when it comes down to the number of threads deployed?
This is another section of the program, that deploys 10000 threads all of them heavily loaded. It takes 15 minutes to complete.
This is the code I am referring to, it deploys 5 times the available processors (so 180 in total). It takes around 10 minutes to complete. Nothing changes if I run 10000 threads instead.

Executors.newFixedThreadPool() - how expensive is this operation

I have a requirement where i need to process some tasks for current live shows.
This is a scheduled tasks and runs every minute.
At any given minute, there can be any number of live shows(though number cannot be that large, approx max 10). There are more than 20 functionalities needs to be done for all the live shows. or say 20 worker classes are there , all are doing there job.
Let say for first functionality, there are 5 shows, then after few minutes shows reduced to 2, then again after few minutes shows increase to 7.
Currently i am doing something like this,
int totalShowsCount = getCurrentShowsCount();
ExecutorService executor = Executors.newFixedThreadPool(showIds.size());
The above statements gets executed every minute.
Problem Statement
1.) How much expensive the above operation be..??. Creating fixedThreadPool at every given minute.
2.) What can i do to optimize my solution, should i use a fixed thread pool, say (10), and maybe 3 or 5 or 6 or any number of threads getting utilized at any given minute.
Can i create a fixed thread pool at worker level, and maintain it and
utilize that.
FYI, using Java8, if any better approach is available.

How much expensive the above operation be..??. Creating fixedThreadPool at every given minute.
Creating a thread pool is a relatively expensive operation which can take milli-seconds. You don't want to be doing this many times per second.
A second is an eternity for a computer, if you have a 36 core machine it can execute as much as 100 billion instructions in that amount of time. A minute is a very, very long time, and if you only do something once a minute you could even restart your JVM every minute and still get reasonable throughput.
What can i do to optimize my solution, should i use a fixed thread pool, say (10), and maybe 3 or 5 or 6 or any number of threads getting utilized at any given minute.
Possibly, it depends on what you are doing. Without most analysis you could say for sure. Note: If you are using parallelStream(), if not you should see if you can, you can use the built in ForkJoinPool.commonPool() and not need to create another pool. But again, this depends on what you are doing.

Optimising max number of threads running on a CPU

Just wondering what is the best way to decide when to stop creating new threads on a single-core machine which is running the same program multiple times as a thread?
The threads are fetching web content and doing a bit of processing, which means the load of each thread is not constant all the way until the thread terminates.
I'm thinking to have a thread which monitors the CPU/RAM load, and stop creating threads if the load reaches a certain treshold, but also stop creating threads if a certain threads count has been reached, to make sure the CPU doesn't get overloaded.
Any feedback on what techniques are out there to achieve this?
Many thanks,
Vladimir

It is going to be difficult to do this by monitoring the CPU used by the current process. Those numbers tend to lag reality and the result is going to be peaks and valleys to a large degree. The problem is that your threads are mostly going to be blocked by IO and there is not any good way to anticipate when bytes will be available to be read in the near future.
That said, you could start out with a ThreadPoolExecutor at a certain max thread number (for a single processor let's say 4) and then check every 10 seconds or so the load average. If the load average is below what you want then you could call setMaximumPoolSize(...) with a larger value to increase it for the next 10 seconds. You may need to poll 30 or more seconds between each calculation to smooth out the performance of your application.
You could use the following code to track your total CPU time for all threads. Not sure if that's the best way to do it
long total = 0;
for (long id : threadMxBean.getAllThreadIds()) {
long cpuTime = threadMxBean.getThreadCpuTime(id);
if (cpuTime > 0) {
total += cpuTime;
}
}
// since is in nano-seconds
long currentCpuMillis = total / 1000000;
Instead of trying to maximize the CPU level for your spider, you might consider trying to maximize throughput. Take the sample of the number of pages spidered per a unit of time and increase or decrease the max number of threads in your ExecutorService until this is maximized.
One thing to consider is to use NIO and selectors so your threads are always busy as opposed to always waiting for IO. Here's a good example tutorial about NIO/Selectors. You might also consider using Pyronet which seems to provide some good features around NIO.

If async I/O is not a good fit, I would consider using thread pools, e.g. ThreadPoolExecutor, so you don't have the overhead of creating, destroying and recreating threads.
Then I would do performance testing to tweak the max number of threads offers the best performance.
You could start with 10 threads, then rerun your performance test with 20 threads until you hone in on an optimal value. At the same time I would use system tools (depending on your OS) to monitor the thread run queue, JVM, etc.
For the performance test you would have to ensure that your test is repeatable (i.e. using the same inputs) and representative of the actual input that your program would be using.

time required to finish the multithreaded program?

A java process starts 5 threads , each thread takes 5 minutes. what will be the minimum and maximum time taken by process? will be of great help if one can explain in java threads and OS threads.
Edit : I want to know how java schedule threads at OS level.

This depends on the amount of logical processor cores you have and the already running processes and the priority of the threads. The theoretical minimum would be 5 minutes plus the little overhead in starting and controlling threads, if you have at least five logical processor cores. The theoretical maximum would be 25 minutes plus the little overhead, if you have only one logical processor core available. The mentioned overhead is usually not more than a few milliseconds.
The theoretical maximum can however be unpredictably (much) higher if there are at the same time a lot of another running threads with a higher priority from other processes than the JVM.
Edit : I want to know how java schedule threads at OS level.
The JVM just spawns another native thread and it get assigned to the process associated with JVM itself.

Minimum time, 5 minutes, assuming that threads run entirely concurrently with no interdependencies and have a dedicated core available. Maximum time, 25 minutes, assuming that each thread has to have exclusive use of some global resource and so can't run in parallel with any other thread.

A glib (but realistic answer) for the maximum is that they might take an infinite amount of time to complete, as multi-threaded programs often contain deadlock bugs.

It depends! There isn't enough information to quantify this.
Missing Info: Hardware - How many threads can run at the same time on your CPU. Workload - Does it take 5 minutes because it's doing something for 5 minutes or is it performing some calculation that usually takes about 5 min's and uses a lot of CPU resources.
When you run multiple threads concurrently there can be lock waits for resources or the threads may even have to take turns executing and although they have been running for 5 minuets they may only have had a few CPU seconds.
5 threads never euqals 5X output. It can get close but will never reach 5X.

I am not sure whether you are looking for CPU time spent by the thread. If that is the case, you can measure the CPU time, see below
ThreadMXBean tb = ManagementFactory.getThreadMXBean()
long startTime= tb.getCurrentThreadCpuTime();
Call the above when thread is created
long endTime= tb.getCurrentThreadCpuTime();
The difference between endTime - starTime, is the CPU time that the thread used

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.