Multithreading performance

Multithreading performance - java

I have Java application which run threads to do some job for N times for each thread and there can be setuped threads number.
Each job iteration for each thread takes from 20 seconds to 1-1.5 minutes. Each thread have to do about 25000-100000 iterations for that job.
So previously added jobs have more "done jobs" and they have higher priority (as I thinks for JVM but prioritet isn't setuped and they have equal priority programmly). But I need threads to do jobs evenly after some new tasks addition.
For example there are 5000 threads to do 100 jobs for 100000 iterations:
Old job #1 doing it
Old job #2 doing it
...
Old job #100 doing it
But when I add for example job #101 then I'll see that threads don't run it as fast as first jobs.
I've used yield() and sleep(50) but it seems not good results.
So can you please tell me what am I doing wrong and how to do excellent performance for too many threads?

Threads are scheduled by the OS scheduler, and you can't expect them to execute in a fixed order like this. Only that each thread should have some allocated time from the scheduler. If the tasks are independant from the other ones, you shouldn't care about the order anyway. If they're not independant, then you should make them collaborate to make sure that everything is executed in the appropriate order.
Having 5000 threads is probably far too much. How many processors does your machine have? What are the tasks executed by the threads. If they're CPU-bound, your best bet is to have a number of threads equal to the number of processors, or to the number of processors + 1.

It's hard to tell you what you are doing wrong (if anything) because you didn't tell us anything about how you exactly implemented this.
You might want to use an ExecutorService (thread pool) that can automatically distribute jobs for you over a number of threads. See Executor Interfaces from Oracle's Java Tutorials.
You didn't tell us anything about what the program is doing, but maybe the Java 7 Fork/Join framework might be useful, if the problem is suited for that.

Related

What is the optimal way to call N blocking api calls in parallel to get most out of CPU while waiting for api calls to complete?

I am working on a micro-service which have following flow of operations :
A request comes to execute some number of tasks. After some pre-processing I know that I need to execute some number of tasks let's say 10. Now the tasks are independent of each other, so can be executed in parallel. And each task have some processing step and some external API calls. And after all tasks are completed the combined results need to be returned.
This is one request, so obviously this micro service can get many such requests in parallel too.
Here API calls are most time-consuming operations and other work executes in very less time comparatively. So I want to design this in a way that as many tasks can be executed in parallel because for tasks mostly would be blocked for API calls.
A simple solution I see is using a thread-pool using ExecutorService, but it doesn't seem like a ideal solution because let's say I create a thread pool of 32 threads, and I get 60 tasks. So only 32 would be executing at a time, even though those 32 tasks are blocked for api call and not using much CPU time.
Is this possible to achieve without breaking the task as a single unit ?

The optimal number of threads depends on the number of cores the server has and the time the I/O workload takes. See http://baddotrobot.com/blog/2013/06/01/optimum-number-of-threads/ to calculate that.
In short it states: threads = number of cores * (1 + wait time / service time)
The timings has to come from your observation and measurements.
For the rest, you can use the CompletableFuture as mentioned in the comments or you can use the Executors class: Executors.newFixedThreadPool(<num of threads>);

You will have to do some benchmarking to figure out what is optimal for your setup. You might want to look into using ThreadPoolExecutor which can scale up and down the number of threads according to how many threads are available in the pool. There are a few parameters you can adjust in your benchmarks, namely corePoolSize and maximumPoolSize.

One way to deal with this is to ensure that the thread pool alway contains at least n threads in the runnable state (where n is usually equal to the number of CPU cores). This implies that you need to manage blocking and whenever a thread starts blocking add a thread to the pool and remove it again once it comes out of blocking.
Java's ForkJoinPool.ManagedBlocker is part of a solution for a similar problem when working with parallel streams.
Scala generalises and simplifies this aspect through the ExecutionContext used when working with futures.

The TheadPoolExecutor has some control parameters (core pool size (<32), maximum pool size (60)), allowing 60 threads for 32 cores would work well when 28 active threads are blocked.
The constellation you describe would often utilize a task queue, but you asked for the most CPU utilizing strategy. Though with microservices other aspects (than cores) play a role too.

ThreadPoolExecutor#execute. How to reuse running threads?

I used to use ThreadPoolExecutors for years and one of the main reasons - it is designed to 'faster' process many requests because of parallelism and 'ready-to-go' threads (there are other though).
Now I'm stuck on minding inner design well known before.
Here is snippet from java 8 ThreadPoolExecutor:
public void execute(Runnable command) {
...
/*
* Proceed in 3 steps:
*
* 1. If fewer than corePoolSize threads are running, try to
* start a new thread with the given command as its first
* task. The call to addWorker atomically checks runState and
* workerCount, and so prevents false alarms that would add
* threads when it shouldn't, by returning false.
*/
...
int c = ctl.get();
if (workerCountOf(c) < corePoolSize) {
if (addWorker(command, true))
return;
c = ctl.get();
}
...
I'm interested in this very first step as in most cases you do not want thread poll executor to store 'unprocessed requests' in the internal queue, it is better to leave them in external input Kafka topic / JMS queue etc. So I'm usually designing my performance / parallelism oriented executor to have zero internal capacity and 'caller runs rejection policy'. You chose some sane big amount of parallel threads and core pool timeout not scare others and show how big the value is ;). I don't use internal queue and I want tasks to start to be processed the earlier the better, thus it has become 'fixed thread pool executor'. Thus in most cases I'm under this 'first step' of the method logic.
Here is the question: is this really the case that it will not 'reuse' existing threads but will create new one each time it is 'under core size' (most cases)? Would it be not better to 'add new core thread only if all others are busy' and not 'when we have a chance to suck for a while on another thread creation'? Am I missing anything?

The doc describes the relationship between the corePoolSize, maxPoolSize, and the task queue, and what happens when a task is submitted.
...but will create new one [thread] each time it is 'under core size...'
Yes. From the doc:
When a new task is submitted in method execute(Runnable), and fewer
than corePoolSize threads are running, a new thread is created to
handle the request, even if other worker threads are idle.
Would it be not better to add new core thread only if all others are busy...
Since you don't want to use the internal queue this seems reasonable. So set the corePoolSize and maxPoolSize to be the same. Once the ramp up of creating the threads is complete there won't be any more creation.
However, using CallerRunsPolicy would seem to hurt performance if the external queue grows faster than can be processed.

Here is the question: is this really the case that it will not 'reuse' existing threads but will create new one each time it is 'under core size' (most cases)?
Yes that is how the code is documented and written.
Am I missing anything?
Yes, I think you are missing the whole point of "core" threads. Core threads are defined in the Executors docs are:
... threads to keep in the pool, even if they are idle.
That's the definition. Thread startup is a non trivial process and so if you have 10 core threads in a pool, the first 10 requests to the pool each start a thread until all of the core threads are live. This spreads the startup load across the first X requests. This is not about getting the tasks done, this is about initializing the TPE and spreading the thread creation load out. You could call prestartAllCoreThreads() if you don't want this behavior.
The whole purpose of the core threads is to have threads already started and running available to work on tasks immediately. If we had to start a thread each time we needed one, there would be unnecessary resource allocation time and thread start/stop overhead taking compute and OS resources. If you don't want the core threads then you can let them timeout and pay for the startup time.
I used to use ThreadPoolExecutors for years and one of the main reasons - it is designed to 'faster' process many requests because of parallelism and 'ready-to-go' threads (there are other though).
TPE is not necessarily "faster". We use it because to manually manage and communicate with a number of threads is hard and easy to get wrong. That's why the TPE code is so powerful. It is the OS threads that give us parallelism.
I don't use internal queue and I want tasks to start to be processed the earlier the better,
The entire point of a threaded program is the maximize throughput. If you run 100 threads on a 4 core system and the tasks are CPU intensive, you are going to pay for the increased context switching and the overall time to process a large number of requests is going to decrease. Your application is also most likely competing for resources on the server with other programs and you don't want to cause it to slow to a crawl if 100s of jobs try to run in a thread pool at the same time.
The whole point of limiting your core threads (i.e. not making them a "sane big amount") is that there is an optimal number of concurrent threads that will maximize the overall throughput of your application. It can be really hard to find the optimal core thread size but experimentation, if possible, would help.
It depends highly on the degree of CPU versus IO in a task. If the tasks are making remote RPC calls to a slow service then it might make sense to have a large number of core threads in your pool. If they are predominantly CPU tasks, however, you are going to want to be closer to the number of CPU/cores and then queue the rest of the tasks. Again it is all about overall throughput.

To reuse threads one need somehow to transfer task to existing thread.
This pushed me towards synchronous queue and zero core pool size.
return new ThreadPoolExecutor(0, maxThreadsCount,
10L, SECONDS,
new SynchronousQueue<Runnable>(),
new BasicThreadFactory.Builder().namingPattern("processor-%d").build());
I have really reduced amounts of 'peaks' of 500 - 1500 (ms) on my 'main flow'.
But this will work only for zero-sized queue. For non-zero-sized queue question is still open.

Executor ScheduledThreadPool What are effects of "more" thread pools?

I have a 3 instance of a class which is implementing runnable interface. I am instantiating my Executor class like below;
executor = Executors.newScheduledThreadPool(2);<--- talking about this part
executor.scheduleAtFixedRate(unassignedRunnable, 0, refreshTime, TimeUnit.SECONDS);
executor.scheduleAtFixedRate(assignedToMeRunnable, 2, refreshTime, TimeUnit.SECONDS);
executor.scheduleAtFixedRate(createTicketsFromFile, 3, refreshTime * 2, TimeUnit.SECONDS);
My question is, Does it make any difference, if I change thread pool count from 2 to 1 or 3 ? I tried and gained nothing almost. Can anyone explain the real use of thread pool count ? Maybe my tasks are lightweight ?

You need to understand, it doesn't matter how many threads you are going to create, ultimately, threads would be executed upon number of available cores. Now, as per documentation, it is "the number of threads to keep in the pool, even if they are idle."
Can you tell me what is the real use of thread pool count
executor = Executors.newScheduledThreadPool(2);
Above line of code will create 2 threads in thread pool , but it doesn't mean all will be doing some work. But, on same time, the same thread can be used to perform some other task from thread pool, which was submitted.
So, it is better to understand your requirement before picking the total number of threads to be created. (I usually prefer the number, depending on the number of available cores count)

That is corePoolSize is number thread in pool .Available thread pick the eligible task and run in same thread.If there is no thread available though task is eligble for run will not execute as all threads are busy.In your case may be your tasks very short lived.To demo create corepool size one and submit long running task and after that submit a light task check the behavior then increase the corepoolsize to 2 and see the behavior.

It depends on number of CPU cores of a machine, on which you are running your application. If you have more number of CPU cores, multiple threads can run in parallel and performance of overall system can be improved if your application is not IO Bound application.
CPU bound application will benefit with more number of cores & threads.
If you have 4 core CPU, you can configure the value as 4. If your machine has single CPU core, there won't be any benefit to change the pool size as 4.
Related SE questions:
Java: How to scale threads according to cpu cores?
Is multithreading faster than single thread?

Java Thread storing

So, I have a loop where I create thousands of threads which process my data.
I checked and storing a Thread slows down my app.
It's from my loop:
Record r = new Record(id, data, outPath, debug);
//r.start();
threads.add(r);
//id is 4 digits
//data is something like 500 chars long
It stop my for loop for a while (it takes a second or more for one run, too much!).
Only init > duration: 0:00:06.369
With adding thread to ArrayList > duration: 0:00:07.348
Questions:
what is the best way of storing Threads?
how to make Threads faster?
should I create Threads and run them with special executor, means for example 10 at once, then next 10 etc.? (if yes, then how?)

Consider that having a number of threads that is very high is not very useful.
At least you can execute at the same time a number of threads equals to the number of core of your cpu.
The best is to reuse existing threads. To do that you can use the Executor framework.
For example to create an Executor that handle internally at most 10 threads you can do the followig:
List<Record> records = ...;
ExecutorService executor = Executors.newFixedThreadPool(10);
for (Record r : records) {
executor.submit(r);
}
// At the end stop the executor
executor.shutdown();
With a code similar to this one you can submit also many thousands of commands (Runnable implementations) but no more than 10 threads will be created.

I'm guessing that it is not the .add method that is really slowing you down. My guess is that the hundreds of Threads running in parallel is what really is the problem. Of course a simple command like "add" will be queued in the pipeline and can take long to be executed, even if the execution itself is fast. Also it is possible that your data-structure has an add method that is in O(n).
Possible solutions for this:
* Find a real wait-free solution for this. E.g. prioritising threads.
* Add them all to your data-structure before executing them
While it is possible to work like this it is strongly discouraged to create more than some Threads for stuff like this. You should use the Thread Executor as David Lorenzo already pointed out.

I have a loop where I create thousands of threads...
That's a bad sign right there. Creating threads is expensive.
Presumeably your program creates thousands of threads because it has thousands of tasks to perform. The trick is, to de-couple the threads from the tasks. Create just a few threads, and re-use them.
That's what a thread pool does for you.
Learn about the java.util.concurrent.ThreadPoolExecutor class and related classes (e.g., Future). It implements a thread pool, and chances are very likely that it provides all of the features that you need.
If your needs are simple enough, you can use one of the static methdods in java.util.concurrent.Executors to create and configure a thread pool. (e.g., Executors.newFixedThreadPool(N) will create a new thread pool with exactly N threads.)
If your tasks are all compute bound, then there's no reason to have any more threads than the number of CPUs in the machine. If your tasks spend time waiting for something (e.g., waiting for commands from a network client), then the decision of how many threads to create becomes more complicated: It depends on how much of what resources those threads use. You may need to experiment to find the right number.

At runtime, how can I limit the number of java threads

I am running a genome assembly program *Trinity, http://trinityrnaseq.sourceforge.net/, if interested) on one of the XSEDE resources. The hardware limits the number of threads to 2500, which the program always wants to exceed... It there an easy way to limit the number of threads executed? I have tried -XX:ParallelGCThreads=16, but this seems to introduce new errors.
So, is there a runtime command to limit the total number of threads??

Use an Executor or ExecutorService. Does what bragboy suggests but it's built in to Java.

You can use a custom queue that runs as a separate process that handles the number of threads limitation. The advantage of this is that you can opt to either limit the threads or you can still keep adding the number of threads. You will probably have a addToQueue(Thread t) class and subsequently a consumer consuming all these threads. The Queue will know how many threads are actively running. The daemon process will fire the consume() method of this queue at will if the threads are well within the range. And after every thread finishes or compeltes it job, it reports back to the queue. The Queue that you maintain can be a priority queue if you feel there should be a priority on the running tasks. This not only removes the dependency on the JVM but also makes your program look cleaner.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.