How to optimize number of threads to speed up processing

How to optimize number of threads to speed up processing - java

I have read many similar questions . However I was not quite satisfied with answers.
I would like to build an algorithm that would adjust the number of threads depending on the average speed.
Let's say as I introduce a new thread, the average speed of task execution increases , it means that the new thread is good. Then the algorithm should try to add another thread ... until the optimal number of threads is achieved .......
Also the algorithm should be keeping track of the average speed. If at some point the average speed goes down significantly, let's say by 10 % (for any reason e.g. i open a different application or whatever) , then the algorithm should terminate one thread and see if the speed goes up ...
Maybe such an API exists. Please, give me any directions or any code example how I could implement such an algorithm
Thank You !

I do not know self-tune system that you are describing but it sounds like not so complicated task once you are using ready thread pool. Take thread pool from concurrency package, implement class TimeConsumptionCallable implements Callable that wraps any other callable and just measures the execution time.
Now you just have to change (increase or decrease) number of working threads when average execution time increases or decreases.
Just do not forget that you need enough statistics before you decide to change number of working threads. Otherwise various random effects that do not depend on your application can cause your thread pool to grow and go down all the time that can itself kill overall performance.

newCachedThreadPool() V/s newFixedThreadPool suggests that perhaps you should be looking at ExecutorService.newCachedThreadPool().
Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available. These pools will typically improve the performance of programs that execute many short-lived asynchronous tasks. Calls to execute will reuse previously constructed threads if available. If no existing thread is available, a new thread will be created and added to the pool. Threads that have not been used for sixty seconds are terminated and removed from the cache. Thus, a pool that remains idle for long enough will not consume any resources. Note that pools with similar properties but different details (for example, timeout parameters) may be created using ThreadPoolExecutor constructors.

If your threads do not block at any time, then the maximum execution speed is reached when you have as many threads as cores, as simply more than 100% CPU usage is not possible.
In other situations it is very difficult to measure how much a new thread will increase/decrease the execution speed, as you just watch a moment in time and make assumptions based on something that could be entirely different the next second.
One idea would be to use an Executor class in combination with a Queue that you specified. So you can measure the size of the queue and make assumptions based on that. If the queue is empty, threads are idle and you can remove one. If the queue fills up, threads cannot handle the load, you need to add more. If the queue is stable, you are about right.

You can come up with your own algorithm by using existing API of java :
public void setCorePoolSize(int corePoolSize) in ThreadPoolExecutor
Sets the core number of threads. This overrides any value set in the constructor.
If the new value is smaller than the current value, excess existing threads will be terminated when they next become idle.
If larger, new threads will, if needed, be started to execute any queued tasks.
Initialization:
ExecutorService service = Executors.newFixedThreadPool(5); // initializaiton
On your need basis, resize the pool by using below API
((ThreadPoolExecutor)service).setCorePoolSize(newLimit);//newLimit is new size of the pool
And one important point: If the queue is full, and new value of number of threads is greater than or equal to maxPoolSize defined earlier, Task will be rejected.
Be careful when setting maxPoolSize so that setCorePoolSize works properly.

Related

ThreadPoolExecutor#execute. How to reuse running threads?

I used to use ThreadPoolExecutors for years and one of the main reasons - it is designed to 'faster' process many requests because of parallelism and 'ready-to-go' threads (there are other though).
Now I'm stuck on minding inner design well known before.
Here is snippet from java 8 ThreadPoolExecutor:
public void execute(Runnable command) {
...
/*
* Proceed in 3 steps:
*
* 1. If fewer than corePoolSize threads are running, try to
* start a new thread with the given command as its first
* task. The call to addWorker atomically checks runState and
* workerCount, and so prevents false alarms that would add
* threads when it shouldn't, by returning false.
*/
...
int c = ctl.get();
if (workerCountOf(c) < corePoolSize) {
if (addWorker(command, true))
return;
c = ctl.get();
}
...
I'm interested in this very first step as in most cases you do not want thread poll executor to store 'unprocessed requests' in the internal queue, it is better to leave them in external input Kafka topic / JMS queue etc. So I'm usually designing my performance / parallelism oriented executor to have zero internal capacity and 'caller runs rejection policy'. You chose some sane big amount of parallel threads and core pool timeout not scare others and show how big the value is ;). I don't use internal queue and I want tasks to start to be processed the earlier the better, thus it has become 'fixed thread pool executor'. Thus in most cases I'm under this 'first step' of the method logic.
Here is the question: is this really the case that it will not 'reuse' existing threads but will create new one each time it is 'under core size' (most cases)? Would it be not better to 'add new core thread only if all others are busy' and not 'when we have a chance to suck for a while on another thread creation'? Am I missing anything?

The doc describes the relationship between the corePoolSize, maxPoolSize, and the task queue, and what happens when a task is submitted.
...but will create new one [thread] each time it is 'under core size...'
Yes. From the doc:
When a new task is submitted in method execute(Runnable), and fewer
than corePoolSize threads are running, a new thread is created to
handle the request, even if other worker threads are idle.
Would it be not better to add new core thread only if all others are busy...
Since you don't want to use the internal queue this seems reasonable. So set the corePoolSize and maxPoolSize to be the same. Once the ramp up of creating the threads is complete there won't be any more creation.
However, using CallerRunsPolicy would seem to hurt performance if the external queue grows faster than can be processed.

Here is the question: is this really the case that it will not 'reuse' existing threads but will create new one each time it is 'under core size' (most cases)?
Yes that is how the code is documented and written.
Am I missing anything?
Yes, I think you are missing the whole point of "core" threads. Core threads are defined in the Executors docs are:
... threads to keep in the pool, even if they are idle.
That's the definition. Thread startup is a non trivial process and so if you have 10 core threads in a pool, the first 10 requests to the pool each start a thread until all of the core threads are live. This spreads the startup load across the first X requests. This is not about getting the tasks done, this is about initializing the TPE and spreading the thread creation load out. You could call prestartAllCoreThreads() if you don't want this behavior.
The whole purpose of the core threads is to have threads already started and running available to work on tasks immediately. If we had to start a thread each time we needed one, there would be unnecessary resource allocation time and thread start/stop overhead taking compute and OS resources. If you don't want the core threads then you can let them timeout and pay for the startup time.
I used to use ThreadPoolExecutors for years and one of the main reasons - it is designed to 'faster' process many requests because of parallelism and 'ready-to-go' threads (there are other though).
TPE is not necessarily "faster". We use it because to manually manage and communicate with a number of threads is hard and easy to get wrong. That's why the TPE code is so powerful. It is the OS threads that give us parallelism.
I don't use internal queue and I want tasks to start to be processed the earlier the better,
The entire point of a threaded program is the maximize throughput. If you run 100 threads on a 4 core system and the tasks are CPU intensive, you are going to pay for the increased context switching and the overall time to process a large number of requests is going to decrease. Your application is also most likely competing for resources on the server with other programs and you don't want to cause it to slow to a crawl if 100s of jobs try to run in a thread pool at the same time.
The whole point of limiting your core threads (i.e. not making them a "sane big amount") is that there is an optimal number of concurrent threads that will maximize the overall throughput of your application. It can be really hard to find the optimal core thread size but experimentation, if possible, would help.
It depends highly on the degree of CPU versus IO in a task. If the tasks are making remote RPC calls to a slow service then it might make sense to have a large number of core threads in your pool. If they are predominantly CPU tasks, however, you are going to want to be closer to the number of CPU/cores and then queue the rest of the tasks. Again it is all about overall throughput.

To reuse threads one need somehow to transfer task to existing thread.
This pushed me towards synchronous queue and zero core pool size.
return new ThreadPoolExecutor(0, maxThreadsCount,
10L, SECONDS,
new SynchronousQueue<Runnable>(),
new BasicThreadFactory.Builder().namingPattern("processor-%d").build());
I have really reduced amounts of 'peaks' of 500 - 1500 (ms) on my 'main flow'.
But this will work only for zero-sized queue. For non-zero-sized queue question is still open.

CachedThreadPool vs FixedThreadPool

I would like to know which to use CachedThreadPool or FixedThreadPool in this particular scenario.
When the user logins into the app, a list of addresses will be obtained about 10 addresses. I need to do the following:
Convert the address into latitude and longitude for which I am calling a Google API
Obtain distance between the above fetched latitude and longitude with user's current location also with the help of a Google API
So, I have created a class GetDistance which implements Runnable. In this class I am first calling the Google API and parsing the response to get respective latitude and longitude and then calling and parsing result of another Google API to get driving distance.
private void getDistanceOfAllAddresses(List<Items> itemsList) {
ExecutorService exService = newCachedThreadPool(); //Executors.newFixedThreadPool(3);
for(int i =0; i<itemsList.size(); i++) {
exService.submit(new GetDistance(i,usersCurrentLocation));
}
exService.shutdown();
}
I have tried with both CachedThreadPool and FixedThreadPool, time taken is almost the same. I am in favour of CachedThreadPool as it is recommended for small operations, but I have some concerns. Lets assume CachedThreadPool creates 10 threads (worst case) to complete the process (10 items), will it be an issue if my app is running on lower end devices? As number of threads created will also affect the RAM of the device.
I want to know your thoughts and opinions on this. Which is better to use?

Go with newCachedThreadPool it is better fit for this situation, because your task are small and I/O (network) bound. Which means you should create threads (usually 1.5x ~ 2x times) greater than number of processor cores to get optimum output, but here I guess newCachedThreadPool will manage itself. So, newCachedThreadPool will have less overhead as compared to newFixedThreadPool and will help in your situation.
If you had CPU intensive tasks then newFixedThreadPool could have been a better choice.
Update
A list of addresses will be obtained about 10 addresses.
If you need only 10 address always, then it doesn't matter, go with newCachedThreadPool. But if you think that number of address can increase then use newFixedThreadPool with number of threads <= 1.5x to 2x times number of cores available.
From Java docs:
newFixedThreadPool
Creates a thread pool that reuses a
fixed number of threads operating off
a shared unbounded queue. At any
point, at most nThreads threads will
be active processing tasks. If
additional tasks are submitted when
all threads are active, they will wait
in the queue until a thread is
available. If any thread terminates
due to a failure during execution
prior to shutdown, a new one will take
its place if needed to execute
subsequent tasks. The threads in the
pool will exist until it is explicitly
shutdown.
newCachedThreadPool
Creates a thread pool that creates new
threads as needed, but will reuse
previously constructed threads when
they are available. These pools will
typically improve the performance of
programs that execute many short-lived
asynchronous tasks. Calls to execute
will reuse previously constructed
threads if available. If no existing
thread is available, a new thread will
be created and added to the pool.
Threads that have not been used for
sixty seconds are terminated and
removed from the cache. Thus, a pool
that remains idle for long enough will
not consume any resources. Note that
pools with similar properties but
different details (for example,
timeout parameters) may be created
using ThreadPoolExecutor constructors.

Java Thread storing

So, I have a loop where I create thousands of threads which process my data.
I checked and storing a Thread slows down my app.
It's from my loop:
Record r = new Record(id, data, outPath, debug);
//r.start();
threads.add(r);
//id is 4 digits
//data is something like 500 chars long
It stop my for loop for a while (it takes a second or more for one run, too much!).
Only init > duration: 0:00:06.369
With adding thread to ArrayList > duration: 0:00:07.348
Questions:
what is the best way of storing Threads?
how to make Threads faster?
should I create Threads and run them with special executor, means for example 10 at once, then next 10 etc.? (if yes, then how?)

Consider that having a number of threads that is very high is not very useful.
At least you can execute at the same time a number of threads equals to the number of core of your cpu.
The best is to reuse existing threads. To do that you can use the Executor framework.
For example to create an Executor that handle internally at most 10 threads you can do the followig:
List<Record> records = ...;
ExecutorService executor = Executors.newFixedThreadPool(10);
for (Record r : records) {
executor.submit(r);
}
// At the end stop the executor
executor.shutdown();
With a code similar to this one you can submit also many thousands of commands (Runnable implementations) but no more than 10 threads will be created.

I'm guessing that it is not the .add method that is really slowing you down. My guess is that the hundreds of Threads running in parallel is what really is the problem. Of course a simple command like "add" will be queued in the pipeline and can take long to be executed, even if the execution itself is fast. Also it is possible that your data-structure has an add method that is in O(n).
Possible solutions for this:
* Find a real wait-free solution for this. E.g. prioritising threads.
* Add them all to your data-structure before executing them
While it is possible to work like this it is strongly discouraged to create more than some Threads for stuff like this. You should use the Thread Executor as David Lorenzo already pointed out.

I have a loop where I create thousands of threads...
That's a bad sign right there. Creating threads is expensive.
Presumeably your program creates thousands of threads because it has thousands of tasks to perform. The trick is, to de-couple the threads from the tasks. Create just a few threads, and re-use them.
That's what a thread pool does for you.
Learn about the java.util.concurrent.ThreadPoolExecutor class and related classes (e.g., Future). It implements a thread pool, and chances are very likely that it provides all of the features that you need.
If your needs are simple enough, you can use one of the static methdods in java.util.concurrent.Executors to create and configure a thread pool. (e.g., Executors.newFixedThreadPool(N) will create a new thread pool with exactly N threads.)
If your tasks are all compute bound, then there's no reason to have any more threads than the number of CPUs in the machine. If your tasks spend time waiting for something (e.g., waiting for commands from a network client), then the decision of how many threads to create becomes more complicated: It depends on how much of what resources those threads use. You may need to experiment to find the right number.

Sleep until system has spare resources? Low priority task scheduling

There's a list of things that my program needs to periodically check - no events can be assigned to trigger when their state changes. These things are stored in array list as Robot class instances:
public class RobotManager extends Thread {
protected final List<Robot> robots = new ArrayList<>();
}
Every robot has canRun task which returns true if there's someting the robot can do. This includes updating availability of GUI buttons and so on.
My current plan was to sleep for some while, like 800ms, then loop through list and canRun (and eventually start()) every Robot in the list. But this doesn't seem very nice - if there's sufficient number of tasks, the program will lag the system every 800ms. It would be much nicer if the program could:
Tell the OS to sleep for something around 800ms with less precision and try to run where there are spare resources
Do these unprecise sleeps while looping the list to reduce the peak in required resources.
In other words: Can I, in Java, make sleep less precise in favour of running when system has spare resources?

I think you are looking for the Thread.yield()method.
Javadoc:
A hint to the scheduler that the current thread is willing to yield
its current use of a processor. The scheduler is free to ignore this
hint.
Yield is a heuristic attempt to improve relative progression
between threads that would otherwise over-utilise a CPU. Its use
should be combined with detailed profiling and benchmarking to
ensure that it actually has the desired effect.
It is rarely appropriate to use this method. It may be useful
for debugging or testing purposes, where it may help to reproduce
bugs due to race conditions. It may also be useful when designing
concurrency control constructs such as the ones in the
java.util.concurrent.locks package.
With a combination of sleep(...) and yield() you can find a tradeoff between "robots list is not processed often enough" and "it's eating up to much cpu". The amount of time you sleep and the number of yield calls (within the robots and/or between robot calls) depends on the stuff your robots actually do.

What you should do is to set the process/thread priority to Idle or very low priority. A thread/process that has an Idle priority will only be scheduled if no other tasks with higher priority is ready to run. Note that this opens the possibility of starvation if the current machine is actively very busy, the idle thread won't run at all. A low priority thread would still let you get some time slice, only that it'll yield to higher priority threads first. The specific behaviour of thread priority varies depending on the JVM implementation and the OS, but generally a low priority thread will likely be preempted if a higher priority thread becomes ready to run, and it will be less likely to be scheduled if a higher priority thread is ready to run.
Another comment is that I'd recommend to avoid polling for available task, but rather use a BlockingQueue instead of ArrayList and sleeping. If your thread is waiting for a BlockingQueue, it won't be scheduled until there's something in the queue, so you don't have the unpredictable wake-up checks. It's also nicer to the machine as a blocked thread would allow the CPU to enter low power mode (unlike a constantly waking thread, which keeps the CPU at its toes), this can be important if your program is running on a machine with battery.

Java multithreading in CPU load

I have a bit of an issue with an application running multiple Java threads.
The application runs a number of working threads that peek continuously at an input queue and if there are messages in the queue they pull them out and process them.
Among those working threads there is another verification thread scheduled to perform at a fixed period a check to see if the host (on which the application runs) is still in "good shape" to run the application. This thread updates an AtomicBoolean value which in turn is verified by the working thread before they start peeking to see if the host is OK.
My problem is that in cases with high CPU load the thread responsible with the verification will take longer because it has to compete with all the other threads. If the AtomicBoolean does not get updated after a certain period it is automatically set to false, causing me a nasty bottleneck.
My initial approach was to increase the priority of the verification thread, but digging into it deeper I found that this is not a guaranteed behavior and an algorithm shouldn't rely on thread priority to function correctly.
Anyone got any alternative ideas? Thanks!

Instead of peeking into a regular queue data structure, use the java.util.concurrent package's LinkedBlockingQueue.
What you can do is, run an pool of threads (you could use executer service's fixed thread pool, i.e., a number of workers of your choice) and do LinkedBlockingQueue.take().
If a message arrives at the queue, it is fed to one of the waiting threads (yeah, take does block the thread until there is something to be fed with).
Java API Reference for Linked Blocking Queue's take method
HTH.

One old school approach to throttling rate of work, that does not use a health check thread at all (and so by-passes these problems) is to block or reject requests to add to the queue if the queue is longer than say 100. This applies dynamic back pressure on to the clients generating the load, slowing them down when the worker threads are over loaded.
This approach was added to the Java 1.5 library, see java.util.concurrent.ArrayBlockingQueue. Its put(o) method blocks if the queue is full.

Are u using Executor framework (from Java's concurrency package)? If not give it a shot. You could try using ScheduledExecutorService for the verification thread.

More threads does not mean better performance. Usually if you have dual core, 2 threads gives best performance, 3 or more starts getting worse. Quad core should handle 4 threads best, etc. So be careful how much threads you use.
You can put the other threads to sleep after they perform their work, and allow other threads to do their part. I believe Thread.yield() will pause the current thread to give time to other threads.
If you want your thread to run continuously, I would suggest creating two main threads, thread A and B. Use A for the verification thread, and from B, create the other threads. Therefore thread A gets more execution time.

Seems you need to utilize Condition variables. Peeking will take cpu cycles.
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/locks/Condition.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.