How many Futures is too many in Java?

How many Futures is too many in Java? - java

When trying to determine how the tasks should break down in a data processing server in Java, I need to know how many Futures is too many for ExecutorService.
To my understanding, ExecutorServices with a pool of heavyweight threads, handles Futures like they are green thread, meaning the cost to perform a context switch between Futures is very small. Is this true?
Should I submit millions of Futures to ExecutorService (using fixed number of threads in the pool)?
Can I expect to submit many very-short-lived Futures (10 ms) into Executor service without seeing severe performance degradation?

You're conflating a Future, which represents the possible result of an asynchronous operation with a Thread which represents the ability to perform processing on a Callable (in the case of an Executor at least).
There's nothing to stop you calling submit on a thread pool millions of times and get a huge list of Future objects for you to wait on. You don't even need to wait for them to finish if the application will continue running and you have no need to process the result.
But.
If you create all these jobs, they are going to require memory to hold their state. If that memory is somehow part of the input to the job, or the result of executing the job, then you will commit heap space to all these tasks. You can't do this forever. Essentially, you need to think of some sort of throttling, if you're going to pull huge amount of work into a process to run in the background.

To my understanding, ExecutorServices with a pool of heavyweight threads, handles Futures like they are green thread
That's not correct. If we ignore the bells and whistles, an ExecutorService consists of a collection of worker threads, and a blocking queue of tasks. Each task in the queue is a wrapper containing one of your tasks, and a Future.
Each worker thread loops forever,
Picks a task from the queue,
Calls your Runnable or Callable object's run() or call(...) method,
Completes the Future with the value returned by your method or, with an exception that was thrown by your method.
Goes back to wait for another task.
The only threads are the "heavy weight" worker threads. Once one of the worker threads starts to work on a task, it won't do anything else until the task is complete. Tasks that haven't yet been started are just objects in a queue, and the Executor forgets about each task and Future object as soon as the Future is completed. Those won't continue exist after your own code has discarded the references to them.

Should I submit millions of Futures to ExecutorService?
you can, but you should evaluate possible time overhead. The overhead of handling separate Future object is small, but greater than zero. So the less number of tasks, the better. On the other hand, when the number of tasks becomes less then the number of processors (that is, the number of processor cores with respect to hyperthreading), then the level of parallelism is reduced and the overall execution time increases.
Let you have 1 million of 10-ms tasks, and your computer have 8 cores. Then the overall execution time of 1250 sec is increased by (10*8/2) = 40 ms because of decreasing parallelism at the end, and plus 125 ms for task switch (I evaluate it as little as 1 useq for each task switch). if your have 100000 100-ms tasks, then the execution time is still expected to be 1250 sec, plus 400 ms for tail and 12.5 ms for switch. Either way, the time overhead is neglidgible, but it can increase if your tasks are significantly shorter or longer than 10...100 interval.

Related

Single ScheduledExecutorService instance vs Multiple ScheduledExecutorService instances

I have a service which schedules async tasks using ScheduledExecutorService for the user. Each user will trigger the service to schedule two tasks. (The 1st Task schedule the 2nd task with a fixed delay, such as 10 seconds interval)
pseudocode code illustration:
task1Future = threadPoolTaskScheduler.schedule(task1);
for(int i = 0; i< 10000; ++i) {
task2Future = threadPoolTaskScheduler.schedule(task2);
task2Future.get(); // Takes long time
Thread.sleep(10);
}
task1.Future.get();
Suppose I have a potential of 10000 users using the service at the same time, we can have two kinds of ScheduledExecutorService configuration for my service:
A single ScheduledExecutorService for all the users.
Create a ScheduledExecutorService for each user.
What I can think about the first method:
Pros:
Easy to control the number of threads in the thread pool.
Avoid creating new threads for scheduled tasks.
Cons:
Always keeping multiple number of threads available could waste computer resources.
May cause the hang of the service because of lacking available threads. (For example, set the thread pool size to 10, and then there is a 100 person using the service the same time, then after entering the 1st task and it tries to schedule the 2nd task, then finding out there is no thread available for scheduling the 2nd task)
What I can think about the second method
Pros:
Avoiding always keep many threads available when the number of user is small.
Can always provide threads for a large number of simultaneously usage.
Cons:
Creating new threads creates overheads.
Don't know how to control the number of maximum threads for the service. May cause the RAM out of space.
Any ideas about which way is better?

Single ScheduledExecutorService drives many tasks
The entire point of a ScheduledExecutorService is to maintain a collection of tasks to be executed after a certain amount of time elapses.
So given the scenario you describe, you need only a single ScheduledExecutorService object. Submit your 10,000 tasks to that one object. Each task will be executed approximately when its designated delay elapses. Simple, and easy.
Thread pool size
The real issue is deciding how many threads to assign to the ScheduledExecutorService.
Threads, as currently implemented in the OpenJDK project, are mapped directly to host OS threads. This makes them relatively heavyweight in terms of CPU and memory usage. In other words, currently Java threads are “expensive”.
There is no simple easy answer to calculating thread pool size. The optimal number is the least amount of threads that can keep up with the workload without over-burdening the host machine’s limited number of cores and limited memory. If you search Stack Overflow, you’ll find many discussions on the topic of deciding how many threads to use in a pool.
Project Loom
And keep tabs with the progress of Project Loom and its promise to bring virtual threads to Java. That technology has the potential to radically alter the calculus of deciding thread pool size. Virtual threads will be more efficient with CPU and with memory. In other words, virtual threads will be quite “cheap”, “inexpensive”.
How executor service works
You said:
entering the 1st task and it tries to schedule the 2nd task, then finding out there is no thread available for scheduling the 2nd task
That is not how the scheduled executor service (SES) works.
If a task being currently executed by a SES needs to schedule itself or some other task to later execution, that submitted task is added to the queue maintained internally by the SES. There is no need to have a thread immediately available. Nothing happens immediately except that queue addition. Later, when the added task’s specified delay has elapsed, the SES looks for an available thread in its thread-pool to execute that task that was queued a while back in time.
You seem to feel a need to manage the time of each task’s execution on certain threads. But that is the job of the scheduled executor service. The SES tracks the tasks submitted for execution, notices when their specified delay elapses, and schedules their execution on a thread from its managed pool of threads. You don’t need to manage any of that. Your only challenge is to assign an appropriate number of threads to the pool.
Multiple executor services
You commented:
why don't use multiple ScheduledExecutorService instances
Because in your scenario, there is no benefit. Your Question implies that you have many tasks all similar with none being prioritized. In such a case, just use one executor service. One scheduled executor service with 12 threads will get the same amount of work accomplished as 3 services with 4 threads each.
As for excess threads, they are not a burden. Any thread without a task to execute uses virtually no CPU time. A pool may or may not choose to close some unused threads after a while. But such a policy is up to the implementation of the thread pool of the executor service, and is transparent to us as calling programmers.
If the scenario were different, where some of the tasks block for long periods of time, or where you need to prioritize certain tasks, then you may want to segregate those into a separate executor service.
In today's Java (before Project Loom with virtual threads), when code in a thread blocks, that thread sits there doing nothing but waiting to unblock. Blocking means your code is performing an operation that awaits a response. For example, making network calls to a socket or web service blocks, writing to storage blocks, and accessing an external database blocks. Ideally, you would not write code that blocks for long periods of time. But sometimes you must.
In such a case where some tasks run long, or conversely you have some tasks that must be prioritized for fast execution, then yes, use multiple executor services.
For example, say you have a 16-core machine with not much else running except your Java app. You might have one executor service with a thread pool size of 4 maximum for long-running tasks, one executor service with a thread pool with a size of 7 maximum for many run-of-the-mill tasks, and a third executor service with a thread pool maximum size of 2 for very few tasks that run short but must run quickly. (The numbers here are arbitrary examples, not a recommendation.)
Other approaches
As commented, there are other frameworks for managing concurrency. The ScheduledExecutorService discussed here is general purpose.
For example, Swing, JavaFX, Spring, and Jakarta EE each have their own concurrency management. Consider using those where approriate to your particular project.

ThreadPoolExecutor#execute. How to reuse running threads?

I used to use ThreadPoolExecutors for years and one of the main reasons - it is designed to 'faster' process many requests because of parallelism and 'ready-to-go' threads (there are other though).
Now I'm stuck on minding inner design well known before.
Here is snippet from java 8 ThreadPoolExecutor:
public void execute(Runnable command) {
...
/*
* Proceed in 3 steps:
*
* 1. If fewer than corePoolSize threads are running, try to
* start a new thread with the given command as its first
* task. The call to addWorker atomically checks runState and
* workerCount, and so prevents false alarms that would add
* threads when it shouldn't, by returning false.
*/
...
int c = ctl.get();
if (workerCountOf(c) < corePoolSize) {
if (addWorker(command, true))
return;
c = ctl.get();
}
...
I'm interested in this very first step as in most cases you do not want thread poll executor to store 'unprocessed requests' in the internal queue, it is better to leave them in external input Kafka topic / JMS queue etc. So I'm usually designing my performance / parallelism oriented executor to have zero internal capacity and 'caller runs rejection policy'. You chose some sane big amount of parallel threads and core pool timeout not scare others and show how big the value is ;). I don't use internal queue and I want tasks to start to be processed the earlier the better, thus it has become 'fixed thread pool executor'. Thus in most cases I'm under this 'first step' of the method logic.
Here is the question: is this really the case that it will not 'reuse' existing threads but will create new one each time it is 'under core size' (most cases)? Would it be not better to 'add new core thread only if all others are busy' and not 'when we have a chance to suck for a while on another thread creation'? Am I missing anything?

The doc describes the relationship between the corePoolSize, maxPoolSize, and the task queue, and what happens when a task is submitted.
...but will create new one [thread] each time it is 'under core size...'
Yes. From the doc:
When a new task is submitted in method execute(Runnable), and fewer
than corePoolSize threads are running, a new thread is created to
handle the request, even if other worker threads are idle.
Would it be not better to add new core thread only if all others are busy...
Since you don't want to use the internal queue this seems reasonable. So set the corePoolSize and maxPoolSize to be the same. Once the ramp up of creating the threads is complete there won't be any more creation.
However, using CallerRunsPolicy would seem to hurt performance if the external queue grows faster than can be processed.

Here is the question: is this really the case that it will not 'reuse' existing threads but will create new one each time it is 'under core size' (most cases)?
Yes that is how the code is documented and written.
Am I missing anything?
Yes, I think you are missing the whole point of "core" threads. Core threads are defined in the Executors docs are:
... threads to keep in the pool, even if they are idle.
That's the definition. Thread startup is a non trivial process and so if you have 10 core threads in a pool, the first 10 requests to the pool each start a thread until all of the core threads are live. This spreads the startup load across the first X requests. This is not about getting the tasks done, this is about initializing the TPE and spreading the thread creation load out. You could call prestartAllCoreThreads() if you don't want this behavior.
The whole purpose of the core threads is to have threads already started and running available to work on tasks immediately. If we had to start a thread each time we needed one, there would be unnecessary resource allocation time and thread start/stop overhead taking compute and OS resources. If you don't want the core threads then you can let them timeout and pay for the startup time.
I used to use ThreadPoolExecutors for years and one of the main reasons - it is designed to 'faster' process many requests because of parallelism and 'ready-to-go' threads (there are other though).
TPE is not necessarily "faster". We use it because to manually manage and communicate with a number of threads is hard and easy to get wrong. That's why the TPE code is so powerful. It is the OS threads that give us parallelism.
I don't use internal queue and I want tasks to start to be processed the earlier the better,
The entire point of a threaded program is the maximize throughput. If you run 100 threads on a 4 core system and the tasks are CPU intensive, you are going to pay for the increased context switching and the overall time to process a large number of requests is going to decrease. Your application is also most likely competing for resources on the server with other programs and you don't want to cause it to slow to a crawl if 100s of jobs try to run in a thread pool at the same time.
The whole point of limiting your core threads (i.e. not making them a "sane big amount") is that there is an optimal number of concurrent threads that will maximize the overall throughput of your application. It can be really hard to find the optimal core thread size but experimentation, if possible, would help.
It depends highly on the degree of CPU versus IO in a task. If the tasks are making remote RPC calls to a slow service then it might make sense to have a large number of core threads in your pool. If they are predominantly CPU tasks, however, you are going to want to be closer to the number of CPU/cores and then queue the rest of the tasks. Again it is all about overall throughput.

To reuse threads one need somehow to transfer task to existing thread.
This pushed me towards synchronous queue and zero core pool size.
return new ThreadPoolExecutor(0, maxThreadsCount,
10L, SECONDS,
new SynchronousQueue<Runnable>(),
new BasicThreadFactory.Builder().namingPattern("processor-%d").build());
I have really reduced amounts of 'peaks' of 500 - 1500 (ms) on my 'main flow'.
But this will work only for zero-sized queue. For non-zero-sized queue question is still open.

Java Thread storing

So, I have a loop where I create thousands of threads which process my data.
I checked and storing a Thread slows down my app.
It's from my loop:
Record r = new Record(id, data, outPath, debug);
//r.start();
threads.add(r);
//id is 4 digits
//data is something like 500 chars long
It stop my for loop for a while (it takes a second or more for one run, too much!).
Only init > duration: 0:00:06.369
With adding thread to ArrayList > duration: 0:00:07.348
Questions:
what is the best way of storing Threads?
how to make Threads faster?
should I create Threads and run them with special executor, means for example 10 at once, then next 10 etc.? (if yes, then how?)

Consider that having a number of threads that is very high is not very useful.
At least you can execute at the same time a number of threads equals to the number of core of your cpu.
The best is to reuse existing threads. To do that you can use the Executor framework.
For example to create an Executor that handle internally at most 10 threads you can do the followig:
List<Record> records = ...;
ExecutorService executor = Executors.newFixedThreadPool(10);
for (Record r : records) {
executor.submit(r);
}
// At the end stop the executor
executor.shutdown();
With a code similar to this one you can submit also many thousands of commands (Runnable implementations) but no more than 10 threads will be created.

I'm guessing that it is not the .add method that is really slowing you down. My guess is that the hundreds of Threads running in parallel is what really is the problem. Of course a simple command like "add" will be queued in the pipeline and can take long to be executed, even if the execution itself is fast. Also it is possible that your data-structure has an add method that is in O(n).
Possible solutions for this:
* Find a real wait-free solution for this. E.g. prioritising threads.
* Add them all to your data-structure before executing them
While it is possible to work like this it is strongly discouraged to create more than some Threads for stuff like this. You should use the Thread Executor as David Lorenzo already pointed out.

I have a loop where I create thousands of threads...
That's a bad sign right there. Creating threads is expensive.
Presumeably your program creates thousands of threads because it has thousands of tasks to perform. The trick is, to de-couple the threads from the tasks. Create just a few threads, and re-use them.
That's what a thread pool does for you.
Learn about the java.util.concurrent.ThreadPoolExecutor class and related classes (e.g., Future). It implements a thread pool, and chances are very likely that it provides all of the features that you need.
If your needs are simple enough, you can use one of the static methdods in java.util.concurrent.Executors to create and configure a thread pool. (e.g., Executors.newFixedThreadPool(N) will create a new thread pool with exactly N threads.)
If your tasks are all compute bound, then there's no reason to have any more threads than the number of CPUs in the machine. If your tasks spend time waiting for something (e.g., waiting for commands from a network client), then the decision of how many threads to create becomes more complicated: It depends on how much of what resources those threads use. You may need to experiment to find the right number.

ThreadPoolExecutor - ArrayBlockingQueue ... to wait before it removes an element form the Queue

I am trying to Tune a thread which does the following:
A thread pool with just 1 thread [CorePoolSize =0, maxPoolSize = 1]
The Queue used is a ArrayBlockingQueue
Quesize = 20
BackGround:
The thread tries to read a request and perform an operation on it.
HOWEVER, eventually the requests have increased so much that the thread is always busy and consume 1 CPU which makes it a resource hog.
What I want to do it , instead sample the requests at intervals and process them . Other requests can be safely ignored.
What I would have to do is put a sleep in "operation" function so that for each task the thread sleeps for sometime and releases the CPU.
Quesiton:
However , I was wondering if there is a way to use a queue which basically itself sleeps for sometime before it reads the next element. This would be ideal since sleeping a task in the middle of execution and keeping the execution incomplete just doesn't sound the best to me.
Please let me know if you have any other suggestions as well for the tasks
Thanks.
Edit:
I have added a follow-up question here
corrected the maxpool size to be 1 [written in a haste] .. thanks tim for pointing it out.

No, you can't make the thread sleep while it's in the pool. If there's a task in the queue, it will be executed.
Pausing within a queued task is the only way to force the thread to be idle in spite of queued tasks. Now, the "sleep" doesn't have to be in the same task as the "work"—you could queue a separate rest task after each real task, which might make for a cleaner implementation. More importantly, if the work is a Callable that returns a result, separating into two tasks will allow you to obtain the result as soon as possible.
As a refinement, rather than sleeping for a fixed interval between every task, you could "throttle" execution to a specified rate. This would allow you to avoid waiting unnecessarily between tasks, yet avoid executing too many tasks within a specified time interval. You can read another answer of mine for a simple way to implement this with a DelayQueue.

You could subclass ThreadPool and override beforeExecute to sleep for some time:
#Overrides
protected void beforeExecute(Thread t,
Runnable r){
try{
Thread.sleep( millis); // will sleep the correct thread, see JavaDoc
}
catch (InterruptedException e){}
}
But see AngerClown's comment about artificially slowing down the queue probably not being a good idea.

This might not work for you, but you could try setting the executor's thread priority to low.
Essentially, create the ThreadPoolExecutor with a custom ThreadFactory. Have the ThreadFactory.newThread() method return Threads with a priority of Thread.MIN_PRIORITY. This will cause the executor service you use to only be scheduled if there is an available core to run it.
The implication: On a system that strictly uses time slicing, you will only be given a time slice to execute if there is no other Thread in the entire program with a greater priority asking to be scheduled. Depending on how busy your application really is, you might get scheduled every once in awhile, or you might not be scheduled at all.

The reason the thread is consuming 100% CPU is because it is given more work than it can process. Adding a delay between tasks is not going to fix this problem. It is just make things worse.
Instead you should look at WHY your tasks are consuming so much CPU e.g. with a profiler and change them so that consume less CPU until you find that your thread can keep up and it no longer consumes 100% cpu.

What is the best approach to schedule Callables which may time out in a polling situation?

I have several Callables which query for some JMX Beans, so each one may time out. I want to poll for values lets say every second. The most naive approach would be to start each in a separate thread, but I want to minimize the number of threads. Which options do I have to do it in a better way?

My interpretation is that you have a bunch of Callable objects which need to be polled at some interval. The trouble if you use a thread pool is that the pool will become contaminated with the slowest members, and your faster ones will be starved.
It sounds like you have control over the scheduling, so you might consider an exponential backoff approach. That is, after Callable X has run (and perhaps timed out), you wait 2 seconds instead of 1 second before rescheduling it. If it still fails, go to 4s, then 8s, etc. If you use a ScheduledThreadPoolExecutor, it comes with a built-in way to do this, allowing you to schedule your executions after a set delay.
If you set a constant timeout, this strategy will reduce your pool's susceptibility to monopolization by the slow ones. It is very difficult to get rid of this problem completely. Using a separate thread per queried object is really the only way to make sure you don't get starvation, and that can be very resource-intensive, as you say.
Another strategy is to bucket your pool into a fast one and a slow one. If an object is timing out (say more than N times), you move it to the slow pool. This keeps your fast pool fast, and while the slow ones all get in each others' way, at least they don't clog up the fast pool. If they have good statistics for a while, you can promote them to the fast pool again.

As soon as you submit a Callable you receive a Future - a handle to the future result. You can decide to wait for its completion for a given amount of time:
Future<String> future = executorService.submit(callable);
try {
future.get(1, TimeUnit.SECONDS);
} catch ( TimeoutException e ) {
future.cancel(true);
} catch ...
Calling get with a timeout allows you to receive an exception if the task has not been completed. This does not distinguish between not started tasks and started but not completed. On the other hand cancel will take a boolean parameter mayStopIfRunning so you can choose to e.g. only cancel tasks not yet scheduled.

i agree with robbotic...implementing a 'cachedThreadPool' will solve your problem as it will restrict the number of threads to the optimum level at the same time has timeouts which will free your un-utilized resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.