ScheduledExecutorService with repeating tasks that have max duration - java

I have the following problem. I need multiple tasks to be performed simultaneously and repeatedly, each in a separate thread. Each task has its own repeat rate interval. Each task execution should take no more than a max duration (timeout) which is different for the different tasks. If a single execution exceeds that duration, it needs to be restarted. The current implementation uses a ScheduledThreadExecutor and scheduleAtFixedRate(...) method. The timeout condition is observed by creating a new thread within each task that performs the task logic and the thread is being joined for the given duration, then interrupted if still alive. This is not memory efficient because it effectively doubles the number of threads for each task: one for each task in the scheduled executor and one more within each task to manage the expiration.
Another idea I have is to utilize a managing thread that constantly iterates through the tasks (the ScheduledFuture instances returned by the executor) and checks the current execution duration, then canceling them if needed and rescheduling. The problem with such an approach is that ScheduledFuture doesn't hold how long the current execution has taken. I can maintain a Map<Future, MyTask> and store the execution duration in the MyTask class but it already starts looking a bit too much.
Do you think there's some better solution that I can use to solve this particular problem?

Related

How many Futures is too many in Java?

When trying to determine how the tasks should break down in a data processing server in Java, I need to know how many Futures is too many for ExecutorService.
To my understanding, ExecutorServices with a pool of heavyweight threads, handles Futures like they are green thread, meaning the cost to perform a context switch between Futures is very small. Is this true?
Should I submit millions of Futures to ExecutorService (using fixed number of threads in the pool)?
Can I expect to submit many very-short-lived Futures (10 ms) into Executor service without seeing severe performance degradation?
You're conflating a Future, which represents the possible result of an asynchronous operation with a Thread which represents the ability to perform processing on a Callable (in the case of an Executor at least).
There's nothing to stop you calling submit on a thread pool millions of times and get a huge list of Future objects for you to wait on. You don't even need to wait for them to finish if the application will continue running and you have no need to process the result.
But.
If you create all these jobs, they are going to require memory to hold their state. If that memory is somehow part of the input to the job, or the result of executing the job, then you will commit heap space to all these tasks. You can't do this forever. Essentially, you need to think of some sort of throttling, if you're going to pull huge amount of work into a process to run in the background.
To my understanding, ExecutorServices with a pool of heavyweight threads, handles Futures like they are green thread
That's not correct. If we ignore the bells and whistles, an ExecutorService consists of a collection of worker threads, and a blocking queue of tasks. Each task in the queue is a wrapper containing one of your tasks, and a Future.
Each worker thread loops forever,
Picks a task from the queue,
Calls your Runnable or Callable object's run() or call(...) method,
Completes the Future with the value returned by your method or, with an exception that was thrown by your method.
Goes back to wait for another task.
The only threads are the "heavy weight" worker threads. Once one of the worker threads starts to work on a task, it won't do anything else until the task is complete. Tasks that haven't yet been started are just objects in a queue, and the Executor forgets about each task and Future object as soon as the Future is completed. Those won't continue exist after your own code has discarded the references to them.
Should I submit millions of Futures to ExecutorService?
you can, but you should evaluate possible time overhead. The overhead of handling separate Future object is small, but greater than zero. So the less number of tasks, the better. On the other hand, when the number of tasks becomes less then the number of processors (that is, the number of processor cores with respect to hyperthreading), then the level of parallelism is reduced and the overall execution time increases.
Let you have 1 million of 10-ms tasks, and your computer have 8 cores. Then the overall execution time of 1250 sec is increased by (10*8/2) = 40 ms because of decreasing parallelism at the end, and plus 125 ms for task switch (I evaluate it as little as 1 useq for each task switch). if your have 100000 100-ms tasks, then the execution time is still expected to be 1250 sec, plus 400 ms for tail and 12.5 ms for switch. Either way, the time overhead is neglidgible, but it can increase if your tasks are significantly shorter or longer than 10...100 interval.

Does Java Timer create a new thread?

I created a Timer object scheduled to run every 1 second and the run method takes 20 seconds to complete. The
Timer.schedule method works as expected: it starts the task immediately after the first task is completed in 20 seconds.
But the Timer.scheduleAtFixedRate method also behaves in the same way. This is what is in the documentation:
In fixed-rate execution, each execution is scheduled relative to the scheduled execution time of the initial execution. If an execution is delayed for any reason (such as garbage collection or other background activity), two or more executions will occur in rapid succession to "catch up.".
I expect that multiple threads will be spun to catch up, but this is not happening.
How can this be explained? What is a good example to demonstrate the difference between these methods?
Java documentation for the Timer class:
Corresponding to each Timer object is a single background thread that is used to execute all of the timer's tasks, sequentially. Timer tasks should complete quickly. If a timer task takes excessive time to complete, it "hogs" the timer's task execution thread. This can, in turn, delay the execution of subsequent tasks, which may "bunch up" and execute in rapid succession when (and if) the offending task finally completes.
The expectation that additional threads will be created to catch up is incorrect. According to the documentation, Timer tasks should complete quickly. A Timer task should not take 20 seconds to complete. An alternative is the ScheduledThreadPoolExecutor class:
A ThreadPoolExecutor that can additionally schedule commands to run after a given delay, or to execute periodically. This class is preferable to Timer when multiple worker threads are needed, or when the additional flexibility or capabilities of ThreadPoolExecutor (which this class extends) are required.
To answer the second question: The difference is that the schedule method "schedules the specified task for repeated fixed-delay execution" and the
scheduleAtFixedRate method "schedules the specified task for repeated fixed-rate execution". This answer explains this difference well.
yes,Java Timer object can be created to run the associated tasks as a daemon thread.
https://www.geeksforgeeks.org/java-util-timer-class-java/

Configure a threadpool for scheduling tasks and their cancellations also

In our current service architecture, we have a single scheduled threadpool, that is used for executing some computation tasks (for real-time data gathering). These tasks are time-bound, so in case they don't complete in a set amount of time, they are interrupted by scheduling cancellation tasks, which are submitted along with the original tasks. The issue we're seeing is that the cancellations are not always on time, and that some of the tasks can take longer to be cancelled than expected.
According to me, this could be because we are using a common threadpool, where sometimes the cancellation tasks are not getting picked up on time. But I'm not sure about this. Is there a way to confirm this, or find the actual cause of the delays in cancellations?
- they are interrupted by scheduling cancellation tasks
you have scheduled a task to interrupt the original on a scheduled threadpool ?
If so, I think the scheduled thread pool guarantees to wait at least for the delay period. Ie there is no guarantee it will run after precisely the delay period.

Using a ScheduledThreadPool with parallel execution

Currently, I have an application that collects data every second and sends it to an API endpoint. To run every second, I am using a ScheduledThreadPoolExecutor that runs the thread which sends the data. The issue is the sending of the data sometimes takes more than one second, and this results in the next sequence of data to be collected more than a second later. Is there any way this can be changed (or other libraries can be used) so that even if a thread is not finished sending the data, another thread can start running in parallel?
The usual way to deal with the desire for overlapping executions of the same scheduled task is to execute the (time consuming) business logic of the task asynchronously.
In other words, when the once-per-second task is triggered, submit the real work to an ExecutorService (either the one you are using for the scheduled tasks or another one). This way, the scheduled task has already finished it's work (to queue the actual work) long before it is time for it to execute again.
Separate out the data collection and send tasks.
Data collection on a separate Thread pool (or a scheduled single thread) and submit the data to another pool whose job is to publish the data
Assuming you are not concerned about out of order invocations on the "API endpoint" then you can create the ScheduledThreadPoolExecutor with a corePoolSize > 1. In this way, every time the scheduler kicks in it will use the first available thread in the pool. And given a corePoolSize > 1 you would need several invocations to take more than 1s before you'd run out of threads.
For additional context: a ScheduledThreadPoolExecutor has a scheduling thread which checks for tasks and on finding one it delegates the task to a worker thread from its internal pool. If the internal pool has a single thread (i.e. corePoolSize=1) then all tasks are executed seriallly and you cannot guarantee that the tasks will be executed every _wait_period_ (though you can be certain about ordering). If you want to insist on the tasks running on schedule and you are not concerned about ordering then you can configure the pool with a corePoolSize which ensures that there is always an available thread in the 'worker' pool every time the scheduler finds a task.
Edit 1: if you are using scheduleAtFixedRate then the other answer which refers to delegating the scheduled invocation to a separate thread pool is an option. If you adopt this approach then corePoolSize=1 will be sufficient since the 'worker' thread is then only reponsible for delegating the task to a separate pool.

I'm confused about ScheduledThreadPoolExecutor.scheduleAtFixedRate and possibly concurrency. Why is there a thread pool?

My confusion starts with this snippet from the api description for the method scheduleAtFixedRate.
If any execution of this task takes longer than its period, then
subsequent executions may start late, but will not concurrently
execute.
If there will be no concurrent execution, why is there a Thread pool?
Also, is there a way to get concurrent execution? I want them running at the exact period even if the prior task hasn't finished yet. I want concurrent execution.
The documentation should be read as:
..subsequent executions [of the supplied Runnable task, per registration] .. will not concurrently execute
This does not mean that there will not no concurrency over all scheduled tasks. Rather, for each task (created by invocation of scheduleAtFixedRate), the Runnable only executes on one thread at a time - even if the execution time overruns the interval.
This is an explicit design choice, as in most situations concurrent execution of task callbacks is undesirable and leads to out-of-control resource spirals. For instance, a "task bomb" could form if an increasing number of (the same) Runnables were executed concurrently.
The Thread Pool monikor is accurate as the implementation does what it is advertised to do - reuse threads across the execution of tasks.
While there is no standard [Thread Pool] Executor that will have the requested behavior it can be emulated in a limited fashion. This is because the concurrency restriction is per-task that is registered (not per-Runnable) and multiple 'identical' tasks can be registered:
long targetPeriod = ..;
long n = targetPeriod * 2;
task1 = executor.scheduleAtFixedRate(runnable, 0, n, ..)
task2 = executor.scheduleAtFixedRate(runnable, n/2, n, ..)
The actual execution/timing behavior depends on various other factors, but both of these registered tasks could execute concurrently if they take over ~n/2 (the target period) to execute.
is there a way to get concurrent execution? I want them running at the exact period even if the prior task hasn't finished yet. I want concurrent execution.
You could launch another thread for (or submit to another executor) the actual task from the scheduled task (assuming the rate is long enough to launch a thread or submit the task). If the task consistently takes longer than the rate you want you may eventually run out of resources (as described in the other answer).
For example:
ScheduledExecutorService ex = Executors.newScheduledThreadPool(1);
ex.scheduleAtFixedRate(() -> {
new Thread(() -> {
//do task
}).start();
}, 0, 1, TimeUnit.MINUTES);

Categories

Resources