I'd like to utilize some lightweight task management (e.g. ScheduledThreadPoolExecutor) for periodically doing some Tasks which might block (e.g. because of waiting to acquire a monitor/lock).
In such case the task management should detect that situation and should spawn another task/thread of the same kind which blocks.
How can this be achieved?
And as a bonus question:
Documentation of ScheduledThreadPoolExecuter states that "If any execution of the task encounters an exception, subsequent executions are suppressed". In my case I rather would like to restart the task which failed. Is there a way to alter this behaviour?
For the first question : Use a java.util.concurrent.Lock and call tryLock() with a timeout. If the timeout expires (say, 5 seconds), then create a new task of the same kind as the current, pass it to the executor, and go back waiting for the lock, this time in a blocking way.
For the second question, I would consider enclosing the scheduled job in a big try/catch block to prevent unexpected exceptions to bubble up to the executor itself.
Have you thought of using 3rd party open source software? The Quartz scheduler (http://www.quartz-scheduler.org/) is very flexible and is worth to try.
Related
I'm trying to run a number of jobs concurrently using Java's ForkJoinPool. The main task (which is already running in the pool) spawns all the jobs and then does a series of joins. I was sure that a task calling join would free the thread it is running in, but it seems like it is actually blocked on it, and therefore it is "wasting" the thread, i.e., since the number of threads equals the number of CPU cores, one core will be inactive.
I know that if I run invokeAll instead, then the first of the sub-jobs gets to run in the same thread, and indeed this works. However, this seems sub-optimal, because if the first task is actually a very fast one, i have the same problem. One of the threads is blocked waiting on join. There are more jobs than threads, so I would rather another one of the jobs gets started.
I can try and bypass all this manually but its not so nice, and it seems like I am redoing what ForkJoinPool is supposed to do.
So the question is: Am I understanding ForkJoinPool wrong? or if what I'm saying is correct, then is there simple way to utilize the threads more efficiently?
ForkJoinPool is designed to prevent you having to think about thread utilization in this way. The 'work stealing' algorithm ensures that each thread is always busy so long as there are tasks in the queue.
Check out these notes for a high-level discussion:
https://www.dre.vanderbilt.edu/~schmidt/cs891f/2018-PDFs/L4-ForkJoinPool-pt3.pdf
To see the ugly details go down the rabbit hole of the ForkJoinPool#awaitJoin source.
Roughly, if I'm reading the (very complex) code correctly: When a thread joins a sub-task, it attempts to complete that task itself, otherwise if the sub-task's worker queue is non-empty (i.e. it is also depending on other tasks), the joining thread repeatedly attempts to complete one of those tasks, via ForkJoinPool#tryHelpStealer, whose Javadoc entry provides some insight:
Tries to locate and execute tasks for a stealer of the given
task, or in turn one of its stealers, Traces currentSteal ->
currentJoin links looking for a thread working on a descendant
of the given task and with a non-empty queue to steal back and
execute tasks from. The first call to this method upon a
waiting join will often entail scanning/search, (which is OK
because the joiner has nothing better to do), but this method
leaves hints in workers to speed up subsequent calls. The
implementation is very branchy to cope with potential
inconsistencies or loops encountering chains that are stale,
unknown, or so long that they are likely cyclic.
Notice that ForkJoinTask does not extend Thread, so 'blocking' of the join operation means something different here than usual. It doesn't mean that the underlying thread is in a blocked state, rather it means that the computation of the current task is held up further up the call stack while join goes off and attempts to resolve the tree of sub-tasks impeding progress.
A few words about what I'm planing to do. I need to create some task executor, that will poll tasks from queue and just execute code in this task. And for this I need to implement some interrupt mechanism to enable user to stop this task.
So I see two possible solutions: 1. start a pool of threads and stop them by using .destroy() method of a thread. (I will not use any shared objects) 2. Use pool of separated processes and System.exit() or kill signal to process. Option 2. looks much safer for me as I can ensure that thread killing will not lead to any concurrency problems. But I'm not sure that it won't produce a big overhead.
Also I'm not sure about JVM, if I will use separated processes, each process will be using the separated JVM, and it can bring a lot of overhead. Or not. So my question in this. Choosing a different language without runtime for worker process is possible option for me, but I still don't have enough experience with processes and don't know about overhead.
start a pool of threads and stop them by using .destroy() method of a thread. (I will not use any shared objects)
You can't stop threads on modern VMs unless said thread is 'in on it'. destroy and friends do not actually do what you want and this is unsafe. The right way is to call interrupt(). If the thread wants to annoy you and not actually stop in the face of an interrupt call, they can. The solution is to fix the code so that it doesn't do that anymore. Note that raising the interrupt flag will guaranteed stop any method that is sleeping which is specced to throw InterruptedException (sleep, wait, etc), and on most OSes, will also cause any I/O call that is currently frozen to exit by throwing an IOException, but there is no guarantee for this.
Use pool of separated processes and System.exit() or kill signal to process.
Hella expensive; a VM is not a light thing to spin up; it'll have its own copy of all the classes (even something as simple as java.lang.String and company). 10 VMs is a stretch. Whereas 1000 threads is no problem.
And for this I need to implement some interrupt mechanism to enable user to stop this task.
The real problem is that this is very difficult to guarantee. But if you control the code that needs interrupting, then usually no big deal. Just use the interrupt() mechanism.
EDIT: In case you're wondering how to do the interrupt thing: Raising the interrupt flag on a thread just raises the flag; nothing else happens unless you write code that interacts with it, or call a method that does.
There are 3 main interactions:
All things that block and are declared to throw InterruptedEx will lower the flag and throw InterruptedEx. If the flag is up and you call Thread.sleep, that will immediately_ clear the flag and throw that exception without ever even waiting. Thus, catch that exception, and return/abort/break off the task.
Thread.interrupted() will lower the flag and return true (thus, does so only once). Put this in your event loops. It's not public void run() {while (true) { ... }} or while (running) {} or whatnot, it's while (!Thread.interrupted() or possibly while (running && !Thread.interrupted9)).
Any other blocking method may or may not; java intentionally doesn't specify either way because it depends on OS and architecture. If they do (and many do), they can't throw interruptedex, as e.g. FileInputStream.read isn't specced to throw it. They throw IOException with a message indicating an abort happened.
Ensure that these 3 code paths one way or another lead to a task that swiftly ends, and you have what you want: user-interruptible tasks.
Executors framework
Java already provides a facility with your desired features, the Executors framework.
You said:
I need to create some task executor, that will poll tasks from queue and just execute code in this task.
The ExecutorService interface does just that.
Choose an implementation meeting your needs from the Executors class. For example, if you want to run your tasks in the sequence of their submission, use a single-threaded executor service. You have several others to choose from if you want other behavior.
ExecutorService executorService = Executors.newSingleThreadExecutor() ;
You said:
start a pool of threads
The executor service may be backed by a pool of threads.
ExecutorService executorService = Executors.newFixedThreadPool( 3 ) ; // Create a pool of exactly three threads to be used for any number of submitted tasks.
You said:
just execute code in this task
Define your task as a class implementing either Runnable or Callable. That means your class carries a run method, or a call method.
Runnable task = ( ) -> System.out.println( "Doing this work on a background thread. " + Instant.now() );
You said:
will poll tasks from queue
Submit your tasks to be run. You can submit many tasks, either of the same class or of different classes. The executor service maintains a queue of submitted tasks.
executorService.submit( task );
Optionally, you may capture the Future object returned.
Future future = executorService.submit( task );
That Future object lets you check to see if the task has finished or has been cancelled.
if( future.isDone() ) { … }
You said:
enable user to stop this task
If you want to cancel the task, call Future::cancel.
Pass true if you want to interrupt the task if it has already begun execution.
Pass false if you only want to cancel the task before it has begun execution.
future.cancel( true );
You said:
looks much safer for me as I can ensure that thread killing will not lead to any concurrency problems.
Using the Executors framework, you would not be creating or killing any threads. The executor service implementation handles the threads. Your code never addresses the Thread class directly.
So no concurrency problems of that kind.
But you may have other concurrency problems if you share any resources across threads. I highly recommend reading Java Concurrency in Practice by Brian Goetz et al.
You said:
But I'm not sure that it won't produce a big overhead.
As the correct Answer by rzwitserloot explained, your approach would certainly create much more overhead that would the use of the Executors framework.
FYI, in the future Project Loom will bring virtual threads (fibers) to the Java platform. This will generally make background threading even faster, and will make practical having many thousands or even millions of non-CPU-bound tasks. Special builds available now on early-access Java 16.
ExecutorService executorService = newVirtualThreadExecutor() ;
executorService.submit( task ) ;
I had some queries regarding Future usage. Please go through below example before addressing my queries.
http://javarevisited.blogspot.in/2015/01/how-to-use-future-and-futuretask-in-Java.html
The main purpose of using thread pools & Executors is to execute task asynchronously without blocking main thread. But once you use Future, it is blocking calling thread. Do we have to create separate new thread/thread pool to analyse the results of Callable tasks? OR is there any other good solution?
Since Future call is blocking the caller, is it worth to use this feature? If I want to analyse the result of a task, I can have synchronous call and check the result of the call without Future.
What is the best way to handle Rejected tasks with usage of RejectionHandler? If a task is rejected, is it good practice to submit the task to another Thread or ThreadPool Or submit the same task to current ThreadPoolExecutor again?
Please correct me if my thought process is wrong about this feature.
Your question is about performing an action when an asynchronous action has been done. Futures on the other hand are good if you have an unrelated activity which you can perform while the asynchronous action is running. Then you may regularly poll the action represented by the Future via isDone() and do something else if not or call the blocking get() if you have no more unrelated work for your current thread.
If you want to schedule an on-completion action without blocking the current thread, you may instead use CompletableFuture which offers such functionality.
CompletableFuture is the solution for queries 1 and 2 as suggested by #Holger
I want to update about RejectedExecutionHandler mechanism regarding query 3.
Java provides four types of Rejection Handler policies as per javadocs.
In the default ThreadPoolExecutor.AbortPolicy, the handler throws a runtime RejectedExecutionException upon rejection.
In ThreadPoolExecutor.CallerRunsPolicy, the thread that invokes execute itself runs the task. This provides a simple feedback control mechanism that will slow down the rate that new tasks are submitted.
In ThreadPoolExecutor.DiscardPolicy, a task that cannot be executed is simply dropped.
In ThreadPoolExecutor.DiscardOldestPolicy, if the executor is not shut down, the task at the head of the work queue is dropped, and then execution is retried (which can fail again, causing this to be repeated.)
CallerRunsPolicy: If you have more tasks in task queue, using this policy will degrade the performance. You have to be careful since reject tasks will be executed by main thread itself. If Running the rejected task is critical for your application and you have limited task queue, you can use this policy.
DiscardPolicy: If discarding a non-critical event does not bother you, then you can use this policy.
DiscardOldestPolicy: Discard the oldest job and try to resume the last one
If none of them suits your need, you can implement your own RejectionHandler.
Use case: tasks are generated in one thread, need to be distributed for computation to many threads and finally the generating task shall reap the results and mark the tasks as done.
I found the class ExecutorCompletionService which fits the use case nearly perfectly --- except that I see no good solution for non-idle waiting. Let me explain.
In principle my code would look like
while (true) {
MyTask t = generateNextTask();
if (t!=null) {
completionService.submit(t);
}
MyTask finished;
while (null!=(finished=compService.poll())) {
retireTaks(finished);
}
}
Both, generateNextTask() and completionService.poll() may return null if there are currently no new tasks available and if currently no task has returned from the CompletionService respectively.
In these cases, the loop degenerates into an ugly idle-wait. I could poll() with a timeout or add a Thread.sleep() for the double-null case, but I consider this a bad workaround, because it nevertheless wastes CPU and is not as responsive as possible, due to the wait.
Suppose I replace generateNextTask() by a poll() on a BlockingQueue, is there good way to poll the queue as well as the CompletionService in parallel to be woken up for work on whichever end something becomes available?
Actually this reminds me of Selector. Is something like it available for queues?
You should use CompletionService.take() to wait until the next task completes and retrieve its Future. poll() is the non-blocking version, returning null if no task is currently completed.
Also, your code seems to be inefficient, because you produce and consume tasks one at a time, instead of allowing multiple tasks to be processed in parallel. Consider having a different thread for task generation and for task results consumption.
-- Edit --
I think that given the constraints you mention in your comments, you can't achieve all your requirements.
Requiring the main thread to be producer and consumer, and disallowing any busy loop or timed loop, you can't avoid the scenario where a blocking wait for a task completion takes too long and no other task gets processed in the meanwhile.
Since you "can replace generateNextTask() by a poll() on a BlockingQueue", I assume incoming tasks can be put in a queue by some other thread, and the problem is, you cannot execute take() on 2 queues simultaneously. The solution is to simply put both incoming and finished tasks in the same queue. To differentiate, wrap them in objects of different types, and then check that type in the loop after take().
This solution works, but we can go further. You said you don't want to use 2 threads for handling tasks - then you can use zero threads. Let wrappers implement Runnable and, instead of checking of the type, you just call take().run(). This way your thread become a single-threaded Executor. But we already have an Executor (CompletionService), can we use it? The problem is, handling of incoming and finished tasks should be done serially, not in parallel. So we need SerialExecutor described in api/java/util/concurrent/Executor, which accepts Runnables and executes them serially, but on another executor. This way no thread is wasted.
And finally, you mentioned Selector as possible solution. I must say, it is an outdated approach. Learn dataflow and actor computing. Nice introduction is here. Look at Dataflow4java project of mine, it has MultiPortActorTest.java example, where class Accum does what you need, with all the boilerplate with wrapper Runnables and serial executors hidden in the supporting library.
What you need is a ListenableFuture from Guava. ListenableFutureExplained
I have a series of concurrent tasks to run. If any one of them fails, I want to interrupt them all and await termination. But assuming none of them fail, I want to wait for all of them to finish.
ExecutorCompletionService seems like almost what I want here, but there doesn't appear to be a way to tell if all of my tasks are done, except by keeping a separate count of the number of tasks. (Note that both of the examples of in the Javadoc for ExecutorCompletionService keep track of the count "n" of the tasks, and use that to determine if the service is finished.)
Am I overlooking something, or do I really have to write this code myself?
Yes, you do need to keep track if you're using an ExecutorCompletionService. Typically, you would call get() on the futures to see if an error occurred. Without iterating over the tasks, how else could you tell that one failed?
If your series of tasks is of a known size, then you should use the second example in the javadoc.
However, if you don't know the number of tasks which you will submit to the CompletionService, then you have a sort of Producer-Consumer problem. One thread is producing tasks and placing them in the ECS, another would be consuming the task futures via take(). A shared Semaphore could be used, allowing the Producer to call release() and the Consumer to call acquire(). Completion semantics would depend on your application, but a volatile or atomic boolean on the producer to indicate that it is done would suffice.
I suggest a Semaphore over wait/notify with poll() because there is a non-deterministic delay between the time a task is produced and the time that task's future is available for consumption. Therefore the consumer and producer needs to be just slightly smarter.