I have an ansynchronous task processor that I am looking to migrate to use the new CompletableFuture syntax as that initially seems a more natural way to manage my task dependencies, none of the guides around CompletableFuture that I've found cover my case, so I am looking for some advice as to the best approach:
This is a command line app that a user supplies a list of tasks to then waits for them to be completed.
The task list is in the order of 50 tasks that do various data-imports against a database.
The tasks all return an import report containing a summary report of times, data processed etc... that gets output to the user.
While some of the tasks have no dependencies, most of them will depend on one or more other tasks completing (where more is an arbitrary number, most likely to be 2-3, but sometimes as high as 10-15).
If a task depends on other tasks, it always requires all of those tasks to have completed (so CompletableFuture.allOf).
If a task depends on other tasks it never depends on the import report that is returned, just on the database state, so doesn't need to consume the result of the previous Future.
Some of the tasks are quite long running
As the user is waiting, I want to be outputting some status every so often to let them know it is still running.
Tasks can fail with exceptions for a variety of reasons, no exception = success.
If a task fails, I need to stop dependent tasks executing
If a task fails, I'd like to stop any new tasks executing and ideally let the current tasks run to completion, then shutdown.
The current implementation uses a ThreadPoolExecutor and ExecutorCompletionService:
A dependency tree for all tasks is built
All tasks with no dependencies are submitted
The flow control code loops around an AtomicInteger of number of tasks in progress calling CompletionService.poll(timeout)
If it gets a completed task it calls Future.get() to collect the report then runs some dependency management code to add any new tasks that are now unblocked to the executor. (and manages the number of tasks in progress appropriately).
If a number of polls goes by with nothing complete it outputs a message to the user. (and there is some timeout and give up code, but basically the same)
In the case of an exception in future.get() it calls Executor.shutdownNow() to clear the queue and stop any further tasks, outputs results that it has an as helpful an error message as it can make.
Once the number of tasks in progress hits 0 the main control loop finishes, the code performs a tidy shutdown and outputs a report for the user about what their tasks did.
The downside is there is a massive amount of boilerplate around this, managing the polling, exception handling and submitting relevant dependencies and the dependency manager and executor are very tightly coupled.
On the surface CompletableFuture seemed a very useful fit. Having built the dependency tree and ordered it, I can iterate over the tree and use CompletableFuture.supplyAsync() for tasks with no dependencies and when I hit a task with dependencies use CompletableFuture.allOf()
CompletableFuture<Report>[] dependencies = getBlockingTasks(task);
CompletableFuture<Void> allOf = CompletableFuture.allOf(dependencies);
CompletableFuture<Report> future = allOf.thenApplyAsync(v -> executeTask(task)), threadPoolExecutor);
I can also chain using thenApply and thenAccept to each worker thread process the Report and log it.
The main area I am not sure of is the exception handling and polling / waiting. If I didn't want to give updates when things are taking a while I would use:
CompletableFuture<Void> allOf = CompletableFuture.allOf(allMyTasks);
allOf.join();
I am not sure about the best way to manage exception handling at all.
So (other than the snippets I have posted being sensible), what is the best way for me to be able to keep track on the completion state of my tasks so I can continue to output things to the user if things are taking a while, what is the best way to manage exceptions, and perhaps importantly, is this actually a sensible pattern, or am I being drawn in by the "new and shiny and does some things I want well"?
Related
I had some queries regarding Future usage. Please go through below example before addressing my queries.
http://javarevisited.blogspot.in/2015/01/how-to-use-future-and-futuretask-in-Java.html
The main purpose of using thread pools & Executors is to execute task asynchronously without blocking main thread. But once you use Future, it is blocking calling thread. Do we have to create separate new thread/thread pool to analyse the results of Callable tasks? OR is there any other good solution?
Since Future call is blocking the caller, is it worth to use this feature? If I want to analyse the result of a task, I can have synchronous call and check the result of the call without Future.
What is the best way to handle Rejected tasks with usage of RejectionHandler? If a task is rejected, is it good practice to submit the task to another Thread or ThreadPool Or submit the same task to current ThreadPoolExecutor again?
Please correct me if my thought process is wrong about this feature.
Your question is about performing an action when an asynchronous action has been done. Futures on the other hand are good if you have an unrelated activity which you can perform while the asynchronous action is running. Then you may regularly poll the action represented by the Future via isDone() and do something else if not or call the blocking get() if you have no more unrelated work for your current thread.
If you want to schedule an on-completion action without blocking the current thread, you may instead use CompletableFuture which offers such functionality.
CompletableFuture is the solution for queries 1 and 2 as suggested by #Holger
I want to update about RejectedExecutionHandler mechanism regarding query 3.
Java provides four types of Rejection Handler policies as per javadocs.
In the default ThreadPoolExecutor.AbortPolicy, the handler throws a runtime RejectedExecutionException upon rejection.
In ThreadPoolExecutor.CallerRunsPolicy, the thread that invokes execute itself runs the task. This provides a simple feedback control mechanism that will slow down the rate that new tasks are submitted.
In ThreadPoolExecutor.DiscardPolicy, a task that cannot be executed is simply dropped.
In ThreadPoolExecutor.DiscardOldestPolicy, if the executor is not shut down, the task at the head of the work queue is dropped, and then execution is retried (which can fail again, causing this to be repeated.)
CallerRunsPolicy: If you have more tasks in task queue, using this policy will degrade the performance. You have to be careful since reject tasks will be executed by main thread itself. If Running the rejected task is critical for your application and you have limited task queue, you can use this policy.
DiscardPolicy: If discarding a non-critical event does not bother you, then you can use this policy.
DiscardOldestPolicy: Discard the oldest job and try to resume the last one
If none of them suits your need, you can implement your own RejectionHandler.
Use case: tasks are generated in one thread, need to be distributed for computation to many threads and finally the generating task shall reap the results and mark the tasks as done.
I found the class ExecutorCompletionService which fits the use case nearly perfectly --- except that I see no good solution for non-idle waiting. Let me explain.
In principle my code would look like
while (true) {
MyTask t = generateNextTask();
if (t!=null) {
completionService.submit(t);
}
MyTask finished;
while (null!=(finished=compService.poll())) {
retireTaks(finished);
}
}
Both, generateNextTask() and completionService.poll() may return null if there are currently no new tasks available and if currently no task has returned from the CompletionService respectively.
In these cases, the loop degenerates into an ugly idle-wait. I could poll() with a timeout or add a Thread.sleep() for the double-null case, but I consider this a bad workaround, because it nevertheless wastes CPU and is not as responsive as possible, due to the wait.
Suppose I replace generateNextTask() by a poll() on a BlockingQueue, is there good way to poll the queue as well as the CompletionService in parallel to be woken up for work on whichever end something becomes available?
Actually this reminds me of Selector. Is something like it available for queues?
You should use CompletionService.take() to wait until the next task completes and retrieve its Future. poll() is the non-blocking version, returning null if no task is currently completed.
Also, your code seems to be inefficient, because you produce and consume tasks one at a time, instead of allowing multiple tasks to be processed in parallel. Consider having a different thread for task generation and for task results consumption.
-- Edit --
I think that given the constraints you mention in your comments, you can't achieve all your requirements.
Requiring the main thread to be producer and consumer, and disallowing any busy loop or timed loop, you can't avoid the scenario where a blocking wait for a task completion takes too long and no other task gets processed in the meanwhile.
Since you "can replace generateNextTask() by a poll() on a BlockingQueue", I assume incoming tasks can be put in a queue by some other thread, and the problem is, you cannot execute take() on 2 queues simultaneously. The solution is to simply put both incoming and finished tasks in the same queue. To differentiate, wrap them in objects of different types, and then check that type in the loop after take().
This solution works, but we can go further. You said you don't want to use 2 threads for handling tasks - then you can use zero threads. Let wrappers implement Runnable and, instead of checking of the type, you just call take().run(). This way your thread become a single-threaded Executor. But we already have an Executor (CompletionService), can we use it? The problem is, handling of incoming and finished tasks should be done serially, not in parallel. So we need SerialExecutor described in api/java/util/concurrent/Executor, which accepts Runnables and executes them serially, but on another executor. This way no thread is wasted.
And finally, you mentioned Selector as possible solution. I must say, it is an outdated approach. Learn dataflow and actor computing. Nice introduction is here. Look at Dataflow4java project of mine, it has MultiPortActorTest.java example, where class Accum does what you need, with all the boilerplate with wrapper Runnables and serial executors hidden in the supporting library.
What you need is a ListenableFuture from Guava. ListenableFutureExplained
I am creating a http proxy server in java. I have a class named Handler which is responsible for processing the requests and responses coming and going from web browser and to web server respectively. I have also another class named Copy which copies the inputStream object to outputStream object . Both these classes implement Runnable interface. I would like to use the concept of Thread pooling in my design, however i don't know how to go about that! Any hint or idea would be highly appreciated.
I suggest you look at Executor and ExecutorService. They add a lot of good stuff to make it easier to use Thread pools.
...
#Azad provided some good information and links. You should also buy and read the book Java Concurrency in Practice. (often abbreviated as JCiP) Note to stackoverflow big-wigs - how about some revenue link to Amazon???
Below is my brief summary of how to use and take advantage of ExecutorService with thread pools. Let's say you want 8 threads in the pool.
You can create one using the full featured constructors of ThreadPoolExecutor, e.g.
ExecutorService service = new ThreadPoolExecutor(8,8, more args here...);
or you can use the simpler but less customizable Executors factories, e.g.
ExecutorService service = Executors.newFixedThreadPool(8);
One advantage you immediately get is the ability to shutdown() or shutdownNow() the thread pool, and to check this status via isShutdown() or isTerminated().
If you don't care much about the Runnable you wish to run, or they are very well written, self-contained, never fail or log any errors appropriately, etc... you can call
execute(Runnable r);
If you do care about either the result (say, it calculates pi or downloads an image from a webpage) and/or you care if there was an Exception, you should use one of the submit methods that returns a Future. That allows you, at some time in the future, check if the task isDone() and to retrieve the result via get(). If there was an Exception, get() will throw it (wrapped in an ExecutionException). Note - even of your Future doesn't "return" anything (it is of type Void) it may still be good practice to call get() (ignoring the void result) to test for an Exception.
However, this checking the Future is a bit of chicken and egg problem. The whole point of a thread pool is to submit tasks without blocking. But Future.get() blocks, and Future.isDone() begs the questions of which thread is calling it, and what it does if it isn't done - do you sleep() and block?
If you are submitting a known chunk of related of tasks simultaneously, e.g., you are performing some big mathematical calculation like a matrix multiply that can be done in parallel, and there is no particular advantage to obtaining partial results, you can call invokeAll(). The calling thread will then block until all the tasks are complete, when you can call Future.get() on all the Futures.
What if the tasks are more disjointed, or you really want to use the partial results? Use ExecutorCompletionService, which wraps an ExecutorService. As tasks get completed, they are added to a queue. This makes it easy for a single thread to poll and remove events from the queue. JCiP has a great example of an web page app that downloads all the images in parallel, and renders them as soon as they become available for responsiveness.
I hope below will help you:,
class Executor
An object that executes submitted Runnable tasks. This interface provides a way of decoupling task submission from the mechanics of how each task will be run, including details of thread use, scheduling, etc. An Executor is normally used instead of explicitly creating threads. For example, rather than invoking new Thread(new(RunnableTask())).start() for each of a set of tasks, you might use:
Executor executor = anExecutor;
executor.execute(new RunnableTask1());
executor.execute(new RunnableTask2());
...
class ScheduledThreadPoolExecutor
A ThreadPoolExecutor that can additionally schedule commands to run after a given delay, or to execute periodically. This class is preferable to Timer when multiple worker threads are needed, or when the additional flexibility or capabilities of ThreadPoolExecutor (which this class extends) are required.
Delayed tasks execute no sooner than they are enabled, but without any real-time guarantees about when, after they are enabled, they will commence. Tasks scheduled for exactly the same execution time are enabled in first-in-first-out (FIFO) order of submission.
and
Interface ExecutorService
An Executor that provides methods to manage termination and methods that can produce a Future for tracking progress of one or more asynchronous tasks.
An ExecutorService can be shut down, which will cause it to stop accepting new tasks. After being shut down, the executor will eventually terminate, at which point no tasks are actively executing, no tasks are awaiting execution, and no new tasks can be submitted.
Edited:
you can find example to use Executor and ExecutorService herehereand here Question will be useful for you.
I want to know how to execute more than one job in Eclipse at a time. I want to run more than one job concurrently in RCP.
A Job more or less wraps a Thread and all started (scheduled) job instances run in parallel.
Suggestion for further reading:
On the Job: the Eclipse Jobs API
Use Threading to run more than one job at a time.
Thread th = new Thread() {
public void run() {
//Here is a thread that you can use wherever you want in your code
}
};
th.start();
See Eclipse RCP: Only one Job runs at a time?
Jobs can optionally finish their execution asynchronously (in another thread) by returning a result status of ASYNC_FINISH. Jobs that finish asynchronously must specify the execution thread by calling setThread, and must indicate when they are finished by calling the method done.
A few years later, you now have the opposite issue, which is to limit the number of concurrent jobs.
That is why Eclipse 4.5M4 will include now (Q4 2014) a way to Support for Job Groups with throttling.
See bug 432049:
Eclipse provides a simple Jobs API to perform different tasks in parallel and in asynchronous fashion. One limitation of the Eclipse Jobs is that there is no easy way to limit the number of worker threads being used to execute jobs.
This may lead to a thread pool explosion when many jobs are scheduled in quick succession. Due to that it’s easy to use Jobs to perform different unrelated tasks in parallel, but hard to implement thousands of Jobs co-operating to complete a single large task.
Eclipse currently supports the concept of Job Families, which provides one way of grouping with support for join, cancel, sleep, and wakeup operations on the whole family.
To address all these issue we would like to propose a simple way to group a set of Eclipse Jobs that are responsible for pieces of the same large task.
The API would support throttling, join, cancel, combined progress and error reporting for all of the jobs in the group and the job grouping functionality can be used to rewrite performance critical algorithms to use parallel execution of cooperating jobs.
You can see the implementation in this commit 26471fa
I have a series of concurrent tasks to run. If any one of them fails, I want to interrupt them all and await termination. But assuming none of them fail, I want to wait for all of them to finish.
ExecutorCompletionService seems like almost what I want here, but there doesn't appear to be a way to tell if all of my tasks are done, except by keeping a separate count of the number of tasks. (Note that both of the examples of in the Javadoc for ExecutorCompletionService keep track of the count "n" of the tasks, and use that to determine if the service is finished.)
Am I overlooking something, or do I really have to write this code myself?
Yes, you do need to keep track if you're using an ExecutorCompletionService. Typically, you would call get() on the futures to see if an error occurred. Without iterating over the tasks, how else could you tell that one failed?
If your series of tasks is of a known size, then you should use the second example in the javadoc.
However, if you don't know the number of tasks which you will submit to the CompletionService, then you have a sort of Producer-Consumer problem. One thread is producing tasks and placing them in the ECS, another would be consuming the task futures via take(). A shared Semaphore could be used, allowing the Producer to call release() and the Consumer to call acquire(). Completion semantics would depend on your application, but a volatile or atomic boolean on the producer to indicate that it is done would suffice.
I suggest a Semaphore over wait/notify with poll() because there is a non-deterministic delay between the time a task is produced and the time that task's future is available for consumption. Therefore the consumer and producer needs to be just slightly smarter.