I'm preparing an application where a single producer generates several million tasks, which will then be processed by a configurable number of consumers. Communication from producer to consumer is (probably) going to be queue-based.
From the thread that runs the producer/generates the tasks, what method can I use to wait for completion of all tasks? I'd rather not resume to any periodic polling to see if my tasks queue is empty. In any case, the task queue being empty isn't actually a guarantee that the last tasks have completed. Those tasks can be relatively long-running, so it's quite possible that the queue is empty while the consumer threads are still happily processing.
Rgds, Maarten
You might want to have a look at the java.util.concurrent package.
ExecutorService
Executors
Future
The executor framework already provides means to execute tasks via threadpool. The Future abstraction allows to wait for the completition of tasks.
Putting both together allows you coordinate the executions easily, decoupling tasks, activities (threads) and results.
Example:
ExecutorService executorService = Executors.newFixedThreadPool(16);
List<Callable<Void>> tasks = null;
//TODO: fill tasks;
//dispatch
List<Future<Void>> results = executorService.invokeAll(tasks);
//Wait until all tasks have completed
for(Future<Void> result: results){
result.get();
}
Edit: Alternative Version using CountDownLatch
ExecutorService executorService = Executors.newFixedThreadPool(16);
final CountDownLatch latch;
List<Callable<Void>> tasks = null;
//TODO: fill tasks;
latch = new CountDownLatch(tasks.size());
//dispatch
executorService.invokeAll(tasks);
//Wait until all tasks have completed
latch.await();
And inside your tasks:
Callable<Void> task = new Callable<Void>()
{
#Override
public Void call() throws Exception
{
// TODO: do your stuff
latch.countDown(); //<---- important part
return null;
}
};
You want to know where every tasks completes. I would have another queue of completed task reports. (One object/message per task) When this count reaches the number of tasks you created, they have all completed. This task report can also have any errors and timing information for the task.
You could have each consumer check to see if the queue is empty when they dequeue, and, if it is, pulse a condvar (or a Monitor, since I believe that's what Java has) on which the main thread is waiting.
Having the threads check a global boolean variable (marked as volatile) is a way to let the threads know that they should stop.
You can use join() method for each thread ..so that till all the threads are done your main thread will not end! And by this way you can actually find out whether all the threads are done or not!
Related
I am having a scenario of around inserting millions of data into the back end and currently using executor framework to load this. I will explain my problem in simpler terms.
In the below case, I am having 10 runnable and three threads to execute the same. Consider my runnable is doing an insert operation and it is taking time to complete the task. When I checked ,It is understood that ,if all the threads are busy, the other tasks will go to the queue and once the threads completed the tasks ,it will fetch the tasks from the pool and complete it.
So in this case, object of SampleRunnable 4 to 10 will be created and this will be in the pool.
Problem: Since I need to load millions of tasks,I cannot load all the records in queue which can lead to memory issues. So my question is instead of taking all tasks in the queue ,is it possible to make the main thread waiting until any one of the executor worker threads becomes available.
Following approaches I tried as a work around instead of queuing this much tasks:
Approach 1: Used Array Blocking Queue for executor and gave the size as 5 (for e.g.)
So in this case, when the 9th task comes ,this will throw RejectedExecutionException and in the catch clause,put a sleep for 1 minute and recursively trying the same.This will get picked up on any of the retry when the thread is available.
Approach 2: Used shut down and await termination. i.e. if the task count is 5, i am putting shut down and await termination. In the await Termination 'if' block (executor.awaitTermination(60000,TimeUnit.SECONDS)),I am instantiating the thread pool again.
public class SampleMain {
public static void main(String[] args) {
ExecutorService executor = Executors.newFixedThreadPool(3);
for (int i=0;i<10;i++){
executorService.execute(new SampleRunnable(i));
}
executor.shutdown();
}
Sounds like the problem is, you want to throttle the main thread, so that it does not get ahead of the workers. If that's the case, then consider explicitly constructing a ThreadPoolExecutor instance instead of calling Executors.newFixedThreadPool().
That class has several different constructors, and most of them allow you to supply your own blocking queue. If you create an ArrayBlockingQueue with a limited size, then every time the queue becomes full, the main thread will be automatically blocked until a worker makes room by taking another task.
final int work_queue_size = 30;
BlockingQueue work_queue = new ArrayBlockingQueue(work_queue_size);
ExecutorService executor = new ThreadPoolExecutor(..., work_queue);
for (int i=0;i<10;i++){
executorService.execute(new SampleRunnable(i));
}
...
I have created ExecutorService like:
private static final java.util.concurrent.ExecutorService EXECUTOR_SERVICE = new java.util.concurrent.ThreadPoolExecutor(
10, // core thread pool size
5, // maximum thread pool size
1, // time to wait before resizing pool
java.util.concurrent.TimeUnit.MINUTES,
new java.util.concurrent.ArrayBlockingQueue<Runnable>(MAX_THREADS, true),
new java.util.concurrent.ThreadPoolExecutor.CallerRunsPolicy());
and added threads in to it with below code:
EXECUTOR_SERVICE.submit(thread);
Now I want know when all threads in EXECUTOR_SERVICE have finished their task so that I can do some dependent tasks.
Kindly suggest any way to achieve it.
You could use :
try {
executor.awaitTermination(1, TimeUnit.SECONDS);
} catch (InterruptedException e) {
// Report the interruptedException
}
Use CountDownLatch. I have used this before in the past with great success.
A synchronization aid that allows one or more threads to wait until a set of operations being performed in other threads completes.
The Javadoc link has a great example.
As per Java Doc Signature of submit Method is <T> Future<T> submit(Callable<T> task)
and Submits a value-returning task for execution and returns a Future representing the pending results of the task. The Future's get method will return the task's result upon successful completion.
If you would like to immediately block waiting for a task, you can use constructions of the form result = exec.submit(aCallable).get();
Note: The Executors class includes a set of methods that can convert some other common closure-like objects, for example, PrivilegedAction to Callable form so they can be submitted.
which return
Future representing pending completion of the task
Without modifying your submitted tasks, you are left to either query the internal state of ThreadPoolExecutor, subclass ThreadPoolExecutor to track task completion according to your requirements, or collect all of the returned Futures from task submission and wait on them each in turn until they are all done.
In no particular order:
Option 1: Query the state of ThreadPoolExecutor:
You can use ThreadPoolExecutor.getActiveCount() if you keep your reference typed to ThreadPoolExecutor instead of ExecutorService.
From ThreadPoolExecutor source:
/**
* Returns the approximate number of threads that are actively executing tasks.
* Returns:
* the number of threads
**/
public int getActiveCount() {
final ReentrantLock mainLock = this.mainLock;
mainLock.lock();
try {
int n = 0;
for (Worker w : workers)
if (w.isLocked())
++n;
return n;
} finally {
mainLock.unlock();
}
}
The JavaDoc there that mentions "approximate" should concern you, however, since given concurrent execution it is not necessarily guaranteed to be accurate. Looking at the code though, it does lock and assuming it is not queried in another thread before all of your tasks have been added, it appears to be sufficient to test for task completeness.
A drawback here is that you are left to monitor the value continuously in a check / sleep loop.
Option 2: Subclass ThreadPoolExecutor:
Another solution (or perhaps a complementary solution) is to subclass ThreadPoolExecutor and override the afterExecute method in order to keep track of completed executions and take appropriate action. You could design your subclass such that it will call a callback once X tasks have been completed, or the number of remaining tasks drops to 0 (some concurrency concerns there since this could trigger before all tasks have been added) etc.
Option 3: Collect task Futures (probably the best option):
Each submission to the ExecutorService returns a Future which can be collected in a list. A loop could then run through and wait on each future in turn until all tasks are complete.
E.g.
List<Future> futures = new ArrayList<Future>();
futures.add(executorService.submit(myTask1));
futures.add(executorService.submit(myTask2));
for (Future future : futures) {
// TODO time limit, exception handling, etc etc.
future.get();
}
I have a looper thread to execute tasks. Other threads can submit tasks to this looper thread. Some tasks are immediate tasks, others are future tasks, which are to be executed after T seconds after submission. I use PriorityBlockingQueue to store tasks, where time is used as the priority, so that the first task of the queue is the most imminent task to be executed.
The looper's main loop is as fellows:
PriorityBlockingQueue<Event> taskQueue = ...
while (true) {
if (taskQueue.isEmpty())
<something>.wait(); // wait indefinitely
else
<something>.wait(taskQueue.peek().time - NOW()); // wait for the first task
Task task = taskQueue.poll(0); // take the first task without waiting
if (task != null && task.time <= NOW())
task.execute();
else if (task != null)
taskQueue.offer(task); // not yet runnable, put it back
}
The looper provides allows other threads (or itself) to submit tasks:
public void submitTask (Task task) { // time information is contained in the task.
taskQueue.offer(task);
<something>.signal(); // informs the loop thread that new task is avaliable.
}
Here, I have only one thread calling wait() and multiple threads calling signal(). My question is that what synchronization primitive should I use in the place of <something>. There are so many primitives in the java.util.concurrent and java.util.concurrent.lock package. And there are also the synchronized keyword and Object.wait()/notify(). Which one fits best here?
You don't need to do any of this.
The whole point of the BlockingQueue is that it already manages thread synchronization for you. You do not need to inform other threads that something new is available now.
Just use
taskQueue.take(); // blocks until something is there
or
taskQueue.poll(1, SECONDS); // wait for a while then give up
For your "future tasks" that should not be processed immediately, I would not add them to this queue at all. You can use a ScheduledExecutorService to add them to the task queue once it is time (in effect, a second queue).
Come to think of it, you can do away with the BlockingQueue altogether and just use the ScheduledExecutorService (backed by a single thread, your "looper") for all your tasks.
j.u.c. package contains DelayedQueue which can satisfy you problem.
Every queued object should implement Delayed interface with getDelay(..) method.
There is a generic queue of tasks where new tasks get added. I want to write code that will create more work in terms of tasks by adding them to the queue. The task that added the work to the queue will wait for all tasks to complete by polling the queue.
What would be the best way to implement it using Java. I was thinking of something on the lines of Simple threads by implementing a runnable interface and make it run in an infinite loop and sleep in between, wake up to see if there is any progress. If the progress is happening, keep on looping, if it has completed break out of the loop. Is there any other good and performance efficient way to implement this ?
How the tasks complete?
The tasks are submitted to a Queue. The Queue is polled by an executor and it runs the tasks.
What i want to do?
Poll that queue to see if the task has completed or is still executing.
What you're describing here, may be a rough sketch of a work queue. You could enqueue processes for asynchronous processing, wait for a notification of completion, and then terminate. This works, but there are new concurrency tools available. I recommend reading the Java Concurrency Lesson.
The new model for concurrency allows you to separate the concurrency concerns from the thread via tasks, Runnable and Callable and the ExecutorService. Rather than working directly with threads and building your own thread pool try to let the Executor do the heavy lifting for you.
...
ExecutorService ex = Executors.newSingleThreadExecutor();
....
You may hand tasks, in the form of Runnables and Callables, to the ExecutorService and receive in return Future objects which may be used to monitor the task's progress.
Future<String> f = executor.submit(new Foo());
....
class Foo implements Callable<String> {
#Override
public String call() throws Exception {
return "Bar";
}
}
You may use an ExecutorCompletionService to monitor the completion of tasks for you :
CompletionService<String> cs = new ExecutorCompletionService<String>(executor);
Future<String> f = cs.submit(new Foo());
... // Let's say you've added TASK_COUNT tasks
for (int i = 0; i < TASK_COUNT ; i++ ) {
try {
String str = cs.take().get();
if (str != null) {
System.out.println(str); //Handle the result of the Callable
continue;
}
} catch (ExecutionException ignore) {}
}
now you've received a result per callable, you can clean up your tasks using the Future f object you received earlier with cs.submit(new Foo()) , by invoking
f.cancel(true)
on each task. And finally, don't forget to clean up your executor with
executor.shutdown();
There is a lot more to concurrency than this, but I believe that the above illustrates a means to meet your needs. I'd recommend reading the JavaDoc as well.
Use java.util.concurrent.Future and a java.util.concurrent.CompletionService.
You can use Fork/Join framework from java 7
I try to work with Java's FutureTask, Future, Runnable, Callable and ExecutorService types.
What is the best practice to compose those building blocks?
Given that I have multiple FutureTasks and and I want to execute them in sequence.
Ofcourse I could make another FutureTask which is submitting / waiting for result for each subtask in sequence, but I want to avoid blocking calls.
Another option would be to let those subtasks invoke a callback when they complete, and schedule the next task in the callback. But going that route, how to I create a proper outer FutureTask object which also handles exceptions in the subtask without producing that much of a boilerplate?
Do I miss something here?
Very important thing, though usually not described in tutorials:
Runnables to be executed on an ExecutorService should not block. This is because each blocking switches off a working thread, and if ExecutorService has limited number of working threads, there is a risk to fall into deadlock (thread starvation), and if ExecutorService has unlimited number of working threads, then there is a risk to run out of memory. Blocking operations in the tasks simply destroy all advantages of ExecutorService, so use blocking operations on usual threads only.
FutureTask.get() is blocking operation, so can be used on ordinary threads and not from an ExecutorService task. That is, it cannot serve as a building block, but only to deliver result of execution to the master thread.
Right approach to build execution from tasks is to start next task when all input data for the next task is ready, so that the task do not have to block waiting for input data. So you need a kind of a gate which stores intermediate results and starts new task when all arguments have arrived. Thus tasks do not bother explicitly to start other tasks. So a gate, which consists of input sockets for arguments and a Runnable to compute them, can be considered as a right building block for computations on ExcutorServices.
This approach is called dataflow or workflow (if gates cannot be created dynamically).
Actor frameworks like Akka use this approach but are limited in the fact that an actor is a gate with single input socket.
I have written a true dataflow library published at https://github.com/rfqu/df4j.
I tried to do something similar with a ScheduledFuture, trying to cause a delay before things were displayed to the user. This is what I come up with, simply use the same ScheduledFuture for all your 'delays'. The code was:
public static final ScheduledExecutorService scheduler = Executors
.newScheduledThreadPool(1);
public ScheduledFuture delay = null;
delay = scheduler.schedule(new Runnable() {
#Override
public void run() {
//do something
}
}, 1000, TimeUnit.MILLISECONDS);
delay = scheduler.schedule(new Runnable() {
#Override
public void run() {
//do something else
}
}, 2000, TimeUnit.MILLISECONDS);
Hope this helps
Andy
The usual approach is to:
Decide about ExecutorService (which type, how many threads).
Decide about the task queue (for how long it could be non-blocking).
If you have some external code that waits for the task result:
* Submit tasks as Callables (this is non blocking as long as you do not run out of the queue).
* Call get on the Future.
If you want some actions to be taken automatically after the task is finished:
You can submit as Callables or Runnables.
Just add that you need to do at the end as the last code inside the task. Use
Activity.runOnUIThread these final actions need to modify GUI.
Normally, you should not actively check when you can submit one more task or schedule callback in order just to submit them. The thread queue (blocking, if preferred) will handle this for you.