I have a list of tasks [Task-A,Task-B,Task-C,Task-D, ...].
One task can be optionally dependent on other tasks.
For example:
A can be dependent on 3 tasks: B, C and D
B can be dependent on 2 tasks: C and E
It's basically a directed acyclic graph and execution of a task should happen only after the dependent tasks are executed.
Now it might happen that at any point of time, there are multiple tasks that are ready for execution. In such a case, we can run them in parallel.
Any idea on how to implement such an execution while having as much parallelism as possible?
class Task{
private String name;
private List<Task> dependentTasks;
public void run(){
// business logic
}
}
The other answer works fine but is too complicated.
A simpler way is to just execute Kahn's algorithm but in parallel.
The key is to execute all the tasks in parallel for whom all dependencies have been executed.
import java.time.Instant;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicInteger;
class DependencyManager {
private final ConcurrentHashMap<String, List<String>> _dependencies = new ConcurrentHashMap<>();
private final ConcurrentHashMap<String, List<String>> _reverseDependencies = new ConcurrentHashMap<>();
private final ConcurrentHashMap<String, Runnable> _tasks = new ConcurrentHashMap<>();
private final ConcurrentHashMap<String, Integer> _numDependenciesExecuted = new ConcurrentHashMap<>();
private final AtomicInteger _numTasksExecuted = new AtomicInteger(0);
private final ExecutorService _executorService = Executors.newFixedThreadPool(16);
private static Runnable getRunnable(DependencyManager dependencyManager, String taskId){
return () -> {
try {
Thread.sleep(2000); // A task takes 2 seconds to finish.
dependencyManager.taskCompleted(taskId);
} catch (InterruptedException e) {
e.printStackTrace();
}
};
}
/**
* In case a vertex is disconnected from the rest of the graph.
* #param taskId The task id
*/
public void addVertex(String taskId) {
_dependencies.putIfAbsent(taskId, new ArrayList<>());
_reverseDependencies.putIfAbsent(taskId, new ArrayList<>());
_tasks.putIfAbsent(taskId, getRunnable(this, taskId));
_numDependenciesExecuted.putIfAbsent(taskId, 0);
}
private void addEdge(String dependentTaskId, String dependeeTaskId) {
_dependencies.get(dependentTaskId).add(dependeeTaskId);
_reverseDependencies.get(dependeeTaskId).add(dependentTaskId);
}
public void addDependency(String dependentTaskId, String dependeeTaskId) {
addVertex(dependentTaskId);
addVertex(dependeeTaskId);
addEdge(dependentTaskId, dependeeTaskId);
}
private void taskCompleted(String taskId) {
System.out.println(String.format("%s:: Task %s done!!", Instant.now(), taskId));
_numTasksExecuted.incrementAndGet();
_reverseDependencies.get(taskId).forEach(nextTaskId -> {
_numDependenciesExecuted.computeIfPresent(nextTaskId, (__, currValue) -> currValue + 1);
int numDependencies = _dependencies.get(nextTaskId).size();
int numDependenciesExecuted = _numDependenciesExecuted.get(nextTaskId);
if (numDependenciesExecuted == numDependencies) {
// All dependencies have been executed, so we can submit this task to the threadpool.
_executorService.submit(_tasks.get(nextTaskId));
}
});
if (_numTasksExecuted.get() == _tasks.size()) {
topoSortCompleted();
}
}
private void topoSortCompleted() {
System.out.println("Topo sort complete!!");
_executorService.shutdownNow();
}
public void executeTopoSort() {
System.out.println(String.format("%s:: Topo sort started!!", Instant.now()));
_dependencies.forEach((taskId, dependencies) -> {
if (dependencies.isEmpty()) {
_executorService.submit(_tasks.get(taskId));
}
});
}
}
public class TestParallelTopoSort {
public static void main(String[] args) {
DependencyManager dependencyManager = new DependencyManager();
dependencyManager.addDependency("8", "5");
dependencyManager.addDependency("7", "5");
dependencyManager.addDependency("7", "6");
dependencyManager.addDependency("6", "3");
dependencyManager.addDependency("6", "4");
dependencyManager.addDependency("5", "1");
dependencyManager.addDependency("5", "2");
dependencyManager.addDependency("5", "3");
dependencyManager.addDependency("4", "1");
dependencyManager.executeTopoSort();
// Parallel version takes 8 seconds to execute.
// Serial version would have taken 16 seconds.
}
}
The Directed Acyclic Graph constructed in this example is this:
We can create a DAG where each vertex of the graph is one of the tasks.
After that, we can compute its topological sorted order.
We can then decorate the Task class with a priority field and run the ThreadPoolExecutor with a PriorityBlockingQueue which compares Tasks using the priority field.
The final trick is to override run() to first wait for all the dependent tasks to finish.
Since each task waits indefinitely for its dependent tasks to finish, we cannot afford to let the thread-pool be completely occupied with tasks that are higher up in the topological sort order; the thread pool will get stuck forever.
To avoid this, we just have to assign priorities to tasks according to the topological order.
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.FutureTask;
import java.util.concurrent.PriorityBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
public class Testing {
private static Callable<Void> getCallable(String taskId){
return () -> {
System.out.println(String.format("Task %s result", taskId));
Thread.sleep(100);
return null;
};
}
public static void main(String[] args) throws ExecutionException, InterruptedException {
Callable<Void> taskA = getCallable("A");
Callable<Void> taskB = getCallable("B");
Callable<Void> taskC = getCallable("C");
Callable<Void> taskD = getCallable("D");
Callable<Void> taskE = getCallable("E");
PrioritizedFutureTask<Void> pfTaskA = new PrioritizedFutureTask<>(taskA);
PrioritizedFutureTask<Void> pfTaskB = new PrioritizedFutureTask<>(taskB);
PrioritizedFutureTask<Void> pfTaskC = new PrioritizedFutureTask<>(taskC);
PrioritizedFutureTask<Void> pfTaskD = new PrioritizedFutureTask<>(taskD);
PrioritizedFutureTask<Void> pfTaskE = new PrioritizedFutureTask<>(taskE);
// Create a DAG graph.
pfTaskB.addDependency(pfTaskC).addDependency(pfTaskE);
pfTaskA.addDependency(pfTaskB).addDependency(pfTaskC).addDependency(pfTaskD);
// Now that we have a graph, we can just get its topological sorted order.
List<PrioritizedFutureTask<Void>> topological_sort = new ArrayList<>();
topological_sort.add(pfTaskE);
topological_sort.add(pfTaskC);
topological_sort.add(pfTaskB);
topological_sort.add(pfTaskD);
topological_sort.add(pfTaskA);
ThreadPoolExecutor executor = new ThreadPoolExecutor(5, 5, 0L, TimeUnit.MILLISECONDS,
new PriorityBlockingQueue<Runnable>(1, new CustomRunnableComparator()));
// Its important to insert the tasks in the topological sorted order, otherwise its possible that the thread pool will be stuck forever.
for (int i = 0; i < topological_sort.size(); i++) {
PrioritizedFutureTask<Void> pfTask = topological_sort.get(i);
pfTask.setPriority(i);
// The lower the priority, the sooner it will run.
executor.execute(pfTask);
}
}
}
class PrioritizedFutureTask<T> extends FutureTask<T> implements Comparable<PrioritizedFutureTask<T>> {
private Integer _priority = 0;
private final Callable<T> callable;
private final List<PrioritizedFutureTask> _dependencies = new ArrayList<>();
;
public PrioritizedFutureTask(Callable<T> callable) {
super(callable);
this.callable = callable;
}
public PrioritizedFutureTask(Callable<T> callable, Integer priority) {
this(callable);
_priority = priority;
}
public Integer getPriority() {
return _priority;
}
public PrioritizedFutureTask<T> setPriority(Integer priority) {
_priority = priority;
return this;
}
public PrioritizedFutureTask<T> addDependency(PrioritizedFutureTask dep) {
this._dependencies.add(dep);
return this;
}
#Override
public void run() {
for (PrioritizedFutureTask dep : _dependencies) {
try {
dep.get();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
}
super.run();
}
#Override
public int compareTo(PrioritizedFutureTask<T> other) {
if (other == null) {
throw new NullPointerException();
}
return getPriority().compareTo(other.getPriority());
}
}
class CustomRunnableComparator implements Comparator<Runnable> {
#Override
public int compare(Runnable task1, Runnable task2) {
return ((PrioritizedFutureTask) task1).compareTo((PrioritizedFutureTask) task2);
}
}
Output:
Task E result
Task C result
Task B result
Task D result
Task A result
PS: Here is a well-tested and simple implementation of topological sort in Python which you can easily port in Java.
Related
I am recently introduced to the LMAX Disruptor and decided to give it a try. Thanks to the developers, the setup was quick and hassle free. But I think I am running into an issue if someone can help me with it.
The issue:
I was told that when the producer publish the event, it should block until the consumer had a chance to retrieve it before wrapping around. I have a sequence barrier on the consumer side and I can confirm that if there is no data published by the producer, the consumer's waitFor call will block. But, producer doesn't seem to be regulated in any way and will just wraparound and overwrite unprocessed data in the ring buffer.
I have a producer as a runnable object running on separate thread.
public class Producer implements Runnable {
private final RingBuffer<Event> ringbuffer;
public Producer(RingBuffer<Event> rb) {
ringbuffer = rb;
}
public void run() {
long next = 0L;
while(true) {
try {
next = ringbuffer.next();
Event e = ringbuffer.get(next);
... do stuff...
e.set(... stuff...);
}
finally {
ringbuffer.publish(next);
}
}
}
}
I have a consumer running on the main thread.
public class Consumer {
private final ExecutorService exec;
private final Disruptor<Event> disruptor;
private final RingBuffer<Event> ringbuffer;
private final SequenceBarrier seqbar;
private long seq = 0L;
public Consumer() {
exec = Executors.newCachedThreadPool();
disruptor = new Disruptor<>(Event.EVENT_FACTORY, 1024, Executors.defaultThreadFactory());
ringbuffer = disruptor.start();
seqbar = ringbuffer.newBarrier();
Producer producer = new Producer(ringbuffer);
exec.submit(producer);
}
public Data getData() {
seqbar.waitFor(seq);
Event e = ringbuffer.get(seq);
seq++;
return e.get();
}
}
Finally, I run the code like so:
public class DisruptorTest {
public static void main(String[] args){
Consumer c = new Consumer();
while (true) {
c.getData();
... Do stuff ...
}
}
You need to add a gating sequence (com.lmax.disruptor.Sequence) to the ringBuffer, this sequence must be updated on what point your consumer is.
You can implement your event handling with EventHandler interface and using the provided BatchEventProcessor(com.lmax.disruptor.BatchEventProcessor.BatchEventProcessor) which comes with builtin sequence
Here's a fully working example
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import com.lmax.disruptor.BatchEventProcessor;
import com.lmax.disruptor.EventHandler;
import com.lmax.disruptor.RingBuffer;
import com.lmax.disruptor.SequenceBarrier;
import com.lmax.disruptor.dsl.Disruptor;
public class Main {
static class Event {
int id;
}
static class Producer implements Runnable {
private final RingBuffer<Event> ringbuffer;
public Producer(RingBuffer<Event> rb) {
ringbuffer = rb;
}
#Override
public void run() {
long next = 0L;
int id = 0;
while (true) {
try {
next = ringbuffer.next();
Event e = ringbuffer.get(next);
e.id = id++;
} finally {
ringbuffer.publish(next);
}
}
}
}
static class Consumer {
private final ExecutorService exec;
private final Disruptor<Event> disruptor;
private final RingBuffer<Event> ringbuffer;
private final SequenceBarrier seqbar;
private BatchEventProcessor<Event> processor;
public Consumer() {
exec = Executors.newCachedThreadPool();
disruptor = new Disruptor<>(() -> new Event(), 1024, Executors.defaultThreadFactory());
ringbuffer = disruptor.start();
seqbar = ringbuffer.newBarrier();
processor = new BatchEventProcessor<Main.Event>(
ringbuffer, seqbar, new Handler());
ringbuffer.addGatingSequences(processor.getSequence());
Producer producer = new Producer(ringbuffer);
exec.submit(producer);
}
}
static class Handler implements EventHandler<Event> {
#Override
public void onEvent(Event event, long sequence, boolean endOfBatch) throws Exception {
System.out.println("Handling event " + event.id);
}
}
public static void main(String[] args) throws Exception {
Consumer c = new Consumer();
while (true) {
c.processor.run();
}
}
}
I'm looking for a class that will allow me to add items to process and when the item count equals the batch size performs some operation. I would use it something like this:
Batcher<Token> batcher = new Batcher<Token>(500, Executors.newFixedThreadPool(4)) {
public void onFlush(List<Token> tokens) {
rest.notifyBatch(tokens);
}
};
tokens.forEach((t)->batcher.add(t));
batcher.awaitDone();
After #awaitDone I know that all tokens have been notified. The #onFlush might do anything, for example, I might want to batch inserts into a database. I would like #onFlush invocations to be put into a Executor.
I came up with a solution for this but it seems like a lot of code, so my question is this, is there a better way I should be doing this? Is there an existing class other than the one I implemented or a better way to implement this? Seems like my solution has a lot of moving pieces.
Here's the code I came up with:
/**
* Simple class to allow the batched processing of items and then to alternatively wait
* for all batches to be completed.
*/
public abstract class Batcher<T> {
private final int batchSize;
private final ArrayBlockingQueue<T> batch;
private final Executor executor;
private final Phaser phaser = new Phaser(1);
private final AtomicInteger processed = new AtomicInteger(0);
public Batcher(int batchSize, Executor executor) {
this.batchSize = batchSize;
this.executor = executor;
this.batch = new ArrayBlockingQueue<>(batchSize);
}
public void add(T item) {
processed.incrementAndGet();
while (!batch.offer(item)) {
flush();
}
}
public void addAll(Iterable<T> items) {
for (T item : items) {
add(item);
}
}
public int getProcessedCount() {
return processed.get();
}
public void flush() {
if (batch.isEmpty())
return;
final List<T> batched = new ArrayList<>(batchSize);
batch.drainTo(batched, batchSize);
if (!batched.isEmpty())
executor.execute(new PhasedRunnable(batched));
}
public abstract void onFlush(List<T> batch);
public void awaitDone() {
flush();
phaser.arriveAndAwaitAdvance();
}
public void awaitDone(long duration, TimeUnit unit) throws TimeoutException {
flush();
try {
phaser.awaitAdvanceInterruptibly(phaser.arrive(), duration, unit);
}
catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
private class PhasedRunnable implements Runnable {
private final List<T> batch;
private PhasedRunnable(List<T> batch) {
this.batch = batch;
phaser.register();
}
#Override
public void run() {
try {
onFlush(batch);
}
finally {
phaser.arrive();
}
}
}
}
A Java 8 solution would be great. Thanks.
What’s striking me is that your code doesn’t work with more than one thread adding items to a single Batcher instance. If we turn this limitation into the specified use case, there is no need to use specialized concurrent classes internally. So we can accumulate into an ordinary ArrayList and swap this list with a new one when the capacity is exhausted, without the need to copy items. This allows simplifying the code to
public class Batcher<T> implements Consumer<T> {
private final int batchSize;
private final Executor executor;
private final Consumer<List<T>> actualAction;
private final Phaser phaser = new Phaser(1);
private ArrayList<T> batch;
private int processed;
public Batcher(int batchSize, Executor executor, Consumer<List<T>> c) {
this.batchSize = batchSize;
this.executor = executor;
this.actualAction = c;
this.batch = new ArrayList<>(batchSize);
}
public void accept(T item) {
processed++;
if(batch.size()==batchSize) flush();
batch.add(item);
}
public int getProcessedCount() {
return processed;
}
public void flush() {
List<T> current = batch;
if (batch.isEmpty())
return;
batch = new ArrayList<>(batchSize);
phaser.register();
executor.execute(() -> {
try {
actualAction.accept(current);
}
finally {
phaser.arrive();
}
});
}
public void awaitDone() {
flush();
phaser.arriveAndAwaitAdvance();
}
public void awaitDone(long duration, TimeUnit unit) throws TimeoutException {
flush();
try {
phaser.awaitAdvanceInterruptibly(phaser.arrive(), duration, unit);
}
catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
regarding Java 8 specific improvements, it uses a Consumer which allows to specify the final action via lambda expression without the need to subclass Batcher. Further, the PhasedRunnable is replaced by a lambda expression. As another simplification, Batcher<T> implements Consumer<T> which elides the need for a method addAll as every Iterable supports forEach(Consumer<? super T>).
So the use case now looks like:
Batcher<Token> batcher = new Batcher<>(
500, Executors.newFixedThreadPool(4), currTokens -> rest.notifyBatch(currTokens));
tokens.forEach(batcher);
batcher.awaitDone();
I'm looking for a java collection that supports blocking read()s on a predicate. I wrote a simple version but it seems like this must have been invented already?
For example:
interface PredicateConsumerCollection<T> {
public void put(T t);
#Nullable
public T get(Predicate<T> p, long millis) throws InterruptedException;
}
put() delivers its argument to a waiting consumer with a matching predicate, or stashes it in a store. A get() returns immediately if a suitable T is already in the store, or blocks till a suitable value is put(), or times out. Consumers compete but fairness isn't critical in my case.
Anyone aware of a such a collection?
There is no immediate class that can solve your problem, but a combination of a ConcurrentHashMap and a BlockingQueue could be a solution.
The hash map is defined as:
final ConcurrentHashMap<Predicate, LinkedBlockingQueue<Result>> lookup;
The put needs to ensure, that for each Predicate a queue is added to the map, this can be done thread-safe using putIfAbsent.
If you have a fixed set of Predicates, you can simply pre-fill the list, then a Consumer can simply call lookup.get(Predicate).take()
If the amount of Predicates is unknown/too many, you need to write a wait/notify implementation for Consumers in case a Predicate is not yet in the list on your own.
I also need something very similar for testing that a certain JMS asynchronous message has been received within a certain timeout. It turns out that your question is relatively easy to implement by using basic wait/notify as explained in the Oracle tutorials. The idea is to make the put and query methods synchronized and let the query method do a wait. The put method calls notifyAll to wake up any waiting threads in the query method. The query method must then check if the predicate is matched. The most tricky thing is getting the timeout right due to waking up when the predicate does not match and due to possible " spurious wakeups". I found this stackoverflow post that provides the answer.
Here is the implementation I came up with:
import java.util.ArrayList;
import java.util.List;
// import net.jcip.annotations.GuardedBy;
import com.google.common.base.Predicate;
import com.google.common.collect.Iterables;
public class PredicateConsumerCollectionImpl<T> implements
PredicateConsumerCollection<T> {
// #GuardedBy("this")
private List<T> elements = new ArrayList<>();
#Override
public synchronized void put(T t) {
elements.add(t);
notifyAll();
}
#Override
public synchronized T query(Predicate<T> p, long millis)
throws InterruptedException {
T match = null;
long nanosOfOneMilli = 1000000L;
long endTime = System.nanoTime() + millis * nanosOfOneMilli;
while ((match = Iterables.find(elements, p, null)) == null) {
long sleepTime = endTime - System.nanoTime();
if (sleepTime <= 0) {
return null;
}
wait(sleepTime / nanosOfOneMilli,
(int) (sleepTime % nanosOfOneMilli));
}
return match;
}
synchronized boolean contains(T t) {
return elements.contains(t);
}
}
And here is a JUnit test that proves that the code works as intended:
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;
import static org.junit.Assert.fail;
import org.junit.Before;
import org.junit.Test;
import com.google.common.base.Predicate;
/**
* Unit test for the {#link PredicateConsumerCollection} implementation.
*
* <p>
* The tests act as consumers waiting for the test Producer to put a certain
* String.
*/
public class PredicateConsumerCollectionTest {
private static class Producer implements Runnable {
private PredicateConsumerCollection<String> collection;
public Producer(PredicateConsumerCollection<String> collection) {
this.collection = collection;
collection.put("Initial");
}
#Override
public void run() {
try {
int millis = 50;
collection.put("Hello");
Thread.sleep(millis);
collection.put("I");
Thread.sleep(millis);
collection.put("am");
Thread.sleep(millis);
collection.put("done");
Thread.sleep(millis);
collection.put("so");
Thread.sleep(millis);
collection.put("goodbye!");
} catch (InterruptedException e) {
e.printStackTrace();
fail("Unexpected InterruptedException");
}
}
}
private PredicateConsumerCollectionImpl<String> collection;
private Producer producer;
#Before
public void setup() {
collection = new PredicateConsumerCollectionImpl<>();
producer = new Producer(collection);
}
#Test(timeout = 2000)
public void wait_for_done() throws InterruptedException {
assertTrue(collection.contains("Initial"));
assertFalse(collection.contains("Hello"));
Thread producerThread = new Thread(producer);
producerThread.start();
String result = collection.query(new Predicate<String>() {
#Override
public boolean apply(String s) {
return "done".equals(s);
}
}, 1000);
assertEquals("done", result);
assertTrue(collection.contains("Hello"));
assertTrue(collection.contains("done"));
assertTrue(producerThread.isAlive());
assertFalse(collection.contains("goodbye!"));
producerThread.join();
assertTrue(collection.contains("goodbye!"));
}
#Test(timeout = 2000)
public void wait_for_done_immediately_happens() throws InterruptedException {
Thread producerThread = new Thread(producer);
producerThread.start();
String result = collection.query(new Predicate<String>() {
#Override
public boolean apply(String s) {
return "Initial".equals(s);
}
}, 1000);
assertEquals("Initial", result);
assertFalse(collection.contains("I"));
producerThread.join();
assertTrue(collection.contains("goodbye!"));
}
#Test(timeout = 2000)
public void wait_for_done_never_happens() throws InterruptedException {
Thread producerThread = new Thread(producer);
producerThread.start();
assertTrue(producerThread.isAlive());
String result = collection.query(new Predicate<String>() {
#Override
public boolean apply(String s) {
return "DONE".equals(s);
}
}, 1000);
assertEquals(null, result);
assertFalse(producerThread.isAlive());
assertTrue(collection.contains("goodbye!"));
}
}
I want to use a CompletionService to process the results from a series of threads as they are completed. I have the service in a loop to take the Future objects it provides as they become available, but I don't know the best way to determine when all the threads have completed (and thus to exit the loop):
import java.util.concurrent.Callable;
import java.util.concurrent.CompletionService;
import java.util.concurrent.ExecutorCompletionService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.ThreadPoolExecutor;
public class Bar {
final static int MAX_THREADS = 4;
final static int TOTAL_THREADS = 20;
public static void main(String[] args) throws Exception{
final ThreadPoolExecutor threadPool = (ThreadPoolExecutor) Executors.newFixedThreadPool(MAX_THREADS);
final CompletionService<Integer> service = new ExecutorCompletionService<Integer>(threadPool);
for (int i=0; i<TOTAL_THREADS; i++){
service.submit(new MyCallable(i));
}
int finished = 0;
Future<Integer> future = null;
do{
future = service.take();
int result = future.get();
System.out.println(" took: " + result);
finished++;
}while(finished < TOTAL_THREADS);
System.out.println("Shutting down");
threadPool.shutdown();
}
public static class MyCallable implements Callable<Integer>{
final int id;
public MyCallable(int id){
this.id = id;
System.out.println("Submitting: " + id);
}
#Override
public Integer call() throws Exception {
Thread.sleep(1000);
System.out.println("finished: " + id);
return id;
}
}
}
I've tried checking the state of the ThreadPoolExecutor, but I know the getCompletedTaskCount and getTaskCount methods are only approximations and shouldn't be relied upon. Is there a better way to ensure that I've retrieved all the Futures from the CompletionService than counting them myself?
Edit: Both the link that Nobeh provided, and this link suggest that counting the number of tasks submitted, then calling take() that many times, is the way to go. I'm just surprised there isn't a way to ask the CompletionService or its Executor what's left to be returned.
See http://www.javaspecialists.eu/archive/Issue214.html for a decent suggestion on how to extend the ExecutorCompletionService to do what you're looking for. I've pasted the relevant code below for your convenience. The author also suggests making the service implement Iterable, which I think would be a good idea.
FWIW, I agree with you that this really should be part of the standard implementation, but alas, it's not.
import java.util.concurrent.*;
import java.util.concurrent.atomic.*;
public class CountingCompletionService<V> extends ExecutorCompletionService<V> {
private final AtomicLong submittedTasks = new AtomicLong();
private final AtomicLong completedTasks = new AtomicLong();
public CountingCompletionService(Executor executor) {
super(executor);
}
public CountingCompletionService(
Executor executor, BlockingQueue<Future<V>> queue) {
super(executor, queue);
}
public Future<V> submit(Callable<V> task) {
Future<V> future = super.submit(task);
submittedTasks.incrementAndGet();
return future;
}
public Future<V> submit(Runnable task, V result) {
Future<V> future = super.submit(task, result);
submittedTasks.incrementAndGet();
return future;
}
public Future<V> take() throws InterruptedException {
Future<V> future = super.take();
completedTasks.incrementAndGet();
return future;
}
public Future<V> poll() {
Future<V> future = super.poll();
if (future != null) completedTasks.incrementAndGet();
return future;
}
public Future<V> poll(long timeout, TimeUnit unit)
throws InterruptedException {
Future<V> future = super.poll(timeout, unit);
if (future != null) completedTasks.incrementAndGet();
return future;
}
public long getNumberOfCompletedTasks() {
return completedTasks.get();
}
public long getNumberOfSubmittedTasks() {
return submittedTasks.get();
}
public boolean hasUncompletedTasks() {
return completedTasks.get() < submittedTasks.get();
}
}
The code below is inspired by #Mark's answer, but I find it more convenient to use:
package com.example;
import java.util.Iterator;
import java.util.concurrent.Callable;
import java.util.concurrent.CompletionService;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorCompletionService;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
public class CompletionIterator<T> implements Iterator<T>, AutoCloseable {
private AtomicInteger count = new AtomicInteger(0);
private CompletionService<T> completer;
private ExecutorService executor = Executors.newWorkStealingPool(100);
public CompletionIterator() {
this.completer = new ExecutorCompletionService<>(executor);
}
public void submit(Callable<T> task) {
completer.submit(task);
count.incrementAndGet();
}
#Override
public boolean hasNext() {
return count.decrementAndGet() >= 0;
}
#Override
public T next() {
try {
return completer.take().get();
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
}
}
#Override
public void close() {
try {
executor.shutdown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
executor = null;
completer = null;
count = null;
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
This is how it can be used :
try(CompletionIterator service = new CompletionIterator()) {
service.submit(task1);
service.submit(task2);
// all tasks must be submitted before iterating, to avoid race condition
for (Future<Integer> future : service) {
System.out.printf("Job %d is done%n", future.get());
}
}
Answering to these questions gives you the answer?
Do your asynchronous tasks create other tasks submitted to CompletionService?
Is service the only object that is supposed to handle the tasks created in your application?
Based on reference documentation, CompletionService acts upon a consumer/producer approach and takes advantage of an internal Executor. So, as long as, you produce the tasks in one place and consume them in another place, CompletionService.take() will denote if there are any more results to give out.
I believe this question also helps you.
My take based on Alex R' variant. Implying this will only be called in one thread, so no atomics just plain int counter
public class CompletionIterator<T> implements Iterable<T> {
private int _count = 0;
private final CompletionService<T> _completer;
public CompletionIterator(ExecutorService executor) {
this._completer = new ExecutorCompletionService<>(executor);
}
public void submit(Callable<T> task) {
_completer.submit(task);
_count++;
}
#Override
public Iterator<T> iterator() {
return new Iterator<T>() {
#Override
public boolean hasNext() {
return _count > 0;
}
#Override
public T next() {
try {
T ret = _completer.take().get();
_count--;
return ret;
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
}
}
};
}
}
I have a process which delegates asynch tasks to a pool of threads. I need to ensure that certain tasks are executed in order.
So for example
Tasks arrive in order
Tasks a1, b1, c1, d1 , e1, a2, a3, b2, f1
Tasks can be executed in any order except where there is a natural dependancy, so a1,a2,a3 must be processed in that order by either allocating to the same thread or blocking these until I know the previous a# task was completed.
Currently it doesn't use the Java Concurrency package, but I'm considering changing to take avantage of the thread management.
Does anyone have a similar solution or suggestions of how to achieve this
I write own Executor that warrants task ordering for tasks with same key. It uses map of queues for order tasks with same key. Each keyed task execute next task with the same key.
This solution don't handle RejectedExecutionException or other exceptions from delegated Executor! So delegated Executor should be "unlimited".
import java.util.HashMap;
import java.util.LinkedList;
import java.util.Map;
import java.util.Queue;
import java.util.concurrent.Executor;
/**
* This Executor warrants task ordering for tasks with same key (key have to implement hashCode and equal methods correctly).
*/
public class OrderingExecutor implements Executor{
private final Executor delegate;
private final Map<Object, Queue<Runnable>> keyedTasks = new HashMap<Object, Queue<Runnable>>();
public OrderingExecutor(Executor delegate){
this.delegate = delegate;
}
#Override
public void execute(Runnable task) {
// task without key can be executed immediately
delegate.execute(task);
}
public void execute(Runnable task, Object key) {
if (key == null){ // if key is null, execute without ordering
execute(task);
return;
}
boolean first;
Runnable wrappedTask;
synchronized (keyedTasks){
Queue<Runnable> dependencyQueue = keyedTasks.get(key);
first = (dependencyQueue == null);
if (dependencyQueue == null){
dependencyQueue = new LinkedList<Runnable>();
keyedTasks.put(key, dependencyQueue);
}
wrappedTask = wrap(task, dependencyQueue, key);
if (!first)
dependencyQueue.add(wrappedTask);
}
// execute method can block, call it outside synchronize block
if (first)
delegate.execute(wrappedTask);
}
private Runnable wrap(Runnable task, Queue<Runnable> dependencyQueue, Object key) {
return new OrderedTask(task, dependencyQueue, key);
}
class OrderedTask implements Runnable{
private final Queue<Runnable> dependencyQueue;
private final Runnable task;
private final Object key;
public OrderedTask(Runnable task, Queue<Runnable> dependencyQueue, Object key) {
this.task = task;
this.dependencyQueue = dependencyQueue;
this.key = key;
}
#Override
public void run() {
try{
task.run();
} finally {
Runnable nextTask = null;
synchronized (keyedTasks){
if (dependencyQueue.isEmpty()){
keyedTasks.remove(key);
}else{
nextTask = dependencyQueue.poll();
}
}
if (nextTask!=null)
delegate.execute(nextTask);
}
}
}
}
When I've done this in the past I've usually had the ordering handled by a component which then submits callables/runnables to an Executor.
Something like.
Got a list of tasks to run, some with dependencies
Create an Executor and wrap with an ExecutorCompletionService
Search all tasks, any with no dependencies, schedule them via the completion service
Poll the completion service
As each task completes
Add it to a "completed" list
Reevaluate any waiting tasks wrt to the "completed list" to see if they are "dependency complete". If so schedule them
Rinse repeat until all tasks are submitted/completed
The completion service is a nice way of being able to get the tasks as they complete rather than trying to poll a bunch of Futures. However you will probably want to keep a Map<Future, TaskIdentifier> which is populated when a task is schedule via the completion service so that when the completion service gives you a completed Future you can figure out which TaskIdentifier it is.
If you ever find yourself in a state where tasks are still waiting to run, but nothing is running and nothing can be scheduled then your have a circular dependency problem.
When you submit a Runnable or Callable to an ExecutorService you receive a Future in return. Have the threads that depend on a1 be passed a1's Future and call Future.get(). This will block until the thread completes.
So:
ExecutorService exec = Executor.newFixedThreadPool(5);
Runnable a1 = ...
final Future f1 = exec.submit(a1);
Runnable a2 = new Runnable() {
#Override
public void run() {
f1.get();
... // do stuff
}
}
exec.submit(a2);
and so on.
You can use Executors.newSingleThreadExecutor(), but it will use only one thread to execute your tasks. Another option is to use CountDownLatch. Here is a simple example:
public class Main2 {
public static void main(String[] args) throws InterruptedException {
final CountDownLatch cdl1 = new CountDownLatch(1);
final CountDownLatch cdl2 = new CountDownLatch(1);
final CountDownLatch cdl3 = new CountDownLatch(1);
List<Runnable> list = new ArrayList<Runnable>();
list.add(new Runnable() {
public void run() {
System.out.println("Task 1");
// inform that task 1 is finished
cdl1.countDown();
}
});
list.add(new Runnable() {
public void run() {
// wait until task 1 is finished
try {
cdl1.await();
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("Task 2");
// inform that task 2 is finished
cdl2.countDown();
}
});
list.add(new Runnable() {
public void run() {
// wait until task 2 is finished
try {
cdl2.await();
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("Task 3");
// inform that task 3 is finished
cdl3.countDown();
}
});
ExecutorService es = Executors.newFixedThreadPool(200);
for (int i = 0; i < 3; i++) {
es.submit(list.get(i));
}
es.shutdown();
es.awaitTermination(1, TimeUnit.MINUTES);
}
}
Another option is to create your own executor, call it OrderedExecutor, and create an array of encapsulated ThreadPoolExecutor objects, with 1 thread per internal executor. You then supply a mechanism for choosing one of the internal objects, eg, you can do this by providing an interface that the user of your class can implement:
executor = new OrderedExecutor( 10 /* pool size */, new OrderedExecutor.Chooser() {
public int choose( Runnable runnable ) {
MyRunnable myRunnable = (MyRunnable)runnable;
return myRunnable.someId();
});
executor.execute( new MyRunnable() );
The implementation of OrderedExecutor.execute() will then use the Chooser to get an int, you mod this with the pool size, and that's your index into the internal array. The idea being that "someId()" will return the same value for all the "a's", etc.
I created an OrderingExecutor for this problem. If you pass the same key to to method execute() with different runnables, the execution of the runnables with the same key will be in the order the execute() is called and will never overlap.
import java.util.Arrays;
import java.util.Collection;
import java.util.Iterator;
import java.util.Queue;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.Executor;
/**
* Special executor which can order the tasks if a common key is given.
* Runnables submitted with non-null key will guaranteed to run in order for the same key.
*
*/
public class OrderedExecutor {
private static final Queue<Runnable> EMPTY_QUEUE = new QueueWithHashCodeAndEquals<Runnable>(
new ConcurrentLinkedQueue<Runnable>());
private ConcurrentMap<Object, Queue<Runnable>> taskMap = new ConcurrentHashMap<Object, Queue<Runnable>>();
private Executor delegate;
private volatile boolean stopped;
public OrderedExecutor(Executor delegate) {
this.delegate = delegate;
}
public void execute(Runnable runnable, Object key) {
if (stopped) {
return;
}
if (key == null) {
delegate.execute(runnable);
return;
}
Queue<Runnable> queueForKey = taskMap.computeIfPresent(key, (k, v) -> {
v.add(runnable);
return v;
});
if (queueForKey == null) {
// There was no running task with this key
Queue<Runnable> newQ = new QueueWithHashCodeAndEquals<Runnable>(new ConcurrentLinkedQueue<Runnable>());
newQ.add(runnable);
// Use putIfAbsent because this execute() method can be called concurrently as well
queueForKey = taskMap.putIfAbsent(key, newQ);
if (queueForKey != null)
queueForKey.add(runnable);
delegate.execute(new InternalRunnable(key));
}
}
public void shutdown() {
stopped = true;
taskMap.clear();
}
/**
* Own Runnable used by OrderedExecutor.
* The runnable is associated with a specific key - the Queue<Runnable> for this
* key is polled.
* If the queue is empty, it tries to remove the queue from taskMap.
*
*/
private class InternalRunnable implements Runnable {
private Object key;
public InternalRunnable(Object key) {
this.key = key;
}
#Override
public void run() {
while (true) {
// There must be at least one task now
Runnable r = taskMap.get(key).poll();
while (r != null) {
r.run();
r = taskMap.get(key).poll();
}
// The queue emptied
// Remove from the map if and only if the queue is really empty
boolean removed = taskMap.remove(key, EMPTY_QUEUE);
if (removed) {
// The queue has been removed from the map,
// if a new task arrives with the same key, a new InternalRunnable
// will be created
break;
} // If the queue has not been removed from the map it means that someone put a task into it
// so we can safely continue the loop
}
}
}
/**
* Special Queue implementation, with equals() and hashCode() methods.
* By default, Java SE queues use identity equals() and default hashCode() methods.
* This implementation uses Arrays.equals(Queue::toArray()) and Arrays.hashCode(Queue::toArray()).
*
* #param <E> The type of elements in the queue.
*/
private static class QueueWithHashCodeAndEquals<E> implements Queue<E> {
private Queue<E> delegate;
public QueueWithHashCodeAndEquals(Queue<E> delegate) {
this.delegate = delegate;
}
public boolean add(E e) {
return delegate.add(e);
}
public boolean offer(E e) {
return delegate.offer(e);
}
public int size() {
return delegate.size();
}
public boolean isEmpty() {
return delegate.isEmpty();
}
public boolean contains(Object o) {
return delegate.contains(o);
}
public E remove() {
return delegate.remove();
}
public E poll() {
return delegate.poll();
}
public E element() {
return delegate.element();
}
public Iterator<E> iterator() {
return delegate.iterator();
}
public E peek() {
return delegate.peek();
}
public Object[] toArray() {
return delegate.toArray();
}
public <T> T[] toArray(T[] a) {
return delegate.toArray(a);
}
public boolean remove(Object o) {
return delegate.remove(o);
}
public boolean containsAll(Collection<?> c) {
return delegate.containsAll(c);
}
public boolean addAll(Collection<? extends E> c) {
return delegate.addAll(c);
}
public boolean removeAll(Collection<?> c) {
return delegate.removeAll(c);
}
public boolean retainAll(Collection<?> c) {
return delegate.retainAll(c);
}
public void clear() {
delegate.clear();
}
#Override
public boolean equals(Object obj) {
if (!(obj instanceof QueueWithHashCodeAndEquals)) {
return false;
}
QueueWithHashCodeAndEquals<?> other = (QueueWithHashCodeAndEquals<?>) obj;
return Arrays.equals(toArray(), other.toArray());
}
#Override
public int hashCode() {
return Arrays.hashCode(toArray());
}
}
}
In Habanero-Java library, there is a concept of data-driven tasks which can be used to express dependencies between tasks and avoid thread-blocking operations. Under the covers Habanero-Java library uses the JDKs ForkJoinPool (i.e. an ExecutorService).
For example, your use case for tasks A1, A2, A3, ... could be expressed as follows:
HjFuture a1 = future(() -> { doA1(); return true; });
HjFuture a2 = futureAwait(a1, () -> { doA2(); return true; });
HjFuture a3 = futureAwait(a2, () -> { doA3(); return true; });
Note that a1, a2, and a3 are just references to objects of type HjFuture and can be maintained in your custom data structures to specify the dependencies as and when the tasks A2 and A3 come in at runtime.
There are some tutorial slides available.
You can find further documentation as javadoc, API summary and primers.
I have written my won executor service which is sequence aware. It sequences the tasks which contain certain related reference and currently inflight.
You can go through the implementation at https://github.com/nenapu/SequenceAwareExecutorService