Are tasks parallelized when executed via an ExecutorCompletionService?

Are tasks parallelized when executed via an ExecutorCompletionService? - java

I submitted 5 jobs to an ExecutorCompletionService, but it seems like the jobs are executed in sequence. The ExecutorService that is passed to the constructor of ExecutorCompletionService is created using newCacheThreadPool form. Am I doing anything wrong ?
UPDATE Each job is basically doing a database query & some calculation. The code for the ExecutorCompletionService is lifted as-is off the javadoc. I just replaced the Callables with my own custom Callable implementations.

The ExecutorCompletionService has nothing to do with how jobs are executed, it's simply a convenient way of retrieving the results.
Executors.newCachedThreadPool by default executes tasks in separate threads, which can be parallel, given that:
tasks are independent, and don't e.g. synchronize on the same object inside;
you have multiple hardware CPU threads.
The last point deserves an explanation. Although there are no guarantees, in practice the Sun JVM favours the currently executing thread so it's never swapped out in favour of another one. That means that your 5 tasks might end up being executed serially due to the JVM implementation and not having e.g. a multi-core machine.

I assume you meant Executors.newCachedThreadPool(). If so, execution should be parallelized as you expect.

Each job is basically doing a database query & some calculation. The code for the ExecutorCompletionService is lifted as-is off the javadoc. I just replaced the Callables with my own custom Callable implementations.
In that case, are you sure you're not mistaken in thinking they're executed sequentially because you're retrieving the results sequentially?
Throw in some debug logging lines in your callables to rule this out, and/or have a look at this limited usage scenario:
public static void main(String... args) throws InterruptedException, ExecutionException {
List<Callable<String>> list = new ArrayList<Callable<String>>();
list.add(new PowersOfX(2));
list.add(new PowersOfX(3));
list.add(new PowersOfX(5));
solve(Executors.newCachedThreadPool(), list);
}
static void solve(Executor e, Collection<Callable<String>> solvers) throws InterruptedException, ExecutionException {
CompletionService<String> ecs = new ExecutorCompletionService<String>(e);
for (Callable<String> s : solvers)
ecs.submit(s);
int n = solvers.size();
for (int i = 0; i < n; ++i) {
String r = ecs.take().get();
if (r != null)
System.out.println("Retrieved: " + r);
}
}
static class PowersOfX implements Callable<String> {
int x;
public PowersOfX(int x) {this.x = x;}
#Override
public String call() throws Exception {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 10; i++) {
sb.append(Math.pow(2, i)).append('\t');
System.out.println(Math.pow(x, i));
Thread.sleep(2000);
}
return sb.toString();
}
}
Executing this you'll see the numbers are generated intermixed (and thus executed concurrently), but retrieving the results alone wont show you this level detail..

The execution will depend on a number of things. For example:
the length of time it takes to complete a job
the number of threads in the thread pool (a cached thread pool will only create threads if it thinks they are needed)
Executing in sequence is not necessarily wrong.

Related

Get results of scheduled non-blocking operations in Java

I am trying to do some blocking operations (say HTTP request) in a scheduled and non-blocking manner. Let's say I have 10 requests and one request takes 3 seconds but I would like not to wait for 3 seconds but wait 1 second and send the next one. After all executions are finished I would like to gather all results in a list and return to the user.
Below, there is a prototype of my scenario (thread sleep used as blocking operation instead of HTTP req.)
public static List<Integer> getResults(List<Integer> inputs) throws InterruptedException, ExecutionException {
List<Integer> results = new LinkedList<Integer>();
Queue<Callable<Integer>> tasks = new LinkedList<Callable<Integer>>();
List<Future<Integer>> futures = new LinkedList<Future<Integer>>();
for (Integer input : inputs) {
Callable<Integer> task = new Callable<Integer>() {
public Integer call() throws InterruptedException {
Thread.sleep(3000);
return input + 1000;
}
};
tasks.add(task);
}
ExecutorService es = Executors.newCachedThreadPool();
ScheduledExecutorService ses = Executors.newScheduledThreadPool(1);
ses.scheduleAtFixedRate(new Runnable() {
#Override
public void run() {
Callable<Integer> task = tasks.poll();
if (task == null) {
ses.shutdown();
es.shutdown();
return;
}
futures.add(es.submit(task));
}
}, 0, 1000, TimeUnit.MILLISECONDS);
while(true) {
if(futures.size() == inputs.size()) {
for (Future<Integer> future : futures) {
Integer result = future.get();
results.add(result);
}
return results;
}
}
}
public static void main(String[] args) throws InterruptedException, ExecutionException {
List<Integer> results = getResults(new LinkedList<Integer>(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)));
System.out.println(Arrays.toString(results.toArray()));
}
I am waiting in a while loop until all tasks return a proper result. But it never enters inside the breaking condition and it infinitely loops. Whenever I put an I/O operation like logger or even a breakpoint, it just break the while loop and everything becomes ok.
I am relatively new to Java concurrency and trying to understand what is happening and whether this is the correct way to do. I guess I/O operation triggers something on thread scheduler and make it check the collections' sizes.

You need to synchronize your threads. You have two different threads (the main thread and the exectuor service thread) accessing the futures list and since LinkedList is not synchronized, these two threads see two different values of futures.
while(true) {
synchronized(futures) {
if(futures.size() == inputs.size()) {
...
}
}
}
This happens because threads in java use the cpu cache to improve performance. So each thread could have different values of a variable until they are synchronized.
This SO question has more information on this.
Also from this answer:
It's all about memory. Threads communicate through shared memory, but when there are multiple CPUs in a system, all trying to access the same memory system, then the memory system becomes a bottleneck. Therefore, the CPUs in a typical multi-CPU computer are allowed to delay, re-order, and cache memory operations in order to speed things up.
That works great when threads are not interacting with one another, but it causes problems when they actually do want to interact: If thread A stores a value into an ordinary variable, Java makes no guarantee about when (or even if) thread B will see the value change.
In order to overcome that problem when it's important, Java gives you certain means of synchronizing threads. That is, getting the threads to agree on the state of the program's memory. The volatile keyword and the synchronized keyword are two means of establishing synchronization between threads.
And finally, the futures list does not update in your code because the main thread is continuously occupied, because of the infinte while block. Doing any I/O operation in your while loop gives the cpu enough breathing space to update its local cache.
An infinite while loop is generally a bad idea because it is very resource intensive. Adding a small delay before the next iteration can make it a little better (though still inefficient).

Why is CompletableFuture.supplyAsync succeeding a random number of times?

I'm new to both lambdas and asynchronous code in Java 8. I keep getting some weird results...
I have the following code:
import java.util.concurrent.CompletableFuture;
public class Program {
public static void main(String[] args) {
for (int i = 0; i < 100; i++) {
String test = "Test_" + i;
final int a = i;
CompletableFuture<Boolean> cf = CompletableFuture.supplyAsync(() -> doPost(test));
cf.thenRun(() -> System.out.println(a)) ;
}
}
private static boolean doPost(String t) {
System.out.println(t);
return true;
}
}
The actual code is a lot longer, as the doPost method will post some data to a web service. However, I'm able to replicate my issue with this bare-bones code.
I want to have the doPost method execute 100 times, but asynchronously for performance reasons (in order to push data to the web service faster than doing 100 synchronous calls would be).
In the code above, the ´doPost´ method is run a random amount of times, but always no more than 20-25 times. There are no exceptions thrown. It seems that either some thread handling mechanism is silently refusing to create new threads and execute their code, or the threads are silently crashing without crashing the program.
I also have an issue where, if I add more functionality to the doPost method than shown above, it reaches a point where the method simply silently breaks. I've tried adding a System.out.println("test") right before the return statement in that case, but it is never called. The loop which loops 100 times does run 100 iterations though.
This behaviour is confusing, to say the least.
What am I missing? Why is the function supplied as an argument to supplyAsync run a seemingly random number of times?
EDIT: Just wanted to point out that the situation is not exactly the same as in the question this was marked as a possible duplicate of, as that question dealt with arbitrarily deeply nested futures, and this one deals with parallell ones. However, the reason why they are failing is virtually identical. The cases seem distinct enough to merit separate questions to me, but others might disagree...

By default CompletableFuture uses own ForkJoinPool.commonPool() (see CompletableFuture implementation). And this default pool creates only daemon threads, e.g. they won't block the main application from terminating if they still alive.
You have the following choices:
Collect all CompletionStage to some array and then make java.util.concurrent.CompletableFuture#allOf().toCompletableFuture().join() - this will guarantee all the stages are completed before going after join()
Use *Async operations with your own thread pool which contains only non-daemon threads, like in the following example:
public static void main(String[] args) throws InterruptedException {
ExecutorService pool = Executors.newFixedThreadPool(10, r -> {
Thread t = new Thread(r);
t.setDaemon(false); // must be not daemon
return t;
});
for (int i = 0; i < 100; i++) {
final int a = i;
// the operation must be Async with our thread pool
CompletableFuture<Boolean> cf = CompletableFuture.supplyAsync(() -> doPost(a), pool);
cf.thenRun(() -> System.out.printf("%s: Run_%s%n", Thread.currentThread().getName(), a));
}
pool.shutdown(); // without this the main application will be blocked forever
}
private static boolean doPost(int t) {
System.out.printf("%s: Post_%s%n", Thread.currentThread().getName(), t);
return true;
}

Massive tasks alternative pattern for Runnable or Callable

For massive parallel computing I tend to use executors and callables. When I have thousand of objects to be computed I feel not so good to instantiate thousand of Runnables for each object.
So I have two approaches to solve this:
I. Split the workload into a small amount of x-workers giving y-objects each. (splitting the object list into x-partitions with y/x-size each)
public static <V> List<List<V>> partitions(List<V> list, int chunks) {
final ArrayList<List<V>> lists = new ArrayList<List<V>>();
final int size = Math.max(1, list.size() / chunks + 1);
final int listSize = list.size();
for (int i = 0; i <= chunks; i++) {
final List<V> vs = list.subList(Math.min(listSize, i * size), Math.min(listSize, i * size + size));
if(vs.size() == 0) break;
lists.add(vs);
}
return lists;
}
II. Creating x-workers which fetch objects from a queue.
Questions:
Is creating thousand of Runnables really expensive and to be avoided?
Is there a generic pattern/recommendation how to do it by solution II?
Are you aware of a different approach?

Creating thousands of Runnable (objects implementing Runnable) is not more expensive than creating a normal object.
Creating and running thousands of Threads can be very heavy, but you can use Executors with a pool of threads to solve this problem.

As for the different approach, you might be interested in java 8's parallel streams.

Combining various answers here :
Is creating thousand of Runnables really expensive and to be avoided?
No, it's not in and of itself. It's how you will make them execute that may prove costly (spawning a few thousand threads certainly has its cost).
So you would not want to do this :
List<Computation> computations = ...
List<Thread> threads = new ArrayList<>();
for (Computation computation : computations) {
Thread thread = new Thread(new Computation(computation));
threads.add(thread);
thread.start();
}
// If you need to wait for completion:
for (Thread t : threads) {
t.join();
}
Because it would 1) be unnecessarily costly in terms of OS ressource (native threads, each having a stack on the heap), 2) spam the OS scheduler with a vastly concurrent workload, most certainly leading to plenty of context switchs and associated cache invalidations at the CPU level 3) be a nightmare to catch and deal with exceptions (your threads should probably define an Uncaught exception handler, and you'd have to deal with it manually).
You'd probably prefer an approach where a finite Thread pool (of a few threads, "a few" being closely related to your number of CPU cores) handles many many Callables.
List<Computation> computations = ...
ExecutorService pool = Executors.newFixedSizeThreadPool(someNumber)
List<Future<Result>> results = new ArrayList<>();
for (Computation computation : computations) {
results.add(pool.submit(new ComputationCallable(computation));
}
for (Future<Result> result : results {
doSomething(result.get);
}
The fact that you reuse a limited number threads should yield a really nice improvement.
Is there a generic pattern/recommendation how to do it by solution II?
There are. First, your partition code (getting from a List to a List<List>) can be found inside collection tools such as Guava, with more generic and fail-proofed implementations.
But more than this, two patterns come to mind for what you are achieving :
Use the Fork/Join Pool with Fork/Join tasks (that is, spawn a task with your whole list of items, and each task will fork sub tasks with half of that list, up to the point where each task manages a small enough list of items). It's divide and conquer. See: http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ForkJoinTask.html
If your computation were to be "add integers from a list", it could look like (there might be a boundary bug in there, I did not really check) :
public static class Adder extends RecursiveTask<Integer> {
protected List<Integer> globalList;
protected int start;
protected int stop;
public Adder(List<Integer> globalList, int start, int stop) {
super();
this.globalList = globalList;
this.start = start;
this.stop = stop;
System.out.println("Creating for " + start + " => " + stop);
}
#Override
protected Integer compute() {
if (stop - start > 1000) {
// Too many arguments, we split the list
Adder subTask1 = new Adder(globalList, start, start + (stop-start)/2);
Adder subTask2 = new Adder(globalList, start + (stop-start)/2, stop);
subTask2.fork();
return subTask1.compute() + subTask2.join();
} else {
// Manageable size of arguments, we deal in place
int result = 0;
for(int i = start; i < stop; i++) {
result +=i;
}
return result;
}
}
}
public void doWork() throws Exception {
List<Integer> computation = new ArrayList<>();
for(int i = 0; i < 10000; i++) {
computation.add(i);
}
ForkJoinPool pool = new ForkJoinPool();
RecursiveTask<Integer> masterTask = new Adder(computation, 0, computation.size());
Future<Integer> future = pool.submit(masterTask);
System.out.println(future.get());
}
Use Java 8 parallel streams in order to launch multiple parallel computations easily (under the hood, Java parallel streams can fall back to the Fork/Join pool actually).
Others have shown how this might look like.
Are you aware of a different approach?
For a different take at concurrent programming (without explicit task / thread handling), have a look at the actor pattern. https://en.wikipedia.org/wiki/Actor_model
Akka comes to mind as a popular implementation of this pattern...

#Aaron is right, you should take a look into Java 8's parallel streams:
void processInParallel(List<V> list) {
list.parallelStream().forEach(item -> {
// do something
});
}
If you need to specify chunks, you could use a ForkJoinPool as described here:
void processInParallel(List<V> list, int chunks) {
ForkJoinPool forkJoinPool = new ForkJoinPool(chunks);
forkJoinPool.submit(() -> {
list.parallelStream().forEach(item -> {
// do something with each item
});
});
}
You could also have a functional interface as an argument:
void processInParallel(List<V> list, int chunks, Consumer<V> processor) {
ForkJoinPool forkJoinPool = new ForkJoinPool(chunks);
forkJoinPool.submit(() -> {
list.parallelStream().forEach(item -> processor.accept(item));
});
}
Or in shorthand notation:
void processInParallel(List<V> list, int chunks, Consumer<V> processor) {
new ForkJoinPool(chunks).submit(() -> list.parallelStream().forEach(processor::accept));
}
And then you would use it like:
processInParallel(myList, 2, item -> {
// do something with each item
});
Depending on your needs, the ForkJoinPool#submit() returns an instance of ForkJoinTask, which is a Future and you may use it to check for the status or wait for the end of your task.
You'd most probably want the ForkJoinPool instantiated only once (not instantiate it on every method call) and then reuse it to prevent CPU choking if the method is called multiple times.

Is creating thousand of Runnables really expensive and to be avoided?
Not at all, the runnable/callable interfaces have only one method to implement each, and the amount of "extra" code in each task depends on the code you are running. But certainly no fault of the Runnable/Callable interfaces.
Is there a generic pattern/recommendation how to do it by solution II?
Pattern 2 is more favorable than pattern 1. This is because pattern 1 assumes that each worker will finish at the exact same time. If some workers finish before other workers, they could just be sitting idle since they only are able to work on the y/x-size queues you assigned to each of them. In pattern 2 however, you will never have idle worker threads (unless the end of the work queue is reached and numWorkItems < numWorkers).
An easy way to use the preferred pattern, pattern 2, is to use the ExecutorService invokeAll(Collection<? extends Callable<T>> list) method.
Here is an example usage:
List<Callable<?>> workList = // a single list of all of your work
ExecutorService es = Executors.newCachedThreadPool();
es.invokeAll(workList);
Fairly readable and straightforward usage, and the ExecutorService implementation will automatically use solution 2 for you, so you know that each worker thread has their use time maximized.
Are you aware of a different approach?
Solution 1 and 2 are two common approaches for generic work. Now, there are many different implementation available for you choose from (such as java.util.Concurrent, Java 8 parallel streams, or Fork/Join pools), but the concept of each implementation is generally the same. The only exception is if you have specific tasks in mind with non-standard running behavior.

Java concurrency counter not properly clean up

This is a java concurrency question. 10 jobs need to be done, each of them will have 32 worker threads. Worker thread will increase a counter . Once the counter is 32, it means this job is done and then clean up counter map. From the console output, I expect that 10 "done" will be output, pool size is 0 and counterThread size is 0.
The issues are :
most of time, "pool size: 0 and countThreadMap size:3" will be
printed out. even those all threads are gone, but 3 jobs are not
finished yet.
some time, I can see nullpointerexception in line 27. I have used ConcurrentHashMap and AtomicLong, why still have concurrency
exception.
Thanks
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.atomic.AtomicLong;
public class Test {
final ConcurrentHashMap<Long, AtomicLong[]> countThreadMap = new ConcurrentHashMap<Long, AtomicLong[]>();
final ExecutorService cachedThreadPool = Executors.newCachedThreadPool();
final ThreadPoolExecutor tPoolExecutor = ((ThreadPoolExecutor) cachedThreadPool);
public void doJob(final Long batchIterationTime) {
for (int i = 0; i < 32; i++) {
Thread workerThread = new Thread(new Runnable() {
#Override
public void run() {
if (countThreadMap.get(batchIterationTime) == null) {
AtomicLong[] atomicThreadCountArr = new AtomicLong[2];
atomicThreadCountArr[0] = new AtomicLong(1);
atomicThreadCountArr[1] = new AtomicLong(System.currentTimeMillis()); //start up time
countThreadMap.put(batchIterationTime, atomicThreadCountArr);
} else {
AtomicLong[] atomicThreadCountArr = countThreadMap.get(batchIterationTime);
atomicThreadCountArr[0].getAndAdd(1);
countThreadMap.put(batchIterationTime, atomicThreadCountArr);
}
if (countThreadMap.get(batchIterationTime)[0].get() == 32) {
System.out.println("done");
countThreadMap.remove(batchIterationTime);
}
}
});
tPoolExecutor.execute(workerThread);
}
}
public void report(){
while(tPoolExecutor.getActiveCount() != 0){
//
}
System.out.println("pool size: "+ tPoolExecutor.getActiveCount() + " and countThreadMap size:"+countThreadMap.size());
}
public static void main(String[] args) throws Exception {
Test test = new Test();
for (int i = 0; i < 10; i++) {
Long batchIterationTime = System.currentTimeMillis();
test.doJob(batchIterationTime);
}
test.report();
System.out.println("All Jobs are done");
}
}

Let’s dig through all the mistakes of thread related programming, one man can make:
Thread workerThread = new Thread(new Runnable() {
…
tPoolExecutor.execute(workerThread);
You create a Thread but don’t start it but submit it to an executor. It’s a historical mistake of the Java API to let Thread implement Runnable for no good reason. Now, every developer should be aware, that there is no reason to treat a Thread as a Runnable. If you don’t want to start a thread manually, don’t create a Thread. Just create the Runnable and pass it to execute or submit.
I want to emphasize the latter as it returns a Future which gives you for free what you are attempting to implement: the information when a task has been finished. It’s even easier when using invokeAll which will submit a bunch of Callables and return when all are done. Since you didn’t tell us anything about your actual task, it’s not clear whether you can let your tasks simply implement Callable (may return null) instead of Runnable.
If you can’t use Callables or don’t want to wait immediately on submission, you have to remember the returned Futures and query them at a later time:
static final ExecutorService cachedThreadPool = Executors.newCachedThreadPool();
public static List<Future<?>> doJob(final Long batchIterationTime) {
final Random r=new Random();
List<Future<?>> list=new ArrayList<>(32);
for (int i = 0; i < 32; i++) {
Runnable job=new Runnable() {
public void run() {
// pretend to do something
LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(r.nextInt(10)));
}
};
list.add(cachedThreadPool.submit(job));
}
return list;
}
public static void main(String[] args) throws Exception {
Test test = new Test();
Map<Long,List<Future<?>>> map=new HashMap<>();
for (int i = 0; i < 10; i++) {
Long batchIterationTime = System.currentTimeMillis();
while(map.containsKey(batchIterationTime))
batchIterationTime++;
map.put(batchIterationTime,doJob(batchIterationTime));
}
// print some statistics, if you really need
int overAllDone=0, overallPending=0;
for(Map.Entry<Long,List<Future<?>>> e: map.entrySet()) {
int done=0, pending=0;
for(Future<?> f: e.getValue()) {
if(f.isDone()) done++;
else pending++;
}
System.out.println(e.getKey()+"\t"+done+" done, "+pending+" pending");
overAllDone+=done;
overallPending+=pending;
}
System.out.println("Total\t"+overAllDone+" done, "+overallPending+" pending");
// wait for the completion of all jobs
for(List<Future<?>> l: map.values())
for(Future<?> f: l)
f.get();
System.out.println("All Jobs are done");
}
But note that if you don’t need the ExecutorService for subsequent tasks, it’s much easier to wait for all jobs to complete:
cachedThreadPool.shutdown();
cachedThreadPool.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
System.out.println("All Jobs are done");
But regardless of how unnecessary the manual tracking of the job status is, let’s delve into your attempt, so you may avoid the mistakes in the future:
if (countThreadMap.get(batchIterationTime) == null) {
The ConcurrentMap is thread safe, but this does not turn your concurrent code into sequential one (that would render multi-threading useless). The above line might be processed by up to all 32 threads at the same time, all finding that the key does not exist yet so possibly more than one thread will then be going to put the initial value into the map.
AtomicLong[] atomicThreadCountArr = new AtomicLong[2];
atomicThreadCountArr[0] = new AtomicLong(1);
atomicThreadCountArr[1] = new AtomicLong(System.currentTimeMillis());
countThreadMap.put(batchIterationTime, atomicThreadCountArr);
That’s why this is called the “check-then-act” anti-pattern. If more than one thread is going to process that code, they all will put their new value, being confident that this was the right thing as they have checked the initial condition before acting but for all but one thread the condition has changed when acting and they are overwriting the value of a previous put operation.
} else {
AtomicLong[] atomicThreadCountArr = countThreadMap.get(batchIterationTime);
atomicThreadCountArr[0].getAndAdd(1);
countThreadMap.put(batchIterationTime, atomicThreadCountArr);
Since you are modifying the AtomicInteger which is already stored into the map, the put operation is useless, it will put the very array that it retrieved before. If there wasn’t the mistake that there can be multiple initial values as described above, the put operation had no effect.
}
if (countThreadMap.get(batchIterationTime)[0].get() == 32) {
Again, the use of a ConcurrentMap doesn’t turn the multi-threaded code into sequential code. While it is clear that the only last thread will update the atomic integer to 32 (when the initial race condition doesn’t materialize), it is not guaranteed that all other threads have already passed this if statement. Therefore more than one, up to all threads can still be at this point of execution and see the value of 32. Or…
System.out.println("done");
countThreadMap.remove(batchIterationTime);
One of the threads which have seen the 32 value might execute this remove operation. At this point, there might be still threads not having executed the above if statement, now not seeing the value 32 but producing a NullPointerException as the array supposed to contain the AtomicInteger is not in the map anymore. This is what happens, occasionally…

After creating your 10 jobs, your main thread is still running - it doesn't wait for your jobs to complete before it calls report on the test. You try to overcome this with the while loop, but tPoolExecutor.getActiveCount() is potentially coming out as 0 before the workerThread is executed, and then the countThreadMap.size() is happening after the threads were added to your HashMap.
There are a number of ways to fix this - but I will let another answer-er do that because I have to leave at the moment.

Which ThreadPool in Java should I use?

There are a huge amount of tasks.
Each task is belong to a single group. The requirement is each group of tasks should executed serially just like executed in a single thread and the throughput should be maximized in a multi-core (or multi-cpu) environment. Note: there are also a huge amount of groups that is proportional to the number of tasks.
The naive solution is using ThreadPoolExecutor and synchronize (or lock). However, threads would block each other and the throughput is not maximized.
Any better idea? Or is there exist a third party library satisfy the requirement?

A simple approach would be to "concatenate" all group tasks into one super task, thus making the sub-tasks run serially. But this will probably cause delay in other groups that will not start unless some other group completely finishes and makes some space in the thread pool.
As an alternative, consider chaining a group's tasks. The following code illustrates it:
public class MultiSerialExecutor {
private final ExecutorService executor;
public MultiSerialExecutor(int maxNumThreads) {
executor = Executors.newFixedThreadPool(maxNumThreads);
}
public void addTaskSequence(List<Runnable> tasks) {
executor.execute(new TaskChain(tasks));
}
private void shutdown() {
executor.shutdown();
}
private class TaskChain implements Runnable {
private List<Runnable> seq;
private int ind;
public TaskChain(List<Runnable> seq) {
this.seq = seq;
}
#Override
public void run() {
seq.get(ind++).run(); //NOTE: No special error handling
if (ind < seq.size())
executor.execute(this);
}
}
The advantage is that no extra resource (thread/queue) is being used, and that the granularity of tasks is better than the one in the naive approach. The disadvantage is that all group's tasks should be known in advance.
--edit--
To make this solution generic and complete, you may want to decide on error handling (i.e whether a chain continues even if an error occures), and also it would be a good idea to implement ExecutorService, and delegate all calls to the underlying executor.

I would suggest to use task queues:
For every group of tasks You have create a queue and insert all tasks from that group into it.
Now all Your queues can be executed in parallel while the tasks inside one queue are executed serially.
A quick google search suggests that the java api has no task / thread queues by itself. However there are many tutorials available on coding one. Everyone feel free to list good tutorials / implementations if You know some:

I mostly agree on Dave's answer, but if you need to slice CPU time across all "groups", i.e. all task groups should progress in parallel, you might find this kind of construct useful (using removal as "lock". This worked fine in my case although I imagine it tends to use more memory):
class TaskAllocator {
private final ConcurrentLinkedQueue<Queue<Runnable>> entireWork
= childQueuePerTaskGroup();
public Queue<Runnable> lockTaskGroup(){
return entireWork.poll();
}
public void release(Queue<Runnable> taskGroup){
entireWork.offer(taskGroup);
}
}
and
class DoWork implmements Runnable {
private final TaskAllocator allocator;
public DoWork(TaskAllocator allocator){
this.allocator = allocator;
}
pubic void run(){
for(;;){
Queue<Runnable> taskGroup = allocator.lockTaskGroup();
if(task==null){
//No more work
return;
}
Runnable work = taskGroup.poll();
if(work == null){
//This group is done
continue;
}
//Do work, but never forget to release the group to
// the allocator.
try {
work.run();
} finally {
allocator.release(taskGroup);
}
}//for
}
}
You can then use optimum number of threads to run the DoWork task. It's kind of a round robin load balance..
You can even do something more sophisticated, by using this instead of a simple queue in TaskAllocator (task groups with more task remaining tend to get executed)
ConcurrentSkipListSet<MyQueue<Runnable>> sophisticatedQueue =
new ConcurrentSkipListSet(new SophisticatedComparator());
where SophisticatedComparator is
class SophisticatedComparator implements Comparator<MyQueue<Runnable>> {
public int compare(MyQueue<Runnable> o1, MyQueue<Runnable> o2){
int diff = o2.size() - o1.size();
if(diff==0){
//This is crucial. You must assign unique ids to your
//Subqueue and break the equality if they happen to have same size.
//Otherwise your queues will disappear...
return o1.id - o2.id;
}
return diff;
}
}

Actor is also another solution for this specified type of issues.
Scala has actors and also Java, which provided by AKKA.

I had a problem similar to your, and I used an ExecutorCompletionService that works with an Executor to complete collections of tasks.
Here is an extract from java.util.concurrent API, since Java7:
Suppose you have a set of solvers for a certain problem, each returning a value of some type Result, and would like to run them concurrently, processing the results of each of them that return a non-null value, in some method use(Result r). You could write this as:
void solve(Executor e, Collection<Callable<Result>> solvers)
throws InterruptedException, ExecutionException {
CompletionService<Result> ecs = new ExecutorCompletionService<Result>(e);
for (Callable<Result> s : solvers)
ecs.submit(s);
int n = solvers.size();
for (int i = 0; i < n; ++i) {
Result r = ecs.take().get();
if (r != null)
use(r);
}
}
So, in your scenario, every task will be a single Callable<Result>, and tasks will be grouped in a Collection<Callable<Result>>.
Reference:
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorCompletionService.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.