Which ThreadPool in Java should I use?

Which ThreadPool in Java should I use? - java

There are a huge amount of tasks.
Each task is belong to a single group. The requirement is each group of tasks should executed serially just like executed in a single thread and the throughput should be maximized in a multi-core (or multi-cpu) environment. Note: there are also a huge amount of groups that is proportional to the number of tasks.
The naive solution is using ThreadPoolExecutor and synchronize (or lock). However, threads would block each other and the throughput is not maximized.
Any better idea? Or is there exist a third party library satisfy the requirement?

A simple approach would be to "concatenate" all group tasks into one super task, thus making the sub-tasks run serially. But this will probably cause delay in other groups that will not start unless some other group completely finishes and makes some space in the thread pool.
As an alternative, consider chaining a group's tasks. The following code illustrates it:
public class MultiSerialExecutor {
private final ExecutorService executor;
public MultiSerialExecutor(int maxNumThreads) {
executor = Executors.newFixedThreadPool(maxNumThreads);
}
public void addTaskSequence(List<Runnable> tasks) {
executor.execute(new TaskChain(tasks));
}
private void shutdown() {
executor.shutdown();
}
private class TaskChain implements Runnable {
private List<Runnable> seq;
private int ind;
public TaskChain(List<Runnable> seq) {
this.seq = seq;
}
#Override
public void run() {
seq.get(ind++).run(); //NOTE: No special error handling
if (ind < seq.size())
executor.execute(this);
}
}
The advantage is that no extra resource (thread/queue) is being used, and that the granularity of tasks is better than the one in the naive approach. The disadvantage is that all group's tasks should be known in advance.
--edit--
To make this solution generic and complete, you may want to decide on error handling (i.e whether a chain continues even if an error occures), and also it would be a good idea to implement ExecutorService, and delegate all calls to the underlying executor.

I would suggest to use task queues:
For every group of tasks You have create a queue and insert all tasks from that group into it.
Now all Your queues can be executed in parallel while the tasks inside one queue are executed serially.
A quick google search suggests that the java api has no task / thread queues by itself. However there are many tutorials available on coding one. Everyone feel free to list good tutorials / implementations if You know some:

I mostly agree on Dave's answer, but if you need to slice CPU time across all "groups", i.e. all task groups should progress in parallel, you might find this kind of construct useful (using removal as "lock". This worked fine in my case although I imagine it tends to use more memory):
class TaskAllocator {
private final ConcurrentLinkedQueue<Queue<Runnable>> entireWork
= childQueuePerTaskGroup();
public Queue<Runnable> lockTaskGroup(){
return entireWork.poll();
}
public void release(Queue<Runnable> taskGroup){
entireWork.offer(taskGroup);
}
}
and
class DoWork implmements Runnable {
private final TaskAllocator allocator;
public DoWork(TaskAllocator allocator){
this.allocator = allocator;
}
pubic void run(){
for(;;){
Queue<Runnable> taskGroup = allocator.lockTaskGroup();
if(task==null){
//No more work
return;
}
Runnable work = taskGroup.poll();
if(work == null){
//This group is done
continue;
}
//Do work, but never forget to release the group to
// the allocator.
try {
work.run();
} finally {
allocator.release(taskGroup);
}
}//for
}
}
You can then use optimum number of threads to run the DoWork task. It's kind of a round robin load balance..
You can even do something more sophisticated, by using this instead of a simple queue in TaskAllocator (task groups with more task remaining tend to get executed)
ConcurrentSkipListSet<MyQueue<Runnable>> sophisticatedQueue =
new ConcurrentSkipListSet(new SophisticatedComparator());
where SophisticatedComparator is
class SophisticatedComparator implements Comparator<MyQueue<Runnable>> {
public int compare(MyQueue<Runnable> o1, MyQueue<Runnable> o2){
int diff = o2.size() - o1.size();
if(diff==0){
//This is crucial. You must assign unique ids to your
//Subqueue and break the equality if they happen to have same size.
//Otherwise your queues will disappear...
return o1.id - o2.id;
}
return diff;
}
}

Actor is also another solution for this specified type of issues.
Scala has actors and also Java, which provided by AKKA.

I had a problem similar to your, and I used an ExecutorCompletionService that works with an Executor to complete collections of tasks.
Here is an extract from java.util.concurrent API, since Java7:
Suppose you have a set of solvers for a certain problem, each returning a value of some type Result, and would like to run them concurrently, processing the results of each of them that return a non-null value, in some method use(Result r). You could write this as:
void solve(Executor e, Collection<Callable<Result>> solvers)
throws InterruptedException, ExecutionException {
CompletionService<Result> ecs = new ExecutorCompletionService<Result>(e);
for (Callable<Result> s : solvers)
ecs.submit(s);
int n = solvers.size();
for (int i = 0; i < n; ++i) {
Result r = ecs.take().get();
if (r != null)
use(r);
}
}
So, in your scenario, every task will be a single Callable<Result>, and tasks will be grouped in a Collection<Callable<Result>>.
Reference:
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorCompletionService.html

Related

Creating 2 threads and each running different tasks

I want to create two threads in my application that'll run two methods. I'm using the builder design pattern where inside the build method I have something like this, request is the Object that is passed:
Rules rule;
Request build() {
Request request = new Request(this);
//I want one threat to call this method
Boolean isExceeding = this.rule.volumeExceeding(request);
//Another thread to call this method
Boolean isRepeating = this.rule.volumeRepeating(request);
//Some sort of timer that will wait until both values are received,
//If one value takes too long to be received kill the thread and continue with
//whatever value was received.
..Logic based on 2 booleans..
return request;
}
Here's how this class looks like:
public class Rules {
public Boolean volumeExceeding(Request request) {
...some...logic...
return true/false;
}
public Boolean volumeRepeating(Request request) {
...some...logic...
return true/false;
}
}
I have commented in the code what I'd like to happen. Basically, I'd like to create two threads that'll run their respective method. It'll wait until both are finished, however, if one takes too long (example: more than 10ms) then return the value that was completed. How do I create this? I'm trying to understand the multithreading tutorials, but the examples are so generic that it's hard to take what they did and apply it to something more complicated.

One way to do that is to use CompletableFutures:
import java.util.concurrent.CompletableFuture;
class Main {
private static final long timeout = 1_000; // 1 second
static Boolean volumeExceeding(Object request) {
System.out.println(Thread.currentThread().getName());
final long startpoint = System.currentTimeMillis();
// do stuff with request but we do dummy stuff
for (int i = 0; i < 1_000_000; i++) {
if (System.currentTimeMillis() - startpoint > timeout) {
return false;
}
Math.log(Math.sqrt(i));
}
return true;
}
static Boolean volumeRepeating(Object request) {
System.out.println(Thread.currentThread().getName());
final long startpoint = System.currentTimeMillis();
// do stuff with request but we do dummy stuff
for (int i = 0; i < 1_000_000_000; i++) {
if (System.currentTimeMillis() - startpoint > timeout) {
return false;
}
Math.log(Math.sqrt(i));
}
return true;
}
public static void main(String[] args) {
final Object request = new Object();
CompletableFuture<Boolean> isExceedingFuture = CompletableFuture.supplyAsync(
() -> Main.volumeExceeding(request));
CompletableFuture<Boolean> isRepeatingFuture = CompletableFuture.supplyAsync(
() -> Main.volumeRepeating(request));
Boolean isExceeding = isExceedingFuture.join();
Boolean isRepeating = isRepeatingFuture.join();
System.out.println(isExceeding);
System.out.println(isRepeating);
}
}
Notice that one task takes significantly longer than the other.
What's happening? You supply those tasks to the common pool by using CompletableFuture for execution. Both tasks are executed by two different threads. What you've asked for is that a task is stopped when it takes too long. Therefore you can simply remember the time when a task has started and periodically check it against a timeout. Important: Do this check when the task would return while leaving the data in a consistent state. Also note that you can place multiple checks of course.
Here's a nice guide about CompletableFuture: Guide To CompletableFuture

If I understand your question correctly, then you should do this with a ticketing system (also known as provider-consumer pattern or producer-consumer pattern), so your threads are reused (which is a significant performance boost, if those operations are time critical).
The general idea should be:
application initialization
Initialize 2 or more "consumer" threads, which can work tickets (also called jobs).
runtime
Feed the consumer threads tickets (or jobs) that will be waited on for (about) as long as you like. However depending on the JVM, the waiting period will most likely not be exactly n milliseconds, as most often schedulers are more 'lax' in regards to waiting periods for timeouts. e.g. Thread.sleep() will almost always be off by a bunch of milliseconds (always late, never early - to my knowledge).
If the thread does not return after a given waiting period, then that result must be neglected (according to your logic), and the ticket (and thus the thread) must be informed to abort that ticket. It is important that you not interrupt the thread, since that can lead to exceptions, or prevent locks from being unlocked.
Remember, that halting or stopping threads from the outside is almost always problematic with locks, so I would suggest, your jobs visit a possible exit point periodically, so if you stop caring about a result, they can be safely terminated.

Massive tasks alternative pattern for Runnable or Callable

For massive parallel computing I tend to use executors and callables. When I have thousand of objects to be computed I feel not so good to instantiate thousand of Runnables for each object.
So I have two approaches to solve this:
I. Split the workload into a small amount of x-workers giving y-objects each. (splitting the object list into x-partitions with y/x-size each)
public static <V> List<List<V>> partitions(List<V> list, int chunks) {
final ArrayList<List<V>> lists = new ArrayList<List<V>>();
final int size = Math.max(1, list.size() / chunks + 1);
final int listSize = list.size();
for (int i = 0; i <= chunks; i++) {
final List<V> vs = list.subList(Math.min(listSize, i * size), Math.min(listSize, i * size + size));
if(vs.size() == 0) break;
lists.add(vs);
}
return lists;
}
II. Creating x-workers which fetch objects from a queue.
Questions:
Is creating thousand of Runnables really expensive and to be avoided?
Is there a generic pattern/recommendation how to do it by solution II?
Are you aware of a different approach?

Creating thousands of Runnable (objects implementing Runnable) is not more expensive than creating a normal object.
Creating and running thousands of Threads can be very heavy, but you can use Executors with a pool of threads to solve this problem.

As for the different approach, you might be interested in java 8's parallel streams.

Combining various answers here :
Is creating thousand of Runnables really expensive and to be avoided?
No, it's not in and of itself. It's how you will make them execute that may prove costly (spawning a few thousand threads certainly has its cost).
So you would not want to do this :
List<Computation> computations = ...
List<Thread> threads = new ArrayList<>();
for (Computation computation : computations) {
Thread thread = new Thread(new Computation(computation));
threads.add(thread);
thread.start();
}
// If you need to wait for completion:
for (Thread t : threads) {
t.join();
}
Because it would 1) be unnecessarily costly in terms of OS ressource (native threads, each having a stack on the heap), 2) spam the OS scheduler with a vastly concurrent workload, most certainly leading to plenty of context switchs and associated cache invalidations at the CPU level 3) be a nightmare to catch and deal with exceptions (your threads should probably define an Uncaught exception handler, and you'd have to deal with it manually).
You'd probably prefer an approach where a finite Thread pool (of a few threads, "a few" being closely related to your number of CPU cores) handles many many Callables.
List<Computation> computations = ...
ExecutorService pool = Executors.newFixedSizeThreadPool(someNumber)
List<Future<Result>> results = new ArrayList<>();
for (Computation computation : computations) {
results.add(pool.submit(new ComputationCallable(computation));
}
for (Future<Result> result : results {
doSomething(result.get);
}
The fact that you reuse a limited number threads should yield a really nice improvement.
Is there a generic pattern/recommendation how to do it by solution II?
There are. First, your partition code (getting from a List to a List<List>) can be found inside collection tools such as Guava, with more generic and fail-proofed implementations.
But more than this, two patterns come to mind for what you are achieving :
Use the Fork/Join Pool with Fork/Join tasks (that is, spawn a task with your whole list of items, and each task will fork sub tasks with half of that list, up to the point where each task manages a small enough list of items). It's divide and conquer. See: http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ForkJoinTask.html
If your computation were to be "add integers from a list", it could look like (there might be a boundary bug in there, I did not really check) :
public static class Adder extends RecursiveTask<Integer> {
protected List<Integer> globalList;
protected int start;
protected int stop;
public Adder(List<Integer> globalList, int start, int stop) {
super();
this.globalList = globalList;
this.start = start;
this.stop = stop;
System.out.println("Creating for " + start + " => " + stop);
}
#Override
protected Integer compute() {
if (stop - start > 1000) {
// Too many arguments, we split the list
Adder subTask1 = new Adder(globalList, start, start + (stop-start)/2);
Adder subTask2 = new Adder(globalList, start + (stop-start)/2, stop);
subTask2.fork();
return subTask1.compute() + subTask2.join();
} else {
// Manageable size of arguments, we deal in place
int result = 0;
for(int i = start; i < stop; i++) {
result +=i;
}
return result;
}
}
}
public void doWork() throws Exception {
List<Integer> computation = new ArrayList<>();
for(int i = 0; i < 10000; i++) {
computation.add(i);
}
ForkJoinPool pool = new ForkJoinPool();
RecursiveTask<Integer> masterTask = new Adder(computation, 0, computation.size());
Future<Integer> future = pool.submit(masterTask);
System.out.println(future.get());
}
Use Java 8 parallel streams in order to launch multiple parallel computations easily (under the hood, Java parallel streams can fall back to the Fork/Join pool actually).
Others have shown how this might look like.
Are you aware of a different approach?
For a different take at concurrent programming (without explicit task / thread handling), have a look at the actor pattern. https://en.wikipedia.org/wiki/Actor_model
Akka comes to mind as a popular implementation of this pattern...

#Aaron is right, you should take a look into Java 8's parallel streams:
void processInParallel(List<V> list) {
list.parallelStream().forEach(item -> {
// do something
});
}
If you need to specify chunks, you could use a ForkJoinPool as described here:
void processInParallel(List<V> list, int chunks) {
ForkJoinPool forkJoinPool = new ForkJoinPool(chunks);
forkJoinPool.submit(() -> {
list.parallelStream().forEach(item -> {
// do something with each item
});
});
}
You could also have a functional interface as an argument:
void processInParallel(List<V> list, int chunks, Consumer<V> processor) {
ForkJoinPool forkJoinPool = new ForkJoinPool(chunks);
forkJoinPool.submit(() -> {
list.parallelStream().forEach(item -> processor.accept(item));
});
}
Or in shorthand notation:
void processInParallel(List<V> list, int chunks, Consumer<V> processor) {
new ForkJoinPool(chunks).submit(() -> list.parallelStream().forEach(processor::accept));
}
And then you would use it like:
processInParallel(myList, 2, item -> {
// do something with each item
});
Depending on your needs, the ForkJoinPool#submit() returns an instance of ForkJoinTask, which is a Future and you may use it to check for the status or wait for the end of your task.
You'd most probably want the ForkJoinPool instantiated only once (not instantiate it on every method call) and then reuse it to prevent CPU choking if the method is called multiple times.

Is creating thousand of Runnables really expensive and to be avoided?
Not at all, the runnable/callable interfaces have only one method to implement each, and the amount of "extra" code in each task depends on the code you are running. But certainly no fault of the Runnable/Callable interfaces.
Is there a generic pattern/recommendation how to do it by solution II?
Pattern 2 is more favorable than pattern 1. This is because pattern 1 assumes that each worker will finish at the exact same time. If some workers finish before other workers, they could just be sitting idle since they only are able to work on the y/x-size queues you assigned to each of them. In pattern 2 however, you will never have idle worker threads (unless the end of the work queue is reached and numWorkItems < numWorkers).
An easy way to use the preferred pattern, pattern 2, is to use the ExecutorService invokeAll(Collection<? extends Callable<T>> list) method.
Here is an example usage:
List<Callable<?>> workList = // a single list of all of your work
ExecutorService es = Executors.newCachedThreadPool();
es.invokeAll(workList);
Fairly readable and straightforward usage, and the ExecutorService implementation will automatically use solution 2 for you, so you know that each worker thread has their use time maximized.
Are you aware of a different approach?
Solution 1 and 2 are two common approaches for generic work. Now, there are many different implementation available for you choose from (such as java.util.Concurrent, Java 8 parallel streams, or Fork/Join pools), but the concept of each implementation is generally the same. The only exception is if you have specific tasks in mind with non-standard running behavior.

Junit test the correct number of threads has started

So I have a method that starts five threads. I want to write a unit test just to check that the five threads have been started. How do I do that? Sample codes are much appreciated.

Instead of writing your own method to start threads, why not use an Executor, which can be injected into your class? Then you can easily test it by passing in a dummy Executor.
Edit: Here's a simple example of how your code could be structured:
public class ResultCalculator {
private final ExecutorService pool;
private final List<Future<Integer>> pendingResults;
public ResultCalculator(ExecutorService pool) {
this.pool = pool;
this.pendingResults = new ArrayList<Future<Integer>>();
}
public void startComputation() {
for (int i = 0; i < 5; i++) {
Future<Integer> future = pool.submit(new Robot(i));
pendingResults.add(future);
}
}
public int getFinalResult() throws ExecutionException {
int total = 0;
for (Future<Integer> robotResult : pendingResults) {
total += robotResult.get();
}
return total;
}
}
public class Robot implements Callable<Integer> {
private final int input;
public Robot(int input) {
this.input = input;
}
#Override
public Integer call() {
// Some very long calculation
Thread.sleep(10000);
return input * input;
}
}
And here's how you'd call it from your main():
public static void main(String args) throws Exception {
// Note that the number of threads is now specified here
ExecutorService pool = Executors.newFixedThreadPool(5);
ResultCalculator calc = new ResultCalculator(pool);
try {
calc.startComputation();
// Maybe do something while we're waiting
System.out.printf("Result is: %d\n", calc.getFinalResult());
} finally {
pool.shutdownNow();
}
}
And here's how you'd test it (assuming JUnit 4 and Mockito):
#Test
#SuppressWarnings("unchecked")
public void testStartComputationAddsRobotsToQueue() {
ExecutorService pool = mock(ExecutorService.class);
Future<Integer> future = mock(Future.class);
when(pool.submit(any(Callable.class)).thenReturn(future);
ResultCalculator calc = new ResultCalculator(pool);
calc.startComputation();
verify(pool, times(5)).submit(any(Callable.class));
}
Note that all this code is just a sketch which I have not tested or even tried to compile yet. But it should give you an idea of how the code can be structured.

Rather than saying you are going to "test the five threads have been started", it would be better to step back and think about what the five threads are actually supposed to do. Then test to make sure that that "something" is actually being done.
If you really just want to test that the threads have been started, there are a few things you could do. Are you keeping references to the threads somewhere? If so, you could retrieve the references, count them, and call isAlive() on each one (checking that it returns true).
I believe there is some method on some Java platform class which you can call to find how many threads are running, or to find all the threads which are running in a ThreadGroup, but you would have to search to find out what it is.
More thoughts in response to your comment
If your code is as simple as new Thread(runnable).start(), I wouldn't bother to test that the threads are actually starting. If you do so, you're basically just testing that the Java platform works (it does). If your code for initializing and starting the threads is more complicated, I would stub out the thread.start() part and make sure that the stub is called the desired number of times, with the correct arguments, etc.
Regardless of what you do about that, I would definitely test that the task is completed correctly when running in multithreaded mode. From personal experience, I can tell you that as soon as you start doing anything remotely complicated with threads, it is devilishly easy to get subtle bugs which only show up under certain conditions, and perhaps only occasionally. Dealing with the complexity of multithreaded code is a very slippery slope.
Because of that, if you can do it, I would highly recommend you do more than just simple unit testing. Do stress tests where you run your task with many threads, on a multicore machine, on very large data sets, and make sure all the answers are exactly as expected.
Also, although you are expecting a performance increase from using threads, I highly recommend that you benchmark your program with varying numbers of threads, to make sure that the desired performance increase is actually achieved. Depending on how your system is designed, it's possible to wind up with concurrency bottlenecks which may make your program hardly faster with threads than without. In some cases, it can even be slower!

How to track task execution statistics using an ExecutorService?

I'm firing off tasks using an ExecutorService, dispatching tasks that need to be grouped by task-specific criteria:
Task[type=a]
Task[type=b]
Task[type=a]
...
Periodically I want to output the average length of time that each task took (grouped by type) along with statistical information such as mean/median and standard deviation.
This needs to be pretty fast, of course, and ideally should not cause the various threads to synchronize when they report statistics. What's a good architecture for doing this?

ThreadPoolExecutor provides beforeExecute and afterExecute methods that you can override. You could use those to record your statistics in a single (member variable of your ExecutorService) ConcurrentHashMap keyed on some unique identifier for your tasks, and storing the type, start time, and end time.
Calculate the statistics from the ConcurrentHashMap when you are ready to look at them.

Subclass Thread Pool Executor and track the execution events:
start on beforeExecute
end on afterExecute
It's worth noting that the methods are invoked by the worker thread which executes the task, so you need to insure thread safety for the execution tracking code.
Also, the Runnables you will receive will most likely not be your Runnables, but wrapped in FutureTasks.

Another way is to use wrapper/decorator pattern.
public class Job implements Runnable {
private Runnable _task;
private Statistics _statistics;
public Job(Runnable task, Statistics statistics) {
this._task = task;
}
public void run() {
long s = System.currentTimeMillis();
_task.run();
long e = System.currentTimeMillis();
long executionTime = e - s;
_statistics.updateStatistics(executionTime);
}
}

I believe the two other answers are correct, but maybe a bit too complicated (although my answer, while simple, is probably not quite as performant as theirs.
Why not just use Atomic variables to keep track of your stats? Such as number of tasks run, total execution time (divided by total number, you get avg execution time). Pass these variables into your Runnable for each task. Unless your tasks as extremely short lived I do not think the overhead of locking an Atomic variable will impact you.

I agree with #Robert Munteanu. The beforeExecute in threadpool really worthing nothing even though the docs said it can be used to statistic. But in fact, we cannot check the runnable's identity in the our situation.
I think a wrapper can arrive this.
public interface ICallableHook<V> {
void beforeExecute(Thread t, Callable<V> callable);
void afterExecute(Callable<V> callable, V result, Throwable e);
}
private class CallableWrapper<V> implements Callable<V> {
private ICallableHook hooker;
private Callable<V> callable;
CallableWrapper(Callable callable, ICallableHook hooker) {
this.callable = callable;
this.hooker = hooker;
}
#Override
public V call() throws Exception {
if (hooker != null) {
hooker.beforeExecute(Thread.currentThread(), callable);
}
V result = null;
Exception exception = null;
try {
result = callable.call();
} catch (Exception e) {
exception = e;
throw e;
} finally {
if (hooker != null) {
hooker.afterExecute(callable, result, exception);
}
}
return result;
}
}
Usage like this,
for (Callable<XXX> callable : callableList) {
CallableWrapper<XXX> callableWrapper = new CallableWrapper<>(callable, hooker);
Future task = completionService.submit(callableWrapper);
}

Java Threading Tutorial Type Question

I am fairly naive when it comes to the world of Java Threading and Concurrency. I am currently trying to learn. I made a simple example to try to figure out how concurrency works.
Here is my code:
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class ThreadedService {
private ExecutorService exec;
/**
* #param delegate
* #param poolSize
*/
public ThreadedService(int poolSize) {
if (poolSize < 1) {
this.exec = Executors.newCachedThreadPool();
} else {
this.exec = Executors.newFixedThreadPool(poolSize);
}
}
public void add(final String str) {
exec.execute(new Runnable() {
public void run() {
System.out.println(str);
}
});
}
public static void main(String args[]) {
ThreadedService t = new ThreadedService(25);
for (int i = 0; i < 100; i++) {
t.add("ADD: " + i);
}
}
}
What do I need to do to make the code print out the numbers 0-99 in sequential order?

Thread pools are usually used for operations which do not need synchronization or are highly parallel.
Printing the numbers 0-99 sequentially is not a concurrent problem and requires threads to be synchronized to avoid printing out of order.
I recommend taking a look at the Java concurrency lesson to get an idea of concurrency in Java.

The idea of threads is not to do things sequentially.
You will need some shared state to coordinate. In the example, adding instance fields to your outer class will work in this example. Remove the parameter from add. Add a lock object and a counter. Grab the lock, increment print the number, increment the number, release the number.

The simplest solution to your problem is to use a ThreadPool size of 1. However, this isn't really the kind of problem one would use threads to solve.
To expand, if you create your executor with:
this.exec = Executors.newSingleThreadExecutor();
then your threads will all be scheduled and executed in the order they were submitted for execution. There are a few scenarios where this is a logical thing to do, but in most cases Threads are the wrong tool to use to solve this problem.
This kind of thing makes sense to do when you need to execute the task in a different thread -- perhaps it takes a long time to execute and you don't want to block a GUI thread -- but you don't need or don't want the submitted tasks to run at the same time.

The problem is by definition not suited to threads. Threads are run independently and there isn't really a way to predict which thread is run first.
If you want to change your code to run sequentially, change add to:
public void add(final String str) {
System.out.println(str);
}
You are not using threads (not your own at least) and everything happens sequentially.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.