The JavaDoc of ForkJoinTask says:
[R]eturns (joins) should be performed innermost-first. For example, a.fork(); b.fork(); b.join(); a.join(); is likely to be substantially more efficient than joining a before b.
I can't quite get my head around as to why (and in which circumstances) the order of join()s would matter, assuming I need to join a and b and get their results before continuing my computations.
Specifically, I have a couple dozens fork()ed tasks and I need to wait for all of them to return their result; much like invokeAll() would do, but I can still perform some work after fork()ing but before join()ing, so I implemented something like a joinAll() to be called only when I know that I cannot continue without the results from the forked tasks.
The question is, how should this joinAll() be implemented? Does it matter in which order this code actually calls join() on the tasks?
While preparing the ForkJoin-Framework for a lecture, I also stumbled upon this statement in the docs and wanted to know why it is that way.
First, I want to note, that I do not have the definitive answer to your question, but want to share what I found:
In the original paper written by Doug Lea (http://gee.cs.oswego.edu/dl/papers/fj.pdf), his implementation of the Work-Stealing algorithm is described more in detail in section 2.1: Subtasks generated by worker threads (using fork) are pushed onto their own deque. The worker threads process their own deque LIFO (youngest-first), while workers steal from other deques FIFO (oldest-first).
And then the important part I think is: "When a worker thread encounters a join operation, it processes other tasks, if available, until the target task is noticed to have completed (via isDone). All tasks otherwise run to completion without blocking."
Therefore, it is more efficient to first join on the tasks which the worker itself will process next, instead of joining on other tasks which might be stolen from other workers. Otherwise there is probably more thread management overhead due to potential context switches and lock contention.
At least for me, this reasoning would make sense regarding the description in the JavaDoc.
Related
I'm trying to run a number of jobs concurrently using Java's ForkJoinPool. The main task (which is already running in the pool) spawns all the jobs and then does a series of joins. I was sure that a task calling join would free the thread it is running in, but it seems like it is actually blocked on it, and therefore it is "wasting" the thread, i.e., since the number of threads equals the number of CPU cores, one core will be inactive.
I know that if I run invokeAll instead, then the first of the sub-jobs gets to run in the same thread, and indeed this works. However, this seems sub-optimal, because if the first task is actually a very fast one, i have the same problem. One of the threads is blocked waiting on join. There are more jobs than threads, so I would rather another one of the jobs gets started.
I can try and bypass all this manually but its not so nice, and it seems like I am redoing what ForkJoinPool is supposed to do.
So the question is: Am I understanding ForkJoinPool wrong? or if what I'm saying is correct, then is there simple way to utilize the threads more efficiently?
ForkJoinPool is designed to prevent you having to think about thread utilization in this way. The 'work stealing' algorithm ensures that each thread is always busy so long as there are tasks in the queue.
Check out these notes for a high-level discussion:
https://www.dre.vanderbilt.edu/~schmidt/cs891f/2018-PDFs/L4-ForkJoinPool-pt3.pdf
To see the ugly details go down the rabbit hole of the ForkJoinPool#awaitJoin source.
Roughly, if I'm reading the (very complex) code correctly: When a thread joins a sub-task, it attempts to complete that task itself, otherwise if the sub-task's worker queue is non-empty (i.e. it is also depending on other tasks), the joining thread repeatedly attempts to complete one of those tasks, via ForkJoinPool#tryHelpStealer, whose Javadoc entry provides some insight:
Tries to locate and execute tasks for a stealer of the given
task, or in turn one of its stealers, Traces currentSteal ->
currentJoin links looking for a thread working on a descendant
of the given task and with a non-empty queue to steal back and
execute tasks from. The first call to this method upon a
waiting join will often entail scanning/search, (which is OK
because the joiner has nothing better to do), but this method
leaves hints in workers to speed up subsequent calls. The
implementation is very branchy to cope with potential
inconsistencies or loops encountering chains that are stale,
unknown, or so long that they are likely cyclic.
Notice that ForkJoinTask does not extend Thread, so 'blocking' of the join operation means something different here than usual. It doesn't mean that the underlying thread is in a blocked state, rather it means that the computation of the current task is held up further up the call stack while join goes off and attempts to resolve the tree of sub-tasks impeding progress.
Use case: tasks are generated in one thread, need to be distributed for computation to many threads and finally the generating task shall reap the results and mark the tasks as done.
I found the class ExecutorCompletionService which fits the use case nearly perfectly --- except that I see no good solution for non-idle waiting. Let me explain.
In principle my code would look like
while (true) {
MyTask t = generateNextTask();
if (t!=null) {
completionService.submit(t);
}
MyTask finished;
while (null!=(finished=compService.poll())) {
retireTaks(finished);
}
}
Both, generateNextTask() and completionService.poll() may return null if there are currently no new tasks available and if currently no task has returned from the CompletionService respectively.
In these cases, the loop degenerates into an ugly idle-wait. I could poll() with a timeout or add a Thread.sleep() for the double-null case, but I consider this a bad workaround, because it nevertheless wastes CPU and is not as responsive as possible, due to the wait.
Suppose I replace generateNextTask() by a poll() on a BlockingQueue, is there good way to poll the queue as well as the CompletionService in parallel to be woken up for work on whichever end something becomes available?
Actually this reminds me of Selector. Is something like it available for queues?
You should use CompletionService.take() to wait until the next task completes and retrieve its Future. poll() is the non-blocking version, returning null if no task is currently completed.
Also, your code seems to be inefficient, because you produce and consume tasks one at a time, instead of allowing multiple tasks to be processed in parallel. Consider having a different thread for task generation and for task results consumption.
-- Edit --
I think that given the constraints you mention in your comments, you can't achieve all your requirements.
Requiring the main thread to be producer and consumer, and disallowing any busy loop or timed loop, you can't avoid the scenario where a blocking wait for a task completion takes too long and no other task gets processed in the meanwhile.
Since you "can replace generateNextTask() by a poll() on a BlockingQueue", I assume incoming tasks can be put in a queue by some other thread, and the problem is, you cannot execute take() on 2 queues simultaneously. The solution is to simply put both incoming and finished tasks in the same queue. To differentiate, wrap them in objects of different types, and then check that type in the loop after take().
This solution works, but we can go further. You said you don't want to use 2 threads for handling tasks - then you can use zero threads. Let wrappers implement Runnable and, instead of checking of the type, you just call take().run(). This way your thread become a single-threaded Executor. But we already have an Executor (CompletionService), can we use it? The problem is, handling of incoming and finished tasks should be done serially, not in parallel. So we need SerialExecutor described in api/java/util/concurrent/Executor, which accepts Runnables and executes them serially, but on another executor. This way no thread is wasted.
And finally, you mentioned Selector as possible solution. I must say, it is an outdated approach. Learn dataflow and actor computing. Nice introduction is here. Look at Dataflow4java project of mine, it has MultiPortActorTest.java example, where class Accum does what you need, with all the boilerplate with wrapper Runnables and serial executors hidden in the supporting library.
What you need is a ListenableFuture from Guava. ListenableFutureExplained
Which is easier and more suitable to use for running things in another thread, notably so that the program waits for the result but doesn't lock up an ui.
There may be a method that is better than either of these also, but I don't know of them.
Thanks :)
Runnable represents the code to be executed.
Executor and its subclasses represent execution strategies.
This means that the former is actually consumed by the later. What you probably meant is: between simple threads and executors, which are more suitable?
The answer to this question is basically: it depends.
Executors are sophisticated tools, which let you choose how many concurrent tasks may be running, and tune different aspects of the execution context. They also provide facilities to monitor the tasks' executions, by returning a token (called a Future or sometimes a promise) which let the code requesting the task execution to query for that task completion.
Threads are less elaborate (or more barebone) a solution to executing code asynchronously. You can still have them return a Future by hand, or simply check if the thread is still running.
So maybe depending on much sophistication you require, you will pick one or the other: Executors for more streamlined requirements (many tasks to execute and monitor), Threads for one shot or simpler situations.
I am reading the book Java Concurrency in Practice where it says,
CyclicBarrier allows a fixed number of parties to rendezvous repeatedly at a barrier point and is useful in parallel iterative algorithms that break down a problem into a fixed number of independent subproblems.
Can someone give an example of how it breaks down a problem into multiple independent subproblems?
You have to break the problem down into multiple independent subproblems yourself.
Barriers ensure that each party completes the first subproblem before any of them start on the second subproblem. This ensures that all of the data from the first subproblem is available before the second subproblem is started.
A CyclicBarrier specifically is used when the same barrier is needed again and again when each step is effectively identical. For example, this could occur when doing any sort of multithreaded reality simulation which is done in steps. The CyclicBarrier would ensure that each thread has completed a given step before all threads will begin the next step.
There is yet another important difference between CountDownLatch and CyclicBarrier and that is: the thread synchronized on CountDownLatch cannot indicate the other threads that something has gone wrong with it, so that the other threads may have a choice to either continue your abort the entire cooperative operation.
In case of a CycliBarrier while one of the threads is waiting on await() some other thread is interrupted or is timed-out, then a BrokenBarrierException will occur on current thread indicating something has gone wrong in one of the cooperating threads.
BrokenBarrierException will also occur in other circumstances which you can find in Javadoc on await() method.
Out-of-the-box, CountDownLatch does not offer this feature.
IF you have an algorithm that can be broken down in independent subproblems,
THEN a CyclicBarrier is useful for all your threads to meet at the end of their calculation and, for example, merge their results.
Note that the Fork/Join framework introduced in Java 7 enables you to do the something similar without needing to use a CyclicBarrier.
We are developing a Java application with several worker threads. These threads will have to deliver a lot of computation results to our UI thread. The order in which the results are delivered does not matter.
Right now, all threads simply push their results onto a synchronized Stack - but this means that every thread must wait for the other threads before results can be delivered.
Is there a data structure that supports simultaneous insertions with each insertion completing in constant time?
Thanks,
Martin
ConcurrentLinkedQueue is designed for high contention. Producers enqueue stuff on one end and consumers collect elements at the other end, so everything will be processed in the order it's added.
ArrayBlockingQueue is a better for lower contention, with lower space overhead.
Edit: Although that's not what you asked for. Simultaneuos inserts? You may want to give every thread one output queue (say, an ArrayBlockingQueue) and then have the UI thread poll the separate queues. However, I'd think you'll find one of the two above Queue implementations sufficient.
Right now, all threads simply push
their results onto a synchronized
Stack - but this means that every
thread must wait for the other threads
before results can be delivered.
Do you have any evidence indicating that this is actually a problem? If the computation performed by those threads is even the least little bit complex (and you don't have literally millions of threads), then lock contention on the result stack is simply a non-issue because when any given thread delivers its results, all others are most likely busy doing their computations.
Take a step back and evaluate whether performance is the key design consideration here. Don't think, know: does profiling back it up?
If not, I'd say a bigger concern is clarity and readability of design, and not introducing new code to maintain. It just so happens that, if you're using Swing, there is a library for doing exactly what you're trying to do, called SwingWorker.
Take a look at java.util.concurrent.ConcurrentLinkedQueue, java.util.concurrent.ConcurrentHashMap or java.util.concurrent.ConcurrentSkipListSet. They might do what you need. ConcurrentSkipListSet, for instance, claims to have "expected average log(n) time cost for the contains, add and remove operations and their variants. Insertion, removal, and access operations safely execute concurrently by multiple threads."
Two other patterns you might want to look at are
each thread has its own collection, when polled it returns the collection and creates a new one, so the collection only holds the pending items between polls. The thread needs to protect operations on its collection, but there is no contention between threads. This is blocking (each thread cannot add to its collection while the UI thread pulls updates from it), but can reduce contention (no contention between threads).
each thread has its own collection, and appends the results to a common queue which is protected using a Lock.tryLock(). The thread continues processing if it fails to acquire the lock. This makes it less likely that a thread will block waiting for the shared queue.