I'm trying to run a number of jobs concurrently using Java's ForkJoinPool. The main task (which is already running in the pool) spawns all the jobs and then does a series of joins. I was sure that a task calling join would free the thread it is running in, but it seems like it is actually blocked on it, and therefore it is "wasting" the thread, i.e., since the number of threads equals the number of CPU cores, one core will be inactive.
I know that if I run invokeAll instead, then the first of the sub-jobs gets to run in the same thread, and indeed this works. However, this seems sub-optimal, because if the first task is actually a very fast one, i have the same problem. One of the threads is blocked waiting on join. There are more jobs than threads, so I would rather another one of the jobs gets started.
I can try and bypass all this manually but its not so nice, and it seems like I am redoing what ForkJoinPool is supposed to do.
So the question is: Am I understanding ForkJoinPool wrong? or if what I'm saying is correct, then is there simple way to utilize the threads more efficiently?
ForkJoinPool is designed to prevent you having to think about thread utilization in this way. The 'work stealing' algorithm ensures that each thread is always busy so long as there are tasks in the queue.
Check out these notes for a high-level discussion:
https://www.dre.vanderbilt.edu/~schmidt/cs891f/2018-PDFs/L4-ForkJoinPool-pt3.pdf
To see the ugly details go down the rabbit hole of the ForkJoinPool#awaitJoin source.
Roughly, if I'm reading the (very complex) code correctly: When a thread joins a sub-task, it attempts to complete that task itself, otherwise if the sub-task's worker queue is non-empty (i.e. it is also depending on other tasks), the joining thread repeatedly attempts to complete one of those tasks, via ForkJoinPool#tryHelpStealer, whose Javadoc entry provides some insight:
Tries to locate and execute tasks for a stealer of the given
task, or in turn one of its stealers, Traces currentSteal ->
currentJoin links looking for a thread working on a descendant
of the given task and with a non-empty queue to steal back and
execute tasks from. The first call to this method upon a
waiting join will often entail scanning/search, (which is OK
because the joiner has nothing better to do), but this method
leaves hints in workers to speed up subsequent calls. The
implementation is very branchy to cope with potential
inconsistencies or loops encountering chains that are stale,
unknown, or so long that they are likely cyclic.
Notice that ForkJoinTask does not extend Thread, so 'blocking' of the join operation means something different here than usual. It doesn't mean that the underlying thread is in a blocked state, rather it means that the computation of the current task is held up further up the call stack while join goes off and attempts to resolve the tree of sub-tasks impeding progress.
Related
The JavaDoc of ForkJoinTask says:
[R]eturns (joins) should be performed innermost-first. For example, a.fork(); b.fork(); b.join(); a.join(); is likely to be substantially more efficient than joining a before b.
I can't quite get my head around as to why (and in which circumstances) the order of join()s would matter, assuming I need to join a and b and get their results before continuing my computations.
Specifically, I have a couple dozens fork()ed tasks and I need to wait for all of them to return their result; much like invokeAll() would do, but I can still perform some work after fork()ing but before join()ing, so I implemented something like a joinAll() to be called only when I know that I cannot continue without the results from the forked tasks.
The question is, how should this joinAll() be implemented? Does it matter in which order this code actually calls join() on the tasks?
While preparing the ForkJoin-Framework for a lecture, I also stumbled upon this statement in the docs and wanted to know why it is that way.
First, I want to note, that I do not have the definitive answer to your question, but want to share what I found:
In the original paper written by Doug Lea (http://gee.cs.oswego.edu/dl/papers/fj.pdf), his implementation of the Work-Stealing algorithm is described more in detail in section 2.1: Subtasks generated by worker threads (using fork) are pushed onto their own deque. The worker threads process their own deque LIFO (youngest-first), while workers steal from other deques FIFO (oldest-first).
And then the important part I think is: "When a worker thread encounters a join operation, it processes other tasks, if available, until the target task is noticed to have completed (via isDone). All tasks otherwise run to completion without blocking."
Therefore, it is more efficient to first join on the tasks which the worker itself will process next, instead of joining on other tasks which might be stolen from other workers. Otherwise there is probably more thread management overhead due to potential context switches and lock contention.
At least for me, this reasoning would make sense regarding the description in the JavaDoc.
I already understand that forking and joining are used for multithreading, but I don't understand what, exactly happens when a task is forked. Does forking a task cause that forked task to go back to the beginning of the compute method? Or does the task do something else? If I want a task to jump to a different method other than compute and run that when forked, how would I tell it to do that? Is there some sort of extension to (instance).fork(); that I can include to specify this?
The task that uses fork/join framework actually gets split into smaller subtasks recursively, so they can be executed concurrently.
By forking, each subtask can be executed by a different CPU parallelly, or by different threads on the same CPU.
After the execution of all subtasks is finished, the join part begins.
In this process, the results of all the subtasks are recursively joined into a single result.
This whole process happens 'behind the scenes' and can be implemented using a pool of threads called ForkJoinPool which manages threads of type ForkJoinWorkerThread.
Trying to debug a race condition where one of our application's poller threads never return causing future pollers to never get scheduled. In abstract terms to hide our business logic while capturing the problem, here's what our code path is.
We have to update some state X of resource Y in a remote server. We have a resource manager, which changes the resource state and updates X as a side effect of the change. This manager polls the resource continually and when it believes resource is updated, it uses a ThreadPoolExecutor to do the work. This thread pool executor has a reasonably sized blocking queue but fairly small number of max threads. The hang itself from thread dump happens in invokeAll call (among other things)
We have reasons to believe that the number of core/max threads in this pool executor are busy doing other stuff (more resource state updations, if you will).
Since invokeAll returns us futures which we wait on, the question is does invokeAll hang even if the blocking data structure used by the executor is big enough to take in the work passed in via invokeAll but there are no enough threads available?
As other users have pointed out, without some code (even pseudo-code), and a clearer understanding of what "state X" is, and what "resource Y" is, it is virtually impossible for anybody here to provide an intelligent answer. In short, you need an SSCCE. Nevertheless, I'll do my best here ;-). And if you do post code and/or provide more info, I'll update my answer accordingly.
From the Java 7 ExecutorService#invokeAll javadoc:
Executes the given tasks, returning a list of Futures holding their status and results when all complete. Future.isDone() is true for each element of the returned list. Note that a completed task could have terminated either normally or by throwing an exception. The results of this method are undefined if the given collection is modified while this operation is in progress.
From your description (and again, I can't tell for sure because of the lack of details), one of your worker threads is hanging. Since you're calling invokeAll(...), the executor is hanging because it's waiting for the hung thread to finish. But it never does. Now, as to why you're getting a hung thread, that's an entirely different issue, and we would definitely need to see some code. HTH.
I have a bit of an issue with an application running multiple Java threads.
The application runs a number of working threads that peek continuously at an input queue and if there are messages in the queue they pull them out and process them.
Among those working threads there is another verification thread scheduled to perform at a fixed period a check to see if the host (on which the application runs) is still in "good shape" to run the application. This thread updates an AtomicBoolean value which in turn is verified by the working thread before they start peeking to see if the host is OK.
My problem is that in cases with high CPU load the thread responsible with the verification will take longer because it has to compete with all the other threads. If the AtomicBoolean does not get updated after a certain period it is automatically set to false, causing me a nasty bottleneck.
My initial approach was to increase the priority of the verification thread, but digging into it deeper I found that this is not a guaranteed behavior and an algorithm shouldn't rely on thread priority to function correctly.
Anyone got any alternative ideas? Thanks!
Instead of peeking into a regular queue data structure, use the java.util.concurrent package's LinkedBlockingQueue.
What you can do is, run an pool of threads (you could use executer service's fixed thread pool, i.e., a number of workers of your choice) and do LinkedBlockingQueue.take().
If a message arrives at the queue, it is fed to one of the waiting threads (yeah, take does block the thread until there is something to be fed with).
Java API Reference for Linked Blocking Queue's take method
HTH.
One old school approach to throttling rate of work, that does not use a health check thread at all (and so by-passes these problems) is to block or reject requests to add to the queue if the queue is longer than say 100. This applies dynamic back pressure on to the clients generating the load, slowing them down when the worker threads are over loaded.
This approach was added to the Java 1.5 library, see java.util.concurrent.ArrayBlockingQueue. Its put(o) method blocks if the queue is full.
Are u using Executor framework (from Java's concurrency package)? If not give it a shot. You could try using ScheduledExecutorService for the verification thread.
More threads does not mean better performance. Usually if you have dual core, 2 threads gives best performance, 3 or more starts getting worse. Quad core should handle 4 threads best, etc. So be careful how much threads you use.
You can put the other threads to sleep after they perform their work, and allow other threads to do their part. I believe Thread.yield() will pause the current thread to give time to other threads.
If you want your thread to run continuously, I would suggest creating two main threads, thread A and B. Use A for the verification thread, and from B, create the other threads. Therefore thread A gets more execution time.
Seems you need to utilize Condition variables. Peeking will take cpu cycles.
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/locks/Condition.html
I'm comparing two variations on a test program. Both are operating with a 4-thread ForkJoinPool on a machine with four cores.
In 'mode 1', I use the pool very much like an executor service. I toss a pile of tasks into ExecutorService.invokeAll. I get better performance than from an ordinary fixed thread executor service (even though there are calls to Lucene, that do some I/O, in there).
There is no divide-and-conquer here. Literally, I do
ExecutorService es = new ForkJoinPool(4);
es.invokeAll(collection_of_Callables);
In 'mode 2', I submit a single task to the pool, and in that task call ForkJoinTask.invokeAll to submit the subtasks. So, I have an object that inherits from RecursiveAction, and it is submitted to the pool. In the compute method of that class, I call the invokeAll on a collection of objects from a different class that also inherits from RecursiveAction. For testing purposes, I submit only one-at-a-time of the first objects. What I naively expected to see what all four threads busy, as the thread calling invokeAll would grab one of the subtasks for itself instead of just sitting and blocking. I can think of some reasons why it might not work that way.
Watching in VisualVM, in mode 2, one thread is pretty nearly always waiting. What I expect to see is the thread calling invokeAll immediately going to work on one of the invoked tasks rather than just sitting still. This is certainly better than the deadlocks that would result from trying this scheme with an ordinary thread pool, but still, what up? Is it holding one thread back in case something else gets submitted? And, if so, why not the same problem in mode 1?
So far I've been running this using the jsr166 jar added to java 1.6's boot class path.
ForkJoinTask.invokeAll is forking all tasks, but the first in the list. The first task it runs itself. Then it joins other tasks. It's thread is not released in any way to the pool. So you what you see, it it's thread blocking on other tasks to be complete.
The classic use of invokeAll for a Fork Join pool is to fork one task and compute another (in that executing thread). The thread that does not fork will join after it computes. The work stealing comes in with both tasks computing. When each task computes it is expected to fork it's own subtasks (until some threshold is met).
I am not sure what invokeAll is being called for your RecursiveAction.compute() but if it is the invokeAll which takes two RecursiveAction it will fork one, compute the other and wait for the forked task to finish.
This is different then a plain executor service because each task of an ExecutorService is simply a Runnable on a queue. There is no need for two tasks of an ExecutorService to know the outcome of another. That is the primary use case of a FJ Pool.