Here's what I need:
A Master task will create a bunch of Worker tasks.
Once each worker finishes the job, it needs to report back to the master.
As soon as the master receives a predefined number of responses, it will save these results. This is needed because inserting the results one by one will take much more time than inserting a bunch of them at once and waiting for all the results might result in an OutOfMemoryException.
I've looked into each worker calling a method on the master and synchronizing this with wait() and notify() and also using ThreadPoolExecutor and the afterExecute(..) method for getting the result from the workers, but I'm still not sure what is the best way to achieve what I need.
Edit: I should also mention that this is a java app.
Use a BlockingQueue where the master waits (queue.take()) for a worker to place a result (queue.put()).
Related
I'm trying to run a number of jobs concurrently using Java's ForkJoinPool. The main task (which is already running in the pool) spawns all the jobs and then does a series of joins. I was sure that a task calling join would free the thread it is running in, but it seems like it is actually blocked on it, and therefore it is "wasting" the thread, i.e., since the number of threads equals the number of CPU cores, one core will be inactive.
I know that if I run invokeAll instead, then the first of the sub-jobs gets to run in the same thread, and indeed this works. However, this seems sub-optimal, because if the first task is actually a very fast one, i have the same problem. One of the threads is blocked waiting on join. There are more jobs than threads, so I would rather another one of the jobs gets started.
I can try and bypass all this manually but its not so nice, and it seems like I am redoing what ForkJoinPool is supposed to do.
So the question is: Am I understanding ForkJoinPool wrong? or if what I'm saying is correct, then is there simple way to utilize the threads more efficiently?
ForkJoinPool is designed to prevent you having to think about thread utilization in this way. The 'work stealing' algorithm ensures that each thread is always busy so long as there are tasks in the queue.
Check out these notes for a high-level discussion:
https://www.dre.vanderbilt.edu/~schmidt/cs891f/2018-PDFs/L4-ForkJoinPool-pt3.pdf
To see the ugly details go down the rabbit hole of the ForkJoinPool#awaitJoin source.
Roughly, if I'm reading the (very complex) code correctly: When a thread joins a sub-task, it attempts to complete that task itself, otherwise if the sub-task's worker queue is non-empty (i.e. it is also depending on other tasks), the joining thread repeatedly attempts to complete one of those tasks, via ForkJoinPool#tryHelpStealer, whose Javadoc entry provides some insight:
Tries to locate and execute tasks for a stealer of the given
task, or in turn one of its stealers, Traces currentSteal ->
currentJoin links looking for a thread working on a descendant
of the given task and with a non-empty queue to steal back and
execute tasks from. The first call to this method upon a
waiting join will often entail scanning/search, (which is OK
because the joiner has nothing better to do), but this method
leaves hints in workers to speed up subsequent calls. The
implementation is very branchy to cope with potential
inconsistencies or loops encountering chains that are stale,
unknown, or so long that they are likely cyclic.
Notice that ForkJoinTask does not extend Thread, so 'blocking' of the join operation means something different here than usual. It doesn't mean that the underlying thread is in a blocked state, rather it means that the computation of the current task is held up further up the call stack while join goes off and attempts to resolve the tree of sub-tasks impeding progress.
Use case: tasks are generated in one thread, need to be distributed for computation to many threads and finally the generating task shall reap the results and mark the tasks as done.
I found the class ExecutorCompletionService which fits the use case nearly perfectly --- except that I see no good solution for non-idle waiting. Let me explain.
In principle my code would look like
while (true) {
MyTask t = generateNextTask();
if (t!=null) {
completionService.submit(t);
}
MyTask finished;
while (null!=(finished=compService.poll())) {
retireTaks(finished);
}
}
Both, generateNextTask() and completionService.poll() may return null if there are currently no new tasks available and if currently no task has returned from the CompletionService respectively.
In these cases, the loop degenerates into an ugly idle-wait. I could poll() with a timeout or add a Thread.sleep() for the double-null case, but I consider this a bad workaround, because it nevertheless wastes CPU and is not as responsive as possible, due to the wait.
Suppose I replace generateNextTask() by a poll() on a BlockingQueue, is there good way to poll the queue as well as the CompletionService in parallel to be woken up for work on whichever end something becomes available?
Actually this reminds me of Selector. Is something like it available for queues?
You should use CompletionService.take() to wait until the next task completes and retrieve its Future. poll() is the non-blocking version, returning null if no task is currently completed.
Also, your code seems to be inefficient, because you produce and consume tasks one at a time, instead of allowing multiple tasks to be processed in parallel. Consider having a different thread for task generation and for task results consumption.
-- Edit --
I think that given the constraints you mention in your comments, you can't achieve all your requirements.
Requiring the main thread to be producer and consumer, and disallowing any busy loop or timed loop, you can't avoid the scenario where a blocking wait for a task completion takes too long and no other task gets processed in the meanwhile.
Since you "can replace generateNextTask() by a poll() on a BlockingQueue", I assume incoming tasks can be put in a queue by some other thread, and the problem is, you cannot execute take() on 2 queues simultaneously. The solution is to simply put both incoming and finished tasks in the same queue. To differentiate, wrap them in objects of different types, and then check that type in the loop after take().
This solution works, but we can go further. You said you don't want to use 2 threads for handling tasks - then you can use zero threads. Let wrappers implement Runnable and, instead of checking of the type, you just call take().run(). This way your thread become a single-threaded Executor. But we already have an Executor (CompletionService), can we use it? The problem is, handling of incoming and finished tasks should be done serially, not in parallel. So we need SerialExecutor described in api/java/util/concurrent/Executor, which accepts Runnables and executes them serially, but on another executor. This way no thread is wasted.
And finally, you mentioned Selector as possible solution. I must say, it is an outdated approach. Learn dataflow and actor computing. Nice introduction is here. Look at Dataflow4java project of mine, it has MultiPortActorTest.java example, where class Accum does what you need, with all the boilerplate with wrapper Runnables and serial executors hidden in the supporting library.
What you need is a ListenableFuture from Guava. ListenableFutureExplained
I have a bit of an issue with an application running multiple Java threads.
The application runs a number of working threads that peek continuously at an input queue and if there are messages in the queue they pull them out and process them.
Among those working threads there is another verification thread scheduled to perform at a fixed period a check to see if the host (on which the application runs) is still in "good shape" to run the application. This thread updates an AtomicBoolean value which in turn is verified by the working thread before they start peeking to see if the host is OK.
My problem is that in cases with high CPU load the thread responsible with the verification will take longer because it has to compete with all the other threads. If the AtomicBoolean does not get updated after a certain period it is automatically set to false, causing me a nasty bottleneck.
My initial approach was to increase the priority of the verification thread, but digging into it deeper I found that this is not a guaranteed behavior and an algorithm shouldn't rely on thread priority to function correctly.
Anyone got any alternative ideas? Thanks!
Instead of peeking into a regular queue data structure, use the java.util.concurrent package's LinkedBlockingQueue.
What you can do is, run an pool of threads (you could use executer service's fixed thread pool, i.e., a number of workers of your choice) and do LinkedBlockingQueue.take().
If a message arrives at the queue, it is fed to one of the waiting threads (yeah, take does block the thread until there is something to be fed with).
Java API Reference for Linked Blocking Queue's take method
HTH.
One old school approach to throttling rate of work, that does not use a health check thread at all (and so by-passes these problems) is to block or reject requests to add to the queue if the queue is longer than say 100. This applies dynamic back pressure on to the clients generating the load, slowing them down when the worker threads are over loaded.
This approach was added to the Java 1.5 library, see java.util.concurrent.ArrayBlockingQueue. Its put(o) method blocks if the queue is full.
Are u using Executor framework (from Java's concurrency package)? If not give it a shot. You could try using ScheduledExecutorService for the verification thread.
More threads does not mean better performance. Usually if you have dual core, 2 threads gives best performance, 3 or more starts getting worse. Quad core should handle 4 threads best, etc. So be careful how much threads you use.
You can put the other threads to sleep after they perform their work, and allow other threads to do their part. I believe Thread.yield() will pause the current thread to give time to other threads.
If you want your thread to run continuously, I would suggest creating two main threads, thread A and B. Use A for the verification thread, and from B, create the other threads. Therefore thread A gets more execution time.
Seems you need to utilize Condition variables. Peeking will take cpu cycles.
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/locks/Condition.html
If I create 1K threads and launch them at the same time using a latch, once the threads complete my process ends.
What I want to do is, as the thread ends, start up another thread to work on the same task (or somehow get the same thread to continue processing with the same task again).
Scenario:
I want to start 1K threads, and don't want the performance penalty of starting another 1K threads when they finish processing.
The threads simply make a http url connection to a url http://www.example.com/some/page
What I want to do is continuously run for x seconds, and always have 1K threads running.
I don't want to use an executor for this for both learning how to do it w/o it and I believe since the executor framework separates the task and threads, it doesn't gaurantee how many threads are running at the same time.
You'll have to do it in the Runnable itself. Create a simple loop surrounding your actions.
If you want them all to synchronize at a certain point, create a CountdownLatch with count 1000 and at the end of every iteration do a countDown and await.
Apache JMeter is a free performance testing tool that you can easily configure to test URL's in multiple threads. It can also distribute the tests to have e.g. 10 clients doing 100 threads instead.
use a loop in your run() method.
Close as I can tell, you want to have a large number of server threads, and have them execute a piece of work from a list, then come back and wait for another piece of work to be be specified (or work on another already-present piece in the list).
This is what you use a queue for. Probably a BlockingQueue is the simplest form to use that will suit your purposes, and there are several implementations of this in the JDK.
I have a project made using Java.
I have a complex processing, something like from one single process i create 10 different threads, then the process waits for the other threads to complete processing. Now the threads that were created do some database processsing, and then finally generates the output. But the problem here is, the process that have been waiting, again needs to process all the data that was created in the threads that were created, sort of aggregated result.
I am almost clueless what needs to be done.
Regards
You could use a java.util.concurrent.ConcurrentLinkedQueue. Have each thread put their results on the queue when they're done. The main thread just watches the queue and processes the results as they come in.
Another alternative is to use Futures. Instead of threads just use Futures for each of the processes. The main thread will block while waiting for each future to finish it's processing.
You might consider using a BlockingQueue to aggregate all your data in one data structure.
This queue can then be used by your main process (even before all your threads actually finished their work).
You'll need to start 10 threads in your main thread, and wait for them to finish. This can be done calling Thread.join() on each of the 10 started threads (after they are all started).
For more information about threads, read the Java tutorial about concurrency.
If your difficulty is how to wait in the main thread until child threads complete their work , then you can use childThread.join() on child threads from the main thread. If you are troubled by how to make the results brought by the child threads from db availble to the main thread for processing , then use some shared data structure which is populated by the child threads and which is then accessed by the main thread. ( Make sure you synchronize properly )
For all such tasks however , it is best to use Executor framework in Java 1.6.
You could just use a shared object to add data to it.
If I understand right then:
Create a class that will hold all data in the end (for example MyData). This class could have "getData" method that will return data and "add" method which will add data to some collection of your choice (array, list, ...).
Then when a thread is done with processing the data it calls:
MyData.add(partialDataFromThread)
And in the end your main class will do:
MainClass.process(MyData.getDatA());
Hope it helps...
You can use java.util.concurrent.CompletionService to submit and poll for the task completion.
Alternatively look into CountdownLatch or the CyclicBarrier classes.
Let me know if you need examples because I assume internet would already be flooded with such examples; also the javadocs are pretty good and it is always a good learning curve to do it first hand.