Assuming that I have a ForkJoinPool setup with degree of parallelism n, and that I call a parallel computation like this:
workpool.submit(
() -> {
objects.values().parallelStream().forEach(obj -> {
obj.foo();
});
});
I do this to ensure that the threads spawned there are created inside the workpool (I have different components of the system that need to be isolated). Now assume that the thread in which this is called, is also executing inside this workpool, and I do:
Future<?> wait = workpool.submit(
() -> {
objects.values().parallelStream().forEach(obj -> {
obj.foo();
});
});
wait.get()
1) Am I blocking a thread in the ForkJoinPool? If I were to have n threads all block on futures, while trying to schedule a task in the workpool, would this lead to deadlock? It's not clear to me whether the "maximum degree of parallellism" in the ForkJoinPool means that (if there are n non-blocked tasks), there will always be n threads executing, or whether there is a fixed number of threads,regardless of whether there are blocked. What if I use wait.join() insteadwait.join instead (I do not need checked exceptions as any exception thrown in this code will already generate a runtimeexception. If I understand correctly, join() will allow threads to execute queued tasks while waiting)
2) Am I still getting the benefit of the light-weight forkjoin tasks of the parallel stream if I am creating a runnable "wrapper" class by doing () -> {}
3) Is there any downside/upside to using this instead (assuming that the .join() does indeed implement the work-stealing behaviour that I think it does):
CompletableFuture.supplyAsync(this::mylambdafunction, workpool)
.thenAccept(this::mynextfunction);
Response to point 1: It's difficult to know whether your code will block without seeing the actual method implementations. One approach to dealing with blocking code is to increase the number of threads in the forkjoin threadpool. Normally, the number of threads in a forkjoin thread is n+1 for compute intensive tasks where n=number of processors. Or if you have I/O blocking you could use a ManagedBlocker.
Response to point 2: Yes
Response for point 3: The obvious upside to your completableFuture code is that the thenAccept is non blocking. So control will immediately go past your CompletableFuture block to the next statement without waiting whereas in the earlier code you wrote with a ForkJoin pool the wait.get() will block till you obtain an answer and won't proceed till then.
Related
Is it possible to configure ForkJoinPool to use 1 execution thread?
I am executing code that invokes Random inside a ForkJoinPool. Every time it runs, I end up with different runtime behavior, making it difficult to investigate regressions.
I would like the codebase to offer "debug" and "release" modes. "debug" mode would configure Random with a fixed seed, and ForkJoinPool with a single execution thread. "release" mode would use system-provided Random seeds and use the default number of ForkJoinPool threads.
I tried configuring ForkJoinPool with a parallelism of 1, but it uses 2 threads (main and a second worker thread). Any ideas?
So, it turns out I was wrong.
When you configure a ForkJoinPool with parallelism set to 1, only one thread executes the tasks. The main thread is blocked on ForkJoin.get(). It doesn't actually execute any tasks.
That said, it turns out that it is really tricky providing deterministic behavior. Here are some of the problems I had to correct:
ForkJoinPool was executing tasks using different worker threads (with different names) if the worker thread became idle long enough. For example, if the main thread got suspended on a debugging breakpoint, the worker thread would become idle and shut down. When I would resume execution, ForkJoinThread would spin up a new worker thread with a different name. To solve this, I had to provide a custom ForkJoinWorkerThreadFactory implementation that ensures only one thread runs at a time, and that its name is hard-coded. I also had ensure that my code was returning the same Random instance even if a worker thread shut down and came back again.
Collections with non-deterministic iteration order such as HashMap or HashSet led to elements grabbing random numbers in a different order on every run. I corrected this by using LinkedHashMap and LinkedHashSet.
Objects with non-deterministic hashCode() implementations, such as Enum.hashCode(). I forget what problems this caused but I corrected it by calculating the hashCode() myself instead of relying on the built-in method.
Here is a sample implementation of ForkJoinWorkerThreadFactory:
class MyForkJoinWorkerThread extends ForkJoinWorkerThread
{
MyForkJoinWorkerThread(ForkJoinPool pool)
{
super(pool);
// Change thread name after ForkJoinPool.registerWorker() does the same
setName("DETERMINISTIC_WORKER");
}
}
ForkJoinWorkerThreadFactory factory = new ForkJoinWorkerThreadFactory()
{
private WeakReference<Thread> currentWorker = new WeakReference<>(null);
#Override
public synchronized ForkJoinWorkerThread newThread(ForkJoinPool pool)
{
// If the pool already has a live thread, wait for it to shut down.
Thread thread = currentWorker.get();
if (thread != null && thread.isAlive())
{
try
{
thread.join();
}
catch (InterruptedException e)
{
log.error("", e);
}
}
ForkJoinWorkerThread result = new MyForkJoinWorkerThread(pool);
currentWorker = new WeakReference<>(result);
return result;
}
};
Main thread is always the first thread your application will create. So when you create a ForkJoinPool with parallelism of 1, you are creating another thread. Effectively there will be two threads in the application now ( because you created a pool of threads ).
If you need only one thread that is Main, you can execute your code in sequence ( and not in parallel at all ).
In a web app i have a method, this waits for another thread for generate reports if the quantity of customers is less than 10, but if greater than 10 i start my thread but without apply the join method, when the thread finish i notify by e-mail.
I'm a little afraid about the orphan threads with a large execution and the impact on the server.
Is good launch a "heavy" process in background (asynchronically) without use the join method or there is a better way to make it?
try {
thread.start();
if(flagSendEmail > 10){
return "{\"message\":\"success\", \"text\":\"you will be notified by email\"}";
}else{
thread.join(); //the customer waits until finish
}
} catch (InterruptedException e) {
LogError.saveErrorApp(e.getMessage(), e);
return "{\"message\":\"danger\", \"text\":\"can't generate the reports\"}";
}
Orphan threads aren't the problem, simply make sure that the run() method has a finally block that sends out the email.
The problem is that you have no control over the number of threads and that's got nothing to do with calling join(). (Unless you always wait for every single thread in the caller, at which point there's no point launching a background thread in the first place.)
The solution is to use an ExecutorService, which gives you a thread pool, and thus precise control over how many of these background threads are running at any one time. If you submit more tasks than the executor can handle at a given time, the remaining ones are queued up, waiting to be run. This way you can control the load on your server.
An added bonus is that because an executor service will typically recycle the same worker threads, the overhead of submitting a new task is less, meaning that you don't need to bother about whether you've got more than 10 items or not, everything can be run the same way.
In your case you could even consider using two separate executors: one for running the report generation and another one for sending out the emails. The reason for this is that you may want to limit the number of emails sent out in a busy period but without slowing report generation down.
There's no point is starting a thread if the very next thing you do is join() it.
I'm not sure I understand what you're trying to do, but if your example is on the right path, then this would be even better because it avoids creating and destroying a new thread (expensive) in the flagSendEmail <= 10 case:
Runnable r = ...;
if (flagSendEmail > 10) {
Thread thread = new Thread(r);
thread.start();
return "...";
} else {
r.run();
return ???
}
But chances are, you should not be explicitly creating new Threads at all. Any time a program continually creates and destroys threads, that's a sign that it should be using a thread pool instead. (See the javadoc for java.util.concurrent.ThreadPoolExecutor)
By the way: t.join() does not do anything to thread t. It doesn't do anything at all except wait until thread t is dead.
Yes it is safe, I don't recall seeing any Thread#join() actual invocations.
But it will depends on what are you trying to do. I don't know if you mean to use a pool or threads that generate reports or have some resource assigned. In any case you should limit yourself to a maximum number of threads for reports. If they are getting blocked or looped (for some bug or poor synchronization), allowing more and more threads will utterly clog your application.
Thread#join waits for the referred thread to die. Are those threads actually ending? Are you waiting for a thread to die just to launch another thread? Usually synchronization is done with wait() and notify() over the synchronization object.
Launching a process (Runtime#exec()) probably will make things even worse, unless it helps work around some weird limitation.
There are some tools like JConsole which can give you some heads up about threads getting locked and other issues.
I have a rather massive number of threads being created inside a clojure program:
(import '(java.util.concurrent Executors))
(def *pool*
(Executors/newCachedThreadPool))
(defn do-something []
; work
Thread/sleep 200
; repeat)
(dotimes [i 10000]
(.submit *pool* do-something))
It's been a while between JVMs for me and I am basically wondering here if there is any argument against using sleep or yield inside the function that is being executed by the Executor? If I understand correctly, in this case, every one of my workers has it's own thread and therefore there should be no side effects.
If the Executor is using a FixedThreadPool:
(Executors/newFixedThreadPool 1000)
Things become more complicated because threads will not be returned to the pool until their work is complete, meaning the other queued workers will take longer to complete if the threads are sleeping.
Is my understanding of threading in this instance correct?
(Note: I suspect my design is actually wrong, but just want to make sure I am on the right page)
An executor is conceptually a task queue + a worker pool. Your explanation of what will happen here is basically correct. When you submit a task to the executor, the work is enqueued until a thread can execute the task. When it is executing the task, that task owns the thread and sleeping will block other tasks from being executed on that worker thread.
Depending on what you're doing that may be ok (although it is unusual and probably bad form to sleep inside a task). It's more common to block a thread as a side effect of waiting on IO (blocked on a socket or db call for example).
Generally if you are doing periodic work, it is better to handle that outside the pool and fire tasks when they should be executed, or better yet, use a ScheduledExecutorService instead from Executors/newScheduledThreadPool.
The other main mechanism in Java for performing time-based tasks is java.util.Timer, which is a bit easier to use but not as robust as the ScheduledExecutorService.
Another alternative from Clojure is to explicitly put the worker into a background thread managed by Clojure instead of by you:
(defn do-task []
(println (java.util.Date.) "doing task"))
(defn worker [f n wait]
(doseq [task (repeat n f)]
(f)
(Thread/sleep wait)))
;; use future to execute worker in a background thread managed by Clojure
(future (worker do-task 10 1000))
;; the call to future returns immediately but in the background console
;; you will see the tasks being run.
An alternative to sleeping your threads is to have each worker have a "sleepUntil" long value. When your executor calls a worker, if it is sleeping it returns immediately. Otherwise, it does its work, then returns. This can help keep your thread count down, because a FixedThreadPoolExecutor will be able to handle many more workers than it has threads, if most of them are flagged as sleeping and return quickly.
I can't use shutdown() and awaitTermination() because it is possible new tasks will be added to the ThreadPoolExecutor while it is waiting.
So I'm looking for a way to wait until the ThreadPoolExecutor has emptied it's queue and finished all of it's tasks without stopping new tasks from being added before that point.
If it makes any difference, this is for Android.
Thanks
Update: Many weeks later after revisiting this, I discovered that a modified CountDownLatch worked better for me in this case. I'll keep the answer marked because it applies more to what I asked.
If you are interested in knowing when a certain task completes, or a certain batch of tasks, you may use ExecutorService.submit(Runnable). Invoking this method returns a Future object which may be placed into a Collection which your main thread will then iterate over calling Future.get() for each one. This will cause your main thread to halt execution until the ExecutorService has processed all of the Runnable tasks.
Collection<Future<?>> futures = new LinkedList<Future<?>>();
futures.add(executorService.submit(myRunnable));
for (Future<?> future:futures) {
future.get();
}
My Scenario is a web crawler to fetch some information from a web site then processing them. A ThreadPoolExecutor is used to speed up the process because many pages can be loaded in the time. So new tasks will be created in the existing task because the crawler will follow hyperlinks in each page. The problem is the same: the main thread do not know when all the tasks are completed and it can start to process the result. I use a simple way to determine this. It is not very elegant but works in my case:
while (executor.getTaskCount()!=executor.getCompletedTaskCount()){
System.err.println("count="+executor.getTaskCount()+","+executor.getCompletedTaskCount());
Thread.sleep(5000);
}
executor.shutdown();
executor.awaitTermination(60, TimeUnit.SECONDS);
Maybe you are looking for a CompletionService to manage batches of task, see also this answer.
(This is an attempt to reproduce Thilo's earlier, deleted answer with my own adjustments.)
I think you may need to clarify your question since there is an implicit infinite condition... at some point you have to decide to shut down your executor, and at that point it won't accept any more tasks. Your question seems to imply that you want to wait until you know that no further tasks will be submitted, which you can only know in your own application code.
The following answer will allow you to smoothly transition to a new TPE (for whatever reason), completing all the currently-submitted tasks, and not rejecting new tasks to the new TPE. It might answer your question. #Thilo's might also.
Assuming you have defined somewhere a visible TPE in use as such:
AtomicReference<ThreadPoolExecutor> publiclyAvailableTPE = ...;
You can then write the TPE swap routine as such. It could also be written using a synchronized method, but I think this is simpler:
void rotateTPE()
{
ThreadPoolExecutor newTPE = createNewTPE();
// atomic swap with publicly-visible TPE
ThreadPoolExecutor oldTPE = publiclyAvailableTPE.getAndSet(newTPE);
oldTPE.shutdown();
// and if you want this method to block awaiting completion of old tasks in
// the previously visible TPE
oldTPE.awaitTermination();
}
Alternatively, if you really no kidding want to kill the thread pool, then your submitter side will need to cope with rejected tasks at some point, and you could use null for the new TPE:
void killTPE()
{
ThreadPoolExecutor oldTPE = publiclyAvailableTPE.getAndSet(null);
oldTPE.shutdown();
// and if you want this method to block awaiting completion of old tasks in
// the previously visible TPE
oldTPE.awaitTermination();
}
Which could cause upstream problems, the caller would need to know what to do with a null.
You could also swap out with a dummy TPE that simply rejected every new execution, but that's equivalent to what happens if you call shutdown() on the TPE.
If you don't want to use shutdown, follow below approaches:
Iterate through all Future tasks from submit on ExecutorService and check the status with blocking call get() on Future object as suggested by Tim Bender
Use one of
Using invokeAll on ExecutorService
Using CountDownLatch
Using ForkJoinPool or newWorkStealingPool of Executors(since java 8)
invokeAll() on executor service also achieves the same purpose of CountDownLatch
Related SE question:
How to wait for a number of threads to complete?
You could call the waitTillDone() on Runner class:
Runner runner = Runner.runner(10);
runner.runIn(2, SECONDS, runnable);
runner.run(runnable); // each of this runnables could submit more tasks
runner.waitTillDone(); // blocks until all tasks are finished (or failed)
// and now reuse it
runner.runIn(500, MILLISECONDS, callable);
runner.waitTillDone();
runner.shutdown();
To use it add this gradle/maven dependency to your project: 'com.github.matejtymes:javafixes:1.0'
For more details look here: https://github.com/MatejTymes/JavaFixes or here: http://matejtymes.blogspot.com/2016/04/executor-that-notifies-you-when-task.html
Try using queue size and active tasks count as shown below
while (executor.getThreadPoolExecutor().getActiveCount() != 0 || !executor.getThreadPoolExecutor().getQueue().isEmpty()){
try {
Thread.sleep(500);
} catch (InterruptedException e) {
}
}
I have a series of concurrent tasks to run. If any one of them fails, I want to interrupt them all and await termination. But assuming none of them fail, I want to wait for all of them to finish.
ExecutorCompletionService seems like almost what I want here, but there doesn't appear to be a way to tell if all of my tasks are done, except by keeping a separate count of the number of tasks. (Note that both of the examples of in the Javadoc for ExecutorCompletionService keep track of the count "n" of the tasks, and use that to determine if the service is finished.)
Am I overlooking something, or do I really have to write this code myself?
Yes, you do need to keep track if you're using an ExecutorCompletionService. Typically, you would call get() on the futures to see if an error occurred. Without iterating over the tasks, how else could you tell that one failed?
If your series of tasks is of a known size, then you should use the second example in the javadoc.
However, if you don't know the number of tasks which you will submit to the CompletionService, then you have a sort of Producer-Consumer problem. One thread is producing tasks and placing them in the ECS, another would be consuming the task futures via take(). A shared Semaphore could be used, allowing the Producer to call release() and the Consumer to call acquire(). Completion semantics would depend on your application, but a volatile or atomic boolean on the producer to indicate that it is done would suffice.
I suggest a Semaphore over wait/notify with poll() because there is a non-deterministic delay between the time a task is produced and the time that task's future is available for consumption. Therefore the consumer and producer needs to be just slightly smarter.