CompletableFuture supplyAsync(supplier) vs supplyAsync(supplier, executor) - java

Java Docs says CompletableFuture:supplyAsync(Supplier<U> supplier) runs the task in the ForkJoinPool#commonPool() whereas the CompletableFuture:suppleAsync(supplier, executor) runs it in the given executor.
I'm trying to figure out which one to use. So my questions are:
What is the ForkJoinPool#commonPool()?
When should I use supplyAsync(supplier) vs supplyAsync(supplier, executor)?

ForkJoinPool#commonPool() is the common pool of threads that Java API provides. If you ever used stream API, then the parallel operations are also done in this thread pool.
The advantage of using the common thread pool is that Java API would manage that for you - from creation to destruction. The disadvantage is that you would expect a lot of classes to share the usage of this pool.
If you used an executor, then it is like owning a private pool, so nothing is going to fight with you over the usage. You make to create the executor yourself, and pass it into CompletableFuture. However, do note that, eventually the actual performance would still depend on what is being done in the threads, and by your hardware.
Generally, I find it fine to use the common thread pool for doing more computationally intensive stuff, while executor would be better for doing things that would have to wait for things (like IO). When you "sleep" in common thread pool thread, it is like using a cubicle in a public washroom to play games on your mobile phone - someone else could be waiting for the cubicle.

Related

Situational use: Run tasks in ForkJoinPool vs. new Thread

Since I'm trying to understand a few of the new features provided in Java 8, I came upon the following situation:
I wish to implement asynchron method calls into my application (in JavaFX). The idea was to provide/use a seperate thread for everything related to the GUI so that background tasks wont block/delay the visible output in my application.
For the background tasks, I thought about either use a pool of threads or simply run them in the main thread of the application for now. Then, I came upon the ForkJoinPool used in the standard way by using the CompletableFuture class when doing something like this:
CompletableFuture.runAsync(task);
whereas task is a Runnable.
In most tutorials and the Javadoc, the ForkJoinPool is described as "a pool which contains threads waiting for tasks to run". Also, the ForkJoinPool is usually the size of the cores of the users machine, or doubled if hyperthreading is supported.
What advantages does the ForkJoinPool give me over the traditional Thread when I want to run a task asynchronously?
ForkJoinPool is not comparable with Threads, it's rather comparable with ThreadPools. Creating a new thread by code often bad and will lead to OutOfMemoryErrors because it is not controlled. Depending on your use case you might need to use a ForkJoinPool or a different pool but make sure you use one. And by the way, all CompletableFuture methods have overloads that allow you pass your own thread pool.
Some benefits of ForkJoinPool,
Already initialised for you and shutdown on the JVM shutdown, so you don't need to worry about that
Its size can be controlled through a VM parameter
Its perfect for compute bound task where work stealing can be beneficial, although this could be a problem depending on the case.

Are there any disadvantages of using a thread pool?

I know the thread pool is a good thing because it can reuse threads and thus save the cost of creating new threads. But my question is, are there any disadvantages of using a thread pool? In which situation is using a thread pool not as good as using just individual threads?
In which situation is using a thread pool not as good as using just individual threads?
The only time I can think of is when you have a single thread that only needs to do a single task for the life of your program. Something like a background thread attached to a permanent cache or something. That's about the only time I fork a thread directly as opposed to using an ExecutorService. Even then, using a Executor.newSingleThreadExecutor() would be fine. The overhead of the thread-pool itself is maybe a bit more logic and some memory but very hard to see a pressing downside.
Certainly anytime you need multiple threads to perform tasks, a thread-pool is warranted. What the ExecutorService code does is reduce the amount of code you need to write to manage the threads. The improvements in readability and code maintainability is a big win.
Threadpool is suitable only when you use it for operations that takes less time to complete. Threadpool threads are not suitable for long running operations, as it can easily lead to thread starvation.
If you require your thread to have a specific priority, then threadpool thread is not suitable.
You have tasks that cause the thread to block for long periods of time. The thread pool has a maximum number of threads, so a large number of blocked thread pool threads might prevent tasks from starting.
You've got a bunch of different answers here. I think one reason for that is the question is incomplete. You are asking for "disadvantages of using a thread pool," but you didn't say, disadvantages compared to what?
A thread pool solves a particular problem. There are other problems where "thread" or "threads" is part of the solution, but "thread pool" is not. "Thread pool" usually is the answer, when the question is, how to achieve parallel execution of many, short-lived, CPU-intensive tasks, on a multi-processor system.
Threads are useful, even on a uni-processor, for other purposes. The first question I ask about any long-running thread, for example, is "what does it wait for." Threads are an excellent tool for organizing a program that has to wait for different kinds of event. You would not use a thread pool for that, though.
In addition to Gray's answer.
Other use-case is if you are using thread local or using thread as a key of some kind of hash table or stateful custom implementation of thread. In this case you have to care about cleaning the state when particular task finished using the thread even if it failed. Otherwise some surprises are possible: next task that uses thread that has some state can start functioning wrong.
Thread pools of limited size are dangerous if the tasks running on it exchange information via blocking queues - this may cause a thread starvation: What is starvation?. Good rule is to never use blocking operation in the tasks running on a thread pool.
Theads are better when you don't plan to stop using the thread. For instance in an infinite loop. Threadpools are best when doing many tasks that don't happen all at the same time. Especially when the tasks are short the overhead and clarity of using the same thread is bigger.
It depends on the situation you are going to utilize the thread pool. For example, if your system does not need to perform tasks in parallel, a threading pool would be in no use. It would keep unnecessary threads ready for a work that will never come. In such cases you can use a SingleThreadExecutor anyway. Check this link if you haven't it may clarify you about it: Thread Pool Pattern

Java's Fork/Join vs ExecutorService - when to use which?

I just finished reading this post: What's the advantage of a Java-5 ThreadPoolExecutor over a Java-7 ForkJoinPool? and felt that the answer is not straight enough.
Can you explain in simple language and examples, what are the trade-offs between Java 7's Fork-Join framework and the older solutions?
I also read the Google's #1 hit on the topic Java Tip: When to use ForkJoinPool vs ExecutorService from javaworld.com but the article doesn't answer the title question when, it talks about api differences mostly ...
Fork-join allows you to easily execute divide and conquer jobs, which have to be implemented manually if you want to execute it in ExecutorService. In practice ExecutorService is usually used to process many independent requests (aka transaction) concurrently, and fork-join when you want to accelerate one coherent job.
Fork-join is particularly good for recursive problems, where a task involves running subtasks and then processing their results. (This is typically called "divide and conquer" ... but that doesn't reveal the essential characteristics.)
If you try to solve a recursive problem like this using conventional threading (e.g. via an ExecutorService) you end up with threads tied up waiting for other threads to deliver results to them.
On the other hand, if the problem doesn't have those characteristics, there is no real benefit from using fork-join.
References:
Java Tutorials: Fork/Join.
Java Tip: When to use ForkJoinPool vs ExecutorService:
Java 8 provides one more API in Executors
static ExecutorService newWorkStealingPool()
Creates a work-stealing thread pool using all available processors as its target parallelism level.
With addition of this API,Executors provides different types of ExecutorService options.
Depending on your requirement, you can choose one of them or you can look out for ThreadPoolExecutor which provides better control on Bounded Task Queue Size, RejectedExecutionHandler mechanisms.
static ExecutorService newFixedThreadPool(int nThreads)
Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue.
static ScheduledExecutorService newScheduledThreadPool(int corePoolSize)
Creates a thread pool that can schedule commands to run after a given delay, or to execute periodically.
static ExecutorService newCachedThreadPool(ThreadFactory threadFactory)
Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available, and uses the provided ThreadFactory to create new threads when needed.
static ExecutorService newWorkStealingPool(int parallelism)
Creates a thread pool that maintains enough threads to support the given parallelism level, and may use multiple queues to reduce contention.
Each of these APIs are targeted to fulfil respective business needs of your application. Which one to use will depend on your use case requirement.
e.g.
If you want to process all submitted tasks in order of arrival, just use newFixedThreadPool(1)
If you want to optimize performance of big computation of recursive tasks, use ForkJoinPool or newWorkStealingPool
If you want to execute some tasks periodically or at certain time in future, use newScheduledThreadPool
Have a look at one more nice article by PeterLawrey on ExecutorService use cases.
Related SE question:
java Fork/Join pool, ExecutorService and CountDownLatch
Brian Goetz describes the situation best: https://www.ibm.com/developerworks/library/j-jtp11137/index.html
Using conventional thread pools to implement fork-join is also challenging because fork-join tasks spend much of their lives waiting for other tasks. This behavior is a recipe for thread starvation deadlock, unless the parameters are carefully chosen to bound the number of tasks created or the pool itself is unbounded. Conventional thread pools are designed for tasks that are independent of each other and are also designed with potentially blocking, coarse-grained tasks in mind — fork-join solutions produce neither.
I recommend reading the whole post, as it has a good example of why you'd want to use a fork-join pool. It was written before ForkJoinPool became official, so the coInvoke() method he refers to became invokeAll().
Fork-Join framework is an extension to Executor framework to particularly address 'waiting' issues in recursive multi-threaded programs. In fact, the new Fork-Join framework classes all extend from the existing classes of the Executor framework.
There are 2 characteristics central to Fork-Join framework
Work Stealing (An idle thread steals work from a thread having tasks
queued up more than it can process currently)
Ability to recursively decompose the tasks and collect the results.
(Apparently, this requirement must have popped up along with the
conception of the notion of parallel processing... but lacked a solid
implementation framework in Java till Java 7)
If the parallel processing needs are strictly recursive, there is no choice but to go for Fork-Join, otherwise either of executor or Fork-Join framework should do, though Fork-Join can be said to better utilize the resources because of the idle threads 'stealing' some tasks from busier threads.
Fork Join is an implementation of ExecuterService. The main difference is that this implementation creates DEQUE worker pool. Where task is inserted from oneside but withdrawn from any side. It means if you have created new ForkJoinPool() it will look for the available CPU and create that many worker thread. It then distribute the load evenly across each thread. But if one thread is working slowly and others are fast, they will pick the task from the slow thread. from the backside. The below steps will illustrate the stealing better.
Stage 1 (initially):
W1 -> 5,4,3,2,1
W2 -> 10,9,8,7,6
Stage 2:
W1 -> 5,4
W2 -> 10,9,8,7,
Stage 3:
W1 -> 10,5,4
W2 -> 9,8,7,
Whereas Executor service creates asked number of thread, and apply a blocking queue to store all the remaining waiting task. If you have used cachedExecuterService, it will create single thread for each job and there will be no waiting queue.

How do I reuse Threads with different ExecutorService objects?

Is it possible to have one thread pool for my whole program so that the threads are reused, or do I need to make the ExecutorService global/ pass it to all objects using it.
To be more precise I have multiple tasks that run in my program but they do not run extremely often.
ScheduledExecutorService executorService = Executors.newScheduledThreadPool(1);
I believe that it would be unnecessary to have a full thread running all the time for every single task but it might also be costly to restart the thread every single time when a task is executed.
Is there a better alternative to making the Thread pool global?
How do I reuse Threads with different ExecutorService objects?
It is not possible to re-use threads across different ExecutorService thread-pools. You can certainly submit vastly different types of Runnable classes to a common thread-pool however.
Is there a better alternative to making the Thread pool global?
I don't see a problem with a "global" thread-pool in your application. Someone needs to know when to call shutdown() on it of course but that's the only problem I see with it. If you have a lot of disparate classes which are submitting tasks, they all could access this set (or 1) of common background threads.
You may find however that different tasks may want to use a cached thread pool while others need a fixed sized pool so that multiple pools are still necessary.
I believe that it would be unnecessary to have a full thread running all the time for every single task but it might also be costly to restart the thread every single time when a task is executed.
In general, unless you are forking tons and tons of threads, the relative cost of starting one up every so often is relatively small. Unless you have evidence from a profiler or some other source, this may be premature optimization.
With Java 8 there is a new solution.
The fork join global thread pool:
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinPool.html#commonPool--

Examples of when it is convenient to use Executors.newSingleThreadExecutor()

Could please somebody tell me a real life example where it's convenient to use this factory method rather than others?
newSingleThreadExecutor
public static ExecutorService newSingleThreadExecutor()
Creates an Executor that uses a single worker thread operating off an
unbounded queue. (Note however that if this single thread terminates
due to a failure during execution prior to shutdown, a new one will
take its place if needed to execute subsequent tasks.) Tasks are
guaranteed to execute sequentially, and no more than one task will be
active at any given time. Unlike the otherwise equivalent
newFixedThreadPool(1) the returned executor is guaranteed not to be
reconfigurable to use additional threads.
Thanks in advance.
Could please somebody tell me a real life example where it's convenient to use [the newSingleThreadExecutor() factory method] rather than others?
I assume you are asking about when you use a single-threaded thread-pool as opposed to a fixed or cached thread pool.
I use a single threaded executor when I have many tasks to run but I only want one thread to do it. This is the same as using a fixed thread pool of 1 of course. Often this is because we don't need them to run in parallel, they are background tasks, and we don't want to take too many system resources (CPU, memory, IO). I want to deal with the various tasks as Callable or Runnable objects so an ExecutorService is optimal but all I need is a single thread to run them.
For example, I have a number of timer tasks that I spring inject. I have two kinds of tasks and my "short-run" tasks run in a single thread pool. There is only one thread that executes them all even though there are a couple of hundred in my system. They do routine tasks such as checking for disk space, cleaning up logs, dumping statistics, etc.. For the tasks that are time critical, I run in a cached thread pool.
Another example is that we have a series of partner integration tasks. They don't take very long and they run rather infrequently and we don't want them to compete with other system threads so they run in a single threaded executor.
A third example is that we have a finite state machine where each of the state mutators takes the job from one state to another and is registered as a Runnable in a single thread-pool. Even though we have hundreds of mutators, only one task is valid at any one point in time so it makes no sense to allocate more than one thread for the task.
Apart from the reasons already mentioned, you would want to use a single threaded executor when you want ordering guarantees, i.e you need to make sure that whatever tasks are being submitted will always happen in the order they were submitted.
The difference between Executors.newSingleThreadExecutor() and Executors.newFixedThreadPool(1) is small but can be helpful when designing a library API. If you expose the returned ExecutorService to users of your library and the library works correctly only when the executor uses a single thread (tasks are not thread safe), it is preferable to use Executors.newSingleThreadExecutor(). Otherwise the user of your library could break it by doing this:
ExecutorService e = myLibrary.getBackgroundTaskExecutor();
((ThreadPoolExecutor)e).setCorePoolSize(10);
, which is not possible for Executors.newSingleThreadExecutor().
It is helpful when you need a lightweight service which only makes it convenient to defer task execution, and you want to ensure only one thread is used for the job.

Categories

Resources