I am trying to implement a thread pool using ExecutorService and CompletableFuture in java. Here I have passed the fixed-size pool to completable future tasks. if I don't pass the executor service here in completable future tasks it will use Fork/Join common pool internally as I read. Now my question is should I pass executor service here externally or not and let it use Fork/Join common pool internally? which is better in which case?
ExecutorService es = Executors.newFixedThreadPool(4);
List<CompletableFuture<?>> futures = new ArrayList<>();
for(item i: item[]){
CompletableFuture<Void> future = CompletableFuture.runAsync(() -> MyClass.service(i), es);
futures.add(future);
}
CompletableFuture.allOf(futures.toArray(new CompletableFuture[]{})).join();
It's not possible to give a fully accurate answer to your question without knowing the specifics. Which kinds of tasks do you intend to run using CompletableFuture? How many CPUs does your machine have?
In general, I would suggest avoiding using the common pool, since it may bring you some unexpected bugs up to blocking all parts of your application that rely on using the common pool.
Talking more specifically, you need to understand which type of tasks will you be running: blocking or non-blocking? Blocking tasks are those ones that use any external dependencies like database invocation and communication with other services/applications, synchronization, thread sleeps, or any other blocking utilities.
The best approach will be the creation of a separate thread pool for blocking tasks that you can fully manage. This will allow you to isolate other parts of your application and have stable and predictable execution and performance behavior for all your non-blockings tasks.
However, there is one a bit advanced workaround that may allow you to use the common pool even for blocking tasks. It's ForkJoinPool.ManagedBlocker. You may check the official documentation if you want to do some digging and experiments.
Related
Java Docs says CompletableFuture:supplyAsync(Supplier<U> supplier) runs the task in the ForkJoinPool#commonPool() whereas the CompletableFuture:suppleAsync(supplier, executor) runs it in the given executor.
I'm trying to figure out which one to use. So my questions are:
What is the ForkJoinPool#commonPool()?
When should I use supplyAsync(supplier) vs supplyAsync(supplier, executor)?
ForkJoinPool#commonPool() is the common pool of threads that Java API provides. If you ever used stream API, then the parallel operations are also done in this thread pool.
The advantage of using the common thread pool is that Java API would manage that for you - from creation to destruction. The disadvantage is that you would expect a lot of classes to share the usage of this pool.
If you used an executor, then it is like owning a private pool, so nothing is going to fight with you over the usage. You make to create the executor yourself, and pass it into CompletableFuture. However, do note that, eventually the actual performance would still depend on what is being done in the threads, and by your hardware.
Generally, I find it fine to use the common thread pool for doing more computationally intensive stuff, while executor would be better for doing things that would have to wait for things (like IO). When you "sleep" in common thread pool thread, it is like using a cubicle in a public washroom to play games on your mobile phone - someone else could be waiting for the cubicle.
I just finished reading this post: What's the advantage of a Java-5 ThreadPoolExecutor over a Java-7 ForkJoinPool? and felt that the answer is not straight enough.
Can you explain in simple language and examples, what are the trade-offs between Java 7's Fork-Join framework and the older solutions?
I also read the Google's #1 hit on the topic Java Tip: When to use ForkJoinPool vs ExecutorService from javaworld.com but the article doesn't answer the title question when, it talks about api differences mostly ...
Fork-join allows you to easily execute divide and conquer jobs, which have to be implemented manually if you want to execute it in ExecutorService. In practice ExecutorService is usually used to process many independent requests (aka transaction) concurrently, and fork-join when you want to accelerate one coherent job.
Fork-join is particularly good for recursive problems, where a task involves running subtasks and then processing their results. (This is typically called "divide and conquer" ... but that doesn't reveal the essential characteristics.)
If you try to solve a recursive problem like this using conventional threading (e.g. via an ExecutorService) you end up with threads tied up waiting for other threads to deliver results to them.
On the other hand, if the problem doesn't have those characteristics, there is no real benefit from using fork-join.
References:
Java Tutorials: Fork/Join.
Java Tip: When to use ForkJoinPool vs ExecutorService:
Java 8 provides one more API in Executors
static ExecutorService newWorkStealingPool()
Creates a work-stealing thread pool using all available processors as its target parallelism level.
With addition of this API,Executors provides different types of ExecutorService options.
Depending on your requirement, you can choose one of them or you can look out for ThreadPoolExecutor which provides better control on Bounded Task Queue Size, RejectedExecutionHandler mechanisms.
static ExecutorService newFixedThreadPool(int nThreads)
Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue.
static ScheduledExecutorService newScheduledThreadPool(int corePoolSize)
Creates a thread pool that can schedule commands to run after a given delay, or to execute periodically.
static ExecutorService newCachedThreadPool(ThreadFactory threadFactory)
Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available, and uses the provided ThreadFactory to create new threads when needed.
static ExecutorService newWorkStealingPool(int parallelism)
Creates a thread pool that maintains enough threads to support the given parallelism level, and may use multiple queues to reduce contention.
Each of these APIs are targeted to fulfil respective business needs of your application. Which one to use will depend on your use case requirement.
e.g.
If you want to process all submitted tasks in order of arrival, just use newFixedThreadPool(1)
If you want to optimize performance of big computation of recursive tasks, use ForkJoinPool or newWorkStealingPool
If you want to execute some tasks periodically or at certain time in future, use newScheduledThreadPool
Have a look at one more nice article by PeterLawrey on ExecutorService use cases.
Related SE question:
java Fork/Join pool, ExecutorService and CountDownLatch
Brian Goetz describes the situation best: https://www.ibm.com/developerworks/library/j-jtp11137/index.html
Using conventional thread pools to implement fork-join is also challenging because fork-join tasks spend much of their lives waiting for other tasks. This behavior is a recipe for thread starvation deadlock, unless the parameters are carefully chosen to bound the number of tasks created or the pool itself is unbounded. Conventional thread pools are designed for tasks that are independent of each other and are also designed with potentially blocking, coarse-grained tasks in mind — fork-join solutions produce neither.
I recommend reading the whole post, as it has a good example of why you'd want to use a fork-join pool. It was written before ForkJoinPool became official, so the coInvoke() method he refers to became invokeAll().
Fork-Join framework is an extension to Executor framework to particularly address 'waiting' issues in recursive multi-threaded programs. In fact, the new Fork-Join framework classes all extend from the existing classes of the Executor framework.
There are 2 characteristics central to Fork-Join framework
Work Stealing (An idle thread steals work from a thread having tasks
queued up more than it can process currently)
Ability to recursively decompose the tasks and collect the results.
(Apparently, this requirement must have popped up along with the
conception of the notion of parallel processing... but lacked a solid
implementation framework in Java till Java 7)
If the parallel processing needs are strictly recursive, there is no choice but to go for Fork-Join, otherwise either of executor or Fork-Join framework should do, though Fork-Join can be said to better utilize the resources because of the idle threads 'stealing' some tasks from busier threads.
Fork Join is an implementation of ExecuterService. The main difference is that this implementation creates DEQUE worker pool. Where task is inserted from oneside but withdrawn from any side. It means if you have created new ForkJoinPool() it will look for the available CPU and create that many worker thread. It then distribute the load evenly across each thread. But if one thread is working slowly and others are fast, they will pick the task from the slow thread. from the backside. The below steps will illustrate the stealing better.
Stage 1 (initially):
W1 -> 5,4,3,2,1
W2 -> 10,9,8,7,6
Stage 2:
W1 -> 5,4
W2 -> 10,9,8,7,
Stage 3:
W1 -> 10,5,4
W2 -> 9,8,7,
Whereas Executor service creates asked number of thread, and apply a blocking queue to store all the remaining waiting task. If you have used cachedExecuterService, it will create single thread for each job and there will be no waiting queue.
Java 5 has introduced support for asynchronous task execution by a thread pool in the form of the Executor framework, whose heart is the thread pool implemented by java.util.concurrent.ThreadPoolExecutor. Java 7 has added an alternative thread pool in the form of java.util.concurrent.ForkJoinPool.
Looking at their respective API, ForkJoinPool provides a superset of ThreadPoolExecutor's functionality in standard scenarios (though strictly speaking ThreadPoolExecutor offers more opportunities for tuning than ForkJoinPool). Adding to this the observation that
fork/join tasks seem to be faster (possibly due to the work stealing scheduler), need definitely fewer threads (due to the non-blocking join operation), one might get the impression that ThreadPoolExecutor has been superseded by ForkJoinPool.
But is this really correct? All the material I have read seems to sum up to a rather vague distinction between the two types of thread pools:
ForkJoinPool is for many, dependent, task-generated, short, hardly ever blocking (i.e. compute-intensive) tasks
ThreadPoolExecutor is for few, independent, externally-generated, long, sometimes blocking tasks
Is this distinction correct at all? Can we say anything more specific about this?
ThreadPool (TP) and ForkJoinPool (FJ) are targeted towards different use cases. The main difference is in the number of queues employed by the different executors which decide what type of problems are better suited to either executor.
The FJ executor has n (aka parallelism level) separate concurrent queues (deques) while the TP executor has only one concurrent queue (these queues/deques maybe custom implementations not following the JDK Collections API). As a result, in scenarios where you have a large number of (usually relatively short running) tasks generated, the FJ executor will perform better as the independent queues will minimize concurrent operations and infrequent steals will help with load balancing. In TP due to the single queue, there will be concurrent operations every time work is dequeued and it will act as a relative bottleneck and limit performance.
In contrast, if there are relatively fewer long-running tasks the single queue in TP is no longer a bottleneck for performance. However, the n-independent queues and relatively frequent work-stealing attempts will now become a bottleneck in FJ as there can be possibly many futile attempts to steal work which add to overhead.
In addition, the work-stealing algorithm in FJ assumes that (older) tasks stolen from the deque will produce enough parallel tasks to reduce the number of steals. E.g. in quicksort or mergesort where older tasks equate to larger arrays, these tasks will generate more tasks and keep the queue non-empty and reduce the number of overall steals. If this is not the case in a given application then the frequent steal attempts again become a bottleneck. This is also noted in the javadoc for ForkJoinPool:
this class provides status check methods (for example getStealCount())
that are intended to aid in developing, tuning, and monitoring
fork/join applications.
Recommended Reading http://gee.cs.oswego.edu/dl/jsr166/dist/docs/
From the docs for ForkJoinPool:
A ForkJoinPool differs from other kinds of ExecutorService mainly by
virtue of employing work-stealing: all threads in the pool attempt to
find and execute tasks submitted to the pool and/or created by other
active tasks (eventually blocking waiting for work if none exist).
This enables efficient processing when most tasks spawn other subtasks
(as do most ForkJoinTasks), as well as when many small tasks are
submitted to the pool from external clients. Especially when setting
asyncMode to true in constructors, ForkJoinPools may also be
appropriate for use with event-style tasks that are never joined.
The fork join framework is useful for parallel execution while executor service allows for concurrent execution and there is a difference. See this and this.
The fork join framework also allows for work stealing (usage of a Deque).
This article is a good read.
AFAIK, ForkJoinPool works best if you a large piece of work and you want it broken up automatically. ThreadPoolExecutor is a better choice if you know how you want the work broken up. For this reason I tend to use the latter because I have determined how I want the work broken up. As such its not for every one.
Its worth nothing that when it comes to relatively random pieces of business logic a ThreadPoolExecutor will do everything you need, so why make it more complicated than you need.
Let's compare the differences in constructors:
ThreadPoolExecutor
ThreadPoolExecutor(int corePoolSize,
int maximumPoolSize,
long keepAliveTime,
TimeUnit unit,
BlockingQueue<Runnable> workQueue,
ThreadFactory threadFactory,
RejectedExecutionHandler handler)
ForkJoinPool
ForkJoinPool(int parallelism,
ForkJoinPool.ForkJoinWorkerThreadFactory factory,
Thread.UncaughtExceptionHandler handler,
boolean asyncMode)
The only advantage I have seen in ForkJoinPool: Work stealing mechanism by idle threads.
Java 8 has introduced one more API in Executors - newWorkStealingPool to create work stealing pool. You don't have to create RecursiveTask and RecursiveAction but still can use ForkJoinPool.
public static ExecutorService newWorkStealingPool()
Creates a work-stealing thread pool using all available processors as its target parallelism level.
Advantages of ThreadPoolExecutor over ForkJoinPool:
You can control task queue size in ThreadPoolExecutor unlike in ForkJoinPool.
You can enforce Rejection Policy when you ran out of your capacity unlike in ForkJoinPool
I like these two features in ThreadPoolExecutor which keeps health of system in good state.
EDIT:
Have a look at this article for use cases of various types of Executor Service thread pools and evaluation of ForkJoin Pool features.
I am creating a http proxy server in java. I have a class named Handler which is responsible for processing the requests and responses coming and going from web browser and to web server respectively. I have also another class named Copy which copies the inputStream object to outputStream object . Both these classes implement Runnable interface. I would like to use the concept of Thread pooling in my design, however i don't know how to go about that! Any hint or idea would be highly appreciated.
I suggest you look at Executor and ExecutorService. They add a lot of good stuff to make it easier to use Thread pools.
...
#Azad provided some good information and links. You should also buy and read the book Java Concurrency in Practice. (often abbreviated as JCiP) Note to stackoverflow big-wigs - how about some revenue link to Amazon???
Below is my brief summary of how to use and take advantage of ExecutorService with thread pools. Let's say you want 8 threads in the pool.
You can create one using the full featured constructors of ThreadPoolExecutor, e.g.
ExecutorService service = new ThreadPoolExecutor(8,8, more args here...);
or you can use the simpler but less customizable Executors factories, e.g.
ExecutorService service = Executors.newFixedThreadPool(8);
One advantage you immediately get is the ability to shutdown() or shutdownNow() the thread pool, and to check this status via isShutdown() or isTerminated().
If you don't care much about the Runnable you wish to run, or they are very well written, self-contained, never fail or log any errors appropriately, etc... you can call
execute(Runnable r);
If you do care about either the result (say, it calculates pi or downloads an image from a webpage) and/or you care if there was an Exception, you should use one of the submit methods that returns a Future. That allows you, at some time in the future, check if the task isDone() and to retrieve the result via get(). If there was an Exception, get() will throw it (wrapped in an ExecutionException). Note - even of your Future doesn't "return" anything (it is of type Void) it may still be good practice to call get() (ignoring the void result) to test for an Exception.
However, this checking the Future is a bit of chicken and egg problem. The whole point of a thread pool is to submit tasks without blocking. But Future.get() blocks, and Future.isDone() begs the questions of which thread is calling it, and what it does if it isn't done - do you sleep() and block?
If you are submitting a known chunk of related of tasks simultaneously, e.g., you are performing some big mathematical calculation like a matrix multiply that can be done in parallel, and there is no particular advantage to obtaining partial results, you can call invokeAll(). The calling thread will then block until all the tasks are complete, when you can call Future.get() on all the Futures.
What if the tasks are more disjointed, or you really want to use the partial results? Use ExecutorCompletionService, which wraps an ExecutorService. As tasks get completed, they are added to a queue. This makes it easy for a single thread to poll and remove events from the queue. JCiP has a great example of an web page app that downloads all the images in parallel, and renders them as soon as they become available for responsiveness.
I hope below will help you:,
class Executor
An object that executes submitted Runnable tasks. This interface provides a way of decoupling task submission from the mechanics of how each task will be run, including details of thread use, scheduling, etc. An Executor is normally used instead of explicitly creating threads. For example, rather than invoking new Thread(new(RunnableTask())).start() for each of a set of tasks, you might use:
Executor executor = anExecutor;
executor.execute(new RunnableTask1());
executor.execute(new RunnableTask2());
...
class ScheduledThreadPoolExecutor
A ThreadPoolExecutor that can additionally schedule commands to run after a given delay, or to execute periodically. This class is preferable to Timer when multiple worker threads are needed, or when the additional flexibility or capabilities of ThreadPoolExecutor (which this class extends) are required.
Delayed tasks execute no sooner than they are enabled, but without any real-time guarantees about when, after they are enabled, they will commence. Tasks scheduled for exactly the same execution time are enabled in first-in-first-out (FIFO) order of submission.
and
Interface ExecutorService
An Executor that provides methods to manage termination and methods that can produce a Future for tracking progress of one or more asynchronous tasks.
An ExecutorService can be shut down, which will cause it to stop accepting new tasks. After being shut down, the executor will eventually terminate, at which point no tasks are actively executing, no tasks are awaiting execution, and no new tasks can be submitted.
Edited:
you can find example to use Executor and ExecutorService herehereand here Question will be useful for you.
What is the advantage of using ExecutorService over running threads passing a Runnable into the Thread constructor?
ExecutorService abstracts away many of the complexities associated with the lower-level abstractions like raw Thread. It provides mechanisms for safely starting, closing down, submitting, executing, and blocking on the successful or abrupt termination of tasks (expressed as Runnable or Callable).
From JCiP, Section 6.2, straight from the horse's mouth:
Executor may be a simple interface, but it forms the basis for a flexible and powerful framework for asynchronous task execution that supports a wide variety of task execution policies. It provides a standard means of decoupling task submission from task execution, describing tasks as Runnable. The Executor implementations also provide lifecycle support and hooks for adding statistics gathering, application management, and monitoring.
...
Using an Executor is usually the easiest path to implementing a producer-consumer design in your application.
Rather than spending your time implementing (often incorrectly, and with great effort) the underlying infrastructure for parallelism, the j.u.concurrent framework allows you to instead focus on structuring tasks, dependencies, potential parallelism. For a large swath of concurrent applications, it is straightforward to identify and exploit task boundaries and make use of j.u.c, allowing you to focus on the much smaller subset of true concurrency challenges which may require more specialized solutions.
Also, despite the boilerplate look and feel, the Oracle API page summarizing the concurrency utilities includes some really solid arguments for using them, not least:
Developers are likely to already
understand the standard library
classes, so there is no need to learn
the API and behavior of ad-hoc
concurrent components. Additionally,
concurrent applications are far
simpler to debug when they are built
on reliable, well-tested components.
Java concurrency in practice is a good book on concurrency. If you haven't already, get yourself a copy. The comprehensive approach to concurrency presented there goes well beyond this question, and will save you a lot of heartache in the long run.
An advantage I see is in managing/scheduling several threads. With ExecutorService, you don't have to write your own thread manager which can be plagued with bugs. This is especially useful if your program needs to run several threads at once. For example you want to execute two threads at a time, you can easily do it like this:
ExecutorService exec = Executors.newFixedThreadPool(2);
exec.execute(new Runnable() {
public void run() {
System.out.println("Hello world");
}
});
exec.shutdown();
The example may be trivial, but try to think that the "hello world" line consists of a heavy operation and you want that operation to run in several threads at a time in order to improve your program's performance. This is just one example, there are still many cases that you want to schedule or run several threads and use ExecutorService as your thread manager.
For running a single thread, I don't see any clear advantage of using ExecutorService.
The following limitations from traditional Thread overcome by Executor framework(built-in Thread Pool framework).
Poor Resource Management i.e. It keep on creating new resource for every request. No limit to creating resource. Using Executor framework we can reuse the existing resources and put limit on creating resources.
Not Robust : If we keep on creating new thread we will get StackOverflowException exception consequently our JVM will crash.
Overhead Creation of time : For each request we need to create new resource. To creating new resource is time consuming. i.e. Thread Creating > task. Using Executor framework we can get built in Thread Pool.
Benefits of Thread Pool
Use of Thread Pool reduces response time by avoiding thread creation during request or task processing.
Use of Thread Pool allows you to change your execution policy as you need. you can go from single thread to multiple thread by just replacing ExecutorService implementation.
Thread Pool in Java application increases stability of system by creating a configured number of threads decided based on system load and available resource.
Thread Pool frees application developer from thread management stuff and allows to focus on business logic.
Source
Below are some benefits:
Executor service manage thread in asynchronous way
Use Future callable to get the return result after thread completion.
Manage allocation of work to free thread and resale completed work from thread for assigning new work automatically
fork - join framework for parallel processing
Better communication between threads
invokeAll and invokeAny give more control to run any or all thread at once
shutdown provide capability for completion of all thread assigned work
Scheduled Executor Services provide methods for producing repeating invocations of runnables and callables
Hope it will help you
Is it really that expensive to create a new thread?
As a benchmark, I just created 60,000 threads with Runnables with empty run() methods. After creating each thread, I called its start(..) method immediately. This took about 30 seconds of intense CPU activity. Similar experiments have been done in response to this question. The summary of those is that if the threads do not finish immediately, and a large number of active threads accumulate (a few thousand), then there will be problems: (1) each thread has a stack, so you will run out of memory, (2) there might be a limit on the number of threads per process imposed by the OS, but not necessarily, it seems.
So, as far as I can see, if we're talking about launching say 10 threads per second, and they all finish faster than new ones start, and we can guarantee that this rate won't be exceeded too much, then the ExecutorService doesn't offer any concrete advantage in visible performance or stability. (Though it may still make it more convenient or readable to express certain concurrency ideas in code.) On the other hand, if you might be scheduling hundreds or thousands of tasks per second, which take time to run, you could run into big problems straight away. This might happen unexpectedly, e.g. if you create threads in response to requests to a server, and there is a spike in the intensity of requests that your server receives. But e.g. one thread in response to every user input event (key press, mouse motion) seems to be perfectly fine, as long as the tasks are brief.
ExecutorService also gives access to FutureTask which will return to the calling class the results of a background task once completed. In the case of implementing Callable
public class TaskOne implements Callable<String> {
#Override
public String call() throws Exception {
String message = "Task One here. . .";
return message;
}
}
public class TaskTwo implements Callable<String> {
#Override
public String call() throws Exception {
String message = "Task Two here . . . ";
return message;
}
}
// from the calling class
ExecutorService service = Executors.newFixedThreadPool(2);
// set of Callable types
Set<Callable<String>>callables = new HashSet<Callable<String>>();
// add tasks to Set
callables.add(new TaskOne());
callables.add(new TaskTwo());
// list of Future<String> types stores the result of invokeAll()
List<Future<String>>futures = service.invokeAll(callables);
// iterate through the list and print results from get();
for(Future<String>future : futures) {
System.out.println(future.get());
}
Prior to java 1.5 version, Thread/Runnable was designed for two separate services
Unit of work
Execution of that unit of work
ExecutorService decouples those two services by designating Runnable/Callable as unit of work and Executor as a mechanism to execute ( with lifecycling) the unit of work
Executor Framework
//Task
Runnable someTask = new Runnable() {
#Override
public void run() {
System.out.println("Hello World!");
}
};
//Thread
Thread thread = new Thread(someTask);
thread.start();
//Executor
Executor executor = new Executor() {
#Override
public void execute(Runnable command) {
Thread thread = new Thread(someTask);
thread.start();
}
};
Executor is just an interface which accept Runnable. execute() method can just call command.run() or working with other classes which use Runnable(e.g. Thread)
interface Executor
execute(Runnable command)
ExecutorService interface which extends Executor and adds methods for managing - shutdown() and submit() which returns Future[About] - get(), cancel()
interface ExecutorService extends Executor
Future<?> submit(Runnable task)
shutdown()
...
ScheduledExecutorService extends ExecutorService for planning executing tasks
interface ScheduledExecutorService extends ExecutorService
schedule()
Executors class which is a Factory to provide ExecutorService realisations for running async tasks[About]
class Executors
newFixedThreadPool() returns ThreadPoolExecutor
newCachedThreadPool() returns ThreadPoolExecutor
newSingleThreadExecutor() returns FinalizableDelegatedExecutorService
newWorkStealingPool() returns ForkJoinPool
newSingleThreadScheduledExecutor() returns DelegatedScheduledExecutorService
newScheduledThreadPool() returns ScheduledThreadPoolExecutor
...
Conclusion
Working with Thread is an expensive operation for CPU and memory.
ThreadPoolExecutor consist of Task Queue(BlockingQueue) and Thread Pool(Set of Worker) which have better performance and API to handle async tasks
Creating a large number of threads with no restriction to the maximum threshold can cause application to run out of heap memory. Because of that creating a ThreadPool is much better solution. Using ThreadPool we can limit the number of threads can be pooled and reused.
Executors framework facilitate process of creating Thread pools in java. Executors class provide simple implementation of ExecutorService using ThreadPoolExecutor.
Source:
What is Executors Framework