ExecutorService Shutdown - Kafka

ExecutorService Shutdown - Kafka - java

I am fetching messages from Kafka and processing them using Executor service similar to below. I am not calling executorService.shutdown(). I have occasionally been seeing heapSize exception , but not sure if this could be one of the causes. How do the unused Runnable instances get removed after completion? Should I do anything specific to explicitly make it eligible for GC?
public class Consumer implements CommandLineRunner{
ExecutorService executorService;
executorService = Executors.newFixedThreadPool(50)
executorService.submit(runnable);
}
}

From documentation for Executors.newFixedThreadPool:
Creates a thread pool that reuses a fixed number of threads operating
off a shared unbounded queue. At any point, at most nThreads threads
will be active processing tasks. If additional tasks are submitted
when all threads are active, they will wait in the queue until a
thread is available. If any thread terminates due to a failure during
execution prior to shutdown, a new one will take its place if needed
to execute subsequent tasks. The threads in the pool will exist until
it is explicitly shutdown.
For given example there will be at most 50 threads active in the thread pool and they are released when you call shutdown method.
If you do not keep references to your Runnables they will be GCed when Executor is done with them. If you get out of memory exceptions this can be due to queued Runnables in cases when executor can not keep up with the work submitted to him.
EDIT: Also out of memory exceptions can happen if your tasks take a lot of memory (obviously).

Related

why is newCachedThreadPool good for short-lived asynchronous tasks? [duplicate]

newCachedThreadPool() versus newFixedThreadPool()
When should I use one or the other? Which strategy is better in terms of resource utilization?

I think the docs explain the difference and usage of these two functions pretty well:
newFixedThreadPool
Creates a thread pool that reuses a
fixed number of threads operating off
a shared unbounded queue. At any
point, at most nThreads threads will
be active processing tasks. If
additional tasks are submitted when
all threads are active, they will wait
in the queue until a thread is
available. If any thread terminates
due to a failure during execution
prior to shutdown, a new one will take
its place if needed to execute
subsequent tasks. The threads in the
pool will exist until it is explicitly
shutdown.
newCachedThreadPool
Creates a thread pool that creates new
threads as needed, but will reuse
previously constructed threads when
they are available. These pools will
typically improve the performance of
programs that execute many short-lived
asynchronous tasks. Calls to execute
will reuse previously constructed
threads if available. If no existing
thread is available, a new thread will
be created and added to the pool.
Threads that have not been used for
sixty seconds are terminated and
removed from the cache. Thus, a pool
that remains idle for long enough will
not consume any resources. Note that
pools with similar properties but
different details (for example,
timeout parameters) may be created
using ThreadPoolExecutor constructors.
In terms of resources, the newFixedThreadPool will keep all the threads running until they are explicitly terminated. In the newCachedThreadPool Threads that have not been used for sixty seconds are terminated and removed from the cache.
Given this, the resource consumption will depend very much in the situation. For instance, If you have a huge number of long running tasks I would suggest the FixedThreadPool. As for the CachedThreadPool, the docs say that "These pools will typically improve the performance of programs that execute many short-lived asynchronous tasks".

Just to complete the other answers, I would like to quote Effective Java, 2nd Edition, by Joshua Bloch, chapter 10, Item 68 :
"Choosing the executor service for a particular application can be tricky. If you’re writing a small program, or a lightly loaded server, using Executors.new- CachedThreadPool is generally a good choice, as it demands no configuration and generally “does the right thing.” But a cached thread pool is not a good choice for a heavily loaded production server!
In a cached thread pool, submitted tasks are not queued but immediately handed off to a thread for execution. If no threads are available, a new one is created. If a server is so heavily loaded that all of its CPUs are fully utilized, and more tasks arrive, more threads will be created, which will only make matters worse.
Therefore, in a heavily loaded production server, you are much better off using Executors.newFixedThreadPool, which gives you a pool with a fixed number of threads, or using the ThreadPoolExecutor class directly, for maximum control."

If you see the code in the grepcode, you will see, they are calling ThreadPoolExecutor. internally and setting their properties. You can create your one to have a better control of your requirement.
public static ExecutorService newFixedThreadPool(int nThreads) {
return new ThreadPoolExecutor(nThreads, nThreads,0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>());
}
public static ExecutorService newCachedThreadPool() {
return new ThreadPoolExecutor(0, Integer.MAX_VALUE,
60L, TimeUnit.SECONDS,
new SynchronousQueue<Runnable>());
}

The ThreadPoolExecutor class is the base implementation for the executors that are returned from many of the Executors factory methods. So let's approach Fixed and Cached thread pools from ThreadPoolExecutor's perspective.
ThreadPoolExecutor
The main constructor of this class looks like this:
public ThreadPoolExecutor(
int corePoolSize,
int maximumPoolSize,
long keepAliveTime,
TimeUnit unit,
BlockingQueue<Runnable> workQueue,
ThreadFactory threadFactory,
RejectedExecutionHandler handler
)
Core Pool Size
The corePoolSize determines the minimum size of the target thread pool. The implementation would maintain a pool of that size even if there are no tasks to execute.
Maximum Pool Size
The maximumPoolSize is the maximum number of threads that can be active at once.
After the thread pool grows and becomes bigger than the corePoolSize threshold, the executor can terminate idle threads and reach to the corePoolSize again.
If allowCoreThreadTimeOut is true, then the executor can even terminate core pool threads if they were idle more than keepAliveTime threshold.
So the bottom line is if threads remain idle more than keepAliveTime threshold, they may get terminated since there is no demand for them.
Queuing
What happens when a new task comes in and all core threads are occupied? The new tasks will be queued inside that BlockingQueue<Runnable> instance. When a thread becomes free, one of those queued tasks can be processed.
There are different implementations of the BlockingQueue interface in Java, so we can implement different queuing approaches like:
Bounded Queue: New tasks would be queued inside a bounded task queue.
Unbounded Queue: New tasks would be queued inside an unbounded task queue. So this queue can grow as much as the heap size allows.
Synchronous Handoff: We can also use the SynchronousQueue to queue the new tasks. In that case, when queuing a new task, another thread must already be waiting for that task.
Work Submission
Here's how the ThreadPoolExecutor executes a new task:
If fewer than corePoolSize threads are running, tries to start a
new thread with the given task as its first job.
Otherwise, it tries to enqueue the new task using the
BlockingQueue#offer method. The offer method won't block if the queue is full and immediately returns false.
If it fails to queue the new task (i.e. offer returns false), then it tries to add a new thread to the thread pool with this task as its first job.
If it fails to add the new thread, then the executor is either shut down or saturated. Either way, the new task would be rejected using the provided RejectedExecutionHandler.
The main difference between the fixed and cached thread pools boils down to these three factors:
Core Pool Size
Maximum Pool Size
Queuing
+-----------+-----------+-------------------+---------------------------------+
| Pool Type | Core Size | Maximum Size | Queuing Strategy |
+-----------+-----------+-------------------+---------------------------------+
| Fixed | n (fixed) | n (fixed) | Unbounded `LinkedBlockingQueue` |
+-----------+-----------+-------------------+---------------------------------+
| Cached | 0 | Integer.MAX_VALUE | `SynchronousQueue` |
+-----------+-----------+-------------------+---------------------------------+
Fixed Thread Pool
Here's how the Excutors.newFixedThreadPool(n) works:
public static ExecutorService newFixedThreadPool(int nThreads) {
return new ThreadPoolExecutor(nThreads, nThreads,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>());
}
As you can see:
The thread pool size is fixed.
If there is high demand, it won't grow.
If threads are idle for quite some time, it won't shrink.
Suppose all those threads are occupied with some long-running tasks and the arrival rate is still pretty high. Since the executor is using an unbounded queue, it may consume a huge part of the heap. Being unfortunate enough, we may experience an OutOfMemoryError.
When should I use one or the other? Which strategy is better in terms of resource utilization?
A fixed-size thread pool seems to be a good candidate when we're going to limit the number of concurrent tasks for resource management purposes.
For example, if we're going to use an executor to handle web server requests, a fixed executor can handle the request bursts more reasonably.
For even better resource management, it's highly recommended to create a custom ThreadPoolExecutor with a bounded BlockingQueue<T> implementation coupled with reasonable RejectedExecutionHandler.
Cached Thread Pool
Here's how the Executors.newCachedThreadPool() works:
public static ExecutorService newCachedThreadPool() {
return new ThreadPoolExecutor(0, Integer.MAX_VALUE,
60L, TimeUnit.SECONDS,
new SynchronousQueue<Runnable>());
}
As you can see:
The thread pool can grow from zero threads to Integer.MAX_VALUE. Practically, the thread pool is unbounded.
If any thread is idle for more than 1 minute, it may get terminated. So the pool can shrink if threads remain too much idle.
If all allocated threads are occupied while a new task comes in, then it creates a new thread, as offering a new task to a SynchronousQueue always fails when there is no one on the other end to accept it!
When should I use one or the other? Which strategy is better in terms of resource utilization?
Use it when you have a lot of predictable short-running tasks.

If you are not worried about an unbounded queue of Callable/Runnable tasks, you can use one of them. As suggested by bruno, I too prefer newFixedThreadPool to newCachedThreadPool over these two.
But ThreadPoolExecutor provides more flexible features compared to either newFixedThreadPool or newCachedThreadPool
ThreadPoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime,
TimeUnit unit, BlockingQueue<Runnable> workQueue, ThreadFactory threadFactory,
RejectedExecutionHandler handler)
Advantages:
You have full control of BlockingQueue size. It's not un-bounded, unlike the earlier two options. I won't get an out of memory error due to a huge pile-up of pending Callable/Runnable tasks when there is unexpected turbulence in the system.
You can implement custom Rejection handling policy OR use one of the policies:
In the default ThreadPoolExecutor.AbortPolicy, the handler throws a runtime RejectedExecutionException upon rejection.
In ThreadPoolExecutor.CallerRunsPolicy, the thread that invokes execute itself runs the task. This provides a simple feedback control mechanism that will slow down the rate that new tasks are submitted.
In ThreadPoolExecutor.DiscardPolicy, a task that cannot be executed is simply dropped.
In ThreadPoolExecutor.DiscardOldestPolicy, if the executor is not shut down, the task at the head of the work queue is dropped, and then execution is retried (which can fail again, causing this to be repeated.)
You can implement a custom Thread factory for the below use cases:
To set a more descriptive thread name
To set thread daemon status
To set thread priority

That’s right, Executors.newCachedThreadPool() isn't a great choice for server code that's servicing multiple clients and concurrent requests.
Why? There are basically two (related) problems with it:
It's unbounded, which means that you're opening the door for anyone to cripple your JVM by simply injecting more work into the service (DoS attack). Threads consume a non-negligible amount of memory and also increase memory consumption based on their work-in-progress, so it's quite easy to topple a server this way (unless you have other circuit-breakers in place).
The unbounded problem is exacerbated by the fact that the Executor is fronted by a SynchronousQueue which means there's a direct handoff between the task-giver and the thread pool. Each new task will create a new thread if all existing threads are busy. This is generally a bad strategy for server code. When the CPU gets saturated, existing tasks take longer to finish. Yet more tasks are being submitted and more threads created, so tasks take longer and longer to complete. When the CPU is saturated, more threads is definitely not what the server needs.
Here are my recommendations:
Use a fixed-size thread pool Executors.newFixedThreadPool or a ThreadPoolExecutor. with a set maximum number of threads;

You must use newCachedThreadPool only when you have short-lived asynchronous tasks as stated in Javadoc, if you submit tasks which takes longer time to process, you will end up creating too many threads. You may hit 100% CPU if you submit long running tasks at faster rate to newCachedThreadPool (http://rashcoder.com/be-careful-while-using-executors-newcachedthreadpool/).

I do some quick tests and have the following findings:
1) if using SynchronousQueue:
After the threads reach the maximum size, any new work will be rejected with the exception like below.
Exception in thread "main" java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask#3fee733d rejected from java.util.concurrent.ThreadPoolExecutor#5acf9800[Running, pool size = 3, active threads = 3, queued tasks = 0, completed tasks = 0]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
2) if using LinkedBlockingQueue:
The threads never increase from minimum size to maximum size, meaning the thread pool is fixed size as the minimum size.

A couple of questions regarding Java ExecutorService newFixedThreadPool

Please note that I usually ask a question after googling for more than 20 times about the issue. But I can't still understand it. So I need your help.
Basically, I don't understand the exact usage of newFixedThreadPool
Does newFixedThreadPool(10) mean having ten different threads? Or does it mean it can have 10 of the same threads? or the both?
I executed with submit() methods more than 20 times and it's working.
Does submit() print a value? Or are you putting threads in the ExecutorService?

Briefly, tasks are small units of code that could be executed in parallel (code sections). The threads (in a thread pool) are what execute them. You can think of the threads like workers and the tasks like jobs. Jobs can be done in parallel, and workers can work in parallel. Workers work on jobs.
So, to answer your questions:
newFixedThreadPool(int nThreads) creates a thread pool of nThread threads that operate on the same input queue. nThreads is the maximum number of threads that can be running at any given time. Each thread can run a different task. With your example, you can be running up to 10 tasks at the same time. (The documentation can be found here with credit to #hovercraft-full-of-eels)
submit() pushes the given task into an event queue that is shared by the threads in the thread pool. Once a thread is available, it will take a task from the front of the queue and execute it. It shouldn't print anything, unless the Runnable you pass it has a print statement in it. However, the print statement may not be printed right when you submit the task! It will print once a thread is executing that particular task. (The documentation can be found here)

Just refer java docs or JAVA API's description rather than googling it.
For your questions I have below comments .
Question 1 ->
ExecutorService executorService = Executors.newFixedThreadPool(10);
First an ExecutorService is created using the Executors newFixedThreadPool() factory method. This creates a thread pool with 10 threads executing tasks.
Executors.newFixedThreadPool API creates, a thread pool that reuses a fixed number of threads and these threads work on a s***hared unbounded queue***.
At any point, at most nThreads threads will be active processing tasks.
If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available.
If any thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks. The threads in the pool will exist until it is explicitly SHUTDOWN.
After submitting even 20 tasks ,it worked with this thread pool.
Internally it calls below line of codes .
public static ExecutorService newFixedThreadPool(int nThreads) {
return new ThreadPoolExecutor(nThreads, nThreads,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue());
}
Question 2- > Submits a Runnable task for execution in Queue and it can also return an Object of type Future Object representing task. we can use Future's get method to check whether submitted task has successfully completed or not because it will return null upon successful completion.

Java job execution multithreaded with executorservice?

How to i implement such a function ?
I have a dynamic queue which gets filled at unknown times with runnables, which have to be executed. The ExecutorService should only start a limited amount of threads, when the maximum thread size is reached, it should stop executing more threads, until one thread finishes, then the next task should be executed.
So far i came across this:
ExecutorService executor = new ThreadPoolExecutor(20, Integer.MAX_VALUE,
60L, TimeUnit.SECONDS,
databaseConnectionQueue);
The ExecutorService is created before the queue is filled, and should stay alive until the queue is deleted, not when its empty, because this can happen. Can anybody help me ?

ThreadPoolExecutor will will not shutdown when it is empty. From the JavaDoc:
A pool that is no longer referenced in a program AND has no remaining
threads will be shutdown automatically.

I believe you should use FixedThreadPool (Executors.newFixedThreadPool)
As per javadoc [Executors.newFixedThreadPool]
Creates a thread pool that reuses a fixed number of threads operating
off a shared unbounded queue. At any point, at most nThreads threads
will be active processing tasks. If additional tasks are submitted
when all threads are active, they will wait in the queue until a
thread is available. If any thread terminates due to a failure during
execution prior to shutdown, a new one will take its place if needed
to execute subsequent tasks. The threads in the pool will exist until
it is explicitly shutdown.
Hope it helps. Thanks.

Java Executor and Long-lived Threads

I've inherited some code that uses Executors.newFixedThreadPool(4); to run the 4 long-lived threads that do all the work of the application.
Is this recommended? I've read the Java Concurrency in Practice book and there does not seem to be much guidance around how to manage long-lived application threads.
What is the recommended way to start and manage several threads that each live for the entire live of the application?

You mentioned that code is using Executors, it should be returning an ExecutorService
ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
ExecutorService is an Executor that provides methods to manage termination and methods that can produce a Future for tracking progress of one or more asynchronous tasks.
As long as returned ExecutorService is performing graceful shutdown there should not be an issue.
You can check that your code is doing shutodwn by finding following in your code:
// This will make the executor accept no new threads
// and finish all existing threads in the queue
executor.shutdown();
// Wait until all threads are finish
executor.awaitTermination();
Cheers !!

I assume that your long-lived thread do some periodic job in a loop. What you can do is the following:
Make sure that each runnable in the pool checks the pool's state before looping.
while( ! pool.isShutdown() ) { ... }
Your runnable must thus have a reference to their parent pool.
Install a JVM shutdown hook with Runtime.addShutdownHook(). The hook calls pool.shutdown() then pool.awaitTermination(). The pool will transition to the SHUTDOWN state and eventually the threads will stop, after which it will transition to the TERMINATED state.
--
That said, I'm a bit suspicious of your 4 threads. Shouldn't there be only 1 long-live threads, which fetches tasks, and submits them to an executor service? Do you really have 4 different long-lived processes? (This consideration is orthogonal to the main question).

java Executor Service

When I create ExecutorService with below code in JAVA,can someone explain how the ExecutorService works ?
ExecutorService executor = Executors.newFixedThreadPool(400);
for (int i = 0; i < 500; i++) {
Runnable worker = new MyRunnable(10000000L + i);
executor.execute(worker);
}
I believe that there will be a single Queue of work and my for loop will add 500 Runnable tasks to this queue. Now the ExecutorService has been created with a Thread Pool of 400 threads.
So of those 500 tasks in the queue, the 400 threads in the ExecutorService will execute this 400 tasks at a time, and the remaining as slots are freed up?
Am I correct in my understanding ?

JavaDoc newFixedThreadPool
Creates a thread pool that reuses a fixed number of threads operating
off a shared unbounded queue. At any point, at most nThreads threads
will be active processing tasks. If additional tasks are submitted
when all threads are active, they will wait in the queue until a
thread is available. If any thread terminates due to a failure during
execution prior to shutdown, a new one will take its place if needed
to execute subsequent tasks. The threads in the pool will exist until
it is explicitly shutdown.

If tasks are more than number of processing threads, the tasks which haven't been picked up by threads will wait. Once the thread completes one task, it will pick-up one more waiting task.
But these thread pools (other than ForkJoinPool) are not efficient in stealing worker thread tasks.
Assume that one thread is backlog of 10 tasks to be executed and it's running first task. At that same time, some other thread in pool is idle. In this scenario, once the task is allocated a Thread, only that thread will execute the task even though other threads are idle.
ForkJoinPool differs from other kinds of ExecutorService mainly by virtue of employing work-stealing: all threads in the pool attempt to find and execute tasks submitted to the pool and/or created by other active tasks (eventually blocking waiting for work if none exist)
One more new API has been added in Java 8.
public static ExecutorService newWorkStealingPool()
Creates a work-stealing thread pool using all available processors as its target parallelism level.
Relate SE question : ThreadPoolExecutor vs ForkJoinPool: stealing subtasks

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.