I have a rather massive number of threads being created inside a clojure program:
(import '(java.util.concurrent Executors))
(def *pool*
(Executors/newCachedThreadPool))
(defn do-something []
; work
Thread/sleep 200
; repeat)
(dotimes [i 10000]
(.submit *pool* do-something))
It's been a while between JVMs for me and I am basically wondering here if there is any argument against using sleep or yield inside the function that is being executed by the Executor? If I understand correctly, in this case, every one of my workers has it's own thread and therefore there should be no side effects.
If the Executor is using a FixedThreadPool:
(Executors/newFixedThreadPool 1000)
Things become more complicated because threads will not be returned to the pool until their work is complete, meaning the other queued workers will take longer to complete if the threads are sleeping.
Is my understanding of threading in this instance correct?
(Note: I suspect my design is actually wrong, but just want to make sure I am on the right page)
An executor is conceptually a task queue + a worker pool. Your explanation of what will happen here is basically correct. When you submit a task to the executor, the work is enqueued until a thread can execute the task. When it is executing the task, that task owns the thread and sleeping will block other tasks from being executed on that worker thread.
Depending on what you're doing that may be ok (although it is unusual and probably bad form to sleep inside a task). It's more common to block a thread as a side effect of waiting on IO (blocked on a socket or db call for example).
Generally if you are doing periodic work, it is better to handle that outside the pool and fire tasks when they should be executed, or better yet, use a ScheduledExecutorService instead from Executors/newScheduledThreadPool.
The other main mechanism in Java for performing time-based tasks is java.util.Timer, which is a bit easier to use but not as robust as the ScheduledExecutorService.
Another alternative from Clojure is to explicitly put the worker into a background thread managed by Clojure instead of by you:
(defn do-task []
(println (java.util.Date.) "doing task"))
(defn worker [f n wait]
(doseq [task (repeat n f)]
(f)
(Thread/sleep wait)))
;; use future to execute worker in a background thread managed by Clojure
(future (worker do-task 10 1000))
;; the call to future returns immediately but in the background console
;; you will see the tasks being run.
An alternative to sleeping your threads is to have each worker have a "sleepUntil" long value. When your executor calls a worker, if it is sleeping it returns immediately. Otherwise, it does its work, then returns. This can help keep your thread count down, because a FixedThreadPoolExecutor will be able to handle many more workers than it has threads, if most of them are flagged as sleeping and return quickly.
Related
I have been reading a lot on this but I am not sure what's the most elegant way of handling my usecase. I have an application that starts a background scheduled thread using ScheduledThreadPoolExecutor. This scheduled thread in-turn has an ExecutorService of pool size 20. Each new thread submitted to this pool will inturn again have an ExecutorService of pool size, lets say 50. The lowest level thread doesn't do much other than looping through some standard tasks, each task taking anywhere from a second to 10 seconds.
As this is a background agent application performing background tasks, We should be able to stop them cleanly any time we want. The problem is I am not sure how to trickle down the an interuption/shutdown signal 3 level down to the lowest thread so I can break out of the loop and shutdown all the threads neatly.
I was looking into Runtime.addShutdownHook() but I wasn't exactly sure how it will be useful in my usecase. I was also looking into checking for isInterrupted() at the lowest possible Thread level but than I wasn't sure if Ctrl + C or kill -9 / kill -15 command actually is transformed to an interrupted signal inside the application. And if so, how would it trickle down 3 levels of threads, or would I have to manually interrupt each thread inside the Runtime.addShutdownHook().
I am trying to find a solution that is most elegant and safe.
The interrupted flag has nothing to do with native OS-level signals sent to the process hosting the JVM. You can set the interrupted flag on any thread by calling thread.interrupt().
For your problem I would suggest accumulating all your ExecutorServices into a global collection so that you may call shutdownNow() on each upon termination. If you use a gentle-enough signal to terminate your process, the shutdown hooks should be executed and there you can try to shut down your executor services. Note, however, that each task you submit must be interruptible, which means that it must respoond to the setting of the interrupted flag by actually finishing its work. This will not happen implicitly.
I must add that I find your solution with numerous executor services quite odd. A single, properly configured thread pool should be all you need in addition to the scheduled executor.
Suppose there are three threads created using executor service and now I want that t2 would start running after t1 and t3 would start running after t2. how to achieve this kind of scenario in case of thread pool?
If it would have any normal thread creating using thread.start(). I could have waited using join() method. But how to handle above scenario?
Thread t1,t2 and t3 can implement callable interface and from the call method you can return some value.
Based on the return value, after t1 returns, you can initiate t2 and similarly for t3.
"Callable" is the answer for it
You are confusing the notion of threads and what is executed on a thread. It doesn't matter when a thread "starts" in a thread pool but when execution of your processing begins or continues. So the better statement is that you have 3 Callables or Runnables and you need one of the to wait for the other two before continuing. This is done using a CountDownLatch. Create a shared latch with a count of 2. 2 of the Callables will call countDown() on the latch, the one that should wait will call await() (possibly with a timeout).
Jobs submitted to an ExecutorService must be mutually independent. If you try to establish dependencies by waiting on Semaphores, CountDownLatches or similar, you run the risk of blocking the whole Service, when all available worker threads execute jobs that wait for a jobs that has been submitted, but is behind the current jobs in the queue. You want to make sure you have more workers than possible blocking jobs. In most cases, it is better to use more than one ExecutorService and submit each job of a dependent group to a different Service.
A few options:
If this is the only scenario you have to deal with (t1->t2->t3), don't use a thread pool. Run the three tasks sequentially.
Use some inter-thread notification mechanism (e.g. BlockingQueue, CountDownLatch). This requires your tasks to hold a shared reference to the synchronization instrument you choose.
Wrap any dependence sequence with a new runnable/callable to be submitted as a single task. This approach is simple, but won't deal correctly with non-linear dependency topologies.
Every task that depends on another task should submit the other task for execution, and wait for its completion. This is a generic approach for thread pools with dependencies, but it requires a careful tuning to avoid possible deadlocks (running tasks may wait for tasks which don't have an available thread to run on. See my response here for a simple solution).
I'm comparing two variations on a test program. Both are operating with a 4-thread ForkJoinPool on a machine with four cores.
In 'mode 1', I use the pool very much like an executor service. I toss a pile of tasks into ExecutorService.invokeAll. I get better performance than from an ordinary fixed thread executor service (even though there are calls to Lucene, that do some I/O, in there).
There is no divide-and-conquer here. Literally, I do
ExecutorService es = new ForkJoinPool(4);
es.invokeAll(collection_of_Callables);
In 'mode 2', I submit a single task to the pool, and in that task call ForkJoinTask.invokeAll to submit the subtasks. So, I have an object that inherits from RecursiveAction, and it is submitted to the pool. In the compute method of that class, I call the invokeAll on a collection of objects from a different class that also inherits from RecursiveAction. For testing purposes, I submit only one-at-a-time of the first objects. What I naively expected to see what all four threads busy, as the thread calling invokeAll would grab one of the subtasks for itself instead of just sitting and blocking. I can think of some reasons why it might not work that way.
Watching in VisualVM, in mode 2, one thread is pretty nearly always waiting. What I expect to see is the thread calling invokeAll immediately going to work on one of the invoked tasks rather than just sitting still. This is certainly better than the deadlocks that would result from trying this scheme with an ordinary thread pool, but still, what up? Is it holding one thread back in case something else gets submitted? And, if so, why not the same problem in mode 1?
So far I've been running this using the jsr166 jar added to java 1.6's boot class path.
ForkJoinTask.invokeAll is forking all tasks, but the first in the list. The first task it runs itself. Then it joins other tasks. It's thread is not released in any way to the pool. So you what you see, it it's thread blocking on other tasks to be complete.
The classic use of invokeAll for a Fork Join pool is to fork one task and compute another (in that executing thread). The thread that does not fork will join after it computes. The work stealing comes in with both tasks computing. When each task computes it is expected to fork it's own subtasks (until some threshold is met).
I am not sure what invokeAll is being called for your RecursiveAction.compute() but if it is the invokeAll which takes two RecursiveAction it will fork one, compute the other and wait for the forked task to finish.
This is different then a plain executor service because each task of an ExecutorService is simply a Runnable on a queue. There is no need for two tasks of an ExecutorService to know the outcome of another. That is the primary use case of a FJ Pool.
I am trying to Tune a thread which does the following:
A thread pool with just 1 thread [CorePoolSize =0, maxPoolSize = 1]
The Queue used is a ArrayBlockingQueue
Quesize = 20
BackGround:
The thread tries to read a request and perform an operation on it.
HOWEVER, eventually the requests have increased so much that the thread is always busy and consume 1 CPU which makes it a resource hog.
What I want to do it , instead sample the requests at intervals and process them . Other requests can be safely ignored.
What I would have to do is put a sleep in "operation" function so that for each task the thread sleeps for sometime and releases the CPU.
Quesiton:
However , I was wondering if there is a way to use a queue which basically itself sleeps for sometime before it reads the next element. This would be ideal since sleeping a task in the middle of execution and keeping the execution incomplete just doesn't sound the best to me.
Please let me know if you have any other suggestions as well for the tasks
Thanks.
Edit:
I have added a follow-up question here
corrected the maxpool size to be 1 [written in a haste] .. thanks tim for pointing it out.
No, you can't make the thread sleep while it's in the pool. If there's a task in the queue, it will be executed.
Pausing within a queued task is the only way to force the thread to be idle in spite of queued tasks. Now, the "sleep" doesn't have to be in the same task as the "work"—you could queue a separate rest task after each real task, which might make for a cleaner implementation. More importantly, if the work is a Callable that returns a result, separating into two tasks will allow you to obtain the result as soon as possible.
As a refinement, rather than sleeping for a fixed interval between every task, you could "throttle" execution to a specified rate. This would allow you to avoid waiting unnecessarily between tasks, yet avoid executing too many tasks within a specified time interval. You can read another answer of mine for a simple way to implement this with a DelayQueue.
You could subclass ThreadPool and override beforeExecute to sleep for some time:
#Overrides
protected void beforeExecute(Thread t,
Runnable r){
try{
Thread.sleep( millis); // will sleep the correct thread, see JavaDoc
}
catch (InterruptedException e){}
}
But see AngerClown's comment about artificially slowing down the queue probably not being a good idea.
This might not work for you, but you could try setting the executor's thread priority to low.
Essentially, create the ThreadPoolExecutor with a custom ThreadFactory. Have the ThreadFactory.newThread() method return Threads with a priority of Thread.MIN_PRIORITY. This will cause the executor service you use to only be scheduled if there is an available core to run it.
The implication: On a system that strictly uses time slicing, you will only be given a time slice to execute if there is no other Thread in the entire program with a greater priority asking to be scheduled. Depending on how busy your application really is, you might get scheduled every once in awhile, or you might not be scheduled at all.
The reason the thread is consuming 100% CPU is because it is given more work than it can process. Adding a delay between tasks is not going to fix this problem. It is just make things worse.
Instead you should look at WHY your tasks are consuming so much CPU e.g. with a profiler and change them so that consume less CPU until you find that your thread can keep up and it no longer consumes 100% cpu.
This question is a followup on this one.
Essentially what I am doing is declaring a ThreadPoolExecutor with just one thread. I am overriding the beforeExecute() method to put a sleep so that each of my tasks are executed with some delay among themselves. This is basically to give away the CPU to other threads since my thread is kind of thrashing.
So the expected behavior is:
For each new task in the ThreadPoolExecutor, it calls the before execute function before executing the task and hence it sleeps for say 20s before it executes the task.
However this is what I see:
For each new task submitted:
It executes the task
Calls the beforeExecute method
sleeps for say 20s
RE-EXECUTE the task!
The order of 1. & 2. is not the same all the time.
Here are my questions:
It is appearing that a new thread comes in after/during sleeping and goes ahead and executes my task right away while the actual thread is sleeping.
So does the ThreadPoolExecutor spawn a new thread as soon as an existing thread sleeps [thinking that the thread is terminated]??
I tried to put the keepAliveTime > sleeptime ..so that in case the above assertion is true .. it atleast waits for more than sleep time to spawn a new thread ...[hoping in the mean time the sleeping thread would be awake and the ThreadPoolExecutor would dump the idea of spawning a new thread
Even if it does spawn a new thread and execute my task right away, why would the task be re-executed after the sleeping thread wakes up !! Shouldn't the task be taken out of task Queue before that ??
Am I missing something here ? Any other way to debug this scenario ?
=> An alternative method I was thinking to do the desired task [and not solve the peoblem] was to wrap the runnable with one more runnable and sleep the outer runnable before calling the inner one.
I think what you're looking for is a ScheduledExecutorService
From what I understand of your question, scheduleAtFixedRate(...) should do the deal:
scheduleAtFixedRate(Runnable command, long initialDelay, long period, TimeUnit unit)
Creates and executes a periodic action
that becomes enabled first after the
given initial delay, and subsequently
with the given period; that is
executions will commence after
initialDelay then initialDelay+period,
then initialDelay + 2 * period, and so
on.
No, that is not how it works. The ThreadPoolExecutor knows it has a worker thread, even if that worker is RUNNABLE, WAITING, BLOCKED, or any other state.
The task is removed from the BlockingQueue long before the beforeExecute method is invoked.
You can look at the code for the API yourself and determine what it is doing. Every Java JDK installation includes a "src.zip" file which contains the entire Java Library. If yu haven't already, you can attach this source in eclipse and then while debugging in eclipse diving into a library method will show you source instead of just the class file.