im an building a multithreaded application, using WorkerThreads which process Tasks from BlockingQueues. The worker looks as follws (as an abstract class. subclasses implement processItem()).
abstract class WorkerThread extends Thread {
BlockingQueue<Task> q;
int tasksInSystem; // globally available
public void run() {
while(!interrupted()) {
Task t = q.take();
process(t);
tasksInSystem--;
}
}
abstract void process(Task t);
}
The special thing is that i'd like to wait for all tasks to complete.
My first idea was to:
count each added task
decrease the counter when processing completed.
But:
But there are different types of Tasks and different worker implementations and multiple queues. So I would have to maintain tons of different counters.
What I'd like to have:
q.waitForEmptyAndCompleted()
That would require the queue to keep track of the Tasks "in flight" and require the Worker Processes to signal when they are done (instead of tasksInsystem---;).
The worker is not able to increase that counter, because he would have to count the tasks after he took them from the queue. But another thread may become running right after the take() call, such that the worker was not able to increase the counter beforehand.
Hence, the counter increase and take() must be tied together (atomar). Which leads me to a specialized BlockingQueue.
I didn't find a premade solution. So my best guess is to implement my own BlockingQueue. Is there something that I could use instead (to avoid implementing and testing a thread-safe blocking queue on my own)? Or do you have any idea to implement that wait call differently?
OK, since general ExecutorService is not enough perhaps ForkJoinPool will work, it does not expose queue explicitly, but should be very easy to use given what you have described.
Key method is awaitQuiescence(long timeout, TimeUnit unit) which will wait until all submitted tasks have finished execution.
Related
Use case: tasks are generated in one thread, need to be distributed for computation to many threads and finally the generating task shall reap the results and mark the tasks as done.
I found the class ExecutorCompletionService which fits the use case nearly perfectly --- except that I see no good solution for non-idle waiting. Let me explain.
In principle my code would look like
while (true) {
MyTask t = generateNextTask();
if (t!=null) {
completionService.submit(t);
}
MyTask finished;
while (null!=(finished=compService.poll())) {
retireTaks(finished);
}
}
Both, generateNextTask() and completionService.poll() may return null if there are currently no new tasks available and if currently no task has returned from the CompletionService respectively.
In these cases, the loop degenerates into an ugly idle-wait. I could poll() with a timeout or add a Thread.sleep() for the double-null case, but I consider this a bad workaround, because it nevertheless wastes CPU and is not as responsive as possible, due to the wait.
Suppose I replace generateNextTask() by a poll() on a BlockingQueue, is there good way to poll the queue as well as the CompletionService in parallel to be woken up for work on whichever end something becomes available?
Actually this reminds me of Selector. Is something like it available for queues?
You should use CompletionService.take() to wait until the next task completes and retrieve its Future. poll() is the non-blocking version, returning null if no task is currently completed.
Also, your code seems to be inefficient, because you produce and consume tasks one at a time, instead of allowing multiple tasks to be processed in parallel. Consider having a different thread for task generation and for task results consumption.
-- Edit --
I think that given the constraints you mention in your comments, you can't achieve all your requirements.
Requiring the main thread to be producer and consumer, and disallowing any busy loop or timed loop, you can't avoid the scenario where a blocking wait for a task completion takes too long and no other task gets processed in the meanwhile.
Since you "can replace generateNextTask() by a poll() on a BlockingQueue", I assume incoming tasks can be put in a queue by some other thread, and the problem is, you cannot execute take() on 2 queues simultaneously. The solution is to simply put both incoming and finished tasks in the same queue. To differentiate, wrap them in objects of different types, and then check that type in the loop after take().
This solution works, but we can go further. You said you don't want to use 2 threads for handling tasks - then you can use zero threads. Let wrappers implement Runnable and, instead of checking of the type, you just call take().run(). This way your thread become a single-threaded Executor. But we already have an Executor (CompletionService), can we use it? The problem is, handling of incoming and finished tasks should be done serially, not in parallel. So we need SerialExecutor described in api/java/util/concurrent/Executor, which accepts Runnables and executes them serially, but on another executor. This way no thread is wasted.
And finally, you mentioned Selector as possible solution. I must say, it is an outdated approach. Learn dataflow and actor computing. Nice introduction is here. Look at Dataflow4java project of mine, it has MultiPortActorTest.java example, where class Accum does what you need, with all the boilerplate with wrapper Runnables and serial executors hidden in the supporting library.
What you need is a ListenableFuture from Guava. ListenableFutureExplained
I have some number of consumer threads, any of which can also act as producer. How should I know when they all have finished their work?
class Worker extends Thread{
void process(Task t){
...
if(needsMoreWork(t)){
queue.addAll(extractTasks(t));
}
}
public void run(){
while(isRunning){
Task t = queue.take();//I need to finish somehow.
process(t);
}
}
...
}
Rather than using Threads manually, submit your tasks to an ExecutorService, and use a CountDownLatch, CyclicBarrier, or Phaser to synchronize them, depending on whether you need multiple cycles of your job and whether you have the same number of task components in each cycle.
Depending on what specifically your process consists of, a ForkJoinPool might be an option to consider; it basically wraps up the idea of "perform this same operation on a bunch of items and collect the results".
I'm looking for a Java Executor that allows me to specify throttling/throughput/pacing limitations, for example, no more than say 100 tasks can be processed in a second -- if more tasks get submitted they should get queued and executed later. The main purpose of this is to avoid running into limits when hitting foreign APIs or servers.
I'm wondering whether either base Java (which I doubt, because I checked) or somewhere else reliable (e.g. Apache Commons) provides this, or if I have to write my own. Preferably something lightweight. I don't mind writing it myself, but if there's a "standard" version out there somewhere I'd at least like to look at it first.
Take a look at guavas RateLimiter:
A rate limiter. Conceptually, a rate limiter distributes permits at a
configurable rate. Each acquire() blocks if necessary until a permit
is available, and then takes it. Once acquired, permits need not be
released. Rate limiters are often used to restrict the rate at which
some physical or logical resource is accessed. This is in contrast to
Semaphore which restricts the number of concurrent accesses instead of
the rate (note though that concurrency and rate are closely related,
e.g. see Little's Law).
Its threadsafe, but still #Beta. Might be worth a try anyway.
You would have to wrap each call to the Executor with respect to the rate limiter. For a more clean solution you could create some kind of wrapper for the ExecutorService.
From the javadoc:
final RateLimiter rateLimiter = RateLimiter.create(2.0); // rate is "2 permits per second"
void submitTasks(List<Runnable> tasks, Executor executor) {
for (Runnable task : tasks) {
rateLimiter.acquire(); // may wait
executor.execute(task);
}
}
The Java Executor doesn't offer such a limitation, only limitation by amount of threads, which is not what you are looking for.
In general the Executor is the wrong place to limit such actions anyway, it should be at the moment where the Thread tries to call the outside server. You can do this for example by having a limiting Semaphore that threads wait on before they submit their requests.
Calling Thread:
public void run() {
// ...
requestLimiter.acquire();
connection.send();
// ...
}
While at the same time you schedule a (single) secondary thread to periodically (like every 60 seconds) releases acquired resources:
public void run() {
// ...
requestLimiter.drainPermits(); // make sure not more than max are released by draining the Semaphore empty
requestLimiter.release(MAX_NUM_REQUESTS);
// ...
}
no more than say 100 tasks can be processed in a second -- if more
tasks get submitted they should get queued and executed later
You need to look into Executors.newFixedThreadPool(int limit). This will allow you to limit the number of threads that can be executed simultaneously. If you submit more than one thread, they will be queued and executed later.
ExecutorService threadPool = Executors.newFixedThreadPool(100);
Future<?> result1 = threadPool.submit(runnable1);
Future<?> result2 = threadPool.submit(runnable2);
Futurte<SomeClass> result3 = threadPool.submit(callable1);
...
Snippet above shows how you would work with an ExecutorService that allows no more than 100 threads to be executed simultaneously.
Update:
After going over the comments, here is what I have come up with (kinda stupid). How about manually keeping a track of threads that are to be executed ? How about storing them first in an ArrayList and then submitting them to the Executor based on how many threads have already been executed in the last one second.
So, lets say 200 tasks have been submitted into our maintained ArrayList, We can iterate and add 100 to the Executor. When a second passes, we can add few more threads based on how many have completed in theExecutor and so on
Depending on the scenario, and as suggested in one of the previous responses, the basic functionalities of a ThreadPoolExecutor may do the trick.
But if the threadpool is shared by multiple clients and you want to throttle, to restrict the usage of each one of them, making sure that one client won't use all the threads, then a BoundedExecutor will do the work.
More details can be found in the following example:
http://jcip.net/listings/BoundedExecutor.java
Personally I found this scenario quite interesting. In my case, I wanted to stress that the interesting phase to throttle is the consuming side one, as in classical Producer/Consumer concurrent theory. That's the opposite of some of the suggested answers before. This is, we don't want to block the submitting thread, but block the consuming threads based in a rate (tasks/second) policy. So, even if there are tasks ready in the queue, executing/consuming Threads may block waiting to meet the throtle policy.
That said, I think a good candidate would be the Executors.newScheduledThreadPool(int corePoolSize). This way you would need a simple queue in front of the executor (a simple LinkedBlockingQueue would suit), and then schedule a periodic task to pick actual tasks from the queue (ScheduledExecutorService.scheduleAtFixedRate). So, is not an straightforward solution, but it should perform goog enough if you try to throttle the consumers as discussed before.
Can limit it inside Runnable:
public static Runnable throttle (Runnable realRunner, long delay) {
Runnable throttleRunner = new Runnable() {
// whether is waiting to run
private boolean _isWaiting = false;
// target time to run realRunner
private long _timeToRun;
// specified delay time to wait
private long _delay = delay;
// Runnable that has the real task to run
private Runnable _realRunner = realRunner;
#Override
public void run() {
// current time
long now;
synchronized (this) {
// another thread is waiting, skip
if (_isWaiting) return;
now = System.currentTimeMillis();
// update time to run
// do not update it each time since
// you do not want to postpone it unlimited
_timeToRun = now+_delay;
// set waiting status
_isWaiting = true;
}
try {
Thread.sleep(_timeToRun-now);
} catch (InterruptedException e) {
e.printStackTrace();
} finally {
// clear waiting status before run
_isWaiting = false;
// do the real task
_realRunner.run();
}
}};
return throttleRunner;
}
Take from JAVA Thread Debounce and Throttle
I try to work with Java's FutureTask, Future, Runnable, Callable and ExecutorService types.
What is the best practice to compose those building blocks?
Given that I have multiple FutureTasks and and I want to execute them in sequence.
Ofcourse I could make another FutureTask which is submitting / waiting for result for each subtask in sequence, but I want to avoid blocking calls.
Another option would be to let those subtasks invoke a callback when they complete, and schedule the next task in the callback. But going that route, how to I create a proper outer FutureTask object which also handles exceptions in the subtask without producing that much of a boilerplate?
Do I miss something here?
Very important thing, though usually not described in tutorials:
Runnables to be executed on an ExecutorService should not block. This is because each blocking switches off a working thread, and if ExecutorService has limited number of working threads, there is a risk to fall into deadlock (thread starvation), and if ExecutorService has unlimited number of working threads, then there is a risk to run out of memory. Blocking operations in the tasks simply destroy all advantages of ExecutorService, so use blocking operations on usual threads only.
FutureTask.get() is blocking operation, so can be used on ordinary threads and not from an ExecutorService task. That is, it cannot serve as a building block, but only to deliver result of execution to the master thread.
Right approach to build execution from tasks is to start next task when all input data for the next task is ready, so that the task do not have to block waiting for input data. So you need a kind of a gate which stores intermediate results and starts new task when all arguments have arrived. Thus tasks do not bother explicitly to start other tasks. So a gate, which consists of input sockets for arguments and a Runnable to compute them, can be considered as a right building block for computations on ExcutorServices.
This approach is called dataflow or workflow (if gates cannot be created dynamically).
Actor frameworks like Akka use this approach but are limited in the fact that an actor is a gate with single input socket.
I have written a true dataflow library published at https://github.com/rfqu/df4j.
I tried to do something similar with a ScheduledFuture, trying to cause a delay before things were displayed to the user. This is what I come up with, simply use the same ScheduledFuture for all your 'delays'. The code was:
public static final ScheduledExecutorService scheduler = Executors
.newScheduledThreadPool(1);
public ScheduledFuture delay = null;
delay = scheduler.schedule(new Runnable() {
#Override
public void run() {
//do something
}
}, 1000, TimeUnit.MILLISECONDS);
delay = scheduler.schedule(new Runnable() {
#Override
public void run() {
//do something else
}
}, 2000, TimeUnit.MILLISECONDS);
Hope this helps
Andy
The usual approach is to:
Decide about ExecutorService (which type, how many threads).
Decide about the task queue (for how long it could be non-blocking).
If you have some external code that waits for the task result:
* Submit tasks as Callables (this is non blocking as long as you do not run out of the queue).
* Call get on the Future.
If you want some actions to be taken automatically after the task is finished:
You can submit as Callables or Runnables.
Just add that you need to do at the end as the last code inside the task. Use
Activity.runOnUIThread these final actions need to modify GUI.
Normally, you should not actively check when you can submit one more task or schedule callback in order just to submit them. The thread queue (blocking, if preferred) will handle this for you.
I have a program that creates hundreds of instances of a class, each of which listens to another thread which simply fires an event on a regular timed schedule (so that they all perform at the same speed). What I'd like is for each of the hundreds of instances to be its own thread, so that when an event is fired, they can all work in parallel. What makes sense to me is to have these classes extend the Thread class and then have this code inside them...
public class IteratorStepListener implements StepEventListener {
public void actionPerformed(ActionEvent e) {
start();
}
}
public void run() {
doStuff();
}
This doesn't seem to work though. Clearly I'm not understanding something basic here. What's the proper way to do this?
Okay, first thing: overcome the notion that your hundreds of threads will run in parallel. At the very best, they will run concurrently, ie, time-sliced. As you get into the hundreds of threads, you will see the bearings on the scheduling algorithm start to glow; in the thousands they'll smoke and eventually seize up, and you'll get no more threads.
Now, that said, we don't have near enough code to understand what you're really doing, but one thing that I note is you don't seem to be making new Threads. Remember that a thread is an object; the canonical way to start a thread is
Thread t = new Thread(Runnable r);
t.run();
What it looks like is that you're trying to run() the same thread over and over again; this way lies madness. Have a look at Wiki on Event Driven Programming. If you really want to have a separate thread for handling each event, you'll want a scheme something like this (pseudocode):
processEvents: function
eventQueue: queue of Events
event: implements Runnable
-- something produces events and puts them on the queue
loop -- forever
do
Event ev := eventQueue.front
new Thread(ev).run();
od
end -- processEvents
It sounds like the event is going to be fired more than once... but you can't start the same thread more than once.
It sounds like your listener should implement the interface but start a thread directly in actionPerformed (or better, use an Executor so that it could use a thread pool). So instead of your current implementation, you could use:
// Assuming the listener implements runnable; you may want to
// delegate that to a separate class for separation of concerns.
public void actionPerformed(ActionEvent e) {
new Thread(this).start();
}
or
public void actionPerformed(ActionEvent e) {
executor.execute(this);
}
What I'd like is for each of the hundreds of instances to be its own thread, so that when an event is fired, they can all work in parallel.
I don't think this is a good approach.
Unless you have hundreds of processors, the threads cannot possibly all work in parallel. You'll end up with the threads running them one at a time (one per processor), or time-slicing between processors.
Each thread actually ties down a significant slice of the JVM's resources, even when inactive. IIRC, the default stack size is about 1 Mbyte.
The example code in your question shows the event calling start() on the thread. Unfortunately, you can only call start() on a thread once. Once the thread has terminated it cannot be restarted.
A better approach would be to create an executor with a bounded thread pool, and have each event cause a new task to be submitted to the executor. Something like this:
ThreadPoolExecutor executor = new ThreadPoolExecutor(corePoolSize, maxPoolSize,
keepAliveTime, timeUnit, workQueue);
...
public class IteratorStepListener implements StepEventListener, Runnable {
public void actionPerformed(ActionEvent e) {
executor.submit(this);
}
public void run() {
doStuff();
}
}
You can't use threads like that in Java. This is because Java threads directly map to underlying OS threads (at least on JVM implementations that I'm aware of), and OS threads can't scale like that. A rule of thumb is, you want to keep total number of threads within hundred or something in an app. A few hundred is probably ok. A few thousand gets usually problematic, depending on the HW you are using.
The use of threads like you described is a valid implementation strategy in languages like Erlang for example. Meanwhile, if you are stuck with Java this time, creating a shared thread pool and submitting your tasks to this instead of allowing all tasks to run concurrently might be a good alternative. In this case, you can choose a suitable number of threads (best number depends on the nature of the task. If you have no idea, number of CPU core available times 2 is a good start), and have that number of tasks run concurrently.
If you absolutely need all tasks to proceed concurrently, it could get a little complicated, but that's doable as well.