I have some number of consumer threads, any of which can also act as producer. How should I know when they all have finished their work?
class Worker extends Thread{
void process(Task t){
...
if(needsMoreWork(t)){
queue.addAll(extractTasks(t));
}
}
public void run(){
while(isRunning){
Task t = queue.take();//I need to finish somehow.
process(t);
}
}
...
}
Rather than using Threads manually, submit your tasks to an ExecutorService, and use a CountDownLatch, CyclicBarrier, or Phaser to synchronize them, depending on whether you need multiple cycles of your job and whether you have the same number of task components in each cycle.
Depending on what specifically your process consists of, a ForkJoinPool might be an option to consider; it basically wraps up the idea of "perform this same operation on a bunch of items and collect the results".
Related
im an building a multithreaded application, using WorkerThreads which process Tasks from BlockingQueues. The worker looks as follws (as an abstract class. subclasses implement processItem()).
abstract class WorkerThread extends Thread {
BlockingQueue<Task> q;
int tasksInSystem; // globally available
public void run() {
while(!interrupted()) {
Task t = q.take();
process(t);
tasksInSystem--;
}
}
abstract void process(Task t);
}
The special thing is that i'd like to wait for all tasks to complete.
My first idea was to:
count each added task
decrease the counter when processing completed.
But:
But there are different types of Tasks and different worker implementations and multiple queues. So I would have to maintain tons of different counters.
What I'd like to have:
q.waitForEmptyAndCompleted()
That would require the queue to keep track of the Tasks "in flight" and require the Worker Processes to signal when they are done (instead of tasksInsystem---;).
The worker is not able to increase that counter, because he would have to count the tasks after he took them from the queue. But another thread may become running right after the take() call, such that the worker was not able to increase the counter beforehand.
Hence, the counter increase and take() must be tied together (atomar). Which leads me to a specialized BlockingQueue.
I didn't find a premade solution. So my best guess is to implement my own BlockingQueue. Is there something that I could use instead (to avoid implementing and testing a thread-safe blocking queue on my own)? Or do you have any idea to implement that wait call differently?
OK, since general ExecutorService is not enough perhaps ForkJoinPool will work, it does not expose queue explicitly, but should be very easy to use given what you have described.
Key method is awaitQuiescence(long timeout, TimeUnit unit) which will wait until all submitted tasks have finished execution.
I'm looking for a Java Executor that allows me to specify throttling/throughput/pacing limitations, for example, no more than say 100 tasks can be processed in a second -- if more tasks get submitted they should get queued and executed later. The main purpose of this is to avoid running into limits when hitting foreign APIs or servers.
I'm wondering whether either base Java (which I doubt, because I checked) or somewhere else reliable (e.g. Apache Commons) provides this, or if I have to write my own. Preferably something lightweight. I don't mind writing it myself, but if there's a "standard" version out there somewhere I'd at least like to look at it first.
Take a look at guavas RateLimiter:
A rate limiter. Conceptually, a rate limiter distributes permits at a
configurable rate. Each acquire() blocks if necessary until a permit
is available, and then takes it. Once acquired, permits need not be
released. Rate limiters are often used to restrict the rate at which
some physical or logical resource is accessed. This is in contrast to
Semaphore which restricts the number of concurrent accesses instead of
the rate (note though that concurrency and rate are closely related,
e.g. see Little's Law).
Its threadsafe, but still #Beta. Might be worth a try anyway.
You would have to wrap each call to the Executor with respect to the rate limiter. For a more clean solution you could create some kind of wrapper for the ExecutorService.
From the javadoc:
final RateLimiter rateLimiter = RateLimiter.create(2.0); // rate is "2 permits per second"
void submitTasks(List<Runnable> tasks, Executor executor) {
for (Runnable task : tasks) {
rateLimiter.acquire(); // may wait
executor.execute(task);
}
}
The Java Executor doesn't offer such a limitation, only limitation by amount of threads, which is not what you are looking for.
In general the Executor is the wrong place to limit such actions anyway, it should be at the moment where the Thread tries to call the outside server. You can do this for example by having a limiting Semaphore that threads wait on before they submit their requests.
Calling Thread:
public void run() {
// ...
requestLimiter.acquire();
connection.send();
// ...
}
While at the same time you schedule a (single) secondary thread to periodically (like every 60 seconds) releases acquired resources:
public void run() {
// ...
requestLimiter.drainPermits(); // make sure not more than max are released by draining the Semaphore empty
requestLimiter.release(MAX_NUM_REQUESTS);
// ...
}
no more than say 100 tasks can be processed in a second -- if more
tasks get submitted they should get queued and executed later
You need to look into Executors.newFixedThreadPool(int limit). This will allow you to limit the number of threads that can be executed simultaneously. If you submit more than one thread, they will be queued and executed later.
ExecutorService threadPool = Executors.newFixedThreadPool(100);
Future<?> result1 = threadPool.submit(runnable1);
Future<?> result2 = threadPool.submit(runnable2);
Futurte<SomeClass> result3 = threadPool.submit(callable1);
...
Snippet above shows how you would work with an ExecutorService that allows no more than 100 threads to be executed simultaneously.
Update:
After going over the comments, here is what I have come up with (kinda stupid). How about manually keeping a track of threads that are to be executed ? How about storing them first in an ArrayList and then submitting them to the Executor based on how many threads have already been executed in the last one second.
So, lets say 200 tasks have been submitted into our maintained ArrayList, We can iterate and add 100 to the Executor. When a second passes, we can add few more threads based on how many have completed in theExecutor and so on
Depending on the scenario, and as suggested in one of the previous responses, the basic functionalities of a ThreadPoolExecutor may do the trick.
But if the threadpool is shared by multiple clients and you want to throttle, to restrict the usage of each one of them, making sure that one client won't use all the threads, then a BoundedExecutor will do the work.
More details can be found in the following example:
http://jcip.net/listings/BoundedExecutor.java
Personally I found this scenario quite interesting. In my case, I wanted to stress that the interesting phase to throttle is the consuming side one, as in classical Producer/Consumer concurrent theory. That's the opposite of some of the suggested answers before. This is, we don't want to block the submitting thread, but block the consuming threads based in a rate (tasks/second) policy. So, even if there are tasks ready in the queue, executing/consuming Threads may block waiting to meet the throtle policy.
That said, I think a good candidate would be the Executors.newScheduledThreadPool(int corePoolSize). This way you would need a simple queue in front of the executor (a simple LinkedBlockingQueue would suit), and then schedule a periodic task to pick actual tasks from the queue (ScheduledExecutorService.scheduleAtFixedRate). So, is not an straightforward solution, but it should perform goog enough if you try to throttle the consumers as discussed before.
Can limit it inside Runnable:
public static Runnable throttle (Runnable realRunner, long delay) {
Runnable throttleRunner = new Runnable() {
// whether is waiting to run
private boolean _isWaiting = false;
// target time to run realRunner
private long _timeToRun;
// specified delay time to wait
private long _delay = delay;
// Runnable that has the real task to run
private Runnable _realRunner = realRunner;
#Override
public void run() {
// current time
long now;
synchronized (this) {
// another thread is waiting, skip
if (_isWaiting) return;
now = System.currentTimeMillis();
// update time to run
// do not update it each time since
// you do not want to postpone it unlimited
_timeToRun = now+_delay;
// set waiting status
_isWaiting = true;
}
try {
Thread.sleep(_timeToRun-now);
} catch (InterruptedException e) {
e.printStackTrace();
} finally {
// clear waiting status before run
_isWaiting = false;
// do the real task
_realRunner.run();
}
}};
return throttleRunner;
}
Take from JAVA Thread Debounce and Throttle
I try to work with Java's FutureTask, Future, Runnable, Callable and ExecutorService types.
What is the best practice to compose those building blocks?
Given that I have multiple FutureTasks and and I want to execute them in sequence.
Ofcourse I could make another FutureTask which is submitting / waiting for result for each subtask in sequence, but I want to avoid blocking calls.
Another option would be to let those subtasks invoke a callback when they complete, and schedule the next task in the callback. But going that route, how to I create a proper outer FutureTask object which also handles exceptions in the subtask without producing that much of a boilerplate?
Do I miss something here?
Very important thing, though usually not described in tutorials:
Runnables to be executed on an ExecutorService should not block. This is because each blocking switches off a working thread, and if ExecutorService has limited number of working threads, there is a risk to fall into deadlock (thread starvation), and if ExecutorService has unlimited number of working threads, then there is a risk to run out of memory. Blocking operations in the tasks simply destroy all advantages of ExecutorService, so use blocking operations on usual threads only.
FutureTask.get() is blocking operation, so can be used on ordinary threads and not from an ExecutorService task. That is, it cannot serve as a building block, but only to deliver result of execution to the master thread.
Right approach to build execution from tasks is to start next task when all input data for the next task is ready, so that the task do not have to block waiting for input data. So you need a kind of a gate which stores intermediate results and starts new task when all arguments have arrived. Thus tasks do not bother explicitly to start other tasks. So a gate, which consists of input sockets for arguments and a Runnable to compute them, can be considered as a right building block for computations on ExcutorServices.
This approach is called dataflow or workflow (if gates cannot be created dynamically).
Actor frameworks like Akka use this approach but are limited in the fact that an actor is a gate with single input socket.
I have written a true dataflow library published at https://github.com/rfqu/df4j.
I tried to do something similar with a ScheduledFuture, trying to cause a delay before things were displayed to the user. This is what I come up with, simply use the same ScheduledFuture for all your 'delays'. The code was:
public static final ScheduledExecutorService scheduler = Executors
.newScheduledThreadPool(1);
public ScheduledFuture delay = null;
delay = scheduler.schedule(new Runnable() {
#Override
public void run() {
//do something
}
}, 1000, TimeUnit.MILLISECONDS);
delay = scheduler.schedule(new Runnable() {
#Override
public void run() {
//do something else
}
}, 2000, TimeUnit.MILLISECONDS);
Hope this helps
Andy
The usual approach is to:
Decide about ExecutorService (which type, how many threads).
Decide about the task queue (for how long it could be non-blocking).
If you have some external code that waits for the task result:
* Submit tasks as Callables (this is non blocking as long as you do not run out of the queue).
* Call get on the Future.
If you want some actions to be taken automatically after the task is finished:
You can submit as Callables or Runnables.
Just add that you need to do at the end as the last code inside the task. Use
Activity.runOnUIThread these final actions need to modify GUI.
Normally, you should not actively check when you can submit one more task or schedule callback in order just to submit them. The thread queue (blocking, if preferred) will handle this for you.
I have a program that creates hundreds of instances of a class, each of which listens to another thread which simply fires an event on a regular timed schedule (so that they all perform at the same speed). What I'd like is for each of the hundreds of instances to be its own thread, so that when an event is fired, they can all work in parallel. What makes sense to me is to have these classes extend the Thread class and then have this code inside them...
public class IteratorStepListener implements StepEventListener {
public void actionPerformed(ActionEvent e) {
start();
}
}
public void run() {
doStuff();
}
This doesn't seem to work though. Clearly I'm not understanding something basic here. What's the proper way to do this?
Okay, first thing: overcome the notion that your hundreds of threads will run in parallel. At the very best, they will run concurrently, ie, time-sliced. As you get into the hundreds of threads, you will see the bearings on the scheduling algorithm start to glow; in the thousands they'll smoke and eventually seize up, and you'll get no more threads.
Now, that said, we don't have near enough code to understand what you're really doing, but one thing that I note is you don't seem to be making new Threads. Remember that a thread is an object; the canonical way to start a thread is
Thread t = new Thread(Runnable r);
t.run();
What it looks like is that you're trying to run() the same thread over and over again; this way lies madness. Have a look at Wiki on Event Driven Programming. If you really want to have a separate thread for handling each event, you'll want a scheme something like this (pseudocode):
processEvents: function
eventQueue: queue of Events
event: implements Runnable
-- something produces events and puts them on the queue
loop -- forever
do
Event ev := eventQueue.front
new Thread(ev).run();
od
end -- processEvents
It sounds like the event is going to be fired more than once... but you can't start the same thread more than once.
It sounds like your listener should implement the interface but start a thread directly in actionPerformed (or better, use an Executor so that it could use a thread pool). So instead of your current implementation, you could use:
// Assuming the listener implements runnable; you may want to
// delegate that to a separate class for separation of concerns.
public void actionPerformed(ActionEvent e) {
new Thread(this).start();
}
or
public void actionPerformed(ActionEvent e) {
executor.execute(this);
}
What I'd like is for each of the hundreds of instances to be its own thread, so that when an event is fired, they can all work in parallel.
I don't think this is a good approach.
Unless you have hundreds of processors, the threads cannot possibly all work in parallel. You'll end up with the threads running them one at a time (one per processor), or time-slicing between processors.
Each thread actually ties down a significant slice of the JVM's resources, even when inactive. IIRC, the default stack size is about 1 Mbyte.
The example code in your question shows the event calling start() on the thread. Unfortunately, you can only call start() on a thread once. Once the thread has terminated it cannot be restarted.
A better approach would be to create an executor with a bounded thread pool, and have each event cause a new task to be submitted to the executor. Something like this:
ThreadPoolExecutor executor = new ThreadPoolExecutor(corePoolSize, maxPoolSize,
keepAliveTime, timeUnit, workQueue);
...
public class IteratorStepListener implements StepEventListener, Runnable {
public void actionPerformed(ActionEvent e) {
executor.submit(this);
}
public void run() {
doStuff();
}
}
You can't use threads like that in Java. This is because Java threads directly map to underlying OS threads (at least on JVM implementations that I'm aware of), and OS threads can't scale like that. A rule of thumb is, you want to keep total number of threads within hundred or something in an app. A few hundred is probably ok. A few thousand gets usually problematic, depending on the HW you are using.
The use of threads like you described is a valid implementation strategy in languages like Erlang for example. Meanwhile, if you are stuck with Java this time, creating a shared thread pool and submitting your tasks to this instead of allowing all tasks to run concurrently might be a good alternative. In this case, you can choose a suitable number of threads (best number depends on the nature of the task. If you have no idea, number of CPU core available times 2 is a good start), and have that number of tasks run concurrently.
If you absolutely need all tasks to proceed concurrently, it could get a little complicated, but that's doable as well.
What do you think is the best way for obtaining the results of the work of a thread? Imagine a Thread which does some calculations, how do you warn the main program the calculations are done?
You could poll every X milliseconds for some public variable called "job finished" or something by the way, but then you'll receive the results later than when they would be available... the main code would be losing time waiting for them. On the other hand, if you use a lower X, the CPU would be wasted polling so many times.
So, what do you do to be aware that the Thread, or some Threads, have finished their work?
Sorry if it looks similar to this other question, that's probably the reason for the eben answer, I suppose. What I meant was running lots of threads and know when all of them have finished, without polling them.
I was thinking more in the line of sharing the CPU load between multiple CPU's using batches of Threads, and know when a batch has finished. I suppose it can be done with Futures objects, but that blocking get method looks a lot like a hidden lock, not something I like.
Thanks everybody for your support. Although I also liked the answer by erickson, I think saua's the most complete, and the one I'll use in my own code.
Don't use low-level constructs such as threads, unless you absolutely need the power and flexibility.
You can use a ExecutorService such as the ThreadPoolExecutor to submit() Callables. This will return a Future object.
Using that Future object you can easily check if it's done and get the result (including a blocking get() if it's not yet done).
Those constructs will greatly simplify the most common threaded operations.
I'd like to clarify about the blocking get():
The idea is that you want to run some tasks (the Callables) that do some work (calculation, resource access, ...) where you don't need the result right now. You can just depend on the Executor to run your code whenever it wants (if it's a ThreadPoolExecutor then it will run whenever a free Thread is available). Then at some point in time you probably need the result of the calculation to continue. At this point you're supposed to call get(). If the task already ran at that point, then get() will just return the value immediately. If the task didn't complete, then the get() call will wait until the task is completed. This is usually desired since you can't continue without the tasks result anyway.
When you don't need the value to continue, but would like to know about it if it's already available (possibly to show something in the UI), then you can easily call isDone() and only call get() if that returns true).
You could create a lister interface that the main program implements wich is called by the worker once it has finished executing it's work.
That way you do not need to poll at all.
Here is an example interface:
/**
* Listener interface to implement to be called when work has
* finished.
*/
public interface WorkerListener {
public void workDone(WorkerThread thread);
}
Here is an example of the actual thread which does some work and notifies it's listeners:
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
/**
* Thread to perform work
*/
public class WorkerThread implements Runnable {
private List listeners = new ArrayList();
private List results;
public void run() {
// Do some long running work here
try {
// Sleep to simulate long running task
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
results = new ArrayList();
results.add("Result 1");
// Work done, notify listeners
notifyListeners();
}
private void notifyListeners() {
for (Iterator iter = listeners.iterator(); iter.hasNext();) {
WorkerListener listener = (WorkerListener) iter.next();
listener.workDone(this);
}
}
public void registerWorkerListener(WorkerListener listener) {
listeners.add(listener);
}
public List getResults() {
return results;
}
}
And finally, the main program which starts up a worker thread and registers a listener to be notified once the work is done:
import java.util.Iterator;
import java.util.List;
/**
* Class to simulate a main program
*/
public class MainProg {
public MainProg() {
WorkerThread worker = new WorkerThread();
// Register anonymous listener class
worker.registerWorkerListener(new WorkerListener() {
public void workDone(WorkerThread thread) {
System.out.println("Work done");
List results = thread.getResults();
for (Iterator iter = results.iterator(); iter.hasNext();) {
String result = (String) iter.next();
System.out.println(result);
}
}
});
// Start the worker thread
Thread thread = new Thread(worker);
thread.start();
System.out.println("Main program started");
}
public static void main(String[] args) {
MainProg prog = new MainProg();
}
}
Polling a.k.a busy waiting is not a good idea. As you mentioned, busy waiting wastes CPU cycles and can cause your application to appear unresponsive.
My Java is rough, but you want something like the following:
If one thread has to wait for the output of another thread you should make use of a condition variable.
final Lock lock = new ReentrantLock();
final Condition cv = lock.newCondition();
The thread interested in the output of the other threat should call cv.wait(). This will cause the current thread to block. When the worker thread is finished working, it should call cv.signal(). This will cause the blocked thread to become unblocked, allowing it to inspect the output of the worker thread.
As an alternative to the concurrency API as described by Saua (and if the main thread doesn't need to know when a worker thread finishes) you could use the publish/subscribe pattern.
In this scenario the child Thread/Runnable is given a listener that knows how to process the result and which is called back to when child Thread/Runnable completes.
Your scenario is still a little unclear.
If you are running a batch job, you may want to use invokeAll. This will block your main thread until all the tasks are complete. There is no "busy waiting" with this approach, where the main thread would waste CPU polling the isDone method of a Future. While this method returns a list of Futures, they are already "done". (There's also an overloaded version that can timeout before completion, which might be safer to use with some tasks.) This can be a lot cleaner than trying to gather up a bunch of Future objects yourself and trying to check their status or block on their get methods individually.
If this is an interactive application, with tasks sporadically spun off to be executed in the background, using a callback as suggested by nick.holt is a great approach. Here, you use the submit a Runnable. The run method invokes the callback with the result when it's been computed. With this approach, you may discard the Future returned by submit, unless you want to be able to cancel running tasks without shutting down the whole ExecutorService.
If you want to be able to cancel tasks or use the timeout capabilities, an important thing to remember is that tasks are canceled by calling interrupt on their thread. So, your task needs to check its interrupted status periodically and abort as needed.
Subclass Thread, and give your class a method that returns the result. When the method is called, if the result hasn't been created, yet, then join() with the Thread. When join() returns, your Thread's work will be done and the result should be available; return it.
Use this only if you actually need to fire off an asynchronous activity, do some work while you're waiting, and then obtain the result. Otherwise, what's the point of a Thread? You might as well just write a class that does the work and returns the result in the main thread.
Another approach would be a callback: have your constructor take an argument that implements an interface with a callback method that will be called when the result is computed. This will make the work completely asynchronous. But if you at all need to wait for the result at some point, I think you're still going to need to call join() from the main thread.
As noted by saua: use the constructs offered by java.util.concurrent. If you're stuck with a pre 1.5 (or 5.0) JRE, you ,might resort to kind of rolling your own, but you're still better of by using a backport: http://backport-jsr166.sourceforge.net/