I'm wrestling with the best way to implement my processing pipeline.
My producers feed work to a BlockingQueue. On the consumer side, I poll the queue, wrap what I get in a Runnable task, and submit it to an ExecutorService.
while (!isStopping())
{
String work = workQueue.poll(1000L, TimeUnit.MILLISECONDS);
if (work == null)
{
break;
}
executorService.execute(new Worker(work)); // needs to block if no threads!
}
This is not ideal; the ExecutorService has its own queue, of course, so what's really happening is that I'm always fully draining my work queue and filling the task queue, which slowly empties as the tasks complete.
I realize that I could queue tasks at the producer end, but I'd really rather not do that - I like the indirection/isolation of my work queue being dumb strings; it really isn't any business of the producer what's going to happen to them. Forcing the producer to queue a Runnable or Callable breaks an abstraction, IMHO.
But I do want the shared work queue to represent the current processing state. I want to be able to block the producers if the consumers aren't keeping up.
I'd love to use Executors, but I feel like I'm fighting their design. Can I partially drink the Kool-ade, or do I have to gulp it? Am I being wrong-headed in resisting queueing tasks? (I suspect I could set up ThreadPoolExecutor to use a 1-task queue and override it's execute method to block rather than reject-on-queue-full, but that feels gross.)
Suggestions?
I want the shared work queue to
represent the current processing
state.
Try using a shared BlockingQueue and have a pool of Worker threads taking work items off of the Queue.
I want to be able to block the
producers if the consumers aren't
keeping up.
Both ArrayBlockingQueue and LinkedBlockingQueue support bounded queues such that they will block on put when full. Using the blocking put() methods ensures that producers are blocked if the queue is full.
Here is a rough start. You can tune the number of workers and queue size:
public class WorkerTest<T> {
private final BlockingQueue<T> workQueue;
private final ExecutorService service;
public WorkerTest(int numWorkers, int workQueueSize) {
workQueue = new LinkedBlockingQueue<T>(workQueueSize);
service = Executors.newFixedThreadPool(numWorkers);
for (int i=0; i < numWorkers; i++) {
service.submit(new Worker<T>(workQueue));
}
}
public void produce(T item) {
try {
workQueue.put(item);
} catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
}
private static class Worker<T> implements Runnable {
private final BlockingQueue<T> workQueue;
public Worker(BlockingQueue<T> workQueue) {
this.workQueue = workQueue;
}
#Override
public void run() {
while (!Thread.currentThread().isInterrupted()) {
try {
T item = workQueue.take();
// Process item
} catch (InterruptedException ex) {
Thread.currentThread().interrupt();
break;
}
}
}
}
}
"find an available existing worker thread if one exists, create one if necessary, kill them if they go idle."
Managing all those worker states is as unnecessary as it is perilous. I would create one monitor thread that constantly runs in the background, who's only task is to fill up the queue and spawn consumers... why not make the worker threads daemons so they die as soon as they complete? If you attach them all to one ThreadGroup you can dynamically re-size the pool... for example:
**for(int i=0; i<queue.size()&&ThreadGroup.activeCount()<UPPER_LIMIT;i++ {
spawnDaemonWorkers(queue.poll());
}**
You could have your consumer execute Runnable::run directly instead of starting a new thread up. Combine this with a blocking queue with a maximum size and I think that you will get what you want. Your consumer becomes a worker that is executing tasks inline based on the work items on the queue. They will only dequeue items as fast as they process them so your producer when your consumers stop consuming.
Related
I want to understand logic of thread pool, and below there is a simple incorrect and not full implementation of it:
class ThreadPool {
private BlockingQueue<Runnable> taskQueue;
public ThreadPool(int numberOfThreads) {
taskQueue = new LinkedBlockingQueue<Runnable>(10);
for (int i = 0; i < numberOfThreads; i++) {
new PoolThread(taskQueue).start();
}
}
public void execute(Runnable task) throws InterruptedException {
taskQueue.put(task);
}
}
class PoolThread extends Thread {
private BlockingQueue<Runnable> taskQueue;
public PoolThread(BlockingQueue<Runnable> queue) {
taskQueue = queue;
}
public void run() {
while (true) {
try {
taskQueue.take().run();
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
What if the number of threads to execute exceed the taskQueue size, will the calling thread be blocked?ThreadPoolExecutor - here we can see that in this case it's a work of rejected execution handler, but I still can not understand how does it work. Thanks in advance for any help.
EDIT:
set max size of blocking queue to 10
Imagine a group of bricklayers (your threads) building a wall, and a pile of bricks (your BlockingQueue).
Each bricklayer takes a brick from the pile, positions it, and then pick another one (taskQueue.take()) - until there are bricks in the pile, the bricklayers are kept busy.
A truck arrives from time to time, filling the pile with more bricks - but there is only a limited space on the pile, if there is no space the truck stops and wait until enough bricks have been used by the bricklayers.
As long there are enough bricks in the pile (more than the number of bricklayers) you can rest assured all bricklayers will have enough to work with - but when the pile start being empty the bricklayers will have to stop working until new bricks are delivered.
You have to pick a suitable number of bricklayers, to few and the truck will be often waiting for space in the pile, too many and most of them will be idle waiting for new bricks.
Implementation-wise, in general, Java gives you a threadpool, you rarely create your own -
ExecutorService threadExecutor = Executors.newFixedThreadPool( 3 );
and then you call:
threadExecutor.submit(Runnable...);
to add a task to the queue.
What if the number of threads to execute exceed the taskQueue size, will the calling thread be blocked?
The size of the queue is the number of tasks which are NOT running. Typically it will be empty even when the threads are busy. Having a queue length which matches the number of threads has no significance and nothing special happens at this point.
here we can see that in this case it's a work of rejected execution handler
The rejection handler is only called if the queue is full. Your queue has no limit so it wouldn't be called even if you supported this feature.
However, if it did have a limit and it supported this feature, the typical behaviour is to throw an exception. You can make it do other things such as block, have the current thread run the task (which is my preference) or ignore the task.
I still can not understand how does it work.
When you offer() a task to a queue, it return false if the queue could not accept it. When this happens call the rejected execution handler.
I have a situation where different threads populate a queue (producers) and one consumer retrieve element from this queue. My problem is that when one of these elements are retrieved from the queue some is missed (missing signal?). The producers code is:
class Producer implements Runnable {
private Consumer consumer;
Producer(Consumer consumer) { this.consumer = consumer; }
#Override
public void run() {
consumer.send("message");
}
}
and they are created and run with:
ExecutorService executor = Executors.newSingleThreadExecutor();
for (int i = 0; i < 20; i++) {
executor.execute(new Producer(consumer));
}
Consumer code is:
class Consumer implements Runnable {
private Queue<String> queue = new ConcurrentLinkedQueue<String>();
void send(String message) {
synchronized (queue) {
queue.add(message);
System.out.println("SIZE: " + queue.size());
queue.notify();
}
}
#Override
public void run() {
int counter = 0;
synchronized (queue) {
while(true) {
try {
System.out.println("SLEEP");
queue.wait(10);
} catch (InterruptedException e) {
Thread.interrupted();
}
System.out.println(counter);
if (!queue.isEmpty()) {
queue.poll();
counter++;
}
}
}
}
}
When the code is run I get sometimes 20 elements added and 20 retrieved, but in other cases the elements retrieved are less than 20. Any idea how to fix that?
I'd suggest you use a BlockingQueue instead of a Queue. A LinkedBlockingDeque might be a good candidate for you.
Your code would look like this:
void send(String message) {
synchronized (queue) {
queue.put(message);
System.out.println("SIZE: " + queue.size());
}
}
and then you'd need to just
queue.take()
on your consumer thread
The idea is that .take() will block until an item is available in the queue and then return exactly one (which is where I think your implementation suffers: missing notification while polling). .put() is responsible for doing all the notifications for you. No wait/notifies needed.
The issue in your code is probably because you are using notify instead of notifyAll. The former will only wake up a single thread, if there is one waiting on the lock. This allows a race condition where no thread is waiting and the signal is lost. A notifyAll will force correctness at a minor performance cost by requiring all threads to wake up to check whether they can obtain the lock.
This is best explained in Effective Java 1st ed (see p.150). The 2nd edition removed this tip since programmers are expected to use java.util.concurrent which provides stronger correctness guarantees.
It looks like bad idea to use ConcurrentLinkedQueue and synchronization both at the same time. It defies the purpose of concurrent data structures in the first place.
There is no problem with ConcurrentLinkedQueue data structure and replacing it with BlockingQueue will solve the problem but this is not the root cause.
Problem is with queue.wait(10). This is timed wait method. It will acquire lock again once 10ms elapses.
Notification (queue.notify() ) will get lost because there is no consumer thread waiting on it if 10ms has elapsed.
Producer will not be able to add to the queue since they can't acquire lock because lock is claimed again by the consumer.
Moving to BlockingQueue solved your problem because you removed your wait(10) code and wait and notify was taken care by BlockingQueue data structure.
I would like to ask basic question about Java threads. Let's consider a producer - consumer scenario. Say there is one producer, and n consumer. Consumer arrive at random time, and once they are served they go away, meaning each consumer runs on its own thread. Should I still use run forever condition for consumer ?
public class Consumer extends Thread {
public void run() {
while (true) {
}
}
}
Won't this keep thread running forever ?
I wouldn't extend Thread, instead I would implement Runnable.
If you want the thread to run forever, I would have it loop forever.
A common alternative is to use
while(!Thread.currentThread().isInterrupted()) {
or
while(!Thread.interrupted()) {
It will, so you might want to do something like
while(beingServed)
{
//check if the customer is done being served (set beingServed to false)
}
This way you'll escaped the loop when it's meant to die.
Why not use a boolean that represents the presence of the Consumer?
public class Consumer extends Thread {
private volatile boolean present;
public Consumer() {
present = true;
}
public void run() {
while (present) {
// Do Stuff
}
}
public void consumerLeft() {
present = false;
}
}
First, you can create for each consumer and after the consumer will finish it's job than the consumer will finish the run function and will die, so no need for infinite loop. however, creating thread for each consumer is not good idea since creation of thread is quite expensive in performance point of view. threads are very expensive resources. In addition, i agree with the answers above that it is better to implement runnable and not to extends thread. extend thread only when you wish to customize your thread.
I strongly suggest you will use thread pool and the consumer will be the runnable object that ran by the thread in the thread pool.
the code should look like this:
public class ConsumerMgr{
int poolSize = 2;
int maxPoolSize = 2;
long keepAliveTime = 10;
ThreadPoolExecutor threadPool = null;
final ArrayBlockingQueue<Runnable> queue = new ArrayBlockingQueue<Runnable>(
5);
public ConsumerMgr()
{
threadPool = new ThreadPoolExecutor(poolSize, maxPoolSize,
keepAliveTime, TimeUnit.SECONDS, queue);
}
public void runTask(Runnable task)
{
// System.out.println("Task count.."+threadPool.getTaskCount() );
// System.out.println("Queue Size before assigning the
// task.."+queue.size() );
threadPool.execute(task);
// System.out.println("Queue Size after assigning the
// task.."+queue.size() );
// System.out.println("Pool Size after assigning the
// task.."+threadPool.getActiveCount() );
// System.out.println("Task count.."+threadPool.getTaskCount() );
System.out.println("Task count.." + queue.size());
}
It is not a good idea to extend Thread (unless you are coding a new kind of thread - ie never).
The best approach is to pass a Runnable to the Thread's constructor, like this:
public class Consumer implements Runnable {
public void run() {
while (true) {
// Do something
}
}
}
new Thread(new Consumer()).start();
In general, while(true) is OK, but you have to handle being interrupted, either by normal wake or by spurious wakeup. There are many examples out there on the web.
I recommend reading Java Concurrency in Practice.
for producer-consumer pattern you better use wait() and notify(). See this tutorial. This is far more efficient than using while(true) loop.
If you want your thread to processes messages until you kill them (or they are killed in some way) inside while (true) there would be some synchronized call to your producer thread (or SynchronizedQueue, or queuing system) which would block until a message becomes available. Once a message is consumed, the loop restarts and waits again.
If you want to manually instantiate a bunch of thread which pull a message from a producer just once then die, don't use while (true).
I'm trying to find a less clunky solution to a Java concurrency problem.
The gist of the problem is that I need a shutdown call to block while there are still worker threads active, but the crucial aspect is that the worker tasks are each spawned and completed asynchronously so the hold and release must be done by different threads. I need them to somehow send a signal to the shutdown thread once their work has completed. Just to make things more interesting, the worker threads cannot block each other so I'm unsure about the application of a Semaphore in this particular instance.
I have a solution which I think safely does the job, but my unfamiliarity with the Java concurrency utils leads me to think that there might be a much easier or more elegant pattern. Any help in this regard would be greatly appreciated.
Here's what I have so far, fairly sparse except for the comments:
final private ReentrantReadWriteLock shutdownLock = new ReentrantReadWriteLock();
volatile private int activeWorkerThreads;
private boolean isShutdown;
private void workerTask()
{
try
{
// Point A: Worker tasks mustn't block each other.
shutdownLock.readLock().lock();
// Point B: I only want worker tasks to continue if the shutdown signal
// hasn't already been received.
if (isShutdown)
return;
activeWorkerThreads ++;
// Point C: This async method call returns immediately, soon after which
// we release our lock. The shutdown thread may then acquire the write lock
// but we want it to continue blocking until all of the asynchronous tasks
// have completed.
executeAsynchronously(new Runnable()
{
#Override
final public void run()
{
try
{
// Do stuff.
}
finally
{
// Point D: Release of shutdown thread loop, if there are no other
// active worker tasks.
activeWorkerThreads --;
}
}
});
}
finally
{
shutdownLock.readLock().unlock();
}
}
final public void shutdown()
{
try
{
// Point E: Shutdown thread must block while any worker threads
// have breached Point A.
shutdownLock.writeLock().lock();
isShutdown = true;
// Point F: Is there a better way to wait for this signal?
while (activeWorkerThreads > 0)
;
// Do shutdown operation.
}
finally
{
shutdownLock.writeLock().unlock();
}
}
Thanks in advance for any help!
Russ
Declaring activeWorkerThreads as volatile doesn't allow you to do activeWorkerThreads++, as ++ is just shorthand for,
activeWorkerThreads = activeWorkerThreads + 1;
Which isn't atomic. Use AtomicInteger instead.
Does executeAsynchronously() send jobs to a ExecutorService? If so you can just use the awaitTermination method, so your shutdown hook will be,
executor.shutdown();
executor.awaitTermination(1, TimeUnit.Minutes);
You can use a semaphore in this scenario and not require a busy wait for the shutdown() call. The way to think of it is as a set of tickets that are handed out to workers to indicate that they are in-flight. If the shutdown() method can acquire all of the tickets then it knows that it has drained all workers and there is no activity. Because #acquire() is a blocking call the shutdown() won't spin. I've used this approach for a distributed master-worker library and its easy extend it to handle timeouts and retrials.
Executor executor = // ...
final int permits = // ...
final Semaphore semaphore = new Semaphore(permits);
void schedule(final Runnable task) {
semaphore.acquire();
try {
executor.execute(new Runnable() {
#Override public run() {
try {
task.run();
} finally {
semaphore.release();
}
}
});
} catch (RejectedExecutionException e) {
semaphore.release();
throw e;
}
}
void shutDown() {
semaphore.acquireUninterruptibly(permits);
// do stuff
}
ExecutorService should be a preferred solution as sbridges mentioned.
As an alternative, if the number of worker threads is fixed, then you can use CountDownLatch:
final CountDownLatch latch = new CountDownLatch(numberOfWorkers);
Pass the latch to every worker thread and call latch.countDown() when task is done.
Call latch.await() from the main thread to wait for all tasks to complete.
Whoa nelly. Never do this:
// Point F: Is there a better way to wait for this signal?
while (activeWorkerThreads > 0)
;
You're spinning and consuming CPU. Use a proper notification:
First: synchronize on an object, then check activeWorkerThreads, and wait() on the object if it's still > 0:
synchronized (mutexObject) {
while (activeWorkerThreads > 0) {
mutexObject.wait();
}
}
Second: Have the workers notify() the object after they decrement the activeWorkerThreads count. You must synchronize on the object before calling notify.
synchronized (mutexObject) {
activeWorkerThreads--;
mutexObject.notify();
}
Third: Seeing as you are (after implementing 1 & 2) synchronizing on an object whenever you touch activeWorkerThreads, use it as protection; there is no need for the variable to be volatile.
Then: the same object you use as a mutex for controlling access to activeWorkerThreads could also be used to control access to isShutdown. Example:
synchronized (mutexObject) {
if (isShutdown) {
return;
}
}
This won't cause workers to block each other except for immeasurably small amounts of time (which you likely do not avoid by using a read-write lock anyway).
This is more like a comment to sbridges answer, but it was a bit too long to submit as a comment.
Anyways, just 1 comment.
When you shutdown the executor, submitting new task to the executor will result in unchecked RejectedExecutionException if you use the default implementations (like Executors.newSingleThreadExecutor()). So in your case you probably want to use the following code.
code:
new ThreadPoolExecutor(1,
1,
1,
TimeUnit.HOURS,
new LinkedBlockingQueue<Runnable>(),
new ThreadPoolExecutor.DiscardPolicy());
This way, the tasks that were submitted to the executor after shutdown() was called, are simply ignored. The parameter above (1,1... etc) should produce an executor that basically is a single-thread executor, but doesn't throw the runtime exception.
I have a queue of task in java. This queue is in a table in the DB.
I need to:
1 thread per task only
No more than N threads running at the same time. This is because the threads have DB interaction and I don't want have a bunch of DB connections opened.
I think I could do something like:
final Semaphore semaphore = new Semaphore(N);
while (isOnJob) {
List<JobTask> tasks = getJobTasks();
if (!tasks.isEmpty()) {
final CountDownLatch cdl = new CountDownLatch(tasks.size());
for (final JobTask task : tasks) {
Thread tr = new Thread(new Runnable() {
#Override
public void run() {
semaphore.acquire();
task.doWork();
semaphore.release();
cdl.countDown();
}
});
}
cdl.await();
}
}
I know that an ExecutorService class exists, but I'm not sure if it I can use it for this.
So, do you think that this is the best way to do this? Or could you clarify me how the ExecutorService works in order to solve this?
final solution:
I think the best solution is something like:
while (isOnJob) {
ExecutorService executor = Executors.newFixedThreadPool(N);
List<JobTask> tasks = getJobTasks();
if (!tasks.isEmpty()) {
for (final JobTask task : tasks) {
executor.submit(new Runnable() {
#Override
public void run() {
task.doWork();
}
});
}
}
executor.shutdown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.HOURS);
}
Thanks a lot for the awnsers. BTW I am using a connection pool, but the queries to the DB are very heavy and I don't want to have uncontrolled number of task at the same time.
You can indeed use an ExecutorService. For instance, create a new fixed thread pool using the newFixedThreadPool method. This way, besides caching threads, you also guarantee that no more than n threads are running at the same time.
Something along these lines:
private static final ExecutorService executor = Executors.newFixedThreadPool(N);
// ...
while (isOnJob) {
List<JobTask> tasks = getJobTasks();
if (!tasks.isEmpty()) {
List<Future<?>> futures = new ArrayList<Future<?>>();
for (final JobTask task : tasks) {
Future<?> future = executor.submit(new Runnable() {
#Override
public void run() {
task.doWork();
}
});
futures.add(future);
}
// you no longer need to use await
for (Future<?> fut : futures) {
fut.get();
}
}
}
Note that you no longer need to use the latch, as get will wait for the computation to complete, if necessary.
I agree with JG that ExecutorService is the way to go... but I think you're both making it more complicated than it needs to be.
Rather than creating a large number of threads (1 per task) why not just create a fixed sized thread pool (with Executors.newFixedThreadPool(N)) and submit all the tasks to it? No need for a semaphore or anything like that - just submit the jobs to the thread pool as you get them, and the thread pool will handle them with up to N threads at a time.
If you aren't going to use more than N threads at a time, why would you want to create them?
Use a ThreadPoolExecutor instance with an unbound queue and fixed maximum size of Threads, e.g. Executors.newFixedThreadPool(N). This will accept a large number of tasks but will only execute N of them concurrently.
If you choose a bounded queue instead (with a capacity of N) the Executor will reject the execution of the task (how exactly depends on the Policy you can configure when working with ThreadPoolExecutor directly, instead of using the Executors factory - see RejectedExecutionHandler).
If you need "real" congestion control you should setup a bound BlockingQueue with a capacity of N. Fetch the tasks you want done from the database and put them into the queue - if it's full the calling thread will block. In another thread (perhaps also started using the Executor API) you take tasks from the BlockingQueue and submit them to the Executor. If the BlockingQueue is empty the calling thread will also block. To signal that you're done use a "special" object (e.g. a singleton which marks the last/final item in the queue).
Achieving good performance also depends on the kind of work that needs to be done in the threads. If your DB is the bottleneck in processing I would start paying attention to how your threads access the DB. Using a connection pool is probably in order. This might help you to achive more throughput, since worker threads can re-use DB connections from the pool.