Implement Producer-Consumer Pattern Using Executor Framework

Implement Producer-Consumer Pattern Using Executor Framework - java

I have two threads, a consumer and a producer. The consumer thread is the main thread while the producer thread is created by a third party library I use.
You make a request for a List of data to the producer thread using ProducerChannel's requestData(), which returns immediately. Then, the producer thread will generate data one by one asynchronously and uses a call back method to send each of them. I want the method that requests data to return the result synchronously. The most straightforward way would be to use wait() and notify() like below.
public class DataFeed {
boolean done;
private List<Data> dataList;
private ProducerChannel producerChannel;
// This method should be synchronous.
public List<Data> getDataList() {
this.producerChannel.requestData();
while (!done) {
wait();
}
List<Data> dataList = this.dataList;
this.dataList = null;
return dataList;
}
// This is the call back method invoked by the producer thread.
public void generated(Data data) {
if (data == null) {
done = true; // End of data.
notify();
}
else {
this.dataList.add(data);
}
}
}
Note that there's only one consumer thread in the entire application. That's why DataFeed has only one List to hold the result for each request. I learned that the Executor framework is now the preferred way to manage threads. How can I refactor this class so that it does not use Thread objects explicitly while not creating additional threads?

I think you should take a look at the producer-consumer example on the BlockingQueue javadoc:
http://download.oracle.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html

I don't really follow you. This class doesn't create any thread. So using the executor framework won't hep you here.
Instead os using explicit synchronization and thread communication using wait and notify, you could just use a BlockingQueue. The producer adds the list to the blocking queue when data is ready, and the consuler blocks while the blocking queue is empty.

If you don't care about the size of the resulting collection or getting items by index, then BlockingQueue might be what you need. If you DO care about those things (i.e. you actually need a java.util.List, instead of Collection, or if you're not ok using a delayed collection for some reason) then you can still use wait() and notify().
You'll need to add they synchronized keyword to your getDataList and generated methods, otherwise you'll get an IllegalStateMonitorException.
Like this: public synchronized List<Data> getDataList() throws InterruptedException{
But if that's the route that you want to go, use caution. Any number of threads would be able to call getDataList() at the same time. Even though it's synchronized, you're releasing the monitor lock by calling wait()
Personally, I'd go with the BlockingQueue

Here is the example on how to use Blocking Queue to tackle producer consumer problems

I'm the OP. While BlockingQueue would certainly work, I learned that Semaphore would be a simpler solution.

Related

Java Multi Threading writing and reading

I want to know I am correct about the below code sample.
I have two Threads in java. Thread_W and Thread_R
Both can access the Queue<String> queue.
in Thread_W has a method called put.
private void put(String email){
queue.offer(email);
}
And in the Thread_R there is a method called get AND it is once called when the Thread_R starts.
public void get(){
while(true)
{
if(!queue.isEmpty())
{
String to = queue.poll();
//thread will consume some time here ...may be 5-10 seconds.
}
}
}
so the method put in the Thread_W will called more efficiently by A other method in the Thread_W.may be in a while loop.
If I use this code in my Java project will the Thread_R lose any of the emails put into the queue?
P.S. I really need a Buffer

You should use a implementation of the Blocking Queue interface, as those are thread safe.
The interface offers the methods put() and take(), which block until they've executed. This way the reading thread doesn't consume a lot of CPU cycles and the writing thread doesn't write, if the queue is full.
Your current busy wait
while(true)
{
if(!queue.isEmpty())
{
//...
}
}
isn't very efficient. It is better to use a blocking method call, so you won't need to check, if the queue is empty (or full).
Also you can't overflow the queue's buffer, if your writing thread is way faster than the reading one as put() wait's for space to become available.
Remember that you could always manually reserve a bigger buffer for the queue by setting the capicitiy in it's construtor beforehand, e.g. ArrayBlockingQueue(int capacity).

If you want to use an unbounded concurrent queue I would recommend taking a look at thread-safe implementations of Deque, for instance LinkedBlockingDeque. LinkedBlockingDeque can be unbounded and take() will block the calling thread if the queue is empty. You do not need to worry about synchronization if you use classes from the java.util.concurentpackage.

Java concurrency with queues

What I am trying to do is basically kick off new threads, add them to a queue, and than execute the rest of the code when they are dequeued. I'm not sure what the best way to add them to a queue, and how I can pause a thread at a point and notify them when they are dequeued. I haven't really done too much concurrent programming in Java before. Any help or suggestions would be greatly appreciated! Thanks

You could use a ThreadPoolExecutor, basically creating a pool of threads according to multiple customizable rules.
And to be sure that all threads have done their respective job before your process goes on the remaining code, you just have to call ThreadPoolExecutor's awaitTermination method preceded by an eventual ThreadPoolExecutor's shutdown method.
You could also send a notify/notifyAll after the call to awaitTermination in order to wake up some other result-dependent threads.
A sample is written in the ExecutorService documentation (implemented by ThreadPoolExecutor).

wait() and notify() can be used for this, as such:
class QueuedThread extends Thread {
private volatile boolean wait = true; //volatile because otherwise the thread running run() might cache this value and run into an endless loop.
public void deQueue() {
synchronized(this) {
wait = false;
this.notify();
}
}
public void run() {
synchronized(this) {
while (wait) { //You need this extra mechanism because wait() can come out randomly, so it's a safe-guard against that (so you NEED to have called deQueue() to continue executing).
this.wait();
}
}
//REST OF RUN METHOD HERE
}
}
Just call queuedThread.deQueue() when it should be de-queued.

Java BlockingQueue with batching?

I am interested in a data structure identical to the Java BlockingQueue, with the exception that it must be able to batch objects in the queue. In other words, I would like the producer to be able to put objects into the queue, but have the consumer block on take() untill the queue reaches a certain size (the batch size).
Then, once the queue has reached the batch size, the producer must block on put() untill the consumer has consumed all of the elements in the queue (in which case the producer will start producing again and the consumer block untill the batch is reached again).
Does a similar data structure exist? Or should I write it (which I don't mind), I just don't want to waste my time if there is something out there.
UPDATE
Maybe to clarify things a bit:
The situation will always be as follows. There can be multiple producers adding items to the queue, but there will never be more than one consumer taking items from the queue.
Now, the problem is that there are multiple of these setups in parallel and serial. In other words, producers produce items for multiple queues, while consumers in their own right can also be producers. This can be more easily thought of as a directed graph of producers, consumer-producers, and finally consumers.
The reason that producers should block until the queues are empty (#Peter Lawrey) is because each of these will be running in a thread. If you leave them to simply produce as space becomes available, you will end up with a situation where you have too many threads trying to process too many things at once.
Maybe coupling this with an execution service could solve the problem?

I would suggest you use BlockingQueue.drainTo(Collection, int). You can use it with take() to ensure you get a minimum number of elements.
The advantage of using this approach is that your batch size grows dynamically with the workload and the producer doesn't have to block when the consumer is busy. i.e. it self optimises for latency and throughput.
To implement exactly as asked (which I think is a bad idea) you can use a SynchronousQueue with a busy consuming thread.
i.e. the consuming thread does a
list.clear();
while(list.size() < required) list.add(queue.take());
// process list.
The producer will block when ever the consumer is busy.

Here is a quick ( = simple but not fully tested) implementation that i think may be suitable for your requests - you should be able to extend it to support the full queue interface if you need to.
to increase performance you can switch to ReentrantLock instead of using "synchronized" keyword..
public class BatchBlockingQueue<T> {
private ArrayList<T> queue;
private Semaphore readerLock;
private Semaphore writerLock;
private int batchSize;
public BatchBlockingQueue(int batchSize) {
this.queue = new ArrayList<>(batchSize);
this.readerLock = new Semaphore(0);
this.writerLock = new Semaphore(batchSize);
this.batchSize = batchSize;
}
public synchronized void put(T e) throws InterruptedException {
writerLock.acquire();
queue.add(e);
if (queue.size() == batchSize) {
readerLock.release(batchSize);
}
}
public synchronized T poll() throws InterruptedException {
readerLock.acquire();
T ret = queue.remove(0);
if (queue.isEmpty()) {
writerLock.release(batchSize);
}
return ret;
}
}
Hope you find it useful.

I recently developed this utility that batch BlockingQueue elements using a flushing timeout if queue elements doesn't reach the batch size. It also supports fanOut pattern using multiple instances to elaborate the same set of data:
// Instantiate the registry
FQueueRegistry registry = new FQueueRegistry();
// Build FQueue consumer
registry.buildFQueue(String.class)
.batch()
.withChunkSize(5)
.withFlushTimeout(1)
.withFlushTimeUnit(TimeUnit.SECONDS)
.done()
.consume(() -> (broadcaster, elms) -> System.out.println("elms batched are: "+elms.size()));
// Push data into queue
for(int i = 0; i < 10; i++){
registry.sendBroadcast("Sample"+i);
}
More info here!
https://github.com/fulmicotone/io.fulmicotone.fqueue

Not that I am aware of. If I understand correctly you want either the producer to work (while the consumer is blocked) until it fills the queue or the consumer to work (while the producer blocks) until it clears up the queue. If that's the case may I suggest that you don't need a data structure but a mechanism to block the one party while the other one is working in a mutex fasion. You can lock on an object for that and internally have the logic of whether full or empty to release the lock and pass it to the other party. So in short, you should write it yourself :)

This sounds like how the RingBuffer works in the LMAX Disruptor pattern. See http://code.google.com/p/disruptor/ for more.
A very rough explanation is your main data structure is the RingBuffer. Producers put data in to the ring buffer in sequence and consumers can pull off as much data as the producer has put in to the buffer (so essentially batching). If the buffer is full, the producer blocks until the consumer has finished and freed up slots in the buffer.

Getting the output of a Thread

What do you think is the best way for obtaining the results of the work of a thread? Imagine a Thread which does some calculations, how do you warn the main program the calculations are done?
You could poll every X milliseconds for some public variable called "job finished" or something by the way, but then you'll receive the results later than when they would be available... the main code would be losing time waiting for them. On the other hand, if you use a lower X, the CPU would be wasted polling so many times.
So, what do you do to be aware that the Thread, or some Threads, have finished their work?
Sorry if it looks similar to this other question, that's probably the reason for the eben answer, I suppose. What I meant was running lots of threads and know when all of them have finished, without polling them.
I was thinking more in the line of sharing the CPU load between multiple CPU's using batches of Threads, and know when a batch has finished. I suppose it can be done with Futures objects, but that blocking get method looks a lot like a hidden lock, not something I like.
Thanks everybody for your support. Although I also liked the answer by erickson, I think saua's the most complete, and the one I'll use in my own code.

Don't use low-level constructs such as threads, unless you absolutely need the power and flexibility.
You can use a ExecutorService such as the ThreadPoolExecutor to submit() Callables. This will return a Future object.
Using that Future object you can easily check if it's done and get the result (including a blocking get() if it's not yet done).
Those constructs will greatly simplify the most common threaded operations.
I'd like to clarify about the blocking get():
The idea is that you want to run some tasks (the Callables) that do some work (calculation, resource access, ...) where you don't need the result right now. You can just depend on the Executor to run your code whenever it wants (if it's a ThreadPoolExecutor then it will run whenever a free Thread is available). Then at some point in time you probably need the result of the calculation to continue. At this point you're supposed to call get(). If the task already ran at that point, then get() will just return the value immediately. If the task didn't complete, then the get() call will wait until the task is completed. This is usually desired since you can't continue without the tasks result anyway.
When you don't need the value to continue, but would like to know about it if it's already available (possibly to show something in the UI), then you can easily call isDone() and only call get() if that returns true).

You could create a lister interface that the main program implements wich is called by the worker once it has finished executing it's work.
That way you do not need to poll at all.
Here is an example interface:
/**
* Listener interface to implement to be called when work has
* finished.
*/
public interface WorkerListener {
public void workDone(WorkerThread thread);
}
Here is an example of the actual thread which does some work and notifies it's listeners:
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
/**
* Thread to perform work
*/
public class WorkerThread implements Runnable {
private List listeners = new ArrayList();
private List results;
public void run() {
// Do some long running work here
try {
// Sleep to simulate long running task
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
results = new ArrayList();
results.add("Result 1");
// Work done, notify listeners
notifyListeners();
}
private void notifyListeners() {
for (Iterator iter = listeners.iterator(); iter.hasNext();) {
WorkerListener listener = (WorkerListener) iter.next();
listener.workDone(this);
}
}
public void registerWorkerListener(WorkerListener listener) {
listeners.add(listener);
}
public List getResults() {
return results;
}
}
And finally, the main program which starts up a worker thread and registers a listener to be notified once the work is done:
import java.util.Iterator;
import java.util.List;
/**
* Class to simulate a main program
*/
public class MainProg {
public MainProg() {
WorkerThread worker = new WorkerThread();
// Register anonymous listener class
worker.registerWorkerListener(new WorkerListener() {
public void workDone(WorkerThread thread) {
System.out.println("Work done");
List results = thread.getResults();
for (Iterator iter = results.iterator(); iter.hasNext();) {
String result = (String) iter.next();
System.out.println(result);
}
}
});
// Start the worker thread
Thread thread = new Thread(worker);
thread.start();
System.out.println("Main program started");
}
public static void main(String[] args) {
MainProg prog = new MainProg();
}
}

Polling a.k.a busy waiting is not a good idea. As you mentioned, busy waiting wastes CPU cycles and can cause your application to appear unresponsive.
My Java is rough, but you want something like the following:
If one thread has to wait for the output of another thread you should make use of a condition variable.
final Lock lock = new ReentrantLock();
final Condition cv = lock.newCondition();
The thread interested in the output of the other threat should call cv.wait(). This will cause the current thread to block. When the worker thread is finished working, it should call cv.signal(). This will cause the blocked thread to become unblocked, allowing it to inspect the output of the worker thread.

As an alternative to the concurrency API as described by Saua (and if the main thread doesn't need to know when a worker thread finishes) you could use the publish/subscribe pattern.
In this scenario the child Thread/Runnable is given a listener that knows how to process the result and which is called back to when child Thread/Runnable completes.

Your scenario is still a little unclear.
If you are running a batch job, you may want to use invokeAll. This will block your main thread until all the tasks are complete. There is no "busy waiting" with this approach, where the main thread would waste CPU polling the isDone method of a Future. While this method returns a list of Futures, they are already "done". (There's also an overloaded version that can timeout before completion, which might be safer to use with some tasks.) This can be a lot cleaner than trying to gather up a bunch of Future objects yourself and trying to check their status or block on their get methods individually.
If this is an interactive application, with tasks sporadically spun off to be executed in the background, using a callback as suggested by nick.holt is a great approach. Here, you use the submit a Runnable. The run method invokes the callback with the result when it's been computed. With this approach, you may discard the Future returned by submit, unless you want to be able to cancel running tasks without shutting down the whole ExecutorService.
If you want to be able to cancel tasks or use the timeout capabilities, an important thing to remember is that tasks are canceled by calling interrupt on their thread. So, your task needs to check its interrupted status periodically and abort as needed.

Subclass Thread, and give your class a method that returns the result. When the method is called, if the result hasn't been created, yet, then join() with the Thread. When join() returns, your Thread's work will be done and the result should be available; return it.
Use this only if you actually need to fire off an asynchronous activity, do some work while you're waiting, and then obtain the result. Otherwise, what's the point of a Thread? You might as well just write a class that does the work and returns the result in the main thread.
Another approach would be a callback: have your constructor take an argument that implements an interface with a callback method that will be called when the result is computed. This will make the work completely asynchronous. But if you at all need to wait for the result at some point, I think you're still going to need to call join() from the main thread.

As noted by saua: use the constructs offered by java.util.concurrent. If you're stuck with a pre 1.5 (or 5.0) JRE, you ,might resort to kind of rolling your own, but you're still better of by using a backport: http://backport-jsr166.sourceforge.net/

Is there a way to search for and access Threads that are currently running?

Using Java 6:
I have a method that uses a Thread to run a task in the background. This task accesses files, so the method should not be able to have multiple threads running.
I am trying to figure out if there is a way that I can search for active Threads at the beginning of my method. I want to know if there is an active Thread that is already running my task, so that I can handle the situation properly.
Is this possible without having an actual instance of a previous Thread handy? I would like to avoid saving instances of the Thread globally.

You may want something a little more robust, like using a ReentrantLock to prevent concurrent access to those resources.

Just for reference: you can get all active threads in the current thread's group and its subgroups (for a standalone program, this usually can get you all threads) with java.lang.Thread.enumerate(Thread[]). But this is not the way to solve your problem - as Brian said, use a lock.

Use [ReentrantLock.tryLock](http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/locks/ReentrantLock.html#tryLock(). If it returns false, bingo! Some other thread currently holds the lock.

Even if you had access to these Threads, what would you do with that knowledge? How would you tell what that Thread is currently doing?
If you have a service that can be accessed from multiple places, but you want to guarantee only a single thread will be used by this service, you can set up a work queue like this:
public class FileService
{
private final Queue workQueue = new ArrayBlockingQueue(100/*capacity*/);
public FileService()
{
new Thread()
{
public void run()
{
while(true)
{
Object request = workQueue.take(); // blocks until available
doSomeWork(request);
}
}
}.start();
}
public boolean addTask(Object param)
{
return workQueue.offer(param); // return true on success
}
}
Here, the ArrayBlockingQueue takes care of all the thread safety issues. addTask() can be called safely from any other thread; it will simply add a "job" to the workQueue. Another, internal thread will constantly read from the workQueue and perform some operation if there's work to do, otherwise it will wait quietly.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.