design of a Producer/Consumer app

design of a Producer/Consumer app - java

I have a producer app that generates an index (stores it in some in-memory tree data structure). And a consumer app will use the index to search for partial matches.
I don't want the consumer UI to have to block (e.g. via some progress bar) while the producer is indexing the data. Basically if the user wishes to use the partial index, it will just do so. In this case, the producer will potentially have to stop indexing for a while until the user goes away to another screen.
Roughly, I know I will need the wait/notify protocol to achieve this. My question: is it possible to interrupt the producer thread using wait/notify while it is doing its business ? What java.util.concurrent primitives do I need to achieve this ?

The way you've described this, there's no reason that you need wait/notify. Simply synchronize access to your data structure, to ensure that it is in a consistent state when accessed.
Edit: by "synchronize access", I do not mean synchronize the entire data structure (which would end up blocking either producer or consumer). Instead, synchronize only those bits that are being updated, and only at the time that you update them. You'll find that most of the producer's work can take place in an unsynchronized manner: for example, if you're building a tree, you can identify the node where the insert needs to happen, synchronize on that node, do the insert, then continue on.

In your producer thread, you are likely to have some kind of main loop. This is probably the best place to interrupt your producer. Instead of using wait() and notify() I suggest you use the java synchronization objects introduced in java 5.
You could potentially do something like that
class Indexer {
Lock lock = new ReentrantLock();
public void index(){
while(somecondition){
this.lock.lock();
try{
// perform one indexing step
}finally{
lock.unlock();
}
}
}
public Item lookup(){
this.lock.lock();
try{
// perform your lookup
}finally{
lock.unlock();
}
}
}
You need to make sure that each time the indexer releases the lock, your index is in a consistent, legal state. In this scenario, when the indexer releases the lock, it leaves a chance for a new or waiting lookup() operation to take the lock, complete and release the lock, at which point your indexer can proceed to its next step. If no lookup() is currently waiting, then your indexer just reaquires the lock itself and goes on with its next operation.
If you think you might have more that one thread trying to do the lookup at the same time, you might want to have a look at the ReadWriteLock interface and ReentrantReadWriteLock implementation.
Of course this solution is the simple way to do it. It will block either one of the threads that doesn't have the lock. You may want to check if you can just synchronize on your data structure directly, but that might prove tricky since building indexes tends to use some sort of balanced tree or B-Tree or whatnot where node insertion is far from being trivial.
I suggest you first try that simple approach, then see if the way it behaves suits you. If it doesn't, you may either try breaking up the the indexing steps into smaller steps, or try synchronizing on only parts of your data structure.
Don't worry too much about the performance of locking, in java uncontended locking (when only one thread is trying to take the lock) is cheap. As long as most of your locking is uncontented, locking performance is nothing to be concerned about.

The producer application can have two indices: published and in-work. The producer will work only with in-work, the consumer will work only with published. Once the producer done with indexing it can replace in-work one with published (usually swapping one pointer). The producer may also publish copy of the partial index if will bring value. This way you will avoid long term locks -- it will be useful when index accessed by lost of consumers.

No, that's not possible.
The only way of notifying a thread without any explicit code in the thread itself is to use Thread.interrupt(), which will cause an exception in the thread. interrrupt() is usually not very reliable though, because throwing a exception at some random point in the code is a nightmare to get right in all code paths. Beside that, a single try{}catch(Throwable){} somewhere in the thread (including any libraries that you use) could be enough to swallow the signal.
In most cases, the only correct solution is use a shared flag or a queue that the consumer can use to pass messages to the producer. If you worry about the producer being unresponsive or freezing, run it in a separate thread and require it to send heartbeat messages every n seconds. If it does not send a heartbeat, kill it. (Note that determining whether a producer is actually freezing, and not just waiting for an external event, is often very hard to get right as well).

Related

Java, why need to use synchronization? instead of using a single thread?

While reading about Java synchronized, I just wondered, if the processing should be in synchronization, why not just creating a single thread (not main thread) and process one by one instead of creating multiple threads.
Because, by 'synchronized', all other threads will be just waiting except single running thread. It seems like the only single thread is working in the time.
Please advise me what I'm missing it.
I would very appreciate it if you could give some use cases.
I read an example, that example about accessing bank account from 2 ATM devices. but it makes me more confused, the blocking(Lock) should be done by the Database side, I think. and I think the 'synchronized' would not work in between multiple EC2 instances.
If my thinking is wrong, please fix me.

If all the code you run with several threads is within a synchronized block, then indeed it makes no difference vs. using a single thread.
However in general your code contains parts which can be run on several threads in parallel and parts which can't. The latter need synchronization but not the former. By using several threads you can speed up the "parallelisable" bits.

Let's consider the following use-case :
Your application is a internet browser game. Every player has a score and can click a button. Every time a player clicks the button, their score is increased and their opponent's is decreased. The first player to reach 10 wins.
As per the nature of the game, and to single a unique winner, you have to consider the two counters increase (and the check for the winner) atomically.
You'll have each player send clickEvents on their own thread and every event will be translated into the increase of the owner's counter, the check on whether the counter reached 10 and the decrease of the opponent's counter.
This is very easily done by synchronizing the method which handles modifying the counters : every concurrent thread will try to obtain the lock, and when they do, they'll execute the code (and finally release the lock).
The locking mechanism is pretty lightweight and only requires a single keyword of code.
If we follow your suggestion to implement another thread that will handle the execution, we'd have to implement the whole thread management logic (more code), to initialize that Thread (more resource) and even so, to guarantee fairness in the handling of events, you still need a way for your client threads to pass the event to your executor thread. The only way I see to do so, is to implement a BlockingQueue, which is also synchronized to prevent the race condition that naturally occurs when trying to add elements from two other thread.
I honnestly don't see a way to resolve this very simple use-case without synchronization (or implementing your own locking algorithm that basically does the same).

You can have a single thread and process one-by-one (and this is done), but there are considerable overheads in doing so and it does not remove the need for synchronization.
You are in a situation where you are starting with multiple threads (for example, you have lots of simultaneous web sessions). You want to do a part of the processing in a single thread - let's say updating some common structure with some new data. You need to pass the new data to the single thread - how do you get it there? You would have to use some kind of message queue (or an equivalent thing) and have the single thread pick requests off the message queue and that would have have to be synchronized anyway, plus there is the overhead of managing the queue, plus the issue that you need to get a reply back from the single thread asynchronously. So you are back to square one.
This technique is used where the processing you need to do is considerable and you don't want to block your main threads for a long time.
In summary: having a single thread does not remove the need for synchronization.

Does a 'blocking' queue defeat the very purpose of multi threading

The ArrayBlockingQueue will block the producer thread if the queue is full and it will block the consumer thread if the queue is empty.
Does not this concept of blocking goes against the very idea of multi threading? if I have a 'main' thread and let us say I want to delegate all 'Logging' activities to another thread. So Basically inside my main thread,I create a Runnable to log the output and I put the Runnable on an ArrayBlockingQueue. The whole purpose of doing this is have the 'main' thread return immediately without wasting any time in an expensive logging operation.
But if the queue is full, the main thread will be blocked and will wait until a spot is available. So how does it help us?

The queue doesn't block out of spite, it blocks to introduce an additional quality into the system. In this case, it's prevention of starvation.
Picture a set of threads, one of which produces work units really fast. If the queue were to be allowed unbounded growth, potentially, the "rapid producer" queue could hog all the producing capacity. Sometimes, prevention of such side-effects is more important than having all threads unblocked.

I think this is the designer's decision. If he chose blocking mode ArrayBlockingQueue provides it with put method. If the desiner dont want blocking mode ArrayBlockingQueue has offer method which will return false when queue is full but then he needs to decide what to do with regected logging event.

In your example I would consider blocking to be a feature: It prevents an OutOfMemoryError.
Generally speaking, one of your threads is just not fast enough to cope with the assigned load. So the others must slow down somehow in order not to endanger the whole application.
On the other hand, if the load is balanced, the queue will not block.

Blocking is a necessary function of multithreading. You must block to have synchronized access to data. It does not defeat the purpose of multithreading.
I would suggest throwing an exception when the producer attempts to submit an item to a queue which is full. There are methods to test if the capacity is full beforehand I believe.
This would allow the invoking code to decide how it wants to handle a full queue.
If execution order when processing items from the queue is unimportant, I recommend using a threadpool (known as an ExecutorService in Java).

It depends on the nature of your multi threading philosophy. For those of us who favour Communicating Sequential Processes a blocking queue is nearly perfect. In fact, the ideal would be one where no message can be put into the queue at all unless the receiver is ready to receive it.
So no, I don't think that a blocking queue goes against the very purpose of multi-threading. In fact, the scenario that you describe (the main thread eventually getting stalled) is a good illustration of the major problem with the actor-model of multi-threading; you've no idea whether or not it will deadlock / block, and you can't exhaustively test for it either.
In contrast, imagine a blocking queue that is zero messages deep. That way for the system to work at all you'd have to find a way to ensure that the logger is always guaranteed to be able to receive a message from the main thread. That's CSP. It might mean that in your hypothetical logger thread you have to have application defined buffering (as opposed to some framework developer's best guess of how deep a FIFO should be), a fast I/O subsystem, checks for keeping up, ways of dealing with falling behind, etc. In short it doesn't let you get away with it, you're forced to address every aspect of your system's performance.
That is of course harder, but that way you end up with a system that's definitely OK rather than the questionable "maybe" that you have if your blocking queues are an unknown number of messages deep.

It sounds like you have the general idea right of why you'd use something like an ArrayBlockingQueue to talk between threads.
Having a blocking queue gives you the option to do something different in case something goes wrong with your background worker threads, rather than blindly adding more requests to the queue. If there is room in the queue, there is no blocking.
For your specific use case, though, I would use ExecutorService rather than reading/writing queues directly, which creates a pool of background worker threads:
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html
pool = Executors.newFixedThreadPool(poolSize);
pool.submit(myRunnable);

A multithreaded program is non-deterministic insofar as you can't say beforehand: n producer actions will take exactly as long as m consumer actions. Therefore, synchronization between n producers and m consumers is necessary in every case.
You'll want to choose the queue size so that the number of active producers and consumers is maximized most of the time. But the thread model of java does not guarantee that any consumer will run unless it is the only unblocked thread. (Yet, of course, on multi-core CPUs it is very likely that the consumer will run).

You have to make a choice about what to do when a Queue is full. In the case of an Array Blocking queue, that choice is to wait.
Another option would be to just throw away new Objects if the queue was full; you can achieve this with offer.
You have to make a trade-off.

Is there a non-reentrant ReadWriteLock I can use?

I need a ReadWriteLock that is NOT reentrant, because the lock may be released by a different thread than the one that acquired it. (I realized this when I started to get IllegalMonitorStateException intermittently.)
I'm not sure if non-reentrant is the right term. A ReentrantLock allows the thread that currently holds to lock to acquire it again. I do NOT want this behaviour, therefore I'm calling it "non-reentrant".
The context is that I have a socket server using a thread pool. There is NOT a thread per connection. Requests may get handled by different threads. A client connection may need to lock in one request and unlock in another request. Since the requests may be handled by different threads, I need to be able to lock and unlock in different threads.
Assume for the sake of this question that I need to stay with this configuration and that I do really need to lock and unlock in different requests and therefore possibly different threads.
It's a ReadWriteLock because I need to allow multiple "readers" OR an exclusive "writer".
It looks like this could be written using AbstractQueuedSynchronizer but I'm afraid if I write it myself I'll make some subtle mistake. I can find various examples of using AbstractQueuedSynchronizer but not a ReadWriteLock.
I could take the OpenJDK ReentrantReadWriteLock source and try to remove the reentrant part but again I'm afraid I wouldn't get it quite right.
I've looked in Guava and Apache Commons but didn't find anything suitable. Apache Commons has RWLockManager which might do what I need but I'm not sure and it seems more complex than I need.

A Semaphore allows different threads to perform the acquire and release of permits. An exclusive write is equivalent to having all of the permits, as the thread waits until all have been released and no additional permits can be acquired by other threads.
final int PERMITS = Integer.MAX_VALUE;
Semaphore semaphore = new Semaphore(PERMITS);
// read
semaphore.acquire(1);
try { ... }
finally {
semaphore.release(1);
}
// write
semaphore.acquire(PERMITS);
try { ... }
finally {
semaphore.release(PERMITS);
}

I know you've already accepted another answer. But I still think that you are going to create quite a nightmare for yourself. Eventually, a client is going to fail to come back and release those permits and you'll begin to wonder why the "writer" never writes.
If I were doing it, I would do it like this:
Client issues a request to start a transaction
The initial request creates a task (Runnable/Callable) and places it in an Executor for execution
The initial request also registers that task in a Map by transaction id
Client issues the second request to close the transaction
The close request finds the task by transaction id in a map
The close request calls a method on the task to indicate that it should close (probably a signal on a Condition or if data needs to be passed, placing an object in a BlockingQueue)
Now, the transaction task would have code like this:
public void run() {
readWriteLock.readLock().lock();
try {
//do stuff for initializing this transaction
if (condition.await(someDurationAsLong, someTimeUnit)( {
//do the rest of the transaction stuff
} else {
//do some other stuff to back out the transaction
}
} finally {
readWriteLock.readLock.unlock();
}
}

Not entirely sure what you need, esp. why it should be a read write lock, but if you have task that need to be handled by many threads, and you don't want it to be processesd/accessed concurrently, I'd use actually a ConcurrentMap ( etc.).
You can remove the task from the map or substitute it with a special "lock object" to indicate it's locked. You could return the task with an updated state to the map to let another thread take over, or alternatively you can pass the task directly to the next thread and let it return the task to the map instead.

They seem to have dropped the ball on this one by deprecating com.sun.corba.se.impl.orbutil.concurrent.Mutex;
I mean who in his right mind thinks that we won't need non-reentrant locks. Here we are, wasting our times arguing over the definition of reentrant (can slighty change in meaning per framework btw). Yes I want to tryLock on the same thread is that such a bad thing? it won't deadlock because ill else out of it. A non-reentrant lock that locks in the same thread can be very usefull to prevent errors on GUI apps where the user presses on the same button rapidly and repeatedly. Been there, done that, QT was right...again.

SwingWorker synchronized method queue blocking or what?

Theoretical question. If I have two SwingWorkers and an outputObject with method
public void synchronized outputToPane(String output)
If each SwingWorker has a loop in it as shown:
//SwingWorker1
while(true) {
outputObject.outputToPane("garbage");
}
//SwingWorker2
Integer i=0;
while(true) {
outputObject.outputToPane(i.toString());
i++;
}
How would those interact? does the outputToPane method receive an argument from one thread and block the other one until it finishes with the first, or does it build a queue of tasks that will execute in the order received, or some other option?
The reason I ask:
I have two threads that will be doing some heavy number crunching, one with a non-pausable data stream and the other from a file. I would like them both to output to a central messaging area when they hit certain milestones; however, I CANNOT risk the data stream getting blocked while it waits for the other thread to finish with the output. I will risk losing data then.

synchronized only guarantees mutual exclusion. Is not fair, which in practice means that your workers might alternate quite nicely, or the first one might get precedence and block the second one completely until finished, or anything between.
See Reentrantlock docs for more about fairness. Maybe you could consider using it instead of synchronized. Probably even better alternative would be using a Queue.

I would advise you to have two output object in your messaging area. Because if one thread starts to modify the output answer then the other one will have to wait for it to finish. Even if you can optimize it to make it fast enough, the actual display of info would make your threads slow each others down over time.
Although you might try to synchronize them, the result might not always be 100% safe

What collection supports multiple simultaneous insertions?

We are developing a Java application with several worker threads. These threads will have to deliver a lot of computation results to our UI thread. The order in which the results are delivered does not matter.
Right now, all threads simply push their results onto a synchronized Stack - but this means that every thread must wait for the other threads before results can be delivered.
Is there a data structure that supports simultaneous insertions with each insertion completing in constant time?
Thanks,
Martin

ConcurrentLinkedQueue is designed for high contention. Producers enqueue stuff on one end and consumers collect elements at the other end, so everything will be processed in the order it's added.
ArrayBlockingQueue is a better for lower contention, with lower space overhead.
Edit: Although that's not what you asked for. Simultaneuos inserts? You may want to give every thread one output queue (say, an ArrayBlockingQueue) and then have the UI thread poll the separate queues. However, I'd think you'll find one of the two above Queue implementations sufficient.

Right now, all threads simply push
their results onto a synchronized
Stack - but this means that every
thread must wait for the other threads
before results can be delivered.
Do you have any evidence indicating that this is actually a problem? If the computation performed by those threads is even the least little bit complex (and you don't have literally millions of threads), then lock contention on the result stack is simply a non-issue because when any given thread delivers its results, all others are most likely busy doing their computations.

Take a step back and evaluate whether performance is the key design consideration here. Don't think, know: does profiling back it up?
If not, I'd say a bigger concern is clarity and readability of design, and not introducing new code to maintain. It just so happens that, if you're using Swing, there is a library for doing exactly what you're trying to do, called SwingWorker.

Take a look at java.util.concurrent.ConcurrentLinkedQueue, java.util.concurrent.ConcurrentHashMap or java.util.concurrent.ConcurrentSkipListSet. They might do what you need. ConcurrentSkipListSet, for instance, claims to have "expected average log(n) time cost for the contains, add and remove operations and their variants. Insertion, removal, and access operations safely execute concurrently by multiple threads."

Two other patterns you might want to look at are
each thread has its own collection, when polled it returns the collection and creates a new one, so the collection only holds the pending items between polls. The thread needs to protect operations on its collection, but there is no contention between threads. This is blocking (each thread cannot add to its collection while the UI thread pulls updates from it), but can reduce contention (no contention between threads).
each thread has its own collection, and appends the results to a common queue which is protected using a Lock.tryLock(). The thread continues processing if it fails to acquire the lock. This makes it less likely that a thread will block waiting for the shared queue.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.