Do I need to synchronize my calls to BlockingQueue (java)?

Do I need to synchronize my calls to BlockingQueue (java)? - java

I want to store a list of objects in a thread safe manner, while maintaining priority. Originally I started out using a BlockingQueue for this as it's thread safe and has ability to maintain custom priority.
I'm wondering if I need to synchronize my methods? My code looks like:
void addToQueue(SomeObject obj) {
... put it on my priority queue
... do some logging
}
What I have noticed is the logging is happening out of order, when accessing addToQueue from multiple threads. So I wrapped my method like so:
void addToQueue(SomeObject obj) {
syncronized(myMutex) {
... put it on my priority queue
... do some logging
}
}
This seemed to keep the logging in order. So now I've come to the conclusion that if I'm going this route, then maybe my code would be more efficient by not using a BlockingQueue but instead use a Set or List and manage the priority myself.
Maybe I have some misunderstanding of the BlockingQueue.

A Queue that additionally supports operations that wait for the queue to become non-empty when retrieving an element, and wait for space to become available in the queue when storing an element.
This is the javadoc for BlockingQueue. You use it if you need this blocking behavior, otherwise you don't.
BlockingQueue does not maintain any priority, it is strictly first-in-first-out. Perhaps you are using PriorityBlockingQueue?
Coming to your pseudocode:
void addToQueue(SomeObject obj) {
... put it on my priority queue
... do some logging
}
The queue is thread-safe, but that only means that multiple threads can concurrently call put it on my priority queue without any data corruption. It does not guarantee any of the following:
If there are multiple threads blocked which one will succeed first
If a thread X completes the put before a thread Y then thread X will also complete the logging before thread Y.
If you need all of addToQueue occur without interleaving from other threads then you need to synchronize. Note that you can use the queue object itself:
void addToQueue(SomeObject obj) {
synchronized (queue) {
... put it on my priority queue
... do some logging
}
}

http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html
As with other concurrent collections, actions in a thread prior to placing an object into a BlockingQueue happen-before actions subsequent to the access or removal of that element from the BlockingQueue in another thread.
If you want to use that safety to get ordered logging you have to log before putting items into the queue and after taken an item from the queue.
I wouldn’t use synchronized for getting ordered logging. Multi-threading means parallel execution and that implies that certain actions don’t have an ordering. Log Records can have a time-stamp and seeing them in the wrong order, i.e. in the console, looks like a minor glitch to me that is not worth sacrificing the advantages of parallel execution for.

Related

Wait till a Blocking Queue is full

Im looking for a way to synchronize multiple asynchronous operations. I'd like to use a BlockingQueue with a size equal to my operations but who can i wait till the Queue is full?
Im looking for something like a reversed Blocking Queue.
I need to gather the Results of each Thread at the End.
The AsyncHandler is fixed, its already a ThreadExecutor underlying, i cannot start new Threads.
//3 Times
makeAsync(new AsyncHandler() {
onSuccess() {
..
queue.put(result)
}
onFailure() {
..
}
});
//Blocking till the Queue is full
List<Results> = queue.takeAll()
Bonus Question: I need a way to end the wait when one of my Requests fails

I've never had need to do this sort of thing, but you might have some luck using a CountDownLatch or CyclicBarrier from your various threads.

What you describe with
//Blocking till the Queue is full
List<Results> results = queue.takeAll();
does not differ semantically from “take as much items as the queue’s capacity”. If you know the capacity you can achieve this by:
// preferably a constant which you also use to construct the bounded queue
int capacity;
…
List<Results> results = new ArrayList<>(capacity);
queue.drainTo(results, capacity);
while(result.size()<capacity)
queue.drainTo(results, capacity-result.size());
This will block until it has received as much items as the capacity which is, as said, the same as waiting for the queue to become full (has a size equal to its capacity) and than take all items. The only difference is that the event of the queue becoming full is not guaranteed to happen, e.g. if you intend your async operations to offer items until the queue is full, it does not work this way.
If you don’t know the capacity, you are out of luck. There is not even a guaranty that an arbitrary BlockingQueue is bounded, read, it might have an unlimited capacity.
On the other hand, if the asynchronous operations are able to detect when they have finished, they could simply collect the items in a list locally and put the entire list into a BlockingQueue<List<Results>> as a single item once they are done. Then your code waiting for it needs only a single take to get the entire list.

If you're using Java 8, do the following:
With each call to makeAsync, create a CompletableFuture<Result> instance and make it available to the AsyncHandler, and have the caller keep a reference too, say in a list.
When an async task completes normally, have it call complete(result) on its CompletableFuture instance.
When an async task completes with an error, have it call completeExceptionally(exception) on its CompletableFuture instance.
After initiating all the asynchronous tasks, have the caller call CompletableFuture.allOf(cfArray).join(). Unfortunately this takes an array, not a list, so you have to convert. The join() call will throw an exception if any one of the tasks completed with an error. Otherwise, you can collect the results from the individual CompletableFuture instances by calling their get() methods.
If you don't have Java 8, you'll have to sort of roll your own mechanism. Initialize a CountDownLatch to the number of async tasks you're going to fire off. Have each async task store its result (or an exception, or some other means of indicating failure) into a thread-safe data structure and then decrement ('countDown`) the latch. Have the caller wait for the latch to reach zero and then collect the results and errors. This isn't terribly difficult, but you have to determine a means for storing valid results as well as recording whether an error occurred, and also maintain a count manually.

If you can modify methodAsync(), then it's as simple as to use a CountDownLatch after each time you put some elements in the queue and have the main thread wait for such a CountDownLatch.
If unfortunately you cannot modify methodAsync(), then simply wrap the queue and give it a count down latch, and then override the add() method to count down this latch. The main method just wait it to be done.
Having said the above, your program structure smells not well organized.

When should I use SynchronousQueue over LinkedBlockingQueue

new SynchronousQueue()
new LinkedBlockingQueue(1)
What is the difference? When I should use SynchronousQueue against LinkedBlockingQueue with capacity 1?

the SynchronousQueue is more of a handoff, whereas the LinkedBlockingQueue just allows a single element. The difference being that the put() call to a SynchronousQueue will not return until there is a corresponding take() call, but with a LinkedBlockingQueue of size 1, the put() call (to an empty queue) will return immediately.
I can't say that i have ever used the SynchronousQueue directly myself, but it is the default BlockingQueue used for the Executors.newCachedThreadPool() methods. It's essentially the BlockingQueue implementation for when you don't really want a queue (you don't want to maintain any pending data).

As far as I understand code above do the same things.
No, the code is not the same at all.
Sync.Q. requires to have waiter(s) for offer to succeed. LBQ will keep the item and offer will finish immediately even if there is no waiter.
SyncQ is useful for tasks handoff. Imagine you have a list w/ pending task and 3 threads available waiting on the queue, try offer() with 1/4 of the list if not accepted the thread can run the task on its own. [the last 1/4 should be handled by the current thread, if you wonder why 1/4 and not 1/3]
Think of trying to hand the task to a worker, if none is available you have an option to execute the task on your own (or throw an exception). On the contrary w/ LBQ, leaving the task in the queue doesn't guarantee any execution.
Note: the case w/ consumers and publishers is the same, i.e. the publisher may block and wait for consumers but after offer or poll returns, it ensures the task/element is to be handled.

One reason to use SynchronousQueue is to improve application performance. If you must have a hand-off between threads, you will need some synchronization object. If you can satisfy the conditions required for its use, SynchronousQueue is the fastest synchronization object I have found. Others agree. See: Implementation of BlockingQueue: What are the differences between SynchronousQueue and LinkedBlockingQueue

[Just trying to put it in (possibly) more clearer words.]
I believe the SynchronousQueue API docs states things very clearly:
A blocking queue in which each insert operation must wait for a corresponding remove operation by another thread, and vice versa.
A synchronous queue does not have any internal capacity, not even a capacity of one. You cannot peek at a synchronous queue because an element is only present when you try to remove it; you cannot insert an element (using any method) unless another thread is trying to remove it; you cannot iterate as there is nothing to iterate.
The head of the queue is the element that the first queued inserting thread is trying to add to the queue; if there is no such queued thread then no element is available for removal and poll() will return null.
And BlockingQueue API docs:
A Queue that additionally supports operations that wait for the queue to become non-empty when retrieving an element, and wait for space to become available in the queue when storing an element.
So the difference is obvious and somewhat critically subtle, especially the 3rd point below:
If the queue is empty when you are retrieving from BlockingQueue, the operation block till the new element is inserted. Also, if the queue is full when you are inserting in the BlockingQueue, the operation will block till the element is removed from the queue and a space is made for the new queue. However note that in SynchronousQueue, as operation is blocked for opposite operation (insert and remove are opposite of each other) to occur on another thread. So, unlike BlockingQueue, the blocking depends on the existence of the operation, instead of existence or non existence of an element.
As the blocking is dependent on existence of opposite operation, the element never really gets inserted in the queue. Thats why the second point: "A synchronous queue does not have any internal capacity, not even a capacity of one."
As a consequence, peek() always returns null (again, check the API doc) and iterator() returns an empty iterator in which hasNext() always returns false. (API doc). However, note that the poll() method neatly retrieves and removes the head of this queue, if another thread is currently making an element available and if no such thread exists, it returns null. (API doc)
Finally, a small note, both SynchronousQueue and LinkedBlockingQueue classes implement BlockingQueue interface.

SynchronousQueue works in a similar fashion with following major differences:
1) The size of SynchronousQueue is 0
2) put() method will only insert an element if take() method will be able to fetch that element from the queue at the same moment i.e. an element cannot be inserted if the consumer take() call is going to take some time to consume it.
SynchronousQueue - Insert only when someone will receive it at that moment itself.

Synchronous queues are basically used for handoff purposes. They do not have any capacity and a put operation is blocked until some other thread performs get operation.
If we want to safely share a variable between two threads, we can put that variable in synchrounous queue and let other thread take it from the queue.
Code Sample from https://www.baeldung.com/java-synchronous-queue
ExecutorService executor = Executors.newFixedThreadPool(2);
SynchronousQueue<Integer> queue = new SynchronousQueue<>();
Runnable producer = () -> {
Integer producedElement = ThreadLocalRandom
.current()
.nextInt();
try {
queue.put(producedElement);
} catch (InterruptedException ex) {
ex.printStackTrace();
}
};
Runnable consumer = () -> {
try {
Integer consumedElement = queue.take();
} catch (InterruptedException ex) {
ex.printStackTrace();
}
};
executor.execute(producer);
executor.execute(consumer);
executor.awaitTermination(500, TimeUnit.MILLISECONDS);
executor.shutdown();
assertEquals(queue.size(), 0);
They are also used in CachedThreadPool to achieve an effect of unlimited(Integer.MAX) thread creation as tasks arrive.
CachedPool has coreSize as 0 and maxPoolSize as Integer.MAX with synchronous queue
As tasks arrive onto queue, other tasks are blocked until the first one is fetched out. Since it does not have any queue capacity, thread pool will create one thread and this thread will take out task allowing more tasks to be put onto the queue. This will continue until thread creation reaches maxPoolSize. Based on timeOut, idle threads maybe terminated and new ones are created without crossing the maxPoolSize.

How to access the underlying queue of a ThreadpoolExecutor in a thread safe way

The getQueue() method provides access to the underlying blocking queue in the ThreadPoolExecutor, but this does not seem to be safe.
A traversal over the queue returned by this function might miss updates made to the queue by the ThreadPoolExecutor.
"Method getQueue() allows access to the work queue for purposes of monitoring and debugging. Use of this method for any other purpose is strongly discouraged."
What would you do if you wanted to traverse the workQueue used by the ThreadPoolExecutor? Or is there an alternate approach?
This is a continuation of..
Choosing a data structure for a variant of producer consumer problem
Now, I am trying the multiple producer multiple consumer, but I want to use some existing threadpool, since I don't want to manage the threadpool myself, and also I want a callback when ThreadPoolExecutor has finished executing some task alongwith the ability to examine in a thread safe way the "inprogress transactions" data structure.

You can override the beforeExecute and afterExecute methods to let you know that a task has started and finished. You can override execute() to know when a task is added.
The problem you have is that the Queue is not designed to be queried and a task can be consumed before you see it. One way around this is to create you own implementation of a Queue (perhaps overriding/wrapping a ConcurrentLinkedQueue)
BTW: The queue is thread-safe, however it is not guaranteed you will see every entry.
A ConcurrentLinkedQueue.iterator() is documented as
Returns an iterator over the elements in this queue in proper sequence. The returned iterator is a "weakly consistent" iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator, and may (but is not guaranteed to) reflect any modifications subsequent to construction.

If you wish to copy the items in the queue and ensure that what you have in the queue has not been executed, you might try this:
a) Introduce the ability to pause and resume execution. See: http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html
b) first pause the queue, then copy the queue, then resume the queue.
And then i have my own question. The problem i see is that while you execute your "Runnable", that "Runnable" is not placed in the queue, but a FutureTask "wrapper", and i cannot find any way to determine just which one of my runnables i'm looking at. So, grabbing and examining the queue is pretty useless. Does anybody know aht i missed there?

If you are following Jon Skeet's advice in your accepted answer from your previous question, then you'll be controlling access to your queues via locks. If you acquire a lock on the in-progress queue then you can guarantee that a traversal will not miss any items in it.
The problem with this of course is that while you are doing the traverse all other operations on the queue (other producers and consumers trying to access it) will block, which could have a pretty dire effect on performance.

design of a Producer/Consumer app

I have a producer app that generates an index (stores it in some in-memory tree data structure). And a consumer app will use the index to search for partial matches.
I don't want the consumer UI to have to block (e.g. via some progress bar) while the producer is indexing the data. Basically if the user wishes to use the partial index, it will just do so. In this case, the producer will potentially have to stop indexing for a while until the user goes away to another screen.
Roughly, I know I will need the wait/notify protocol to achieve this. My question: is it possible to interrupt the producer thread using wait/notify while it is doing its business ? What java.util.concurrent primitives do I need to achieve this ?

The way you've described this, there's no reason that you need wait/notify. Simply synchronize access to your data structure, to ensure that it is in a consistent state when accessed.
Edit: by "synchronize access", I do not mean synchronize the entire data structure (which would end up blocking either producer or consumer). Instead, synchronize only those bits that are being updated, and only at the time that you update them. You'll find that most of the producer's work can take place in an unsynchronized manner: for example, if you're building a tree, you can identify the node where the insert needs to happen, synchronize on that node, do the insert, then continue on.

In your producer thread, you are likely to have some kind of main loop. This is probably the best place to interrupt your producer. Instead of using wait() and notify() I suggest you use the java synchronization objects introduced in java 5.
You could potentially do something like that
class Indexer {
Lock lock = new ReentrantLock();
public void index(){
while(somecondition){
this.lock.lock();
try{
// perform one indexing step
}finally{
lock.unlock();
}
}
}
public Item lookup(){
this.lock.lock();
try{
// perform your lookup
}finally{
lock.unlock();
}
}
}
You need to make sure that each time the indexer releases the lock, your index is in a consistent, legal state. In this scenario, when the indexer releases the lock, it leaves a chance for a new or waiting lookup() operation to take the lock, complete and release the lock, at which point your indexer can proceed to its next step. If no lookup() is currently waiting, then your indexer just reaquires the lock itself and goes on with its next operation.
If you think you might have more that one thread trying to do the lookup at the same time, you might want to have a look at the ReadWriteLock interface and ReentrantReadWriteLock implementation.
Of course this solution is the simple way to do it. It will block either one of the threads that doesn't have the lock. You may want to check if you can just synchronize on your data structure directly, but that might prove tricky since building indexes tends to use some sort of balanced tree or B-Tree or whatnot where node insertion is far from being trivial.
I suggest you first try that simple approach, then see if the way it behaves suits you. If it doesn't, you may either try breaking up the the indexing steps into smaller steps, or try synchronizing on only parts of your data structure.
Don't worry too much about the performance of locking, in java uncontended locking (when only one thread is trying to take the lock) is cheap. As long as most of your locking is uncontented, locking performance is nothing to be concerned about.

The producer application can have two indices: published and in-work. The producer will work only with in-work, the consumer will work only with published. Once the producer done with indexing it can replace in-work one with published (usually swapping one pointer). The producer may also publish copy of the partial index if will bring value. This way you will avoid long term locks -- it will be useful when index accessed by lost of consumers.

No, that's not possible.
The only way of notifying a thread without any explicit code in the thread itself is to use Thread.interrupt(), which will cause an exception in the thread. interrrupt() is usually not very reliable though, because throwing a exception at some random point in the code is a nightmare to get right in all code paths. Beside that, a single try{}catch(Throwable){} somewhere in the thread (including any libraries that you use) could be enough to swallow the signal.
In most cases, the only correct solution is use a shared flag or a queue that the consumer can use to pass messages to the producer. If you worry about the producer being unresponsive or freezing, run it in a separate thread and require it to send heartbeat messages every n seconds. If it does not send a heartbeat, kill it. (Note that determining whether a producer is actually freezing, and not just waiting for an external event, is often very hard to get right as well).

How to use ConcurrentLinkedQueue?

How do I use a ConcurrentLinkedQueue in Java?
Using this LinkedQueue, do I need to be worried about concurrency in the queue? Or do I just have to define two methods (one to retrive elements from the list and another to add elements to the list)?
Note: obviously these two methods have to be synchronized. Right?
EDIT: What I'm trying to do is this: I have a class (in Java) with one method to retrieve items from the queue and another class with one method to add items to the queue. The items added and retrieved from the list are objects of my own class.
One more question: do I need to do this in the remove method:
while (queue.size() == 0){
wait();
queue.poll();
}
I only have one consumer and one producer.

No, the methods don't need to be synchronized, and you don't need to define any methods; they are already in ConcurrentLinkedQueue, just use them. ConcurrentLinkedQueue does all the locking and other operations you need internally; your producer(s) adds data into the queue, and your consumers poll for it.
First, create your queue:
Queue<YourObject> queue = new ConcurrentLinkedQueue<YourObject>();
Now, wherever you are creating your producer/consumer objects, pass in the queue so they have somewhere to put their objects (you could use a setter for this, instead, but I prefer to do this kind of thing in a constructor):
YourProducer producer = new YourProducer(queue);
and:
YourConsumer consumer = new YourConsumer(queue);
and add stuff to it in your producer:
queue.offer(myObject);
and take stuff out in your consumer (if the queue is empty, poll() will return null, so check it):
YourObject myObject = queue.poll();
For more info see the Javadoc
EDIT:
If you need to block waiting for the queue to not be empty, you probably want to use a LinkedBlockingQueue, and use the take() method. However, LinkedBlockingQueue has a maximum capacity (defaults to Integer.MAX_VALUE, which is over two billion) and thus may or may not be appropriate depending on your circumstances.
If you only have one thread putting stuff into the queue, and another thread taking stuff out of the queue, ConcurrentLinkedQueue is probably overkill. It's more for when you may have hundreds or even thousands of threads accessing the queue at the same time. Your needs will probably be met by using:
Queue<YourObject> queue = Collections.synchronizedList(new LinkedList<YourObject>());
A plus of this is that it locks on the instance (queue), so you can synchronize on queue to ensure atomicity of composite operations (as explained by Jared). You CANNOT do this with a ConcurrentLinkedQueue, as all operations are done WITHOUT locking on the instance (using java.util.concurrent.atomic variables). You will NOT need to do this if you want to block while the queue is empty, because poll() will simply return null while the queue is empty, and poll() is atomic. Check to see if poll() returns null. If it does, wait(), then try again. No need to lock.
Finally:
Honestly, I'd just use a LinkedBlockingQueue. It is still overkill for your application, but odds are it will work fine. If it isn't performant enough (PROFILE!), you can always try something else, and it means you don't have to deal with ANY synchronized stuff:
BlockingQueue<YourObject> queue = new LinkedBlockingQueue<YourObject>();
queue.put(myObject); // Blocks until queue isn't full.
YourObject myObject = queue.take(); // Blocks until queue isn't empty.
Everything else is the same. Put probably won't block, because you aren't likely to put two billion objects into the queue.

This is largely a duplicate of another question.
Here's the section of that answer that is relevant to this question:
Do I need to do my own synchronization if I use java.util.ConcurrentLinkedQueue?
Atomic operations on the concurrent collections are synchronized for you. In other words, each individual call to the queue is guaranteed thread-safe without any action on your part. What is not guaranteed thread-safe are any operations you perform on the collection that are non-atomic.
For example, this is threadsafe without any action on your part:
queue.add(obj);
or
queue.poll(obj);
However; non-atomic calls to the queue are not automatically thread-safe. For example, the following operations are not automatically threadsafe:
if(!queue.isEmpty()) {
queue.poll(obj);
}
That last one is not threadsafe, as it is very possible that between the time isEmpty is called and the time poll is called, other threads will have added or removed items from the queue. The threadsafe way to perform this is like this:
synchronized(queue) {
if(!queue.isEmpty()) {
queue.poll(obj);
}
}
Again...atomic calls to the queue are automatically thread-safe. Non-atomic calls are not.

This is probably what you're looking for in terms of thread safety & "prettyness" when trying to consume everything in the queue:
for (YourObject obj = queue.poll(); obj != null; obj = queue.poll()) {
}
This will guarantee that you quit when the queue is empty, and that you continue to pop objects off of it as long as it's not empty.

Use poll to get the first element, and add to add a new last element. That's it, no synchronization or anything else.

The ConcurentLinkedQueue is a very efficient wait/lock free implementation (see the javadoc for reference), so not only you don't need to synchronize, but the queue will not lock anything, thus being virtually as fast as a non synchronized (not thread safe) one.

Just use it as you would a non-concurrent collection. The Concurrent[Collection] classes wrap the regular collections so that you don't have to think about synchronizing access.
Edit: ConcurrentLinkedList isn't actually just a wrapper, but rather a better concurrent implementation. Either way, you don't have to worry about synchronization.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.