Is there a bounded lock-free blocking queue? - java

Currently we have LinkedBlockingQueue and ConcurrentLinkedQueue.
LinkedBlockingQueue can be bounded, but it uses locks.
ConcurrentLinkedQueue doesn't use locks, but it is not bounded. And it is doesn't block which makes it hard to poll.
Obviously I can't have a queue that both blocks and is lock-free (wait-free or non-blocking or something else). I don't ask for academical definitions.
Does anyone know a queue implementation that is mostly lock-free (doesn't use a lock in the hot path), blocks when empty (no need to busy waiting), and is bounded (blocking when full)? Off-heap solution is welcome as well.
I heard about LMAX Disruptor, but it doesn't look like a queue at all.
I am happy to know non-general solutions too (Single-Producer-Single-Consumer, SPMC, MPSC)
If there are no known implementations, I am also happy to know possible algorithms.

The lock-free data structures use atomic reads and writes (e.g. compare-and-swap) to eliminate the need for locks. Naturally, these data structures never blocks.
What you describe is a queue that uses lock-free mechanisms for non-blocking calls, e.g. remove() with non-empty queue, while uses lock to block for e.g. remove() on empty queue.
As you might realize this is not possible to implement. If, for example, you were to after a pop operation, see if the queue was in fact empty and then proceed to block, by the time you block, the queue might already have one or more items inserted by another thread.

Related

How do we maintain order of elements in a Bounded Blocking Queue

A queue by virtue of its definition should be a FIFO kind of structure. While we make it blocking, it means that several threads might be blocked on adding new elements, when the size of the queue is equal to MAX_LIMIT of queue size. Now if one element got dequeued from the queue, how do we ensure that the thread which was waiting for the first time is able to execute.
If you read the documentation of a particular implementation, you will e.g. find:
ArrayBlockingQueue
This class supports an optional fairness policy for ordering waiting producer and consumer threads. By default, this ordering is not guaranteed. However, a queue constructed with fairness set to true grants threads access in FIFO order. Fairness generally decreases throughput but reduces variability and avoids starvation.
LinkedBlockingQueue
No fairness guarantees are available.
SynchronousQueue
This class supports an optional fairness policy for ordering waiting producer and consumer threads. By default, this ordering is not guaranteed. However, a queue constructed with fairness set to true grants threads access in FIFO order.
See #Andreas' answer for a summary of how regular queue classes handle this.
But I'm going to suggest an alternative approach / way of looking at this.
You are proposing a scenario where threads are blocking because they can't add to a queue. If this is occurring, then your biggest problem from a performance perspective is the resources and locks used / held by the blocked threads.
In general, there are two possible causes for this:
You have a short-term imbalance between the rate at which queue entries are being added and removed. This can be solved by simply increasing the queue bounds.
You have a long-term imbalance between the rate at which queue entries are being added and removed. This can only be solved by adding more consumer threads and/or removing worker threads.
The point is that if you can make threads not need to block, you don't need to worry about the fairness of blocked thread.
The other issue is whether fairness actually matters. Does it really matter that entries are added to the queue in a strictly fair fashion? Does it impact on the correctness of the application? (Will the users be able to tell when their request is held back ... or overtakes another user's request?)
(I can imagine some scenarios where strict fairness is a requirement. But they are a tiny minority.)
And if strict fairness is a requirement, what about the handling of the request that happens before the request is added to the queue. Does that need to be fair too? Because the JVM and the OS do not provide any guarantees that thread scheduling is fair.
Basically, making request handling totally fair is an intractable problem. So for most scenarios there is little point in implementing strict fairness of FIFO queue submission.

Are Synchronized Blocks needed for Blocking Queues

public BlockingQueue<Message> Queue;
Queue = new LinkedBlockingQueue<>();
I know if I use, say a synchronized List, I need to surround it in synchronized blocks to safely use it across threads
Is that the same for Blocking Queues?
No you do not need to surround with synchronized blocks.
From the JDK javadocs...
BlockingQueue implementations are thread-safe. All queuing methods achieve their effects atomically using internal locks or other forms of concurrency control. However, the bulk Collection operations addAll, containsAll, retainAll and removeAll are not necessarily performed atomically unless specified otherwise in an implementation. So it is possible, for example, for addAll(c) to fail (throwing an exception) after adding only some of the elements in c.
Just want to point out that from my experience the classes in the java.util.concurrent package of the JDK do not need synchronization blocks. Those classes manage the concurrency for you and are typically thread-safe. Whether intentional or not, seems like the java.util.concurrent has superseded the need to use synchronization blocks in modern Java code.
Depends on use case, will explain 2 scenarios where you may need synchronized blocks or dont need it.
Case 1: Not required while using queuing methods e.g. put, take etc.
Why not required is explained here, important line is below:
BlockingQueue implementations are thread-safe. All queuing methods
achieve their effects atomically using internal locks or other forms
of concurrency control.
Case 2: Required while iterating over blocking queues and most concurrent collections
Since iterator (one example from comments) is weakly consistent, meaning it reflects some but not necessarily all of the changes that have been made to its backing collection since it was created. So if you care about reflecting all changes you need to use synchronized blocks/ Locks while iterating.
You are thinking about synchronization at too low a level. It doesn't have anything to do with what classes you use. It's about protecting data and objects that are shared between threads.
If one thread is able to modify any single data object or group of related data objects while other threads are able to look at or modify the same object(s) at the same time, then you probably need synchronization. The reason is, it often is not possible for one thread to modify data in a meaningful way without temporarily putting the data into an invalid state.
The purpose of synchronization is to prevent other threads from seeing the invalid state and possibly doing bad things to the same data or to other data as a result.
Java's Collections.synchronizedList(...) gives you a way for two or more threads to share a List in such a way that the list itself is safe from being corrupted by the action of the different threads. But, It does not offer any protection for the data objects that are in the List. If your application needs that protection, then it's up to you to supply it.
If you need the equivalent protection for a queue, you can use any of the several classes that implement java.util.concurrent.BlockingQueue. But beware! The same caveat applies. The queue itself will be protected from corruption, but the protection does not automatically extend to the objects that your threads pass through the queue.

Blocking queue method in concurrent package;

Wondering why the method drainTo is present only in the concurrent collection framework (BlockingQueue in particular) and not in the regular one. Is there any reason for that?
Thanks in advance.
As always with that sort of question it is difficult to say without asking the author of the class himself. But we can make some educated guesses.
The javadoc states:
Removes all available elements from this queue and adds them to the given collection. This operation may be more efficient than repeatedly polling this queue.
So the underlying reason is probably for efficiency.
drainTo is essentially equivalent (in a single threaded environment, for simplicity) to:
while ((e = queue.poll()) != null) collection.add(e);
With a blocking queue, each iteration is (most likely) going to acquire some lock and release it again, which is not optimal. If you look at the implementation in ArrayBlockingQueue for example, you will see that the lock is acquired once for the whole iteration, probably because the authors of the library found that it was more efficient.
The point is here that all locking and signalling happens outside of the pseudocoded while block, so yes, it is for efficiency only. For non-concurrent queues, there is no such protection anyway, so the while-block would be enough.

Is multithreading a part of Queue contract?

Currently I have an algorithm which somewhat looks like web-spiders or file search systems - it has a collection of the elements to process and processing elements can lead to enqueuing more elements.
However this algorithm is single threaded - it's because I fetch data from the db and would like to have only single db connection at once.
In my current situation performance is not critical - I'm doing this only for the visualization purposes to ease up debugging.
For me it seems natural to use queue abstraction, however it's seems that using queues implies multithreading - as I understand, most of standard java queue implementations reside in java.util.concurrent package.
I understand that I can go on with any data structure that support pull and push but I would like to know what data structure is more natural to use in this case(is it ok to use a queue in a single threaded application?).
It's basically fine to use the java.util.concurrent structures with a single thread.
The main thing to watch out for is blocking calls. If you use a bounded-size structure like an ArrayBlockingQueue, and you call the put method on a queue that's full, then the calling thread will block until there is space in the queue. If you use any kind of queue and you call take when it's empty, the calling thread will block until there's something in the queue. If you application is single-threaded, than those things can never happen, so that means blocking forever.
To avoid put blocking, you could use an unbounded structure like a LinkedBlockingQueue. To avoid blocking on removal, use a non-blocking operation - remove throws an exception if the queue is empty, and poll returns null.
Having said that, there are implementations of the Queue interface that are not in java.util.concurrent. ArrayDeque would probably be a good choice.
Queue is defined in java.util. LinkedList is a Queue and not very concurrency-friendly. None of the Queue method blocks, so they should be safe from a single threaded perspective.
It is ok to use any queue in a single threaded application. Synchronization overhead, in absence of concurrent threads, should be negligible, and is noticeable only if element processing time is very short.
If you want to use a Queue with a ThreadPool I sugges using an ExecutorService which combines both for you. The ExecutorService use LinkedBlockingQueue by default.
http://tutorials.jenkov.com/java-util-concurrent/executorservice.html
http://recursor.blogspot.co.uk/2007/03/mini-executorservice-future-how-to-for.html
http://www.vogella.com/articles/JavaConcurrency/article.html

LinkedBlockingQueue vs ConcurrentLinkedQueue

My question relates to this question asked earlier. In situations where I am using a queue for communication between producer and consumer threads would people generally recommend using LinkedBlockingQueue or ConcurrentLinkedQueue?
What are the advantages / disadvantages of using one over the other?
The main difference I can see from an API perspective is that a LinkedBlockingQueue can be optionally bounded.
For a producer/consumer thread, I'm not sure that ConcurrentLinkedQueue is even a reasonable option - it doesn't implement BlockingQueue, which is the fundamental interface for producer/consumer queues IMO. You'd have to call poll(), wait a bit if you hadn't found anything, and then poll again etc... leading to delays when a new item comes in, and inefficiencies when it's empty (due to waking up unnecessarily from sleeps).
From the docs for BlockingQueue:
BlockingQueue implementations are designed to be used primarily for producer-consumer queues
I know it doesn't strictly say that only blocking queues should be used for producer-consumer queues, but even so...
This question deserves a better answer.
Java's ConcurrentLinkedQueue is based on the famous algorithm by Maged M. Michael and Michael L. Scott for non-blocking lock-free queues.
"Non-blocking" as a term here for a contended resource (our queue) means that regardless of what the platform's scheduler does, like interrupting a thread, or if the thread in question is simply too slow, other threads contending for the same resource will still be able to progress. If a lock is involved for example, the thread holding the lock could be interrupted and all threads waiting for that lock would be blocked. Intrinsic locks (the synchronized keyword) in Java can also come with a severe penalty for performance - like when biased locking is involved and you do have contention, or after the VM decides to "inflate" the lock after a spin grace period and block contending threads ... which is why in many contexts (scenarios of low/medium contention), doing compare-and-sets on atomic references can be much more efficient and this is exactly what many non-blocking data-structures are doing.
Java's ConcurrentLinkedQueue is not only non-blocking, but it has the awesome property that the producer does not contend with the consumer. In a single producer / single consumer scenario (SPSC), this really means that there will be no contention to speak of. In a multiple producer / single consumer scenario, the consumer will not contend with the producers. This queue does have contention when multiple producers try to offer(), but that's concurrency by definition. It's basically a general purpose and efficient non-blocking queue.
As for it not being a BlockingQueue, well, blocking a thread to wait on a queue is a freakishly terrible way of designing concurrent systems. Don't. If you can't figure out how to use a ConcurrentLinkedQueue in a consumer/producer scenario, then just switch to higher-level abstractions, like a good actor framework.
LinkedBlockingQueue blocks the consumer or the producer when the queue is empty or full and the respective consumer/producer thread is put to sleep. But this blocking feature comes with a cost: every put or take operation is lock contended between the producers or consumers (if many), so in scenarios with many producers/consumers the operation might be slower.
ConcurrentLinkedQueue is not using locks, but CAS, on its add/poll operations potentially reducing contention with many producer and consumer threads. But being an "wait free" data structure, ConcurrentLinkedQueue will not block when empty, meaning that the consumer will need to deal with the poll() returning null values by "busy waiting", for example, with the consumer thread eating up CPU.
So which one is "better" depends on the number of consumer threads, on the rate they consume/produce, etc. A benchmark is needed for each scenario.
One particular use case where the ConcurrentLinkedQueue is clearly better is when producers first produce something and finish their job by placing the work in the queue and only after the consumers starts to consume, knowing that they will be done when queue is empty. (here is no concurrency between producer-consumer but only between producer-producer and consumer-consumer)
Another solution (that does not scale well) is rendezvous channels : java.util.concurrent SynchronousQueue
If your queue is non expandable and contains only one producer/consumer thread. You can use lockless queue (You don't need to lock the data access).

Categories

Resources