Wondering why the method drainTo is present only in the concurrent collection framework (BlockingQueue in particular) and not in the regular one. Is there any reason for that?
Thanks in advance.
As always with that sort of question it is difficult to say without asking the author of the class himself. But we can make some educated guesses.
The javadoc states:
Removes all available elements from this queue and adds them to the given collection. This operation may be more efficient than repeatedly polling this queue.
So the underlying reason is probably for efficiency.
drainTo is essentially equivalent (in a single threaded environment, for simplicity) to:
while ((e = queue.poll()) != null) collection.add(e);
With a blocking queue, each iteration is (most likely) going to acquire some lock and release it again, which is not optimal. If you look at the implementation in ArrayBlockingQueue for example, you will see that the lock is acquired once for the whole iteration, probably because the authors of the library found that it was more efficient.
The point is here that all locking and signalling happens outside of the pseudocoded while block, so yes, it is for efficiency only. For non-concurrent queues, there is no such protection anyway, so the while-block would be enough.
Related
public BlockingQueue<Message> Queue;
Queue = new LinkedBlockingQueue<>();
I know if I use, say a synchronized List, I need to surround it in synchronized blocks to safely use it across threads
Is that the same for Blocking Queues?
No you do not need to surround with synchronized blocks.
From the JDK javadocs...
BlockingQueue implementations are thread-safe. All queuing methods achieve their effects atomically using internal locks or other forms of concurrency control. However, the bulk Collection operations addAll, containsAll, retainAll and removeAll are not necessarily performed atomically unless specified otherwise in an implementation. So it is possible, for example, for addAll(c) to fail (throwing an exception) after adding only some of the elements in c.
Just want to point out that from my experience the classes in the java.util.concurrent package of the JDK do not need synchronization blocks. Those classes manage the concurrency for you and are typically thread-safe. Whether intentional or not, seems like the java.util.concurrent has superseded the need to use synchronization blocks in modern Java code.
Depends on use case, will explain 2 scenarios where you may need synchronized blocks or dont need it.
Case 1: Not required while using queuing methods e.g. put, take etc.
Why not required is explained here, important line is below:
BlockingQueue implementations are thread-safe. All queuing methods
achieve their effects atomically using internal locks or other forms
of concurrency control.
Case 2: Required while iterating over blocking queues and most concurrent collections
Since iterator (one example from comments) is weakly consistent, meaning it reflects some but not necessarily all of the changes that have been made to its backing collection since it was created. So if you care about reflecting all changes you need to use synchronized blocks/ Locks while iterating.
You are thinking about synchronization at too low a level. It doesn't have anything to do with what classes you use. It's about protecting data and objects that are shared between threads.
If one thread is able to modify any single data object or group of related data objects while other threads are able to look at or modify the same object(s) at the same time, then you probably need synchronization. The reason is, it often is not possible for one thread to modify data in a meaningful way without temporarily putting the data into an invalid state.
The purpose of synchronization is to prevent other threads from seeing the invalid state and possibly doing bad things to the same data or to other data as a result.
Java's Collections.synchronizedList(...) gives you a way for two or more threads to share a List in such a way that the list itself is safe from being corrupted by the action of the different threads. But, It does not offer any protection for the data objects that are in the List. If your application needs that protection, then it's up to you to supply it.
If you need the equivalent protection for a queue, you can use any of the several classes that implement java.util.concurrent.BlockingQueue. But beware! The same caveat applies. The queue itself will be protected from corruption, but the protection does not automatically extend to the objects that your threads pass through the queue.
Currently we have LinkedBlockingQueue and ConcurrentLinkedQueue.
LinkedBlockingQueue can be bounded, but it uses locks.
ConcurrentLinkedQueue doesn't use locks, but it is not bounded. And it is doesn't block which makes it hard to poll.
Obviously I can't have a queue that both blocks and is lock-free (wait-free or non-blocking or something else). I don't ask for academical definitions.
Does anyone know a queue implementation that is mostly lock-free (doesn't use a lock in the hot path), blocks when empty (no need to busy waiting), and is bounded (blocking when full)? Off-heap solution is welcome as well.
I heard about LMAX Disruptor, but it doesn't look like a queue at all.
I am happy to know non-general solutions too (Single-Producer-Single-Consumer, SPMC, MPSC)
If there are no known implementations, I am also happy to know possible algorithms.
The lock-free data structures use atomic reads and writes (e.g. compare-and-swap) to eliminate the need for locks. Naturally, these data structures never blocks.
What you describe is a queue that uses lock-free mechanisms for non-blocking calls, e.g. remove() with non-empty queue, while uses lock to block for e.g. remove() on empty queue.
As you might realize this is not possible to implement. If, for example, you were to after a pop operation, see if the queue was in fact empty and then proceed to block, by the time you block, the queue might already have one or more items inserted by another thread.
Do mutex locks ensure bounded waiting condition ? Is it possible if two threads are trying to get hold of a lock, but only one process (just by luck) gets it again and again. Since Peterson's Algorithm ensures bounded waiting, is it better to use that instead of mutex locks ?
It is possible to have unbounded wait with mutices, if for instance locking attempts keep coming in on a mutex, at least in C++ std::mutex there's no guaranteed first comes first gets.
However this shouldn't really be a concern - Unless you have some lock with many many threads locking all the time (and even in that case it's very unlikely to cause some starvation situation).
The best thing to do is always use standard library locking mechanism and not write your own mutices.
Mutex with "bounded waiting condition" is called Fair sometimes. As gbehar correctly mention above, C++ standard doesn't define fairness for std::mutex. If you really need fair mutex, you can look at Intel TBB, where fairness is guaranteed for some kinds of them. I would like to remember that fairness comes not without overhead.
See https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Mutex_Flavors.html for details.
Update: Current link https://github.com/oneapi-src/oneTBB/blob/master/doc/main/tbb_userguide/Mutex_Flavors.rst
Currently I have an algorithm which somewhat looks like web-spiders or file search systems - it has a collection of the elements to process and processing elements can lead to enqueuing more elements.
However this algorithm is single threaded - it's because I fetch data from the db and would like to have only single db connection at once.
In my current situation performance is not critical - I'm doing this only for the visualization purposes to ease up debugging.
For me it seems natural to use queue abstraction, however it's seems that using queues implies multithreading - as I understand, most of standard java queue implementations reside in java.util.concurrent package.
I understand that I can go on with any data structure that support pull and push but I would like to know what data structure is more natural to use in this case(is it ok to use a queue in a single threaded application?).
It's basically fine to use the java.util.concurrent structures with a single thread.
The main thing to watch out for is blocking calls. If you use a bounded-size structure like an ArrayBlockingQueue, and you call the put method on a queue that's full, then the calling thread will block until there is space in the queue. If you use any kind of queue and you call take when it's empty, the calling thread will block until there's something in the queue. If you application is single-threaded, than those things can never happen, so that means blocking forever.
To avoid put blocking, you could use an unbounded structure like a LinkedBlockingQueue. To avoid blocking on removal, use a non-blocking operation - remove throws an exception if the queue is empty, and poll returns null.
Having said that, there are implementations of the Queue interface that are not in java.util.concurrent. ArrayDeque would probably be a good choice.
Queue is defined in java.util. LinkedList is a Queue and not very concurrency-friendly. None of the Queue method blocks, so they should be safe from a single threaded perspective.
It is ok to use any queue in a single threaded application. Synchronization overhead, in absence of concurrent threads, should be negligible, and is noticeable only if element processing time is very short.
If you want to use a Queue with a ThreadPool I sugges using an ExecutorService which combines both for you. The ExecutorService use LinkedBlockingQueue by default.
http://tutorials.jenkov.com/java-util-concurrent/executorservice.html
http://recursor.blogspot.co.uk/2007/03/mini-executorservice-future-how-to-for.html
http://www.vogella.com/articles/JavaConcurrency/article.html
I have a producer app that generates an index (stores it in some in-memory tree data structure). And a consumer app will use the index to search for partial matches.
I don't want the consumer UI to have to block (e.g. via some progress bar) while the producer is indexing the data. Basically if the user wishes to use the partial index, it will just do so. In this case, the producer will potentially have to stop indexing for a while until the user goes away to another screen.
Roughly, I know I will need the wait/notify protocol to achieve this. My question: is it possible to interrupt the producer thread using wait/notify while it is doing its business ? What java.util.concurrent primitives do I need to achieve this ?
The way you've described this, there's no reason that you need wait/notify. Simply synchronize access to your data structure, to ensure that it is in a consistent state when accessed.
Edit: by "synchronize access", I do not mean synchronize the entire data structure (which would end up blocking either producer or consumer). Instead, synchronize only those bits that are being updated, and only at the time that you update them. You'll find that most of the producer's work can take place in an unsynchronized manner: for example, if you're building a tree, you can identify the node where the insert needs to happen, synchronize on that node, do the insert, then continue on.
In your producer thread, you are likely to have some kind of main loop. This is probably the best place to interrupt your producer. Instead of using wait() and notify() I suggest you use the java synchronization objects introduced in java 5.
You could potentially do something like that
class Indexer {
Lock lock = new ReentrantLock();
public void index(){
while(somecondition){
this.lock.lock();
try{
// perform one indexing step
}finally{
lock.unlock();
}
}
}
public Item lookup(){
this.lock.lock();
try{
// perform your lookup
}finally{
lock.unlock();
}
}
}
You need to make sure that each time the indexer releases the lock, your index is in a consistent, legal state. In this scenario, when the indexer releases the lock, it leaves a chance for a new or waiting lookup() operation to take the lock, complete and release the lock, at which point your indexer can proceed to its next step. If no lookup() is currently waiting, then your indexer just reaquires the lock itself and goes on with its next operation.
If you think you might have more that one thread trying to do the lookup at the same time, you might want to have a look at the ReadWriteLock interface and ReentrantReadWriteLock implementation.
Of course this solution is the simple way to do it. It will block either one of the threads that doesn't have the lock. You may want to check if you can just synchronize on your data structure directly, but that might prove tricky since building indexes tends to use some sort of balanced tree or B-Tree or whatnot where node insertion is far from being trivial.
I suggest you first try that simple approach, then see if the way it behaves suits you. If it doesn't, you may either try breaking up the the indexing steps into smaller steps, or try synchronizing on only parts of your data structure.
Don't worry too much about the performance of locking, in java uncontended locking (when only one thread is trying to take the lock) is cheap. As long as most of your locking is uncontented, locking performance is nothing to be concerned about.
The producer application can have two indices: published and in-work. The producer will work only with in-work, the consumer will work only with published. Once the producer done with indexing it can replace in-work one with published (usually swapping one pointer). The producer may also publish copy of the partial index if will bring value. This way you will avoid long term locks -- it will be useful when index accessed by lost of consumers.
No, that's not possible.
The only way of notifying a thread without any explicit code in the thread itself is to use Thread.interrupt(), which will cause an exception in the thread. interrrupt() is usually not very reliable though, because throwing a exception at some random point in the code is a nightmare to get right in all code paths. Beside that, a single try{}catch(Throwable){} somewhere in the thread (including any libraries that you use) could be enough to swallow the signal.
In most cases, the only correct solution is use a shared flag or a queue that the consumer can use to pass messages to the producer. If you worry about the producer being unresponsive or freezing, run it in a separate thread and require it to send heartbeat messages every n seconds. If it does not send a heartbeat, kill it. (Note that determining whether a producer is actually freezing, and not just waiting for an external event, is often very hard to get right as well).