Currently I have an algorithm which somewhat looks like web-spiders or file search systems - it has a collection of the elements to process and processing elements can lead to enqueuing more elements.
However this algorithm is single threaded - it's because I fetch data from the db and would like to have only single db connection at once.
In my current situation performance is not critical - I'm doing this only for the visualization purposes to ease up debugging.
For me it seems natural to use queue abstraction, however it's seems that using queues implies multithreading - as I understand, most of standard java queue implementations reside in java.util.concurrent package.
I understand that I can go on with any data structure that support pull and push but I would like to know what data structure is more natural to use in this case(is it ok to use a queue in a single threaded application?).
It's basically fine to use the java.util.concurrent structures with a single thread.
The main thing to watch out for is blocking calls. If you use a bounded-size structure like an ArrayBlockingQueue, and you call the put method on a queue that's full, then the calling thread will block until there is space in the queue. If you use any kind of queue and you call take when it's empty, the calling thread will block until there's something in the queue. If you application is single-threaded, than those things can never happen, so that means blocking forever.
To avoid put blocking, you could use an unbounded structure like a LinkedBlockingQueue. To avoid blocking on removal, use a non-blocking operation - remove throws an exception if the queue is empty, and poll returns null.
Having said that, there are implementations of the Queue interface that are not in java.util.concurrent. ArrayDeque would probably be a good choice.
Queue is defined in java.util. LinkedList is a Queue and not very concurrency-friendly. None of the Queue method blocks, so they should be safe from a single threaded perspective.
It is ok to use any queue in a single threaded application. Synchronization overhead, in absence of concurrent threads, should be negligible, and is noticeable only if element processing time is very short.
If you want to use a Queue with a ThreadPool I sugges using an ExecutorService which combines both for you. The ExecutorService use LinkedBlockingQueue by default.
http://tutorials.jenkov.com/java-util-concurrent/executorservice.html
http://recursor.blogspot.co.uk/2007/03/mini-executorservice-future-how-to-for.html
http://www.vogella.com/articles/JavaConcurrency/article.html
Related
Currently we have LinkedBlockingQueue and ConcurrentLinkedQueue.
LinkedBlockingQueue can be bounded, but it uses locks.
ConcurrentLinkedQueue doesn't use locks, but it is not bounded. And it is doesn't block which makes it hard to poll.
Obviously I can't have a queue that both blocks and is lock-free (wait-free or non-blocking or something else). I don't ask for academical definitions.
Does anyone know a queue implementation that is mostly lock-free (doesn't use a lock in the hot path), blocks when empty (no need to busy waiting), and is bounded (blocking when full)? Off-heap solution is welcome as well.
I heard about LMAX Disruptor, but it doesn't look like a queue at all.
I am happy to know non-general solutions too (Single-Producer-Single-Consumer, SPMC, MPSC)
If there are no known implementations, I am also happy to know possible algorithms.
The lock-free data structures use atomic reads and writes (e.g. compare-and-swap) to eliminate the need for locks. Naturally, these data structures never blocks.
What you describe is a queue that uses lock-free mechanisms for non-blocking calls, e.g. remove() with non-empty queue, while uses lock to block for e.g. remove() on empty queue.
As you might realize this is not possible to implement. If, for example, you were to after a pop operation, see if the queue was in fact empty and then proceed to block, by the time you block, the queue might already have one or more items inserted by another thread.
I got to know that we can use BlockingQueue instead of classical wait() and notify() while implementing the Producer Consumer pattern. My question is, which implementation is more efficient? In an article about blocking queues it's been written that- "you don't require to use wait and notify to communicate between Producer and Consumer"
Read more: http://javarevisited.blogspot.com/2012/02/producer-consumer-design-pattern-with.html#ixzz2lczIZ3Mo" . Does this simplicity come at the cost of efficiency??
The BlockingQueue will be faster, because it does not use wait/notify or synchronized for the queue access. All concurrent packages implement the lock-free algorithms using the Atomic-classes.
Think about a queue of 100 elements, and 1000 Threads wanting to do their work. With a synchronized implementation, for each element 999 Threads need to wait, till 1 Thread has picked it's task. With a lock-free algorithm, 100 Threads simultaneously pick their task, and only the other 900 have to wait.
If the number of objects produced/consumed every second is less than 100000, then you'll be unable to see the difference for standard or your own implementations.
Otherwise, you have following options to speed up your code:
use ArrayBlockingQueue instead of LinkedBlockingQueue: no need to create wrapper object for each transferred message. Another advantage of ArrayBlockingQueue is that producer thread is blocked if the queue is full - and indeed, producer should slow down if consumer is not fast, otherwise, we will end up with memory exhausted.
send messages in batches, say in arrays of 10 messages each. This reduces the contention of threads on shared object.
If you have to send tens of millions messages per second, look at Lmax Disruptor.
BlockingQueue is simply a class that puts wait() and notify() to this common use. Generally, doing it yourself is just reinventing the wheel, and only worth it if you have lots of producers and consumers and you can optimize in some way that's specific to your code.
The ArrayBlockingQueue will block the producer thread if the queue is full and it will block the consumer thread if the queue is empty.
Does not this concept of blocking goes against the very idea of multi threading? if I have a 'main' thread and let us say I want to delegate all 'Logging' activities to another thread. So Basically inside my main thread,I create a Runnable to log the output and I put the Runnable on an ArrayBlockingQueue. The whole purpose of doing this is have the 'main' thread return immediately without wasting any time in an expensive logging operation.
But if the queue is full, the main thread will be blocked and will wait until a spot is available. So how does it help us?
The queue doesn't block out of spite, it blocks to introduce an additional quality into the system. In this case, it's prevention of starvation.
Picture a set of threads, one of which produces work units really fast. If the queue were to be allowed unbounded growth, potentially, the "rapid producer" queue could hog all the producing capacity. Sometimes, prevention of such side-effects is more important than having all threads unblocked.
I think this is the designer's decision. If he chose blocking mode ArrayBlockingQueue provides it with put method. If the desiner dont want blocking mode ArrayBlockingQueue has offer method which will return false when queue is full but then he needs to decide what to do with regected logging event.
In your example I would consider blocking to be a feature: It prevents an OutOfMemoryError.
Generally speaking, one of your threads is just not fast enough to cope with the assigned load. So the others must slow down somehow in order not to endanger the whole application.
On the other hand, if the load is balanced, the queue will not block.
Blocking is a necessary function of multithreading. You must block to have synchronized access to data. It does not defeat the purpose of multithreading.
I would suggest throwing an exception when the producer attempts to submit an item to a queue which is full. There are methods to test if the capacity is full beforehand I believe.
This would allow the invoking code to decide how it wants to handle a full queue.
If execution order when processing items from the queue is unimportant, I recommend using a threadpool (known as an ExecutorService in Java).
It depends on the nature of your multi threading philosophy. For those of us who favour Communicating Sequential Processes a blocking queue is nearly perfect. In fact, the ideal would be one where no message can be put into the queue at all unless the receiver is ready to receive it.
So no, I don't think that a blocking queue goes against the very purpose of multi-threading. In fact, the scenario that you describe (the main thread eventually getting stalled) is a good illustration of the major problem with the actor-model of multi-threading; you've no idea whether or not it will deadlock / block, and you can't exhaustively test for it either.
In contrast, imagine a blocking queue that is zero messages deep. That way for the system to work at all you'd have to find a way to ensure that the logger is always guaranteed to be able to receive a message from the main thread. That's CSP. It might mean that in your hypothetical logger thread you have to have application defined buffering (as opposed to some framework developer's best guess of how deep a FIFO should be), a fast I/O subsystem, checks for keeping up, ways of dealing with falling behind, etc. In short it doesn't let you get away with it, you're forced to address every aspect of your system's performance.
That is of course harder, but that way you end up with a system that's definitely OK rather than the questionable "maybe" that you have if your blocking queues are an unknown number of messages deep.
It sounds like you have the general idea right of why you'd use something like an ArrayBlockingQueue to talk between threads.
Having a blocking queue gives you the option to do something different in case something goes wrong with your background worker threads, rather than blindly adding more requests to the queue. If there is room in the queue, there is no blocking.
For your specific use case, though, I would use ExecutorService rather than reading/writing queues directly, which creates a pool of background worker threads:
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html
pool = Executors.newFixedThreadPool(poolSize);
pool.submit(myRunnable);
A multithreaded program is non-deterministic insofar as you can't say beforehand: n producer actions will take exactly as long as m consumer actions. Therefore, synchronization between n producers and m consumers is necessary in every case.
You'll want to choose the queue size so that the number of active producers and consumers is maximized most of the time. But the thread model of java does not guarantee that any consumer will run unless it is the only unblocked thread. (Yet, of course, on multi-core CPUs it is very likely that the consumer will run).
You have to make a choice about what to do when a Queue is full. In the case of an Array Blocking queue, that choice is to wait.
Another option would be to just throw away new Objects if the queue was full; you can achieve this with offer.
You have to make a trade-off.
The getQueue() method provides access to the underlying blocking queue in the ThreadPoolExecutor, but this does not seem to be safe.
A traversal over the queue returned by this function might miss updates made to the queue by the ThreadPoolExecutor.
"Method getQueue() allows access to the work queue for purposes of monitoring and debugging. Use of this method for any other purpose is strongly discouraged."
What would you do if you wanted to traverse the workQueue used by the ThreadPoolExecutor? Or is there an alternate approach?
This is a continuation of..
Choosing a data structure for a variant of producer consumer problem
Now, I am trying the multiple producer multiple consumer, but I want to use some existing threadpool, since I don't want to manage the threadpool myself, and also I want a callback when ThreadPoolExecutor has finished executing some task alongwith the ability to examine in a thread safe way the "inprogress transactions" data structure.
You can override the beforeExecute and afterExecute methods to let you know that a task has started and finished. You can override execute() to know when a task is added.
The problem you have is that the Queue is not designed to be queried and a task can be consumed before you see it. One way around this is to create you own implementation of a Queue (perhaps overriding/wrapping a ConcurrentLinkedQueue)
BTW: The queue is thread-safe, however it is not guaranteed you will see every entry.
A ConcurrentLinkedQueue.iterator() is documented as
Returns an iterator over the elements in this queue in proper sequence. The returned iterator is a "weakly consistent" iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator, and may (but is not guaranteed to) reflect any modifications subsequent to construction.
If you wish to copy the items in the queue and ensure that what you have in the queue has not been executed, you might try this:
a) Introduce the ability to pause and resume execution. See: http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html
b) first pause the queue, then copy the queue, then resume the queue.
And then i have my own question. The problem i see is that while you execute your "Runnable", that "Runnable" is not placed in the queue, but a FutureTask "wrapper", and i cannot find any way to determine just which one of my runnables i'm looking at. So, grabbing and examining the queue is pretty useless. Does anybody know aht i missed there?
If you are following Jon Skeet's advice in your accepted answer from your previous question, then you'll be controlling access to your queues via locks. If you acquire a lock on the in-progress queue then you can guarantee that a traversal will not miss any items in it.
The problem with this of course is that while you are doing the traverse all other operations on the queue (other producers and consumers trying to access it) will block, which could have a pretty dire effect on performance.
I have a producer app that generates an index (stores it in some in-memory tree data structure). And a consumer app will use the index to search for partial matches.
I don't want the consumer UI to have to block (e.g. via some progress bar) while the producer is indexing the data. Basically if the user wishes to use the partial index, it will just do so. In this case, the producer will potentially have to stop indexing for a while until the user goes away to another screen.
Roughly, I know I will need the wait/notify protocol to achieve this. My question: is it possible to interrupt the producer thread using wait/notify while it is doing its business ? What java.util.concurrent primitives do I need to achieve this ?
The way you've described this, there's no reason that you need wait/notify. Simply synchronize access to your data structure, to ensure that it is in a consistent state when accessed.
Edit: by "synchronize access", I do not mean synchronize the entire data structure (which would end up blocking either producer or consumer). Instead, synchronize only those bits that are being updated, and only at the time that you update them. You'll find that most of the producer's work can take place in an unsynchronized manner: for example, if you're building a tree, you can identify the node where the insert needs to happen, synchronize on that node, do the insert, then continue on.
In your producer thread, you are likely to have some kind of main loop. This is probably the best place to interrupt your producer. Instead of using wait() and notify() I suggest you use the java synchronization objects introduced in java 5.
You could potentially do something like that
class Indexer {
Lock lock = new ReentrantLock();
public void index(){
while(somecondition){
this.lock.lock();
try{
// perform one indexing step
}finally{
lock.unlock();
}
}
}
public Item lookup(){
this.lock.lock();
try{
// perform your lookup
}finally{
lock.unlock();
}
}
}
You need to make sure that each time the indexer releases the lock, your index is in a consistent, legal state. In this scenario, when the indexer releases the lock, it leaves a chance for a new or waiting lookup() operation to take the lock, complete and release the lock, at which point your indexer can proceed to its next step. If no lookup() is currently waiting, then your indexer just reaquires the lock itself and goes on with its next operation.
If you think you might have more that one thread trying to do the lookup at the same time, you might want to have a look at the ReadWriteLock interface and ReentrantReadWriteLock implementation.
Of course this solution is the simple way to do it. It will block either one of the threads that doesn't have the lock. You may want to check if you can just synchronize on your data structure directly, but that might prove tricky since building indexes tends to use some sort of balanced tree or B-Tree or whatnot where node insertion is far from being trivial.
I suggest you first try that simple approach, then see if the way it behaves suits you. If it doesn't, you may either try breaking up the the indexing steps into smaller steps, or try synchronizing on only parts of your data structure.
Don't worry too much about the performance of locking, in java uncontended locking (when only one thread is trying to take the lock) is cheap. As long as most of your locking is uncontented, locking performance is nothing to be concerned about.
The producer application can have two indices: published and in-work. The producer will work only with in-work, the consumer will work only with published. Once the producer done with indexing it can replace in-work one with published (usually swapping one pointer). The producer may also publish copy of the partial index if will bring value. This way you will avoid long term locks -- it will be useful when index accessed by lost of consumers.
No, that's not possible.
The only way of notifying a thread without any explicit code in the thread itself is to use Thread.interrupt(), which will cause an exception in the thread. interrrupt() is usually not very reliable though, because throwing a exception at some random point in the code is a nightmare to get right in all code paths. Beside that, a single try{}catch(Throwable){} somewhere in the thread (including any libraries that you use) could be enough to swallow the signal.
In most cases, the only correct solution is use a shared flag or a queue that the consumer can use to pass messages to the producer. If you worry about the producer being unresponsive or freezing, run it in a separate thread and require it to send heartbeat messages every n seconds. If it does not send a heartbeat, kill it. (Note that determining whether a producer is actually freezing, and not just waiting for an external event, is often very hard to get right as well).