I am having some trouble grasping the idea of a concurrent queue. I understand a queue is a FIFO, or first come first serve, data structure.
Now when we add the concurrency part, which I interpret as thread safety (please let me know if that is incorrect) things get a bit fuzzy. By concurrency we mean the way various threads can add to the queue, or delete (service an item) from the queue? Is concurrency providing a sense of ordering to this operations?
I would greatly appreciate a general description of the functionality of a concurrent queue. A similar post here is not as general as I hoped.
Also is there such a thing as a concurrent priority queue? What would be its usage?
Many thanks in advance, for any brief explanations or helpful links on this subject.
The notion that a BlockingQueue offers little overhead is a bit miss leading. Acquiring a lock invokes pretty substantial overhead. Alone with the context switching we are talking thousands of instructions. Not just that but the progress of one thread will directly affect another thread. Now, its not as bad as it was years ago, but compared to non blocking, it is substantial.
BlockingQueue's use locks for mutual exclusion
ArrayBlockingQueue, LinkedBlockingQueue, PriorityBlockingQUeue: are three blocking queue's while
ConcurrentLinkedQueue, java 1.7 LinkedTransferQueue: Uses the Michael and Scott, non blocking queue algorithm.
Under moderate to low contention (which is more of a real world scenario), the non blocking queues significantly out perform blocking queues.
And to note on Steve's comment about the lack of bottlenecks. Under heavy contention a non blocking algorithm can bottle neck on the constant cas attempts, while blocking will suspend the threads. We then see that a BlockingQueue under heavy contention slightly out performs a non blocking queue, but that type of contention isn't a norm by any means.
I understand by "concurrency" that the queue is thread-safe. This does not mean that it will be efficient. However, I would imagine that the Java queue use a lock-free implementation which means that there is little or no penatly when two threads attempt a push or a pop at the same time. What generally happens is that they use atomic locking at an assembler level which ensures that the same object cannot be popped twice.
I once wrote a lock-free FIFO queue (in Delphi ) which worked very well. Much more efficient that a previous version which used Critical sections. The CS version ground to a halt especially with many threads all trying to access the queue. The lock-free version however had no bottlenecks depsite many threads accessing it a lot.
You should start by checking out the BlockingQueue interface definition as this is the cornerstone for using queues for communication between threads and contains utility methods to allow producer and consumer threads to access the queue in either a blocking or non-blocking fashion. This, along with thread-safe access is my understanding of what constitutes a "concurrent queue" (although I've never heard of that phrase - BlockingQueue merely exists in the java.util.concurrent package).
To answer the second part of your question, the priority queue implementation you should study is PriorityBlockingQueue. This may be useful if your producer thread(s) are producing tasks of varying priorities (e.g. requests from "normal users" and "power users") and you wish to control the order in which tasks are processed by your consumer thread(s). One possible pitfall to avoid is the starvation of low priority tasks that are never removed from the queue due to the constant influx of higher priority tasks.
Just leaving here a link to the java.util.concurrent package that I think contains very important information about some questions raised here.
See: Concurrent Collections and Memory Consistency Properties
Related
Given we have an application that is heavily polluted with concurrency constructs
multiple techniques are used (different people worked without clear architecture in mind),
multiple questionable locks that are there "just in case", thread safe queues. CPU usage is around 20%.
Now my goal is to optimize it such that it is making better use of caches and generally improve its performance and service time.
I'm considering to pin the parent process to a single core, remove all things that cause membars,
replace all thread safe data structures and replace all locks with some UnsafeReentrantLock
which would simply use normal reference field but take care of exclusive execution
needs...
I expect that we would end up with much more cache friendly application,
since we don't have rapid cache flushes all the time (no membars).
We would have less overhead since we dont need thread safe data structures,
volaties, atomics and replace all sorts of locks with I would assume that service time would improve also,
since we no longer synchronize on multiple thread safe queues...
Is there something that I'm overlooking here?
Maybe blocking operations would have to be paid attention to since they would not show up in that 20% usage?
In event processing a function puts values into a collection and another removes from the same collection. The items should be placed inside the collection in the order they received from the source (sockets) and read in the same way or else the results will change.
Queue is the collection most people recommend but at the same time, is the queue blocked when an item is being added and hence the other function has to wait until the adding is completed making it inefficient and the operational latency increases over time.
For example, one thread reads from a queue and another writes to the same queue. Either one operation performs at a time on queue until it releases a lock. Is there any data structure that avoids this.
ConcurrentLinkedQueue is one of the examples. Please see other classes from java.util.concurrent.
There are even more performant third party libraries for specific cases, e.g. LMAX Disruptor
In fact, the LinkedBlockingQueue is the easiest to use in many cases because of its blocking put and take methods, which wait until there's an item to take, or space for another item to insert in case an upper size limit named capacity has been activated. Setting a capacity is optional, and without one, the queue can grow indefinitely.
The ArrayBlockingQueue, on the other hand, is the most efficient and beautiful of them, it internally uses a ring buffer and therefore must have an fixed capacity. It is way faster than the LinkedBlockingQueue, yet far from the maximum throughput you can achieve with a disruptor :)
In both cases, blocking is purely optional on both sides. The non-blocking API of all concurrent queues is also supported. The blocking and non-blocking APIs can be mixed.
In many cases, the queue is not the bottleneck, and when it really is, using a disruptor is often the sensible thing to do. It is not a queue but a ring buffer shared between participating threads with different roles, i.e. typically one producer, n workers, and one consumer. A bit more cumbersome to set up but speeds around 100 million transactions per second are possible on modern hardware because it does not require expensive volatile variables but relies on more subtle ways of serialising reads and writes that are machine dependent (you basically need to write parts of such a thing in assembler) :)
I was looking at ArrayBlockingQueue
For the fair option I can pass in the constructor, what does it actually mean to be fair?
fair - if true then queue accesses for threads blocked on insertion or
removal, are processed in FIFO order; if false the access order is
unspecified.
From what I understand fair means FIFO? Not what I needed? Eg. 1 thread should not keep accessing the queue?
FAIR is to implement a fair scheduling policyor to allow the
implementation to choose one. Fair scheduling sounds like the better
alternative, since it avoids the possibility that an unlucky thread
might be delayed indefinitely but, in practice, the benefits it
provides are rarely important enough to justify incurring the large
overhead that it imposes on a queue's operation. If fair scheduling is
not specified, ArrayBlockingQueue will normally approximate fair
operation, but with no guarantees.
Reference with Code
Fair means guaranteed FIFO access. Java 7 will literally create a queue for any thread that attempts to access the queue when it's lock has already been taken.
Fair queues will be significantly slower than unfair queues on a system with high usage of the Array Blocking Queue, due to the maintenance of the queue for thread ordering. If it isn't extremely important that all the threads progress at a very similar rate, it's probably worth keeping the queue unfair.
Depending on your problem, you can define what is fair. You can say that fair is a timeslot in wich a thread can access a resource. Or you can define fair as a thread accessing a resource in a first-in first-served manner.
FIFO is fair in the order of access to a resource.
In a life without Java Executors, new threads would have to be created for each Runnable tasks. Making new threads requires thread overhead (creation and teardown) that adds complexity and wasted time to a non-Executor program.
Referring to code:
no Java Executor -
new Thread (aRunnableObject).start ();
with Java Executor -
Executor executor = some Executor factory method;
exector.execute (aRunnable);
Bottom line is that Executors abstract the low-level details of how to manage threads.
Is that true?
Thanks.
Bottom line is that Executors abstract the low-level details of how to manage threads. Is that true?
Yes.
They deal with issues such as creating the thread objects, maintaining a pool of threads, controlling the number of threads are running, and graceful / less that graceful shutdown. Doing these things by hand is non-trivial.
EDIT
There may or may not be a performance hit in doing this ... compared with a custom implementation perfectly tuned to the precise needs of your application. But the chances are that:
your custom implementation wouldn't be perfectly tuned, and
the performance difference wouldn't be significant anyway.
Besides, the Executor support classes allow you to simply tune various parameters (e.g. thread pool sizes) if there is an issue that needs to be addressed. I don't see how garbage collection overheads would be significantly be impacted by using Executors, one way or the other.
As a general rule, you should focus on writing your applications simply and robustly (e.g. using the high level concurrency support classes), and only worry about performance if:
your application is running "too slow", and
the profiling tools tell you that you've got a problem in a particular area.
Couple of benefits of executors as against normal threads.
Throttling can be achieved easily by varying the size of ThreadPools. This helps keeping control/check on the number of threads flowing through your application. Particularly helpful when benchmarking your application for load bearing.
Better management of Runnable tasks can be achieved using the RejectionHandlers.
I think all that executors do is that they will do the low level tasks
for you, but you still have to judiciously decide which thread pool do
you want. I mean if your use case needs maximum 5 threads and you go
and use thread pool having 100 threads, then certainly it is going to
have impact on performance. Other than this there is noting extra
being done at low level which is going to halt the system. And last of
all, it is always better to get an idea what is being done at low
level so that it will give us fair idea about the underground things.
Can somebody tell me how I can find out "how many threads are in deadlock condition" in a Java multi-threading application? What is the way to find out the list of deadlocked threads?
I heard about Thread Dump and Stack Traces, but I don't know how to implement it.
I also want to know what new features have been introduced in Java 5 for Threading?
Please let me know with your comments and suggestions.
Way of obtaining thread dumps:
ctrl-break (Windows) or ctrl-\, possibly ctrl-4 and kill -3 on Linux/UNIX
jstack and your process id (use jps)
jconsole or visualvm
just about any debugger
Major new threading features in J2SE 5.0 (released 2004, in End Of Service Life Period):
java.util.concurrent
New Java Memory Model.
use kill -3 on the process id
this will print out to the console a thread dump and an overview of thread contention
From within your program, the ThreadMXBean class has a method findMonitorDeadlockedThreads(), as well as methods for querying the current stack traces of threads. From the console in Windows, doing Ctrl+Break gives you a list of stack traces and indicates deadlocked threads.
As well as some tweaks to the Java memory model that tidy up some concurrency "loopholes", the most significant feature underlyingly in Java 5 is that it exposes Compare-And-Set (CAS) operations to the programmer. Then, on the back of this, a whole raft of concurrency utilities are provided in the platform. There's really a whole host of stuff, but they include:
concurrent collections
executors, which effectively allow you to implement things such as thread pools
other common concurrency constructs (queues, latches, barriers)
atomic variables
You may be interested in some tutorials I've written on many of the Java 5 concurrency features.
If you want to learn about the new concurrent features in Java 5 you could do a lot worse than getting a copy of Java Concurrency in Practice by Brian Goetz (Brian Goetz and a number of the coauthors designed the Java 5 concurrency libraries). It is both highly readable and authoritative , and combining practical examples and theory.
The executive summary of the new concurrent utilities is as follows:
Task Scheduling Framework - The Executor framework is a framework for standardizing invocation, scheduling, execution, and control of asynchronous tasks according to a set of execution policies. Implementations are provided that allow tasks to be executed within the submitting thread, in a single background thread (as with events in Swing), in a newly created thread, or in a thread pool, and developers can create of Executor supporting arbitrary execution policies. The built-in implementations offer configurable policies such as queue length limits and saturation policy which can improve the stability of applications by preventing runaway resource consumption.
Concurrent Collections - Several new Collections classes have been added, including the new Queue and BlockingQueue interfaces, and high-performance, concurrent implementations of Map, List, and Queue.
Atomic Variables - Classes for atomically manipulating single variables (primitive types or references), providing high-performance atomic arithmetic and compare-and-set methods. The atomic variable implementations in java.util.concurrent.atomic offer higher performance than would be available by using synchronization (on most platforms), making them useful for implementing high-performance concurrent algorithms as well as conveniently implementing counters and sequence number generators.
Synchronizers - General purpose synchronization classes, including semaphores, mutexes, barriers, latches, and exchangers, which facilitate coordination between threads.
Locks - While locking is built into the Java language via the synchronized keyword, there are a number of inconvenient limitations to built-in monitor locks. The java.util.concurrent.locks package provides a high-performance lock implementation with the same memory semantics as synchronization, but which also supports specifying a timeout when attempting to acquire a lock, multiple condition variables per lock, non-lexically scoped locks, and support for interrupting threads which are waiting to acquire a lock.
Nanosecond-granularity timing - The System.nanoTime method enables access to a nanosecond-granularity time source for making relative time measurements, and methods which accept timeouts (such as the BlockingQueue.offer, BlockingQueue.poll, Lock.tryLock, Condition.await, and Thread.sleep) can take timeout values in nanoseconds. The actual precision of System.nanoTime is platform-dependent.