Java Fair Queue

Java Fair Queue - java

I was looking at ArrayBlockingQueue
For the fair option I can pass in the constructor, what does it actually mean to be fair?
fair - if true then queue accesses for threads blocked on insertion or
removal, are processed in FIFO order; if false the access order is
unspecified.
From what I understand fair means FIFO? Not what I needed? Eg. 1 thread should not keep accessing the queue?

FAIR is to implement a fair scheduling policyor to allow the
implementation to choose one. Fair scheduling sounds like the better
alternative, since it avoids the possibility that an unlucky thread
might be delayed indefinitely but, in practice, the benefits it
provides are rarely important enough to justify incurring the large
overhead that it imposes on a queue's operation. If fair scheduling is
not specified, ArrayBlockingQueue will normally approximate fair
operation, but with no guarantees.
Reference with Code

Fair means guaranteed FIFO access. Java 7 will literally create a queue for any thread that attempts to access the queue when it's lock has already been taken.
Fair queues will be significantly slower than unfair queues on a system with high usage of the Array Blocking Queue, due to the maintenance of the queue for thread ordering. If it isn't extremely important that all the threads progress at a very similar rate, it's probably worth keeping the queue unfair.

Depending on your problem, you can define what is fair. You can say that fair is a timeslot in wich a thread can access a resource. Or you can define fair as a thread accessing a resource in a first-in first-served manner.
FIFO is fair in the order of access to a resource.

Related

Does acquiring and releasing java monitor locks (synchronized blocks, reentrant locks etc) requires context switch to kernel space?

AFAIK, Every object in Java has a mark word. The first word (the mark word) is used for storing locking information, either through a flag if only one thread is acquiring the lock or pointing to a lock monitor object if there is contention between different threads, and in both the cases, compare and swap construct is used for acquiring the lock.
But according to this link -
https://www.baeldung.com/lmax-disruptor-concurrency
To deal with the write contention, a queue often uses locks, which can cause a context switch to the kernel. When this happens the processor involved is likely to lose the data in its caches.
What am I missing ?

Neither, synchronized nor the standard Lock implementations, require a context switch into the kernel when locking uncontended or unlocking. These operations indeed boil down to an atomic cas or write.
The performance critical aspect is the contention, i.e. when trying to acquire the monitor or lock and it’s not available. Waiting for the availability of the monitor or lock implies putting the thread into a waiting state and reactivating it when the resource became available. The performance impact is so large, that you don’t need to worry about CPU caches at all.
For this reason, typical implementations perform some amount of spinning, rechecking the availability of the monitor or lock in a loop for some time, when there is a chance of becoming available in that time. This is usually tied to the number of CPU cores. When the resource becomes available in that time, these costs can be avoided. This, however, usually requires the acquisition to be allowed to be unfair, as a spinning acquisition may overtake an already waiting thread.
Note that the linked article says before your cited sentence:
Queues are typically always close to full or close to empty due to the differences in pace between consumers and producers.
In such a scenario, the faster threads will sooner or later enter a condition wait, waiting for new space or new items in a queue, even when they acquired the lock without contention. So in this specific scenario, the associated costs are indeed there and unavoidable when using a simple queue implementation.

Any of the collections or data structure is non blocking in java

In event processing a function puts values into a collection and another removes from the same collection. The items should be placed inside the collection in the order they received from the source (sockets) and read in the same way or else the results will change.
Queue is the collection most people recommend but at the same time, is the queue blocked when an item is being added and hence the other function has to wait until the adding is completed making it inefficient and the operational latency increases over time.
For example, one thread reads from a queue and another writes to the same queue. Either one operation performs at a time on queue until it releases a lock. Is there any data structure that avoids this.

ConcurrentLinkedQueue is one of the examples. Please see other classes from java.util.concurrent.
There are even more performant third party libraries for specific cases, e.g. LMAX Disruptor

In fact, the LinkedBlockingQueue is the easiest to use in many cases because of its blocking put and take methods, which wait until there's an item to take, or space for another item to insert in case an upper size limit named capacity has been activated. Setting a capacity is optional, and without one, the queue can grow indefinitely.
The ArrayBlockingQueue, on the other hand, is the most efficient and beautiful of them, it internally uses a ring buffer and therefore must have an fixed capacity. It is way faster than the LinkedBlockingQueue, yet far from the maximum throughput you can achieve with a disruptor :)
In both cases, blocking is purely optional on both sides. The non-blocking API of all concurrent queues is also supported. The blocking and non-blocking APIs can be mixed.
In many cases, the queue is not the bottleneck, and when it really is, using a disruptor is often the sensible thing to do. It is not a queue but a ring buffer shared between participating threads with different roles, i.e. typically one producer, n workers, and one consumer. A bit more cumbersome to set up but speeds around 100 million transactions per second are possible on modern hardware because it does not require expensive volatile variables but relies on more subtle ways of serialising reads and writes that are machine dependent (you basically need to write parts of such a thing in assembler) :)

ArrayBlockingQueue uses a single lock for insertion and removal but LinkedBlockingQueue uses 2 separate locks

I was going through the source code of ArrayBlockingQueue and LinkedBlockingQueue. LinkedBlockingQueue has a putLock and a takeLock for insertion and removal respectively but ArrayBlockingQueue uses only 1 lock. I believe LinkedBlockingQueue was implemented based on the design described in Simple, Fast, and Practical Non-Blocking and Blocking
Concurrent Queue Algorithms. In this paper, they mention that they keep a dummy node so that enqueuers never have to access head and dequeuers never have to access tail which avoids deadlock scenarios. I was wondering why ArrayBlockingQueue doesn't borrow the same idea and use 2 locks instead.

ArrayBlockingQueue has to avoid overwriting entries so that it needs to know where the start and the end is. A LinkedBlockQueue doesn't need to know this as it lets the GC worry about cleaning up Nodes in the queue.

I was wondering why ArrayBlockingQueue doesn't borrow the same idea and use 2 locks instead.
Because the ArrayBlockingQueue uses a much simpler data structure to hold the queue items.
The ArrayBlockingQueue stores its data in one private final E[] items; array. For multiple threads to deal with this same storage space, either if adding or dequeuing, they have to use the same lock. This is not only because of memory barrier but because of mutex protection since they are modifying the same array.
LinkedBlockingQueue on the other hand is a linked list of queue elements that is completely different and allows for the ability to have a dual lock. It is the internal storage of the elements in the queue that enabled the different lock configurations.

2 locks are used in LBQ to restrict access to head and lock concurrently. The head lock disallows two elements from being removed concurrently and tail lock disallows two elements from being concurrently added to the queue. the two lock together prevent races.

I think its possible for ABQ to borrow the same idea as LBQ. Please refer to my code http://pastebin.com/ZD1uFy7S and a similar question i asked on SO ArrayBlockingQueue: concurrent put and take.
The reason why they didn't used it, is mainly because of the complexity in implementation especially iterators and trade off between complexity and performance gain was not that lucrative.
For more reference please have a look at http://jsr166-concurrency.10961.n7.nabble.com/ArrayBlockingQueue-concurrent-put-and-take-tc1306.html .

Why is an ExecutorService created via newCachedThreadPool evil?

Paul Tyma presentation has this line:
Executors.newCacheThreadPool evil, die die die
Why is it evil ?
I will hazard a guess: is it because the number of threads will grow in an unbounded fashion. Thus a server that has been slashdotted, would probably die if the JVM's max thread count was reached ?

(This is Paul)
The intent of the slide was (apart from having facetious wording) that, as you mention, that thread pool grows without bound creating new threads.
A thread pool inherently represents a queue and transfer point of work within a system. That is, something is feeding it work to do (and it may be feeding work elsewhere too). If a thread pool starts to grow its because it cannot keep up with demand.
In general, that's fine as computer resources are finite and that queue is built to handle bursts of work. However, that thread pool doesn't give you control over being able to push the bottleneck forward.
For example, in a server scenario, a few threads might be accepting on sockets and handing a thread pool the clients for processing. If that thread pool starts to grow out of control - the system should stop accepting new clients (in fact, the "acceptor" threads then often hop into the thread-pool temporarily to help process clients).
The effect is similar if you use a fixed thread pool with an unbounded input queue. Anytime you consider the scenario of the queue filling out of control - you realize the problem.
IIRC, Matt Welsh's seminal SEDA servers (which are asynchronous) created thread pools which modified their size according to server characteristics.
The idea of stop accepting new clients sounds bad until you realize the alternative is a crippled system which is processing no clients. (Again, with the understanding that computers are finite - even an optimally tuned system has a limit)
Incidentally, JVMs limit threads to 16k (usually) or 32k threads depending on the JVM. But if you are CPU bound, that limit isn't very relevant - starting yet another thread on a CPU-bound system is counterproductive.
I've happily run systems at 4 or 5 thousand threads. But nearing the 16k limit things tend to bog down (this limit JVM enforced - we had many more threads in linux C++) even when not CPU bound.

The problem with Executors.newCacheThreadPool() is that the executor will create and start as many threads as necessary to execute the tasks submitted to it. While this is mitigated by the fact that the completed threads are released (the thresholds are configurable), this can indeed lead to severe resource starvation, or even crash the JVM (or some badly designed OS).

There are a couple of issues with it. Unbounded growth in terms of threads an obvious issue – if you have cpu bound tasks then allowing many more than the available CPUs to run them is simply going to create scheduler overhead with your threads context switching all over the place and none actually progressing much. If your tasks are IO bound though things get more subtle. Knowing how to size pools of threads that are waiting on network or file IO is much more difficult, and depends a lot on the latencies of those IO events. Higher latencies mean you need (and can support) more threads.
The cached thread pool continues adding new threads as the rate of task production outstrips the rate of execution. There are a couple of small barriers to this (such as locks that serialise new thread id creation) but this can unbound growth can lead to out-of-memory errors.
The other big problem with the cached thread pool is that it can be slow for task producer thread. The pool is configured with a SynchronousQueue for tasks to be offered to. This queue implementation basically has zero size and only works when there is a matching consumer for a producer (there is a thread polling when another is offering). The actual implementation was significantly improved in Java6, but it is still comparatively slow for the producer, particularly when it fails (as the producer is then responsible for creating a new thread to add to the pool). Often it is more ideal for the producer thread to simply drop the task on an actual queue and continue.
The problem is, no-one has a pool that has a small core set of threads which when they are all busy creates new threads up to some max and then enqueues subsequent tasks. Fixed thread pools seem to promise this, but they only start adding more threads when the underlying queue rejects more tasks (it is full). A LinkedBlockingQueue never gets full so these pools never grow beyond the core size. An ArrayBlockingQueue has a capacity, but as it only grows the pool when capacity is reached this doesn't mitigate the production rate until it is already a big problem. Currently the solution requires using a good rejected execution policy such as caller-runs, but it needs some care.
Developers see the cached thread pool and blindly use it without really thinking through the consequences.

concurrent queue - general question (description and usage)

I am having some trouble grasping the idea of a concurrent queue. I understand a queue is a FIFO, or first come first serve, data structure.
Now when we add the concurrency part, which I interpret as thread safety (please let me know if that is incorrect) things get a bit fuzzy. By concurrency we mean the way various threads can add to the queue, or delete (service an item) from the queue? Is concurrency providing a sense of ordering to this operations?
I would greatly appreciate a general description of the functionality of a concurrent queue. A similar post here is not as general as I hoped.
Also is there such a thing as a concurrent priority queue? What would be its usage?
Many thanks in advance, for any brief explanations or helpful links on this subject.

The notion that a BlockingQueue offers little overhead is a bit miss leading. Acquiring a lock invokes pretty substantial overhead. Alone with the context switching we are talking thousands of instructions. Not just that but the progress of one thread will directly affect another thread. Now, its not as bad as it was years ago, but compared to non blocking, it is substantial.
BlockingQueue's use locks for mutual exclusion
ArrayBlockingQueue, LinkedBlockingQueue, PriorityBlockingQUeue: are three blocking queue's while
ConcurrentLinkedQueue, java 1.7 LinkedTransferQueue: Uses the Michael and Scott, non blocking queue algorithm.
Under moderate to low contention (which is more of a real world scenario), the non blocking queues significantly out perform blocking queues.
And to note on Steve's comment about the lack of bottlenecks. Under heavy contention a non blocking algorithm can bottle neck on the constant cas attempts, while blocking will suspend the threads. We then see that a BlockingQueue under heavy contention slightly out performs a non blocking queue, but that type of contention isn't a norm by any means.

I understand by "concurrency" that the queue is thread-safe. This does not mean that it will be efficient. However, I would imagine that the Java queue use a lock-free implementation which means that there is little or no penatly when two threads attempt a push or a pop at the same time. What generally happens is that they use atomic locking at an assembler level which ensures that the same object cannot be popped twice.
I once wrote a lock-free FIFO queue (in Delphi ) which worked very well. Much more efficient that a previous version which used Critical sections. The CS version ground to a halt especially with many threads all trying to access the queue. The lock-free version however had no bottlenecks depsite many threads accessing it a lot.

You should start by checking out the BlockingQueue interface definition as this is the cornerstone for using queues for communication between threads and contains utility methods to allow producer and consumer threads to access the queue in either a blocking or non-blocking fashion. This, along with thread-safe access is my understanding of what constitutes a "concurrent queue" (although I've never heard of that phrase - BlockingQueue merely exists in the java.util.concurrent package).
To answer the second part of your question, the priority queue implementation you should study is PriorityBlockingQueue. This may be useful if your producer thread(s) are producing tasks of varying priorities (e.g. requests from "normal users" and "power users") and you wish to control the order in which tasks are processed by your consumer thread(s). One possible pitfall to avoid is the starvation of low priority tasks that are never removed from the queue due to the constant influx of higher priority tasks.

Just leaving here a link to the java.util.concurrent package that I think contains very important information about some questions raised here.
See: Concurrent Collections and Memory Consistency Properties

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.