Using ` LinkedBlockingQueue` may cause null pointer exception - java

I am learning java concurrent programming recently. I know that the final keyword can guarantee a safe publication. However, when I read the LinkedBlockingQueue source code, I found that the head and last field did not use the final keyword. I found that the enqueue method is called in the put method, and the enqueue method directly assigns the value to last.next. At this time, last may be a null because last is not declared with final. Is my understanding correct? Although lock can guarantee last read and write thread safety, but can lock guarantee that last is a correct initial value instead of null
public class LinkedBlockingQueue<E> extends AbstractQueue<E>
implements BlockingQueue<E>, java.io.Serializable {
transient Node<E> head;
private transient Node<E> last;
public LinkedBlockingQueue(int capacity) {
if (capacity <= 0) throw new IllegalArgumentException();
this.capacity = capacity;
last = head = new Node<E>(null);
}
private void enqueue(Node<E> node) {
// assert putLock.isHeldByCurrentThread();
// assert last.next == null;
last = last.next = node;
}
public void put(E e) throws InterruptedException {
if (e == null) throw new NullPointerException();
// Note: convention in all put/take/etc is to preset local var
// holding count negative to indicate failure unless set.
int c = -1;
Node<E> node = new Node<E>(e);
final ReentrantLock putLock = this.putLock;
final AtomicInteger count = this.count;
putLock.lockInterruptibly();
try {
/*
* Note that count is used in wait guard even though it is
* not protected by lock. This works because count can
* only decrease at this point (all other puts are shut
* out by lock), and we (or some other waiting put) are
* signalled if it ever changes from capacity. Similarly
* for all other uses of count in other wait guards.
*/
while (count.get() == capacity) {
notFull.await();
}
enqueue(node);
c = count.getAndIncrement();
if (c + 1 < capacity)
notFull.signal();
} finally {
putLock.unlock();
}
if (c == 0)
signalNotEmpty();
}
}

According to this blog post https://shipilev.net/blog/2014/safe-public-construction/ even writing to one final property in constructor is enough to achieve safe initialization (and thus your object will be always published safely). And capacity property is declared as final.
In short, we emit a trailing barrier in three cases:
A final field was written. Notice we do not care about what field was actually written, we unconditionally emit the barrier before exiting the (initializer) method. That means if you have at least one final field write, the final fields semantics extend to every other field written in constructor.

Maybe you miss understanding about the of Java's continuous assignment
//first last is inited in the constructor
last = head = new Node<E>(null); // only the filed's value in last is null(item & next)
// enqueue
last = last.next = node;
//equals:
last.next = node;
last = last.next;
Only if you call last.next otherwise there will no NPE.

You are correct that last is equal to a node with a null value. However this is intentional. The lock is only meant to ensure that each thread can perform modifications in this class correctly.
Sometimes using null values is intentional, to indicate a lack of value (empty queue in this case). Because the variable is private it can only be modified from within the class, so as long as the one writing the class is aware of the possibility of null, everything is alright.
I think you are confusing multiple different concepts which are not necessarily connected. Note that because last is private there is no publication. In addition head and last are meant to be modified, so they can't be final.
Edit
Perhaps I misunderstood your question...
null is never assigned to last directly. So the only place this could happen is in the constructor, before last is assigned new Node<E>(null). Although we can be sure that the constructor finishes before it is used by many threads, there is no visibility guarantee for the values.
However put uses a lock which does guarantees visibility in use. So if there was no lock used, then last could actually be null.

Related

Reduce thread competition by using notify in place of notifyAll

I saw this self implemented bounded blocking queue.
A change was made to it, aiming to eleminate competition by replacing notifyAll with notify.
But I don't quite get what's the point of the 2 extra variables added: waitOfferCount and waitPollCount.
Their initial values are both 0.
Diff after and before they're added is below:
Offer:
Poll:
My understanding is that the 2 variables purpose is that you won't do useless notify calls when there's nothing wait on the object. But what harm would it do if not done this way?
Another thought is that they may have something to do with the switch from notifyAll to notify, but again I think we can safely use notify even without them?
Full code below:
class FairnessBoundedBlockingQueue implements Queue {
protected final int capacity;
protected Node head;
protected Node tail;
// guard: canPollCount, head
protected final Object pollLock = new Object();
protected int canPollCount;
protected int waitPollCount;
// guard: canOfferCount, tail
protected final Object offerLock = new Object();
protected int canOfferCount;
protected int waitOfferCount;
public FairnessBoundedBlockingQueue(int capacity) {
this.capacity = capacity;
this.canPollCount = 0;
this.canOfferCount = capacity;
this.waitPollCount = 0;
this.waitOfferCount = 0;
this.head = new Node(null);
this.tail = head;
}
public boolean offer(Object obj) throws InterruptedException {
synchronized (offerLock) {
while (canOfferCount <= 0) {
waitOfferCount++;
offerLock.wait();
waitOfferCount--;
}
Node node = new Node(obj);
tail.next = node;
tail = node;
canOfferCount--;
}
synchronized (pollLock) {
++canPollCount;
if (waitPollCount > 0) {
pollLock.notify();
}
}
return true;
}
public Object poll() throws InterruptedException {
Object result;
synchronized (pollLock) {
while (canPollCount <= 0) {
waitPollCount++;
pollLock.wait();
waitPollCount--;
}
result = head.next.value;
head.next.value = null;
head = head.next;
canPollCount--;
}
synchronized (offerLock) {
canOfferCount++;
if (waitOfferCount > 0) {
offerLock.notify();
}
}
return result;
}
}
You would need to ask the authors of that change what they thought they were achieving with that change.
My take is as follows:
Changing from notifyAll() to notify() is a good thing. If there are N threads waiting on a queue's offerLock or pollLock, then this avoids N - 1 unnecessary wakeups.
It seems that the counters are being used avoid calling notify() when there isn't a thread waiting. This looks to me like a doubtful optimization. AFAIK a notify on a mutex when nothing is waiting is very cheap. So this may make a small difference ... but it is unlikely to be significant.
If you really want to know, write some benchmarks. Write 4 versions of this class with no optimization, the notify optimization, the counter optimization and both of them. Then compare the results ... for different levels of queue contention.
I'm not sure what "fairness" is supposed to mean here, but I can't see anything in this class to guarantee that threads that are waiting in offer or poll get treated fairly.
Another thought is that they may have something to do with the switch from notifyAll to notify, but again I think we can safely use notify even without them?
Yes, since two locks (pollLock and offerLock) are used, it is no problem to change notyfiAll to notify without these two variables. But if you are using a lock, you must use notifyAll.
My understanding is that the 2 variables purpose is that you won't do useless notify calls when there's nothing wait on the object. But what harm would it do if not done this way?
Yes, these two variables are to avoid useless notify calls. These two variables also bring in additional operations. I think benchmarking may be needed to determine performance in different scenarios.
Besides,
1.As a blocking queue, it should implement the interface BlockingQueue, and both poll and offer methods shoule be non-blocking. It should use take and put.
2.This is not a Fairness queue.

Thread safe switching of Linked-List nodes without locking the entire list

I'm learning about threads, locks etc. Therefore, I don't want to use synchronized key word or any class that is thread-safe other then semaphore and ReentrantLock (without Atomic variables).
I want to have kind of synchronized LinkedList<T> of Node<T>, order by the size of T (assume that T is implements an interface that have size and increment functions and lock, unlock functions). I want to be able to replace two Nodes by their T.getSize() function without locking all the list.
For example, if I only had one thread the function will be a "classic" replace function, something like that:
public void IncrementAndcheckReplace(Node<T> node)
{
node.getData().incrementSize();
Node nextNode = node.getNext();
if(nextNode == null)
return;
while(node.getData().getSize() > nextNode.getData().getSize())
{
node.getPrev().setNext(nextNode);
nextNode.setPrev(node.getPrev());
Node nextnext = nextNode.getNext();
nextNode.setNext(node);
node.setPrev(nextNode);
node.setNext(nextnext);
nextnext.setPrev(node);
nextNode = node.getNext();
if(nextNode == null)
break;
}
}
now lets get to the synchronized problem.
I thought about trying to do something like that to create a lock for my Nodes:
public void IncrementAndcheckReplace(Node<T> node)
{
node.lock(); //using fair ReentrantLock for specific node
node.getData().incrementSize();
Node nextNode = node.getNext();
if(nextNode == null)
{
node.unlock();
return;
}
nextNode.lock();
while(node.getData().getSize() > nextNode.getData().getSize())
{
Node prev = node.getPrev();
if(prev != null)
{
prev.lock();
prev.setNext(nextNode);
}
nextNode.setPrev(prev);
Node nextnext = nextNode.getNext();
if(nextnext != null)
{
nextnext.lock();
nextnext.setPrev(node);
}
nextNode.setNext(node);
node.setPrev(nextNode);
node.setNext(nextnext);
if(prev!=null)
prev.unlock();
if(nextnext!=null)
nextnext.unlock();
nextNode.unlock();
nextNode = node.getNext();
if(nextNode == null)
break;
nextNode.lock();
}
node.unlock();
}
the problem is that this is not thread safe as all, and dead-lock may happens. For example lets assume that we have Node a, Node b which a.next == b and b.prev==a that now if thread A trying to use replace function on a, and thread B trying to use replace function on b, they will both be locked and I will get nowhere ever.
how can I make the replace function to be thread safe without lock the entire list? I want to avoid dead-lock and starvation.
thanks!
The most general answer, permitting the most concurrency, is to lock all of the four nodes involved in the reordering. After they are all locked, check that the ordering hasn't changed - perhaps aborting and retrying if it has - then do the reordering, then release the locks in reverse order.
The tricky part is that, to avoid deadlock, the nodes have to be locked in order according to some fixed order. Unfortunately, ordering them by position in the list won't work, since that ordering can change. The nodes' hashCode and identityHashCode aren't guaranteed to work, since there can be collisions. You'll need to provide some ordering of your own, for example by giving each node a unique permanent ID on construction, which can then be used for the locking order.

ArrayDeque isn't empty but returns null for the poll method

I have this code (Obfuscated) as part of a large application and it's getting a NullPointerException on the object.doSomething() line. Since we just checked the isEmpty() call and there is no other thread polling this queue, how is this possible? There are other threads adding to the queue; is it possible concurrent adds permanently screwed up the queue?
I tried reading the source code of ArrayDeque and it uses head == tail as a check for isEmpty(). Is it possible some weird collision during adds made head != tail but the head point to null?
private final Queue<Task> active = new ArrayDeque<Task>();
if (!this.active.isEmpty()) {
SomeType object = null;
object = this.active.poll();
object.doSomething();
}
Even if there is no other thread polling, there are probably other threads pushing.
This means in a concurrent access that tail can be modified incorrectly, and if tail is corrupted you may never get to the point where head == tail, hence the NullPointerException.
As #dacwe stated, the documentation clearly specify that you (or the developers of this obfuscated application) should not use ArrayDeque in a concurrent environment, this is one of the possible problems with concurrency.
They are not thread-safe; in the absence of external synchronization, they do not support concurrent access by multiple threads.
If you want a threadsafe Queue you can use LinkedBlockingQueue, if you need a Dequeue you can use LinkedBlockingDeque.
Resources:
Javadoc: ArrayDeque
As stated in the api:
They are not thread-safe; in the absence of external synchronization, they do not support concurrent access by multiple threads.
You may consider the case when active.poll() accesses the old elements[] reclaimed in ArrayDequeue.doubleCapacity() for which the deque is full at the same time.
One possible timeline:
The polling thread checks out active.isEmpty() returns false
The polling thread calls ```active.pollFirst() to access the elements[] which is not atomic
One or more other threads call active.addLast() in bursts so that active is full and doubleCapacity() is triggered
In doubleCapacity(), the elements[] is replaced with a newly allocated array such that the old elements[] is reclaimed by GC
The polling thread now references the reclaimed elements[] and possibly gets null.
My guess is that you would like to avoid synchronization for polling while the queue is not empty. To avoid the race due to doubleCapacity(), make sure the queue is allocated with a sufficiently large capacity and will not be full anytime addLast() is called. However, there might be other races you need to consider depending on the actual implementation.
The following sources from openJDK are appended FYI.
public E pollFirst() {
int h = head;
#SuppressWarnings("unchecked")
E result = (E) elements[h];
// Element is null if deque empty
if (result == null)
return null;
elements[h] = null; // Must null out slot
head = (h + 1) & (elements.length - 1);
return result;
}
public void addLast(E e) {
if (e == null)
throw new NullPointerException();
elements[tail] = e;
if ( (tail = (tail + 1) & (elements.length - 1)) == head)
doubleCapacity();
}
private void doubleCapacity() {
assert head == tail;
int p = head;
int n = elements.length;
int r = n - p; // number of elements to the right of p
int newCapacity = n << 1;
if (newCapacity < 0)
throw new IllegalStateException("Sorry, deque too big");
Object[] a = new Object[newCapacity];
System.arraycopy(elements, p, a, 0, r);
System.arraycopy(elements, 0, a, r, p);
elements = a;
head = 0;
tail = n;
}

How to make my data structure thread safe?

I defined an Element class:
class Element<T> {
T value;
Element<T> next;
Element(T value) {
this.value = value;
}
}
also defined a List class based on Element. It is a typical list, just like in any data structure books, has addHead, delete and etc operations
public class List<T> implements Iterable<T> {
private Element<T> head;
private Element<T> tail;
private long size;
public List() {
this.head = null;
this.tail = null;
this.size = 0;
}
public void insertHead (T node) {
Element<T> e = new Element<T>(node);
if (size == 0) {
head = e;
tail = e;
} else {
e.next = head;
head = e;
}
size++;
}
//Other method code omitted
}
How do I make this List class thread safe?
put synchronized on all methods? Seems not working. Two threads may work on differnt methods at the same time and cause collision.
If I have used an array to keep all the elements in the class, then I may use a volatile on the array to make sure only one thread is working with the internal elements. But currently all the elements are linked through object refernece on each's next pointer. I have no way to use volatile.
Using volatile on head, tail and size? This may cause deadlocks if two thread running different methods holding on the resource each other waiting for.
Any suggestions?
If you put synchronized on every method, the data structure WILL BE thread-safe. Because by definition, only one thread will be executing any method on the object at a time, and inter-thread ordering and visibility is also ensured. So it is as good as if one thread is doing all operations.
Putting a synchronized(this) block won't be any different if the area the block covers is the whole method. You might get better performance if the area is smaller than that.
Doing something like
private final Object LOCK = new Object();
public void method(){
synchronized(LOCK){
doStuff();
}
}
Is considered good practice, although not for better performance. Doing this will ensure that nobody else can use your lock, and unintentionally creating a deadlock-prone implementation etc.
In your case, I think you could use ReadWriteLock to get better read performance. As the name suggests, a ReadWriteLock lets multiple threads through if they are accessing "read method", methods that does not mutate the state of the object (Of course you have to correctly identify which of your methods are "read method" and "write method", and use ReadWriteLock accordingly!). Also, it ensures that no other thread is accessing the object while "write method" are executed. And it takes care of the scheduling of the read/write threads.
Other well known way of making a class thread-safe is "CopyOnWrite", where you copy the whole data structure upon mutation. This is only recommended when the object is mostly "read" and rarely "written".
Here is a sample implementation of that strategy.
http://www.codase.com/search/smart?join=class+java.util.concurrent.CopyOnWriteArrayList
private volatile transient E[] array;
/**
* Returns the element at the specified position in this list.
*
* #param index index of element to return.
* #return the element at the specified position in this list.
* #throws IndexOutOfBoundsException if index is out of range <tt>(index
* < 0 || index >= size())</tt>.
*/
public E get(int index) {
E[] elementData = array();
rangeCheck(index, elementData.length);
return elementData[index];
}
/**
* Appends the specified element to the end of this list.
*
* #param element element to be appended to this list.
* #return true (as per the general contract of Collection.add).
*/
public synchronized boolean add(E element) {
int len = array.length;
E[] newArray = (E[]) new Object[len+1];
System.arraycopy(array, 0, newArray, 0, len);
newArray[len] = element;
array = newArray;
return true;
}
Here, read method is accessing without going through any lock, while write method has to be synchronized. Inter-thread ordering and visibility for read methods are ensured by the use of volatile for the array.
The reason that write methods have to "copy" is because the assignment array = newArray has to be "one shot" (in java, assignment of object reference is atomic), and you may not touch the original array during the manipulation.
I'd look at the source code for the java.util.LinkedList class for a real implementation.
Synchronized by default will lock on the instance of the class - which may not be what you want. (esp. if Element is externally accessible). If you synchronize all the methods on the same lock, then you'll have terrible concurrent performance, but it'll prevent them from executing at the same time - effectively single-threading access to the class.
Also - I see a tail reference, but don't see Element with a corresponding previous field, for a double linked-list - reason?
I'd suggest you to use a ReentrantLock which you can pass to every element of the list, but you will have to use a factory to instantiate every element.
Any time you need to take something out of the list, you will block the very same lock, so you can assure that no two threads will be accessing at the same time.

Synchronization of a Queue

I've been reading up on Doug Lea's 'Concurrency Programming in Java' book. As you may know, Doug originally wrote the Java Concurrency API. However, something has caused me some confusion and I was hoping to gain a few my opinions on this little conundrum!
Take the following code from Doug Lea's queuing example...
class LinkedQueue {
protected Node head = new Node(null);
protected Node last = head;
protected final Object pollLock = new Object();
protected final Object putLock = new Object();
public void put(Object x) {
Node node = new Node(x);
synchronized (putLock) { // insert at end of list
synchronized (last) {
last.next = node; // extend list
last = node;
}
}
}
public Object poll() { // returns null if empty
synchronized (pollLock) {
synchronized (head) {
Object x = null;
Node first = head.next; // get to first real node
if (first != null) {
x = first.object;
first.object = null; // forget old object
head = first; // first becomes new head
}
return x;
}
}
}
static class Node { // local node class for queue
Object object;
Node next = null;
Node(Object x) { object = x; }
}
}
This a quite a nice Queue. It uses two monitors so a Producer and a Consumer can access the Queue at the same time. Nice! However, the synchronization on 'last' and 'head' is confusing me here. The book states this is needed for for the situation whereby Queue is currently or about to have 0 entries. Ok, fair enough and this kind of makes sense.
However, then I looked at the Java Concurrency LinkedBlockingQueue. The original version of the Queue don't synchronize on head or tail (I also wanted to post another link to the modern version which also suffers from the same problem but I couldn't do so because I'm a newbie). I wonder why not? Am I missing something here? Is there some part of the idiosyncratic nature of the Java Memory Model I'm missing? I would have thought for visibility purposes that this synchronization is needed? I'd appreciate some expert opinions!
In the version you put up a link for as well as the version in the latest JRE the item inside the Node class is volatile which enforces reads and writes to be visible to all other threads, here is a more in depth explaination http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#volatile
The subtlety here is that synchronized(null) would throw a NullPointerException,so neither head nor last is allowed to become null. They are both initialized to the value of the same dummy node that is never returned or removed from either list.
put() and poll() are synchronized on two different locks. The methods would need to synchronize on the same lock to be thread-safe with respect to one another if they could modify the same value from different threads. The only situation in which this is a problem is when head == last (i.e. they are the same object, referenced through different member variables). This is why the code synchronizes on head and last - most of the time these will be fast, uncontented locks, but occasionally head and last will be the same instance and one of the threads will have to block the other.
The only time that visibility is an issue is when the queue is nearly empty, the rest of the time put() and poll() work on different ends of the queue and don't interfere with each other.

Categories

Resources