How to make my data structure thread safe?

How to make my data structure thread safe? - java

I defined an Element class:
class Element<T> {
T value;
Element<T> next;
Element(T value) {
this.value = value;
}
}
also defined a List class based on Element. It is a typical list, just like in any data structure books, has addHead, delete and etc operations
public class List<T> implements Iterable<T> {
private Element<T> head;
private Element<T> tail;
private long size;
public List() {
this.head = null;
this.tail = null;
this.size = 0;
}
public void insertHead (T node) {
Element<T> e = new Element<T>(node);
if (size == 0) {
head = e;
tail = e;
} else {
e.next = head;
head = e;
}
size++;
}
//Other method code omitted
}
How do I make this List class thread safe?
put synchronized on all methods? Seems not working. Two threads may work on differnt methods at the same time and cause collision.
If I have used an array to keep all the elements in the class, then I may use a volatile on the array to make sure only one thread is working with the internal elements. But currently all the elements are linked through object refernece on each's next pointer. I have no way to use volatile.
Using volatile on head, tail and size? This may cause deadlocks if two thread running different methods holding on the resource each other waiting for.
Any suggestions?

If you put synchronized on every method, the data structure WILL BE thread-safe. Because by definition, only one thread will be executing any method on the object at a time, and inter-thread ordering and visibility is also ensured. So it is as good as if one thread is doing all operations.
Putting a synchronized(this) block won't be any different if the area the block covers is the whole method. You might get better performance if the area is smaller than that.
Doing something like
private final Object LOCK = new Object();
public void method(){
synchronized(LOCK){
doStuff();
}
}
Is considered good practice, although not for better performance. Doing this will ensure that nobody else can use your lock, and unintentionally creating a deadlock-prone implementation etc.
In your case, I think you could use ReadWriteLock to get better read performance. As the name suggests, a ReadWriteLock lets multiple threads through if they are accessing "read method", methods that does not mutate the state of the object (Of course you have to correctly identify which of your methods are "read method" and "write method", and use ReadWriteLock accordingly!). Also, it ensures that no other thread is accessing the object while "write method" are executed. And it takes care of the scheduling of the read/write threads.
Other well known way of making a class thread-safe is "CopyOnWrite", where you copy the whole data structure upon mutation. This is only recommended when the object is mostly "read" and rarely "written".
Here is a sample implementation of that strategy.
http://www.codase.com/search/smart?join=class+java.util.concurrent.CopyOnWriteArrayList
private volatile transient E[] array;
/**
* Returns the element at the specified position in this list.
*
* #param index index of element to return.
* #return the element at the specified position in this list.
* #throws IndexOutOfBoundsException if index is out of range <tt>(index
* < 0 || index >= size())</tt>.
*/
public E get(int index) {
E[] elementData = array();
rangeCheck(index, elementData.length);
return elementData[index];
}
/**
* Appends the specified element to the end of this list.
*
* #param element element to be appended to this list.
* #return true (as per the general contract of Collection.add).
*/
public synchronized boolean add(E element) {
int len = array.length;
E[] newArray = (E[]) new Object[len+1];
System.arraycopy(array, 0, newArray, 0, len);
newArray[len] = element;
array = newArray;
return true;
}
Here, read method is accessing without going through any lock, while write method has to be synchronized. Inter-thread ordering and visibility for read methods are ensured by the use of volatile for the array.
The reason that write methods have to "copy" is because the assignment array = newArray has to be "one shot" (in java, assignment of object reference is atomic), and you may not touch the original array during the manipulation.

I'd look at the source code for the java.util.LinkedList class for a real implementation.
Synchronized by default will lock on the instance of the class - which may not be what you want. (esp. if Element is externally accessible). If you synchronize all the methods on the same lock, then you'll have terrible concurrent performance, but it'll prevent them from executing at the same time - effectively single-threading access to the class.
Also - I see a tail reference, but don't see Element with a corresponding previous field, for a double linked-list - reason?

I'd suggest you to use a ReentrantLock which you can pass to every element of the list, but you will have to use a factory to instantiate every element.
Any time you need to take something out of the list, you will block the very same lock, so you can assure that no two threads will be accessing at the same time.

Related

Reduce thread competition by using notify in place of notifyAll

I saw this self implemented bounded blocking queue.
A change was made to it, aiming to eleminate competition by replacing notifyAll with notify.
But I don't quite get what's the point of the 2 extra variables added: waitOfferCount and waitPollCount.
Their initial values are both 0.
Diff after and before they're added is below:
Offer:
Poll:
My understanding is that the 2 variables purpose is that you won't do useless notify calls when there's nothing wait on the object. But what harm would it do if not done this way?
Another thought is that they may have something to do with the switch from notifyAll to notify, but again I think we can safely use notify even without them?
Full code below:
class FairnessBoundedBlockingQueue implements Queue {
protected final int capacity;
protected Node head;
protected Node tail;
// guard: canPollCount, head
protected final Object pollLock = new Object();
protected int canPollCount;
protected int waitPollCount;
// guard: canOfferCount, tail
protected final Object offerLock = new Object();
protected int canOfferCount;
protected int waitOfferCount;
public FairnessBoundedBlockingQueue(int capacity) {
this.capacity = capacity;
this.canPollCount = 0;
this.canOfferCount = capacity;
this.waitPollCount = 0;
this.waitOfferCount = 0;
this.head = new Node(null);
this.tail = head;
}
public boolean offer(Object obj) throws InterruptedException {
synchronized (offerLock) {
while (canOfferCount <= 0) {
waitOfferCount++;
offerLock.wait();
waitOfferCount--;
}
Node node = new Node(obj);
tail.next = node;
tail = node;
canOfferCount--;
}
synchronized (pollLock) {
++canPollCount;
if (waitPollCount > 0) {
pollLock.notify();
}
}
return true;
}
public Object poll() throws InterruptedException {
Object result;
synchronized (pollLock) {
while (canPollCount <= 0) {
waitPollCount++;
pollLock.wait();
waitPollCount--;
}
result = head.next.value;
head.next.value = null;
head = head.next;
canPollCount--;
}
synchronized (offerLock) {
canOfferCount++;
if (waitOfferCount > 0) {
offerLock.notify();
}
}
return result;
}
}

You would need to ask the authors of that change what they thought they were achieving with that change.
My take is as follows:
Changing from notifyAll() to notify() is a good thing. If there are N threads waiting on a queue's offerLock or pollLock, then this avoids N - 1 unnecessary wakeups.
It seems that the counters are being used avoid calling notify() when there isn't a thread waiting. This looks to me like a doubtful optimization. AFAIK a notify on a mutex when nothing is waiting is very cheap. So this may make a small difference ... but it is unlikely to be significant.
If you really want to know, write some benchmarks. Write 4 versions of this class with no optimization, the notify optimization, the counter optimization and both of them. Then compare the results ... for different levels of queue contention.
I'm not sure what "fairness" is supposed to mean here, but I can't see anything in this class to guarantee that threads that are waiting in offer or poll get treated fairly.

Another thought is that they may have something to do with the switch from notifyAll to notify, but again I think we can safely use notify even without them?
Yes, since two locks (pollLock and offerLock) are used, it is no problem to change notyfiAll to notify without these two variables. But if you are using a lock, you must use notifyAll.
My understanding is that the 2 variables purpose is that you won't do useless notify calls when there's nothing wait on the object. But what harm would it do if not done this way?
Yes, these two variables are to avoid useless notify calls. These two variables also bring in additional operations. I think benchmarking may be needed to determine performance in different scenarios.
Besides,
1.As a blocking queue, it should implement the interface BlockingQueue, and both poll and offer methods shoule be non-blocking. It should use take and put.
2.This is not a Fairness queue.

Using ` LinkedBlockingQueue` may cause null pointer exception

I am learning java concurrent programming recently. I know that the final keyword can guarantee a safe publication. However, when I read the LinkedBlockingQueue source code, I found that the head and last field did not use the final keyword. I found that the enqueue method is called in the put method, and the enqueue method directly assigns the value to last.next. At this time, last may be a null because last is not declared with final. Is my understanding correct? Although lock can guarantee last read and write thread safety, but can lock guarantee that last is a correct initial value instead of null
public class LinkedBlockingQueue<E> extends AbstractQueue<E>
implements BlockingQueue<E>, java.io.Serializable {
transient Node<E> head;
private transient Node<E> last;
public LinkedBlockingQueue(int capacity) {
if (capacity <= 0) throw new IllegalArgumentException();
this.capacity = capacity;
last = head = new Node<E>(null);
}
private void enqueue(Node<E> node) {
// assert putLock.isHeldByCurrentThread();
// assert last.next == null;
last = last.next = node;
}
public void put(E e) throws InterruptedException {
if (e == null) throw new NullPointerException();
// Note: convention in all put/take/etc is to preset local var
// holding count negative to indicate failure unless set.
int c = -1;
Node<E> node = new Node<E>(e);
final ReentrantLock putLock = this.putLock;
final AtomicInteger count = this.count;
putLock.lockInterruptibly();
try {
/*
* Note that count is used in wait guard even though it is
* not protected by lock. This works because count can
* only decrease at this point (all other puts are shut
* out by lock), and we (or some other waiting put) are
* signalled if it ever changes from capacity. Similarly
* for all other uses of count in other wait guards.
*/
while (count.get() == capacity) {
notFull.await();
}
enqueue(node);
c = count.getAndIncrement();
if (c + 1 < capacity)
notFull.signal();
} finally {
putLock.unlock();
}
if (c == 0)
signalNotEmpty();
}
}

According to this blog post https://shipilev.net/blog/2014/safe-public-construction/ even writing to one final property in constructor is enough to achieve safe initialization (and thus your object will be always published safely). And capacity property is declared as final.
In short, we emit a trailing barrier in three cases:
A final field was written. Notice we do not care about what field was actually written, we unconditionally emit the barrier before exiting the (initializer) method. That means if you have at least one final field write, the final fields semantics extend to every other field written in constructor.

Maybe you miss understanding about the of Java's continuous assignment
//first last is inited in the constructor
last = head = new Node<E>(null); // only the filed's value in last is null(item & next)
// enqueue
last = last.next = node;
//equals:
last.next = node;
last = last.next;
Only if you call last.next otherwise there will no NPE.

You are correct that last is equal to a node with a null value. However this is intentional. The lock is only meant to ensure that each thread can perform modifications in this class correctly.
Sometimes using null values is intentional, to indicate a lack of value (empty queue in this case). Because the variable is private it can only be modified from within the class, so as long as the one writing the class is aware of the possibility of null, everything is alright.
I think you are confusing multiple different concepts which are not necessarily connected. Note that because last is private there is no publication. In addition head and last are meant to be modified, so they can't be final.
Edit
Perhaps I misunderstood your question...
null is never assigned to last directly. So the only place this could happen is in the constructor, before last is assigned new Node<E>(null). Although we can be sure that the constructor finishes before it is used by many threads, there is no visibility guarantee for the values.
However put uses a lock which does guarantees visibility in use. So if there was no lock used, then last could actually be null.

Java Synchronization - Mutex.wait vs List.wait

While using Java Threading Primitives to construct a thread safe bounded queue - whats the difference between these 2 constructs
Creating an explicit lock object.
Using the list as the lock and waiting on it.
Example of 1
private final Object lock = new Object();
private ArrayList<String> list = new ArrayList<String>();
public String dequeue() {
synchronized (lock) {
while (list.size() == 0) {
lock.wait();
}
String value = list.remove(0);
lock.notifyAll();
return value;
}
}
public void enqueue(String value) {
synchronized (lock) {
while (list.size() == maxSize) {
lock.wait();
}
list.add(value);
lock.notifyAll();
}
}
Example of 2
private ArrayList<String> list = new ArrayList<String>();
public String dequeue() {
synchronized (list) { // lock on list
while (list.size() == 0) {
list.wait(); // wait on list
}
String value = list.remove(0);
list.notifyAll();
return value;
}
}
public void enqueue(String value) {
synchronized (list) { // lock on list
while (list.size() == maxSize) {
list.wait(); // wait on list
}
list.add(value);
list.notifyAll();
}
}
Note
This is a bounded list
No other operation is being performed apart from enqueue and dequeue.
I could use a blocking queue, but this question is more for improving my limited knowledge of threading.
If this question is repeated please let me know.

The short answer is, no, there is no functional difference, other than the extra memory overhead of maintaining that extra lock object. However, there are a couple of semantics-related items I would consider before making a final decision.
Will I ever need to perform synchronized operations on more than just my internal list?
Let's say you wanted to maintain a parallel data structure to your ArrayList, such that all operations on the list and that parallel data structure needed to be synchronized. In this case, it might be best to use the external lock, as locking on either the list or the structure might be confusing to future development efforts on this class.
Will I ever give access to my list outside of my queue class?
Let's say you wanted to provide an accessor method for your list, or make it visible to extensions of your Queue class. If you were using an external lock object, classes that retrieved references to the list would never be able to perform thread-safe operations on that list. In that case, it'd be better to synchronize on the list and make it clear in the API that external accesses/modifications to the list must also synchronize on that list.
I'm sure there are more reasons why you might choose one over the other, but these are the two big ones I can think of.

Synchronization of a Queue

I've been reading up on Doug Lea's 'Concurrency Programming in Java' book. As you may know, Doug originally wrote the Java Concurrency API. However, something has caused me some confusion and I was hoping to gain a few my opinions on this little conundrum!
Take the following code from Doug Lea's queuing example...
class LinkedQueue {
protected Node head = new Node(null);
protected Node last = head;
protected final Object pollLock = new Object();
protected final Object putLock = new Object();
public void put(Object x) {
Node node = new Node(x);
synchronized (putLock) { // insert at end of list
synchronized (last) {
last.next = node; // extend list
last = node;
}
}
}
public Object poll() { // returns null if empty
synchronized (pollLock) {
synchronized (head) {
Object x = null;
Node first = head.next; // get to first real node
if (first != null) {
x = first.object;
first.object = null; // forget old object
head = first; // first becomes new head
}
return x;
}
}
}
static class Node { // local node class for queue
Object object;
Node next = null;
Node(Object x) { object = x; }
}
}
This a quite a nice Queue. It uses two monitors so a Producer and a Consumer can access the Queue at the same time. Nice! However, the synchronization on 'last' and 'head' is confusing me here. The book states this is needed for for the situation whereby Queue is currently or about to have 0 entries. Ok, fair enough and this kind of makes sense.
However, then I looked at the Java Concurrency LinkedBlockingQueue. The original version of the Queue don't synchronize on head or tail (I also wanted to post another link to the modern version which also suffers from the same problem but I couldn't do so because I'm a newbie). I wonder why not? Am I missing something here? Is there some part of the idiosyncratic nature of the Java Memory Model I'm missing? I would have thought for visibility purposes that this synchronization is needed? I'd appreciate some expert opinions!

In the version you put up a link for as well as the version in the latest JRE the item inside the Node class is volatile which enforces reads and writes to be visible to all other threads, here is a more in depth explaination http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#volatile

The subtlety here is that synchronized(null) would throw a NullPointerException,so neither head nor last is allowed to become null. They are both initialized to the value of the same dummy node that is never returned or removed from either list.
put() and poll() are synchronized on two different locks. The methods would need to synchronize on the same lock to be thread-safe with respect to one another if they could modify the same value from different threads. The only situation in which this is a problem is when head == last (i.e. they are the same object, referenced through different member variables). This is why the code synchronizes on head and last - most of the time these will be fast, uncontented locks, but occasionally head and last will be the same instance and one of the threads will have to block the other.
The only time that visibility is an issue is when the queue is nearly empty, the rest of the time put() and poll() work on different ends of the queue and don't interfere with each other.

Lock Free Array Element Swapping

In multi-thread environment, in order to have thread safe array element swapping, we will perform synchronized locking.
// a is char array.
synchronized(a) {
char tmp = a[1];
a[1] = a[0];
a[0] = tmp;
}
Is it possible that we can make use of the following API in the above situation, so that we can have a lock free array element swapping? If yes, how?
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/atomic/AtomicReferenceFieldUpdater.html#compareAndSet%28T,%20V,%20V%29

Regardless of API used you won't be able to achieve both thread-safe and lock-free array element swapping in Java.
The element swapping requires multiple read and update operations that need to be performed atomically. To simulate the atomicity you need a lock.
EDIT:
An alternative to lock-free algorithm might be micro-locking: instead of locking the entire array it’s possible to lock only elements that are being swapped.
The value of this approach fully is questionable. That is to say if the algorithm that requires swapping elements can guarantee that different threads are going to work on different parts of the array then no synchronisation required.
In the opposite case, when different threads can actually attempt swapping overlapping elements then thread execution order will matter. For example if one thread tries to swap elements 0 and 1 of the array and the other simultaneously attempts to swap 1 and 2 then the result will depend entirely on the order of execution, for initial {‘a’,’b’,’c’} you can end up either with {‘b’,’c’,’a’} or {‘c’,’a’,’b’}. Hence you’d require a more sophisticated synchronisation.
Here is a quick and dirty class for character arrays that implements micro locking:
import java.util.concurrent.atomic.AtomicIntegerArray;
class SyncCharArray {
final private char array [];
final private AtomicIntegerArray locktable;
SyncCharArray (char array[])
{
this.array = array;
// create a lock table the size of the array
// to track currently locked elements
this.locktable = new AtomicIntegerArray(array.length);
for (int i = 0;i<array.length;i++) unlock(i);
}
void swap (int idx1, int idx2)
{
// return if the same element
if (idx1==idx2) return;
// lock element with the smaller index first to avoid possible deadlock
lock(Math.min(idx1,idx2));
lock(Math.max(idx1,idx2));
char tmp = array[idx1];
array [idx1] = array[idx2];
unlock(idx1);
array[idx2] = tmp;
unlock(idx2);
}
private void lock (int idx)
{
// if required element is locked when wait ...
while (!locktable.compareAndSet(idx,0,1)) Thread.yield();
}
private void unlock (int idx)
{
locktable.set(idx,0);
}
}
You’d need to create the SyncCharArray and then pass it to all threads that require swapping:
char array [] = {'a','b','c','d','e','f'};
SyncCharArray sca = new SyncCharArray(array);
// then pass sca to any threads that require swapping
// then within a thread
sca.swap(15,3);
Hope that makes some sense.
UPDATE:
Some testing demonstrated that unless you have a great number of threads accessing the array simulteniously (100+ on run-of-the-mill hardware) a simple synchronise (array) {} works much faster than the elaborate synchronisation.

// lock-free swap array[i] and array[j] (assumes array contains not null elements only)
static <T> void swap(AtomicReferenceArray<T> array, int i, int j) {
while (true) {
T ai = array.getAndSet(i, null);
if (ai == null) continue;
T aj = array.getAndSet(j, null);
if (aj == null) {
array.set(i, ai);
continue;
}
array.set(i, aj);
array.set(j, ai);
break;
}
}

The closest you're going to get is java.util.concurrent.atomic.AtomicReferenceArray, which offers CAS-based operations such as boolean compareAndSet(int i, E expect, E update). It does not have a swap(int pos1, int pos2) operation though so you're going to have to emulate it with two compareAndSet calls.

"The principal threat to scalability in concurrent applications is the exclusive resource lock." - Java Concurrency in Practice.
I think you need a lock, but as others mention that lock can be more granular than it is at present.
You can use lock striping like java.util.concurrent.ConcurrentHashMap.

The API you mentioned, as already stated by others, may only be used to set values of a single object, not an array. Nor even for two objects simultaneously, so you wouldn't have a secure swap anyway.
The solution depends on your specific situation. Can the array be replaced by another data structure? Is it also changing in size concurrently?
If you must use an array, it could be changed it to hold updatable objects (not primitive types nor a Char), and synchronize over both being swapped. S data structure like this would work:
public class CharValue {
public char c;
}
CharValue[] a = new CharValue[N];
Remember to use a deterministic synchronization order for not having a deadlocks (http://en.wikipedia.org/wiki/Deadlock#Circular_wait_prevention)! You could simply follow index ordering to avoid it.
If items should also be added or removed concurrently from the collection, you could use a Map instead, synchronize swaps on the Map.Entry'es and use a synchronized Map implementation. A simple List wouldn't do it because there are no isolated structures for retaining the values (or you don't have access to them).

I don't think the AtomicReferenceFieldUpdater is meant for array access, and even if it were, it only provides atomic guarantees on one reference at a time. AFAIK, all the classes in java.util.concurrent.atomic only provide atomic access to one reference at a time. In order to change two or more references as one atomic operation, you must use some kind of locking.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to make my data structure thread safe? - java

Related

Reduce thread competition by using notify in place of notifyAll

Using ` LinkedBlockingQueue` may cause null pointer exception

Java Synchronization - Mutex.wait vs List.wait

Synchronization of a Queue

Lock Free Array Element Swapping

Categories

Resources