Lock Free Array Element Swapping

Lock Free Array Element Swapping - java

In multi-thread environment, in order to have thread safe array element swapping, we will perform synchronized locking.
// a is char array.
synchronized(a) {
char tmp = a[1];
a[1] = a[0];
a[0] = tmp;
}
Is it possible that we can make use of the following API in the above situation, so that we can have a lock free array element swapping? If yes, how?
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/atomic/AtomicReferenceFieldUpdater.html#compareAndSet%28T,%20V,%20V%29

Regardless of API used you won't be able to achieve both thread-safe and lock-free array element swapping in Java.
The element swapping requires multiple read and update operations that need to be performed atomically. To simulate the atomicity you need a lock.
EDIT:
An alternative to lock-free algorithm might be micro-locking: instead of locking the entire array it’s possible to lock only elements that are being swapped.
The value of this approach fully is questionable. That is to say if the algorithm that requires swapping elements can guarantee that different threads are going to work on different parts of the array then no synchronisation required.
In the opposite case, when different threads can actually attempt swapping overlapping elements then thread execution order will matter. For example if one thread tries to swap elements 0 and 1 of the array and the other simultaneously attempts to swap 1 and 2 then the result will depend entirely on the order of execution, for initial {‘a’,’b’,’c’} you can end up either with {‘b’,’c’,’a’} or {‘c’,’a’,’b’}. Hence you’d require a more sophisticated synchronisation.
Here is a quick and dirty class for character arrays that implements micro locking:
import java.util.concurrent.atomic.AtomicIntegerArray;
class SyncCharArray {
final private char array [];
final private AtomicIntegerArray locktable;
SyncCharArray (char array[])
{
this.array = array;
// create a lock table the size of the array
// to track currently locked elements
this.locktable = new AtomicIntegerArray(array.length);
for (int i = 0;i<array.length;i++) unlock(i);
}
void swap (int idx1, int idx2)
{
// return if the same element
if (idx1==idx2) return;
// lock element with the smaller index first to avoid possible deadlock
lock(Math.min(idx1,idx2));
lock(Math.max(idx1,idx2));
char tmp = array[idx1];
array [idx1] = array[idx2];
unlock(idx1);
array[idx2] = tmp;
unlock(idx2);
}
private void lock (int idx)
{
// if required element is locked when wait ...
while (!locktable.compareAndSet(idx,0,1)) Thread.yield();
}
private void unlock (int idx)
{
locktable.set(idx,0);
}
}
You’d need to create the SyncCharArray and then pass it to all threads that require swapping:
char array [] = {'a','b','c','d','e','f'};
SyncCharArray sca = new SyncCharArray(array);
// then pass sca to any threads that require swapping
// then within a thread
sca.swap(15,3);
Hope that makes some sense.
UPDATE:
Some testing demonstrated that unless you have a great number of threads accessing the array simulteniously (100+ on run-of-the-mill hardware) a simple synchronise (array) {} works much faster than the elaborate synchronisation.

// lock-free swap array[i] and array[j] (assumes array contains not null elements only)
static <T> void swap(AtomicReferenceArray<T> array, int i, int j) {
while (true) {
T ai = array.getAndSet(i, null);
if (ai == null) continue;
T aj = array.getAndSet(j, null);
if (aj == null) {
array.set(i, ai);
continue;
}
array.set(i, aj);
array.set(j, ai);
break;
}
}

The closest you're going to get is java.util.concurrent.atomic.AtomicReferenceArray, which offers CAS-based operations such as boolean compareAndSet(int i, E expect, E update). It does not have a swap(int pos1, int pos2) operation though so you're going to have to emulate it with two compareAndSet calls.

"The principal threat to scalability in concurrent applications is the exclusive resource lock." - Java Concurrency in Practice.
I think you need a lock, but as others mention that lock can be more granular than it is at present.
You can use lock striping like java.util.concurrent.ConcurrentHashMap.

The API you mentioned, as already stated by others, may only be used to set values of a single object, not an array. Nor even for two objects simultaneously, so you wouldn't have a secure swap anyway.
The solution depends on your specific situation. Can the array be replaced by another data structure? Is it also changing in size concurrently?
If you must use an array, it could be changed it to hold updatable objects (not primitive types nor a Char), and synchronize over both being swapped. S data structure like this would work:
public class CharValue {
public char c;
}
CharValue[] a = new CharValue[N];
Remember to use a deterministic synchronization order for not having a deadlocks (http://en.wikipedia.org/wiki/Deadlock#Circular_wait_prevention)! You could simply follow index ordering to avoid it.
If items should also be added or removed concurrently from the collection, you could use a Map instead, synchronize swaps on the Map.Entry'es and use a synchronized Map implementation. A simple List wouldn't do it because there are no isolated structures for retaining the values (or you don't have access to them).

I don't think the AtomicReferenceFieldUpdater is meant for array access, and even if it were, it only provides atomic guarantees on one reference at a time. AFAIK, all the classes in java.util.concurrent.atomic only provide atomic access to one reference at a time. In order to change two or more references as one atomic operation, you must use some kind of locking.

Related

Java ArrayList thread unsafe example explanation

class ThreadUnsafe {
static final int THREAD_NUMBER = 2;
static final int LOOP_NUMBER = 200;
public static void main(String[] args) {
ThreadUnsafe test = new ThreadUnsafe();
for (int i = 0; i < THREAD_NUMBER; i++) {
new Thread(() -> {
test.method1(LOOP_NUMBER);
}, "Thread" + i).start();
}
}
ArrayList<String> list = new ArrayList<>();
public void method1(int loopNumber) {
for (int i = 0; i < loopNumber; i++) {
method2();
method3();
}
}
private void method2() {
list.add("1");
}
private void method3() {
list.remove(0);
}
}
The code above throws
java.lang.IndexOutOfBoundsException: Index: 0, Size: 1
I know ArrayList is not thread-safe, but in the example, I think every remove() call is guaranteed to be preceded by at least one add() call, so the code should be OK even the order is messed up like the following:
thread0: method2()
thread1: method2()
thread1: method3()
thread0: method3()
Some explanations needed here, please.

If always one add() or remove() call is completely finished before another one is started, your reasoning is correct. But ArrayList doesn't guarantee that as its methods aren't synchronized. So, it can happen that two threads are in the middle of some modifying calls at the same time.
Let's look at the internals of e.g. the add() method to understand one possible failure mode.
When adding an element, ArrayList increases the size using size++. And this is not atomic.
Now imagine the list being empty, and two threads A and B adding an element at exactly the same moment, doing the size++ in parallel (maybe in different CPU cores). Let's imagine things happen in the following order:
A reads size as 0.
B reads size as 0.
A adds one to its value, giving 1.
B adds one to its value, giving 1.
A writes its new value back into the size field, resulting in size=1.
B writes its new value back into the size field, resulting in size=1.
Although we had 2 add() calls, the size is only 1. If now you try to remove 2 elements (and this time it happens sequentially), the second remove() will fail.
To achieve thread safety, no other thread should be able to mess around with the internals like size (or the elements array) while one access is currently in progress.
Multi-threading is inherently complex in that the calls from multiple threads can not only happen in any (expected or unexpected) order, but that they can also overlap, unless protected by some mechanism like synchronized. On the other hand, excessive use of the synchronization can easily lead to poor multi-thread performance, and also to dead-locks.

As a supplement to #RalfKleberhoff's answer,
I think every remove() call is guaranteed to be preceded by at least one add() call,
Yes.
so the code should be OK even the order is messed up
No, that is not a valid inference with respect to a multithreaded program.
Your program contains data races as a result of two threads both accessing the same shared, non-atomic object, with some of those accesses being writes, without appropriate synchronization. The whole behavior of a program that contains data races is undefined, so in fact you cannot draw any conclusions at all about its behavior.
Do not try to cheat or scrimp on synchronization. Do minimize the amount of it that you need by limiting your use of shared objects, but where you need it, you need it, and the rules for determining when and where you need it are not that hard to learn.

ArrayList in java docs says,
Note that this implementation is not synchronized. If multiple threads
access an ArrayList instance concurrently, and at least one of the
threads modifies the list structurally, it must be synchronized
externally.
Why this code is not thread safe ?
Multiple thread running on Machine runs independent of each other.
public void method1(int loopNumber) {
for (int i = 0; i < loopNumber; i++) {
method2();
method3();
}
}
Here method2() and method3() are being process sequential within
the thread but not across the thread. ArrayList list is common between both thread. which will be in inconstant state between both thread on multi core system.
Interesting test would be add empty check in method3() and set LOOP_NUMBER = 10000;
private void method3()
{
if (!list.isEmpty())
list.remove(0);
}
In result you should get same Runtime Exception some thing like java.lang.IndexOutOfBoundsException: Index: 0, Size: 1 or java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 because of same reason inconstant state of variable in list i.e. size.
To fix this issue you could have added synchronized like below or use Syncronized list
public void method1(int loopNumber)
{
for (int i = 0; i < loopNumber; i++)
{
synchronized (list)
{
method2();
method3();
}
}
}

Volatile arrays and memory barriers and visibility in Java

I am having difficulties understanding memory barriers and cache coherence in Java, and how these concepts relate to arrays.
I have the following scenario, where one thread modifies an array (both the reference to it and one of its internal values) and another thread reads from it.
int[] integers;
volatile boolean memoryBarrier;
public void resizeAndAddLast(int value) {
integers = Arrays.copyOf(integers, integers.size + 1);
integers[integers.size - 1] = value;
memoryBarrier = true;
}
public int read(int index) {
boolean memoryBarrier = this.memoryBarrier;
return integers[index];
}
My question is, does this do what I think it does, i.e. does "publishing" to memoryBarrier and subsequently reading the variable force a cache-coherence action and make sure that the reader thread will indeed get both the latest array reference and the correct underlying value at the specified index?
My understanding is that the array reference does not have to be declared volatile, it should be enough to force a cache-coherence action using any volatile field. Is this reasoning correct?
EDIT: there is precisely one writer thread and many reader threads.

Nope, your code is thread-unsafe. A variation which would make it safe is as follows:
void raiseFlag() {
if (memoryBarrier == true)
throw new IllegalStateException("Flag already raised");
memoryBarrier = true;
}
public int read(int index) {
if (memoryBarrier == false)
throw IllegalStateException("Flag not raised yet");
return integers[index];
}
You only get to raise the flag once and you don't get to publish more than one integers array. This would be quite useless for your use case, though.
Now, as to the why... You do not guarantee that between the first and second line of read() there wasn't an intervening write to integers which was observed by the second line. The lack of a memory barrier does not prevent another thread from observing an action. It makes the result unspecified.
There is a simple idiom that would make your code thread-safe (specialized for the assumption that a single thread calls resizeAndAddLast, otherwise more code is necessary and an AtomicReference):
volatile int[] integers;
public void resizeAndAddLast(int value) {
int[] copy = Arrays.copyOf(integers, integers.length + 1);
copy[copy.length - 1] = value;
integers = copy;
}
public int read(int index) {
return integers[index];
}
In this code you never touch an array once it got published, therefore whatever you dereference from read will be observed as intended, with the index updated.

There are multiple reasons why it wont work in general:
Java doesnt say anything about memory barriers or about the ordering
of unrelated variables. Global Memory barriers is a side effect of
x86
Even with global memory barriers: The write-order of array-reference and indexed array-value is undefined. It is guarantied that both happen-before the memory barrier, but in which order? An unsynchronized read may see the reference but not the array-value. Your read-barrier doesnt help here in case of multiple read/writes.
Beware of arrays of references: Visibility of referenced values requires special attention
A slightly better approach would be to declare the array itself as volatile and treat its values as immutable:
volatile int[] integers; // volatile (or maybe better AtomicReference)
public void resizeAndAddLast(int value) {
// enforce exactly one volatile read!
int[] copy = integers;
copy = Arrays.copyOf(copy, copy.size + 1);
copy[copy.size - 1] = value;
// may lose concurrent updates. Add synchronization or a compareExchange-loop!
integers = copy;
}
public int read(int index) {
return integers[index];
}

Unless you declare a variable volatile there is no guarantee that the thread will get the correct value. Volatile guarantees change in the variable is visible meaning instead of using the CPU cache it will write/read from main memory.
You will also need synchronization so that the reading thread does not read before the write is complete. Any reason for going with array rather than an ArrayList object because you are already using Arrays.copyOf and resizing?

Java fork join concurrent HashMap solve lost updates

I have a list of users and each user has a sequence of places he has visited (e.g. list = 1,2,3,1,2,8,10,1...usw.). Now I want figure out how often each place has been visited. Futhermore, I really want to take fork/join for that. Now my acutal question is, do you know a way to use the concurrentHashMap here, because the current problem is that there are lost updates at
map.put(i, map.get(i)+1);// lost updates here
Do you have a nice idea to solve that without locking the whole map (is there are partial lock for parts of the map as it is for put()?). I know, I could create a map for each user then join them again, but I thought, perhaps someone has a better solution.
public class ForkUsers extends RecursiveAction{
ArrayList<User>users;
ConcurrentHashMap<Integer,Integer>map;
int indexfrom;
int indexto;
ForkUsers(ArrayList<User>users,ConcurrentHashMap<Integer,Integer> map,int indexfrom,int indexto){
this.users=users;
this.map=map;
this.indexfrom=indexfrom;
this.indexto=indexto;
}
void computeDirectly(User user){
for(Integer i:user.getVisitedPlaces()){
if(map.get(i)==null){
map.putIfAbsent(i, 1);
}else{
map.put(i, map.get(i)+1);// lost updates here
}
}
}
protected void compute() {
if(indexfrom==indexto){
computeDirectly(users.get(indexfrom));
}else{
int half=(indexfrom+indexto)/2;
invokeAll(new ForkUsers(users,map,indexfrom,half),new ForkUsers(users,map,half+1,indexto));
}
}
}

Even though you're using a ConcurrentHashMap, that doesn't prevent read-update-write race conditions; both threads call get, then both add 1, then both put the value with just the single update back. You can either synchronize the whole read-update-write operation or (my preference) use an AtomicInteger for the value and use incrementAndGet instead.

"Atomically" update an entire array

I have a single writer thread and single reader thread to update and process a pool of arrays(references stored in map). The ratio of writes to read is almost 5:1(latency of writes is a concern).
The writer thread needs to update few elements of an array in the pool based on some events. The entire write operation(all elements) needs to be atomic.
I want to ensure that reader thread reads the previous updated array if writer thread is updating it(something like volatile but on entire array rather than individual fields). Basically, I can afford to read stale values but not block.
Also, since the writes are so frequent, it would be really expensive to create new objects or lock the entire array while read/write.
Is there a more efficient data structure that could be used or use cheaper locks ?

How about this idea: The writer thread does not mutate the array. It simply queues the updates.
The reader thread, whenever it enters a read session that requires a stable snapshot of the array, applies the queued updates to the array, then reads the array.
class Update
{
int position;
Object value;
}
ArrayBlockingQueue<Update> updates = new ArrayBlockingQueue<>(Integer.MAX_VALUE);
void write()
{
updates.put(new Update(...));
}
Object[] read()
{
Update update;
while((update=updates.poll())!=null)
array[update.position] = update.value;
return array;
}

Is there a more efficient data structure?
Yes, absolutely! They're called persistent data structures. They are able to represent a new version of a vector/map/etc merely by storing the differences with respect to a previous version. All versions are immutable, which makes them appropiate for concurrency (writers don't interfere/block readers, and vice versa).
In order to express change, one stores references to a persistent data structure in a reference type such as AtomicReference, and changes what those references point to - not the structures themselves.
Clojure provides a top-notch implementation of persistent data structures. They're written in pure, efficient Java.
The following program exposes how one would approach your described problem using persistent data structures.
import clojure.lang.IPersistentVector;
import clojure.lang.PersistentVector;
public class AtomicArrayUpdates {
public static Map<Integer, AtomicReference<IPersistentVector>> pool
= new HashMap<>();
public static Random rnd = new Random();
public static final int SIZE = 60000;
// For simulating the reads/writes ratio
public static final int SLEEP_TIMÉ = 5;
static {
for (int i = 0; i < SIZE; i++) {
pool.put(i, new AtomicReference(PersistentVector.EMPTY));
}
}
public static class Writer implements Runnable {
#Override public void run() {
while (true) {
try {
Thread.sleep(SLEEP_TIMÉ);
} catch (InterruptedException e) {}
int index = rnd.nextInt(SIZE);
IPersistentVector vec = pool.get(index).get();
// note how we repeatedly assign vec to a new value
// cons() means "append a value".
vec = vec.cons(rnd.nextInt(SIZE + 1));
// assocN(): "update" at index 0
vec = vec.assocN(0, 42);
// appended values are nonsense, just an example!
vec = vec.cons(rnd.nextInt(SIZE + 1));
pool.get(index).set(vec);
}
}
}
public static class Reader implements Runnable {
#Override public void run() {
while (true) {
try {
Thread.sleep(SLEEP_TIMÉ * 5);
} catch (InterruptedException e) {}
IPersistentVector vec = pool.get(rnd.nextInt(SIZE)).get();
// Now you can do whatever you want with vec.
// nothing can mutate it, and reading it doesn't block writers!
}
}
}
public static void main(String[] args) {
new Thread(new Writer()).start();
new Thread(new Reader()).start();
}
}

Another idea, given that the array contains only 20 doubles.
Have two arrays, one for write, one for read.
Reader locks the read array during read.
read()
lock();
read stuff
unlock();
Writer first modifies the write array, then tryLock the read array, if locking fails, fine, write() returns; if locking succeeds, copy the write array to the read array, then release the lock.
write()
update write array
if tryLock()
copy write array to read array
unlock()
Reader can be blocked, but only for the time it takes to copy the 20 doubles, which is short.
Reader should use spin lock, like do{}while(tryLock()==false); to avoid being suspended.

I would do as follows:
synchronize the whole thing and see if the performance is good enough. Considering you only have one writer thread and one reader thread, contention will be low and this could work well enough
private final Map<Key, double[]> map = new HashMap<> ();
public synchronized void write(Key key, double value, int index) {
double[] array = map.get(key);
array[index] = value;
}
public synchronized double[] read(Key key) {
return map.get(key);
}
if it is too slow, I would have the writer make a copy of the array, change some values and put the new array back to the map. Note that array copies are very fast - typically, a 20 items array would most likely take less than 100 nanoseconds
//If all the keys and arrays are constructed before the writer/reader threads
//start, no need for a ConcurrentMap - otherwise use a ConcurrentMap
private final Map<Key, AtomicReference<double[]>> map = new HashMap<> ();
public void write(Key key, double value, int index) {
AtomicReference<double[]> ref = map.get(key);
double[] oldArray = ref.get();
double[] newArray = oldArray.clone();
newArray[index] = value;
//you might want to check the return value to see if it worked
//or you might just skip the update if another writes was performed
//in the meantime
ref.compareAndSet(oldArray, newArray);
}
public double[] read(Key key) {
return map.get(key).get(); //check for null
}
since the writes are so frequent, it would be really expensive to create new objects or lock the entire array while read/write.
How frequent? Unless there are hundreds of them every millisecond you should be fine.
Also note that:
object creation is fairly cheap in Java (think around 10 CPU cycles = a few nanoseconds)
garbage collection of short lived object is generally free (as long as the object stays in the young generation, if it is unreachable it is not visited by the GC)
whereas long lived objects have a GC performance impact because they need to be copied across to the old generation

The following variation is inspired by both my previous answer and one of zhong.j.yu's.
Writers don't interfere/block readers and vice versa, and there are no thread safety/visibility issues, or delicate reasoning going on.
public class V2 {
static Map<Integer, AtomicReference<Double[]>> commited = new HashMap<>();
static Random rnd = new Random();
static class Writer {
private Map<Integer, Double[]> writeable = new HashMap<>();
void write() {
int i = rnd.nextInt(writeable.size());
// manipulate writeable.get(i)...
commited.get(i).set(writeable.get(i).clone());
}
}
static class Reader{
void read() {
double[] arr = commited.get(rnd.nextInt(commited.size())).get();
// do something useful with arr...
}
}
}

You need two static references: readArray and writeArray and a simple mutex to track when write has been changed.
have a locked function called changeWriteArray make changes to a deepCopy of writeArray:
synchronized String[] changeWriteArray(String[] writeArrayCopy, other params go here){
// here make changes to deepCopy of writeArray
//then return deepCopy
return writeArrayCopy;
}
Notice that changeWriteArray is functional programming with effectively no side effect since it is returning a copy that is neither readArray nor writeArray.
whoever calles changeWriteArray must call it as writeArray = changeWriteArray(writeArray.deepCopy()).
the mutex is changed by both changeWriteArray and updateReadArray but is only checked by updateReadArray. If the mutex is set, updateReadArray will simply point the reference of readArray to the actual block of writeArray
EDIT:
#vemv concerning the answer you mentioned. While the ideas are the same, the difference is significant: the two static references are static so that no time is spent actually copying the changes into the readArray; rather the pointer of readArray is moved to point to writeArray. Effectively we are swapping by means of a tmp array that changeWriteArray generates as necessary. Also the locking here is minimal as reading does not require locking in the sense that you can have more than one reader at any given time.
In fact, with this approach, you can keep a count of concurrent readers and check the counter to be zero for when to update readArray with writeArray; again, furthering that reading requires no lock at all.

Improving on #zhong.j.yu's answer, it is really a good idea to queue the writes instead of trying to perform them when they occur. However, we must tackle the problem when updates are coming so fast that the reader would choke on updates continuously coming in. My idea is what if the reades only performs the writes that were queued before the read, and ignoring subsequent writes (those would be tackled by next read).
You will need to write your own synchornised queue. It will be based off a linked list, and would contain only two methods:
public synchronised enqeue(Write write);
This method will atomically enqueue a write. There is a possible deadlock when writes would come faster than it would actually take to enqueue them, but I think there would have to be hundreds of thousands of writes every second to achieve that.
public synchronised Element cut();
This will atomically empty the queue and returns its head (or tail) as the Element object. It will contain a chain of other Elements (Element.next, etc..., just the usual linked list stuff), all those representing a chain of writes since last read. The queue would then be empty, ready to accept new writes. The reader then can trace the Element chain (which will be standalone by then, untouched by subsequent writes), perform the writes, and finally perform the read. While the reader processes the read, new writes would be enqueued in the queue, but those will be next read's problem.
I wrote this once, albeit in C++, to represent a sound data buffer. There were more writes (driver sends more data), than reads (some mathematical stuff over the data), while the writes had to finish as soon as possible. (The data came in real-time, so I needed to save them before next batch was ready in the driver.)

I've got a funny solution using three arrays and a volatile boolean toggle. Basically, both threads have its own array. Additionally, there's a shared array controlled via the toggle.
When the writer finishes and the toggle allows it, it copies the newly written array into the shared array and flips the toggle.
Similarly, before the reader starts, when the toggle allows it, it copies the shared array into its own array and flips the toggle.
public class MolecularArray {
private final double[] writeArray;
private final double[] sharedArray;
private final double[] readArray;
private volatile boolean writerOwnsShared;
MolecularArray(int length) {
writeArray = new double[length];
sharedArray = new double[length];
readArray = new double[length];
}
void read(Consumer<double[]> reader) {
if (!writerOwnsShared) {
copyFromTo(sharedArray, readArray);
writerOwnsShared = true;
}
reader.accept(readArray);
}
void write(Consumer<double[]> writer) {
writer.accept(writeArray);
if (writerOwnsShared) {
copyFromTo(writeArray, sharedArray);
writerOwnsShared = false;
}
}
private void copyFromTo(double[] from, double[] to) {
System.arraycopy(from, 0, to, 0, from.length);
}
}
It depends on the "single writer thread and single reader" assumption.
It never blocks.
It uses a constant (albeit huge) amount of memory.
Repeated calls to read without any intervening write do no copying and vice versa.
The reader does not necessarily see the most recent data, but it sees the data from the first write started after the previous read, if any.
I guess, this could be improved using two shared arrays.

How to make my data structure thread safe?

I defined an Element class:
class Element<T> {
T value;
Element<T> next;
Element(T value) {
this.value = value;
}
}
also defined a List class based on Element. It is a typical list, just like in any data structure books, has addHead, delete and etc operations
public class List<T> implements Iterable<T> {
private Element<T> head;
private Element<T> tail;
private long size;
public List() {
this.head = null;
this.tail = null;
this.size = 0;
}
public void insertHead (T node) {
Element<T> e = new Element<T>(node);
if (size == 0) {
head = e;
tail = e;
} else {
e.next = head;
head = e;
}
size++;
}
//Other method code omitted
}
How do I make this List class thread safe?
put synchronized on all methods? Seems not working. Two threads may work on differnt methods at the same time and cause collision.
If I have used an array to keep all the elements in the class, then I may use a volatile on the array to make sure only one thread is working with the internal elements. But currently all the elements are linked through object refernece on each's next pointer. I have no way to use volatile.
Using volatile on head, tail and size? This may cause deadlocks if two thread running different methods holding on the resource each other waiting for.
Any suggestions?

If you put synchronized on every method, the data structure WILL BE thread-safe. Because by definition, only one thread will be executing any method on the object at a time, and inter-thread ordering and visibility is also ensured. So it is as good as if one thread is doing all operations.
Putting a synchronized(this) block won't be any different if the area the block covers is the whole method. You might get better performance if the area is smaller than that.
Doing something like
private final Object LOCK = new Object();
public void method(){
synchronized(LOCK){
doStuff();
}
}
Is considered good practice, although not for better performance. Doing this will ensure that nobody else can use your lock, and unintentionally creating a deadlock-prone implementation etc.
In your case, I think you could use ReadWriteLock to get better read performance. As the name suggests, a ReadWriteLock lets multiple threads through if they are accessing "read method", methods that does not mutate the state of the object (Of course you have to correctly identify which of your methods are "read method" and "write method", and use ReadWriteLock accordingly!). Also, it ensures that no other thread is accessing the object while "write method" are executed. And it takes care of the scheduling of the read/write threads.
Other well known way of making a class thread-safe is "CopyOnWrite", where you copy the whole data structure upon mutation. This is only recommended when the object is mostly "read" and rarely "written".
Here is a sample implementation of that strategy.
http://www.codase.com/search/smart?join=class+java.util.concurrent.CopyOnWriteArrayList
private volatile transient E[] array;
/**
* Returns the element at the specified position in this list.
*
* #param index index of element to return.
* #return the element at the specified position in this list.
* #throws IndexOutOfBoundsException if index is out of range <tt>(index
* < 0 || index >= size())</tt>.
*/
public E get(int index) {
E[] elementData = array();
rangeCheck(index, elementData.length);
return elementData[index];
}
/**
* Appends the specified element to the end of this list.
*
* #param element element to be appended to this list.
* #return true (as per the general contract of Collection.add).
*/
public synchronized boolean add(E element) {
int len = array.length;
E[] newArray = (E[]) new Object[len+1];
System.arraycopy(array, 0, newArray, 0, len);
newArray[len] = element;
array = newArray;
return true;
}
Here, read method is accessing without going through any lock, while write method has to be synchronized. Inter-thread ordering and visibility for read methods are ensured by the use of volatile for the array.
The reason that write methods have to "copy" is because the assignment array = newArray has to be "one shot" (in java, assignment of object reference is atomic), and you may not touch the original array during the manipulation.

I'd look at the source code for the java.util.LinkedList class for a real implementation.
Synchronized by default will lock on the instance of the class - which may not be what you want. (esp. if Element is externally accessible). If you synchronize all the methods on the same lock, then you'll have terrible concurrent performance, but it'll prevent them from executing at the same time - effectively single-threading access to the class.
Also - I see a tail reference, but don't see Element with a corresponding previous field, for a double linked-list - reason?

I'd suggest you to use a ReentrantLock which you can pass to every element of the list, but you will have to use a factory to instantiate every element.
Any time you need to take something out of the list, you will block the very same lock, so you can assure that no two threads will be accessing at the same time.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.