In Java Concurrency in Practice author gives the following example of a not-thread safe class, that behind the scenes invokes iterator on a set object and if multiple threads are involved, this may cause a ConcurrentModificationException. This is understood, one thread is modifying the collection, the other is iterating over it and, - boom!
What I do not understand, - the author is saying that this code can be fixed by wrapping a HashSet with Collections.synchronizedSet(). How would this fix a problem? Even though access to all the methods will be synchronized and guarded by the same intrinsic lock, once the iterator object is obtained, there is no guarantee that the other thread won't modify the collection once an iteration is being made.
Quote from the book:
If HiddenIterator wrapped the HashSet with a synchronizedSet, encapsulating the synchronization, this sort of error would not occur.
public class HiddenIterator {
//Solution :
//If HiddenIterator wrapped the HashSet with a synchronizedSet, encapsulating the synchronization,
//this sort of error would not occur.
//#GuardedBy("this")
private final Set<Integer> set = new HashSet<Integer>();
public synchronized void add(Integer i) {
set.add(i);
}
public synchronized void remove(Integer i) {
set.remove(i);
}
public void addTenThings() {
Random r = new Random();
for (int i = 0; i < 10; i++)
add(r.nextInt());
/*The string concatenation gets turned by the compiler into a call to StringBuilder.append(Object),
* which in turn invokes the collection's toString method - and the implementation of toString in
* the standard collections iterates the collection and calls toString on each element to
* produce a nicely formatted representation of the collection's contents. */
System.out.println("DEBUG: added ten elements to " + set);
}
}
If someone could help me understand that, I'd be grateful.
Here is how I think it could've been fixed:
public class HiddenIterator {
private final Set<Integer> set = Collections.synchronizedSet(new HashSet<Integer>());
public void add(Integer i) {
set.add(i);
}
public void remove(Integer i) {
set.remove(i);
}
public void addTenThings() {
Random r = new Random();
for (int i = 0; i < 10; i++)
add(r.nextInt());
// synchronizing in set's intrinsic lock
synchronized(set) {
System.out.println("DEBUG: added ten elements to " + set);
}
}
}
Or, as an alternative, one could keep synchronized keyword for add() and remove() methods. We'd be synchronizing on this in this case. Also, we'd have to add a synchronized block (again sync'ed on this) into addTenThings(), which would contain a single operation - logging with implicit iteration:
public class HiddenIterator {
private final Set<Integer> set = new HashSet<Integer>();
public synchronized void add(Integer i) {
set.add(i);
}
public synchronized void remove(Integer i) {
set.remove(i);
}
public void addTenThings() {
Random r = new Random();
for (int i = 0; i < 10; i++)
add(r.nextInt());
synchronized(this) {
System.out.println("DEBUG: added ten elements to " + set);
}
}
}
Collections.synchronizedSet() wraps the collection in an instance of an internal class called SynchronizedSet, extending SynchronizedCollection. Now let's look how's the SynchronizedCollection.toString() is implemented:
public String toString() {
synchronized (mutex) {return c.toString();}
}
Basically the iteration is still there, hidden in the c.toString() call, but it's already synchronized with all other methods of this wrapper collection. So you don't need to repeat the synchronization in your code.
Edited
synchronizedSet()::toString()
As Sergei Petunin pointed out rightly, the toString() method of Collections.synchronizedSet() internally takes care about synchronisation, so no manual synchronistion is necessary in this case.
external iteration on synchronizedSet()
once the iterator object is obtained, there is no guarantee that the other thread won't modify the collection once an iteration is being made.
In cases of external iteration, like using for-each or an Iterator, the approach with encapsulating that iteration in an synchronize(set) block is required/sufficient.
That's why the JavaDoc of Collections.synchronizedSet() states, that
It is imperative that the user manually synchronize on the returned
sorted set when iterating over it or any of its subSet, headSet, or
tailSet views.
SortedSet s = Collections.synchronizedSortedSet(new TreeSet());
...
synchronized (s) {
Iterator i = s.iterator(); // Must be in the synchronized block
while (i.hasNext())
foo(i.next());
}
manual synchronization
Your second version with the synchronized add/remove methods of the class HiddenIterator and synchronize(this) would work too, but it introduces unneccesarry overhead as adding/removing would be synchronized twice (by HiddenIterator and Collections.synchronizedSet(..).
However, in this case you could omit the Collections.synchronizedSet(..) as HiddenIterator takes care of all the synchronization required when accessing the private Set field.
Related
I have an ArrayList filled with 'someObject'. I need to iterate over this list, with 4 different threads (using Futures & Callables). The threads will keep the top 5 valued objects it comes across. I first tried creating a parallel stream, but that didn't work out so well. Is there some obvious thing I'm not thinking of, so each thread can iterate over the objects, without possibly grabbing the same object twice?
You can use an AtomicInteger to iterate over the list:
class MyRunnable implements Runnable {
final List<SomeObject> list;
final AtomicInteger counter; // initialize to 0
public void run() {
while(true) {
int index = counter.getAndIncrement();
if(index < list.size()) {
do something with list.get(index);
} else {
return;
}
}
}
}
So long as each MyRunnable has the same AtomicInteger reference they won't duplicate indices
You don't need AtomicInteger or any other synchronization for that matter.
You should simply logically partition your list (whose size is known upfront) based on the number of processing threads (whose number is also known upfront) and let each of them operate on its own section of [from, to) of the list.
This avoid the need for any synchronization at all (even if it's just an optimized one such as AtomicInteger) which is what you should always strive for (as long as it's safe).
Pseudo code
class Worker<T> implements Runnable {
final List<T> toProcess;
protected Worker(List<T> list, int fromInc, int toExcl){
// note this does not allow passing an empty list or specifying an empty work section but you can relax that if you wish
// this also implicitly checks the list for null
Preconditions.checkArgument(fromInc >= 0 && fromInc < list.size());
Preconditions.checkArgument(toExcl > 0 && fromInc <= list.size());
// note: this does not create a copy, but only a view so it's very cheap
toProcess = list.subList(fromInc, toExcl);
}
#Override
public final void run() {
for(final T t : toProcess) {
process(t);
}
}
protected abstract process(T t);
}
As with the AtomicInteger solution (really any solution which does not involve copying the list), this solution also assumes that you will not be modifying the list once you have handed it off to each thread and processing has commenced. Modifying the list while processing is in progress will result in undefined behavior.
I have a method that's supposed to feed a map from a queue and it only does that if the map size is not exceeding a certain number. This prompted concurrency problem as the size I get from every thread is non coherent globaly. I replicated the problem by this code
import java.sql.Timestamp;
import java.util.Date;
import java.util.concurrent.ConcurrentHashMap;
public class ConcurrenthashMapTest {
private ConcurrentHashMap<Integer, Integer> map = new ConcurrentHashMap<Integer, Integer>();
private ThreadUx[] tArray = new ThreadUx[999];
public void parallelMapFilling() {
for ( int i = 0; i < 999; i++ ) {
tArray[i] = new ThreadUx( i );
}
for ( int i = 0; i < 999; i++ ) {
tArray[i].start();
}
}
public class ThreadUx extends Thread {
private int seq = 0;
public ThreadUx( int i ) {
seq = i;
}
#Override
public void run() {
while ( map.size() < 2 ) {
map.put( seq, seq );
System.out.println( Thread.currentThread().getName() + " || The size is: " + map.size() + " || " + new Timestamp( new Date().getTime() ) );
}
}
}
public static void main( String[] args ) {
new ConcurrenthashMapTest().parallelMapFilling();
}
}
Normally I should have only one line of output and the size not exceeding 1, but I do have some stuff like this
Thread-1 || The size is: 2 || 2016-06-07 18:32:55.157
Thread-0 || The size is: 2 || 2016-06-07 18:32:55.157
I tried marking the whole run method as synchronized but that didn't work, only when I did this
#Override
public void run() {
synchronized ( map ) {
if ( map.size() < 1 ) {
map.put( seq, seq );
System.out.println( Thread.currentThread().getName() + " || The size is: " + map.size() + " || " + new Timestamp( new Date().getTime() ) );
}
}
}
It worked, why is only the synch block working and the synch method? Also I don't want to use something as old as a synch block as I am working on a Java EE app, is there a Spring or Java EE task executor or annotation that can help?
From Java Concurrency in Practice:
The semantics of methods of ConcurrentHashMap that operate on the entire Map, such as size and isEmpty, have been slightly weakened to reflect the concurrent nature of the collection. Since the result of size could be out of date by the time it is computed, it is really only an estimate, so size is allowed to return an approximation instead of an exact count. While at first this may seem disturbing, in reality methods like size and isEmpty are far less useful in concurrent environments because these quantities are moving targets. So the requirements for these operations were weakened to enable performance optimizations for the most important operations, primarily get, put, containsKey, and remove.
The one feature offered by the synchronized Map implementations but not by ConcurrentHashMap is the ability to lock the map for exclusive access. With Hashtable and synchronizedMap, acquiring the Map lock prevents any other thread from accessing it. This might be necessary in unusual cases such as adding several mappings atomically, or iterating the Map several times and needing to see the same elements in the same order. On the whole, though, this is a reasonable tradeoff: concurrent collections should be expected to change their contents continuously.
Solutions:
Refactor design and do not use size method with concurrent access.
To use methods as size and isEmpty you can use synchronized collection Collections.synchronizedMap. Synchronized collections achieve their thread safety by serializing all access to the collection's state. The cost of this approach is poor concurrency; when multiple threads contend for the collection-wide lock, throughput suffers. Also you will need to synchronize the block where it checks-and-puts with map instance, because it's a compound action.
Third. Use third-party implementation or write your own.
public class BoundConcurrentHashMap <K,V> {
private final Map<K, V> m;
private final Semaphore semaphore;
public BoundConcurrentHashMap(int size) {
m = new ConcurrentHashMap<K, V>();
semaphore = new Semaphore(size);
}
public V get(V key) {
return m.get(key);
}
public boolean put(K key, V value) {
boolean hasSpace = semaphore.tryAcquire();
if(hasSpace) {
m.put(key, value);
}
return hasSpace;
}
public void remove(Object key) {
m.remove(key);
semaphore.release();
}
// approximation, do not trust this method
public int size(){
return m.size();
}
}
Class BoundConcurrentHashMap is as effective as ConcurrentHashMap and almost thread-safe. Because removing an element and releasing semaphore in remove method are not simultaneous as it should be. But in this case it is tolerable. size method still returns approximated value, but put method will not allow to exceed map size.
You are using ConcurrentHashMap, and according to the API doc:
Bear in mind that the results of aggregate status methods including
size, isEmpty, and containsValue are typically useful only when a map
is not undergoing concurrent updates in other threads. Otherwise the
results of these methods reflect transient states that may be adequate
for monitoring or estimation purposes, but not for program control.
Which means you cannot get accurate result unless you explicit synchronize the access to size().
Adding synchronized to the run method does not work because threads are not synchronizing on the same lock object -- each getting a lock on itself.
Synchronizing on the map itself definitely work, but IMHO it's not a good choice because then you lose the performance advantage ConcurrentHashMap can provide.
In conclusion you need to reconsider the design.
I am facing a problem in my program when multiple threads access the same server over RMI. The server contains a list as a cache and performs some expensive computation sometimes changing that list. After the computation finished the list will be serialized and sent to the client.
First Problem: if the list is changed while being serialized (e.g. by a different client requesting some data) a ConcurrentModificationException is (probably) thrown, resulting in a EOFException for the RMI call / the deserialization on the client-side.
Therefore I need a some kind of list-structure which is "stable" for serialization while possibly being changed by a different thread.
Solutions we tried:
regular ArrayList / Set - not working because of concurrency
deep-copying the entire structure before every serialization - faaar too expensive
CopyOnWriteArrayList - expensive as well since it copies the list and
revealing the Second Problem: we need to be able to atomically replace any element in the list which is currently not thread-safe (first delete, then add (which is even more expensive)) or only doable by locking the list and therefore only doing the different threads in sequence.
Therefore my question is:
Do you know of a Collection implementation which allows us to serialize the Collection thread-safe while other Threads modify it and which contains some way of atomically replacing elements?
A bonus would be if the list would not need to be copied before serialization! Creating a snapshot for every serialization would be okay, but still meh :/
Illustration of the problem (C=compute, A=add to list, R=remove from list, S=serialize)
Thread1 Thread2
C
A
A C
C A
S C
S R <---- Remove and add have to be performed without Thread1 serializing
S A <---- anything in between (atomically) - and it has to be done without
S S blocking other threads computations and serializations for long
S and not third thread must be allowed to start serializing in this
S in-between state
S
The simplest solution would be to imply external synchronization to the ArrayList, possibly via read-write lock like this:
public class SyncList<T> implements Serializable {
private static final long serialVersionUID = -6184959782243333803L;
private List<T> list = new ArrayList<>();
private transient Lock readLock, writeLock;
public SyncList() {
ReentrantReadWriteLock readWriteLock = new ReentrantReadWriteLock();
readLock = readWriteLock.readLock();
writeLock = readWriteLock.writeLock();
}
public void add(T element) {
writeLock.lock();
try {
list.add(element);
} finally {
writeLock.unlock();
}
}
public T get(int index) {
readLock.lock();
try {
return list.get(index);
} finally {
readLock.unlock();
}
}
public String dump() {
readLock.lock();
try {
return list.toString();
} finally {
readLock.unlock();
}
}
public boolean replace(T old, T newElement) {
writeLock.lock();
try {
int pos = list.indexOf(old);
if (pos < 0)
return false;
list.set(pos, newElement);
return true;
} finally {
writeLock.unlock();
}
}
private void writeObject(ObjectOutputStream out) throws IOException {
readLock.lock();
try {
out.writeObject(list);
} finally {
readLock.unlock();
}
}
#SuppressWarnings("unchecked")
private void readObject(ObjectInputStream in) throws IOException,
ClassNotFoundException {
list = (List<T>) in.readObject();
ReentrantReadWriteLock readWriteLock = new ReentrantReadWriteLock();
readLock = readWriteLock.readLock();
writeLock = readWriteLock.writeLock();
}
}
Provide any operations you like, just properly use either read-lock or write-lock.
My wrong initial thought was that the CopyOnWriteArrayList was a bad idea since it copies everything. But of course it does only perform a shallow copy, copying only the references, not a deep copy copying all Objects as well.
Therefore we clearly went with the CopyOnWriteArrayList because it already offered a lot of the needed functionality. The only remaining problem was the replace which even got more complex to be a addIfAbsentOrReplace.
We tried the CopyOnWriteArraySet but that did not fit our need because it only offered addIfAbsent. But in our case we had a instance of a class C called c1 which we needed to store and then replace with a updated new instance c2. Of course we overwrite equals and hashCode. Now we had to choose wether or not we wanted the equality to return true or false for the two only minimally different objects. Both options did not work, because
true would mean that the objects are the same and the set would not even bother adding the new object c2 because c1 already is in
false would mean c2 would be added but c1 would not be removed
Therefore CopyOnWriteArrayList. That list already offers a
public void replaceAll(UnaryOperator<E> operator) { ... }
which somewhat fits our needs. It lets us replace the object we need via custom comparison.
We utilized it in the following way:
protected <T extends OurSpecialClass> void addIfAbsentOrReplace(T toAdd, List<T> elementList) {
OurSpecialClassReplaceOperator<T> op = new OurSpecialClassReplaceOperator<>(toAdd);
synchronized (elementList) {
elementList.replaceAll(op);
if (!op.isReplaced()) {
elementList.add(toAdd);
}
}
}
private class OurSpecialClassReplaceOperator<T extends OurSpecialClass> implements UnaryOperator<T> {
private boolean replaced = false;
private T toAdd;
public OurSpecialClassReplaceOperator(T toAdd) {
this.toAdd = toAdd;
}
#Override
public T apply(T toAdd) {
if (this.toAdd.getID().equals(toAdd.getID())) {
replaced = true;
return this.toAdd;
}
return toAdd;
}
public boolean isReplaced() {
return replaced;
}
}
How java AtomicReference works under the hood? I tried looking over the code but is based on sun.misc.Unsafe so probably another question is how Unsafe works?
This is specific to the current implementation and can change but isn't necessarily documents
How java AtomicReference works under the hood
There are two operations. Single read/writes or atomic swaps.
Single read/writes are simple volatile loads or stores.
The atomic swaps need processor level instructions. The most common implementations are Compare and Swap (CAS) found on sparc-TSO, x86, and ia64 and LL/SC found on arm, ppc and alpha. I am sure there are more that I am missing out but this gives you an idea of the scope.
another question is how Unsafe works?
Unsafe works via native methods leveraging processor instructions.
Sources:
http://gee.cs.oswego.edu/dl/jmm/cookbook.html
Some important elementary facts are as follows. 1> Different threads can only contend for instance and static member variables in the heap space. 2> Volatile read or write are completely atomic and serialized/happens before and only done from memory. By saying this I mean that any read will follow the previous write in memory. And any write will follow the previous read from memory. So any thread working with a volatile will always see the most up-to-date value. AtomicReference uses this property of volatile.
Following are some of the source code of AtomicReference. AtomicReference refers to an object reference. This reference is a volatile member variable in the AtomicReference instance as below.
private volatile V value;
get() simply returns the latest value of the variable (as volatiles do in a "happens before" manner).
public final V get()
Following is the most important method of AtomicReference.
public final boolean compareAndSet(V expect, V update) {
return unsafe.compareAndSwapObject(this, valueOffset, expect, update);
}
The compareAndSet(expect,update) method calls the compareAndSwapObject() method of the unsafe class of Java. This method call of unsafe invokes the native call, which invokes a single instruction to the processor. "expect" and "update" each reference an object.
If and only if the AtomicReference instance member variable "value" refers to the same object is referred to by "expect", "update" is assigned to this instance variable now, and "true" is returned. Or else, false is returned. The whole thing is done atomically. No other thread can intercept in between. As this is a single processor operation (magic of modern computer architecture), it's often faster than using a synchronized block. But remember that when multiple variables need to be updated atomically, AtomicReference won't help.
I would like to add a full fledged running code, which can be run in eclipse. It would clear many confusion. Here 22 users (MyTh threads) are trying to book 20 seats. Following is the code snippet followed by the full code.
Code snippet where 22 users are trying to book 20 seats.
for (int i = 0; i < 20; i++) {// 20 seats
seats.add(new AtomicReference<Integer>());
}
Thread[] ths = new Thread[22];// 22 users
for (int i = 0; i < ths.length; i++) {
ths[i] = new MyTh(seats, i);
ths[i].start();
}
Following is the full running code.
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicReference;
public class Solution {
static List<AtomicReference<Integer>> seats;// Movie seats numbered as per
// list index
public static void main(String[] args) throws InterruptedException {
// TODO Auto-generated method stub
seats = new ArrayList<>();
for (int i = 0; i < 20; i++) {// 20 seats
seats.add(new AtomicReference<Integer>());
}
Thread[] ths = new Thread[22];// 22 users
for (int i = 0; i < ths.length; i++) {
ths[i] = new MyTh(seats, i);
ths[i].start();
}
for (Thread t : ths) {
t.join();
}
for (AtomicReference<Integer> seat : seats) {
System.out.print(" " + seat.get());
}
}
/**
* id is the id of the user
*
* #author sankbane
*
*/
static class MyTh extends Thread {// each thread is a user
static AtomicInteger full = new AtomicInteger(0);
List<AtomicReference<Integer>> l;//seats
int id;//id of the users
int seats;
public MyTh(List<AtomicReference<Integer>> list, int userId) {
l = list;
this.id = userId;
seats = list.size();
}
#Override
public void run() {
boolean reserved = false;
try {
while (!reserved && full.get() < seats) {
Thread.sleep(50);
int r = ThreadLocalRandom.current().nextInt(0, seats);// excludes
// seats
//
AtomicReference<Integer> el = l.get(r);
reserved = el.compareAndSet(null, id);// null means no user
// has reserved this
// seat
if (reserved)
full.getAndIncrement();
}
if (!reserved && full.get() == seats)
System.out.println("user " + id + " did not get a seat");
} catch (InterruptedException ie) {
// log it
}
}
}
}
AtomicReference has two fields:-
* value, which is the reference
* valueOffset, which is the position of value in bytes from 'this', i.e. the AtomicReference
In compareAndSwap(expected, updated), the object at this-location + valueOffset is compared using == semantics with "expected", and if ==, then updated with "updated".
This is a single hardware instruction, and thus guaranteed to update or fail with false return atomically.
Read Unsafe source code from openJDK.
From the CopyOnWriteArrayList.java, the add method is as follows:
public boolean add(E e) {
final ReentrantLock lock = this.lock;
lock.lock();
try {
Object[] elements = getArray();
int len = elements.length;
Object[] newElements = Arrays.copyOf(elements, len + 1);
newElements[len] = e;
setArray(newElements);
return true;
} finally {
lock.unlock();
}
}
It's not hard to understand that add operation should lock, what confuses me is that it copy old data to new array and abandon the previous one.
meanwhile get method is as follows:
public E get(int index) {
return (E)(getArray()[index]);
}
With no lock in get method.
I find some explanations, some say copy to a new array can avoid add and get method operate on the same array.
My problem is why two thread cannot read and write at the same time?
If you just look at the top of the class CopyOnWriteArrayList about array referance variablle declaration there is the answer of your question.
private volatile transient Object[] array; // this is volatile
return (E)(getArray()[index]);
which returns latest copy of array[index] so this is threadsafe
final Object[] getArray() {
return array;
}
getArray is returning reference to array.
Actually the reason that the write path locks is not because it needs to provide thread safety considering the read path, but because it wants to serialize writers. Since the copy-on-write technique replaces the volatile reference, it's usually best to serialize that operation.
The key to this idea is that writes are accomplished by copying the existing value, modifying it, and replacing the reference. It also follows that once set the object pointed by the reference is always read only (i.e. no mutation is done directly on the object referred by the reference). Therefore, readers can access it safely without synchronization.
Reads and writes can happen concurrently. However, the implication is that the reads will see the soon-to-be-stale state until the volatile reference set is done.
At the time of get() if multiple threads try to get from the list their will be no issue.
Because due to volatile array it will always read latest copy and return the element from array.
But
During add() or set() every time they created a new array to avoid mutual execution problems, this is one way to make objects thread safe to make the immutable.
If they have used same array object during add or set then they have to make traversal synchronized.or it may throw exception if any thread add/remove object to list during traversal
As per java doc
A thread-safe variant of java.util.ArrayList in which all mutative operations (add, set, and so on) are implemented by making a fresh copy of the underlying array.
This is ordinarily too costly, but may be more efficient than alternatives when traversal operations vastly outnumber mutations, and is useful when you cannot or don't want to synchronize traversals
See this
package com.concurrent;
import java.util.List;
import java.util.concurrent.CopyOnWriteArrayList;
public class CopyOnWriteArrayListTest {
/**
* #param args
*/
public static void main(String[] args) {
CopyOnWriteArrayList<Integer> list=new CopyOnWriteArrayList<>();
Viewer viewer=new Viewer();
viewer.setList(list);
Thread t1=new Thread(viewer);
Adder adder=new Adder();
adder.setList(list);
Thread t=new Thread(adder);
t.start();
t1.start();
}
static class Adder implements Runnable{
private List<Integer> list;
public void setList(List<Integer> list) {
this.list = list;
}
#Override
public void run() {
for(int i=0;i<100;i++){
list.add(i);
System.out.println("Added-"+i);
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
static class Viewer implements Runnable{
private List<Integer> list;
public void setList(List<Integer> list) {
this.list = list;
}
#Override
public void run() {
while (true) {
System.out.println("Length of list->"+list.size());
for (Integer i : list) {
System.out.println("Reading-"+i);
try {
Thread.sleep(500);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
}
}