Difference between CopyOnWriteArrayList and synchronizedList - java

As per my understanding concurrent collection classes preferred over synchronized collections because the concurrent collection classes don't take a lock on the complete collection object. Instead they take locks on a small segment of the collection object.
But when I checked the add method of CopyOnWriteArrayList, we are acquiring a lock on complete collection object. Then how come CopyOnWriteArrayList is better than a list returned by Collections.synchronizedList? The only difference I see in the add method of CopyOnWriteArrayList is that we are creating copy of that array each time the add method is called.
public boolean add(E e) {
final ReentrantLock lock = this.lock;
lock.lock();
try {
Object[] elements = getArray();
int len = elements.length;
Object[] newElements = Arrays.copyOf(elements, len + 1);
newElements[len] = e;
setArray(newElements);
return true;
} finally {
lock.unlock();
}
}

As per my understanding concurrent collection classes preferred over synchronized collection because concurrent collection classes don't take lock on complete collection object. Instead it takes lock on small segment of collection object.
This is true for some collections but not all. A map returned by Collections.synchronizedMap locks the entire map around every operation, whereas ConcurrentHashMap locks only one hash bucket for some operations, or it might use a non-blocking algorithm for others.
For other collections, the algorithms in use, and thus the tradeoffs, are different. This is particularly true of lists returned by Collections.synchronizedList compared to CopyOnWriteArrayList. As you noted, both synchronizedList and CopyOnWriteArrayList take a lock on the entire array during write operations. So why are the different?
The difference emerges if you look at other operations, such as iterating over every element of the collection. The documentation for Collections.synchronizedList says,
It is imperative that the user manually synchronize on the returned list when iterating over it:
List list = Collections.synchronizedList(new ArrayList());
...
synchronized (list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
Failure to follow this advice may result in non-deterministic behavior.
In other words, iterating over a synchronizedList is not thread-safe unless you do locking manually. Note that when using this technique, all operations by other threads on this list, including iterations, gets, sets, adds, and removals, are blocked. Only one thread at a time can do anything with this collection.
By contrast, the doc for CopyOnWriteArrayList says,
The "snapshot" style iterator method uses a reference to the state of the array at the point that the iterator was created. This array never changes during the lifetime of the iterator, so interference is impossible and the iterator is guaranteed not to throw ConcurrentModificationException. The iterator will not reflect additions, removals, or changes to the list since the iterator was created.
Operations by other threads on this list can proceed concurrently, but the iteration isn't affected by changes made by any other threads. So, even though write operations lock the entire list, CopyOnWriteArrayList still can provide higher throughput than an ordinary synchronizedList. (Provided that there is a high proportion of reads and traversals to writes.)

For write (add) operation, CopyOnWriteArrayList uses ReentrantLock and creates a backup copy of the data and the underlying volatile array reference is only updated via setArray(Any read operation on the list during before setArray will return the old data before add).Moreover, CopyOnWriteArrayList provides snapshot fail-safe iterator and doesn't throw ConcurrentModifficationException on write/ add.
But when I checked add method of CopyOnWriteArrayList.class, we are acquiring lock on complete collection object. Then how come CopyOnWriteArrayList is better than synchronizedList. The only difference I see in add method of CopyOnWriteArrayList is we are creating copy of that array each time add method get called.
No, the lock is not on the entire Collection object. As stated above it is a ReentrantLock and it is different from the intrinsic object lock.
The add method will always create a copy of the existing array and do the modification on the copy and then finally update the volatile reference of the array to point to this new array. And that's why we have the name "CopyOnWriteArrayList" - makes copy when you write into it.. This also avoids the ConcurrentModificationException

1) get and other read operation on CopyOnWriteArrayList are not synchronized.
2) CopyOnWriteArrayList's iterator never throws ConcurrentModificationException while Collections.synchronizedList's iterator may throw it.

Related

Synchronizing ArrayList

In ArrayList api we have:
Note that this implementation is not synchronized. If multiple threads
access an ArrayList instance concurrently, and at least one of the
threads modifies the list structurally, it must be synchronized
externally. (A structural modification is any operation that adds or
deletes one or more elements, or explicitly resizes the backing array;
merely setting the value of an element is not a structural
modification.) This is typically accomplished by synchronizing on some
object that naturally encapsulates the list. If no such object exists,
the list should be "wrapped" using the Collections.synchronizedList
method.
Here what is meant by "This is typically accomplished by synchronizing on some object that naturally encapsulates the list"? How this related to concurrent modification exception?
from ArrayList
This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the
list should be "wrapped" using the Collections.synchronizedList
method. This is best done at creation time, to prevent accidental
unsynchronized access to the list:
List list = Collections.synchronizedList(new ArrayList(...));
By "naturally encapsulates" it means that if the list is a field of an object, but the list is not publically accessible suppose the following:
class ParkingLot{
private ArrayList<Cars> spots;
public boolean park(int spotNumber, Car car){
if( spots.get(spotNumber)==null){
spot.set(spotNumber,car);
return true;
}
return false;
}
}
In this case ParkinLot would encapulate the list spot. If you were to try and call park(), you'd want to synchronize on the ParkingLot object to prevent two threads trying to park a car in the same spot at the same time.
It is related to a ConcurrentModificationException in that it prevents you from changing the list from separate threads simultaneously (by synchronizing), which could leave the list in an inconsistent state (ie Two cars parking at the same time thinking they've successfully parked).
Here what is meant by "This is typically accomplished by synchronizing
on some object that naturally encapsulates the list"
If you have a class which encapsulates an ArrayList then if you synchronize on the wrapper object than the underlying ArrayList will also be synchronized. e.g.
class MyClasss{
private final ArrayList list;
......
......
......
}
If you synchronize on the instance of MyClass than the underlying list is also synchronized and the all the read / write will be serialized e.g.
class MyClasss{
private final ArrayList list;
......
......
......
public void fun(){
synchronized(this){
list.add(....)
}
}
How this related to concurrent modification exception?
Before answering the above question you need to understand how ConcurrentModificationException is thrown by JVM.
ConcurrentModificationException is implemented in java by checking the modification count of each Collection. Every time you do an operation it compares the modification count before doing the operation and after doing the operation.
So if you synchronize the Collection then simultaneous modification of the modification count will not happen resulting in not throwing the ConcurrentModificationException.
Hope it helps.
Why not use Vector instead? It's already synchronized.

Need to manually synchronize the Synchronized list while iteration when it could be avoided?

My question is about synchronizedList method Collections Class.
Javadocs say:
It is imperative that the user manually synchronize on the returned list when iterating over it:
List list = Collections.synchronizedList(new ArrayList());
...
synchronized(list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
Though manually synchroniziation is not required for other methods. I looked into the source code of Collections class
and found shyncronization has already been taken care for all methods like add
public boolean add(E e) {
synchronized(list) {return c.add(e);}
}
but not for iterator method. I think iterator method could have also handled synchronization in the same fashion
as above method (it would have avoided the extra work i.e manual synchronization for programmers). i am sure there
must be some concrete reason behind it but i am missing it?
public Iterator<E> iterator() {
return c.iterator(); // Must be manually synched by user!
}
A way to avoid manual synchronization from Programmer
public Iterator<E> iterator() {
synchronized(list) {
return c.iterator(); // No need to manually synched by user!
}
}
I think iterator method could have also handled synchronization in the same fashion as above method
No, it absolutely couldn't.
The iterator has no control over what your code does between calls to the individual methods on it. That's the point. Your iteration code will call hasNext() and next() repeatedly, and synchronization during those calls is feasible but irrelevant - what's important is that no other code tries to modify the list across the whole time you're iterating.
So imagine a timeline of:
t = 0: call iterator()
t = 1: call hasNext()
t = 2: call next()
// Do lots of work with the returned item
t = 10: call hasNext()
The iterator can't synchronize between the end of the call to next() at t=2 and the call to hasNext() at t=10. So if another thread tries to (say) add an item to the list at t=7, how is the iterator meant to stop it from doing so?
This is the overall problem with synchronized collections: each individual operation is synchronized, whereas typically you want a whole chunky operation to be synchronized.
If you don't synchronize the entire iteration, another thread could modify the collection as you iterate, leading to a ConccurentModificationException.
Also, the returned iterator is not thread-safe.
They could fix that by wrapping the iterator in a SynchronizedIterator that locks every method in the iterator, but that wouldn't help either – another thread could still modify the collection between two iterations, and break everything.
This is one of the reasons that the Collections.synchronized*() methods are completely useless.
For more information about proper thread-safe collection usage, see my blog.
If you want to avoid manual synchronization, you have to use a Collection like java.util.concurrent.CopyOnWriteArrayList. Every time an object is added to the list, the underlying datastructure is copyied to avaoid a concurrent modification exception.
The reason why you need manual serialization on the Iterator in your example is that the Iterator uses the same internal datastructure as the list but they are independend objects and both Iterator and list can be accessed by different threads at any arbitrary moment in time.
Another aproach would be to make a local copy of the list and iterate over the copy.

Necessary to synchronize a concurrent HashMap when calling values()?

In the following code:
private final Map<A, B> entriesMap = Collections
.synchronizedMap(new HashMap<A, B>());
// ...
List<B> entries = new ArrayList<>(this.entriesMap.values());
If entriesMap is being accessed/modified by multiple threads in other methods, is it necessary to synchronize on entriesMap? In other words:
List<B> entries;
synchronize (this.entriesMap) {
entries = new ArrayList<>(this.entriesMap.values());
}
If I am correct, values() is not an atomic operation, unlike put() and get(), right?
Thanks!
The problem is that even if values() itself were atomic, the act of iterating over it is isn't. The ArrayList constructor can't take a copy of the values in an atomic way - and the iterator will be invalidated if another thread changes the map while it's copying them.
Well, calling values might well be an atomic operation, but the Collection it returns is not a snapshot copy, but backed by the underlying Map, so it will croak when there are concurrent modifications to the Map as you iterate it afterwards (when copying it into the ArrayList).
Note that this (ConcurrentModificationException) also happens when there is just one thread, as long as you iterate the values and modify the Map in an interleaved fashion, so this is not really a problem of thread synchronization.
Further note that there is ConcurrentHashMap, which does provide for a snapshot iterator, that you can iterate while modifying the Map (modifications are not reflected in the iterator). But even with ConcurrentHashMap, the Collection of values() is not a snapshot, it works just like for the normal HashMap.
Yes, you are right.
When put the values of the map in the ArrayList, an iteration is performed on the values. So you need the synchronized block. See this page.
Collections.synchronizedMap() guarantees that each atomic operation you want to run on the map will be synchronized. and values() is also atomic , but when you are putting values to ArrayList that will not be synchronized in this case you need to synchronized the list also.

Why does it.next() throw java.util.ConcurrentModificationException?

final Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term t : terms.keySet()) {
Collection<BooleanClause> C = new HashSet(terms.get(t));
if (!C.isEmpty()) {
for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
BooleanClause c = it.next();
if(c.isSomething()) C.remove(c);
}
}
}
Not a SSCCE, but can you pick up the smell?
The Iterator for the HashSet class is a fail-fast iterator. From the documentation of the HashSet class:
The iterators returned by this class's iterator method are fail-fast:
if the set is modified at any time after the iterator is created, in
any way except through the iterator's own remove method, the Iterator
throws a ConcurrentModificationException. Thus, in the face of
concurrent modification, the iterator fails quickly and cleanly,
rather than risking arbitrary, non-deterministic behavior at an
undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed
as it is, generally speaking, impossible to make any hard guarantees
in the presence of unsynchronized concurrent modification. Fail-fast
iterators throw ConcurrentModificationException on a best-effort
basis. Therefore, it would be wrong to write a program that depended
on this exception for its correctness: the fail-fast behavior of
iterators should be used only to detect bugs.
Note the last sentence - the fact that you are catching a ConcurrentModificationException implies that another thread is modifying the collection. The same Javadoc API page also states:
If multiple threads access a hash set concurrently, and at least one
of the threads modifies the set, it must be synchronized externally.
This is typically accomplished by synchronizing on some object that
naturally encapsulates the set. If no such object exists, the set
should be "wrapped" using the Collections.synchronizedSet method. This
is best done at creation time, to prevent accidental unsynchronized
access to the set:
Set s = Collections.synchronizedSet(new HashSet(...));
I believe the references to the Javadoc are self explanatory in what ought to be done next.
Additionally, in your case, I do not see why you are not using the ImmutableSet, instead of creating a HashSet on the terms object (which could possibly be modified in the interim; I cannot see the implementation of the getTerms method, but I have a hunch that the underlying keyset is being modified). Creating a immutable set will allow the current thread to have it's own defensive copy of the original key-set.
Note, that although a ConcurrentModificationException can be prevented by using a synchronized Set (as noted in the Java API documentation), it is a prerequisite that all threads access the synchronized collection and not the backing collection directly (which might be untrue in your case as the HashSet is probably created in one thread, while the underlying collection for the MultiMap is modified by other threads). The synchronized collection classes actually maintain an internal mutex for threads to acquire access to; since you cannot access the mutex directly from other threads (and it would be quite ridiculous to do so here), you ought to look at using a defensive copy of either the keyset or of the MultiMap itself using the unmodifiableMultimap method of the MultiMaps class (you'll need to return an unmodifiable MultiMap from the getTerms method). You could also investigate the necessity of returning a synchronized MultiMap, but then again, you'll need to ensure that the mutex must be acquired by any thread to protect the underlying collection from concurrent modifications.
Note, I have deliberately omitted mentioning the use of a thread-safe HashSet for the sole reason that I'm unsure of whether concurrent access to the actual collection will be ensured; it most likely will not be the case.
Edit: ConcurrentModificationExceptions thrown on Iterator.next in a single-threaded scenario
This is with respect to the statement: if(c.isSomething()) C.remove(c); that was introduced in the edited question.
Invoking Collection.remove changes the nature of the question, for it now becomes possible to have ConcurrentModificationExceptions thrown even in a single-threaded scenario.
The possibility arises out of the use of the method itself, in conjunction with the use of the Collection's iterator, in this case the variable it that was initialized using the statement : Iterator<BooleanClause> it = C.iterator();.
The Iterator it that iterates over Collection C stores state pertinent to the current state of the Collection. In this particular case (assuming a Sun/Oracle JRE), a KeyIterator (an internal inner class of the HashMap class that is used by the HashSet) is used to iterate through the Collection. A particular characteristic of this Iterator is that it tracks the number of structural modifications performed on the Collection (the HashMap in this case) via it's Iterator.remove method.
When you invoke remove on the Collection directly, and then follow it up with an invocation of Iterator.next, the iterator throws a ConcurrentModificationException, as Iterator.next verifies whether any structural modifications of the Collection have occurred that the Iterator is unaware of. In this case, Collection.remove causes a structural modification, that is tracked by the Collection, but not by the Iterator.
To overcome this part of the problem, you must invoke Iterator.remove and not Collection.remove, for this ensures that the Iterator is now aware of the modification to the Collection. The Iterator in this case, will track the structural modification occurring through the remove method. Your code should therefore look like the following:
final Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term t : terms.keySet()) {
Collection<BooleanClause> C = new HashSet(terms.get(t));
if (!C.isEmpty()) {
for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
BooleanClause c = it.next();
if(c.isSomething()) it.remove(); // <-- invoke remove on the Iterator. Removes the element returned by it.next.
}
}
}
The reason is that you are trying to modify the collection outside iterator.
How it works :
When you create an iterator the collection maintains a modificationNum-variable for both the collection and the iterator independently.
1. The variable for collection is being incremented for each change made to the collection and and iterator.
2. The variable for iterator is being incremented for each change made to the iterator.
So when you call it.remove() through iterator that increases the value of both the modification-number-variable by 1.
But again when you call collection.remove() on collection directly, that increments only the value of the modification-numbervariable for the collection, but not the variable for the iterator.
And rule is : whenever the modification-number value for the iterator does not match with the original collection modification-number value, it gives ConcurrentModificationException.
Vineet Reynolds has explained in great details the reasons why collections throw a ConcurrentModificationException (thread-safety, concurrency). Swagatika has explained in great details the implementation details of this mechanism (how collection and iterator keep count of the number of modifications).
Their answers were interesting, and I upvoted them. But, in your case, the problem does not come from concurrency (you have only one thread), and implementation details, while interesting, should not be considered here.
You should only consider this part of the HashSet javadoc:
The iterators returned by this class's iterator method are fail-fast:
if the set is modified at any time after the iterator is created, in
any way except through the iterator's own remove method, the Iterator
throws a ConcurrentModificationException. Thus, in the face of
concurrent modification, the iterator fails quickly and cleanly,
rather than risking arbitrary, non-deterministic behavior at an
undetermined time in the future.
In your code, you iterate over your HashSet using its iterator, but you use the HashSet's own remove method to remove elements ( C.remove(c) ), which causes the ConcurrentModificationException. Instead, as explained in the javadoc, you should use the Iterator's own remove() method, which removes the element being currently iterated from the underlying collection.
Replace
if(c.isSomething()) C.remove(c);
with
if(c.isSomething()) it.remove();
If you want to use a more functional approach, you could create a Predicate and use Guava's Iterables.removeIf() method on the HashSet:
Predicate<BooleanClause> ignoredBooleanClausePredicate = ...;
Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term term : terms.keySet()) {
Collection<BooleanClause> booleanClauses = Sets.newHashSet(terms.get(term));
Iterables.removeIf(booleanClauses, ignoredBooleanClausePredicate);
}
PS: note that in both cases, this will only remove elements from the temporary HashSet. The Multimap won't be modified.

java - general synchronizedList question

I have a general question regarding synchronized List.
Lets say that in the constructor I am createing a list
List synchronizedList = Collections.synchronizedList(list);
and I have one method adds an object to the list.
public void add(String s){
synchronizedList.add(s)
}
There is another thread that checks every few seconds if there are a few rows , dump it to a file and deletes them all.
Now lets say I iterate each row and save it to the db.
after all iteration I clear the list.
How does the multithread support help me?
I could add an element to the list just before the clear() in the other thread occurs .
Unless I manage the lock myself (which I dont realy need a synched list for that ) it myself.
The synchronized list returned by Collections won't help in your case. It's only good if you need to guarantee serial access to individual method calls. If you need to synchronize around a larger set of operations, then you need to manually wrap that code in a synchronized block. The Javadoc states:
It is imperative that the user manually synchronize on the returned list when iterating over it.
If your list is used elsewhere you can at least safeguard it from individual method calls that would otherwise not be thread-safe. If you're entirely managing the list however, you can just add a synchronized block to your add method and use the same lock that you'll use when iterating over it.
synchronizedList indeed only guarantees that every method call on the list is synchronized. If you need multiple operations to be done in a synchronized way, you have to handle the synchronization yourself.
BTW, this is explicitely said in the javadoc for Collections.synchronizedList :
It is imperative that the user
manually synchronize on the returned
list when iterating over it:
List list = Collections.synchronizedList(new ArrayList());
...
synchronized(list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
synchronized list means that all the operations on that list are guaranteed to be atomic. The scenario you describe requires to have some locking outside the list. Consider semaphores or making synchronized block to implement monitors. Take a look at java.util.concurrent.

Categories

Resources