Synchronizing ArrayList - java

In ArrayList api we have:
Note that this implementation is not synchronized. If multiple threads
access an ArrayList instance concurrently, and at least one of the
threads modifies the list structurally, it must be synchronized
externally. (A structural modification is any operation that adds or
deletes one or more elements, or explicitly resizes the backing array;
merely setting the value of an element is not a structural
modification.) This is typically accomplished by synchronizing on some
object that naturally encapsulates the list. If no such object exists,
the list should be "wrapped" using the Collections.synchronizedList
method.
Here what is meant by "This is typically accomplished by synchronizing on some object that naturally encapsulates the list"? How this related to concurrent modification exception?

from ArrayList
This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the
list should be "wrapped" using the Collections.synchronizedList
method. This is best done at creation time, to prevent accidental
unsynchronized access to the list:
List list = Collections.synchronizedList(new ArrayList(...));
By "naturally encapsulates" it means that if the list is a field of an object, but the list is not publically accessible suppose the following:
class ParkingLot{
private ArrayList<Cars> spots;
public boolean park(int spotNumber, Car car){
if( spots.get(spotNumber)==null){
spot.set(spotNumber,car);
return true;
}
return false;
}
}
In this case ParkinLot would encapulate the list spot. If you were to try and call park(), you'd want to synchronize on the ParkingLot object to prevent two threads trying to park a car in the same spot at the same time.
It is related to a ConcurrentModificationException in that it prevents you from changing the list from separate threads simultaneously (by synchronizing), which could leave the list in an inconsistent state (ie Two cars parking at the same time thinking they've successfully parked).

Here what is meant by "This is typically accomplished by synchronizing
on some object that naturally encapsulates the list"
If you have a class which encapsulates an ArrayList then if you synchronize on the wrapper object than the underlying ArrayList will also be synchronized. e.g.
class MyClasss{
private final ArrayList list;
......
......
......
}
If you synchronize on the instance of MyClass than the underlying list is also synchronized and the all the read / write will be serialized e.g.
class MyClasss{
private final ArrayList list;
......
......
......
public void fun(){
synchronized(this){
list.add(....)
}
}
How this related to concurrent modification exception?
Before answering the above question you need to understand how ConcurrentModificationException is thrown by JVM.
ConcurrentModificationException is implemented in java by checking the modification count of each Collection. Every time you do an operation it compares the modification count before doing the operation and after doing the operation.
So if you synchronize the Collection then simultaneous modification of the modification count will not happen resulting in not throwing the ConcurrentModificationException.
Hope it helps.

Why not use Vector instead? It's already synchronized.

Related

Synchronization on Collections.unmodifiableList

I have a question. I think i know the answer but for some reason i prefer to ask for it here.
So here is the scenario:
I have an Object which has a list as a field. Then i have a method that returns the list as an unmodifiableList.
The Oject class has other methods that add elements to the list.
So lets imagine a case where one thread is iterating throught the unmodifiable list and another thread that is adding elements to the list using the Object class method.
How do i make this thread safe? If i synchronize the unmodifiableList and the list itselft will it make it thread safe? After all they are two different object where the unmodifiableList has a field which is the naked list itselft.
You need to make the "naked" list synchronized:
private List<Foo> list = Collections.synchronizedList(new ArrayList<Foo>());
But beware: that will only make sure the list internal state is coherent. As soon as you iterate on the list, you can't prevent a modification to the list to happen between two calls to the list iterator. So nothing will prevent a ConcurrentModificationException to happen in that case. To prevent that, you should not return any reference (even an indirect one) to the list. All modifications and iterations to the list should be encapsulated in your class, and properly synchronized.
You can return an unmodifiable-clone of original list to the caller.
The disadvantage is that the caller may end up with a "stale" version of the list. However, by this way you achieve safe iterations. In concurrent world, it is OK to return last successfully updated data to the caller.
public List<Thing> getThings() {
List<Thing> copytOfThings = new ArrayList<>();
copyOfThings.addAll(_things); //original list items.
return Collections.unmodifiableList(copyOfThings);
}
There are a couple of ways you could do this:
Return a copy of the list, rather than an unmodifiable view of it
Rather than using the iterator, use List.get(int)

Difference between CopyOnWriteArrayList and synchronizedList

As per my understanding concurrent collection classes preferred over synchronized collections because the concurrent collection classes don't take a lock on the complete collection object. Instead they take locks on a small segment of the collection object.
But when I checked the add method of CopyOnWriteArrayList, we are acquiring a lock on complete collection object. Then how come CopyOnWriteArrayList is better than a list returned by Collections.synchronizedList? The only difference I see in the add method of CopyOnWriteArrayList is that we are creating copy of that array each time the add method is called.
public boolean add(E e) {
final ReentrantLock lock = this.lock;
lock.lock();
try {
Object[] elements = getArray();
int len = elements.length;
Object[] newElements = Arrays.copyOf(elements, len + 1);
newElements[len] = e;
setArray(newElements);
return true;
} finally {
lock.unlock();
}
}
As per my understanding concurrent collection classes preferred over synchronized collection because concurrent collection classes don't take lock on complete collection object. Instead it takes lock on small segment of collection object.
This is true for some collections but not all. A map returned by Collections.synchronizedMap locks the entire map around every operation, whereas ConcurrentHashMap locks only one hash bucket for some operations, or it might use a non-blocking algorithm for others.
For other collections, the algorithms in use, and thus the tradeoffs, are different. This is particularly true of lists returned by Collections.synchronizedList compared to CopyOnWriteArrayList. As you noted, both synchronizedList and CopyOnWriteArrayList take a lock on the entire array during write operations. So why are the different?
The difference emerges if you look at other operations, such as iterating over every element of the collection. The documentation for Collections.synchronizedList says,
It is imperative that the user manually synchronize on the returned list when iterating over it:
List list = Collections.synchronizedList(new ArrayList());
...
synchronized (list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
Failure to follow this advice may result in non-deterministic behavior.
In other words, iterating over a synchronizedList is not thread-safe unless you do locking manually. Note that when using this technique, all operations by other threads on this list, including iterations, gets, sets, adds, and removals, are blocked. Only one thread at a time can do anything with this collection.
By contrast, the doc for CopyOnWriteArrayList says,
The "snapshot" style iterator method uses a reference to the state of the array at the point that the iterator was created. This array never changes during the lifetime of the iterator, so interference is impossible and the iterator is guaranteed not to throw ConcurrentModificationException. The iterator will not reflect additions, removals, or changes to the list since the iterator was created.
Operations by other threads on this list can proceed concurrently, but the iteration isn't affected by changes made by any other threads. So, even though write operations lock the entire list, CopyOnWriteArrayList still can provide higher throughput than an ordinary synchronizedList. (Provided that there is a high proportion of reads and traversals to writes.)
For write (add) operation, CopyOnWriteArrayList uses ReentrantLock and creates a backup copy of the data and the underlying volatile array reference is only updated via setArray(Any read operation on the list during before setArray will return the old data before add).Moreover, CopyOnWriteArrayList provides snapshot fail-safe iterator and doesn't throw ConcurrentModifficationException on write/ add.
But when I checked add method of CopyOnWriteArrayList.class, we are acquiring lock on complete collection object. Then how come CopyOnWriteArrayList is better than synchronizedList. The only difference I see in add method of CopyOnWriteArrayList is we are creating copy of that array each time add method get called.
No, the lock is not on the entire Collection object. As stated above it is a ReentrantLock and it is different from the intrinsic object lock.
The add method will always create a copy of the existing array and do the modification on the copy and then finally update the volatile reference of the array to point to this new array. And that's why we have the name "CopyOnWriteArrayList" - makes copy when you write into it.. This also avoids the ConcurrentModificationException
1) get and other read operation on CopyOnWriteArrayList are not synchronized.
2) CopyOnWriteArrayList's iterator never throws ConcurrentModificationException while Collections.synchronizedList's iterator may throw it.

Java concurrency with a Map of Lists

I have a java class that is accessed by a lot of threads at once and want to make sure it is thread safe. The class has one private field, which is a Map of Strings to Lists of Strings. I've implemented the Map as a ConcurrentHashMap to ensure gets and puts are thread safe:
public class ListStore {
private Map<String, List<String>> innerListStore;
public ListStore() {
innerListStore = new ConcurrentHashMap<String, List<String>>();
}
...
}
So given that gets and puts to the Map are thread safe, my concern is with the lists that are stored in the Map. For instance, consider the following method that checks if a given entry exists in a given list in the store (I've omitted error checking for brevity):
public boolean listEntryExists(String listName, String listEntry) {
List<String> listToSearch = innerListStore.get(listName);
for (String entryName : listToSearch) {
if(entryName.equals(listEntry)) {
return true;
}
}
return false;
}
It would seem that I need to synchronize the entire contents of this method because if another method changed the contents of the list at innerListStore.get(listName) while this method is iterating over it, a ConcurrentModificationException would be thrown.
Is that correct and if so, do I synchronize on innerListStore or would synchronizing on the local listToSearch variable work?
UPDATE: Thanks for the responses. It sounds like I can synchronize on the list itself. For more information, here is the add() method, which can be running at the same time the listEntryExists() method is running in another thread:
public void add(String listName, String entryName) {
List<String> addTo = innerListStore.get(listName);
if (addTo == null) {
addTo = Collections.synchronizedList(new ArrayList<String>());
List<String> added = innerListStore.putIfAbsent(listName, addTo);
if (added != null) {
addTo = added;
}
}
addTo.add(entryName);
}
If this is the only method that modifies the underlying lists stored in the map and no public methods return references to the map or entries in the map, can I synchronize iteration on the lists themselves and is this implementation of add() sufficient?
You can synchronize on listToSearch ("synchronized(listToSearch) {...}"). Make sure that there is no race condition creating the lists (use innerListStore.putIfAbsent to create them).
You could synchronize on just listToSearch, there's no reason to lock the entire map any time anyone is using just one entry.
Just remember though, that you need to synchronize on the list everywhere it is modified! Synchronizing the iterator doesn't automagically block other people from doing an add() or whatnot if you passed out to them references to the unsynchronized list.
It would be safest to just store synchronized lists in the Map and then lock on them when you iterate, and also document when you return a reference to the list that the user must sycnhronize on it if they iterate. Synchronization is pretty cheap in modern JVMs when no actual contention is happening. Of course if you never let a reference to one of the lists escape your class, you can handle it internally with a finer comb.
Alternately you can use a threadsafe list such as CopyOnWriteArrayList that uses snapshot iterators. What kind of point in time consistency you need is a design decision we can't make for you. The javadoc also includes a helpful discussion of performance characteristics.
It would seem that I need to synchronize the entire contents of this method because if another method changed the contents of the list at innerListStore.get(listName) while this method is iterating over it, a ConcurrentModificationException would be thrown.
Are other threads accessing the List itself, or only though operations exposed by ListStore?
Will operations invoked by other threads result in the contents of the a List stored in the Map being changed? Or will entries only be added/removed from the Map?
You would only need to synchronize access to the List stored within the Map if different threads can result in changes to the same List instances. If the threads are only allowed to add/remove List instances from the Map (i.e. change the structure of the Map), then synchronization is not necessary.
if the lists stored in the map are of the type that don't throw CME (CopyOnWriteArrayList for example) you can iterate at will
this can introduce some races though if you're not careful
If the Map is already thread safe, then I think syncronizing the listToSearch should work. Im not 100% but I think it should work
synchronized(listToSearch)
{
}
You could use another abstraction from Guava
Note that this will synchronize on the whole map, so it might be not that useful for you.
As you haven't provided any client for the map of lists apart from the boolean listEntryExists(String listName, String listEntry) method, I wonder why you are storing lists at all? This structure seems to be more naturally a Map<String, Set<String>> and the listEntryExists should use the contains method (available on List as well, but O(n) to the size of the list):
public boolean listEntryExists(String name, String entry) {
SetString> set = map.get(name);
return (set == null) ? false : set.contains(entry;
}
Now, the contains call can encapsulate whatever internal concurrency protocol you want it to.
For the add you can either use a synchronized wrapper (simple, but maybe slow) or if writes are infrequent compared to reads, utilise ConcurrentMap.replace to implement your own copy-on-write strategy. For instance, using Guava ImmutableSet:
public boolean add(String name, String entry) {
while(true) {
SetString> set = map.get(name);
if (set == null) {
if (map.putIfAbsent(name, ImmutableSet.of(entry))
return true
continue;
}
if (set.contains(entry)
return false; // no need to change, already exists
Set<String> newSet = ImmutableSet.copyOf(Iterables.concat(set, ImmutableSet.of(entry))
if (map.replace(name, set, newSet)
return true;
}
}
This is now an entirely thread-safe lock-free structure, where concurrent readers and writers will not block each other (modulo the lock-freeness of the underlying ConcurrentMap implementation). This implementation does have an O(n) in its write, where your original implementation was O9n) in the read. Again if you are read-mostly rather than write-mostly this could be a big win.

java - general synchronizedList question

I have a general question regarding synchronized List.
Lets say that in the constructor I am createing a list
List synchronizedList = Collections.synchronizedList(list);
and I have one method adds an object to the list.
public void add(String s){
synchronizedList.add(s)
}
There is another thread that checks every few seconds if there are a few rows , dump it to a file and deletes them all.
Now lets say I iterate each row and save it to the db.
after all iteration I clear the list.
How does the multithread support help me?
I could add an element to the list just before the clear() in the other thread occurs .
Unless I manage the lock myself (which I dont realy need a synched list for that ) it myself.
The synchronized list returned by Collections won't help in your case. It's only good if you need to guarantee serial access to individual method calls. If you need to synchronize around a larger set of operations, then you need to manually wrap that code in a synchronized block. The Javadoc states:
It is imperative that the user manually synchronize on the returned list when iterating over it.
If your list is used elsewhere you can at least safeguard it from individual method calls that would otherwise not be thread-safe. If you're entirely managing the list however, you can just add a synchronized block to your add method and use the same lock that you'll use when iterating over it.
synchronizedList indeed only guarantees that every method call on the list is synchronized. If you need multiple operations to be done in a synchronized way, you have to handle the synchronization yourself.
BTW, this is explicitely said in the javadoc for Collections.synchronizedList :
It is imperative that the user
manually synchronize on the returned
list when iterating over it:
List list = Collections.synchronizedList(new ArrayList());
...
synchronized(list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
synchronized list means that all the operations on that list are guaranteed to be atomic. The scenario you describe requires to have some locking outside the list. Consider semaphores or making synchronized block to implement monitors. Take a look at java.util.concurrent.

If all collection attributes are thread-safe , can we say that this collection is thread-safe?

If all attributes (or items fields, or data members) of a java collection are thread-safe (CopyOnWriteArraySet,ConcurrentHashMap, BlockingQueue, ...), can we say that this collection is thread-safe ?
an exemple :
public class AmIThreadSafe {
private CopyOnWriteArraySet thradeSafeAttribute;
public void add(Object o) {
thradeSafeAttribute.add(o);
}
public void clear() {
thradeSafeAttribute.clear();
}
}
in this sample can we say that AmIThreadSafe is thread-safe ?
Assuming by "attributes" you mean "what the collection holds", then no. Just because the Collection holds thread-safe items does not mean that the Collection's implementation implements add(), clear(), remove(), etc., in a thread-safe manner.
Short answer: No.
Slightly longer answer: because add() and clear() are not in any way synchronized, and HashSet isn't itself synchronized, it's possible for multiple threads to be in them at the same time.
Edit following comment: Ah. Now the short answer is Yes, sorta. :)
The reason for the "sorta" (American slang meaning partially, btw) is that it's possible for two operations to be atomically safe, but to be unsafe when used in combination to make a compound operation.
In your given example, where only add() and clear() are supported, this can't happen.
But in a more complete class, where we would have more of the Collection interface, imagine a caller who needs to add an entry to the set iff the set has no more than 100 entries already.
This caller would want to write a method something like this:
void addIfNotOverLimit (AmIThreadSafe set, Object o, int limit) {
if (set.size() < limit) // ## thread-safe call 1
set.add(o); // ## thread-safe call 2
}
The problem is that while each call is itself threadsafe, two threads could be in addIfNotOverLimit (or for that matter, adding through another method altogether), and so threads A would call size() and get 99, and then call add(), but before that happens, it could be interrupted, and thread B could then add an entry, and now the set would be over its limit.
Moral? Compound operations make the definition of 'thread safe' more complex.
No, because the state of an object is the "sum" of all of its attributes.
for instance, you could have 2 thread-safe collections as attributes in your object. additionally, your object could depend on some sort of correlation between these 2 collections (e.g. if an object is in 1 collection, it is in the other collection, and vice versa). simply using 2 thread-safe collections will not ensure that that correlation is true at all points in time. you would need additional concurrency control in your object to ensure that this constraint holds across the 2 collections.
since most non-trivial objects have some type of correlation relationship across their attributes, using thread-safe collections as attributes is not sufficient to make an object thread-safe.
What is thread safety?
Thread safety simply means that the
fields of an object or class always
maintain a valid state, as observed by
other objects and classes, even when
used concurrently by multiple threads.
A thread-safe object is one that
always maintains a valid state, as
observed by other classes and objects,
even in a multithreaded environment.
According to the API documentation, you have to use this function to ensure thread-safety:
synchronizedCollection(Collection c)
Returns a synchronized (thread-safe) collection
backed by the specified collection
Reading that, it is my opinion that you have to use the above function to ensure a thread-safe Collection. However, you do not have to use them for all Collections and there are faster Collections that are thread-safe such as ConcurrentHashMap. The underlying nature of CopyOnWriteArraySet ensures thread-safe operations.

Categories

Resources