synchronizedCollection and contains--do I need to synchronize manually? - java

I'm using Collections.synchronizedCollection in Java to protect a Set that I know is getting accessed concurrently by many threads. The Java API warns:
" It is imperative that the user manually synchronize on the returned collection when iterating over it:
Collection c = Collections.synchronizedCollection(myCollection);
...
synchronized(c) {
Iterator i = c.iterator(); // Must be in the synchronized block
while (i.hasNext())
foo(i.next());
}
"
If I use c.contains(obj), is that thread-safe? Internally, obviously, this is iterating over the Collection and seeing if any of the objects in it are equal to obj. My instinct is to assume that this is probably synchronized (it would seem to be a major failing if not), but given previous pains with synchronization, it seems wise to double-check, and a Google search for answers on this hasn't turned up anything.

In itself, a call to contains is safe.
The problem is that one often tests whether a collection contains an element then does something to the collection based on the result.
Most likely, the test and the action should be treated as a single, atomic operation. In that case, a lock on the collection should be obtained, and both operations should be performed in the synchronized block.

Collections.synchronizedCollection() will return a thread safe collection which means
any single method call is thread safe by itself. It depends what you want do. If you want to call couple of methods, java cannot make it thread safe together.

It's safe, because contains itself is synchronized.

Related

Why do we use synchronized collection if it doesn't guarantee the synchronized access on iterators?

For example, in the code below, we have to wrap list in a synchronized block when doing the iteration. Does the Collections.synchronizedList make the list synchronized? Why do we do this if it doesn't provide any convenience? Thanks!
List<Integer> list = Collections.synchronizedList( new ArrayList<>(Arrays.asList(4,3,52)));
synchronized(list) {
for(int data: list)
System.out.print(data+" ");
}
See https://docs.oracle.com/javase/tutorial/collections/implementations/wrapper.html
The reason is that iteration is accomplished via multiple calls into the collection, which must be composed into a single atomic operation.
Also see https://www.baeldung.com/java-synchronized-collections
Why do we do this if it doesn't provide any convenience
That it does not help you when iterating is not the same as providing no convenience.
All of the methods - get, size, set, isEmpty etc - are synchronized. This means that they have visibility of all writes made in any thread.
Without the synchronization, there is no guarantee that updates made in one thread are visible to any other threads, so one thread might see a size of 5 which another sees a size of 6, for example.
The mechanism for making the list synchronized is to make all of its methods synchronized: this effectively means that the body of the method is wrapped in a synchronized (this) { ... } block.
This is still true of the iterator() method: that too is synchronized. But the synchronized block finishes when iterator() returns, not when you finish iterating. It's a fundamental limitation of the way the language is designed.
So you have to help the language by adding the synchronized block yourself.
Wrapper is used to synchronize addition and removal elements from wrapped collection.
JavaDoc mentions that iteration is not synchronized an you need to synchronize it yourself.
* It is imperative that the user manually synchronize on the returned
* list when iterating over it
But other access operations are thread-safe and also establish happens before relation (since they use synchronized).
Collections.synchronizedList method synchronises methods like add, remove. However, it does not synzhronize iterator() method. Consider the following scenario:
Thread 1 is iterating through the list
Thread 2 is adding an element into it
In this case, you will get ConcurrentModificationException and hence, it's imperative to synzhronize the calls to iterator() method.

synchronized ArrayList vs synchronized method block

I have 2 code snippets which will do the same thing which makes thread safe. first one does it using Collections.synchronizedList, Example:
DataServiceRequest request = Collections.synchronizedList(new ArrayList<DataServiceRequest>());
Second one do the same thing by synchronizing the method, Example:
public synchronized void addRequest(DataServiceRequest request) {
this.getRequests().add(request);
}
What would be the most efficient and safest way When comparing with performance from above 2 examples?
The first is really just syntactic sugar for the second (it returns a wrapper list that puts synchronized (mutex) around each call), so it is unlikely to make any difference from a performance point of view.
As for "which is the safest way" - that depends on your coding standards. You must pay attention to the documents for Collections.synchronizedList if you use it, particularly:
it is critical that all access to the backing list is accomplished through the returned list.
and
It is imperative that the user manually synchronize on the returned list when iterating over it
You'll still have the same issue when iterating a list that you control the synchronization of - this is just saying that the mutex in use for synchronizedList is the list itself. If you control the synchronization you just need to consistently use the same mutex for all thread-safe access to the backing list.
Your question might imply that you don't plan to synchronize on all list operations, not just those that change the list. If so, then this would be wrong thinking. But even if not so, using synchronizedList wrapper takes that worry away from your program because it guarantees that all method calls are synchronized.
The one thing that synchronizedList cannot guarantee is synchronization over the block of code which consumes a list iterator. This is still something you'll need to do inside your own synchronized block.

Hashtable: why is get method synchronized?

I known a Hashtable is synchronized, but why its get() method is synchronized?
Is it only a read method?
If the read was not synchronized, then the Hashtable could be modified during the execution of read. New elements could be added, the underlying array could become too small and could be replaced by a bigger one, etc. Without sequential execution, it is difficult to deal with these situations.
However, even if get would not crash when the Hashtable is modified by another thread, there is another important aspect of the synchronized keyword, namely cache synchronization. Let's use a simplified example:
class Flag {
bool value;
bool get() { return value; } // WARNING: not synchronized
synchronized void set(bool value) { this->value = value; }
}
set is synchronized, but get isn't. What happens if two threads A and B simultaneously read and write to this class?
1. A calls read
2. B calls set
3. A calls read
Is it guaranteed at step 3 that A sees the modification of thread B?
No, it isn't, as A could be running on a different core, which uses a separate cache where the old value is still present. Thus, we have to force B to communicate the memory to other core, and force A to fetch the new data.
How can we enforce it? Everytime, a thread enters and leaves a synchronized block, an implicit memory barrier is executed. A memory barrier forces the cache to be updated. However, it is required that both the writer and the reader have to execute the memory barrier. Otherwise, the information is not properly communicated.
In our example, thread B already uses the synchronized method set, so its data modification is communicated at the end of the method. However, A does not see the modified data. The solution is to make get synchronized, so it is forced to get the updated data.
Have a look in Hashtable source code and you can think of lots of race conditions that can cause problem in a unsynchronized get() .
(I am reading JDK6 source code)
For example, a rehash() will create a empty array, and assign it to the instance var table, and put the entries from old table to the new one. Therefore if your get occurs after the empty array assignment, but before actually putting entries in it, you cannot find your key even it is in the table.
Another example is, there is a loop iterate thru the linked list at the table index, if in middle in your iteration, rehash happens. You may also failed to find the entry even it exists in the hashtable.
Hashtable is synchronized meaning the whole class is thread-safe
Inside the Hashtable, not only get() method is synchronized but also many other methods are. And particularly put() method is synchronized like Tom said.
A read method must be synchronized as a write method because it will make sure the visibility and the consistency of the variable.

Manually need to synchronize access to synchronized list/map/set etc

Collection class provides various methods to get thread safe collections . Then why is it necessary to manually synchronize access while iterating ?
Each method is thread safe. If you make multiple calls to a synchronized collection this is not thread safe unless you hold a lock explicitly. Using an Iterator involves making multiple calls to the iterator implicitly so there is no way around this.
What some of the Concurrency Libraries collections do is provide weak consistency. They provide a pragmatic solution which is that an added or removed element may, or may not be seen when Iterating.
A simple example of a thread safe collection used in an unsafe manner.
private final List<String> list = Collections.synchronizedList(
new ArrayList<String>());
list.add("hello");
String hi = list.remove(list.size()-1);
Both add and remove are thread safe and you won't get an error using them individually. The problem is another thread can alter the collection BETWEEN calls (not within calls) causing this code to break in a number of ways.

How to prevent nested synchronized blocks when iterating over a collection

In a multithreaded Java application I need to iterate over a collection of objects. Since both the collection and the objects could be modified by another thread while I iterate over them, I need to use synchronization.
However nested synchronized blocks are not recommended since they could lead to deadlocks. How would I solve this problem?
Collection<Data> dataCollection = something.getDataCollection();
synchronized ( dataCollection ) {
for ( final Data data : dataCollection ) {
synchronized ( data ) {
data.doSomething(); // doSomething() changes object state
}
}
}
I think you can use CopyOnWriteArrayList instead of the outer synchronization.
A thread-safe variant of ArrayList in which all mutative operations (add, set, and so on) are implemented by making a fresh copy of the underlying array.
This is ordinarily too costly, but may be more efficient than alternatives when traversal operations vastly outnumber mutations, and is useful when you cannot or don't want to synchronize traversals, yet need to preclude interference among concurrent threads
You can take a copy of the collection and only lock one object at a time.
Collection<Data> dataCollection = something.getDataCollection();
Collection<Data> copy;
synchronized ( dataCollection ) {
copy = new ArrayList<Data>(dataCollection);
}
for (Data data : copy) {
synchronized ( data ) {
data.doSomething(); // doSomething() changes object state
}
}
Can't believe nobody pointed out that the number one way to avoid synchronizing on the Data object is to have this object itself thread-safe! It's also the correct way of handling synchronization - if you know that your object will be accessed by multiple threads, handle synchronization the way you see fit inside the class, not in the code that may access it. You will also certainly be more efficient because you can limit synchronization to just the critical blocks, use ReadWriteLock, j.u.c.atomic, etc
Nested synchronization can lead to deadlock, but it doesn't have to. One way to avoid deadlocks is to define an order that you synchronize objects and always follow it.
If you always synchronize the dataCollection object before you synchronize the data objects, you won't deadlock.
Take a look at ReentrantReadWriteLock. With this class you can implement a lock that makes it possible for any number of non-modifying (reading) threads to access the shared property simultaneously, but only one modifying (writing) thread to access it at a time (all other readers and writers are blocked until the writing thread releases the write-lock). Remember to test your implementation thorougly, as wrong usage of the locks can still lead to race condition and/or deadlocks.
Whether you use CopyOnWriteArrayList as Bozho said or copy the list before iterating as Peter says should depend on how much you expect the list to be edited compared to iterated over.
Use CopyOnWriteArrayList when you expect the list to be iterated over far more than it is modified.
Use copying the list if you think it will be modified far more than it is iterated over.
These should be the first options because concurrency solutions should be simple unless unavoidable, but if neither situation applies you will need to pick one of the more complicated strategies outlined in the comments here.
Good luck!

Categories

Resources