In a multithreaded Java application I need to iterate over a collection of objects. Since both the collection and the objects could be modified by another thread while I iterate over them, I need to use synchronization.
However nested synchronized blocks are not recommended since they could lead to deadlocks. How would I solve this problem?
Collection<Data> dataCollection = something.getDataCollection();
synchronized ( dataCollection ) {
for ( final Data data : dataCollection ) {
synchronized ( data ) {
data.doSomething(); // doSomething() changes object state
}
}
}
I think you can use CopyOnWriteArrayList instead of the outer synchronization.
A thread-safe variant of ArrayList in which all mutative operations (add, set, and so on) are implemented by making a fresh copy of the underlying array.
This is ordinarily too costly, but may be more efficient than alternatives when traversal operations vastly outnumber mutations, and is useful when you cannot or don't want to synchronize traversals, yet need to preclude interference among concurrent threads
You can take a copy of the collection and only lock one object at a time.
Collection<Data> dataCollection = something.getDataCollection();
Collection<Data> copy;
synchronized ( dataCollection ) {
copy = new ArrayList<Data>(dataCollection);
}
for (Data data : copy) {
synchronized ( data ) {
data.doSomething(); // doSomething() changes object state
}
}
Can't believe nobody pointed out that the number one way to avoid synchronizing on the Data object is to have this object itself thread-safe! It's also the correct way of handling synchronization - if you know that your object will be accessed by multiple threads, handle synchronization the way you see fit inside the class, not in the code that may access it. You will also certainly be more efficient because you can limit synchronization to just the critical blocks, use ReadWriteLock, j.u.c.atomic, etc
Nested synchronization can lead to deadlock, but it doesn't have to. One way to avoid deadlocks is to define an order that you synchronize objects and always follow it.
If you always synchronize the dataCollection object before you synchronize the data objects, you won't deadlock.
Take a look at ReentrantReadWriteLock. With this class you can implement a lock that makes it possible for any number of non-modifying (reading) threads to access the shared property simultaneously, but only one modifying (writing) thread to access it at a time (all other readers and writers are blocked until the writing thread releases the write-lock). Remember to test your implementation thorougly, as wrong usage of the locks can still lead to race condition and/or deadlocks.
Whether you use CopyOnWriteArrayList as Bozho said or copy the list before iterating as Peter says should depend on how much you expect the list to be edited compared to iterated over.
Use CopyOnWriteArrayList when you expect the list to be iterated over far more than it is modified.
Use copying the list if you think it will be modified far more than it is iterated over.
These should be the first options because concurrency solutions should be simple unless unavoidable, but if neither situation applies you will need to pick one of the more complicated strategies outlined in the comments here.
Good luck!
Related
A bit of (simplified) context.
Let's say I have an ArrayList<ContentStub> where ContentStub is:
public class ContentStub {
ContentType contentType;
Object content;
}
And I have multiple implementations of classes that "inflate" stubs for each ContentType, e.g.
public class TypeAStubInflater {
public void inflate(List<ContentStub> contentStubs) {
contentStubs.forEach(stub ->
{
if(stub.contentType == ContentType.TYPE_A) {
stub.content = someService.getContent();
}
});
}
}
The idea being, there is TypeAStubInflater which only modifies items ContentType.TYPE_A running in one thread, and TypeBStubInflater which only modifies items ContentType.TYPE_B, etc. - but each instance's inflate() method is modifying items in the same contentStubs List, in parallel.
However:
No thread ever changes the size of the ArrayList
No thread ever attempts to modify a value that's being modified by another thread
No thread ever attempts to read a value written by another thread
Given all this, it seems that no additional measures to ensure thread-safety are necessary. From a (very) quick look at the ArrayList implementation, it seems that there is no risk of a ConcurrentModificationException - however, that doesn't mean that something else can't go wrong. Am I missing something, or this safe to do?
In general, that will work, because you are not modifying the state of the List itself, which would throw a ConcurrentModificationException if any iterator is active at the time of looping, but rather are modifying just an object inside the list, which is fine from the list's POV.
I would recommend splitting up your into a Map<ContentType, List<ContentStub>> and then start Threads with those specific lists.
You could convert your list to a map with this:
Map<ContentType, ContentStub> typeToStubMap = stubs.stream().collect(Collectors.toMap(stub -> stub.contentType, Function.identity()));
If your List is not that big (<1000 entries) I would even recommend not using any threading, but just use a plain for-i loop to iterate, even .foreach if that 2 extra integers are no concern.
Let's assume the thread A writes TYPE_A content and thread B writes TYPE_B content. The List contentStubs is only used to obtain instances of ContentStub: read-access only. So from the perspective of A, B and contentStubs, there is no problem. However, the updates done by threads A and B will likely never be seen by another thread, e.g. another thread C will likely conclude that stub.content == null for all elements in the list.
The reason for this is the Java Memory Model. If you don't use constructs like locks, synchronization, volatile and atomic variables, the memory model gives no guarantee if and when modifications of an object by one thread are visible for another thread. To make this a little more practical, let's have an example.
Imagine that a thread A executes the following code:
stub.content = someService.getContent(); // happens to be element[17]
List element 17 is a reference to a ContentStub object on the global heap. The VM is allowed to make a private thread copy of that object. All subsequent access to reference in thread A, uses the copy. The VM is free to decide when and if to update the original object on the global heap.
Now imagine a thread C that executes the following code:
ContentStub stub = contentStubs.get(17);
The VM will likely do the same trick with a private copy in thread C.
If thread C already accessed the object before thread A updated it, thread C will likely use the – not updated – copy and ignore the global original for a long time. But even if thread C accesses the object for the first time after thread A updated it, there is no guarantee that the changes in the private copy of thread A already ended up in the global heap.
In short: without a lock or synchronization, thread C will almost certainly only read null values in each stub.content.
The reason for this memory model is performance. On modern hardware, there is a trade-off between performance and consistency across all CPUs/cores. If the memory model of a modern language requires consistency, that is very hard to guarantee on all hardware and it will likely impact performance too much. Modern languages therefore embrace low consistency and offer the developer explicit constructs to enforce it when needed. In combination with instruction reordering by both compilers and processors, that makes old-fashioned linear reasoning about your program code … interesting.
I tried to search but couldn't find exact answer I was looking for hence putting up a new question.
If you wish to share any mutable object(s) between multiple threads, are there any best practices/principles/guidelines to do it ?
Or will it simply vary case by case ?
Sharing mutable objects between threads is risky.
The safest way is to make the objects immutable, you can then share them freely.
If they must be mutable then each of the objects each needs to ensure their own thread safety using the usual methods to do so. (synchronized, AtomicX classes, etc).
The ways to protect the individual objects will vary a lot though depending on how you are using them and what you are using them for.
In java, you should synchronize any method that changes/reads the state of shared object, it is the easiest way.
other strategies are:
make use of thread safe classes (ConcurrentHashMap) for example
use of locks
use of volatile keyword, to avoid stale objects (sometimes could be used as lightweight synchronizer)
they key is sync your updates/reads to guarantee consistent state, the way you do it, could vary a lot.
The problems with sharing objects between threads are caused by having the two threads access the same data structure at the same time, with one mutating the structure while the other depends on the structure to be complete, correct or stable. Which of these cause the problem is important and should be considered when choosing the strategy.
These are the strategies I use.
Use immutable objects as much as possible.
This removes the issue of changing the data structure altogether. There are however a lot of useful patterns that can not be written using this approach. Also unless you are using a language/api which promotes immutability it can be inefficient. Adding a entry to a Scala list is much faster than making a copy of a Java list and adding a entry to the copy.
Use the synchronize keyword.
This ensures that only one thread at a time is allowed to change the object. It is important to choose which object to synchronize on. Changing a part of a structure might put the hole structure in an illegal state until another change is made. Also synchronize removes many of the benefits of going multithreaded in the first place.
The Actor model.
The actor model organizes the world in actors sending immutable messages to each other. Each actor only has one thread at once. The actor can contain the mutability.
There are platforms, like Akka, which provide the fundamentals for this approach.
Use the atomic classes. (java.util.concurrent.atomic)
These gems have methods like incrementAndGet. They can be used
to achieve many of the effects of synchronized without the overhead.
Use concurrent data structures.
The Java api contains concurrent data structures created for this purpose.
Risk doing stuff twice.
When designing a cache it is often a good idea to risk doing the work twice instead of using synchronize. Say you have a cache of compiled expressions from a dsl. If an expression is compiled twice that is ok as long as it eventually ends up in the cache. By allowing doing some extra work during initialization you may not need to use the synchronize keyword during cache access.
There is example. StringBuilder is not thread safe, so without synchronized (builder) blocks - result will be broken. Try and see.
Some objects are thread safe (for example StringBuffer), so no need to use synchronized blocks with them.
public static void main(String[] args) throws InterruptedException {
StringBuilder builder = new StringBuilder("");
Thread one = new Thread() {
public void run() {
for (int i = 0; i < 1000; i++) {
//synchronized (builder) {
builder.append("thread one\n");
//}
}
}
};
Thread two = new Thread() {
public void run() {
for (int i = 0; i < 1000; i++) {
//synchronized (builder) {
builder.append("thread two\n");
//}
}
}
};
one.start();
two.start();
one.join();
two.join();
System.out.println(builder);
}
Although there are some good answers already posted, but here is what I found while reading Java Concurrency in Practice Chapter 3 - Sharing Objects.
Quote from the book.
The publication requirements for an object depend on its mutability:
Mutable objects can be published through any mechanism;
Effectively immutable objects (whose state will not be modified after publication) must be safely published;
Mutable objects must be safely published, and must be either threadsafe or guarded by a lock.
Book states ways to safely publish mutable objects:
To publish an object safely, both the reference to the object and the object's state must be made visible to other threads at the same time. A properly constructed object can be safely published by:
Initializing an object reference from a static initializer;
Storing a reference to it into a volatile field or AtomicReference;
Storing a reference to it into a final field of a properly constructed object; or
Storing a reference to it into a field that is properly guarded by a lock.
The last point refers to using various mechanisms like using concurrent data structures and/or using synchronize keyword.
I have 2 code snippets which will do the same thing which makes thread safe. first one does it using Collections.synchronizedList, Example:
DataServiceRequest request = Collections.synchronizedList(new ArrayList<DataServiceRequest>());
Second one do the same thing by synchronizing the method, Example:
public synchronized void addRequest(DataServiceRequest request) {
this.getRequests().add(request);
}
What would be the most efficient and safest way When comparing with performance from above 2 examples?
The first is really just syntactic sugar for the second (it returns a wrapper list that puts synchronized (mutex) around each call), so it is unlikely to make any difference from a performance point of view.
As for "which is the safest way" - that depends on your coding standards. You must pay attention to the documents for Collections.synchronizedList if you use it, particularly:
it is critical that all access to the backing list is accomplished through the returned list.
and
It is imperative that the user manually synchronize on the returned list when iterating over it
You'll still have the same issue when iterating a list that you control the synchronization of - this is just saying that the mutex in use for synchronizedList is the list itself. If you control the synchronization you just need to consistently use the same mutex for all thread-safe access to the backing list.
Your question might imply that you don't plan to synchronize on all list operations, not just those that change the list. If so, then this would be wrong thinking. But even if not so, using synchronizedList wrapper takes that worry away from your program because it guarantees that all method calls are synchronized.
The one thing that synchronizedList cannot guarantee is synchronization over the block of code which consumes a list iterator. This is still something you'll need to do inside your own synchronized block.
I've got a gigantic Trove map and a method that I need to call very often from multiple threads. Most of the time this method shall return true. The threads are doing heavy number crunching and I noticed that there was some contention due to the following method (it's just an example, my actual code is bit different):
synchronized boolean containsSpecial() {
return troveMap.contains(key);
}
Note that it's an "append only" map: once a key is added, is stays in there forever (which is important for what comes next I think).
I noticed that by changing the above to:
boolean containsSpecial() {
if ( troveMap.contains(key) ) {
// most of the time (>90%) we shall pass here, dodging lock-acquisition
return true;
}
synchronized (this) {
return troveMap.contains(key);
}
}
I get a 20% speedup on my number crunching (verified on lots of runs, running during long times etc.).
Does this optimization look correct (knowing that once a key is there it shall stay there forever)?
What is the name for this technique?
EDIT
The code that updates the map is called way less often than the containsSpecial() method and looks like this (I've synchronized the entire method):
synchronized void addSpecialKeyValue( key, value ) {
....
}
This code is not correct.
Trove doesn't handle concurrent use itself; it's like java.util.HashMap in that regard. So, like HashMap, even seemingly innocent, read-only methods like containsKey() could throw a runtime exception or, worse, enter an infinite loop if another thread modifies the map concurrently. I don't know the internals of Trove, but with HashMap, rehashing when the load factor is exceeded, or removing entries can cause failures in other threads that are only reading.
If the operation takes a significant amount of time compared to lock management, using a read-write lock to eliminate the serialization bottleneck will improve performance greatly. In the class documentation for ReentrantReadWriteLock, there are "Sample usages"; you can use the second example, for RWDictionary, as a guide.
In this case, the map operations may be so fast that the locking overhead dominates. If that's the case, you'll need to profile on the target system to see whether a synchronized block or a read-write lock is faster.
Either way, the important point is that you can't safely remove all synchronization, or you'll have consistency and visibility problems.
It's called wrong locking ;-) Actually, it is some variant of the double-checked locking approach. And the original version of that approach is just plain wrong in Java.
Java threads are allowed to keep private copies of variables in their local memory (think: core-local cache of a multi-core machine). Any Java implementation is allowed to never write changes back into the global memory unless some synchronization happens.
So, it is very well possible that one of your threads has a local memory in which troveMap.contains(key) evaluates to true. Therefore, it never synchronizes and it never gets the updated memory.
Additionally, what happens when contains() sees a inconsistent memory of the troveMap data structure?
Lookup the Java memory model for the details. Or have a look at this book: Java Concurrency in Practice.
This looks unsafe to me. Specifically, the unsynchronized calls will be able to see partial updates, either due to memory visibility (a previous put not getting fully published, since you haven't told the JMM it needs to be) or due to a plain old race. Imagine if TroveMap.contains has some internal variable that it assumes won't change during the course of contains. This code lets that invariant break.
Regarding the memory visibility, the problem with that isn't false negatives (you use the synchronized double-check for that), but that trove's invariants may be violated. For instance, if they have a counter, and they require that counter == someInternalArray.length at all times, the lack of synchronization may be violating that.
My first thought was to make troveMap's reference volatile, and to re-write the reference every time you add to the map:
synchronized (this) {
troveMap.put(key, value);
troveMap = troveMap;
}
That way, you're setting up a memory barrier such that anyone who reads the troveMap will be guaranteed to see everything that had happened to it before its most recent assignment -- that is, its latest state. This solves the memory issues, but it doesn't solve the race conditions.
Depending on how quickly your data changes, maybe a Bloom filter could help? Or some other structure that's more optimized for certain fast paths?
Under the conditions you describe, it's easy to imagine a map implementation for which you can get false negatives by failing to synchronize. The only way I can imagine obtaining false positives is an implementation in which key insertions are non-atomic and a partial key insertion happens to look like another key you are testing for.
You don't say what kind of map you have implemented, but the stock map implementations store keys by assigning references. According to the Java Language Specification:
Writes to and reads of references are always atomic, regardless of whether they are implemented as 32 or 64 bit values.
If your map implementation uses object references as keys, then I don't see how you can get in trouble.
EDIT
The above was written in ignorance of Trove itself. After a little research, I found the following post by Rob Eden (one of the developers of Trove) on whether Trove maps are concurrent:
Trove does not modify the internal structure on retrievals. However, this is an implementation detail not a guarantee so I can't say that it won't change in future versions.
So it seems like this approach will work for now but may not be safe at all in a future version. It may be best to use one of Trove's synchronized map classes, despite the penalty.
I think you would be better off with a ConcurrentHashMap which doesn't need explicit locking and allows concurrent reads
boolean containsSpecial() {
return troveMap.contains(key);
}
void addSpecialKeyValue( key, value ) {
troveMap.putIfAbsent(key,value);
}
another option is using a ReadWriteLock which allows concurrent reads but no concurrent writes
ReadWriteLock rwlock = new ReentrantReadWriteLock();
boolean containsSpecial() {
rwlock.readLock().lock();
try{
return troveMap.contains(key);
}finally{
rwlock.readLock().release();
}
}
void addSpecialKeyValue( key, value ) {
rwlock.writeLock().lock();
try{
//...
troveMap.put(key,value);
}finally{
rwlock.writeLock().release();
}
}
Why you reinvent the wheel?
Simply use ConcurrentHashMap.putIfAbsent
I'm using Collections.synchronizedCollection in Java to protect a Set that I know is getting accessed concurrently by many threads. The Java API warns:
" It is imperative that the user manually synchronize on the returned collection when iterating over it:
Collection c = Collections.synchronizedCollection(myCollection);
...
synchronized(c) {
Iterator i = c.iterator(); // Must be in the synchronized block
while (i.hasNext())
foo(i.next());
}
"
If I use c.contains(obj), is that thread-safe? Internally, obviously, this is iterating over the Collection and seeing if any of the objects in it are equal to obj. My instinct is to assume that this is probably synchronized (it would seem to be a major failing if not), but given previous pains with synchronization, it seems wise to double-check, and a Google search for answers on this hasn't turned up anything.
In itself, a call to contains is safe.
The problem is that one often tests whether a collection contains an element then does something to the collection based on the result.
Most likely, the test and the action should be treated as a single, atomic operation. In that case, a lock on the collection should be obtained, and both operations should be performed in the synchronized block.
Collections.synchronizedCollection() will return a thread safe collection which means
any single method call is thread safe by itself. It depends what you want do. If you want to call couple of methods, java cannot make it thread safe together.
It's safe, because contains itself is synchronized.