We have a spring boot service that simply provides data from a map. The map is updated regularly, triggered by a scheduler, meaning we build a new intermediate map loading all the data needed and as soon as it is finished we assign it. To overcome concurrency issues we introduced a ReentrantReadWriteLock that opens a write lock just in the moment the assignment of the intermediate map happens and of course read locks while accessing the map. Please see simplified code below
#Service
public class MyService {
private final Lock readLock;
private final Lock writeLock;
private Map<String, SomeObject> myMap = new HashMap<>();
public MyService() {
final ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock();
readLock = rwLock.readLock();
writeLock = rwLock.writeLock();
}
protected SomeObject getSomeObject(String key) {
readLock.lock();
try {
return this.myMap.get(key);
}
} finally {
readLock.unlock();
}
return null;
}
private void loadData() {
Map<String, SomeObject> intermediateMyMap = new HashMap<>();
// Now do some heavy processing till the new data is loaded to intermediateMyMap
//clear maps
writeLock.lock();
try {
myMap = intermediateMyMap;
} finally {
writeLock.unlock();
}
}
}
If we set the service under load accessing the map a lot we still saw the java.util.ConcurrentModificationException happening in the logs and I have no clue why.
BTW: Meanwhile I also saw this question, which seems also to be a solution. Nevertheless, I would like to know what I did wrong or if I misunderstood the concept of ReentrantReadWriteLock
EDIT: Today I was provided with the full stacktrace. As argued by some of you guys, the issue is really not related to this piece of code, it just happened coincidently in the same time the reload happened.
The problem actually was really in the access to getSomeObject(). In the real code SomeObject is again a Map and this inner List gets sorted each time it is accessed (which is bad anyways, but that is another issue). So basically we ran into this issue
I see nothing obviously wrong with the code. ReadWriteLock should provide the necessary memory ordering guarantees (See Memory Synchronization section at https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/Lock.html)
The problem might well be in the "heavy processing" part. A ConcurrentModificationException could also be caused by modifying the map while iterating over it from a single thread, but then you would see the same problem regardless of the load on the system.
As you already mentioned, for this pattern of replacing the whole map I think a volatile field or an AtomicReference would be the better and much simpler solution.
ReentrantReadWriteLock only guarantees the thread that holds the lock on the map can hold on to the lock if needed.
It does not guarantee myMap has not been cached behind the scenes.
A cached value could result in a stale read.
A stale read will give you the java.util.ConcurrentModificationException
myMap needs to be declared volatile to make the update visible to other threads.
From Java Concurrency in Practice:
volatile variables, to ensure that updates to a variable are
propagated predictably to other threads. When a field is declared
volatile, the compiler and runtime are put on notice that this
variable is shared and that operations on it should not be reordered
with other memory operations. Volatile variables are not cached in
registers or in caches where they are hidden from other processors, so
a read of a volatile variable always returns the most recent write by
any thread.
Peierls, Tim. Java Concurrency in Practice
an alternative would be to use syncronized on getSomeObject and a synchonized block on this around myMap = intermediateMyMap;
Related
When reading JDK codes, I tried to find some usage of ReentrantReadWriteLock, and found that the only usage is in javax.swing.plaf.nimbus.ImageCache.
I have two questions with the usage of ReentrantReadWriteLock here:
I can understand the readLock used in the getImage method, and the writeLock used in the setImage method, but why is a readLock used in the flush method? Isn't the flush method also some kind of "write", since it changes the map:
public void flush() {
lock.readLock().lock();
try {
map.clear();
} finally {
lock.readLock().unlock();
}
}
The other question: why not use a ConcurrentHashMap here, since it will provide some concurrent writes to different mapEntries and provide more concurrency than ReadWriteLock?
Second Question First:
ReentrantReadWriteLocks can be used to improve concurrency in some uses of some kinds of Collections. This is typically worthwhile only when the collections are expected to be large, accessed by more reader threads than writer threads, and entail operations with overhead that outweighs synchronization overhead. - from ReentrantReadWriteLock Documentation
All of the points mentioned above correspond to an image cache. As for "why not use a ConcurrentHashMap?" - ImageCache uses a LinkedHashMap which has no concurrent implementation. For speculation as to why, refer to this SO question: Why there is no ConcurrentLinkedHashMap class in jdk?
First Question:
I too question why the flush method doesn't use the writeLock like the setImage method. After all it is structurally modifying the map.
After reviewing the javax.swing.plaf.nimbus.ImageCache and PixelCountSoftReference sources along with the ReentrantReadWriteLock and LinkedHashMap documentations, I'm left without a definitive answer.
Although I'm further confused by flush using a readLock, since ReentrantReadWriteLock's documentation has the following example, where a writeLock is used when clearing a TreeMap.
// For example, here is a class using a TreeMap that is expected to be
// large and concurrently accessed.
class RWDictionary {
private final Map<String, Data> m = new TreeMap<String, Data>();
private final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
private final Lock w = rwl.writeLock();
// other code left out for brevity
public void clear() {
w.lock(); // write lock
try { m.clear(); } // clear the TreeMap
finally { w.unlock(); }
}
}
The only thing I can do is speculate.
Speculation:
Maybe the author(s) made a mistake, highly unlikely but not impossible.
It's intentional. I have some ideas as to why it may be intentional, but I'm not sure how to word them and they're probably wrong.
The author(s) were the only ones using the ImageCache code and knew when and how (not) to use the flush method. This is unlikely as well.
It would be interesting to ask the author(s) why they used a readLock instead of a writeLock, via email, but no authors or emails are listed in the source. Perhaps sending an email to Oracle would result in an answer, I'm not to sure how to go about that.
Hopefully someone will come along and provide an actual answer. Good question.
The someParameters hashmap is loaded from a .csv file every twenty minutes or so by one thread and set by the setParameters method.
It is very frequently read by multiple threads calling getParameters: to perform a lookup translation of one value into a corresponding value.
Is the code unsafe and/ or the "wrong" way to achieve this (particularly in terms of performance)? I know about ConcurrentHashMap but am trying to get a more fundamental understanding of concurrency, rather than using classes that are inherrently thread-safe.
One potential risk I see is that the object reference someParameters could be reset whilst another thread is reading the copy, so the other thread might not have the latest values (which wouldn't matter to me).
public class ConfigObject {
private static HashMap<String, String> someParameters = new HashMap<String, String>();
public HashMap<String, String> getParameters(){
return new HashMap<String, String>(someParameters);
//to some thread which will only ever iterate or get
}
public void setParameters(HashMap<String, String> newParameters){
//could be called by any thread at any time
someParameters = newParameters;
}
}
There are two problems here
Visibility problem, as someParameters after update might not be visible to other thread, to fix this mark someParameters as volatile.
Other problem is performance one due to creating new HashMap in get method, to fix that use Utility method Collections.unmodifiableMap() this just wrap original map and disallowing put/remove method.
If I understand your problem correctly, you need to change/replace many parameters at once (atomically). Unfortunately, ConcurrentHashMap doesn't support atomic bulk inserts/updates.
To achieve this, you should use shared ReadWriteLock. Advantage comparing to Collections.synchronized... is that concurrent reads can be performed simultaneously: if readLock is acquired from some thread, readLock().lock() called from another thread will not block.
ReadWriteLock lock = new ReadWriteLock();
// on write:
lock.writeLock().lock();
try {
// write/update operation,
// e. g. clear map and write new values
} finally {
lock.writeLock().unlock();
}
// on read:
lock.readLock().lock();
try {
// read operation
} finally {
lock.readLock().unlock();
}
So far, I have used double-checked locking as follows:
class Example {
static Object o;
volatile static boolean setupDone;
private Example() { /* private constructor */ }
getInstance() {
if(!setupDone) {
synchronized(Example.class) {
if(/*still*/ !setupDone) {
o = new String("typically a more complicated operation");
setupDone = true;
}
}
}
return o;
}
}// end of class
Now, because we have groups of threads that all share this class, we changed the boolean to a ConcurrentHashMap as follows:
class Example {
static ConcurrentHashMap<String, Object> o = new ConcurrentHashMap<String, Object>();
static volatile ConcurrentHashMap<String, Boolean> setupsDone = new ConcurrentHashMap<String, Boolean>();
private Example() { /* private constructor */ }
getInstance(String groupId) {
if (!setupsDone.containsKey(groupId)) {
setupsDone.put(groupId, false);
}
if(!setupsDone.get(groupId)) {
synchronized(Example.class) {
if(/*still*/ !setupsDone.get(groupId)) {
o.put(groupId, new String("typically a more complicated operation"));
setupsDone.put(groupId, true); // will this still maintain happens-before?
}
}
}
return o.get(groupId);
}
}// end of class
My question now is: If I declare a standard Object as volatile, I will only get a happens-before relationship established when I read or write its reference. Therefore writing an element within that Object (if it is e.g. a standard HashMap, performing a put() operation on it) will not establish such a relationship. Is that correct? (What about reading an element; wouldn't that require to read the reference as well and thus establish the relationship?)
Now, with using a volatile ConcurrentHashMap, will writing an element to it establish the happens-before relationship, i.e. will the above still work?
Update: The reason for this question and why double-checked locking is important:
What we actually set up (instead of an Object) is a MultiThreadedHttpConnectionManager, to which we pass some settings, and which we then pass into an HttpClient, that we set up, too, and that we return. We have up to 10 groups of up to 100 threads each, and we use double-checked locking as we don't want to block each of them whenever they need to acquire their group's HttpClient, as the whole setup will be used to help with performance testing. Because of an awkward design and an odd platform we run this on we cannot just pass objects in from outside, so we hope to somehow make this setup work. (I realise the reason for the question is a bit specific, but I hope the question itself is interesting enough: Is there a way to get that ConcurrentHashMap to use "volatile behaviour", i.e. establish a happens-before relationship, as the volatile boolean did, when performing a put() on the ConcurrentHashMap? ;)
Yes, it is correct. volatile protects only that object reference, but nothing else.
No, putting an element to a volatile HashMap will not create a happens-before relationship, not even with a ConcurrentHashMap.
Actually ConcurrentHashMap does not hold lock for read operations (e.g. containsKey()). See ConcurrentHashMap Javadoc.
Update:
Reflecting your updated question: you have to synchronize on the object you put into the CHM. I recommend to use a container object instead of directly storing the Object in the map:
public class ObjectContainer {
volatile boolean isSetupDone = false;
Object o;
}
static ConcurrentHashMap<String, ObjectContainer> containers =
new ConcurrentHashMap<String, ObjectContainer>();
public Object getInstance(String groupId) {
ObjectContainer oc = containers.get(groupId);
if (oc == null) {
// it's enough to sync on the map, don't need the whole class
synchronized(containers) {
// double-check not to overwrite the created object
if (!containers.containsKey(groupId))
oc = new ObjectContainer();
containers.put(groupId, oc);
} else {
// if another thread already created, then use that
oc = containers.get(groupId);
}
} // leave the class-level sync block
}
// here we have a valid ObjectContainer, but may not have been initialized
// same doublechecking for object initialization
if(!oc.isSetupDone) {
// now syncing on the ObjectContainer only
synchronized(oc) {
if(!oc.isSetupDone) {
oc.o = new String("typically a more complicated operation"));
oc.isSetupDone = true;
}
}
}
return oc.o;
}
Note, that at creation, at most one thread may create ObjectContainer. But at initialization each groups may be initialized in parallel (but at most 1 thread per group).
It may also happen that Thread T1 will create the ObjectContainer, but Thread T2 will initialize it.
Yes, it is worth to keep the ConcurrentHashMap, because the map reads and writes will happen at the same time. But volatile is not required, since the map object itself will not change.
The sad thing is that the double-check does not always work, since the compiler may create a bytecode where it is reusing the result of containers.get(groupId) (that's not the case with the volatile isSetupDone). That's why I had to use containsKey for the double-checking.
Therefore writing an element within that Object (if it is e.g. a standard HashMap, performing a put() operation on it) will not establish such a relationship. Is that correct?
Yes and no. There is always a happens-before relationship when you read or write a volatile field. The issue in your case is that even though there is a happens-before when you access the HashMap field, there is no memory synchronization or mutex locking when you are actually operating on the HashMap. So multiple threads can see different versions of the same HashMap and can create a corrupted data structure depending on race conditions.
Now, with using a volatile ConcurrentHashMap, will writing an element to it establish the happens-before relationship, i.e. will the above still work?
Typically you do not need to mark a ConcurrentHashMap as being volatile. There are memory barriers that are crossed internal to the ConcurrentHashMap code itself. The only time I'd use this is if the ConcurrentHashMap field is being changed often -- i.e. is non-final.
Your code really seems like premature optimization. Has a profiler shown you that it is a performance problem? I would suggest that you just synchronize on the map and me done with it. Having two ConcurrentHashMap to solve this problem seems like overkill to me.
private static Map<Integer, String> map = null;
public static String getString(int parameter){
if(map == null){
map = new HashMap<Integer, String>();
//map gets filled here...
}
return map.get(parameter);
}
Is that code unsafe as multithreading goes?
As mentioned, it's definitely not safe. If the contents of the map are not based on the parameter in getString(), then you would be better served by initializing the map as a static initializer as follows:
private static final Map<Integer, String> MAP = new HashMap<Integer,String>();
static {
// Populate map here
}
The above code gets called once, when the class is loaded. It's completely thread safe (although future modification to the map are not).
Are you trying to lazy load it for performance reasons? If so, this is much safer:
private static Map<Integer, String> map = null;
public synchronized static String getString(int parameter){
if(map == null){
map = new HashMap<Integer, String>();
//map gets filled here...
}
return map.get(parameter);
}
Using the synchronized keyword will make sure that only a single thread can execute the method at any one time, and that changes to the map reference are always propagated.
If you're asking this question, I recommend reading "Java Concurrency in Practice".
Race condition? Possibly.
If map is null, and two threads check if (map == null) at the same time, each would allocate a separate map. This may or may not be a problem, depending mainly on whether map is invariant. Even if the map is invariant, the cost of populating the map may also become an issue.
Memory leak? No.
The garbage collector will do its job correctly regardless of the race condition.
You do run the risk of initializing map twice in a multi-threaded scenario.
In a managed language, the garbage collector will eventually dispose of the no-longer-referenced instance. In an unmanaged language, you will never free the memory allocated for the overwritten map.
Either way, initialization should be properly protected so that multiple threads do not run initialization code at the same time.
One reason: The first thread could be in the middle of initializing the HashMap, while a second thread comes a long, sees that map is not null, and merrily tries to use the partially-initialized data structure.
It is unsafe in multithreading case due to race condition.
But do you really need the lazy initialization for the map? If the map is going to be used anyway, seems you could just do eager initialization for it..
The above code isn't thread-safe, as others have mentioned, your map can be initialized twice. You may be tempted to try and fix the above code by adding some synchronization, this is known as "double checked locking", Here is an article that describes the problems with this approach, as well as some potential fixes.
The simplest solution is to make the field a static field in a separate class:
class HelperSingleton {
static Helper singleton = new Helper();
}
it can also be fixed using the volatile keyword, as described in Bill Pugh's article.
No, this code is not safe for use by multiple threads.
There is a race condition in the initialization of the map. For example, multiple threads could initialize the map simultaneously and clobber each others' writes.
There are no memory barriers to ensure that modifications made by a thread are visible to other threads. For example, each thread could use its own copy of the map because they never "see" the values written by another thread.
There is no atomicity to ensure that invariants are preserved as the map is accessed concurrently. For example, a thread that's performing a get() operation could get into an infinite loop because another thread rehashed the buckets during a simultaneous put() operation.
If you are using Java 6, use ConcurrentHashMap
ConcurrentHashMap JavaDoc
I'm writing an analogue of DatabaseConfiguration class which reads configuration from database and I need some advice regards synchronization.
For example,
public class MyDBConfiguration{
private Connection cn;
private String table_name;
private Map<String, String> key_values = new HashMap<String,String>();
public MyDBConfiguration (Connection cn, String table_name) {
this.cn = cn;
this.table_name = table_name;
reloadConfig();
}
public String getProperty(String key){
return this.key_values.get(key);
}
public void reloadConfig() {
Map<String, String> tmp_map = new HashMap<String,String> ();
// read data from database
synchronized(this.key_values)
{
this.key_values = tmp_map;
}
}
}
So I have a couple questions.
1. Assuming properties are read only , do I have use synchronize in getProperty ?
2. Does it make sense to do this.key_values = Collections.synchronizedMap(tmp_map) in reloadConfig?
Thank you.
If multiple threads are going to share an instance, you must use some kind of synchronization.
Synchronization is needed mainly for two reasons:
It can guarantee that some operations are atomic, so the system will keep consistent
It guarantees that every threads sees the same values in the memory
First of all, since you made reloadConfig() public, your object does not really look immutable. If the object is really immutable, that is, if after initialization of its values they cannot change (which is a desired property to have in objects that are shared).
For the above reason, you must synchronize all the access to the map: suppose a thread is trying to read from it while another thread is calling reloadConfig(). Bad things will happen.
If this is really the case (mutable settings), you must synchronize in both reads and writes (for obvious reasons). Threads must synchronize on a single object (otherwise there's no synchronization). The only way to guarantee that all the threads will synchronize on the same object is to synchronize on the object itself or in a properly published, shared, lock, like this:
// synchronizes on the in instance itself:
class MyDBConfig1 {
// ...
public synchronized String getProperty(...) { ... }
public synchronized reloadConfig() { ... }
}
// synchronizes on a properly published, shared lock:
class MyDBConfig2 {
private final Object lock = new Object();
public String getProperty(...) { synchronized(lock) { ... } }
public reloadConfig() { synchronized(lock) { ... } }
}
The properly publication here is guaranteed by the final keyword. It is subtle: it guarantees that the value of this field is visible to every thread after initialization (without it, a thread might see that lock == null, and bad things will happen).
You could improve the code above by using a (properly published) ReadWriteReentrantLock. It might improve concurrency a bit if that's a concern for you.
Supposing your intention was to make MyDBConfig immutable, you do not need to serialize access to the hash map (that is, you don't necessarily need to add the synchronized keyword). You might improve concurrency.
First of all, make reloadConfig() private (this will indicate that, for consumers of this object, it is indeed immutable: the only method they see is getProperty(...), which, by its name, should not modify the instance).
Then, you only need to guarantee that every thread will see the correct values in the hash map. To do so, you could use the same techniques presented above, or you could use a volatile field, like this:
class MyDBConfig {
private volatile boolean initialized = false;
public String getProperty(...) { if (initialized) { ... } else { throw ... } }
private void reloadConfig() { ...; initialized = true; }
public MyDBConfig(...) { ...; reloadConfig(); }
}
The volatile keyword is very subtle. Volatile writes and volatile reads have a happens-before relationship. A volatile write is said to happen-before a subsequent volatile read of the same (volatile) field. What this means is that all the memory locations that have been modified before (in program order) a volatile write are visible to every other thread after they have executed a subsequente volatile read of the same (volatile) field.
In the code above, you write true to the volatile field after all the values have been set. Then, the method reading values (getProperty(...)) begins by executing a volatile read of the same field. Then this method is guaranteed to see the correct values.
In the example above, if you don't publish the instance before the constructor finishes, it is guaranteed that the exception won't get thrown in the method getProperty(...) (because before the constructor finishes, you have written true to initialized).
Assuming that key_values will not be put to after reloadConfig you will need to synchronize access to both reads and writes of the map. You are violating this by only synchronizing on the assignment. You can solve this by removing the synchronized block and assigning the key_values as volatile.
Since the HashMap is effectively read only I wouldn't assign Collections.synchronizedMap rather Collections.unmodifiableMap (this wouldn't effect the Map itself, just prohibit from accidental puts from someone else possible using this class).
Note: Also, you should never synchronize a field that will change. The results are very unpredictable.
Edit: In regards to the other answers. It is highly suggested that all shared mutable data must be synchronized as the effects are non-deterministic. The key_values field is a shared mutable field and assignments to it must be synchronized.
Edit: And to clear up any confusion with Bruno Reis. The volatilefield would be legal if you still fill the tmp_map and after its finished being filled assign it to this.key_values it would look like:
private volatile Map<String, String> key_values = new HashMap<String,String>();
..rest of class
public void reloadConfig() {
Map<String, String> tmp_map = new HashMap<String,String> ();
// read data from database
this.key_values = tmp_map;
}
You still need the same style or else as Bruno Reis noted it would not be thread-safe.
I would say that if you guarantee that no code will structurally modify your map, then there is no need to synchronize it.
If multiple threads access a hash map concurrently, and at least one
of the threads modifies the map structurally, it must be synchronized
externally.
http://download.oracle.com/javase/6/docs/api/java/util/HashMap.html
The code you have shown provides only read access to the map. Client code cannot make a structural modification.
Since your reload method alters a temporary map and then changes key_values to point to the new map, again I'd say no synchronization is required. The worst that can happen is someone reads from old copy of the map.
I'm going to keep my head down and wait for the downvotes now ;)
EDIT
As suggested by Bruno, the fly in the ointment is inheritance. If you cannot guarantee that your class will not be sub-classed, then you should be more defensive.
EDIT
Just to refer back to the specific questions posed by the OP...
Assuming properties are read only , do I have use synchronize in getProperty ?
Does it make sense to do this.key_values = Collections.synchronizedMap(tmp_map) in reloadConfig?
... I am genuinely interested to know if my answers are wrong. So I won't give up and delete my answer for a while ;)