When reading JDK codes, I tried to find some usage of ReentrantReadWriteLock, and found that the only usage is in javax.swing.plaf.nimbus.ImageCache.
I have two questions with the usage of ReentrantReadWriteLock here:
I can understand the readLock used in the getImage method, and the writeLock used in the setImage method, but why is a readLock used in the flush method? Isn't the flush method also some kind of "write", since it changes the map:
public void flush() {
lock.readLock().lock();
try {
map.clear();
} finally {
lock.readLock().unlock();
}
}
The other question: why not use a ConcurrentHashMap here, since it will provide some concurrent writes to different mapEntries and provide more concurrency than ReadWriteLock?
Second Question First:
ReentrantReadWriteLocks can be used to improve concurrency in some uses of some kinds of Collections. This is typically worthwhile only when the collections are expected to be large, accessed by more reader threads than writer threads, and entail operations with overhead that outweighs synchronization overhead. - from ReentrantReadWriteLock Documentation
All of the points mentioned above correspond to an image cache. As for "why not use a ConcurrentHashMap?" - ImageCache uses a LinkedHashMap which has no concurrent implementation. For speculation as to why, refer to this SO question: Why there is no ConcurrentLinkedHashMap class in jdk?
First Question:
I too question why the flush method doesn't use the writeLock like the setImage method. After all it is structurally modifying the map.
After reviewing the javax.swing.plaf.nimbus.ImageCache and PixelCountSoftReference sources along with the ReentrantReadWriteLock and LinkedHashMap documentations, I'm left without a definitive answer.
Although I'm further confused by flush using a readLock, since ReentrantReadWriteLock's documentation has the following example, where a writeLock is used when clearing a TreeMap.
// For example, here is a class using a TreeMap that is expected to be
// large and concurrently accessed.
class RWDictionary {
private final Map<String, Data> m = new TreeMap<String, Data>();
private final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
private final Lock w = rwl.writeLock();
// other code left out for brevity
public void clear() {
w.lock(); // write lock
try { m.clear(); } // clear the TreeMap
finally { w.unlock(); }
}
}
The only thing I can do is speculate.
Speculation:
Maybe the author(s) made a mistake, highly unlikely but not impossible.
It's intentional. I have some ideas as to why it may be intentional, but I'm not sure how to word them and they're probably wrong.
The author(s) were the only ones using the ImageCache code and knew when and how (not) to use the flush method. This is unlikely as well.
It would be interesting to ask the author(s) why they used a readLock instead of a writeLock, via email, but no authors or emails are listed in the source. Perhaps sending an email to Oracle would result in an answer, I'm not to sure how to go about that.
Hopefully someone will come along and provide an actual answer. Good question.
Related
We have a spring boot service that simply provides data from a map. The map is updated regularly, triggered by a scheduler, meaning we build a new intermediate map loading all the data needed and as soon as it is finished we assign it. To overcome concurrency issues we introduced a ReentrantReadWriteLock that opens a write lock just in the moment the assignment of the intermediate map happens and of course read locks while accessing the map. Please see simplified code below
#Service
public class MyService {
private final Lock readLock;
private final Lock writeLock;
private Map<String, SomeObject> myMap = new HashMap<>();
public MyService() {
final ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock();
readLock = rwLock.readLock();
writeLock = rwLock.writeLock();
}
protected SomeObject getSomeObject(String key) {
readLock.lock();
try {
return this.myMap.get(key);
}
} finally {
readLock.unlock();
}
return null;
}
private void loadData() {
Map<String, SomeObject> intermediateMyMap = new HashMap<>();
// Now do some heavy processing till the new data is loaded to intermediateMyMap
//clear maps
writeLock.lock();
try {
myMap = intermediateMyMap;
} finally {
writeLock.unlock();
}
}
}
If we set the service under load accessing the map a lot we still saw the java.util.ConcurrentModificationException happening in the logs and I have no clue why.
BTW: Meanwhile I also saw this question, which seems also to be a solution. Nevertheless, I would like to know what I did wrong or if I misunderstood the concept of ReentrantReadWriteLock
EDIT: Today I was provided with the full stacktrace. As argued by some of you guys, the issue is really not related to this piece of code, it just happened coincidently in the same time the reload happened.
The problem actually was really in the access to getSomeObject(). In the real code SomeObject is again a Map and this inner List gets sorted each time it is accessed (which is bad anyways, but that is another issue). So basically we ran into this issue
I see nothing obviously wrong with the code. ReadWriteLock should provide the necessary memory ordering guarantees (See Memory Synchronization section at https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/Lock.html)
The problem might well be in the "heavy processing" part. A ConcurrentModificationException could also be caused by modifying the map while iterating over it from a single thread, but then you would see the same problem regardless of the load on the system.
As you already mentioned, for this pattern of replacing the whole map I think a volatile field or an AtomicReference would be the better and much simpler solution.
ReentrantReadWriteLock only guarantees the thread that holds the lock on the map can hold on to the lock if needed.
It does not guarantee myMap has not been cached behind the scenes.
A cached value could result in a stale read.
A stale read will give you the java.util.ConcurrentModificationException
myMap needs to be declared volatile to make the update visible to other threads.
From Java Concurrency in Practice:
volatile variables, to ensure that updates to a variable are
propagated predictably to other threads. When a field is declared
volatile, the compiler and runtime are put on notice that this
variable is shared and that operations on it should not be reordered
with other memory operations. Volatile variables are not cached in
registers or in caches where they are hidden from other processors, so
a read of a volatile variable always returns the most recent write by
any thread.
Peierls, Tim. Java Concurrency in Practice
an alternative would be to use syncronized on getSomeObject and a synchonized block on this around myMap = intermediateMyMap;
I have a class that is accessed by multiple threads, and I want to make sure it's thread safe. Plus it needs to be as fast as possible. This is just an example:
public class SharedClass {
private final Map<String, String> data = new HashMap<>();
private final Striped<ReadWriteLock> rwLockStripes = Striped.readWriteLock(100);
public void setSomethingFastVersion(String key, String value) {
ReadWriteLock rwLock = rwLockStripes.get(key);
try {
rwLock.lock();
} finally{
rwLock.unLock();
}
data.put(key, value);
}
public synchronized void setSomethingSlowVersion(String key, String value) {
data.put(key, value);
}
}
I'm using StripedLock from Google Guava in one version, and a normal synchronized on the other one.
Am I right saying that the Guava version should be faster?
If so, what would be a good use case for synchronized, where the StripedLocks would not fit?
BTW, I know I could use a simple ConcurrentHashMap here, but I'm adding the example code to make sure you understand my question.
Synchronized has been around for ages. It's not really surprising that we nowadays have more advanced mechanisms for concurrent programming.
However striped locks are advantageous only in cases where something can be partitioned or striped, such as locking parts of a map allowing different parts to be manipulated at the same time, but blocking simultaneous manipulations to the same stripe. In many cases you don't have that kind of partitioning, you're just looking for a mutex. In those cases synchronized is still a viable option, although a ReadWriteLock might be a better choice depending on the situation.
A ConcurrentHashMap has internal partitioning similar to stripes, but it applies only to the map operations such as put(). With an explicit StripedLock you can make longer operations atomic, while still allowing concurrency when operations don't touch the same stripe.
Let me put in this way. Say you have 1000 instances of a class, and you have 1000 threads trying to accesses those instances. Each instance will acquire a lock for each thread. So 1000 locks which will lead to huge memory consumption. In this case stripped locks could come handy.
But in normal case where you have a singleton class you may not need stripped locks and can go ahead and use synchronized keyword.
So, i hope i answered when to use what.
Use a ConcurrentHashMap so you won't have to do any of your own synchronizing.
The ReadWriteLock javadoc at Oracle and its implementation describe what the lock does and how to use it but doesn't say anything about whether to use the volatile keyword.
This is not the same question as do-all-mutable-variables-need-to-be-volatile-when-using-locks because I'm happy that the lock will synchronise access and visibility properly, but is the use of volatile for the variables still a good idea, e.g. for compiler optimisations or any other reasons?
My cached data consists of a rarely changed List and several Maps mapping the objects in the list using various attributes of the objects.
private void reload() {
Set<Registration> newBeans = dao.listRegistered();
beans = Collections.unmodifiableSet(newBeans);
codeToBeanMap.clear();
userToBeanMap.clear();
nameToBeanMap.clear();
idToBeanMap.clear();
for (Registration bean : newBeans) {
codeToBeanMap.put(bean.getCode(), bean);
userToBeanMap.put(bean.getUser(), bean);
nameToBeanMap.put(bean.getName(), bean);
idToBeanMap.put(bean.getId(), bean);
}
}
What would be the best declarations? I have this:
private Set<Registration> ecns;
private final Map<String, Registration> codeToBeanMap =
new HashMap<String, Registration>();
private final Map<String, Registration> userToBeanMap =
new HashMap<String, Registration>();
private final Map<String, Registration> nameToBeanMap =
new HashMap<String, Registration>();
private final Map<String, Registration> idToBeanMap =
new HashMap<String, Registration>();
If you are going to access a variable exclusively from code guarded by a lock, then it would be redundant to specify that variable as volatile and would incur a certain performance hit (it's in the nanosecond range).
Your declarations like this are fine for concurrency:
private final Map<String, Registration> idToBeanMap = new HashMap<>();
... because Java guarantees that fields declared final will be initialised before other threads can access them (Java 1.5 onwards, at least). The volatile keyword is not needed (and makes no sense with final anyway).
However, this says nothing about the contents of the HashMap - this would not be thread safe and different threads may see different or inconsistent content unless you use synchronized code blocks. Your simplest solution would be to use the ConcurrentHashMap instead. This would guarantee that all threads would see the expected Map content.
However, this only means operations on the maps would be atomic. In your code above it is possible you might have cleared one map but not the others (for example) at the point another thread tries to read the data which might give inconsistent results.
So, because it looks like your code requires multiple Maps to remain in sync and consistent with each other the only way is to make your reload method synchronized and also make clients reading the data do so through a synchronized method on the same object.
but is the use of volatile for the variables still a good idea, e.g. for compiler optimisations or any other reasons?
If anything volatile prevents compiler optimizations instead of enabling them.
The ReadWriteLock javadoc at Oracle and its implementation describe what the lock does and how to use it but doesn't say anything about whether to use the volatile keyword.
It does, in fact if you look at the interfaces that the read and write locks implement, specifically the Lock interface states that implementations must provide the same memory visibility guarantees as the synchronized() block:
All Lock implementations must enforce the same memory synchronization semantics as provided by the built-in monitor lock, as described in section 17.4 of The Java™ Language Specification:
A successful lock operation has the same memory synchronization effects as a successful Lock action.
A successful unlock operation has the same memory synchronization effects as a successful Unlock action.
I'm still quite shaky on multi-threading in Java. What I describe here is at the very heart of my application and I need to get this right. The solution needs to work fast and it needs to be practically safe. Will this work? Any suggestions/criticism/alternative solutions welcome.
Objects used within my application are somewhat expensive to generate but change rarely, so I am caching them in *.temp files. It is possible for one thread to try and retrieve a given object from cache, while another is trying to update it there. Cache operations of retrieve and store are encapsulated within a CacheService implementation.
Consider this scenario:
Thread 1: retrieve cache for objectId "page_1".
Thread 2: update cache for objectId "page_1".
Thread 3: retrieve cache for objectId "page_2".
Thread 4: retrieve cache for objectId "page_3".
Thread 5: retrieve cache for objectId "page_4".
Note: thread 1 appears to retrieve an obsolete object, because thread 2 has a newer copy of it. This is perfectly OK so I do not need any logic that will give thread 2 priority.
If I synchronize retrieve/store methods on my service, then I'm unnecessarily slowing things down for threads 3, 4 and 5. Multiple retrieve operations will be effective at any given time but the update operation will be called rarely. This is why I want to avoid method synchronization.
I gather I need to synchronize on an object that is exclusively common to thread 1 and 2, which implies a lock object registry. Here, an obvious choice would be a Hashtable but again, operations on Hashtable are synchronized, so I'm trying a HashMap. The map stores a string object to be used as a lock object for synchronization and the key/value would be the id of the object being cached. So for object "page_1" the key would be "page_1" and the lock object would be a string with a value of "page_1".
If I've got the registry right, then additionally I want to protect it from being flooded with too many entries. Let's not get into details why. Let's just assume, that if the registry has grown past defined limit, it needs to be reinitialized with 0 elements. This is a bit of a risk with an unsynchronized HashMap but this flooding would be something that is outside of normal application operation. It should be a very rare occurrence and hopefully never takes place. But since it is possible, I want to protect myself from it.
#Service
public class CacheServiceImpl implements CacheService {
private static ConcurrentHashMap<String, String> objectLockRegistry=new ConcurrentHashMap<>();
public Object getObject(String objectId) {
String objectLock=getObjectLock(objectId);
if(objectLock!=null) {
synchronized(objectLock) {
// read object from objectInputStream
}
}
public boolean storeObject(String objectId, Object object) {
String objectLock=getObjectLock(objectId);
synchronized(objectLock) {
// write object to objectOutputStream
}
}
private String getObjectLock(String objectId) {
int objectLockRegistryMaxSize=100_000;
// reinitialize registry if necessary
if(objectLockRegistry.size()>objectLockRegistryMaxSize) {
// hoping to never reach this point but it is not impossible to get here
synchronized(objectLockRegistry) {
if(objectLockRegistry.size()>objectLockRegistryMaxSize) {
objectLockRegistry.clear();
}
}
}
// add lock to registry if necessary
objectLockRegistry.putIfAbsent(objectId, new String(objectId));
String objectLock=objectLockRegistry.get(objectId);
return objectLock;
}
If you are reading from disk, lock contention is not going to be your performance issue.
You can have both threads grab the lock for the entire cache, do a read, if the value is missing, release the lock, read from disk, acquire the lock, and then if the value is still missing write it, otherwise return the value that is now there.
The only issue you will have with that is the concurrent read trashing the disk... but the OS caches will be hot, so the disk shouldn't be overly trashed.
If that is an issue then switch your cache to holding a Future<V> in place of a <V>.
The get method will become something like:
public V get(K key) {
Future<V> future;
synchronized(this) {
future = backingCache.get(key);
if (future == null) {
future = executorService.submit(new LoadFromDisk(key));
backingCache.put(key, future);
}
}
return future.get();
}
Yes that is a global lock... but you're reading from disk, and don't optimize until you have a proved performance bottleneck...
Oh. First optimization, replace the map with a ConcurrentHashMap and use putIfAbsent and you'll have no lock at all! (BUT only do that when you know this is an issue)
The complexity of your scheme has already been discussed. That leads to hard to find bugs. For example, not only do you lock on non-final variables, but you even change them in the middle of synchronized blocks that use them as a lock. Multi-threading is very hard to reason about, this kind of code makes it almost impossible:
synchronized(objectLockRegistry) {
if(objectLockRegistry.size() > objectLockRegistryMaxSize) {
objectLockRegistry = new HashMap<>(); //brrrrrr...
}
}
In particular, 2 simultaneous calls to get a lock on a specific string might actually return 2 different instances of the same string, each stored in a different instance of your hashmap (unless they are interned), and you won't be locking on the same monitor.
You should either use an existing library or keep it a lot simpler.
If your question includes the keywords "optimize", "concurrent", and your solution includes a complicated locking scheme ... you're doing it wrong. It is possible to succeed at this sort of venture, but the odds are stacked against you. Prepare to diagnose bizarre concurrency bugs, including but not limited to, deadlock, livelock, cache incoherency... I can spot multiple unsafe practices in your example code.
Pretty much the only way to create a safe and effective concurrent algorithm without being a concurrency god is to take one of the pre-baked concurrent classes and adapt them to your need. It's just too hard to do unless you have an exceptionally convincing reason.
You might take a look at ConcurrentMap. You might also like CacheBuilder.
Using Threads and synchronize directly is covered by the beginning of most tutorials about multithreading and concurrency. However, many real-world examples require more sophisticated locking and concurrency schemes, which are cumbersome and error prone if you implement them yourself. To prevent reinventing the wheel over an over again, the Java concurrency library was created. There, you can find many classes that will be of great help to you. Try googling for tutorials about java concurrency and locks.
As an example for a lock which might help you, see http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReadWriteLock.html .
Rather than roll your own cache I would take a look at Google's MapMaker. Something like this will give you a lock cache that automatically expires unused entries as they are garbage collected:
ConcurrentMap<String,String> objectLockRegistry = new MapMaker()
.softValues()
.makeComputingMap(new Function<String,String> {
public String apply(String s) {
return new String(s);
});
With this, the whole getObjectLock implementation is simply return objectLockRegistry.get(objectId) - the map takes care of all the "create if not already present" stuff for you in a safe way.
I Would do it similar, to you: just create a map of Object (new Object()).
But in difference to you i would use TreeMap<String, Object>
or HashMap
You call that the lockMap. One entry per file to lock. The lockMap is public available to all participating threads.
Each read and write to a specific file, gets the lock from the map. And uses syncrobize(lock) on that lock object.
If the lockMap is not fixed, and its content chan change, then reading and writing to the map must syncronized, too. (syncronized (this.lockMap) {....})
But your getObjectLock() is not safe, sync that all with your lock. (Double checked lockin is in Java not thread safe!) A recomended book: Doug Lea, Concurrent Programming in Java
I'm somewhat new to multithreaded environments and I'm trying to come up with the best solution for the following situation:
I read data from a database once daily in the morning, and stores the data in a HashMap in a Singleton object. I have a setter method that is called only when an intra-day DB change occurs (which will happen 0-2 times a day).
I also have a getter which returns an element in the map, and this method is called hundreds of times a day.
I'm worried about the case where the getter is called while I'm emptying and recreating the HashMap, thus trying to find an element in an empty/malformed list. If I make these methods synchronized, it prevents two readers from accessing the getter at the same time, which could be a performance bottleneck. I don't want to take too much of a performance hit since writes happen so infrequently. If I use a ReentrantReadWriteLock, will this force a queue on anyone calling the getter until the write lock is released? Does it allow multiple readers to access the getter at the same time? Will it enforce only one writer at a time?
Is coding this just a matter of...
private final ReentrantReadWriteLock readWriteLock = new ReentrantReadWriteLock();
private final Lock read = readWriteLock.readLock();
private final Lock write = readWriteLock.writeLock();
public HashMap getter(String a) {
read.lock();
try {
return myStuff_.get(a);
} finally {
read.unlock();
}
}
public void setter()
{
write.lock();
try {
myStuff_ = // my logic
} finally {
write.unlock();
}
}
Another way to achieve this (without using locks) is the copy-on-write pattern. It works well when you do not write often. The idea is to copy and replace the field itself. It may look like the following:
private volatile Map<String,HashMap> myStuff_ = new HashMap<String,HashMap>();
public HashMap getter(String a) {
return myStuff_.get(a);
}
public synchronized void setter() {
// create a copy from the original
Map<String,HashMap> copy = new HashMap<String,HashMap>(myStuff_);
// populate the copy
// replace copy with the original
myStuff_ = copy;
}
With this, the readers are fully concurrent, and the only penalty they pay is a volatile read on myStuff_ (which is very little). The writers are synchronized to ensure mutual exclusion.
Yes, if the write lock is held by a thread then other threads accessing the getter method would block since they cannot acquire the read lock. So you are fine here. For more details please read the JavaDoc of ReentrantReadWriteLock - http://download.oracle.com/javase/6/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html
You're kicking this thing off at the start of the day... you'll update it 0-2 times a day and you're reading it 100s of times per day. Assuming that the reading is going to take, say 1 full second(a looonnnng time) in an 8 hour day(28800 seconds) you've still got a very low read load. Looking at the docs for ReentrantReadWriteLock you can 'tweek' the mode so that it will be "fair", which means the thread that's been waiting the longest will get the lock. So if you set it to be fair, I don't think that your write thread(s) are going to be starved.
References
ReentrantReadWriteLock