Synchronised VS Striped Lock - java

I have a class that is accessed by multiple threads, and I want to make sure it's thread safe. Plus it needs to be as fast as possible. This is just an example:
public class SharedClass {
private final Map<String, String> data = new HashMap<>();
private final Striped<ReadWriteLock> rwLockStripes = Striped.readWriteLock(100);
public void setSomethingFastVersion(String key, String value) {
ReadWriteLock rwLock = rwLockStripes.get(key);
try {
rwLock.lock();
} finally{
rwLock.unLock();
}
data.put(key, value);
}
public synchronized void setSomethingSlowVersion(String key, String value) {
data.put(key, value);
}
}
I'm using StripedLock from Google Guava in one version, and a normal synchronized on the other one.
Am I right saying that the Guava version should be faster?
If so, what would be a good use case for synchronized, where the StripedLocks would not fit?
BTW, I know I could use a simple ConcurrentHashMap here, but I'm adding the example code to make sure you understand my question.

Synchronized has been around for ages. It's not really surprising that we nowadays have more advanced mechanisms for concurrent programming.
However striped locks are advantageous only in cases where something can be partitioned or striped, such as locking parts of a map allowing different parts to be manipulated at the same time, but blocking simultaneous manipulations to the same stripe. In many cases you don't have that kind of partitioning, you're just looking for a mutex. In those cases synchronized is still a viable option, although a ReadWriteLock might be a better choice depending on the situation.
A ConcurrentHashMap has internal partitioning similar to stripes, but it applies only to the map operations such as put(). With an explicit StripedLock you can make longer operations atomic, while still allowing concurrency when operations don't touch the same stripe.

Let me put in this way. Say you have 1000 instances of a class, and you have 1000 threads trying to accesses those instances. Each instance will acquire a lock for each thread. So 1000 locks which will lead to huge memory consumption. In this case stripped locks could come handy.
But in normal case where you have a singleton class you may not need stripped locks and can go ahead and use synchronized keyword.
So, i hope i answered when to use what.

Use a ConcurrentHashMap so you won't have to do any of your own synchronizing.

Related

Using Threads and Flyweight pattern together in Java?

I'm new to both multi-threading and using design patterns.
I've some threads using explicit multi-threading and each is suppose to compute the factorial of a number if it hasn't been computed ever by any thread. I'm using Flyweight Pattern for this.
private final long Comp;
private static Map<String, Fact> instances=new HashMap<String, Fact>();
private Fact(long comp) {
Comp=comp;
}
public static Fact getInstance(int num){
String key=String.valueOf(num);
if(!instances.containsKey(key)){
int comp=//calculate factorial of num
instances.put(key, new Fact(comp));
}
return instances.get(key);
}
public long get_Comp(){
return this.Comp;
}
}
public class Th implements Runnable {
// code elited
#Override
public void run() {
//get number and check if it's already in the HashMap, if no,
compute
}
}
If I do so then is it right to say that my Threads Th are computing Factorials?
If I add the computation in Fact (Flyweight) class then does it remain Flyweight, I guess yes.
Any other way of doing what I wish would be highly appreciated as well.
There's a couple of aims you might have here. What to do is dependent on what you are trying to do.
So it seems in this case you are attempting to avoid repeated computation, but that computation is not particularly expensive. You could run into a problem of lock contention. Therefore, to make it thread safe use ThreadLocal<Map<String, Fact>>. Potentially InheritableThreadLocal<Map<String, Fact>> where childValue copies the Map.
Often there are a known set of values that are likely to be common, and you just want these. In that case, compute a Map (or array) during class static initialisation.
If you want the flyweights to be shared between thread and be unique, use ConcurrentHashMap with together with the Map.computeIfAbsent method.
If you want the flyweights to be shared between thread, be unique and you want to make sure you only do the computation once, it gets a bit more difficult. You need to put (if absent) a placeholder into the ConcurrentMap; if the current thread wins replace that with the computed value and notify, otherwise wait for the computation.
Now if you want the flyweights to be garbage collected, you would want WeakHashMap. This cannot be a ConcurrentMap using the Java SE collections which makes it a bit hopeless. You can use good old fashioned locking. Alternatively the value can be a WeakReference<Fact>, but you'll need to manage eviction yourself.
It may be that a strong reference to Fact is only kept intermittently but you don't want it to be recreated too often, in which case you will need SoftReference instead of WeakReference. Indeed WeakHashMap can behave surprisingly, in some circumstances causing performance to drop to unusable after previously working fine.
(Note, in this case your Map would be better keyed on Integer.)

Usage of ReentrantReadWriteLock in javax.swing.plaf.nimbus.ImageCache

When reading JDK codes, I tried to find some usage of ReentrantReadWriteLock, and found that the only usage is in javax.swing.plaf.nimbus.ImageCache.
I have two questions with the usage of ReentrantReadWriteLock here:
I can understand the readLock used in the getImage method, and the writeLock used in the setImage method, but why is a readLock used in the flush method? Isn't the flush method also some kind of "write", since it changes the map:
public void flush() {
lock.readLock().lock();
try {
map.clear();
} finally {
lock.readLock().unlock();
}
}
The other question: why not use a ConcurrentHashMap here, since it will provide some concurrent writes to different mapEntries and provide more concurrency than ReadWriteLock?
Second Question First:
ReentrantReadWriteLocks can be used to improve concurrency in some uses of some kinds of Collections. This is typically worthwhile only when the collections are expected to be large, accessed by more reader threads than writer threads, and entail operations with overhead that outweighs synchronization overhead. - from ReentrantReadWriteLock Documentation
All of the points mentioned above correspond to an image cache. As for "why not use a ConcurrentHashMap?" - ImageCache uses a LinkedHashMap which has no concurrent implementation. For speculation as to why, refer to this SO question: Why there is no ConcurrentLinkedHashMap class in jdk?
First Question:
I too question why the flush method doesn't use the writeLock like the setImage method. After all it is structurally modifying the map.
After reviewing the javax.swing.plaf.nimbus.ImageCache and PixelCountSoftReference sources along with the ReentrantReadWriteLock and LinkedHashMap documentations, I'm left without a definitive answer.
Although I'm further confused by flush using a readLock, since ReentrantReadWriteLock's documentation has the following example, where a writeLock is used when clearing a TreeMap.
// For example, here is a class using a TreeMap that is expected to be
// large and concurrently accessed.
class RWDictionary {
private final Map<String, Data> m = new TreeMap<String, Data>();
private final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
private final Lock w = rwl.writeLock();
// other code left out for brevity
public void clear() {
w.lock(); // write lock
try { m.clear(); } // clear the TreeMap
finally { w.unlock(); }
}
}
The only thing I can do is speculate.
Speculation:
Maybe the author(s) made a mistake, highly unlikely but not impossible.
It's intentional. I have some ideas as to why it may be intentional, but I'm not sure how to word them and they're probably wrong.
The author(s) were the only ones using the ImageCache code and knew when and how (not) to use the flush method. This is unlikely as well.
It would be interesting to ask the author(s) why they used a readLock instead of a writeLock, via email, but no authors or emails are listed in the source. Perhaps sending an email to Oracle would result in an answer, I'm not to sure how to go about that.
Hopefully someone will come along and provide an actual answer. Good question.

How to optimize concurrent operations in Java?

I'm still quite shaky on multi-threading in Java. What I describe here is at the very heart of my application and I need to get this right. The solution needs to work fast and it needs to be practically safe. Will this work? Any suggestions/criticism/alternative solutions welcome.
Objects used within my application are somewhat expensive to generate but change rarely, so I am caching them in *.temp files. It is possible for one thread to try and retrieve a given object from cache, while another is trying to update it there. Cache operations of retrieve and store are encapsulated within a CacheService implementation.
Consider this scenario:
Thread 1: retrieve cache for objectId "page_1".
Thread 2: update cache for objectId "page_1".
Thread 3: retrieve cache for objectId "page_2".
Thread 4: retrieve cache for objectId "page_3".
Thread 5: retrieve cache for objectId "page_4".
Note: thread 1 appears to retrieve an obsolete object, because thread 2 has a newer copy of it. This is perfectly OK so I do not need any logic that will give thread 2 priority.
If I synchronize retrieve/store methods on my service, then I'm unnecessarily slowing things down for threads 3, 4 and 5. Multiple retrieve operations will be effective at any given time but the update operation will be called rarely. This is why I want to avoid method synchronization.
I gather I need to synchronize on an object that is exclusively common to thread 1 and 2, which implies a lock object registry. Here, an obvious choice would be a Hashtable but again, operations on Hashtable are synchronized, so I'm trying a HashMap. The map stores a string object to be used as a lock object for synchronization and the key/value would be the id of the object being cached. So for object "page_1" the key would be "page_1" and the lock object would be a string with a value of "page_1".
If I've got the registry right, then additionally I want to protect it from being flooded with too many entries. Let's not get into details why. Let's just assume, that if the registry has grown past defined limit, it needs to be reinitialized with 0 elements. This is a bit of a risk with an unsynchronized HashMap but this flooding would be something that is outside of normal application operation. It should be a very rare occurrence and hopefully never takes place. But since it is possible, I want to protect myself from it.
#Service
public class CacheServiceImpl implements CacheService {
private static ConcurrentHashMap<String, String> objectLockRegistry=new ConcurrentHashMap<>();
public Object getObject(String objectId) {
String objectLock=getObjectLock(objectId);
if(objectLock!=null) {
synchronized(objectLock) {
// read object from objectInputStream
}
}
public boolean storeObject(String objectId, Object object) {
String objectLock=getObjectLock(objectId);
synchronized(objectLock) {
// write object to objectOutputStream
}
}
private String getObjectLock(String objectId) {
int objectLockRegistryMaxSize=100_000;
// reinitialize registry if necessary
if(objectLockRegistry.size()>objectLockRegistryMaxSize) {
// hoping to never reach this point but it is not impossible to get here
synchronized(objectLockRegistry) {
if(objectLockRegistry.size()>objectLockRegistryMaxSize) {
objectLockRegistry.clear();
}
}
}
// add lock to registry if necessary
objectLockRegistry.putIfAbsent(objectId, new String(objectId));
String objectLock=objectLockRegistry.get(objectId);
return objectLock;
}
If you are reading from disk, lock contention is not going to be your performance issue.
You can have both threads grab the lock for the entire cache, do a read, if the value is missing, release the lock, read from disk, acquire the lock, and then if the value is still missing write it, otherwise return the value that is now there.
The only issue you will have with that is the concurrent read trashing the disk... but the OS caches will be hot, so the disk shouldn't be overly trashed.
If that is an issue then switch your cache to holding a Future<V> in place of a <V>.
The get method will become something like:
public V get(K key) {
Future<V> future;
synchronized(this) {
future = backingCache.get(key);
if (future == null) {
future = executorService.submit(new LoadFromDisk(key));
backingCache.put(key, future);
}
}
return future.get();
}
Yes that is a global lock... but you're reading from disk, and don't optimize until you have a proved performance bottleneck...
Oh. First optimization, replace the map with a ConcurrentHashMap and use putIfAbsent and you'll have no lock at all! (BUT only do that when you know this is an issue)
The complexity of your scheme has already been discussed. That leads to hard to find bugs. For example, not only do you lock on non-final variables, but you even change them in the middle of synchronized blocks that use them as a lock. Multi-threading is very hard to reason about, this kind of code makes it almost impossible:
synchronized(objectLockRegistry) {
if(objectLockRegistry.size() > objectLockRegistryMaxSize) {
objectLockRegistry = new HashMap<>(); //brrrrrr...
}
}
In particular, 2 simultaneous calls to get a lock on a specific string might actually return 2 different instances of the same string, each stored in a different instance of your hashmap (unless they are interned), and you won't be locking on the same monitor.
You should either use an existing library or keep it a lot simpler.
If your question includes the keywords "optimize", "concurrent", and your solution includes a complicated locking scheme ... you're doing it wrong. It is possible to succeed at this sort of venture, but the odds are stacked against you. Prepare to diagnose bizarre concurrency bugs, including but not limited to, deadlock, livelock, cache incoherency... I can spot multiple unsafe practices in your example code.
Pretty much the only way to create a safe and effective concurrent algorithm without being a concurrency god is to take one of the pre-baked concurrent classes and adapt them to your need. It's just too hard to do unless you have an exceptionally convincing reason.
You might take a look at ConcurrentMap. You might also like CacheBuilder.
Using Threads and synchronize directly is covered by the beginning of most tutorials about multithreading and concurrency. However, many real-world examples require more sophisticated locking and concurrency schemes, which are cumbersome and error prone if you implement them yourself. To prevent reinventing the wheel over an over again, the Java concurrency library was created. There, you can find many classes that will be of great help to you. Try googling for tutorials about java concurrency and locks.
As an example for a lock which might help you, see http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReadWriteLock.html .
Rather than roll your own cache I would take a look at Google's MapMaker. Something like this will give you a lock cache that automatically expires unused entries as they are garbage collected:
ConcurrentMap<String,String> objectLockRegistry = new MapMaker()
.softValues()
.makeComputingMap(new Function<String,String> {
public String apply(String s) {
return new String(s);
});
With this, the whole getObjectLock implementation is simply return objectLockRegistry.get(objectId) - the map takes care of all the "create if not already present" stuff for you in a safe way.
I Would do it similar, to you: just create a map of Object (new Object()).
But in difference to you i would use TreeMap<String, Object>
or HashMap
You call that the lockMap. One entry per file to lock. The lockMap is public available to all participating threads.
Each read and write to a specific file, gets the lock from the map. And uses syncrobize(lock) on that lock object.
If the lockMap is not fixed, and its content chan change, then reading and writing to the map must syncronized, too. (syncronized (this.lockMap) {....})
But your getObjectLock() is not safe, sync that all with your lock. (Double checked lockin is in Java not thread safe!) A recomended book: Doug Lea, Concurrent Programming in Java

What is the name of this locking technique?

I've got a gigantic Trove map and a method that I need to call very often from multiple threads. Most of the time this method shall return true. The threads are doing heavy number crunching and I noticed that there was some contention due to the following method (it's just an example, my actual code is bit different):
synchronized boolean containsSpecial() {
return troveMap.contains(key);
}
Note that it's an "append only" map: once a key is added, is stays in there forever (which is important for what comes next I think).
I noticed that by changing the above to:
boolean containsSpecial() {
if ( troveMap.contains(key) ) {
// most of the time (>90%) we shall pass here, dodging lock-acquisition
return true;
}
synchronized (this) {
return troveMap.contains(key);
}
}
I get a 20% speedup on my number crunching (verified on lots of runs, running during long times etc.).
Does this optimization look correct (knowing that once a key is there it shall stay there forever)?
What is the name for this technique?
EDIT
The code that updates the map is called way less often than the containsSpecial() method and looks like this (I've synchronized the entire method):
synchronized void addSpecialKeyValue( key, value ) {
....
}
This code is not correct.
Trove doesn't handle concurrent use itself; it's like java.util.HashMap in that regard. So, like HashMap, even seemingly innocent, read-only methods like containsKey() could throw a runtime exception or, worse, enter an infinite loop if another thread modifies the map concurrently. I don't know the internals of Trove, but with HashMap, rehashing when the load factor is exceeded, or removing entries can cause failures in other threads that are only reading.
If the operation takes a significant amount of time compared to lock management, using a read-write lock to eliminate the serialization bottleneck will improve performance greatly. In the class documentation for ReentrantReadWriteLock, there are "Sample usages"; you can use the second example, for RWDictionary, as a guide.
In this case, the map operations may be so fast that the locking overhead dominates. If that's the case, you'll need to profile on the target system to see whether a synchronized block or a read-write lock is faster.
Either way, the important point is that you can't safely remove all synchronization, or you'll have consistency and visibility problems.
It's called wrong locking ;-) Actually, it is some variant of the double-checked locking approach. And the original version of that approach is just plain wrong in Java.
Java threads are allowed to keep private copies of variables in their local memory (think: core-local cache of a multi-core machine). Any Java implementation is allowed to never write changes back into the global memory unless some synchronization happens.
So, it is very well possible that one of your threads has a local memory in which troveMap.contains(key) evaluates to true. Therefore, it never synchronizes and it never gets the updated memory.
Additionally, what happens when contains() sees a inconsistent memory of the troveMap data structure?
Lookup the Java memory model for the details. Or have a look at this book: Java Concurrency in Practice.
This looks unsafe to me. Specifically, the unsynchronized calls will be able to see partial updates, either due to memory visibility (a previous put not getting fully published, since you haven't told the JMM it needs to be) or due to a plain old race. Imagine if TroveMap.contains has some internal variable that it assumes won't change during the course of contains. This code lets that invariant break.
Regarding the memory visibility, the problem with that isn't false negatives (you use the synchronized double-check for that), but that trove's invariants may be violated. For instance, if they have a counter, and they require that counter == someInternalArray.length at all times, the lack of synchronization may be violating that.
My first thought was to make troveMap's reference volatile, and to re-write the reference every time you add to the map:
synchronized (this) {
troveMap.put(key, value);
troveMap = troveMap;
}
That way, you're setting up a memory barrier such that anyone who reads the troveMap will be guaranteed to see everything that had happened to it before its most recent assignment -- that is, its latest state. This solves the memory issues, but it doesn't solve the race conditions.
Depending on how quickly your data changes, maybe a Bloom filter could help? Or some other structure that's more optimized for certain fast paths?
Under the conditions you describe, it's easy to imagine a map implementation for which you can get false negatives by failing to synchronize. The only way I can imagine obtaining false positives is an implementation in which key insertions are non-atomic and a partial key insertion happens to look like another key you are testing for.
You don't say what kind of map you have implemented, but the stock map implementations store keys by assigning references. According to the Java Language Specification:
Writes to and reads of references are always atomic, regardless of whether they are implemented as 32 or 64 bit values.
If your map implementation uses object references as keys, then I don't see how you can get in trouble.
EDIT
The above was written in ignorance of Trove itself. After a little research, I found the following post by Rob Eden (one of the developers of Trove) on whether Trove maps are concurrent:
Trove does not modify the internal structure on retrievals. However, this is an implementation detail not a guarantee so I can't say that it won't change in future versions.
So it seems like this approach will work for now but may not be safe at all in a future version. It may be best to use one of Trove's synchronized map classes, despite the penalty.
I think you would be better off with a ConcurrentHashMap which doesn't need explicit locking and allows concurrent reads
boolean containsSpecial() {
return troveMap.contains(key);
}
void addSpecialKeyValue( key, value ) {
troveMap.putIfAbsent(key,value);
}
another option is using a ReadWriteLock which allows concurrent reads but no concurrent writes
ReadWriteLock rwlock = new ReentrantReadWriteLock();
boolean containsSpecial() {
rwlock.readLock().lock();
try{
return troveMap.contains(key);
}finally{
rwlock.readLock().release();
}
}
void addSpecialKeyValue( key, value ) {
rwlock.writeLock().lock();
try{
//...
troveMap.put(key,value);
}finally{
rwlock.writeLock().release();
}
}
Why you reinvent the wheel?
Simply use ConcurrentHashMap.putIfAbsent

Collection.synchronizedMap vs synchronizing individual methods in HashMap

What is the difference between a Collections.synchronizedMap() and a wrapper around a HashMap with all the methods synchronized. I dont see any difference becuase Collections.synchronizedMap() internally maintains the same lock for all methods.
Basically, what is the difference between the following code snippets
Class C {
Object o;
public void foo() {
synchronized(o) {
// thread safe code here
}
}
}
and
Class C {
Object o;
public synchronized void foo() {
}
}
There is only one difference:
Collections.synchronizedMap is able to use a different monitor than itself.
Using synchronized methods is the same as using sychnchonized(this)-blocks, which means, the wrapper would be the monitor and could be locked from the outside of the wrapper.
If you doesn't want an outside application to lock your monitor, you need to hide it.
On the other side, if you want to call multiple methods in a thread safe fashion, it is the easiest way to lock the whole collection (but it's not very scaleable, indeed).
Ps: For reuse, it's better to delegate the method calls to a backup-Map than to override the class, because you can switch to another Map implementation later, without changing your wrapper.
Both approaches acquire a monitor on the object and so should perform exactly the same. The main reason for the difference is architectural. The synchronized wrapper allows extending the basic non-thread safe variation easily.
Having said that don't use either, use ConcurrentHashMap. It uses lock striping so it's much quicker to use than either approach (as they are the same in terms of overhead + contention). Lock striping allows segments of the backing array to be locked independently. This means it's less probable that two threads will request to acquire the same lock.
Do not reinvent the wheel and use what is provided by the API.
You should always decorate rather than lumping everything and all feartures into one big featured class.
Always take the plain Map and decorate it with Collections or use a java.util.concurrent and use a real lock, so one can atomically inspect and update the map. Tomorrow you might want to change the Hashtable to a Treemap and you will be in trouble if your stuck with a hashtable.
So, why do you ask? :) Do you really believe that if class is placed in java.util package then some magic happens and its java code works in some tricky way?
It really just wraps all methods with synchronized {} block and nothing more.
UPD: the difference is that you have much less chances to make a mistake if you use synchronized collection instead of doing all synchronization stuff by yourself.
UPD 2: as you can see in sources they use 'mutex'-object as monitor. When you use synchronized modifier in method signature (i.e. synchronized void doSmth()) current instance of your object (i.e. this) is used as a monitor. Two blocks of code below are the same:
1.
synchronized public void doSmth () {
someLogic ();
moreLogic ();
}
synchronized public static void doSmthStatic () {
someStaticLogic ();
moreStaticLogic ();
}
2.
public void doSmth () {
synchronized (this) {
someLogic ();
moreLogic ();
}
}
public static void doSmthStatic () {
synchronized (ClassName.class) {
someStaticLogic ();
moreStaticLogic ();
}
}
If thread safety is the case, use concurrency package data structures. Using the wrapper class will reduce all accesses to the Map into a sequential queue.
a) Threads waiting to do operations at totally different points in the Map will be waiting for the same lock. Based on the number of threads this can affect the application performance.
b) Consider compound operations on the Map. Using a wrapper with a Single lock will not help. For example. "Look if present then add" kind of operations. Thread syncronization will again become an issue.

Categories

Resources