ConcurrentHashMap vs ReentrantReadWriteLock based Custom Map for Reloading

ConcurrentHashMap vs ReentrantReadWriteLock based Custom Map for Reloading - java

Java Gurus,
Currently we have a HashMap<String,SomeApplicationObject> which is being read frequently and modified occasionally and we are having issues that during the modification/reloading, Read operation returns null which is not acceptable.
To fix this I have following options:
A. Use ConcurrentHashMap
Which looks like the first choice but the operation which we are talking about is reload() - means clear() followed by replaceAll(). So if the Map is read post clear() and pre replaceAll() it returns null which is not desirable. Even if I synchronize this doesn't resolves the issue.
B. Create another implementation based upon ReentrantReadWriteLock
Where I would create acquire Write Lock before reload() operation. This seems more appropriate but I feel there must be something already available for this and I need not to reinvent the wheel.
What is the best way out?
EDIT Is any Collection already available with such feature?

Since you are reloading the map, I would replace it on a reload.
You can do this by using a volatile Map, which you replace in full when it is updated.

It seems you are not sure as to how what Peter Lawrey suggests can be implemented. It could look like this:
class YourClass {
private volatile Map<String, SomeApplicationObject> map;
//constructors etc.
public void reload() {
Map<String,SomeApplicationObject> newMap = getNewValues();
map = Collections.unmodifiableMap(newMap);
}
}
There are no concurrency issues because:
The new map is created via a local variable, which by definition is not shared - getNewValues does not need to be synchronized or atomic
The assignement to map is atomic
map is volatile, which guarantees that other threads will see the change

This sounds a lot like Guava's Cache, though it really depends how you're populating the map, and how you compute the values. (Disclosure: I contribute to Guava.)
The real question is whether or not you can specify how to compute your SomeApplicationObject given the input String. Just based on what you've told us so far, it might look something like this...
LoadingCache<String, SomeApplicationObject> cache = CacheBuilder.newBuilder()
.build(
new CacheLoader<String, SomeApplicationObject>() {
public SomeApplicationObject load(String key) throws AnyException {
return computeSomeApplicationObject(key);
}
});
Then, whenever you wanted to rebuild the cache, you just call cache.invalidateAll(). With a LoadingCache, you can then call cache.get(key) and if it hasn't computed the value already, it'll get recomputed. Or maybe after calling cache.invalidateAll(), you can call cache.loadAll(allKeys), though you'd still need to be able to load single elements at a time in case any queries come in between the invalidateAll and loadAll.
If this isn't acceptable -- if you can't load one value individually, you have to load them all at once -- then I'd go ahead with Peter Lawrey's approach -- keep a volatile reference to a map (ideally an ImmutableMap), recompute the whole map and assign the new map to the reference when you're done.

Related

How to make operations on the value of a concurrent map atomic?

Let's say I have the following field inside of a class:
ConcurrentHashMap<SomeClass, Set<SomeOtherClass>> myMap = new ConcurrentHashMap<SomeClass, Set<SomeOtherClass>>();
An instance of this class is shared among many threads.
If I want to add or remove an element from a set associated with a key, I could do:
Set<SomeOtherClass> setVal = myMap.get(someKeyValue);
setVal.add(new SomeOtherClass());
The get operation is atomic, therefore thread-safe. However, there's no guarantee that, in between the get and the add instruction, some other thread won't modify the structure messing with the first one's execution.
What would be the best way to make the whole operation atomic?
Here's what I've come up with, but I don't believe it to be very efficient (or to be making the best use out of Java's structures):
I have a ReentrantLock field, so my class looks like this:
class A {
ReentrantLock lock = new ReentrantLock();
ConcurrentHashMap<SomeClass, Set<SomeOtherClass>> myMap = new ConcurrentHashMap<SomeClass, Set<SomeOtherClass>>();
}
Then the method call looks something like this:
lock.lock();
Set<SomeOtherClass> setVal = myMap.get(someKeyValue);
synchronized(setVal) {
lock.unlock();
setVal.add(new SomeOtherClass());
}
The idea being that we let go of the lock once we have made sure no one else will access the Set we're trying to modify. However, I don't think this is making the best use of the ConcurrentMap or that it makes much sense to have a lock, a concurrent structure, and a synchronized block all used to achieve one operation.
Is there a better way to go about this?

ConcurrentHashMap guarantees that the entire method invocation of compute (or computeIfAbsent or computeIfPresent) is done atomically. So, e.g., you could do something like this:
myMap.compute(someKeyValue, (k, v) -> {v.add(new SomeOtherClass()); return v;});
Note:
Using compute is analogous to the original snippet that assumes that somKeyValue is present in the map. Using computeIfPresent is probably safer, though.

Does double-checked locking work with a final Map in Java?

I'm trying to implement a thread-safe Map cache, and I want the cached Strings to be lazily initialized. Here's my first pass at an implementation:
public class ExampleClass {
private static final Map<String, String> CACHED_STRINGS = new HashMap<String, String>();
public String getText(String key) {
String string = CACHED_STRINGS.get(key);
if (string == null) {
synchronized (CACHED_STRINGS) {
string = CACHED_STRINGS.get(key);
if (string == null) {
string = createString();
CACHED_STRINGS.put(key, string);
}
}
}
return string;
}
}
After writing this code, Netbeans warned me about "double-checked locking," so I started researching it. I found The "Double-Checked Locking is Broken" Declaration and read it, but I'm unsure if my implementation falls prey to the issues it mentioned. It seems like all the issues mentioned in the article are related to object instantiation with the new operator within the synchronized block. I'm not using the new operator, and Strings are immutable, so I'm not sure that if the article is relevant to this situation or not. Is this a thread-safe way to cache strings in a HashMap? Does the thread-safety depend on what action is taken in the createString() method?

No it's not correct because the first access is done out side of a sync block.
It's somewhat down to how get and put might be implemented. You must bare in mind that they are not atomic operations.
For example, what if they were implemented like this:
public T get(string key){
Entry e = findEntry(key);
return e.value;
}
public void put(string key, string value){
Entry e = addNewEntry(key);
//danger for get while in-between these lines
e.value = value;
}
private Entry addNewEntry(key){
Entry entry = new Entry(key, ""); //a new entry starts with empty string not null!
addToBuckets(entry); //now it's findable by get
return entry;
}
Now the get might not return null when the put operation is still in progress, and the whole getText method could return the wrong value.
The example is a bit convoluted, but you can see that correct behaviour of your code relies on the inner workings of the map class. That's not good.
And while you can look that code up, you cannot account for compiler, JIT and processor optimisations and inlining which effectively can change the order of operations just like the wacky but correct way I chose to write that map implementation.

Consider use of a concurrent hashmap and the method Map.computeIfAbsent() which takes a function to call to compute a default value if key is absent from the map.
Map<String, String> cache = new ConcurrentHashMap<>( );
cache.computeIfAbsent( "key", key -> "ComputedDefaultValue" );
Javadoc: If the specified key is not already associated with a value, attempts to compute its value using the given mapping function and enters it into this map unless null. The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.

Non-trivial problem domains:
Concurrency is easy to do and hard to do correctly.
Caching is easy to do and hard to do correctly.
Both are right up there with Encryption in the category of hard to get right without an intimate understanding of the problem domain and its many subtle side effects and behaviors.
Combine them and you get a problem an order of magnitude harder than either one.
This is a non-trivial problem that your naive implementation will not solve in a bug free manner. The HashMap you are using is not going to threadsafe if any accesses are not checked and serialized, it will not be performant and will cause lots of contention that will cause lot of blocking and latency depending on the use.
The proper way to implement a lazy loading cache is to use something like Guava Cache with a Cache Loader it takes care of all the concurrency and cache race conditions for you transparently. A cursory glance through the source code shows how they do it.

No, and ConcurrentHashMap would not help.
Recap: the double check idiom is typically about assigning a new instance to a variable/field; it is broken because the compiler can reorder instructions, meaning the field can be assigned with a partially constructed object.
For your setup, you have a distinct issue: the map.get() is not safe from the put() which may be occurring thus possibly rehashing the table. Using a Concurrent hash map fixes ONLY that but not the risk of a false positive (that you think the map has no entry but it is actually being made). The issue is not so much a partially constructed object but the duplication of work.
As for the avoidable guava cacheloader: this is just a lazy-init callback that you give to the map so it can create the object if missing. This is essentially the same as putting all the 'if null' code inside the lock, which is certainly NOT going to be faster than good old direct synchronization. (The only times it makes sense to use a cacheloader is for pluggin-in a factory of such missing objects while you are passing the map to classes who don't know how to make missing objects and don't want to be told how).

least blocking java cache

Suppose we want to implement a cache for a particular entity.
class Cache {
private static Map<String, Object> cache = new HashMap<>();
public static Object get(String id) {
assert notNullOrEmpty(id);
return cache.get(id);
}
public static Object add(String id, Object element) {
assert notNullOrEmpty(id) && notNull(element);
if(cache.containsKey(id)) return cache.get(id);
cache.put(id, element);
return element;
}
}
now we want to ensure this is threadsafe and most importantly optimal when it comes to data access and performance (we dont want to block when its not necessary). For example if we mark both methods as synchronized we will uslessly block two concurrent get() calls which could perfectly work without block.
so we want to block get() only if add() is in process, and block add only if at least one get() or an add() is in process. Multiple concurrent get() executions should not block each other...
How do we do this?
UPDATE
In fact this is not a cache but just a use case i've come up with to describe the problem, the actual purpose is to create a singletone instances store...
For example there is a Currency type which is only instantiated trough its builder and is immutable, builder itself after verifying that parameters passed in are valid checks this so called global cache in static context to see if there is an instance already created... well you got me...
This is not an enum usecase because system will dynamically add new Currency, Market or even Exchange instances which all should be loosely coupled and instantiated only once... (also to prevent heavy GC)
So to clarify the question... think of the global problem of concurrency not the particular examlpe.
I've found this link quite helpful http://tutorials.jenkov.com/java-concurrency/read-write-locks.html
i guess there are some lock types already in JDK for this purpose, but not sure yet.

Actually I gave a talk on this just today at the FOSDEM conference in Burssels. See the slides here: http://www.slideshare.net/cruftex/cache2k-java-caching-turbo-charged-fosdem-2015
Basically you can use Google Guava, however, since Guava is a cache which uses LRU, there is still a synchronized block needed. Something which I am exploring in cache2k is used an advanced eviction algorithm, that needs no list manipulation for the cache access, so locks whatsoever at all.
cache2k is on maven central, add cache2k-api and cache2k-core as dependency and initialize the cache with:
cache =
CacheBuilder.newCache(String.class, Object.class)
.implementation(ClockProPlusCache.class)
.build();
If you have only cache hits, cache2k is about 5x faster then Guava and 10x faster then EHCache. For your usage pattern e.g. with the Currency type you can run the cache in read through configuration and add a cache source which is responsible for constructing the Currency instances.
So, you don't necessarily do look out for a cache. For the currency example you don't need a cache, since there is a limited space of currency instances. If you want to do the same with a possible non limited space, the cache is the more universal solution, since you have to limit the resource consumption. One example I explored, is using this for formatted dates. See: https://github.com/headissue/cache2k-benchmark/blob/master/zoo/src/test/java/org/cache2k/benchmark/DateFormattingBenchmark.java
For general questions on cache2k, feel free to post them on stack overflow.

How to optimize concurrent operations in Java?

I'm still quite shaky on multi-threading in Java. What I describe here is at the very heart of my application and I need to get this right. The solution needs to work fast and it needs to be practically safe. Will this work? Any suggestions/criticism/alternative solutions welcome.
Objects used within my application are somewhat expensive to generate but change rarely, so I am caching them in *.temp files. It is possible for one thread to try and retrieve a given object from cache, while another is trying to update it there. Cache operations of retrieve and store are encapsulated within a CacheService implementation.
Consider this scenario:
Thread 1: retrieve cache for objectId "page_1".
Thread 2: update cache for objectId "page_1".
Thread 3: retrieve cache for objectId "page_2".
Thread 4: retrieve cache for objectId "page_3".
Thread 5: retrieve cache for objectId "page_4".
Note: thread 1 appears to retrieve an obsolete object, because thread 2 has a newer copy of it. This is perfectly OK so I do not need any logic that will give thread 2 priority.
If I synchronize retrieve/store methods on my service, then I'm unnecessarily slowing things down for threads 3, 4 and 5. Multiple retrieve operations will be effective at any given time but the update operation will be called rarely. This is why I want to avoid method synchronization.
I gather I need to synchronize on an object that is exclusively common to thread 1 and 2, which implies a lock object registry. Here, an obvious choice would be a Hashtable but again, operations on Hashtable are synchronized, so I'm trying a HashMap. The map stores a string object to be used as a lock object for synchronization and the key/value would be the id of the object being cached. So for object "page_1" the key would be "page_1" and the lock object would be a string with a value of "page_1".
If I've got the registry right, then additionally I want to protect it from being flooded with too many entries. Let's not get into details why. Let's just assume, that if the registry has grown past defined limit, it needs to be reinitialized with 0 elements. This is a bit of a risk with an unsynchronized HashMap but this flooding would be something that is outside of normal application operation. It should be a very rare occurrence and hopefully never takes place. But since it is possible, I want to protect myself from it.
#Service
public class CacheServiceImpl implements CacheService {
private static ConcurrentHashMap<String, String> objectLockRegistry=new ConcurrentHashMap<>();
public Object getObject(String objectId) {
String objectLock=getObjectLock(objectId);
if(objectLock!=null) {
synchronized(objectLock) {
// read object from objectInputStream
}
}
public boolean storeObject(String objectId, Object object) {
String objectLock=getObjectLock(objectId);
synchronized(objectLock) {
// write object to objectOutputStream
}
}
private String getObjectLock(String objectId) {
int objectLockRegistryMaxSize=100_000;
// reinitialize registry if necessary
if(objectLockRegistry.size()>objectLockRegistryMaxSize) {
// hoping to never reach this point but it is not impossible to get here
synchronized(objectLockRegistry) {
if(objectLockRegistry.size()>objectLockRegistryMaxSize) {
objectLockRegistry.clear();
}
}
}
// add lock to registry if necessary
objectLockRegistry.putIfAbsent(objectId, new String(objectId));
String objectLock=objectLockRegistry.get(objectId);
return objectLock;
}

If you are reading from disk, lock contention is not going to be your performance issue.
You can have both threads grab the lock for the entire cache, do a read, if the value is missing, release the lock, read from disk, acquire the lock, and then if the value is still missing write it, otherwise return the value that is now there.
The only issue you will have with that is the concurrent read trashing the disk... but the OS caches will be hot, so the disk shouldn't be overly trashed.
If that is an issue then switch your cache to holding a Future<V> in place of a <V>.
The get method will become something like:
public V get(K key) {
Future<V> future;
synchronized(this) {
future = backingCache.get(key);
if (future == null) {
future = executorService.submit(new LoadFromDisk(key));
backingCache.put(key, future);
}
}
return future.get();
}
Yes that is a global lock... but you're reading from disk, and don't optimize until you have a proved performance bottleneck...
Oh. First optimization, replace the map with a ConcurrentHashMap and use putIfAbsent and you'll have no lock at all! (BUT only do that when you know this is an issue)

The complexity of your scheme has already been discussed. That leads to hard to find bugs. For example, not only do you lock on non-final variables, but you even change them in the middle of synchronized blocks that use them as a lock. Multi-threading is very hard to reason about, this kind of code makes it almost impossible:
synchronized(objectLockRegistry) {
if(objectLockRegistry.size() > objectLockRegistryMaxSize) {
objectLockRegistry = new HashMap<>(); //brrrrrr...
}
}
In particular, 2 simultaneous calls to get a lock on a specific string might actually return 2 different instances of the same string, each stored in a different instance of your hashmap (unless they are interned), and you won't be locking on the same monitor.
You should either use an existing library or keep it a lot simpler.

If your question includes the keywords "optimize", "concurrent", and your solution includes a complicated locking scheme ... you're doing it wrong. It is possible to succeed at this sort of venture, but the odds are stacked against you. Prepare to diagnose bizarre concurrency bugs, including but not limited to, deadlock, livelock, cache incoherency... I can spot multiple unsafe practices in your example code.
Pretty much the only way to create a safe and effective concurrent algorithm without being a concurrency god is to take one of the pre-baked concurrent classes and adapt them to your need. It's just too hard to do unless you have an exceptionally convincing reason.
You might take a look at ConcurrentMap. You might also like CacheBuilder.

Using Threads and synchronize directly is covered by the beginning of most tutorials about multithreading and concurrency. However, many real-world examples require more sophisticated locking and concurrency schemes, which are cumbersome and error prone if you implement them yourself. To prevent reinventing the wheel over an over again, the Java concurrency library was created. There, you can find many classes that will be of great help to you. Try googling for tutorials about java concurrency and locks.
As an example for a lock which might help you, see http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReadWriteLock.html .

Rather than roll your own cache I would take a look at Google's MapMaker. Something like this will give you a lock cache that automatically expires unused entries as they are garbage collected:
ConcurrentMap<String,String> objectLockRegistry = new MapMaker()
.softValues()
.makeComputingMap(new Function<String,String> {
public String apply(String s) {
return new String(s);
});
With this, the whole getObjectLock implementation is simply return objectLockRegistry.get(objectId) - the map takes care of all the "create if not already present" stuff for you in a safe way.

I Would do it similar, to you: just create a map of Object (new Object()).
But in difference to you i would use TreeMap<String, Object>
or HashMap
You call that the lockMap. One entry per file to lock. The lockMap is public available to all participating threads.
Each read and write to a specific file, gets the lock from the map. And uses syncrobize(lock) on that lock object.
If the lockMap is not fixed, and its content chan change, then reading and writing to the map must syncronized, too. (syncronized (this.lockMap) {....})
But your getObjectLock() is not safe, sync that all with your lock. (Double checked lockin is in Java not thread safe!) A recomended book: Doug Lea, Concurrent Programming in Java

Building an object for a cache, "asynchronously" (kind of), in Java: howto?

Say you have this kind of code:
public final class SomeClass
{
private final Map<SomeKey, SomeValue> map = new HashMap<SomeKey, SomeValue>();
// ...
public SomeValue getFromCache(final SomeKey key)
{
SomeKey ret;
synchronized(map) {
ret = map.get(key);
if (ret == null) {
ret = buildValue(key);
map.put(key, ret);
}
}
return ret;
}
//etc
}
The problem is performance: if buildValue() is an expensive function, then one caller having to build its value will block all other callers, whose value may already exist. I'd like to find a mechanism in which a caller having to build a value will not block other callers.
I cannot believe this problem hasn't been tackled (and solved) already. I tried to google around for a solution but couldn't find one so far. Do you have a link to do that?
I was thinking about using a ReentrantReadWriteLock, but couldn't come with anything yet.

Guava has a very solid solution to this, based on some work by Doug Lea, who wrote most of java.util.concurrent. (Disclosure: I contribute to Guava, though I haven't worked on caching at all.)
The user guide article on Guava's Cache package is here, but the syntax looks like this...
LoadingCache<Key, Graph> graphs = CacheBuilder.newBuilder()
.maximumSize(1000)
.expireAfterWrite(10, TimeUnit.MINUTES)
.removalListener(MY_LISTENER)
.build(
new CacheLoader<Key, Graph>() {
public Graph load(Key key) throws AnyException {
return createExpensiveGraph(key);
}
});

I think part of the issue is that the get method you have appears to be synchronous. That alone makes it difficult to do anything async. It seems that your get should take in a callback.
The Java Concurrency in Practice book details an awesome cache using ConcurrentHashMap and FutureTasks - check out http://jcip.net/listings/Memoizer.java
You'll still need to tweak that class to make it asynchronous though - it still blocks the current thread as the task is computed, but it prevents two threads from computing the same thing. If one thread is already computing it and another wants it, it will wait till the computation is finished rather than starting a new computation

ReentrantReadWriteLock could be a solution: use read lock when you get data from map, while use write lock when you build the data and put it into map. The disadvantage for this solution is: write lock will lock the whole map, so buildValue becomes a synchronized method, when too many caches are write into map, they have to be writen one by one.
Another method is use java.util.concurrent.ConcurrentMap, then you don't need to have any lock by using putIfAbsent when you put data into map. The disadvantage is may datas with same key are built concurrently in different Thread, but it won't be a problem unless your appliacation needs strict memory usage.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.