Volatile HashMap vs ConcurrentHashMap

Volatile HashMap vs ConcurrentHashMap - java

I have a cache class which contains a volatile HashMap<T> to store cache items.
I'm curious what would be the consequences of changing volatile HashMap to ConcurrentHashMap?
Would i gain performance increase? This cache is readonly cache.
What would be the best option to use? just HashMap? Cache is being populated on a interval.

First, it appears you don't understand what the volatile keyword does. It makes sure that if the reference value held by the variable declared volatile changes, other threads will see it rather than having a cached copy. It has nothing to do with thread-safety in regard to accessing the HashMap
Given that, and the fact that you say the HashMap is read-only ... you certainly don't need to use anything that provides thread-safety including a ConcurrentHashMap
Edit to add: Your last edit you now say "The cache is being populated on a interval"
That's not read-only then, is it?
If you're going to have threads reading from it while you are writing (updating the existing HashMap) then you should use a ConcurrentHashMap, yes.
If you are populating an entirely new HashMap then assigning it to the existing variable, then you use volatile

You say the cache is read-only, but also being updated on an interval which seems contradictory.
If the whole cache gets updated on an interval, I'd keep using the volatile.
The volatile will make sure that the updated map is safely published.
public final class Cache
{
private volatile Map<?,?> cache;
private void mapUpdate() {
Map<?,?> newCache = new HashMap<>();
// populate the map
// update the reference with an immutable collection
cache = Collections.unmodifiableMap(newCache);
}
}
If the interval update is modifying the same cache, then you probably want to use a ConcurrentHashMap, or copy the map, update the copy, and update the reference.
public final class Cache
{
private volatile Map<?,?> cache;
private void mapUpdate() {
Map<?,?> newCache = new HashMap<>(cache);
// update the map
// update the reference with an immutable collection
cache = Collections.unmodifiableMap(newCache);
}
}

I have a similar use case for my web application. I am using a HAshMap for my in-memory cache. The use case is as follows -
One user request comes in and first checks the cache for existence of a record using an input key. This is done in the add method.
If the object is not present then it inserts the new record in the cache.
Similarly in the remove method first checks the presence of a record in the cache using the key and if found just removes that.
I want to make sure of two threads are concurrently executing one on add and another on remove method will this approach make sure they at point of them they see the latest data in the cache? If i am not wrong then synchronized method takes care of thread safety where as volatile takes care of visibility.
private volatile HashMap<String,String> activeRequests = new HashMap<String,String>();
public synchronized boolean add(String pageKey, String space, String pageName) {
if (!(activeRequests.get(pageKey) == null)) {
return false;
}
activeRequests.put(pageKey, space + ":" + pageName);
return true;
}
public synchronized void remove(String pageKey) {
if(!(activeRequests.get(pageKey) == null))
activeRequests.remove(pageKey);
}

AFAIK, although the first answer explains correctly, depending on use case, using a volatile on a cache that is refreshed and replaced frequently is unnecessary overhead and can actually be bad or inconsistent assuming this is just static metadata snapshot and not updated by other threads.
If you take an example of a Http Request that reads everything from the cache to get everything needed, the request uses a reference of the map, then starts reading some keys from the reference, then half way while reading, cache reference is updated to a new hashmap (refresh), now it starts reading a different state of cache and can become inconsistent if entries in cache are not for a specific time snapshot T. With volatile, you read Key1:Val1 at T1, Key2:Val2 at T2 wheras you need Val1, Val2 to be read for the same snapshot at T1. With volatile your reference is always updated, you could read Key1:Val1 first time and Key1:Val2 second time giving different data in the same request.
Without volatile, the request will use a reference always pointing to a reference snapshot until it has completed processing. Without volatile, you will always read Key1:Val1 at T1 and same value Key2:Val1 at T2. Once all requests using this reference have completed, the older dereferenced map will be GCed.

Related

Java Synchronizing for Integer values. Cleaning up map that tracks synchronize objects

I have a portion of code that needs to be thread safe. It is code that loads and modifies an object from the database based on its ID. I want to avoid synchronizing on just the Integer ID variable, so I am attempting to implement the solution offered in this thread: https://stackoverflow.com/a/659939/3561422
However, I am not creating a cache so I have nothing in place to manage the objects added to the map. I want to avoid a memory leak situation. I have looked into using a WeakHashMap, but that is apparently not thread-safe. I have created a map as follows, but the GC does not appear to be cleaning up the references I create.
private static Map<Integer, Object> locks = Collections.synchronizedMap(new WeakHashMap<Integer, Object>())
Is there something I am missing here that would make this solution work? Is WeakHashMap actually safe for me to use here?
Some example code:
public static void mainMethod(Integer id){
Object lockObject = getMapObject(id);
synchronized (lockObject) {
Object dbObj = loadDBObjFromDB(id);
//Do pre execution checks
if (dbObj.isInUse()) {
//fail here
}
dbObj.setAsInUseAndCommitToDB();
}
actOnObj(dbObj);
}
private static Object getMapObject(final Integer id) {
locks.putIfAbsent(id, new Object);
return locks.get(id);
}
Basically, I need to mark something in the database as in use. If another thread comes in and wants to do something on it, it needs to see if it is already in use. If it is, I fail and give the user feedback. I need to lock around loading, checking if it is in use, and updating that it is in use. I would like to use the map to avoid locking on an Integer object

I think that what you are looking for here is an implementation of ConcurrentHashSet (there are several out there, I'd look at Guava's). It is the same idea as a ConcurrentHashMap without needing a value (in fact, Guava's is based on ConcurrentHashSet per the documentation). Another alternative is simply doing what you are doing, and only using a single, statically created object as the value (since the value here is irrelevant):
private static final MAP_VALUE = new Object();
private static Object getMapObject(final Integer id) {
locks.putIfAbsent(id, MAP_VALUE);
return locks.get(id);
}
For the map, just make it a regular ConcurrentHashMap. No need to worry about weak references or weak hashmaps.

Storing object reference into a volatile field

I'm using the following field:
private DateDao dateDao;
private volatile Map<String, Date> dates;
public Map<String, Date> getDates() {
return Collections.unmodifiableMap(dates);
}
public retrieveDates() {
dates = dateDao.retrieveDates();
}
Where
public interface DateDao {
//Currently returns HashMap instance
public Map<String, Date> retrieveDates();
}
Is it safe to publish the map of dates that way? I mean, volatile field means that the reference to a field won't be cached in CPU registers and be read from memory any time it is accessed.
So, we might as well read a stale value for the state of the map because HashMap doesn't do any synchronization.
Is it safe to do so?
UPD: For instance assume that the DAo method implemented in the following way:
public Map<String, Date> retrieveDates() {
Map<String, Date> retVal = new HashMap<>();
retVal.put("SomeString", new Date());
//ad so forth...
return retVal;
}
As can be seen, the Dao method doesn't do any synchronization, and both HashMap and Date are mutable and not thread safe. Now, we've created and publish them as it was shown above. Is it guaranteed that any subsequent read from the dates from some another thread will observe not only the correct reference to the Map object, but also it's "fresh" state.
I'm not sure about if the thread can't observe some stale value (e.g. dates.get("SomeString") returns null)

I think you're asking two questions:
Given that DAO code, is it possible for your code using it to use the object reference it gets here:
dates = dateDao.retrieveDates();
before the dateDao.retrieveDates method as quoted is done adding to that object. E.g., do the memory model' statement reordering semantics allow the retrieveDates method to return the reference before the last put (etc.) is complete?
Once your code has the dates reference, is there an issue with unsynchronized access to dates in your code and also via the read-only view of it you return from getDates.
Whether your field is volatile has no bearing on either of those questions. The only thing that making your field volatile does is prevent a thread calling getDates from getting an out-of-date value for your dates field. That is:
Thread A Thread B
---------- --------
1. Updates `dates` from dateDao.retrieveDates
2. Updates `dates` from " " again
3. getDates returns read-only
view of `dates` from #1
Without volatile, the scenario above is possible (but harmless). With volatile, it isn't, Thread B will see the value of dates from #2, not #1.
But that doesn't relate to either of the questions I think you're asking.
Question 1
No, your code in retrieveDates cannot see the object reference returned by dateDao.retrieveDates before dateDao.retrieveDates is done filling in that map. The memory model allows reordering statements, but:
...compilers are allowed to reorder the instructions in either thread, when this does not affect the execution of that thread in isolation
(My emphasis.) Returning the reference to your code before dateDao.retrieveDates would obviously affect the execution of the thread in isolation.
Question 2
The DAO code you've shown can never modify the map it returns to you, since it doesn't keep a copy of it, so we don't need to worry about the DAO.
In your code, you haven't shown anything that modifies the contents of dates. If your code doesn't modify the contents of dates, then there's no need for synchronization, since the map is unchanging. You might want to make that a guarantee by wrapping dates in the read-only view when you get it, rather than when you return it:
dates = Collection.unmodifiableMap(dateDao.retrieveDates());
If your code does modify dates somewhere you haven't shown, then yes, there's potential for trouble because Collections.unmodifiableMap does nothing to synchronize map operations. It just creates a read-only view.
If you wanted to ensure synchronization, you'd want to wrap dates in a Collections.synchronizedMap instance:
dates = Collections.synchronizedMap(dateDao.retrieveDates());
Then all access to it in your code will be synchronized, and all access to it via the read-only view you return will also be synchronized, as they all go through the synchronized map.

As far as I can tell, declaring a map volatile won't synchronize its access (i.e. readers could read the map while it is being updated by the dao). However, it guarantees that the map lives in shared memory, so every thread will see the same values in it at every given time. What I usually do when I need synchronization and freshness is using a lock object, something similar to the following :
private DateDao dateDao;
private volatile Map<String, Date> dates;
private final Object _lock = new Object();
public Map<String, Date> getDates() {
synchronized(_lock) {
return Collections.unmodifiableMap(dates);
}
}
public retrieveDates() {
synchronized(_lock) {
dates = dateDao.retrieveDates();
}
}
This provides readers/writers synchronization (but note that writers are not prioritized, i.e. if a reader is getting the map the writers will have to wait) and 'data freshness' via volatile. Moreover, this is a pretty basic approach, and there are other ways of achieving the same features (e.g. Locks and Semaphores), but most of the times this does the trick for me.

Best data structure in Java when using HashSet as a cache

My use case is this:
I need to cache a set of strings for frequent read access. The cache is updated by a daemon thread periodically. Moreover, the cache element will never get updated individually, it would always be set.clear();set.addAll(List)I'm currently using a HashSet protected by an ReentrantReadWriteLock. Is there a better way to do this?

One option would be a volatile set:
private volatile Set<String> set = new HashSet<> ();
public void update() {
Set<String> newSet = getNewData();
set = newSet;
}
This is thread safe (if you don't let other code access the set itself) and does not require locking. One drawback is that you hold both sets in memory until the next GC (not sure how much space is used per entry - to be tested).

Updating BigDecimal concurrently within ConcurrentHashMap thread safe

Is the code below thread/concurrency safe when there are multiple threads calling the totalBadRecords() method from inside other method? Both map objects parameters to this method are ConcurrentHashMap. I want to ensure that each call updates the total properly.
If it is not safe, please explain what do I have to do to ensure thread safety.
Do I need to synchronize the add/put or is there a better way?
Do i need to synchronize the get method in TestVO. TestVO is simple java bean and having getter/setter method.
Below is my Sample code:
public void totalBadRecords(final Map<Integer, TestVO> sourceMap,
final Map<String, String> logMap) {
BigDecimal badCharges = new BigDecimal(0);
boolean badRecordsFound = false;
for (Entry<Integer, TestVO> e : sourceMap.entrySet()) {
if ("Y".equals(e.getValue().getInd()))
badCharges = badCharges.add(e.getValue()
.getAmount());
badRecordsFound = true;
}
if (badRecordsFound)
logMap.put("badRecordsFound:", badCharges.toPlainString());
}

That depends on how your objects are used in your whole application.
If each call to totalBadRecords takes a different sourceMap and the map (and its content) is not mutated while counting, it's thread-safe:
badCharges is a local variable, it can't be shared between thread, and is thus thread-safe (no need to synchronize add)
logMap can be shared between calls to totalBadRecords: the method put of ConcurrentHashMap is already synchronized (or behaves as if it was).
if instances of TestVO are not mutated, the value from getValue() and getInd() are always coherent with one other.
the sourceMap is not mutated, so you can iterate over it.
Actually, in this case, you don't need a concurrent map for sourceMap. You could even make it immutable.
If the instances of TestVO and the sourceMap can change while counting, then of course you could be counting wrongly.

It depends on what you mean by thread-safe. And that boils down to what the requirements for this method are.
At the data structure level, the method will not corrupt any data structures, because the only data structures that could be shared with other threads are ConcurrentHashMap instances, and they safe against that kind of problem.
The potential thread-safety issue is that iterating a ConcurrentHashMap is not an atomic operation. The guarantees for the iterators are such that you are not guaranteed to see all entries in the iteration if the map is updated (e.g. by another thread) while you are iterating. That means that the totalBadRecords method may not give an accurate count if some other thread modifies the map during the call. Whether this is a real thread-safety issue depends on whether or not the totalBadRecords is required to give an accurate result in that circumstance.
If you need to get an accurate count, then you have to (somehow) lock out updates to the sourceMap while making the totalBadRecords call. AFAIK, there is no way to do this using (just) the ConcurrentHashMap API, and I can't think of a way to do it that doesn't make the map a concurrency bottleneck.
In fact, if you need to calculate accurate counts, you have to use external locking for (at least) the counting operation, and all operations that could change the outcome of the counting. And even that doesn't deal with the possibility that some thread may modify one of the TestVO objects while you are counting records, and cause the TestVO to change from "good" to "bad" or vice-versa.

You could use something like the following.
That would guarantee you that after a call to the totalBadRecords method, the String representing the bad charges in the logMap is accurate, you don't have lost updates. Of course a phantom read can always happen, as you do not lock the sourceMap.
private static final String BAD_RECORDS_KEY = "badRecordsFound:";
public void totalBadRecords(final ConcurrentMap<Integer, TestVO> sourceMap,
final ConcurrentMap<String, String> logMap) {
while (true) {
// get the old value that is going to be replaced.
String oldValue = logMap.get(BAD_RECORDS_KEY);
// calculate new value
BigDecimal badCharges = BigDecimal.ZERO;
for (TestVO e : sourceMap.values()) {
if ("Y".equals(e.getInd()))
badCharges = badCharges.add(e.getAmount());
}
final String newValue = badCharges.toPlainString();
// insert into map if there was no mapping before
if (oldValue == null) {
oldValue = logMap.putIfAbsent(BAD_RECORDS_KEY, newValue);
if (oldValue == null) {
oldValue = newValue;
}
}
// replace the entry in the map
if (logMap.replace(BAD_RECORDS_KEY, oldValue, newValue)) {
// update succeeded -> there where no updates to the logMap while calculating the bad charges.
break;
}
}
}

The right way to synchronize access to read only map in Java

I'm writing an analogue of DatabaseConfiguration class which reads configuration from database and I need some advice regards synchronization.
For example,
public class MyDBConfiguration{
private Connection cn;
private String table_name;
private Map<String, String> key_values = new HashMap<String,String>();
public MyDBConfiguration (Connection cn, String table_name) {
this.cn = cn;
this.table_name = table_name;
reloadConfig();
}
public String getProperty(String key){
return this.key_values.get(key);
}
public void reloadConfig() {
Map<String, String> tmp_map = new HashMap<String,String> ();
// read data from database
synchronized(this.key_values)
{
this.key_values = tmp_map;
}
}
}
So I have a couple questions.
1. Assuming properties are read only , do I have use synchronize in getProperty ?
2. Does it make sense to do this.key_values = Collections.synchronizedMap(tmp_map) in reloadConfig?
Thank you.

If multiple threads are going to share an instance, you must use some kind of synchronization.
Synchronization is needed mainly for two reasons:
It can guarantee that some operations are atomic, so the system will keep consistent
It guarantees that every threads sees the same values in the memory
First of all, since you made reloadConfig() public, your object does not really look immutable. If the object is really immutable, that is, if after initialization of its values they cannot change (which is a desired property to have in objects that are shared).
For the above reason, you must synchronize all the access to the map: suppose a thread is trying to read from it while another thread is calling reloadConfig(). Bad things will happen.
If this is really the case (mutable settings), you must synchronize in both reads and writes (for obvious reasons). Threads must synchronize on a single object (otherwise there's no synchronization). The only way to guarantee that all the threads will synchronize on the same object is to synchronize on the object itself or in a properly published, shared, lock, like this:
// synchronizes on the in instance itself:
class MyDBConfig1 {
// ...
public synchronized String getProperty(...) { ... }
public synchronized reloadConfig() { ... }
}
// synchronizes on a properly published, shared lock:
class MyDBConfig2 {
private final Object lock = new Object();
public String getProperty(...) { synchronized(lock) { ... } }
public reloadConfig() { synchronized(lock) { ... } }
}
The properly publication here is guaranteed by the final keyword. It is subtle: it guarantees that the value of this field is visible to every thread after initialization (without it, a thread might see that lock == null, and bad things will happen).
You could improve the code above by using a (properly published) ReadWriteReentrantLock. It might improve concurrency a bit if that's a concern for you.
Supposing your intention was to make MyDBConfig immutable, you do not need to serialize access to the hash map (that is, you don't necessarily need to add the synchronized keyword). You might improve concurrency.
First of all, make reloadConfig() private (this will indicate that, for consumers of this object, it is indeed immutable: the only method they see is getProperty(...), which, by its name, should not modify the instance).
Then, you only need to guarantee that every thread will see the correct values in the hash map. To do so, you could use the same techniques presented above, or you could use a volatile field, like this:
class MyDBConfig {
private volatile boolean initialized = false;
public String getProperty(...) { if (initialized) { ... } else { throw ... } }
private void reloadConfig() { ...; initialized = true; }
public MyDBConfig(...) { ...; reloadConfig(); }
}
The volatile keyword is very subtle. Volatile writes and volatile reads have a happens-before relationship. A volatile write is said to happen-before a subsequent volatile read of the same (volatile) field. What this means is that all the memory locations that have been modified before (in program order) a volatile write are visible to every other thread after they have executed a subsequente volatile read of the same (volatile) field.
In the code above, you write true to the volatile field after all the values have been set. Then, the method reading values (getProperty(...)) begins by executing a volatile read of the same field. Then this method is guaranteed to see the correct values.
In the example above, if you don't publish the instance before the constructor finishes, it is guaranteed that the exception won't get thrown in the method getProperty(...) (because before the constructor finishes, you have written true to initialized).

Assuming that key_values will not be put to after reloadConfig you will need to synchronize access to both reads and writes of the map. You are violating this by only synchronizing on the assignment. You can solve this by removing the synchronized block and assigning the key_values as volatile.
Since the HashMap is effectively read only I wouldn't assign Collections.synchronizedMap rather Collections.unmodifiableMap (this wouldn't effect the Map itself, just prohibit from accidental puts from someone else possible using this class).
Note: Also, you should never synchronize a field that will change. The results are very unpredictable.
Edit: In regards to the other answers. It is highly suggested that all shared mutable data must be synchronized as the effects are non-deterministic. The key_values field is a shared mutable field and assignments to it must be synchronized.
Edit: And to clear up any confusion with Bruno Reis. The volatilefield would be legal if you still fill the tmp_map and after its finished being filled assign it to this.key_values it would look like:
private volatile Map<String, String> key_values = new HashMap<String,String>();
..rest of class
public void reloadConfig() {
Map<String, String> tmp_map = new HashMap<String,String> ();
// read data from database
this.key_values = tmp_map;
}
You still need the same style or else as Bruno Reis noted it would not be thread-safe.

I would say that if you guarantee that no code will structurally modify your map, then there is no need to synchronize it.
If multiple threads access a hash map concurrently, and at least one
of the threads modifies the map structurally, it must be synchronized
externally.
http://download.oracle.com/javase/6/docs/api/java/util/HashMap.html
The code you have shown provides only read access to the map. Client code cannot make a structural modification.
Since your reload method alters a temporary map and then changes key_values to point to the new map, again I'd say no synchronization is required. The worst that can happen is someone reads from old copy of the map.
I'm going to keep my head down and wait for the downvotes now ;)
EDIT
As suggested by Bruno, the fly in the ointment is inheritance. If you cannot guarantee that your class will not be sub-classed, then you should be more defensive.
EDIT
Just to refer back to the specific questions posed by the OP...
Assuming properties are read only , do I have use synchronize in getProperty ?
Does it make sense to do this.key_values = Collections.synchronizedMap(tmp_map) in reloadConfig?
... I am genuinely interested to know if my answers are wrong. So I won't give up and delete my answer for a while ;)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.