Concurrency level for ConcurrentHashMap in synchronized method

Concurrency level for ConcurrentHashMap in synchronized method - java

I'm trying to fix a memory leak issue. Heap dump analysis shows that a ConcurrentHashMap is occupying around 98% of heap memory. Checked the code and it turns out that ConcurrentHashMap instantiation is using a constructor with no parameter. The default configuration for concurrencyLevel is 16. After this map instantiation I see a synchronized method call where data is being put in the map.
I would like to know that since data is being put only in synchronized method, is it safe to set concurrencyLevel of ConcurrentHashMap to 1?
Following is the sample code snippet:
private volatile Map<String, Integer> storeCache;
public void someMethod() {
storeCache = new ConcurrentHashMap<String, Integer>();
syncMethod();
}
private synchronized void syncMethod() {
storeCache.put("Test", 1);
}

I would like to know that since data is being put only in synchronized method, is it safe to set concurrencyLevel of ConcurrentHashMap to 1?
It's certainly safe, in the sense that it's not going to cause any Map corruption. However, it's not going to fix your memory leak. In fact, you probably don't want to synchronize access to the ConcurrentHashMap, which already guarantees safe reads and writes from multiple threads. Synchronizing externally is going to single-thread access to your CHM, which is going to eliminate many of the benefits of the CHM over a HashMap. If you remove the synchronized and specify a concurrencyLevel equal to the estimated number of concurrent writes, you'll probably achieve much better performance.
As for your memory leak, the keys and values in the CHM are strong references, meaning the Java garbage collector won't collect them, even if they're no longer referenced anywhere else in your code. So if you're using the CHM as a cache for temporary values, you'll need to .remove() them when your application no longer needs them.
(If you want the semantics of a ConcurrentMap without the strong keys, you can't get that out-of-the-box, but Guava provides a pretty good alternative.)
You may also want to check that the keys that you're .put()ing into the map have properly implemented .equals() and .hashCode().

Related

ConcurrentModifcationException when adding value to hashmap

The below code is receiving concurrent modificationexception when 2 thread access the same.
I would like to know whether
Whether this exception can be avoided if we use concurrent Hashmap.
If we use concurrent hashmap will there be any issue in a multithreaded environment.
or is there any other way to prevent this exception?
I donot intend use synchronzed as this code is used during polling. as one thread may have to wait for another to finish exceution.
The code is
HashMap<Integer, myModel> aServiceHash = new HashMap<Integer, myModel>();
HashMap<Integer, myModel> rServiceHash = new HashMap<Integer, myModel>();
for (myModel ser : serAccepted){
aServiceHash.put(service.getOriginalDate(), ser);
}
for (myModel ser : serRequested) {
if (aServiceHash.containsKey(service.getDate())) {
aServiceHash.put(serv.getDate(), serv);
}
else
rServiceHash.put(service.getDate(), ser);
}
Referred http://examples.javacodegeeks.com/java-basics/exceptions/java-util-concurrentmodificationexception-how-to-handle-concurrent-modification-exception/
http://www.journaldev.com/378/how-to-avoid-concurrentmodificationexception-when-using-an-iterator
How to avoid HashMap "ConcurrentModificationException" while manipulating `values()` and `put()` in concurrent threads?
Using JSF 2.1,JDK 7.1.

HashMap is not thread safe. ConcurrentHashMapis thread safe. When accessing a map from different threads, prefer it to be concurrent for thread safety.
And yes, it will avoid the exception.There will be no multithreading issues from that direction. You'll still need to make sure no thread removes something you intend to use later.
Another way to prevent the exception is to lock the map before each insert, whether through synchronized block or a Lock object.

Depending on your usage patterns and performance requirements, you could also build a copy-on-write map using a volatile HashMap delegate. This will give you one volatile read for each access whereas in ConcurrentHashMap you have a lot more, and they are a bit more expensive than ordinary reads or writes. Of course, copy-on-write schemes have their own drawbacks when you write to the map. But if you create the map from a pre-populated initial map and treat it as read-only afterwards, copy-on-write will be more efficient.

Is this code multi-thread safe?

private static Map<Integer, String> map = null;
public static String getString(int parameter){
if(map == null){
map = new HashMap<Integer, String>();
//map gets filled here...
}
return map.get(parameter);
}
Is that code unsafe as multithreading goes?

As mentioned, it's definitely not safe. If the contents of the map are not based on the parameter in getString(), then you would be better served by initializing the map as a static initializer as follows:
private static final Map<Integer, String> MAP = new HashMap<Integer,String>();
static {
// Populate map here
}
The above code gets called once, when the class is loaded. It's completely thread safe (although future modification to the map are not).
Are you trying to lazy load it for performance reasons? If so, this is much safer:
private static Map<Integer, String> map = null;
public synchronized static String getString(int parameter){
if(map == null){
map = new HashMap<Integer, String>();
//map gets filled here...
}
return map.get(parameter);
}
Using the synchronized keyword will make sure that only a single thread can execute the method at any one time, and that changes to the map reference are always propagated.
If you're asking this question, I recommend reading "Java Concurrency in Practice".

Race condition? Possibly.
If map is null, and two threads check if (map == null) at the same time, each would allocate a separate map. This may or may not be a problem, depending mainly on whether map is invariant. Even if the map is invariant, the cost of populating the map may also become an issue.
Memory leak? No.
The garbage collector will do its job correctly regardless of the race condition.

You do run the risk of initializing map twice in a multi-threaded scenario.
In a managed language, the garbage collector will eventually dispose of the no-longer-referenced instance. In an unmanaged language, you will never free the memory allocated for the overwritten map.
Either way, initialization should be properly protected so that multiple threads do not run initialization code at the same time.
One reason: The first thread could be in the middle of initializing the HashMap, while a second thread comes a long, sees that map is not null, and merrily tries to use the partially-initialized data structure.

It is unsafe in multithreading case due to race condition.
But do you really need the lazy initialization for the map? If the map is going to be used anyway, seems you could just do eager initialization for it..

The above code isn't thread-safe, as others have mentioned, your map can be initialized twice. You may be tempted to try and fix the above code by adding some synchronization, this is known as "double checked locking", Here is an article that describes the problems with this approach, as well as some potential fixes.
The simplest solution is to make the field a static field in a separate class:
class HelperSingleton {
static Helper singleton = new Helper();
}
it can also be fixed using the volatile keyword, as described in Bill Pugh's article.

No, this code is not safe for use by multiple threads.
There is a race condition in the initialization of the map. For example, multiple threads could initialize the map simultaneously and clobber each others' writes.
There are no memory barriers to ensure that modifications made by a thread are visible to other threads. For example, each thread could use its own copy of the map because they never "see" the values written by another thread.
There is no atomicity to ensure that invariants are preserved as the map is accessed concurrently. For example, a thread that's performing a get() operation could get into an infinite loop because another thread rehashed the buckets during a simultaneous put() operation.

If you are using Java 6, use ConcurrentHashMap
ConcurrentHashMap JavaDoc

Java synchronized block vs concurrentHashMap vs Collections.synchronizedMap

Say If have a synchronized method and within that method, I update a hashmap like this:
public synchronized void method1()
{
myHashMap.clear();
//populate the hashmap, takes about 5 seconds.
}
now while the method1 is running and the hashmap is being re-populated, if there are other threads tring to get the value of the hashmap, I assume they will get blocked?
Now instead of using sync method, if I change hashmap to ConcurrentHashMap like below, what's the behaviour?
public void method1()
{
myConcurrentHashMap.clear();
//populate the hashmap, takes about 5 seconds.
}
what if i use Collections.synchronizedMap ? is it the same?

CHM(ConcurrentHashMap), instead of synchronizing every method on a common lock, restricting access to a single thread
at a time, it uses a finer-grained locking mechanism called lock striping to allow a greater degree of shared access. Arbitrarily many reading threads
can access the map concurrently, readers can access the map concurrently with
writers, and a limited number of writers can modify the map concurrently. The result
is far higher throughput under concurrent access, with little performance penalty for
single-threaded access.
ConcurrentHashMap, along with the other concurrent collections, further improve on
the synchronized collection classes by providing iterators that do not throw
ConcurrentModificationException, thus eliminating the need to lock the collection
during iteration.
As with all improvements, there are still a few tradeoffs. The semantics of methods
that operate on the entire Map, such as size and isEmpty, have been slightly
weakened to reflect the concurrent nature of the collection. Since the result of size
could be out of date by the time it is computed, it is really only an estimate, so size
is allowed to return an approximation instead of an exact count. While at first this
may seem disturbing, in reality methods like size and isEmpty are far less useful in
concurrent environments because these quantities are moving targets.
Secondly, Collections.synchronizedMap
It's just simple HashMap with synchronized methods - I'd call it deprecated dute to CHM

If you want to have all read and write actions to your HashMap synchronized, you need to put the synchronize on all methods accessing the HashMap; it is not enough to block just one method.
ConcurrentHashMap allows thread-safe access to your data without locking. That means you can add/remove values in one thread and at the same time get values out in another thread without running into an exception. See also the documentation of ConcurrentHashMap

you could probably do
volatile private HashMap map = newMap();
private HashMap newMap() {
HashMap map = new HashMap();
//populate the hashmap, takes about 5 seconds
return map;
}
public void updateMap() {
map = newMap();
}
A reader sees a constant map, so reads don't require synchronization, and are not blocked.

concurrent HashMap: checking size

Concurrent Hashmap could solve synchronization issue which is seen in hashmap. So adding and removing would be fast if we are using synchronize key work with hashmap. What about checking hashmap size, if mulitple threads checking concurrentHashMap size? do we still need synchronzation key word: something as follows:
public static synchronized getSize(){
return aConcurrentHashmap.size();
}

concurentHashMap.size() will return the size known at the moment of the call, but it might be a stale value when you use that number because another thread has added / removed items in the meantime.
However the whole purpose of ConcurrentMaps is that you don't need to synchronize it as it is a thread safe collection.

You can simply call aConcurrentHashmap.size(). However, you have to bear in mind that by the time you get the answer it might already be obsolete. This would happen if another thread where to concurrently modify the map.

You don't need to use synchronized with ConcurretnHashMap except in very rare occasions where you need to perform multiple operations atomically.
To just get the size, you can call it without synchronization.
To clarify when I would use synchronization with ConcurrentHashMap...
Say you have an expensive object you want to create on demand. You want concurrent reads, but also want to ensure that values are only created once.
public ExpensiveObject get(String key) {
return map.get(key); // can work concurrently.
}
public void put(String key, ExepensiveBuilder builder) {
// cannot use putIfAbsent because it needs the object before checking.
synchronized(map) {
if (!map.containsKey(key))
map.put(key, builder.create());
}
}
Note: This requires that all writes are synchronized, but reads can still be concurrent.

The designers of ConcurrentHashMap thought of giving weightage to individual operations like : get(), put() and remove() over methods which operate over complete HashMap like isEmpty() or size(). This is done because the changes of these methods getting called (in general) are less than the other individual methods.
A synchronization for size() is not needed here. We can get the size by calling concurentHashMap.size() method. This method may return stale values as other thread might modify the map in the meanwhile. But, this is explicitely assumed to be broken as these operations are deprioritized.

ConcorrentHashMap is fail-safe. it won't give any concurrent modification exceptions. it works good for multi threaded operations.
The whole implementation of ConcurrentHashMap is same as HashMap but the while retrieving the elements , HashMap locks whole map restricting doing further modifications which gives concurrent modification exception.'
But in ConcurrentHashMap, the locking happens at bucket level so the chance of giving concurrent modification exception is not present.
So to answer you question here, checking size of ConcurrentHashMap doesn't help because , it keeps chaining based on the operations or modification code that you write on the map. It has size method which is same from the HashMap.

What is the name of this locking technique?

I've got a gigantic Trove map and a method that I need to call very often from multiple threads. Most of the time this method shall return true. The threads are doing heavy number crunching and I noticed that there was some contention due to the following method (it's just an example, my actual code is bit different):
synchronized boolean containsSpecial() {
return troveMap.contains(key);
}
Note that it's an "append only" map: once a key is added, is stays in there forever (which is important for what comes next I think).
I noticed that by changing the above to:
boolean containsSpecial() {
if ( troveMap.contains(key) ) {
// most of the time (>90%) we shall pass here, dodging lock-acquisition
return true;
}
synchronized (this) {
return troveMap.contains(key);
}
}
I get a 20% speedup on my number crunching (verified on lots of runs, running during long times etc.).
Does this optimization look correct (knowing that once a key is there it shall stay there forever)?
What is the name for this technique?
EDIT
The code that updates the map is called way less often than the containsSpecial() method and looks like this (I've synchronized the entire method):
synchronized void addSpecialKeyValue( key, value ) {
....
}

This code is not correct.
Trove doesn't handle concurrent use itself; it's like java.util.HashMap in that regard. So, like HashMap, even seemingly innocent, read-only methods like containsKey() could throw a runtime exception or, worse, enter an infinite loop if another thread modifies the map concurrently. I don't know the internals of Trove, but with HashMap, rehashing when the load factor is exceeded, or removing entries can cause failures in other threads that are only reading.
If the operation takes a significant amount of time compared to lock management, using a read-write lock to eliminate the serialization bottleneck will improve performance greatly. In the class documentation for ReentrantReadWriteLock, there are "Sample usages"; you can use the second example, for RWDictionary, as a guide.
In this case, the map operations may be so fast that the locking overhead dominates. If that's the case, you'll need to profile on the target system to see whether a synchronized block or a read-write lock is faster.
Either way, the important point is that you can't safely remove all synchronization, or you'll have consistency and visibility problems.

It's called wrong locking ;-) Actually, it is some variant of the double-checked locking approach. And the original version of that approach is just plain wrong in Java.
Java threads are allowed to keep private copies of variables in their local memory (think: core-local cache of a multi-core machine). Any Java implementation is allowed to never write changes back into the global memory unless some synchronization happens.
So, it is very well possible that one of your threads has a local memory in which troveMap.contains(key) evaluates to true. Therefore, it never synchronizes and it never gets the updated memory.
Additionally, what happens when contains() sees a inconsistent memory of the troveMap data structure?
Lookup the Java memory model for the details. Or have a look at this book: Java Concurrency in Practice.

This looks unsafe to me. Specifically, the unsynchronized calls will be able to see partial updates, either due to memory visibility (a previous put not getting fully published, since you haven't told the JMM it needs to be) or due to a plain old race. Imagine if TroveMap.contains has some internal variable that it assumes won't change during the course of contains. This code lets that invariant break.
Regarding the memory visibility, the problem with that isn't false negatives (you use the synchronized double-check for that), but that trove's invariants may be violated. For instance, if they have a counter, and they require that counter == someInternalArray.length at all times, the lack of synchronization may be violating that.
My first thought was to make troveMap's reference volatile, and to re-write the reference every time you add to the map:
synchronized (this) {
troveMap.put(key, value);
troveMap = troveMap;
}
That way, you're setting up a memory barrier such that anyone who reads the troveMap will be guaranteed to see everything that had happened to it before its most recent assignment -- that is, its latest state. This solves the memory issues, but it doesn't solve the race conditions.
Depending on how quickly your data changes, maybe a Bloom filter could help? Or some other structure that's more optimized for certain fast paths?

Under the conditions you describe, it's easy to imagine a map implementation for which you can get false negatives by failing to synchronize. The only way I can imagine obtaining false positives is an implementation in which key insertions are non-atomic and a partial key insertion happens to look like another key you are testing for.
You don't say what kind of map you have implemented, but the stock map implementations store keys by assigning references. According to the Java Language Specification:
Writes to and reads of references are always atomic, regardless of whether they are implemented as 32 or 64 bit values.
If your map implementation uses object references as keys, then I don't see how you can get in trouble.
EDIT
The above was written in ignorance of Trove itself. After a little research, I found the following post by Rob Eden (one of the developers of Trove) on whether Trove maps are concurrent:
Trove does not modify the internal structure on retrievals. However, this is an implementation detail not a guarantee so I can't say that it won't change in future versions.
So it seems like this approach will work for now but may not be safe at all in a future version. It may be best to use one of Trove's synchronized map classes, despite the penalty.

I think you would be better off with a ConcurrentHashMap which doesn't need explicit locking and allows concurrent reads
boolean containsSpecial() {
return troveMap.contains(key);
}
void addSpecialKeyValue( key, value ) {
troveMap.putIfAbsent(key,value);
}
another option is using a ReadWriteLock which allows concurrent reads but no concurrent writes
ReadWriteLock rwlock = new ReentrantReadWriteLock();
boolean containsSpecial() {
rwlock.readLock().lock();
try{
return troveMap.contains(key);
}finally{
rwlock.readLock().release();
}
}
void addSpecialKeyValue( key, value ) {
rwlock.writeLock().lock();
try{
//...
troveMap.put(key,value);
}finally{
rwlock.writeLock().release();
}
}

Why you reinvent the wheel?
Simply use ConcurrentHashMap.putIfAbsent

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.