concurrent HashMap: checking size

concurrent HashMap: checking size - java

Concurrent Hashmap could solve synchronization issue which is seen in hashmap. So adding and removing would be fast if we are using synchronize key work with hashmap. What about checking hashmap size, if mulitple threads checking concurrentHashMap size? do we still need synchronzation key word: something as follows:
public static synchronized getSize(){
return aConcurrentHashmap.size();
}

concurentHashMap.size() will return the size known at the moment of the call, but it might be a stale value when you use that number because another thread has added / removed items in the meantime.
However the whole purpose of ConcurrentMaps is that you don't need to synchronize it as it is a thread safe collection.

You can simply call aConcurrentHashmap.size(). However, you have to bear in mind that by the time you get the answer it might already be obsolete. This would happen if another thread where to concurrently modify the map.

You don't need to use synchronized with ConcurretnHashMap except in very rare occasions where you need to perform multiple operations atomically.
To just get the size, you can call it without synchronization.
To clarify when I would use synchronization with ConcurrentHashMap...
Say you have an expensive object you want to create on demand. You want concurrent reads, but also want to ensure that values are only created once.
public ExpensiveObject get(String key) {
return map.get(key); // can work concurrently.
}
public void put(String key, ExepensiveBuilder builder) {
// cannot use putIfAbsent because it needs the object before checking.
synchronized(map) {
if (!map.containsKey(key))
map.put(key, builder.create());
}
}
Note: This requires that all writes are synchronized, but reads can still be concurrent.

The designers of ConcurrentHashMap thought of giving weightage to individual operations like : get(), put() and remove() over methods which operate over complete HashMap like isEmpty() or size(). This is done because the changes of these methods getting called (in general) are less than the other individual methods.
A synchronization for size() is not needed here. We can get the size by calling concurentHashMap.size() method. This method may return stale values as other thread might modify the map in the meanwhile. But, this is explicitely assumed to be broken as these operations are deprioritized.

ConcorrentHashMap is fail-safe. it won't give any concurrent modification exceptions. it works good for multi threaded operations.
The whole implementation of ConcurrentHashMap is same as HashMap but the while retrieving the elements , HashMap locks whole map restricting doing further modifications which gives concurrent modification exception.'
But in ConcurrentHashMap, the locking happens at bucket level so the chance of giving concurrent modification exception is not present.
So to answer you question here, checking size of ConcurrentHashMap doesn't help because , it keeps chaining based on the operations or modification code that you write on the map. It has size method which is same from the HashMap.

Related

How to prevent threads from reading inconsistent data from ConcurrentHashMap?

I'm using a ConcurrentHashMap<String, String> that works as a cache, and where read operations are performed to validate if an element is already in the cache and write operations to add an element to the cache.
So, my question is: what are the best practices to always read the most recent ConcorrentHashMap values?
I want to ensure data consistency and not have cases like:
With the map.get("key") method, the first thread validates that this key does not yet exist in the map, then it does the map.put("value")
The second thread reads the data before the first thread puts the element on the map, leading to inconsistent data.
Code example:
Optional<String> cacheValue = Optional.ofNullable(cachedMap.get("key"));
if (cacheValue.isPresent()) {
// Perform actions
} else {
cachedMap.putIfAbsent("key", "value");
// Perform actions
}
How can I ensure that my ConcurrentHashMap is synchronized and doesn't retrieve inconsistent data?
Should I perform these map operations inside a synchronized block?

You probably need to do it this way:
if (cachedMap.putIfAbsent("key", "value") == null) {
// Perform actions "IS NOT PRESENT"
} else {
// Perform actions "IS PRESENT"
}
Doing it in two checks is obviously not atomic, so if you're having problems with the wrong values getting put in the cache, then that's likely your problem.

what are the best practices to always read the most recent ConcurrentHashMap values?
Oracle's Javadoc for ConcurrentHashMap says, "Retrievals reflect the results of the most recently completed update operations holding upon their onset." In other words, any time you call map.get(...) or any other method on the map, you are always working with the "most recent" content.
*BUT*
Is that enough? Maybe not. If your program threads expect any kind of consistency between two or more keys in the map, or if your threads expect any kind of consistency between something that is stored in the map and something that is stored elsewhere, then you are going to need to provide some explicit higher-level synchronization between the threads.
I can't provide an example that would be specific to the problem that's puzzling you because your question doesn't really say what that problem is.

Concurrency level for ConcurrentHashMap in synchronized method

I'm trying to fix a memory leak issue. Heap dump analysis shows that a ConcurrentHashMap is occupying around 98% of heap memory. Checked the code and it turns out that ConcurrentHashMap instantiation is using a constructor with no parameter. The default configuration for concurrencyLevel is 16. After this map instantiation I see a synchronized method call where data is being put in the map.
I would like to know that since data is being put only in synchronized method, is it safe to set concurrencyLevel of ConcurrentHashMap to 1?
Following is the sample code snippet:
private volatile Map<String, Integer> storeCache;
public void someMethod() {
storeCache = new ConcurrentHashMap<String, Integer>();
syncMethod();
}
private synchronized void syncMethod() {
storeCache.put("Test", 1);
}

I would like to know that since data is being put only in synchronized method, is it safe to set concurrencyLevel of ConcurrentHashMap to 1?
It's certainly safe, in the sense that it's not going to cause any Map corruption. However, it's not going to fix your memory leak. In fact, you probably don't want to synchronize access to the ConcurrentHashMap, which already guarantees safe reads and writes from multiple threads. Synchronizing externally is going to single-thread access to your CHM, which is going to eliminate many of the benefits of the CHM over a HashMap. If you remove the synchronized and specify a concurrencyLevel equal to the estimated number of concurrent writes, you'll probably achieve much better performance.
As for your memory leak, the keys and values in the CHM are strong references, meaning the Java garbage collector won't collect them, even if they're no longer referenced anywhere else in your code. So if you're using the CHM as a cache for temporary values, you'll need to .remove() them when your application no longer needs them.
(If you want the semantics of a ConcurrentMap without the strong keys, you can't get that out-of-the-box, but Guava provides a pretty good alternative.)
You may also want to check that the keys that you're .put()ing into the map have properly implemented .equals() and .hashCode().

Hashtable: why is get method synchronized?

I known a Hashtable is synchronized, but why its get() method is synchronized?
Is it only a read method?

If the read was not synchronized, then the Hashtable could be modified during the execution of read. New elements could be added, the underlying array could become too small and could be replaced by a bigger one, etc. Without sequential execution, it is difficult to deal with these situations.
However, even if get would not crash when the Hashtable is modified by another thread, there is another important aspect of the synchronized keyword, namely cache synchronization. Let's use a simplified example:
class Flag {
bool value;
bool get() { return value; } // WARNING: not synchronized
synchronized void set(bool value) { this->value = value; }
}
set is synchronized, but get isn't. What happens if two threads A and B simultaneously read and write to this class?
1. A calls read
2. B calls set
3. A calls read
Is it guaranteed at step 3 that A sees the modification of thread B?
No, it isn't, as A could be running on a different core, which uses a separate cache where the old value is still present. Thus, we have to force B to communicate the memory to other core, and force A to fetch the new data.
How can we enforce it? Everytime, a thread enters and leaves a synchronized block, an implicit memory barrier is executed. A memory barrier forces the cache to be updated. However, it is required that both the writer and the reader have to execute the memory barrier. Otherwise, the information is not properly communicated.
In our example, thread B already uses the synchronized method set, so its data modification is communicated at the end of the method. However, A does not see the modified data. The solution is to make get synchronized, so it is forced to get the updated data.

Have a look in Hashtable source code and you can think of lots of race conditions that can cause problem in a unsynchronized get() .
(I am reading JDK6 source code)
For example, a rehash() will create a empty array, and assign it to the instance var table, and put the entries from old table to the new one. Therefore if your get occurs after the empty array assignment, but before actually putting entries in it, you cannot find your key even it is in the table.
Another example is, there is a loop iterate thru the linked list at the table index, if in middle in your iteration, rehash happens. You may also failed to find the entry even it exists in the hashtable.

Hashtable is synchronized meaning the whole class is thread-safe
Inside the Hashtable, not only get() method is synchronized but also many other methods are. And particularly put() method is synchronized like Tom said.
A read method must be synchronized as a write method because it will make sure the visibility and the consistency of the variable.

Java synchronized block vs concurrentHashMap vs Collections.synchronizedMap

Say If have a synchronized method and within that method, I update a hashmap like this:
public synchronized void method1()
{
myHashMap.clear();
//populate the hashmap, takes about 5 seconds.
}
now while the method1 is running and the hashmap is being re-populated, if there are other threads tring to get the value of the hashmap, I assume they will get blocked?
Now instead of using sync method, if I change hashmap to ConcurrentHashMap like below, what's the behaviour?
public void method1()
{
myConcurrentHashMap.clear();
//populate the hashmap, takes about 5 seconds.
}
what if i use Collections.synchronizedMap ? is it the same?

CHM(ConcurrentHashMap), instead of synchronizing every method on a common lock, restricting access to a single thread
at a time, it uses a finer-grained locking mechanism called lock striping to allow a greater degree of shared access. Arbitrarily many reading threads
can access the map concurrently, readers can access the map concurrently with
writers, and a limited number of writers can modify the map concurrently. The result
is far higher throughput under concurrent access, with little performance penalty for
single-threaded access.
ConcurrentHashMap, along with the other concurrent collections, further improve on
the synchronized collection classes by providing iterators that do not throw
ConcurrentModificationException, thus eliminating the need to lock the collection
during iteration.
As with all improvements, there are still a few tradeoffs. The semantics of methods
that operate on the entire Map, such as size and isEmpty, have been slightly
weakened to reflect the concurrent nature of the collection. Since the result of size
could be out of date by the time it is computed, it is really only an estimate, so size
is allowed to return an approximation instead of an exact count. While at first this
may seem disturbing, in reality methods like size and isEmpty are far less useful in
concurrent environments because these quantities are moving targets.
Secondly, Collections.synchronizedMap
It's just simple HashMap with synchronized methods - I'd call it deprecated dute to CHM

If you want to have all read and write actions to your HashMap synchronized, you need to put the synchronize on all methods accessing the HashMap; it is not enough to block just one method.
ConcurrentHashMap allows thread-safe access to your data without locking. That means you can add/remove values in one thread and at the same time get values out in another thread without running into an exception. See also the documentation of ConcurrentHashMap

you could probably do
volatile private HashMap map = newMap();
private HashMap newMap() {
HashMap map = new HashMap();
//populate the hashmap, takes about 5 seconds
return map;
}
public void updateMap() {
map = newMap();
}
A reader sees a constant map, so reads don't require synchronization, and are not blocked.

Concurrent Modification Exceptions

I am getting the following java.util.ConcurrentModificationException on this method
private AtomicReference<HashMap<String, Logger>> transactionLoggerMap = new AtomicReference<HashMap<String,Logger>>();
public void rolloutFile() {
// Get all the loggers and fire a temp log line.
Set<String> transactionLoggerSet = (Set<String>) transactionLoggerMap.get().keySet();
Iterator<String> transactionLoggerSetIter = transactionLoggerSet.iterator();
while(transactionLoggerSetIter.hasNext()){
String key = (String) transactionLoggerSetIter.next();
Logger txnLogger = transactionLoggerMap.get().get(key);
localLogger.trace("About to do timer task rollover:");
txnLogger.info(DataTransformerConstants.IGNORE_MESSAGE);
}
}
Please suggest, if I am using a Atomic Reference, how do i get a como?

A ConcurrentModificationException means that you have modified the collection outside of the iterator. I don't see any modifications in your loop so I assume that there is another thread that is also adding or removing from the transactionLoggerMap at the same time you are iterating across it.
Even though you have it wrapped in an AtomicReference, you still cannot have two threads making changes to the same unsynchronized collection at the same time. AtomicReference does not synchronize the object it is wrapping -- it just gives you a way to atomically get and set that reference.
You will need to make this a synchronized collection either by using the ConcurrentHashMap class or wrapping your HashMap using the Collections.synchronizedMap(map) method.

Because you are not protecting against concurrent modifications during your iteration. The Atomic reference only makes sure you get your Map (and it's contents).

Maybe you can consider iterating over a local copy of the collection instead of the same collection. It would be an easy way to ensure your collection will not be modified while you're looping over it. It is advisable to use immutable objects on a multi-threaded environment and you prevent this kind of problems for free.
Hope it helps.

Remove the redundant
Set<String> transactionLoggerSet = (Set<String>) transactionLoggerMap.get().keySet();
You would still have to use synchronization as you are iterating over the map. SynchronizedMap guarentees consistency for its API methods. For the rest, you would need to do client side synchronization

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.