Is synchronization necessary for unmodifiable maps? - java

I'm using JDK 1.4...so I don't have access to the nice concurrency stuff in 1.5+.
Consider the following class fragment:
private Map map = Collections.EMPTY_MAP;
public Map getMap() {
return map;
}
public synchronized void updateMap(Object key, Object value) {
Map newMap = new HashMap(map);
newMap.put(key, value);
map = Collections.unmodifiableMap(newMap);
}
Is it necessary to synchronize (or volatile) the map reference given that I will only be allowed to update the map via the updateMap method (which is synchronized)?
The map object will be accessed (read) in multiple threads, especially via Iterators. Knowing that Iterators will throw an exception if the back-end map's structure is changed, I figured I would make the map unmodifiable. So as I change the structure of the map via updateMap, the existing iterators will continue working on the "old" map (which is fine for my purposes).
The side effect is, I don't have to worry about synchronizing reads. In essense, I'm going to have a much greater magnitude of reads compared to writes. The threads that are currently iterating over the map object will continue to do so and any new threads that kick off will grab the latest map. (well, I assume it will considering erickson's comments here - Java concurrency scenario -- do I need synchronization or not?)
Can somebody comment on whether or not this idea is a good one?
Thanks!

You should use the volatile keyword, to ensure that Threads will see the most recent Map version. Otherwise, without synchronization, there is no guarantee that other threads will ever see anything except the empty map.
Since your updateMap() is synchronized, each access to it will see the latest value for map. Thus, you won't lose any updates. This is guaranteed. However, since your getMap() is not synchronized and map is not volatile, there is no guarantee that a thread will see the latest value for map unless that thread itself was the most recent thread to update the map. Use of volatile will fix this.
However, you do have access to the Java 1.5 and 1.6 concurrency additions. A backport exists. I highly recommend use of the backport, as it will allow easy migration to the JDK concurrency classes when you are able to migrate to a later JDK, and it allows higher performance than your method does. (Although if updates to your map are rare, your performance should be OK.)

Technically, if you use volatile map, you don't need to synchronize updateMap (updating 'map' is atomic, and all other instructions in that method operate on objects local to the current thread).

I know this isnt part of your question. But if you are worried about concurrency and are not using java 1.5 for ConcurrentHashMap. Use an immutable instance of a Hashtable. It is a blocking concurrent map implementation which handles all concurrency writes and reads.

Related

Is Hashmap's containsKey method threadsafe if the map is initialized once, and is never modified again

Can we use Hashmap's containsKey() method without synchronizing in an multi-threaded environment?
Note: Threads are only going to read the Hashmap. The map is initialized once, and is never modified again.
It really depends on how/when your map is accessed.
Assuming the map is initialized once, and never modified again, then methods that don't modify the internal state like containsKey() should be safe.
In this case though, you should make sure your map really is immutable, and is published safely.
Now if in your particular case the state does change during the course of your program, then no, it is not safe.
From the documentation:
Note that this implementation is not synchronized.
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
In this case, you should use ConcurrentHashMap, or synchronize externally.
You shouldn't look at a single method this way. A HashMap is not meant to be used in a multi-threaded setup.
Having said that, the one exception would be: a map that gets created once (single threaded), and afterwards is "read" only. In other words: if a map doesn't get changed anymore, then you can have as many threads reading it as you want.
From that point of view, just containsKey() calls shouldn't call a problem. The problem arises when the information that this method relies on changes over time.
No, it is not thread-safe for any operations. You need to synchronise all access or use something like ConcurrentHashMap.
My favourite production system troubleshooting horror story is when we found that HashMap.get went into an infinite loop locking up 100% CPU forever because of missing synchronisation. This happened because the linked lists that were used within each bucket got into an inconsistent state. The same could happen with containsKey.
You should be safe if no one modifies the HashMap after it has been initially published, but better use an implementation that guarantees this explicitly (such as ImmutableMap or, again, a ConcurrentMap).
No. (No it is not. Not at all. 30 characters?)
It's complicated, but, mostly, no.
The spec of HashMap makes no guarantees whatsoever. It therefore reserves the right to blast yankee doodle dandy from your speakers if you try: You're just not supposed to use it that way.
... however, in practice, whilst the API of HashMap makes no guarantees, generally it works out. But, mind the horror story of #Thilo's answer.
... buuut, the Java Memory Model works like this: You should consider that each thread gets an individual copy of each and every field across the entire heap of the VM. These individual copies are then synced up at indeterminate times. That means that all sorts of code simply isn't going to work right; you add an entry to the map from one thread, and if you then access that map from another you won't see it even though a lot of time has passed – that's theoretically possible. Also, internally, map uses multiple fields and presumably these fields must be consistent with each other or you'll get weird behaviours (exceptions and wrong results). The JMM makes no guarantees about consistency either. The way out of this dilemma is that the JMM offers these things called 'comes-before/comes-after' relationships which give you guarantees that changes have been synced up. Using the 'synchronized' keyword is one easy way to get such relationships going.
Why not use a ConcurrentHashMap which has all the bells and whistles built in and does in fact guarantee that adding an entry from thread A and then querying it via containsKey from thread B will get you a consistent answer (which might still be 'no, that key is not in the map', because perhaps thread B got there slightly before thread A or slightly after but there's no way for you to know. It won't throw any exceptions or do something really bizarre such as returning 'false' for things you added ages ago all of a sudden).
So, whilst it's complicated, the answer is basically: Don't do that; either use a synchronized guard, or, probably the better choice: ConcurrentHashMap.
No, Read the bold part of HashMap documentation:
Note that this implementation is not synchronized.
So you should handle it:
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
And suggested solutions:
This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method
#user7294900 is right.
If your application does not modifies the HashMap structurally which is build thread-safely and your application just invoke containsKey method, it's thread safe.
For instance, I've used HashMap like this:
#Component
public class SpringSingletonBean {
private Map<String, String> map = new HashMap<>();
public void doSomething() {
//
if (map.containsKey("aaaa")) {
//do something
}
}
#PostConstruct
public void init() {
// do something to initialize the map
}
}
It works well.

ConcurrentHashMap in Java locking mechanism for computeIfPresent

I'm using Java 8 and would like to know if the computeIfPresent operation of the ConcurrentHashMap does lock the whole table/map or just the bin containing the key.
From the documentation of the computeIfPresent method:
Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map
This looks like the whole map is locked when invoking this method for a key. Why does the whole map have to be locked if a value of a certain key is updated? Wouldn't it be better to just lock the bin containing the key/value pair?
Judging by implementation (Oracle JDK 1.8.0_101), just the corresponding bin is locked. This does not contradict the documentation snippet you've cited, since it mentions that some update operations may be blocked, not necessarily all. Of course, it'd be clearer if the docs stated explicitly what gets locked, but that'd be leaking implementation details to what is de facto a part of the interface.
If you look at the source code of ConcurrentHashMap#computeIfPresent, you'll notice that the synchronisation is made directly on the node itself.
So an operation will only block if you attempt to update any node that is being computed. You should not have any problems with other nodes.
From my understanding, the synchronization directly on nodes is actually the major add-on of ConcurrentHashMap vs old Hashtable.
If you look at the source code of hashtables, you'll notice that the synchronization is much wider. A contrario, any synchronization in ConcurrentHashMap happens directly on nodes.
The end of Hashtable documentation also suggest that :
[...] Hashtable is synchronized. If a thread-safe implementation is not
needed, it is recommended to use HashMap in place of Hashtable. If a
thread-safe highly-concurrent implementation is desired, then it is
recommended to use ConcurrentHashMap in place of Hashtable.

Get and Put to Map at same time in Java

I have one doubt. What will happen if I get from map at same time when I am putting to map some data?
What I mean is if map.get() and map.put() are called by two separate processes at the same time. Will get() wait until put() has been executed?
It depends on which Map implementation you are using.
For example, ConcurrentHashMap supports full concurrency, and get() will not wait for put() to get executed, and stated in the Javadoc :
* <p> Retrieval operations (including <tt>get</tt>) generally do not
* block, so may overlap with update operations (including
* <tt>put</tt> and <tt>remove</tt>). Retrievals reflect the results
* of the most recently <em>completed</em> update operations holding
* upon their onset.
Other implementations (such as HashMap) don't support concurrency and shouldn't be used by multiple threads at the same time.
It might throw ConcurrentModificationException- not sure about it. It is always better to use synchronizedMap.This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method.This is best done at creation time, to prevent accidental unsynchronized access to the map:
Map map = Collections.synchronizedMap(new HashMap(...));
Map is an interface, so the answer depends on the implementation you're using.
Generally speaking, the simpler implementations of this interface, such as HashMap and TreeMap are not thread safe. If you don't have some synchronization built around them, concurrently puting and geting will result in an undefined behavior - you may get the new value, you may get the old one, bust most probably you'd just get a ConcurrentModificationException, or something worse.
If you want to handle the same Map from different threads, either use one of the implementations of a ConcurrentMap (e.g., a ConcurrentHashMap), which guarantees a happens-before-sequence (i.e., if the get was fired before the put, you'd get the old value, even if the put is ongoing, and vise-versa), or synchronize the Map's access (e.g., by calling Collections#synchronizedMap(Map).

Why should we use HashMap in multi-threaded environments?

Today I was reading about how HashMap works in Java. I came across a blog and I am quoting directly from the article of the blog. I have gone through this article on Stack Overflow. Still
I want to know the detail.
So the answer is Yes there is potential race condition exists while
resizing HashMap in Java, if two thread at the same time found that
now HashMap needs resizing and they both try to resizing. on the
process of resizing of HashMap in Java , the element in bucket which
is stored in linked list get reversed in order during there migration
to new bucket because java HashMap doesn't append the new element at
tail instead it append new element at head to avoid tail traversing.
If race condition happens then you will end up with an infinite loop.
It states that as HashMap is not thread-safe during resizing of the HashMap a potential race condition can occur. I have seen in our office projects even, people are extensively using HashMaps knowing they are not thread safe. If it is not thread safe, why should we use HashMap then? Is it just lack of knowledge among developers as they might not be aware about structures like ConcurrentHashMap or some other reason. Can anyone put a light on this puzzle.
I can confidently say ConcurrentHashMap is a pretty ignored class. Not many people know about it and not many people care to use it. The class offers a very robust and fast method of synchronizing a Map collection. I have read a few comparisons of HashMap and ConcurrentHashMap on the web. Let me just say that they’re totally wrong. There is no way you can compare the two, one offers synchronized methods to access a map while the other offers no synchronization whatsoever.
What most of us fail to notice is that while our applications, web applications especially, work fine during the development & testing phase, they usually go tilts up under heavy (or even moderately heavy) load. This is due to the fact that we expect our HashMap’s to behave a certain way but under load they usually misbehave. Hashtable’s offer concurrent access to their entries, with a small caveat, the entire map is locked to perform any sort of operation.
While this overhead is ignorable in a web application under normal load, under heavy load it can lead to delayed response times and overtaxing of your server for no good reason. This is where ConcurrentHashMap’s step in. They offer all the features of Hashtable with a performance almost as good as a HashMap. ConcurrentHashMap’s accomplish this by a very simple mechanism.
Instead of a map wide lock, the collection maintains a list of 16 locks by default, each of which is used to guard (or lock on) a single bucket of the map. This effectively means that 16 threads can modify the collection at a single time (as long as they’re all working on different buckets). Infact there is no operation performed by this collection that locks the entire map.
There are several aspects to this: First of all, most of the collections are not thread safe. If you want a thread safe collection you can call synchronizedCollection or synchronizedMap
But the main point is this: You want your threads to run in parallel, no synchronization at all - if possible of course. This is something you should strive for but of course cannot be achieved every time you deal with multithreading.
But there is no point in making the default collection/map thread safe, because it should be an edge case that a map is shared. Synchronization means more work for the jvm.
In a multithreaded environment, you have to ensure that it is not modified concurrently or you can reach a critical memory problem, because it is not synchronized in any way.
Dear just check Api previously I also thinking in same manner.
I thought that the solution was to use the static Collections.synchronizedMap method. I was expecting it to return a better implementation. But if you look at the source code you will realize that all they do in there is just a wrapper with a synchronized call on a mutex, which happens to be the same map, not allowing reads to occur concurrently.
In the Jakarta commons project, there is an implementation that is called FastHashMap. This implementation has a property called fast. If fast is true, then the reads are non-synchronized, and the writes will perform the following steps:
Clone the current structure
Perform the modification on the clone
Replace the existing structure with the modified clone
public class FastSynchronizedMap implements Map,
Serializable {
private final Map m;
private ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
.
.
.
public V get(Object key) {
lock.readLock().lock();
V value = null;
try {
value = m.get(key);
} finally {
lock.readLock().unlock();
}
return value;
}
public V put(K key, V value) {
lock.writeLock().lock();
V v = null;
try {
v = m.put(key, value);
} finally {
lock.writeLock().lock();
}
return v;
}
.
.
.
}
Note that we do a try finally block, we want to guarantee that the lock is released no matter what problem is encountered in the block.
This implementation works well when you have almost no write operations, and mostly read operations.
Hashmap can be used when a single thread has an access to it. However when multiple threads start accessing the Hashmap there will be 2 main problems:
1. resizing of hashmap is not gauranteed to work as expected.
2. Concurrent Modification exception would be thrown. This can also be thrown when its accessed by single thread to read and write onto the hashmap at the same time.
A workaround for using HashMap in multi-threaded environment is to initialize it with the expected number of objects' count, hence avoiding the need for a re-sizing.

How do I get the latest view of a ConcurrentHashMap?

You can ensure that changes one thread makes to a variable can be seen on other threads by making the variable volatile, or by having both threads synchronize on something. If the thing being changed is a java.util.ConcurrentHashMap, does it make sense to create a memory barrier by declaring the type of the variable holding this map as volatile, or are readers accessing the map (say via myMap.values()) going to get the latest possible view anyway? For context I have a heavy reading, light writing scenario where I am switching my lock free read solution to a ConcurrentHashMap.
ConcurrentHashMap guarantees that there is a happens-before relationship between writes and subsequent reads. So yes, when you read (get), you will see the most recent changes that have been "committed" (put has returned).
Note: this does not apply to iterators as explained in the javadoc.
The variable "holding" the map is a reference or pointer to the map object (respectively (simplified) to the memory address where the map is stored). Making it volatile would only affect the pointer, not the map object itself. As long as you always use the same Map-Object and ensure that the map is fully initialized before the threads use it, you don't have to use "volatile references" to it. The concurrency is handled transparently inside the concurrent hash map.
Yes, ConcurrentHashMap gives the latest views. If you refer the javadocs at http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentHashMap.html#get(java.lang.Object)
it is clearly written that
Retrievals reflect the results of the most recently completed update
operations holding upon their onset
It has some more details and I would suggest you go and read it.
Besides, as already noted using volatile is not what you want as it will only affect the pointer and not the actual contents of the map.
All you need to do is make sure that the reference holding the map is final, so you get a final field fence that guarantees you see a properly initialised map and that the reference itself is not changed.
As others point out, ConcurrentHashMap will guarantee visibility/happens-before of writes internally, as all of the java.util.concurrent.* collections do. You should however use the conditional writes exposed on the ConcurrentMap interface to avoid data-races in your writes.

Categories

Resources