Understanding `structural modification` in HashMap - java

In the doc, it says
If multiple threads access a hash map concurrently, and at least one
of the threads modifies the map structurally, it must be synchronized
externally. (A structural modification is any operation that adds or
deletes one or more mappings; merely changing the value associated
with a key that an instance already contains is not a structural
modification.)
It seems indicating that changing the value associated with a key that an instance already contains does not need external synchronization. But I think it's not thread safe. right?

For thread visibility purposes yes, you'll need external syncing if you have two threads that communicate using the map. But unsynchronized structural changes have a chance of corrupting the map completely (imagine when 2 threads put a new mapping and both start to rehash the map), whereas changing a mapped value will have less dramatic effects.
Even with only one thread doing structural modifications it's problematic if the backing array is grown/rehashed. Other threads using the same array (or the old one, if the array is grown) can encounter lost updates (thread puts value in the old array instead of new array), disappearing mappings (thread puts value in the array, while another thread is rehashing the same array, value gets put in the wrong bucket) and so on.
So when is it safe to not synchronize? Almost never. A safe situation would be a pre-built map with threads only accessing "their" entries, like
thread1: map.get("A");
thread2: map.put("B", "1"); // Assume "B" was in the map already
thread3: map.get("C");
No problems because no structural changes and threads are not sharing keys. As soon as you start sharing keys between threads, you can get race conditions and visibility issues. If you introduce structural changes, those visibility issues can result in data loss in the map.

Related

Why are we here, specifically, saying that ArrayList is not thread safe?

Description: If we use same object reference among multiple threads, no object is thread safe. Similarly, if any collection reference is shared among multiple threads then that collection is not thread-safe since other threads can access it. So, Why are we here specifically saying that ArrayList is not thread-safe? What about the other Collections?
You misunderstand the meaning of "thread-safe."
When we say "class X is thread-safe," We are not saying that you don't have to worry about the thread-safety of a program that uses it. If you build a program using nothing but thread-safe objects, that does not guarantee that your program will be thread-safe.
So what does it guarantee?
Suppose you have a List. Suppose that two threads, A and B, each write different values to the same index in the list, suppose that some thread C reads from that index, and suppose that none of those three threads uses any synchronization.
If the list is "thread-safe," then you can be assured that thread C will get one of three possible values:
The value that thread A wrote,
The value that thread B wrote,
The value that was stored at that index before either thread A or thread B wrote.
If the list is not thread-safe, then any of those same three things could happen, but also, other things could happen:
Thread C could get a value that was never in the list,
The list could behave in broken ways in the future for thread C even if no other thread continues to use it,
The program could crash,
etc. (I don't know how many other strange things could happen.)
When we say that a class is "thread-safe" we are saying that it will always behave in predictable, reasonable ways, even when its methods are concurrently called by multiple threads.
If you write a program that uses a "thread-safe" list, and if it depends on thread C reading one particular value of the three possibilities that I listed above, then your program has a thread-safety problem, even though the list itself does not.
I haven't checked but I think that all standard Collection implementations state if they are thread-safe or not. So you know if you can share that collection among different threads without synchronization.
CopyOnWriteArrayList for example is a thread-safe List implementation.
ArrayList is unsynchronized in implementation. When an object is unsynchronized it means that is is not locked while being modified structurally. A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the backing array; merely setting the value of an element is not a structural modification.
What you are referring to is an array which the elements are being added to or being deleted from and can be modified this differs from it having its value being set.
Reference is in regards with the pointer of the start of the array but how many elements are there is in question and having an unsynchronized object being modified in the sense of elements while the elements are being iterated over by another thread the integrity of the elements in the list is hard to guarantee. I hope I was able to convey the message plainly.
Look for more details here in Oracle: Array List and ConcurrentModificationException
ArrayList:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the backing array; merely setting the value of an element is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the list should be "wrapped" using the Collections.synchronizedList method.
ConcurrentModificationException:
Note that fail-fast behavior cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification.

What happens if an object is modified outside of the concurrent map in java?

What happens if an object is modified outside of the concurrent map in java?
Say, I have a concurrent hash map and in one thread I retrieve a value from that map and modify its state. Will the other threads see the modification without an additional synchronization?
The key concept in the Java Memory Model is the happens before relation. You can only rely on things happening between threads if there is a happens before relationship between the two of them.
In the case of ConcurrentHashMap, there is a happens-before relationship between a put and a subsequent get of the value for the same key: updating the value and putting it into the map happens before getting the value and reading its state. Because of that happens before relationship, the update happens before the reading of the state, so you will see the updated state.
So, if you update the state of an object, and then put it into the map, if you subsequently get it from the map, you are guaranteed to see that updated state (until such time as you put again).
But, if you have a reference to that object outside the context of a ConcurrentHashMap, there is no automatic happens-before relationship. You have to create that relationship for yourself.
One way of doing this is with synchronization (as in using synchronized, on the same object in all threads); other ways include:
writing and reading a volatile variable
using a Lock
putting the object into the map again, and then getting from the map before you start using it in the other thread.
Short answer is no.
A concurrent map will only synchronize the access to the map. That is, if one thread writes the map, all other threads can see that without additional synchronization.
If you retrieve an object from the map and modify it without synchronization, and if another thread retrieves the same object to read it, then you have a race without explicit synchronization between those threads.

Is it safe to replace all the occurrences of Hashtable with ConcurrentHashmap?

Our legacy multi-threaded application has a lots of usage of Hashtable. Is it safe to replace the Hashtable instances with ConcurrentHashmap instances for performance gain? Will there be any side effect?
Is it safe to replace the Hashtable instances with ConcurrentHashmap instances for performance gain?
In most cases it should be safe and yield better performance. The effort on changing depends on whether you used the Map interface or Hashtable directly.
Will there be any side effect?
There might be side effects if your application expects to immediately be able to access elements that were put into the map by another thread.
From the JavaDoc on ConcurrentHashMap:
Retrieval operations (including get) generally do not block, so may overlap
with update operations (including put and remove). Retrievals reflect the
results of the most recently completed update operations holding upon their onset.
Edit: to clarify on "immediately" consider thread 1 adds element A to the map and while that write is executed thread 2 tries to whether A exists in the map. With Hashtable thread 2 would be blocked until after the write so the check would return true but when using ConcurrentHashMap it would return false since thread 2 would not be blocked and the write operation is not yet completed (thus thread 2 would see an outdated version of the bucket).
Depending on the size of your Hashtable objects you might get some performance gains by switching to ConcurrentHashmap.
ConcurrentHashmap is broken into segments which allow for the table to be only partially locked. This means that you can get more accesses per second than a Hashtable, which requires that you lock the entire table.
The tables themselves are both thread safe and both implement the Map interface, so replacement should be relatively easy.

ConcurrentHashMap operations are thread safe

Java Docs for the ConcurrentHashMap says,
even though all operations are thread-safe
What is the meaning when we say all operations of ConcurrentHashMap are thread safe?
EDIT:
what i mean to ask is that suppose there is put() operation. then according to above statement put() in CHM is thread safe. What does this mean?
From Wikipedia:
A piece of code is thread-safe if it only manipulates shared data structures in a manner that guarantees safe execution by multiple threads at the same time.
To answer your expanded question, if multiple threads were to execute put() the effect would be that the last one to run would set the value for that key in the map. All of the puts would happen in some sequence, but they would not interfere with each other. How might they interfere without a concurrency guarantee? Well, put() returns null if no value had previously been associated with the mapping or the previous value. If two puts happened on a non-concurrent map they can both get the same return value from the put.
This sequence is possible without concurrency:
Thread1: map.put("key1", "value1") => null
then
Thread2: map.put("key2", "value2") => "value1"
Thread3: map.put("key3", "value3") => "value1"
If Thread3 got in just after Thread2, it might see "value1" rather than "value2", even though that's not what it replaces. This won't happen in a concurrent map.
What thread safety means is that you are permitted to share a ConcurrentHashMap object across multiple threads, and to access/modify that object concurrently without external locking.
Thread-safety means that an object can be used simultaneously by multiple threads while still operating correctly. In the specific case of ConcurrentHashMap, these characteristics are guaranteed:
Iterators produced by the map never throw ConcurrentModificationException, and they'll iterate in an order that's fixed when they're created. They may or may not reflect any modifications made while the map is being accessed. Ordinary HashMap iterators will throw exceptions if modified while a thread is iterating over them.
Insertion and removal operations are thread-safe. Ordinary HashMaps might get into an inconsistent internal state if multiple threads tried to insert or remove items simultaneously, especially if modifications required a rehash.
that if two threads will concurrently try to do operations on the ConcurrentHashMap you are guaranteed that the operations will not leave the data structure in an inconsistent state.
That's not something other non concurrent data structure guarantee.
It means that all the operations you do to add/delete objects into your hash map is thread safe, but retrieving is not thread safe. Means that when you added a object in a perfect thread safe environment, after that moment that object should be visible to all the thread who are retrieving object from this MAP. But this thing is not guaranteed here.

advantages of java's ConcurrentHashMap for a get-only map?

Consider these two situations:
a map which you are going to populate once at the beginning and then will be accessed from many different threads.
a map which you are going to use as cache that will be accessed from many different threads. you would like to avoid computing the result that will be stored in the map unless it is missing, the get-computation-store block will be synchronized. (and the map will not otherwise be used)
In either of these cases, does ConcurrentHashMap offer you anything additional in terms of thread safety above an ordinary HashMap?
In the first case, it should not matter in practice, but there is no guarantee that modifications written to a regular hashmap will ever be seen by other threads. So if one thread initially creates and populates the map, and that thread never synchronized with your other threads, then those threads may never see the initial values set into the map.
The above situation is unlikely in practice, and would only take a single synchronization event or happens before guarantee between the threads (read / write to a volatile variable for instance) to ensure even theoretical correctness.
In the second case, there is a concern since access to a HashMap that modifies it structurally (adding a value) requires synchronization. Furthermore, you need some type of synchronization to establish a happens-before relationship / shared visibility with the other threads or there is no guarantee that the other threads will see the new values you put in. ConcurrentHashMap offers these guarantees and will not break when one thread modifies it structurally.
There is no difference in thread safety, no. For scenario #2 there is a difference in performance and a small difference in timing guarantees.
There will be no synchronization for your scenario #2, so threads that want to use the cache don't have to queue up and wait for others to finish. However, in order to get that benefit you don't have hard happens-before relationships at the synchronization boundaries, so it's possible two threads will compute the same cached value more or less at the same time. This is generally harmless as long as the computation is repeatable.
(There is also the slight difference that ConcurrentHashMap does not allow null to be used as a key.)

Categories

Resources