What are the not thread-Safe cases when using HashMap in Java? - java

In the API documents, we can see:
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be
synchronized externally. (A structural modification is any operation
that adds or deletes one or more mappings; merely changing the value
associated with a key that an instance already contains is not a
structural modification.)
I'm thinking if the "put" method should be synchronized ? It said only the structural modification. Can you give some unsafe cases for the HashMap. And when I view the source code of "HashTable", the "get" method is also been synchronized, why not only the write operations be synchronized?

There is a general rule of thumb:
If you have more than one thread accessing a collection and at least one thread modifies the collection at some point, you need to synchronize all accesses to the collection.
If you think about it, its very clear: If a collection is modified while another thread reads from it (e.g. iterates), read and write operation can interfere with each other (the read seeing a partial write, e.g. entry created but value not yet set or entry not properly linked yet).
Exempt from this are collections one thread creates and modifies, then hands of to "the world" but never modifies them after publishing their reference.

why not only the write operations be synchronized?
If the reads are not synchronized as well, you might encounter visibility issues. Not only that, but it is also possible to completely thrash the object, if it performs structural changes!
The JVM specification gives a few guarantees regarding when modifications to a memory location made by one thread will be visible to other threads. One such guarantee is that modifications by a thread prior to releasing a lock are visible to threads that subsequently acquire the same lock. That's why you need to synchronized the read operations as well, even in the absence of concurrent structural modifications to the object.
Note that this releasing/acquiring locks is not the only way to guarantee visibility of memory modifications, but it's the easiest. Others include order of starting threads, class initialization, reads/writes to memory locations... more sophisticated stuff (and possibly more scalable on a highly concurrent environment, due to a reduced level of contention).
If you don't use any of those other techniques to ensure visibility, then simply locking only on write operations is wrong code. You might or might not encounter visibility issues though -- there's no guarantee that the JVM will fail, but it's possible, so... wrong code.
I'd suggest you read the book "Java Concurrency in Practice", one of the best texts on the subject I've ever read, after the JVM spec itself. Obviously, the book is way easier (still far from trivial!) and more fun to read than the spec...

One example would be:
Thread 1:
Iterator<YourType> it = yourMapInstance.entrySet().iterator();
while(it.hasNext()) {
it.next().getValue().doSth();
Thread.sleep(1000);
}
}
Thread 2:
for(int i = 0; i < 10; i++) {
if(Math.random() < 0.5) {
yourMapInstance.clear();
Thread.sleep(500);
}
}
Now, if both threads are executed concurrently, at some point there might be a situation, that you have a value in your iterator, while the other thread has already deleted everything from the map. In this case, synchronization is required.

Related

ConcurrentHashMap needed with ReadWriteLock?

I have a Map which is read by multiple threads but which is (from time to time) cleared and rebuilt by another thread.
I have surrounded all the access to this map with
readWriteLock.readLock().lock()
try {
... access myMap here...
} finally {
readWriteLock.readLock().unlock()
}
... or the writeLock() equivalents, depending on the type of access.
My question is... will the ReadWriteLock ensure the updates to myMap are visible to the other threads (since they must wait until after the unlock() is called by the writing thread? Or, do I also need to make myMap a concurrent map, like ConcurrentHashMap?
I will probably do that, just to be safe, but I'd like to understand better.
Yes, this should be fine even without a thread-aware map. The Javadoc for ReadWriteLock explicitly says:
All ReadWriteLock implementations must guarantee that the memory synchronization effects of writeLock operations (as specified in the Lock interface) also hold with respect to the associated readLock. That is, a thread successfully acquiring the read lock will see all updates made upon previous release of the write lock.
(Of course, by using a reader/writer lock at all you depend on the map supporting concurrent lookups from different threads. One could imagine clever data structure that try to save time overall by mutating some internal cached state during a lookup. But the standard collections such as HashMap will not do that).

How does a function becomes atomic?

I have been reading the book called art of multiprocessor programming and came across functions such as get(), getandset(), compareandset(), getandIncrease(), getandIncrease() etc.
In the book it says that all the above function are atomic and I agree but I had my own doubts on how some function becomes a atomic function.
Why does the function with get or compare become atomic ? - because it has to wait till it gets the value or waits till some condition becomes true which creates a barrier, hence atomic.
Am I right in thinking this way? is there any thing that I have missed ?
When I do
if (tail_index.get() == (head_index.getAndIncrement())
is this atomic?
A method is made atomic relative to some instance by adding explicit thread-safety. In many cases this is done by marking the method as synchronized. There is not magic, if you look at the source code of the thread-safe class that claims that methods are atomic, you will see the locking.
WRT to your second part, No it is not atomic. Each method call is atomic but when you put two together the combination is not atomic. get and getAndIncrement have been explicitly made atomic. Once you add other code (or a combination of the calls) it is not atomic unless you make it so.
A function is atomic if it appears to occur instantaneously.[1]
Here, "appears to" means from the point of view of the rest of the system. For instance, consider a synchronized function that reverses a linked list. To an outside observer, the operation clearly does not occur instantaneously: it takes many reads and writes to update all the list pointers. However, as a lock is held the entire time, no other part of the system may read the list during this time, so to them, the update appears instantaneous.
Equally, CAS (compare-and-set) operations do not actually occur instantly on modern computers. It takes time for one CPU core to obtain exclusive write access to the value, and then it takes more time for another core to re-obtain read access afterwards to see the new value. During this time, the CPU is executing other instructions in parallel. To ensure the illusion of instantaneous execution is preserved, the JVM issues CPU instructions before and after the CAS operation to ensure no logically subsequent reads get pulled up and executed before the CAS finishes (which would allow you to read a part of the linked list before you had actually taken the lock, for instance), and that no logically preceding writes get delayed and executed after the CAS finishes (which would allow another thread to take the lock before the linked list was completely updated).
These CPU ordering instructions are the key difference between AtomicInteger.compareAndSet and AtomicInteger.weakCompareAndSet (the "may fail spuriously" bit is easily rectified with a loop). Without the ordering guarantees, the weak CAS operation cannot be used to implement most concurrent algorithms, and "is only rarely an appropriate alternative to compareAndSet".
If this is sounding complicated...well...it is! Which is why you can still get a PhD by designing a concurrent algorithm. To show correctness for a concurrent algorithm, you have to consider what every other thread may possibly be doing to mess you around. It may help if you think of them as adversaries, trying to break the illusion of atomicity. For instance, let's consider your example:
if (tail_index.get() == (head_index.getAndIncrement()))
I assume this is part of a method to pop an item off a stack implemented as a cyclic array with index counters, and execute the body of the "if" if the stack is now empty. As head_index and tail_index are being accessed separately, your adversary can "divide" them with as many operations as he likes. (Imagine, for instance, that your thread is interrupted by the OS between the get and the getAndIncrement.) So it would be easy for him to add dozens of items to the stack, then remove all but one, leaving head_index above tail_index; your if block will then never execute, even though you are removing the last item on the stack.
So, when your book says get(), getAndSet(), etc. are atomic, it is not making a general statement about any possible implementation of those methods. It's telling you that the Java standard guarantees that they are atomic, and does so by careful use of the available CPU instructions, in a way that would be impossible to do in plain Java (synchronized lets you emulate it, but is more costly).
No, function, using get() is not atomic. But, for example, getAndIncrement or compareAndSet are atomic themselves. That means that it guaranteed, that all the logic is made atomically. For get() there is one another assurance: when you publish atomic value into one thread, it immediately becomes visible to another threads (just like volatile fields). Non-volatile and non-atomic values dont: there are cases, where value being set to non-volatile fiels is not visible to another threads; these threads get an old value reading field's value.
But you always can write atomic function using Atomic* classes and other synchonization primitives.

Synchronization decision built into Java using intrinsic locks (good or bad)

In Java an Object itself can act as a lock for guarding its own state . This convention is used in many built in classes like Vector and other synchronized collections where every method is synchronized and thus guarded by the intrinsic lock of the object itself . Is this good or bad ? Please give reasons also .
Pros
It's simple.
You can control the lock externally.
Cons
It breaks encapuslation.
You can't change its locking behaviour without changing its implied contract.
For the most part, it doesn't matter unless you are developing an API which will be widely used. So while using synchronised(this) is not ideal, it is simple.
Well Vector, Hashtable, etc. were synchronized like this internally and we all know what happened to them...
I honestly can't find any good reason to do synchronization like this. Here are the disadvantages that I see:
There's almost always a more efficient way of ensuring thread-safety than just putting a lock on the entire method.
It slows down the code in single threaded environments because you pay the overhead of locking and unlocking without actually needing the lock.
It gives a false sense of security because although each operation is synchronized, sequences of operations are not and you can still accidentally create data races. Imagine a collection which is synchronized on each method and the following code:
if(collection.isEmpty()) {
collection.add(...);
}
Assuming the aim is to have only a single item added, the above code is not thread safe because a thread can be interrupted between the if check and the actual call to add, even though both operations are synchronized individually, so it is possible to actually get two items in the collection.

advantages of java's ConcurrentHashMap for a get-only map?

Consider these two situations:
a map which you are going to populate once at the beginning and then will be accessed from many different threads.
a map which you are going to use as cache that will be accessed from many different threads. you would like to avoid computing the result that will be stored in the map unless it is missing, the get-computation-store block will be synchronized. (and the map will not otherwise be used)
In either of these cases, does ConcurrentHashMap offer you anything additional in terms of thread safety above an ordinary HashMap?
In the first case, it should not matter in practice, but there is no guarantee that modifications written to a regular hashmap will ever be seen by other threads. So if one thread initially creates and populates the map, and that thread never synchronized with your other threads, then those threads may never see the initial values set into the map.
The above situation is unlikely in practice, and would only take a single synchronization event or happens before guarantee between the threads (read / write to a volatile variable for instance) to ensure even theoretical correctness.
In the second case, there is a concern since access to a HashMap that modifies it structurally (adding a value) requires synchronization. Furthermore, you need some type of synchronization to establish a happens-before relationship / shared visibility with the other threads or there is no guarantee that the other threads will see the new values you put in. ConcurrentHashMap offers these guarantees and will not break when one thread modifies it structurally.
There is no difference in thread safety, no. For scenario #2 there is a difference in performance and a small difference in timing guarantees.
There will be no synchronization for your scenario #2, so threads that want to use the cache don't have to queue up and wait for others to finish. However, in order to get that benefit you don't have hard happens-before relationships at the synchronization boundaries, so it's possible two threads will compute the same cached value more or less at the same time. This is generally harmless as long as the computation is repeatable.
(There is also the slight difference that ConcurrentHashMap does not allow null to be used as a key.)

Some questions on java multithreading,

I have a set of questions regarding Java multithreading issues. Please provide me with as much help as you can.
0) Assume we have 2 banking accounts and we need to transfer money between them in a thread-safe way.
i.e.
accountA.money += transferSum;
accountB.money -= transferSum;
Two requirements exist:
no one should be able to see the intermediate results of the operation (i.e. one acount sum is increased, but others is not yet decreased)
reading access should not be blocked during the operation (i.e. old values of account sums should be shown during the operation goes on)
Can you suggest some ideas on this?
1) Assume 2 threads modify some class field via synchronized method or utilizing an explicit lock. Regardless of synchronization, there are no guarantee that this field will be visible to threads, that read it via NOT synchronized method. - is it correct?
2) How long a thread that is awoken by notify method can wait for a lock? Assume we have a code like this:
synchronized(lock) {
lock.notifyall();
//do some very-very long activity
lock.wait() //or the end of synchronized block
}
Can we state that at least one thread will succeed and grab the lock? Can a signal be lost due to some timeout?
3) A quotation from Java Concurrency Book:
"Single-threaded executors also provide sufficient internal synchronization to guarantee that any memory writes made by tasks are visible to subsequent tasks; this means that objects can be safely confined to the "task thread" even though that thread may be replaced with another from time to time."
Does this mean that the only thread-safety issue that remains for a code being executed in single-threaded executor is data race and we can abandon the volatile variables and overlook all visibility issues? It looks like a universal way to solve a great part of concurrency issues.
4) All standard getters and setters are atomic. They need not to be synchronized if the field is marked as volatile. - is it correct?
5) The initiation of static fields and static blocks is accomplished by one thread and thus need not to be synchronized. - is it correct?
6) Why a thread needs to notify others if it leaves the lock with wait() method, but does not need to do this if it leaves the lock by exiting the synchronized block?
0: You can't.
Assuring an atomic update is easy: you synchronize on whatever object holds the bank accounts. But then you either block all readers (because they synchronize as well), or you can't guarantee what the reader will see.
BUT, in a large-scale system such as a banking system, locking on frequently-accessed objects is a bad idea, as it introduces waits into the system. In the specific case of changing two values, this might not be an issue: it will happen so fast that most accesses will be uncontended.
There are certainly ways to avoid such race conditions. Databases do a pretty good job for ba nk accounts (although ultimately they rely on contended access to the end of a transaction).
1) To the best of my knowledge, there are no guarantees other than those established by synchronized or volatile. If one thread makes a synchronized access and one thread does not, the unsynchronized access does not have a memory barrier. (if I'm wrong, I'm sure that I'll be corrected or at least downvoted)
2) To quote that JavaDoc: "The awakened threads will not be able to proceed until the current thread relinquishes the lock on this object." If you decide to throw a sleep into that synchronized block, you'll be unhappy.
3) I'd have to read that quote several times to be sure, but I believe that "single-threaded executor" is the key phrase. If the executor is running only a single thread, then there is a strict happens-before relationship for all operations on that thread. It does not mean that other threads, running in other executors, can ignore synchronization.
4) No. long and double are not atomic (see the JVM spec). Use an AtomicXXX object if you want unsynchronized access to member variables.
5) No. I couldn't find an exact reference in the JVM spec, but section 2.17.5 implies that multiple threads may initialize classes.
6) Because all threads wait until one thread does a notify. If you're in a synchronized block, and leave it with a wait and no notify, every thread will be waiting for a notification that will never happen.
0) Is a difficult problem because you don't want intermediate results to be visible or to lock readers during the operation. To be honest I'm not sure it's possible at all, in order to ensure no thread sees intermediate results you need to block readers while doing both writes.
If you dont want intermediate results visible then you have to lock both back accounts before doing your writing. The best way to do this is to make sure you get and release the locks in the same order each time (otherwise you get a deadlock). E.G. get the lock on the lower account number first and then the greater.
1) Correct, all access must be via a lock/synchronized or use volatile.
2) Forever
3) Using a Single Threaded Executor means that as long as all access is doen by tasks run by that executor you dont need to worry about thread safety/visibilty.
4) Not sure what you mean by standard getters and setters but writes to most variable types (except double and long) are atomic and so don't need sync, just volatile for visibility. Try using the Atomic variants instead.
5) No, it is possible for two threads to try an init some static code, making naive implementations of Singleton unsafe.
6) Sync and Wait/Notify are two different but related mechanisms. Without wait/notify you'd have to spin lock (i.e. keep getting a lock and polling )on a object to get updates
5) The initiation of static fields and static blocks is accomplished by one thread and thus need not to be synchronized. - is it correct?
VM executes static initialization in a synchronized(clazz) block.
static class Foo {
static {
assert Thread.holdsLock(Foo.class); // true
synchronized(Foo.class){ // redundant, already under the lock
....
0) The only way I can see to do this to to store accountA and accountB in an object stored in an AtomicReference. You then make a copy of the object, modify it, and update the reference if it is still the same as the original reference.
AtomicReference<Accounts> accountRef;
Accounts origRef;
Accounts newRef;
do {
origRef = accountRef.get();
// make a deep copy of origRef
newRef.accountA.money += transferSum;
newRef.accountB.money -= transferSum;
} while(accountRef.compareAndSet(origRef, newRef);

Categories

Resources