In Java an Object itself can act as a lock for guarding its own state . This convention is used in many built in classes like Vector and other synchronized collections where every method is synchronized and thus guarded by the intrinsic lock of the object itself . Is this good or bad ? Please give reasons also .
Pros
It's simple.
You can control the lock externally.
Cons
It breaks encapuslation.
You can't change its locking behaviour without changing its implied contract.
For the most part, it doesn't matter unless you are developing an API which will be widely used. So while using synchronised(this) is not ideal, it is simple.
Well Vector, Hashtable, etc. were synchronized like this internally and we all know what happened to them...
I honestly can't find any good reason to do synchronization like this. Here are the disadvantages that I see:
There's almost always a more efficient way of ensuring thread-safety than just putting a lock on the entire method.
It slows down the code in single threaded environments because you pay the overhead of locking and unlocking without actually needing the lock.
It gives a false sense of security because although each operation is synchronized, sequences of operations are not and you can still accidentally create data races. Imagine a collection which is synchronized on each method and the following code:
if(collection.isEmpty()) {
collection.add(...);
}
Assuming the aim is to have only a single item added, the above code is not thread safe because a thread can be interrupted between the if check and the actual call to add, even though both operations are synchronized individually, so it is possible to actually get two items in the collection.
Related
Joshua Bloch's Effective Java, Second Edition, item 69 , states that
[...] To provide
high concurrency, these implementations manage their own synchronization internally (Item 67). Therefore, it is impossible to exclude concurrent activity from
a concurrent collection; locking it will have no effect but to slow the program.
Is this last statement correct? If two threads lock the collection and perform several operations within that lock, these operations might still be interleaved?
For the statement to be correct I would expect that either these collections run threads internally with which you cannot synchronize, or they somehow "override" the standard synchronization behavior such that a statement like synchronized(map){ ... } behaves different than on a 'normal' object. From the answers/comments to related questions I think neither if these is true:
Exclusively Locking ConcurrentHashMap
ConcurrentHashMap and compound operations
To avoid possible misinterpretation:
I'm aware that concurrent collections are designed exactly to avoid this global locking, my question is whether it's possible in principle
I find Effective Java an excellent book and I'm just seeking clarity on a particular item.
Sources suggest that ConcurrentHashMap uses an internal mechanism for locking (static final class [More ...] Segment<K,V> extends ReentrantLock) and does not therefore use any synchronized methods for it's locking mechanism.
It should therefore be simple to use the Map as a lock and synchronize on it - in the same way you could use a new Object() or your own ReentrantLock. However, it would not affect the inner workings of the Map which is - I think - what he is trying to say.
This might clarify it (hint from another Item 67):
It is not possible for clients to perform external synchronization on such a method because there can be no guarantee that unrelated clients will do likewise.
Your code is a client to these internally-synchronized concurrent implementations. Even if you use external lock (to slow yourself down), other clients may not and will still execute internal implementation concurrently.
In the API documents, we can see:
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be
synchronized externally. (A structural modification is any operation
that adds or deletes one or more mappings; merely changing the value
associated with a key that an instance already contains is not a
structural modification.)
I'm thinking if the "put" method should be synchronized ? It said only the structural modification. Can you give some unsafe cases for the HashMap. And when I view the source code of "HashTable", the "get" method is also been synchronized, why not only the write operations be synchronized?
There is a general rule of thumb:
If you have more than one thread accessing a collection and at least one thread modifies the collection at some point, you need to synchronize all accesses to the collection.
If you think about it, its very clear: If a collection is modified while another thread reads from it (e.g. iterates), read and write operation can interfere with each other (the read seeing a partial write, e.g. entry created but value not yet set or entry not properly linked yet).
Exempt from this are collections one thread creates and modifies, then hands of to "the world" but never modifies them after publishing their reference.
why not only the write operations be synchronized?
If the reads are not synchronized as well, you might encounter visibility issues. Not only that, but it is also possible to completely thrash the object, if it performs structural changes!
The JVM specification gives a few guarantees regarding when modifications to a memory location made by one thread will be visible to other threads. One such guarantee is that modifications by a thread prior to releasing a lock are visible to threads that subsequently acquire the same lock. That's why you need to synchronized the read operations as well, even in the absence of concurrent structural modifications to the object.
Note that this releasing/acquiring locks is not the only way to guarantee visibility of memory modifications, but it's the easiest. Others include order of starting threads, class initialization, reads/writes to memory locations... more sophisticated stuff (and possibly more scalable on a highly concurrent environment, due to a reduced level of contention).
If you don't use any of those other techniques to ensure visibility, then simply locking only on write operations is wrong code. You might or might not encounter visibility issues though -- there's no guarantee that the JVM will fail, but it's possible, so... wrong code.
I'd suggest you read the book "Java Concurrency in Practice", one of the best texts on the subject I've ever read, after the JVM spec itself. Obviously, the book is way easier (still far from trivial!) and more fun to read than the spec...
One example would be:
Thread 1:
Iterator<YourType> it = yourMapInstance.entrySet().iterator();
while(it.hasNext()) {
it.next().getValue().doSth();
Thread.sleep(1000);
}
}
Thread 2:
for(int i = 0; i < 10; i++) {
if(Math.random() < 0.5) {
yourMapInstance.clear();
Thread.sleep(500);
}
}
Now, if both threads are executed concurrently, at some point there might be a situation, that you have a value in your iterator, while the other thread has already deleted everything from the map. In this case, synchronization is required.
rephrased for clarity
I would like to be able to mix the use of the synchronized block with more explicit locking via calling lock and release methods directly when appropriate. Thus allowing me the syntaxtical sugar of using sychtronized(myObject) when I can get away with it, but also being able to call myObject.lock & myObject.unlock dierctly for those times when a synchronized block is not flexible enough to do what I need done.
I know that every single Object has, implicitly, a rentrant lock built into it which is used by the synchronized block. Effectively the the lock mthod is caled on the Objects internal reentrant lock every time one enters a sychronized block, and unlock is called on the same reentrant lock when you leave the synchronized block. I would seem as if it wouldn be easy enough to allow one the ability to manually lock/unlock this inplicit reentrant lock; thus allowing a mixing of synchronized blocks and explcit locking.
However, as far as I know there is no way to do this. And because of the way synchronized blocks work I don't believe there is a convenient way to mix them with explicit locking othewrise. It seems as if this would be a rather convenient, and easily added by expending the Object api to add lock/unlock methods.
The question I have is, why doesn't this exist? I'm certain there is a reason, but I don't know what it is. I thought the issue may be with encapsulation; same reason you don't want to do synchronize(this). However, if I am already calling sycnhronized(myObject) then by defination anyone who knows about myObject can likewise synchronize on it and cause a deadlock if done foolishly. The question of encapsulation comes down to who can access the object you synchronized on regardless of rather you use a sychtronized block or manually locked the object; at least as I see it. So is there some other advantage to not allowing one to manually lock an object?
The locks of a certain object is highly tied to the instance itself. The structure of the synchronized blocks and methods are very strict. If you, as a programmer, would have the possibility to interfere with the system (virtual machine), it could cause serious problems.
You could eventually release a lock that was created by a synchronized block
You create a lock that another synchronized block will release
You create more lock entries than exits
You create more lock exits than entries
There are even specific bytecodes defined for the lock and release operations. If you would have a "method" for this lock/unlock operation, it should be compiled to these bytecodes. So, it is really a low-level operation, and very much different from other Java object level implementations.
Synchronisation is a very strong contract. I think that the designers of the JLS did not want to allow the possibility to break this contract.
The Chapter 17 of the JLS describes more about the expected behaviour.
We know that ConcurrentHashMap can provide concurrent access to multiple threads to boost performance , and inside this class, segments are synchronized up (am I right?). Question is, can this design guarantee the thread safety? Say we have 30+ threads accessing &changing an object mapped by the same key in a ConcurrentHashMap instance, my guess is, they still have to line up for that, don't they?
From my recollection that the book "Java Concurrency in Practice" says the ConcurrentHashMap provide concurrent reading and a decent level of concurrent writing. in the aforementioned scenario, and if my guess is correct, the performance won't be better than using the Collection's static synchonization wrapper api?
Thanks for clarifying,
John
You will still have to synchronize any access to the object being modified, and as you suspect all access to the same key will still have contention. The performance improvement comes in access to different keys, which is of course the more typical case.
All a ConcurrentMap can give you wrt to concurrency is that modifications to the map itself are done atomically, and that any writes happen-before any reads (this is important as it provides safe publishing of any reference from the map.
Safe-publishing means that any (mutable) object retrieved from the map will be seen with all writes to it before it was placed in the map. It won't help for publishing modifications that are made after retrieving it though.
However, concurrency and thread-safety is generally hard to reason about and make correct if you have mutable objects that are being modified by multiple parties. Usually you have to lock in order to get it right. A better approach is often to use immutable objects in conjunction with the ConcurrentMap conditional putIfAbsent/replace methods and linearize your algorithm that way. This lock-free style tends to be easier to reason about.
Question is, can this design guarantee the thread safety?
It guarantees the thread safety of the map; i.e. that access and updates on the map have a well defined and orderly behaviour in the presence of multiple threads performing updates simultaneously.
It does guarantee thread safety of the key or value objects. And it does not provide any form of higher level synchronization.
Say we have 30+ threads accessing &changing an object mapped by the same key in a ConcurrentHashMap instance, my guess is, they still have to line up for that, don't they?
If you have multiple threads trying to use the same key, then their operations will inevitably be serialized to some degree. That is unavoidable.
In fact, from briefly looking at the source code, it looks like ConcurrentHashMap falls back to using conventional locks if there is too much contention for a particular segment of the map. And if you have multiple threads trying to access AND update the same key simultaneously, that will trigger locking.
first remember that a thread safe tool doesn't guarantee thread safe usage of it in and of itself
the if(!map.contains(k))map.put(k,v); construct to putIfAbsent for example is not thread safe
and each value access/modification still has to be made thread safe independently
Reads are concurrent, even for the same key, so performance will be better for typical applications.
When using any of the java.util.concurrent classes, do I still need to synchronize access on the instance to avoid visibility issues between difference threads?
Elaborating the question a bit more
When using an instance of java.util.concurrent, is it possible that one thread modify the instance (i.e., put an element in a concurrent hashmap) and a subsequent thread won't be seeing the modification?
My question arises from the fact that The Java Memory Model allows threads to cache values instead of fetching them directly from memory if the access to the value is not synchronized.
On the java.util.concurrent package Memory Consistency Properties, you can check the Javadoc API for the package:
The methods of all classes in
java.util.concurrent and its
subpackages extend these guarantees to
higher-level synchronization. In
particular:
Actions in a thread prior to placing an object into any
concurrent collection
happen-before actions subsequent to the access or removal of that
element from the collection in
another thread.
[...]
Actions prior to "releasing" synchronizer methods such as
Lock.unlock, Semaphore.release, and
CountDownLatch.countDown
happen-before actions subsequent to a successful "acquiring" method
such as Lock.lock,
Semaphore.acquire, Condition.await,
and CountDownLatch.await on the
same synchronizer object in another
thread.
[...]
So, the classes in this package make sure of the concurrency, making use of a set of classes for thread control (Lock, Semaphore, etc.). This classes handle the happen-before logic programmatically, i.e. managing a FIFO stack of concurrent threads, locking and releasing current and subsequent threads (i.e. using Thread.wait() and Thread.resume(), etc.
Then, (theoretically) you don't need to synchronize your statements accessing this classes, because they are controlling concurrent threads access programmatically.
Because the ConcurrentHashMap (for example) is designed to be used from a concurrent context, you don't need to synchronise it further. In fact, doing so could undermine the optimisations it introduces.
For example, Collections.synchronizedMap(...) represents a way to make a map thread safe, as I understand it, it works essentially by wrapping all the calls within the synchronized keyword. Something like ConcurrentHashMap on the other hand creates synchronized "buckets" across the elements in the collection, causing finer grained concurrency control and therefore giving less lock contention under heavy usage. It may also not lock on reads for example. If you wrap this again with some synchronised access, you could undermine this. Obviously, you have to be careful that all access to the collection is syncrhronised etc which is another advantage of the newer library; you don't have to worry (as much!).
The java.lang.concurrent collections may implement their thread safety via syncrhonised. in which case the language specification guarantees visibility. They may implement things without using locks. I'm not as clear on this, but I assume it the same visibility would be in place here.
If you're seeing what looks like lost updates in your code, it may be that its just a race condition. Something like the ConcurrentHashpMap will give you the most recent value on a read and the write may not have yet been written. It's often a trade off between accuracy and performance.
The point is; java.util.concurrent stuff is meant to do this stuff so I'd be confident that it ensures visibility and use of volatile and/or addition syncrhonisation shouldn't be needed.