Local References in ConcurrentHashMap - java

In ConcurrentHashMap, segments is marked final (and thus will never change), but the method ensureSegment creates a method-local copy, ss, of segments upon which to operate.
Does anybody know this purpose? which benefit we can get?
Update:
I search from google, get one page which explained ConcurrentHashMap in JDK7 The Concurrency Of ConcurrentHashMap, below are excerpts
Local References
Even though segments is marked final (and thus will never change), Doug Lea prudently creates a method-local copy, ss, of segments upon which to operate. Such defensive programming allows a programmer to not worry about otherwise-volatile instance member references changing during execution of a method (i.e. inconsistent reads). Of course, this is simply a new reference and does not prevent your method from seeing changes to the referent.
Can anyone explain the bold text?

There is no semantic difference between accessing a final field and accessing a local variable holding a copy of the final field’s value. However, it is an established pattern to copy fields into local variables in performance critical code.
Even in cases where it does not make a difference (that depends on the state of the HotSpot optimizer), it will save at least some bytes in the method’s code.
Each access to an instance field, be it final or not (the only exception being compile-time constants), will get compiled as two instructions, first pushing the this reference onto the operand stack via aload_0, then performing a getfield operation, which has a two byte index to the constant pool entry describing the field. In other words, each field access needs four bytes in the method’s code, whereas reading a local variable needs only one byte if the variable is one of the method’s first four (counting this as a local variable), which is the case here (see aload_n).
So storing a field’s value in a local variable when it is going to be accessed multiple times is a good behavior to protect against changes of a mutable variable and to avoid the cost of a volatile read and still doesn’t hurt even when it is obsolete as in the case of a final field as it will even produce more compact byte code. And in a simple interpreted execution, i.e. before the optimizer kicks in, the single local variable access might indeed be faster than the detour via the this instance.

The array is marked final, but the array elements are not (can not). So each array element can be set or replaced, for example using:
this.segments[i] = ...
ensureSegment in Java 7 uses a different method to set the array element, using sun.misc.Unsafe, for improved performance when concurrently calling the method. This is not what a "regular" developer should do.
A local variable
final Segment<K,V>[] ss = this.segments;
is typically used to ensure (well - to increase that probability) that ss is in a CPU register, and not re-read from memory, for the duration of this method. I guess that would not be needed in this case, as the compiler can infer that. Using ss does make the following lines slightly shorter, maybe that's why it is used.

Related

Is the word 'mutable variable' in java concurrency programming same as the meaning in functional programming?

In the book 'Java Concurrency in Practice', when talking about 'locking and visibility', the author said:
We can now give the other reason for the rule requiring all threads to synchronize on the same lock when accessing a shared mutable variable—to guarantee that values written by one thread are made visible to other threads. Otherwise, if a thread reads a variable without holding the appropriate lock, it might see a stale value.
Here is the figure:
I'm curious about the meaning of 'mutable' here. As per my knowledge in functional programming, 'immutable' means unchangeable and 'mutable' opposite. The variable x in the figure is what the author refers to as shared mutable variable. Is x(an integer or some other similar) mutable?
A shared variable is a placeholder for a location within the shared memory. There might be some confusion due to the fact that you can have an immutable reference variable pointing to an object with mutable instance variables.
But you can always decomposed all object graphs to a set of simple variables. If all these variables are immutable, then the entire object graph is immutable. But if some of these variables are mutable, we may enter the discussion about the possibility of data races if one or more of these variables are modified in one thread and read by another thread.
For this discussion, their place in the complex object graph is irrelevant, which is reason why the discussion uses just two mutable variables, x and y, apparently of type int. They still may be members of a, e.g. a Point instance being stored in a HashMap, but the only thing that matters is that these x and y variables are being modified and, as explained in the cited book, the unlocking of M will make these modifications visible to any thread subsequently locking M, as this applies to all variables, regardless of their place within the heap memory or object graph.
Note that the mutable nature of x and y implies that there might be older values they had before the x=1 resp. y=1 assignments, which can show up when being read without synchronization. This includes the default values (0) they have before the first assignment.

Java final keyword semantics with respect to cache

What is the behavior of Java final keyword with respect of caches?
quote from:jsr133-faq
The values for an object's final fields are set in its constructor.
Assuming the object is constructed "correctly", once an object is
constructed, the values assigned to the final fields in the
constructor will be visible to all other threads without
synchronization. In addition, the visible values for any other object
or array referenced by those final fields will be at least as
up-to-date as the final fields.
I don't understand what it refers to when it says as up-to-date as the final fields.:
In addition, the visible values for any other object or array
referenced by those final fields will be at least as up-to-date as the
final fields.
My guess is, for example:
public class CC{
private final Mutable mutable; //final field
private Other other; //non-final field
public CC(Mutable m, Other o){
mutable=m;
other=o;
}
}
When the constructor CC returns, besides the pointer value of mutable, all values on the object graph rooted at m, if exist in the local processor cache, will be flushed to main memory. And at the same time, mark the corresponding cache lines of other processors' local caches as Invalid.
Is that the case? What does it look like in assembly? How do they actually implement it?
Is that the case?
The actual guarantee is that any thread that can see an instance of CC created using that constructor is guaranteed to see the mutable reference and also the state of the Mutable object's fields as of the time that the constructor completed.
It does not guarantee that the state of all values in the closure of the Mutable instance will be visible. However, any writes (in the closure or not) made by the thread that executed the constructor prior to the constructor completing will be visible. (By "happens-before" analysis.)
Note that the behavior is specified in terms what one thread is guaranteed to see, not in terms of cache flushing / invalidation. The latter is a way of implementing the behavior that the specification requires. There may be other ways.
What does it look like in assembly?
That will be version / platform / etc specific. There is a way to get the JIT compiler to dump out the compiled code, if you want to investigate what the native code looks like for your hardware.
How to see JIT-compiled code in JVM?
How do they actually implement it?
See above.

Is assigning a frequently used field to a local variable more efficient?

I was reading the source of java.util.HashMap and noticed it almost always assign the table field to a local variable if the value is used more than once in the method.
Since this class is documented to be not thread-safe and the field is not volatile, what's the point of this? Does it make the code more efficient?
By putting a member field into the local scope (that is, the current stackframe), you fix the reference for the entire execution of the method. So you have the same reference to the same object for every use.
Without putting it into the local scope, every access to the field is via this reference (implicitly or explicitly). So for every access, the JVM has to get the current value of the field - which theoretically may have change since the last access.
On top of being more reliable, the JIT may optimize the access, i.e. in loops (inlining values, whatever).
Impacts on performance are rather small, but measurable.

How do final fields prevent other threads from seeing partially constructed objects?

I was looking into creating an immutable datatype that has final fields (including an array that is constructed and filled prior to being assigned to the final member field), and noticed that it seems that the JVM is specified to guarantee that any other thread that gets a reference to this object will see the initialized fields and array values (assuming no pointers to this are published within the constructor, see What is an "incompletely constructed object"? and How do JVM's implicit memory barriers behave when chaining constructors?).
I am curious how this is achieved without synchronizing every access to this object, or otherwise paying some significant performance penalty. According to my understanding, the JVM can achieve this by doing the following:
Issue a write-fence at the end of the constructor
Publish the reference to the new object only after the write-fence
Issue a read-fence any time you refer to a final field of an object
I can't think of a simpler or cheaper way of eliminating the risk of other threads seeing uninitialized final fields (or recursive references through final fields).
This seems like it could impose a severe performance penalty due to all of the read-fences in the other threads reading the object, but eliminating the read-fences introduces the possibility that the object reference is seen in another processor before it issues a read-fence or otherwise sees the updates to the memory locations corresponding to the newly initialized final fields.
Does anyone know how this works? And whether this introduces a significant performance penalty?
See the "Memory Barriers" section in this writeup.
A StoreStore barrier is required after final fields are set and before the object reference is assigned to another variable. This is the key piece of info you're asking about.
According to the "Reordering" section there, a store of a final field can not be reordered with respect to a store of a reference to the object containing the final field.
Additionally, it states that in v.afield = 1; x.finalField = v; ... ; sharedRef = x;, neither of the first two can be reordered with respect to the third; which ensures that stores to the fields of an object that is stored as a final field are themselves guaranteed to be visible to other threads before a reference to the object containing the final field is stored.
Together, this means that all stores to final fields must be visible to all threads before a reference to the object containing the field is stored.

Why String class is immutable even though it has a non -final field called "hash"

I was reading through Item 15 of Effective Java by Joshua Bloch. Inside Item 15 which speaks about 'minimizing mutability' he mentions five rules to make objects immutable. One of them is is to make all fields final . Here is the rule :
Make all fields final : This clearly expresses your intent in a manner that is enforced
by the system. Also, it is necessary to ensure correct behavior if a reference
to a newly created instance is passed from one thread to another without
synchronization, as spelled out in the memory model [JLS, 17.5; Goetz06 16].
I know that String class is an example of a immutable class. Going through the source code I see that it actually has a hash instance which is not final .
//Cache the hash code for the string
private int hash; // Default to 0
How does String become immutable then ?
The remark explains why this is not final:
//Cache the hash code for the string
It's a cache. If you don't call hashCode, the value for it will not be set. It could have been set during the creation of the string, but that would mean longer creation time, for a feature you might not need (hash code). On the other hand, it would be wasteful to calculate the hash each time its asked, give the string is immutable, and the hash code will never change.
The fact that there's a non-final field does somewhat contradict that definition you quote, but here it's not part of the object's interface. It's merely an internal implementation detail, which has no effect on the mutability of the string (as a characters container).
Edit - due to popular demand, completing my answer: although hash is not directly part of the public interface, it could have affected the behavior of that interface, as hashCode return its value. Now, since hashCode is not synchronized, it is possible that hash be set more than once, if more than one thread used that method concurrently. However, the value that is set to hash is always the result of a stable calculation, which relies only on final fields (value, offset and count). Therefore, every calculation of the hash yield the exact same result. For an external user, this is just as if hash was calculated once - and just as if it was calculated each and every time, as the contract of hashCode requires that it consistently returns the same result for a given value. Bottom line, even though hash is not final, its mutability is never visible to an external viewer, hence the class can be considered immutable.
String is immutable because as far as its users are concerned, it can never be modified and will always look the same to all threads.
hashCode() is computed using the racy single-check idiom (EJ item 71), and it's safe because it doesn't hurt anybody if hashCode() is computed more than once accidentally.
Making all fields final is the easiest and simplest way to make classes immutable, but it's not strictly required. So long as all methods return the same thing no matter which thread calls it when, the class is immutable.
Even though String is immutable, it can change through reflection. If you make hash final, you could mess things up royally were this to occur. The hash field is different too in that it is there mainly as a cache, a way to speed up the calculation of hashCode() and should really be thought of as a calculated field, less so a constant.
There are many situations in which it may be helpful for a class which is logically immutable have several different representations for the same observable state, and for instances of the class to be able to switch among them. The hashcode value that will be returned from a string whose hash field is zero will be the same as the value that would be returned if the hash field held the result of an earlier hashcode call. Consequently, changing the hash value from the former to the latter will not change the object's observable state, but will cause future operations to run faster.
The biggest difficulties with coding things in those ways are
If an object is changed from holding a reference to some particular immutable object to holding a reference to a different object with identical semantic content, such a change shouldn't affect the observable state of the object holding the reference, but if it turns out the supposedly-identical object wasn't really identical, bad things can happen, especially if the object supposedly holding the reference was assumed to be substitutable for other semantically-identical objects.
Even if there aren't any mistakes in which objects are "identical", there may still be a danger that objects which appear identical to a thread which makes a substitution may not appear identical to other threads. This scenario isn't likely to occur, but if it does occur the effects may be very bad.
Still, there can be some advantages to making substitutitions of immutable objects. For example, if a program will be comparing many objects which hold long strings and many of them, though separately generated, will be identical to each other, it may be useful to use a WeakDictionary to build a pool of distinct string instances, and replace any string which is found to be identical to one in the pool with a reference to the pool copy. Doing that would cause many strings which are identical to be mapped to the same string, thus greatly accellerating any future comparisons that may be done between them. Of course, as noted it's very important that the objects are properly logically immutable, that the comparisons are done correctly. Any problems in that regard can turn what should be an optimization into a mess.
To create a object immutable You need to make the class final and all its member final so that once objects gets crated no one can modify its state. You can achieve same functionality by making member as non final but private and not modifying them except in constructor.
EDIT:
Notice :
When hashing a string, Java also caches the hash value in the hash attribute, but only if the result is different from zero.

Categories

Resources