immutable objects and lazy initialization. - java

http://www.javapractices.com/topic/TopicAction.do?Id=29
Above is the article which i am looking at. Immutable objects greatly simplify your program, since they:
allow hashCode to use lazy initialization, and to cache its return value
Can anyone explain me what the author is trying to say on the above
line.
Is my class immutable if its marked final and its instance variable
still not final and vice-versa my instance variables being final and class being normal.

As explained by others, because the state of the object won't change the hashcode can be calculated only once.
The easy solution is to precalculate it in the constructor and place the result in a final variable (which guarantees thread safety).
If you want to have a lazy calculation (hashcode only calculated if needed) it is a little more tricky if you want to keep the thread safety characteristics of your immutable objects.
The simplest way is to declare a private volatile int hash; and run the calculation if it is 0. You will get laziness except for objects whose hashcode really is 0 (1 in 4 billion if your hash method is well distributed).
Alternatively you could couple it with a volatile boolean but need to be careful about the order in which you update the two variables.
Finally for extra performance, you can use the methodology used by the String class which uses an extra local variable for the calculation, allowing to get rid of the volatile keyword while guaranteeing correctness. This last method is error prone if you don't fully understand why it is done the way it is done...

If your object is immutable it can't change it's state and therefore it's hashcode can't change. That allows you to calculate the value once you need it and to cache the value since it will always stay the same. It's in fact a very bad idea to implement your own hasCode function based on mutable state since e.g. HashMap assumes that the hash can't change and it will break if it does change.
The benefit of lazy initialization is that hashcode calculation is delayed until it is required. Many object don't need it at all so you save some calculations. Especially expensive hash calculations like on long Strings benefit from that.
class FinalObject {
private final int a, b;
public FinalObject(int value1, int value2) {
a = value1;
b = value2;
}
// not calculated at the beginning - lazy once required
private int hashCode;
#Override
public int hashCode() {
int h = hashCode; // read
if (h == 0) {
h = a + b; // calculation
hashCode = h; // write
}
return h; // return local variable instead of second read
}
}
Edit: as pointed out by #assylias, using unsynchronized / non volatile code is only guaranteed to work if there is only 1 read of hashCode because every consecutive read of that field could return 0 even though the first read could already see a different value. Above version fixes the problem.
Edit2: replaced with more obvious version, slightly less code but roughly equivalent in bytecode
public int hashCode() {
int h = hashCode; // only read
return h != 0 ? h : (hashCode = a + b);
// ^- just a (racy) write to hashCode, no read
}

What that line means is, since the object is immutable, then the hashCode has to only be computed once. Further, it doesn't have to be computed when the object is constructed - it only has to be computed when the function is first called. If the object's hashCode is never used then it is never computed. So the hashCode function can look something like this:
#Override public int hashCode(){
synchronized (this) {
if (!this.computedHashCode) {
this.hashCode = expensiveComputation();
this.computedHashCode = true;
}
}
return this.hashCode;
}

And to add to other answers.
Immutable object cannot be changed. The final keyword works for basic data types such as int. But for custom objects it doesn't mean that - it has to be done internally in your implementation:
The following code would result in a compilation error, because you are trying to change a final reference/pointer to an object.
final MyClass m = new MyClass();
m = new MyClass();
However this code would work.
final MyClass m = new MyClass();
m.changeX();

Related

Lazy initialization of hashcode in Java

Why do we say that immutable objects use lazy hash code initialization? For mutable objects too, we can calculate hashcode only when required right causing lazy initialization?
For mutable classes, it usually doesn't make much sense to store the hashCode, as you'd have to update it every time the object is modified (or at least nullify it so you can recalculate it next time hashCode() is called).
For immutable classes, it makes a lot of sense to store the hash code - once it's calculated, it will never change (since the object is immutable), and there's no need to keep re-calculating every time hashCode() is called. As a further optimization, we can avoid calculating this value until the first time it's needed (i.e., hashCode() is called) - i.e., use lazy initialization.
There's nothing that prohibits you from doing the same on a mutable object, it's just generally not a very good idea.
The advantage of lazy initialization is that hashcode computation is suspended until it is required. Many objects don't need it at all, so you save some computations. Particularly when you have high hash computations. Look at the example below :
class FinalObject {
private final int a, b;
public FinalObject(int value1, int value2) {
a = value1;
b = value2;
}
// not calculated at the beginning - lazy once required
private int hashCode;
#Override
public int hashCode() {
int h = hashCode; // read
if (h == 0) {
h = a + b; // calculation
hashCode = h; // write
}
return h; // return local variable instead of second read
}
}

Debugging Challenge in regards to TreeSet

So this was going to be my question, but I actually figured out the problem while I was writing it. Perhaps this will be useful for others (I will remove the question if it's a duplicate or is deemed inappropriate for this site). I know of two possible solutions to my problem, but perhaps someone will come up with a better one than I thought of.
I don't understand why TreeSet isn't removing the first element here. The size of the my TreeSet is supposed to stay bounded, but appears to grow without bound.
Here is what I believe to be the relevant code:
This code resides inside of a double for loop. NUM_GROUPs is a static final int which is set to 100. newGroups is a TreeSet<TeamGroup> object which is initialized (with no elements) before the double for loop (the variables group and team are from the two for-each loops).
final TeamGroup newGroup = new TeamGroup(group, team);
newGroups.add(newGroup);
System.err.println("size of newGroups: " + newGroups.size());
if (newGroups.size() > NUM_GROUPS) {
System.err.println("removing first from newGroups");
newGroups.remove(newGroups.first());
System.err.println("new size of newGroups: "
+ newGroups.size());
}
I included my debugging statements to show that the problem really does appear to happen. I get the following types of output:
size of newGroups: 44011
removing first from newGroups
new size of newGroups: 44011
You see that although the if statement is clearly being entered, the size of the TreeSet<TeamGroup> teamGroups isn't being decremented. It would seem to me that the only way for this to happen is if the remove call doesn't remove anything--but how can it not remove something from a call to first() which should definitely be an element in the TreeSet?
Here is the compareTo method in my TeamGroup class (score is an int which could very reasonably be the same for many different TeamGroup objects hence why I use the R_ID field as a tie-breaker):
public int compareTo(TeamGroup o) {
// sorts low to high so that when you pop off of the TreeSet object, the
// lowest value gets popped off (and keeps the highest values).
if (o.score == this.score)
return this.R_ID - o.R_ID;
return this.score - o.score;
}
Here is the equals method for my TeamGroup class:
#Override
public boolean equals(final Object o) {
return this.R_ID == ((TeamGroup) o).R_ID;
}
...I'm not worried about a ClassCastException here because this is specifically pertaining to my above problem where I never try to compare a TeamGroup object with anything but another TeamGroup object--and this is definitely not the problem (at least not a ClassCastException problem).
The R_ID's are supposed to be unique and I guarantee this by the following:
private static final double WIDTH = (double) Integer.MAX_VALUE
- (double) Integer.MIN_VALUE;
private static final Map<Integer, Integer> MAPPED_IDS =
new HashMap<Integer, Integer>(50000);
...
public final int R_ID = TeamGroup.getNewID();
...
private static int getNewID() {
int randID = randID();
while (MAPPED_IDS.get(randID) != null) {
randID = randID();
}
MAPPED_IDS.put(randID, randID);
return randID;
}
private static int randID() {
return (int) (Integer.MIN_VALUE + Math.random() * WIDTH);
}
The problem is here:
return this.R_ID - o.R_ID;
It should be:
return Integer.compare(this.R_ID, o.R_ID);
Taking the difference of two int or Integer values works if the values are both guaranteed to be non-negative. However, in your example, you are using ID values across the entire range of int / Integer and that means that the subtraction can lead to overflow ... and an incorrect result for compareTo.
The incorrect implementation leads to situations where the compareTo method is not reflexive; i.e. integers I1, I2 and I3 where the compareTo method says that I1 < I2 and I2 < I3, but also I3 < I1. When you plug this into TreeSet, elements get inserted into the tree in the wrong place, and strange behaviours happen. Precisely what is happening is hard to predict - it will depend on the objects that are inserted, and the order they are inserted.
TreeSet.first() should definitely return an object which belongs to the set, right?
Probably ...
So then why can it not remove this object?
Probably because it can't find it ... because of the broken compareTo.
To understand what exactly is going on, you would been to single step through the TreeSet code, etcetera.

String immutability allows hashcode value to be cached

Among the many reasons to why Strings are immutable, one of the reasons is cited as
String immutability allows hashcode value to be cached.
I did not really understand this. What is meant by caching hashcode values? Where are these values cached? Even if Strings would have been mutable, this cached hashcode value could always be updated as required; so what's the big deal?
What is meant by caching hashcode values? Where are these values cached?
After the hash code is calculated, it is stored in a variable in String.
Looking at the source of String makes this clearer:
public final class String implements ... {
...
/** Cache the hash code for the string */
private int hash; // Default to 0
...
public int hashCode() {
int h = hash;
if (h == 0 && ...) {
...
hash = h;
}
return h;
}
...
}
Even if Strings would have been mutable, this cached hashcode value could always be updated as required
True. But it would have to be recalculated / reset in every modification function. While this is possible, it's not good design.
All in all, the reason probably would've been better if it were as follows:
String immutability makes it easier to cache the hashcode value.

Caching objects built with multiple parameters

I have a factory that creates objects of class MyClass, returning already generated ones when they exist. As I have the creation method (getOrCreateMyClass) taking multiple parameters, which is the best way to use a Map to store and retrieve the objects?
My current solution is the following, but it doesn't sound too clear to me.
I use the hashCode method (slightly modified) of class MyClass to build an int based on the parameters of class MyClass, and I use it as the key of the Map.
import java.util.HashMap;
import java.util.Map;
public class MyClassFactory {
static Map<Integer, MyClass> cache = new HashMap<Integer, MyClass>();
private static class MyClass {
private String s;
private int i;
public MyClass(String s, int i) {
}
public static int getHashCode(String s, int i) {
final int prime = 31;
int result = 1;
result = prime * result + i;
result = prime * result + ((s == null) ? 0 : s.hashCode());
return result;
}
#Override
public int hashCode() {
return getHashCode(this.s, this.i);
}
}
public static MyClass getOrCreateMyClass(String s, int i) {
int hashCode = MyClass.getHashCode(s, i);
MyClass a = cache.get(hashCode);
if (a == null) {
a = new MyClass(s, i);
cache.put(hashCode , a);
}
return a;
}
}
Your getOrCreateMyClass doesn't seem to add to the cache if it creates.
I think this will also not perform correctly when hashcodes collide. Identical hashcodes do not imply equal objects. This could be the source of the bug you mentioned in a comment.
You might consider creating a generic Pair class with actual equals and hashCode methods and using Pair<String, Integer> class as the map key for your cache.
Edit:
The issue of extra memory consumption by storing both a Pair<String, Integer> key and a MyClass value might be best dealt with by making the Pair<String, Integer> into a field of MyClass and thereby having only one reference to this object.
With all of this though, you might have to worry about threading issues that don't seem to be addressed yet, and which could be another source of bugs.
And whether it is actually a good idea at all depends on whether the creation of MyClass is much more expensive than the creation of the map key.
Another Edit:
ColinD's answer is also reasonable (and I've upvoted it), as long as the construction of MyClass is not expensive.
Another approach that might be worth consideration is to use a nested map Map<String, Map<Integer, MyClass>>, which would require a two-stage lookup and complicate the cache updating a bit.
You really shouldn't be using the hashcode as the key in your map. A class's hashcode is not intended to necessarily guarantee that it will not be the same for any two non-equal instances of that class. Indeed, your hashcode method could definitely produce the same hashcode for two non-equal instances. You do need to implement equals on MyClass to check that two instances of MyClass are equal based on the equality of the String and int they contain. I'd also recommend making the s and i fields final to provide a stronger guarantee of the immutability of each MyClass instance if you're going to be using it this way.
Beyond that, I think what you actually want here is an interner.... that is, something to guarantee that you'll only ever store at most 1 instance of a given MyClass in memory at a time. The correct solution to this is a Map<MyClass, MyClass>... more specifically a ConcurrentMap<MyClass, MyClass> if there's any chance of getOrCreateMyClass being called from multiple threads. Now, you do need to create a new instance of MyClass in order to check the cache when using this approach, but that's inevitable really... and it's not a big deal because MyClass is easy to create.
Guava has something that does all the work for you here: its Interner interface and corresponding Interners factory/utility class. Here's how you might use it to implement getOrCreateMyClass:
private static final Interner<MyClass> interner = Interners.newStrongInterner();
public static MyClass getOrCreateMyClass(String s, int i) {
return interner.intern(new MyClass(s, i));
}
Note that using a strong interner will, like your example code, keep each MyClass it holds in memory as long as the interner is in memory, regardless of whether anything else in the program has a reference to a given instance. If you use newWeakInterner instead, when there isn't anything elsewhere in your program using a given MyClass instance, that instance will be eligible for garbage collection, helping you not waste memory with instances you don't need around.
If you choose to do this yourself, you'll want to use a ConcurrentMap cache and use putIfAbsent. You can take a look at the implementation of Guava's strong interner for reference I imagine... the weak reference approach is much more complicated though.

What does AtomicReference.compareAndSet() use for determination?

Say you have the following class
public class AccessStatistics {
private final int noPages, noErrors;
public AccessStatistics(int noPages, int noErrors) {
this.noPages = noPages;
this.noErrors = noErrors;
}
public int getNoPages() { return noPages; }
public int getNoErrors() { return noErrors; }
}
and you execute the following code
private AtomicReference<AccessStatistics> stats =
new AtomicReference<AccessStatistics>(new AccessStatistics(0, 0));
public void incrementPageCount(boolean wasError) {
AccessStatistics prev, newValue;
do {
prev = stats.get();
int noPages = prev.getNoPages() + 1;
int noErrors = prev.getNoErrors;
if (wasError) {
noErrors++;
}
newValue = new AccessStatistics(noPages, noErrors);
} while (!stats.compareAndSet(prev, newValue));
}
In the last line while (!stats.compareAndSet(prev, newValue)) how does the compareAndSet method determine equality between prev and newValue? Is the AccessStatistics class required to implement an equals() method? If not, why? The javadoc states the following for AtomicReference.compareAndSet
Atomically sets the value to the given updated value if the current value == the expected value.
... but this assertion seems very general and the tutorials i've read on AtomicReference never suggest implementing an equals() for a class wrapped in an AtomicReference.
If classes wrapped in AtomicReference are required to implement equals() then for objects more complex than AccessStatistics I'm thinking it may be faster to synchronize methods that update the object and not use AtomicReference.
It compares the refrerences exactly as if you had used the == operator. That means that the references must be pointing to the same instance. Object.equals() is not used.
Actually, it does not compare prev and newValue!
Instead it compares the value stored within stats to prev and only when those are the same, it updates the value stored within stats to newValue. As said above it uses the equals operator (==) to do so. This means that anly when prev is pointing to the same object as is stored in stats will stats be updated.
It simply checks the object reference equality (aka ==), so if object reference held by AtomicReference had changed after you got the reference, it won't change the reference, so you'll have to start over.
Following are some of the source code of AtomicReference. AtomicReference refers to an object reference. This reference is a volatile member variable in the AtomicReference instance as below.
private volatile V value;
get() simply returns the latest value of the variable (as volatiles do in a "happens before" manner).
public final V get()
Following is the most important method of AtomicReference.
public final boolean compareAndSet(V expect, V update) {
return unsafe.compareAndSwapObject(this, valueOffset, expect, update);
}
The compareAndSet(expect,update) method calls the compareAndSwapObject() method of the unsafe class of Java. This method call of unsafe invokes the native call, which invokes a single instruction to the processor. "expect" and "update" each reference an object.
If and only if the AtomicReference instance member variable "value" refers to the same object is referred to by "expect", "update" is assigned to this instance variable now, and "true" is returned. Or else, false is returned. The whole thing is done atomically. No other thread can intercept in between. As this is a single processor operation (magic of modern computer architecture), it's often faster than using a synchronized block. But remember that when multiple variables need to be updated atomically, AtomicReference won't help.
I would like to add a full fledged running code, which can be run in eclipse. It would clear many confusion. Here 22 users (MyTh threads) are trying to book 20 seats. Following is the code snippet followed by the full code.
Code snippet where 22 users are trying to book 20 seats.
for (int i = 0; i < 20; i++) {// 20 seats
seats.add(new AtomicReference<Integer>());
}
Thread[] ths = new Thread[22];// 22 users
for (int i = 0; i < ths.length; i++) {
ths[i] = new MyTh(seats, i);
ths[i].start();
}
Following is the github link for those who wants to see the running full code which is small and concise.
https://github.com/sankar4git/atomicReference/blob/master/Solution.java

Categories

Resources