"hash" variable in String class

"hash" variable in String class - java

what is the use of private "hash" variable in java.lang.String class. It is private and calculated/re-calculated every time hashcode method is called.
http://hg.openjdk.java.net/jdk7u/jdk7u6/jdk/file/8c2c5d63a17e/src/share/classes/java/lang/String.java

It's used to cache the hashCode of the String. Because String is immutable, its hashCode will never change, so attempting to recalculate it after it's already been calculated is pointless.
In the code that you've posted, it's only recalculated when the value of hash is 0, which can either occur if the hashCode hasn't been calculated yet or if the hashCode of the String is actually 0, which is possible!
For example, the hashCode of "aardvark polycyclic bitmap" is 0.
This oversight seems to have been corrected in Java 13 with the introduction of a hashIsZero field:
public int hashCode() {
// The hash or hashIsZero fields are subject to a benign data race,
// making it crucial to ensure that any observable result of the
// calculation in this method stays correct under any possible read of
// these fields. Necessary restrictions to allow this to be correct
// without explicit memory fences or similar concurrency primitives is
// that we can ever only write to one of these two fields for a given
// String instance, and that the computation is idempotent and derived
// from immutable state
int h = hash;
if (h == 0 && !hashIsZero) {
h = isLatin1() ? StringLatin1.hashCode(value)
: StringUTF16.hashCode(value);
if (h == 0) {
hashIsZero = true;
} else {
hash = h;
}
}
return h;
}

Related

Lazy initialization of hashcode in Java

Why do we say that immutable objects use lazy hash code initialization? For mutable objects too, we can calculate hashcode only when required right causing lazy initialization?

For mutable classes, it usually doesn't make much sense to store the hashCode, as you'd have to update it every time the object is modified (or at least nullify it so you can recalculate it next time hashCode() is called).
For immutable classes, it makes a lot of sense to store the hash code - once it's calculated, it will never change (since the object is immutable), and there's no need to keep re-calculating every time hashCode() is called. As a further optimization, we can avoid calculating this value until the first time it's needed (i.e., hashCode() is called) - i.e., use lazy initialization.
There's nothing that prohibits you from doing the same on a mutable object, it's just generally not a very good idea.

The advantage of lazy initialization is that hashcode computation is suspended until it is required. Many objects don't need it at all, so you save some computations. Particularly when you have high hash computations. Look at the example below :
class FinalObject {
private final int a, b;
public FinalObject(int value1, int value2) {
a = value1;
b = value2;
}
// not calculated at the beginning - lazy once required
private int hashCode;
#Override
public int hashCode() {
int h = hashCode; // read
if (h == 0) {
h = a + b; // calculation
hashCode = h; // write
}
return h; // return local variable instead of second read
}
}

Deciding the right HashCode [duplicate]

How do we decide on the best implementation of hashCode() method for a collection (assuming that equals method has been overridden correctly) ?

The best implementation? That is a hard question because it depends on the usage pattern.
A for nearly all cases reasonable good implementation was proposed in Josh Bloch's Effective Java in Item 8 (second edition). The best thing is to look it up there because the author explains there why the approach is good.
A short version
Create a int result and assign a non-zero value.
For every field f tested in the equals() method, calculate a hash code c by:
If the field f is a boolean:
calculate (f ? 0 : 1);
If the field f is a byte, char, short or int: calculate (int)f;
If the field f is a long: calculate (int)(f ^ (f >>> 32));
If the field f is a float: calculate Float.floatToIntBits(f);
If the field f is a double: calculate Double.doubleToLongBits(f) and handle the return value like every long value;
If the field f is an object: Use the result of the hashCode() method or 0 if f == null;
If the field f is an array: see every field as separate element and calculate the hash value in a recursive fashion and combine the values as described next.
Combine the hash value c with result:
result = 37 * result + c
Return result
This should result in a proper distribution of hash values for most use situations.

If you're happy with the Effective Java implementation recommended by dmeister, you can use a library call instead of rolling your own:
#Override
public int hashCode() {
return Objects.hash(this.firstName, this.lastName);
}
This requires either Guava (com.google.common.base.Objects.hashCode) or the standard library in Java 7 (java.util.Objects.hash) but works the same way.

Although this is linked to Android documentation (Wayback Machine) and My own code on Github, it will work for Java in general. My answer is an extension of dmeister's Answer with just code that is much easier to read and understand.
#Override
public int hashCode() {
// Start with a non-zero constant. Prime is preferred
int result = 17;
// Include a hash for each field.
// Primatives
result = 31 * result + (booleanField ? 1 : 0); // 1 bit » 32-bit
result = 31 * result + byteField; // 8 bits » 32-bit
result = 31 * result + charField; // 16 bits » 32-bit
result = 31 * result + shortField; // 16 bits » 32-bit
result = 31 * result + intField; // 32 bits » 32-bit
result = 31 * result + (int)(longField ^ (longField >>> 32)); // 64 bits » 32-bit
result = 31 * result + Float.floatToIntBits(floatField); // 32 bits » 32-bit
long doubleFieldBits = Double.doubleToLongBits(doubleField); // 64 bits (double) » 64-bit (long) » 32-bit (int)
result = 31 * result + (int)(doubleFieldBits ^ (doubleFieldBits >>> 32));
// Objects
result = 31 * result + Arrays.hashCode(arrayField); // var bits » 32-bit
result = 31 * result + referenceField.hashCode(); // var bits » 32-bit (non-nullable)
result = 31 * result + // var bits » 32-bit (nullable)
(nullableReferenceField == null
? 0
: nullableReferenceField.hashCode());
return result;
}
EDIT
Typically, when you override hashcode(...), you also want to override equals(...). So for those that will or has already implemented equals, here is a good reference from my Github...
#Override
public boolean equals(Object o) {
// Optimization (not required).
if (this == o) {
return true;
}
// Return false if the other object has the wrong type, interface, or is null.
if (!(o instanceof MyType)) {
return false;
}
MyType lhs = (MyType) o; // lhs means "left hand side"
// Primitive fields
return booleanField == lhs.booleanField
&& byteField == lhs.byteField
&& charField == lhs.charField
&& shortField == lhs.shortField
&& intField == lhs.intField
&& longField == lhs.longField
&& floatField == lhs.floatField
&& doubleField == lhs.doubleField
// Arrays
&& Arrays.equals(arrayField, lhs.arrayField)
// Objects
&& referenceField.equals(lhs.referenceField)
&& (nullableReferenceField == null
? lhs.nullableReferenceField == null
: nullableReferenceField.equals(lhs.nullableReferenceField));
}

It is better to use the functionality provided by Eclipse which does a pretty good job and you can put your efforts and energy in developing the business logic.

First make sure that equals is implemented correctly. From an IBM DeveloperWorks article:
Symmetry: For two references, a and b, a.equals(b) if and only if b.equals(a)
Reflexivity: For all non-null references, a.equals(a)
Transitivity: If a.equals(b) and b.equals(c), then a.equals(c)
Then make sure that their relation with hashCode respects the contact (from the same article):
Consistency with hashCode(): Two equal objects must have the same hashCode() value
Finally a good hash function should strive to approach the ideal hash function.

about8.blogspot.com, you said
if equals() returns true for two objects, then hashCode() should return the same value. If equals() returns false, then hashCode() should return different values
I cannot agree with you. If two objects have the same hashcode it doesn't have to mean that they are equal.
If A equals B then A.hashcode must be equal to B.hascode
but
if A.hashcode equals B.hascode it does not mean that A must equals B

If you use eclipse, you can generate equals() and hashCode() using:
Source -> Generate hashCode() and equals().
Using this function you can decide which fields you want to use for equality and hash code calculation, and Eclipse generates the corresponding methods.

There's a good implementation of the Effective Java's hashcode() and equals() logic in Apache Commons Lang. Checkout HashCodeBuilder and EqualsBuilder.

Just a quick note for completing other more detailed answer (in term of code):
If I consider the question how-do-i-create-a-hash-table-in-java and especially the jGuru FAQ entry, I believe some other criteria upon which a hash code could be judged are:
synchronization (does the algo support concurrent access or not) ?
fail safe iteration (does the algo detect a collection which changes during iteration)
null value (does the hash code support null value in the collection)

If I understand your question correctly, you have a custom collection class (i.e. a new class that extends from the Collection interface) and you want to implement the hashCode() method.
If your collection class extends AbstractList, then you don't have to worry about it, there is already an implementation of equals() and hashCode() that works by iterating through all the objects and adding their hashCodes() together.
public int hashCode() {
int hashCode = 1;
Iterator i = iterator();
while (i.hasNext()) {
Object obj = i.next();
hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());
}
return hashCode;
}
Now if what you want is the best way to calculate the hash code for a specific class, I normally use the ^ (bitwise exclusive or) operator to process all fields that I use in the equals method:
public int hashCode(){
return intMember ^ (stringField != null ? stringField.hashCode() : 0);
}

#about8 : there is a pretty serious bug there.
Zam obj1 = new Zam("foo", "bar", "baz");
Zam obj2 = new Zam("fo", "obar", "baz");
same hashcode
you probably want something like
public int hashCode() {
return (getFoo().hashCode() + getBar().hashCode()).toString().hashCode();
(can you get hashCode directly from int in Java these days? I think it does some autocasting.. if that's the case, skip the toString, it's ugly.)

As you specifically asked for collections, I'd like to add an aspect that the other answers haven't mentioned yet: A HashMap doesn't expect their keys to change their hashcode once they are added to the collection. Would defeat the whole purpose...

Use the reflection methods on Apache Commons EqualsBuilder and HashCodeBuilder.

I use a tiny wrapper around Arrays.deepHashCode(...) because it handles arrays supplied as parameters correctly
public static int hash(final Object... objects) {
return Arrays.deepHashCode(objects);
}

any hashing method that evenly distributes the hash value over the possible range is a good implementation. See effective java ( http://books.google.com.au/books?id=ZZOiqZQIbRMC&dq=effective+java&pg=PP1&ots=UZMZ2siN25&sig=kR0n73DHJOn-D77qGj0wOxAxiZw&hl=en&sa=X&oi=book_result&resnum=1&ct=result ) , there is a good tip in there for hashcode implementation (item 9 i think...).

I prefer using utility methods fromm Google Collections lib from class Objects that helps me to keep my code clean. Very often equals and hashcode methods are made from IDE's template, so their are not clean to read.

Here is another JDK 1.7+ approach demonstration with superclass logics accounted. I see it as pretty convinient with Object class hashCode() accounted, pure JDK dependency and no extra manual work. Please note Objects.hash() is null tolerant.
I have not include any equals() implementation but in reality you will of course need it.
import java.util.Objects;
public class Demo {
public static class A {
private final String param1;
public A(final String param1) {
this.param1 = param1;
}
#Override
public int hashCode() {
return Objects.hash(
super.hashCode(),
this.param1);
}
}
public static class B extends A {
private final String param2;
private final String param3;
public B(
final String param1,
final String param2,
final String param3) {
super(param1);
this.param2 = param2;
this.param3 = param3;
}
#Override
public final int hashCode() {
return Objects.hash(
super.hashCode(),
this.param2,
this.param3);
}
}
public static void main(String [] args) {
A a = new A("A");
B b = new B("A", "B", "C");
System.out.println("A: " + a.hashCode());
System.out.println("B: " + b.hashCode());
}
}

The standard implementation is weak and using it leads to unnecessary collisions. Imagine a
class ListPair {
List<Integer> first;
List<Integer> second;
ListPair(List<Integer> first, List<Integer> second) {
this.first = first;
this.second = second;
}
public int hashCode() {
return Objects.hashCode(first, second);
}
...
}
Now,
new ListPair(List.of(a), List.of(b, c))
and
new ListPair(List.of(b), List.of(a, c))
have the same hashCode, namely 31*(a+b) + c as the multiplier used for List.hashCode gets reused here. Obviously, collisions are unavoidable, but producing needless collisions is just... needless.
There's nothing substantially smart about using 31. The multiplier must be odd in order to avoid losing information (any even multiplier loses at least the most significant bit, multiples of four lose two, etc.). Any odd multiplier is usable. Small multipliers may lead to faster computation (the JIT can use shifts and additions), but given that multiplication has latency of only three cycles on modern Intel/AMD, this hardly matters. Small multipliers also leads to more collision for small inputs, which may be a problem sometimes.
Using a prime is pointless as primes have no meaning in the ring Z/(2**32).
So, I'd recommend using a randomly chosen big odd number (feel free to take a prime). As i86/amd64 CPUs can use a shorter instruction for operands fitting in a single signed byte, there is a tiny speed advantage for multipliers like 109. For minimizing collisions, take something like 0x58a54cf5.
Using different multipliers in different places is helpful, but probably not enough to justify the additional work.

When combining hash values, I usually use the combining method that's used in the boost c++ library, namely:
seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
This does a fairly good job of ensuring an even distribution. For some discussion of how this formula works, see the StackOverflow post: Magic number in boost::hash_combine
There's a good discussion of different hash functions at: http://burtleburtle.net/bob/hash/doobs.html

For a simple class it is often easiest to implement hashCode() based on the class fields which are checked by the equals() implementation.
public class Zam {
private String foo;
private String bar;
private String somethingElse;
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
Zam otherObj = (Zam)obj;
if ((getFoo() == null && otherObj.getFoo() == null) || (getFoo() != null && getFoo().equals(otherObj.getFoo()))) {
if ((getBar() == null && otherObj. getBar() == null) || (getBar() != null && getBar().equals(otherObj. getBar()))) {
return true;
}
}
return false;
}
public int hashCode() {
return (getFoo() + getBar()).hashCode();
}
public String getFoo() {
return foo;
}
public String getBar() {
return bar;
}
}
The most important thing is to keep hashCode() and equals() consistent: if equals() returns true for two objects, then hashCode() should return the same value. If equals() returns false, then hashCode() should return different values.

Is there a reason I should call Integer.hashCode()?

Maybe there is a reason I don't know, but I see it's being used in my code to calculate hashcode of a complex object.
Does it provide anything comparing to putting Integer itself there? (I hope not), or it's just for a better clarity?
class SomeClass() {
private Integer myIntegerField1;
private Integer myIntegerField2;
...
public int hashCode() {
final int prime = 31;
int result =1;
result = prime * result + ((myIntegerField1 == null) ? 0 : myIntegerField1.hashCode());
result = prime * result + ....
...
return result;
}
}

The javadoc of Integer.hashCode() says:
Returns: a hash code value for this object, equal to the
primitive int value represented by this
Integer object.
So using Integer.hashCode() or Integer.intValue(), or using auto-unboxing leads to exactly the same value.

Your posted code was auto-generated by an IDE. The code generator has no special cases to handle Integer or other primitive type wrappers, and there isn't a really good reason for it to have one: the way it is implemented now is 100% by the book and on a general level of consideration is the right thing to do.
If you replaced myIntegerField1.hashCode() with just myIntegerField1, the real effect would be a change from a hashCode() call to an intValue() call, and if you check out the source code, you'll find that these two methods are exactly the same.

Composite objects can use combined hashes of their internal state to calculate their own hash code. Example:
public class Person
{
private Integer id;
private String name;
#Override
public int hashCode()
{
int hash = getClass().getName().hashCode();
if (id != null)
{
hash ^= id.hashCode();
}
if (name != null)
{
hash ^= name.hashCode();
}
return hash;
}
}
Don't make hashes overly complicated, and base hashes only on some values which don't change, or are otherwise likely to be stable. Hash codes, by their very nature, are not required to be unique or collision-free.
The hash code is just a quick and dirty finger print that allows for a quick determination whether two instances are NOT equal (if they were equal, they would have to have the same hash code), so the actual equals() check has to be executed only for instances whose hash is equals (again, same hash does NOT imply that they are equal).

There is no reason to explicitly use the hashcode of an Integer. The source code just returns the value of the Integer:
public int hashCode(){
return value;
}
So use the value of the Integer rather than the hash code.
What is the reason why this method is included in the source? What would happen if you had an Object that points to an Integer? Explicitly including the method in the source code ensures proper results.

Here you are trying to find the hashcode of SomeClass type objects.
public int hashCode() {
final int prime = 31;
int result =1;
result = prime * result + ((myIntegerField1 == null) ? 0 : myIntegerField1.hashCode());
result = prime * result + ....
...
return result;
}
In
result = prime * result + ((myIntegerField1 == null) ? 0 : myIntegerField1.hashCode());
you are trying to check if myIntegerField1==null, return hashCode as 0 else hashCode of Integer myIntegerField1.
Remember : myIntegerField1.hashCode() and myIntegerField1.intValue() will return same value as myIntegerField1.

String immutability allows hashcode value to be cached

Among the many reasons to why Strings are immutable, one of the reasons is cited as
String immutability allows hashcode value to be cached.
I did not really understand this. What is meant by caching hashcode values? Where are these values cached? Even if Strings would have been mutable, this cached hashcode value could always be updated as required; so what's the big deal?

What is meant by caching hashcode values? Where are these values cached?
After the hash code is calculated, it is stored in a variable in String.
Looking at the source of String makes this clearer:
public final class String implements ... {
...
/** Cache the hash code for the string */
private int hash; // Default to 0
...
public int hashCode() {
int h = hash;
if (h == 0 && ...) {
...
hash = h;
}
return h;
}
...
}
Even if Strings would have been mutable, this cached hashcode value could always be updated as required
True. But it would have to be recalculated / reset in every modification function. While this is possible, it's not good design.
All in all, the reason probably would've been better if it were as follows:
String immutability makes it easier to cache the hashcode value.

immutable objects and lazy initialization.

http://www.javapractices.com/topic/TopicAction.do?Id=29
Above is the article which i am looking at. Immutable objects greatly simplify your program, since they:
allow hashCode to use lazy initialization, and to cache its return value
Can anyone explain me what the author is trying to say on the above
line.
Is my class immutable if its marked final and its instance variable
still not final and vice-versa my instance variables being final and class being normal.

As explained by others, because the state of the object won't change the hashcode can be calculated only once.
The easy solution is to precalculate it in the constructor and place the result in a final variable (which guarantees thread safety).
If you want to have a lazy calculation (hashcode only calculated if needed) it is a little more tricky if you want to keep the thread safety characteristics of your immutable objects.
The simplest way is to declare a private volatile int hash; and run the calculation if it is 0. You will get laziness except for objects whose hashcode really is 0 (1 in 4 billion if your hash method is well distributed).
Alternatively you could couple it with a volatile boolean but need to be careful about the order in which you update the two variables.
Finally for extra performance, you can use the methodology used by the String class which uses an extra local variable for the calculation, allowing to get rid of the volatile keyword while guaranteeing correctness. This last method is error prone if you don't fully understand why it is done the way it is done...

If your object is immutable it can't change it's state and therefore it's hashcode can't change. That allows you to calculate the value once you need it and to cache the value since it will always stay the same. It's in fact a very bad idea to implement your own hasCode function based on mutable state since e.g. HashMap assumes that the hash can't change and it will break if it does change.
The benefit of lazy initialization is that hashcode calculation is delayed until it is required. Many object don't need it at all so you save some calculations. Especially expensive hash calculations like on long Strings benefit from that.
class FinalObject {
private final int a, b;
public FinalObject(int value1, int value2) {
a = value1;
b = value2;
}
// not calculated at the beginning - lazy once required
private int hashCode;
#Override
public int hashCode() {
int h = hashCode; // read
if (h == 0) {
h = a + b; // calculation
hashCode = h; // write
}
return h; // return local variable instead of second read
}
}
Edit: as pointed out by #assylias, using unsynchronized / non volatile code is only guaranteed to work if there is only 1 read of hashCode because every consecutive read of that field could return 0 even though the first read could already see a different value. Above version fixes the problem.
Edit2: replaced with more obvious version, slightly less code but roughly equivalent in bytecode
public int hashCode() {
int h = hashCode; // only read
return h != 0 ? h : (hashCode = a + b);
// ^- just a (racy) write to hashCode, no read
}

What that line means is, since the object is immutable, then the hashCode has to only be computed once. Further, it doesn't have to be computed when the object is constructed - it only has to be computed when the function is first called. If the object's hashCode is never used then it is never computed. So the hashCode function can look something like this:
#Override public int hashCode(){
synchronized (this) {
if (!this.computedHashCode) {
this.hashCode = expensiveComputation();
this.computedHashCode = true;
}
}
return this.hashCode;
}

And to add to other answers.
Immutable object cannot be changed. The final keyword works for basic data types such as int. But for custom objects it doesn't mean that - it has to be done internally in your implementation:
The following code would result in a compilation error, because you are trying to change a final reference/pointer to an object.
final MyClass m = new MyClass();
m = new MyClass();
However this code would work.
final MyClass m = new MyClass();
m.changeX();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

"hash" variable in String class - java

what is the use of private "hash" variable in java.lang.String class. It is private and calculated/re-calculated every time hashcode method is called. http://hg.openjdk.java.net/jdk7u/jdk7u6/jdk/file/8c2c5d63a17e/src/share/classes/java/lang/String.java

Related

Lazy initialization of hashcode in Java

Deciding the right HashCode [duplicate]

Is there a reason I should call Integer.hashCode()?

String immutability allows hashcode value to be cached

immutable objects and lazy initialization.

Categories

Resources