Explaination for IntelliJ default hashCode() implementation - java

I took a look at the IntelliJ default hashCode() implementation and was wondering, why they implemented it the way they did. I'm quite new to the hash concept and found some contradictory statements, that need clarification:
public int hashCode(){
// creationDate is of type Date
int result = this.creationDate != null ? this.creationDate.hashCode() : 0;
// id is of type Long (wrapper class)
result = 31 * result + (this.id != null ? this.id.hashCode() : 0);
// code is of type String
result = 31 * result + (this.code != null ? this.code.hashCode() : 0);
// revision is of type int
result = 31 * result + this.revision;
return result;
}
Imo, the best source about this topic seemed to be this Java world article because I found their arguments most convincing. So I was wondering:
Among other arguments, above source states that multiplication is one of the slower operations. So, wouldn't it be better to skip the multiplication with a prime number whenever I call the hashCode() method of a reference type? Because most of the time this already includes such a multiplication.
Java world states that bitwise XOR ^ also improves the computation due to not mentioned reasons : ( What exactly might be an advantage in comparison to regular addition?
Wouldn't it be better to return different values when the respective class field is null? It would make the result more distinguishable, wouldn't it? Are there any huge disadvantages to use non-zero values?
Their example code looks more appealing to my eye, tbh:
public boolean hashCode() {
return
(name == null ? 17 : name.hashCode()) ^
(birth == null ? 31 : name.hashCode());
}
But I'm not sure if that's objectively true. I'm also a little bit suspicious of IntelliJ because their default code for equals(Object) compares by instanceof instead of comparing the instance classes directly. And I agree with that Java world article that this doesn't seem to fulfill the contract correctly.

As for hashCode(), I would consider it more important to minimize collisions (two different objects having same hashCode()) than the speed of the hashCode() computation. Yes, the hashCode() should be fast (constant-time if possible), but for huge data structures using hashCode() (maps, sets etc.) the collisions are more important factor.
If your hashCode() function performs in constant time (independent on data and input size) and produces a good hashing function (few collisions), asymptotically the operations (get, contains, put) on map will perform in constant time.
If your hashCode() function produces a lot of collisions, the performance will suffer. In extreme case, you can always return 0 from hashCode() - the function itself will be super-fast, but the map operations will perform in linear time (i.e. growing with map size).
Multiplying the hashCode() before adding another field's sub-hashCode should usually provide for less collisions - this is a heuristic based on that often the fields contain similar data / small numbers.
Consider an example of class Person:
class Person {
int age;
int heightCm;
int weightKg;
}
If you just added the numbers together to compute the hashCode, the result would be somewhere between 60 and 500 for all persons. If you multiply it the way Idea does, you will get hashCodes between 2000 and more than 100000 - much bigger space and therefore lower chance of collisions.
Using XOR is not a very good idea, for example if you have class Rectangle with fields height and width, all squares would have the same hashCode - 0.
As for equals() using instanceof vs. getClass().equals(), I've never seen a conclusive debate on this. Both have their advantages and disadvantages, and both ways can cause troubles if you're not careful:
If you use instanceof, any subclass that overrides your equals() will likely break the symmetry requirement
If you use getClass().equals(), this will not work well with some frameworks like Hibernate that produce their own subclasses of your classes to store their own technical information

Related

Can I use identityHashCode to produce a compareTo between Objects respecting same-ness?

I want to implement a simple comparator between two Objects, whose only requirements are that
it is a valid comparator (i.e. defines a linear order on all objects) and
.compare will return 0 if and only if the objects are the same.
Will Comparator.comparing(System::identityHashCode) work? Is there another way?
Motivation:
I want to build a collection that will allow me to store time-stamped messages in a thread-safe collection, which will support queries like "get me all the messages whose timestamp lies in [a,b)".
It seems that Guava's TreeMultimap uses a global lock (edit: if wrapped with the synchronizedSortedSetMultimap wrapper), and ConcurrentSkipListMap seems to support only one entry per time (it is a map, not a multi map). So I thought of using just a set of pairs:
ConcurrentSkipListSet<ImmutablePair<Float,Message>> db,
where the pairs are lexically ordered, first by the times (using Float.compareTo) and then by something like Comparator.nullsFirst(Comparator.comparing(System::identityHashCode)).
The nullsFirst is there just so db.subSet(ImmutablePair.of(a,null), ImmutablePair.of(b,null)) queries the half-open time interval [a,b).
You see why I care about the comparator preserving sameness: if the message comparator returns zero for non-same messages, messages may be deleted.
You also see why I don't need much else from the comparator: it's just there so I can use the storage mechanism of ConcurrentSkipListSet. I certainly don't want to impose on the user (well, just me :-) to implement a comparator for Message.
Another possible solution is to use a ConcurrentSkipListMap<Float, Set<Message>> (with thread-safe Set<> instances) but it seems a bit wasteful in terms of memory, and I will need to remove emptySet's myself to save memory once messages are deleted.
EDIT: As several people noted, identityHashCode may produce collisions, and in fact I've now confirmed that such collisions exist in my setup (which is roughly equivalent to having 4K collections as above, each populated with 4K messages per time bin). This is most likely the reason I see some messages dropped. So I'm now even more interested than ever in finding some way to have an "agnostic" comparison operator, that truly respects sameness. Actually, a 64 bit hash value (instead of the 32bit value provided by identityHashCode) would probably suffice.
While it's not guaranteed, I suspect the chances of this causing a problem are vanishingly small.
System.identityHashCode returns the value that Object.hashCode would return if not overridden, including this in the documentation:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.
So is "as much as is reasonably practical" sufficient? While it's not guaranteed, I would be very surprised if you ever ran into a situation where it causes a problem. You'd have to have two messages with exactly the same timestamp and where the JVM's Object.hashCode implementation returns the same value for the two messages.
If the result of that coincidence were to be "nuclear power plant explodes" then I wouldn't risk it. If the result of that coincidence were to be "we fail to bill a customer" - or even "we bill a customer twice, and might get sued" I'd probably accept that chance, if no better alternatives are suggested.
As #StuartMarks noted in his comment, Guava supports
Ordering.arbitrary(), which provides thread-safe collision handling. The implementation makes an efficient use of identityHashCode:
#Override
public int compare(Object left, Object right) {
if (left == right) {
return 0;
} else if (left == null) {
return -1;
} else if (right == null) {
return 1;
}
int leftCode = identityHashCode(left);
int rightCode = identityHashCode(right);
if (leftCode != rightCode) {
return leftCode < rightCode ? -1 : 1;
}
// identityHashCode collision (rare, but not as rare as you'd think)
int result = getUid(left).compareTo(getUid(right));
if (result == 0) {
throw new AssertionError(); // extremely, extremely unlikely.
}
return result;
}
so only if there is a hash collision, getUid (which uses a memoized AtomicInteger counter to allocate uid's) is invoked.
It's also quite easy to write (perhaps less easy to read?) the desired timestamped message container in "one" line:
db = new ConcurrentSkipListSet<>(
(Ordering.<Float>natural().<ImmutablePair<Float,Message>>onResultOf(x -> x.left))
.compound(Ordering.arbitrary().nullsFirst().<ImmutablePair<Float,Message>>onResultOf(x -> x.right)))
Will Comparator.comparing(System::identityHashCode) work? Is there another way?
As mentioned, identityHashCode is not unique.
Actually, a 64 bit hash value (instead of the 32bit value provided by identityHashCode) would probably suffice
I think this would just be reducing the chances of overlap, not removing them. Hash algorithems are designed to limit overlaps but typically have no guarantees about none. For example, MD5 is 128 bit and still has overlaps.
How about just assigning a unique number to each message with AtomicLong. Then your comparison function would do:
Compare by time. I would use long if possible instead of float.
If same time then compare by the unique value.
If you have multiple systems doing the ingesting of these messages then you are going to need to record unique system-id and message number to ensure uniqueness.

How do I prove that Object.hashCode() can produce same hash code for two different objects in Java?

Had a discussion with an interviewer regarding internal implementation of Java Hashmaps and how it would behave if we override equals() but not the HashCode() method for an Employee<Emp_ID, Emp_Name> object.
I was told that hashCode for two different objects would never be the same for the default object.hashCode() implementation, unless we overrode the hashCode() ourselves.
From what I remembered, I told him that Java Hashcode contracts says that two different objects "may" have the same hashcode() not that it "must".
According to my interviewer, the default object.hashcode() never returns the same hashcode() for two different objects, Is this true?
Is it even remotely possible to write a code that demonstrates this. From what I understand, Object.hashcode() can produce 2^30 unique values, how does one produce a collision, with such low possibility of collision to demonstrate that two different objects can get the same hashcode() with the Object classes method.
Or is he right, with the default Object.HashCode() implementation, we will never have a collision i.e two different objects can never have the same HashCode. If so, why do so many java manuals don't explicitly say so.
How can I write some code to demonstrate this? Because on demonstrating this, I can also prove that a bucket in a hashmap can contain different HashCodes(I tried to show him the debugger where the hashMap was expanded but he told me that this is just logical Implementation and not the internal algo?)
2^30 unique values sounds like a lot but the birthday problem means we don't need many objects to get a collision.
The following program works for me in about a second and gives a collision between objects 196 and 121949. I suspect it will heavily depend on your system configuration, compiler version etc.
As you can see from the implementation of the Hashable class, every one is guarenteed to be unique and yet there are still collisions.
class HashCollider
{
static class Hashable
{
private static int curr_id = 0;
public final int id;
Hashable()
{
id = curr_id++;
}
}
public static void main(String[] args)
{
final int NUM_OBJS = 200000; // birthday problem suggests
// this will be plenty
Hashable objs[] = new Hashable[NUM_OBJS];
for (int i = 0; i < NUM_OBJS; ++i) objs[i] = new Hashable();
for (int i = 0; i < NUM_OBJS; ++i)
{
for (int j = i + 1; j < NUM_OBJS; ++j)
{
if (objs[i].hashCode() == objs[j].hashCode())
{
System.out.println("Objects with IDs " + objs[i].id
+ " and " + objs[j].id + " collided.");
System.exit(0);
}
}
}
System.out.println("No collision");
}
}
If you have a large enough heap (assuming 64 bit address space) and objects are small enough (the smallest object size on a 64 bit JVM is 8 bytes), then you will be able to represent more than 2^32 objects that are reachable at the same time. At that point, the objects' identity hashcodes cannot be unique.
However, you don't need a monstrous heap. If you create a large enough pool of objects (e.g. in a large array) and randomly delete and recreate them, it is (I think) guaranteed that you will get a hashcode collision ... if you continue doing this long enough.
The default algorithm for hashcode in older versions of Java is based on the address of the object when hashcode is first called. If the garbage collector moves an object, and another one is created at the original address of the first one, and identityHashCode is called, then the two objects will have the same identity hashcode.
The current (Java 8) default algorithm uses a PRNG. The "birthday paradox" formula will tell you the probability that one object's identity hashcode is the same as one more of the other's.
The -XXhashCode=n option that #BastianJ mentioned has the following behavior:
hashCode == 0: Returns a freshly generated pseudo-random number
hashCode == 1: XORs the object address with a pseudo-random number that changes occasionally.
hashCode == 2: The hashCode is 1! (Hence #BastianJ's "cheat" answer.)
hashCode == 3: The hashcode is an ascending sequence number.
hashCode == 4: the bottom 32 bits of the object address
hashCode >= 5: This is the default algorithm for Java 8. It uses Marsaglia's xor-shift PRNG with a thread specific seed.
If you have downloaded the OpenJDK Java 8 source code, you will find the implementation in hotspot/src/share/vm/runtime/synchronizer.cp. Look for the get_next_hash() method.
So that is another way to prove it. Show him the source code!
Use Oracle JVM and set -XX:hashCode=2. If I remember corretly, this chooses the Default implementation to be "constant 1". Just for the purpose of proving you're right.
I have little to add to Michael's answer (+1) except a bit of code golfing and statistics.
The Wikipedia article on the Birthday problem that Michael linked to has a nice table of the number of events necessary to get a collision, with a desired probability, given a value space of a particular size. For example, Java's hashCode has 32 bits, giving a value space of 4 billion. To get a collision with a probability of 50%, about 77,000 events are necessary.
Here's a simple way to find two instances of Object that have the same hashCode:
static int findCollision() {
Map<Integer,Object> map = new HashMap<>();
Object n, o;
do {
n = new Object();
o = map.put(n.hashCode(), n);
} while (o == null);
assert n != o && n.hashCode() == o.hashCode();
return map.size() + 1;
}
This returns the number of attempts it took to get a collision. I ran this a bunch of times and generated some statistics:
System.out.println(
IntStream.generate(HashCollisions::findCollision)
.limit(1000)
.summaryStatistics());
IntSummaryStatistics{count=1000, sum=59023718, min=635, average=59023.718000, max=167347}
This seems quite in line with the numbers from the Wikipedia table. Incidentally, this took only about 10 seconds to run on my laptop, so this is far from a pathological case.
You were right in the first place, but it bears repeating: hash codes are not unique!

Is the hashCode function generated by Eclipse any good?

Eclipse source menu has a "generate hashCode / equals method" which generates functions like the one below.
String name;
#Override
public int hashCode()
{
final int prime = 31;
int result = 1;
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
#Override
public boolean equals(Object obj)
{
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
CompanyRole other = (CompanyRole) obj;
if (name == null)
{
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
return true;
}
If I select multiple fields when generating hashCode() and equals() Eclipse uses the same pattern shown above.
I am not an expert on hash functions and I would like to know how "good" the generated hash function is? What are situations where it will break down and cause too many collisions?
You can see the implementation of hashCode function in java.util.ArrayList as
public int hashCode() {
int hashCode = 1;
Iterator<E> i = iterator();
while (i.hasNext()) {
E obj = i.next();
hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());
}
return hashCode;
}
It is one such example and your Eclipse generated code follows a similar way of implementing it. But if you feel that you have to implement your hashCode by your own, there are some good guidelines given by Joshua Bloch in his famous book Effective Java. I will post those important points from Item 9 of that book. Those are,
Store some constant nonzero value, say, 17, in an int variable called result.
For each significant field f in your object (each field taken into account by the equals method, that is), do the following:
a. Compute an int hash code c for the field:
i. If the field is a boolean, compute (f ? 1 : 0).
ii. If the field is a byte, char, short, or int, compute (int) f.
iii. If the field is a long, compute (int) (f ^ (f >>> 32)).
iv. If the field is a float, compute Float.floatToIntBits(f).
v. If the field is a double, compute Double.doubleToLongBits(f), and then hash the resulting long as in step 2.a.iii.
vi. If the field is an object reference and this class’s equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If a more complex comparison is required, compute a “canonical representation” for this field and invoke hashCode on the canonical representation. If the value of the field is null, return 0 (or some other constant, but 0 is traditional)
vii. If the field is an array, treat it as if each element were a separate field.
That is, compute a hash code for each significant element by applying
these rules recursively, and combine these values per step 2.b. If every
element in an array field is significant, you can use one of the
Arrays.hashCode methods added in release 1.5.
b. Combine the hash code c computed in step 2.a into result as follows:
result = 31 * result + c;
Return result.
When you are finished writing the hashCode method, ask yourself whether
equal instances have equal hash codes. Write unit tests to verify your intuition!
If equal instances have unequal hash codes, figure out why and fix the problem.
Java language designers and Eclipse seem to follow similar guidelines I suppose. Happy coding. Cheers.
Since Java 7 you can use java.util.Objects to write short and elegant methods:
class Foo {
private String name;
private String id;
#Override
public int hashCode() {
return Objects.hash(name,id);
}
#Override
public boolean equals(Object obj) {
if (obj instanceof Foo) {
Foo right = (Foo) obj;
return Objects.equals(name,right.name) && Objects.equals(id,right.id);
}
return false;
}
}
Generally it is good, but:
Guava does it somehow better, I prefer it. [EDIT: It seems that as of JDK7 Java provides a similar hash function].
Some frameworks can cause problems when accessing fields directly instead of using setters/getters, like Hibernate for example. For some fields that Hibernate creates lazy, it creates a proxy not the real object. Only calling the getter will make Hibernate go for the real value in the database.
Yes, it is perfect :) You will see this approach almost everywhere in the Java source code.
It's a standard way of writing hash functions. However, you can improve/simplify it if you have some knowledge about the fields. E.g. you can ommit the null check, if your class guarantees that the field never be null (applies to equals() as well). Or you can of delegate the field's hash code if only one field is used.
I would also like to add a reference to Item 9, in Effective Java 2nd Edition by Joshua Bloch.
Here is a recipe from Item 9 : ALWAYS OVERRIDE HASHCODE WHEN YOU OVERRIDE EQUALS
Store some constant nonzero value, say, 17, in an int variable called result.
For each significant field f in your object (each field taken into account by the equals method, that is), do the following:
a. Compute an int hash code c for the field:
i. If the field is a boolean, compute (f ? 1 : 0).
ii. If the field is a byte, char, short, or int, compute (int) f.
iii. If the field is a long,compute(int)(f^(f>>>32)).
iv. If the field is a float, compute Float.floatToIntBits(f).
v. If the field is a double, compute Double.doubleToLongBits(f), and then hash the resulting long as in step 2.a.iii.
vi. If the field is an object reference and this class’s equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If a more complex comparison is required, compute a “canonical representation” for this field and invoke hashCode on the canonical representation. If the value of the field is null, return 0 (or some other constant, but 0 is traditional).
vii. If the field is an array, treat it as if each element were a separate field. That is, compute a hash code for each significant element by applying these rules recursively, and combine these values per step 2.b. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5.
b. Combine the hash code c computed in step 2.a into result as follows: result = 31 * result + c;
3. Return result.
4. When you are finished writing the hashCode method, ask yourself whether equal instances have equal hash codes. Write unit tests to verify your intuition! If equal instances have unequal hash codes, figure out why and fix the problem.
If you are using Apache Software Foundation (commons-lang library) then
below classes will help you to generate hashcode/equals/toString methods using reflection.
You don't need to worry about regenerating hashcode/equals/toString methods when you add/remove instance variables.
EqualsBuilder - This class provides methods to build a good equals method for any class. It follows rules laid out in Effective Java , by Joshua Bloch. In particular the rule for comparing doubles, floats, and arrays can be tricky. Also, making sure that equals() and hashCode() are consistent can be difficult.
HashCodeBuilder - This class enables a good hashCode method to be built for any class. It follows the rules laid out in the book Effective Java by Joshua Bloch. Writing a good hashCode method is actually quite difficult. This class aims to simplify the process.
ReflectionToStringBuilder - This class uses reflection to determine the fields to append. Because these fields are usually private, the class uses AccessibleObject.setAccessible(java.lang.reflect.AccessibleObject[], boolean) to change the visibility of the fields. This will fail under a security manager, unless the appropriate permissions are set up correctly.
Maven Dependency:
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>${commons.lang.version}</version>
</dependency>
Sample Code:
import org.apache.commons.lang.builder.EqualsBuilder;
import org.apache.commons.lang.builder.HashCodeBuilder;
import org.apache.commons.lang.builder.ReflectionToStringBuilder;
public class Test{
instance variables...
....
getter/setter methods...
....
#Override
public String toString() {
return ReflectionToStringBuilder.toString(this);
}
#Override
public int hashCode() {
return HashCodeBuilder.reflectionHashCode(this);
}
#Override
public boolean equals(Object obj) {
return EqualsBuilder.reflectionEquals(this, obj);
}
}
One potential drawback is that all objects with null fields will have a hash code of 31, thus there could be many potential collisions between objects that only contain null fields. This would make for slower lookups in Maps.
This can occur when you have a Map whose key type has multiple subclasses. For example, if you had a HashMap<Object, Object>, you could have many key values whose hash code was 31. Admittedly, this won't occur that often. If you like, you could randomly change the values of the prime to something besides 31, and lessen the probability of collisions.

Why does Object.hashcode() have conflicts in Java?

I run the code below in Hotspot JDK 1.6 on Windows XP,
I ran it twice and I got the results below.
So basically it seems the object.hashcode() also have conflicts?
it looks like it's not returning the memory address in the VM.
However, a comment in the JDK said the values should be distinct, can anyone explain?
As much as is reasonably practical, the hashCode method defined by
class Object does return distinct integers for distinct
objects. (This is typically implemented by converting the internal
address of the object into an integer, but this implementation
technique is not required by the
JavaTM programming language.)
#return a hash code value for this object.
#see java.lang.Object#equals(java.lang.Object)
#see java.util.Hashtable
This is the first result:
i,hashcode(): 361,9578500
i,hashcode(): 1886,9578500
conflict:1886, 361
i,hashcode(): 1905,14850080
i,hashcode(): 2185,14850080
conflict:2185, 1905
9998
This is the 2nd result:
i,hashcode(): 361,5462872
i,hashcode(): 1886,29705835
conflict:1887, 362
i,hashcode(): 1905,9949222
i,hashcode(): 2185,2081190
conflict:2186, 1906
9998
10000
My code:
#Test
public void testAddr()
{
Set<Integer> s = new TreeSet<Integer>();
Map<Integer, Integer> m = new TreeMap<Integer, Integer>();
Set<Object> os = new HashSet<Object>();
for(int i = 0; i < 10000; ++i)
{
Object o = new Object();
os.add(o);
Integer h = o.hashCode();
if((i == 361) || (i == 1886) || (i == 2185) || (i == 1905))
{
System.out.println("i,hashcode(): " + i + "," + h);
}
if(s.contains(h))
{
System.out.println("conflict:" + i + ", " + m.get(h));
}
else
{
s.add(h);
m.put(h, i);
}
}
System.out.println(s.size());
int c = 0;
for(Object o: os)
{
c++;
}
System.out.println(c);
}
hashCode() is supposed to be used for placing objects in hash tables. The rule for hashCode is not that hashCode should never generate conflicts, although that is a desirable property, but that equal objects must have equal hash codes. This does not preclude non-equal objects from having equal hash codes.
You have found a case where the default Object.hashCode() implementation does generate equal hash codes for non-equal objects. It is required that the hash code of an object not change unless there is a change to some field affection equality of that object with another. One possible cause is that the garbage collector rearranged memory so that a later instantiation of o was at the same location as an earlier instantiation of o (that is, you allocated two objects o in the loop, and the garbage collector rearranged memory in between the two allocations so that the old o was moved out of one location of memory and the new o was then allocated at that location). Then, even though the hash code for the old o cannot change, the hash code for the new o is the address where the new o is stored in memory, which happens to be equal to the hash code for the old o.
It's an unfortunately common misinterpretation of the API docs. From a still-unfixed (1 vote) bug for this some time ago.
(spec) System.identityHashCode doc inadequate, Object.hashCode default
implementation docs mislead
[...]
From Usenet discussions and Open Source Software it appears that
many, perhaps majority, of programmers take this to mean that the
default implementation, and hence System.identityHashCode, will
produce unique hashcodes.
The suggested implementation technique is not even appropriate to
modern handleless JVMs, and should go the same way as JVM Spec Chapter
9.
The qualification "As much as is reasonably practical," is, in
practice, insufficient to make clear that hashcodes are not, in
practice, distinct.
It is possible that a long-running program may create, call hashCode() upon, and abandon many billions of objects during the time that it runs. Thus, it would be mathematically impossible to ensure that ensuring that once some object hashCode returned a particular number, no other object would ever return that same number for the life of the program. Even if hashCode() somehow managed to return unique values for the first 4,294,967,296 objects, it would have no choice but to return an already-used value for the next one (since the previous call would have used the last remaining formerly-unused value).
The fact that hashCode() clearly cannot guarantee that hash values won't get reused for the life of the program does not mean it couldn't guarantee that hash codes won't get reused during the lifetime of the objects in question. Indeed, for some memory-management schemes, such a guarantee could be made relatively cheaply. For example, the 1984 Macintosh split the heap into two parts, one of which held fixed-sized object descriptors, and one of which held variable-sized object data. The object descriptors, once created, would never move; if any objects were deleted, the space used by their descriptors would get reused when new objects were created. Under such a scheme, the address of an object descriptor would represent a unique and unchanging representation of its identity for as long as the object existed, and could thus be used as a hashCode() value. Unfortunately, such schemes tend to have more overhead than some other approaches in which objects have no fixed address associated with them.
The comment does not say that it is distinct.
It says that it is distinct as much as is reasonably practical.
Apparently, you found a case where it wasn't practical.
Hashcodes do not have to be unique just consistent. Although more often than not they are typically fairly unique.
In addition to your excerpt above Object has the following to say.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
Object Doc

What is a best practice of writing hash function in java?

I'm wondering what is the best practice for writing #hashCode() method in java.
Good description can be found here. Is it that good?
Here's a quote from Effective Java 2nd Edition, Item 9: "Always override hashCode when you override equals":
While the recipe in this item yields reasonably good hash functions, it does not yield state-of-the-art hash functions, nor do Java platform libraries provide such hash functions as of release 1.6. Writing such hash functions is a research topic, best left to mathematicians and computer scientists. [... Nonetheless,] the techniques described in this item should be adequate for most applications.
Josh Bloch's recipe
Store some constant nonzero value, say 17, in an int variable called result
Compute an int hashcode c for each field f that defines equals:
If the field is a boolean, compute (f ? 1 : 0)
If the field is a byte, char, short, int, compute (int) f
If the field is a long, compute (int) (f ^ (f >>> 32))
If the field is a float, compute Float.floatToIntBits(f)
If the field is a double, compute Double.doubleToLongBits(f), then hash the resulting long as in above
If the field is an object reference and this class's equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If the value of the field is null, return 0
If the field is an array, treat it as if each element is a separate field. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5
Combine the hashcode c into result as follows: result = 31 * result + c;
Now, of course that recipe is rather complicated, but luckily, you don't have to reimplement it every time, thanks to java.util.Arrays.hashCode(Object[]).
#Override public int hashCode() {
return Arrays.hashCode(new Object[] {
myInt, //auto-boxed
myDouble, //auto-boxed
myString,
});
}
As of Java 7 there is a convenient varargs variant in java.util.Objects.hash(Object...).
A great reference for an implementation of hashCode() is described in the book Effective Java. After you understand the theory behind generating a good hash function, you may check HashCodeBuilder from Apache commons lang, which implements what's described in the book. From the docs:
This class enables a good hashCode
method to be built for any class. It
follows the rules laid out in the book
Effective Java by Joshua Bloch.
Writing a good hashCode method is
actually quite difficult. This class
aims to simplify the process.
It's good, as #leonbloy says, to understand it well. Even then, however, one "best" practice is to simply let your IDE write the function for you. It won't be optimal under some circumstances - and in some very rare circumstances it won't even be good - but for most situations, it's easy, repeatable, error-free, and as good (as a hash code) as it needs to be. Sure, read the docs and understand it well - but don't complicate it unnecessarily.

Categories

Resources