hashCode() method [duplicate] - java

This question already has answers here:
What is the use of hashCode in Java?
(8 answers)
Closed 2 years ago.
I need some help in better understanding hashCode() method in a theoretical way. I´ve read (emphasis mine):
When hashCode() is called on two separate objects (which are equal according to the equals() method) it returns the same hash code value. However, if it is called on two unequal objects, it will not necessarily return different integer values.
Where can said exceptions occur?

Suppose you have a class with two String fields, and that its hashcode is calculated by summing the hashcodes of those two fields. Further suppose you have an equals that simply checks whether the class fields values are equal.
class Test {
String a;
String b;
public Test
#Override
public int hashCode() {
return a.hashCode() + b.hashCode();
}
#Override
public boolean equals(Object o) { // simplified
Test other = (Test)o;
return a.equals(other.a) && b.equals(other.b);
}
}
Let's see if non-equal instances can have the same hashcode
Test t1 = new Test("hello", "world");
Test t2 = new Test("world", "hello");
System.out.println(t1.equals(t2)); // false
System.out.println(t1.hashCode() == t2.hashCode()); // true
Are we still respecting hashCode's contract?
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
Well, yes, since it only depends on a and b and we're using their hashCode method which we can assume respects the contract.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It does
Test t1 = new Test("hello", "world");
Test t2 = new Test("hello", "world");
System.out.println(t1.equals(t2)); // true
System.out.println(t1.hashCode() == t2.hashCode()); // true
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
That's what we were trying to demonstrate in the first place. It's not a requirement.

public int hashCode() {
return 27;
}
Believe it or not, but this is, although it isn't a very efficient way of working, a valid implementation of hashCode, since it will respect the contract with the equals method. This implementation will cause exactly what you describe.
The hashCode is used to limit the number of cases to compare to.
For instance, if in a high school, you are looking for a student. You only know the name, gender and age of this student. Are you going to look through all the students, or only the ones with that age and gender?
The hashCode does the same, for certain data structures. When looking for an item, it will first make a sub-list/collection of the items with an identical hashCode, and then, it searches for the exact item within that sub-list/collection.
The more specific your hash code, the more efficient the search.

Related

Item-9: "Always override hashCode() when you override equals"

With respect to 3 contracts mentioned below:
1) Whenever hashCode() is invoked on the same object more than once during an execution of an application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
From this statement, i understand that, In a single execution of an application, if hashCode() is used one or more times on same object it should return same value.
2) If two objects are equal according to the equals(Object) method, then calling the hashCode() method on each of the two objects must produce the same integer result.
From this statement, i understand that, to perform the equality operation(in broad scope) in your subclass, There are at least four different degrees of equality.
(a) Reference equality(==), comparing the internal address of two reference type objects.
(b) Shallow structural equality: two objects are "equals" if all their fields are ==.
{ For example, two SingleLinkedList whose "size" fields are equal and whose "head" field point to the same SListNode.}
(c) Deep structural equality: two objects are "equals" if all their fields are "equals".
{For example, two SingleLinkedList that represent the same sequence of "items" (though the SListNodes may be different).}
(d) Logical equality. {Two examples:
(a) Two "Set" objects are "equals" if they contain the same elements, even if the underlying lists store the elements in different orders.
(b) The Fractions 1/3 and 2/6 are "equals", even though their numerators and denominators are all different.}
Based on above four categories of equality, second contract will hold good only: if(Say) equals() method returns truth value based on logical_equality between two objects then hashCode() method must also consider logical_equality amidst computation before generating the integer for every new object instead of considering internal address of a new object.
But i have a problem in understanding this third contract.
3) IT IS NOT REQUIRED that if two objects are unequal according to the equals(Object) method, then calling the hashCode() method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
In second contract, As we are saying that hashCode() method should be accordingly[for ex: considering logical_equality before generating integer] implemented, I feel, It is not true to say that, if two objects are unequal according to equals(Object) then hashCode() method may produce same integer results as mentioned in third contract? As per the argument in second contract, hashCode() must produce distinct integer results. One just writing return 42 in hashCode() is breaking second contract!
please help me understand this point!
It would be impossible for hashCode() to always return different values for unequal objects. For example, there are 2^64 different Long values, but only 2^32 possible int values. Therefore the hashCode() method for Long has to have some repeats. In situations like this you have to try hard to ensure that your hashCode() method distributes values as evenly as possible, and is unlikely to produce repeats for the instances you are most likely to use in practice.
The second condition just says that two equal() instances must return the same hashCode() value, so this program must print true:
Long a = Long.MIN_VALUE;
Long b = Long.MIN_VALUE;
System.out.println(a.hashCode() == b.hashCode()); // a.equals(b), so must print true.
However this program also prints true:
Long c = 0L;
Long d = 4294967297L;
System.out.println(c.hashCode() == d.hashCode()); // prints true even though !c.equals(d)
hashCode() does not have to produce a distinct result. return 0; is a perfectly legal implementation of hashCode() - it ensures that two equal objects will have the same hash code. But it will ensure dismal performance when using HashMaps and HashSets.
It's preferable that hashCode() return values will be distinct (i.e., objects that are not equal should have different hash codes), but it's not required.
The second contract states what happens when equals() returns true. It does not say anything about the case when equals() returns false.
The third contract is just a reminder about that fact. It reminds you that when equals() is false for two objects, there is no connection between their hash codes. They may be same or different, as the implementation happens to make them.
The third point means that you can have many unequal objects with the same hashcode. . For example 2 string objects can have the same hashcode. The second point states that two equal objects must have the same hashcode. . return 5 is a valid hash implementation because it returns the same value for 2 equal objects.

How come this equals & hashCode override does NOT cause an exception or error?

When I compile and run the code below, I get the following results:
o1==o2 ? true
Hash codes: 0 | 0
o1==o2 ? true
Hash codes: 1 | 8
o1==o2 ? true
Hash codes: 7 | 3
o1==o2 ? true
Hash codes: 68 | 10
o1==o2 ? true
Hash codes: 5 | 4
From what I've read, if two objects are equal, their hashCodes must also be equal. So, how does this code not cause an exception or error?
import java.io.*;
import java.lang.*;
public class EqualsAndHashCode {
private int num1;
private int num2;
public EqualsAndHashCode(int num1, int num2) {
this.num1 = num1;
this.num2 = num2;
}
public static void main(String[] args) {
for (int x=0; x < 5; x++) {
EqualsAndHashCode o1 = new EqualsAndHashCode(x, x);
EqualsAndHashCode o2 = new EqualsAndHashCode(x, x);
System.out.println("o1==o2 ? " + o1.equals(o2));
System.out.println("Hash codes: " + o1.hashCode() + " | " + o2.hashCode());
}
}
public boolean equals(Object o) {
return (this.getNum1() == ((EqualsAndHashCode)o).getNum1()) && (this.getNum2() == ((EqualsAndHashCode)o).getNum2());
}
public int hashCode() {
return (int)(this.getNum1() / Math.random());
}
public int getNum1() { return num1; }
public int getNum2() { return num2; }
}
EDIT I
The premise behind my question was the wording surrounding the hashCode contract (http://docs.oracle.com/javase/6/docs/api/java/lang/Object.html#hashCode()):
If two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of the two objects must produce
the same integer result.
I assumed that this rule would have been enforced by the JVM at compile or run time and I would have seen errors or exceptions right away when the contract was violated...
Because the JVM does not check or validate that the method contract holds true. They're just methods, and they can return whatever they want.
However, any code which depends upon them supporting the method contract might or will fail. You will not be able to use your EqualsAndHashCode objects in a HashMap, for example. That will throw exceptions or will not return correct values in most cases.
This is the same thing with compareTo() and TreeMaps - compareTo() can return any int that it wants, but if it doesn't return a consistent ordering as defined by the method contract in the Comparable interface, then your TreeMap will throw exceptions as soon as it detects inconsistencies.
So, how does this code not cause an exception or error?
Well, breaking the contract of equals and hashcode, never throws an exception or error. It's just that you see weird behaviour, when you use the objects of those class in hash based collections, like - HashSet, or HashMap.
For example, if in your case, you use your class objects as key in a HashMap, then you might not be able to find that key again, when you try to fetch it. Because, then even if your keys are equal, their hashcodes might be different. And HashMap saerch for keys first using their hashcodes, and then using equals.
if two objects are equal, their hashCodes must also be equal
Above is a recommendation and is not mandated by JVM
The idea behind this recommendation is to have less collisions when storing elements in a hashed collection such as HashMap.
A very good article on the need of hashcode, rules for equals and hashcode, etc:
http://www.ibm.com/developerworks/java/library/j-jtp05273/index.html
How could they possibly be the same given the fact that you are dividing by a Random number?
The typical approach is to use the hashCode values of the individual fields to build the hashCode of the object (if they aren't primitive, in this case they are). You also typically multiply by several prime numbers.
// adapted from Effective Java
public int hashCode() {
int p = 17, q = 37;
p = q * p + num1;
p = q * p + num2;
return p;
}
Use this for hasCose
public int hashCode() {
int result = num1;
result = 31 * result + num2;
return result;
}
Default implementation of hashcode() and equals() is inhertied from Object class by each of the class that you define. In order for your code to behave correctly, especially when it is used in data structures such as HashMap, it is important that you "should" override the default implementation that ensures that "If two instances of your class are equal, then they return same value when hashCode() method is called".
Definition of equality of two objects depends on domain concept their classes represent, and hence, only the author of the class is best suited to implement "equals" and "hashcode" methods. For example, two Employee objects are considered equal if they have same value for "employeeId" attribute. These two may be different instances, but in the realm of domain (say, Human Resources System), they are equal due to equality of their employee IDs. Now, the author of the Employee class should implement "equals" method that compares "employeeId" attributes and returns true if they are same. Similarly, the author should ensure that hashCode() of two Employee instances are same if their employee IDs are same.
And if you are worried about how to write hashCode that meets the above Java recommendation, then, you can generate the hashCode and equals using Eclipse.
Though it is only a recommendation "if two objects are equal, their hashCodes must also be equal", you should be aware of the fact that your code can start misbehaving if objects of your class are used in Set, Map, etc. if you don't create "equals" and "hashCode" methods that comply with this recommendation. Only time you would like to ignore this recommendation is when you are sure that your class will never be tested for equality. An example of such class can be a DAO class or a Service class which typically is instantiated and used as Singleton, and no one (in normal scenarios) compares two DAO or Service classes
The purpose of method contracts is in most cases to allow other code to assume that certain conditions will hold true. In particular, the purpose of the hashCode and equals contracts is to allow a collection to assume that if an object Foo has a particular hashcode (e.g. 24601), and a collection of objects Bar is known not to contain any objects with that hashcode, one may infer from that information that Bar doesn't contain Foo. As a bonus, if a collection of objects contains a variety of hashcodes including that of Foo, and if one has precalculated the hashcodes of all the objects in the collection, one may check the hashcode of each object against that of Bar before looking at the object itself. Comparing two objects' already-computed hash values will be fast, no matter how complicated the objects are.
For all this to work, it is imperative that an object which will report itself equal to another object must always have reported the same hash code as that other object. Because it is always possible to obey this rule, there is seldom any good reason to disobey it. Even if the only immutable characteristic of an object which would be used in determining equality is its type, it's still possible to obey the rule by having all objects of that type return the same hash value. Having objects which will always be different report different hash values may improve performance by several orders of magnitude, but given a choice between behavior which is slow but correct, and behavior which is fast but wrong, the former should generally be preferred.

Does it matter if two hashCodes are equal, even if the two objects aren't from the same type?

Let's say I have two types A and B that both have a unique id field, here is how I usually implement the equals() and hashCode() methods:
#Override
public boolean equals(Object obj) {
return obj instanceof ThisType && obj.hashCode() == hashCode();
}
#Override
public int hashCode() {
return Arrays.hashCode(new Object[] { id });
}
In that case, given that A and B both have a 1-arg constructor to set their respective id field,
new A(1).equals(new A(1)) // prints true as expected,
new A(1).equals(new A(2)) // prints false as expected,
new A(1).equals(new B(1)) // prints false as expected.
But also,
new A(1).hashCode() == new B(1).hashCode() // prints true.
I wonder if it matters if two hashCodes are equal, even if the two objects aren't from the same type? Could hashCode() be used somewhere else than in equals()? If yes, to what purpose?
I thought about implementing the two methods as follow:
#Override
public boolean equals(Object obj) {
return obj != null && obj.hashCode() == hashCode();
}
#Override
public int hashCode() {
return Arrays.hashCode(new Object[] { getClass(), id });
}
Adding the class to the hashCode generation would solve this potential problem. What do you think? Is it necessary?
For objects of different classes the same hashCode() does not matter. The hashCode() only says that the objects are possibly the same. If e.g. HashSet encounters the same hashCode() it will test for equality with equals().
The rule is simple:
A.equals(B) implies B.hashcode() == A.hashcode()
B.hashcode() != A.hashcode() implies !A.equals(B)
There should be no other relations between the two. If you use hashcode() inside equals(), you should have a warning.
Hashcode is definitely not used in equals; it is used by collections based on the data structure called a hash table. It is always OK from the correctness standpoint for two hashcodes to equal each other; this is called a hash collision, it is unavoidable in the general case, and the only consequence is weaker performance.
Nothing wrong with two different objects (even of the same type) to have the equal hash code, but your second variant of equals() looks odd to me. It will work only if you can guarantee that your objects will be compared only to the objects of the same type.
Could hashCode() be used somewhere else than in equals()?
This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable. from javadoc
Also
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Not that, when A extends B, or B extends A, then your equals method is faulty, since:
a.equals(b) != b.equals(a)
if a and b happen to have the same hash code.

java LinkedHashSet

I've been studying for OCJP (former SCJP) and I came across the following example which uses LinkedHashSet:
public class Test{
int size;
public Test(int s){
this.size = s;
}
#Override
public boolean equals(Object obj) {
return (this.size == ((Test)obj).size);
}
public static void main(String[] args) {
LinkedHashSet<Test> s = new LinkedHashSet<Test>();
s.add(new Test(1));
s.add(new Test(2));
s.add(new Test(1));
System.out.println(s.size());
}
}
Now, the question is what is displayed if :
1) implementation stays as is
2) override of hashCode is inserted in the class Test as follows:
public int hashCode() {return size/5};
Running and compiling the code states that the size of set in the first case is 3, while in the second it is 2.
Why?
In case 1, although equals method is overriden, it is never invoked. Does that mean that add() method does not check for object equality if hashCode method is not overriden?
In case 2, hashCode with the given implementation and the give set of Test objects always returns the same number. How is that different from the default hashCode implementation, and why does it cause equals to be invoked?
If you don't override hashCode(), then each of your instances will have hashcode calculated from some pre-defined Hashing algorithm in Object class. So, all your instances will possibly have different hashcode values (This is not for sure though). Means, each instance will go into its own bucket.
Now, even if you overridden equals() method make two instances equal based on some attribute, their hashcodes are still different.
So, two instances with a different hashcodes, can never be equal. So the size of the set is 3. Since it does not have any duplicate.
But, when you override hashCode() with following implementation: -
public int hashCode() {return size/5};
It will return same value for same size. So the instances with same value of size will have same hashcodes and also, since you have compared them in equals method on the basis of size, so they will be equal and hence they will be considered duplicate in your Set and hence will be removed.So, Set.size() is 2.
Moral: - You should always override hashCode() whenever you override equals() method, to maintain the general contract between the two methods.
General contract between hashcode and equals method: -
When two objects are equal, their hashcode must be equal
When two objects are not equal, their hashcode can be equal
The hashCode algorithm should always generate same value for same object.
If hashCode for two objects are different, they will not be equal
Always use same attributes to calculate hashCode that you used to compare the two instances
Strongly suggested to read at least once: -
Effective Java -
Item#9: Always override hashCode when you override equals
Hashing structures rely on hashing algorithm which is represented by hashCode() in java. When you put something into a HashMap (or LinkedHashSet in your case), jvm invokes hashCode() on the objects that are being inserted into this structure. When it is not overriden, default hashCode() from Object class will be used and it is way inefficient -- all the objects get into their own buckets.
When you override the hashCode() the way that is shown in your example, all of objects in your example will get into the same bucket. And then (when adding them one after another), be compared with equals(). That's why in the first case (when equals() is not called) you get size of 3, and in the second -- 2.

Java: overriding equals method doesn't do the trick when looking for a key of hashtable?

I have a hashtable looking like this:
Hashtable<Mapping, Integer> mappingCount = new Hashtable<Mapping, Integer>();
I want to use this code:
if (mappingCount.get(currentMapping) != null)
mappingCount.put(currentMapping, mappingCount.get(currentMapping) + 1);
else
mappingCount.put(currentMapping, 1);
In order to be able to get the value from the hashtable, for the class Mapping I did the following:
#Override
public boolean equals(Object obj) {
return ((Mapping)obj).mappingXML.equals(this.mappingXML);
}
However, this doesn't do the trick since mappingCount.get(currentMapping) always results in null. To be sure that something's not wrong, I did the following:
if (aaa.contains(currentMapping.getMappingXML()))
System.out.println("found it!");
else
aaa.add(currentMapping.getMappingXML());
where aaa is List<String> aaa = new ArrayList<String>(). Of course, found it is printed many times. What am I doing wrong?
You also need to override the hashCode() method.
From the JavaDocs:
To successfully store and retrieve
objects from a hashtable, the objects
used as keys must implement the
hashCode method and the equals method.
The reason for this is that Hashtable uses hashCode as a preliminary test to see if two objects are equals. If the hashCode matches, then it uses equals to check for collissions.
The default implementation of hashCode() returns the memory address of the object, and for two objects that are equal, their hashcodes must also be equal.
Also look at the general contract for hashCode().
All of the recommendations to override equals and hash code correctly are spot on; Joshua Bloch tells you how to do it properly.
But an equally important requirement is that keys in maps must be immutable. If your class can change its values, then the equals and hash code can change after you add it to the map; disaster ensues.
Whenever you override equals, you must override hashCode as well.
You need to override hashCode as well.
From the Object#hashCode doc:
Returns a hash code value for the
object. This method is supported for
the benefit of hashtables such as
those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the
hashCode method must consistently
return the same integer, provided no
information used in equals comparisons
on the object is modified. This
integer need not remain consistent
from one execution of an application
to another execution of the same
application.
If two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of
the two objects must produce the same
integer result.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then
calling the hashCode method on each of
the two objects must produce distinct
integer results. However, the
programmer should be aware that
producing distinct integer results for
unequal objects may improve the
performance of hashtables.
As much as is reasonably practical,
the hashCode method defined by class
Object does return distinct integers
for distinct objects. (This is
typically implemented by converting
the internal address of the object
into an integer, but this
implementation technique is not
required by the JavaTM programming
language.)
You have to implement hashcode() as well!
Example:
public class Employee{
int employeeId;
String name;
Department dept;
// other methods would be in here
#Override
public int hashCode() {
int hash = 1;
hash = hash * 17 + employeeId;
hash = hash * 31 + name.hashCode();
hash = hash * 13 + (dept == null ? 0 : dept.hashCode());
return hash;
}
}

Categories

Resources