How to ensure hashCode() is consistent with equals()? - java

When overriding the equals() function of java.lang.Object, the javadocs suggest that,
it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
The hashCode() method must return a unique integer for each object (this is easy to do when comparing objects based on memory location, simply return the unique integer address of the object)
How should a hashCode() method be overriden so that it returns a unique integer for each object based only on that object's properities?
public class People{
public String name;
public int age;
public int hashCode(){
// How to get a unique integer based on name and age?
}
}
/*******************************/
public class App{
public static void main( String args[] ){
People mike = new People();
People melissa = new People();
mike.name = "mike";
mike.age = 23;
melissa.name = "melissa";
melissa.age = 24;
System.out.println( mike.hasCode() ); // output?
System.out.println( melissa.hashCode(); // output?
}
}

It doesn't say the hashcode for an object has to be completely unique, only that the hashcode for two equal objects returns the same hashcode. It's entirely legal to have two non-equal objects return the same hashcode. However, the more unique a hashcode distribution is over a set of objects, the better performance you'll get out of HashMaps and other operations that use the hashCode.
IDEs such as IntelliJ Idea have built-in generators for equals and hashCode that generally do a pretty good job at coming up with "good enough" code for most objects (and probably better than some hand-crafted overly-clever hash functions).
For example, here's a hashCode function that Idea generates for your People class:
public int hashCode() {
int result = name != null ? name.hashCode() : 0;
result = 31 * result + age;
return result;
}

I won't go in to the details of hashCode uniqueness as Marc has already addressed it. For your People class, you first need to decide what equality of a person means. Maybe equality is based solely on their name, maybe it's based on name and age. It will be domain specific. Let's say equality is based on name and age. Your overridden equals would look like
public boolean equals(Object obj) {
if (this==obj) return true;
if (obj==null) return false;
if (!(getClass().equals(obj.getClass())) return false;
Person other = (Person)obj;
return (name==null ? other.name==null : name.equals(other.name)) &&
age==other.age;
}
Any time you override equals you must override hashCode. Furthermore, hashCode can't use any more fields in its computation than equals did. Most of the time you must add or exclusive-or the hash code of the various fields (hashCode should be fast to compute). So a valid hashCode method might look like:
public int hashCode() {
return (name==null ? 17 : name.hashCode()) ^ age;
}
Note that the following is not valid as it uses a field that equals didn't (height). In this case two "equals" objects could have a different hash code.
public int hashCode() {
return (name==null ? 17 : name.hashCode()) ^ age ^ height;
}
Also, it's perfectly valid for two non-equals objects to have the same hash code:
public int hashCode() {
return age;
}
In this case Jane age 30 is not equal to Bob age 30, yet both their hash codes are 30. While valid this is undesirable for performance in hash-based collections.

Another question asks if there are some basic low-level things that all programmers should know, and I think hash lookups are one of those. So here goes.
A hash table (note that I'm not using an actual classname) is basically an array of linked lists. To find something in the table, you first compute the hashcode of that something, then mod it by the size of the table. This is an index into the array, and you get a linked list at that index. You then traverse the list until you find your object.
Since array retrieval is O(1), and linked list traversal is O(n), you want a hash function that creates as random a distribution as possible, so that objects will be hashed to different lists. Every object could return the value 0 as its hashcode, and a hash table would still work, but it would essentially be a long linked-list at element 0 of the array.
You also generally want the array to be large, which increases the chances that the object will be in a list of length 1. The Java HashMap, for example, increases the size of the array when the number of entries in the map is > 75% of the size of the array. There's a tradeoff here: you can have a huge array with very few entries and waste memory, or a smaller array where each element in the array is a list with > 1 entries, and waste time traversing. A perfect hash would assign each object to a unique location in the array, with no wasted space.
The term "perfect hash" is a real term, and in some cases you can create a hash function that provides a unique number for each object. This is only possible when you know the set of all possible values. In the general case, you can't achieve this, and there will be some values that return the same hashcode. This is simple mathematics: if you have a string that's more than 4 bytes long, you can't create a unique 4-byte hashcode.
One interesting tidbit: hash arrays are generally sized based on prime numbers, to give the best chance for random allocation when you mod the results, regardless of how random the hashcodes really are.
Edit based on comments:
1) A linked list is not the only way to represent the objects that have the same hashcode, although that is the method used by the JDK 1.5 HashMap. Although less memory-efficient than a simple array, it does arguably create less churn when rehashing (because the entries can be unlinked from one bucket and relinked to another).
2) As of JDK 1.4, the HashMap class uses an array sized as a power of 2; prior to that it used 2^N+1, which I believe is prime for N <= 32. This does not speed up array indexing per se, but does allow the array index to be computed with a bitwise AND rather than a division, as noted by Neil Coffey. Personally, I'd question this as premature optimization, but given the list of authors on HashMap, I'll assume there is some real benefit.

In general the hash code cannot be unique, as there are more values than possible hash codes (integers).
A good hash code distributes the values well over the integers.
A bad one could always give the same value and still be logically correct, it would just lead to unacceptably inefficient hash tables.
Equal values must have the same hash value for hash tables to work correctly.
Otherwise you could add a key to a hash table, then try to look it up via an equal value with a different hash code and not find it.
Or you could put an equal value with a different hash code and have two equal values at different places in the hash table.
In practice you usually select a subset of the fields to be taken into account in both the hashCode() and the equals() method.

I think you misunderstood it. The hashcode does not have to be unique to each object (after all, it is a hash code) though you obviously don't want it to be identical for all objects. You do, however, need it to be identical to all objects that are equal, otherwise things like the standard collections would not work (e.g., you'd look up something in the hash set but would not find it).
For straightforward attributes, some IDEs have hashcode function builders.
If you don't use IDEs, consider using Apahce Commons and the class HashCodeBuilder

The only contractual obligation for hashCode is for it to be consistent. The fields used in creating the hashCode value must be the same or a subset of the fields used in the equals method. This means returning 0 for all values is valid, although not efficient.
One can check if hashCode is consistent via a unit test. I written an abstract class called EqualityTestCase, which does a handful of hashCode checks. One simply has to extend the test case and implement two or three factory methods. The test does a very crude job of testing if the hashCode is efficient.

This is what documentation tells us as for hash code method
# javadoc
Whenever it is invoked on
the same object more than once during
an execution of a Java application,
the hashCode method must consistently
return the same integer, provided no
information used in equals comparisons
on the object is modified. This
integer need not remain consistent
from one execution of an application
to another execution of the same
application.

There is a notion of business key, which determines uniqueness of separate instances of the same type. Each specific type (class) that models a separate entity from the target domain (e.g. vehicle in a fleet system) should have a business key, which is represented by one or more class fields. Methods equals() and hasCode() should both be implemented using the fields, which make up a business key. This ensures that both methods consistent with each other.

Related

Should hashCode() only use the subset of immutable fields of those used in equals()?

Situation
I needed to overwrite equals() and as it is recommended I also overwrote the hashCode() method using the same fields. Then, when I was looking at a set, that contained only the one object I got the frustrating result of
set.contains(object)
=> false
while
set.stream().findFirst().get().equals(object)
=> true
I understand now, that this is due to changes that were made to object after it was added to set which again changed its hashCode. contains then looks at the wrong key and can't find the object.
My requirements for the implementation are
mutable fields are needed to correctly implement equals()
use these objects safely in hash-based Collections or Maps such ash HashSet even if they are prone to changes.
which conflicts with the convention that
equals() and hashCode() should use the same fields in order to avoid surprises (as argued here: https://stackoverflow.com/a/22827702).
Question
Are there any dangers to using only a subset of fields which are used in equals() to calculate hashCode() instead of using all?
More specifically this would mean: equals() uses a number of fields of the object whereas hashCode() only uses those fields that are used in equals() and that are immutable.
I think this should be okay, because
the contract is fullfilled: equal objects will produce the same hashCode, while the same hashCode does not necesairly mean that the objects are the same.
The hashCode of an object stays the same, even if an object is exposed to changes and therefore will be found in a HashSet before and after those changes.
Related posts that helped me understand my problem but not how to solve it: What issues should be considered when overriding equals and hashCode in Java? and Different fields for equals and hashcode
It's ok for hashCode() to use a subset of the fields that equals() uses, although it may possibly give you a slight performance drop.
Your problem seems to be caused by modifying the object, while still inside the set, in a way that alters the functioning of hashCode() and/or equals(). Whenever you add an object to a HashSet (or as the key in a HashMap), you must not subsequently modify any fields of that object that are used by equals() and/or hashCode(). Ideally, all fields used by equals() should be final. If they can't be, you must treat them as though they are final whilst the object is in the set.
The same goes for TreeSet/TreeMap, too, but applies to fields used by compareTo().
If you really need to modify the fields that are used by equals() (or by compareTo() in the case of a TreeSet/TreeMap), you must:
First, remove that object from the set;
Then modify the object;
And finally add it back to the set.
The contract would indeed be fulfilled. The contract imposes that .equal() objects have ALWAYS the same .hashCode(). The opposite doesn't have to be true and I wonder with the obsession of some people and IDEs to apply exactly that practice. If this was possible for all possible combinations, then you would discover the perfect hash function.
BTW, IntelliJ offers a nice wizard when generating hashCode and equals by treating those two methods separately and allowing to differentiate your selection. Obviously, the opposite, aka offering more fields in the hashCode() and less fields in the equals() would violate the contract.
For HashSet and similar collections/maps, it's a valid solution to have hashCode() use only a subset of the fields from the equals() method. Of course, you have to think about how useful the hash code is to reduce collisions in the map.
But be aware that the problem comes back if you want to use ordered collections like TreeSet. Then you need a comparator that never gives collisions (returns zero) for "different" objects, meaning that the set can only contain one of the colliding elements. Your equals() description implies that multiple objects will exist that differ only in the mutable fields, and then you lose:
Including the mutable fields in the compareTo() method can change the comparison sign, so that the object needs to move to a different branch in the tree.
Excluding the mutable fields in the compareTo() method limits you to have maximum one colliding element in the TreeSet.
So I'd strongly recommend to think about your object class'es concept of equality and mutability again.
That's perfectly valid to me. Suppose you have a Person:
final int name; // used in hashcode
int income; // name + income used in equals
name decides where the entry will go (think HashMap) or which bucket will be chosen.
You put a Person as a Key inside HashMap : according to hashcode it goes to some bucket, second for example. You upgrade the income and search for that Person in the map. According to hashcode it must be in the second bucket, but according to equals it's not there:
static class Person {
private final String name;
private int income;
public Person(String name) {
super();
this.name = name;
}
public int getIncome() {
return income;
}
public void setIncome(int income) {
this.income = income;
}
public String getName() {
return name;
}
#Override
public int hashCode() {
return name.hashCode();
}
#Override
public boolean equals(Object other) {
Person right = (Person) other;
return getIncome() == right.getIncome() && getName().equals(right.getName());
}
}
And a test:
HashSet<Person> set = new HashSet<>();
Person bob = new Person("bob");
bob.setIncome(100);
set.add(bob);
Person sameBob = new Person("bob");
sameBob.setIncome(200);
System.out.println(set.contains(sameBob)); // false
What you are missing I think is the fact that hashcode decides a bucket where an entry goes (there could be many entries in that bucket) and that's the first step, but equals decides if that is well, an equal entry.
The example that you provide is perfectly legal; but the one you linked is the other way around - it uses more fields in hashcode making it thus incorrect.
If you understand these details that first hashcode is used to understand where and Entry might reside and only later all of them (from the subset or bucket) are tried to be found via equal - your example would make sense.

Why does Java need equals() if there is hashCode()?

If two objects return same hashCode, doesn't it mean that they are equal? Or we need equals to prevent collisions?
And can I implement equals by comparing hashCodes?
If two objects have the same hashCode then they are NOT necessarily equal. Otherwise you will have discovered the perfect hash function. But the opposite is true - if the objects are equal, then they must have the same hashCode.
hashCode and Equals are different information about objects
Consider the analogy to Persons where hashcode is the Birthday,
in that escenario, you and many other people have the same b-day (same hashcode), all you are not the same person however..
Why does Java need equals() if there is hashCode()?
Java needs equals() because it is the method through which object equality is tested by examining classes, fields, and other conditions the designer considers to be part of an equality test.
The purpose of hashCode() is to provide a hash value primarily for use by hash tables; though it can also be used for other purposes. The value returned is based on an object's fields and hash codes of its composite and/or aggregate objects. The method does not take into account the class or type of object.
The relationship between equals() and hashCode() is an implication.
Two objects that are equal implies that the have the same hash code.
Two objects having the same hash code does not imply that they are equal.
The latter does not hold for several reasons:
There is a chance that two distinct objects may return the same hash code. Keep in mind that a hash value folds information from a large amount of data into a smaller number.
Two objects from different classes with similar fields will most likely use the same type of hash function, and return equal hash values; yet, they are not the same.
hashCode() can be implementation-specific returning different values on different JVMs or JVM target installations.
Within the same JVM, hashCode() can be used as a cheap precursor for equality by testing for a known hash code first and only if the same testing actual equality; provided that the equality test is significantly more expensive than generating a hash code.
And can I implement equals by comparing hashCodes?
No. As mentioned, equal hash codes does not imply equal objects.
The hashCode method as stated in the Oracle Docs is a numeric representation of an object in Java. This hash code has limited possible values (represented by the values which can be stored in an int).
For a more complex class, there is a high possibility that you will find two different objects which have the same hash code value. Also, no one stops you from doing this inside any class.
class Test {
#Override
public int hashCode() {
return 0;
}
}
So, it is not recommended to implement the equals method by comparing hash codes. You should use them for comparison only if you can guarantee that each object has an unique hash code. In most cases, your only certainty is that if two objects are equal using o1.equals(o2) then o1.hashCode() == o2.hashCode().
In the equals method you can define a more complex logic for comparing two objects of the same class.
If two objects return same hashCode, doesn't it mean that they are equal?
No it doesn't mean that.
The javadocs for Object state this:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. ...
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must
produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCodemethod on
each of the two objects must produce distinct integer results. ...
Note the highlighted statement. It plainly says "No" to your question.
There is another way to look at this.
The hashCode returns an int.
There are only 232 distinct values that an int can take.
If a.hashCode() == b.hashCode() implies a.equals(b), then there can be only 232 distinct (i.e. mutually unequal) objects at any given time in a running Java application.
That last point is plainly not true. Indeed, it is demonstrably not true if you have a large enough heap to hold 232 instances of java.lang.Object ... in a 64-bit JVM.
And a third way is to some well-known examples where two different two character strings have the same hashcode.
Given that your assumption is incorrect, the reasoning that follows from it is also incorrect.
Java does need an equals method.
You generally cannot implement equals using just hashCode.
You may be able to use hashCode to implement a faster equals method, but only if calling hashCode twice is faster than comparing two objects. It generally isn't.
hashCodes are equal -> Objects might be equal -> further comparision is required
hashCodes are different -> Object are not equal (if hashCode is implemented right)
That's how equals method are implemented. At first you check if hashCodes are equal. If yes, you need to check class fields to see if it represents the exact same object. If hashCodes are different, you can be sure that objects are not equal.
Sometimes (very often?) you don't!
These answers are not untrue. But they don't tell the whole story.
One example would be where you are creating a load of objects of class SomeClass, and each instance that is created is given a unique ID by incrementing a static variable, nInstanceCount, or some such, in the constructor:
iD = nInstanceCount++;
Your hash function could then be
int hashCode(){
return iD;
}
and your equals could then be
boolean equals( Object obj ){
if( ! ( obj instanceof SomeClass )){
return false;
}
return hashCode() == obj.hashCode();
}
... under such circumstances your idea that "equals is superfluous" is effectively true: if all classes behaved like this, Java 10 (or Java 23) might say, ah, let's just get rid of silly old equals, what's the point? (NB backwards compatibility would then go out the window).
There are two essential points:
you couldn't then create more than MAXINT instances of SomeClass. Or... you could ... if you set up a system for reassigning the IDs of previously destroyed instances. IDs are typically long rather than int ... but this wouldn't work because hashCode() returns int.
none of these objects could then be "equal" to another one, since equality = identity for this particular class, as you have defined it. Often this is desirable. Often it shuts off whole avenues of possibilities...
The necessary implication of your question is, perhaps, what's the use of these two methods which, in a rather annoying way, have to "cooperate"? Frelling, in his/her answer, alluded to the crucial point: hash codes are needed for sorting into "buckets" with classes like HashMap. It's well worth reading up on this: the amount of advanced maths that has gone into designing efficient "bucket" mechanisms for classes like HashMap is quite frightening. After reading up on it you may come to have (like me) a bit of understanding and reverence about how and why you should bother implementing hashCode() with a bit of thought!

Consistent hashcode for an object in java

In java Is it possible to get consistent hash code for an object when we are running the application multiple times
Sure. If it is a String for example, then String.hashCode() gives a consistent hashcode each time you run the application.
You only get into trouble if the hashcode incorporates something other than the values of the object's component fields; e.g. an identity hashcode. And of course, this means that the object class needs to override Object.hashcode() at some point, because that method gives you an identity hashcode.
FOLLOW UP
Judging from comments on other answers, the OP still seems to be pursuing the illusory goal of a unique hash function; i.e. some function that will map (for example) any String to a hashcode that is unique for all possible Strings.
Unfortunately this is impossible in the general case, and in this case. Furthermore, it is a simple matter to construct a proof that a String to int hash function that generates unique int values is mathematically impossible. (I won't bore you with the details ... but the basis of the proof is that there are more String values than int values.)
In fact, the only situation where such a hash function is possible is when the set of all possible values of input type has a size that is no greater than the number of possible values of the integer type. There are hash functions that will map a byte, char, short or int to a unique int, but a hash function that maps long values to unique int values is impossible.
It depends on implementation on hashCode() method of Object
It can also be
public int hashCode() {
return 1;
}
No, not for objects in general. Objects with their own hashcode method will probably be consistent across runs.
Implement/override the public int hashCode() method all objects have?
You have to decide what makes the object the same. Usually it is based on the content of one or more fields. In this case, you should make the hashCode based on these fields. (And equals())
However, I would suggest you shouldn't rely on the hashCode being the same between runs of the application. This is highly likely to break when you change code and very hard to fix when it does. e.g. if you add/remove a field which is part of the hashCode or change the way the hashCode is calculated or anything ti depends on, the hashCode will change.
What are you trying to do? This sounds like a problem where a different solution would be better.
Looking in the contract of hashCode:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the
hashCode method must consistently
return the same integer, provided no
information used in equals comparisons
on the object is modified. This
integer need not remain consistent
from one execution of an application
to another execution of the same
application.
If two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of
the two objects must produce the same
integer result.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then
calling the hashCode method on each of
the two objects must produce distinct
integer results. However, the
programmer should be aware that
producing distinct integer results for
unequal objects may improve the
performance of hashtables.
So it is not guaranteed that the hashCode is euqal between invocations. In reality, there are be quite some hashCode implementations that return the same value across invocations: String and all types used for boxing (like Integer) have a consistent return value for hashCode. Objects that only combine member hashCodes where each member has a consistent return value also feature this consistency. So, in practice it should be rather common to have a hashCode return value that is consistent accross invocations.

Internal implementation of java.util.HashMap and HashSet

I have been trying to understand the internal implementation of java.util.HashMap and java.util.HashSet.
Following are the doubts popping in my mind for a while:
Whats is the importance of the #Override public int hashcode() in a HashMap/HashSet? Where is this hash code used internally?
I have generally seen the key of the HashMap be a String like myMap<String,Object>. Can I map the values against someObject (instead of String) like myMap<someObject, Object>? What all contracts do I need to obey for this happen successfully?
Thanks in advance !
EDIT:
Are we saying that the hash code of the key (check!) is the actual thing against which the value is mapped in the hash table? And when we do myMap.get(someKey); java is internally calling someKey.hashCode() to get the number in the Hash table to be looked for the resulting value?
Answer: Yes.
EDIT 2:
In a java.util.HashSet, from where is the key generated for the Hash table? Is it from the object that we are adding eg. mySet.add(myObject); then myObject.hashCode() is going to decide where this is placed in the hash table? (as we don't give keys in a HashSet).
Answer: The object added becomes the key. The value is dummy!
The answer to question 2 is easy - yes you can use any Object you like. Maps that have String type keys are widely used because they are typical data structures for naming services. But in general, you can map any two types like Map<Car,Vendor> or Map<Student,Course>.
For the hashcode() method it's like answered before - whenever you override equals(), then you have to override hashcode() to obey the contract. On the other hand, if you're happy with the standard implementation of equals(), then you shouldn't touch hashcode() (because that could break the contract and result in identical hashcodes for unequal objects).
Practical sidenote: eclipse (and probably other IDEs as well) can auto generate a pair of equals() and hashcode() implementation for your class, just based on the class members.
Edit
For your additional question: yes, exactly. Look at the source code for HashMap.get(Object key); it calls key.hashcode to calculate the position (bin) in the internal hashtable and returns the value at that position (if there is one).
But be careful with 'handmade' hashcode/equals methods - if you use an object as a key, make sure that the hashcode doesn't change afterwards, otherwise you won't find the mapped values anymore. In other words, the fields you use to calculate equals and hashcode should be final (or 'unchangeable' after creation of the object).
Assume, we have a contact with String name and String phonenumber and we use both fields to calculate equals() and hashcode(). Now we create "John Doe" with his mobile phone number and map him to his favorite Donut shop. hashcode() is used to calculate the index (bin) in the hash table and that's where the donut shop is stored.
Now we learn that he has a new phone number and we change the phone number field of the John Doe object. This results in a new hashcode. And this hashcode resolves to a new hash table index - which usually isn't the position where John Does' favorite Donut shop was stored.
The problem is clear: In this case we wanted to map "John Doe" to the Donut shop, and not "John Doe with a specific phone number". So, we have to be careful with autogenerated equals/hashcode to make sure they're what we really want, because they might use unwanted fields, introducing trouble with HashMaps and HashSets.
Edit 2
If you add an object to a HashSet, the Object is the key for the internal hash table, the value is set but unused (just a static instance of Object). Here's the implementation from the openjdk 6 (b17):
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
private transient HashMap<E,Object> map;
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
Hashing containers like HashMap and HashSet provide fast access to elements stored in them by splitting their contents into "buckets".
For example the list of numbers: 1, 2, 3, 4, 5, 6, 7, 8 stored in a List would look (conceptually) in memory something like: [1, 2, 3, 4, 5, 6, 7, 8].
Storing the same set of numbers in a Set would look more like this: [1, 2] [3, 4] [5, 6] [7, 8]. In this example the list has been split into 4 buckets.
Now imagine you want to find the value 6 out of both the List and the Set. With a list you would have to start at the beginning of the list and check each value until you get to 6, this will take 6 steps. With a set you find the correct bucket, the check each of the items in that bucket (only 2 in our example) making this a 3 step process. The value of this approach increases dramatically the more data you have.
But wait how did we know which bucket to look in? That is where the hashCode method comes in. To determine the bucket in which to look for an item Java hashing containers call hashCode then apply some function to the result. This function tries to balance the numbers of buckets and the number of items for the fastest lookup possible.
During lookup once the correct bucket has been found each item in that bucket is compared one at a time as in a list. That is why when you override hashCode you must also override equals. So if an object of any type has both an equals and a hashCode method it can be used as a key in a Map or an entry in a Set. There is a contract that must be followed to implement these methods correctly the canonical text on this is from Josh Bloch's great book Effective Java: Item 8: Always override hashCode when you override equals
Whats is the importance of the #Override public int hashcode() in a HashMap/HashSet?
This allows the instance of the map to produce a useful hash code depending on the content of the map. Two maps with the same content will produce the same hash code. If the content is different, the hash code will be different.
Where is this hash code used internally?
Never. This code only exists so you can use a map as a key in another map.
Can I map the values against someObject (instead of String) like myMap<someObject, Object>?
Yes but someObject must be a class, not an object (your name suggests that you want to pass in object; it should be SomeObject to make it clear you're referring to the type).
What all contracts do I need to obey for this happen successfully?
The class must implement hashCode() and equals().
[EDIT]
Are we saying that the hash code of the key (check!) is the actual thing against which the value is mapped in the hash table?
Yes.
Yes. You can use any object as the key in a HashMap. In order to do so following are the steps you have to follow.
Override equals.
Override hashCode.
The contracts for both the methods are very clearly mentioned in documentation of java.lang.Object. http://java.sun.com/javase/6/docs/api/java/lang/Object.html
And yes hashCode() method is used internally by HashMap and hence returning proper value is important for performance.
Here is the hashCode() method from HashMap
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
It is clear from the above code that hashCode of each key is not just used for hashCode() of the map, but also for finding the bucket to place the key,value pair. That is why hashCode() is related to performance of the HashMap
Any Object in Java must have a hashCode() method; HashMap and HashSet are no execeptions. This hash code is used if you insert the hash map/set into another hash map/set.
Any class type can be used as the key in a HashMap/HashSet. This requires that the hashCode() method returns equal values for equal objects, and that the equals() method is implemented according to contract (reflexive, transitive, symmetric). The default implementations from Object already obey these contracts, but you may want to override them if you want value equality instead of reference equality.
There is a intricate relationship between equals(), hashcode() and hash tables in general in Java (and .NET too, for that matter). To quote from the documentation:
public int hashCode()
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)
The line
#Overrides public int hashCode()
just tells that the hashCode() method is overridden. This ia usually a sign that it's safe to use the type as key in a HashMap.
And yes, you can aesily use any object which obeys the contract for equals() and hashCode() in a HashMap as key.
In answer to question 2, though you can have any class that can be used to as the key in Hashmap, the best practice is to use immutable classes as keys for the HashMap. Or at the least if your "hashCode", and "equals" implementation are dependent on some of the attributes of your class then you should take care that you don't provide methods to alter these attributes.
Aaron Digulla is absolutely correct. An interesting additional note that people don't seem to realise is that the key object's hashCode() method is not used verbatim. It is, in fact, rehashed by the HashMap i.e. it calls hash(someKey.hashCode)), where hash() is an internal hashing method.
To see this, have a look at the source: http://kickjava.com/src/java/util/HashMap.java.htm
The reason for this is that some people implement hashCode() poorly and the hash() function gives a better hash distribution. It's basically done for performance reasons.
HashCode method for collection classes like HashSet, HashTable, HashMap etc – Hash code returns integer number for the object that is being supported for the purpose of hashing. It is implemented by converting internal address of the object into an integer. Hash code method should be overridden in every class that overrides equals method.
Three general contact for HashCode method
For two equal objects acc. to equal method, then calling HashCode for both object it should produce same integer value.
If it is being called several times for a single object, then it should return constant integer value.
For two unequal objects acc. to equal method, then calling HashCode method for both object, it is not mandatory that it should produce distinct value.

How to compute the hashCode() from the object's address?

In Java, I have a subclass Vertex of the Java3D class Point3f. Now Point3f computes equals() based on the values of its coordinates, but for my Vertex class I want to be stricter: two vertices are only equal if they are the same object. So far, so good:
class Vertex extends Point3f {
// ...
public boolean equals(Object other) {
return this == other;
}
}
I know this violates the contract of equals(), but since I'll only compare vertices to other vertices this is not a problem.
Now, to be able to put vertices into a HashMap, the hashCode() method must return results consistent with equals(). It currently does that, but probably bases its return value on the fields of the Point3f, and therefore will give hash collisions for different Vertex objects with the same coordinates.
Therefore I would like to base the hashCode() on the object's address, instead of computing it from the Vertex's fields. I know that the Object class does this, but I cannot call its hashCode() method because Point3f overrides it.
So, actually my question is twofold:
Should I even want such a shallow equals()?
If yes, then, how do I get the object's address to compute the hash code from?
Edit: I just thought of something... I could generate a random int value on object creation, and use that for the hash code. Is that a good idea? Why (not)?
Either use System.identityHashCode() or use an IdentityHashMap.
System.identityHashCode() returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode().
You use a delegate even though this answer is probably better.
class Vertex extends Point3f{
private final Object equalsDelegate = new Object();
public boolean equals(Object vertex){
if(vertex instanceof Vertex){
return this.equalsDelegate.equals(((Vertex)vertex).equalsDelegate);
}
else{
return super.equals(vertex);
}
}
public int hashCode(){
return this.equalsDelegate.hashCode();
}
}
Just FYI, your equals method does NOT violate the equals contract (for the base Object's contract that is)... that is basically the equals method for the base Object method, so if you want identity equals instead of the Vertex equals, that is fine.
As for the hash code, you really don't need to change it, though the accepted answer is a good option and will be a lot more efficient if your hash table contains a lot of vertex keys that have the same values.
The reason you don't need to change it is because it is completely fine that the hash code will return the same value for objects that equals returns false... it is even a valid hash code to just return 0 all the time for EVERY instance. Whether this is efficient for hash tables is completely different issue... you will get a lot more collisions if a lot of your objects have the same hash code (which may be the case if you left hash code alone and had a lot of vertices with the same values).
Please don't accept this as the answer though of course (what you chose is much more practical), I just wanted to give you a little more background info about hash codes and equals ;-)
Why do you want to override hashCode() in the first place? You'd want to do it if you want to work with some other definition of equality. For example
public class A {
int id;
public boolean equals(A other) { return other.id==id}
public int hashCode() {return id;}
}
where you want to be clear that if the id's are the same then the objects are the same, and you override hashcode so that you can't do this:
HashSet hash= new HashSet();
hash.add(new A(1));
hash.add(new A(1));
and get 2 identical(from the point of view of your definition of equality) A's.
The correct behavior would then be that you'd only have 1 object in the hash, the second write would overwrite.
Since you are not using equals as a logical comparison, but a physical one (i.e. it is the same object), the only way you will guarantee that the hashcode will return a unique value, is to implement a variation of your own suggestion. Instead of generating a random number, use UUID to generate an actual unique value for each object.
The System.identityHashCode() will work, most of the time, but is not guaranteed as the Object.hashCode() method is not guaranteed to return a unique value for every object. I have seen the marginal case happen, and it will probably be dependent on the VM implementation, which is not something you will want your code be dependent on.
Excerpt from the javadocs for Object.hashCode():
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
The problem this addresses, is the case of having two separate point objects from overwriting each other when inserted into the hashmap because they both have the same hash. Since there is no logical equals, with the accompanying override of hashCode(), the identityHashCode method can actually cause this scenario to occur. Where the logical case would only replace hash entries for the same logical point, using the system based hash can cause it to occur with any two objects, equality (and even class) is no longer a factor.
The function hashCode() is inherited from Object and works exactly as you intend (on object level, not coordinate-level). There should be no need to change it.
As for your equals-method, there is no reason to even use it, since you can just do obj1 == obj2 in your code instead of using equals, since it's meant for sorting and similar, where comparing coordinates makes a lot more sense.

Categories

Resources