Let's say I have an object.
It contains width and height, x and y coordinates, and x and y representing velocity.
I have overridden equals method and I compare by comparing width, height, x and y and velocity.
What should I do with hash code?
The reason why I am confused is that it is a moving object and I am not sure what I should be using in order to calculate hash code, values are going to be constantly changing and the only thing that will remain static is size really.
According to Object.hashCode() there is a clause that can help with your decision:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. This integer need not remain
consistent from one execution of an application to another execution
of the same application.
Since your equals() compares width, height, x and y and velocity, your hashcode() would not return the same hash whenever these values change.
Sample hashcode() for you:
#Override
public int hashCode() {
int hash = 1;
hash = hash * 17 + width;
hash = hash * 31 + height;
hash = hash * 13 + x;
hash = hash * 13 + y;
hash = hash * 13 + velocity.hashcode();
return hash;
}
You can go further by storing the hashcode in a private variable, and use a dirty flag to know to recalculate the hashcode of your object if any parameters change. Since you are not doing anything expensive within the hashcode() method, I don't think you would need to do that.
If you want your hash code code to compare equality then using the changing values will be just fine because the hash is calculated on the fly when you need it (or should be). So you have all these "moving" things and you want to know if they are equal (they have the same location or velocity or whatever) then it will be accurate.
If you want to use it in a hash table then don't override it and just use the default (The address in memory). Or if you want more control set a static counter and use that to create IDs for these objects.
Just make sure equals() and hashCode() are consistent.
That means, whatever fields are used to determine equality should be used to compute the hash code so that equal objects (with a certain state) will have the same hash code (for that same state).
But you may want to consider whether your equals() implementation is correct in the first place, or whether you even need to override the default implementation.
To be honest the only reason why I wanted to implement hash code is because java rules state that if equals is overriden then hash code should be too. However none of the objects will be used in a map.
In that case, you could implement it as follows:
public int hashCode() {
throw new UnsupportedOperationException("hashCode is not implemented");
}
This has the advantage that if you accidentally use the object as a hash key you will get an immediate exception rather than unexpected behavior if the object is mutated at the wrong time.
Typically, I would assign an ID to the object as a member variable and return the ID as the hashcode value.
Related
We know that hashCode() method of an object gives a hash-code based on the memory address of the instance of the object. So when we have two objects of a same class with same data it will still give different hash-code as they are stored in different memory location.
Now, when we create two string objects using new String("Some_Name") we will have two objects that are stored in different address. When we see the hashcode for these two objects we should get different hashcodes as they are stored in different memory location. But we end up getting same hashcode as result.
Employee empObject = new Employee("Some_Name");
Employee empObject1 =new Employee("Some_Name");
String stringObject= new String("Some_Name");
String stringObject1=new String("Some_Name");
//Output
System.out.println(empObject.hashCode()); //1252169911
System.out.println(empObject1.hashCode()); //2101973421
System.out.println(stringObject.hashCode()); //1418906358
System.out.println(stringObject1.hashCode()); //1418906358
Does this mean that String object has overridden hashCode() method from Object. If so in the overridden method it has to search for other String objects with same data in the heap and put a constant hashCode for all. Help me if my basics understanding itself is wrong.
Note: It is not about String literal in-fact it is about String Object as literals are stored in String Constant Pool and String Object is created outside the pool inside heap as Object.
This is what i found with a simple google search:
An object’s hash code allows algorithms and data structures to put objects into compartments, just like letter types in a printer’s type case. The printer puts all A types into the compartment for A, and he looks for an A only in this one compartment. This simple system lets him find types much faster than searching in an unsorted drawer. That’s also the idea of hash-based collections, such as HashMap and HashSet.
The contract is explained in the hashCode method’s JavaDoc. It can be roughly summarized with this statement:
Objects that are equal must have the same hash code within a running process
Unequal objects must have different hash codes – WRONG!
Objects with the same hash code must be equal – WRONG!
The contract allows for unequal objects to share the same hash code, such as the A and µ objects in the sketch above. In math terms, the mapping from objects to hash codes doesn’t have to be injective or even bijective. This is obvious because the number of possible distinct objects is usually bigger than the number of possible hash codes 2^32.
Hope it helps
Explanation
Your first paragraph describes the default behavior of hashCode. But usually classes override it and create a content-based solution (same as for equals). This especially applies to the String class.
Default hashCode
The default implementation is not done in Java but directly implemented in the JVM, it has a native keyword. You can always get hands on the original hashCode by using System#identityHashCode, see its documentation:
Returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode(). The hash code for the null reference is zero.
Note that the default implementation of hashCode is not necessarily based on the memory location. It often is related, but you can by no means rely on that (see How is hashCode() calculated in Java). Here is the documentation of Object#hashCode:
Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
The relevant parts are the second and third requirement. It must behave the same as equals and hash-collisions are okay (but not optimal).
And Object#equals is typically used to create custom content-based comparisons (see documentation).
String hashCode
Now let us take a look at the implementation of String#hashCode. As said, the class overrides the method and implements a content-based solution. So the hash for "hello" will always be the same as for "hello". Even if you force new instances using the constructor:
// Will have the same hash
new String("hello").hashCode()
new String("hello").hashCode()
It works exactly as equals, which would output true here as well:
new String("hello").equals(new String("hello")) // true
as required by the contract of the hashCode method (see documentation).
Here is the implementation of the method (JDK 10):
/**
* Returns a hash code for this string. The hash code for a
* {#code String} object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using {#code int} arithmetic, where {#code s[i]} is the
* <i>i</i>th character of the string, {#code n} is the length of
* the string, and {#code ^} indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* #return a hash code value for this object.
*/
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
hash = h = isLatin1() ? StringLatin1.hashCode(value)
: StringUTF16.hashCode(value);
}
return h;
}
Which just forwards to either StringLatin1 or StringUTF16, let us see what they have:
// StringLatin1
public static int hashCode(byte[] value) {
int h = 0;
for (byte v : value) {
h = 31 * h + (v & 0xff);
}
return h;
}
// StringUTF16
public static int hashCode(byte[] value) {
int h = 0;
int length = value.length >> 1;
for (int i = 0; i < length; i++) {
h = 31 * h + getChar(value, i);
}
return h;
}
As you see, both of them just do some simple math based on the individual characters in the string. So it is completely content-based and will thus obviously result in the same result for the same characters always.
We know that hashCode() method of an object gives a hash-code based on the memory address of the instance of the object. So when we have two objects of a same class with same data it will still give different hash-code as they are stored in different memory location.
Now, when we create two string objects using new String("Some_Name") we will have two objects that are stored in different address. When we see the hashcode for these two objects we should get different hashcodes as they are stored in different memory location. But we end up getting same hashcode as result.
Employee empObject = new Employee("Some_Name");
Employee empObject1 =new Employee("Some_Name");
String stringObject= new String("Some_Name");
String stringObject1=new String("Some_Name");
//Output
System.out.println(empObject.hashCode()); //1252169911
System.out.println(empObject1.hashCode()); //2101973421
System.out.println(stringObject.hashCode()); //1418906358
System.out.println(stringObject1.hashCode()); //1418906358
Does this mean that String object has overridden hashCode() method from Object. If so in the overridden method it has to search for other String objects with same data in the heap and put a constant hashCode for all. Help me if my basics understanding itself is wrong.
Note: It is not about String literal in-fact it is about String Object as literals are stored in String Constant Pool and String Object is created outside the pool inside heap as Object.
This is what i found with a simple google search:
An object’s hash code allows algorithms and data structures to put objects into compartments, just like letter types in a printer’s type case. The printer puts all A types into the compartment for A, and he looks for an A only in this one compartment. This simple system lets him find types much faster than searching in an unsorted drawer. That’s also the idea of hash-based collections, such as HashMap and HashSet.
The contract is explained in the hashCode method’s JavaDoc. It can be roughly summarized with this statement:
Objects that are equal must have the same hash code within a running process
Unequal objects must have different hash codes – WRONG!
Objects with the same hash code must be equal – WRONG!
The contract allows for unequal objects to share the same hash code, such as the A and µ objects in the sketch above. In math terms, the mapping from objects to hash codes doesn’t have to be injective or even bijective. This is obvious because the number of possible distinct objects is usually bigger than the number of possible hash codes 2^32.
Hope it helps
Explanation
Your first paragraph describes the default behavior of hashCode. But usually classes override it and create a content-based solution (same as for equals). This especially applies to the String class.
Default hashCode
The default implementation is not done in Java but directly implemented in the JVM, it has a native keyword. You can always get hands on the original hashCode by using System#identityHashCode, see its documentation:
Returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode(). The hash code for the null reference is zero.
Note that the default implementation of hashCode is not necessarily based on the memory location. It often is related, but you can by no means rely on that (see How is hashCode() calculated in Java). Here is the documentation of Object#hashCode:
Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
The relevant parts are the second and third requirement. It must behave the same as equals and hash-collisions are okay (but not optimal).
And Object#equals is typically used to create custom content-based comparisons (see documentation).
String hashCode
Now let us take a look at the implementation of String#hashCode. As said, the class overrides the method and implements a content-based solution. So the hash for "hello" will always be the same as for "hello". Even if you force new instances using the constructor:
// Will have the same hash
new String("hello").hashCode()
new String("hello").hashCode()
It works exactly as equals, which would output true here as well:
new String("hello").equals(new String("hello")) // true
as required by the contract of the hashCode method (see documentation).
Here is the implementation of the method (JDK 10):
/**
* Returns a hash code for this string. The hash code for a
* {#code String} object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using {#code int} arithmetic, where {#code s[i]} is the
* <i>i</i>th character of the string, {#code n} is the length of
* the string, and {#code ^} indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* #return a hash code value for this object.
*/
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
hash = h = isLatin1() ? StringLatin1.hashCode(value)
: StringUTF16.hashCode(value);
}
return h;
}
Which just forwards to either StringLatin1 or StringUTF16, let us see what they have:
// StringLatin1
public static int hashCode(byte[] value) {
int h = 0;
for (byte v : value) {
h = 31 * h + (v & 0xff);
}
return h;
}
// StringUTF16
public static int hashCode(byte[] value) {
int h = 0;
int length = value.length >> 1;
for (int i = 0; i < length; i++) {
h = 31 * h + getChar(value, i);
}
return h;
}
As you see, both of them just do some simple math based on the individual characters in the string. So it is completely content-based and will thus obviously result in the same result for the same characters always.
I'm trying to learn the full story behind hashCode. In most implementations hashCode is fully deterministic, like in StringUTF16 class:
public static int hashCode(byte[] value) {
int h = 0;
int length = value.length >> 1;
for (int i = 0; i < length; i++) {
h = 31 * h + getChar(value, i);
}
return h;
}
I think that such implementation is not great: it's easy to construct examples which have the same hashCode. For example, a user of a system can submit exactly the words with the same hashCode for DOS attack. It doesn't work with String, since it implements Comparable (and HashMap is an over-hacked mess), but it won't help with classes which don't implement Comparable.
A better approach seems to use a random factor (instead of 31), so that the user don't know how to construct bad examples (and it also has some theoretical properties), like this:
class ImmutableArray{
// Note static keyword. It guarantees that for the same run all objects use the same x.
private static final int x = generateRandomPrime();
int[] values;
public int hashCode() {
int res = 5;
for (int v : values) {
res = res * x + v;
}
return res;
}
...
}
Now, my question: is there anything bad about this implementation? The only problem I can see is that it will return different hashCodes for different runs of the program, but I can't imagine a concrete scenario where something can go wrong.
It is NOT a requirement that hashCode gives the same values in different JVMs. For example, the HashMap class does not persist the hashCode values of the map's keys when it is serialized. Instead, the hashCode values are recomputed when the map is deserialized.
The only potential problem I can see is that recomputing the hashCode on each call is inefficient. You can address that by computing it lazily (like String::hashCode does for example).
But if you implement lazy hashCode calculation, you need to declare the field where you store it as transient. Otherwise, the hashCode value in a de-persisted key instance won't == the hashCode value computed for another instance that is "equal" to the key. (In other words, the hashcode / equals contract is broken!) This will lead to lookup failure.
If you do this properly, there should be no problem vis-a-vis serialization of HashMap. For example, you could follow the approach of String::hashCode and use zero as the cached hashCode value which means "the code needs to be calculated" to the hashCode() method.
(If your key class doesn't have a field to hold a cached hashCode value, the problem with persisting that value doesn't arise.)
The other thing to note is that modifying the key class to implement Comparable would be another defense against DOS-based attacks. In your example class, the implementation of the compareTo method is simple. Note that the ordering that you implement doesn't need to be semantically meaningful. It just needs to be stable and consistent.
I don't see this as much of an issue unless you get into specialized applications of serialization. In most scenarios, the way you have it setup is basically equivalent to adding an arbitrary 31 value as far as the runtime is concerned (the value does not change).
Though, through reflection 'trickery' you could potentially alter the value and throw the whole system off track (think setAccessible and modifier flags).
In the event there's a setup that depends on hash-codes and consistency when objects are serialized and transferred to different environments, I see greater chance for problems. The way hash-codes compare between the two separate environments are highly likely to differ when they actually should not).
I have the funny situation, that I store a Coordinate into a HashMap<Coordinate, GUIGameField>.
Now, the strange thing about it is, that I have a fragment of code, which should guard, that no coordinate should be used twice. But if I debug this code:
if (mapForLevel.containsKey(coord)) {
throw new IllegalStateException("This coordinate is already used!");
} else {
...do stuff...
}
... the containsKey always returns false, although I stored a coordinate with a hashcode of 9731 into the map and the current coord also has the hashcode 9731.
After that, the mapForLevel.entrySet() looks like:
(java.util.HashMap$EntrySet) [(270,90)=gui.GUIGameField#29e357, (270,90)=gui.GUIGameField#ca470]
What could I have possibly done wrong? I ran out of ideas. Thanks for any help!
public class Coordinate {
int xCoord;
int yCoord;
public Coordinate(int x, int y) {
...store params in attributes...
}
...getters & setters...
#Override
public int hashCode() {
int hash = 1;
hash = hash * 41 + this.xCoord;
hash = hash * 31 + this.yCoord;
return hash;
}
}
You should override equals in addition to hashCode for it to work correctly.
EDIT : I have wrongly stated that you should use hashCode in your equals - this was not correct. While hashCode must return the same result for two equal objects, it still may return the same result for different objects.
It seems that you forgot to implement equals() method for your coordinate class. This is required by contract. Hah compares 2 entries with the same hash code using equals. In your case the Object.equals() is called that is always different for 2 different object because is based on reference to the object in memory.
The reason why you need to implement equals alongside with hashCode is because of the way hash tables work.
Hash tables associate an integer value (the hash of the Key) with the Value.
Think of it as an array of Value objects. When you insert in this table, value is stored at position key.hashCode().
This allows you to find any object in the table "straight away". You just have to compute the hashCode for that object and you'll know where it is on the table. Think of it as opposed to a Tree, in which you would need to navigate the tree to find the object).
However, there is a problem with this approach: there might be more than one object with the same hash code.
This would cause you to mistakenly associate two (or more) keys with the same value. This is called a collision.
There's a simple way to solve this problem: instead of mapping each hashcode to one Value, you can map it to a list of pairs Key-Value.
Now every time you're looking for an object in the hash map, after computing the hash you need to go through that list (the list of 'values' that are related to that hashcode) and find the correct one.
This is why you always need to implement equals on the key of the hash map.
Note: Hash tables are actually a bit more complex than this, but the idea is the same. You can read more about collision resolution here.
Define hashCode method in your Coordinate class. Make sure it returns unique code for unique objects and same for same objects.
When overriding the equals() function of java.lang.Object, the javadocs suggest that,
it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
The hashCode() method must return a unique integer for each object (this is easy to do when comparing objects based on memory location, simply return the unique integer address of the object)
How should a hashCode() method be overriden so that it returns a unique integer for each object based only on that object's properities?
public class People{
public String name;
public int age;
public int hashCode(){
// How to get a unique integer based on name and age?
}
}
/*******************************/
public class App{
public static void main( String args[] ){
People mike = new People();
People melissa = new People();
mike.name = "mike";
mike.age = 23;
melissa.name = "melissa";
melissa.age = 24;
System.out.println( mike.hasCode() ); // output?
System.out.println( melissa.hashCode(); // output?
}
}
It doesn't say the hashcode for an object has to be completely unique, only that the hashcode for two equal objects returns the same hashcode. It's entirely legal to have two non-equal objects return the same hashcode. However, the more unique a hashcode distribution is over a set of objects, the better performance you'll get out of HashMaps and other operations that use the hashCode.
IDEs such as IntelliJ Idea have built-in generators for equals and hashCode that generally do a pretty good job at coming up with "good enough" code for most objects (and probably better than some hand-crafted overly-clever hash functions).
For example, here's a hashCode function that Idea generates for your People class:
public int hashCode() {
int result = name != null ? name.hashCode() : 0;
result = 31 * result + age;
return result;
}
I won't go in to the details of hashCode uniqueness as Marc has already addressed it. For your People class, you first need to decide what equality of a person means. Maybe equality is based solely on their name, maybe it's based on name and age. It will be domain specific. Let's say equality is based on name and age. Your overridden equals would look like
public boolean equals(Object obj) {
if (this==obj) return true;
if (obj==null) return false;
if (!(getClass().equals(obj.getClass())) return false;
Person other = (Person)obj;
return (name==null ? other.name==null : name.equals(other.name)) &&
age==other.age;
}
Any time you override equals you must override hashCode. Furthermore, hashCode can't use any more fields in its computation than equals did. Most of the time you must add or exclusive-or the hash code of the various fields (hashCode should be fast to compute). So a valid hashCode method might look like:
public int hashCode() {
return (name==null ? 17 : name.hashCode()) ^ age;
}
Note that the following is not valid as it uses a field that equals didn't (height). In this case two "equals" objects could have a different hash code.
public int hashCode() {
return (name==null ? 17 : name.hashCode()) ^ age ^ height;
}
Also, it's perfectly valid for two non-equals objects to have the same hash code:
public int hashCode() {
return age;
}
In this case Jane age 30 is not equal to Bob age 30, yet both their hash codes are 30. While valid this is undesirable for performance in hash-based collections.
Another question asks if there are some basic low-level things that all programmers should know, and I think hash lookups are one of those. So here goes.
A hash table (note that I'm not using an actual classname) is basically an array of linked lists. To find something in the table, you first compute the hashcode of that something, then mod it by the size of the table. This is an index into the array, and you get a linked list at that index. You then traverse the list until you find your object.
Since array retrieval is O(1), and linked list traversal is O(n), you want a hash function that creates as random a distribution as possible, so that objects will be hashed to different lists. Every object could return the value 0 as its hashcode, and a hash table would still work, but it would essentially be a long linked-list at element 0 of the array.
You also generally want the array to be large, which increases the chances that the object will be in a list of length 1. The Java HashMap, for example, increases the size of the array when the number of entries in the map is > 75% of the size of the array. There's a tradeoff here: you can have a huge array with very few entries and waste memory, or a smaller array where each element in the array is a list with > 1 entries, and waste time traversing. A perfect hash would assign each object to a unique location in the array, with no wasted space.
The term "perfect hash" is a real term, and in some cases you can create a hash function that provides a unique number for each object. This is only possible when you know the set of all possible values. In the general case, you can't achieve this, and there will be some values that return the same hashcode. This is simple mathematics: if you have a string that's more than 4 bytes long, you can't create a unique 4-byte hashcode.
One interesting tidbit: hash arrays are generally sized based on prime numbers, to give the best chance for random allocation when you mod the results, regardless of how random the hashcodes really are.
Edit based on comments:
1) A linked list is not the only way to represent the objects that have the same hashcode, although that is the method used by the JDK 1.5 HashMap. Although less memory-efficient than a simple array, it does arguably create less churn when rehashing (because the entries can be unlinked from one bucket and relinked to another).
2) As of JDK 1.4, the HashMap class uses an array sized as a power of 2; prior to that it used 2^N+1, which I believe is prime for N <= 32. This does not speed up array indexing per se, but does allow the array index to be computed with a bitwise AND rather than a division, as noted by Neil Coffey. Personally, I'd question this as premature optimization, but given the list of authors on HashMap, I'll assume there is some real benefit.
In general the hash code cannot be unique, as there are more values than possible hash codes (integers).
A good hash code distributes the values well over the integers.
A bad one could always give the same value and still be logically correct, it would just lead to unacceptably inefficient hash tables.
Equal values must have the same hash value for hash tables to work correctly.
Otherwise you could add a key to a hash table, then try to look it up via an equal value with a different hash code and not find it.
Or you could put an equal value with a different hash code and have two equal values at different places in the hash table.
In practice you usually select a subset of the fields to be taken into account in both the hashCode() and the equals() method.
I think you misunderstood it. The hashcode does not have to be unique to each object (after all, it is a hash code) though you obviously don't want it to be identical for all objects. You do, however, need it to be identical to all objects that are equal, otherwise things like the standard collections would not work (e.g., you'd look up something in the hash set but would not find it).
For straightforward attributes, some IDEs have hashcode function builders.
If you don't use IDEs, consider using Apahce Commons and the class HashCodeBuilder
The only contractual obligation for hashCode is for it to be consistent. The fields used in creating the hashCode value must be the same or a subset of the fields used in the equals method. This means returning 0 for all values is valid, although not efficient.
One can check if hashCode is consistent via a unit test. I written an abstract class called EqualityTestCase, which does a handful of hashCode checks. One simply has to extend the test case and implement two or three factory methods. The test does a very crude job of testing if the hashCode is efficient.
This is what documentation tells us as for hash code method
# javadoc
Whenever it is invoked on
the same object more than once during
an execution of a Java application,
the hashCode method must consistently
return the same integer, provided no
information used in equals comparisons
on the object is modified. This
integer need not remain consistent
from one execution of an application
to another execution of the same
application.
There is a notion of business key, which determines uniqueness of separate instances of the same type. Each specific type (class) that models a separate entity from the target domain (e.g. vehicle in a fleet system) should have a business key, which is represented by one or more class fields. Methods equals() and hasCode() should both be implemented using the fields, which make up a business key. This ensures that both methods consistent with each other.