Good hashCode() Implementation - java

The accepted answer in Best implementation for hashCode method gives a seemingly good method for finding Hash Codes. But I'm new to Hash Codes, so I don't quite know what to do.
For 1), does it matter what nonzero value I choose? Is 1 just as good as other numbers such as the prime 31?
For 2), do I add each value to c? What if I have two fields that are both a long, int, double, etc?
Did I interpret it right in this class:
public MyClass{
long a, b, c; // these are the only fields
//some code and methods
public int hashCode(){
return 37 * (37 * ((int) (a ^ (a >>> 32))) + (int) (b ^ (b >>> 32)))
+ (int) (c ^ (c >>> 32));
}
}

The value is not important, it can be whatever you want. Prime numbers will result in a better distribution of the hashCode values therefore they are preferred.
You do not necessary have to add them, you are free to implement whatever algorithm you want, as long as it fulfills the hashCode contract:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
There are some algorithms which can be considered as not good hashCode implementations, simple adding of the attributes values being one of them. The reason for that is, if you have a class which has two fields, Integer a, Integer b and your hashCode() just sums up these values then the distribution of the hashCode values is highly depended on the values your instances store. For example, if most of the values of a are between 0-10 and b are between 0-10 then the hashCode values are be between 0-20. This implies that if you store the instance of this class in e.g. HashMap numerous instances will be stored in the same bucket (because numerous instances with different a and b values but with the same sum will be put inside the same bucket). This will have bad impact on the performance of the operations on the map, because when doing a lookup all the elements from the bucket will be compared using equals().
Regarding the algorithm, it looks fine, it is very similar to the one that Eclipse generates, but it is using a different prime number, 31 not 37:
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + (int) (a ^ (a >>> 32));
result = prime * result + (int) (b ^ (b >>> 32));
result = prime * result + (int) (c ^ (c >>> 32));
return result;
}

A well-behaved hashcode method already exists for long values - don't reinvent the wheel:
int hashCode = Long.hashCode((a * 31 + b) * 31 + c); // Java 8+
int hashCode = Long.valueOf((a * 31 + b) * 31 + c).hashCode() // Java <8
Multiplying by a prime number (usually 31 in JDK classes) and cumulating the sum is a common method of creating a "unique" number from several numbers.
The hashCode() method of Long keeps the result properly distributed across the int range, making the hash "well behaved" (basically pseudo random).

Related

Generating hashCode() for a custom class

I have a class named Dish and I handle it inside ArrayLists
So I had to override default hashCode() method.
#Override
public int hashCode() {
int hash =7;
hash = 3*hash + this.getId();
hash = 3*hash + this.getQuantity();
return hash;
}
When I get two dishes with id=4,quan=3 and id=5,quan=0, hashCode() for both is same;
((7*3)+4)*3+3 = 78
((7*3)+5)*3+0 = 78
What am I doing wrong? Or the magic numbers 7 and 3 I have chosen is wrong?
How do I properly override hashCode() so that it generates unique hashes?
PS: From what I searched from google and SO, people use different numbers but the same method. If the problem is with the numbers, how do I wisely choose numbers that doesn't actually increase the cost for multiplication and at the same time works well even for more number of attributes.
Say I had 7 int attributes and my second magic no. is 31, the final hash will be first magic no. * 27512614111 even if all my attributes are 0. So how do I do it without having my hashed value in billions so as keep my processor burden-free?
You can use something like this
public int hashCode() {
int result = 17;
result = 31 * result + getId();
result = 31 * result + getQuantity();
return result;
}
One more thing if your id is unique for each object then no need of using quantity while calculating hashcode.
Here is extract from Effective Java by Joshua bloch telling how implement hashcode method
Store some constant nonzero value, say, 17, in an int variable called result.
For each significant field f in your object (each field taken into account by the equals method, that is), do the following:
a. Compute an int hash code c for the field:
i. If the field is a boolean, compute (f ? 1 : 0).
ii. If the field is a byte , char, short, or int, compute (int) f .
iii. If the field is a long , compute (int) (f ^ (f >>> 32)) .
iv. If the field is a float , compute Float.floatToIntBits(f) .
v. If the field is a double, compute Double.doubleToLongBits(f) , and then hash the resulting long as in step 2.a.iii.
vi. If the field is an object reference and this class’s equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If a more complex comparison is required, compute a “canonical representation” for this field and invoke hashCode on the canonical representation. If the value of the field is null, return 0 (or some other constant, but 0 is traditional).
vii. If the field is an array, treat it as if each element were a separate field. That is, compute a hash code for each **significant element** by applying these rules recursively, and combine these values per step 2.b. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5.
b. Combine the hash code c computed in step 2.a into result as follows:
result = 31 * result + c;
Return result .
When you are finished writing the hashCode method, ask yourself whether equal instances have equal hash codes. Write unit tests to verify your intuition!If equal instances have unequal hash codes, figure out why and fix the problem.
Source: Effective Java by Joshua Bloch
This is perfectly OK. The hashing function is not supposed to be universally unique - it just gives a quick hint about which elements might be equal and should be checked in more depth by a call to equals().
From the name of class looks like quantity is the number of dish. So, There is chance that many times it will be zero. I would say in case getquantity() is zero use a variable say x in the hash fucntion.like this:
#Override
public int hashCode() {
int hash =7;int x =0;
if(getQuantity==0)
{
x = getQuantity+getId();
}
else
{
x = getquantity;
}
hash = 3*hash + this.getId();
hash = 3*hash + x;
return hash;
}
I believe this should reduce the collision of hash.since the getId() you have is a unique number.it makes the x a unique number too.

Calling hashCode() from equals()

I have defined hashCode() for my class, with a lengthy list of class attributes.
Per the contract, I also need to implement equals(), but is it possible to implement it simply by comparing hashCode() inside, to avoid all the extra code? Are there any dangers of doing so?
e.g.
#Override
public int hashCode()
{
return new HashCodeBuilder(17, 37)
.append(field1)
.append(field2)
// etc.
// ...
}
#Override
public boolean equals(Object that) {
// Quick special cases
if (that == null) {
return false;
}
if (this == that) {
return true;
}
// Now consider all main cases via hashCode()
return (this.hashCode() == that.hashCode());
}
Don't do that.
The contract for hashCode() says that two objects that are equal must have the same hashcode. It doesn't guarantee anything for objects that are not equal. What this means is that you could have two objects that are completely different but, by chance, happen to have the same hashcode, thus breaking your equals().
It is not hard to get hashcode collisions between strings. Consider the core loop from the JDK 8 String.hashCode() implementation:
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
Where the initial value for h is 0 and val[i] is the numerical value for the character in the ith position in the given string. If we take, for example, a string of length 3, this loop can be written as:
h = 31 * (31 * val[0] + val[1]) + val[2];
If we choose an arbitrary string, such as "abZ", we have:
h("abZ") = 31 * (31 * 'a' + 'b') + 'Z'
h("abZ") = 31 * (31 * 97 + 98) + 90
h("abZ") = 96345
Then we can subtract 1 from val[1] while adding 31 to val[2], which gives us the string "aay":
h("aay") = 31 * (31 * 'a' + 'a') + 'y'
h("aay") = 31 * (31 * 97 + 97) + 121
h("aay") = 96345
Resulting in a collision: h("abZ") == h("aay") == 96345.
Also, note that your equals() implementation does not check if you are comparing objects of the same type. So, supposing you had this.hashCode() == 96345, the following statement would return true:
yourObject.equals(Integer.valueOf(96345))
Which is probably not what you want.
It is definitely not safe to just compare the hashCode() of your objects.
Your objects can have more different states than hash codes: Hash code is an int, that means it is limited to 2^32 = 4,294,967,296 possible values, but your object will probably have more than one single int field.
So it is proven, that there might be two different objects (according to equals) that have the same hash code.
But of course, you can first compare the hash codes for performance reasons (if hash code computation is faster than field comparison): If the hash codes are not equal, the objects are unequal too, so you can safely return false immediately!

How do I correct my hash function?

This question is a result of the responses submitted to my post at CodeReview.
I have a class called Point, which is basically "intended to encapsulate a point represented in 2D space." I have overrided the hashcode() function which is as follows:
...
#Override
public int hashCode() {
int hash = 0;
hash += (int) (Double.doubleToLongBits(this.getX())
^ (Double.doubleToLongBits(this.getX()) >>> 32));
hash += (int) (Double.doubleToLongBits(this.getY())
^ (Double.doubleToLongBits(this.getY()) >>> 32));
return hash;
}
...
Let me clarify (for those who didn't check the above link) that my Point uses the two doubles: x and y to represent its coordinates.
Problem:
My Problem is evident when I run this method:
public static void main(String[] args) {
Point p1 = Point.getCartesianPoint(12, 0);
Point p2 = Point.getCartesianPoint(0, 12);
System.out.println(p1.hashCode());
System.out.println(p2.hashCode());
}
I get the Output:
1076363264
1076363264
This is clearly a problem. Basically I intend my hashcode() to return equal hashcodes for equal Points. If I reverse the order in one of the parameter declarations (i.e. swap 12 with 1 in one of them to get equal Points), I get the correct (same) result. How can I correct my approach while maintaining the quality or uniqueness of the hash?
You cannot get an integer hash code for two doubles that is unique, without some further information about the numbers being made available about the nature of the numbers in the doubles.
Why?
int is stored as a 32bit representation, double as a 64 bit representation (see the Java tutorial).
So you are trying to store 128 bits of information in a 32 bit space, so it can never give an unique hash.
However
This really isn't the purpose of a hash code, hash codes
just need to have fairly uncommon collisions to be useful.
If you
know something about the double numbers, that reduces their
entropy/information content then you could use this to compress the
number of bits they use. This will be dependant on the application
of this class that you have not discussed yet.
This is why equals
normally does not make use of the hashcode to check for equality,
use getX and getY of each Point to do the comparison instead.
Try this
public int hashCode() {
long bits = Double.doubleToLongBits(x);
int hash = 31 + (int) (bits ^ (bits >>> 32));
bits = Double.doubleToLongBits(y);
hash = 31 * hash + (int) (bits ^ (bits >>> 32));
return hash;
}
this implementation follows Arrays.hashCode(double a[]) pattern.
It produces these hash codes:
-992476223
1076364225
You can find suggestions how to write good hashCode in Effective Java Item. 9
Can't you use the code that is present in Arrays.hashCode already?
Arrays.hashCode(new double[]{x,y});
This is what guava for example is using in Objects.hashCode.
If you have Java 7, simply:
Objects.hash(x,y)
It might be a silly idea, bt you are using + which is a symetric operation and you are getting symetric problems. What if you ue a non-symmetric operation such as division (check for denominator == 0 though) or minus? Or any other that you cna find in literature or invent yourself.

How to decide hashcode value?

How to decide hashcode value? Recently I faced an interview question that "Is 17 a valid hashcode?". Is there any mechanism to define hashcode value? or we can give any number for hashcode value?
Hashcodes should have good dispersion, so that different objects will be saved in different positions of the hash table (otherwise performance could be degraded).
From that point, while 17 us a "valid" hash code (in the sense that it is a 32-bit signed integer), it is suspicious how the hash function was defined.
For instance, a naive approach for hashing a string is just adding the value of each character. This result in similar hash values for simple strings (such as "tar" and "rat" that sum up to the same value).
A common trick is multiplying each value by a small prime, so that simple inputs will return different values, e.g.;
int result = 1;
result = 31 * result + a;
result = 31 * result + b;
or
int h=0;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
(the latter, from the JRE implementation of String.hashCode)
Yes, 17 is a perfectly valid hashcode.
Whatever method you select to derive the hashcode, it should always return the same integer for the object (as long as its state remains the same).
Ya 17 is valid. Usually prime numbers like shown in the link are used, you can implement hashcode using the id of that entity which is a primary key
public int hashCode()
{
int result = 17;
result = 37 * result + (getId() == null ? 0 : this.getId().hashCode());
return result;
}
this provides different ways for implementing the hascode

A faster hash function

I'm trying to implement my own hash function, i add up the ASCII numbers of each string, using java. I find the hash code by finding the mod of the size of the hash table and the sum. size%sum. I was wondering if there was a way to use the same process but reduce collisions, when searching for the string?
Thanks in advance.
I would look at the code for String and HashMap as these have a low collision rate and don't use % and handle negative numbers.
From the source for String
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
From the source for HashMap
/**
* Retrieve object hash code and applies a supplemental hash function to the
* result hash, which defends against poor quality hash functions. This is
* critical because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.
*/
final int hash(Object k) {
int h = 0;
if (useAltHashing) {
if (k instanceof String) {
return sun.misc.Hashing.stringHash32((String) k);
}
h = hashSeed;
}
h ^= k.hashCode();
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
As the HashMap is always a power of 2 in size you can use
hash = (null != key) ? hash(key) : 0;
bucketIndex = indexFor(hash, table.length);
and
/**
* Returns index for hash code h.
*/
static int indexFor(int h, int length) {
return h & (length-1);
}
Using & is much faster than % and only return positive numbers as length is positive.
The Java String.hashcode() makes a tradeoff between being a really good hash function and being as efficient as possible. Simply adding up the character values in a string is not a reliable hash function.
For example, consider the two strings dog and god. Since they both contain a 'd', 'g', and an 'o', no method involving only addition will ever result in a different hash code.
Joshua Bloch, who implemented a good part of Java, discusses the String.hashCode() method in his book Effective Java and talks about how, in versions of Java prior to 1.3, the String.hashCode() function used to consider only 16 characters in a given String. This ran somewhat faster than the current implementation, but resulted is shockingly poor performance in certain situations.
In general, if your specific data set is very well-defined and you could exploit some uniqueness in it, you could probably make a better hash function. For general purpose Strings, good luck.

Categories

Resources