Generating hashCode() for a custom class

Generating hashCode() for a custom class - java

I have a class named Dish and I handle it inside ArrayLists
So I had to override default hashCode() method.
#Override
public int hashCode() {
int hash =7;
hash = 3*hash + this.getId();
hash = 3*hash + this.getQuantity();
return hash;
}
When I get two dishes with id=4,quan=3 and id=5,quan=0, hashCode() for both is same;
((7*3)+4)*3+3 = 78
((7*3)+5)*3+0 = 78
What am I doing wrong? Or the magic numbers 7 and 3 I have chosen is wrong?
How do I properly override hashCode() so that it generates unique hashes?
PS: From what I searched from google and SO, people use different numbers but the same method. If the problem is with the numbers, how do I wisely choose numbers that doesn't actually increase the cost for multiplication and at the same time works well even for more number of attributes.
Say I had 7 int attributes and my second magic no. is 31, the final hash will be first magic no. * 27512614111 even if all my attributes are 0. So how do I do it without having my hashed value in billions so as keep my processor burden-free?

You can use something like this
public int hashCode() {
int result = 17;
result = 31 * result + getId();
result = 31 * result + getQuantity();
return result;
}
One more thing if your id is unique for each object then no need of using quantity while calculating hashcode.
Here is extract from Effective Java by Joshua bloch telling how implement hashcode method
Store some constant nonzero value, say, 17, in an int variable called result.
For each significant field f in your object (each field taken into account by the equals method, that is), do the following:
a. Compute an int hash code c for the field:
i. If the field is a boolean, compute (f ? 1 : 0).
ii. If the field is a byte , char, short, or int, compute (int) f .
iii. If the field is a long , compute (int) (f ^ (f >>> 32)) .
iv. If the field is a float , compute Float.floatToIntBits(f) .
v. If the field is a double, compute Double.doubleToLongBits(f) , and then hash the resulting long as in step 2.a.iii.
vi. If the field is an object reference and this class’s equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If a more complex comparison is required, compute a “canonical representation” for this field and invoke hashCode on the canonical representation. If the value of the field is null, return 0 (or some other constant, but 0 is traditional).
vii. If the field is an array, treat it as if each element were a separate field. That is, compute a hash code for each **significant element** by applying these rules recursively, and combine these values per step 2.b. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5.
b. Combine the hash code c computed in step 2.a into result as follows:
result = 31 * result + c;
Return result .
When you are finished writing the hashCode method, ask yourself whether equal instances have equal hash codes. Write unit tests to verify your intuition!If equal instances have unequal hash codes, figure out why and fix the problem.
Source: Effective Java by Joshua Bloch

This is perfectly OK. The hashing function is not supposed to be universally unique - it just gives a quick hint about which elements might be equal and should be checked in more depth by a call to equals().

From the name of class looks like quantity is the number of dish. So, There is chance that many times it will be zero. I would say in case getquantity() is zero use a variable say x in the hash fucntion.like this:
#Override
public int hashCode() {
int hash =7;int x =0;
if(getQuantity==0)
{
x = getQuantity+getId();
}
else
{
x = getquantity;
}
hash = 3*hash + this.getId();
hash = 3*hash + x;
return hash;
}
I believe this should reduce the collision of hash.since the getId() you have is a unique number.it makes the x a unique number too.

Related

Calling hashCode() from equals()

I have defined hashCode() for my class, with a lengthy list of class attributes.
Per the contract, I also need to implement equals(), but is it possible to implement it simply by comparing hashCode() inside, to avoid all the extra code? Are there any dangers of doing so?
e.g.
#Override
public int hashCode()
{
return new HashCodeBuilder(17, 37)
.append(field1)
.append(field2)
// etc.
// ...
}
#Override
public boolean equals(Object that) {
// Quick special cases
if (that == null) {
return false;
}
if (this == that) {
return true;
}
// Now consider all main cases via hashCode()
return (this.hashCode() == that.hashCode());
}

Don't do that.
The contract for hashCode() says that two objects that are equal must have the same hashcode. It doesn't guarantee anything for objects that are not equal. What this means is that you could have two objects that are completely different but, by chance, happen to have the same hashcode, thus breaking your equals().
It is not hard to get hashcode collisions between strings. Consider the core loop from the JDK 8 String.hashCode() implementation:
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
Where the initial value for h is 0 and val[i] is the numerical value for the character in the ith position in the given string. If we take, for example, a string of length 3, this loop can be written as:
h = 31 * (31 * val[0] + val[1]) + val[2];
If we choose an arbitrary string, such as "abZ", we have:
h("abZ") = 31 * (31 * 'a' + 'b') + 'Z'
h("abZ") = 31 * (31 * 97 + 98) + 90
h("abZ") = 96345
Then we can subtract 1 from val[1] while adding 31 to val[2], which gives us the string "aay":
h("aay") = 31 * (31 * 'a' + 'a') + 'y'
h("aay") = 31 * (31 * 97 + 97) + 121
h("aay") = 96345
Resulting in a collision: h("abZ") == h("aay") == 96345.
Also, note that your equals() implementation does not check if you are comparing objects of the same type. So, supposing you had this.hashCode() == 96345, the following statement would return true:
yourObject.equals(Integer.valueOf(96345))
Which is probably not what you want.

It is definitely not safe to just compare the hashCode() of your objects.
Your objects can have more different states than hash codes: Hash code is an int, that means it is limited to 2^32 = 4,294,967,296 possible values, but your object will probably have more than one single int field.
So it is proven, that there might be two different objects (according to equals) that have the same hash code.
But of course, you can first compare the hash codes for performance reasons (if hash code computation is faster than field comparison): If the hash codes are not equal, the objects are unequal too, so you can safely return false immediately!

How do I correct my hash function?

This question is a result of the responses submitted to my post at CodeReview.
I have a class called Point, which is basically "intended to encapsulate a point represented in 2D space." I have overrided the hashcode() function which is as follows:
...
#Override
public int hashCode() {
int hash = 0;
hash += (int) (Double.doubleToLongBits(this.getX())
^ (Double.doubleToLongBits(this.getX()) >>> 32));
hash += (int) (Double.doubleToLongBits(this.getY())
^ (Double.doubleToLongBits(this.getY()) >>> 32));
return hash;
}
...
Let me clarify (for those who didn't check the above link) that my Point uses the two doubles: x and y to represent its coordinates.
Problem:
My Problem is evident when I run this method:
public static void main(String[] args) {
Point p1 = Point.getCartesianPoint(12, 0);
Point p2 = Point.getCartesianPoint(0, 12);
System.out.println(p1.hashCode());
System.out.println(p2.hashCode());
}
I get the Output:
1076363264
1076363264
This is clearly a problem. Basically I intend my hashcode() to return equal hashcodes for equal Points. If I reverse the order in one of the parameter declarations (i.e. swap 12 with 1 in one of them to get equal Points), I get the correct (same) result. How can I correct my approach while maintaining the quality or uniqueness of the hash?

You cannot get an integer hash code for two doubles that is unique, without some further information about the numbers being made available about the nature of the numbers in the doubles.
Why?
int is stored as a 32bit representation, double as a 64 bit representation (see the Java tutorial).
So you are trying to store 128 bits of information in a 32 bit space, so it can never give an unique hash.
However
This really isn't the purpose of a hash code, hash codes
just need to have fairly uncommon collisions to be useful.
If you
know something about the double numbers, that reduces their
entropy/information content then you could use this to compress the
number of bits they use. This will be dependant on the application
of this class that you have not discussed yet.
This is why equals
normally does not make use of the hashcode to check for equality,
use getX and getY of each Point to do the comparison instead.

Try this
public int hashCode() {
long bits = Double.doubleToLongBits(x);
int hash = 31 + (int) (bits ^ (bits >>> 32));
bits = Double.doubleToLongBits(y);
hash = 31 * hash + (int) (bits ^ (bits >>> 32));
return hash;
}
this implementation follows Arrays.hashCode(double a[]) pattern.
It produces these hash codes:
-992476223
1076364225
You can find suggestions how to write good hashCode in Effective Java Item. 9

Can't you use the code that is present in Arrays.hashCode already?
Arrays.hashCode(new double[]{x,y});
This is what guava for example is using in Objects.hashCode.
If you have Java 7, simply:
Objects.hash(x,y)

It might be a silly idea, bt you are using + which is a symetric operation and you are getting symetric problems. What if you ue a non-symmetric operation such as division (check for denominator == 0 though) or minus? Or any other that you cna find in literature or invent yourself.

Good hashCode() Implementation

The accepted answer in Best implementation for hashCode method gives a seemingly good method for finding Hash Codes. But I'm new to Hash Codes, so I don't quite know what to do.
For 1), does it matter what nonzero value I choose? Is 1 just as good as other numbers such as the prime 31?
For 2), do I add each value to c? What if I have two fields that are both a long, int, double, etc?
Did I interpret it right in this class:
public MyClass{
long a, b, c; // these are the only fields
//some code and methods
public int hashCode(){
return 37 * (37 * ((int) (a ^ (a >>> 32))) + (int) (b ^ (b >>> 32)))
+ (int) (c ^ (c >>> 32));
}
}

The value is not important, it can be whatever you want. Prime numbers will result in a better distribution of the hashCode values therefore they are preferred.
You do not necessary have to add them, you are free to implement whatever algorithm you want, as long as it fulfills the hashCode contract:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
There are some algorithms which can be considered as not good hashCode implementations, simple adding of the attributes values being one of them. The reason for that is, if you have a class which has two fields, Integer a, Integer b and your hashCode() just sums up these values then the distribution of the hashCode values is highly depended on the values your instances store. For example, if most of the values of a are between 0-10 and b are between 0-10 then the hashCode values are be between 0-20. This implies that if you store the instance of this class in e.g. HashMap numerous instances will be stored in the same bucket (because numerous instances with different a and b values but with the same sum will be put inside the same bucket). This will have bad impact on the performance of the operations on the map, because when doing a lookup all the elements from the bucket will be compared using equals().
Regarding the algorithm, it looks fine, it is very similar to the one that Eclipse generates, but it is using a different prime number, 31 not 37:
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + (int) (a ^ (a >>> 32));
result = prime * result + (int) (b ^ (b >>> 32));
result = prime * result + (int) (c ^ (c >>> 32));
return result;
}

A well-behaved hashcode method already exists for long values - don't reinvent the wheel:
int hashCode = Long.hashCode((a * 31 + b) * 31 + c); // Java 8+
int hashCode = Long.valueOf((a * 31 + b) * 31 + c).hashCode() // Java <8
Multiplying by a prime number (usually 31 in JDK classes) and cumulating the sum is a common method of creating a "unique" number from several numbers.
The hashCode() method of Long keeps the result properly distributed across the int range, making the hash "well behaved" (basically pseudo random).

How to decide hashcode value?

How to decide hashcode value? Recently I faced an interview question that "Is 17 a valid hashcode?". Is there any mechanism to define hashcode value? or we can give any number for hashcode value?

Hashcodes should have good dispersion, so that different objects will be saved in different positions of the hash table (otherwise performance could be degraded).
From that point, while 17 us a "valid" hash code (in the sense that it is a 32-bit signed integer), it is suspicious how the hash function was defined.
For instance, a naive approach for hashing a string is just adding the value of each character. This result in similar hash values for simple strings (such as "tar" and "rat" that sum up to the same value).
A common trick is multiplying each value by a small prime, so that simple inputs will return different values, e.g.;
int result = 1;
result = 31 * result + a;
result = 31 * result + b;
or
int h=0;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
(the latter, from the JRE implementation of String.hashCode)

Yes, 17 is a perfectly valid hashcode.
Whatever method you select to derive the hashcode, it should always return the same integer for the object (as long as its state remains the same).

Ya 17 is valid. Usually prime numbers like shown in the link are used, you can implement hashcode using the id of that entity which is a primary key
public int hashCode()
{
int result = 17;
result = 37 * result + (getId() == null ? 0 : this.getId().hashCode());
return result;
}
this provides different ways for implementing the hascode

Implement "tolerant" `equals` & `hashCode` for a class with floating point members

I have a class with a float field. For example:
public class MultipleFields {
final int count;
final float floatValue;
public MultipleFields(int count, float floatValue) {
this.count = count;
this.floatValue = floatValue;
}
}
I need to be able to compare instances by value. Now how do I properly implement equals & hashCode?
The usual way to implement equals and hashCode is to just consider all fields. E.g. Eclipse will generate the following equals:
public boolean equals(Object obj) {
// irrelevant type checks removed
....
MultipleFields other = (MultipleFields) obj;
if (count != other.count)
return false;
if (Float.floatToIntBits(floatValue) != Float.floatToIntBits(other.floatValue))
return false;
return true;
}
(and a similar hashCode, that essentially computes count* 31 + Float.floatToIntBits(floatValue)).
The problem with this is that my FP values are subject to rounding errors (they may come from user input, from a DB, etc.). So I need a "tolerant" comparison.
The common solution is to compare using an epsilon value (see e.g. Comparing IEEE floats and doubles for equality). However, I'm not quite certain how I can implement equals using this method, and still have a hashCode that is consisten with equals.
My idea is to define the number of significant digits for comparison, then always round to that number of digits in both equals and hashCode:
long comparisonFloatValue = Math.round(floatValue* (Math.pow(10, RELEVANT_DIGITS)));
Then if I replace all uses of floatValue with comparisonFloatValue in equals and hashCode, I should get a "tolerant" comparison, which is consistent with hashCode.
Will this work?
Do you see any problems with this approach?
Is there a better way to do this? It seems rather complicated.

The big problem with it is that two float values could still be very close together but still compare unequal. Basically you're dividing the range of floating point values into buckets - and two values could be very close together without being in the same bucket. Imagine you were using two significant digits, applying truncation to obtain the bucket, for example... then 11.999999 and 12.000001 would be unequal, but 12.000001 and 12.9999999 would be equal despite being much further apart from each other.
Unfortunately, if you don't bucket values like this, you can't implement equals appropriately because of transitivity: x and y may be close together, y and z may be close together, but that doesn't mean that x and z are close together.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Generating hashCode() for a custom class - java

This is perfectly OK. The hashing function is not supposed to be universally unique - it just gives a quick hint about which elements might be equal and should be checked in more depth by a call to equals().

Related

Calling hashCode() from equals()

How do I correct my hash function?

Good hashCode() Implementation

How to decide hashcode value?

Implement "tolerant" `equals` & `hashCode` for a class with floating point members

Categories

Resources