How to decide hashcode value? Recently I faced an interview question that "Is 17 a valid hashcode?". Is there any mechanism to define hashcode value? or we can give any number for hashcode value?
Hashcodes should have good dispersion, so that different objects will be saved in different positions of the hash table (otherwise performance could be degraded).
From that point, while 17 us a "valid" hash code (in the sense that it is a 32-bit signed integer), it is suspicious how the hash function was defined.
For instance, a naive approach for hashing a string is just adding the value of each character. This result in similar hash values for simple strings (such as "tar" and "rat" that sum up to the same value).
A common trick is multiplying each value by a small prime, so that simple inputs will return different values, e.g.;
int result = 1;
result = 31 * result + a;
result = 31 * result + b;
or
int h=0;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
(the latter, from the JRE implementation of String.hashCode)
Yes, 17 is a perfectly valid hashcode.
Whatever method you select to derive the hashcode, it should always return the same integer for the object (as long as its state remains the same).
Ya 17 is valid. Usually prime numbers like shown in the link are used, you can implement hashcode using the id of that entity which is a primary key
public int hashCode()
{
int result = 17;
result = 37 * result + (getId() == null ? 0 : this.getId().hashCode());
return result;
}
this provides different ways for implementing the hascode
Related
I have a class named Dish and I handle it inside ArrayLists
So I had to override default hashCode() method.
#Override
public int hashCode() {
int hash =7;
hash = 3*hash + this.getId();
hash = 3*hash + this.getQuantity();
return hash;
}
When I get two dishes with id=4,quan=3 and id=5,quan=0, hashCode() for both is same;
((7*3)+4)*3+3 = 78
((7*3)+5)*3+0 = 78
What am I doing wrong? Or the magic numbers 7 and 3 I have chosen is wrong?
How do I properly override hashCode() so that it generates unique hashes?
PS: From what I searched from google and SO, people use different numbers but the same method. If the problem is with the numbers, how do I wisely choose numbers that doesn't actually increase the cost for multiplication and at the same time works well even for more number of attributes.
Say I had 7 int attributes and my second magic no. is 31, the final hash will be first magic no. * 27512614111 even if all my attributes are 0. So how do I do it without having my hashed value in billions so as keep my processor burden-free?
You can use something like this
public int hashCode() {
int result = 17;
result = 31 * result + getId();
result = 31 * result + getQuantity();
return result;
}
One more thing if your id is unique for each object then no need of using quantity while calculating hashcode.
Here is extract from Effective Java by Joshua bloch telling how implement hashcode method
Store some constant nonzero value, say, 17, in an int variable called result.
For each significant field f in your object (each field taken into account by the equals method, that is), do the following:
a. Compute an int hash code c for the field:
i. If the field is a boolean, compute (f ? 1 : 0).
ii. If the field is a byte , char, short, or int, compute (int) f .
iii. If the field is a long , compute (int) (f ^ (f >>> 32)) .
iv. If the field is a float , compute Float.floatToIntBits(f) .
v. If the field is a double, compute Double.doubleToLongBits(f) , and then hash the resulting long as in step 2.a.iii.
vi. If the field is an object reference and this class’s equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If a more complex comparison is required, compute a “canonical representation” for this field and invoke hashCode on the canonical representation. If the value of the field is null, return 0 (or some other constant, but 0 is traditional).
vii. If the field is an array, treat it as if each element were a separate field. That is, compute a hash code for each **significant element** by applying these rules recursively, and combine these values per step 2.b. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5.
b. Combine the hash code c computed in step 2.a into result as follows:
result = 31 * result + c;
Return result .
When you are finished writing the hashCode method, ask yourself whether equal instances have equal hash codes. Write unit tests to verify your intuition!If equal instances have unequal hash codes, figure out why and fix the problem.
Source: Effective Java by Joshua Bloch
This is perfectly OK. The hashing function is not supposed to be universally unique - it just gives a quick hint about which elements might be equal and should be checked in more depth by a call to equals().
From the name of class looks like quantity is the number of dish. So, There is chance that many times it will be zero. I would say in case getquantity() is zero use a variable say x in the hash fucntion.like this:
#Override
public int hashCode() {
int hash =7;int x =0;
if(getQuantity==0)
{
x = getQuantity+getId();
}
else
{
x = getquantity;
}
hash = 3*hash + this.getId();
hash = 3*hash + x;
return hash;
}
I believe this should reduce the collision of hash.since the getId() you have is a unique number.it makes the x a unique number too.
I have defined hashCode() for my class, with a lengthy list of class attributes.
Per the contract, I also need to implement equals(), but is it possible to implement it simply by comparing hashCode() inside, to avoid all the extra code? Are there any dangers of doing so?
e.g.
#Override
public int hashCode()
{
return new HashCodeBuilder(17, 37)
.append(field1)
.append(field2)
// etc.
// ...
}
#Override
public boolean equals(Object that) {
// Quick special cases
if (that == null) {
return false;
}
if (this == that) {
return true;
}
// Now consider all main cases via hashCode()
return (this.hashCode() == that.hashCode());
}
Don't do that.
The contract for hashCode() says that two objects that are equal must have the same hashcode. It doesn't guarantee anything for objects that are not equal. What this means is that you could have two objects that are completely different but, by chance, happen to have the same hashcode, thus breaking your equals().
It is not hard to get hashcode collisions between strings. Consider the core loop from the JDK 8 String.hashCode() implementation:
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
Where the initial value for h is 0 and val[i] is the numerical value for the character in the ith position in the given string. If we take, for example, a string of length 3, this loop can be written as:
h = 31 * (31 * val[0] + val[1]) + val[2];
If we choose an arbitrary string, such as "abZ", we have:
h("abZ") = 31 * (31 * 'a' + 'b') + 'Z'
h("abZ") = 31 * (31 * 97 + 98) + 90
h("abZ") = 96345
Then we can subtract 1 from val[1] while adding 31 to val[2], which gives us the string "aay":
h("aay") = 31 * (31 * 'a' + 'a') + 'y'
h("aay") = 31 * (31 * 97 + 97) + 121
h("aay") = 96345
Resulting in a collision: h("abZ") == h("aay") == 96345.
Also, note that your equals() implementation does not check if you are comparing objects of the same type. So, supposing you had this.hashCode() == 96345, the following statement would return true:
yourObject.equals(Integer.valueOf(96345))
Which is probably not what you want.
It is definitely not safe to just compare the hashCode() of your objects.
Your objects can have more different states than hash codes: Hash code is an int, that means it is limited to 2^32 = 4,294,967,296 possible values, but your object will probably have more than one single int field.
So it is proven, that there might be two different objects (according to equals) that have the same hash code.
But of course, you can first compare the hash codes for performance reasons (if hash code computation is faster than field comparison): If the hash codes are not equal, the objects are unequal too, so you can safely return false immediately!
This question is a result of the responses submitted to my post at CodeReview.
I have a class called Point, which is basically "intended to encapsulate a point represented in 2D space." I have overrided the hashcode() function which is as follows:
...
#Override
public int hashCode() {
int hash = 0;
hash += (int) (Double.doubleToLongBits(this.getX())
^ (Double.doubleToLongBits(this.getX()) >>> 32));
hash += (int) (Double.doubleToLongBits(this.getY())
^ (Double.doubleToLongBits(this.getY()) >>> 32));
return hash;
}
...
Let me clarify (for those who didn't check the above link) that my Point uses the two doubles: x and y to represent its coordinates.
Problem:
My Problem is evident when I run this method:
public static void main(String[] args) {
Point p1 = Point.getCartesianPoint(12, 0);
Point p2 = Point.getCartesianPoint(0, 12);
System.out.println(p1.hashCode());
System.out.println(p2.hashCode());
}
I get the Output:
1076363264
1076363264
This is clearly a problem. Basically I intend my hashcode() to return equal hashcodes for equal Points. If I reverse the order in one of the parameter declarations (i.e. swap 12 with 1 in one of them to get equal Points), I get the correct (same) result. How can I correct my approach while maintaining the quality or uniqueness of the hash?
You cannot get an integer hash code for two doubles that is unique, without some further information about the numbers being made available about the nature of the numbers in the doubles.
Why?
int is stored as a 32bit representation, double as a 64 bit representation (see the Java tutorial).
So you are trying to store 128 bits of information in a 32 bit space, so it can never give an unique hash.
However
This really isn't the purpose of a hash code, hash codes
just need to have fairly uncommon collisions to be useful.
If you
know something about the double numbers, that reduces their
entropy/information content then you could use this to compress the
number of bits they use. This will be dependant on the application
of this class that you have not discussed yet.
This is why equals
normally does not make use of the hashcode to check for equality,
use getX and getY of each Point to do the comparison instead.
Try this
public int hashCode() {
long bits = Double.doubleToLongBits(x);
int hash = 31 + (int) (bits ^ (bits >>> 32));
bits = Double.doubleToLongBits(y);
hash = 31 * hash + (int) (bits ^ (bits >>> 32));
return hash;
}
this implementation follows Arrays.hashCode(double a[]) pattern.
It produces these hash codes:
-992476223
1076364225
You can find suggestions how to write good hashCode in Effective Java Item. 9
Can't you use the code that is present in Arrays.hashCode already?
Arrays.hashCode(new double[]{x,y});
This is what guava for example is using in Objects.hashCode.
If you have Java 7, simply:
Objects.hash(x,y)
It might be a silly idea, bt you are using + which is a symetric operation and you are getting symetric problems. What if you ue a non-symmetric operation such as division (check for denominator == 0 though) or minus? Or any other that you cna find in literature or invent yourself.
The accepted answer in Best implementation for hashCode method gives a seemingly good method for finding Hash Codes. But I'm new to Hash Codes, so I don't quite know what to do.
For 1), does it matter what nonzero value I choose? Is 1 just as good as other numbers such as the prime 31?
For 2), do I add each value to c? What if I have two fields that are both a long, int, double, etc?
Did I interpret it right in this class:
public MyClass{
long a, b, c; // these are the only fields
//some code and methods
public int hashCode(){
return 37 * (37 * ((int) (a ^ (a >>> 32))) + (int) (b ^ (b >>> 32)))
+ (int) (c ^ (c >>> 32));
}
}
The value is not important, it can be whatever you want. Prime numbers will result in a better distribution of the hashCode values therefore they are preferred.
You do not necessary have to add them, you are free to implement whatever algorithm you want, as long as it fulfills the hashCode contract:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
There are some algorithms which can be considered as not good hashCode implementations, simple adding of the attributes values being one of them. The reason for that is, if you have a class which has two fields, Integer a, Integer b and your hashCode() just sums up these values then the distribution of the hashCode values is highly depended on the values your instances store. For example, if most of the values of a are between 0-10 and b are between 0-10 then the hashCode values are be between 0-20. This implies that if you store the instance of this class in e.g. HashMap numerous instances will be stored in the same bucket (because numerous instances with different a and b values but with the same sum will be put inside the same bucket). This will have bad impact on the performance of the operations on the map, because when doing a lookup all the elements from the bucket will be compared using equals().
Regarding the algorithm, it looks fine, it is very similar to the one that Eclipse generates, but it is using a different prime number, 31 not 37:
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + (int) (a ^ (a >>> 32));
result = prime * result + (int) (b ^ (b >>> 32));
result = prime * result + (int) (c ^ (c >>> 32));
return result;
}
A well-behaved hashcode method already exists for long values - don't reinvent the wheel:
int hashCode = Long.hashCode((a * 31 + b) * 31 + c); // Java 8+
int hashCode = Long.valueOf((a * 31 + b) * 31 + c).hashCode() // Java <8
Multiplying by a prime number (usually 31 in JDK classes) and cumulating the sum is a common method of creating a "unique" number from several numbers.
The hashCode() method of Long keeps the result properly distributed across the int range, making the hash "well behaved" (basically pseudo random).
I was going through HashMap and read the following analysis ..
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor.
The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.
The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.
When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
The default initial capacity is 16, the default load factor is 0.75. You can supply other values in the map's constructor.
Now suppose I have a map..
HashMap map=new HashMap();//HashMap key random order.
System.out.println("Amit".hashCode());
map.put("Amit","Java");
map.put("mAit","J2EE");
map.put("Saral","J2rrrEE");
I want collision to occur please advise how the collision would occur..!!
I believe the exact hashmap behavior is implementation dependent. Just look at however your class library is doing the hashing and construct a collision. It's pretty simple.
If you want collisions on arbitrary objects instead of strings, it's a lot easier. Just create a class with a custom hashCode() that always returns 0.
If you want really collision to be occured then it's better to write your own custom hash code. Say for example, if you want collision for Amit and mAit, you can do one thing, just use addition of ascii values of the chars as the hash code. You will get collision for different keys.
Collision will happend when 2 keys has the same hash key .
I didn't calc your keys hash keys , but i don't think they have the same hash key, so collision will not occurred if they don't have the same hash key.
If you will put the Same string as key than you will haves collision
Collision here is definitely possible and not tied to hash table implementation.
HashMap works internally by using Object.hashCode to map objects to buckets, and then uses a collision resolution mechanism (the OpenJDK implementation uses separate-chaining) with Object.equals.
To answer your question, String.hashCode is well-defined for compatibility...
Returns a hash code for this string. The hash code for a String object is computed as
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
using int arithmetic, where s[i] is the i-th character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)
Or, in code (from OpenJDK)
public int hashCode() {
int h = hash;
if (h == 0 && count > 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
As with any hash function, collisions are possible. According to the Wikipedia article, it states that, for example, "FB" and "Ea" result in the same value.
If you want more, it should be a trivial bruteforce problem to find collisions which have the same hash value here.
As a side note, I'd thought I'd point out how this is very similar to the function as in the second edition of the The C Programming Language:
#define HASHSIZE 100
unsigned hash(char *s)
{
unsigned hashval;
for(hashval = 0; *s != '\0'; s++)
hashval = *s + 31 * hashval;
return hashval % HASHSIZE;
}