Calling hashCode() from equals() - java

I have defined hashCode() for my class, with a lengthy list of class attributes.
Per the contract, I also need to implement equals(), but is it possible to implement it simply by comparing hashCode() inside, to avoid all the extra code? Are there any dangers of doing so?
e.g.
#Override
public int hashCode()
{
return new HashCodeBuilder(17, 37)
.append(field1)
.append(field2)
// etc.
// ...
}
#Override
public boolean equals(Object that) {
// Quick special cases
if (that == null) {
return false;
}
if (this == that) {
return true;
}
// Now consider all main cases via hashCode()
return (this.hashCode() == that.hashCode());
}

Don't do that.
The contract for hashCode() says that two objects that are equal must have the same hashcode. It doesn't guarantee anything for objects that are not equal. What this means is that you could have two objects that are completely different but, by chance, happen to have the same hashcode, thus breaking your equals().
It is not hard to get hashcode collisions between strings. Consider the core loop from the JDK 8 String.hashCode() implementation:
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
Where the initial value for h is 0 and val[i] is the numerical value for the character in the ith position in the given string. If we take, for example, a string of length 3, this loop can be written as:
h = 31 * (31 * val[0] + val[1]) + val[2];
If we choose an arbitrary string, such as "abZ", we have:
h("abZ") = 31 * (31 * 'a' + 'b') + 'Z'
h("abZ") = 31 * (31 * 97 + 98) + 90
h("abZ") = 96345
Then we can subtract 1 from val[1] while adding 31 to val[2], which gives us the string "aay":
h("aay") = 31 * (31 * 'a' + 'a') + 'y'
h("aay") = 31 * (31 * 97 + 97) + 121
h("aay") = 96345
Resulting in a collision: h("abZ") == h("aay") == 96345.
Also, note that your equals() implementation does not check if you are comparing objects of the same type. So, supposing you had this.hashCode() == 96345, the following statement would return true:
yourObject.equals(Integer.valueOf(96345))
Which is probably not what you want.

It is definitely not safe to just compare the hashCode() of your objects.
Your objects can have more different states than hash codes: Hash code is an int, that means it is limited to 2^32 = 4,294,967,296 possible values, but your object will probably have more than one single int field.
So it is proven, that there might be two different objects (according to equals) that have the same hash code.
But of course, you can first compare the hash codes for performance reasons (if hash code computation is faster than field comparison): If the hash codes are not equal, the objects are unequal too, so you can safely return false immediately!

Related

Regarding equals() and == in a generic class [duplicate]

I just saw code similar to this:
public class Scratch
{
public static void main(String[] args)
{
Integer a = 1000, b = 1000;
System.out.println(a == b);
Integer c = 100, d = 100;
System.out.println(c == d);
}
}
When ran, this block of code will print out:
false
true
I understand why the first is false: because the two objects are separate objects, so the == compares the references. But I can't figure out, why is the second statement returning true? Is there some strange autoboxing rule that kicks in when an Integer's value is in a certain range? What's going on here?
The true line is actually guaranteed by the language specification. From section 5.1.7:
If the value p being boxed is true,
false, a byte, a char in the range
\u0000 to \u007f, or an int or short
number between -128 and 127, then let
r1 and r2 be the results of any two
boxing conversions of p. It is always
the case that r1 == r2.
The discussion goes on, suggesting that although your second line of output is guaranteed, the first isn't (see the last paragraph quoted below):
Ideally, boxing a given primitive
value p, would always yield an
identical reference. In practice, this
may not be feasible using existing
implementation techniques. The rules
above are a pragmatic compromise. The
final clause above requires that
certain common values always be boxed
into indistinguishable objects. The
implementation may cache these, lazily
or eagerly.
For other values, this formulation
disallows any assumptions about the
identity of the boxed values on the
programmer's part. This would allow
(but not require) sharing of some or
all of these references.
This ensures that in most common
cases, the behavior will be the
desired one, without imposing an undue
performance penalty, especially on
small devices. Less memory-limited
implementations might, for example,
cache all characters and shorts, as
well as integers and longs in the
range of -32K - +32K.
public class Scratch
{
public static void main(String[] args)
{
Integer a = 1000, b = 1000; //1
System.out.println(a == b);
Integer c = 100, d = 100; //2
System.out.println(c == d);
}
}
Output:
false
true
Yep the first output is produced for comparing reference; 'a' and 'b' - these are two different reference. In point 1, actually two references are created which is similar as -
Integer a = new Integer(1000);
Integer b = new Integer(1000);
The second output is produced because the JVM tries to save memory, when the Integer falls in a range (from -128 to 127). At point 2 no new reference of type Integer is created for 'd'. Instead of creating a new object for the Integer type reference variable 'd', it only assigned with previously created object referenced by 'c'. All of these are done by JVM.
These memory saving rules are not only for Integer. for memory saving purpose, two instances of the following wrapper objects (while created through boxing), will always be == where their primitive values are the same -
Boolean
Byte
Character from \u0000 to \u007f (7f is 127 in decimal)
Short and Integer from -128 to 127
Integer objects in some range (I think maybe -128 through 127) get cached and re-used. Integers outside that range get a new object each time.
Integer Cache is a feature that was introduced in Java Version 5 basically for :
Saving of Memory space
Improvement in performance.
Integer number1 = 127;
Integer number2 = 127;
System.out.println("number1 == number2" + (number1 == number2);
OUTPUT: True
Integer number1 = 128;
Integer number2 = 128;
System.out.println("number1 == number2" + (number1 == number2);
OUTPUT: False
HOW?
Actually when we assign value to an Integer object, it does auto promotion behind the hood.
Integer object = 100;
is actually calling Integer.valueOf() function
Integer object = Integer.valueOf(100);
Nitty-gritty details of valueOf(int)
public static Integer valueOf(int i) {
if (i >= IntegerCache.low && i <= IntegerCache.high)
return IntegerCache.cache[i + (-IntegerCache.low)];
return new Integer(i);
}
Description:
This method will always cache values in the range -128 to 127,
inclusive, and may cache other values outside of this range.
When a value within range of -128 to 127 is required it returns a constant memory location every time.
However, when we need a value thats greater than 127
return new Integer(i);
returns a new reference every time we initiate an object.
Furthermore, == operators in Java is used to compares two memory references and not values.
Object1 located at say 1000 and contains value 6.
Object2 located at say 1020 and contains value 6.
Object1 == Object2 is False as they have different memory locations though contains same values.
Yes, there is a strange autoboxing rule that kicks in when the values are in a certain range. When you assign a constant to an Object variable, nothing in the language definition says a new object must be created. It may reuse an existing object from cache.
In fact, the JVM will usually store a cache of small Integers for this purpose, as well as values such as Boolean.TRUE and Boolean.FALSE.
My guess is that Java keeps a cache of small integers that are already 'boxed' because they are so very common and it saves a heck of a lot of time to re-use an existing object than to create a new one.
That is an interesting point.
In the book Effective Java suggests always to override equals for your own classes. Also that, to check equality for two object instances of a java class always use the equals method.
public class Scratch
{
public static void main(String[] args)
{
Integer a = 1000, b = 1000;
System.out.println(a.equals(b));
Integer c = 100, d = 100;
System.out.println(c.equals(d));
}
}
returns:
true
true
Direct assignment of an int literal to an Integer reference is an example of auto-boxing, where the literal value to object conversion code is handled by the compiler.
So during compilation phase compiler converts Integer a = 1000, b = 1000; to Integer a = Integer.valueOf(1000), b = Integer.valueOf(1000);.
So it is Integer.valueOf() method which actually gives us the integer objects, and if we look at the source code of Integer.valueOf() method we can clearly see the method caches integer objects in the range -128 to 127 (inclusive).
/**
* Returns an {#code Integer} instance representing the specified
* {#code int} value. If a new {#code Integer} instance is not
* required, this method should generally be used in preference to
* the constructor {#link #Integer(int)}, as this method is likely
* to yield significantly better space and time performance by
* caching frequently requested values.
*
* This method will always cache values in the range -128 to 127,
* inclusive, and may cache other values outside of this range.
*
* #param i an {#code int} value.
* #return an {#code Integer} instance representing {#code i}.
* #since 1.5
*/
public static Integer valueOf(int i) {
if (i >= IntegerCache.low && i <= IntegerCache.high)
return IntegerCache.cache[i + (-IntegerCache.low)];
return new Integer(i);
}
So instead of creating and returning new integer objects, Integer.valueOf() the method returns Integer objects from the internal IntegerCache if the passed int literal is greater than -128 and less than 127.
Java caches these integer objects because this range of integers gets used a lot in day to day programming which indirectly saves some memory.
The cache is initialized on the first usage when the class gets loaded into memory because of the static block. The max range of the cache can be controlled by the -XX:AutoBoxCacheMax JVM option.
This caching behaviour is not applicable for Integer objects only, similar to Integer.IntegerCache we also have ByteCache, ShortCache, LongCache, CharacterCache for Byte, Short, Long, Character respectively.
You can read more on my article Java Integer Cache - Why Integer.valueOf(127) == Integer.valueOf(127) Is True.
In Java the boxing works in the range between -128 and 127 for an Integer. When you are using numbers in this range you can compare it with the == operator. For Integer objects outside the range you have to use equals.
If we check the source code of the Integer class, we can find the source of the valueOf method just like this:
public static Integer valueOf(int i) {
if (i >= IntegerCache.low && i <= IntegerCache.high)
return IntegerCache.cache[i + (-IntegerCache.low)];
return new Integer(i);
}
This explains why Integer objects, which are in the range from -128 (Integer.low) to 127 (Integer.high), are the same referenced objects during the autoboxing. And we can see there is a class IntegerCache that takes care of the Integer cache array, which is a private static inner class of the Integer class.
There is another interesting example that may help to understand this weird situation:
public static void main(String[] args) throws ReflectiveOperationException {
Class cache = Integer.class.getDeclaredClasses()[0];
Field myCache = cache.getDeclaredField("cache");
myCache.setAccessible(true);
Integer[] newCache = (Integer[]) myCache.get(cache);
newCache[132] = newCache[133];
Integer a = 2;
Integer b = a + a;
System.out.printf("%d + %d = %d", a, a, b); // The output is: 2 + 2 = 5
}
In Java 5, a new feature was introduced to save the memory and improve performance for Integer type objects handlings. Integer objects are cached internally and reused via the same referenced objects.
This is applicable for Integer values in range between –127 to +127
(Max Integer value).
This Integer caching works only on autoboxing. Integer objects will
not be cached when they are built using the constructor.
For more detail pls go through below Link:
Integer Cache in Detail
Class Integer contains cache of values between -128 and 127, as it required by JLS 5.1.7. Boxing Conversion. So when you use the == to check the equality of two Integers in this range, you get the same cached value, and if you compare two Integers out of this range, you get two diferent values.
You can increase the cache upper bound by changing the JVM parameters:
-XX:AutoBoxCacheMax=<cache_max_value>
or
-Djava.lang.Integer.IntegerCache.high=<cache_max_value>
See inner IntegerCache class:
/**
* Cache to support the object identity semantics of autoboxing for values
* between -128 and 127 (inclusive) as required by JLS.
*
* The cache is initialized on first usage. The size of the cache
* may be controlled by the {#code -XX:AutoBoxCacheMax=<size>} option.
* During VM initialization, java.lang.Integer.IntegerCache.high property
* may be set and saved in the private system properties in the
* sun.misc.VM class.
*/
private static class IntegerCache {
static final int low = -128;
static final int high;
static final Integer cache[];
static {
// high value may be configured by property
int h = 127;
String integerCacheHighPropValue =
sun.misc.VM.getSavedProperty("java.lang.Integer.IntegerCache.high");
if (integerCacheHighPropValue != null) {
try {
int i = parseInt(integerCacheHighPropValue);
i = Math.max(i, 127);
// Maximum array size is Integer.MAX_VALUE
h = Math.min(i, Integer.MAX_VALUE - (-low) -1);
} catch( NumberFormatException nfe) {
// If the property cannot be parsed into an int, ignore it.
}
}
high = h;
cache = new Integer[(high - low) + 1];
int j = low;
for(int k = 0; k < cache.length; k++)
cache[k] = new Integer(j++);
// range [-128, 127] must be interned (JLS7 5.1.7)
assert IntegerCache.high >= 127;
}
private IntegerCache() {}
}

correct way to compare double in equals()-method [duplicate]

This question already has answers here:
How to compare two double values in Java?
(7 answers)
Closed 3 years ago.
I have double types within my class and have to override equals()/hashCode(). So I need to compare double values.
Which is the correct way?
Version 1:
boolean isEqual(double a, double b){
return Double.doubleToLongBits(a) == Double.doubleToLongBits(b);}
Version 2:
boolean isEqual(double a, double b){
final double THRESHOLD = .0001;
return Math.abs(a - b) < THRESHOLD;
}
Or should I avoid primitive double at all and use its wrapper type Double ? With this I can use Objects.equals(a,b), if a and b are Double.
The recommended way for use in equals/hashcode methods[citation needed] is to use Double.doubleToLongBits() and Double.hashcode() respectively.
This is because the contract of equals requires the two inputs to evaluate to 'different' if the hash codes are different. The other way around has no restriction.
(Note: It turns out that Double.compare() internally uses doubleToLongBits() but this is not specified by the API. As such I won't recommend it. On the other hand, hashCode() does specify that it uses doubleToLongBits().)
Practical example:
#Override
public boolean equals(Object obj) {
if (obj == null || getClass() != obj.getClass())
return false;
Vector2d other = (Vector2d)obj;
return Double.doubleToLongBits(x) == Double.doubleToLongBits(other.x) &&
Double.doubleToLongBits(y) == Double.doubleToLongBits(other.y);
}
#Override
public int hashCode() {
int hash = 0x811C9DC5;
hash ^= Double.hashCode(x);
hash *= 0x01000193;
hash ^= Double.hashCode(y);
hash *= 0x01000193;
return hash;
}
double values should not be used as a component to establish object equality and therefore its hashcode.
It comes from the fact that there is inherent imprecision in floating point numbers and double saturates artificially at +/-Infinity
To illustrate this problem:
System.out.println(Double.compare(0.1d + 0.2d, 0.3d));
System.out.println(Double.compare(Math.pow(3e27d, 127d), 17e256d / 7e-128d));
prints:
1
0
... which translates to the following 2 false statements:
0.1 + 0.2 > 0.3
(3 * 1027)127 == 17 * 10256 / (7 * 10-128)
So your software will make you act on 2 equal numbers being unequal, or 2 very large or very small unequal numbers being equal.

Generating hashCode() for a custom class

I have a class named Dish and I handle it inside ArrayLists
So I had to override default hashCode() method.
#Override
public int hashCode() {
int hash =7;
hash = 3*hash + this.getId();
hash = 3*hash + this.getQuantity();
return hash;
}
When I get two dishes with id=4,quan=3 and id=5,quan=0, hashCode() for both is same;
((7*3)+4)*3+3 = 78
((7*3)+5)*3+0 = 78
What am I doing wrong? Or the magic numbers 7 and 3 I have chosen is wrong?
How do I properly override hashCode() so that it generates unique hashes?
PS: From what I searched from google and SO, people use different numbers but the same method. If the problem is with the numbers, how do I wisely choose numbers that doesn't actually increase the cost for multiplication and at the same time works well even for more number of attributes.
Say I had 7 int attributes and my second magic no. is 31, the final hash will be first magic no. * 27512614111 even if all my attributes are 0. So how do I do it without having my hashed value in billions so as keep my processor burden-free?
You can use something like this
public int hashCode() {
int result = 17;
result = 31 * result + getId();
result = 31 * result + getQuantity();
return result;
}
One more thing if your id is unique for each object then no need of using quantity while calculating hashcode.
Here is extract from Effective Java by Joshua bloch telling how implement hashcode method
Store some constant nonzero value, say, 17, in an int variable called result.
For each significant field f in your object (each field taken into account by the equals method, that is), do the following:
a. Compute an int hash code c for the field:
i. If the field is a boolean, compute (f ? 1 : 0).
ii. If the field is a byte , char, short, or int, compute (int) f .
iii. If the field is a long , compute (int) (f ^ (f >>> 32)) .
iv. If the field is a float , compute Float.floatToIntBits(f) .
v. If the field is a double, compute Double.doubleToLongBits(f) , and then hash the resulting long as in step 2.a.iii.
vi. If the field is an object reference and this class’s equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If a more complex comparison is required, compute a “canonical representation” for this field and invoke hashCode on the canonical representation. If the value of the field is null, return 0 (or some other constant, but 0 is traditional).
vii. If the field is an array, treat it as if each element were a separate field. That is, compute a hash code for each **significant element** by applying these rules recursively, and combine these values per step 2.b. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5.
b. Combine the hash code c computed in step 2.a into result as follows:
result = 31 * result + c;
Return result .
When you are finished writing the hashCode method, ask yourself whether equal instances have equal hash codes. Write unit tests to verify your intuition!If equal instances have unequal hash codes, figure out why and fix the problem.
Source: Effective Java by Joshua Bloch
This is perfectly OK. The hashing function is not supposed to be universally unique - it just gives a quick hint about which elements might be equal and should be checked in more depth by a call to equals().
From the name of class looks like quantity is the number of dish. So, There is chance that many times it will be zero. I would say in case getquantity() is zero use a variable say x in the hash fucntion.like this:
#Override
public int hashCode() {
int hash =7;int x =0;
if(getQuantity==0)
{
x = getQuantity+getId();
}
else
{
x = getquantity;
}
hash = 3*hash + this.getId();
hash = 3*hash + x;
return hash;
}
I believe this should reduce the collision of hash.since the getId() you have is a unique number.it makes the x a unique number too.

Good hashCode() Implementation

The accepted answer in Best implementation for hashCode method gives a seemingly good method for finding Hash Codes. But I'm new to Hash Codes, so I don't quite know what to do.
For 1), does it matter what nonzero value I choose? Is 1 just as good as other numbers such as the prime 31?
For 2), do I add each value to c? What if I have two fields that are both a long, int, double, etc?
Did I interpret it right in this class:
public MyClass{
long a, b, c; // these are the only fields
//some code and methods
public int hashCode(){
return 37 * (37 * ((int) (a ^ (a >>> 32))) + (int) (b ^ (b >>> 32)))
+ (int) (c ^ (c >>> 32));
}
}
The value is not important, it can be whatever you want. Prime numbers will result in a better distribution of the hashCode values therefore they are preferred.
You do not necessary have to add them, you are free to implement whatever algorithm you want, as long as it fulfills the hashCode contract:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
There are some algorithms which can be considered as not good hashCode implementations, simple adding of the attributes values being one of them. The reason for that is, if you have a class which has two fields, Integer a, Integer b and your hashCode() just sums up these values then the distribution of the hashCode values is highly depended on the values your instances store. For example, if most of the values of a are between 0-10 and b are between 0-10 then the hashCode values are be between 0-20. This implies that if you store the instance of this class in e.g. HashMap numerous instances will be stored in the same bucket (because numerous instances with different a and b values but with the same sum will be put inside the same bucket). This will have bad impact on the performance of the operations on the map, because when doing a lookup all the elements from the bucket will be compared using equals().
Regarding the algorithm, it looks fine, it is very similar to the one that Eclipse generates, but it is using a different prime number, 31 not 37:
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + (int) (a ^ (a >>> 32));
result = prime * result + (int) (b ^ (b >>> 32));
result = prime * result + (int) (c ^ (c >>> 32));
return result;
}
A well-behaved hashcode method already exists for long values - don't reinvent the wheel:
int hashCode = Long.hashCode((a * 31 + b) * 31 + c); // Java 8+
int hashCode = Long.valueOf((a * 31 + b) * 31 + c).hashCode() // Java <8
Multiplying by a prime number (usually 31 in JDK classes) and cumulating the sum is a common method of creating a "unique" number from several numbers.
The hashCode() method of Long keeps the result properly distributed across the int range, making the hash "well behaved" (basically pseudo random).

How to decide hashcode value?

How to decide hashcode value? Recently I faced an interview question that "Is 17 a valid hashcode?". Is there any mechanism to define hashcode value? or we can give any number for hashcode value?
Hashcodes should have good dispersion, so that different objects will be saved in different positions of the hash table (otherwise performance could be degraded).
From that point, while 17 us a "valid" hash code (in the sense that it is a 32-bit signed integer), it is suspicious how the hash function was defined.
For instance, a naive approach for hashing a string is just adding the value of each character. This result in similar hash values for simple strings (such as "tar" and "rat" that sum up to the same value).
A common trick is multiplying each value by a small prime, so that simple inputs will return different values, e.g.;
int result = 1;
result = 31 * result + a;
result = 31 * result + b;
or
int h=0;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
(the latter, from the JRE implementation of String.hashCode)
Yes, 17 is a perfectly valid hashcode.
Whatever method you select to derive the hashcode, it should always return the same integer for the object (as long as its state remains the same).
Ya 17 is valid. Usually prime numbers like shown in the link are used, you can implement hashcode using the id of that entity which is a primary key
public int hashCode()
{
int result = 17;
result = 37 * result + (getId() == null ? 0 : this.getId().hashCode());
return result;
}
this provides different ways for implementing the hascode

Categories

Resources