Can I implement hashCode and equals this way for (Hash)Sets? - java

I know that if a.equals(b), we must have a.hashCode() == b.hashCode() else we get strange results, but I'm wondering if the converse is also required.
More specifically, I have a hashCode() function that uses the field id to calculate the hashCode. However, my equals() function only uses the simple comparison "==" to check for equality. This may seem strange but unless more details are required, it's simply how I've implemented it.
Now the question is will this mess anything up? Specifically for HashSets, but more generally, for any (common) implementations of Set.
The way I understand it, a Set will first check the hashCode then the equals operator to see if a duplicate object exists. In this case, it should work right? If two objects are the same instance, they will produce the same hashCode as well as return true for equals() and thus the Set will only allow the instance to be be added once.
For two separate instances with the same id, the hashCode will be identical but then the equals() operator will return false and thus allow both objects to enter the Set, which is what I hope to accomplish.
Is this a beginner's mistake? Am I missing something? Will this have an unexpected results for any collection types other than Set?
edit:
I guess I should explain myself. I have a Hibernate object FooReference which implements both a hashCode and equals method using the id. This object is guaranteed to always have a unique id. However, before this object is persisted, I use a Foo object which has a default id of -1. So when putting it in a Set (to be saved) I know each Foo is unique (thus the basic == operator). So this Foo which extends FooReference overrides the equals method with a basic ==. For my purposes this should work... hopefully.

Objects are allowed to have the same hashcode without being equal to each other. In fact, it's perfectly valid (though inefficient and a bad idea) to implement hashCode as simply return 0, giving every instance the same hashcode.
All that's required is that if two objects are equal (as determined by the equals method), they have the same hashcode.
However, if your equals method just compares the two objects using == internally, no two (distinct) instances will ever be equal to each other, so there's no point defining your own hashCode and equals methods at all. The default implementations will produce the same behavior.

Related

Why does Java need equals() if there is hashCode()?

If two objects return same hashCode, doesn't it mean that they are equal? Or we need equals to prevent collisions?
And can I implement equals by comparing hashCodes?
If two objects have the same hashCode then they are NOT necessarily equal. Otherwise you will have discovered the perfect hash function. But the opposite is true - if the objects are equal, then they must have the same hashCode.
hashCode and Equals are different information about objects
Consider the analogy to Persons where hashcode is the Birthday,
in that escenario, you and many other people have the same b-day (same hashcode), all you are not the same person however..
Why does Java need equals() if there is hashCode()?
Java needs equals() because it is the method through which object equality is tested by examining classes, fields, and other conditions the designer considers to be part of an equality test.
The purpose of hashCode() is to provide a hash value primarily for use by hash tables; though it can also be used for other purposes. The value returned is based on an object's fields and hash codes of its composite and/or aggregate objects. The method does not take into account the class or type of object.
The relationship between equals() and hashCode() is an implication.
Two objects that are equal implies that the have the same hash code.
Two objects having the same hash code does not imply that they are equal.
The latter does not hold for several reasons:
There is a chance that two distinct objects may return the same hash code. Keep in mind that a hash value folds information from a large amount of data into a smaller number.
Two objects from different classes with similar fields will most likely use the same type of hash function, and return equal hash values; yet, they are not the same.
hashCode() can be implementation-specific returning different values on different JVMs or JVM target installations.
Within the same JVM, hashCode() can be used as a cheap precursor for equality by testing for a known hash code first and only if the same testing actual equality; provided that the equality test is significantly more expensive than generating a hash code.
And can I implement equals by comparing hashCodes?
No. As mentioned, equal hash codes does not imply equal objects.
The hashCode method as stated in the Oracle Docs is a numeric representation of an object in Java. This hash code has limited possible values (represented by the values which can be stored in an int).
For a more complex class, there is a high possibility that you will find two different objects which have the same hash code value. Also, no one stops you from doing this inside any class.
class Test {
#Override
public int hashCode() {
return 0;
}
}
So, it is not recommended to implement the equals method by comparing hash codes. You should use them for comparison only if you can guarantee that each object has an unique hash code. In most cases, your only certainty is that if two objects are equal using o1.equals(o2) then o1.hashCode() == o2.hashCode().
In the equals method you can define a more complex logic for comparing two objects of the same class.
If two objects return same hashCode, doesn't it mean that they are equal?
No it doesn't mean that.
The javadocs for Object state this:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. ...
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must
produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCodemethod on
each of the two objects must produce distinct integer results. ...
Note the highlighted statement. It plainly says "No" to your question.
There is another way to look at this.
The hashCode returns an int.
There are only 232 distinct values that an int can take.
If a.hashCode() == b.hashCode() implies a.equals(b), then there can be only 232 distinct (i.e. mutually unequal) objects at any given time in a running Java application.
That last point is plainly not true. Indeed, it is demonstrably not true if you have a large enough heap to hold 232 instances of java.lang.Object ... in a 64-bit JVM.
And a third way is to some well-known examples where two different two character strings have the same hashcode.
Given that your assumption is incorrect, the reasoning that follows from it is also incorrect.
Java does need an equals method.
You generally cannot implement equals using just hashCode.
You may be able to use hashCode to implement a faster equals method, but only if calling hashCode twice is faster than comparing two objects. It generally isn't.
hashCodes are equal -> Objects might be equal -> further comparision is required
hashCodes are different -> Object are not equal (if hashCode is implemented right)
That's how equals method are implemented. At first you check if hashCodes are equal. If yes, you need to check class fields to see if it represents the exact same object. If hashCodes are different, you can be sure that objects are not equal.
Sometimes (very often?) you don't!
These answers are not untrue. But they don't tell the whole story.
One example would be where you are creating a load of objects of class SomeClass, and each instance that is created is given a unique ID by incrementing a static variable, nInstanceCount, or some such, in the constructor:
iD = nInstanceCount++;
Your hash function could then be
int hashCode(){
return iD;
}
and your equals could then be
boolean equals( Object obj ){
if( ! ( obj instanceof SomeClass )){
return false;
}
return hashCode() == obj.hashCode();
}
... under such circumstances your idea that "equals is superfluous" is effectively true: if all classes behaved like this, Java 10 (or Java 23) might say, ah, let's just get rid of silly old equals, what's the point? (NB backwards compatibility would then go out the window).
There are two essential points:
you couldn't then create more than MAXINT instances of SomeClass. Or... you could ... if you set up a system for reassigning the IDs of previously destroyed instances. IDs are typically long rather than int ... but this wouldn't work because hashCode() returns int.
none of these objects could then be "equal" to another one, since equality = identity for this particular class, as you have defined it. Often this is desirable. Often it shuts off whole avenues of possibilities...
The necessary implication of your question is, perhaps, what's the use of these two methods which, in a rather annoying way, have to "cooperate"? Frelling, in his/her answer, alluded to the crucial point: hash codes are needed for sorting into "buckets" with classes like HashMap. It's well worth reading up on this: the amount of advanced maths that has gone into designing efficient "bucket" mechanisms for classes like HashMap is quite frightening. After reading up on it you may come to have (like me) a bit of understanding and reverence about how and why you should bother implementing hashCode() with a bit of thought!

Why override equals instead of using another method name

This seems like a silly question but why do we override equals method instead of creating a new method with new name and compare using it?
If I didn't override equals that means both == and equals check whether both references are pointed to same memory location?
This seems like a silly question but why do we override equals method instead of creating a new method with new name and compare using it?
Because all standard collections (ArrayList, LinkedList, HashSet, HashMap, ...) use equals when deciding if two objects are equal.
If you invent a new method these collections wouldn't know about it and not work as intended.
The following is very important to understand: If a collection such as ArrayList calls Object.equals this call will, in runtime, resolve to the overridden method. So even though you invent classes that the collections are not aware of, they can still invoke methods, such as equals, on those classes.
If I didn't override equals that means both == and equals check whether both references are pointed to same memory location?
Yes. The implementation of Object.equals just performs a == check.
You override equals if you are using classes that rely on equals, such as HashMap, HashSet, ArrayList etc...
For example, if you store elements of your class in a HashSet, you must override hashCode and equals if you want the uniqueness of elements to be determined not by simple reference equality.
Yes, if you don't override equals, the default equals implementation (as implemented in the Object class) is the same as ==.
In addition to the main reason, already given in other answers, consider program readability.
If you override equals and hashCode anyone reading your code knows what the methods are for. Doing so tells the reader the criteria for value equality between instances of your class. Someone reading your code that uses equals will immediately know you are checking for value equality.
If you use some other name, it will only confuse readers and cost them extra time reading your JavaDocs to find out what your method is for.
Because equals() is a method of the Object class, which is the superclass of all classes, and due to which it is inherently present in every class you write. Hence every collection class or other standard classes use equals() for object comparsion. If you want your custom class objects to be supported for equality by other classes, you have to override equals() only (since other classes know that for object comparion call equals()). If you are only using your own classes, you might create a new method and make sure everything uses your custom method for comparison.
The equals and hashcode method are special methods, widely used across the java's utility classes specially collection framework, and the wrpper classes e.g. String, Integer have overridden this method, So e.g. if you are placing any Object of your choice which has correct equals and hashcode implementation inside the HashSet, to maintain the property of uniqueness the hashcode will compare with all the existing object in hashset, and if it finds any of the hashcode matching then it looks into the equals method to double check if both are really equal and if that equality check also is pass then incoming object is rejected, but if the hashcode equality check is not passed then hashset will not go for the equals method and straight way place that object into the hashset. So we need to make sure the implementation of equals and hashcode is logically proper.
A class like HashMap<T,U> needs to have some means of identifying which item in the collection, if any, should be considered equivalent to a given item. There are two general means via which this can be accomplished:
Requiring that anything to be stored in a collection must include virtual methods to perform such comparison (and preferably provide a quick means (e.g. hashCode()) of assigning partial equivalence classes).
Require that code which creates the collection must supply an object which can accept references to other objects and perform equivalence-related operations upon them.
It would have been possible to omit equals and hashCode() from Object, and have types like HashMap only be usable with key types that implement an equatable interface that includes such members; code which wishes to use collections of references keyed by identity would have to use IdentityHashMap instead. Such a design would not have been unreasonable, but the present design makes it possible for a general-purpose collection-of-collections type which uses HashMap to be usable with things that are compared by value as well as by identity, rather than having to define a separate types for collection-of-HashMap and collection-of-IdentityHashMap.
An alternative design might have been to have a GeneralHashMap type whose constructor requires specifying a comparison function, and have IdentityHashMap and HashMap both derive from that; the latter would constrain its type to equatable and have its identity functions chain to those of the objects contained therein. There would probably have been nothing particularly wrong with that design, but that's not how things were done.
In any case, there needs to be some standard means by which collections can identify items that should be considered equivalent; using virtual equals(Object) and getHashCode() is a way of doing that.
Question 1
There are Two things.
equals() is Located inside Object class
Collection framework using equals() and hashcode() methods when comparing objects
Question 2
Yes for comparing two Object. but when You comparing two String Objects using equals() its only checking the value.

Can you just return a field's hashCode() value in a hashCode() method?

While reviewing a large code base, I've often come across cases like this:
#Override
public int hashCode()
{
return someFieldValue.hashCode();
}
where the programmer, instead of generating their own unique hash code for the class, simply inherits the hash code from a field value. My gut feeling (which might just as well be digestive problems) tells me that this is wrong, but I can't put my finger on it. What problems can arise, if any, with this sort of implementation?
This is fine if you want to hash your object based on a single property.
For example, in a Person class you might have an ID property that uniquely identifies a Person, so the hashCode() of Person can simply be the hash of that ID.
In addition, the hashCode() is related to the implementation of equals. If two objects are equal, they must have the same hashCode (the opposite doesn't have to be true - two non equal objects may still have the same hashCode). Therefore, if equality is determined by a single property (such as a unique ID), the hashCode method must also use only that single property.
This can be seen in the JavaDoc of hashCode :
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
Technically speaking, you can return any consistent number from hashCode, even a constant value. The only requirement the contract places upon you is that equal objects must return the same hash code:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
Theoretically, if all objects return, say, zero for their hashCode, the contract is formally satisfied. However, this makes hashCode completely useless.
The real question is whether you should do it or not. The answer depends on how unique is the field the hash code of which you are returning. It is not uncommon to return the hashCode of a unique identifier of an object for the object's hashCode. On the other hand, if a significant percentage of objects have the sane value of someFieldValue, you would be better off using a different strategy for making the hash code of your object.
hashCode() has to go with equals().
If the only property defining equalness is, for example, an ID, you HAVE TO take care that your hash codes are equal when the ID is equal.
The easiest way to accomplish this is by taking the hashCode() of your ID.
This is fine, if you really want to uniquely identify your object by this single property. Here is an article that explains what object identity really is.
As noted in the documentation of Object, your equals() and hashCode() need to incorporate the same properties, be sure to verify that.
So this means that you should ask yourself the question: do I really want the objects to be equal if only this single property is equal?
Finally do take great care when subclassing objects with a custom equals() and hashcode() implementation, if you want to add properties to the identity of the object, you will break the requirement that a.equals(b) == b.equals(a) (to see why this fails thing about this as a being the super class and b being the subclass.
yes you can do it technically, you need a non-primitive somefieldValue for that.

Different fields for equals and hashcode

I agree with the statement from this post What issues should be considered when overriding equals and hashCode in Java?
Use the same set of fields that you use to compute equals() to compute hashCode().
But i've some doubts :
Is this absolutely necessary to have same fields ?
If yes, what if I don't use same field ?
Will it affect HashMap performance or HashMap Accuracy ?
The fields don't have to be the same. The requirement is for two objects that are equal, they must have the same hash code. If they have the same hash code, they don't have to be equal. From the javadocs:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. This integer need not remain
consistent from one execution of an application to another execution
of the same application.
If two objects are equal according to the
equals(Object) method, then calling the hashCode method on each of the
two objects must produce the same integer result.
It is not required
that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on
each of the two objects must produce distinct integer results.
However, the programmer should be aware that producing distinct
integer results for unequal objects may improve the performance of
hash tables.
For example, you could return 1 as your hash code always, and you would obey the hash code contract, no matter what fields you used in your equals method.
Returning 1 all the time would improve the computation time of hashCode, but HashMap's performance would drop since it would have to resort to equals() more often.
Is this absolutely necessary to have same fields ?
Yes, if you don't want any surprises.
If yes, what if I don't use same field ?
You might get different hashCode for objects that are equal, as per equals() method, which is a requirement for the equals and hashCode contract.
For example, suppose you've 3 fields - a, b, c. And you use a and b for equals() method, and all the 3 fields for hashCode() method. So, for 2 objects, if a and b are equals, and c is different, both will be equals with different hashcode.
Will it affect HashMap performance or HashMap Accuracy ?
It's not about performance, but yes your map will not behave as expected.
Fields used in hashcode can be a subset of fields used in equals.
It will still abide by this rule "Whenever a.equals(b), then a.hashCode() must be same as b.hashCode()"

Java: If I overwrite the .equals method, can I still test for reference equality with ==?

I have the following situation: I need to sort trees based by height, so I made the Tree's comparable using the height attribute. However, I was also told to overwrite the equals and hashCode methods to avoid unpredictable behaviour.
Still, sometimes I may want to compare the references of the roots or something along those lines using ==. Is that still possible or does the == comparison call the equals method?
equals() is meant to compare an object with rules set by the programmer. In your example you compare your trees by height, so you'll write equals() so it compares heights.
==, as you said, compares references. These aren't touched neither by equals() nor by hashCode(). So you won't change its behaviour.
Yes, == will not call hashCode or equals. You can still test for reference equality like this.
== does not call equals. So it's still find for identity checks.
As many implementations of equals start with this == other check you would get a literal StackOverflow if it were calling equals behind the scenes.
I think that a bigger question here is whether it is appropriate to implement comparable on these objects. It may be more appropriate to use a Comparator for the operations that work on height, and not embed ordinal computation in the class itself.
My general philosophy on this is to only implement Comparable if there is a truly natural ordering for the object. In the case of a tree node, is height the only way that anyone could ever want to sort? Maybe this is a private class, and the answer is 'yes'. But even then, creating a Comparator isn't that much extra work, and it leaves things flexible in case you decide you want to make that tree node a protected or public class some day.
== tests referential equality. It will not call equals.
Overriding the equals() method will have NO effect on the == operator.
== is used to test if 2 references point to the same object.
equals() method "meaningfully" compares 2 objects.
It is important to realize the implication of the work "meaningful" here. Equality is easier to understand when you are comparing, for instance, 2 Strings or 2 integers. This is why, the equals() method - inherited from the Object class - is already overridden by the String and Wrapper classes (Integer, Float, etc). However, what if you are comparing 2 objects of type Song. Here, equality can be established on the basis of
1) Artist name
2) Song name
3) or some other criterion
Therefore, you have to override the equals() method to "explicitly" determine "when" 2 Song objects are considered equal.
The "unpredictable behavior" you mentioned in your question relates to objects like the one above (Song) behave when dealing with Collections like Map. You SHOULD NOT use these objects in a map until you override both the equals() and hashcode() method. The reason being how hashmap search and indexing works. Refer the JavaDoc for the specifc rules. What you should remember is:
If 2 objects are meaningfully equal, their hashcode should return the same value. However, it is not necessary for 2 objects to be equal, if they return the same hashcode. Again, Java doesn't enforce any rules regarding this. It is your responsibility to implement the equals() and hashcode() methods correctly.

Categories

Resources