Is Java hashCode() method a reliable measure of object equality? - java

I am currently working on comparing two complex objects of the same type, with multiple fields consisting of data structures of custom object types. Assuming that none of the custom objects has overriden the hashCode() method, if I compare the hashcodes of every field in the objects, and they will turn out to be the same, do I have a 100% confidence that the content of the compared objects is the same? If not, which method would you recommend to compare two objects, assuming I can't use any external libraries.

Absolutely not. You should only use hashCode() as a first pass - if the hash codes are different, you can assume the objects are unequal. If the hash codes are the same, you should then call equals() to check for full equality.
Think about it this way: there are only 232 possible hash codes. How many possible different objects are there of type String, as an example? Far more than that. Therefore at least two non-equal strings must share the same hash code.
Eric Lippert writes well about hash codes - admittedly from a .NET viewpoint, but the principles are the same.

No, lack of hashCode() collision only means that the objects could be identical, it's never a guarantee.
The only guarantee is that if the hashCode() values are different (and the hashCode()/equals() implementations are correct), then the objects will not be equal.
Additionally if your custom types don't have a hashCode() implementation, then that value is entirely useless for comparing the content of the object, because it will be the identityHashCode().

If you haven't overriden the hashCode() method, all of your objects are unequal. By overriding it you provide the logic of the comparison. Remember, if you override hashCode(), you definitely should override equals().
EDIT:
there still can be a collisionm of course, but if you didn't override equal(), your objects will be compared by reference (an object is equal to itself).

The usual JVM implementation of Object.hashCode() is to return the memory address of the object in some format, so this would technically be used for what you want (as no two objects can share the same address).
However, the actual specification of Object.hashCode() makes no guarentees and should not be used for this purpose in any sensible or well-written piece of code.
I would suggest using the hashCode and equals builders available in the Apache commons library, or if you really can't use free external libs, just have a look at them for inspiration. The best method to use depends entirely on what "equal" actually means in the context of your application domain though.

Related

Why does Java need equals() if there is hashCode()?

If two objects return same hashCode, doesn't it mean that they are equal? Or we need equals to prevent collisions?
And can I implement equals by comparing hashCodes?
If two objects have the same hashCode then they are NOT necessarily equal. Otherwise you will have discovered the perfect hash function. But the opposite is true - if the objects are equal, then they must have the same hashCode.
hashCode and Equals are different information about objects
Consider the analogy to Persons where hashcode is the Birthday,
in that escenario, you and many other people have the same b-day (same hashcode), all you are not the same person however..
Why does Java need equals() if there is hashCode()?
Java needs equals() because it is the method through which object equality is tested by examining classes, fields, and other conditions the designer considers to be part of an equality test.
The purpose of hashCode() is to provide a hash value primarily for use by hash tables; though it can also be used for other purposes. The value returned is based on an object's fields and hash codes of its composite and/or aggregate objects. The method does not take into account the class or type of object.
The relationship between equals() and hashCode() is an implication.
Two objects that are equal implies that the have the same hash code.
Two objects having the same hash code does not imply that they are equal.
The latter does not hold for several reasons:
There is a chance that two distinct objects may return the same hash code. Keep in mind that a hash value folds information from a large amount of data into a smaller number.
Two objects from different classes with similar fields will most likely use the same type of hash function, and return equal hash values; yet, they are not the same.
hashCode() can be implementation-specific returning different values on different JVMs or JVM target installations.
Within the same JVM, hashCode() can be used as a cheap precursor for equality by testing for a known hash code first and only if the same testing actual equality; provided that the equality test is significantly more expensive than generating a hash code.
And can I implement equals by comparing hashCodes?
No. As mentioned, equal hash codes does not imply equal objects.
The hashCode method as stated in the Oracle Docs is a numeric representation of an object in Java. This hash code has limited possible values (represented by the values which can be stored in an int).
For a more complex class, there is a high possibility that you will find two different objects which have the same hash code value. Also, no one stops you from doing this inside any class.
class Test {
#Override
public int hashCode() {
return 0;
}
}
So, it is not recommended to implement the equals method by comparing hash codes. You should use them for comparison only if you can guarantee that each object has an unique hash code. In most cases, your only certainty is that if two objects are equal using o1.equals(o2) then o1.hashCode() == o2.hashCode().
In the equals method you can define a more complex logic for comparing two objects of the same class.
If two objects return same hashCode, doesn't it mean that they are equal?
No it doesn't mean that.
The javadocs for Object state this:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. ...
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must
produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCodemethod on
each of the two objects must produce distinct integer results. ...
Note the highlighted statement. It plainly says "No" to your question.
There is another way to look at this.
The hashCode returns an int.
There are only 232 distinct values that an int can take.
If a.hashCode() == b.hashCode() implies a.equals(b), then there can be only 232 distinct (i.e. mutually unequal) objects at any given time in a running Java application.
That last point is plainly not true. Indeed, it is demonstrably not true if you have a large enough heap to hold 232 instances of java.lang.Object ... in a 64-bit JVM.
And a third way is to some well-known examples where two different two character strings have the same hashcode.
Given that your assumption is incorrect, the reasoning that follows from it is also incorrect.
Java does need an equals method.
You generally cannot implement equals using just hashCode.
You may be able to use hashCode to implement a faster equals method, but only if calling hashCode twice is faster than comparing two objects. It generally isn't.
hashCodes are equal -> Objects might be equal -> further comparision is required
hashCodes are different -> Object are not equal (if hashCode is implemented right)
That's how equals method are implemented. At first you check if hashCodes are equal. If yes, you need to check class fields to see if it represents the exact same object. If hashCodes are different, you can be sure that objects are not equal.
Sometimes (very often?) you don't!
These answers are not untrue. But they don't tell the whole story.
One example would be where you are creating a load of objects of class SomeClass, and each instance that is created is given a unique ID by incrementing a static variable, nInstanceCount, or some such, in the constructor:
iD = nInstanceCount++;
Your hash function could then be
int hashCode(){
return iD;
}
and your equals could then be
boolean equals( Object obj ){
if( ! ( obj instanceof SomeClass )){
return false;
}
return hashCode() == obj.hashCode();
}
... under such circumstances your idea that "equals is superfluous" is effectively true: if all classes behaved like this, Java 10 (or Java 23) might say, ah, let's just get rid of silly old equals, what's the point? (NB backwards compatibility would then go out the window).
There are two essential points:
you couldn't then create more than MAXINT instances of SomeClass. Or... you could ... if you set up a system for reassigning the IDs of previously destroyed instances. IDs are typically long rather than int ... but this wouldn't work because hashCode() returns int.
none of these objects could then be "equal" to another one, since equality = identity for this particular class, as you have defined it. Often this is desirable. Often it shuts off whole avenues of possibilities...
The necessary implication of your question is, perhaps, what's the use of these two methods which, in a rather annoying way, have to "cooperate"? Frelling, in his/her answer, alluded to the crucial point: hash codes are needed for sorting into "buckets" with classes like HashMap. It's well worth reading up on this: the amount of advanced maths that has gone into designing efficient "bucket" mechanisms for classes like HashMap is quite frightening. After reading up on it you may come to have (like me) a bit of understanding and reverence about how and why you should bother implementing hashCode() with a bit of thought!

Why override equals instead of using another method name

This seems like a silly question but why do we override equals method instead of creating a new method with new name and compare using it?
If I didn't override equals that means both == and equals check whether both references are pointed to same memory location?
This seems like a silly question but why do we override equals method instead of creating a new method with new name and compare using it?
Because all standard collections (ArrayList, LinkedList, HashSet, HashMap, ...) use equals when deciding if two objects are equal.
If you invent a new method these collections wouldn't know about it and not work as intended.
The following is very important to understand: If a collection such as ArrayList calls Object.equals this call will, in runtime, resolve to the overridden method. So even though you invent classes that the collections are not aware of, they can still invoke methods, such as equals, on those classes.
If I didn't override equals that means both == and equals check whether both references are pointed to same memory location?
Yes. The implementation of Object.equals just performs a == check.
You override equals if you are using classes that rely on equals, such as HashMap, HashSet, ArrayList etc...
For example, if you store elements of your class in a HashSet, you must override hashCode and equals if you want the uniqueness of elements to be determined not by simple reference equality.
Yes, if you don't override equals, the default equals implementation (as implemented in the Object class) is the same as ==.
In addition to the main reason, already given in other answers, consider program readability.
If you override equals and hashCode anyone reading your code knows what the methods are for. Doing so tells the reader the criteria for value equality between instances of your class. Someone reading your code that uses equals will immediately know you are checking for value equality.
If you use some other name, it will only confuse readers and cost them extra time reading your JavaDocs to find out what your method is for.
Because equals() is a method of the Object class, which is the superclass of all classes, and due to which it is inherently present in every class you write. Hence every collection class or other standard classes use equals() for object comparsion. If you want your custom class objects to be supported for equality by other classes, you have to override equals() only (since other classes know that for object comparion call equals()). If you are only using your own classes, you might create a new method and make sure everything uses your custom method for comparison.
The equals and hashcode method are special methods, widely used across the java's utility classes specially collection framework, and the wrpper classes e.g. String, Integer have overridden this method, So e.g. if you are placing any Object of your choice which has correct equals and hashcode implementation inside the HashSet, to maintain the property of uniqueness the hashcode will compare with all the existing object in hashset, and if it finds any of the hashcode matching then it looks into the equals method to double check if both are really equal and if that equality check also is pass then incoming object is rejected, but if the hashcode equality check is not passed then hashset will not go for the equals method and straight way place that object into the hashset. So we need to make sure the implementation of equals and hashcode is logically proper.
A class like HashMap<T,U> needs to have some means of identifying which item in the collection, if any, should be considered equivalent to a given item. There are two general means via which this can be accomplished:
Requiring that anything to be stored in a collection must include virtual methods to perform such comparison (and preferably provide a quick means (e.g. hashCode()) of assigning partial equivalence classes).
Require that code which creates the collection must supply an object which can accept references to other objects and perform equivalence-related operations upon them.
It would have been possible to omit equals and hashCode() from Object, and have types like HashMap only be usable with key types that implement an equatable interface that includes such members; code which wishes to use collections of references keyed by identity would have to use IdentityHashMap instead. Such a design would not have been unreasonable, but the present design makes it possible for a general-purpose collection-of-collections type which uses HashMap to be usable with things that are compared by value as well as by identity, rather than having to define a separate types for collection-of-HashMap and collection-of-IdentityHashMap.
An alternative design might have been to have a GeneralHashMap type whose constructor requires specifying a comparison function, and have IdentityHashMap and HashMap both derive from that; the latter would constrain its type to equatable and have its identity functions chain to those of the objects contained therein. There would probably have been nothing particularly wrong with that design, but that's not how things were done.
In any case, there needs to be some standard means by which collections can identify items that should be considered equivalent; using virtual equals(Object) and getHashCode() is a way of doing that.
Question 1
There are Two things.
equals() is Located inside Object class
Collection framework using equals() and hashcode() methods when comparing objects
Question 2
Yes for comparing two Object. but when You comparing two String Objects using equals() its only checking the value.

When is hashcode useful if I am never using hashtable?

Lets say I am implementing a class called Car, with 2 member variables int numDoors, and String color.
In a hypothetical case, I am never going to use such a car in hashtable or hashmap or any structure that needs a hash, time immemorial.
Now, why is it still required to override hashCode along with equals ?
Note: all answers I checkout include use in hashtable / hashmap. I have tried extensively to get this answer, so as a request dont mark it as a duplicate. Thanks
It's the general convention:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
However, it's not entirely enforceable.
There are times in which you would believe that you don't need to have hashCode defined and implemented for your object, and if you don't use any structure that relies on a hash to store or reference it, you'd be correct.
But, there are third-party libraries in which your object may come into contact with, and they may very well be using a Map or Set to do their work, and they'd have the expectation that you followed conventions.
It's up to you to not implement hashCode along with equals - you're certainly not forced to (although many would argue that this is a bug), but beware that your object may not work as well with a third party library for this reason.
The only conceivable types which would not be able to override hashCode method in a fashion consistent with the hashCode and equals contract would be those which are unable to override hashCode [e.g. because a base class declared it final]. There is thus almost never any reason for a type not to legitimately implement hashCode(). Even if a type cannot guarantee that instances which are unequal won't spontaneously become equal, the author of the type may still legitimately implement hashCode() by picking a 32-big int value [e.g. 8675309] and implementing hashCode() as #override int hashCode() { return 8675309; }. Doing this will allow all of the hash-table-based collection types to work correctly. Storing very many such items into a hash table will severely degrade performance, but hash tables with just a few items will work just fine and generally perform decently. By contrast, if one doesn't override hashCode then even a hash table will likely work incorrectly if even a single item is stored into it.
Incidentally, in some cases there may be advantages to implementing hashCode even when not using hashed collections. For example, some immutable collection types which support deep comparison might call hashCode() on the items stored therein. If a collection is large, and/or comparison operations on the items stored within it are expensive, the efficiency of testing two collections for equality ("do they contain equal items") may be enhanced by using a three-step process:
Compare the aggregate hashcode of two collections. If they're not equal, no reason to look any further. Will often yield instant results, no matter the size of the collections.
Compare the cached hash codes of all the items. If the collections' contents match except for the last couple items, and if comparisons between items may be expensive (e.g. the items are thousand-character strings) this will often avoid the need to compare all but one of the items for equality [note that if all but one of the items matched, and its hash code differed, then the aggregate hash code would differ and we wouldn't have gotten this far].
If all the hash codes match, then call equals on each pair of items that don't compare reference-equal.
Note that if two collections contain distinct items with equal content, a comparison is going to need to deeply examine all of the items; hashCode can't do anything to help with that case. On the other hand, in most cases where things are compared they are not equal, and using cached hashCode() values may facilitate orders-of-magnitude speedups in those cases.

Why there is no Hashable interface in Java

Object in Java has hashCode method, however, it is being used only in associative containers like HashSet or HashMap. Why was it designed like that? Hashable interface having hashCode method looks as much more elegant solution.
The major argument seems to me that there is a well-defined default hashCode that can be calculated for any Java object, along with an equally well-defined equals. There is simply no good reason to withhold this function from all objects, and of course there are plenty reasons not to withhold it. So it's a no-brainer in my book.
This question is claimed as a duplicate from another which asks why there's no interface which behaves like Comparator (as distinct from Comparable) but for hashing. .NET includes such an interface, called IEqualityComparer, and it would seem like Java could as well. As it is, if someone wants to e.g. have a Java collection which e.g. maps strings to other objects in case-insensitive fashion (perhaps the most common use of IEqualityComparer) one must wrap the strings in objects whose hashCode and equals methods act on a case-insensitive basis.
I suspect the big issue is that while an "equalityComparer" interface could be convenient, in many cases efficiently testing an equivalence relation would require caching information. For example, while a case-insensitive string-hashing function could make an uppercase-only copy of the passed-in string and call hashCode on that, it would be difficult to avoid having every request for the hashcode of a particular string repeat the conversion to uppercase and the hashing of that uppercase value. By contrast, a "case-insensitive string" object could include fields for an uppercase-only copy of the string, which would then only have to be generated once for the instance.
An EqualityComparer could achieve reasonable performance if it included something like a WeakHashMap<string,string> to convert raw strings to uppercase-only strings, but such a design would either require different threads to use different EqualityComparer instances despite the lack of externally visible state, or else require performance-robbing locking and synchronization code even in single-threaded scenarios.
Incidentally, a second issue that arises with comparator-style interfaces is that a collection type which uses an externally-supplied comparator (whether it compares for rank or equality) is that the comparator itself becomes part of the state of the class which uses it. If hash tables use different EqualityComparer instances, there may be no way to know that they can safely be considered equivalent, even if the two comparators would behave identically in all circumstances.

What's the best way to compare two JavaScriptObjects in GWT?

It looks like hashCode() and equals() are declared as final. So overriding the implementation is not possible. It also states that equals() returns true if the objects are JavaScript identical (triple-equals). I'm not quite sure what that means as creating two identical JavaScriptObject's in GWT and comparing them with equals() returns false. Also it looks like hashcode() uses a monotonically increasing counter to assign a hash code to the underlying JavaScript object. If I wanted to store JavaScriptObjects in a Set this would complicate things. Any help would be much appreciated.
It depends on which equality criteria you want to use for your situation.
If you want object identity, you can use the predefined equals() and hashCode() - and put the JavaScriptObjects directly in a HashSet.
If you need a different equality notion, you can write your own Comparator and put the JavaScriptObjects in e.g. a TreeSet, created by new TreeSet(comparator).
If you need to put JavaScriptObjects in a HashSet (not a TreeSet), and still need a different equality notion, you can't put the JavaScriptObjects directly in the Set. Then you'd have to write a wrapper class, which contains the JavaScriptObject, and implements equals() and hashCode().

Categories

Resources