General contract for object comparision : equals() and hashCode() - java

There is a point in general contract of equals method, which says if You has defined equals() method then You should also define hashCode() method. And if o1.equals(o2) then this is must o1.hashCode() == o2.hashCode().
So my question is what if I break this contract? Where can bring fails the situation when o1.equals(o2) but o1.hashCode != o2.hashCode() ?

It will lead to unexpected behavior in hash based data structure for example: HashMap, Read how HashTable works

HashMap/HashTable/HashSet/etc will put your object into one of several buckets based on its hashCode, and then check to see if any other objects already in that bucket are equal.
Because these classes assume the equals/hashCode contract, they won't check for equality against objects in other buckets. After all, any object in another bucket must have a different hashCode, and thus (by the contract) cannot be equal to the object in quesiton. If two objects are equal but have different hash codes, they could end up in different buckets, in which case the HashMap/Table/Set/etc won't have a chance to compare them.
So, you could end up with a Set that contains two objects which are equal -- which it's not supposed to do; or a Map that contains two values for the same one key (since the buckets are by key); or a Map where you can't look up a key (since the lookup checks both the hash code and equality); or any number of similar bugs.

If you break the contract, your objects won't work with hash-based containers (and anything else that uses hashCode() and relies on its contract).
The basic intuition is as follows: to see whether two objects are the same, the container could call hashCode() on both, and compare the results. If the hash codes are different, the container is allowed to short-circuit by assuming that the two objects are not equal.
To give a specific example, if o1.equals(o2) but o1.hashCode() != o2.hashCode(), you'll likely be able to insert both objects into a HashMap (which is meant to store unique objects).

Related

How can I take into consideration the object itself when calculating a hash for an object in Java?

I was working on some algorithmic problems when I got to this and it seemed interesting to me. If I have two lists (so two different objects), with the same values, the hashcode is the same. After some reading, I understand that this is how it should behave. For example:
List<String> lst1 = new LinkedList<>(Arrays.asList("str1", "str2"));
List<String> lst2 = new LinkedList<>(Arrays.asList("str1", "str2"));
System.out.println(lst1.hashCode() + " " + lst2.hashCode());
...........
Result: 2640541 2640541
My purpose would be to differentiate between lst1 and lst2 in a list for example.
Is there a structure (like a HashSet for example) that takes into consideration the actual object and not only the values inside the object when calculating the hashcode for something?
Yes, you can use java's java.util.IdentityHashMap, or guava's identity hash set.
The hashes of the two lists must be equal, because the objects are equal. But the identity map and set above are based on the identity of the list objects, not their hash.
If I have two lists (so two different objects), with the same values, the hashcode is the same. After some reading, I understand that this is how it should behave.
Yes, this is part of the specification of java.util.List.
Is there a structure (like a HashSet for example) that takes into consideration the actual object and not only the values inside the object when calculating the hashcode for something?
My purpose would be to differentiate between lst1 and lst2 in a list for example
It is unclear what "in a list" means here. For example, Collection.contains() and List.equals() are defined in terms or members' equals() methods, and likewise the behavior of List.remove(Object). Although distinct objects, your two Lists will compare equal to each other, so those methods will not distinguish between them, neither directly nor as members of another list. You can always compare them for reference equality (==), however, to determine that they are not the same object despite being equals() each other.
As far as a collection that takes members' object identity into account, you could consider java.util.IdentityHashMap. Two such maps having keys and associated values that are pairwise equals() each other but not identical will not compare equals() to each other. Such sets will typically have different hash codes than each other, though that cannot be guaranteed. Note well, however, the warnings throughout the documentation of IdentityHashMap that although it implements the Map API, many of the behavioral details are inconsistent with the requirements of that interface.
Note also that
most of the above is relevant only for collections whose members are of a type that overrides equals() and hashCode(). The implementations of or inherited from Object differentiate between objects on a reference-equality basis, so the ordinary collections classes have no surprises for you there.
identical string literals are not required to represent distinct objects, so the lst1 and lst2 in your example code may in fact contain identical elements, in the reference equality sense.
Not generally in collections, because you generally want two collections with all the same items to be equal (which is why they implement it like this- equals will return true and the hash codes are the same).
You can subclass a list and have it not do that, it would just not be widely useful and would cause a lot of confusion if other programmers read your code. In that case, you'd just want equals to return the result of == and hashCode to return the integer value of the reference (the same thing that Object.equals does).

Why does Java need equals() if there is hashCode()?

If two objects return same hashCode, doesn't it mean that they are equal? Or we need equals to prevent collisions?
And can I implement equals by comparing hashCodes?
If two objects have the same hashCode then they are NOT necessarily equal. Otherwise you will have discovered the perfect hash function. But the opposite is true - if the objects are equal, then they must have the same hashCode.
hashCode and Equals are different information about objects
Consider the analogy to Persons where hashcode is the Birthday,
in that escenario, you and many other people have the same b-day (same hashcode), all you are not the same person however..
Why does Java need equals() if there is hashCode()?
Java needs equals() because it is the method through which object equality is tested by examining classes, fields, and other conditions the designer considers to be part of an equality test.
The purpose of hashCode() is to provide a hash value primarily for use by hash tables; though it can also be used for other purposes. The value returned is based on an object's fields and hash codes of its composite and/or aggregate objects. The method does not take into account the class or type of object.
The relationship between equals() and hashCode() is an implication.
Two objects that are equal implies that the have the same hash code.
Two objects having the same hash code does not imply that they are equal.
The latter does not hold for several reasons:
There is a chance that two distinct objects may return the same hash code. Keep in mind that a hash value folds information from a large amount of data into a smaller number.
Two objects from different classes with similar fields will most likely use the same type of hash function, and return equal hash values; yet, they are not the same.
hashCode() can be implementation-specific returning different values on different JVMs or JVM target installations.
Within the same JVM, hashCode() can be used as a cheap precursor for equality by testing for a known hash code first and only if the same testing actual equality; provided that the equality test is significantly more expensive than generating a hash code.
And can I implement equals by comparing hashCodes?
No. As mentioned, equal hash codes does not imply equal objects.
The hashCode method as stated in the Oracle Docs is a numeric representation of an object in Java. This hash code has limited possible values (represented by the values which can be stored in an int).
For a more complex class, there is a high possibility that you will find two different objects which have the same hash code value. Also, no one stops you from doing this inside any class.
class Test {
#Override
public int hashCode() {
return 0;
}
}
So, it is not recommended to implement the equals method by comparing hash codes. You should use them for comparison only if you can guarantee that each object has an unique hash code. In most cases, your only certainty is that if two objects are equal using o1.equals(o2) then o1.hashCode() == o2.hashCode().
In the equals method you can define a more complex logic for comparing two objects of the same class.
If two objects return same hashCode, doesn't it mean that they are equal?
No it doesn't mean that.
The javadocs for Object state this:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. ...
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must
produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCodemethod on
each of the two objects must produce distinct integer results. ...
Note the highlighted statement. It plainly says "No" to your question.
There is another way to look at this.
The hashCode returns an int.
There are only 232 distinct values that an int can take.
If a.hashCode() == b.hashCode() implies a.equals(b), then there can be only 232 distinct (i.e. mutually unequal) objects at any given time in a running Java application.
That last point is plainly not true. Indeed, it is demonstrably not true if you have a large enough heap to hold 232 instances of java.lang.Object ... in a 64-bit JVM.
And a third way is to some well-known examples where two different two character strings have the same hashcode.
Given that your assumption is incorrect, the reasoning that follows from it is also incorrect.
Java does need an equals method.
You generally cannot implement equals using just hashCode.
You may be able to use hashCode to implement a faster equals method, but only if calling hashCode twice is faster than comparing two objects. It generally isn't.
hashCodes are equal -> Objects might be equal -> further comparision is required
hashCodes are different -> Object are not equal (if hashCode is implemented right)
That's how equals method are implemented. At first you check if hashCodes are equal. If yes, you need to check class fields to see if it represents the exact same object. If hashCodes are different, you can be sure that objects are not equal.
Sometimes (very often?) you don't!
These answers are not untrue. But they don't tell the whole story.
One example would be where you are creating a load of objects of class SomeClass, and each instance that is created is given a unique ID by incrementing a static variable, nInstanceCount, or some such, in the constructor:
iD = nInstanceCount++;
Your hash function could then be
int hashCode(){
return iD;
}
and your equals could then be
boolean equals( Object obj ){
if( ! ( obj instanceof SomeClass )){
return false;
}
return hashCode() == obj.hashCode();
}
... under such circumstances your idea that "equals is superfluous" is effectively true: if all classes behaved like this, Java 10 (or Java 23) might say, ah, let's just get rid of silly old equals, what's the point? (NB backwards compatibility would then go out the window).
There are two essential points:
you couldn't then create more than MAXINT instances of SomeClass. Or... you could ... if you set up a system for reassigning the IDs of previously destroyed instances. IDs are typically long rather than int ... but this wouldn't work because hashCode() returns int.
none of these objects could then be "equal" to another one, since equality = identity for this particular class, as you have defined it. Often this is desirable. Often it shuts off whole avenues of possibilities...
The necessary implication of your question is, perhaps, what's the use of these two methods which, in a rather annoying way, have to "cooperate"? Frelling, in his/her answer, alluded to the crucial point: hash codes are needed for sorting into "buckets" with classes like HashMap. It's well worth reading up on this: the amount of advanced maths that has gone into designing efficient "bucket" mechanisms for classes like HashMap is quite frightening. After reading up on it you may come to have (like me) a bit of understanding and reverence about how and why you should bother implementing hashCode() with a bit of thought!

what is the disadvantage of overriding equals and not hashcode and vice versa? [duplicate]

This question already has answers here:
Why do I need to override the equals and hashCode methods in Java?
(31 answers)
Closed 7 years ago.
I know there are lots of similar questions out there but I have not satisfied by the answers I have read. I tried to figure it out but I still did not get the idea.
What I know is these two are important while using set or map especially HashSet, HashMap or Hash objects in general which use hash mechanism for storing element objects.
Both methods are used to test if two Objects are equal or not.
For two objects A and B to be equal first they need to have the same hash value( have to be in the same bucket) and second we have to get true while executing A.equals(B).
What I do not understand is, WHY is it necessary to override both of these methods.
WHAT if we do not override hashcode. IS IT A MUST TO OVERRIDE BOTH.If it is not what is the disadvantage of overriding equals and not hashcode and vice versa.
Properly implementing hashCode is necessary for your object to be a key in hash-based containers. It is not necessary for anything else.
Here's why it is important for hash-based containers such as HashMap, HashSet, ConcurrentHashMap etc.
At a high level, a HashMap is an array, indexed by the hashCode of the key, whose entries are "chains" - lists of (key, value) pairs where all keys in a particular chain have the same hash code. For a refresher on hashtables, see Wikipedia.
Consider what happens if two keys A, B are equal, but have a different hash code - for example, a.hashCode() == 42 and b.hashCode() == 37. Suppose you write:
hashTable.put(a, "foo");
hashTable.get(b);
Since the keys are equal, you would like the result to be "foo", right?
However, get(b) will look into the chain corresponding to hash 37, while the pair (a, "foo") is located in the chain corresponding to hash 42, so the lookup will fail and you'll get null.
This is why it is important that equal objects have equal hash codes if you intend to use the object as a key in a hash-based container.
Note that if you use a non-hash based container, such as TreeMap, then you don't have to implement hashCode because the container doesn't use it. Instead, in case of TreeMap, you should implement compareTo - other types of containers may have their own requirements.
Yes it's correct when you override equals method you have to override hashcode method as well. The reason behind is that in hash base elements two objects are equal if their equals method return true and their hashcode method return same integer value. In hash base elements (hash map) when you make the equal check for two objects first their hashcode method is get called, if it return same value for both then only equals method is get called. If hashcode don't return same value for both then it simplity consider both objects as not equal. By default the hashcode method return some random value, so if you are making two objects equal for some specific condition by overriding equals method, they still won't equal because their hashcode value is different, so in order to make their hascode value equal you have to override it. Otherwise you won't be able to make this object as a key to your hash map.

Can you just return a field's hashCode() value in a hashCode() method?

While reviewing a large code base, I've often come across cases like this:
#Override
public int hashCode()
{
return someFieldValue.hashCode();
}
where the programmer, instead of generating their own unique hash code for the class, simply inherits the hash code from a field value. My gut feeling (which might just as well be digestive problems) tells me that this is wrong, but I can't put my finger on it. What problems can arise, if any, with this sort of implementation?
This is fine if you want to hash your object based on a single property.
For example, in a Person class you might have an ID property that uniquely identifies a Person, so the hashCode() of Person can simply be the hash of that ID.
In addition, the hashCode() is related to the implementation of equals. If two objects are equal, they must have the same hashCode (the opposite doesn't have to be true - two non equal objects may still have the same hashCode). Therefore, if equality is determined by a single property (such as a unique ID), the hashCode method must also use only that single property.
This can be seen in the JavaDoc of hashCode :
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
Technically speaking, you can return any consistent number from hashCode, even a constant value. The only requirement the contract places upon you is that equal objects must return the same hash code:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
Theoretically, if all objects return, say, zero for their hashCode, the contract is formally satisfied. However, this makes hashCode completely useless.
The real question is whether you should do it or not. The answer depends on how unique is the field the hash code of which you are returning. It is not uncommon to return the hashCode of a unique identifier of an object for the object's hashCode. On the other hand, if a significant percentage of objects have the sane value of someFieldValue, you would be better off using a different strategy for making the hash code of your object.
hashCode() has to go with equals().
If the only property defining equalness is, for example, an ID, you HAVE TO take care that your hash codes are equal when the ID is equal.
The easiest way to accomplish this is by taking the hashCode() of your ID.
This is fine, if you really want to uniquely identify your object by this single property. Here is an article that explains what object identity really is.
As noted in the documentation of Object, your equals() and hashCode() need to incorporate the same properties, be sure to verify that.
So this means that you should ask yourself the question: do I really want the objects to be equal if only this single property is equal?
Finally do take great care when subclassing objects with a custom equals() and hashcode() implementation, if you want to add properties to the identity of the object, you will break the requirement that a.equals(b) == b.equals(a) (to see why this fails thing about this as a being the super class and b being the subclass.
yes you can do it technically, you need a non-primitive somefieldValue for that.

Confusion with hashmap in Java?

I've read all the posts in the topic and I still have confusion on the following: when overriden and collision can happen? From what I'v read I see:
Whenever two objects are the same in terms of equals() method, their hash code must be the same
Whenever two objects are not the same in terms of equals() method, we have no guarantee for theid hashcode(), i.e. it might be the same, it might be different
when we use HashMap.put(key, value) HashMap compares objects by their equal() method. If the two keys are equal() then the new value is overriden
If two kays have the same hashcode then collision occurs and Java deals with it
However if two keys are equal then the new value is overriden, BUT it also implies that the hashCode() must be the same, so collision must happen, which is a contradiction with the previous?
Can someone please clarify these steps for me?
Think of a hashmap as a set of pigeon holes. Each pigeon hole can contain more than one object.
The hashCode() return is used to select the pigeon hole which either contains or would contain that object.
The equals() is used as the criterion to identify a specific object (e.g. for replacement).
The aim of hashCode() is to disperse typical objects uniformly across the pigeon holes. Once a particular pigeon hole has been identified as possibly containing an object then all objects in that particular group have to be examined. That operation is expensive since equals() needs to be called.
Your point #3 comes too soon: HashMap compares for equality only when the hashCode is the same.
HashMap checks hash code first to determine the placement of the object in a bucket. The regular HashMap keeps only items with identical hash codes (modulo a certain number) in the same bucket, and checks equality only for objects within the same bucket.

Categories

Resources