Lets say I am implementing a class called Car, with 2 member variables int numDoors, and String color.
In a hypothetical case, I am never going to use such a car in hashtable or hashmap or any structure that needs a hash, time immemorial.
Now, why is it still required to override hashCode along with equals ?
Note: all answers I checkout include use in hashtable / hashmap. I have tried extensively to get this answer, so as a request dont mark it as a duplicate. Thanks
It's the general convention:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
However, it's not entirely enforceable.
There are times in which you would believe that you don't need to have hashCode defined and implemented for your object, and if you don't use any structure that relies on a hash to store or reference it, you'd be correct.
But, there are third-party libraries in which your object may come into contact with, and they may very well be using a Map or Set to do their work, and they'd have the expectation that you followed conventions.
It's up to you to not implement hashCode along with equals - you're certainly not forced to (although many would argue that this is a bug), but beware that your object may not work as well with a third party library for this reason.
The only conceivable types which would not be able to override hashCode method in a fashion consistent with the hashCode and equals contract would be those which are unable to override hashCode [e.g. because a base class declared it final]. There is thus almost never any reason for a type not to legitimately implement hashCode(). Even if a type cannot guarantee that instances which are unequal won't spontaneously become equal, the author of the type may still legitimately implement hashCode() by picking a 32-big int value [e.g. 8675309] and implementing hashCode() as #override int hashCode() { return 8675309; }. Doing this will allow all of the hash-table-based collection types to work correctly. Storing very many such items into a hash table will severely degrade performance, but hash tables with just a few items will work just fine and generally perform decently. By contrast, if one doesn't override hashCode then even a hash table will likely work incorrectly if even a single item is stored into it.
Incidentally, in some cases there may be advantages to implementing hashCode even when not using hashed collections. For example, some immutable collection types which support deep comparison might call hashCode() on the items stored therein. If a collection is large, and/or comparison operations on the items stored within it are expensive, the efficiency of testing two collections for equality ("do they contain equal items") may be enhanced by using a three-step process:
Compare the aggregate hashcode of two collections. If they're not equal, no reason to look any further. Will often yield instant results, no matter the size of the collections.
Compare the cached hash codes of all the items. If the collections' contents match except for the last couple items, and if comparisons between items may be expensive (e.g. the items are thousand-character strings) this will often avoid the need to compare all but one of the items for equality [note that if all but one of the items matched, and its hash code differed, then the aggregate hash code would differ and we wouldn't have gotten this far].
If all the hash codes match, then call equals on each pair of items that don't compare reference-equal.
Note that if two collections contain distinct items with equal content, a comparison is going to need to deeply examine all of the items; hashCode can't do anything to help with that case. On the other hand, in most cases where things are compared they are not equal, and using cached hashCode() values may facilitate orders-of-magnitude speedups in those cases.
Related
If two objects return same hashCode, doesn't it mean that they are equal? Or we need equals to prevent collisions?
And can I implement equals by comparing hashCodes?
If two objects have the same hashCode then they are NOT necessarily equal. Otherwise you will have discovered the perfect hash function. But the opposite is true - if the objects are equal, then they must have the same hashCode.
hashCode and Equals are different information about objects
Consider the analogy to Persons where hashcode is the Birthday,
in that escenario, you and many other people have the same b-day (same hashcode), all you are not the same person however..
Why does Java need equals() if there is hashCode()?
Java needs equals() because it is the method through which object equality is tested by examining classes, fields, and other conditions the designer considers to be part of an equality test.
The purpose of hashCode() is to provide a hash value primarily for use by hash tables; though it can also be used for other purposes. The value returned is based on an object's fields and hash codes of its composite and/or aggregate objects. The method does not take into account the class or type of object.
The relationship between equals() and hashCode() is an implication.
Two objects that are equal implies that the have the same hash code.
Two objects having the same hash code does not imply that they are equal.
The latter does not hold for several reasons:
There is a chance that two distinct objects may return the same hash code. Keep in mind that a hash value folds information from a large amount of data into a smaller number.
Two objects from different classes with similar fields will most likely use the same type of hash function, and return equal hash values; yet, they are not the same.
hashCode() can be implementation-specific returning different values on different JVMs or JVM target installations.
Within the same JVM, hashCode() can be used as a cheap precursor for equality by testing for a known hash code first and only if the same testing actual equality; provided that the equality test is significantly more expensive than generating a hash code.
And can I implement equals by comparing hashCodes?
No. As mentioned, equal hash codes does not imply equal objects.
The hashCode method as stated in the Oracle Docs is a numeric representation of an object in Java. This hash code has limited possible values (represented by the values which can be stored in an int).
For a more complex class, there is a high possibility that you will find two different objects which have the same hash code value. Also, no one stops you from doing this inside any class.
class Test {
#Override
public int hashCode() {
return 0;
}
}
So, it is not recommended to implement the equals method by comparing hash codes. You should use them for comparison only if you can guarantee that each object has an unique hash code. In most cases, your only certainty is that if two objects are equal using o1.equals(o2) then o1.hashCode() == o2.hashCode().
In the equals method you can define a more complex logic for comparing two objects of the same class.
If two objects return same hashCode, doesn't it mean that they are equal?
No it doesn't mean that.
The javadocs for Object state this:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. ...
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must
produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCodemethod on
each of the two objects must produce distinct integer results. ...
Note the highlighted statement. It plainly says "No" to your question.
There is another way to look at this.
The hashCode returns an int.
There are only 232 distinct values that an int can take.
If a.hashCode() == b.hashCode() implies a.equals(b), then there can be only 232 distinct (i.e. mutually unequal) objects at any given time in a running Java application.
That last point is plainly not true. Indeed, it is demonstrably not true if you have a large enough heap to hold 232 instances of java.lang.Object ... in a 64-bit JVM.
And a third way is to some well-known examples where two different two character strings have the same hashcode.
Given that your assumption is incorrect, the reasoning that follows from it is also incorrect.
Java does need an equals method.
You generally cannot implement equals using just hashCode.
You may be able to use hashCode to implement a faster equals method, but only if calling hashCode twice is faster than comparing two objects. It generally isn't.
hashCodes are equal -> Objects might be equal -> further comparision is required
hashCodes are different -> Object are not equal (if hashCode is implemented right)
That's how equals method are implemented. At first you check if hashCodes are equal. If yes, you need to check class fields to see if it represents the exact same object. If hashCodes are different, you can be sure that objects are not equal.
Sometimes (very often?) you don't!
These answers are not untrue. But they don't tell the whole story.
One example would be where you are creating a load of objects of class SomeClass, and each instance that is created is given a unique ID by incrementing a static variable, nInstanceCount, or some such, in the constructor:
iD = nInstanceCount++;
Your hash function could then be
int hashCode(){
return iD;
}
and your equals could then be
boolean equals( Object obj ){
if( ! ( obj instanceof SomeClass )){
return false;
}
return hashCode() == obj.hashCode();
}
... under such circumstances your idea that "equals is superfluous" is effectively true: if all classes behaved like this, Java 10 (or Java 23) might say, ah, let's just get rid of silly old equals, what's the point? (NB backwards compatibility would then go out the window).
There are two essential points:
you couldn't then create more than MAXINT instances of SomeClass. Or... you could ... if you set up a system for reassigning the IDs of previously destroyed instances. IDs are typically long rather than int ... but this wouldn't work because hashCode() returns int.
none of these objects could then be "equal" to another one, since equality = identity for this particular class, as you have defined it. Often this is desirable. Often it shuts off whole avenues of possibilities...
The necessary implication of your question is, perhaps, what's the use of these two methods which, in a rather annoying way, have to "cooperate"? Frelling, in his/her answer, alluded to the crucial point: hash codes are needed for sorting into "buckets" with classes like HashMap. It's well worth reading up on this: the amount of advanced maths that has gone into designing efficient "bucket" mechanisms for classes like HashMap is quite frightening. After reading up on it you may come to have (like me) a bit of understanding and reverence about how and why you should bother implementing hashCode() with a bit of thought!
While reviewing a large code base, I've often come across cases like this:
#Override
public int hashCode()
{
return someFieldValue.hashCode();
}
where the programmer, instead of generating their own unique hash code for the class, simply inherits the hash code from a field value. My gut feeling (which might just as well be digestive problems) tells me that this is wrong, but I can't put my finger on it. What problems can arise, if any, with this sort of implementation?
This is fine if you want to hash your object based on a single property.
For example, in a Person class you might have an ID property that uniquely identifies a Person, so the hashCode() of Person can simply be the hash of that ID.
In addition, the hashCode() is related to the implementation of equals. If two objects are equal, they must have the same hashCode (the opposite doesn't have to be true - two non equal objects may still have the same hashCode). Therefore, if equality is determined by a single property (such as a unique ID), the hashCode method must also use only that single property.
This can be seen in the JavaDoc of hashCode :
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
Technically speaking, you can return any consistent number from hashCode, even a constant value. The only requirement the contract places upon you is that equal objects must return the same hash code:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
Theoretically, if all objects return, say, zero for their hashCode, the contract is formally satisfied. However, this makes hashCode completely useless.
The real question is whether you should do it or not. The answer depends on how unique is the field the hash code of which you are returning. It is not uncommon to return the hashCode of a unique identifier of an object for the object's hashCode. On the other hand, if a significant percentage of objects have the sane value of someFieldValue, you would be better off using a different strategy for making the hash code of your object.
hashCode() has to go with equals().
If the only property defining equalness is, for example, an ID, you HAVE TO take care that your hash codes are equal when the ID is equal.
The easiest way to accomplish this is by taking the hashCode() of your ID.
This is fine, if you really want to uniquely identify your object by this single property. Here is an article that explains what object identity really is.
As noted in the documentation of Object, your equals() and hashCode() need to incorporate the same properties, be sure to verify that.
So this means that you should ask yourself the question: do I really want the objects to be equal if only this single property is equal?
Finally do take great care when subclassing objects with a custom equals() and hashcode() implementation, if you want to add properties to the identity of the object, you will break the requirement that a.equals(b) == b.equals(a) (to see why this fails thing about this as a being the super class and b being the subclass.
yes you can do it technically, you need a non-primitive somefieldValue for that.
I have two simple wrapper classes around an Integer field, where I had to override equals() and hashCode(). In the end, they both use the same algorithm for hashCode(), so if the Integer field is the same, the hash codes collide.
Since the Objects are different types does this matter, or should I only care if I expect to mix them as keys in the same HashMap?
hashCode() being equal for two objects says "there's a chance these objects are equal, take a closer look by calling equals()". As long as the equals() methods for those classes are correct, the hash codes being the same is not a problem.
The general rule for hashCode() is that if two objects are equal, their hash codes should also be equal. Note that the rule is not "if two objects have the same hash code, then they should be equal."
If you are likely to have a hash map with objects of both types with the same values, then that is obviously going to be a potential performance problem. HashMap and the like don't look at the actual runtime class - and indeed there isn't a standard way to tell whether two objects of different classes can be equal (for instance, Lists with the same values in the same order generated by ArrayList and Arrays.asList should compare equal). For HashMap, I'm guessing the hit wont be too bad, but could be worse for, say, a probing implementation where there is a significant gain for getting a hit on first inspection.
I m using Java.I want to know,is any algorithm is available that will give me an unique and the same hash code when I will run the application multiple times sop that collisions of hash code will be avoided.
I know the thing that for similar objects, jvm returns same hash code and for different objects it may return same or different hash code.Bt I want some logic that will help to generate generate unique hash code for every object.
unique means that hash code of one object should not collide with any other object's hash code.and same means when i will run the application multiple times ,it should return me the same hash code whatever it returned me previously
The default hash code function in Java might return different hash codes for each JVM invokation, because it is able to use the memory address of the object, mangle it, and return it.
This is however not good coding practice, since objects which are equal should always return the same hashcode! Please read about the hash code contract to learn more. And most Classes in Java already have a hashcode function implemented that returns the same value on each JVM invocation.
To make it simple: All your data holding objects which might be stored in some collection should have an equals and hashcode implemention. If you code with Eclipse or any other reasonable IDE, you can use a wizard that creates the functions automatically.
And while we are at it: It is IMHO good practice to also implement the Comparable<T> interface, so you can use the objects within SortedSets and TreeMaps, too.
While we are at it: If others should your objects, don't forget Serializable and Cloneable.
Unique means that hashcode of one object should not collide with any other object's hashcode. Same means when I run the application multiple times, it should return me the same hash code whatever it returned me previously.
It is impossible to meet these requirements for a number of reasons:
It is not possible to guarantee that hashcodes are unique. Whatever you do in your classes hashcode method, some other classes hashcode method may give a value for some instance that is the same as the hashcode of one of your instances.
It is impossible to guarantee that hashcodes are unique across application runs even just for instances of your class.
The second requires justification. The way to create a unique hashcode is to do something like this:
static HashSet<Integer> usedCodes = ...
static IdentityHashMap<YourClass, Integer> codeMap = ...
public int hashcode() {
Integer code = codeMap.get(this);
if (code == null) {
code = // generate value-based hashcode for 'this'
while (usedCode.contains(code)) {
code = rehash(code);
}
usedCodes.add(code);
codeMap.put(this, code);
}
return code;
}
This gives the hashcodes with the desired uniqueness property, but the sameness property is not guaranteed ... unless the application always generates / accesses the hashcodes for all objects in the same order.
The only way to get this to work would be to persist the usedCode and codeMap data structures in a suitable form. Even (just) storing the unique hashcodes as part of the persisted objects is not sufficient, because there is a risk that the application may reissue a hashcode to a newly created object before reading the existing object that has the hashcode.
Finally, it should be noted that you have to be careful with using identity hashcodes anywhere in the solution. Identity hashcodes are not unique across different runs of an application. Indeed, if there are differences in any inputs, or if there is any non-determinism, it is highly likely that a given object will have a different identity hashcode value each time you run the application.
FOLLOW UP
Suppose you are storing millions of urls in database. While retrieving these urls, I want to generate unique hashcode that will make searching faster.
You need to store the hashcodes in a separate column of the table. But given the constraints discussed above, I don't see how this is going to make search faster. Basically you have to search the database for the URL in order to work out its unique hashcode.
I think you are better off using hashcodes that are not unique with a small probability. If you use a good enough "cryptographic" hashing function and a large enough hash size you can (in theory) make the probability of collision arbitrarily small ... but not zero.
Based on my understanding of your question...
If it is your custom object, then you can override the hashcode method(along with equals) to get a consistent hashcode based on the instance variables of your class. You can even return a constant hashcode, it will still satisfy the hascode contract.
The hashCode of a java Hashtable element is always unique?
If not, how can I guarantee that one search will give me the right element?
Not necessarily. Two distinct (and not-equal) objects can have the same hashcode.
First thing first.
You should consider to use HashMap instead of Hashtable, as the latter is considered obsolete (it enforces implicit synchronization, which is not required most of the time. If you need a synchronized HashMap, it is easily doable)
Now, regarding your question.
Hashcode is not guaranteed to be unique mathematically-wise,
however, when you're using HashMap (or Hashtable), it does not matter.
If two keys generate the same hash code, an equals is automatically invoked on each one of the keys to guarantee that the correct object will be retrieved.
If you're using a String as your key, you're worry free,
But if you're using your own object as the key, you should override the equals and the hashCode methods.
The equals method is mandatory for the proper operation of HashMap, whereas the hashCode method should be coded such that the hash-table will be relatively sparse (otherwise your hashmap, will be just a long array)
If you're using Eclipse there's an easy way to generate hashCode and equals, it basically does all the work for you.
From the Java documentation:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals comparisons
on the object is modified. This integer need not remain consistent
from one execution of an application to another execution of the same
application.
If two objects are equal according to the equals(Object)
method, then calling the hashCode method on each of the two objects must
produce the same integer result.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on each of
the two objects must produce distinct integer results. However, the
programmer should be aware that producing distinct integer results for
unequal objects may improve the performance of hashtables.
As much as is reasonably practical,
the hashCode method defined by class
Object does return distinct integers
for distinct objects. (This is
typically implemented by converting
the internal address of the object
into an integer, but this
implementation technique is not
required by the JavaTM programming
language.)
So yes, you can typically expect the default hashCode for an Object to be unique. However, if the method has been overridden by the class you are storing in the Hashtable, all bets are off.
Ideally, yes. In reality, collisions do occasionally happen.
The hashCode of a java Hashtable element is always unique?
They should. At least within the same class.
If not, how can I guarantee that one search will give me the right element?
By specifying your self a good hasCode implementation for your class: Override equals() and hashCode