What I understand from hashcode method for objects in Java: it is required to calculate the hashcode of objects which in turn is used to calculate the index / bucket location of the object in a hashed datastructure like hashMap.
So will it be correct to say that for a class that is not to be used along with a hashed datastructure doesnt need to have hashCode() method implemented in it? in other words is overriding the equals() method enough for non hashed datastructures?
Also please correct me if my assumption is wrong .
In theory, you are correct: if you know that your object will never be used in any way that requires the hash code, then not implementing hashCode will not cause anything bad to happen.
In practice there are reasons not to rely on this fact:
code changes and an objects that were initially planned to only ever exist in non-hashed structures get put into sets or used as the key to maps, because requirements change. If you don't implement hashCode then, things can go bad.
if you implement equals and you don't implement hashCode then you are almost certainly breaking the contract of hashCode that requires it to be consistent to equals. If your code breaks a contract that other code depends on, then that other code can silently fail in unexpected/weird ways.
in order to avoid mistakes it's usually best to let your IDE generate equals anyway and if you do that generating the appropriate hashCode at the same time is no extra effort.
Note that all of this assumes you even want to implement equals: If you don't care about equality of an object entirely (which is actually very common, not every type needs a specific equality definition), then you can just leave equals and hashCode out of your code and it's all conforming ('though your type might not match the "intuitive" equality definition of your type then).
Related
I am working on an assignment where I am supposed to implement my own hashcode and equals functions for a class called Book. The hashtable has its own abstract class called ProbingHashTable (there are also other classes for implementing different types of probing that extends the ProbingHashTable class, for another part of the assignment).
I would like to include key % tableSize (among other things) in the hashcode function if possible. The ProbingHashTable has a size() function, but of course the size can change if the hashtable gets rehashed etc. Is it possible to access the current size of the table from the Book class somehow?
It's somewhat unclear what you want, but it seems like you want to change what you return in hashCode based on some state unrelated to the object hashCode is being called on.
That's tricky to do and, more importantly, almost certainly breaking the contract of hashCode. If you check that documentation the first requirement for hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified
This means that your hashCode method must not consider any data that isn't also considered in the equals method. And since "the size of a map that is mostly unrelated to this object" isn't anything that should be used in your equals method it's definitely not something that you should use in your hashCode.
You could hold a reference to the map in your Book class and just query the current size in the hashCode method, but it's bound to cause a good number of issues and not really do anything good for you.
That being said: the calculation you describe (like key % tableSize) is usually done in the map implementation after the hashCode method was called.
In other words: it's perfectly fine if the map implementation (or whoever gets and uses the hashCode result) further manipulates the return value to fit its requirements. And that's what you probably should work towards.
Lets say I am implementing a class called Car, with 2 member variables int numDoors, and String color.
In a hypothetical case, I am never going to use such a car in hashtable or hashmap or any structure that needs a hash, time immemorial.
Now, why is it still required to override hashCode along with equals ?
Note: all answers I checkout include use in hashtable / hashmap. I have tried extensively to get this answer, so as a request dont mark it as a duplicate. Thanks
It's the general convention:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
However, it's not entirely enforceable.
There are times in which you would believe that you don't need to have hashCode defined and implemented for your object, and if you don't use any structure that relies on a hash to store or reference it, you'd be correct.
But, there are third-party libraries in which your object may come into contact with, and they may very well be using a Map or Set to do their work, and they'd have the expectation that you followed conventions.
It's up to you to not implement hashCode along with equals - you're certainly not forced to (although many would argue that this is a bug), but beware that your object may not work as well with a third party library for this reason.
The only conceivable types which would not be able to override hashCode method in a fashion consistent with the hashCode and equals contract would be those which are unable to override hashCode [e.g. because a base class declared it final]. There is thus almost never any reason for a type not to legitimately implement hashCode(). Even if a type cannot guarantee that instances which are unequal won't spontaneously become equal, the author of the type may still legitimately implement hashCode() by picking a 32-big int value [e.g. 8675309] and implementing hashCode() as #override int hashCode() { return 8675309; }. Doing this will allow all of the hash-table-based collection types to work correctly. Storing very many such items into a hash table will severely degrade performance, but hash tables with just a few items will work just fine and generally perform decently. By contrast, if one doesn't override hashCode then even a hash table will likely work incorrectly if even a single item is stored into it.
Incidentally, in some cases there may be advantages to implementing hashCode even when not using hashed collections. For example, some immutable collection types which support deep comparison might call hashCode() on the items stored therein. If a collection is large, and/or comparison operations on the items stored within it are expensive, the efficiency of testing two collections for equality ("do they contain equal items") may be enhanced by using a three-step process:
Compare the aggregate hashcode of two collections. If they're not equal, no reason to look any further. Will often yield instant results, no matter the size of the collections.
Compare the cached hash codes of all the items. If the collections' contents match except for the last couple items, and if comparisons between items may be expensive (e.g. the items are thousand-character strings) this will often avoid the need to compare all but one of the items for equality [note that if all but one of the items matched, and its hash code differed, then the aggregate hash code would differ and we wouldn't have gotten this far].
If all the hash codes match, then call equals on each pair of items that don't compare reference-equal.
Note that if two collections contain distinct items with equal content, a comparison is going to need to deeply examine all of the items; hashCode can't do anything to help with that case. On the other hand, in most cases where things are compared they are not equal, and using cached hashCode() values may facilitate orders-of-magnitude speedups in those cases.
I am currently working on comparing two complex objects of the same type, with multiple fields consisting of data structures of custom object types. Assuming that none of the custom objects has overriden the hashCode() method, if I compare the hashcodes of every field in the objects, and they will turn out to be the same, do I have a 100% confidence that the content of the compared objects is the same? If not, which method would you recommend to compare two objects, assuming I can't use any external libraries.
Absolutely not. You should only use hashCode() as a first pass - if the hash codes are different, you can assume the objects are unequal. If the hash codes are the same, you should then call equals() to check for full equality.
Think about it this way: there are only 232 possible hash codes. How many possible different objects are there of type String, as an example? Far more than that. Therefore at least two non-equal strings must share the same hash code.
Eric Lippert writes well about hash codes - admittedly from a .NET viewpoint, but the principles are the same.
No, lack of hashCode() collision only means that the objects could be identical, it's never a guarantee.
The only guarantee is that if the hashCode() values are different (and the hashCode()/equals() implementations are correct), then the objects will not be equal.
Additionally if your custom types don't have a hashCode() implementation, then that value is entirely useless for comparing the content of the object, because it will be the identityHashCode().
If you haven't overriden the hashCode() method, all of your objects are unequal. By overriding it you provide the logic of the comparison. Remember, if you override hashCode(), you definitely should override equals().
EDIT:
there still can be a collisionm of course, but if you didn't override equal(), your objects will be compared by reference (an object is equal to itself).
The usual JVM implementation of Object.hashCode() is to return the memory address of the object in some format, so this would technically be used for what you want (as no two objects can share the same address).
However, the actual specification of Object.hashCode() makes no guarentees and should not be used for this purpose in any sensible or well-written piece of code.
I would suggest using the hashCode and equals builders available in the Apache commons library, or if you really can't use free external libs, just have a look at them for inspiration. The best method to use depends entirely on what "equal" actually means in the context of your application domain though.
In my implementation, I have a class A which overrides equals(Object) and hashCode(). But I have a small doubt that is, while adding the instance of A to HashSet/HashMap the value of the hashCode() is x, after sometime the value of the same hashCode() changed to y. Will it effect anything?
The hash code mustn't change after it's been added to a map / set. It's okay for it to change before that, although it generally makes the type easier to work with if it doesn't change.
If the hash code changes, the key won't be found in the map / set, as even if it ends up in the same bucket, the hash code will be changed first.
When the return value of hashCode() or equals() changes while the object is contained in HashMap/HashSet etc., the behavior is undefined (you could get all kinds of strange behavior). So one must avoid such mutation of keys while the object is contained in such collections etc.
It is considered best to use only immutable objects for keys (or place them in a HashSet etc.). In fact python for example, does not allow mutable objects to be used as keys in maps. It is permissive/common to use mutable objects as keys in Java, but in such case it is advisable to make such objects "effectively immutable". I.e. do not change the state of such objects at all after instantiation.
To give an example, using a list as a key in a Map is usually considered okay, but you should avoid mutating such lists at any point of your application to avoid getting bitten by nasty bugs.
As long as you don't change the return value of hashCode() and equals() while the objects are in the container, you should be ok on paper. But one could easily introduce nasty, hard to find bugs by mistake so it's better to avoid the situation altogether.
Yes, the hash code of an object must not change during its lifetime. If it does, you need to notify the container (if that's possible); otherwise you will can get wrong results.
Edit: As pointed out, it depends on the container. Obviously, if the container never uses your hashCode or equals methods, nothing will go wrong. But as soon as it tries to compare things for equality (all maps and sets), you'll get yourself in trouble.
Yes. Many people answered the question here, I just want to say an analogy. Hash code is something like address in hash-based collection:
Imagine you check in a hotel by your name "Mike", after that you change your name to "GreatMike" on check-paper. Then when someone looks for you by your name "Mike", he cannot find you anymore.
The hashCode of a java Hashtable element is always unique?
If not, how can I guarantee that one search will give me the right element?
Not necessarily. Two distinct (and not-equal) objects can have the same hashcode.
First thing first.
You should consider to use HashMap instead of Hashtable, as the latter is considered obsolete (it enforces implicit synchronization, which is not required most of the time. If you need a synchronized HashMap, it is easily doable)
Now, regarding your question.
Hashcode is not guaranteed to be unique mathematically-wise,
however, when you're using HashMap (or Hashtable), it does not matter.
If two keys generate the same hash code, an equals is automatically invoked on each one of the keys to guarantee that the correct object will be retrieved.
If you're using a String as your key, you're worry free,
But if you're using your own object as the key, you should override the equals and the hashCode methods.
The equals method is mandatory for the proper operation of HashMap, whereas the hashCode method should be coded such that the hash-table will be relatively sparse (otherwise your hashmap, will be just a long array)
If you're using Eclipse there's an easy way to generate hashCode and equals, it basically does all the work for you.
From the Java documentation:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals comparisons
on the object is modified. This integer need not remain consistent
from one execution of an application to another execution of the same
application.
If two objects are equal according to the equals(Object)
method, then calling the hashCode method on each of the two objects must
produce the same integer result.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on each of
the two objects must produce distinct integer results. However, the
programmer should be aware that producing distinct integer results for
unequal objects may improve the performance of hashtables.
As much as is reasonably practical,
the hashCode method defined by class
Object does return distinct integers
for distinct objects. (This is
typically implemented by converting
the internal address of the object
into an integer, but this
implementation technique is not
required by the JavaTM programming
language.)
So yes, you can typically expect the default hashCode for an Object to be unique. However, if the method has been overridden by the class you are storing in the Hashtable, all bets are off.
Ideally, yes. In reality, collisions do occasionally happen.
The hashCode of a java Hashtable element is always unique?
They should. At least within the same class.
If not, how can I guarantee that one search will give me the right element?
By specifying your self a good hasCode implementation for your class: Override equals() and hashCode