Hash Code method in Object class [duplicate] - java

This question already has answers here:
Why equals and hashCode were defined in Object?
(10 answers)
Closed 5 years ago.
I was thinking why the hash code is implemented in Object class when its purpose is served only while using collections like HashMap.So should'nt the hashcode be implemented in interfaces implementing Maps.

It's not a good idea to say that hashcode implementation is used
in Collections only.
In the Java API documentation, the general contract of hashCode is given as:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must
consistently return the same integer, provided no information used
in equals comparisons on the object is modified. This integer need
not remain consistent from one execution of an application to
another execution of the same application.
If two objects are equal according to the equals(Object) method,
then calling the hashCode method on each of the two objects must
produce the same integer result.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on
each of the two objects must produce distinct integer results.
However, the programmer should be aware that producing distinct
integer results for unequal objects may improve the performance of
hashtables.
So hashcode has to do only with Object. Collections just get benefits
from this feature for their own use cases e.g checking objects having same hashcode, storing objects based on hashcode
etc.
Note:- Collections doesn't use hashcode value to sort the objects.

hashcode() method is used mainly in case of hash based collections like HashMap and HashSet. The hash code returned by this method is used for calculation of hash index or bucket index.
HashCode function is the necessity of all classes or POJOs or Beans where we need a comparison or check equality.
Let suppose we need to compare two objects irrespective of the Collection API, then there should be a way to achieve it.
If HashCode is not the part of Object Class then it would be difficult to calculate the hash every time and be an overburden.

It's a pragmatic design decision but the question is essentially correct. A purist analysis would say that it's an example of a Interface Bloat or a Fat Interface.
The Java java.lang.Object has more methods than are strictly required by (or even meaningful for) all objects. It's not just hashCode() either.
It's arguable that the only method on Object that makes sense for all objects is getClass().
Not all applications are concurrent let alone needing their own monitors. So a purist object model would remove notify(), notifyAll() and 3 versions of wait() to an interface called (say) Monitored and then only permit synchronized to be used with objects implementing that.
It's very common for it to be either invalid or unnecessary to clone() objects - though that method is fortunately protected. Again best off in an interface say interface Cloneable<T>.
Object identity comparisons (are these references to the same object) has been provided as the intrinsic operator ==, so equals(Object) should (still being a purist) be in a ValueComparable<T> interface for objects that have that semantic (many don't).
Being very pure even then you'd push hashCode() into another interface (say) HashCodable.
finalize() could also be put in an interface HasFinalize. Indeed that could make the garbage collectors life a bit easier especially given its use so rare and specialized.
However there is a clear design decision in Java to simplify things and the designers decided to put a number of methods in Object that are apparently 'frequently used' or useful rather than 'strictly part of the minimal nature of being an object' which is (in the Java model at least) 'being an instance of some class of objects (having common methods, interfaces and semantics)'.
IMHO hashCode() is probably the least out of place!
It is totally unnecessary to provide a monitor on every object and leaves implementers with a headache of supporting the methods on every object knowing they will be called for a minuscule number of them. Don't under estimate the overhead that might cause given it may be necessary to allocate things like mutexes a whole cache-line (typically tens of bytes) to every one of millions of objects and there being no sane way it would ever get used.
I'm not suggesting for a second 'Java is broken' or 'badly designed'. I am not here to knock Java. It is a great language. As with the design of generics it has always chosen to make things simple and been willing to make some compromises on performance for simplicity and as a result produced a very powerful and accessible language in which by great implementation those performance overheads only occasionally grate.
But to repeat the point I think we should recognise those methods are not in the intrinsic nature of all objects.

Related

Find an a specific instance, what is the best approach

So imagine I have two instances of a class:
public class MyClass {
public void sayHello() {
System.out.println("Hello");
}
}
a = new MyClass();
b = new MyClass();
Now I add those to another object, such as:
public class OtherClass {
private ArrayList<MyClass> myClsList = new ArrayList<>();
public void add(MyClass obj) {
myClsList.add(obj);
}
public void remove(MyClass obj) {
// ????
}
}
c = new OtherClass();
c.add(a);
c.add(b);
Now I want to remove one specific instance e.g
c.remove(a);
Could I just iterate over them and test for equality, I mean this should theoretically work, since the two instances have distinct "internal pointers"?
I guess using a HashMap based approach would be more efficient, but what can I use as an key there (suppose I can't add unique instance ids or something).
EDIT: There is some confusion as to what exactly I'd like to know.
The key here is that I'd like to know if there is any way of removing that specific instance from c's ArrayList or whatever Aggregator Object I might use, just by providing the respective object reference.
I imagine this could be done by just keeping the ArrayList and testing for equality (although I'm not a 100% sure) but it would be cleaner if it was possible without iterating through the whole list.
I'd just like to know if anything of the like is possible in Java. (I know how to workaround it by using additional information but the clue is to just have the respective object reference for filtering/ retrieving purposes.
You can use a.toString(), according to the Java doc,
The toString method for class Object returns a string consisting of
the name of the class of which the object is an instance, the at-sign
character `#', and the unsigned hexadecimal representation of the hash
code of the object.
This should give you an unique identifier for your class instance, hence you can use this as a hash key without storing / creating any extra identifiers.
NB: Be careful with this practice, don't rely on the the value returned by `Object.toString(), as being related to the actual object addres, see detailed explanation here.
While your question is one that many beginners have (including myself), I believe that your concern is not justified in this case. The features you are asking for are already built into the Java language at the specification level.
First of all, let's look at Object.equals(). On the one hand, the Language Specification states that
The method equals defines a notion of object equality, which is based on value, not reference, comparison.
However, the documentation for Object.equals() clearly states that
The equals method for class Object implements the most discriminating possible equivalence relation on objects; that is, for any non-null reference values x and y, this method returns true if and only if x and y refer to the same object (x == y has the value true).
This means that you can safely redirect OtherClass.remove to ArrayList.remove(). Whatever Object.equals is comparing works exactly like a unique ID. In fact, in many (but not all) implementations, it compares the memory addresses to the objects, which are a form of unique ID.
Quite understandably, you do not wish to use linear iteration every time. As it happens, the machinery of Object is perfectly suited for use with something like a HashSet, which, by the way is the solution I recommend you use in this case.
If you are not dealing with some huge data set, we do not need to discuss the optimization of Object.hashCode(). You just need to know that it will implement whatever contract is necessary to work correctly with Object.equals to make HashSet.remove work correctly.
The spec itself only states that
The method hashCode is very useful, together with the method equals, in hashtables such as java.util.Hashmap.
This does not really say much, so we turn to the API reference. The two relevant point are:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
Simply put, the hashCode of equal objects must be the same, but an equal hashCode does not necessarily mean equal objects. Object implements this contract, so you can use it with a HashSet, which is backed by a HashMap.
The one piece of information that is missing to make this a formal argument in favor of not doing any additional work, is why I keep citing the API reference as if it was the language specification. As it happens:
As noted above, this specification often refers to classes of the Java SE platform API. In particular, some classes have a special relationship with the Java programming language. Examples include classes such as Object, Class, ClassLoader, String, Thread, and the classes and interfaces in package java.lang.reflect, among others. This specification constrains the behavior of such classes and interfaces, but does not provide a complete specification for them. The reader is referred to the Java SE platform API documentation.
[emphasis mine], but you get the idea. The Java SE API reference is the language spec as far as the behavior of the methods of Object is concerned.
As an aside, you will probably want to stay away from something like TreeSet, because that will require you to add a bunch of machinery to your implementation. As a minimum, MyClass instances will have to be orderable, either by implementing Comparable, or by assigning a custom Comparator to the Set.
TL;DR
The language specification states that you have at least the following two options available to you with no additional effort on your part:
Make myClsList an ArrayList and use the appropriate add()/remove() methods as you see fit.
Make myClsList a HashSet and use the appropriate add()/remove() methods.
I recommend the second option. In fact, instead of containment, you may consider extending HashSet so you don't have to bother implementing your own add/remove methods.
Final Note
All this works as long as MyClass overrides neither Object.equals nor Object.hashCode. The moment you do that, you put the burden of satisfying contractual requirements entirely on yourself.

is equals() supposed to be recursive/deep?

None really talks about this aspect of equals() and hasCode(), but there is potentially massive impact on equals() and hashCode() behavior. Massive when dealing with a bit more complex objects referencing other objects.
Joshua Bloch in his Effective Java does not even mention it in his "overriding equals() method" chapter. All his examples are trivialities like Point and ColorPoint, all with just primitive or nearly primitive types.
Can recursivity be avoided? Sometimes hardly. Assume:
Person {
String name;
Address address;
}
Both fields has to go to business key (as Hibernate guys call it), they are both value components (as Joshua Bloch has it). And Address is a complex object itself. Recursion.
Be aware, IDEs like Eclipse and IntelliJ does generates recursive equals() and hashCode().
They by default use all fields. If you apply generator tools an mass, you asking for troubles.
One trouble is you can get a StackOverflowError. Here my simple test proving it.
All is needed is class having as a "value component" another object, forming a object graph and recommended equals() implementation. Yes, you need a graph in that cycle, but that is nothing unrealistic (imagine molecules, paths on map, interlinked transactions..).
Another trouble is performance. What is recommended for equals() is in fact comparing of two object graphs, potentially huge graphs, one can end up comparing thousands of nodes without knowing it. And not all of them are necessary in the memory! Consider that some objects may be lazy loadable. One can end up loading half of the database on one equals() or hashCode() call.
Paradox is, the more rigorously you override equals() and hashCode() as you are encouraged to do, the more likely you get into troubles.
Ideally, the equals() method should test logical equality. In some cases, that may descend more deeply than the physical object, and in others, it may not.
If testing logical equality is not feasible, due to performance or other concerns, then you can leave the default implementation provided by Object, and not rely on equals(). For example, you don't have to use your object graph as a key in a collection.
Bloch does say this:
The easiest way to avoid problems is not to override the equals method, in which case each instance of the class is equal only to itself.
There are at least two logical questions, which would be meaningful for any two references of any type, which it would at various times be useful for equals to test:
Can the type promise that the two references will forevermore identify equivalent objects?
Can the type promise that the two references will identify equivalent objects as long as the code which holds the references neither modifies the objects, nor exposes them to code that will?
If a reference identifies an object that might change at any time without notice, the only references that should be considered equivalent are those which identify the same object. If a reference identifies an object of a deeply-immutable type, and is never used in ways that test its identity (e.g. locking, IdentityHashSet, etc.) then all references to objects holding equal content should be considered equivalent. In both of the above situations, the proper behavior of equals is clear and unambiguous, since in the former case the proper answer for both questions would be obtained by testing reference identity, and in the latter case the proper answer would be obtained by testing deep equality.
Unfortunately, there's a very common scenario where the answers to the two questions diverge: when the only extant references to objects of mutable type are held by code which knows that no references to those objects will ever be held by code that might mutate them nor test them for reference identity. In that scenario, if two such objects presently encapsulate the same state, they will forever more do so, and thus equivalence should be based upon equivalence of constituents rather than upon reference identity. In other words, equals should be based upon how nested objects answer the second question.
Because the meaning of equality depends upon information which is only known by the holder of a reference, and not by the object identified by the reference, it's not really possible for an equals method to know what style of equality is appropriate. Types that know that the things to which they hold references might spontaneously change should test reference equality of those constituent parts, while types that know they won't change should generally test deep equality. Things like collections should allow the owner to specify whether the things stored in the collections could spontaneously change, and test equality on that basis; unfortunately, relatively few of the built-in collections include any such facility (code can select between e.g. HashTable and IdentityHashTable to distinguish what kind of test is appropriate for keys, but most kinds of collections have no equivalent choice). The best one can do is probably have each new collection type offer in its constructor a choice of encapsulation mode: regard the collection itself as something that might be changed without notice (report reference equality on the collection), assume the collection will hold an unchanging set of references to things that might change without notice (test reference equality on the contents), assume that neither the collection nor the constituent objects will change (test equals of each constituent object), or--for collections of arrays or of nested collections that don't support deep-equality testing--perform super-deep equality testing to a specified depth.

Why there is no Hashable interface in Java

Object in Java has hashCode method, however, it is being used only in associative containers like HashSet or HashMap. Why was it designed like that? Hashable interface having hashCode method looks as much more elegant solution.
The major argument seems to me that there is a well-defined default hashCode that can be calculated for any Java object, along with an equally well-defined equals. There is simply no good reason to withhold this function from all objects, and of course there are plenty reasons not to withhold it. So it's a no-brainer in my book.
This question is claimed as a duplicate from another which asks why there's no interface which behaves like Comparator (as distinct from Comparable) but for hashing. .NET includes such an interface, called IEqualityComparer, and it would seem like Java could as well. As it is, if someone wants to e.g. have a Java collection which e.g. maps strings to other objects in case-insensitive fashion (perhaps the most common use of IEqualityComparer) one must wrap the strings in objects whose hashCode and equals methods act on a case-insensitive basis.
I suspect the big issue is that while an "equalityComparer" interface could be convenient, in many cases efficiently testing an equivalence relation would require caching information. For example, while a case-insensitive string-hashing function could make an uppercase-only copy of the passed-in string and call hashCode on that, it would be difficult to avoid having every request for the hashcode of a particular string repeat the conversion to uppercase and the hashing of that uppercase value. By contrast, a "case-insensitive string" object could include fields for an uppercase-only copy of the string, which would then only have to be generated once for the instance.
An EqualityComparer could achieve reasonable performance if it included something like a WeakHashMap<string,string> to convert raw strings to uppercase-only strings, but such a design would either require different threads to use different EqualityComparer instances despite the lack of externally visible state, or else require performance-robbing locking and synchronization code even in single-threaded scenarios.
Incidentally, a second issue that arises with comparator-style interfaces is that a collection type which uses an externally-supplied comparator (whether it compares for rank or equality) is that the comparator itself becomes part of the state of the class which uses it. If hash tables use different EqualityComparer instances, there may be no way to know that they can safely be considered equivalent, even if the two comparators would behave identically in all circumstances.

Is Java hashCode() method a reliable measure of object equality?

I am currently working on comparing two complex objects of the same type, with multiple fields consisting of data structures of custom object types. Assuming that none of the custom objects has overriden the hashCode() method, if I compare the hashcodes of every field in the objects, and they will turn out to be the same, do I have a 100% confidence that the content of the compared objects is the same? If not, which method would you recommend to compare two objects, assuming I can't use any external libraries.
Absolutely not. You should only use hashCode() as a first pass - if the hash codes are different, you can assume the objects are unequal. If the hash codes are the same, you should then call equals() to check for full equality.
Think about it this way: there are only 232 possible hash codes. How many possible different objects are there of type String, as an example? Far more than that. Therefore at least two non-equal strings must share the same hash code.
Eric Lippert writes well about hash codes - admittedly from a .NET viewpoint, but the principles are the same.
No, lack of hashCode() collision only means that the objects could be identical, it's never a guarantee.
The only guarantee is that if the hashCode() values are different (and the hashCode()/equals() implementations are correct), then the objects will not be equal.
Additionally if your custom types don't have a hashCode() implementation, then that value is entirely useless for comparing the content of the object, because it will be the identityHashCode().
If you haven't overriden the hashCode() method, all of your objects are unequal. By overriding it you provide the logic of the comparison. Remember, if you override hashCode(), you definitely should override equals().
EDIT:
there still can be a collisionm of course, but if you didn't override equal(), your objects will be compared by reference (an object is equal to itself).
The usual JVM implementation of Object.hashCode() is to return the memory address of the object in some format, so this would technically be used for what you want (as no two objects can share the same address).
However, the actual specification of Object.hashCode() makes no guarentees and should not be used for this purpose in any sensible or well-written piece of code.
I would suggest using the hashCode and equals builders available in the Apache commons library, or if you really can't use free external libs, just have a look at them for inspiration. The best method to use depends entirely on what "equal" actually means in the context of your application domain though.

hash of java hashtable

The hashCode of a java Hashtable element is always unique?
If not, how can I guarantee that one search will give me the right element?
Not necessarily. Two distinct (and not-equal) objects can have the same hashcode.
First thing first.
You should consider to use HashMap instead of Hashtable, as the latter is considered obsolete (it enforces implicit synchronization, which is not required most of the time. If you need a synchronized HashMap, it is easily doable)
Now, regarding your question.
Hashcode is not guaranteed to be unique mathematically-wise,
however, when you're using HashMap (or Hashtable), it does not matter.
If two keys generate the same hash code, an equals is automatically invoked on each one of the keys to guarantee that the correct object will be retrieved.
If you're using a String as your key, you're worry free,
But if you're using your own object as the key, you should override the equals and the hashCode methods.
The equals method is mandatory for the proper operation of HashMap, whereas the hashCode method should be coded such that the hash-table will be relatively sparse (otherwise your hashmap, will be just a long array)
If you're using Eclipse there's an easy way to generate hashCode and equals, it basically does all the work for you.
From the Java documentation:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals comparisons
on the object is modified. This integer need not remain consistent
from one execution of an application to another execution of the same
application.
If two objects are equal according to the equals(Object)
method, then calling the hashCode method on each of the two objects must
produce the same integer result.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on each of
the two objects must produce distinct integer results. However, the
programmer should be aware that producing distinct integer results for
unequal objects may improve the performance of hashtables.
As much as is reasonably practical,
the hashCode method defined by class
Object does return distinct integers
for distinct objects. (This is
typically implemented by converting
the internal address of the object
into an integer, but this
implementation technique is not
required by the JavaTM programming
language.)
So yes, you can typically expect the default hashCode for an Object to be unique. However, if the method has been overridden by the class you are storing in the Hashtable, all bets are off.
Ideally, yes. In reality, collisions do occasionally happen.
The hashCode of a java Hashtable element is always unique?
They should. At least within the same class.
If not, how can I guarantee that one search will give me the right element?
By specifying your self a good hasCode implementation for your class: Override equals() and hashCode

Categories

Resources