Object in Java has hashCode method, however, it is being used only in associative containers like HashSet or HashMap. Why was it designed like that? Hashable interface having hashCode method looks as much more elegant solution.
The major argument seems to me that there is a well-defined default hashCode that can be calculated for any Java object, along with an equally well-defined equals. There is simply no good reason to withhold this function from all objects, and of course there are plenty reasons not to withhold it. So it's a no-brainer in my book.
This question is claimed as a duplicate from another which asks why there's no interface which behaves like Comparator (as distinct from Comparable) but for hashing. .NET includes such an interface, called IEqualityComparer, and it would seem like Java could as well. As it is, if someone wants to e.g. have a Java collection which e.g. maps strings to other objects in case-insensitive fashion (perhaps the most common use of IEqualityComparer) one must wrap the strings in objects whose hashCode and equals methods act on a case-insensitive basis.
I suspect the big issue is that while an "equalityComparer" interface could be convenient, in many cases efficiently testing an equivalence relation would require caching information. For example, while a case-insensitive string-hashing function could make an uppercase-only copy of the passed-in string and call hashCode on that, it would be difficult to avoid having every request for the hashcode of a particular string repeat the conversion to uppercase and the hashing of that uppercase value. By contrast, a "case-insensitive string" object could include fields for an uppercase-only copy of the string, which would then only have to be generated once for the instance.
An EqualityComparer could achieve reasonable performance if it included something like a WeakHashMap<string,string> to convert raw strings to uppercase-only strings, but such a design would either require different threads to use different EqualityComparer instances despite the lack of externally visible state, or else require performance-robbing locking and synchronization code even in single-threaded scenarios.
Incidentally, a second issue that arises with comparator-style interfaces is that a collection type which uses an externally-supplied comparator (whether it compares for rank or equality) is that the comparator itself becomes part of the state of the class which uses it. If hash tables use different EqualityComparer instances, there may be no way to know that they can safely be considered equivalent, even if the two comparators would behave identically in all circumstances.
Related
So imagine I have two instances of a class:
public class MyClass {
public void sayHello() {
System.out.println("Hello");
}
}
a = new MyClass();
b = new MyClass();
Now I add those to another object, such as:
public class OtherClass {
private ArrayList<MyClass> myClsList = new ArrayList<>();
public void add(MyClass obj) {
myClsList.add(obj);
}
public void remove(MyClass obj) {
// ????
}
}
c = new OtherClass();
c.add(a);
c.add(b);
Now I want to remove one specific instance e.g
c.remove(a);
Could I just iterate over them and test for equality, I mean this should theoretically work, since the two instances have distinct "internal pointers"?
I guess using a HashMap based approach would be more efficient, but what can I use as an key there (suppose I can't add unique instance ids or something).
EDIT: There is some confusion as to what exactly I'd like to know.
The key here is that I'd like to know if there is any way of removing that specific instance from c's ArrayList or whatever Aggregator Object I might use, just by providing the respective object reference.
I imagine this could be done by just keeping the ArrayList and testing for equality (although I'm not a 100% sure) but it would be cleaner if it was possible without iterating through the whole list.
I'd just like to know if anything of the like is possible in Java. (I know how to workaround it by using additional information but the clue is to just have the respective object reference for filtering/ retrieving purposes.
You can use a.toString(), according to the Java doc,
The toString method for class Object returns a string consisting of
the name of the class of which the object is an instance, the at-sign
character `#', and the unsigned hexadecimal representation of the hash
code of the object.
This should give you an unique identifier for your class instance, hence you can use this as a hash key without storing / creating any extra identifiers.
NB: Be careful with this practice, don't rely on the the value returned by `Object.toString(), as being related to the actual object addres, see detailed explanation here.
While your question is one that many beginners have (including myself), I believe that your concern is not justified in this case. The features you are asking for are already built into the Java language at the specification level.
First of all, let's look at Object.equals(). On the one hand, the Language Specification states that
The method equals defines a notion of object equality, which is based on value, not reference, comparison.
However, the documentation for Object.equals() clearly states that
The equals method for class Object implements the most discriminating possible equivalence relation on objects; that is, for any non-null reference values x and y, this method returns true if and only if x and y refer to the same object (x == y has the value true).
This means that you can safely redirect OtherClass.remove to ArrayList.remove(). Whatever Object.equals is comparing works exactly like a unique ID. In fact, in many (but not all) implementations, it compares the memory addresses to the objects, which are a form of unique ID.
Quite understandably, you do not wish to use linear iteration every time. As it happens, the machinery of Object is perfectly suited for use with something like a HashSet, which, by the way is the solution I recommend you use in this case.
If you are not dealing with some huge data set, we do not need to discuss the optimization of Object.hashCode(). You just need to know that it will implement whatever contract is necessary to work correctly with Object.equals to make HashSet.remove work correctly.
The spec itself only states that
The method hashCode is very useful, together with the method equals, in hashtables such as java.util.Hashmap.
This does not really say much, so we turn to the API reference. The two relevant point are:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
Simply put, the hashCode of equal objects must be the same, but an equal hashCode does not necessarily mean equal objects. Object implements this contract, so you can use it with a HashSet, which is backed by a HashMap.
The one piece of information that is missing to make this a formal argument in favor of not doing any additional work, is why I keep citing the API reference as if it was the language specification. As it happens:
As noted above, this specification often refers to classes of the Java SE platform API. In particular, some classes have a special relationship with the Java programming language. Examples include classes such as Object, Class, ClassLoader, String, Thread, and the classes and interfaces in package java.lang.reflect, among others. This specification constrains the behavior of such classes and interfaces, but does not provide a complete specification for them. The reader is referred to the Java SE platform API documentation.
[emphasis mine], but you get the idea. The Java SE API reference is the language spec as far as the behavior of the methods of Object is concerned.
As an aside, you will probably want to stay away from something like TreeSet, because that will require you to add a bunch of machinery to your implementation. As a minimum, MyClass instances will have to be orderable, either by implementing Comparable, or by assigning a custom Comparator to the Set.
TL;DR
The language specification states that you have at least the following two options available to you with no additional effort on your part:
Make myClsList an ArrayList and use the appropriate add()/remove() methods as you see fit.
Make myClsList a HashSet and use the appropriate add()/remove() methods.
I recommend the second option. In fact, instead of containment, you may consider extending HashSet so you don't have to bother implementing your own add/remove methods.
Final Note
All this works as long as MyClass overrides neither Object.equals nor Object.hashCode. The moment you do that, you put the burden of satisfying contractual requirements entirely on yourself.
This question already has answers here:
Why equals and hashCode were defined in Object?
(10 answers)
Closed 5 years ago.
I was thinking why the hash code is implemented in Object class when its purpose is served only while using collections like HashMap.So should'nt the hashcode be implemented in interfaces implementing Maps.
It's not a good idea to say that hashcode implementation is used
in Collections only.
In the Java API documentation, the general contract of hashCode is given as:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must
consistently return the same integer, provided no information used
in equals comparisons on the object is modified. This integer need
not remain consistent from one execution of an application to
another execution of the same application.
If two objects are equal according to the equals(Object) method,
then calling the hashCode method on each of the two objects must
produce the same integer result.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on
each of the two objects must produce distinct integer results.
However, the programmer should be aware that producing distinct
integer results for unequal objects may improve the performance of
hashtables.
So hashcode has to do only with Object. Collections just get benefits
from this feature for their own use cases e.g checking objects having same hashcode, storing objects based on hashcode
etc.
Note:- Collections doesn't use hashcode value to sort the objects.
hashcode() method is used mainly in case of hash based collections like HashMap and HashSet. The hash code returned by this method is used for calculation of hash index or bucket index.
HashCode function is the necessity of all classes or POJOs or Beans where we need a comparison or check equality.
Let suppose we need to compare two objects irrespective of the Collection API, then there should be a way to achieve it.
If HashCode is not the part of Object Class then it would be difficult to calculate the hash every time and be an overburden.
It's a pragmatic design decision but the question is essentially correct. A purist analysis would say that it's an example of a Interface Bloat or a Fat Interface.
The Java java.lang.Object has more methods than are strictly required by (or even meaningful for) all objects. It's not just hashCode() either.
It's arguable that the only method on Object that makes sense for all objects is getClass().
Not all applications are concurrent let alone needing their own monitors. So a purist object model would remove notify(), notifyAll() and 3 versions of wait() to an interface called (say) Monitored and then only permit synchronized to be used with objects implementing that.
It's very common for it to be either invalid or unnecessary to clone() objects - though that method is fortunately protected. Again best off in an interface say interface Cloneable<T>.
Object identity comparisons (are these references to the same object) has been provided as the intrinsic operator ==, so equals(Object) should (still being a purist) be in a ValueComparable<T> interface for objects that have that semantic (many don't).
Being very pure even then you'd push hashCode() into another interface (say) HashCodable.
finalize() could also be put in an interface HasFinalize. Indeed that could make the garbage collectors life a bit easier especially given its use so rare and specialized.
However there is a clear design decision in Java to simplify things and the designers decided to put a number of methods in Object that are apparently 'frequently used' or useful rather than 'strictly part of the minimal nature of being an object' which is (in the Java model at least) 'being an instance of some class of objects (having common methods, interfaces and semantics)'.
IMHO hashCode() is probably the least out of place!
It is totally unnecessary to provide a monitor on every object and leaves implementers with a headache of supporting the methods on every object knowing they will be called for a minuscule number of them. Don't under estimate the overhead that might cause given it may be necessary to allocate things like mutexes a whole cache-line (typically tens of bytes) to every one of millions of objects and there being no sane way it would ever get used.
I'm not suggesting for a second 'Java is broken' or 'badly designed'. I am not here to knock Java. It is a great language. As with the design of generics it has always chosen to make things simple and been willing to make some compromises on performance for simplicity and as a result produced a very powerful and accessible language in which by great implementation those performance overheads only occasionally grate.
But to repeat the point I think we should recognise those methods are not in the intrinsic nature of all objects.
None really talks about this aspect of equals() and hasCode(), but there is potentially massive impact on equals() and hashCode() behavior. Massive when dealing with a bit more complex objects referencing other objects.
Joshua Bloch in his Effective Java does not even mention it in his "overriding equals() method" chapter. All his examples are trivialities like Point and ColorPoint, all with just primitive or nearly primitive types.
Can recursivity be avoided? Sometimes hardly. Assume:
Person {
String name;
Address address;
}
Both fields has to go to business key (as Hibernate guys call it), they are both value components (as Joshua Bloch has it). And Address is a complex object itself. Recursion.
Be aware, IDEs like Eclipse and IntelliJ does generates recursive equals() and hashCode().
They by default use all fields. If you apply generator tools an mass, you asking for troubles.
One trouble is you can get a StackOverflowError. Here my simple test proving it.
All is needed is class having as a "value component" another object, forming a object graph and recommended equals() implementation. Yes, you need a graph in that cycle, but that is nothing unrealistic (imagine molecules, paths on map, interlinked transactions..).
Another trouble is performance. What is recommended for equals() is in fact comparing of two object graphs, potentially huge graphs, one can end up comparing thousands of nodes without knowing it. And not all of them are necessary in the memory! Consider that some objects may be lazy loadable. One can end up loading half of the database on one equals() or hashCode() call.
Paradox is, the more rigorously you override equals() and hashCode() as you are encouraged to do, the more likely you get into troubles.
Ideally, the equals() method should test logical equality. In some cases, that may descend more deeply than the physical object, and in others, it may not.
If testing logical equality is not feasible, due to performance or other concerns, then you can leave the default implementation provided by Object, and not rely on equals(). For example, you don't have to use your object graph as a key in a collection.
Bloch does say this:
The easiest way to avoid problems is not to override the equals method, in which case each instance of the class is equal only to itself.
There are at least two logical questions, which would be meaningful for any two references of any type, which it would at various times be useful for equals to test:
Can the type promise that the two references will forevermore identify equivalent objects?
Can the type promise that the two references will identify equivalent objects as long as the code which holds the references neither modifies the objects, nor exposes them to code that will?
If a reference identifies an object that might change at any time without notice, the only references that should be considered equivalent are those which identify the same object. If a reference identifies an object of a deeply-immutable type, and is never used in ways that test its identity (e.g. locking, IdentityHashSet, etc.) then all references to objects holding equal content should be considered equivalent. In both of the above situations, the proper behavior of equals is clear and unambiguous, since in the former case the proper answer for both questions would be obtained by testing reference identity, and in the latter case the proper answer would be obtained by testing deep equality.
Unfortunately, there's a very common scenario where the answers to the two questions diverge: when the only extant references to objects of mutable type are held by code which knows that no references to those objects will ever be held by code that might mutate them nor test them for reference identity. In that scenario, if two such objects presently encapsulate the same state, they will forever more do so, and thus equivalence should be based upon equivalence of constituents rather than upon reference identity. In other words, equals should be based upon how nested objects answer the second question.
Because the meaning of equality depends upon information which is only known by the holder of a reference, and not by the object identified by the reference, it's not really possible for an equals method to know what style of equality is appropriate. Types that know that the things to which they hold references might spontaneously change should test reference equality of those constituent parts, while types that know they won't change should generally test deep equality. Things like collections should allow the owner to specify whether the things stored in the collections could spontaneously change, and test equality on that basis; unfortunately, relatively few of the built-in collections include any such facility (code can select between e.g. HashTable and IdentityHashTable to distinguish what kind of test is appropriate for keys, but most kinds of collections have no equivalent choice). The best one can do is probably have each new collection type offer in its constructor a choice of encapsulation mode: regard the collection itself as something that might be changed without notice (report reference equality on the collection), assume the collection will hold an unchanging set of references to things that might change without notice (test reference equality on the contents), assume that neither the collection nor the constituent objects will change (test equals of each constituent object), or--for collections of arrays or of nested collections that don't support deep-equality testing--perform super-deep equality testing to a specified depth.
I am currently working on comparing two complex objects of the same type, with multiple fields consisting of data structures of custom object types. Assuming that none of the custom objects has overriden the hashCode() method, if I compare the hashcodes of every field in the objects, and they will turn out to be the same, do I have a 100% confidence that the content of the compared objects is the same? If not, which method would you recommend to compare two objects, assuming I can't use any external libraries.
Absolutely not. You should only use hashCode() as a first pass - if the hash codes are different, you can assume the objects are unequal. If the hash codes are the same, you should then call equals() to check for full equality.
Think about it this way: there are only 232 possible hash codes. How many possible different objects are there of type String, as an example? Far more than that. Therefore at least two non-equal strings must share the same hash code.
Eric Lippert writes well about hash codes - admittedly from a .NET viewpoint, but the principles are the same.
No, lack of hashCode() collision only means that the objects could be identical, it's never a guarantee.
The only guarantee is that if the hashCode() values are different (and the hashCode()/equals() implementations are correct), then the objects will not be equal.
Additionally if your custom types don't have a hashCode() implementation, then that value is entirely useless for comparing the content of the object, because it will be the identityHashCode().
If you haven't overriden the hashCode() method, all of your objects are unequal. By overriding it you provide the logic of the comparison. Remember, if you override hashCode(), you definitely should override equals().
EDIT:
there still can be a collisionm of course, but if you didn't override equal(), your objects will be compared by reference (an object is equal to itself).
The usual JVM implementation of Object.hashCode() is to return the memory address of the object in some format, so this would technically be used for what you want (as no two objects can share the same address).
However, the actual specification of Object.hashCode() makes no guarentees and should not be used for this purpose in any sensible or well-written piece of code.
I would suggest using the hashCode and equals builders available in the Apache commons library, or if you really can't use free external libs, just have a look at them for inspiration. The best method to use depends entirely on what "equal" actually means in the context of your application domain though.
There are some cases that the key objects used in map do not override hashCode() and equals() from Object, for examples, use a socket Connection or java.lang.Class as keys.
Is there any potential defect to use these objects as keys in a HashMap?
Should I use IdentityHashMap in these cases?
If equals() and hashCode() are not overridden on key objects, HashMap and IdentityHashMap should have the same semantics. The default equals() implementation uses reference semantics, and the default hashCode() is the system identity hash code of the object.
This is only harmful in cases where different instances of an object can be considered logically equal. For example, you would not want to use IdentityHashMap if your keys were:
new Integer(1)
and
new Integer(1)
Since these are technically different instances of the Integer class. (You should really be using Integer.valueOf(1), but that's getting off-topic.)
Class as keys should be okay, except in very special circumstances (for example, the hibernate ORM library generates subclasses of your classes at runtime in order to implement a proxy.) As a developer I would be skeptical of code which stores Connection objects in a Map as keys (maybe you should be using a connection pool, if you are managing database connections?). Whether or not they will work depends on the implementation (since Connection is just an interface).
Also, it's important to note that HashMap expects the equals() and hashCode() determination to remain constant. In particular, if you implement some custom hashCode() which uses mutable fields on the key object, changing a key field may make the key get 'lost' in the wrong hashtable bucket of the HashMap. In these cases, you may be able to use IdentityHashMap (depending on the object and your particular use case), or you might just need a different equals()/hashCode() implementation.
From a mobile code security point of view, there are situations where using IdentityHashMap or similar becomes necessary. Malicious implementations of non-final key classes can override hashCode and equals to be malicious. They can, for instance, claim equality to different instances, acquire a reference to other instances they are compared to, etc. I suggest breaking with standard practice by staying safe and using IdentityHashMap where you want identity semantics. There rarely is a good reason to changing the meaning of equality in a subclass where the superclass is already being compared. I guess the most likely scenario is a broken, non-symmetric proxy.
The implementation of IdentityHashMap is quite different than HashMap. It uses linear probing rather than Entry objects as links in a chain. This leads to a slight reduction in the number of objects, although a pretty small difference in total memory use. I don't have any good performance statistics I can cite. There used to be a performance difference between using (non-overridden) Object.hashCode and System.identityHashCode, but that got cleared up a few years ago.
In situation you describes, the behaviors of HashMap and IdentityHashMap is identical.
On the contrast to this, if keys overrides equals() and hashCode(), behaviors of two maps are different.
see java.util.IdentityHashMap's javadoc below.
This class implements the Map interface with a hash table, using reference-equality in place of object-equality when comparing keys (and values). In other words, in an IdentityHashMap, two keys k1 and k2 are considered equal if and only if (k1==k2). (In normal Map implementations (like HashMap) two keys k1 and k2 are considered equal if and only if (k1==null ? k2==null : k1.equals(k2)).)
In summary, my answer is that:
Is there any potential defect to use these objects as keys in a HashMap?
--> No
Should I use IdentityHashMap in these cases? --> No
While there's no theoretical problem, you should avoid IdentityHashMap unless you have an explicit reason to use it. It provides no appreciable performance or other benefit in the general case, and when you inevitably start introducing objects into the map that do override equals() and hashCode(), you'll end up with subtle, hard-to-diagnose bugs.
If you think you need IdentityHashMap for performance reasons, use a profiler to confirm that suspicion before you make the switch. My guess is you'll find plenty of other opportunities for optimization that are both safer and make a bigger difference.
As far as I know, the only problem with a hashmap with bad keys, would be with very big hashmaps- your keys could be very bad and you get o(n) retrieval time, instead of o(1). If it does break anything else, I would be interested to hear about it though :)