So imagine I have two instances of a class:
public class MyClass {
public void sayHello() {
System.out.println("Hello");
}
}
a = new MyClass();
b = new MyClass();
Now I add those to another object, such as:
public class OtherClass {
private ArrayList<MyClass> myClsList = new ArrayList<>();
public void add(MyClass obj) {
myClsList.add(obj);
}
public void remove(MyClass obj) {
// ????
}
}
c = new OtherClass();
c.add(a);
c.add(b);
Now I want to remove one specific instance e.g
c.remove(a);
Could I just iterate over them and test for equality, I mean this should theoretically work, since the two instances have distinct "internal pointers"?
I guess using a HashMap based approach would be more efficient, but what can I use as an key there (suppose I can't add unique instance ids or something).
EDIT: There is some confusion as to what exactly I'd like to know.
The key here is that I'd like to know if there is any way of removing that specific instance from c's ArrayList or whatever Aggregator Object I might use, just by providing the respective object reference.
I imagine this could be done by just keeping the ArrayList and testing for equality (although I'm not a 100% sure) but it would be cleaner if it was possible without iterating through the whole list.
I'd just like to know if anything of the like is possible in Java. (I know how to workaround it by using additional information but the clue is to just have the respective object reference for filtering/ retrieving purposes.
You can use a.toString(), according to the Java doc,
The toString method for class Object returns a string consisting of
the name of the class of which the object is an instance, the at-sign
character `#', and the unsigned hexadecimal representation of the hash
code of the object.
This should give you an unique identifier for your class instance, hence you can use this as a hash key without storing / creating any extra identifiers.
NB: Be careful with this practice, don't rely on the the value returned by `Object.toString(), as being related to the actual object addres, see detailed explanation here.
While your question is one that many beginners have (including myself), I believe that your concern is not justified in this case. The features you are asking for are already built into the Java language at the specification level.
First of all, let's look at Object.equals(). On the one hand, the Language Specification states that
The method equals defines a notion of object equality, which is based on value, not reference, comparison.
However, the documentation for Object.equals() clearly states that
The equals method for class Object implements the most discriminating possible equivalence relation on objects; that is, for any non-null reference values x and y, this method returns true if and only if x and y refer to the same object (x == y has the value true).
This means that you can safely redirect OtherClass.remove to ArrayList.remove(). Whatever Object.equals is comparing works exactly like a unique ID. In fact, in many (but not all) implementations, it compares the memory addresses to the objects, which are a form of unique ID.
Quite understandably, you do not wish to use linear iteration every time. As it happens, the machinery of Object is perfectly suited for use with something like a HashSet, which, by the way is the solution I recommend you use in this case.
If you are not dealing with some huge data set, we do not need to discuss the optimization of Object.hashCode(). You just need to know that it will implement whatever contract is necessary to work correctly with Object.equals to make HashSet.remove work correctly.
The spec itself only states that
The method hashCode is very useful, together with the method equals, in hashtables such as java.util.Hashmap.
This does not really say much, so we turn to the API reference. The two relevant point are:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
Simply put, the hashCode of equal objects must be the same, but an equal hashCode does not necessarily mean equal objects. Object implements this contract, so you can use it with a HashSet, which is backed by a HashMap.
The one piece of information that is missing to make this a formal argument in favor of not doing any additional work, is why I keep citing the API reference as if it was the language specification. As it happens:
As noted above, this specification often refers to classes of the Java SE platform API. In particular, some classes have a special relationship with the Java programming language. Examples include classes such as Object, Class, ClassLoader, String, Thread, and the classes and interfaces in package java.lang.reflect, among others. This specification constrains the behavior of such classes and interfaces, but does not provide a complete specification for them. The reader is referred to the Java SE platform API documentation.
[emphasis mine], but you get the idea. The Java SE API reference is the language spec as far as the behavior of the methods of Object is concerned.
As an aside, you will probably want to stay away from something like TreeSet, because that will require you to add a bunch of machinery to your implementation. As a minimum, MyClass instances will have to be orderable, either by implementing Comparable, or by assigning a custom Comparator to the Set.
TL;DR
The language specification states that you have at least the following two options available to you with no additional effort on your part:
Make myClsList an ArrayList and use the appropriate add()/remove() methods as you see fit.
Make myClsList a HashSet and use the appropriate add()/remove() methods.
I recommend the second option. In fact, instead of containment, you may consider extending HashSet so you don't have to bother implementing your own add/remove methods.
Final Note
All this works as long as MyClass overrides neither Object.equals nor Object.hashCode. The moment you do that, you put the burden of satisfying contractual requirements entirely on yourself.
Related
Say we have a class Foo like
class Foo {
private int attr1;
private String attr2;
// getters and setters.
// hashCode and equals not overrided.
}
So while adding references of Foo to a Set or a Map (as a key) duplicates will be identified based on their address locations. Now if I override hashCode and equals based on attr2, duplicates will be identified based on the value of attr2. That's how duplicate filtration works in Java - Look for any user defined mechanism, if present use that otherwise use the default mechanism.
If we try to add references of Foo to a sorted collection like TreeSet, TreeMap it will throw ClassCastException saying that there is no comparison mechanism. So we can make it either Comparable or Comparator type and can define a comparison mechanism.
So my question is while finding duplicates, if the user hasn't defined any mechanism Java will look for default mechanism, but while sorting or comparing it insists user to define a mechanism. Why won't it go for a default mechanism, for example comparing references based on their hashcode? Is it because any OOPs concept or any concept in Java may be violated if they go for a default comparison?
TL;DR Give me a sensible way of defining a total ordering between opaque objects.
It is sensible to say, lacking any other information, that objects are the same only if they are physically the same. This is what the default equals and hashCode do.
It is not sensible, and indeed makes no sense, to say that one object is "bigger" than another because a digest of its memory location is bigger.
Even more dammingly for your proposed mechanism, in modern Java the old adage that a hashCode is a memory location is actually incorrect. An Object is assigned a random number at construction and that is used as the hashCode.
So you are really proposing:
"Lacking any information about the objects to be ordered, we will order them completely at random"
I cannot think of any situation where this default behaviour is:
expected
useful
This question already has answers here:
Why equals and hashCode were defined in Object?
(10 answers)
Closed 5 years ago.
I was thinking why the hash code is implemented in Object class when its purpose is served only while using collections like HashMap.So should'nt the hashcode be implemented in interfaces implementing Maps.
It's not a good idea to say that hashcode implementation is used
in Collections only.
In the Java API documentation, the general contract of hashCode is given as:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must
consistently return the same integer, provided no information used
in equals comparisons on the object is modified. This integer need
not remain consistent from one execution of an application to
another execution of the same application.
If two objects are equal according to the equals(Object) method,
then calling the hashCode method on each of the two objects must
produce the same integer result.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on
each of the two objects must produce distinct integer results.
However, the programmer should be aware that producing distinct
integer results for unequal objects may improve the performance of
hashtables.
So hashcode has to do only with Object. Collections just get benefits
from this feature for their own use cases e.g checking objects having same hashcode, storing objects based on hashcode
etc.
Note:- Collections doesn't use hashcode value to sort the objects.
hashcode() method is used mainly in case of hash based collections like HashMap and HashSet. The hash code returned by this method is used for calculation of hash index or bucket index.
HashCode function is the necessity of all classes or POJOs or Beans where we need a comparison or check equality.
Let suppose we need to compare two objects irrespective of the Collection API, then there should be a way to achieve it.
If HashCode is not the part of Object Class then it would be difficult to calculate the hash every time and be an overburden.
It's a pragmatic design decision but the question is essentially correct. A purist analysis would say that it's an example of a Interface Bloat or a Fat Interface.
The Java java.lang.Object has more methods than are strictly required by (or even meaningful for) all objects. It's not just hashCode() either.
It's arguable that the only method on Object that makes sense for all objects is getClass().
Not all applications are concurrent let alone needing their own monitors. So a purist object model would remove notify(), notifyAll() and 3 versions of wait() to an interface called (say) Monitored and then only permit synchronized to be used with objects implementing that.
It's very common for it to be either invalid or unnecessary to clone() objects - though that method is fortunately protected. Again best off in an interface say interface Cloneable<T>.
Object identity comparisons (are these references to the same object) has been provided as the intrinsic operator ==, so equals(Object) should (still being a purist) be in a ValueComparable<T> interface for objects that have that semantic (many don't).
Being very pure even then you'd push hashCode() into another interface (say) HashCodable.
finalize() could also be put in an interface HasFinalize. Indeed that could make the garbage collectors life a bit easier especially given its use so rare and specialized.
However there is a clear design decision in Java to simplify things and the designers decided to put a number of methods in Object that are apparently 'frequently used' or useful rather than 'strictly part of the minimal nature of being an object' which is (in the Java model at least) 'being an instance of some class of objects (having common methods, interfaces and semantics)'.
IMHO hashCode() is probably the least out of place!
It is totally unnecessary to provide a monitor on every object and leaves implementers with a headache of supporting the methods on every object knowing they will be called for a minuscule number of them. Don't under estimate the overhead that might cause given it may be necessary to allocate things like mutexes a whole cache-line (typically tens of bytes) to every one of millions of objects and there being no sane way it would ever get used.
I'm not suggesting for a second 'Java is broken' or 'badly designed'. I am not here to knock Java. It is a great language. As with the design of generics it has always chosen to make things simple and been willing to make some compromises on performance for simplicity and as a result produced a very powerful and accessible language in which by great implementation those performance overheads only occasionally grate.
But to repeat the point I think we should recognise those methods are not in the intrinsic nature of all objects.
I recently discovered that class ProcessBuilder in JDK6 does not override equals(). Is there a reason? Since the class is mutable, I can understand why it does not override hashCode().
I was surprised to see this code not work:
ProcessBuilder x = new ProcessBuilder("abc", "def");
ProcessBuilder y = new ProcessBuilder("abc", "def");
if (x.equals(y)) { // they are never equal
// something important here
}
I looked into the JDK6 source code for class ProcessBuilder, and I do not see an override for equals().
I have a feeling there is a deeper reason, beyond this one class. Perhaps this is intentional?
It is considered best practice to make mutable objects not equal unless they are the same object. This is because the object could change later. Consider the following
Set<ProcessBuilder> pbSet = new HashSet<>();
pbSet.add(x);
pbSet.add(y);
// if x and y were equal pbSet would have one element.
y.setSomething()
// should pbSet have one or two elements.
Worse than this is the opposite case where two object could be different but later made the same. This means the Set would have a duplicate object.
What is interesting is that collections are mutable but still have equals and hashCode. I think the reason this is the case is that there is no immutable collections. e.g. String override equals(), StringBuilder does not.
To complement #PeterLawrey's answer: for objects which are mutable by nature, implementing equals and hashcode is risky in any event. You have no guarantee that such objects are safely published at all. It therefore makes sense that the authors of such classes just "gave up" on equals and hashcode for such classes.
HOWEVER: if you are reasonably sure that you can control this equality, there is something for you: Guava's Equivalence. If you can ensure sufficiently controlled access to highly mutable classes, this allows you to define an equals/hashcode strategy for such objects so that you can even use them in, say, a HashSet.
More about this Equivalence: for an "unstable" class X which is mutable by nature, but for which you can guarantee equivalence in a given context, you implement an Equivalence<X>. Then you "wrap" these instances of X into, for instance, a:
Set<Equivalence.Wrapper<X>>
You'll then add to this set using:
set.add(eq.wrap(x));
where eq is your implementation of an Equivalence.
None really talks about this aspect of equals() and hasCode(), but there is potentially massive impact on equals() and hashCode() behavior. Massive when dealing with a bit more complex objects referencing other objects.
Joshua Bloch in his Effective Java does not even mention it in his "overriding equals() method" chapter. All his examples are trivialities like Point and ColorPoint, all with just primitive or nearly primitive types.
Can recursivity be avoided? Sometimes hardly. Assume:
Person {
String name;
Address address;
}
Both fields has to go to business key (as Hibernate guys call it), they are both value components (as Joshua Bloch has it). And Address is a complex object itself. Recursion.
Be aware, IDEs like Eclipse and IntelliJ does generates recursive equals() and hashCode().
They by default use all fields. If you apply generator tools an mass, you asking for troubles.
One trouble is you can get a StackOverflowError. Here my simple test proving it.
All is needed is class having as a "value component" another object, forming a object graph and recommended equals() implementation. Yes, you need a graph in that cycle, but that is nothing unrealistic (imagine molecules, paths on map, interlinked transactions..).
Another trouble is performance. What is recommended for equals() is in fact comparing of two object graphs, potentially huge graphs, one can end up comparing thousands of nodes without knowing it. And not all of them are necessary in the memory! Consider that some objects may be lazy loadable. One can end up loading half of the database on one equals() or hashCode() call.
Paradox is, the more rigorously you override equals() and hashCode() as you are encouraged to do, the more likely you get into troubles.
Ideally, the equals() method should test logical equality. In some cases, that may descend more deeply than the physical object, and in others, it may not.
If testing logical equality is not feasible, due to performance or other concerns, then you can leave the default implementation provided by Object, and not rely on equals(). For example, you don't have to use your object graph as a key in a collection.
Bloch does say this:
The easiest way to avoid problems is not to override the equals method, in which case each instance of the class is equal only to itself.
There are at least two logical questions, which would be meaningful for any two references of any type, which it would at various times be useful for equals to test:
Can the type promise that the two references will forevermore identify equivalent objects?
Can the type promise that the two references will identify equivalent objects as long as the code which holds the references neither modifies the objects, nor exposes them to code that will?
If a reference identifies an object that might change at any time without notice, the only references that should be considered equivalent are those which identify the same object. If a reference identifies an object of a deeply-immutable type, and is never used in ways that test its identity (e.g. locking, IdentityHashSet, etc.) then all references to objects holding equal content should be considered equivalent. In both of the above situations, the proper behavior of equals is clear and unambiguous, since in the former case the proper answer for both questions would be obtained by testing reference identity, and in the latter case the proper answer would be obtained by testing deep equality.
Unfortunately, there's a very common scenario where the answers to the two questions diverge: when the only extant references to objects of mutable type are held by code which knows that no references to those objects will ever be held by code that might mutate them nor test them for reference identity. In that scenario, if two such objects presently encapsulate the same state, they will forever more do so, and thus equivalence should be based upon equivalence of constituents rather than upon reference identity. In other words, equals should be based upon how nested objects answer the second question.
Because the meaning of equality depends upon information which is only known by the holder of a reference, and not by the object identified by the reference, it's not really possible for an equals method to know what style of equality is appropriate. Types that know that the things to which they hold references might spontaneously change should test reference equality of those constituent parts, while types that know they won't change should generally test deep equality. Things like collections should allow the owner to specify whether the things stored in the collections could spontaneously change, and test equality on that basis; unfortunately, relatively few of the built-in collections include any such facility (code can select between e.g. HashTable and IdentityHashTable to distinguish what kind of test is appropriate for keys, but most kinds of collections have no equivalent choice). The best one can do is probably have each new collection type offer in its constructor a choice of encapsulation mode: regard the collection itself as something that might be changed without notice (report reference equality on the collection), assume the collection will hold an unchanging set of references to things that might change without notice (test reference equality on the contents), assume that neither the collection nor the constituent objects will change (test equals of each constituent object), or--for collections of arrays or of nested collections that don't support deep-equality testing--perform super-deep equality testing to a specified depth.
Object in Java has hashCode method, however, it is being used only in associative containers like HashSet or HashMap. Why was it designed like that? Hashable interface having hashCode method looks as much more elegant solution.
The major argument seems to me that there is a well-defined default hashCode that can be calculated for any Java object, along with an equally well-defined equals. There is simply no good reason to withhold this function from all objects, and of course there are plenty reasons not to withhold it. So it's a no-brainer in my book.
This question is claimed as a duplicate from another which asks why there's no interface which behaves like Comparator (as distinct from Comparable) but for hashing. .NET includes such an interface, called IEqualityComparer, and it would seem like Java could as well. As it is, if someone wants to e.g. have a Java collection which e.g. maps strings to other objects in case-insensitive fashion (perhaps the most common use of IEqualityComparer) one must wrap the strings in objects whose hashCode and equals methods act on a case-insensitive basis.
I suspect the big issue is that while an "equalityComparer" interface could be convenient, in many cases efficiently testing an equivalence relation would require caching information. For example, while a case-insensitive string-hashing function could make an uppercase-only copy of the passed-in string and call hashCode on that, it would be difficult to avoid having every request for the hashcode of a particular string repeat the conversion to uppercase and the hashing of that uppercase value. By contrast, a "case-insensitive string" object could include fields for an uppercase-only copy of the string, which would then only have to be generated once for the instance.
An EqualityComparer could achieve reasonable performance if it included something like a WeakHashMap<string,string> to convert raw strings to uppercase-only strings, but such a design would either require different threads to use different EqualityComparer instances despite the lack of externally visible state, or else require performance-robbing locking and synchronization code even in single-threaded scenarios.
Incidentally, a second issue that arises with comparator-style interfaces is that a collection type which uses an externally-supplied comparator (whether it compares for rank or equality) is that the comparator itself becomes part of the state of the class which uses it. If hash tables use different EqualityComparer instances, there may be no way to know that they can safely be considered equivalent, even if the two comparators would behave identically in all circumstances.