How to test for equivalence in a generic class?

How to test for equivalence in a generic class? - java

Scheme knows three different equivalence operators: eq?, eqv?, equal?. Sere here for details. In short: eq? tests references, eqv? tests values and equal? recursively tests the values of lists. I would like to write a Java generic which needs the functionality of Scheme´s equal?.
I tried to use Java´s equals() method, because I thought it does a value comparison, because for a reference comparison the == operator exists and there is no need for equals to do the same. But this assumption is completely wrong, because equals in Java is completely unreliable. Sometimes it does a value comparison and sometimes it does a reference comparison. And one can not be sure which class does a reference comparison and which class does a value comparison.
This means that equals can not be used in a generic, because the generic would not do the same for all types. And it is also not possible to restrict the generic type in a way that only types are acceptable which implement the correct value comparison.
So the question is: how to do a reliable value comparison in a generic? Do I have to write it on my own from scratch?
By the way: I think Java´s equal failure does not start with Array. It starts already with Object. I think it is wrong that equals for two objects returns false. It must return true, because if you do a value comparison of something which does not have a value the values can not differ and therefor they must be the same. Scheme does it in that way and it is perfectly reasonable: (equal? (vector) (vector)) -> #t.

In Scheme, list equivalence is based completely on the structure of the items.
In Java by comparison, equality is context-dependent depending on the type of object, and may use some or all of the internal structure in its equivalence calculation. What it means for two objects of the same type to be "equal" is up to the object type to determine, so long as the general contract for equals is met (most notably that it forms an equivalence relation with all other objects).
Assuming all types used in a program have a reasonable equals definition, they should have a "reliable" value comparison, at least in the sense of the object oriented paradigm.
Returning to the analogous Java equal? implementation. It's a bit hard to piece together from the question's phrasing, but from context clues it appears that this is also attempting to operate on lists of items. The equals method on Java's List type already implements behavior directly analogous to Scheme's equals? operation:
Compares the specified object with this list for equality. Returns true if and only if the specified object is also a list, both lists have the same size, and all corresponding pairs of elements in the two lists are equal. (Two elements e1 and e2 are equal if Objects.equals(e1, e2).) In other words, two lists are defined to be equal if they contain the same elements in the same order.
This definition also means that recursive list structures also work in a similar manner as Scheme's equals? operation.
Note that the List behavior is notably different from that of Java's array type (which you mention in your question). Arrays in Java are a fairly low-level type, and do not support much of the typical object-oriented functionality one might expect. Of particular note, for equality, arrays are compared by object reference rather than by a structural comparison of the items in the array. There are ways to do sensible equality comparison on arrays using methods in the Arrays class (e.g. Arrays.equals and Arrays.deepEquals).
As an aside, to address your postscript about the equality of two bare Objects.
assert !(new Object().equals(new Object()))
From an object-oriented perspective, it is sensible that two bare objects be equal only if they're the same reference. First, as mentioned above, there is not a direct relation between an object's internal structure and its equality, so there's no need for them to be equal. There is virtually no context as to what two different instances of Object represent from a object modeling perspective, so there's no inherent conceptual way to tell that these two objects are logically the "same" thing.
In summary, assuming all the types in your list have a sensible version of equals() defined per their object's type, Java's List.equals() behaves directly analogously to Scheme's equals? operation.

Related

How can I take into consideration the object itself when calculating a hash for an object in Java?

I was working on some algorithmic problems when I got to this and it seemed interesting to me. If I have two lists (so two different objects), with the same values, the hashcode is the same. After some reading, I understand that this is how it should behave. For example:
List<String> lst1 = new LinkedList<>(Arrays.asList("str1", "str2"));
List<String> lst2 = new LinkedList<>(Arrays.asList("str1", "str2"));
System.out.println(lst1.hashCode() + " " + lst2.hashCode());
...........
Result: 2640541 2640541
My purpose would be to differentiate between lst1 and lst2 in a list for example.
Is there a structure (like a HashSet for example) that takes into consideration the actual object and not only the values inside the object when calculating the hashcode for something?

Yes, you can use java's java.util.IdentityHashMap, or guava's identity hash set.
The hashes of the two lists must be equal, because the objects are equal. But the identity map and set above are based on the identity of the list objects, not their hash.

If I have two lists (so two different objects), with the same values, the hashcode is the same. After some reading, I understand that this is how it should behave.
Yes, this is part of the specification of java.util.List.
Is there a structure (like a HashSet for example) that takes into consideration the actual object and not only the values inside the object when calculating the hashcode for something?
My purpose would be to differentiate between lst1 and lst2 in a list for example
It is unclear what "in a list" means here. For example, Collection.contains() and List.equals() are defined in terms or members' equals() methods, and likewise the behavior of List.remove(Object). Although distinct objects, your two Lists will compare equal to each other, so those methods will not distinguish between them, neither directly nor as members of another list. You can always compare them for reference equality (==), however, to determine that they are not the same object despite being equals() each other.
As far as a collection that takes members' object identity into account, you could consider java.util.IdentityHashMap. Two such maps having keys and associated values that are pairwise equals() each other but not identical will not compare equals() to each other. Such sets will typically have different hash codes than each other, though that cannot be guaranteed. Note well, however, the warnings throughout the documentation of IdentityHashMap that although it implements the Map API, many of the behavioral details are inconsistent with the requirements of that interface.
Note also that
most of the above is relevant only for collections whose members are of a type that overrides equals() and hashCode(). The implementations of or inherited from Object differentiate between objects on a reference-equality basis, so the ordinary collections classes have no surprises for you there.
identical string literals are not required to represent distinct objects, so the lst1 and lst2 in your example code may in fact contain identical elements, in the reference equality sense.

Not generally in collections, because you generally want two collections with all the same items to be equal (which is why they implement it like this- equals will return true and the hash codes are the same).
You can subclass a list and have it not do that, it would just not be widely useful and would cause a lot of confusion if other programmers read your code. In that case, you'd just want equals to return the result of == and hashCode to return the integer value of the reference (the same thing that Object.equals does).

Why does Java need equals() if there is hashCode()?

If two objects return same hashCode, doesn't it mean that they are equal? Or we need equals to prevent collisions?
And can I implement equals by comparing hashCodes?

If two objects have the same hashCode then they are NOT necessarily equal. Otherwise you will have discovered the perfect hash function. But the opposite is true - if the objects are equal, then they must have the same hashCode.

hashCode and Equals are different information about objects
Consider the analogy to Persons where hashcode is the Birthday,
in that escenario, you and many other people have the same b-day (same hashcode), all you are not the same person however..

Why does Java need equals() if there is hashCode()?
Java needs equals() because it is the method through which object equality is tested by examining classes, fields, and other conditions the designer considers to be part of an equality test.
The purpose of hashCode() is to provide a hash value primarily for use by hash tables; though it can also be used for other purposes. The value returned is based on an object's fields and hash codes of its composite and/or aggregate objects. The method does not take into account the class or type of object.
The relationship between equals() and hashCode() is an implication.
Two objects that are equal implies that the have the same hash code.
Two objects having the same hash code does not imply that they are equal.
The latter does not hold for several reasons:
There is a chance that two distinct objects may return the same hash code. Keep in mind that a hash value folds information from a large amount of data into a smaller number.
Two objects from different classes with similar fields will most likely use the same type of hash function, and return equal hash values; yet, they are not the same.
hashCode() can be implementation-specific returning different values on different JVMs or JVM target installations.
Within the same JVM, hashCode() can be used as a cheap precursor for equality by testing for a known hash code first and only if the same testing actual equality; provided that the equality test is significantly more expensive than generating a hash code.
And can I implement equals by comparing hashCodes?
No. As mentioned, equal hash codes does not imply equal objects.

The hashCode method as stated in the Oracle Docs is a numeric representation of an object in Java. This hash code has limited possible values (represented by the values which can be stored in an int).
For a more complex class, there is a high possibility that you will find two different objects which have the same hash code value. Also, no one stops you from doing this inside any class.
class Test {
#Override
public int hashCode() {
return 0;
}
}
So, it is not recommended to implement the equals method by comparing hash codes. You should use them for comparison only if you can guarantee that each object has an unique hash code. In most cases, your only certainty is that if two objects are equal using o1.equals(o2) then o1.hashCode() == o2.hashCode().
In the equals method you can define a more complex logic for comparing two objects of the same class.

If two objects return same hashCode, doesn't it mean that they are equal?
No it doesn't mean that.
The javadocs for Object state this:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. ...
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must
produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCodemethod on
each of the two objects must produce distinct integer results. ...
Note the highlighted statement. It plainly says "No" to your question.
There is another way to look at this.
The hashCode returns an int.
There are only 232 distinct values that an int can take.
If a.hashCode() == b.hashCode() implies a.equals(b), then there can be only 232 distinct (i.e. mutually unequal) objects at any given time in a running Java application.
That last point is plainly not true. Indeed, it is demonstrably not true if you have a large enough heap to hold 232 instances of java.lang.Object ... in a 64-bit JVM.
And a third way is to some well-known examples where two different two character strings have the same hashcode.
Given that your assumption is incorrect, the reasoning that follows from it is also incorrect.
Java does need an equals method.
You generally cannot implement equals using just hashCode.
You may be able to use hashCode to implement a faster equals method, but only if calling hashCode twice is faster than comparing two objects. It generally isn't.

hashCodes are equal -> Objects might be equal -> further comparision is required
hashCodes are different -> Object are not equal (if hashCode is implemented right)
That's how equals method are implemented. At first you check if hashCodes are equal. If yes, you need to check class fields to see if it represents the exact same object. If hashCodes are different, you can be sure that objects are not equal.

Sometimes (very often?) you don't!
These answers are not untrue. But they don't tell the whole story.
One example would be where you are creating a load of objects of class SomeClass, and each instance that is created is given a unique ID by incrementing a static variable, nInstanceCount, or some such, in the constructor:
iD = nInstanceCount++;
Your hash function could then be
int hashCode(){
return iD;
}
and your equals could then be
boolean equals( Object obj ){
if( ! ( obj instanceof SomeClass )){
return false;
}
return hashCode() == obj.hashCode();
}
... under such circumstances your idea that "equals is superfluous" is effectively true: if all classes behaved like this, Java 10 (or Java 23) might say, ah, let's just get rid of silly old equals, what's the point? (NB backwards compatibility would then go out the window).
There are two essential points:
you couldn't then create more than MAXINT instances of SomeClass. Or... you could ... if you set up a system for reassigning the IDs of previously destroyed instances. IDs are typically long rather than int ... but this wouldn't work because hashCode() returns int.
none of these objects could then be "equal" to another one, since equality = identity for this particular class, as you have defined it. Often this is desirable. Often it shuts off whole avenues of possibilities...
The necessary implication of your question is, perhaps, what's the use of these two methods which, in a rather annoying way, have to "cooperate"? Frelling, in his/her answer, alluded to the crucial point: hash codes are needed for sorting into "buckets" with classes like HashMap. It's well worth reading up on this: the amount of advanced maths that has gone into designing efficient "bucket" mechanisms for classes like HashMap is quite frightening. After reading up on it you may come to have (like me) a bit of understanding and reverence about how and why you should bother implementing hashCode() with a bit of thought!

is equals() supposed to be recursive/deep?

None really talks about this aspect of equals() and hasCode(), but there is potentially massive impact on equals() and hashCode() behavior. Massive when dealing with a bit more complex objects referencing other objects.
Joshua Bloch in his Effective Java does not even mention it in his "overriding equals() method" chapter. All his examples are trivialities like Point and ColorPoint, all with just primitive or nearly primitive types.
Can recursivity be avoided? Sometimes hardly. Assume:
Person {
String name;
Address address;
}
Both fields has to go to business key (as Hibernate guys call it), they are both value components (as Joshua Bloch has it). And Address is a complex object itself. Recursion.
Be aware, IDEs like Eclipse and IntelliJ does generates recursive equals() and hashCode().
They by default use all fields. If you apply generator tools an mass, you asking for troubles.
One trouble is you can get a StackOverflowError. Here my simple test proving it.
All is needed is class having as a "value component" another object, forming a object graph and recommended equals() implementation. Yes, you need a graph in that cycle, but that is nothing unrealistic (imagine molecules, paths on map, interlinked transactions..).
Another trouble is performance. What is recommended for equals() is in fact comparing of two object graphs, potentially huge graphs, one can end up comparing thousands of nodes without knowing it. And not all of them are necessary in the memory! Consider that some objects may be lazy loadable. One can end up loading half of the database on one equals() or hashCode() call.
Paradox is, the more rigorously you override equals() and hashCode() as you are encouraged to do, the more likely you get into troubles.

Ideally, the equals() method should test logical equality. In some cases, that may descend more deeply than the physical object, and in others, it may not.
If testing logical equality is not feasible, due to performance or other concerns, then you can leave the default implementation provided by Object, and not rely on equals(). For example, you don't have to use your object graph as a key in a collection.
Bloch does say this:
The easiest way to avoid problems is not to override the equals method, in which case each instance of the class is equal only to itself.

There are at least two logical questions, which would be meaningful for any two references of any type, which it would at various times be useful for equals to test:
Can the type promise that the two references will forevermore identify equivalent objects?
Can the type promise that the two references will identify equivalent objects as long as the code which holds the references neither modifies the objects, nor exposes them to code that will?
If a reference identifies an object that might change at any time without notice, the only references that should be considered equivalent are those which identify the same object. If a reference identifies an object of a deeply-immutable type, and is never used in ways that test its identity (e.g. locking, IdentityHashSet, etc.) then all references to objects holding equal content should be considered equivalent. In both of the above situations, the proper behavior of equals is clear and unambiguous, since in the former case the proper answer for both questions would be obtained by testing reference identity, and in the latter case the proper answer would be obtained by testing deep equality.
Unfortunately, there's a very common scenario where the answers to the two questions diverge: when the only extant references to objects of mutable type are held by code which knows that no references to those objects will ever be held by code that might mutate them nor test them for reference identity. In that scenario, if two such objects presently encapsulate the same state, they will forever more do so, and thus equivalence should be based upon equivalence of constituents rather than upon reference identity. In other words, equals should be based upon how nested objects answer the second question.
Because the meaning of equality depends upon information which is only known by the holder of a reference, and not by the object identified by the reference, it's not really possible for an equals method to know what style of equality is appropriate. Types that know that the things to which they hold references might spontaneously change should test reference equality of those constituent parts, while types that know they won't change should generally test deep equality. Things like collections should allow the owner to specify whether the things stored in the collections could spontaneously change, and test equality on that basis; unfortunately, relatively few of the built-in collections include any such facility (code can select between e.g. HashTable and IdentityHashTable to distinguish what kind of test is appropriate for keys, but most kinds of collections have no equivalent choice). The best one can do is probably have each new collection type offer in its constructor a choice of encapsulation mode: regard the collection itself as something that might be changed without notice (report reference equality on the collection), assume the collection will hold an unchanging set of references to things that might change without notice (test reference equality on the contents), assume that neither the collection nor the constituent objects will change (test equals of each constituent object), or--for collections of arrays or of nested collections that don't support deep-equality testing--perform super-deep equality testing to a specified depth.

Difference between the two Java operators: != vs !equals

Is this code:
elem1!=elem2
equivalent to this one?
!elem1.equals(elem2)
It compiles both ways, but I'm still unsure about it...

== (and by extension !=) check for object identity, that is, if both of the objects refer to the very same instance. equals checks for a higher level concept of identity, usually whether the "values" of the objects are equal. What this means is up to whoever implemented equals on that particular object. Therefore they are not the same thing.
A common example where these two are not the same thing are strings, where two different instances might have the same content (the same string of characters), in which case a == comparison is false but equals returns true.
The default implementation of equals (on Object) uses == inside, so the results will be same for objects that do not override equals (excluding nulls, of course)

In general, no they're not the same. The first version checks whether elem1 and elem2 are references to the same object (assuming that they're not primitive types). The second version calls a type-specific method to check whether two (possibly distinct) ojects are "equal", in some sense (often, this is just a check that all their member fields are identical).
I don't think this has anything to do with generics, as such.

Java: If I overwrite the .equals method, can I still test for reference equality with ==?

I have the following situation: I need to sort trees based by height, so I made the Tree's comparable using the height attribute. However, I was also told to overwrite the equals and hashCode methods to avoid unpredictable behaviour.
Still, sometimes I may want to compare the references of the roots or something along those lines using ==. Is that still possible or does the == comparison call the equals method?

equals() is meant to compare an object with rules set by the programmer. In your example you compare your trees by height, so you'll write equals() so it compares heights.
==, as you said, compares references. These aren't touched neither by equals() nor by hashCode(). So you won't change its behaviour.

Yes, == will not call hashCode or equals. You can still test for reference equality like this.

== does not call equals. So it's still find for identity checks.
As many implementations of equals start with this == other check you would get a literal StackOverflow if it were calling equals behind the scenes.

I think that a bigger question here is whether it is appropriate to implement comparable on these objects. It may be more appropriate to use a Comparator for the operations that work on height, and not embed ordinal computation in the class itself.
My general philosophy on this is to only implement Comparable if there is a truly natural ordering for the object. In the case of a tree node, is height the only way that anyone could ever want to sort? Maybe this is a private class, and the answer is 'yes'. But even then, creating a Comparator isn't that much extra work, and it leaves things flexible in case you decide you want to make that tree node a protected or public class some day.

== tests referential equality. It will not call equals.

Overriding the equals() method will have NO effect on the == operator.
== is used to test if 2 references point to the same object.
equals() method "meaningfully" compares 2 objects.
It is important to realize the implication of the work "meaningful" here. Equality is easier to understand when you are comparing, for instance, 2 Strings or 2 integers. This is why, the equals() method - inherited from the Object class - is already overridden by the String and Wrapper classes (Integer, Float, etc). However, what if you are comparing 2 objects of type Song. Here, equality can be established on the basis of
1) Artist name
2) Song name
3) or some other criterion
Therefore, you have to override the equals() method to "explicitly" determine "when" 2 Song objects are considered equal.
The "unpredictable behavior" you mentioned in your question relates to objects like the one above (Song) behave when dealing with Collections like Map. You SHOULD NOT use these objects in a map until you override both the equals() and hashcode() method. The reason being how hashmap search and indexing works. Refer the JavaDoc for the specifc rules. What you should remember is:
If 2 objects are meaningfully equal, their hashcode should return the same value. However, it is not necessary for 2 objects to be equal, if they return the same hashcode. Again, Java doesn't enforce any rules regarding this. It is your responsibility to implement the equals() and hashcode() methods correctly.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.