Java, Object.hashCode() result constant across all JVMs/Systems?

Java, Object.hashCode() result constant across all JVMs/Systems? - java

Is the output of Object.hashCode() required to be the same on all JVM implementations for the same Object?
For example if "test".hashCode() returns 1 on 1.4, could it potentially return 2 running on 1.6. Or what if the operating systems were different, or there was a different processor architecture between instances?

No. The output of hashCode is liable to change between JVM implementations and even between different executions of a program on the same JVM.
However, in the specific example you gave, the value of "test".hashCode() will actually be consistent because the implementation of hashCode for String objects is part of the API of String (see the Javadocs for java.lang.String and this other SO post).

From the API
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

No, the result of hashCode() is only constant during a single execution. You should not expect the result of the function to be the same between executions, let alone between JRE versions or platforms.

first of all, the result of hashCode depends heavily on the Object type and its implementation. every class including its subclasses can define its own behavior. you can rely on it following the general contract as outlined in the javadoc as well as in other answers. but the value is not required to stay the same after a VM restart. especially if it depends on the .hashCode implementations of thrid party classes.
when referring to the concrete implementation of the String class, you should not depend on the return value. if you program is executed in a different VM, it could potentially change.
if you refer solely to the Sun Vm, it could be argued that Sun will not break - even badly programmed - existing code. so "test".hashCode() will always return exactly 3556498 for any version of the Sun VM.
if you want to deliberatly shoot yourself in the foot, go ahead and depend on this. people who will need to fix your code running on the "2015 Nintendo Java VM for Hairdryer" will cry out your name at night.

As noted, for many implementations the default behavior of hashCode() is to return the address of the object. Obviously this can be different each time the program is run. This is also consistent with the default behavior of equals(): two objects are equal only if they are the same object (where x and y are both non-null, x.equals(y) if and only if x == y).
For any classes where hashCode() and equals() are overridden, generally they are calculated in a deterministic way based on the values of some or all of the members. Thus, in practice it is likely that if an object in one run of the program can be said to be equal to an object in another run of the program, and the source code is the same (including such things as the source code for String.hashCode() if that is called by the hashCode() override), the hash codes will be the same.
It is not guaranteed, although it is hard to think of a reasonable real-world example.

The only truth: hashcode is the same for the application run. Another run may give other hashcodes.
When you ask for object's hashcode, JVM creates it using one of RNG algorithms and puts it in object's header for future usage.
Just look into get_next_hash function in OpenJDK.
The RNG algorithm is configurable with JVM arg -XX:hashCode=x,
where x is a digit:
0 – Park-Miller RNG (default)
1 – f (address, the global)
2 – constant 1
3 – sequential counter
4 – object's address in heap
5 – Xorshift (the fastest)
When the hashcode equals address in heap - this is sometimes awkward, because GC can move objects to another heap cells etc.

Related

Java collection performance when comparing items

A basic performance question from someone coming from C/C++.
I'm using a Collection (ArrayDeque) to simply hold, add, remove items by identity. I know the contract is for the collection to use equals() when checking equality, for example during remove(obj), but in my case I want to use reference semantics (like IdentityHashMap but don't need the map). So I am fine to just know that I will never override the equals() on any of the objects held inside the collection (which is declared to hold an interface).
Coming from native programming I can't avoid asking myself, will the compiled code of remove(obj) traverse items and perform a virtual call on Object.equals() only to end up comparing addresses? Since I'm storing interface references, there is no way (?) to optimise this using final so the compiler doesn't bother making the useless calls (i.e. inline them) - but now I'm getting ahead of myself because it may be such optimisation is not necessary anyway and JVM has other means (devirtualisation?) to generate optimal code in this case.
Assuming my code needs the level of optimisation that can be obtained by thinking about this aspect in the first place - is my understanding correct? What is a good design for this case?

Making the method final wont avoid the virtual call because invokevirtual opcode will be used anyway and there is no way for the JVM to tell if the method was final or not.
The good news is that the JVM might be able inline it or avoid the virtual call if it can't see that the method is overridden anywhere in the classpath so your performance will improve as your program runs.

When you use the remove method, it will call the equals method for comparison. Ideally, you should be overriding the equals and hashcode method to use such methods. Otherwise the by-default implementation of type-checking and address comparison happens. It is highly recommended to define your implementation of equals and hashcode methods while using methods of Collections.
Regarding the performance, yes you are right - all the objects in the collection will be scanned linearly till the JVM encounters correct match. It is a linear search, hence the time complexity for this operation of removal will take O(n) time.

Value-based Classes confusion

I'm seeking some clarification to the definition of Value-based Classes. I can't imagine, how is the last bullet point (6) supposed to work together with the first one
(1) they are final and immutable (though may contain references to mutable objects)
(6) they are freely substitutable when equal, meaning that interchanging any two instances x and y that are equal according to equals() in any computation or method invocation should produce no visible change in behavior.
Optional is such a class.
Optional a = Optional.of(new ArrayList<String>());
Optional b = Optional.of(new ArrayList<String>());
assertEquals(a, b); // passes as `equals` delegated to the lists
b.get().add("a");
// now bite the last bullet
assertTrue(a.get().isEmpty()); // passes
assertTrue(b.get().isEmpty()); // throws
Am I reading it incorrectly, or would it need to get more precise?
Update
The answer by Eran makes sense (they are no more equal), but let me move the target:
...
assertEquals(a, b); // now, they are still equal
assertEquals(m(a, b), m(a, a)); // this will throw
assertEquals(a, b); // now, they are equal, too
Let's define a funny method m, which does some mutation and undoes it again:
int m(Optional<ArrayList<String>> x, Optional<ArrayList<String>> y) {
x.get().add("");
int result = x.get().size() + y.get().size();
x.get().remove(x.get().size() - 1);
return result;
}
It's strange method, I know. But I guess, it qualifies as "any computation or method invocation", doesn't it?

they are freely substitutable when equal, meaning that interchanging any two instances x and y that are equal according to equals() in any computation or method invocation should produce no visible change in behavior
Once b.get().add("a"); is executed, a is no longer equals to b, so you have no reason to expect assertTrue(a.get().isEmpty()); and assertTrue(b.get().isEmpty()); would produce the same result.
The fact that a value based class is immutable doesn't mean you can't mutate the values stored in instances of such classes (as stated in though may contain references to mutable objects). It only means that once you create an Optional instance with Optional a = Optional.of(new ArrayList<String>()), you can't mutate a to hold a reference to a different ArrayList.

You can derive the invalidity of your actions from the specification you’re referring to:
A program may produce unpredictable results if it attempts to distinguish two references to equal values of a value-based class, whether directly via reference equality or indirectly via an appeal to synchronization, identity hashing, serialization, or any other identity-sensitive mechanism. Use of such identity-sensitive operations on instances of value-based classes may have unpredictable effects and should be avoided.
(emphasis mine)
Modifying an object is an identity-sensitive operation, as it only affects the object with the specific identity represented by the reference you are using for the modification.
When you are calling x.get().add(""); you are performing an operation that allows to recognize whether x and y represent the same instance, in other words, you are performing an identity sensitive operation.
Still, I expect that if a future JVM truly tries to substitute value based instances, it has to exclude instances referring to mutable objects, to ensure compatibility. If you perform an operation that produces an Optional followed by extracting the Optional, e.g. … stream. findAny().get(), it would be disastrous/unacceptable if the intermediate operation allowed to substitute the element with another object that happened to be equal at the point of the intermediate Optional use (if the element is not itself a value type)…

I think a more interesting example is as follows:
void foo() {
List<String> list = new ArrayList<>();
Optional<List<String>> a = Optional.of(list);
Optional<List<String>> b = Optional.of(list);
bar(a, b);
}
It's clear that a.equals(b) is true. Furthermore, since Optional is final (cannot be subclassed), immutable, and both a and b refer to the same list, a.equals(b) will always be true. (Well, almost always, subject to race conditions where another thread is modifying the list while this one is comparing them.) Thus, this seems like it would be a case where it would be possible for the JVM to substitute b for a or vice-versa.
As things stand today (Java 8 and 9 and 10) we can write a == b and the result will be false. The reason is that we know that Optional is an instance of an ordinary reference type, and the way things are currently implemented, Optional.of(x) will always return a new instance, and two new instances are never == to each other.
However, the paragraph at the bottom of the value-based classes definition says:
A program may produce unpredictable results if it attempts to distinguish two references to equal values of a value-based class, whether directly via reference equality or indirectly via an appeal to synchronization, identity hashing, serialization, or any other identity-sensitive mechanism. Use of such identity-sensitive operations on instances of value-based classes may have unpredictable effects and should be avoided.
In other words, "don't do that," or at least, don't rely on the result. The reason is that tomorrow the semantics of the == operation might change. In a hypothetical future value-typed world, == might be redefined for value types to be the same as equals, and Optional might change from being a value-based class to being a value type. If this happens, then a == b will be true instead of false.
One of the main ideas about value types is that they have no notion of identity (or perhaps their identity isn't detectable to Java programs). In such a world, how could we tell whether a and b "really" are the same or different?
Suppose we were to instrument the bar method via some means (say, a debugger) such that we can inspect the attributes of the parameter values in a way that can't be done through the programming language, such as by looking at machine addresses. Even if a == b is true (remember, in a value-typed world, == is the same as equals) we might be able to ascertain that a and b reside at different addresses in memory.
Now suppose the JIT compiler compiles foo and inlines the calls to Optional.of. Seeing that there are now two hunks of code that return two results that are always equals, the compiler eliminates one of the hunks and then uses the same result wherever a or b is used. Now, in our instrumented version of bar, we might observe that the two parameter values are the same. The JIT compiler is allowed to do this because of the sixth bullet item, which allows substitution of values that are equals.
Note that we're only able to observe this difference because we're using an extra-linguistic mechanism such as a debugger. Within the Java programming language, we can't tell the difference at all, and thus this substitution can't affect the result of any Java program. This lets the JVM choose any implementation strategy it sees fit. The JVM is free to allocate a and b on the heap, on the stack, one on each, as distinct instances, or as the same instances, as long as Java programs can't tell the difference. When the JVM is granted freedom of implementation choices, it can make programs go a lot faster.
That's the point of the sixth bullet item.

When you execute the lines:
Optional a = Optional.of(new ArrayList<String>());
Optional b = Optional.of(new ArrayList<String>());
assertEquals(a, b); // passes as `equals` delegated to the lists
In the assertEquals(a, b), according to the API :
will check if the params a and b are both Optional
Items both have no value present or,
The present values are "equal to" each other via
equals() (in your example this equals is the one from ArrayList).
So, when you change one of the ArrayList the Optional instance is pointing to, the assert will fail in the third point.

Point 6 says if a & b are equal then they can be used interchangeably i.e say if a method expects two instances of Class A and you have created a&b instances then if a & b passes point 6 you may send (a,a) or (b,b) or (a,b) all three will give the same output.

Does Object.toString or Object.hashCode ever give the memory address of the object

It is often claimed that the implementation of Object.hashCode() (the default implementation for all objects) gives the memory address of the object. That claim is often attached to an explanation of the peculiar output produced by Object.to String().
See here for an example.
This is certainly not the case for any JVMs/JREs I am aware of. Not least because addresses are usually 64 bits long now. But also, garbage collectors relocate objects, so the address changes. I've seen claims it can be the initial memory address of the object. But as many objects would then have similar addresses, that would be a poor choice for a hash code.
Are there, or have there ever been, any widely used JVMs/JREs for which it was the (initial) memory address of the object.
I am aware that the JavaDoc for the Object class suggests that the hashCode for an implementation might be the memory address. But I suspect that is a grossly out of date statement that has never been updated.
Indeed, the current Oracle JVM does not use the memory address (but can be configured to do so):
https://stackoverflow.com/a/16105878/545127
The idea that the hashCode is a memory address is a historical artefact:
https://stackoverflow.com/a/13860488/545127
My question is whether (and which) any widely used JVM used the memory address as its (default) implementation.

Since the default hash code of an object does not need to be unique, returning the whole address is not necessary. An implementation could grab a group of bits from the address - say, bits 3 through 35 on a 64-bit system, or a XOR between the upper 32 bits and the lower 32 bits, or simply the lower 32 bits.
But as many objects would then have similar addresses [due to garbage collection], that would be a poor choice for a hash code.
Hash codes that are numerically close to each other are OK. Even a small number of identical hash codes would not create a problem, because equality is used to resolve any ties. The situations when the default hash code implementation is used are generally limited, because objects that are used as keys in hash-based containers are expected to provide "good" implementations of hashCode method.
Oracle says that the default implementation of their JVM uses the internal address of the object, whatever that means, to compute its hashCode. However, other JVM implementations are not required to do the same:
Here is a quote from Oracle's documentation:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)
You can find the actual implementation of the algorithm here. Search for get_next_hash function for details. It appears that computing hash based on address is done with a simple conversion:
value = intptr_t(obj) ;

Why does Java's Area#equals method not override Object#equals?

I just ran into a problem caused by Java's java.awt.geom.Area#equals(Area) method. The problem can be simplified to the following unit test:
#org.junit.Test
public void testEquals() {
java.awt.geom.Area a = new java.awt.geom.Area();
java.awt.geom.Area b = new java.awt.geom.Area();
assertTrue(a.equals(b)); // -> true
java.lang.Object o = b;
assertTrue(a.equals(o)); // -> false
}
After some head scratching and debugging, I finally saw in the JDK source, that the signature of the equals method in Area looks like this:
public boolean equals(Area other)
Note that it does not #Override the normal equals method from Object, but instead just overloads the method with a more concrete type. Thus, the two calls in the example above end up calling different implementations of equals.
As this behavior has been present since Java 1.2, I assume it is not considered a bug. I am, therefore, more interested in finding out why the decision was made to not properly override the equals method, but at the same time provide an overloaded variant. (Another hint that this was an actual decision made is the absence of an overwritten hashCode() method.)
My only guess would be that the authors feared that the slow equals implementation for areas is unsuitable for comparing equality when placing Areas in Set,Map,etc. datastructures. (In the above example, you could add a to a HashSet, and although b is equal to a, calling contains(b) will fail.) Then again, why did they not just name the questionable method in a way that does not clash with such a fundamental concept as the equals method ?

RealSkeptic linked to JDK-4391558 in a comment above. The comment in that bug explains the reasoning:
The problem with overriding equals(Object) is that you must also
override hashCode() to return a value which guarantees that equals()
is true only if the hashcodes of the two objects are also equal.
but:
The problem here is that Area.equals(Area) does not perform a very
straight-forward comparison. It painstakingly examines each and every
piece of geometry in the two Areas and tests to see if they cover the
same enclosed spaces. Two Area objects could have a completely
different description of the same enclosed space and equals(Area)
would detect that they were the same.
So basically we're left with an array of not-so-pleasant options, such as:
deprecate equals(Area) and create an alternate name for that
operation, such as "areasEqual" so as to avoid the confusion.
Unfortunately, the old method would remain and would be linkable and
would trap many people who were intending to invoke the equals(Object)
version.
or:
deprecate equals(Area) and change its implementation to be exactly
that of equals(Object) so as to avoid semantic problems if the wrong
method is called. Create a new method with a different name to avoid
confusion to implement the old functionality provided by equals(Area).
or:
implement equals(Object) to call equals(Area) and implement a dummy
hashCode() which honors the equals/hashCode contract in a degenerate
way by returning a constant. This would make the hashCode method
essentially useless and make Area objects nearly useless as keys in a
HashMap or Hashtable.
or other ways to modify the equals(Area) behavior that would either change its semantics or make it inconsistent with hashCode.
Looks like changing this method is deemed by the maintainers to be neither feasible (because neither option outlined in the bug comment quite solves the problem) nor important (since the method, as implemented, is quite slow and would probably only ever return true when comparing an instance of an Area with itself, as the commenter suggests).

"Why does Java's Area#equals method not override Object#equals?"
Because overriding is not necessary for overloaded methods where the parameters are of differing types.
An overridden method would have the exact same method name, return type, number of parameters, and types of parameters as the method in the parent class, and the only difference would be the definition of the method.
This case does not compel us to override but it is overloading as it follows these rules:
1.) The number of parameters is different for the methods.
2.) The parameter types are different (like changing a parameter that was a float to an int).
"why did they not just name the questionable method in a way that does not clash with such a fundamental concept as the equals method?"
Because this could trip people up going into the future. If we had a time machine to the 90's we could do it without this concern.

How to test for equality of complex object graphs?

Say I have a unit test that wants to compare two complex for objects for equality. The objects contains many other deeply nested objects. All of the objects' classes have correctly defined equals() methods.
This isn't difficult:
#Test
public void objectEquality() {
Object o1 = ...
Object o2 = ...
assertEquals(o1, o2);
}
Trouble is, if the objects are not equal, all you get is a fail, with no indication of which part of the object graph didn't match. Debugging this can be painful and frustrating.
My current approach is to make sure everything implements toString(), and then compare for equality like this:
assertEquals(o1.toString(), o2.toString());
This makes it easier to track down test failures, since IDEs like Eclipse have a special visual comparator for displaying string differences in failed tests. Essentially, the object graphs are represented textually, so you can see where the difference is. As long as toString() is well written, it works great.
It's all a bit clumsy, though. Sometimes you want to design toString() for other purposes, like logging, maybe you only want to render some of the objects fields rather than all of them, or maybe toString() isn't defined at all, and so on.
I'm looking for ideas for a better way of comparing complex object graphs. Any thoughts?

The Atlassian Developer Blog had a few articles on this very same subject, and how the Hamcrest library can make debugging this kind of test failure very very simple:
How Hamcrest Can Save Your Soul (part 1)
Hamcrest saves your soul - Now with less suffering! (part 2)
Basically, for an assertion like this:
assertThat(lukesFirstLightsaber, is(equalTo(maceWindusLightsaber)));
Hamcrest will give you back the output like this (in which only the fields that are different are shown):
Expected: is {singleBladed is true, color is PURPLE, hilt is {...}}
but: is {color is GREEN}

What you could do is render each object to XML using XStream, and then use XMLUnit to perform a comparison on the XML. If they differ, then you'll get the contextual information (in the form of an XPath, IIRC) telling you where the objects differ.
e.g. from the XMLUnit doc:
Comparing test xml to control xml [different]
Expected element tag name 'uuid' but was 'localId' -
comparing <uuid...> at /msg[1]/uuid[1] to <localId...> at /msg[1]/localId[1]
Note the XPath indicating the location of the differing elements.
Probably not fast, but that may not be an issue for unit tests.

Because of the way I tend to design complex objects, I have a very easy solution here.
When designing a complex object for which I need to write an equals method (and therefore a hashCode method), I tend to write a string renderer, and use the String class equals and hashCode methods.
The renderer, of course, is not toString: it doesn't really have to be easy for humans to read, and includes all and only the values I need to compare, and by habit I put them in the order which controls the way I'd want them to sort; none of which is necessarily true of the toString method.
Naturally, I cache this rendered string (and the hashCode value as well). It's normally private, but leaving the cached string package-private would let you see it from your unit tests.
Incidentally, this isn't always what I end up with in delivered systems, of course - if performance testing shows that this method is too slow, I'm prepared to replace it, but that's a rare case. So far, it's only happened once, in a system in which mutable objects were being rapidly changed and frequently compared.
The reason I do this is that writing a good hashCode isn't trivial, and requires testing(*), while making use of the one in String avoids the testing.
(* Consider that step 3 in Josh Bloch's recipe for writing a good hashCode method is to test it to make sure that "equal" objects have equal hashCode values, and making sure that you've covered all possible variations are covered isn't trivial in itself. More subtle and even harder to test well is distribution)

The code for this problem exists at http://code.google.com/p/deep-equals/
Use DeepEquals.deepEquals(a, b) to compare two Java objects for semantic equality. This will compare the objects using any custom equals() methods they may have (if they have an equals() method implemented other than Object.equals()). If not, this method will then proceed to compare the objects field by field, recursively. As each field is encountered, it will attempt to use the derived equals() if it exists, otherwise it will continue to recurse further.
This method will work on a cyclic Object graph like this: A->B->C->A. It has cycle detection so ANY two objects can be compared, and it will never enter into an endless loop.
Use DeepEquals.hashCode(obj) to compute a hashCode() for any object. Like deepEquals(), it will attempt to call the hashCode() method if a custom hashCode() method (below Object.hashCode()) is implemented, otherwise it will compute the hashCode field by field, recursively (Deep). Also like deepEquals(), this method will handle Object graphs with cycles. For example, A->B->C->A. In this case, hashCode(A) == hashCode(B) == hashCode(C). DeepEquals.deepHashCode() has cycle detection and therefore will work on ANY object graph.

Unit tests should have well-defined, single thing they test. This means that in the end you should have well-defined, single thing that can be different about those two object. If there are too many things that can differ, I would suggest splitting this test into several smaller tests.

I followed the same track you are on. I also had additionnal troubles:
we can't modify classes (for equals or toString) that we don't own (JDK), arrays etc.
equality is sometimes different in various contexts
For example, tracking entities equality might rely on database ids when available ("same row" concept), rely the equality of some fields (the business key) (for unsaved objects). For Junit assertion, you might want all fields equality.
So I ended up creating objects that run through a graph, doing their job as they go.
There is typically a superclass Crawling object:
crawl through all properties of the objects ; stop at:
enums,
framework classes (if applicable),
at unloaded proxies or distant connections,
at objects already visited (to avoid looping)
at Many-To-One relationship, if they indicate a parent (usually not included in the equals semantic)
...
configurable so that it can stop at some point (stop completely, or stop crawling inside the current property):
when mustStopCurrent() or mustStopCompletely() methods return true,
when encountering some annotations on a getter or a class,
when the current (class, getter) belong to a list of exceptions
...
From that Crawling superclass, subclasses are made for many needs:
For creating a debug string (calling toString as needed, with special cases for Collections and arrays that don't have a nice toString ; handling a size limit, and much more).
For creating several Equalizers (as said before, for Entities using ids, for all fields, or solely based on equals ;). These equalizers often need special cases also (for example for classes outside your control).
Back to the question : These Equalizers could remember the path to the differing values, that would be very useful your JUnit case to understand the difference.
For creating Orderers. For example, saving entities need to be done is a specific order, and efficiency will dictate that saving the same classes together will give a huge boost.
For collecting a set of objects that can be found at various levels in the graph. Looping on the result of the Collector is then very easy.
As a complement, I must say that, except for entities where performance is a real concern, I did choose that technology to implements toString(), hashCode(), equals() and compareTo() on my entities.
For example, if a business key on one or more fields is defined in Hibernate via a #UniqueConstraint on the class, let's pretend that all my entities have a getIdent() property implemented in a common superclass.
My entities superclass has a default implementation of these 4 methods that relies on this knowledge, for example (nulls need to be taken care of):
toString() prints "myClass(key1=value1, key2=value2)"
hashCode() is "value1.hashCode() ^ value2.hashCode()"
equals() is "value1.equals(other.value1) && value2.equals(other.value2)"
compareTo() is combine the comparison of the class, value1 and value2.
For entities where performance is of concern, I simply override these methods to not use reflexion. I can test in regression JUnit tests that the two implementations behave identically.

We use a library called junitx to test the equals contract on all of our "common" objects:
http://www.extreme-java.de/junitx/
The only way I can think of to test the different parts of your equals() method is to break down the information into something more granular. If you are testing a deeply-nested tree of objects, what you are doing is not truly a unit test. You need to test the equals() contract on each individual object in the graph with a separate test case for that type of object. You can use stub objects with a simplistic equals() implementation for the class-typed fields on the object under test.
HTH

I would not use the toString() because as you say, it is usually more useful for creating a nice representation of the object for display or logging purposes.
It sounds to me that your "unit" test is not isolating the unit under test. If, for example, your object graph is A-->B-->C and you are testing A, your unit test for A should not care that the equals() method in C is working. Your unit test for C would make sure it works.
So I would test the following in the test for A's equals() method:
- compare two A objects that have identical B's, in both directions, e.g. a1.equals(a2) and a2.equals(a1).
- compare two A objects that have different B's, in both directions
By doing it this way, with a JUnit assert for each comparison, you will know where the failure is.
Obviously if your class has more children that are part of determining equality, you would need to test many more combinations. What I'm trying to get at though is that your unit test should not care about the behavior of anything beyond the classes it has direct contact with. In my example, that means, you would assume C.equals() works correctly.
One wrinkle may be if you are comparing collections. In that case I would use a utility for comparing collections, such as commons-collections CollectionUtils.isEqualCollection(). Of course, only for collections in your unit under test.

If you're willing to have your tests written in scala you could use matchete. It is a collection of matchers that can be used with JUnit and provide amongst other things the ability to compare objects graphs:
case class Person(name: String, age: Int, address: Address)
case class Address(street: String)
Person("john",12, Address("rue de la paix")) must_== Person("john",12,Address("rue du bourg"))
Will produce the following error message
org.junit.ComparisonFailure: Person(john,12,Address(street)) is not equal to Person(john,12,Address(different street))
Got : address.street = 'rue de la paix'
Expected : address.street = 'rue du bourg'
As you can see here I've been using case classes, which are recognized by matchete in order to dive into the object graph.
This is done through a type-class called Diffable. I'm not going to discuss type-classes here, so let's say that it is the corner stone for this mechanism, which compare 2 instances of a given type. Types that are not case-classes (so basically all types in Java) get a default Diffable that uses equals. This isn't very useful, unless you provide a Diffable for your particular type:
// your java object
public class Person {
public String name;
public Address address;
}
// you scala test code
implicit val personDiffable : Diffable[Person] = Diffable.forFields(_.name,_.address)
// there you go you can now compare two person exactly the way you did it
// with the case classes
So we've seen that matchete works well with a java code base. As a matter of fact I've been using matchete at my last job on a large Java project.
Disclaimer : i'm the matchete author :)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java, Object.hashCode() result constant across all JVMs/Systems? - java

No, the result of hashCode() is only constant during a single execution. You should not expect the result of the function to be the same between executions, let alone between JRE versions or platforms.

Related

Java collection performance when comparing items

Value-based Classes confusion

Does Object.toString or Object.hashCode ever give the memory address of the object

Why does Java's Area#equals method not override Object#equals?

How to test for equality of complex object graphs?

Categories

Resources