Using UUIDs for cheap equals() and hashCode()

Using UUIDs for cheap equals() and hashCode() - java

I have an immutable class, TokenList, which consists of a list of Token objects, which are also immutable:
#Immutable
public final class TokenList {
private final List<Token> tokens;
public TokenList(List<Token> tokens) {
this.tokens = Collections.unmodifiableList(new ArrayList(tokens));
}
public List<Token> getTokens() {
return tokens;
}
}
I do several operations on these TokenLists that take multiple TokenLists as inputs and return a single TokenList as the output. There can be arbitrarily many TokenLists going in, and each can have arbitrarily many Tokens.
These operations are expensive, and there is a good chance that the same operation (ie the same inputs) will be performed multiple times, so I would like to cache the outputs. However, performance is critical, and I am worried about the expense of performing hashCode() and equals() on these objects that may contain arbitrarily many elements (as they are immutable then hashCode could be cached, but equals will still be expensive).
This led me to wondering whether I could use a UUID to provide equals() and hashCode() simply and cheaply by making the following updates to TokenList:
#Immutable
public final class TokenList {
private final List<Token> tokens;
private final UUID uuid;
public TokenList(List<Token> tokens) {
this.tokens = Collections.unmodifiableList(new ArrayList(tokens));
this.uuid = UUID.randomUUID();
}
public List<Token> getTokens() {
return tokens;
}
public UUID getUuid() {
return uuid;
}
}
And something like this to act as a cache key:
#Immutable
public final class TopicListCacheKey {
private final UUID[] uuids;
public TopicListCacheKey(TopicList... topicLists) {
uuids = new UUID[topicLists.length];
for (int i = 0; i < uuids.length; i++) {
uuids[i] = topicLists[i].getUuid();
}
}
#Override
public int hashCode() {
return Arrays.hashCode(uuids);
}
#Override
public boolean equals(Object other) {
if (other == this) return true;
if (other instanceof TopicListCacheKey)
return Arrays.equals(uuids, ((TopicListCacheKey) other).uuids);
return false;
}
}
I figure that there are 2^128 different UUIDs and I will probably have at most around 1,000,000 TokenList objects active in the application at any time. Given this, and the fact that the UUIDs are used combinatorially in cache keys, it seems that the chances of this producing the wrong result are vanishingly small. Nevertheless, I feel uneasy about going ahead with it as it just feels 'dirty'. Are there any reasons I should not use this system? Will the performance costs of the SecureRandom used by UUID.randomUUID() outweigh the gains (especially since I expect multiple threads to be doing this at the same time)? Are collisions going to be more likely than I think? Basically, is there anything wrong with doing it this way??
Thanks.

What you are trying is very tricky and needs detailed analysis. So you need to check below questions before deciding on any approach.
These operations are expensive, and there is a good chance that the same operation (ie the same inputs) will be performed multiple times
1) When you say "Same input" in the above line, what do you mean exactly? Does this mean, exact same object i.e. one object referred through several references (same memory location) or does it mean memory-wise separate objects but having logically the same data??
Here if the object is same i.e. same memory location, then == comparison would do fine. For this you have to keep object reference as key in cache.
But if it's the second case i.e. memory-wise separate objects but logically same, then I don't think UUID will help you. Because you have to make sure that such 2 separate objects will get same UUID. THis won't be much easy as anyways you have to go through whole TokenList data to make sure of this
2) Using hashcode in cache, is it safe? I suggest not to use hashcode as key because even though 2 objects are different, they may have the same hashcode. So your logic may go horribly wrong.
So get answers for these questions clear first & only then think about approach.

SecureRandom won't give you any boost, it is just "more" random than a normal Random. The chance of a collision is something on the order of the number squared divided by the total possible UUIDs, so a very small number. Still, I wouldn't rely on the number always being unique. You could try this, but it would be best to check and make sure the number isn't already included somewhere else in the hashcode list. Otherwise, you might get yourself into some very weird problems...

Related

How to implement Comparable so it is consistent with identity-equality

I have a class for which equality (as per equals()) must be defined by the object identity, i.e. this == other.
I want to implement Comparable to order such objects (say by some getName() property). To be consistent with equals(), compareTo() must not return 0, even if two objects have the same name.
Is there a way to compare object identities in the sense of compareTo? I could compare System.identityHashCode(o), but that would still return 0 in case of hash collisions.

I think the real answer here is: don't implement Comparable then. Implementing this interface implies that your objects have a natural order. Things that are "equal" should be in the same place when you follow up that thought.
If at all, you should use a custom comparator ... but even that doesn't make much sense. If the thing that defines a < b ... is not allowed to give you a == b (when a and b are "equal" according to your < relation), then the whole approach of comparing is broken for your use case.
In other words: just because you can put code into a class that "somehow" results in what you want ... doesn't make it a good idea to do so.

By definition, by assigning each object a Universally unique identifier (UUID) (or a Globally unique identifier, (GUID)) as it's identity property, the UUID is comparable, and consistent with equals. Java already has a UUID class, and once generated, you can just use the string representation for persistence. The dedicated property will also insure that the identity is stable across versions/threads/machines. You could also just use an incrementing ID if you have a method of insuring everything gets a unique ID, but using a standard UUID implementation will protect you from issues from set merges and parallel systems generating data at the same time.
If you use anything else for the comparable, that means that it is comparable in a way separate from its identity/value. So you will need to define what comparable means for this object, and document that. For example, people are comparable by name, DOB, height, or a combination by order of precedence; most naturally by name as a convention (for easier lookup by humans) which is separate from if two people are the same person. You will also have to accept that compareto and equals are disjoint because they are based on different things.

You could add a second property (say int id or long id) which would be unique for each instance of your class (you can have a static counter variable and use it to initialize the id in your constructor).
Then your compareTo method can first compare the names, and if the names are equal, compare the ids.
Since each instance has a different id, compareTo will never return 0.

While I stick by my original answer that you should use a UUID property for a stable and consistent compare / equality setup, I figured I'd go ahead an answer the question of "how far could you go if you were REALLY paranoid and wanted a guaranteed unique identity for comparable".
Basically, in short if you don't trust UUID uniqueness or identity uniqueness, just use as many UUIDs as it takes to prove god is actively conspiring against you. (Note that while not technically guaranteed not to throw an exception, needing 2 UUID should be overkill in any sane universe.)
import java.time.Instant;
import java.util.ArrayList;
import java.util.UUID;
public class Test implements Comparable<Test>{
private final UUID antiCollisionProp = UUID.randomUUID();
private final ArrayList<UUID> antiuniverseProp = new ArrayList<UUID>();
private UUID getParanoiaLevelId(int i) {
while(antiuniverseProp.size() < i) {
antiuniverseProp.add(UUID.randomUUID());
}
return antiuniverseProp.get(i);
}
#Override
public int compareTo(Test o) {
if(this == o)
return 0;
int temp = System.identityHashCode(this) - System.identityHashCode(o);
if(temp != 0)
return temp;
//If the universe hates you
temp = this.antiCollisionProp.compareTo(o.antiCollisionProp);
if(temp != 0)
return temp;
//If the universe is activly out to get you
temp = System.identityHashCode(this.antiCollisionProp) - System.identityHashCode(o.antiCollisionProp);;
if(temp != 0)
return temp;
for(int i = 0; i < Integer.MAX_VALUE; i++) {
UUID id1 = this.getParanoiaLevelId(i);
UUID id2 = o.getParanoiaLevelId(i);
temp = id1.compareTo(id2);
if(temp != 0)
return temp;
temp = System.identityHashCode(id1) - System.identityHashCode(id2);;
if(temp != 0)
return temp;
}
// If you reach this point, I have no idea what you did to deserve this
throw new IllegalStateException("RAGNAROK HAS COME! THE MIDGARD SERPENT AWAKENS!");
}
}

Assuming that with two objects with same name, if equals() returns false then compareTo() should not return 0. If this is what you want to do then following can help:
Override hashcode() and make sure it doesn't rely solely on name
Implement compareTo() as follows:
public void compareTo(MyObject object) {
this.equals(object) ? this.hashcode() - object.hashcode() : this.getName().compareTo(object.getName());
}

You are having unique objects, but as Eran said you may need an extra counter/rehash code for any collisions.
private static Set<Pair<C, C> collisions = ...;
#Override
public boolean equals(C other) {
return this == other;
}
#Override
public int compareTo(C other) {
...
if (this == other) {
return 0
}
if (super.equals(other)) {
// Some stable order would be fine:
// return either -1 or 1
if (collisions.contains(new Pair(other, this)) {
return 1;
} else if (!collisions.contains(new Pair(this, other)) {
collisions.add(new Par(this, other));
}
return 1;
}
...
}
So go with the answer of Eran or put the requirement as such in question.
One might consider the overhead of non-identical 0 comparisons neglectable.
One might look into ideal hash functions, if at some point of time no longer instances are created. This implies you have a collection of all instances.

There are times (although rare) when it is necessary to implement an identity-based compareTo override. In my case, I was implementing java.util.concurrent.Delayed.
Since the JDK also implements this class, I thought I would share the JDK's solution, which uses an atomically incrementing sequence number. Here is a snippet from ScheduledThreadPoolExecutor (slightly modified for clarity):
/**
* Sequence number to break scheduling ties, and in turn to
* guarantee FIFO order among tied entries.
*/
private static final AtomicLong sequencer = new AtomicLong();
private class ScheduledFutureTask<V>
extends FutureTask<V> implements RunnableScheduledFuture<V> {
/** Sequence number to break ties FIFO */
private final long sequenceNumber = sequencer.getAndIncrement();
}
If the other fields used in compareTo are exhausted, this sequenceNumber value is used to break ties. The range of a 64bit integer (long) is sufficiently large to count on this.

How to maintain several collections where an object can only exist in one?

I have a class and an enum that looks something like this
class Container{
static int next_id = 0;
final int id = next_id++;
State state = State.one;
}
enum State{
one, two, three, four, five;
}
I want to maintain several collections of Container but maintain that for any instance of Container, it is only present in one collection. The collections need to be thread safe, and I cannot store the Container directly in a hash based collection, as its hash will change based on its current state.
--edit--
To further clarify, the goal of this is to be able to retrieve all Containers that are in a given state, without having to inspect every single container's state as there are several thousand containers.

You won't be able to ensure that without writing some code yourself, as maintaining consistency, i.e. a Container belonging to exactly one collection depending on its state, requires not only the collections' but also the members' support.
Disregarding that the target language is java, I would probably suggest linked lists, something like this:
class Container{
static int next_id = 0;
final int id = next_id++;
final Node<Container> node;
State state;
public Container() {
node = new Node<>(this);
setState(State.one);
}
public void setState(State state) {
if(this.state != null) node.unlink();
this.state = state;
if(this.state != null) this.state.containers.add(node);
}
}
enum State{
one, two, three, four, five;
final List<Container> containers = ...;
}
This code only works efficiently because, knowing the node, an element can be unlinked from the linked list in O(1). (add is of course O(1) as well). Java's default linked list does not expose nodes, so every access requires a linear search, which is inefficient.
In the light of this, the next best thing is using a hash based approach. Some of the code can still remain:
setting the state is somewhat complex, so the logic should be put in a setter, not in the client doing container.state = ...
as long as the collections of containers are application wide (hint: in a managed environment, like an application server, they hardly are!), they can be maintained directly by the enum. otherwise, give the containers some kind of Context that holds an EnumMap<State, Map<Integer, Container>> or similar.
The result:
class Container{
static int next_id = 0;
final int id = next_id++;
State state;
public Container() {
setState(State.one);
}
public void setState(State state) {
if(this.state != null) this.state.containers.remove(id);
this.state = state;
if(this.state != null) this.state.containers.put(id, node);
}
}
enum State{
one, two, three, four, five;
final Map<Integer, Container> containers = new HashMap<>();
}
by using the id as the key, you don't have to worry about your own hashCode and equals implementations.
Regarding visibility: You really, really want to make a bunch of this private - or default (I'm looking at the maps in State) at the very least.
Regarding thread safety: I don't want to lean out of the window here (I'm no expert), but using ConcurrentHashMaps will probably do much of what you need. Also, synchronize setState. If you don't, two concurrent updates could insert the container into two maps. Finally, final int id = next_id++; is not thread safe to my knowledge, because ++ is not atomic. you could use AtomicInteger here.

Just override the hashCode() method to not factor in its State state so the hash would remain the same no matter the content of the state variable.
Even better, because your Containers id variable is what you are sorting by, and there exists only one Container per id, you could just do:
#Override
public int hashCode() {
return this.id;
}
If you want the hashCodes to not be linear, you could just XOR it against some number, and it will still be guaranteed unique per id
#Override
public int hashCode() {
return (2807 ^ this.id);
}

You probably can get the effect that you have described (add it to the proper collection on state change and have it automatically disappear from the prior collection) by using HashMapfor your collections. I say 'probably' because there may be other, unstated, requirements.
As you note, when you update the state, the hashcode value will change. This means you will no longer be able to retrieve it from the HashMap that has had the prior state when it was put, and it effectively disappears from the prior HashMap. Whenever you update state you should put it in the HashMap relevant for the new state.
Your unit test would call the "update state" method with various state changes and validate that when you get from each collection that it is found in only one collection and it is the right one.

java - live view on collection contained within a collection contained within ... etc

I have a class A which can contain many instances of class B which may in turn contain many instances of Class C, which can contain many instances of class D
Now, in class A I have a method getAllD. Currently every time this is called there is a lot of iterating that takes place, and a rather large list is freshly created and returned. This cannot be very efficient.
I was wondering how I could do this better. This question Combine multiple Collections into a single logical Collection? seems to touch upon a similar topic, but I'm not really sure how I could apply it to my situation.
All comments are much appreciated!

I would combine Iterables.concat with Iterables.transform to obtain a live view of Ds:
public class A {
private Collection<B> bs;
/**
* #return a live concatenated view of the Ds contained in the Cs
* contained in the Bs contained in this A.
*/
public Iterable<D> getDs() {
Iterable<C> cs = Iterables.concat(Iterables.transform(bs, BToCsFunction.INSTANCE));
Iterable<D> ds = Iterables.concat(Iterables.transform(cs, CToDsFunction.INSTANCE));
return ds;
}
private enum BToCsFunction implements Function<B, Collection<C>> {
INSTANCE;
#Override
public Collection<C> apply(B b) {
return b.getCs();
}
}
private enum CToDsFunction implements Function<C, Collection<D>> {
INSTANCE;
#Override
public Collection<D> apply(C c) {
return c.getDs();
}
}
}
public class B {
private Collection<C> cs;
public Collection<C> getCs() {
return cs;
}
}
public class C {
private Collection<D> ds;
public Collection<D> getDs() {
return ds;
}
}
This works well if your goal is simply to iterate over the Ds and you don't really need a collection view. It avoids the instantiation of a big temporary collection.

The answer to your question is going to depend on the specifics of your situation. Are these collections static or dynamic? How big is your collection of B's in A? Are you only going to access the Ds from A, or will you sometimes want to be farther down in the tree or returning Bs or Cs? How frequently are you going to want to access the same set of Ds from a particular A? Can a D (or C or B) be associated with more than 1 A?
If everything is dynamic, then the best chance of improving performance is to have parent references from the Cs to A, and then updating the parent whenever C's list of Ds changes. This way, you can keep a collection of Ds in your A object and update A whenever one of the Cs gets a new one or has one deleted.
If everything is static and there is some reuse of the D collections from each A, then caching may be a good choice, particularly if there are a lot of Bs. A would have a map with a key of B and a value of a collection of Ds. The getAllDs() method would first check to see if the map had a key for B and if so return its collection of Ds. If not, then it would generate the collection, store it into the cache map, and return the collection.
You could also use a tree to store the objects, particularly if they were fairly simple. For example, you could create an XML DOM object and use XPath expressions to pull out the subset of Ds that you wanted. This would allow far more dynamic access to the sets of objects you were interested in.
Each of these solutions has different tradeoffs in terms of cost to setup, cost to maintain, timeliness of results, flexibility of use, and cost to fetch results. Which you should choose is going to depend on your context.

Actually, I think Iterables.concat (or IteratorChain from Apache Commons) would work fine for your case:
class A {
Collection<B> children;
Iterator<D> getAllD() {
Iterator<Iterator<D>> iters = new ArrayList<Iterator<D>>();
for (B child : children) {
iters.add(child.getAllD());
}
Iterator<D> iter = Iterables.concat(iters);
return iter;
}
}
class B {
Collection<C> children;
Iterator<D> getAllD() {
Iterator<Iterator<D>> iters = new ArrayList<Iterator<D>>();
for (C child : children) {
iters.add(child.getAllD());
}
Iterator<D> iter = Iterables.concat(iters);
return iter;
}
}
class C {
Collection<D> children;
Iterator<D> getAllD() {
Iterator<D> iter = children.iterator();
return iter;
}
}

This cannot be very efficient.
Iterating in-memory is pretty damn fast. Also the efficiency of creating an ArrayList of 10 k elements compared to creating 10 ArrayList with 1k elements each won't be that drastically different. So, in conclusion, you should probably first just go with the most straight-forward iterating. Chances are that this works just fine.
Even if you have gazillion elements, it is probably wise to implement a straight-forward iterating anyways for comparison. Otherwise you don't know if you are being able to optimize or if you are slowing things down by doing things clever.
Having said that, if you want to optimize for sequential read access of all Ds, I'd maintain an "index" outside. The index could be a LinkedList, ArrayList, TreeList etc. depending on your situation. For example, if you aren't sure of the length of the index, it is probably wise to avoid ArrayList. If you want to efficiently remove random elements using the reference of that element, OrderedSet might be much better than a list etc.
When you do this you have to worry about the consistency of the index & actual references in your classes. I.e. more complexity = more place to hide bugs. So, unless you find it necessary through performance testing, it is really not advisable to attempt an optimization.
(btw avoiding instantiation of new collection objects are unlikely to make things much faster unless you are talking about EXTREME high-performing code. Object instantiation in modern JVMs only take a few ten nano seconds or something. Also, you could mistakenly use an ArrayList having small initial length or something and make things worse)

Caching objects built with multiple parameters

I have a factory that creates objects of class MyClass, returning already generated ones when they exist. As I have the creation method (getOrCreateMyClass) taking multiple parameters, which is the best way to use a Map to store and retrieve the objects?
My current solution is the following, but it doesn't sound too clear to me.
I use the hashCode method (slightly modified) of class MyClass to build an int based on the parameters of class MyClass, and I use it as the key of the Map.
import java.util.HashMap;
import java.util.Map;
public class MyClassFactory {
static Map<Integer, MyClass> cache = new HashMap<Integer, MyClass>();
private static class MyClass {
private String s;
private int i;
public MyClass(String s, int i) {
}
public static int getHashCode(String s, int i) {
final int prime = 31;
int result = 1;
result = prime * result + i;
result = prime * result + ((s == null) ? 0 : s.hashCode());
return result;
}
#Override
public int hashCode() {
return getHashCode(this.s, this.i);
}
}
public static MyClass getOrCreateMyClass(String s, int i) {
int hashCode = MyClass.getHashCode(s, i);
MyClass a = cache.get(hashCode);
if (a == null) {
a = new MyClass(s, i);
cache.put(hashCode , a);
}
return a;
}
}

Your getOrCreateMyClass doesn't seem to add to the cache if it creates.
I think this will also not perform correctly when hashcodes collide. Identical hashcodes do not imply equal objects. This could be the source of the bug you mentioned in a comment.
You might consider creating a generic Pair class with actual equals and hashCode methods and using Pair<String, Integer> class as the map key for your cache.
Edit:
The issue of extra memory consumption by storing both a Pair<String, Integer> key and a MyClass value might be best dealt with by making the Pair<String, Integer> into a field of MyClass and thereby having only one reference to this object.
With all of this though, you might have to worry about threading issues that don't seem to be addressed yet, and which could be another source of bugs.
And whether it is actually a good idea at all depends on whether the creation of MyClass is much more expensive than the creation of the map key.
Another Edit:
ColinD's answer is also reasonable (and I've upvoted it), as long as the construction of MyClass is not expensive.
Another approach that might be worth consideration is to use a nested map Map<String, Map<Integer, MyClass>>, which would require a two-stage lookup and complicate the cache updating a bit.

You really shouldn't be using the hashcode as the key in your map. A class's hashcode is not intended to necessarily guarantee that it will not be the same for any two non-equal instances of that class. Indeed, your hashcode method could definitely produce the same hashcode for two non-equal instances. You do need to implement equals on MyClass to check that two instances of MyClass are equal based on the equality of the String and int they contain. I'd also recommend making the s and i fields final to provide a stronger guarantee of the immutability of each MyClass instance if you're going to be using it this way.
Beyond that, I think what you actually want here is an interner.... that is, something to guarantee that you'll only ever store at most 1 instance of a given MyClass in memory at a time. The correct solution to this is a Map<MyClass, MyClass>... more specifically a ConcurrentMap<MyClass, MyClass> if there's any chance of getOrCreateMyClass being called from multiple threads. Now, you do need to create a new instance of MyClass in order to check the cache when using this approach, but that's inevitable really... and it's not a big deal because MyClass is easy to create.
Guava has something that does all the work for you here: its Interner interface and corresponding Interners factory/utility class. Here's how you might use it to implement getOrCreateMyClass:
private static final Interner<MyClass> interner = Interners.newStrongInterner();
public static MyClass getOrCreateMyClass(String s, int i) {
return interner.intern(new MyClass(s, i));
}
Note that using a strong interner will, like your example code, keep each MyClass it holds in memory as long as the interner is in memory, regardless of whether anything else in the program has a reference to a given instance. If you use newWeakInterner instead, when there isn't anything elsewhere in your program using a given MyClass instance, that instance will be eligible for garbage collection, helping you not waste memory with instances you don't need around.
If you choose to do this yourself, you'll want to use a ConcurrentMap cache and use putIfAbsent. You can take a look at the implementation of Guava's strong interner for reference I imagine... the weak reference approach is much more complicated though.

Is there a Java utility to do a deep comparison of two objects? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
How to "deep"-compare two objects that do not implement the equals method based on their field values in a test?
Original Question (closed because lack of precision and thus not fulfilling SO standards), kept for documentation purposes:
I'm trying to write unit tests for a variety of clone() operations inside a large project and I'm wondering if there is an existing class somewhere that is capable of taking two objects of the same type, doing a deep comparison, and saying if they're identical or not?

Unitils has this functionality:
Equality assertion through reflection, with different options like ignoring Java default/null values and ignoring order of collections

I love this question! Mainly because it is hardly ever answered or answered badly. It's like nobody has figured it out yet. Virgin territory :)
First off, don't even think about using equals. The contract of equals, as defined in the javadoc, is an equivalence relation (reflexive, symmetric, and transitive), not an equality relation. For that, it would also have to be antisymmetric. The only implementation of equals that is (or ever could be) a true equality relation is the one in java.lang.Object. Even if you did use equals to compare everything in the graph, the risk of breaking the contract is quite high. As Josh Bloch pointed out in Effective Java, the contract of equals is very easy to break:
"There is simply no way to extend an instantiable class and add an aspect while preserving the equals contract"
Besides what good does a boolean method really do you anyway? It'd be nice to actually encapsulate all the differences between the original and the clone, don't you think? Also, I'll assume here that you don't want to be bothered with writing/maintaining comparison code for each object in the graph, but rather you're looking for something that will scale with the source as it changes over time.
Soooo, what you really want is some kind of state comparison tool. How that tool is implemented is really dependent on the nature of your domain model and your performance restrictions. In my experience, there is no generic magic bullet. And it will be slow over a large number of iterations. But for testing the completeness of a clone operation, it'll do the job pretty well. Your two best options are serialization and reflection.
Some issues you will encounter:
Collection order: Should two collections be considered similar if they hold the same objects, but in a different order?
Which fields to ignore: Transient? Static?
Type equivalence: Should field values be of exactly the same type? Or is it ok for one to extend the other?
There's more, but I forget...
XStream is pretty fast and combined with XMLUnit will do the job in just a few lines of code. XMLUnit is nice because it can report all the differences, or just stop at the first one it finds. And its output includes the xpath to the differing nodes, which is nice. By default it doesn't allow unordered collections, but it can be configured to do so. Injecting a special difference handler (Called a DifferenceListener) allows you to specify the way you want to deal with differences, including ignoring order. However, as soon as you want to do anything beyond the simplest customization, it becomes difficult to write and the details tend to be tied down to a specific domain object.
My personal preference is to use reflection to cycle through all the declared fields and drill down into each one, tracking differences as I go. Word of warning: Don't use recursion unless you like stack overflow exceptions. Keep things in scope with a stack (use a LinkedList or something). I usually ignore transient and static fields, and I skip object pairs that I've already compared, so I don't end up in infinite loops if someone decided to write self-referential code (However, I always compare primitive wrappers no matter what, since the same object refs are often reused). You can configure things up front to ignore collection ordering and to ignore special types or fields, but I like to define my state comparison policies on the fields themselves via annotations. This, IMHO, is exactly what annotations were meant for, to make meta data about the class available at runtime. Something like:
#StatePolicy(unordered=true, ignore=false, exactTypesOnly=true)
private List<StringyThing> _mylist;
I think this is actually a really hard problem, but totally solvable! And once you have something that works for you, it is really, really, handy :)
So, good luck. And if you come up with something that's just pure genius, don't forget to share!

In AssertJ, you can do:
Assertions.assertThat(expectedObject).isEqualToComparingFieldByFieldRecursively(actualObject);
Probably it won't work in all cases, however it will work in more cases that you'd think.
Here's what the documentation says:
Assert that the object under test (actual) is equal to the given
object based on recursive a property/field by property/field
comparison (including inherited ones). This can be useful if actual's
equals implementation does not suit you. The recursive property/field
comparison is not applied on fields having a custom equals
implementation, i.e. the overridden equals method will be used instead
of a field by field comparison.
The recursive comparison handles cycles. By default floats are
compared with a precision of 1.0E-6 and doubles with 1.0E-15.
You can specify a custom comparator per (nested) fields or type with
respectively usingComparatorForFields(Comparator, String...) and
usingComparatorForType(Comparator, Class).
The objects to compare can be of different types but must have the
same properties/fields. For example if actual object has a name String
field, it is expected the other object to also have one. If an object
has a field and a property with the same name, the property value will
be used over the field.

Override The equals() Method
You can simply override the equals() method of the class using the EqualsBuilder.reflectionEquals() as explained here:
public boolean equals(Object obj) {
return EqualsBuilder.reflectionEquals(this, obj);
}

Just had to implement comparison of two entity instances revised by Hibernate Envers. I started writing my own differ but then found the following framework.
https://github.com/SQiShER/java-object-diff
You can compare two objects of the same type and it will show changes, additions and removals. If there are no changes, then the objects are equal (in theory). Annotations are provided for getters that should be ignored during the check. The frame work has far wider applications than equality checking, i.e. I am using to generate a change-log.
Its performance is OK, when comparing JPA entities, be sure to detach them from the entity manager first.

I am usin XStream:
/**
* #see java.lang.Object#equals(java.lang.Object)
*/
#Override
public boolean equals(Object o) {
XStream xstream = new XStream();
String oxml = xstream.toXML(o);
String myxml = xstream.toXML(this);
return myxml.equals(oxml);
}
/**
* #see java.lang.Object#hashCode()
*/
#Override
public int hashCode() {
XStream xstream = new XStream();
String myxml = xstream.toXML(this);
return myxml.hashCode();
}

http://www.unitils.org/tutorial-reflectionassert.html
public class User {
private long id;
private String first;
private String last;
public User(long id, String first, String last) {
this.id = id;
this.first = first;
this.last = last;
}
}
User user1 = new User(1, "John", "Doe");
User user2 = new User(1, "John", "Doe");
assertReflectionEquals(user1, user2);

Hamcrest has the Matcher samePropertyValuesAs. But it relies on the JavaBeans Convention (uses getters and setters). Should the objects that are to be compared not have getters and setters for their attributes, this will not work.
import static org.hamcrest.beans.SamePropertyValuesAs.samePropertyValuesAs;
import static org.junit.Assert.assertThat;
import org.junit.Test;
public class UserTest {
#Test
public void asfd() {
User user1 = new User(1, "John", "Doe");
User user2 = new User(1, "John", "Doe");
assertThat(user1, samePropertyValuesAs(user2)); // all good
user2 = new User(1, "John", "Do");
assertThat(user1, samePropertyValuesAs(user2)); // will fail
}
}
The user bean - with getters and setters
public class User {
private long id;
private String first;
private String last;
public User(long id, String first, String last) {
this.id = id;
this.first = first;
this.last = last;
}
public long getId() {
return id;
}
public void setId(long id) {
this.id = id;
}
public String getFirst() {
return first;
}
public void setFirst(String first) {
this.first = first;
}
public String getLast() {
return last;
}
public void setLast(String last) {
this.last = last;
}
}

If your objects implement Serializable you can use this:
public static boolean deepCompare(Object o1, Object o2) {
try {
ByteArrayOutputStream baos1 = new ByteArrayOutputStream();
ObjectOutputStream oos1 = new ObjectOutputStream(baos1);
oos1.writeObject(o1);
oos1.close();
ByteArrayOutputStream baos2 = new ByteArrayOutputStream();
ObjectOutputStream oos2 = new ObjectOutputStream(baos2);
oos2.writeObject(o2);
oos2.close();
return Arrays.equals(baos1.toByteArray(), baos2.toByteArray());
} catch (IOException e) {
throw new RuntimeException(e);
}
}

Your Linked List example is not that difficult to handle. As the code traverses the two object graphs, it places visited objects in a Set or Map. Before traversing into another object reference, this set is tested to see if the object has already been traversed. If so, no need to go further.
I agree with the person above who said use a LinkedList (like a Stack but without synchronized methods on it, so it is faster). Traversing the object graph using a Stack, while using reflection to get each field, is the ideal solution. Written once, this "external" equals() and "external" hashCode() is what all equals() and hashCode() methods should call. Never again do you need a customer equals() method.
I wrote a bit of code that traverses a complete object graph, listed over at Google Code. See json-io (http://code.google.com/p/json-io/). It serializes a Java object graph into JSON and deserialized from it. It handles all Java objects, with or without public constructors, Serializeable or not Serializable, etc. This same traversal code will be the basis for the external "equals()" and external "hashcode()" implementation. Btw, the JsonReader / JsonWriter (json-io) is usually faster than the built-in ObjectInputStream / ObjectOutputStream.
This JsonReader / JsonWriter could be used for comparison, but it will not help with hashcode. If you want a universal hashcode() and equals(), it needs it's own code. I may be able to pull this off with a generic graph visitor. We'll see.
Other considerations - static fields - that's easy - they can be skipped because all equals() instances would have the same value for static fields, as the static fields is shared across all instances.
As for transient fields - that will be a selectable option. Sometimes you may want transients to count other times not. "Sometimes you feel like a nut, sometimes you don't."
Check back to the json-io project (for my other projects) and you will find the external equals() / hashcode() project. I don't have a name for it yet, but it will be obvious.

I think the easiest solution inspired by Ray Hulha solution is to serialize the object and then deep compare the raw result.
The serialization could be either byte, json, xml or simple toString etc. ToString seems to be cheaper. Lombok generates free easy customizable ToSTring for us. See example below.
#ToString #Getter #Setter
class foo{
boolean foo1;
String foo2;
public boolean deepCompare(Object other) { //for cohesiveness
return other != null && this.toString().equals(other.toString());
}
}

I guess you know this, but In theory, you're supposed to always override .equals to assert that two objects are truly equal. This would imply that they check the overridden .equals methods on their members.
This kind of thing is why .equals is defined in Object.
If this were done consistently you wouldn't have a problem.

A halting guarantee for such a deep comparison might be a problem. What should the following do? (If you implement such a comparator, this would make a good unit test.)
LinkedListNode a = new LinkedListNode();
a.next = a;
LinkedListNode b = new LinkedListNode();
b.next = b;
System.out.println(DeepCompare(a, b));
Here's another:
LinkedListNode c = new LinkedListNode();
LinkedListNode d = new LinkedListNode();
c.next = d;
d.next = c;
System.out.println(DeepCompare(c, d));

Apache gives you something, convert both objects to string and compare strings, but you have to Override toString()
obj1.toString().equals(obj2.toString())
Override toString()
If all fields are primitive types :
import org.apache.commons.lang3.builder.ReflectionToStringBuilder;
#Override
public String toString() {return
ReflectionToStringBuilder.toString(this);}
If you have non primitive fields and/or collection and/or map :
// Within class
import org.apache.commons.lang3.builder.ReflectionToStringBuilder;
#Override
public String toString() {return
ReflectionToStringBuilder.toString(this,new
MultipleRecursiveToStringStyle());}
// New class extended from Apache ToStringStyle
import org.apache.commons.lang3.builder.ReflectionToStringBuilder;
import org.apache.commons.lang3.builder.ToStringStyle;
import java.util.*;
public class MultipleRecursiveToStringStyle extends ToStringStyle {
private static final int INFINITE_DEPTH = -1;
private int maxDepth;
private int depth;
public MultipleRecursiveToStringStyle() {
this(INFINITE_DEPTH);
}
public MultipleRecursiveToStringStyle(int maxDepth) {
setUseShortClassName(true);
setUseIdentityHashCode(false);
this.maxDepth = maxDepth;
}
#Override
protected void appendDetail(StringBuffer buffer, String fieldName, Object value) {
if (value.getClass().getName().startsWith("java.lang.")
|| (maxDepth != INFINITE_DEPTH && depth >= maxDepth)) {
buffer.append(value);
} else {
depth++;
buffer.append(ReflectionToStringBuilder.toString(value, this));
depth--;
}
}
#Override
protected void appendDetail(StringBuffer buffer, String fieldName,
Collection<?> coll) {
for(Object value: coll){
if (value.getClass().getName().startsWith("java.lang.")
|| (maxDepth != INFINITE_DEPTH && depth >= maxDepth)) {
buffer.append(value);
} else {
depth++;
buffer.append(ReflectionToStringBuilder.toString(value, this));
depth--;
}
}
}
#Override
protected void appendDetail(StringBuffer buffer, String fieldName, Map<?, ?> map) {
for(Map.Entry<?,?> kvEntry: map.entrySet()){
Object value = kvEntry.getKey();
if (value.getClass().getName().startsWith("java.lang.")
|| (maxDepth != INFINITE_DEPTH && depth >= maxDepth)) {
buffer.append(value);
} else {
depth++;
buffer.append(ReflectionToStringBuilder.toString(value, this));
depth--;
}
value = kvEntry.getValue();
if (value.getClass().getName().startsWith("java.lang.")
|| (maxDepth != INFINITE_DEPTH && depth >= maxDepth)) {
buffer.append(value);
} else {
depth++;
buffer.append(ReflectionToStringBuilder.toString(value, this));
depth--;
}
}
}}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.