How to implement Comparable so it is consistent with identity-equality - java

I have a class for which equality (as per equals()) must be defined by the object identity, i.e. this == other.
I want to implement Comparable to order such objects (say by some getName() property). To be consistent with equals(), compareTo() must not return 0, even if two objects have the same name.
Is there a way to compare object identities in the sense of compareTo? I could compare System.identityHashCode(o), but that would still return 0 in case of hash collisions.

I think the real answer here is: don't implement Comparable then. Implementing this interface implies that your objects have a natural order. Things that are "equal" should be in the same place when you follow up that thought.
If at all, you should use a custom comparator ... but even that doesn't make much sense. If the thing that defines a < b ... is not allowed to give you a == b (when a and b are "equal" according to your < relation), then the whole approach of comparing is broken for your use case.
In other words: just because you can put code into a class that "somehow" results in what you want ... doesn't make it a good idea to do so.

By definition, by assigning each object a Universally unique identifier (UUID) (or a Globally unique identifier, (GUID)) as it's identity property, the UUID is comparable, and consistent with equals. Java already has a UUID class, and once generated, you can just use the string representation for persistence. The dedicated property will also insure that the identity is stable across versions/threads/machines. You could also just use an incrementing ID if you have a method of insuring everything gets a unique ID, but using a standard UUID implementation will protect you from issues from set merges and parallel systems generating data at the same time.
If you use anything else for the comparable, that means that it is comparable in a way separate from its identity/value. So you will need to define what comparable means for this object, and document that. For example, people are comparable by name, DOB, height, or a combination by order of precedence; most naturally by name as a convention (for easier lookup by humans) which is separate from if two people are the same person. You will also have to accept that compareto and equals are disjoint because they are based on different things.

You could add a second property (say int id or long id) which would be unique for each instance of your class (you can have a static counter variable and use it to initialize the id in your constructor).
Then your compareTo method can first compare the names, and if the names are equal, compare the ids.
Since each instance has a different id, compareTo will never return 0.

While I stick by my original answer that you should use a UUID property for a stable and consistent compare / equality setup, I figured I'd go ahead an answer the question of "how far could you go if you were REALLY paranoid and wanted a guaranteed unique identity for comparable".
Basically, in short if you don't trust UUID uniqueness or identity uniqueness, just use as many UUIDs as it takes to prove god is actively conspiring against you. (Note that while not technically guaranteed not to throw an exception, needing 2 UUID should be overkill in any sane universe.)
import java.time.Instant;
import java.util.ArrayList;
import java.util.UUID;
public class Test implements Comparable<Test>{
private final UUID antiCollisionProp = UUID.randomUUID();
private final ArrayList<UUID> antiuniverseProp = new ArrayList<UUID>();
private UUID getParanoiaLevelId(int i) {
while(antiuniverseProp.size() < i) {
antiuniverseProp.add(UUID.randomUUID());
}
return antiuniverseProp.get(i);
}
#Override
public int compareTo(Test o) {
if(this == o)
return 0;
int temp = System.identityHashCode(this) - System.identityHashCode(o);
if(temp != 0)
return temp;
//If the universe hates you
temp = this.antiCollisionProp.compareTo(o.antiCollisionProp);
if(temp != 0)
return temp;
//If the universe is activly out to get you
temp = System.identityHashCode(this.antiCollisionProp) - System.identityHashCode(o.antiCollisionProp);;
if(temp != 0)
return temp;
for(int i = 0; i < Integer.MAX_VALUE; i++) {
UUID id1 = this.getParanoiaLevelId(i);
UUID id2 = o.getParanoiaLevelId(i);
temp = id1.compareTo(id2);
if(temp != 0)
return temp;
temp = System.identityHashCode(id1) - System.identityHashCode(id2);;
if(temp != 0)
return temp;
}
// If you reach this point, I have no idea what you did to deserve this
throw new IllegalStateException("RAGNAROK HAS COME! THE MIDGARD SERPENT AWAKENS!");
}
}

Assuming that with two objects with same name, if equals() returns false then compareTo() should not return 0. If this is what you want to do then following can help:
Override hashcode() and make sure it doesn't rely solely on name
Implement compareTo() as follows:
public void compareTo(MyObject object) {
this.equals(object) ? this.hashcode() - object.hashcode() : this.getName().compareTo(object.getName());
}

You are having unique objects, but as Eran said you may need an extra counter/rehash code for any collisions.
private static Set<Pair<C, C> collisions = ...;
#Override
public boolean equals(C other) {
return this == other;
}
#Override
public int compareTo(C other) {
...
if (this == other) {
return 0
}
if (super.equals(other)) {
// Some stable order would be fine:
// return either -1 or 1
if (collisions.contains(new Pair(other, this)) {
return 1;
} else if (!collisions.contains(new Pair(this, other)) {
collisions.add(new Par(this, other));
}
return 1;
}
...
}
So go with the answer of Eran or put the requirement as such in question.
One might consider the overhead of non-identical 0 comparisons neglectable.
One might look into ideal hash functions, if at some point of time no longer instances are created. This implies you have a collection of all instances.

There are times (although rare) when it is necessary to implement an identity-based compareTo override. In my case, I was implementing java.util.concurrent.Delayed.
Since the JDK also implements this class, I thought I would share the JDK's solution, which uses an atomically incrementing sequence number. Here is a snippet from ScheduledThreadPoolExecutor (slightly modified for clarity):
/**
* Sequence number to break scheduling ties, and in turn to
* guarantee FIFO order among tied entries.
*/
private static final AtomicLong sequencer = new AtomicLong();
private class ScheduledFutureTask<V>
extends FutureTask<V> implements RunnableScheduledFuture<V> {
/** Sequence number to break ties FIFO */
private final long sequenceNumber = sequencer.getAndIncrement();
}
If the other fields used in compareTo are exhausted, this sequenceNumber value is used to break ties. The range of a 64bit integer (long) is sufficiently large to count on this.

Related

Is there any chance for the hash codes of two different objects of being same? [duplicate]

In Java, obj.hashCode() returns some value. What is the use of this hash code in programming?
hashCode() is used for bucketing in Hash implementations like HashMap, HashTable, HashSet, etc.
The value received from hashCode() is used as the bucket number for storing elements of the set/map. This bucket number is the address of the element inside the set/map.
When you do contains() it will take the hash code of the element, then look for the bucket where hash code points to. If more than 1 element is found in the same bucket (multiple objects can have the same hash code), then it uses the equals() method to evaluate if the objects are equal, and then decide if contains() is true or false, or decide if element could be added in the set or not.
From the Javadoc:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)
hashCode() is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like HashMap, HashTable, HashSet, etc. that need to store objects will use a hashCode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode), and that, of course, needs to be solved carefully.
The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
This answer is from the java SE 8 official tutorial documentation
A hashcode is a number generated from any object.
This is what allows objects to be stored/retrieved quickly in a Hashtable.
Imagine the following simple example:
On the table in front of you. you have nine boxes, each marked with a number 1 to 9. You also have a pile of wildly different objects to store in these boxes, but once they are in there you need to be able to find them as quickly as possible.
What you need is a way of instantly deciding which box you have put each object in. It works like an index. you decide to find the cabbage so you look up which box the cabbage is in, then go straight to that box to get it.
Now imagine that you don't want to bother with the index, you want to be able to find out immediately from the object which box it lives in.
In the example, let's use a really simple way of doing this - the number of letters in the name of the object. So the cabbage goes in box 7, the pea goes in box 3, the rocket in box 6, the banjo in box 5 and so on.
What about the rhinoceros, though? It has 10 characters, so we'll change our algorithm a little and "wrap around" so that 10-letter objects go in box 1, 11 letters in box 2 and so on. That should cover any object.
Sometimes a box will have more than one object in it, but if you are looking for a rocket, it's still much quicker to compare a peanut and a rocket, than to check a whole pile of cabbages, peas, banjos, and rhinoceroses.
That's a hash code. A way of getting a number from an object so it can be stored in a Hashtable. In Java, a hash code can be any integer, and each object type is responsible for generating its own. Lookup the "hashCode" method of Object.
Source - here
Although hashcode does nothing with your business logic, we have to take care of it in most cases. Because when your object is put into a hash based container(HashSet, HashMap...), the container puts/gets the element's hashcode.
hashCode() is a unique code which is generated by the JVM for every object creation.
We use hashCode() to perform some operation on hashing related algorithm like Hashtable, Hashmap etc..
The advantages of hashCode() make searching operation easy because when we search for an object that has unique code, it helps to find out that object.
But we can't say hashCode() is the address of an object. It is a unique code generated by JVM for every object.
That is why nowadays hashing algorithm is the most popular search algorithm.
One of the uses of hashCode() is building a Catching mechanism.
Look at this example:
class Point
{
public int x, y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
if (x != point.x) return false;
return y == point.y;
}
#Override
public int hashCode()
{
int result = x;
result = 31 * result + y;
return result;
}
class Line
{
public Point start, end;
public Line(Point start, Point end)
{
this.start = start;
this.end = end;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Line line = (Line) o;
if (!start.equals(line.start)) return false;
return end.equals(line.end);
}
#Override
public int hashCode()
{
int result = start.hashCode();
result = 31 * result + end.hashCode();
return result;
}
}
class LineToPointAdapter implements Iterable<Point>
{
private static int count = 0;
private static Map<Integer, List<Point>> cache = new HashMap<>();
private int hash;
public LineToPointAdapter(Line line)
{
hash = line.hashCode();
if (cache.get(hash) != null) return; // we already have it
System.out.println(
String.format("%d: Generating points for line [%d,%d]-[%d,%d] (no caching)",
++count, line.start.x, line.start.y, line.end.x, line.end.y));
}

Reason behind JVM's default Object.HashCode() implementation

I am trying to understand why JVM's default implementation does not return same hashcode() value for all the objects...
I have written a program where i have overridden equals() but not hashCode(), and the consequences are scary.
HashSet is adding two objects even the equals are same.
TreeSet is throwing exception with Comparable implementation..
And many more..
Had the default Object'shashCode() implementation returns same int value, all these issues could have been avoided...
I understand their's alot written and discussed about hashcode() and equals() but i am not able to understand why things cant be handled at by default, this is error prone and consequences could be really bad and scary..
Here's my sample program..
import java.util.HashSet;
import java.util.Set;
public class HashcodeTest {
public static void main(String...strings ) {
Car car1 = new Car("honda", "red");
Car car2 = new Car("honda", "red");
Set<Car> set = new HashSet<Car>();
set.add(car1);
set.add(car2);
System.out.println("size of Set : "+set.size());
System.out.println("hashCode for car1 : "+car1.hashCode());
System.out.println("hashCode for car2 : "+car2.hashCode());
}
}
class Car{
private String name;
private String color;
public Car(String name, String color) {
super();
this.name = name;
this.color = color;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getColor() {
return color;
}
public void setColor(String color) {
this.color = color;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Car other = (Car) obj;
if (color == null) {
if (other.color != null)
return false;
} else if (!color.equals(other.color))
return false;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
return true;
}
}
Output:
size of Set : 2
hashCode for car1 : 330932989
hashCode for car2 : 8100393
It seems that you want to propose to calculate hashCode by default just by taking all the object fields and combining their hashCodes using some formula. Such approach is wrong and may lead to many unpleasant circumstances. In your case it would work, because your object is very simple. But real life objects are much more complex. A few examples:
Objects are connected into double-linked list (every object has previous and next fields). How default implementation would calculate the hashCode? If it should check the fields, it will end up with infinite recursion.
Ok, suppose that we can detect infinite recursion. Let's just have single-linked list. In this case the hashCode of every node should be calculated from all the successor nodes? What if this list contains millions of nodes? All of them should be checked to generate the hashCode?
Suppose you have two HashSet objects. First is created like:
HashSet<Integer> a = new HashSet<>();
a.add(1);
The second is created like this:
HashSet<Integer> b = new HashSet<>();
for(int i=1; i<1000; i++) b.add(i);
for(int i=2; i<1000; i++) b.remove(i);
From user's point of view both contain only one element. But programmatically the second one holds big hash-table inside (like array of 2048 entries of which only one is not null), because when you added many elements, the hash-table was resized. In contrast, the first one holds small hash-table inside (e.g. 16 elements). So programmatically objects are very different: one has big array, other has small array. But they are equal and have the same hashCode, thanks to custom implementation of hashCode and equals.
Suppose you have different List implementations. For example, ArrayList and LinkedList. Both contain the same elements and from the user's point of view they are equal and should have the same hashCode. And they indeed equal and have the same hashCode. However their internal structure is completely different: ArrayList contains an array while LinkedList contains pointers to the objects representing head and tail. So you cannot just generate the hashCode based on their fields: it surely will be different.
Some object may contain the field which is lazily initialized (initialized to null and calculated from other fields only when necessary). What if you have two otherwise equal objects and one has its lazy field initialized while other is not? We should exclude this lazy field from hashCode calculation.
So, there are many cases when universal hashCode approach would not work and may even produce problems (like making your program crash with StackOverflowError or stuck enumerating all the linked objects). Due to this the simplest implementation was selected which is based on object identity. Note that the contract of hashCode and equals requires them to be consistent, and it's fulfilled by default implementation. If you redefine equals, you just must redefine hashCode as well.
You broke the contract.
hashcode and equals should be written in such a way, that when equals return true these objects has same hashcode.
If you override equals then you must provide hashcode that works properly.
Default implementation can't handle it, because default implementation don't know which fields are important. And automatic implementation would not do it in efficient way, the hashcode function is to speed up operations like data lookup in data structures, if it is implemented improperly, then performance will suffer.
From the Docs
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
From documentation:
If two objects are equal according to the equals(Object)
method, then calling the hashCode} method on each of
the two objects must produce the same integer result.
then if you overrides how equals() behave, you must override hashCode() as well.
Also, from docs of equals() -
Note that it is generally necessary to override the hashCode
method whenever this method is overridden, so as to maintain the
general contract for the hashCode method, which states
that equal objects must have equal hash codes.
From javadoc of Object class:
Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.
Thus if default implementation provides the same hash, it defeats the purpose.
And for a default implementation, it cannot assume all the classes are of value class, thus the last sentence from doc:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.

Java, Date, Array, hashcode() [duplicate]

In Java, obj.hashCode() returns some value. What is the use of this hash code in programming?
hashCode() is used for bucketing in Hash implementations like HashMap, HashTable, HashSet, etc.
The value received from hashCode() is used as the bucket number for storing elements of the set/map. This bucket number is the address of the element inside the set/map.
When you do contains() it will take the hash code of the element, then look for the bucket where hash code points to. If more than 1 element is found in the same bucket (multiple objects can have the same hash code), then it uses the equals() method to evaluate if the objects are equal, and then decide if contains() is true or false, or decide if element could be added in the set or not.
From the Javadoc:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)
hashCode() is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like HashMap, HashTable, HashSet, etc. that need to store objects will use a hashCode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode), and that, of course, needs to be solved carefully.
The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
This answer is from the java SE 8 official tutorial documentation
A hashcode is a number generated from any object.
This is what allows objects to be stored/retrieved quickly in a Hashtable.
Imagine the following simple example:
On the table in front of you. you have nine boxes, each marked with a number 1 to 9. You also have a pile of wildly different objects to store in these boxes, but once they are in there you need to be able to find them as quickly as possible.
What you need is a way of instantly deciding which box you have put each object in. It works like an index. you decide to find the cabbage so you look up which box the cabbage is in, then go straight to that box to get it.
Now imagine that you don't want to bother with the index, you want to be able to find out immediately from the object which box it lives in.
In the example, let's use a really simple way of doing this - the number of letters in the name of the object. So the cabbage goes in box 7, the pea goes in box 3, the rocket in box 6, the banjo in box 5 and so on.
What about the rhinoceros, though? It has 10 characters, so we'll change our algorithm a little and "wrap around" so that 10-letter objects go in box 1, 11 letters in box 2 and so on. That should cover any object.
Sometimes a box will have more than one object in it, but if you are looking for a rocket, it's still much quicker to compare a peanut and a rocket, than to check a whole pile of cabbages, peas, banjos, and rhinoceroses.
That's a hash code. A way of getting a number from an object so it can be stored in a Hashtable. In Java, a hash code can be any integer, and each object type is responsible for generating its own. Lookup the "hashCode" method of Object.
Source - here
Although hashcode does nothing with your business logic, we have to take care of it in most cases. Because when your object is put into a hash based container(HashSet, HashMap...), the container puts/gets the element's hashcode.
hashCode() is a unique code which is generated by the JVM for every object creation.
We use hashCode() to perform some operation on hashing related algorithm like Hashtable, Hashmap etc..
The advantages of hashCode() make searching operation easy because when we search for an object that has unique code, it helps to find out that object.
But we can't say hashCode() is the address of an object. It is a unique code generated by JVM for every object.
That is why nowadays hashing algorithm is the most popular search algorithm.
One of the uses of hashCode() is building a Catching mechanism.
Look at this example:
class Point
{
public int x, y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
if (x != point.x) return false;
return y == point.y;
}
#Override
public int hashCode()
{
int result = x;
result = 31 * result + y;
return result;
}
class Line
{
public Point start, end;
public Line(Point start, Point end)
{
this.start = start;
this.end = end;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Line line = (Line) o;
if (!start.equals(line.start)) return false;
return end.equals(line.end);
}
#Override
public int hashCode()
{
int result = start.hashCode();
result = 31 * result + end.hashCode();
return result;
}
}
class LineToPointAdapter implements Iterable<Point>
{
private static int count = 0;
private static Map<Integer, List<Point>> cache = new HashMap<>();
private int hash;
public LineToPointAdapter(Line line)
{
hash = line.hashCode();
if (cache.get(hash) != null) return; // we already have it
System.out.println(
String.format("%d: Generating points for line [%d,%d]-[%d,%d] (no caching)",
++count, line.start.x, line.start.y, line.end.x, line.end.y));
}

Is it safe to return 0 as hashcode

I have a class that has a List of Article (or what you want). Is it safe to implement/override the hashCode() function in that class like this :
class Place{
List <Article> list = new List<Article>();
public int hashCode() {
if(list.isEmpty())
return 0; //is it safe to return 0
else
return list.get(0).hashCode();
}
}
public class Main{
private HashSet<Place> places = new HashSet<>();
//this function must be donne like this
public boolean isArticleIn(Article article){
Place place = new Place();
place.add(article);
return places.contains(place);
}
}
Is there a possibility to have a list that is not empty and return 0.
If you want to store objects of your class in a container which uses hashCode, then you should make sure that "if two objects are equal then they should return the same hash code" (otherwise the container may store duplicates / generally get confused). Will objects of your class compare equal if they both have an empty list?
Best advice on how to implement equals and hashcode so that they capture all the information you want while remaining consistent is available here (using EqualsBuilder and HashCodeBuilder from Apache Commons Lang library recommended). It does seem likely that all the elements in your list should contribute to the hashcode - after all, if the second or third element of the list is different, then your objects will return false from 'equals' - right?
It is safe. It's immutable, which is good.
However your hashed collections won't be performant, since you're not providing an evenly distributed range of hashcodes, and so your hash populations won't be evenly distributed. See here for an explanation of what's going on inside a hashed collection.

Is there a Java utility to do a deep comparison of two objects? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
How to "deep"-compare two objects that do not implement the equals method based on their field values in a test?
Original Question (closed because lack of precision and thus not fulfilling SO standards), kept for documentation purposes:
I'm trying to write unit tests for a variety of clone() operations inside a large project and I'm wondering if there is an existing class somewhere that is capable of taking two objects of the same type, doing a deep comparison, and saying if they're identical or not?
Unitils has this functionality:
Equality assertion through reflection, with different options like ignoring Java default/null values and ignoring order of collections
I love this question! Mainly because it is hardly ever answered or answered badly. It's like nobody has figured it out yet. Virgin territory :)
First off, don't even think about using equals. The contract of equals, as defined in the javadoc, is an equivalence relation (reflexive, symmetric, and transitive), not an equality relation. For that, it would also have to be antisymmetric. The only implementation of equals that is (or ever could be) a true equality relation is the one in java.lang.Object. Even if you did use equals to compare everything in the graph, the risk of breaking the contract is quite high. As Josh Bloch pointed out in Effective Java, the contract of equals is very easy to break:
"There is simply no way to extend an instantiable class and add an aspect while preserving the equals contract"
Besides what good does a boolean method really do you anyway? It'd be nice to actually encapsulate all the differences between the original and the clone, don't you think? Also, I'll assume here that you don't want to be bothered with writing/maintaining comparison code for each object in the graph, but rather you're looking for something that will scale with the source as it changes over time.
Soooo, what you really want is some kind of state comparison tool. How that tool is implemented is really dependent on the nature of your domain model and your performance restrictions. In my experience, there is no generic magic bullet. And it will be slow over a large number of iterations. But for testing the completeness of a clone operation, it'll do the job pretty well. Your two best options are serialization and reflection.
Some issues you will encounter:
Collection order: Should two collections be considered similar if they hold the same objects, but in a different order?
Which fields to ignore: Transient? Static?
Type equivalence: Should field values be of exactly the same type? Or is it ok for one to extend the other?
There's more, but I forget...
XStream is pretty fast and combined with XMLUnit will do the job in just a few lines of code. XMLUnit is nice because it can report all the differences, or just stop at the first one it finds. And its output includes the xpath to the differing nodes, which is nice. By default it doesn't allow unordered collections, but it can be configured to do so. Injecting a special difference handler (Called a DifferenceListener) allows you to specify the way you want to deal with differences, including ignoring order. However, as soon as you want to do anything beyond the simplest customization, it becomes difficult to write and the details tend to be tied down to a specific domain object.
My personal preference is to use reflection to cycle through all the declared fields and drill down into each one, tracking differences as I go. Word of warning: Don't use recursion unless you like stack overflow exceptions. Keep things in scope with a stack (use a LinkedList or something). I usually ignore transient and static fields, and I skip object pairs that I've already compared, so I don't end up in infinite loops if someone decided to write self-referential code (However, I always compare primitive wrappers no matter what, since the same object refs are often reused). You can configure things up front to ignore collection ordering and to ignore special types or fields, but I like to define my state comparison policies on the fields themselves via annotations. This, IMHO, is exactly what annotations were meant for, to make meta data about the class available at runtime. Something like:
#StatePolicy(unordered=true, ignore=false, exactTypesOnly=true)
private List<StringyThing> _mylist;
I think this is actually a really hard problem, but totally solvable! And once you have something that works for you, it is really, really, handy :)
So, good luck. And if you come up with something that's just pure genius, don't forget to share!
In AssertJ, you can do:
Assertions.assertThat(expectedObject).isEqualToComparingFieldByFieldRecursively(actualObject);
Probably it won't work in all cases, however it will work in more cases that you'd think.
Here's what the documentation says:
Assert that the object under test (actual) is equal to the given
object based on recursive a property/field by property/field
comparison (including inherited ones). This can be useful if actual's
equals implementation does not suit you. The recursive property/field
comparison is not applied on fields having a custom equals
implementation, i.e. the overridden equals method will be used instead
of a field by field comparison.
The recursive comparison handles cycles. By default floats are
compared with a precision of 1.0E-6 and doubles with 1.0E-15.
You can specify a custom comparator per (nested) fields or type with
respectively usingComparatorForFields(Comparator, String...) and
usingComparatorForType(Comparator, Class).
The objects to compare can be of different types but must have the
same properties/fields. For example if actual object has a name String
field, it is expected the other object to also have one. If an object
has a field and a property with the same name, the property value will
be used over the field.
Override The equals() Method
You can simply override the equals() method of the class using the EqualsBuilder.reflectionEquals() as explained here:
public boolean equals(Object obj) {
return EqualsBuilder.reflectionEquals(this, obj);
}
Just had to implement comparison of two entity instances revised by Hibernate Envers. I started writing my own differ but then found the following framework.
https://github.com/SQiShER/java-object-diff
You can compare two objects of the same type and it will show changes, additions and removals. If there are no changes, then the objects are equal (in theory). Annotations are provided for getters that should be ignored during the check. The frame work has far wider applications than equality checking, i.e. I am using to generate a change-log.
Its performance is OK, when comparing JPA entities, be sure to detach them from the entity manager first.
I am usin XStream:
/**
* #see java.lang.Object#equals(java.lang.Object)
*/
#Override
public boolean equals(Object o) {
XStream xstream = new XStream();
String oxml = xstream.toXML(o);
String myxml = xstream.toXML(this);
return myxml.equals(oxml);
}
/**
* #see java.lang.Object#hashCode()
*/
#Override
public int hashCode() {
XStream xstream = new XStream();
String myxml = xstream.toXML(this);
return myxml.hashCode();
}
http://www.unitils.org/tutorial-reflectionassert.html
public class User {
private long id;
private String first;
private String last;
public User(long id, String first, String last) {
this.id = id;
this.first = first;
this.last = last;
}
}
User user1 = new User(1, "John", "Doe");
User user2 = new User(1, "John", "Doe");
assertReflectionEquals(user1, user2);
Hamcrest has the Matcher samePropertyValuesAs. But it relies on the JavaBeans Convention (uses getters and setters). Should the objects that are to be compared not have getters and setters for their attributes, this will not work.
import static org.hamcrest.beans.SamePropertyValuesAs.samePropertyValuesAs;
import static org.junit.Assert.assertThat;
import org.junit.Test;
public class UserTest {
#Test
public void asfd() {
User user1 = new User(1, "John", "Doe");
User user2 = new User(1, "John", "Doe");
assertThat(user1, samePropertyValuesAs(user2)); // all good
user2 = new User(1, "John", "Do");
assertThat(user1, samePropertyValuesAs(user2)); // will fail
}
}
The user bean - with getters and setters
public class User {
private long id;
private String first;
private String last;
public User(long id, String first, String last) {
this.id = id;
this.first = first;
this.last = last;
}
public long getId() {
return id;
}
public void setId(long id) {
this.id = id;
}
public String getFirst() {
return first;
}
public void setFirst(String first) {
this.first = first;
}
public String getLast() {
return last;
}
public void setLast(String last) {
this.last = last;
}
}
If your objects implement Serializable you can use this:
public static boolean deepCompare(Object o1, Object o2) {
try {
ByteArrayOutputStream baos1 = new ByteArrayOutputStream();
ObjectOutputStream oos1 = new ObjectOutputStream(baos1);
oos1.writeObject(o1);
oos1.close();
ByteArrayOutputStream baos2 = new ByteArrayOutputStream();
ObjectOutputStream oos2 = new ObjectOutputStream(baos2);
oos2.writeObject(o2);
oos2.close();
return Arrays.equals(baos1.toByteArray(), baos2.toByteArray());
} catch (IOException e) {
throw new RuntimeException(e);
}
}
Your Linked List example is not that difficult to handle. As the code traverses the two object graphs, it places visited objects in a Set or Map. Before traversing into another object reference, this set is tested to see if the object has already been traversed. If so, no need to go further.
I agree with the person above who said use a LinkedList (like a Stack but without synchronized methods on it, so it is faster). Traversing the object graph using a Stack, while using reflection to get each field, is the ideal solution. Written once, this "external" equals() and "external" hashCode() is what all equals() and hashCode() methods should call. Never again do you need a customer equals() method.
I wrote a bit of code that traverses a complete object graph, listed over at Google Code. See json-io (http://code.google.com/p/json-io/). It serializes a Java object graph into JSON and deserialized from it. It handles all Java objects, with or without public constructors, Serializeable or not Serializable, etc. This same traversal code will be the basis for the external "equals()" and external "hashcode()" implementation. Btw, the JsonReader / JsonWriter (json-io) is usually faster than the built-in ObjectInputStream / ObjectOutputStream.
This JsonReader / JsonWriter could be used for comparison, but it will not help with hashcode. If you want a universal hashcode() and equals(), it needs it's own code. I may be able to pull this off with a generic graph visitor. We'll see.
Other considerations - static fields - that's easy - they can be skipped because all equals() instances would have the same value for static fields, as the static fields is shared across all instances.
As for transient fields - that will be a selectable option. Sometimes you may want transients to count other times not. "Sometimes you feel like a nut, sometimes you don't."
Check back to the json-io project (for my other projects) and you will find the external equals() / hashcode() project. I don't have a name for it yet, but it will be obvious.
I think the easiest solution inspired by Ray Hulha solution is to serialize the object and then deep compare the raw result.
The serialization could be either byte, json, xml or simple toString etc. ToString seems to be cheaper. Lombok generates free easy customizable ToSTring for us. See example below.
#ToString #Getter #Setter
class foo{
boolean foo1;
String foo2;
public boolean deepCompare(Object other) { //for cohesiveness
return other != null && this.toString().equals(other.toString());
}
}
I guess you know this, but In theory, you're supposed to always override .equals to assert that two objects are truly equal. This would imply that they check the overridden .equals methods on their members.
This kind of thing is why .equals is defined in Object.
If this were done consistently you wouldn't have a problem.
A halting guarantee for such a deep comparison might be a problem. What should the following do? (If you implement such a comparator, this would make a good unit test.)
LinkedListNode a = new LinkedListNode();
a.next = a;
LinkedListNode b = new LinkedListNode();
b.next = b;
System.out.println(DeepCompare(a, b));
Here's another:
LinkedListNode c = new LinkedListNode();
LinkedListNode d = new LinkedListNode();
c.next = d;
d.next = c;
System.out.println(DeepCompare(c, d));
Apache gives you something, convert both objects to string and compare strings, but you have to Override toString()
obj1.toString().equals(obj2.toString())
Override toString()
If all fields are primitive types :
import org.apache.commons.lang3.builder.ReflectionToStringBuilder;
#Override
public String toString() {return
ReflectionToStringBuilder.toString(this);}
If you have non primitive fields and/or collection and/or map :
// Within class
import org.apache.commons.lang3.builder.ReflectionToStringBuilder;
#Override
public String toString() {return
ReflectionToStringBuilder.toString(this,new
MultipleRecursiveToStringStyle());}
// New class extended from Apache ToStringStyle
import org.apache.commons.lang3.builder.ReflectionToStringBuilder;
import org.apache.commons.lang3.builder.ToStringStyle;
import java.util.*;
public class MultipleRecursiveToStringStyle extends ToStringStyle {
private static final int INFINITE_DEPTH = -1;
private int maxDepth;
private int depth;
public MultipleRecursiveToStringStyle() {
this(INFINITE_DEPTH);
}
public MultipleRecursiveToStringStyle(int maxDepth) {
setUseShortClassName(true);
setUseIdentityHashCode(false);
this.maxDepth = maxDepth;
}
#Override
protected void appendDetail(StringBuffer buffer, String fieldName, Object value) {
if (value.getClass().getName().startsWith("java.lang.")
|| (maxDepth != INFINITE_DEPTH && depth >= maxDepth)) {
buffer.append(value);
} else {
depth++;
buffer.append(ReflectionToStringBuilder.toString(value, this));
depth--;
}
}
#Override
protected void appendDetail(StringBuffer buffer, String fieldName,
Collection<?> coll) {
for(Object value: coll){
if (value.getClass().getName().startsWith("java.lang.")
|| (maxDepth != INFINITE_DEPTH && depth >= maxDepth)) {
buffer.append(value);
} else {
depth++;
buffer.append(ReflectionToStringBuilder.toString(value, this));
depth--;
}
}
}
#Override
protected void appendDetail(StringBuffer buffer, String fieldName, Map<?, ?> map) {
for(Map.Entry<?,?> kvEntry: map.entrySet()){
Object value = kvEntry.getKey();
if (value.getClass().getName().startsWith("java.lang.")
|| (maxDepth != INFINITE_DEPTH && depth >= maxDepth)) {
buffer.append(value);
} else {
depth++;
buffer.append(ReflectionToStringBuilder.toString(value, this));
depth--;
}
value = kvEntry.getValue();
if (value.getClass().getName().startsWith("java.lang.")
|| (maxDepth != INFINITE_DEPTH && depth >= maxDepth)) {
buffer.append(value);
} else {
depth++;
buffer.append(ReflectionToStringBuilder.toString(value, this));
depth--;
}
}
}}

Categories

Resources