I get this suggestion from IntelliJ IDEA when using #Data annotation from lombok.
The class in question is an #Entity.
Can someone explain:
what does it do exactly (especially the part with Hibernate)
Is this method preferred over comparing every field one-by-one? If yes, why?
#Override
public boolean equals(Object o) {
if (this == o)
return true;
if (o == null || Hibernate.getClass(this) != Hibernate.getClass(o))
return false;
MyObject that = (MyObject ) o;
return id != null && Objects.equals(id, that.id);
}
The project contains/uses Spring boot, Hibernate, Lombok.
Thank you
There's a fundamental problem at work, one inherent to JPA/Hibernate. For this example, let's say we have a db table named User, and we have a class also named User that models it.
The problem boils down to simply this:
What does the java class User represent? Does it represent 'a row in the database table "User"', or does it represent a User?
Depending on your answer, you get a wildly different requirement for the equals method. Depending on which equals method you chose, answering this question incorrectly leads to code bugs. As far as I know, there is no actual 'standard', people just sort of do something and most aren't aware that this is a fundamental problem.
It represents a row in the DB
Such an interpretation would then suggest the following implementation of your equals method:
If all fields that model the primary key columns in the DB definition are equal between the two instances, then they are equal, even if the other (non-primary-key) fields are different. After all, that's how the DB determines equality, so java code should match it.
The java code should be like SQL when dealing with NULLs. That is to say, quite unlike just about every equality definition, equals method code generator (including lombok, intellij, and eclipse), and even the Objects.equals method, in this mode, null == null should be FALSE, as it is in SQL! Specifically, if any of the primary key fields have a null value, that object cannot be equal to any other, even a carbon copy of itself; to stick to java rules, it can (must, really) be equal to its own reference.
In other words:
Any 2 objects are equal if either [A] they are literally the same object (this == other), or [B] BOTH object's unid field is initialized and equal. Whether you use null or 0 to track 'not written to DB yet', that value instantly disqualifies that row from being equal to any other, even another one with 100% identical values.
After all, if you make 2 separate new objects and save() them both, they would turn into 2 separate rows.
It represents a user object
Then what happens is that the equals rules do a 180. The primary key, assuming its an unid style primary key and not a natural primary key, are inherently an implementation detail. Imagine that somehow in your DB you end up with 2 rows for the exact same user (presumably somebody messed up and failed to add a UNIQUE constraint on username, perhaps). In the semantic model of users on the system, users are uniquely identified by their username, therefore, equality is defined by username alone. 2 objects with identical username but different unid values are nevertheless equal.
So which one do I take?
I have no idea. Fortunately, your question asked for explanation and not an answer!
What IntelliJ is telling you is to go with the first interpretation (row in the DB), and even applies the wonky null stuff correctly, so whomever wrote the suggestion tool in intellij at least seems to understand what's going on.
For what its worth, I think 'represents a row in the DB' is the more 'useful' interpretation (because not doing this involves invoking getters which make equality checks incredibly pricey, as it may result in hundreds of SELECT calls and a gigantic bunch of heap mem as you pull half the DB in!), however, the 'an instance of class User represents a user in the system' is the more java-like interpretation and the one that most java programmers would (erroneously then, if you use intellij's suggestion here) silently presume.
I've solved this problem in my own programming endeavours by never using hibernate/JPA in the first place, and using tools like JOOQ or JDBI instead. But, the downside is that generally you end up with more code – you really do sometimes have an object, e.g. called UserRow, representing a user row, and an object e.g. called User that represents a user on-system.
Another trick could be to decide to name all your Hibernate model classes as XRow. Names are important and the best documentation around: This makes no bones about it and clues in all users of this code about how they are to interpret its semantic meaning: Row in DB. Thus, the intellij suggestion would then be your equals implementation.
NB: Lombok is java and not Hibernate specific, so it makes the 'represents a user in the system' choice. You can try to push lombok towards the 'row in DB' interpretation by telling lombok to only use the id field (stick an #EqualsAndHashCode.Include on that field), but lombok would still consider 2 null values / 2 0 values identical even though it shouldn't. This is on hibernate, as it is breaking all sorts of rules and specs.
(NB: Added due to a comment on another answer)
Why is .getClass() being invoked?
Java has sensible rules about what equals is supposed to mean. This is in the javadoc of the equals method and these rules can be relied upon (and are, by e.g. HashSet and co). The rules are:
If aequals(b) is true , a.hashCode() == b.hashCode() must also be true.
a.equals(a) must be true.
If a.equals(b) then b.equals(a) must also be true.
If a.equals(b) and b.equals(c) then a.equals(c) must also be true.
Sensible and simple, right?
Nope. That's actually really complex.
Imagine you make a subclass of ArrayList: You decide to give lists a colour. You can have a blue list of strings and a red list of strings.
Right now the equality method of ArrayList checks if the that is a list and if so, compares elements. Seems sensible, right? We can see it in action:
List<String> a = new ArrayList<String>();
a.add("Hello");
List<String> b = new LinkedList<String>();
b.add("Hello");
System.out.println(a.equals(b));
This prints true.
Let's now make our coloured arraylist implementation: class ColoredList<T> extends ArrayList<T> { .. }. Surely, a red empty list is no longer equal to a blue empty list right?
Nope, you'd be breaking rules if you do that!
List<String> a = new ArrayList<String>();
List<String> b = new ColoredList<String>(Color.RED);
List<String> c = new ColoredList<String>(Color.BLUE);
System.out.println(a.equals(b));
System.out.println(a.equals(c));
System.out.println(b.equals(c));
That prints true/true/false which is invalid. The conclusion is that it is in fact impossible to make any list subclass that adds some semantically relevant information. The only subclasses that can exist are those which either actively break spec (bad idea), or whose additions have no impact on equality.
There is a different view of things which says that you ought to be able to make such classes. Again we're struggling, just like with the JPA/Hibernate case, about what equals is even supposed to mean.
A more common and far better default behaviour for your equals implementations is to simply state that any 2 objects can only be equal if they are of the exact same type: An instance of Dog cannot be equal to an instance of Animal.
The only way to accomplish this, given that the rule a.equals(b)? Then b.equals(a) exists, is that animal checks the class of that and returns false if it isn't exactly Animal. In other words:
Animal a = new Animal("Betsy");
Cow c = new Cow("Betsy");
a.equals(c); // must return false!!
The .getClass() check accomplishes this.
Lombok gives you the best of both worlds. It can't perform miracles, so it won't take away the rule that at the type level you need to choose extensibility, but lombok has the canEqual system to deal with this: The equals code of Animal will ask the that code if the two things can be equal. In this mode, if you have some non-semantically-different subclass of animal (such as ArrayList, which is a subclass of AbstractList and doesn't change the semantics at all, it just adds implementation details that have no bearing on equality), it can say that it can be equal, whereas if you have one that is semantically different, such as your coloured list, it can say that none are.
In other words, going back to the coloured lists, IF ArrayList and co were written with lombok's canEqual system, this could have worked out, you could have had the results (where a is an arraylist, b is a red list, and c is a blur list):
a.equals(b); // false, even though same items
a.equals(c); // false, same reason.
b.equals(c); // false and now it's not a conflict.
Lombok's default behaviour is that all subtypes add semantic load and therefore any X cannot be equal to any Y where Y is a subclass of X, but you can override this by writing out the canEqual method in Y. You would do that if you write a subclass that doesn't add semantic load.
This isn't going to help you in the slightest with the problems above about hibernate.
Who knew something as seemingly simple as equality is hiding 2 intractably difficult philosophical treatises, huh?
For more info on canEqual, see lombok's #EqualsAndHashCode documentation.
I'm not trying to undermine ~rzwitserloot 's excellent answer, just trying to help you figure out why it uses Hibernate.getClass(this) for you instead of this.getClass().
It doesn't do it for me, but I don't have Hibernate in my project anyway.
The code is generated using velocity macros as seen here:
The IntelliJ default uses a file 'equalsHelper.vm'. I found a possible source of that file version at https://github.com/JetBrains/intellij-community/blob/master/java/java-impl/src/com/intellij/codeInsight/generation/equalsHelper.vm
It contains this:
#macro(addInstanceOfToText)
#if ($checkParameterWithInstanceof)
if(!($paramName instanceof $classname)) return false;
#else
if($paramName == null || getClass() != ${paramName}.getClass()) return false;
#end
#end
So apparently you have a different version of that file? Or you use a different template? Maybe some plugin changed it?
Two objects are not equal if they are of different class.
For 'preferred', it depends on what an 'id' is. The last line seems a little redundant; it could have been
return Objects.equals(id, that.id);
since the null case is handled by Objects.equals. But to my taste, it's clearer to write
return id != null && id.equals(that.id);
The extra layer adds nothing that I can see in the example.
Related
Question ahead:
why does in Java the call coll.contains(null) fail for ImmutableCollections?
I know, that immutable collections cannot contain null-elements, and I do not want to discuss whether that's good or bad.
But when I write a Function, that takes a (general, not explicit immutable) Collection, it fails upon checking for nulls. Why does the implementation not return false (which is actually the 'correct' answer)?
And how can I properly check for nulls in a Collection in general?
Edit:
with some discussions (thanks to the commenters!) I realized, that I mixed up two things: ImmutableCollection from the guava library, and the List returned by java.util.List.of, being some class from ImmutableCollections. However, both classes throw an NPE on .contains(null).
My problem was with the List.of result, but technically the same would happen with guaves implementation. [edit: It does not]
I am distressed by this discussion!
Collections that do this have been a pet peeve of mine since before I wrote the first collections that eventually became Guava. If you find any Guava collection that throws NPE just because you asked it a perfectly innocent question like .contains(null), please file a bug! We hate that crap.
EDIT: I was so distressed that I had to go back to look at my 2007 changelist that first created ImmutableSet and saw literally this:
#Override public boolean contains(#Nullable Object target) {
if (target == null) {
return false;
}
ahhhhh.
why does in Java the call coll.contains(null) fail for ImmutableCollections?
Because the design team (the ones who have created guava) decided that, for their collections, null is unwanted, and therefore any interaction between their collections and a null check, even in this case, should just throw to highlight to the programmer, at the earliest possible opportunity, that there is a mismatch. Even where the established behaviour (as per the existing implementations in the core runtime itself, such as ArrayList and friends, as well as the javadoc), rather explicitly go the other way and say that a non-sequitur check (is this pear part of this list of apples?) strongly suggests that the right move is to just return false and not throw.
In other words, guava messed up. But now that they have done so, going back is potentially backwards compatibility breaking. It really isn't very - you are replacing an exception thrown with a false return value; presumably code could be out there that relies on the NPE (catching it and doing something different from what the code would do had contains(null) returned false instead of throwing) - but that's a rare case, and guava breaks backwards compatibility all the time.
And how can I properly check for nulls in a Collection in general?
By calling .contains(null), just as you are. The fact that guava doesn't do it right doesn't change the answer. You might as well ask 'how do I add elements to a list', and counter the answer of "well, you call list.add(item) to do that" with: Well, I have this implementation of the List interface that plays Rick Astley over the speaker instead of adding to the list, so, I reject your answer.
That's.. how java and interfaces work: You can have implementations of them, and the only guardianship that they do what the interface dictates they must, is that the author understands there is a contract that needs to be followed.
Now, normally a library so badly written they break contract for no good reason*, isn't popular. But guava IS popular. Very popular. That gets at a simple truth: No library is perfect. Guava's API design is generally quite good (in my opinion, vastly superior to e.g. Apache commons libraries), and the team actively spends a lot of time debating proper API design, in the sense that the code that one would write using guava is nice (as defined by: Easy to understand, has few surprises, easy to maintain, easy to test, and probably easy to mutate to deal with changing requirements - the only useful definition for nebulous terms like 'nice' or 'elegant' code - it's code that does those things, anything else is pointless aesthetic drivel). In other words, they are actively trying, and they usually get it right.
Just, not in this case. Work around it: return item != null && coll.contains(item); will get the job done.
There is one major argument in favour of guava's choice: They 'contract break' is an implicit break - one would expect that .contains(null) works, and always returns false, but it's not explicitly stated in the javadoc that one must do this. Contrast to e.g. IdentityHashMap, which uses identity equivalence (a==b) and not value equality (a.equals(b)) in its .containsKey etc implementations, which explicitly goes against the javadoc contract as stated in the j.u.Map interface. IHM has an excellent reason for it, and highlights the discrepancy, plus explains the reason, in the javadoc. Guava isn't nearly as clear about their bizarre null behaviour, but, here's a crucial thing about null in java:
Its meaning is nebulous. Sometimes it means 'empty', which is bad design: You should never write if (x == null || x.isEmpty()) - that implies some API is badly coded. If null is semantically equivalent to some value (such as "" or List.of()), then you should just return "" or List.of(), and not null. However, in such a design, list.contains(null) == false) would make sense.
But sometimes null means not found, irrelevant, not applicable, or unknown (for example, if map.get(k) returns null, that's what it means: Not found. Not 'I found an empty value for you'). This matches with what NULL means in e.g. SQL. In all those cases, .contains(null) should be returning neither true nor false. If I hand you a bag of marbles and ask you if there is a marble in there that is grue, and you have no idea what grue means, you shouldn't answer either yes or no to my query: Either answer is a meaningless guess. You should tell me that the question cannot be answered. Which is best represented in java by throwing, which is precisely what guava does. This also matches with what NULL does in SQL. In SQL, v IN (x) returns one of 3 values, not 2 values: It can resolve to true, false, or null. v IN (NULL) would resolve to NULL and not false. It is answering a question that can't be answered with the NULL value, which is to be read as: Don't know.
In other words, guava made a call on what null implies which evidently does not match with your definitions, as you expect .contains(null) to return false. I think your viewpoint is more idiomatic, but the point is, guava's viewpoint is different but also consistent, and the javadoc merely insinuates, but does not explicitly demand, that .contains(null) returns false.
That's not useful whatsoever in fixing your code, but hopefully it gives you a mental model, and answers your question of "why does it work like this?".
We are checking the quality of our code using Sonar, and Sonar found code which compares objects for identity like this:
if (cellOfInterest == currentCell) { … }
Sonar finds this kind of identity check peculiar enough to call it critical and proposes to replace the identity check (using ==) with a check for equality (using .equals() instead). Identity checks, so the rationale behind this, are often not what is meant.
In our case, however, we iterate through a list of Cells and check in each iteration (currentCell) whether we are handling a special cell we already have (cellOfInterest).
I'd like to hear if there are other patterns than ours which are common and which avoid this issue simply by using a different design. Or what solutions do you propose to avoid using an identity check in the mentioned situation?
We considered a replacement of the identity check with an equality check as described above but it does not seem applicable in our situation because other cells might be "equal" as well but not identical.
All ideas are welcome!
If identity is what you need then it is what you need. The warning makes sense as it is often a bug but in this case it's not.
As an example, IdentityHashMap (which works with identity vs. equality) has this in its javadoc:
This class is not a general-purpose Map implementation! [...] This class is designed for use only in the rare cases wherein reference-equality semantics are required.
So it is rarely useful but has its uses. You are in a similar position. And of course, its internal code does exactly what you expect, it uses == to get a key.
Bottom line: I don't see why you would need to make the code more complex just because some static code analysis tool says it may be a problem - such a tool is supposed to help you, not to force you into some weird construct that will essentially do the same thing. I would explain why == is used in a comment for clarity, mark it as a false positive and move on.
If you really want to remove the warning, the only option I can think of is to use equals and change the equals method to either:
the default Object#equals which is based on identity
some implementation that uniquely identifies the cells, maybe based on some unique id or coordinates?
For Strings and most other Java objects, it is possible to have 2 instances which are identity unequal but are actually equivalent by .equals. It's conventional to avoid == for comparison (using equals or compareTo instead) but if it works, it works. You can mark the item as a false positive in SonarQube.
At first you will not run into problems if you just replace your identity check with an equals call (except that you will have to check for null values on cellOfInterest), as the default implementation of equals in Object is the identity check.
if (cellOfInterest != null && cellOfInterest.equals(currentCell)) { … }
will not break the code. It behaves exactly the same way as your code, when I may suppose that currentCell will not be null.
To omit the null check (and retain the behaviour on both values being null) you can also use (since Java 7)
if(Objects.equals(cellOfInterest, currentCell)) { ...}
In general using equals is the better architecture.
To view both cases you mentioned:
If the class is changeable by yourself, you might (or might not) come to the conclusion that there are better ways for equality than identity; so you just change the equals (and do not forget the hashCode!) in your class.
If you cannot change the class, you have to trust in a meaningful implementation of equals and hashCode by the provider of the class.
I commonly find myself writing code like this:
private List<Foo> fooList = new ArrayList<Foo>();
public Foo findFoo(FooAttr attr) {
for(Foo foo : fooList) {
if (foo.getAttr().equals(attr)) {
return foo;
}
}
}
However, assuming I properly guard against null input, I could also express the loop like this:
for(Foo foo : fooList) {
if (attr.equals(foo.getAttr()) {
return foo;
}
}
I'm wondering if one of the above forms has a performance advantage over the other. I'm well aware of the dangers of premature optimization, but in this case, I think the code is equally legible either way, so I'm looking for a reason to prefer one form over another, so I can build my coding habits to favor that form. I think given a large enough list, even a small performance advantage could amount to a significant amount of time.
In particular, I'm wondering if the second form might be more performant because the equals() method is called repeatedly on the same object, instead of different objects? Maybe branch prediction is a factor?
I would offer 2 pieces of advice here:
Measure it
If nothing else points you in any given direction, prefer the form which makes most sense and sounds most natural when you say it out loud (or in your head!)
I think that considering branch prediction is worrying about efficiency at too low of a level. However, I find the second example of your code more readable because you put the consistent object first. Similarly, if you were comparing this to some other object that, I would put the this first.
Of course, equals is defined by the programmer so it could be asymmetric. You should make equals an equivalence relation so this shouldn't be the case. Even if you have an equivalence relation, the order could matter. Suppose that attr is a superclass of the various foo.getAttr and the first test of your equals method checks if the other object is an instance of the same class. Then attr.equals(foo.getAttr()) will pass the first check but foo.getAttr().equals(attr) will fail the first check.
However, worrying about efficiency at this level seldom has benefits.
This depends on the implementation of the equals methods. In this situation I assume that both objects are instances of the same class. So that would mean that the methods are equal. This makes no performance difference.
If both objects are of the same type, then they should perform the same. If not, then you can't really know in advance what's going to happen, but usually it will be stopped quite quickly (with an instanceof or something else).
For myself, I usually start the method with a non-null check on the given parameter and I then use the attr.equals(foo.getAttr()) since I don't have to check for null in the loop. Just a question of preference I guess.
The only thing which does affect performance is code which does nothing.
In some cases you have code which is much the same or the difference is so small it just doesn't matter. This is the case here.
Where its is useful to swap the .equals() around is when you have a known value which cannot be null (This doesn't appear to be the cases here) of the type you are using is known.
e.g.
Object o = (Integer) 123;
String s = "Hello";
o.equals(s); // the type of equals is unknown and a virtual table look might be required
s.equals(o); // the type of equals is known and the class is final.
The difference is so small I wouldn't worry about it.
DEVENTER (n) A decision that's very hard to make because so little depends on it, such as which way to walk around a park
-- The Deeper Meaning of Liff by Douglas Adams and John Lloyd.
The performance should be the same, but in terms of safety, it's usually best to have the left operand be something that you are sure is not null, and have your equals method deal with null values.
Take for instance:
String s1 = null;
s1.equals("abc");
"abc".equals(s1);
The two calls to equals are not equivalent as one would issue a NullPointerException (the first one), and the other would return false.
The latter form is generally preferred for comparing with string constants for exactly this reason.
Say I have a unit test that wants to compare two complex for objects for equality. The objects contains many other deeply nested objects. All of the objects' classes have correctly defined equals() methods.
This isn't difficult:
#Test
public void objectEquality() {
Object o1 = ...
Object o2 = ...
assertEquals(o1, o2);
}
Trouble is, if the objects are not equal, all you get is a fail, with no indication of which part of the object graph didn't match. Debugging this can be painful and frustrating.
My current approach is to make sure everything implements toString(), and then compare for equality like this:
assertEquals(o1.toString(), o2.toString());
This makes it easier to track down test failures, since IDEs like Eclipse have a special visual comparator for displaying string differences in failed tests. Essentially, the object graphs are represented textually, so you can see where the difference is. As long as toString() is well written, it works great.
It's all a bit clumsy, though. Sometimes you want to design toString() for other purposes, like logging, maybe you only want to render some of the objects fields rather than all of them, or maybe toString() isn't defined at all, and so on.
I'm looking for ideas for a better way of comparing complex object graphs. Any thoughts?
The Atlassian Developer Blog had a few articles on this very same subject, and how the Hamcrest library can make debugging this kind of test failure very very simple:
How Hamcrest Can Save Your Soul (part 1)
Hamcrest saves your soul - Now with less suffering! (part 2)
Basically, for an assertion like this:
assertThat(lukesFirstLightsaber, is(equalTo(maceWindusLightsaber)));
Hamcrest will give you back the output like this (in which only the fields that are different are shown):
Expected: is {singleBladed is true, color is PURPLE, hilt is {...}}
but: is {color is GREEN}
What you could do is render each object to XML using XStream, and then use XMLUnit to perform a comparison on the XML. If they differ, then you'll get the contextual information (in the form of an XPath, IIRC) telling you where the objects differ.
e.g. from the XMLUnit doc:
Comparing test xml to control xml [different]
Expected element tag name 'uuid' but was 'localId' -
comparing <uuid...> at /msg[1]/uuid[1] to <localId...> at /msg[1]/localId[1]
Note the XPath indicating the location of the differing elements.
Probably not fast, but that may not be an issue for unit tests.
Because of the way I tend to design complex objects, I have a very easy solution here.
When designing a complex object for which I need to write an equals method (and therefore a hashCode method), I tend to write a string renderer, and use the String class equals and hashCode methods.
The renderer, of course, is not toString: it doesn't really have to be easy for humans to read, and includes all and only the values I need to compare, and by habit I put them in the order which controls the way I'd want them to sort; none of which is necessarily true of the toString method.
Naturally, I cache this rendered string (and the hashCode value as well). It's normally private, but leaving the cached string package-private would let you see it from your unit tests.
Incidentally, this isn't always what I end up with in delivered systems, of course - if performance testing shows that this method is too slow, I'm prepared to replace it, but that's a rare case. So far, it's only happened once, in a system in which mutable objects were being rapidly changed and frequently compared.
The reason I do this is that writing a good hashCode isn't trivial, and requires testing(*), while making use of the one in String avoids the testing.
(* Consider that step 3 in Josh Bloch's recipe for writing a good hashCode method is to test it to make sure that "equal" objects have equal hashCode values, and making sure that you've covered all possible variations are covered isn't trivial in itself. More subtle and even harder to test well is distribution)
The code for this problem exists at http://code.google.com/p/deep-equals/
Use DeepEquals.deepEquals(a, b) to compare two Java objects for semantic equality. This will compare the objects using any custom equals() methods they may have (if they have an equals() method implemented other than Object.equals()). If not, this method will then proceed to compare the objects field by field, recursively. As each field is encountered, it will attempt to use the derived equals() if it exists, otherwise it will continue to recurse further.
This method will work on a cyclic Object graph like this: A->B->C->A. It has cycle detection so ANY two objects can be compared, and it will never enter into an endless loop.
Use DeepEquals.hashCode(obj) to compute a hashCode() for any object. Like deepEquals(), it will attempt to call the hashCode() method if a custom hashCode() method (below Object.hashCode()) is implemented, otherwise it will compute the hashCode field by field, recursively (Deep). Also like deepEquals(), this method will handle Object graphs with cycles. For example, A->B->C->A. In this case, hashCode(A) == hashCode(B) == hashCode(C). DeepEquals.deepHashCode() has cycle detection and therefore will work on ANY object graph.
Unit tests should have well-defined, single thing they test. This means that in the end you should have well-defined, single thing that can be different about those two object. If there are too many things that can differ, I would suggest splitting this test into several smaller tests.
I followed the same track you are on. I also had additionnal troubles:
we can't modify classes (for equals or toString) that we don't own (JDK), arrays etc.
equality is sometimes different in various contexts
For example, tracking entities equality might rely on database ids when available ("same row" concept), rely the equality of some fields (the business key) (for unsaved objects). For Junit assertion, you might want all fields equality.
So I ended up creating objects that run through a graph, doing their job as they go.
There is typically a superclass Crawling object:
crawl through all properties of the objects ; stop at:
enums,
framework classes (if applicable),
at unloaded proxies or distant connections,
at objects already visited (to avoid looping)
at Many-To-One relationship, if they indicate a parent (usually not included in the equals semantic)
...
configurable so that it can stop at some point (stop completely, or stop crawling inside the current property):
when mustStopCurrent() or mustStopCompletely() methods return true,
when encountering some annotations on a getter or a class,
when the current (class, getter) belong to a list of exceptions
...
From that Crawling superclass, subclasses are made for many needs:
For creating a debug string (calling toString as needed, with special cases for Collections and arrays that don't have a nice toString ; handling a size limit, and much more).
For creating several Equalizers (as said before, for Entities using ids, for all fields, or solely based on equals ;). These equalizers often need special cases also (for example for classes outside your control).
Back to the question : These Equalizers could remember the path to the differing values, that would be very useful your JUnit case to understand the difference.
For creating Orderers. For example, saving entities need to be done is a specific order, and efficiency will dictate that saving the same classes together will give a huge boost.
For collecting a set of objects that can be found at various levels in the graph. Looping on the result of the Collector is then very easy.
As a complement, I must say that, except for entities where performance is a real concern, I did choose that technology to implements toString(), hashCode(), equals() and compareTo() on my entities.
For example, if a business key on one or more fields is defined in Hibernate via a #UniqueConstraint on the class, let's pretend that all my entities have a getIdent() property implemented in a common superclass.
My entities superclass has a default implementation of these 4 methods that relies on this knowledge, for example (nulls need to be taken care of):
toString() prints "myClass(key1=value1, key2=value2)"
hashCode() is "value1.hashCode() ^ value2.hashCode()"
equals() is "value1.equals(other.value1) && value2.equals(other.value2)"
compareTo() is combine the comparison of the class, value1 and value2.
For entities where performance is of concern, I simply override these methods to not use reflexion. I can test in regression JUnit tests that the two implementations behave identically.
We use a library called junitx to test the equals contract on all of our "common" objects:
http://www.extreme-java.de/junitx/
The only way I can think of to test the different parts of your equals() method is to break down the information into something more granular. If you are testing a deeply-nested tree of objects, what you are doing is not truly a unit test. You need to test the equals() contract on each individual object in the graph with a separate test case for that type of object. You can use stub objects with a simplistic equals() implementation for the class-typed fields on the object under test.
HTH
I would not use the toString() because as you say, it is usually more useful for creating a nice representation of the object for display or logging purposes.
It sounds to me that your "unit" test is not isolating the unit under test. If, for example, your object graph is A-->B-->C and you are testing A, your unit test for A should not care that the equals() method in C is working. Your unit test for C would make sure it works.
So I would test the following in the test for A's equals() method:
- compare two A objects that have identical B's, in both directions, e.g. a1.equals(a2) and a2.equals(a1).
- compare two A objects that have different B's, in both directions
By doing it this way, with a JUnit assert for each comparison, you will know where the failure is.
Obviously if your class has more children that are part of determining equality, you would need to test many more combinations. What I'm trying to get at though is that your unit test should not care about the behavior of anything beyond the classes it has direct contact with. In my example, that means, you would assume C.equals() works correctly.
One wrinkle may be if you are comparing collections. In that case I would use a utility for comparing collections, such as commons-collections CollectionUtils.isEqualCollection(). Of course, only for collections in your unit under test.
If you're willing to have your tests written in scala you could use matchete. It is a collection of matchers that can be used with JUnit and provide amongst other things the ability to compare objects graphs:
case class Person(name: String, age: Int, address: Address)
case class Address(street: String)
Person("john",12, Address("rue de la paix")) must_== Person("john",12,Address("rue du bourg"))
Will produce the following error message
org.junit.ComparisonFailure: Person(john,12,Address(street)) is not equal to Person(john,12,Address(different street))
Got : address.street = 'rue de la paix'
Expected : address.street = 'rue du bourg'
As you can see here I've been using case classes, which are recognized by matchete in order to dive into the object graph.
This is done through a type-class called Diffable. I'm not going to discuss type-classes here, so let's say that it is the corner stone for this mechanism, which compare 2 instances of a given type. Types that are not case-classes (so basically all types in Java) get a default Diffable that uses equals. This isn't very useful, unless you provide a Diffable for your particular type:
// your java object
public class Person {
public String name;
public Address address;
}
// you scala test code
implicit val personDiffable : Diffable[Person] = Diffable.forFields(_.name,_.address)
// there you go you can now compare two person exactly the way you did it
// with the case classes
So we've seen that matchete works well with a java code base. As a matter of fact I've been using matchete at my last job on a large Java project.
Disclaimer : i'm the matchete author :)
This question is specifically related to overriding the equals() method for objects with a large number of fields. First off, let me say that this large object cannot be broken down into multiple components without violating OO principles, so telling me "no class should have more than x fields" won't help.
Moving on, the problem came to fruition when I forgot to check one of the fields for equality. Therefore, my equals method was incorrect. Then I thought to use reflection:
--code removed because it was too distracting--
The purpose of this post isn't necessarily to refactor the code (this isn't even the code I am using), but instead to get input on whether or not this is a good idea.
Pros:
If a new field is added, it is automatically included
The method is much more terse than 30 if statements
Cons:
If a new field is added, it is automatically included, sometimes this is undesirable
Performance: This has to be slower, I don't feel the need to break out a profiler
Whitelisting certain fields to ignore in the comparison is a little ugly
Any thoughts?
If you did want to whitelist for performance reasons, consider using an annotation to indicate which fields to compare. Also, this implementation won't work if your fields don't have good implementations for equals().
P.S. If you go this route for equals(), don't forget to do something similar for hashCode().
P.P.S. I trust you already considered HashCodeBuilder and EqualsBuilder.
Use Eclipse, FFS!
Delete the hashCode and equals methods you have.
Right click on the file.
Select Source->Generate hashcode and equals...
Done! No more worries about reflection.
Repeat for each field added, you just use the outline view to delete your two methods, and then let Eclipse autogenerate them.
If you do go the reflection approach, EqualsBuilder is still your friend:
public boolean equals(Object obj) {
return EqualsBuilder.reflectionEquals(this, obj);
}
Here's a thought if you're worried about:
1/ Forgetting to update your big series of if-statements for checking equality when you add/remove a field.
2/ The performance of doing this in the equals() method.
Try the following:
a/ Revert back to using the long sequence of if-statements in your equals() method.
b/ Have a single function which contains a list of the fields (in a String array) and which will check that list against reality (i.e., the reflected fields). It will throw an exception if they don't match.
c/ In your constructor for this object, have a synchronized run-once call to this function (similar to a singleton pattern). In other words, if this is the first object constructed by this class, call the checking function described in (b) above.
The exception will make it immediately obvious when you run your program if you haven't updated your if-statements to match the reflected fields; then you fix the if-statements and update the field list from (b) above.
Subsequent construction of objects will not do this check and your equals() method will run at it's maximum possible speed.
Try as I might, I haven't been able to find any real problems with this approach (greater minds may exist on StackOverflow) - there's an extra condition check on each object construction for the run-once behaviour but that seems fairly minor.
If you try hard enough, you could still get your if-statements out of step with your field-list and reflected fields but the exception will ensure your field list matches the reflected fields and you just make sure you update the if-statements and field list at the same time.
You can always annotate the fields you do/do not want in your equals method, that should be a straightforward and simple change to it.
Performance is obviously related to how often the object is actually compared, but a lot of frameworks use hash maps, so your equals may be being used more than you think.
Also, speaking of hash maps, you have the same issue with the hashCode method.
Finally, do you really need to compare all of the fields for equality?
You have a few bugs in your code.
You cannot assume that this and obj are the same class. Indeed, it's explicitly allowed for obj to be any other class. You could start with if ( ! obj instanceof myClass ) return false; however this is still not correct because obj could be a subclass of this with additional fields that might matter.
You have to support null values for obj with a simple if ( obj == null ) return false;
You can't treat null and empty string as equal. Instead treat null specially. Simplest way here is to start by comparing Field.get(obj) == Field.get(this). If they are both equal or both happen to point to the same object, this is fast. (Note: This is also an optimization, which you need since this is a slow routine.) If this fails, you can use the fast if ( Field.get(obj) == null || Field.get(this) == null ) return false; to handle cases where exactly one is null. Finally you can use the usual equals().
You're not using foundMismatch
I agree with Hank that [HashCodeBuilder][1] and [EqualsBuilder][2] is a better way to go. It's easy to maintain, not a lot of boilerplate code, and you avoid all these issues.
You could use Annotations to exclude fields from the check
e.g.
#IgnoreEquals
String fieldThatShouldNotBeCompared;
And then of course you check the presence of the annotation in your generic equals method.
If you have access to the names of the fields, why don't you make it a standard that fields you don't want to include always start with "local" or "nochk" or something like that.
Then you blacklist all fields that begin with this (code is not so ugly then).
I don't doubt it's a little slower. You need to decide whether you want to swap ease-of-updates against execution speed.
Take a look at org.apache.commons.EqualsBuilder:
http://commons.apache.org/proper/commons-lang/javadocs/api-3.2/org/apache/commons/lang3/builder/EqualsBuilder.html