Checking for deep equality in JUnit tests

Checking for deep equality in JUnit tests - java

I am writing unit tests for objects that are cloned, serialized, and/or written to an XML file. In all three cases I would like to verify that the resulting object is the "same" as the original one. I have gone through several iterations in my approach and having found fault with all of them, was wondering what other people did.
My first idea was to manually implement the equals method in all the classes, and use assertEquals. I abandoned this this approach after deciding that overriding equals to perform a deep compare on mutable objects is a bad thing, as you almost always want collections to use reference equality for mutable objects they contain[1].
Then I figured I could just rename the method to contentEquals or something. However, after thinking more, I realized this wouldn't help me find the sort of regressions I was looking for. If a programmer adds a new (mutable) field, and forgets to add it to the clone method, then he will probably forget to add it to the contentEquals method too, and all these regression tests I'm writing will be worthless.
I then wrote a nifty assertContentEquals function that uses reflection to check the value of all the (non-transient) members of an object, recursively if necessary. This avoids the problems with the manual compare method above since it assumes by default that all fields must be preserved and the programmer must explicitly declare fields to skip. However, there are legitimate cases when a field really shouldn't be the same after cloning[2]. I put in an extra parameter toassertContentEquals that lists which fields to ignore, but since this list is declared in the unit test, it gets real ugly real fast in the case of recursive checking.
So I am now thinking of moving back to including a contentEquals method in each class being tested, but this time implemented using a helper function similar to the assertContentsEquals described above. This way when operating recursively, the exemptions will be defined in each individual class.
Any comments? How have you approached this issue in the past?
Edited to expound on my thoughts:
[1]I got the rational for not overriding equals on mutable classes from this article. Once you stick a mutable object in a Set/Map, if a field changes then its hash will change but its bucket will not, breaking things. So the options are to not override equals/getHash on mutable objects or have a policy of never changing a mutable object once it has been put into a collection.
I didn't mention that I am implementing these regression test on an existing codebase. In this context, the idea of changing the definition of equals, and then having to find all instances where it could change the behavior of the software frightens to me. I feel like I could easily break more than I fix.
[2]One example in our code base is a graph structure, where each node needs a unique identifier to use to link the nodes XML when eventually written to XML. When we clone these objects we want the identifier to be different, but everything else to remain the same. After ruminating about it more, it seems like the questions "is this object already in this collection" and "are these objects defined the same", use fundamentally different concepts of equality in this context. The first is asking about identity and I would want the ID included if doing a deep compare, while the second is asking about similarity and I don't want the ID included. This is making me lean more against implementing the equals method.
Do you guys agree with this decision, or do you think that implementing equals is the better way to go?

I would go with the reflection approach and define a custom Annotation with RetentionPolicy.RUNTIME to allow the implementers of the tested classes to mark the fields that are expected to change after cloning. You can then check the annotation with reflection and skip the marked fields.
This way you can keep your test code generic and simple and have a convenient means to mark exceptions directly in the code without affecting the design or runtime behavior of the code that needs to be tested.
The annotation could look like this:
import java.lang.annotation.*;
#Retention(RetentionPolicy.RUNTIME)
#Target({ElementType.FIELD})
public #interface ChangesOnClone
{
}
This is how it can be used in the code that is to be tested:
class ABC
{
private String name;
#ChangesOnClone
private Cache cache;
}
And finally the relevant part of the test code:
for ( Field field : fields )
{
if( field.getAnnotation( ChangesOnClone.class ) )
continue;
// else test it
}

AssertJ's offers a recursive comparison function:
assertThat(new).usingRecursiveComparison().isEqualTo(old);
See the AssertJ documentation for details: https://assertj.github.io/doc/#basic-usage
Prerequisites for using AssertJ:
import:
import static org.assertj.core.api.Assertions.*;
maven dependency:
<!-- test -->
<dependency>
<groupId>org.assertj</groupId>
<artifactId>assertj-core</artifactId>
<version>3.19.0</version>
<scope>test</scope>
</dependency>

Related

How to unit test a method in an enum that might change

So I have an autogenerated enum where each enum contains several fields and I wish to test some of the logic of the methods contained in the enum. Examples could be "find all enums with this value in this field". However the enum can possibly change, more specifically, the values and the number of enum elements, but not the number of fields in each enum. This also includes the possibility of mocking the values() method.
Now I'm afraid if I make tests using specific values, those tests might fail if the values are no longer present in the enum.
So my options are either: Add elements to the existing enum that I might then use in the test or mock the entire enum with new values I can use in the test.
Now my question, what is good practice? I've read about powermock, however it seems to be differing oppinions on this. Any better solutions? Am I looking at this wrong?

The part that can be easily answered: you don't need a mocking framework here.
You have enums of some content - and when you want to test their internals, a mocking framework is of no use. There is no point in mocking the values() when your goal is to test certain properties of these generated enums.
In other words: your test cases should boil down into code that fetches values and then somehow asserts() something on them. Worst case, you might have to use reflection, as in:
somehow collect the names of all enum classes to test (could be achieved by scanning class path content for example)
for each such enum - maybe use reflection to acquire certain fields - to then assert against expected results.
But most likely, the real answer is completely different: it is be wrong to unit test generated code in the first place. Rather have unit tests to verify the code generator instead.
You see - when your unit tests find a problem in the generated enum? What will you do ... probably change your generator.

How to determine if a Java method modifies an object passed as parameter

I come from a C++ background and I am currently learning Java. One question arose when I have tried using some third party libraries. How do I determine if the call to a method taking an object reference as parameter modifies the object?
In C++ this is clear thanks to the use of the const keyword. If the method signature is:
void foo(Boo& boo);
I know that the referenced object might be modified, while if the method signature is:
void foo(const Boo& boo);
The compiler guarantees that the referenced object is not modified.
I haven't seen something analogous in Java, as only the reference itself can be declared final, not the referenced object, and a final argument doesn't make much sense in the first place since it is passed by value anyway. Therefore, when I see a method such as:
void foo(Boo boo) {...}
How do I determine if the object referenced by boo is modified inside the body of the function (maybe using annotations)? If there is no way to know, is there some widely used convention or some best practices to avoid confusion and bugs?

how do I determine if the object referenced by boo is modified inside the body of the function (maybe using annotations)?
The only way is to read the code unfortunately.
If there is no way to know, is there some widely used convention or some best practices to avoid confusion and bugs?
The common convention is to pass an object which cannot be modified, using a wrapper if needed. This ensure the class cannot modify the object.
List<String> readOnly = Collections.unmodifiableList(list);
If the object is Cloneable, you can also use clone() but another common approach is to use a copy.
List<String> readOnly = new ArrayList<>(list);
If you care about such behaviour, unit tests can show whether a method modifies an object or not. If you have unit tests already, it is usually one or two lines extra to check for this.

There's no such facility built in to the language, unfortunately. A good defensive practice is to define the data objects you pass around as immutable (i.e., without any public method that allows modifying their state). If you are really concerned about this, you could copy/clone an object before passing it to a method you don't trust, but this is usually a redundant precaution.

NOTE: this answer is a more detailed version of
You can also write purity or side-effect annotations in your code — mernst
There exists the Checker Framework among the various things it can check at compile-time via annotations is the IJG Immutablity checker. This checker allows you to annotate object references with #Immutable or #ReadOnly.
The problem is that you often would have to annotate the library yourself. To ease your task the Checker Framework can automatically infer part of the annotations; you will still have to do much yourself.

A side effect analysis is not built into the Java language.
You can perform side effect analysis via manual inspection, but several tools exist to automate the process.
You can use an inference tool (1, 2, 3) to detect whether your code side-effects a parameter.
You can also write purity or side-effect annotations in your code and then use a checking/verification tool (1, 2) to ensure that your code conforms to the annotations you have written.
All of the above-linked tools have limitations, but you might find them useful. If you know of other tools, mention them in comments.

How do I determine if the object referenced by boo is modified inside
the body of the function (maybe using annotations)?
I must agree with other answers that there is no direct way to determine that method will modify your object or not and yes to make sure that method can not modify your Object you all have to do it is from your side.
If there is no way to know, is there some widely used convention or
some best practices to avoid confusion and bugs?
Here the method name comes to the scene. Moving ahead with the naming convention of method we have to take a look at some method declarations which clearly convince you that your Object will not be changed at all.
For example, You know that Arrays.copyOf will not change your actual array, System.out.println(boo) will not change your boo
Method names are real weapons to provide as much information as possible to the method user.(Yes! it's always not possible but quite a good practice to follow.)
Let's consider it in your case that say printBoo will only print, copyBoo will only copy, clearBoo will reset all attributes, checkAndCreateNewBoo will check your boo Object and create new if required.
So, ultimately if we can use them in a proper way caller can be assured with the fact that Object will remain the same after calling the method.

As everyone says, prefer using immutable objects and also avoid void methods
The available purposes of methods like this
void foo(Boo boo) {...}
are to change the state of the object itself or change the object passed as a parameter
void completOrder(Order order) { ... }
//or
void parserTokenEnded(String str) { ... }

There is a way , that the method developer should mark parameter as final , if it is not going to modify the parameter.
public void test(final Object param)
However very few people follow this , so it is difficult to know. However good programmer follow this rule , especially writing the api. If you want to write method and expose it. Make param final to indicate that passed object is not going to be modified.

Is passing 'this' in a method call accepted practice in java

Is it good/bad/acceptable practice to pass the current object in a method call. As in:
public class Bar{
public Bar(){}
public void foo(Baz baz){
// modify some values of baz
}
}
public class Baz{
//constructor omitted
public void method(){
Bar bar = new Bar();
bar.foo(this);
}
}
Specifically, is the line bar.foo(this) acceptable?

There's nothing wrong with that. What is NOT a good practice is to do the same inside constructors, because you would give a reference to a not-yet-completely-initialized object.
There is a sort of similar post here: Java leaking this in constructor
where they give an explanation of why the latter is a bad practice.

There's no reason not to use it, this is the current instance and it's perfectly legitimate to use. In fact there's often no clean way to omit it.
So use it.
As it's hard to convince it's acceptable without example (a negative answer to such a question is always easier to argument), I just opened one of the most common java.lang classes, the String one, and of course I found instances of this use, for example
1084 // Argument is a String
1085 if (cs.equals(this))
1086 return true;
Look for (this in big "accepted" projects, you won't fail to find it.

Yes, but you should be careful about two things
Passing this when the object has not been constructed yet (i.e. in its constructor)
Passing this to a long-living object, that will keep the reference alive and will prevent the this object from being garbage collected.

It's perfectly normal and perfectly acceptable.

this stands for the current object. What you are doing is sytatically correct but i don't see a need of this if you are calling the method in the same class.

It is bad practice to pass the current object in a method call if there less complex alternatives to achieve the same behaviour.
By definition, a bidirectional association is created as soon as this is passed from one object to another.
To quote Refactoring, by Martin Fowler:
Change Bidirectional Association to Unidirectional (200)
Bidirectional associations are useful, but they carry a price. The
price is the added complexity of maintaining the two-way links and
ensuring that objects are properly created and removed. Bidirectional
associations are not natural for many programmers, so they often are a
source of errors
...
You should use bidirectional associations when you need to but not
when you don’t. As soon as you see a bidirectional association is no
longer pulling its weight, drop the unnecessary end.
So, theoretically, we should be hearing alarm bells when we find we need to pass this and try really hard to think of other ways to solve the problem at hand. There are, of course, times when, at last resort, it makes sense to do it.
Also it is often necessary to corrupt your design temporarily, doing 'bad practice things', during a longer term refactoring of your code for an overall improvement. (One step back, two steps forward).
In practice I have found my code has improved massively by avoiding bidirectional links like the plague.

Yes. you can use it.Its just common in programming to pass this.But there are pros and cons about using that.Still it is not hazardous to do so.

Just to add one more example where passing this is correct and follows good design: Visitor pattern. In Visitor design pattern, method accept(Visitor v) is typically implemented in a way it just calls v.visit(this).

Acceptable
Snippet from Oracle JAVA docs:
Within an instance method or a constructor, this is a reference to the
current object — the object whose method or constructor is being
called. You can refer to any member of the current object from within
an instance method or a constructor by using this.
Using this with a Field
The most common reason for using the this keyword is because a field
is shadowed by a method or constructor parameter.

Everything in java is passed by value. But objects are NEVER passed to the method!
When java passes an object to a method, it first makes a copy of a reference to the object, not a copy of the object itself. Hence this is pefectly used method in java. And most commonly followed usage.

Java: What is the fastest way to inject fields using reflection?

Suppose, I have a lot of classes, which are constructed using Java reflection (for some reason). Now I need to post-inject values to fields, which are
annotated with #PostInject.
public class SomeClass {
#PostInject
private final String someString = null;
public void someMethod() {
// here, someString has a value.
}
}
My question is: what is a fast way to set a field using reflection?
Remember, I need to do this very often on a lot of classes, that's
why performance is relevant.
What I would do by intuition is shown by this pseudo-code:
get all fields of the class
clazz.getFields();
check, which are annotated with #PostInject
eachField.getAnnotation(PostInject.class);
make these fields accessible
eachAnnotatedField.setAccessible(true);
set them to a certain value
eachAnnotatedField.set(clazz, someValue);
I'm afraid that getting all fields is the slowest thing to do.
Can I someone get a field, when I know it from the beginning?
NOTE: I can't just let the classes implement some interface, which would
allow to set the fields using a method. I need POJOs.
NOTE2: Why I want post-field injection: From the point of view of an API user, it must be possible to use final fields. Furthermore, when the types and number of fields are not known by the API a priori, it is impossible to achieve field initialization using an interface.
NOTE2b: From the point of view of the user, the final contract is not broken. It stays final. First, a field gets initialized, then it can't be changed. By the way: there are a lot of APIs which use this concept, one of them is JAXB (part of the JDK).

How about doing steps 1 to 3 just after you constructed the object and saving the set of annotated fields that you obtain either in the object itself or by keeping a separate map of class to set-of-annotated-fields?
Then, when you need to update the injected fields in an object, retrieve the set from either the object or the seperate map and perform step 4.

Don't know if it's any good, but this project looks like it would do what you want. Quote:
A set of reflection utilities and
miscellaneous utilities related to
working with classes and their fields
with no dependencies which is
compatible with java 1.5 and generics.
The utilities cache reflection data
for high performance operation but
uses weak/soft caching to avoid
holding open ClassLoaders and causing
the caches to exist in memory
permanently. The ability to override
the caching mechanism with your own is
supported.

Another option, as you say you know the few fields concerned from the beginning, is to ask only for those fields or methods.
Example : see getDeclaredMethod or getDeclaredField in java/lang/Class.html

You can exploit existing frameworks that allow to inject dependencies on object construction. For example Spring allows to do that with aspectj weaving. The general idea is that you define bean dependencies at spring level and just mark target classes in order to advise their object creation. Actual dependency resolution logic is injected directly to the class byte-code (it's possible to use either compile- or load-time weaving).

Fastest way to do anything with reflection is to cache the actual Reflection API classes whenever possible. For example I very recently made a yet-another-dynamic-POJO-manipulator which I believe is one of those things everyone ends up doing at some point which enables me to do this:
Object o = ...
BeanPropertyController c = BeanPropertyController.of(o);
for (String propertyName : c.getPropertyNames()) {
if (c.access(propertyName) == null &&
c.typeOf(propertyName).equals(String.class)) {
c.mutate(propertyName, "");
}
}
The way it works is that it basically has that one controller object which lazyloads all the properties of the bean (note: some magic involved) and then reuses them as long as the actual controller object is alive. All I can say is that by just saving the Method objects themselves I managed to turn that thing into a damn fast thing and I'm quite proud of it and even considering releasing it assuming I can manage to sort out copyrights etc.

trying to use only one method name

When I was programming a Form Validator in PHP, when creating new methods, I needed to increase the number of arguments in old methods.
When I was learning Java, when I read that extends is to not touch previously tested, working code, I thought I shouldn't have increased the number of arguments in the old methods, but overridden the old methods with the new methods.
Imagine if you are to verify if a field is empty in one part of the form, in an other and in yet an other.
If the arguments are different, you'll overload isEmpty, but, if the arguments are equal, is it right to use isEmpty, isEmpty2, isEmpty3, three classes and one isEmpty per class or, if both are wrong, what should I have done?

So the question is:
If I need different behaviors for a method isEmpty which receives the same number arguments, what should I do?
Use different names? ( isEmpty, isEmpty2, isEmpty3 )
Have three classes with a single isEmpty method?
Other?
If that's the question then I think you should use:
When they belong to the same logical unit ( they are of the same sort of validation ) but don't use numbers as version, better is to name them after what they do: isEmptyUser, isEmptyAddress, isEmptyWhatever
When the validator object could be computed in one place and passed around during the program lifecycle. Let's say: Validator v = Validator.getInstance( ... ); and then use it as : validator.isEmpty() and let polymorphism to it's job.
Alternatively you could pack the arguments in one class and pass it to the isEmpty method, although you'll end up with pretty much the same problem of the name. Still it's easier to refactor from there and have the new class doing the validation for you.
isEmpty( new Arguments(a,b,c ) ); => arguments.isEmpty();

The Open/Closed Principle [usually attributed to Bertrand Meyer] says that "software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification". This might be the principle that you came across in your Java days. In real life this applies to completed code where the cost of modification, re-testing and re-certification outweighs the benefit of the simplicity gained by making a direct change.
If you are changing a method because it needs an additional argument, you might choose to use the following steps:
Copy the old method.
Remove the implementation from the copy.
Change the signature of the original method to add the new argument.
Update the implementation of the original method to use the new argument.
Implement the copy in terms of the new method with a default value for the argument.
If your implementation language doesn't support method overloading then the principle is the same but you need to find a new name for the new method signature.
The advantage of this approach is that you have added the new argument to the method, and your existing client code will continue to compile and run.
This works well if there is an obvious default for the new argument, and less well if there isn't.

Since java 5 you can use variable list of arguments as in void foo(Object ... params)
You will need to come up with creative names for your methods since you can't overload methods that have same type and number of arguments (or based on return type). I actually personally prefer this to overloading anyway. So you can have isEmpty and isEmptyWhenFoo and isEmptyWhenIHaveTheseArguments (well meybe not the last one :)

Not sure if this actually answers your question, but the best way to think about OO in "real life" is to think of the Nygaard Classification:
ObjectOrientedProgramming. A program execution is regarded as a physical model, simulating the behavior of either a real or imaginary part of the world.
So how would you build a physical device to do what you are trying to do in code? You'd probably have some kind of "Form" object, and the form object would have little tabs or bits connected to it to represent the different Form variables, and then you would build a Validator object that would take the Form object in a slot and then flash one light if the form was valid and another if it was invalid. Or your Validator could take a Form object in one slot and return a Form object out (possibly the same one), but modified in various ways (that only the Validator understood) to make it "valid". Or maybe a Validator is part of a Form, and so the Form has this Validator thingy sticking out of it...
My point is, try to imagine what such a machine would look like and how it would work. Then think of all of the parts of that machine, and make each one an object. That's how "object-oriented" things work in "real life", right?
With that said, what is meant by "extending" a class? Well, a class is a "template" for objects -- each object instance is made by building it from a class. A subclass is simply a class that "inherits" from a parent class. In Java at least, there are two kinds of inheritance: interface inheritance and implementation inheritance. In Java, you are allowed to inherit implementation (actual method code) from at most one class at a time, but you can inherit many interfaces -- which are basically just collections of attributes that someone can see from outside your class.
Additionally, a common way of thinking about OO programming is to think about "messages" instead of "method calls" (in fact, this is the original term invented by Alan Kay for Smalltalk, which was the first language to actually be called "object-oriented"). So when you send an isEmpty message to the object, how do you want it to respond? Do you want to be able to send different arguments with the isEmpty message and have it respond differently? Or do you want to send the isEmpty message to different objects and have them respond differently? Either are appropriate answers, depending on the design of your code.

Instead having one class providing multiple versions of isEmpty with differing names, try breaking down your model into a finer grained pieces the could be put together in more flexible ways.
Create an interface called Empty with
one method isEmpty(String value);
Create implemntations of this
interface like EmptyIgnoreWhiteSpace
and EmptyIgnoreZero
Create FormField
class that have validation methods
which delegate to implementations of
Empty.
Your Form object will have
instances of FormField which will
know how to validate themselves.
Now you have a lot of flexibility, you can combine your Empty implemenation classes to make new classes like EmptyIgnoreWhiteSpaceAndZero. You can use them in other places that have nothing to do with form field validation.
You don't have have have multple similarly named methods polluting your object model.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.