Purpose of CompareObjectsWithEquals PMD rule - java

I've just fallen foul of the CompareObjectsWithEquals rule in PMD, because I've compared two object references using '==' instead of equals(), but I'm struggling to see why this is a problem and can't find any justification for this restriction.
I appreciate that Object.equals() compares references and therefore has the same effect, but I'm not using a raw Object, so I can't guarantee that method won't be overridden at some point somewhere in the hierarchy.
I want to do a reference comparison, and I want to be sure that this always will be a reference comparison. Why would PMD try to force me to call equals()?
Is it just me, or is this a really stupid rule??
Edited:
Just to be clear - I am not asking what the difference is between == and equals() (as per What is the difference between == vs equals() in Java?) - I understand this perfectly. I am asking why PMD would force me to always use equals() when the caller may legitimately want to ensure that a reference comparison is performed.

In your case, you know what you are doing, and need to compare reference so for sure the rule does not apply. And you have to use ==.
But most of the time, that's a mistake from new Java developers who try to compare value of objects using == instead of .equals().

Addition to #YMomb:
PMD and those kind of static analysis tools always leave the final decision to user. You have full rights to ignore any rule if you believe that your design is correct.

Related

Checking objects for identity and Sonar issues

We are checking the quality of our code using Sonar, and Sonar found code which compares objects for identity like this:
if (cellOfInterest == currentCell) { … }
Sonar finds this kind of identity check peculiar enough to call it critical and proposes to replace the identity check (using ==) with a check for equality (using .equals() instead). Identity checks, so the rationale behind this, are often not what is meant.
In our case, however, we iterate through a list of Cells and check in each iteration (currentCell) whether we are handling a special cell we already have (cellOfInterest).
I'd like to hear if there are other patterns than ours which are common and which avoid this issue simply by using a different design. Or what solutions do you propose to avoid using an identity check in the mentioned situation?
We considered a replacement of the identity check with an equality check as described above but it does not seem applicable in our situation because other cells might be "equal" as well but not identical.
All ideas are welcome!
If identity is what you need then it is what you need. The warning makes sense as it is often a bug but in this case it's not.
As an example, IdentityHashMap (which works with identity vs. equality) has this in its javadoc:
This class is not a general-purpose Map implementation! [...] This class is designed for use only in the rare cases wherein reference-equality semantics are required.
So it is rarely useful but has its uses. You are in a similar position. And of course, its internal code does exactly what you expect, it uses == to get a key.
Bottom line: I don't see why you would need to make the code more complex just because some static code analysis tool says it may be a problem - such a tool is supposed to help you, not to force you into some weird construct that will essentially do the same thing. I would explain why == is used in a comment for clarity, mark it as a false positive and move on.
If you really want to remove the warning, the only option I can think of is to use equals and change the equals method to either:
the default Object#equals which is based on identity
some implementation that uniquely identifies the cells, maybe based on some unique id or coordinates?
For Strings and most other Java objects, it is possible to have 2 instances which are identity unequal but are actually equivalent by .equals. It's conventional to avoid == for comparison (using equals or compareTo instead) but if it works, it works. You can mark the item as a false positive in SonarQube.
At first you will not run into problems if you just replace your identity check with an equals call (except that you will have to check for null values on cellOfInterest), as the default implementation of equals in Object is the identity check.
if (cellOfInterest != null && cellOfInterest.equals(currentCell)) { … }
will not break the code. It behaves exactly the same way as your code, when I may suppose that currentCell will not be null.
To omit the null check (and retain the behaviour on both values being null) you can also use (since Java 7)
if(Objects.equals(cellOfInterest, currentCell)) { ...}
In general using equals is the better architecture.
To view both cases you mentioned:
If the class is changeable by yourself, you might (or might not) come to the conclusion that there are better ways for equality than identity; so you just change the equals (and do not forget the hashCode!) in your class.
If you cannot change the class, you have to trust in a meaningful implementation of equals and hashCode by the provider of the class.

I don't understand this ("string" == "string") example

I found this java code on a java tutorial page:
if ("progress" == evt.getPropertyName())
http://download.oracle.com/javase/tutorial/uiswing/examples/components/index.html
How could this work? I thought we HAVE TO use the equals() method for this situation (string.equals("bla"))? Could we use equals() here too? Would it be better? Any ideas?
Edit: So IF equals() would be better, then I really don't understand why a serious oracle tutorial page didn't use it? Also, I don't understand why it's working because I thought a string is an object. If I say object == object, then that's a big problem.
Yes, equals() would definitely be better and correct. In Java, a pool of string constants is maintained and reused intelligently for performance. So this can work, but it is only guaranteed if evt.getPropertyName() is assured to return constants.
Also, the more correct version would be "progress".equals(evt.getPropertyName()), in case evt.getPropertyName() is null. Note that the implementation of String.equals starts with using == as a first test before doing char-by-char comparison, so performance will not be much affected versus the original code.
Which demo are we looking at?
This explains equals() vs ==
http://www.java-samples.com/showtutorial.php?tutorialid=221
It is important to understand that the equals( ) method and the == operator perform two different operations. As just explained, the equals( ) method compares the characters inside a String object. The == operator compares two object references to see whether they refer to the same instance. The following program shows how two different String objects can contain the same characters, but references to these objects will not compare as equal:
So in your particular example, it is comparing the reference to see if they are the same reference, not to see if the string chars match I believe.
The correct version of this code should be:
if ("progress".equals(evt.getPropertyName()))
This could work because of the way that the JVM handles string constants. Each string constant is intern()ed. So if evt.getPropertyName() is returning a reference to a string constant than using == will work. But it is bad form and in general it will not work.
This only would work if evt.getPropertyName() returns a constant string of value "progress".
With constant string, I mean evaluated at compile-time.
In most cases, when comparing Strings, using equals is best. However, if you know you'll be comparing the exact same String objects (not just two strings that have the same content), or if you're dealing entirely with constant Strings and you really care about performance, using == will be somewhat faster than using equals. You should normally use equals since you normally don't care about performance sufficiently to think about all the other prerequisites for using ==.
In this case, the author of the progress demo should probably have used equals - that code isn't especially performance-critical. However, in this particular case, the code will be dealing entirely with constant strings, so whilst it's probably not the best choice, especially for a demo, it is a valid choice.

Why doesn't Java warn about a == "something"?

This might sound stupid, but why doesn't the Java compiler warn about the expression in the following if statement:
String a = "something";
if(a == "something"){
System.out.println("a is equal to something");
}else{
System.out.println("a is not equal to something");
}
I realize why the expression is untrue, but AFAIK, a can never be equal to the String literal "something". The compiler should realize this and at least warn me that I'm an idiot who is coding way to late at night.
Clarification
This question is not about comparing two String object variables, it is about comparing a String object variable to a String literal. I realize that the following code is useful and would produce different results than .equals():
String a = iReturnAString();
String b = iReturnADifferentString();
if(a == b){
System.out.println("a is equal to b");
}else{
System.out.println("a is not equal to b");
}
In this case a and b might actually point to the same area in memory, even if it's not because of interning. In the first example though, the only reason it would be true is if Java is doing something behind the scenes which is not useful to me, and which I can't use to my advantage.
Follow up question
Even if a and the string-literal both point to the same area in memory, how is that useful for me in an expression like the one above. If that expression returns true, there isn't really anything useful I could do with that knowledge, is there? If I was comparing two variables, then yes, that info would be useful, but with a variable and a literal it's kinda pointless.
Actually they can indeed be the same reference if Java chooses to intern the string. String interning is the notion of having only one value for a distinct string at runtime.
http://en.wikipedia.org/wiki/String_intern_pool
Java notes about string interning
http://javatechniques.com/blog/string-equality-and-interning/
Compiler warnings tend to be about things that are either blatantly wrong (conditionals that can never be true or false) or unsafe (unchecked casts). The use of == is valid, and in some rare cases intentional.
I believe all of Checkstyle, FindBugs and PMD will warn about this, and optionally a lot of other bad practices we tend to have when half asleep or otherwise incapacitated ;).
Because:
you might actually want to use ==, if working with constants and interned strings
the compiler should make an exception only for String, and no other type. What I mean is - whenever the compiler encounters == it should check if the operands are Strings in order to issue a warning. What if the arguments are Strings, but are referred to as Object or CharSequence ?
The rationale given by checkstyle for issuing an error is that novice programmers often do this. And if you are novice, I'd be hard to configure checkstyle (or pmd), or even to know about them.
Another thing is the actual scenario when strings are compared and there is a literal as one of the operands. First, it would be better to use a constant (static final) instead of a literal. And where would the other operand come from? It is likely that it will come from the same constant / literal, somewhere else in the code. So == would work.
Depending on the context, both identity comparisons and value comparisons can be legitimate.
I can think of very few queries where there is a deterministic automated algorithm to figure out unambiguously that one of them is an error.
Therefore, there's no attempt to do this automatically.
If you think about things like caching, then there are situations where you would want to do this test.
Actually, it may sometimes be true, depending on if Java takes an existing String from its internal String cache, creating the first declaration and then storing it, or taking it for both string declarations.
The compiler doesn't care that you're trying to do identity comparison against a literal. It could also be argued that it's not the compiler's job to be a code nanny. Look for a lint-like tool if you want to catch situations like this.
"I realize why the expression is untrue, but AFAIK, a can never be equal to the String literal "something"."
To clarify, in the example given, the expersion is always TRUE and a can be == and equals() to the String literal and in the example given it is always == and equals().
It is ironic that you appear have given the rare counter example to your own argument.
There are cases where you actually care whether you're dealing with exactly the same object rather than whether two objects are equal. In such cases, you need == rather than equals(). The compiler has no way of knowing whether you really wanted to compare the references for equality or the objects that they point to.
Now, it's far less likely that you're going to want == for strings than it would be for a user-defined type, but that doesn't guarantee that you wouldn't want it, and even if it did, that means that the compiler would have to special case strings are specifically check to make sure that you didn't use == on them.
In addition, because strings are immutable, the JVM is free to make string which would be equal per equals() share the same instance (to save memory), in which case they would also be equal per ==. So, depending on what the JVM does, == could very well return true. And the example that you gave is actually one where there's a decent chance of it because they're both string literals, so it would be fairly easy for the JVM to make them the same string, and it probably would. And, of course, if you want to see whether the JVM is making two strings share the same instance, you would have to use == rather than equals(), so there's a legitimate reason to want to use == on strings right there.
So, the compiler has no way of knowing enough of what you're doing to know that using == instead of equals() should be an error. This can lead to bugs if you're not careful (especially if you're used to a language like C++ which overloads == instead of having a separate equals() function), but the compiler can only do so much for you. There are legitimate reasons for using == instead of equals(), so the compiler isn't going to flag it as an error.
There exist tools that will warn you about these constructs; feel free to use them. However there are valid cases when you want to use == on Strings, and it is much worse language design to warn a user about a perfectly valid construct than to fail to warn them. When you have been using Java a year or so (and I will bet good money that you haven't reached that stage yet) you will find avoiding constructs like this is second nature.

What sort of equality does the Apache Commons ObjectUtils equals method test for?

I have always understood there to be two types of equality in Java,
value equality : uses the .equals() method to test that two objects implement an equivalence relation on non-null object references.
reference equality : uses the == operator to test that two primitive types or memory locations are equal.
The following pages describe these language fundamentals in more detail.
Wikibooks Java Programming : Java Programming/Comparing Objects
xyzws Java EE FAQ : What are the differences between the equality operator and the equals method?
Java Platform API : Javadoc for Object.equals()
Java Language Specification : Equality Operators
What none of these links explicitly specify is what should happen if two null object references are compared for value equality. The implicit assumption is that a NullPointerException should be thrown but this is not what is done by the ObjectUtils.equals() method, which might be considered a best practice utility method.
What worries me is that Apache Commons seems to have effectively introduced a third measure of equality into Java by the back door and that the already confusing state of affairs might have been made greatly more complex. I call it a third measure of equality because it attempts to test for value equality and when that fails it falls back to testing for reference equality. The Apache Commons equality test has many similarities with the value equality and reference equality but is also distinctly different.
Am I right to be concerned and to want to avoid using the ObjectUtils.equals() where ever possible?
Is there an argument for claiming that ObjectUtils.equals() provides a useful union of the other two measures of equality?
Chosen Answer
There doesn't seem to be a consensus opinion on this question but I decided to mark Bozho's as correct because he best drew my attention to what I now see as the greatest problem with null-safe equals checks. We should all be writing fail-fast code that addresses the root cause of why two null objects are being compared for value equality rather than trying to sweep the problem under the carpet.
Here's the code of ObjectUtils.equals(..):
public static boolean equals(Object object1, Object object2) {
if (object1 == object2) {
return true;
}
if ((object1 == null) || (object2 == null)) {
return false;
}
return object1.equals(object2);
}
ObjecUtils docs state clearly that objects passed can be null.
Now on the matter whether true should be returned if you compare two nulls. In my opinion - no, because:
when you compare two objects, you are probably going to do something with them later on. This will lead to a NullPointerException
passing two nulls to compare means that they got from somewhere instead of "real" objects, perhaps due to some problem. In that case comparing them alone is wrong - the program flow should have halted before that.
In a custom library we're using here we have a method called equalOrBothNull() - which differs from the equals method in this utility in the null comparison.
Am I right to be concerned and to want
to avoid using the
ObjectUtils.equals() where ever
possible?
No. What you need to consider equals depends on your requirements. And wanting to consider two nulls equal and any non-null unequal to a null without having to deal with NullPointerExceptions is a very, very common requirement (e.g. when you want to fire value-change events from a setter).
Actually, it's how equals() in general should work, and typically, half of that behvaiour is implemented (the API doc of Object.equals() states "For any non-null reference value x, x.equals(null) should return false.") - that it doesn't work the other way round is mainly due to technical restrictions (the language was designed without multiple dispatch to be simpler).
If you are concerned about this then you could either 1) not use this method 2) write your own to wrap it
public class MyObjectUtils {
public static boolean equals(Object obj1, Object obj2) {
return obj1 != null && obj2 != null && ObjectUtils.equals(obj1, obj2);
}
}
To me it seems weird to allow for null to be equals to null, but this doesn't seem like a large problem. For the most part, I wouldn't expect my application to even get into code paths that involve equality tests if one or more objects are null.

Java equals(): to reflect or not to reflect

This question is specifically related to overriding the equals() method for objects with a large number of fields. First off, let me say that this large object cannot be broken down into multiple components without violating OO principles, so telling me "no class should have more than x fields" won't help.
Moving on, the problem came to fruition when I forgot to check one of the fields for equality. Therefore, my equals method was incorrect. Then I thought to use reflection:
--code removed because it was too distracting--
The purpose of this post isn't necessarily to refactor the code (this isn't even the code I am using), but instead to get input on whether or not this is a good idea.
Pros:
If a new field is added, it is automatically included
The method is much more terse than 30 if statements
Cons:
If a new field is added, it is automatically included, sometimes this is undesirable
Performance: This has to be slower, I don't feel the need to break out a profiler
Whitelisting certain fields to ignore in the comparison is a little ugly
Any thoughts?
If you did want to whitelist for performance reasons, consider using an annotation to indicate which fields to compare. Also, this implementation won't work if your fields don't have good implementations for equals().
P.S. If you go this route for equals(), don't forget to do something similar for hashCode().
P.P.S. I trust you already considered HashCodeBuilder and EqualsBuilder.
Use Eclipse, FFS!
Delete the hashCode and equals methods you have.
Right click on the file.
Select Source->Generate hashcode and equals...
Done! No more worries about reflection.
Repeat for each field added, you just use the outline view to delete your two methods, and then let Eclipse autogenerate them.
If you do go the reflection approach, EqualsBuilder is still your friend:
public boolean equals(Object obj) {
return EqualsBuilder.reflectionEquals(this, obj);
}
Here's a thought if you're worried about:
1/ Forgetting to update your big series of if-statements for checking equality when you add/remove a field.
2/ The performance of doing this in the equals() method.
Try the following:
a/ Revert back to using the long sequence of if-statements in your equals() method.
b/ Have a single function which contains a list of the fields (in a String array) and which will check that list against reality (i.e., the reflected fields). It will throw an exception if they don't match.
c/ In your constructor for this object, have a synchronized run-once call to this function (similar to a singleton pattern). In other words, if this is the first object constructed by this class, call the checking function described in (b) above.
The exception will make it immediately obvious when you run your program if you haven't updated your if-statements to match the reflected fields; then you fix the if-statements and update the field list from (b) above.
Subsequent construction of objects will not do this check and your equals() method will run at it's maximum possible speed.
Try as I might, I haven't been able to find any real problems with this approach (greater minds may exist on StackOverflow) - there's an extra condition check on each object construction for the run-once behaviour but that seems fairly minor.
If you try hard enough, you could still get your if-statements out of step with your field-list and reflected fields but the exception will ensure your field list matches the reflected fields and you just make sure you update the if-statements and field list at the same time.
You can always annotate the fields you do/do not want in your equals method, that should be a straightforward and simple change to it.
Performance is obviously related to how often the object is actually compared, but a lot of frameworks use hash maps, so your equals may be being used more than you think.
Also, speaking of hash maps, you have the same issue with the hashCode method.
Finally, do you really need to compare all of the fields for equality?
You have a few bugs in your code.
You cannot assume that this and obj are the same class. Indeed, it's explicitly allowed for obj to be any other class. You could start with if ( ! obj instanceof myClass ) return false; however this is still not correct because obj could be a subclass of this with additional fields that might matter.
You have to support null values for obj with a simple if ( obj == null ) return false;
You can't treat null and empty string as equal. Instead treat null specially. Simplest way here is to start by comparing Field.get(obj) == Field.get(this). If they are both equal or both happen to point to the same object, this is fast. (Note: This is also an optimization, which you need since this is a slow routine.) If this fails, you can use the fast if ( Field.get(obj) == null || Field.get(this) == null ) return false; to handle cases where exactly one is null. Finally you can use the usual equals().
You're not using foundMismatch
I agree with Hank that [HashCodeBuilder][1] and [EqualsBuilder][2] is a better way to go. It's easy to maintain, not a lot of boilerplate code, and you avoid all these issues.
You could use Annotations to exclude fields from the check
e.g.
#IgnoreEquals
String fieldThatShouldNotBeCompared;
And then of course you check the presence of the annotation in your generic equals method.
If you have access to the names of the fields, why don't you make it a standard that fields you don't want to include always start with "local" or "nochk" or something like that.
Then you blacklist all fields that begin with this (code is not so ugly then).
I don't doubt it's a little slower. You need to decide whether you want to swap ease-of-updates against execution speed.
Take a look at org.apache.commons.EqualsBuilder:
http://commons.apache.org/proper/commons-lang/javadocs/api-3.2/org/apache/commons/lang3/builder/EqualsBuilder.html

Categories

Resources