Does equality test order affect performance in Java? - java

I commonly find myself writing code like this:
private List<Foo> fooList = new ArrayList<Foo>();
public Foo findFoo(FooAttr attr) {
for(Foo foo : fooList) {
if (foo.getAttr().equals(attr)) {
return foo;
}
}
}
However, assuming I properly guard against null input, I could also express the loop like this:
for(Foo foo : fooList) {
if (attr.equals(foo.getAttr()) {
return foo;
}
}
I'm wondering if one of the above forms has a performance advantage over the other. I'm well aware of the dangers of premature optimization, but in this case, I think the code is equally legible either way, so I'm looking for a reason to prefer one form over another, so I can build my coding habits to favor that form. I think given a large enough list, even a small performance advantage could amount to a significant amount of time.
In particular, I'm wondering if the second form might be more performant because the equals() method is called repeatedly on the same object, instead of different objects? Maybe branch prediction is a factor?

I would offer 2 pieces of advice here:
Measure it
If nothing else points you in any given direction, prefer the form which makes most sense and sounds most natural when you say it out loud (or in your head!)

I think that considering branch prediction is worrying about efficiency at too low of a level. However, I find the second example of your code more readable because you put the consistent object first. Similarly, if you were comparing this to some other object that, I would put the this first.
Of course, equals is defined by the programmer so it could be asymmetric. You should make equals an equivalence relation so this shouldn't be the case. Even if you have an equivalence relation, the order could matter. Suppose that attr is a superclass of the various foo.getAttr and the first test of your equals method checks if the other object is an instance of the same class. Then attr.equals(foo.getAttr()) will pass the first check but foo.getAttr().equals(attr) will fail the first check.
However, worrying about efficiency at this level seldom has benefits.

This depends on the implementation of the equals methods. In this situation I assume that both objects are instances of the same class. So that would mean that the methods are equal. This makes no performance difference.

If both objects are of the same type, then they should perform the same. If not, then you can't really know in advance what's going to happen, but usually it will be stopped quite quickly (with an instanceof or something else).
For myself, I usually start the method with a non-null check on the given parameter and I then use the attr.equals(foo.getAttr()) since I don't have to check for null in the loop. Just a question of preference I guess.

The only thing which does affect performance is code which does nothing.
In some cases you have code which is much the same or the difference is so small it just doesn't matter. This is the case here.
Where its is useful to swap the .equals() around is when you have a known value which cannot be null (This doesn't appear to be the cases here) of the type you are using is known.
e.g.
Object o = (Integer) 123;
String s = "Hello";
o.equals(s); // the type of equals is unknown and a virtual table look might be required
s.equals(o); // the type of equals is known and the class is final.
The difference is so small I wouldn't worry about it.
DEVENTER (n) A decision that's very hard to make because so little depends on it, such as which way to walk around a park
-- The Deeper Meaning of Liff by Douglas Adams and John Lloyd.

The performance should be the same, but in terms of safety, it's usually best to have the left operand be something that you are sure is not null, and have your equals method deal with null values.
Take for instance:
String s1 = null;
s1.equals("abc");
"abc".equals(s1);
The two calls to equals are not equivalent as one would issue a NullPointerException (the first one), and the other would return false.
The latter form is generally preferred for comparing with string constants for exactly this reason.

Related

Java: Use getter vs caching value

I have a getter that returns a String and I am comparing it to some other String. I check the returned value for null so my ifstatement looks like this (and I really do exit early if it is true)
if (someObject.getFoo() != null && someObject.getFoo().equals(someOtherString)) {
return;
}
Performancewise, would it be better to store the returned String rather than calling the getter twice like this? Does it even matter?
String foo = someObject.getFoo();
if (foo != null && foo.equals(someOtherString)) {
return;
}
To answer questions from the comments, this check is not performed very often and the getter is fairly simple. I am mostly curious how allocating a new local variable compares to executing the getter an additional time.
It depends entirely on what the getter does. If it's a simple getter (retrieving a data member), then the JVM will be able to inline it on-the-fly if it determines that code is a hot spot for performance. This is actually why Oracle/Sun's JVM is called "HotSpot". :-) It will apply aggressive JIT optimization where it sees that it needs it (when it can). If the getter does something complex, though, naturally it could be slower to use it and have it repeat that work.
If the code isn't a hot spot, of course, you don't care whether there's a difference in performance.
Someone once told me that the inlined getter can sometimes be faster than the value cached to a local variable, but I've never proven that to myself and don't know the theory behind why it would be the case.
Use the second block. The first block will most likely get optimized to the second anyway, and the second is more readable. But the main reason is that, if someObject is ever accessed by other threads, and if the optimization somehow gets disabled, the first block will throw no end of NullPointerException exceptions.
Also: even without multi-threading, if someObject is by any chance made volatile, the optimization will disappear. (Bad for performance, and, of course, really bad with multiple threads.) And lastly, the second block will make using a debugger easier (not that that would ever be necessary.)
You can omit the first null check since equals does that for you:
The result is true if and only if the argument is not null and is a String object that represents the same sequence of characters as this object.
So the best solution is simply:
if(someOtherString.equals(someObject.getFoo())
They both look same,even Performance wise.Use the 1st block if you are sure you won't be using the returned value further,if not,use 2nd block.
I prefer the second code block because it assigns foo and then foo cannot change to null/notnull.
Null are often required and Java should solve this by using the 'Elvis' operator:
if (someObject.getFoo()?.equals(someOtherString)) {
return;
}

String intern in equals method

Is it a good practise to use String#intern() in equals method of the class. Suppose we have a class:
public class A {
private String field;
private int number;
#Override
public boolean equals(Object obj) {
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final A other = (A) obj;
if ((this.field == null) ? (other.field != null) : !this.field.equals(other.field)) {
return false;
}
if (this.number != other.number) {
return false;
}
return true;
}
}
Will it be faster to use field.intern() != other.field.intern() instead of !this.field.equals(other.field).
No! Using String.intern() implicitly like this is not a good idea:
It will not be faster. As a matter of fact it will be slower due to the use of a hash table in the background. A get() operation in a hash table contains a final equality check, which is what you want to avoid in the first place. Used like this, intern() will be called each and every time you call equals() for your class.
String.intern() has a lot of memory/GC implications that you should not implicitly force on users of this class.
If you want to avoid full blown equality checks when possible, consider the following avenues:
If you know that the set of strings is limited and you have repeated equality checks, you can use intern() for the field at object creation, so that any subsequent equality checks will come down to an identity comparison.
Use an explicit HashMap or WeakHashMap instead of intern() to avoid storing strings in the GC permanent generation - this was an issue in older JVMs, not sure if it is still a valid concern.
Keep in mind that if the set of strings is unbounded, you will have memory issues.
That said, all this sounds like premature optimization to me. String.equals() is pretty fast in the general case, since it compares the string lengths before comparing the strings themselves. Have you profiled your code?
Good practice : Nope. You're doing something tricky, and that makes for brittle, less readable code. Unless this equals() method needs to be crazy performant (and your performance tests validate that it is in fact faster), it's not worth it.
Faster : Could be. But don't forget that you can have unintended side effects from using the intern() method: http://www.onkarjoshi.com/blog/213/6-things-to-remember-about-saving-memory-with-the-string-intern-method/
Any benefit gained by performing an identity comparison on the interned Strings is likely to be outweighed by the associated cost of interning the Strings.
In the above case you could consider interning the String when you instantiate the class, providing the field is constant (in which case you should also mark it as final). You could also check for null on instantiation to avoid having to check on each call to equals (assuming you disallow null Strings).
However, in general these types of micro-optimisation offer little gain in performance.
Let's go through this one step at a time...
The idea here is that if you use String#intern, you'll be given a canonical representation of that String. A pool of Strings is kept internally and each entry is guaranteed to be unique for that pool with regard to equals. If you call intern() on a String, then either a previously pooled identical String is going to be returned, or the String you called intern on is going to be pooled and returned.
So if we have two Strings s1 and s2 and we assume neither is null, then the following two lines of code are considered idempotent:
s1.equals(s2);
s1.intern() == s2.intern();
Let's investigate two assumptions we've made now:
s1.intern() and s2.intern() really will return the same object if s1.equals(s2) evaluates to true.
Using the == operator on two interned references to the same String will be more efficient than using the equals method.
The first assumption is probably the most dangerous of all. The JavaDoc for the intern method tells us that using this method will return a canonical representation for an internally kept pool of Strings. But it doesn't tell us anything about that pool. Once an entry has been added to the pool, can it ever be removed again? Will the pool keep growing indefinitely or will entries occassionally be culled to make it act as a limited-size cache? You'd have to check the actual specifications of the Java Language and Virtual Machine to get any certainty, if they offer it at all. Having to check specs for a limited optimization is usually a big warning sign. Checking the source code for Sun's JDK 7, I see that intern is specified as a native method. So not only is the implementation likely to be vendor-specific, it might vary across platforms as well for VMs from the same vendor. All bets are off regarding stuff that's not in the spec.
On to our second assumption. Let's consider for a moment what it would take to intern a String... First of all, we'll need to check if the String is already in the pool. We'll assume they've tried to get an O(1) complexity going there to keep this fast by using some hashing scheme. But that's assuming we've got a hash of the String. Since this is a native method, I'm not certain what would be used... Some hash of the native representation or simply what hashCode() returns. I know from the source code of Sun's JDK that a String instance caches its hash code. It'll only be calculated the first time the method is called, and after that the calculated value will be returned. So at the very least, a hash must be calculated at least once if we're to use that. Getting a reliable hash of a String will probably involve arithmetic on each and every character, which can be expensive for lenghty values. Even once we have the hash and thus a set of Strings that are candidates for being matches in the interned pool, we'd still have to verify if one of these really is an exact match which would involve... an equality check. Meaning going through each and every character of the Strings and seeing if they match if trivial cases like inequal length can't be applied first. Worse still, we might have to do this for more than one other String like we'd do with a regular equals, since multiple Strings in the pool might have the same hash or end up in the same hash bucket.
So, that stuff we need to do to find out if a String was already interned or not sounds suspiciously like what equals would need to do. Basically, we've gained nothing and might even have made our equals implementation more expensive. At least, if we're going to call intern each and every time. So maybe we should intern the String right away and simply always use that interned reference. Let's check how class A would look if that were the case. I'm assuming the String field is initialized on construction:
public class A {
private final String field;
public A(final String s) {
field = s.intern();
}
}
That's looking a little more sensible. Any Strings that are passed to the constructor and are equal will end up being the same reference. Now we can safely use == between the field field of A instances for equality checks, right?
Well, it'd be useless. Why? If you check the source for equals in class String, you'll find that any implementation made by someone with half a brain will first do a == check to catch the trivial case where the instance and the argument are the same reference first. That could save a potentially heavy char-by-char comparison. I know the JDK 7 source I'm using for reference does this. So you're still better off using equals because it does that reference check anyway.
The second reason this'd be a bad idea is that first point way up above... We simply don't know if the instances are going to be kept in the pool indefinitely. Check this scenario, which may or may not occur depending on JVM implementation:
String s1 = ... //Somehow gets passed a non-interned "test" value
A a1 = new A(s1);
//Lots of time passes... winter comes and goes and spring returns the land to a lush green...
String s2 = ... //Somehow gets passed a non-interned "test" value
A a2 = new A(s2);
a1.equals(a2); //Totally returns the wrong result
What happened? Well, if it turns out the interned String pool will sometimes be culled of certain entries, then that first construction of an A could have s1 interned, only to see it being removed from the pool, to have it later replaced by that s2 instance. Since s1 and s2 are conceivably different instances, the == check fails. Can this happen? I've got no idea. I certainly won't go check the specs and native code to find out. Will the programmer that's going through your code with a debugger to find out why the hell "test" is not considered the same as "test"?
It's no problem if we're using equals. It'll catch the same instance case early for optimal results, which will benefit us when we've interned our Strings, but we won't have to worry about cases where the instances still end up being different because then equals is gonna do the classic compare work. It just goes to show that it's best not to second-guess the actual runtime implementation or compiler, because these things were made by people who know the specs like the back of their hands and really worry about performance.
So String interning manually can be of benefit when you make sure that...
you're not interning each and every time, but just intern a String once like when intializing a field and then keep using that interned instance;
you still use equals to make sure implementation details won't ruin your day and your code doesn't actually rely on that interning, instead relying on the implementation of the method to catch the trivial cases.
After keeping this in mind, surely it's worth using intern()? Well, we still don't know how expensive intern() is. It's a native method so it might be really fast. But we're not sure unless we check the code for our target platform and JVM implementation. We've also had to make sure we understand exactly what interning does and what assumptions we've made about it. Are you sure the next person reading your code will have the same level of understanding? They might be bewildered about this weird method they've never seen before that dabbles in JVM internals and might spend an hour reading the same gibberish I'm typing right now, instead of getting work done.
That's the problem right there... Before, it was simple. You used equals and were done. Now, you've added another little thing that can nestle itself in your mind and cause you to wake up screaming one night because you've just realized that oh my God you've forgot to take out one of the == uses and that piece of code is used in a routine controlling the killer bots' apprisal of citizen disobedience and you've heard its JVM isn't too solid!
Donald Knuth was famously attributed the quote...
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"
Knuth was clever enough to add in that 97% detail. Sometimes, thoroughly micro-optimizing a small portion of code can make a big difference. Say, if that piece of code takes up 30% of the program's runtime execution. The problem with micro-optimizations is that they tend to work on assumptions. When you start using intern() and believe that from then on it'll be safe to make reference equality checks, you've made a hell of a lot of assumptions. And even if you go down to implementation level to check if they're right, are you sure they will be in the next JRE version?
I myself have used intern() manually. Did it in some piece of code where the same handful of Strings are gonna end up in hundreds if not thousands of object instances as fields. Those fields are gonna be used as keys in HashMaps and are frequently used while doing some validation over those instances. I figured interning was worth it for two purposes: reducing memory overhead by making all those equal Strings one single instance and speeding up the map lookups, since they're using hashCode() and equals. But I've made damn sure that you can take all those intern() calls out of the code and everything will still work fine. The interning is just some icing on the cake in this case, a little extra that may or may not make a bit of difference along the road. But it's not an essential part of my code's correctness.
Long post, eh? Why'd I go through the trouble of typing all of this up? To show you that if you make micro-optimizations, you'd better know damn well what you're doing and willing to document it so thoroughly that you might as well not have bothered.
This is hard to say given that you have not specified hardware. Timing test are difficult to get right and are not universal. Have you done a timing test yourself?
My feeling is that the intern pattern would not be faster as each string would need to be matched to a possible string in a dictionary of all interned strings.

Java = Return Object list/array vs. Result-Object (the same with method parameters)

This might seem to be a strange question: I am struggling to decide whether it is a good practice and "efficient" to work with "Typed Objects" on a very granular level.
public Object[] doSomething() {
Object[] resultList = new Object[] {new Foo(), new Bar()};
return resultList;
}
versus
public Result doSomething() {
Result result = new Result();
result.foo = new Foo();
result.bar = new Bar();
return result;
}
public class Result{
Foo foo;
Bar bar;
}
My question is concrete as follows:
In terms of CPU Cycles (as a relative figure), how much does the second approach consume more resources. (like 100% more)
The same question in regard to memory consumption
NB (these two are questions to understand it more, its not about premature optimization)
In terms of "good design practice". Do you think version 1 is an absolute No-Go or do you rather think it actually does not matter...Or would you propose never returning "object Arrays" (((in an object oriented programming language)))...
This is something, I am always wondering if I should create dedicated Objects for everything (for passing values) or I should rather use generic objects (and common method parameters...)
The question also applies to
public doSomething(Query query )
versus
public doSomething(Foo foo, Bar bar, Aaaa, a, Bbbbb)
thanks
Markus
3.) In terms of "good design pratice". Do you think version 1 is an absolute No-Go or do you rather think it actually does not matter...Or would you propose never returnung "object Arrays" (((in an object oriented programming langauge/regarding encapsulation ...)))...
Version 1 is absolutely a no-go. It's almost completely untyped. The caller has to know the actual types and where they are in the array, and cast appropriately. You lose any useful compile-time type checking, and the code itself is significantly less clear.
I would never return an Object[] unless the values it contained were constructed with new Object().
I don't believe that defining a Result class and returning that consumes any more resources at run time than constructing an Object[]. (Granted, there's a miniscule cost for storing and loading the class definition.) Do you have data that indicate otherwise?
Returning an untyped object array is poor practice for various reasons, among which are:
It's prone to error.
It's harder to maintain.
Casting back to the "real" type is not free, either.
Regarding your other query:
public doSomething(Query query)
versus
public doSomething(Foo foo, Bar bar)
This is less clear-cut. If packaging up a Foo and a Bar into a Query object makes sense in the problem domain, then I would definitely do it. If it's just a packaging up for the sake of minimizing the number of arguments (that is, there's no "query object" concept in your problem domain), then I would probably not do it. If it's a question of run-time performance, then the answer is (as always) to profile.
I'd have to do an experiment to really know, but I'd guess that the object array would not be significantly faster. It might even be slower. After all, in either case you have to create an object: either the array object or the Result object. With the Result object you have to read the class definition from disk the first time you use it, and the class definition has to float around in memory, so there'd be some extra cost there. But with the array object you have to do casts when you pull the data out, and the JVM has to do bounds checkings on the array (What happens if the caller tries to retrieve resultList[12]?), which also involves extra work. My guess is that if you do it only once or twice, the array would be faster (because of the class load time), but if you do it many times, the dedicated object would be faster (because of the cast and array access time). But I admit I'm just guessing.
In any case, even if the array does have a slight performance edge, the loss in readability and maintainability of the code almost surely isn't worth it.
The absolute worst thing that can happen is if values you're returning in the array are of the same class but have different semantic meanings. Like suppose you did this:
public Object[] getCustomerData(int customerid)
{
String customerName=... however you get it ...
BigDecimal currentDue=...
BigDecimal pastDue=...
return new Object[] {customerName, pastDue, currentDue};
}
... meanwhile, back at the ranch ...
Object[] customerData=getCustomerData(customerid);
BigDecimal pastDue=(BigDecimal)customerData[2];
if (pastDue>0)
sendNastyCollectionLetter();
Do you see the error? I retrieve entry #2 as pastDue when it's supposed to be #1. You could easily imagine this happenning if a programmer in a moment of thoughtlessness counted the fields starting from one instead of zero. Or in a long list if he miscounted and said #14 when it's really #15. As both have the same data type, this will compile and run just fine. But we'll be sending inappropriate collection letters to customers who are not over due. This would be very bad for customer relations.
Okay, maybe this is a bad example -- I just pulled it off the top of my head -- because we would be likely to catch that in testing. But what if the values we switched were rarely used, so that no one thought to include a test scenario for them. Or their effect was subtle, so that an error might slip through testing. For that matter, maybe you wouldn't catch this one in testing if you were rushing a change through, or if the tester slipped up, etc etc.

extract boolean checks to local variables

Sometimes i extract boolean checks into local variables to achief better readability.
What do you think?
Any disadvantages?
Does the compiler a line-in or something if the variable isn't used anywhere else? I also thought about reducing the scope with an additional block "{}".
if (person.getAge() > MINIMUM_AGE && person.getTall() > MAXIMUM_SIZE && person.getWeight < MAXIMUM_WEIGHT) {
// do something
}
final boolean isOldEnough = person.getAge() > MINIMUM_AGE;
final boolean isTallEnough = person.getTall() > MAXIMUM_SIZE;
final boolean isNotToHeavy = person.getWeight < MAXIMUM_WEIGHT;
if (isOldEnough && isTallEnough && isNotToHeavy) {
// do something
}
I do this all the time. The code is much more readable that way. The only reason for not doing this is that it inhibits the runtime from doing shortcut optimisation, although a smart VM might figure that out.
The real risk in this approach is that it loses responsiveness to changing values.
Yes, people's age, weight, and height don't change very often, relative to the runtime of most programs, but they do change, and if, for example, age changes while the object from which your snippet is still alive, your final isOldEnough could now yield a wrong answer.
And yet I don't believe putting isEligible into Person is appropriate either, since the knowledge of what constitutes eligibility seems to be of a larger scope. One must ask: eligible for what?
All in all, in a code review, I'd probably recommend that you add methods in Person instead.
boolean isOldEnough (int minimumAge) { return (this.getAge() > minimumAge); }
And so on.
Your two blocks of code are inequivalent.
There are many cases that could be used to show this but I will use one. Suppose that person.getAge() > MINIMUM_AGE were true and person.getTall() threw an exception.
In the first case, the expression will execute the if code block, while the second case will throw an exception. In computability theory, when an exception is thrown, then this is called 'the bottom element. It has been shown that a program when evaluated using eager evaluation semantics (as in your second example), that if it terminates (does not resolve to bottom), then it is guaranteed that an evaluation strategy of laziness (your first example) is guaranteed to terminate. This is an important tenet of programming. Notice that you cannot write Java's && function yourself.
While it is unlikely that your getTall() method will throw an exception, you cannot apply your reasoning to the general case.
I think the checks probably belong in the person class. You could pass in the Min/Max values, but calling person.IsEligable() would be a better solution in my opinion.
You could go one step further and create subtypes of the Person:
Teenager extends Person
ThirdAgePerson extends Person
Kid extends Person
Subclasses will be overriding Person's methods in their own way.
One advantage to the latter case is that you will have the isOldEnough, isTallEnough, and isNotToHeavy (sic) variables available for reuse later in the code. It is also more easily readable.
You might want to consider abstracting those boolean checks into their own methods, or combining the check into a method. For example a person.isOldEnough() method which would return the value of the boolean check. You could even give it an integer parameter that would be your minimum age, to give it more flexible functionality.
I think this is a matter of personal taste. I find your refactoring quite readable.
In this particualr case I might refactor the whole test into a
isThisPersonSuitable()
method.
If there were much such code I might even create a PersonInterpreter (maybe inner) class which holds a person and answers questions about their eligibility.
Generally I would tend to favour readability over any minor performance considerations.
The only possible negative is that you lose the benefits of the AND being short-circuited. But in reality this is only really of any significance if any of your checks is largely more expensive than the others, for example if person.getWeight() was a significant operation and not just an accessor.
I have nothing against your construct, but it seems to me that in this case the readability gain could be achieved by simply putting in line breaks, i.e.
if (person.getAge() > MINIMUM_AGE
&& person.getTall() > MAXIMUM_SIZE
&& person.getWeight < MAXIMUM_WEIGHT)
{
// do something
}
The bigger issue that other answers brought up is whether this belongs inside the Person object. I think the simple answer to that is: If there are several places where you do the same test, it belongs in Person. If there are places where you do similar but different tests, then they belong in the calling class.
Like, if this is a system for a site that sells alcohol and you have many places where you must test if the person is of legal drinking age, then it makes sense to have a Person.isLegalDrinkingAge() function. If the only factor is age, then having a MINIMUM_DRINKING_AGE constant would accomplish the same result, I guess, but once there's other logic involved, like different legal drinking ages in different legal jurisdictions or there are special cases or exceptions, then it really should be a member function.
On the other hand, if you have one place where you check if someone is over 18 and somewhere else where you check if he's over 12 and somewhere else where you check if he's over 65 etc etc, then there's little to be gained by pushing this function into Person.

Java equals(): to reflect or not to reflect

This question is specifically related to overriding the equals() method for objects with a large number of fields. First off, let me say that this large object cannot be broken down into multiple components without violating OO principles, so telling me "no class should have more than x fields" won't help.
Moving on, the problem came to fruition when I forgot to check one of the fields for equality. Therefore, my equals method was incorrect. Then I thought to use reflection:
--code removed because it was too distracting--
The purpose of this post isn't necessarily to refactor the code (this isn't even the code I am using), but instead to get input on whether or not this is a good idea.
Pros:
If a new field is added, it is automatically included
The method is much more terse than 30 if statements
Cons:
If a new field is added, it is automatically included, sometimes this is undesirable
Performance: This has to be slower, I don't feel the need to break out a profiler
Whitelisting certain fields to ignore in the comparison is a little ugly
Any thoughts?
If you did want to whitelist for performance reasons, consider using an annotation to indicate which fields to compare. Also, this implementation won't work if your fields don't have good implementations for equals().
P.S. If you go this route for equals(), don't forget to do something similar for hashCode().
P.P.S. I trust you already considered HashCodeBuilder and EqualsBuilder.
Use Eclipse, FFS!
Delete the hashCode and equals methods you have.
Right click on the file.
Select Source->Generate hashcode and equals...
Done! No more worries about reflection.
Repeat for each field added, you just use the outline view to delete your two methods, and then let Eclipse autogenerate them.
If you do go the reflection approach, EqualsBuilder is still your friend:
public boolean equals(Object obj) {
return EqualsBuilder.reflectionEquals(this, obj);
}
Here's a thought if you're worried about:
1/ Forgetting to update your big series of if-statements for checking equality when you add/remove a field.
2/ The performance of doing this in the equals() method.
Try the following:
a/ Revert back to using the long sequence of if-statements in your equals() method.
b/ Have a single function which contains a list of the fields (in a String array) and which will check that list against reality (i.e., the reflected fields). It will throw an exception if they don't match.
c/ In your constructor for this object, have a synchronized run-once call to this function (similar to a singleton pattern). In other words, if this is the first object constructed by this class, call the checking function described in (b) above.
The exception will make it immediately obvious when you run your program if you haven't updated your if-statements to match the reflected fields; then you fix the if-statements and update the field list from (b) above.
Subsequent construction of objects will not do this check and your equals() method will run at it's maximum possible speed.
Try as I might, I haven't been able to find any real problems with this approach (greater minds may exist on StackOverflow) - there's an extra condition check on each object construction for the run-once behaviour but that seems fairly minor.
If you try hard enough, you could still get your if-statements out of step with your field-list and reflected fields but the exception will ensure your field list matches the reflected fields and you just make sure you update the if-statements and field list at the same time.
You can always annotate the fields you do/do not want in your equals method, that should be a straightforward and simple change to it.
Performance is obviously related to how often the object is actually compared, but a lot of frameworks use hash maps, so your equals may be being used more than you think.
Also, speaking of hash maps, you have the same issue with the hashCode method.
Finally, do you really need to compare all of the fields for equality?
You have a few bugs in your code.
You cannot assume that this and obj are the same class. Indeed, it's explicitly allowed for obj to be any other class. You could start with if ( ! obj instanceof myClass ) return false; however this is still not correct because obj could be a subclass of this with additional fields that might matter.
You have to support null values for obj with a simple if ( obj == null ) return false;
You can't treat null and empty string as equal. Instead treat null specially. Simplest way here is to start by comparing Field.get(obj) == Field.get(this). If they are both equal or both happen to point to the same object, this is fast. (Note: This is also an optimization, which you need since this is a slow routine.) If this fails, you can use the fast if ( Field.get(obj) == null || Field.get(this) == null ) return false; to handle cases where exactly one is null. Finally you can use the usual equals().
You're not using foundMismatch
I agree with Hank that [HashCodeBuilder][1] and [EqualsBuilder][2] is a better way to go. It's easy to maintain, not a lot of boilerplate code, and you avoid all these issues.
You could use Annotations to exclude fields from the check
e.g.
#IgnoreEquals
String fieldThatShouldNotBeCompared;
And then of course you check the presence of the annotation in your generic equals method.
If you have access to the names of the fields, why don't you make it a standard that fields you don't want to include always start with "local" or "nochk" or something like that.
Then you blacklist all fields that begin with this (code is not so ugly then).
I don't doubt it's a little slower. You need to decide whether you want to swap ease-of-updates against execution speed.
Take a look at org.apache.commons.EqualsBuilder:
http://commons.apache.org/proper/commons-lang/javadocs/api-3.2/org/apache/commons/lang3/builder/EqualsBuilder.html

Categories

Resources