Empty String Validation - java

I've read through this SO thread: Java, check whether a string is not null and not empty?
The question that arises (and was not answered in that thread), is:
Why is string.isEmpty() better than using string.equals("") ? In the same answer the poster states that prior to J6, people would use string.length == 0 as a way to check for empty strings. Am I missing something, because they all seem to do exactly the same thing... making it just a matter of beautifying your code.

The best to use is
"".equals(yourString)
as this avoids the null pointer exception.
If you use
string.equals("")
and if your string is null, it will throw a null pointer exception.
and again the problem with isEmpty is that
It Returns true if, and only if, length() is 0
So if your string is empty it will also cause NPE.

Why is string.isEmpty() better than using string.equals("") ?
Just look at the source of String#isEmpty():
public boolean isEmpty() {
return count == 0;
}
It simply returns the result of comparing count variable stored as a part of the class with 0. No heavy computation. The value is already there.
Whereas, the String.equals() method does lot of computations, viz, typecasting, reference comparison, length comparison - String#equals(Object) source code. So, to avoid those runtime operations, you can simply use isEmpty() to check for empty strings.
That would be a slower by a minute difference though.
Note: The Oracle Java 7 version of String.isEmpty() uses value.length == 0, as it no more stores the count and offset variable. In openjdk-7 though, it still uses the count variable there.
But still, the value.length computation will be a bit faster than all those operations in equals() method.
Apart from the performance difference, which is really not of much concern, and the difference if any would be minute, the String.isEmpty() method seems more clear about your intents. So, I prefer to use that method for checking for empty strings.
And at last, of course, don't believe on what you see. Just benchmark your code using both the methods, and see for any measurable differences if any.

The call of isEmpty() expresses your intentions better, thus improving readability. Readers spend less time understanding what is your intention behind the check: rather than thinking that you are interested in checking string's equality or determining its length, they see that all you want to know is if the string is empty or not.
There is no performance difference, as isEmpty() is implemented by checking the length.

as Lucas stated, one's not better than the other...isEmpty is a little more readable, nothing more, nothing less. There's no significant performance hit or anything like that using one over the other.

There are many ways to get one answer, everything is depending on the programming language and how to program your.
For Example in C#:
if (String.IsNullOrEmpty(s))
return "is null or empty";

Use the util method provided by apache commons
http://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringUtils.html#isEmpty%28java.lang.String%29
If you feel like you can even write your own isEmpty(String input) method
This approach has following advantages
It makes your intentions clear
You are safe in case of nulls.

Related

Why does ImmutableCollection.contains(null) fail?

Question ahead:
why does in Java the call coll.contains(null) fail for ImmutableCollections?
I know, that immutable collections cannot contain null-elements, and I do not want to discuss whether that's good or bad.
But when I write a Function, that takes a (general, not explicit immutable) Collection, it fails upon checking for nulls. Why does the implementation not return false (which is actually the 'correct' answer)?
And how can I properly check for nulls in a Collection in general?
Edit:
with some discussions (thanks to the commenters!) I realized, that I mixed up two things: ImmutableCollection from the guava library, and the List returned by java.util.List.of, being some class from ImmutableCollections. However, both classes throw an NPE on .contains(null).
My problem was with the List.of result, but technically the same would happen with guaves implementation. [edit: It does not]
I am distressed by this discussion!
Collections that do this have been a pet peeve of mine since before I wrote the first collections that eventually became Guava. If you find any Guava collection that throws NPE just because you asked it a perfectly innocent question like .contains(null), please file a bug! We hate that crap.
EDIT: I was so distressed that I had to go back to look at my 2007 changelist that first created ImmutableSet and saw literally this:
#Override public boolean contains(#Nullable Object target) {
if (target == null) {
return false;
}
ahhhhh.
why does in Java the call coll.contains(null) fail for ImmutableCollections?
Because the design team (the ones who have created guava) decided that, for their collections, null is unwanted, and therefore any interaction between their collections and a null check, even in this case, should just throw to highlight to the programmer, at the earliest possible opportunity, that there is a mismatch. Even where the established behaviour (as per the existing implementations in the core runtime itself, such as ArrayList and friends, as well as the javadoc), rather explicitly go the other way and say that a non-sequitur check (is this pear part of this list of apples?) strongly suggests that the right move is to just return false and not throw.
In other words, guava messed up. But now that they have done so, going back is potentially backwards compatibility breaking. It really isn't very - you are replacing an exception thrown with a false return value; presumably code could be out there that relies on the NPE (catching it and doing something different from what the code would do had contains(null) returned false instead of throwing) - but that's a rare case, and guava breaks backwards compatibility all the time.
And how can I properly check for nulls in a Collection in general?
By calling .contains(null), just as you are. The fact that guava doesn't do it right doesn't change the answer. You might as well ask 'how do I add elements to a list', and counter the answer of "well, you call list.add(item) to do that" with: Well, I have this implementation of the List interface that plays Rick Astley over the speaker instead of adding to the list, so, I reject your answer.
That's.. how java and interfaces work: You can have implementations of them, and the only guardianship that they do what the interface dictates they must, is that the author understands there is a contract that needs to be followed.
Now, normally a library so badly written they break contract for no good reason*, isn't popular. But guava IS popular. Very popular. That gets at a simple truth: No library is perfect. Guava's API design is generally quite good (in my opinion, vastly superior to e.g. Apache commons libraries), and the team actively spends a lot of time debating proper API design, in the sense that the code that one would write using guava is nice (as defined by: Easy to understand, has few surprises, easy to maintain, easy to test, and probably easy to mutate to deal with changing requirements - the only useful definition for nebulous terms like 'nice' or 'elegant' code - it's code that does those things, anything else is pointless aesthetic drivel). In other words, they are actively trying, and they usually get it right.
Just, not in this case. Work around it: return item != null && coll.contains(item); will get the job done.
There is one major argument in favour of guava's choice: They 'contract break' is an implicit break - one would expect that .contains(null) works, and always returns false, but it's not explicitly stated in the javadoc that one must do this. Contrast to e.g. IdentityHashMap, which uses identity equivalence (a==b) and not value equality (a.equals(b)) in its .containsKey etc implementations, which explicitly goes against the javadoc contract as stated in the j.u.Map interface. IHM has an excellent reason for it, and highlights the discrepancy, plus explains the reason, in the javadoc. Guava isn't nearly as clear about their bizarre null behaviour, but, here's a crucial thing about null in java:
Its meaning is nebulous. Sometimes it means 'empty', which is bad design: You should never write if (x == null || x.isEmpty()) - that implies some API is badly coded. If null is semantically equivalent to some value (such as "" or List.of()), then you should just return "" or List.of(), and not null. However, in such a design, list.contains(null) == false) would make sense.
But sometimes null means not found, irrelevant, not applicable, or unknown (for example, if map.get(k) returns null, that's what it means: Not found. Not 'I found an empty value for you'). This matches with what NULL means in e.g. SQL. In all those cases, .contains(null) should be returning neither true nor false. If I hand you a bag of marbles and ask you if there is a marble in there that is grue, and you have no idea what grue means, you shouldn't answer either yes or no to my query: Either answer is a meaningless guess. You should tell me that the question cannot be answered. Which is best represented in java by throwing, which is precisely what guava does. This also matches with what NULL does in SQL. In SQL, v IN (x) returns one of 3 values, not 2 values: It can resolve to true, false, or null. v IN (NULL) would resolve to NULL and not false. It is answering a question that can't be answered with the NULL value, which is to be read as: Don't know.
In other words, guava made a call on what null implies which evidently does not match with your definitions, as you expect .contains(null) to return false. I think your viewpoint is more idiomatic, but the point is, guava's viewpoint is different but also consistent, and the javadoc merely insinuates, but does not explicitly demand, that .contains(null) returns false.
That's not useful whatsoever in fixing your code, but hopefully it gives you a mental model, and answers your question of "why does it work like this?".

Java: Use getter vs caching value

I have a getter that returns a String and I am comparing it to some other String. I check the returned value for null so my ifstatement looks like this (and I really do exit early if it is true)
if (someObject.getFoo() != null && someObject.getFoo().equals(someOtherString)) {
return;
}
Performancewise, would it be better to store the returned String rather than calling the getter twice like this? Does it even matter?
String foo = someObject.getFoo();
if (foo != null && foo.equals(someOtherString)) {
return;
}
To answer questions from the comments, this check is not performed very often and the getter is fairly simple. I am mostly curious how allocating a new local variable compares to executing the getter an additional time.
It depends entirely on what the getter does. If it's a simple getter (retrieving a data member), then the JVM will be able to inline it on-the-fly if it determines that code is a hot spot for performance. This is actually why Oracle/Sun's JVM is called "HotSpot". :-) It will apply aggressive JIT optimization where it sees that it needs it (when it can). If the getter does something complex, though, naturally it could be slower to use it and have it repeat that work.
If the code isn't a hot spot, of course, you don't care whether there's a difference in performance.
Someone once told me that the inlined getter can sometimes be faster than the value cached to a local variable, but I've never proven that to myself and don't know the theory behind why it would be the case.
Use the second block. The first block will most likely get optimized to the second anyway, and the second is more readable. But the main reason is that, if someObject is ever accessed by other threads, and if the optimization somehow gets disabled, the first block will throw no end of NullPointerException exceptions.
Also: even without multi-threading, if someObject is by any chance made volatile, the optimization will disappear. (Bad for performance, and, of course, really bad with multiple threads.) And lastly, the second block will make using a debugger easier (not that that would ever be necessary.)
You can omit the first null check since equals does that for you:
The result is true if and only if the argument is not null and is a String object that represents the same sequence of characters as this object.
So the best solution is simply:
if(someOtherString.equals(someObject.getFoo())
They both look same,even Performance wise.Use the 1st block if you are sure you won't be using the returned value further,if not,use 2nd block.
I prefer the second code block because it assigns foo and then foo cannot change to null/notnull.
Null are often required and Java should solve this by using the 'Elvis' operator:
if (someObject.getFoo()?.equals(someOtherString)) {
return;
}

String intern in equals method

Is it a good practise to use String#intern() in equals method of the class. Suppose we have a class:
public class A {
private String field;
private int number;
#Override
public boolean equals(Object obj) {
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final A other = (A) obj;
if ((this.field == null) ? (other.field != null) : !this.field.equals(other.field)) {
return false;
}
if (this.number != other.number) {
return false;
}
return true;
}
}
Will it be faster to use field.intern() != other.field.intern() instead of !this.field.equals(other.field).
No! Using String.intern() implicitly like this is not a good idea:
It will not be faster. As a matter of fact it will be slower due to the use of a hash table in the background. A get() operation in a hash table contains a final equality check, which is what you want to avoid in the first place. Used like this, intern() will be called each and every time you call equals() for your class.
String.intern() has a lot of memory/GC implications that you should not implicitly force on users of this class.
If you want to avoid full blown equality checks when possible, consider the following avenues:
If you know that the set of strings is limited and you have repeated equality checks, you can use intern() for the field at object creation, so that any subsequent equality checks will come down to an identity comparison.
Use an explicit HashMap or WeakHashMap instead of intern() to avoid storing strings in the GC permanent generation - this was an issue in older JVMs, not sure if it is still a valid concern.
Keep in mind that if the set of strings is unbounded, you will have memory issues.
That said, all this sounds like premature optimization to me. String.equals() is pretty fast in the general case, since it compares the string lengths before comparing the strings themselves. Have you profiled your code?
Good practice : Nope. You're doing something tricky, and that makes for brittle, less readable code. Unless this equals() method needs to be crazy performant (and your performance tests validate that it is in fact faster), it's not worth it.
Faster : Could be. But don't forget that you can have unintended side effects from using the intern() method: http://www.onkarjoshi.com/blog/213/6-things-to-remember-about-saving-memory-with-the-string-intern-method/
Any benefit gained by performing an identity comparison on the interned Strings is likely to be outweighed by the associated cost of interning the Strings.
In the above case you could consider interning the String when you instantiate the class, providing the field is constant (in which case you should also mark it as final). You could also check for null on instantiation to avoid having to check on each call to equals (assuming you disallow null Strings).
However, in general these types of micro-optimisation offer little gain in performance.
Let's go through this one step at a time...
The idea here is that if you use String#intern, you'll be given a canonical representation of that String. A pool of Strings is kept internally and each entry is guaranteed to be unique for that pool with regard to equals. If you call intern() on a String, then either a previously pooled identical String is going to be returned, or the String you called intern on is going to be pooled and returned.
So if we have two Strings s1 and s2 and we assume neither is null, then the following two lines of code are considered idempotent:
s1.equals(s2);
s1.intern() == s2.intern();
Let's investigate two assumptions we've made now:
s1.intern() and s2.intern() really will return the same object if s1.equals(s2) evaluates to true.
Using the == operator on two interned references to the same String will be more efficient than using the equals method.
The first assumption is probably the most dangerous of all. The JavaDoc for the intern method tells us that using this method will return a canonical representation for an internally kept pool of Strings. But it doesn't tell us anything about that pool. Once an entry has been added to the pool, can it ever be removed again? Will the pool keep growing indefinitely or will entries occassionally be culled to make it act as a limited-size cache? You'd have to check the actual specifications of the Java Language and Virtual Machine to get any certainty, if they offer it at all. Having to check specs for a limited optimization is usually a big warning sign. Checking the source code for Sun's JDK 7, I see that intern is specified as a native method. So not only is the implementation likely to be vendor-specific, it might vary across platforms as well for VMs from the same vendor. All bets are off regarding stuff that's not in the spec.
On to our second assumption. Let's consider for a moment what it would take to intern a String... First of all, we'll need to check if the String is already in the pool. We'll assume they've tried to get an O(1) complexity going there to keep this fast by using some hashing scheme. But that's assuming we've got a hash of the String. Since this is a native method, I'm not certain what would be used... Some hash of the native representation or simply what hashCode() returns. I know from the source code of Sun's JDK that a String instance caches its hash code. It'll only be calculated the first time the method is called, and after that the calculated value will be returned. So at the very least, a hash must be calculated at least once if we're to use that. Getting a reliable hash of a String will probably involve arithmetic on each and every character, which can be expensive for lenghty values. Even once we have the hash and thus a set of Strings that are candidates for being matches in the interned pool, we'd still have to verify if one of these really is an exact match which would involve... an equality check. Meaning going through each and every character of the Strings and seeing if they match if trivial cases like inequal length can't be applied first. Worse still, we might have to do this for more than one other String like we'd do with a regular equals, since multiple Strings in the pool might have the same hash or end up in the same hash bucket.
So, that stuff we need to do to find out if a String was already interned or not sounds suspiciously like what equals would need to do. Basically, we've gained nothing and might even have made our equals implementation more expensive. At least, if we're going to call intern each and every time. So maybe we should intern the String right away and simply always use that interned reference. Let's check how class A would look if that were the case. I'm assuming the String field is initialized on construction:
public class A {
private final String field;
public A(final String s) {
field = s.intern();
}
}
That's looking a little more sensible. Any Strings that are passed to the constructor and are equal will end up being the same reference. Now we can safely use == between the field field of A instances for equality checks, right?
Well, it'd be useless. Why? If you check the source for equals in class String, you'll find that any implementation made by someone with half a brain will first do a == check to catch the trivial case where the instance and the argument are the same reference first. That could save a potentially heavy char-by-char comparison. I know the JDK 7 source I'm using for reference does this. So you're still better off using equals because it does that reference check anyway.
The second reason this'd be a bad idea is that first point way up above... We simply don't know if the instances are going to be kept in the pool indefinitely. Check this scenario, which may or may not occur depending on JVM implementation:
String s1 = ... //Somehow gets passed a non-interned "test" value
A a1 = new A(s1);
//Lots of time passes... winter comes and goes and spring returns the land to a lush green...
String s2 = ... //Somehow gets passed a non-interned "test" value
A a2 = new A(s2);
a1.equals(a2); //Totally returns the wrong result
What happened? Well, if it turns out the interned String pool will sometimes be culled of certain entries, then that first construction of an A could have s1 interned, only to see it being removed from the pool, to have it later replaced by that s2 instance. Since s1 and s2 are conceivably different instances, the == check fails. Can this happen? I've got no idea. I certainly won't go check the specs and native code to find out. Will the programmer that's going through your code with a debugger to find out why the hell "test" is not considered the same as "test"?
It's no problem if we're using equals. It'll catch the same instance case early for optimal results, which will benefit us when we've interned our Strings, but we won't have to worry about cases where the instances still end up being different because then equals is gonna do the classic compare work. It just goes to show that it's best not to second-guess the actual runtime implementation or compiler, because these things were made by people who know the specs like the back of their hands and really worry about performance.
So String interning manually can be of benefit when you make sure that...
you're not interning each and every time, but just intern a String once like when intializing a field and then keep using that interned instance;
you still use equals to make sure implementation details won't ruin your day and your code doesn't actually rely on that interning, instead relying on the implementation of the method to catch the trivial cases.
After keeping this in mind, surely it's worth using intern()? Well, we still don't know how expensive intern() is. It's a native method so it might be really fast. But we're not sure unless we check the code for our target platform and JVM implementation. We've also had to make sure we understand exactly what interning does and what assumptions we've made about it. Are you sure the next person reading your code will have the same level of understanding? They might be bewildered about this weird method they've never seen before that dabbles in JVM internals and might spend an hour reading the same gibberish I'm typing right now, instead of getting work done.
That's the problem right there... Before, it was simple. You used equals and were done. Now, you've added another little thing that can nestle itself in your mind and cause you to wake up screaming one night because you've just realized that oh my God you've forgot to take out one of the == uses and that piece of code is used in a routine controlling the killer bots' apprisal of citizen disobedience and you've heard its JVM isn't too solid!
Donald Knuth was famously attributed the quote...
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"
Knuth was clever enough to add in that 97% detail. Sometimes, thoroughly micro-optimizing a small portion of code can make a big difference. Say, if that piece of code takes up 30% of the program's runtime execution. The problem with micro-optimizations is that they tend to work on assumptions. When you start using intern() and believe that from then on it'll be safe to make reference equality checks, you've made a hell of a lot of assumptions. And even if you go down to implementation level to check if they're right, are you sure they will be in the next JRE version?
I myself have used intern() manually. Did it in some piece of code where the same handful of Strings are gonna end up in hundreds if not thousands of object instances as fields. Those fields are gonna be used as keys in HashMaps and are frequently used while doing some validation over those instances. I figured interning was worth it for two purposes: reducing memory overhead by making all those equal Strings one single instance and speeding up the map lookups, since they're using hashCode() and equals. But I've made damn sure that you can take all those intern() calls out of the code and everything will still work fine. The interning is just some icing on the cake in this case, a little extra that may or may not make a bit of difference along the road. But it's not an essential part of my code's correctness.
Long post, eh? Why'd I go through the trouble of typing all of this up? To show you that if you make micro-optimizations, you'd better know damn well what you're doing and willing to document it so thoroughly that you might as well not have bothered.
This is hard to say given that you have not specified hardware. Timing test are difficult to get right and are not universal. Have you done a timing test yourself?
My feeling is that the intern pattern would not be faster as each string would need to be matched to a possible string in a dictionary of all interned strings.

Why is myString.equals("aString"); different from "aString".equals(myString);?

I heard several times that in using boolean equals(Object o) to compare Strings, it's better to put the constant on the left side of the function as in the following:
Bad: myString.equals("aString");
Good: "aString".equals(myString);
Why is this?
Because if myString is null you get an exception. You know "aString" will never be null, so you can avoid that problem.
Often you'll see libraries that use nullSafeEquals(myString,"aString"); everywhere to avoid exactly that (since most times you compare objects, they aren't generated by the compiler!)
This is a defensive technique to protect against NullPointerExceptions. If your constant is always on the left, no chance you will get a NPE on that equals call.
This is poor design, because you are hiding NullPointerExceptions. Instead of being alerted that string is null, you will instead get some weird program behaviour and an exception being thrown somewhere else.
But that all depends if 'null' is a valid state for your string. In general 'null's should never be considered a reasonable object state for passing around.

Java : "xx".equals(variable) better than variable.equals("xx") , TRUE?

I'm reviewing a manual of best practices and recommendation coding java I think is doubtful.
Recomendation:
String variable;
"xx".equals(variable) // OK
variable.equals("xx") //Not recomended
Because prevents appearance of NullPointerException that are not controlled
Is this true?
This is a very common technique that causes the test to return false if the variable is null instead of throwing a NullPointerException. But I guess I'll be different and say that I wouldn't regard this as a recommendation that you always should follow.
I definitely think it is something that all Java programmers should be aware of as it is a common idiom.
It's also a useful technique to make code more concise (you can handle the null and not null case at the same time).
But:
It makes your code harder to read: "If blue is the sky..."
If you have just checked that your argument is not null on the previous line then it is unnecessary.
If you forgot to test for null and someone does come with a null argument that you weren't expecting it then a NullPointerException is not necessarily the worst possible outcome. Pretending everything is OK and carrying until it eventually fails later is not really a better alternative. Failing fast is good.
Personally I don't think usage of this technique should be required in all cases. I think it should be left to the programmer's judgement on a case-by-case basis. The important thing is to make sure you've handled the null case in an appropriate manner and how you do that depends on the situation. Checking correct handling of null values could be part of the testing / code review guidelines.
It is true. If variable is null in your example,
variable.equals("xx");
will throw a NPE because you can't call a method (equals) on a null object. But
"xx".equals(variable);
will just return false without error.
Actually, I think that the original recommendation is true. If you use variable.equals("xx"), then you will get a NullPointerException if variable is null. Putting the constant string on the left hand side avoids this possibility.
It's up to you whether this defense is worth the pain of what many people consider an unnatural idiom.
This is a common technique used in Java (and C#) programs. The first form avoids the null pointer exception because the .equals() method is called on the constant string "xx", which is never null. A non-null string compared to a null is false.
If you know that variable will never be null (and your program is incorrect in some other way if it is ever null), then using variable.equals("xx") is fine.
It's true that using any propertie of an object that way helps you to avoid the NPE.
But that's why we have Exceptions, to handle those kind of thing.
Maybe if you use "xx".equals(variable) you would never know if the value of variable is null or just isn't equal to "xx". IMO it's best to know that you are getting a null value in your variable, so you can reasign it, rather than just ignore it.
You are correct about the order of the check--if the variable is null, calling .equals on the string constant will prevent an NPE--but I'm not sure I consider this a good idea; Personally I call it "slop".
Slop is when you don't detect an abnormal condition but in fact create habits to personally avoid it's detection. Passing around a null as a string for an extended period of time will eventually lead to errors that may be obscure and hard to find.
Coding for slop is the opposite of "Fail fast fail hard".
Using a null as a string can occasionally make a great "Special" value, but the fact that you are trying to compare it to something indicates that your understanding of the system is incomplete (at best)--the sooner you find this fact out, the better.
On the other hand, making all variables final by default, using Generics and minimizing visibility of all objects/methods are habits that reduce slop.
If you need to check for null, I find this better readable than
if (variable != null && variable.equals("xx")). It's more a matter of personal preference.
As a side note, here is a design pattern where this code recommendation might not make any difference, since the String (i.e. Optional<String>) is never null because of the .isPresent() call from the design pattern:
Optional<String> gender = Optional.of("MALE");
if (gender.isPresent()) {
System.out.println("Value available.");
} else {
System.out.println("Value not available.");
}
gender.ifPresent(g -> System.out.println("Consumer: equals: " + g.equals("whatever")));

Categories

Resources