Checking for a not null, not blank String in Java - java

I am trying to check if a Java String is not null, not empty and not whitespace.
In my mind, this code should have been quite up for the job.
public static boolean isEmpty(String s) {
if ((s != null) && (s.trim().length() > 0))
return false;
else
return true;
}
As per documentation, String.trim() should work thus:
Returns a copy of the string, with leading and trailing whitespace omitted.
If this String object represents an empty character sequence, or the first and last characters of character sequence represented by this String object both have codes greater than '\u0020' (the space character), then a reference to this String object is returned.
However, apache/commons/lang/StringUtils.java does it a little differently.
public static boolean isBlank(String str) {
int strLen;
if (str == null || (strLen = str.length()) == 0) {
return true;
}
for (int i = 0; i < strLen; i++) {
if ((Character.isWhitespace(str.charAt(i)) == false)) {
return false;
}
}
return true;
}
As per documentation, Character.isWhitespace():
Determines if the specified character is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:
It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
It is '\t', U+0009 HORIZONTAL TABULATION.
It is '\n', U+000A LINE FEED.
It is '\u000B', U+000B VERTICAL TABULATION.
It is '\f', U+000C FORM FEED.
It is '\r', U+000D CARRIAGE RETURN.
It is '\u001C', U+001C FILE SEPARATOR.
It is '\u001D', U+001D GROUP SEPARATOR.
It is '\u001E', U+001E RECORD SEPARATOR.
It is '\u001F', U+001F UNIT SEPARATOR.
If I am not mistaken - or might be I am just not reading it correctly - the String.trim() should take away any of the characters that are being checked by Character.isWhiteSpace(). All of them see to be above '\u0020'.
In this case, the simpler isEmpty function seems to be covering all the scenarios that the lengthier isBlank is covering.
Is there a string that will make the isEmpty and isBlank behave differently in a test case?
Assuming there are none, is there any other consideration because of which I should choose isBlank and not use isEmpty?
For those interested in actually running a test, here are the methods and unit tests.
public class StringUtil {
public static boolean isEmpty(String s) {
if ((s != null) && (s.trim().length() > 0))
return false;
else
return true;
}
public static boolean isBlank(String str) {
int strLen;
if (str == null || (strLen = str.length()) == 0) {
return true;
}
for (int i = 0; i < strLen; i++) {
if ((Character.isWhitespace(str.charAt(i)) == false)) {
return false;
}
}
return true;
}
}
And unit tests
#Test
public void test() {
String s = null;
assertTrue(StringUtil.isEmpty(s)) ;
assertTrue(StringUtil.isBlank(s)) ;
s = "";
assertTrue(StringUtil.isEmpty(s)) ;
assertTrue(StringUtil.isBlank(s));
s = " ";
assertTrue(StringUtil.isEmpty(s)) ;
assertTrue(StringUtil.isBlank(s)) ;
s = " ";
assertTrue(StringUtil.isEmpty(s)) ;
assertTrue(StringUtil.isBlank(s)) ;
s = " a ";
assertTrue(StringUtil.isEmpty(s)==false) ;
assertTrue(StringUtil.isBlank(s)==false) ;
}
Update: It was a really interesting discussion - and this is why I love Stack Overflow and the folks here. By the way, coming back to the question, we got:
A program showing which all characters will make the behave differently. The code is at https://ideone.com/ELY5Wv. Thanks #Dukeling.
A performance related reason for choosing the standard isBlank(). Thanks #devconsole.
A comprehensive explanation by #nhahtdh. Thanks mate.

Is there a string that will make the isEmpty and isBlank behave differently in a test case?
Note that Character.isWhitespace can recognize Unicode characters and return true for Unicode whitespace characters.
Determines if the specified character is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:
It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
[...]
On the other hand, trim() method would trim all control characters whose code points are below U+0020 and the space character (U+0020).
Therefore, the two methods would behave differently at presence of a Unicode whitespace character. For example: "\u2008". Or when the string contains control characters that are not consider whitespace by Character.isWhitespace method. For example: "\002".
If you were to write a regular expression to do this (which is slower than doing a loop through the string and check):
isEmpty() would be equivalent to .matches("[\\x00-\\x20]*")
isBlank() would be equivalent to .matches("\\p{javaWhitespace}*")
(The isEmpty() and isBlank() method both allow for null String reference, so it is not exactly equivalent to the regex solution, but putting that aside, it is equivalent).
Note that \p{javaWhitespace}, as its name implied, is Java-specific syntax to access the character class defined by Character.isWhitespace method.
Assuming there are none, is there any other consideration because of which I should choose isBlank and not use isEmpty?
It depends. However, I think the explanation in the part above should be sufficient for you to decide. To sum up the difference:
isEmpty() will consider the string is empty if it contains only control characters1 below U+0020 and space character (U+0020)
isBlank will consider the string is empty if it contains only whitespace characters as defined by Character.isWhitespace method, which includes Unicode whitespace characters.
1 There is also the control character at U+007F DELETE, which is not trimmed by trim() method.

The purpose of the two standard methods is to distinguish between this two cases:
org.apache.common.lang.StringUtils.isBlank(" ") (will return true).
org.apache.common.lang.StringUtils.isEmpty(" ") (will return false).
Your custom implementation of isEmpty() will return true.
UPDATE:
org.apache.common.lang.StringUtils.isEmpty() is used to find if the String is length 0 or null.
org.apache.common.lang.StringUtils.isBlank() takes it a step forward. It not only checks if the String is length 0 or null, but also checks if it is only a whitespace string.
In your case, you're trimming the String in your isEmpty method. The only difference that can occur now can't occur (the case you gives it " ") because you're trimming it (Removing the trailing whitespace - which is in this case is like removing all spaces).

I would choose isBlank() over isEmpty() because trim() creates a new String object that has to be garbage collected later. isBlank() on the other hand does not create any objects.

You could take a look at JSR 303 Bean Validtion wich contains the Annotatinos #NotEmpty and #NotNull. Bean Validation is cool because you can seperate validation issues from the original intend of the method.

Why can't you simply use a nested ternary operator to achieve this.Please look into the sample code
public static void main(String[] args)
{
String s = null;
String s1="";
String s2="hello";
System.out.println(" 1 "+check(s));
System.out.println(" 2 "+check(s1));
System.out.println(" 3 "+check(s2));
}
public static boolean check(String data)
{
return (data==null?false:(data.isEmpty()?false:true));
}
and the output is as follows
1 false 2 false 3 true
here the 1st 2 scenarios returns false (i.e null and empty)and the 3rd scenario returns true

<%
System.out.println(request.getParameter("userName")+"*");
if (request.getParameter("userName").trim().length() == 0 | request.getParameter("userName") == null) { %>
<jsp:forward page="HandleIt.jsp" />
<% }
else { %>
Hello ${param.userName}
<%} %>

This simple code will do enough:
public static boolean isNullOrEmpty(String str) {
return str == null || str.trim().equals("");
}
And the unit tests:
#Test
public void testIsNullOrEmpty() {
assertEquals(true, AcdsUtils.isNullOrEmpty(""));
assertEquals(true, AcdsUtils.isNullOrEmpty((String) null));
assertEquals(false, AcdsUtils.isNullOrEmpty("lol "));
assertEquals(false, AcdsUtils.isNullOrEmpty("HallO"));
}

With Java 8, you could also use the Optional capability with filtering. To check if a string is blank, the code is pure Java SE without additional library.
The following code illustre a isBlank() implementation.
String.trim() behaviour
!Optional.ofNullable(tocheck).filter(e -> e != null && e.trim().length() > 0).isPresent()
StringUtils.isBlank() behaviour
Optional.ofNullable(toCheck)
.filter(e ->
{
int strLen;
if (str == null || (strLen = str.length()) == 0) {
return true;
}
for (int i = 0; i < strLen; i++) {
if ((Character.isWhitespace(str.charAt(i)) == false)) {
return false;
}
}
return true;
})
.isPresent()

Related

Check if string contains only Unicode values [\u0030-\u0039] or [\u0660-\u0669]

I need to check, in java, if a string is composed only of Unicode values [\u0030-\u0039] or [\u0660-\u0669]. What is the most efficient way of doing this?
Use \x for unicode characters:
^([\x{0030}-\x{0039}\x{0660}-\x{0669}]+)$
if the patternt should match an empty string too, use * instead of +
Use this if you dont want to allows mixing characters from both sets you provided:
^([\x{0030}-\x{0039}]+|[\x{0660}-\x{0669}]+)$
https://regex101.com/r/xqWL4q/6
As mentioned by Holger in comments below. \x{0030}-\x{0039} is equivalent with [0-9]. So could be substituted and would be more readable.
As said here, it’s not clear whether you want to check for probably mixed occurrences of these digits or check for either of these ranges.
A simple check for mixed digits would be string.matches("[0-9٠-٩]*") or to avoid confusing changes of the read/write direction, or if your source code encoding doesn’t support all characters, string.matches("[0-9\u0660-\u669]*").
Checking whether the string matches either range, can be done using
string.matches("[0-9]*")||string.matches("[٠-٩]*") or
string.matches("[0-9]*")||string.matches("[\u0660-\u669]*").
An alternative would be
string.chars().allMatch(c -> c >= '0' && c <= '9' || c >= '٠' && c <= '٩').
Or to check for either, string.chars().allMatch(c -> c >= '0' && c <= '9') || string.chars().allMatch(c -> c >= '٠' && c <= '٩')
Since these codepoints represent numerals in two different unicode blocks,
I suggest to check if respective character is a numeral:
boolean isNumerals(String s) {
return !s.chars().anyMatch(v -> !Character.isDigit(v));
}
This will definitely match more than asked for, but in some cases or in more controlled environment it may be useful to make code more readable.
(edit)
Java API also allows to determine a unicode block of a specific character:
Character.UnicodeBlock arabic = Character.UnicodeBlock.ARABIC;
Character.UnicodeBlock latin = Character.UnicodeBlock.BASIC_LATIN;
boolean isValidBlock(String s) {
return s.chars().allMatch(v ->
Character.UnicodeBlock.of(v).equals(arabic) ||
Character.UnicodeBlock.of(v).equals(latin)
);
}
Combined with the check above will give exact result OP has asked for.
On the plus side - higher abstraction gives more flexibility, makes code more readable and is not dependent on exact encoding of string passed.
simple solution by using regex:
(see also lot better explained by #Predicate https://stackoverflow.com/a/60597367/12558456)
private boolean legalRegex(String s) {
return s.matches("^([\u0030-\u0039]|[\u0660-\u0669])*$");
}
faster but ugly solution: (needs a hashset of allowed chars)
private boolean legalCharactersOnly(String s) {
for (char c:s.toCharArray()) {
if (!allowedCharacters.contains(c)) {
return false;
}
}
return true;
}
Here is a solution which works without regex for arbitrary unicode code points (outside of the Basic Multilingual Plane).
private final Set<Integer> codePoints = new HashSet<Integer>();
public boolean test(String string) {
for (int i = 0, codePoint = 0; i < string.length(); i += Character.charCount(codePoint)) {
codePoint = string.codePointAt(i);
if (!codePoints.contains(codePoint)) {
return false;
}
}
return true;
}

String Predicates to validate if a String contains numeric Value in java

Is the any Predicate Validation in java that checks whether String contains Numbers?
I want to allow special characters but no numbers or spaces. There are Predicates that checks for alphabets but they do they do not allow Special Characters, I need something that only allows alphabets and Special characters and return false if String contains spaces or numericals.
I will use an regex to show my understanding of the question. You want a Predicate<String> that returns true for any string matching
[a-zA-Z_]*
One way to do this regexlessly is to use a for loop and check each character:
Predicate<String> predicate = x -> {
for (int i = 0 ; i < x.length() ; i++) {
if (!Character.isLetter(x.charAt(i)) && !x.charAt(i) == '_') {
return false;
}
}
return true;
};
Here is a method that does the same thing:
public static boolean test(String x) {
for (int i = 0 ; i < x.length() ; i++) {
if (!Character.isLetter(x.charAt(i)) && !x.charAt(i) == '_') {
return false;
}
}
return true;
}
It may be done in a more elegant way:
Predicate<String> p = (s -> s.matches("[a-zA-Z\\_]*"));
Returning true for any string matching [a-zA-Z_]*.
Since your predicate shall return false if the string contains at least one digit or space character and else true, you can do the following:
Predicate<String> p = s -> !s.matches(".*[ \\d].*");
The advantage of this method is that every UTF-8 letter and every special character is valid in p. For some reason the other ansers allow only for ASCII letters ([a-zA-Z] and allow only underscores. I guess the question has been rewritten in the meanwhile.

Checking to see if a string is letters + spaces ONLY?

I want to write a static method that is passed a string and that checks to see if the string is made up of just letters and spaces. I can use String's methods length() and charAt(i) as needed..
I was thinking something like the following: (Sorry about the pseudocode)
public static boolean onlyLettersSpaces(String s){
for(i=0;i<s.length();i++){
if (s.charAt(i) != a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z) {
return false;
break;
}else {
return true;
}
}
I know there is probably an error in my coding, and there is probably a much easier way to do it, but please let me know your suggestions!
use a regex. This one only matches if it starts with, contains, and ends with only letters and spaces.
^[ A-Za-z]+$
In Java, initialize this as a pattern and check if it matches your strings.
Pattern p = Pattern.compile("^[ A-Za-z]+$");
Matcher m = p.matcher("aaaaab");
boolean b = m.matches();
That isn't how you test character equality, one easy fix would be
public static boolean onlyLettersSpaces(String s){
for(i=0;i<s.length();i++){
char ch = s.charAt(i);
if (Character.isLetter(ch) || ch == ' ') {
continue;
}
return false;
}
return true;
}
For the constraints your mentioned (use of only length() and charAt()), you got it almost right.
You do loop over each character and check if its one of the acceptable characters - thats the right way. If you find a non-acceptable character, you immediately return "false", thats also good. Whats wrong is that if you determined to accept the character, you do return true. But the definition says only to return true if all characters are accepted. You need to move the "return true" to after the loop (thats the point at which you will know that all characters were accepted)
So you change your pseudocode to:
for (all characters in string) {
if (character is bad) {
// one bad character means reject the string, we're done.
return false;
}
}
// we now know all chars are good
return true;

Java function to return if string contains illegal characters

I have the following characters that I would like to be considered "illegal":
~, #, #, *, +, %, {, }, <, >, [, ], |, “, ”, \, _, ^
I'd like to write a method that inspects a string and determines (true/false) if that string contains these illegals:
public boolean containsIllegals(String toExamine) {
return toExamine.matches("^.*[~##*+%{}<>[]|\"\\_^].*$");
}
However, a simple matches(...) check isn't feasible for this. I need the method to scan every character in the string and make sure it's not one of these characters. Of course, I could do something horrible like:
public boolean containsIllegals(String toExamine) {
for(int i = 0; i < toExamine.length(); i++) {
char c = toExamine.charAt(i);
if(c == '~')
return true;
else if(c == '#')
return true;
// etc...
}
}
Is there a more elegant/efficient way of accomplishing this?
You can make use of Pattern and Matcher class here. You can put all the filtered character in a character class, and use Matcher#find() method to check whether your pattern is available in string or not.
You can do it like this: -
public boolean containsIllegals(String toExamine) {
Pattern pattern = Pattern.compile("[~##*+%{}<>\\[\\]|\"\\_^]");
Matcher matcher = pattern.matcher(toExamine);
return matcher.find();
}
find() method will return true, if the given pattern is found in the string, even once.
Another way that has not yet been pointed out is using String#split(regex). We can split the string on the given pattern, and check the length of the array. If length is 1, then the pattern was not in the string.
public boolean containsIllegals(String toExamine) {
String[] arr = toExamine.split("[~##*+%{}<>\\[\\]|\"\\_^]", 2);
return arr.length > 1;
}
If arr.length > 1, that means the string contained one of the character in the pattern, that is why it was splitted. I have passed limit = 2 as second parameter to split, because we are ok with just single split.
I need the method to scan every character in the string
If you must do it character-by-character, regexp is probably not a good way to go. However, since all characters on your "blacklist" have codes less than 128, you can do it with a small boolean array:
static final boolean blacklist[] = new boolean[128];
static {
// Unassigned elements of the array are set to false
blacklist[(int)'~'] = true;
blacklist[(int)'#'] = true;
blacklist[(int)'#'] = true;
blacklist[(int)'*'] = true;
blacklist[(int)'+'] = true;
...
}
static isBad(char ch) {
return (ch < 128) && blacklist[(int)ch];
}
Use a constant for avoids recompile the regex in every validation.
private static final Pattern INVALID_CHARS_PATTERN =
Pattern.compile("^.*[~##*+%{}<>\\[\\]|\"\\_].*$");
And change your code to:
public boolean containsIllegals(String toExamine) {
return INVALID_CHARS_PATTERN.matcher(toExamine).matches();
}
This is the most efficient way with Regex.
If you can't use a matcher, then you can do something like this, which is cleaner than a bunch of different if statements or a byte array.
for(int i = 0; i < toExamine.length(); i++) {
char c = toExamine.charAt(i);
if("~##*+%{}<>[]|\"_^".contains(c)){
return true;
}
}
Try the negation of a character class containing all the blacklisted characters:
public boolean containsIllegals(String toExamine) {
return toExamine.matches("[^~##*+%{}<>\\[\\]|\"\\_^]*");
}
This will return true if the string contains illegals (your original function seemed to return false in that case).
The caret ^ just to the right of the opening bracket [ negates the character class. Note that in String.matches() you don't need the anchors ^ and $ because it automatically matches the whole string.
A pretty compact way of doing this would be to rely on the String.replaceAll method:
public boolean containsIllegal(final String toExamine) {
return toExamine.length() != toExamine.replaceAll(
"[~##*+%{}<>\\[\\]|\"\\_^]", "").length();
}

Whitespaces in java

What are kinds of whitespaces in Java?
I need to check in my code if the text contains any whitespaces.
My code is:
if (text.contains(" ") || text.contains("\t") || text.contains("\r")
|| text.contains("\n"))
{
//code goes here
}
I already know about \n ,\t ,\r and space.
For a non-regular expression approach, you can check Character.isWhitespace for each character.
boolean containsWhitespace(String s) {
for (int i = 0; i < s.length(); ++i) {
if (Character.isWhitespace(s.charAt(i)) {
return true;
}
}
return false;
}
Which are the white spaces in Java?
The documentation specifies what Java considers to be whitespace:
public static boolean isWhitespace(char ch)
Determines if the specified character is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:
It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space
('\u00A0', '\u2007', '\u202F').
It is '\u0009', HORIZONTAL TABULATION.
It is '\u000A', LINE FEED.
It is '\u000B', VERTICAL TABULATION.
It is '\u000C', FORM FEED.
It is '\u000D', CARRIAGE RETURN.
It is '\u001C', FILE SEPARATOR.
It is '\u001D', GROUP SEPARATOR.
It is '\u001E', RECORD SEPARATOR.
It is '\u001F', UNIT SEPARATOR.
boolean containsWhitespace = false;
for (int i = 0; i < text.length() && !containsWhitespace; i++) {
if (Character.isWhitespace(text.charAt(i)) {
containsWhitespace = true;
}
}
return containsWhitespace;
or, using Guava,
boolean containsWhitespace = CharMatcher.WHITESPACE.matchesAnyOf(text);
If you want to consider a regular expression based way of doing it
if(text.split("\\s").length > 1){
//text contains whitespace
}
Use Character.isWhitespace() rather than creating your own.
In Java how does one turn a String into a char or a char into a String?
If you can use apache.commons.lang in your project, the easiest way would be just to use the method provided there:
public static boolean containsWhitespace(CharSequence seq)
Check whether the given CharSequence contains any whitespace characters.
Parameters:
seq - the CharSequence to check (may be null)
Returns:
true if the CharSequence is not empty and contains at least 1 whitespace character
It handles empty and null parameters and provides the functionality at a central place.
From sun docs:
\s A whitespace character: [ \t\n\x0B\f\r]
The simplest way is to use it with regex.
boolean whitespaceSearchRegExp(String input) {
return java.util.regex.Pattern.compile("\\s").matcher(input).find();
}
Why don't you check if text.trim() has a different length? :
if(text.length() == text.trim().length() || otherConditions){
//your code
}

Categories

Resources