String Predicates to validate if a String contains numeric Value in java - java

Is the any Predicate Validation in java that checks whether String contains Numbers?
I want to allow special characters but no numbers or spaces. There are Predicates that checks for alphabets but they do they do not allow Special Characters, I need something that only allows alphabets and Special characters and return false if String contains spaces or numericals.

I will use an regex to show my understanding of the question. You want a Predicate<String> that returns true for any string matching
[a-zA-Z_]*
One way to do this regexlessly is to use a for loop and check each character:
Predicate<String> predicate = x -> {
for (int i = 0 ; i < x.length() ; i++) {
if (!Character.isLetter(x.charAt(i)) && !x.charAt(i) == '_') {
return false;
}
}
return true;
};
Here is a method that does the same thing:
public static boolean test(String x) {
for (int i = 0 ; i < x.length() ; i++) {
if (!Character.isLetter(x.charAt(i)) && !x.charAt(i) == '_') {
return false;
}
}
return true;
}

It may be done in a more elegant way:
Predicate<String> p = (s -> s.matches("[a-zA-Z\\_]*"));
Returning true for any string matching [a-zA-Z_]*.

Since your predicate shall return false if the string contains at least one digit or space character and else true, you can do the following:
Predicate<String> p = s -> !s.matches(".*[ \\d].*");
The advantage of this method is that every UTF-8 letter and every special character is valid in p. For some reason the other ansers allow only for ASCII letters ([a-zA-Z] and allow only underscores. I guess the question has been rewritten in the meanwhile.

Related

Regex for a pattern XXYYZZ

I want to validate a string which should follow the pattern XXYYZZ where X, Y, Z can be any letter a-z, A-Z or 0-9.
Example of valid strings:
RRFFKK
BB7733
WWDDMM
5599AA
Not valid:
555677
AABBCD
For now I am splitting the string using the regex (?<=(.))(?!\\1) and iterating over the resulting array and checking if each substring has a length of 2.
String str = "AABBEE";
boolean isValid = checkPattern(str);
public static boolean checkPattern(String str) {
String splited = str.split("(?<=(.))(?!\\1)");
for (String s : splited) {
if (s.length() != 2) {
return false;
}
}
return true;
}
I would like to replace my way of checking with String#matches and get rid of the loop, but can't come up with a valid regex. Can some one help what to put in someRegex in the below snippet?
public static boolean checkPattern(String str) {
return str.matches(someRegex);
}
You can use
s.matches("(\\p{Alnum})\\1(?!\\1)(\\p{Alnum})\\2(?!\\1|\\2)(\\p{Alnum})\\3")
See the regex demo.
Details
\A - start of string (it is implicit in String#matches) - the start of string
(\p{Alnum})\1 - an alphanumeric char (captured into Group 1) and an identical char right after
(?!\1) - the next char cannot be the same as in Group 1
(\p{Alnum})\2 - an alphanumeric char (captured into Group 2) and an identical char right after
(?!\1|\2) - the next char cannot be the same as in Group 1 and 2
(\p{Alnum})\3 - an alphanumeric char (captured into Group 3) and an identical char right after
\z - (implicit in String#matches) - end of string.
RegexPlanet test results:
Since you know a valid pattern will always be six characters long with three pairs of equal characters which are different from each other, a short series of explicit conditions may be simpler than a regex:
public static boolean checkPattern(String str) {
return str.length() == 6 &&
str.charAt(0) == str.chatAt(1) &&
str.charAt(2) == str.chatAt(3) &&
str.charAt(4) == str.chatAt(5) &&
str.charAt(0) != str.charAt(2) &&
str.charAt(0) != str.charAt(4) &&
str.charAt(2) != str.charAt(4);
}
Would the following work for you?
^(([A-Za-z\d])\2(?!.*\2)){3}$
See the online demo
^ - Start string anchor.
(- Open 1st capture group.
( - Open 2nd capture group.
[A-Za-z\d] - Any alphanumeric character.
) - Close 2nd capture group.
\2 - Match exactly what was just captured.
(?!.*\2) - Negative lookahead to make sure the same character is not used elsewhere.
) - Close 1st capture group.
{3} - Repeat the above three times.
$ - End string anchor.
Well, here's another solution that uses regex and streams in combination.
It breaks up the pattern into groups of two characters.
keeps the distinct groups.
and returns true if the count is 3.
String[] data = { "AABBBB", "AABBCC", "AAAAAA","AABBAA", "ABC", "AAABCC",
"RRABBCCC" };
String pat = "(?:\\G(.)\\1)+";
Pattern pattern = Pattern.compile(pat);
for (String str : data) {
Matcher m = pattern.matcher(str);
boolean isValid = m.results().map(MatchResult::group).distinct().count() == 3;
System.out.printf("%8s -> %s%n",
str, isValid ? "Valid" : "Not Valid");
}
Prints
AABBBB -> Not Valid
AABBCC -> Valid
AAAAAA -> Not Valid
AABBAA -> Not Valid
ABC -> Not Valid
AAABCC -> Not Valid
RRABBCCC -> Not Valid
You can check if a character matches with its following character and also if the count of distinct characters is 3.
Demo:
public class Main {
public static void main(String[] args) {
// Test
System.out.println(isValidPattern("RRFFKK"));
System.out.println(isValidPattern("BBAABB"));
System.out.println(isValidPattern("555677"));
}
static boolean isValidPattern(String str) {
return str.length() == 6 &&
str.charAt(0) == str.charAt(1) &&
str.charAt(2) == str.charAt(3) &&
str.charAt(4) == str.charAt(5) &&
str.chars().distinct().count() == 3;
}
}
Output:
true
false
false
Note: String#chars is available since Java-9.

How to check if String contains Latin letters without regex

I want to check if String contains only Latin letters but also can contains numbers and other symbols like: _/+), etc.
String utm_source=google should pass, utm_source=google&2019_and_2020! should pass too. But utm_ресурс=google should not pass (coz cyrillic letters). I know code with regex, but how can i do it without using regex and classic for loop, maybe with Streams and Character class?
Use this code
public static boolean isValidUsAscii (String s) {
return Charset.forName("US-ASCII").newEncoder().canEncode(s);
}
For restricted "latin" (no é etcetera), it must be either US-ASCII (7 bits), or ISO-8859-1 but without accented letters.
boolean isBasicLatin(String s) {
return s.codePoints().allMatch(cp -> cp < 128 || (cp < 256 && !isLetter(cp)));
}
Less of a neat single line approach but really all you need to do is check whether the numeric value of the character is within certain limits like so:
public boolean isQwerty(String text) {
int length = text.length();
for(int i = 0; i < length; i++) {
char character = text.charAt(i);
int ascii = character;
if(ascii<32||ascii>126) {
return false;
}
}
return true;
}
Test Run
ä returns false
abc returns true

Java string matching with wildcards

I have a pattern string with a wild card say X (E.g.: abc*).
Also I have a set of strings which I have to match against the given pattern.
E.g.:
abf - false
abc_fgh - true
abcgafa - true
fgabcafa - false
I tried using regex for the same, it didn't work.
Here is my code
String pattern = "abc*";
String str = "abcdef";
Pattern regex = Pattern.compile(pattern);
return regex.matcher(str).matches();
This returns false
Is there any other way to make this work?
Thanks
Just use bash style pattern to Java style pattern converter:
public static void main(String[] args) {
String patternString = createRegexFromGlob("abc*");
List<String> list = Arrays.asList("abf", "abc_fgh", "abcgafa", "fgabcafa");
list.forEach(it -> System.out.println(it.matches(patternString)));
}
private static String createRegexFromGlob(String glob) {
StringBuilder out = new StringBuilder("^");
for(int i = 0; i < glob.length(); ++i) {
final char c = glob.charAt(i);
switch(c) {
case '*': out.append(".*"); break;
case '?': out.append('.'); break;
case '.': out.append("\\."); break;
case '\\': out.append("\\\\"); break;
default: out.append(c);
}
}
out.append('$');
return out.toString();
}
Is there an equivalent of java.util.regex for “glob” type patterns?
Convert wildcard to a regex expression
you can use stringVariable.startsWith("abc")
abc* would be the RegEx that matches ab, abc, abcc, abccc and so on.
What you want is abc.* - if abc is supposed to be the beginning of the matched string and it's optional if anything follows it.
Otherwise you could prepend .* to also match strings with abc in the middle: .*abc.*
Generally i recommend playing around with a site like this to learn RegEx. You are asking for a pretty basic pattern but it's hard to say what you need exactly. Good Luck!
EDIT:
It seems like you want the user to type a part of a file name (or so) and you want to offer something like a search functionality (you could have made that clear in your question IMO). In this case you could bake your own RegEx from the users' input:
private Pattern getSearchRegEx(String userInput){
return Pattern.compile(".*" + userInput + ".*");
}
Of course that's just a very simple example. You could modify this and then use the RegEx to match file names.
So I thin here is your answer:
The regexp that you are looking for is this : [a][b][c].*
Here is my code that works:
String first = "abc"; // true
String second = "abctest"; // true
String third = "sthabcsth"; // false
Pattern pattern = Pattern.compile("[a][b][c].*");
System.out.println(first.matches(pattern.pattern())); // true
System.out.println(second.matches(pattern.pattern())); // true
System.out.println(third.matches(pattern.pattern())); // false
But if you want to check only if starts with or ends with you can use the methods of String: .startsWith() and endsWith()
// The main function that checks if two given strings match. The pattern string may contain
// wildcard characters
default boolean matchPattern(String pattern, String str) {
// If we reach at the end of both strings, we are done
if (pattern.length() == 0 && str.length() == 0) return true;
// Make sure that the characters after '*' are present in str string. This function assumes that
// the pattern string will not contain two consecutive '*'
if (pattern.length() > 1 && pattern.charAt(0) == '*' && str.length() == 0) return false;
// If the pattern string contains '?', or current characters of both strings match
if ((pattern.length() > 1 && pattern.charAt(0) == '?')
|| (pattern.length() != 0 && str.length() != 0 && pattern.charAt(0) == str.charAt(0)))
return matchPattern(pattern.substring(1), str.substring(1));
// If there is *, then there are two possibilities
// a: We consider current character of str string
// b: We ignore current character of str string.
if (pattern.length() > 0 && pattern.charAt(0) == '*')
return matchPattern(pattern.substring(1), str) || matchPattern(pattern, str.substring(1));
return false;
}
public static void main(String[] args) {
test("w*ks", "weeks"); // Yes
test("we?k*", "weekend"); // Yes
test("g*k", "gee"); // No because 'k' is not in second
test("*pqrs", "pqrst"); // No because 't' is not in first
test("abc*bcd", "abcdhghgbcd"); // Yes
test("abc*c?d", "abcd"); // No because second must have 2 instances of 'c'
test("*c*d", "abcd"); // Yes
test("*?c*d", "abcd"); // Yes
}

Checking for a not null, not blank String in Java

I am trying to check if a Java String is not null, not empty and not whitespace.
In my mind, this code should have been quite up for the job.
public static boolean isEmpty(String s) {
if ((s != null) && (s.trim().length() > 0))
return false;
else
return true;
}
As per documentation, String.trim() should work thus:
Returns a copy of the string, with leading and trailing whitespace omitted.
If this String object represents an empty character sequence, or the first and last characters of character sequence represented by this String object both have codes greater than '\u0020' (the space character), then a reference to this String object is returned.
However, apache/commons/lang/StringUtils.java does it a little differently.
public static boolean isBlank(String str) {
int strLen;
if (str == null || (strLen = str.length()) == 0) {
return true;
}
for (int i = 0; i < strLen; i++) {
if ((Character.isWhitespace(str.charAt(i)) == false)) {
return false;
}
}
return true;
}
As per documentation, Character.isWhitespace():
Determines if the specified character is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:
It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
It is '\t', U+0009 HORIZONTAL TABULATION.
It is '\n', U+000A LINE FEED.
It is '\u000B', U+000B VERTICAL TABULATION.
It is '\f', U+000C FORM FEED.
It is '\r', U+000D CARRIAGE RETURN.
It is '\u001C', U+001C FILE SEPARATOR.
It is '\u001D', U+001D GROUP SEPARATOR.
It is '\u001E', U+001E RECORD SEPARATOR.
It is '\u001F', U+001F UNIT SEPARATOR.
If I am not mistaken - or might be I am just not reading it correctly - the String.trim() should take away any of the characters that are being checked by Character.isWhiteSpace(). All of them see to be above '\u0020'.
In this case, the simpler isEmpty function seems to be covering all the scenarios that the lengthier isBlank is covering.
Is there a string that will make the isEmpty and isBlank behave differently in a test case?
Assuming there are none, is there any other consideration because of which I should choose isBlank and not use isEmpty?
For those interested in actually running a test, here are the methods and unit tests.
public class StringUtil {
public static boolean isEmpty(String s) {
if ((s != null) && (s.trim().length() > 0))
return false;
else
return true;
}
public static boolean isBlank(String str) {
int strLen;
if (str == null || (strLen = str.length()) == 0) {
return true;
}
for (int i = 0; i < strLen; i++) {
if ((Character.isWhitespace(str.charAt(i)) == false)) {
return false;
}
}
return true;
}
}
And unit tests
#Test
public void test() {
String s = null;
assertTrue(StringUtil.isEmpty(s)) ;
assertTrue(StringUtil.isBlank(s)) ;
s = "";
assertTrue(StringUtil.isEmpty(s)) ;
assertTrue(StringUtil.isBlank(s));
s = " ";
assertTrue(StringUtil.isEmpty(s)) ;
assertTrue(StringUtil.isBlank(s)) ;
s = " ";
assertTrue(StringUtil.isEmpty(s)) ;
assertTrue(StringUtil.isBlank(s)) ;
s = " a ";
assertTrue(StringUtil.isEmpty(s)==false) ;
assertTrue(StringUtil.isBlank(s)==false) ;
}
Update: It was a really interesting discussion - and this is why I love Stack Overflow and the folks here. By the way, coming back to the question, we got:
A program showing which all characters will make the behave differently. The code is at https://ideone.com/ELY5Wv. Thanks #Dukeling.
A performance related reason for choosing the standard isBlank(). Thanks #devconsole.
A comprehensive explanation by #nhahtdh. Thanks mate.
Is there a string that will make the isEmpty and isBlank behave differently in a test case?
Note that Character.isWhitespace can recognize Unicode characters and return true for Unicode whitespace characters.
Determines if the specified character is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:
It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
[...]
On the other hand, trim() method would trim all control characters whose code points are below U+0020 and the space character (U+0020).
Therefore, the two methods would behave differently at presence of a Unicode whitespace character. For example: "\u2008". Or when the string contains control characters that are not consider whitespace by Character.isWhitespace method. For example: "\002".
If you were to write a regular expression to do this (which is slower than doing a loop through the string and check):
isEmpty() would be equivalent to .matches("[\\x00-\\x20]*")
isBlank() would be equivalent to .matches("\\p{javaWhitespace}*")
(The isEmpty() and isBlank() method both allow for null String reference, so it is not exactly equivalent to the regex solution, but putting that aside, it is equivalent).
Note that \p{javaWhitespace}, as its name implied, is Java-specific syntax to access the character class defined by Character.isWhitespace method.
Assuming there are none, is there any other consideration because of which I should choose isBlank and not use isEmpty?
It depends. However, I think the explanation in the part above should be sufficient for you to decide. To sum up the difference:
isEmpty() will consider the string is empty if it contains only control characters1 below U+0020 and space character (U+0020)
isBlank will consider the string is empty if it contains only whitespace characters as defined by Character.isWhitespace method, which includes Unicode whitespace characters.
1 There is also the control character at U+007F DELETE, which is not trimmed by trim() method.
The purpose of the two standard methods is to distinguish between this two cases:
org.apache.common.lang.StringUtils.isBlank(" ") (will return true).
org.apache.common.lang.StringUtils.isEmpty(" ") (will return false).
Your custom implementation of isEmpty() will return true.
UPDATE:
org.apache.common.lang.StringUtils.isEmpty() is used to find if the String is length 0 or null.
org.apache.common.lang.StringUtils.isBlank() takes it a step forward. It not only checks if the String is length 0 or null, but also checks if it is only a whitespace string.
In your case, you're trimming the String in your isEmpty method. The only difference that can occur now can't occur (the case you gives it " ") because you're trimming it (Removing the trailing whitespace - which is in this case is like removing all spaces).
I would choose isBlank() over isEmpty() because trim() creates a new String object that has to be garbage collected later. isBlank() on the other hand does not create any objects.
You could take a look at JSR 303 Bean Validtion wich contains the Annotatinos #NotEmpty and #NotNull. Bean Validation is cool because you can seperate validation issues from the original intend of the method.
Why can't you simply use a nested ternary operator to achieve this.Please look into the sample code
public static void main(String[] args)
{
String s = null;
String s1="";
String s2="hello";
System.out.println(" 1 "+check(s));
System.out.println(" 2 "+check(s1));
System.out.println(" 3 "+check(s2));
}
public static boolean check(String data)
{
return (data==null?false:(data.isEmpty()?false:true));
}
and the output is as follows
1 false 2 false 3 true
here the 1st 2 scenarios returns false (i.e null and empty)and the 3rd scenario returns true
<%
System.out.println(request.getParameter("userName")+"*");
if (request.getParameter("userName").trim().length() == 0 | request.getParameter("userName") == null) { %>
<jsp:forward page="HandleIt.jsp" />
<% }
else { %>
Hello ${param.userName}
<%} %>
This simple code will do enough:
public static boolean isNullOrEmpty(String str) {
return str == null || str.trim().equals("");
}
And the unit tests:
#Test
public void testIsNullOrEmpty() {
assertEquals(true, AcdsUtils.isNullOrEmpty(""));
assertEquals(true, AcdsUtils.isNullOrEmpty((String) null));
assertEquals(false, AcdsUtils.isNullOrEmpty("lol "));
assertEquals(false, AcdsUtils.isNullOrEmpty("HallO"));
}
With Java 8, you could also use the Optional capability with filtering. To check if a string is blank, the code is pure Java SE without additional library.
The following code illustre a isBlank() implementation.
String.trim() behaviour
!Optional.ofNullable(tocheck).filter(e -> e != null && e.trim().length() > 0).isPresent()
StringUtils.isBlank() behaviour
Optional.ofNullable(toCheck)
.filter(e ->
{
int strLen;
if (str == null || (strLen = str.length()) == 0) {
return true;
}
for (int i = 0; i < strLen; i++) {
if ((Character.isWhitespace(str.charAt(i)) == false)) {
return false;
}
}
return true;
})
.isPresent()

Java function to return if string contains illegal characters

I have the following characters that I would like to be considered "illegal":
~, #, #, *, +, %, {, }, <, >, [, ], |, “, ”, \, _, ^
I'd like to write a method that inspects a string and determines (true/false) if that string contains these illegals:
public boolean containsIllegals(String toExamine) {
return toExamine.matches("^.*[~##*+%{}<>[]|\"\\_^].*$");
}
However, a simple matches(...) check isn't feasible for this. I need the method to scan every character in the string and make sure it's not one of these characters. Of course, I could do something horrible like:
public boolean containsIllegals(String toExamine) {
for(int i = 0; i < toExamine.length(); i++) {
char c = toExamine.charAt(i);
if(c == '~')
return true;
else if(c == '#')
return true;
// etc...
}
}
Is there a more elegant/efficient way of accomplishing this?
You can make use of Pattern and Matcher class here. You can put all the filtered character in a character class, and use Matcher#find() method to check whether your pattern is available in string or not.
You can do it like this: -
public boolean containsIllegals(String toExamine) {
Pattern pattern = Pattern.compile("[~##*+%{}<>\\[\\]|\"\\_^]");
Matcher matcher = pattern.matcher(toExamine);
return matcher.find();
}
find() method will return true, if the given pattern is found in the string, even once.
Another way that has not yet been pointed out is using String#split(regex). We can split the string on the given pattern, and check the length of the array. If length is 1, then the pattern was not in the string.
public boolean containsIllegals(String toExamine) {
String[] arr = toExamine.split("[~##*+%{}<>\\[\\]|\"\\_^]", 2);
return arr.length > 1;
}
If arr.length > 1, that means the string contained one of the character in the pattern, that is why it was splitted. I have passed limit = 2 as second parameter to split, because we are ok with just single split.
I need the method to scan every character in the string
If you must do it character-by-character, regexp is probably not a good way to go. However, since all characters on your "blacklist" have codes less than 128, you can do it with a small boolean array:
static final boolean blacklist[] = new boolean[128];
static {
// Unassigned elements of the array are set to false
blacklist[(int)'~'] = true;
blacklist[(int)'#'] = true;
blacklist[(int)'#'] = true;
blacklist[(int)'*'] = true;
blacklist[(int)'+'] = true;
...
}
static isBad(char ch) {
return (ch < 128) && blacklist[(int)ch];
}
Use a constant for avoids recompile the regex in every validation.
private static final Pattern INVALID_CHARS_PATTERN =
Pattern.compile("^.*[~##*+%{}<>\\[\\]|\"\\_].*$");
And change your code to:
public boolean containsIllegals(String toExamine) {
return INVALID_CHARS_PATTERN.matcher(toExamine).matches();
}
This is the most efficient way with Regex.
If you can't use a matcher, then you can do something like this, which is cleaner than a bunch of different if statements or a byte array.
for(int i = 0; i < toExamine.length(); i++) {
char c = toExamine.charAt(i);
if("~##*+%{}<>[]|\"_^".contains(c)){
return true;
}
}
Try the negation of a character class containing all the blacklisted characters:
public boolean containsIllegals(String toExamine) {
return toExamine.matches("[^~##*+%{}<>\\[\\]|\"\\_^]*");
}
This will return true if the string contains illegals (your original function seemed to return false in that case).
The caret ^ just to the right of the opening bracket [ negates the character class. Note that in String.matches() you don't need the anchors ^ and $ because it automatically matches the whole string.
A pretty compact way of doing this would be to rely on the String.replaceAll method:
public boolean containsIllegal(final String toExamine) {
return toExamine.length() != toExamine.replaceAll(
"[~##*+%{}<>\\[\\]|\"\\_^]", "").length();
}

Categories

Resources