Java string matching with wildcards - java

I have a pattern string with a wild card say X (E.g.: abc*).
Also I have a set of strings which I have to match against the given pattern.
E.g.:
abf - false
abc_fgh - true
abcgafa - true
fgabcafa - false
I tried using regex for the same, it didn't work.
Here is my code
String pattern = "abc*";
String str = "abcdef";
Pattern regex = Pattern.compile(pattern);
return regex.matcher(str).matches();
This returns false
Is there any other way to make this work?
Thanks

Just use bash style pattern to Java style pattern converter:
public static void main(String[] args) {
String patternString = createRegexFromGlob("abc*");
List<String> list = Arrays.asList("abf", "abc_fgh", "abcgafa", "fgabcafa");
list.forEach(it -> System.out.println(it.matches(patternString)));
}
private static String createRegexFromGlob(String glob) {
StringBuilder out = new StringBuilder("^");
for(int i = 0; i < glob.length(); ++i) {
final char c = glob.charAt(i);
switch(c) {
case '*': out.append(".*"); break;
case '?': out.append('.'); break;
case '.': out.append("\\."); break;
case '\\': out.append("\\\\"); break;
default: out.append(c);
}
}
out.append('$');
return out.toString();
}
Is there an equivalent of java.util.regex for “glob” type patterns?
Convert wildcard to a regex expression

you can use stringVariable.startsWith("abc")

abc* would be the RegEx that matches ab, abc, abcc, abccc and so on.
What you want is abc.* - if abc is supposed to be the beginning of the matched string and it's optional if anything follows it.
Otherwise you could prepend .* to also match strings with abc in the middle: .*abc.*
Generally i recommend playing around with a site like this to learn RegEx. You are asking for a pretty basic pattern but it's hard to say what you need exactly. Good Luck!
EDIT:
It seems like you want the user to type a part of a file name (or so) and you want to offer something like a search functionality (you could have made that clear in your question IMO). In this case you could bake your own RegEx from the users' input:
private Pattern getSearchRegEx(String userInput){
return Pattern.compile(".*" + userInput + ".*");
}
Of course that's just a very simple example. You could modify this and then use the RegEx to match file names.

So I thin here is your answer:
The regexp that you are looking for is this : [a][b][c].*
Here is my code that works:
String first = "abc"; // true
String second = "abctest"; // true
String third = "sthabcsth"; // false
Pattern pattern = Pattern.compile("[a][b][c].*");
System.out.println(first.matches(pattern.pattern())); // true
System.out.println(second.matches(pattern.pattern())); // true
System.out.println(third.matches(pattern.pattern())); // false
But if you want to check only if starts with or ends with you can use the methods of String: .startsWith() and endsWith()

// The main function that checks if two given strings match. The pattern string may contain
// wildcard characters
default boolean matchPattern(String pattern, String str) {
// If we reach at the end of both strings, we are done
if (pattern.length() == 0 && str.length() == 0) return true;
// Make sure that the characters after '*' are present in str string. This function assumes that
// the pattern string will not contain two consecutive '*'
if (pattern.length() > 1 && pattern.charAt(0) == '*' && str.length() == 0) return false;
// If the pattern string contains '?', or current characters of both strings match
if ((pattern.length() > 1 && pattern.charAt(0) == '?')
|| (pattern.length() != 0 && str.length() != 0 && pattern.charAt(0) == str.charAt(0)))
return matchPattern(pattern.substring(1), str.substring(1));
// If there is *, then there are two possibilities
// a: We consider current character of str string
// b: We ignore current character of str string.
if (pattern.length() > 0 && pattern.charAt(0) == '*')
return matchPattern(pattern.substring(1), str) || matchPattern(pattern, str.substring(1));
return false;
}
public static void main(String[] args) {
test("w*ks", "weeks"); // Yes
test("we?k*", "weekend"); // Yes
test("g*k", "gee"); // No because 'k' is not in second
test("*pqrs", "pqrst"); // No because 't' is not in first
test("abc*bcd", "abcdhghgbcd"); // Yes
test("abc*c?d", "abcd"); // No because second must have 2 instances of 'c'
test("*c*d", "abcd"); // Yes
test("*?c*d", "abcd"); // Yes
}

Related

Regex for a pattern XXYYZZ

I want to validate a string which should follow the pattern XXYYZZ where X, Y, Z can be any letter a-z, A-Z or 0-9.
Example of valid strings:
RRFFKK
BB7733
WWDDMM
5599AA
Not valid:
555677
AABBCD
For now I am splitting the string using the regex (?<=(.))(?!\\1) and iterating over the resulting array and checking if each substring has a length of 2.
String str = "AABBEE";
boolean isValid = checkPattern(str);
public static boolean checkPattern(String str) {
String splited = str.split("(?<=(.))(?!\\1)");
for (String s : splited) {
if (s.length() != 2) {
return false;
}
}
return true;
}
I would like to replace my way of checking with String#matches and get rid of the loop, but can't come up with a valid regex. Can some one help what to put in someRegex in the below snippet?
public static boolean checkPattern(String str) {
return str.matches(someRegex);
}
You can use
s.matches("(\\p{Alnum})\\1(?!\\1)(\\p{Alnum})\\2(?!\\1|\\2)(\\p{Alnum})\\3")
See the regex demo.
Details
\A - start of string (it is implicit in String#matches) - the start of string
(\p{Alnum})\1 - an alphanumeric char (captured into Group 1) and an identical char right after
(?!\1) - the next char cannot be the same as in Group 1
(\p{Alnum})\2 - an alphanumeric char (captured into Group 2) and an identical char right after
(?!\1|\2) - the next char cannot be the same as in Group 1 and 2
(\p{Alnum})\3 - an alphanumeric char (captured into Group 3) and an identical char right after
\z - (implicit in String#matches) - end of string.
RegexPlanet test results:
Since you know a valid pattern will always be six characters long with three pairs of equal characters which are different from each other, a short series of explicit conditions may be simpler than a regex:
public static boolean checkPattern(String str) {
return str.length() == 6 &&
str.charAt(0) == str.chatAt(1) &&
str.charAt(2) == str.chatAt(3) &&
str.charAt(4) == str.chatAt(5) &&
str.charAt(0) != str.charAt(2) &&
str.charAt(0) != str.charAt(4) &&
str.charAt(2) != str.charAt(4);
}
Would the following work for you?
^(([A-Za-z\d])\2(?!.*\2)){3}$
See the online demo
^ - Start string anchor.
(- Open 1st capture group.
( - Open 2nd capture group.
[A-Za-z\d] - Any alphanumeric character.
) - Close 2nd capture group.
\2 - Match exactly what was just captured.
(?!.*\2) - Negative lookahead to make sure the same character is not used elsewhere.
) - Close 1st capture group.
{3} - Repeat the above three times.
$ - End string anchor.
Well, here's another solution that uses regex and streams in combination.
It breaks up the pattern into groups of two characters.
keeps the distinct groups.
and returns true if the count is 3.
String[] data = { "AABBBB", "AABBCC", "AAAAAA","AABBAA", "ABC", "AAABCC",
"RRABBCCC" };
String pat = "(?:\\G(.)\\1)+";
Pattern pattern = Pattern.compile(pat);
for (String str : data) {
Matcher m = pattern.matcher(str);
boolean isValid = m.results().map(MatchResult::group).distinct().count() == 3;
System.out.printf("%8s -> %s%n",
str, isValid ? "Valid" : "Not Valid");
}
Prints
AABBBB -> Not Valid
AABBCC -> Valid
AAAAAA -> Not Valid
AABBAA -> Not Valid
ABC -> Not Valid
AAABCC -> Not Valid
RRABBCCC -> Not Valid
You can check if a character matches with its following character and also if the count of distinct characters is 3.
Demo:
public class Main {
public static void main(String[] args) {
// Test
System.out.println(isValidPattern("RRFFKK"));
System.out.println(isValidPattern("BBAABB"));
System.out.println(isValidPattern("555677"));
}
static boolean isValidPattern(String str) {
return str.length() == 6 &&
str.charAt(0) == str.charAt(1) &&
str.charAt(2) == str.charAt(3) &&
str.charAt(4) == str.charAt(5) &&
str.chars().distinct().count() == 3;
}
}
Output:
true
false
false
Note: String#chars is available since Java-9.

String Predicates to validate if a String contains numeric Value in java

Is the any Predicate Validation in java that checks whether String contains Numbers?
I want to allow special characters but no numbers or spaces. There are Predicates that checks for alphabets but they do they do not allow Special Characters, I need something that only allows alphabets and Special characters and return false if String contains spaces or numericals.
I will use an regex to show my understanding of the question. You want a Predicate<String> that returns true for any string matching
[a-zA-Z_]*
One way to do this regexlessly is to use a for loop and check each character:
Predicate<String> predicate = x -> {
for (int i = 0 ; i < x.length() ; i++) {
if (!Character.isLetter(x.charAt(i)) && !x.charAt(i) == '_') {
return false;
}
}
return true;
};
Here is a method that does the same thing:
public static boolean test(String x) {
for (int i = 0 ; i < x.length() ; i++) {
if (!Character.isLetter(x.charAt(i)) && !x.charAt(i) == '_') {
return false;
}
}
return true;
}
It may be done in a more elegant way:
Predicate<String> p = (s -> s.matches("[a-zA-Z\\_]*"));
Returning true for any string matching [a-zA-Z_]*.
Since your predicate shall return false if the string contains at least one digit or space character and else true, you can do the following:
Predicate<String> p = s -> !s.matches(".*[ \\d].*");
The advantage of this method is that every UTF-8 letter and every special character is valid in p. For some reason the other ansers allow only for ASCII letters ([a-zA-Z] and allow only underscores. I guess the question has been rewritten in the meanwhile.

inserting parentheses and asterisks into string according to some conditions

I have the following method which is used to insert parentheses and asterisks into a boolean expression when dealing with multiplication. For instance, an input of A+B+AB will give A+B+(A*B).
However, I also need to take into account the primes (apostrophes). The following are some examples of input/output:
A'B'+CD should give (A'*B')+(C*D)
A'B'C'D' should give (A'*B'*C'*D')
(A+B)'+(C'D') should give (A+B)'+(C'*D')
I have tried the following code but seems to have errors. Any thoughts?
public static String modify(String expression)
{
String temp = expression;
StringBuilder validated = new StringBuilder();
boolean inBrackets=false;
for(int idx=0; idx<temp.length()-1; idx++)
{
//no prime
if((Character.isLetter(temp.charAt(idx))) && (Character.isLetter(temp.charAt(idx+1))))
{
if(!inBrackets)
{
inBrackets = true;
validated.append("(");
}
validated.append(temp.substring(idx,idx+1));
validated.append("*");
}
//first prime
else if((Character.isLetter(temp.charAt(idx))) && (temp.charAt(idx+1)=='\'') && (Character.isLetter(temp.charAt(idx+2))))
{
if(!inBrackets)
{
inBrackets = true;
validated.append("(");
}
validated.append(temp.substring(idx,idx+2));
validated.append("*");
idx++;
}
//second prime
else if((Character.isLetter(temp.charAt(idx))) && (temp.charAt(idx+2)=='\'') && (Character.isLetter(temp.charAt(idx+1))))
{
if(!inBrackets)
{
inBrackets = true;
validated.append("(");
}
validated.append(temp.substring(idx,idx+1));
validated.append("*");
idx++;
}
else
{
validated.append(temp.substring(idx,idx+1));
if(inBrackets)
{
validated.append(")");
inBrackets=false;
}
}
}
validated.append(temp.substring(temp.length()-1));
if(inBrackets)
{
validated.append(")");
inBrackets=false;
}
return validated.toString();
}
Your help will greatly be appreciated. Thank you in advance! :)
I would suggest you should start with positions of + character in your string. If they differ by 1, you dont do anything. If they differ by two then there are two possiblities: AB or A'. So you check for it. If they differ by more than 2, then just check for ' symbol and put required symbol.
You can do it in 2 passes using regular expressions:
StringBuilder input = new StringBuilder("A'B'+(CDE)+A'B");
Pattern pattern1 = Pattern.compile("[A-Z]'?(?=[A-Z]'?)");
Matcher matcher1 = pattern1.matcher(input);
while (matcher1.find()) {
input.insert(matcher1.end(), '*');
matcher1.region(matcher1.end() + 1, input.length());
}
Pattern pattern2 = Pattern.compile("([A-Z]'?[*])+[A-Z]'?");
Matcher matcher2 = pattern2.matcher(input);
while (matcher2.find()) {
int start = matcher2.start();
int end = matcher2.end();
if (start==0||input.charAt(start-1) != '(') {
input.insert(start, '(');
end++;
}
if (input.length() == end || input.charAt(end) != ')') {
input.insert(end, ')');
end++;
}
matcher2.region(end, input.length());
}
It works as follows: the regex [A-Z]'? will match a letter from A-Z (all the capital letters) and it can be followed by an optional apostrophe, so it conveniently takes care of whether there is an apostrophe or not for us. The regex [A-Z]'?(?=[A-Z]'?) then means "look for a capital letter followed by an option apostrophe and then look for (but don't match against) a capital letter followed by an option apostrophe. This wil be all the places after which you want to put an asterisk. We then create a Matcher and find all the characters that match it. then we insert the asterisk. Since we modified the string, we need to update the Matcher for it to function properly.
In the second pass, we use the regex ([A-Z]'?[*])+[A-Z]'? which will look for "a capital letter followed by an option apostrophe and then an asterisk at least one time and then a capital letter followed by an option apostrophe". this is where all the groups that parentheses need to go in lie. So we create a Matcher and find the matches. we then check to see if there is already a parentese there (making sure not to go out of bounds ). If not we add a one. We again need to update the Matcher since we inserted characters. once this is over we have or final string.
for more on regex:
Pattern documentation
Regex tutorial

Java function to return if string contains illegal characters

I have the following characters that I would like to be considered "illegal":
~, #, #, *, +, %, {, }, <, >, [, ], |, “, ”, \, _, ^
I'd like to write a method that inspects a string and determines (true/false) if that string contains these illegals:
public boolean containsIllegals(String toExamine) {
return toExamine.matches("^.*[~##*+%{}<>[]|\"\\_^].*$");
}
However, a simple matches(...) check isn't feasible for this. I need the method to scan every character in the string and make sure it's not one of these characters. Of course, I could do something horrible like:
public boolean containsIllegals(String toExamine) {
for(int i = 0; i < toExamine.length(); i++) {
char c = toExamine.charAt(i);
if(c == '~')
return true;
else if(c == '#')
return true;
// etc...
}
}
Is there a more elegant/efficient way of accomplishing this?
You can make use of Pattern and Matcher class here. You can put all the filtered character in a character class, and use Matcher#find() method to check whether your pattern is available in string or not.
You can do it like this: -
public boolean containsIllegals(String toExamine) {
Pattern pattern = Pattern.compile("[~##*+%{}<>\\[\\]|\"\\_^]");
Matcher matcher = pattern.matcher(toExamine);
return matcher.find();
}
find() method will return true, if the given pattern is found in the string, even once.
Another way that has not yet been pointed out is using String#split(regex). We can split the string on the given pattern, and check the length of the array. If length is 1, then the pattern was not in the string.
public boolean containsIllegals(String toExamine) {
String[] arr = toExamine.split("[~##*+%{}<>\\[\\]|\"\\_^]", 2);
return arr.length > 1;
}
If arr.length > 1, that means the string contained one of the character in the pattern, that is why it was splitted. I have passed limit = 2 as second parameter to split, because we are ok with just single split.
I need the method to scan every character in the string
If you must do it character-by-character, regexp is probably not a good way to go. However, since all characters on your "blacklist" have codes less than 128, you can do it with a small boolean array:
static final boolean blacklist[] = new boolean[128];
static {
// Unassigned elements of the array are set to false
blacklist[(int)'~'] = true;
blacklist[(int)'#'] = true;
blacklist[(int)'#'] = true;
blacklist[(int)'*'] = true;
blacklist[(int)'+'] = true;
...
}
static isBad(char ch) {
return (ch < 128) && blacklist[(int)ch];
}
Use a constant for avoids recompile the regex in every validation.
private static final Pattern INVALID_CHARS_PATTERN =
Pattern.compile("^.*[~##*+%{}<>\\[\\]|\"\\_].*$");
And change your code to:
public boolean containsIllegals(String toExamine) {
return INVALID_CHARS_PATTERN.matcher(toExamine).matches();
}
This is the most efficient way with Regex.
If you can't use a matcher, then you can do something like this, which is cleaner than a bunch of different if statements or a byte array.
for(int i = 0; i < toExamine.length(); i++) {
char c = toExamine.charAt(i);
if("~##*+%{}<>[]|\"_^".contains(c)){
return true;
}
}
Try the negation of a character class containing all the blacklisted characters:
public boolean containsIllegals(String toExamine) {
return toExamine.matches("[^~##*+%{}<>\\[\\]|\"\\_^]*");
}
This will return true if the string contains illegals (your original function seemed to return false in that case).
The caret ^ just to the right of the opening bracket [ negates the character class. Note that in String.matches() you don't need the anchors ^ and $ because it automatically matches the whole string.
A pretty compact way of doing this would be to rely on the String.replaceAll method:
public boolean containsIllegal(final String toExamine) {
return toExamine.length() != toExamine.replaceAll(
"[~##*+%{}<>\\[\\]|\"\\_^]", "").length();
}

Comparing chars in Java

I want to check a char variable is one of 21 specific chars, what is the shortest way I can do this?
For example:
if(symbol == ('A'|'B'|'C')){}
Doesn't seem to be working. Do I need to write it like:
if(symbol == 'A' || symbol == 'B' etc.)
If your input is a character and the characters you are checking against are mostly consecutive you could try this:
if ((symbol >= 'A' && symbol <= 'Z') || symbol == '?') {
// ...
}
However if your input is a string a more compact approach (but slower) is to use a regular expression with a character class:
if (symbol.matches("[A-Z?]")) {
// ...
}
If you have a character you'll first need to convert it to a string before you can use a regular expression:
if (Character.toString(symbol).matches("[A-Z?]")) {
// ...
}
If you know all your 21 characters in advance you can write them all as one String and then check it like this:
char wanted = 'x';
String candidates = "abcdefghij...";
boolean hit = candidates.indexOf(wanted) >= 0;
I think this is the shortest way.
The first statement you have is probably not what you want... 'A'|'B'|'C' is actually doing bitwise operation :)
Your second statement is correct, but you will have 21 ORs.
If the 21 characters are "consecutive" the above solutions is fine.
If not you can pre-compute a hash set of valid characters and do something like
if (validCharHashSet.contains(symbol))...
you can use this:
if ("ABCDEFGHIJKLMNOPQRSTUVWXYZ".contains(String.valueOf(yourChar)))
note that you do not need to create a separate String with the letters A-Z.
It might be clearer written as a switch statement with fall through e.g.
switch (symbol){
case 'A':
case 'B':
// Do stuff
break;
default:
}
If you have specific chars should be:
Collection<Character> specificChars = Arrays.asList('A', 'D', 'E'); // more chars
char symbol = 'Y';
System.out.println(specificChars.contains(symbol)); // false
symbol = 'A';
System.out.println(specificChars.contains(symbol)); // true
Using Guava:
if (CharMatcher.anyOf("ABC...").matches(symbol)) { ... }
Or if many of those characters are a range, such as "A" to "U" but some aren't:
CharMatcher.inRange('A', 'U').or(CharMatcher.anyOf("1379"))
You can also declare this as a static final field so the matcher doesn't have to be created each time.
private static final CharMatcher MATCHER = CharMatcher.anyOf("ABC...");
Option 2 will work. You could also use a Set<Character> or
char[] myCharSet = new char[] {'A', 'B', 'C', ...};
Arrays.sort(myCharSet);
if (Arrays.binarySearch(myCharSet, symbol) >= 0) { ... }
You can solve this easily by using the String.indexOf(char) method which returns -1 if the char is not in the String.
String candidates = "ABCDEFGHIJK";
if(candidates.indexOf(symbol) != -1){
//character in list of candidates
}
Yes, you need to write it like your second line. Java doesn't have the python style syntactic sugar of your first line.
Alternatively you could put your valid values into an array and check for the existence of symbol in the array.
pseudocode as I haven't got a java sdk on me:
Char candidates = new Char[] { 'A', 'B', ... 'G' };
foreach(Char c in candidates)
{
if (symbol == c) { return true; }
}
return false;
One way to do it using a List<Character> constructed using overloaded convenience factory methods in java9 is as :
if(List.of('A','B','C','D','E').contains(symbol) {
// do something
}
You can just write your chars as Strings and use the equals method.
For Example:
String firstChar = "A";
String secondChar = "B";
String thirdChar = "C";
if (firstChar.equalsIgnoreCase(secondChar) ||
(firstChar.equalsIgnoreCase(thirdChar))) // As many equals as you want
{
System.out.println(firstChar + " is the same as " + secondChar);
} else {
System.out.println(firstChar + " is different than " + secondChar);
}

Categories

Resources