Java regex not working for operand groups of possible values - java

I'm trying to write a Java method that will determine (true or false) if a particular String matches a regex of animal<L_OPERAND,R_OPERAND>, where L_OPERAND can be any of the following values: dog, cat, sheep and R_OPERAND can be any one of the following values: red, blue. All values are case- and whitespace-sensitive.
Some examples:
animal<fizz,cat> => false; fizz is not a valid L_OPERAND value
animAl<dog,blue> => false; animAl contains an upper-case char (illegal)
animal<dog,sheep> => false; sheep is not a valid R_OPERAND value
animal<dog, blue> => false; contains whitespace between ',' and 'blue' (no whitesapce allowed)
animal<dog,blue> => true; valid
animal<cat,red> => true; valid
animal<sheep,blue> => true; valid
My best attempt so far:
public class RegexExperiments {
public static void main(String[] args) {
boolean b = new RegexExperiments().isValidAnimalDef("animal<dog,blue>");
System.out.println(b);
}
public boolean isValidAnimalDef(String animalDef) {
String regex = "animal<[dog,cat,sheep],[red,blue]>";
if(animalDef.matches(regex)) {
return true;
} else {
return false;
}
}
}
Although I'm not getting any exceptions, I'm getting false for every type of input string (animalDef) I pass in. So obviously my regex is bad. Can anyone spot where I'm going awry?

Your problem lies within the [dog,cat,sheep] and [red,blue] structures. [] represent a character class, it matches a single character that is contained inside. For the first one this would be ,acdeghopst and for the second ,bdelru. So you currently match strings like animal<d,b> or even animal<,,,>.
What you are after is a mix of a grouping structure and an alternation. Alternations are provided by |, so e.g. dog|cat|sheep would match dog or cat or sheep. As you want this alternation inside a larger pattern, you have to contain it inside a group. The (for this case) simpliest grouping structure is a capturing group which is starting with ( and ending with ).
Your final pattern could then be animal<(dog|cat|sheep),(red|blue)>.

Try
String regex = "animal<(dog|cat|sheep),(red|blue)>";

You can use RegEx animal<(dog|cat|sheep),(red|blue)>
Output
false
false
false
false
true
true
true
Code
import java.util.regex.*;
public class HelloWorld {
public static void main(String[] args) {
System.out.println(filterOut("animal<fizz,cat>"));
System.out.println(filterOut("animAl<dog,blue>"));
System.out.println(filterOut("animal<dog,sheep>"));
System.out.println(filterOut("animal<dog, blue>"));
System.out.println(filterOut("animal<dog,blue>"));
System.out.println(filterOut("animal<cat,red>"));
System.out.println(filterOut("animal<sheep,blue>"));
}
public static boolean filterOut(String str) {
Matcher m = Pattern.compile("animal<(dog|cat|sheep),(red|blue)>").matcher(str);
if (m.find()) return true;
else return false;
}
}

Related

Words with two or more capital letters in Java

Words with at least 2 Capital letters and with any special letters (like ##$%^&*()_-+= and so on...) optional.
I tried:
public static boolean isWordHas2Caps(String s) {
return s.matches("\\b(?:\\p{Ll}*\\p{Lu}){2,}\\p{Ll}*\\b");
}
But, I am getting
System.out.println(isWordHas2Caps("eHJHJK"));
System.out.println(isWordHas2Caps("YUIYUI"));
System.out.println(isWordHas2Caps("LkfjkdJkdfj"));
System.out.println(isWordHas2Caps("LLdkjkd"));
System.out.println(isWordHas2Caps("OhdfjhdsjO"));
System.out.println(isWordHas2Caps("LLLuoiu9898"));
System.out.println(isWordHas2Caps("Ohdf&jh/dsjO"));
System.out.println(isWordHas2Caps("auuuu"));
System.out.println(isWordHas2Caps("JJJJJJJJ"));
System.out.println(isWordHas2Caps("YYYY99999"));
System.out.println(isWordHas2Caps("ooooPPPP"));
Output:
true eHJHJK
true YUIYUI
true LkfjkdJkdfj
true LLdkjkd
true OhdfjhdsjO
false LLLuoiu9898 It should be true but getting false
false Ohdf&jh/dsjO It should be true but getting false
false auuuu
true JJJJJJJJ
false YYYY99999 It should be true but getting false
true ooooPPPP
I think, I should in the regexp and numbers and Special letters. How can I do that?
Update:
A valuable comment from anubhava:
Probably s.matches("(?:\\S*\\p{Lu}){2}\\S*"); may be better
Demo of the above solution.
Original answer:
You can use the regex, \b.*\p{Lu}.*\p{Lu}.*\b as shown below:
public static boolean isWordHas2Caps(String s) {
return s.matches("\\b.*\\p{Lu}.*\\p{Lu}.*\\b");
}
Demo:
public class Main {
public static void main(String[] args) {
System.out.println(isWordHas2Caps("eHJHJK"));
System.out.println(isWordHas2Caps("YUIYUI"));
System.out.println(isWordHas2Caps("LkfjkdJkdfj"));
System.out.println(isWordHas2Caps("LLdkjkd"));
System.out.println(isWordHas2Caps("OhdfjhdsjO"));
System.out.println(isWordHas2Caps("LLLuoiu9898"));
System.out.println(isWordHas2Caps("Ohdf&jh/dsjO"));
System.out.println(isWordHas2Caps("auuuu"));
System.out.println(isWordHas2Caps("JJJJJJJJ"));
System.out.println(isWordHas2Caps("YYYY99999"));
System.out.println(isWordHas2Caps("ooooPPPP"));
}
public static boolean isWordHas2Caps(String s) {
return s.matches("\\b.*\\p{Lu}.*\\p{Lu}.*\\b");
}
}
Output:
true
true
true
true
true
true
true
false
true
true
true
You want to check if there are at least two uppercase letters anywhere in a string that can contain arbitrary chars.
Then, you can use
public static boolean isWordHas2Caps(String s) {
return Pattern.compile("\\p{Lu}\\P{Lu}*\\p{Lu}").matcher(s).find();
}
See the Java demo.
Alternatively, if you still want to use String#matches you can use the following (keeping in mind that we need to match the entire string):
public static boolean isWordHas2Caps(String s) {
return s.matches("(?s)(?:\\P{Lu}*\\p{Lu}){2}.*");
}
The (?s)(?:\\P{Lu}*\\p{Lu}){2}.* regex matches
(?s) - the Pattern.DOTALL embedded flag option (makes . match any chars)
(?:\P{Lu}*\p{Lu}){2} - two occurrences of any zero or more chars other than uppercase letters and then an uppercase letter
.* - the rest of the string.
Your code did not return expected results because all of them contain non-letter characters, while String#matches() requires a full string match against a pattern, and yours matches strings that contains letters only.
That is why you should
Make sure you can match anywhere inside a string, and Matcher.find does this job best
\p{Lu}\P{Lu}*\p{Lu} pattern will find any sequence of an uppercase letter + any zero or more non-letters + an uppercase letter
Alternatively, you can use (?s)(?:\P{Lu}*\p{Lu}){2}.* regex to match a full string that contains at least two uppercase letters.

Remove a pair of chars in a string next to each other

In an interview, I have faced one problem, and I'm unable to find the logic for dynamic input.
Input: abbcaddaee
If This input is given, we have to remove pair of char, for example
abbcaddaee. Bold value will be removed, and output is acaa, then we have to do the same for this also, then acaa. The final output is ac.
Likewise have to do n number of iterations to remove these pairs of the same char.
Input: aabbbcffjdddd → aabbbcffjdddd → bcj
You can use regexp and a single do-while loop:
String str = "abbcaddaee";
do {
System.out.println(str);
} while (!str.equals(str = str.replaceAll("(.)\\1", "")));
Output:
abbcaddaee
acaa
ac
Explanation:
regexp (.)\\1 - any character followed by the same character;
str = str.replaceAll(...) - removes all duplicates and replaces current string;
!str.equals(...) - checks inequality of the current string with itself, but without duplicates.
See also: Iterate through a string and remove consecutive duplicates
I would use a regex replacement here:
String input = "aabbbcffjdddd";
String output = input.replaceAll("(.)\\1", "");
System.out.println(output); // bcj
The regex pattern (.)\1 matches any single character followed by that same character once. We replace such matches with empty string, effectively removing them.
In the following solution, I used the recursive method to give you the result you want.
For Pattern:
1st Capturing Group (.)
. matches any character (except for line terminators)
\1 matches the same text as most recently matched by the 1st capturing group
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
private static Pattern p = Pattern.compile("(.)\\1");
public static void main(String[] args) {
System.out.println(removePairChar("abbcaddaee"));
System.out.println(removePairChar("aabbbcffjdddd"));
}
public static String removePairChar(String input) {
Matcher matcher = p.matcher(input);
boolean matchFound = matcher.find();
if(matchFound) {
input = input.replaceAll(p.pattern(), "");
return removePairChar(input);
}
return input;
}
}
OUTPUT:
ac
bcj
The basic idea is to use Stack. In this case we will have O(n) complexity as opposed to while + replaceAll
Loop through char codes, if char code is not present in stack push it
If the stack head equals to the current char code pop it
import java.util.Optional;
import java.util.Stack;
public class Main {
public static void main(final String... params) {
System.out.println(Main.normalize("abbcaddaee"));
System.out.println(Main.normalize("aabbbcffjdddd"));
}
private static String normalize(final String input) {
final int length = Optional.ofNullable(input).map(String::length).orElse(0);
if (length < 2) {
return input;
}
Stack<Integer> buf = new Stack<Integer>();
input.codePoints().forEach(code -> {
if (buf.isEmpty() || buf.peek() != code) {
buf.push(code);
} else {
buf.pop();
}
});
return buf.stream().collect(
StringBuilder::new,
StringBuilder::appendCodePoint,
StringBuilder::append).
toString();
}
}

Why doesn't Pattern.matches("[a*mn]","aaaa") return true? What should be proper code to get the desired output?

I want to create a pattern where the desired string should either be multiples of a including null i.e. a*, or it should be one single m or single n. But the following code doesn't give the desired output.
class Solution {
public static void main(String args[]) {
System.out.println(Pattern.matches("[a*mn]", "aaaa"));
}
}
* within a character class ([]) is just a *, not a quantifier.
I want to create a pattern where the desired string should either be multiples of a including null i.e. a*, or it should be one single m or single n.
You'll need an alternation (|) for that: a*|[mn]:
Pattern.matches("a*|[mn]", "aaaa")
Live example:
import java.util.regex.Pattern;
class Example {
public static void main (String[] args) throws java.lang.Exception {
check("aaaa", true);
check("a", true);
check("", true);
check("m", true);
check("n", true);
check("mn", false);
check("q", false);
check("nnnn", false);
}
private static void check(String text, boolean expect) {
boolean result = Pattern.matches("a*|[mn]", text);
System.out.println(
(result ? "Match " : "No match") +
(result == expect ? " OK " : " ERROR ") +
": " + text
);
}
}
...though obviously if you were really using the pattern repeatedly, you'd want to compile it once and reuse the result.
Try this regex
(a*)|m|n
Pattern.matches("(a*)|m|n", "") // true, match 1st group
Pattern.matches("(a*)|m|n", "a") // true, match 1st group
Pattern.matches("(a*)|m|n", "aaaa") // true, match 1st group
Pattern.matches("(a*)|m|n", "m") // true, match `n`
Pattern.matches("(a*)|m|n", "n") // true, match `m`
Pattern.matches("(a*)|m|n", "man") // false
Pattern.matches("(a*)|m|n", "mn") // false
inside the [] the "" is not a quantifier so you'll get a true if one of the characters in the regex is present therefore the result will be true if the string is "a","","m" or "n".
And the rest will result in false.
your regex should be:
([aa*]*|[mn])
it will be true only if multiples of "a" are entered including "a*" or a single "m" or "n".
check it by following examples:
System.out.println("[aa*]*|[mn]","m");
System.out.println("[aa*]*|[mn]","aaaaa");
System.out.println("[aa*]*|[mn]","a*a*");

Java - Regex to have value validated

I am trying to generalise Regex-java like if I give value and pattern than the method should return true or false if the given value matches the given pattern - TRUE else FALSE.
following is the method I tried with simple Alphanumeric
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public static boolean isValidInput(String value, String pattern) {
boolean isValid = false;
Pattern walletInputPattern = Pattern.compile(pattern);
Matcher walletMatcher = walletInputPattern.matcher(value);
if (walletMatcher.matches()) {
isValid = true;
}
return isValid;
}
public static void main(String args[]) {
String pattern = "^[a-zA-Z0-9]*$";
String inputValue = "45645";
if (isValidInput(inputValue, pattern)) {
System.out.println("Alphanumeric");
} else {
System.out.println("OOPS");
}
}
}
but I gave wrong input and still it prints the TRUE..
what is the mistake I do here....??..
thanks for your inputs and spending your valuable time :)
I believe this lookahead-based regex should work for you:
String pattern = "^(?=.*?[A-Za-z])(?=.*?[0-9])[a-zA-Z0-9]+$";
This ensures that:
There is at least one alphabetic character in the input
There is at least one digit in the input
The input is comprised of ONLY alphanumerics
It is the right result because 45645 is indeed an alphanumeric value.
If you want to make sure the value is a combination of numbers and letters then you need a different expression:
String pattern = "^(?!^[0-9]+$)(?!^[a-zA-Z]+$)[a-zA-Z0-9]+$";
(?!^[0-9]+$): This makes sure the string isn't just a combination of digits.
(?!^[a-zA-Z]+$): This makes sure the string isn't just a combination of letters.
[a-zA-Z0-9]*: This matches a combination of letters and digits.

Java function to return if string contains illegal characters

I have the following characters that I would like to be considered "illegal":
~, #, #, *, +, %, {, }, <, >, [, ], |, “, ”, \, _, ^
I'd like to write a method that inspects a string and determines (true/false) if that string contains these illegals:
public boolean containsIllegals(String toExamine) {
return toExamine.matches("^.*[~##*+%{}<>[]|\"\\_^].*$");
}
However, a simple matches(...) check isn't feasible for this. I need the method to scan every character in the string and make sure it's not one of these characters. Of course, I could do something horrible like:
public boolean containsIllegals(String toExamine) {
for(int i = 0; i < toExamine.length(); i++) {
char c = toExamine.charAt(i);
if(c == '~')
return true;
else if(c == '#')
return true;
// etc...
}
}
Is there a more elegant/efficient way of accomplishing this?
You can make use of Pattern and Matcher class here. You can put all the filtered character in a character class, and use Matcher#find() method to check whether your pattern is available in string or not.
You can do it like this: -
public boolean containsIllegals(String toExamine) {
Pattern pattern = Pattern.compile("[~##*+%{}<>\\[\\]|\"\\_^]");
Matcher matcher = pattern.matcher(toExamine);
return matcher.find();
}
find() method will return true, if the given pattern is found in the string, even once.
Another way that has not yet been pointed out is using String#split(regex). We can split the string on the given pattern, and check the length of the array. If length is 1, then the pattern was not in the string.
public boolean containsIllegals(String toExamine) {
String[] arr = toExamine.split("[~##*+%{}<>\\[\\]|\"\\_^]", 2);
return arr.length > 1;
}
If arr.length > 1, that means the string contained one of the character in the pattern, that is why it was splitted. I have passed limit = 2 as second parameter to split, because we are ok with just single split.
I need the method to scan every character in the string
If you must do it character-by-character, regexp is probably not a good way to go. However, since all characters on your "blacklist" have codes less than 128, you can do it with a small boolean array:
static final boolean blacklist[] = new boolean[128];
static {
// Unassigned elements of the array are set to false
blacklist[(int)'~'] = true;
blacklist[(int)'#'] = true;
blacklist[(int)'#'] = true;
blacklist[(int)'*'] = true;
blacklist[(int)'+'] = true;
...
}
static isBad(char ch) {
return (ch < 128) && blacklist[(int)ch];
}
Use a constant for avoids recompile the regex in every validation.
private static final Pattern INVALID_CHARS_PATTERN =
Pattern.compile("^.*[~##*+%{}<>\\[\\]|\"\\_].*$");
And change your code to:
public boolean containsIllegals(String toExamine) {
return INVALID_CHARS_PATTERN.matcher(toExamine).matches();
}
This is the most efficient way with Regex.
If you can't use a matcher, then you can do something like this, which is cleaner than a bunch of different if statements or a byte array.
for(int i = 0; i < toExamine.length(); i++) {
char c = toExamine.charAt(i);
if("~##*+%{}<>[]|\"_^".contains(c)){
return true;
}
}
Try the negation of a character class containing all the blacklisted characters:
public boolean containsIllegals(String toExamine) {
return toExamine.matches("[^~##*+%{}<>\\[\\]|\"\\_^]*");
}
This will return true if the string contains illegals (your original function seemed to return false in that case).
The caret ^ just to the right of the opening bracket [ negates the character class. Note that in String.matches() you don't need the anchors ^ and $ because it automatically matches the whole string.
A pretty compact way of doing this would be to rely on the String.replaceAll method:
public boolean containsIllegal(final String toExamine) {
return toExamine.length() != toExamine.replaceAll(
"[~##*+%{}<>\\[\\]|\"\\_^]", "").length();
}

Categories

Resources