Java - Why is this pattern matching not working?

Java - Why is this pattern matching not working? - java

public static String FILL_IN_THE_BLANK_REGEX = "\\\\[blank_.+\\\\]";
public static int getBlankCountForFillInTheBlank(String questionText) {
Matcher m = Pattern.compile(FILL_IN_THE_BLANK_REGEX).matcher(questionText);
int count = 0;
while (m.find()) ++count;
return count;
}
public static void main(String[] args) {
System.out.println(getBlankCountForFillInTheBlank("abc [blank_tag1] abc [blank_tag2]")); // prints 1
}
But if I do something like
public static String FILL_IN_THE_BLANK_REGEX = "\\\\[blank_tag.\\\\]";
It prints 2 which is correct.
'+' does not work here I don't know why.
(the blank tag can be anything like [blank_someusertag])

See the javadoc for Pattern. I believe it's because + is a greedy quantifier and therefore matches everything it can. You can add a ? after the + to make it reluctant.
public static String FILL_IN_THE_BLANK_REGEX = "\\[blank_.+?\\]";
will print
2

.+ will match ANY character 1 or more times.
Use the non-greedy ? to ensure you only capture until the next defined expression.
Your working expression: \\[blank_.+?\\]

Related

Remove a pair of chars in a string next to each other

In an interview, I have faced one problem, and I'm unable to find the logic for dynamic input.
Input: abbcaddaee
If This input is given, we have to remove pair of char, for example
abbcaddaee. Bold value will be removed, and output is acaa, then we have to do the same for this also, then acaa. The final output is ac.
Likewise have to do n number of iterations to remove these pairs of the same char.
Input: aabbbcffjdddd → aabbbcffjdddd → bcj

You can use regexp and a single do-while loop:
String str = "abbcaddaee";
do {
System.out.println(str);
} while (!str.equals(str = str.replaceAll("(.)\\1", "")));
Output:
abbcaddaee
acaa
ac
Explanation:
regexp (.)\\1 - any character followed by the same character;
str = str.replaceAll(...) - removes all duplicates and replaces current string;
!str.equals(...) - checks inequality of the current string with itself, but without duplicates.
See also: Iterate through a string and remove consecutive duplicates

I would use a regex replacement here:
String input = "aabbbcffjdddd";
String output = input.replaceAll("(.)\\1", "");
System.out.println(output); // bcj
The regex pattern (.)\1 matches any single character followed by that same character once. We replace such matches with empty string, effectively removing them.

In the following solution, I used the recursive method to give you the result you want.
For Pattern:
1st Capturing Group (.)
. matches any character (except for line terminators)
\1 matches the same text as most recently matched by the 1st capturing group
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
private static Pattern p = Pattern.compile("(.)\\1");
public static void main(String[] args) {
System.out.println(removePairChar("abbcaddaee"));
System.out.println(removePairChar("aabbbcffjdddd"));
}
public static String removePairChar(String input) {
Matcher matcher = p.matcher(input);
boolean matchFound = matcher.find();
if(matchFound) {
input = input.replaceAll(p.pattern(), "");
return removePairChar(input);
}
return input;
}
}
OUTPUT:
ac
bcj

The basic idea is to use Stack. In this case we will have O(n) complexity as opposed to while + replaceAll
Loop through char codes, if char code is not present in stack push it
If the stack head equals to the current char code pop it
import java.util.Optional;
import java.util.Stack;
public class Main {
public static void main(final String... params) {
System.out.println(Main.normalize("abbcaddaee"));
System.out.println(Main.normalize("aabbbcffjdddd"));
}
private static String normalize(final String input) {
final int length = Optional.ofNullable(input).map(String::length).orElse(0);
if (length < 2) {
return input;
}
Stack<Integer> buf = new Stack<Integer>();
input.codePoints().forEach(code -> {
if (buf.isEmpty() || buf.peek() != code) {
buf.push(code);
} else {
buf.pop();
}
});
return buf.stream().collect(
StringBuilder::new,
StringBuilder::appendCodePoint,
StringBuilder::append).
toString();
}
}

Pattern Count with between 0's and stat and end with "1"

A string contains many patterns of the form 1(0+)1 where (0+) represents any non-empty consecutive sequence of 0's. The patterns are allowed to overlap.
For example, consider string "1101001", we can see there are two consecutive sequences "1(0)1" and "1(00)1" which are of the form 1(0+)1.
public class Solution {
static int patternCount(String s){
String[] sArray = s.split("1");
for(String str : sArray) {
if(Pattern.matches("[0]+", str)) {
count++;
}
}
return count;
}
public static void main(String[] args) {
int result = patternCount("1001010001");
System.out.println(result);//3
}
}
Sample Input
100001abc101
1001ab010abc01001
1001010001
Sample Output
2
2
3
But still something i feel might fail in future could you pleaese help me to optimize my code as per the Requirement

First: you did not declare the count variable.
Anyway, I think a better method is:
static int patternCount(String s){
Pattern pattern = Pattern.compile("(?<=1)[0]+(?=1)");
Matcher matcher = pattern.matcher(s);
int count = 0;
while (matcher.find())
count++;
return count;
}
You use more regex and less logic; and, for what I could see, it is even faster (see test).
In case you didn't know, the trick used in regex is called lookaround. More precisely, (?<=1) is positive lookbehind and (?=1) is positive lookahead.

Regex semicolon and words

I am facing some difficulties because of some regex expression in Java. I want a expression the validates that one or more words are valid and are delimited by semicolon or not.
Examples:
VF;VM - Good
VF;GM - Bad
VF,VM - Bad
VF;VM;IF - Good
VF,VM;IF - Bad
I tried this one:
String regex = "(\\bVM\\b|\\bVF\\b|\\bTV\\b|\\bIM\\b|\\bIF\\b)|\\;";
But it doesn't work....
If you can help me I will be thankful.

Basically, you want a list of the valid words, and then an optional repeated group starting with a ; and the list of valid words:
String regex = "^(?:\\b(?:VM|VF|TV|IM|IF)\\b)(?:;\\b(?:VM|VF|TV|IM|IF)\\b)*$";
That:
Uses ^ at the beginning and $ at the end to match the full input.
Starts with VM, VF, TV, IM, or IF with word boundary assertions on either side.
Then allows zero or more repeats with a ; in front of it. All of your examples involve at least two "words," though, so if that's a requirement, change the * (repeat zero or more times) to a + (repeat one or more times) on the second group.
...and actually, as Toto points out, since we're using anchors and defining a specific separator (;), we don't need the word boundaries, so simply
String regex = "^(?:VM|VF|TV|IM|IF)(?:;(?:VM|VF|TV|IM|IF))*$";
...is sufficient, and simpler.
Example on regex101 (as a JavaScript regex)
Tests:
class Example
{
private static String regex = "^(?:VM|VF|TV|IM|IF)(?:;(?:VM|VF|TV|IM|IF))*$";
public static void main (String[] args) throws java.lang.Exception
{
test("VF;VM", true);
test("VF;GM", false);
test("VF,VM", false);
test("VF;VM;IF", true);
test("VF,VM;IF", false);
}
private static void test(String str, boolean expectedResult) {
boolean result = str.matches(regex);
System.out.println(str + " -- " + (result ? "Good" : "Bad") + (result == expectedResult ? " - OK" : " - ERROR"));
}
}
Live on ideone

This code might be easier to understand and modify than a big RegEx.
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
public class ValidateList
{
public static void main(String[] args) {
Set<String> validWords = new HashSet<String>(Arrays.asList(new String[] { "VM", "VF", "TV", "IM", "IF" }));
System.out.println(areAllWordsValid("VF;VM;IF", validWords));
System.out.println(areAllWordsValid("VF;VM;IF;", validWords));
System.out.println(areAllWordsValid("VF;GM;IF", validWords));
}
public static boolean areAllWordsValid(String string, Set<String> validWords) {
String[] words = string.split(";", -1);
for (String word : words) {
if (!validWords.contains(word)) {
return false;
}
}
return true;
}
}

Same as the accepted answer but collapsed down a little bit:
^(?:VF|VM|IF|TV|IM|;)++$

Regex to add digit between delimiter characters if missing

I didn't use regex a lot and I need a little bit of help. I have a situation where I have digits which are separated with dot char, something like this:
0.0.1
1.1.12.1
20.3.4.00.1
Now I would like to ensure that each number between . has two digits:
00.00.01
01.01.12.01
20.03.04.00.01
How can I accomplish that? Thank you for your help.

You can use String.split() to accomplish this:
public static void main(String[] args) {
String[] splitString = "20.3.4.00.1".split("\\.");
String output = "";
for(String a : splitString)
{
if(a.length() < 2)
{
a = "0" + a;
}
output += a + ".";
}
output = output.substring(0, output.length() - 1);
System.out.println(output);
}

use this pattern
\b(?=\d(?:\.|$))
and replace with 0
Demo
\b # <word boundary>
(?= # Look-Ahead
\d # <digit 0-9>
(?: # Non Capturing Group
\. # "."
| # OR
$ # End of string/line
) # End of Non Capturing Group
) # End of Look-Ahead

You can iterate over the matching groups retrieved from matching the following expression: /([^.]+)/g.
Example:
public class StackOverFlow {
public static String text;
public static String pattern;
static {
text = "20.3.4.00.1";
pattern = "([^.]+)";
}
public static String appendLeadingZero(String text) {
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(text);
StringBuilder sb = new StringBuilder();
while (m.find()) {
String firstMatchingGroup = m.group(1);
if (firstMatchingGroup.length() < 2) {
sb.append("0" + firstMatchingGroup);
} else {
sb.append(firstMatchingGroup);
}
sb.append(".");
}
return sb.substring(0, sb.length() - 1);
}
public static void main(String[] args) {
System.out.println(appendLeadingZero(text));
}
}

I am going with the assumption that you want to ensure every integer is at least two digits, both between . and on the ends. This is what I came up with
public String ensureTwoDigits(String original){
return original.replaceAll("(?<!\\d)(\\d)(?!\\d)","0$1");
}
Test case
public static void main(String[] args) {
Foo f = new Foo();
List<String> values = Arrays.asList("1",
"1.1",
"01.1",
"01.01.1.1",
"01.2.01",
"01.01.01");
values.forEach(s -> System.out.println(s + " -> " + f.ensureTwoDigits(s)));
}
Test output
1 -> 01
1.1 -> 01.01
01.1 -> 01.01
01.01.1.1 -> 01.01.01.01
01.2.01 -> 01.02.01
01.01.01 -> 01.01.01
The regex (?<!\\d)(\\d)(?!\\d) uses both negative lookbehind and negative lookahead to check if a single digit has other digits around it. Otherwise, it will put a zero in front of every single digit. The replacement string "0$1" says put a 0 in front of the first capturing group. There really is only one, that being (\\d) -- the single digit occurrance.
EDIT: I should note that I realize this is not a strict match to the original requirements. It won't matter what you use between single digits -- letters, various punctuation, et. al., will all return just fine with zero in front of any single digit. If you want it to fail or skip strings that may contain characters other than digits and ., the regex would need to be changed.

you can use this simple regex:
\\b\\d\\b
and replace with 0$0

What's up with this regular expression not matching?

public class PatternTest {
public static void main(String[] args) {
System.out.println("117_117_0009v0_172_5738_5740".matches("^([0-9_]+v._.)"));
}
}
This program prints "false". What?!
I am expecting to match the prefix of the string: "117_117_0009v0_1"
I know this stuff, really I do... but for the life of me, I've been staring at this for 20 minutes and have tried every variation I can think of and I'm obviously missing something simple and obvious here.
Hoping the many eyes of SO can pick it out for me before I lose my mind over this.
Thanks!
The final working version ended up as:
String text = "117_117_0009v0_172_5738_5740";
String regex = "[0-9_]+v._.";
Pattern p = Pattern.compile(regex);
Mather m = p.matcher(text);
if (m.lookingAt()) {
System.out.println(m.group());
}
One non-obvious discovery/reminder for me was that before accessing matcher groups, one of matches() lookingAt() or find() must be called. If not an IllegalStateException is thrown with the unhelpful message "Match not found". Despite this, groupCount() will still return non-zero, but it lies. Do not beleive it.
I forgot how ugly this API is. Argh...

by default Java sticks in the ^ and $ operators, so something like this should work:
public class PatternTest {
public static void main(String[] args) {
System.out.println("117_117_0009v0_172_5738_5740".matches("^([0-9_]+v._.).*$"));
}
}
returns:
true
Match content:
117_117_0009v0_1
This is the code I used to extract the match:
Pattern p = Pattern.compile("^([0-9_]+v._.).*$");
String str = "117_117_0009v0_172_5738_5740";
Matcher m = p.matcher(str);
if (m.matches())
{
System.out.println(m.group(1));
}

If you want to check if a string starts with the certain pattern you should use Matcher.lookingAt() method:
Pattern pattern = Pattern.compile("([0-9_]+v._.)");
Matcher matcher = pattern.matcher("117_117_0009v0_172_5738_5740");
if (matcher.lookingAt()) {
int groupCount = matcher.groupCount();
for (int i = 0; i <= groupCount; i++) {
System.out.println(i + " : " + matcher.group(i));
}
}
Javadoc:
boolean
java.util.regex.Matcher.lookingAt()
Attempts to match the input sequence,
starting at the beginning of the
region, against the pattern. Like the
matches method, this method always
starts at the beginning of the region;
unlike that method, it does not
require that the entire region be
matched. If the match succeeds then
more information can be obtained via
the start, end, and group methods.

I donno Java Flavor of Regular Expression However This PCRE Regular Expression Should work
^([\d_]+v\d_\d).+
Dont know why you are using ._. instead of \d_\d

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - Why is this pattern matching not working? - java

See the javadoc for Pattern. I believe it's because + is a greedy quantifier and therefore matches everything it can. You can add a ? after the + to make it reluctant. public static String FILL_IN_THE_BLANK_REGEX = "\\[blank_.+?\\]"; will print 2

.+ will match ANY character 1 or more times. Use the non-greedy ? to ensure you only capture until the next defined expression. Your working expression: \\[blank_.+?\\]

Related

Remove a pair of chars in a string next to each other

Pattern Count with between 0's and stat and end with "1"

Regex semicolon and words

Regex to add digit between delimiter characters if missing

What's up with this regular expression not matching?

Categories

Resources