Why does this regex pattern fail to match the groups in Java. When I run the same example with in a bash shell with echo and sed it works.
String s = "Match foo and bar and baz";
//Pattern p = Pattern.compile("Match (.*) or (.*) or (.*)"); //was a typo
Pattern p = Pattern.compile("Match (.*) and (.*) and (.*)");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
I am expecting to match foo, bar, and baz.
$ echo "Match foo and bar and baz" | sed 's/Match \(.*\) and \(.*\) and \(.*\)/\1, \2, \3/'
foo, bar, baz
It is due to greedy nature of .*. You can use this regex:
Pattern p = Pattern.compile("Match (\\S+) and (\\S+) and (\\S+)");
Here this regex is using \\S+ which means match 1 or more non-spaces.
Full code
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(1) + ", " + m.group(2) + ", " + m.group(3));
}
You're trying to match the whole String, so
while (m.find()) {
will only iterate once.
That single find() will capture all the groups. As such, you can print them out as
System.out.println(m.group(1) + " " + m.group(2) + m.group(3));
Or use a for loop over the Matcher#groupCount().
Your regex is correct, but you need to print the different groups and not only the 1st, ex:
while (m.find()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
}
It seems like a simple typo (or -> and):
Pattern p = Pattern.compile("Match (.*) and (.*) and (.*)");
UPDATE
To replace:
String s = "Match foo and bar and baz";
String replaced = s.replaceAll("Match (.*) and (.*) and (.*)", "$1, $2, $3");
System.out.println(replaced);
Related
I am trying to extract some information from a parse exception message which looks like the following:
"Encountered " <FUNCNAME> "FF "" at line 1, column 22.
Was expecting:
"DEF" ..."
From this message I would like to get the token encountered, in the case above it would be "FUNCNAME" and I would also like to get the expected token, again, in this case it would be "DEF".
String[] REGEX = { "Encountered \" <(.*)> ", "Encountered (.*)." };
Pattern pattern = Pattern.compile(REGEX[0]);
Matcher matcher = pattern.matcher(message);
System.out.println("Matched: " + matcher.group(1));
I used the pattern above to get the encountered token (which works fine), but I am struggling to get the expected one because of the line breaks.
You need to slightly rework your regex pattern, and also use the dot all (?s) modifier when declaring the regex, so that .* can match across lines.
String message = "\"Encountered \" <FUNCNAME> \"FF \" at line 1, column 22.\nWas expecting:\n\"DEF\" ...";
String regex = "(?s)\"Encountered \" <(.*?)>.*?Was expecting:\\s+\"(.*?)\"";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(message);
if (m.find()) {
System.out.println("Matched: " + m.group(1) + ", " + m.group(2));
}
This prints:
Matched: FUNCNAME, DEF
My Entries:
String e1 = "MyString=1234 MyString=5678";
String e2 = "MyString=1234\nMyString=5678";
What i'm doing:
String pattern = "MyString=(.*)";
Pattern patternObj = Pattern.compile(pattern);
Matcher matcher = patternObj.matcher(e1); //e1 or e2
if (matcher.find()) {
System.out.println("G1: " + matcher.group(1));
System.out.println("G2: " + matcher.group(2));
}
What i want in output:
G1: 1234
G2: 5678
There's only one group that will be matched multiple times. You have to keep matching and printing group 1:
int i = 0;
while (matcher.find()) {
System.out.println("G" + (++i) + ": " + matcher.group(1));
}
Also, you need to update your pattern so it doesn't match the next MyString. You can use \d+ or \w+ or [^\s]+, depending on the type of values you're matching.
The easiest "quick fix" is to replace . (any char but a newline) with \w (a letter, digit or an underscore):
String pattern = "MyString=(\\w*)"; // <---- HERE
Pattern patternObj = Pattern.compile(pattern);
Matcher matcher = patternObj.matcher(e1);
if (matcher.find()) {
System.out.println("G1: " + matcher.group(1));
System.out.println("G2: " + matcher.group(2));
}
Now, MyString=(\\w*) matches a MyString= substring and matches and captures any 0 or more letters, digits or underscores after it not matching any whitespace, punctuation, and other non-word chars.
NOTE: If you need to match any chars but whitespace, you may use \S instead of \w.
If it will always be numbers, you could use this as your regex:
String pattern = "MyString=([0-9]*)";
If it will contain letters as well as numbers, Wiktor Stribizew's comment is very helpful in the original post. He said to use \w which matches on word characters.
I have some strings which differ only in one word.(E.g. foo byte bar and foo word bar). I used mutiple regex to parse them.(E.g. (\w+) byte (\w+) -> $1 1 $2 and (\w+) word (\w+) -> $1 2 $2) Is it possible to choose the output depending on the input word? (E.g. (\w+) (\w+) (\w+) -> $1 <depending on $2> $3) Tell me, if you need more examples.
This is best solved using Matcher.appendReplacement / Matcher.appendTail as follows:
String input = "hello byte world";
Pattern p = Pattern.compile("(\\w+) (\\w+) (\\w+)");
Matcher m = p.matcher(input);
StringBuffer sb = new StringBuffer();
while (m.find()) {
// Compute replacement for middle word
String w = m.group(2);
String s = w.equals("byte") ? "<A BYTE!>"
: w.equals("word") ? "<A WORD!>"
: "something else";
m.appendReplacement(sb, "$1 " + s + " $3");
}
m.appendTail(sb);
System.out.println(sb);
Output:
hello <A BYTE!> world
I am trying to use regex to find a match for a string between Si and (P) or Si and (I).
Below is what I wrote. Why isn't it working and how do I fix it?
String Channel = "Si0/4(I) Si0/6( Si0/8K Si0/5(P)";
if (Channel.length() > 0) {
String pattern1 = "Si";
String pattern2 = "(P)";
String pattern3 = "(I)";
String P1 = Pattern.quote(pattern1) + "(.*?)[" + Pattern.quote(pattern2) + "|" + Pattern.quote(pattern3) + "]";
Pattern p = Pattern.compile(P1);
Matcher m = p.matcher(Channel);
while(m.find()){
if (m.group(1)!= null)
{
System.out.println(m.group(1));
}
else if (m.group(2)!= null)
{
System.out.println(m.group(2));
}
}
}
Expected output
0/4
0/5
Actual output
0/4
0/6
0/8K Si0/5
Use a lookbehind and lookahead in your regex. And also you need to add space inside the character class, so that it won't this 0/8K string .
(?<=Si)[^\\( ]*(?=\\((?:P|I)\\))
DEMO
String str="Si0/4(I) Si0/6( Si0/8K Si0/5(P)";
String regex="(?<=Si)[^\\( ]*(?=\\([PI]\\))";
Pattern pattern = Pattern.compile(regex);
Matcher matcher =pattern.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(0));
}
Output:
0/4
0/5
You need to group your regex.It is currently
Si(.*?)[(P)|(I)]
Whereas it should be
Si(.*?)\(I\)|Si(.*?)\(P\)
See demo.
http://regex101.com/r/oO8zI4/8
[] means "any of these character", so it evaluates every letter in the block as if they were separated with OR.
If the result you're searching is always: number/number
You can use:
Si(\d+\/\d+)(?:\(P\)|\(I\))
I am trying to match pattern like '#(a-zA-Z0-9)+ " but not like 'abc#test'.
So this is what I tried:
Pattern MY_PATTERN
= Pattern.compile("\\s#(\\w)+\\s?");
String data = "abc#gere.com #gogasig #jytaz #tibuage";
Matcher m = MY_PATTERN.matcher(data);
StringBuffer sb = new StringBuffer();
boolean result = m.find();
while(result) {
System.out.println (" group " + m.group());
result = m.find();
}
But I can only see '#jytaz', but not #tibuage.
How can I fix my problem? Thank you.
This pattern should work: \B(#\w+)
The \B scans for non-word boundary in the front. The \w+ already excludes the trailing space. Further I've also shifted the parentheses so that the # and + comes in the correct group. You should preferably use m.group(1) to get it.
Here's the rewrite:
Pattern pattern = Pattern.compile("\\B(#\\w+)");
String data = "abc#gere.com #gogasig #jytaz #tibuage";
Matcher m = pattern.matcher(data);
while (m.find()) {
System.out.println(" group " + m.group(1));
}