I'm trying to capture word or words from a string like this:
input: "aa bb"
pattern: "(.*) bb"
expected group: "aa"
input: "aa yy bb xx"
pattern: "(.*) bb (.*)"
expected groups: "aa yy, xx"
But in my attempts it always captures whole string. Where is my mistake?
String patternString = "(.*) bb";
Log("patternString: " + patternString);
Pattern p = Pattern.compile(patternString);
Matcher m = p.matcher("aa bb");
while(m.find()) {
Log("group: " + m.group());
//Log: group: aa bb
}
You want to get the first group not the entire match. You should use m.group(1) for this, instead of m.group() which returns the entire match.
See the documentation of Matcher for the available API. Use Matcher#groupCount() to get the number of groups in the last match.
Related
I have one input String like this:
"I am Duc/N Ta/N Van/N"
String "/N" present it is the Name of one person.
The expected output is:
Name: Duc Ta Van
How can I do it by using regular expression?
You can use Pattern and Matcher like this :
String input = "I am Duc/N Ta/N Van/N";
Pattern pattern = Pattern.compile("([^\\s]+)/N");
Matcher matcher = pattern.matcher(input);
String result = "";
while (matcher.find()) {
result+= matcher.group(1) + " ";
}
System.out.println("Name: " + result.trim());
Output
Name: Duc Ta Van
Another Solution using Java 9+
From Java9+ you can use Matcher::results like this :
String input = "I am Duc/N Ta/N Van/N";
String regex = "([^\\s]+)/N";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
String result = matcher.results().map(s -> s.group(1)).collect(Collectors.joining(" "));
System.out.println("Name: " + result); // Name: Duc Ta Van
Here is the regex to use to capture every "name" preceded by a /N
(\w+)\/N
Validate with Regex101
Now, you just need to loop on every match in that String and concatenate the to get the result :
String pattern = "(\\w+)\\/N";
String test = "I am Duc/N Ta/N Van/N";
Matcher m = Pattern.compile(pattern).matcher(test);
StringBuilder sbNames = new StringBuilder();
while(m.find()){
sbNames.append(m.group(1)).append(" ");
}
System.out.println(sbNames.toString());
Duc Ta Van
It is giving you the hardest part. I let you adapt this to match your need.
Note :
In java, it is not required to escape a forward slash, but to use the same regex in the entire answer, I will keep "(\\w+)\\/N", but "(\\w+)/N" will work as well.
I've used "[/N]+" as the regular expression.
Regex101
[] = Matches characters inside the set
\/ = Matches the character / literally (case sensitive)
+ = Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
My Entries:
String e1 = "MyString=1234 MyString=5678";
String e2 = "MyString=1234\nMyString=5678";
What i'm doing:
String pattern = "MyString=(.*)";
Pattern patternObj = Pattern.compile(pattern);
Matcher matcher = patternObj.matcher(e1); //e1 or e2
if (matcher.find()) {
System.out.println("G1: " + matcher.group(1));
System.out.println("G2: " + matcher.group(2));
}
What i want in output:
G1: 1234
G2: 5678
There's only one group that will be matched multiple times. You have to keep matching and printing group 1:
int i = 0;
while (matcher.find()) {
System.out.println("G" + (++i) + ": " + matcher.group(1));
}
Also, you need to update your pattern so it doesn't match the next MyString. You can use \d+ or \w+ or [^\s]+, depending on the type of values you're matching.
The easiest "quick fix" is to replace . (any char but a newline) with \w (a letter, digit or an underscore):
String pattern = "MyString=(\\w*)"; // <---- HERE
Pattern patternObj = Pattern.compile(pattern);
Matcher matcher = patternObj.matcher(e1);
if (matcher.find()) {
System.out.println("G1: " + matcher.group(1));
System.out.println("G2: " + matcher.group(2));
}
Now, MyString=(\\w*) matches a MyString= substring and matches and captures any 0 or more letters, digits or underscores after it not matching any whitespace, punctuation, and other non-word chars.
NOTE: If you need to match any chars but whitespace, you may use \S instead of \w.
If it will always be numbers, you could use this as your regex:
String pattern = "MyString=([0-9]*)";
If it will contain letters as well as numbers, Wiktor Stribizew's comment is very helpful in the original post. He said to use \w which matches on word characters.
I have a string like this.
//Locaton;RowIndex;maxRows=New York, NY_10007;1;4
From this i need to get the contry name New York only.
How it can possible in a single step code.
i used..
String str = "Locaton;RowIndex;maxRows=New York, NY_10007;1;4 ";
str = str.split("=")[1];
str = str.split(",")[0]
the above code contails lots of splits.How can i avoid thiis.
i want to get the contry name only using single code.
Try to use this regular expression "=(.*?)," like this:
String str = "Locaton;RowIndex;maxRows=New York, NY_10007;1;4 ";
Pattern pattern = Pattern.compile("=(.*?),");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output:
New York
Using matcher.group(1) means capturing groups make it easy to extract part of the regex match,parentheses also create a numbered capturing group.
It stores the part of the string matched by the part of the regular expression inside the parentheses.
Match "Locaton;RowIndex;maxRows=New York, NY_10007;1;4 "
Group 1: "New York"
Use capture groups with regex which perfect capturing the specific data from string.
String str = "Locaton;RowIndex;maxRows=New York, NY_10007;1;4 ";
String pattern = "(.*?=)(.*?)(,.*)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(str);
if (m.find()) {
System.out.println("Group 1: " + m.group(1));
System.out.println("Group 2: " + m.group(2));
System.out.println("Group 3: " + m.group(3));
}
Here is the output
Group 1: Locaton;RowIndex;maxRows=
Group 2: New York
Group 3: , NY_10007;1;4
I want to match every file name which ends with .js and is stored in a directory called lib.
Therefore I created the following regular expression: (lib/)(.*?).js$.
I tested the expression (lib/)(.*?).js$ in a Regex Tester and matched this filename: src/main/lib/abc/DocumentHandler.js.
To use my expression in Java, I escaped it to: (lib/)(.*?)\\.js$.
Nevertheless, Java tells me that my expression does not match.
Here is my code:
String regEx = "(lib/)(.*?).js$";
String escapedRegEx = "(lib/)(.*?)\\.js$";
Pattern pattern = Pattern.compile(escapedRegEx);
Matcher matcher = pattern.matcher("src/main/lib/abc/DocumentHandler.js");
System.out.println("Matches: " + matcher.matches()); // false :-(
Did I forgot to escape something?
Use Matcher.find() instead of Matcher.matches() to check for subset of any string.
As per Java Doc:
Matcher#matches()
Attempts to match the entire region against the pattern.
Matcher#find()
Attempts to find the next subsequence of the input sequence that matches the pattern.
sample code:
String regEx = "(lib/)(.*)\\.js$";
String str = "src/main/lib/abc/DocumentHandler.js";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(str);
if (matcher.find()) { // <== returns true if found
System.out.println("Matches: " + matcher.group());
System.out.println("Path: " + matcher.group(2));
}
output:
Matches: lib/abc/DocumentHandler.js
Path: abc/DocumentHandler
Use Matcher#group(index) to get the matched group that is grouped by enclosing inside parenthesis (...) in the regex pattern.
You can use String#matches() method to match the whole string.
String regEx = "(.*)(/lib/)(.*?)\\.js$";
String str = "src/main/lib/abc/DocumentHandler.js";
System.out.println("Matched :" + str.matches(regEx)); // Matched : true
Note: Don't forget to escape dot . that has special meaning in regex pattern to match any thing other than new line.
Try this RegEx pattern
String regEx = "(.*)(lib\\/)(.*)(\\.js$)";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher("src/main/lib/abc/DocumentHandler.js");
It's working for me:
Firstly you don't need to escape it, and secondly you are not matching the first part of the string.
String regEx = "(.*)(lib/)(.*?).js$";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher("src/main/lib/abc/DocumentHandler.js");
So i need to get a word between 2 other words; and im using pattern and matcher.
Pattern p = Pattern.compile("Hello(.*?)GoodBye");
Matcher m = p.matcher(line);
In this example i'm getting the word between Hello and Goodbye and it works.
What i want to do is replace Hello and GoodBye bye variables such as:
String StartDelemiter = "Hello";
String EndDelemiter = "GoodBye";
How should write it in Pattern p = Pattern.compile(---); I Tried :
Pattern p = Pattern.compile( "{ "+StartDelemiter +" (.*?) "+EndDelemiter+" }" );
But application crashes !!
You need to escape { and } with backslashes, something like:
Pattern p = Pattern.compile( "\\{ "+StartDelemiter +" (.*?) "+EndDelemiter+" \\}" );
The curly braces are Regex quantifiers
<pattern>{n} Match exactly n times
<pattern>{n,} Match at least n times
<pattern>{n,m} Match at least n but not more than m times