Java regex matcher doesn't group as expected - java

I have a regex
.*?(\\d+.*?\\d*).*?-.*?(\\d+.*?\\d*).*?
I want to match any string that contains a numerical value followed by "-" and another number. Any string can be in between.
Also, I want to be able to extract the numbers using group function of Java Matcher class.
Pattern pattern = Pattern.compile(".*?(\\d+.*?\\d*).*?-.*?(\\d+.*?\\d*).*?");
Matcher matcher = pattern.matcher("13.9 mp - 14.9 mp");
matcher.matches();
I expect this result:
matcher.group(1) // this should be 13.9 but it is 13 instead
matcher.group(2) // this should be 14.9 but it is 14 instead
Any idea what I am missing?

Your current pattern has several problems. As others have pointed out, your dots should be escaped with two backslashes if you intend for them to be literal dots. I think the pattern you want to use to match a number which may or may not have a decimal component is this:
(\\d+(?:\\.\\d+)?)
This matches the following:
\\d+ one or more numbers
(?:\\.\\d+)? followed by a decimal point and one or more numbers
this entire quantity being optional
Full code:
Pattern pattern = Pattern.compile(".*?(\\d+(?:\\.\\d+)?).*?-.*?(\\d+(?:\\.\\d+)?).*?");
Matcher matcher = pattern.matcher("13.9 mp - 14.9 mp");
while (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
Output:
13.9
14.9

.*?(\d+\.*\d*).*?-.*?(\d+\.*\d*).*?
. between '\d+' and '\d' in your regex should be changed to \.

Related

Searching for number after a specific word that does not immediately precede the number

I am trying to use a pattern to search for a Zip Code within a string. I cannot get it to work correctly.
A sample of the inputLine is
What is the weather in 75042?
What I am trying to use for a pattern is
public String getZipcode(String inputLine) {
Pattern pattern = Pattern.compile(".*weather.*([0-9]+).*");
Matcher matcher = pattern.matcher(inputLine);
if (matcher.find()) {
return matcher.group(1).toString();
}
return "Zipcode Not Found.";
}
If I am looking to only get 75002, what do I need to change? This only outputs the last digit in the number, 2. I am terribly confused and I do not completely understand the Javadocs for the Pattern class.
The reason is because the .* matches the first digits and let only one left for your capturing group, you have to throw it away
A more simple pattern can be used here : \D+(\d+)\D+ which means
some non-digits \D+, then some digits to capture (\d+), then some non-digits \D+
public String getZipcode(String inputLine) {
Pattern pattern = Pattern.compile("\\D+(\\d+)\\D+");
Matcher matcher = pattern.matcher(inputLine);
if (matcher.find()) {
return matcher.group(1).toString();
}
return "Zipcode Not Found.";
}
Workable Demo
The problem is that your middle .* is too greedy and eats away 7500. One easy fix is to add a space before your regexp: .*weather.* ([0-9]+).* or even use \\s. But the best is to use non-greedy version of .*? so regexp should be .*weather.*?([0-9]+).*
Spaces are missing in your regex (\s). You can use \s* or \s+ based on your data
Pattern pattern = Pattern.compile("weather\\s*\\w+\\s*(\\d+)");
Matcher matcher = pattern.matcher(inputLine);
Your .*weather.*([0-9]+).* pattern grabs the whole line with the first .* and backtracks to find weather, and if it finds it, it grabs the line portion after the words to the end of line with the subsequent .* pattern and backtracks again to find the last digit and the only one digit is stored in Capturing group 1 since one digit satisfies the [0-9]+ pattern. The last .* just consumes the line to its end.
You may solve the issue by just using ".*weather.*?([0-9]+).*" (making the second .* lazy), but since you are using Matcher#find(), you can use a simpler regex:
Pattern pattern = Pattern.compile("weather\\D*(\\d+)");
And after getting a match, retrieve the value with matcher.group(1).
See the regex demo.
Pattern details
weather - a weather word
\\D* - 0+ chars other than digits
(\\d+) - Capturing group 1: one or more digits
See the Java demo:
String inputLine = "What is the weather in 75042?";
Pattern pattern = Pattern.compile("weather\\D*(\\d+)");
Matcher matcher = pattern.matcher(inputLine);
if (matcher.find()) {
System.out.println(matcher.group(1)); // => 75042
}
I think all you need is \\d+
public String getZipcode(String inputLine) throws Exception {
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(inputLine);
if (matcher.find()) {
return matcher.group();
}
//A good practice is to throw an exception if no result found
throw new NoSuchElementException("Zipcode Not Found.");
}
In regular expressions operators that have no upper bound (*, +) are greedy.
There were already perfect solutions suggested.
I'm just adding one that is very close to your's and addresses the problem in a more isolated way:
If you use the regex
".*weather.*?([0-9]+).*" ... instead of ...
".*weather.*([0-9]+).*"
... your solution will work perfectly well. The '?' after the asterisk instructs the regex compiler to treat the asterisk as non-greedy.
Greedy means consuming as many characters as possible (from left to right) while still allowing the remainder of the regex to match.
Non-greedy means consuming as few characters as possible while still allowing the remainder of the regex to match.

Regex to match words after forward slash or in between

I have this code that needs to get words after / or in between this character.
Pattern pattern = Pattern.compile("\\/([a-zA-Z0-9]{0,})"); // Regex: \/([a-zA-Z0-9]{0,})
Matcher matcher = pattern.matcher(path);
if(matcher.matches()){
return matcher.group(0);
}
The regex \/([a-zA-Z0-9]{0,}) works but not in Java, what could be the reason?
You need to get the value of Group 1 and use find to get a partial match:
Pattern pattern = Pattern.compile("/([a-zA-Z0-9]*)");
Matcher matcher = pattern.matcher(path);
if(matcher.find()){
return matcher.group(1); // Here, use Group 1 value
}
Matcher.matches requires a full string match, only use it if your string fully matches the pattern. Else, use Matcher.find.
Since the value you need is captured into Group 1 (([a-zA-Z0-9]*), the subpattern enclosed with parentheses), you need to return that part.
You needn't escape the / in Java regex. Also, {0,} functions the same way as * quantifier (matches zero or more occurrences of the quantified subpattern).
Also, [a-zA-Z0-9] can be replaced with \p{Alnum} to match the same range of characters (see Java regex syntax reference. The pattern declaration will look like
"/(\\p{Alnum}*)"

Grouping multiple digits prior to a known value

I'm executing this regex code expecting a grouping value of 11, but am getting a 1. Seems like the grouping contains the correct regex for getting one or more digits prior to a known value. I'm sure it is simple, bit I cannot seem to figure it out.
String mydata = "P0Y0M0W0DT11H0M0S";
Pattern pattern = Pattern.compile("P.*(\\\\d+)H.*");
Matcher matcher = pattern.matcher(mydata);
if (matcher.find()){
System.out.println(matcher.group(1));
}
Try this
public static void main(String a1[]) {
String mydata = "P0Y0M0W0DT11H0M0S";
Pattern pattern = Pattern.compile("P.*?(\\d+)H.*");
Matcher matcher = pattern.matcher(mydata);
if (matcher.find()){
System.out.println(matcher.group(1));
}
}
Output
11
The problem is that .* will try to consume/match as much as possible before the next part is checked. Thus in your regex P.*(\d+)H.* the first .* will match 0Y0M0W0DT1 since that's as much as can be matched with the group still being able to match a single digit afterwards.
If you make that quantifier lazy/reluctant (i.e. .*?), it will try to match as little as possible so of the possible matches 0Y0M0W0DT1 and 0Y0M0W0DT it will select the shorter one and leave all the digits for the group to match.
Thus the regex P.*?(\d+)H.* should do what you want.
Additional note: since you're using Matcher#find() you'd not need the catch-all-expression .* at the end. It would also match any string that contains the character H preceeded by at least one digit and a P somewhere in front of those digits. So if you want to be more restrictive your regex would need to be enhanced.

Extract any 5 digit number from a String using Java

I'm attempting to use REGEX to pull a 5 digit number from a larger String.
Here is the method I am using to do this however it simply returns null.
public void setCWBudgetCode(String webPage){
Pattern pattern = Pattern.compile("/\\b\\d{5}\b/g");
Matcher matcher = pattern.matcher(webPage);
if (matcher.find())
this.cwBudgetCode = matcher.group();
}
This is the problem:
Pattern pattern = Pattern.compile("/\\b\\d{5}\\b/g");
Instead of that use this pattern:
Pattern pattern = Pattern.compile("(\\b\\d{5}\\b)");
Unlike Javascript there is no regex delimiter in Java and of course no /g
Also note I have use parenthesis around regex so that you can use matcher.group(1)
You need to change your Regex to save the number as a group Pattern.compile("\\b(\\d{5})\\b");
this will make it work with your group(1) code below

Regex pattern issue

I have the following string:
!date +10 (yyyy-MM-dd'T'HH:mm:ssz)
this string could be also (notice the minus instead of the plus.:
!date -10 (yyyy-MM-dd'T'HH:mm:ssz)
I need a regex pattern that will extract the numeric digits after the + (or -). There could be more than one digit.
I also need a pattern to extract the contents of the brackets ();
I've had a play around on regex pal. but couldn't get a working pattern.
Cheers.
To pick out the number & bracket content, you could do:
String str = "date +10 (yyyy-MM-dd'T'HH:mm:ssz)";
Matcher m = Pattern.compile(".*[+|-](\\d+).*\\((.*)\\).*").matcher(str);
if (m.matches()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
}
This regex should give you a match with the digits after the +/- and the contents of the parentheses in the first and second capturing group, respectively:
"!date\\s[+-](\\d+)\\s\\(([^)]*)\\)"
The following regex leads to 2 capturing groups with the contents you want
"!date\\s[+-](\\d+)\\s\\((\\d{4}-\\d{2}-\\d{2}'T'\\d{2}:\\d{2}:\\d{2}z)\\)"

Categories

Resources