Regex pattern issue - java

I have the following string:
!date +10 (yyyy-MM-dd'T'HH:mm:ssz)
this string could be also (notice the minus instead of the plus.:
!date -10 (yyyy-MM-dd'T'HH:mm:ssz)
I need a regex pattern that will extract the numeric digits after the + (or -). There could be more than one digit.
I also need a pattern to extract the contents of the brackets ();
I've had a play around on regex pal. but couldn't get a working pattern.
Cheers.

To pick out the number & bracket content, you could do:
String str = "date +10 (yyyy-MM-dd'T'HH:mm:ssz)";
Matcher m = Pattern.compile(".*[+|-](\\d+).*\\((.*)\\).*").matcher(str);
if (m.matches()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
}

This regex should give you a match with the digits after the +/- and the contents of the parentheses in the first and second capturing group, respectively:
"!date\\s[+-](\\d+)\\s\\(([^)]*)\\)"

The following regex leads to 2 capturing groups with the contents you want
"!date\\s[+-](\\d+)\\s\\((\\d{4}-\\d{2}-\\d{2}'T'\\d{2}:\\d{2}:\\d{2}z)\\)"

Related

Regular Expression For Duplicate Digits

what would be the regular expression to find duplicate set of digits in a numeric string?
Suppose
String s="0.1234523452345234";
From this string I need to obtain "2345". I tried the following regex-
String s="0.1234523452345234";
String regex="(\\d+)\\1+\\b";
Pattern p=Pattern.compile(regex);
Matcher m=p.matcher(s);
if(m.find())
{
System.out.println(m.group(0));
}
But the output is
523452345234
While i need to print
2345
"(\\d+)\\1+\\b" macthes any sequence of digits followed immediately by this sequence at least once. It can be followed by multiple occurences of the sequence (the + quantifier). The regex also enforces a word boundary after the last matching sequence.
I think what you are looking for is the following regex:
"(\\d+).*\\1" (without word boundary, anything between your sequences, and only one repetition of the sequence. Example:
0.1234789897897123499
^^^^ ^^^^---- (\\d+) and \\1
^^^^^^^^^-------- .*
If your longest run needs to be followed immediately by the duplicate (no fillers inbetween), then drop the .* from the regex.
group(0) will return the full match (e.g. 12347898978971234), group(1) will contain the first capturing group (e.g. 1234).
I tried this regular expression that finds the number that duplicates one time , it can be shown by m.group(1) the first occurence :
String s="0.1234523452345234";
String regex="([0-9]+)\\1";
Pattern p=Pattern.compile(regex);
Matcher m=p.matcher(s);
if(m.find())
{
System.out.println(m.group(1));
}
Output :
2345

Searching for number after a specific word that does not immediately precede the number

I am trying to use a pattern to search for a Zip Code within a string. I cannot get it to work correctly.
A sample of the inputLine is
What is the weather in 75042?
What I am trying to use for a pattern is
public String getZipcode(String inputLine) {
Pattern pattern = Pattern.compile(".*weather.*([0-9]+).*");
Matcher matcher = pattern.matcher(inputLine);
if (matcher.find()) {
return matcher.group(1).toString();
}
return "Zipcode Not Found.";
}
If I am looking to only get 75002, what do I need to change? This only outputs the last digit in the number, 2. I am terribly confused and I do not completely understand the Javadocs for the Pattern class.
The reason is because the .* matches the first digits and let only one left for your capturing group, you have to throw it away
A more simple pattern can be used here : \D+(\d+)\D+ which means
some non-digits \D+, then some digits to capture (\d+), then some non-digits \D+
public String getZipcode(String inputLine) {
Pattern pattern = Pattern.compile("\\D+(\\d+)\\D+");
Matcher matcher = pattern.matcher(inputLine);
if (matcher.find()) {
return matcher.group(1).toString();
}
return "Zipcode Not Found.";
}
Workable Demo
The problem is that your middle .* is too greedy and eats away 7500. One easy fix is to add a space before your regexp: .*weather.* ([0-9]+).* or even use \\s. But the best is to use non-greedy version of .*? so regexp should be .*weather.*?([0-9]+).*
Spaces are missing in your regex (\s). You can use \s* or \s+ based on your data
Pattern pattern = Pattern.compile("weather\\s*\\w+\\s*(\\d+)");
Matcher matcher = pattern.matcher(inputLine);
Your .*weather.*([0-9]+).* pattern grabs the whole line with the first .* and backtracks to find weather, and if it finds it, it grabs the line portion after the words to the end of line with the subsequent .* pattern and backtracks again to find the last digit and the only one digit is stored in Capturing group 1 since one digit satisfies the [0-9]+ pattern. The last .* just consumes the line to its end.
You may solve the issue by just using ".*weather.*?([0-9]+).*" (making the second .* lazy), but since you are using Matcher#find(), you can use a simpler regex:
Pattern pattern = Pattern.compile("weather\\D*(\\d+)");
And after getting a match, retrieve the value with matcher.group(1).
See the regex demo.
Pattern details
weather - a weather word
\\D* - 0+ chars other than digits
(\\d+) - Capturing group 1: one or more digits
See the Java demo:
String inputLine = "What is the weather in 75042?";
Pattern pattern = Pattern.compile("weather\\D*(\\d+)");
Matcher matcher = pattern.matcher(inputLine);
if (matcher.find()) {
System.out.println(matcher.group(1)); // => 75042
}
I think all you need is \\d+
public String getZipcode(String inputLine) throws Exception {
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(inputLine);
if (matcher.find()) {
return matcher.group();
}
//A good practice is to throw an exception if no result found
throw new NoSuchElementException("Zipcode Not Found.");
}
In regular expressions operators that have no upper bound (*, +) are greedy.
There were already perfect solutions suggested.
I'm just adding one that is very close to your's and addresses the problem in a more isolated way:
If you use the regex
".*weather.*?([0-9]+).*" ... instead of ...
".*weather.*([0-9]+).*"
... your solution will work perfectly well. The '?' after the asterisk instructs the regex compiler to treat the asterisk as non-greedy.
Greedy means consuming as many characters as possible (from left to right) while still allowing the remainder of the regex to match.
Non-greedy means consuming as few characters as possible while still allowing the remainder of the regex to match.

Java regex matcher doesn't group as expected

I have a regex
.*?(\\d+.*?\\d*).*?-.*?(\\d+.*?\\d*).*?
I want to match any string that contains a numerical value followed by "-" and another number. Any string can be in between.
Also, I want to be able to extract the numbers using group function of Java Matcher class.
Pattern pattern = Pattern.compile(".*?(\\d+.*?\\d*).*?-.*?(\\d+.*?\\d*).*?");
Matcher matcher = pattern.matcher("13.9 mp - 14.9 mp");
matcher.matches();
I expect this result:
matcher.group(1) // this should be 13.9 but it is 13 instead
matcher.group(2) // this should be 14.9 but it is 14 instead
Any idea what I am missing?
Your current pattern has several problems. As others have pointed out, your dots should be escaped with two backslashes if you intend for them to be literal dots. I think the pattern you want to use to match a number which may or may not have a decimal component is this:
(\\d+(?:\\.\\d+)?)
This matches the following:
\\d+ one or more numbers
(?:\\.\\d+)? followed by a decimal point and one or more numbers
this entire quantity being optional
Full code:
Pattern pattern = Pattern.compile(".*?(\\d+(?:\\.\\d+)?).*?-.*?(\\d+(?:\\.\\d+)?).*?");
Matcher matcher = pattern.matcher("13.9 mp - 14.9 mp");
while (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
Output:
13.9
14.9
.*?(\d+\.*\d*).*?-.*?(\d+\.*\d*).*?
. between '\d+' and '\d' in your regex should be changed to \.

Constructing regex pattern to match sentence

I'm trying to write a regex pattern that will match any sentence that begins with multiple or one tab and/or whitespace.
For example, I want my regex pattern to be able to match " hello there I like regex!"
but so I'm scratching my head on how to match words after "hello". So far I have this:
String REGEX = "(?s)(\\p{Blank}+)([a-z][ ])*";
Pattern PATTERN = Pattern.compile(REGEX);
Matcher m = PATTERN.matcher(" asdsada adf adfah.");
if (m.matches()) {
System.out.println("hurray!");
}
Any help would be appreciated. Thanks.
String regex = "^\\s+[A-Za-z,;'\"\\s]+[.?!]$"
^ means "begins with"
\\s means white space
+ means 1 or more
[A-Za-z,;'"\\s] means any letter, ,, ;, ', ", or whitespace character
$ means "ends with"
An example regex to match sentences by the definition: "A sentence is a series of characters, starting with at lease one whitespace character, that ends in one of ., ! or ?" is as follows:
\s+[^.!?]*[.!?]
Note that newline characters will also be included in this match.
A sentence starts with a word boundary (hence \b) and ends with one or more terminators. Thus:
\b[^.!?]+[.!?]+
https://regex101.com/r/7DdyM1/1
This gives pretty accurate results. However, it will not handle fractional numbers. E.g. This sentence will be interpreted as two sentences:
The value of PI is 3.141...
If you looking to match all strings starting with a white space you can try using "^\s+*"
regular expression.
This tool could help you to test your regular expression efficiently.
http://www.rubular.com/
Based upon what you desire and asked for, the following will work.
String s = " hello there I like regex!";
Pattern p = Pattern.compile("^\\s+[a-zA-Z\\s]+[.?!]$");
Matcher m = p.matcher(s);
if (m.matches()) {
System.out.println("hurray!");
}
See working demo
String regex = "(?<=^|(\.|!|\?) |\n|\t|\r|\r\n) *\(?[A-Z][^.!?]*((\.|!|\?)(?! |\n|\r|\r\n)[^.!?]*)*(\.|!|\?)(?= |\n|\r|\r\n)"
This match any sentence following the definition 'a sentence start with a capital letter and end with a dot'.
The below regex pattern matches sentences in a paragraph.
Pattern pattern = Pattern.compile("\\b[\\w\\p{Space}“”’\\p{Punct}&&[^.?!]]+[.?!]");
Reference: https://devsought.com/regex-pattern-to-match-sentence

Java Regex Behavior

I am trying to apply the below pattern:
Pattern p = Pattern.compile(".*?");
Matcher m = p.matcher("RAJ");
StringBuffer sb = new StringBufffer();
while(m.find()) {
m.appendReplacement(sb, "L");
}
m.appendTail(sb);
Expected Output : LLL
Actual output : LRLALJL
Does the Dot(.) in the above regex match the position between the characters? If not why is the above output received
The .*? matches any number of characters, but as few as necessary to match the whole regex (the ? makes the * reluctant (also known as lazy)). Since there's nothing after that in the regex, this will always match the empty string (a.k.a the place between characters).
If you want at least a single character to be matched try .+?. Note that this is the same as just . if there's nothing else after it in the regex.
You can get it doing this:
String s = "RAJ";
s = s.replaceAll(".","L");
System.out.println(s);
You can do it using a Matcher and find method, but replaceAll accepts a regex.
It is not that . matches between the characters, but that * means 0 or more and the ? means as few as possible.
So "Zero or more things, and as few of them as possible" will always match Zero things, as that is the fewest possible, if it's not followed by something else the expression is looking for.
.{1} would result in an output of LLL as it matches anything once.
The * in your regex .*? means none or more repetitions. If you want to match at least a single character use the regex .+?.

Categories

Resources