Regex matcher not giving expected result. Not matching number properly - java

I cannot understand why 2nd group is giving me only 0. I expect 3000. And do point me to a resource where I can understand better.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public static void main( String args[] ) {
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );//?
System.out.println("Found value: " + m.group(3) );
}else {
System.out.println("NO MATCH");
}
}
}

Precise the pattern, add QT before the \d pattern, or use .*? instead of the first .* to get as few chars as possible.
String pattern = "(.*QT)(\\d+)(.*)";
or
String pattern = "(.*?)(\\d+)(.*)";
will do. See a Java demo.
The (.*QT)(\\d+)(.*) will match and capture into Group 1 any 0+ chars other than line break chars, as many as possible, up to the last occurrence of QT (followed with the subsequent subpatterns), then will match and capture 1+ digits into Group 2, and then will match and capture into Group 3 the rest of the line.
The .*? in the alternative pattern will matchand capture into Group 1 any 0+ chars other than line break chars, as few as possible, up to the first chunk of 1 or more digits.
You may also use a simpler pattern like String pattern = "QT(\\d+)"; to get all digits after QT, and the result will be in Group 1 then (you won't have the text before and after the number).

The * quantifier will try to match as many as possible, because it is a greedy quantifier.
You can make it non-greedy (lazy) by changing it to *?
Then, your regex will become :
(.*?)(\d+)(.*)
And you will match 3000 in the 2nd capturing group.
Here is a regex101 demo

Related

Merge two pattern into one

I need write a pattern to remove currency symbol and comma. eg Fr.-145,000.01
After the pattern matcher should return -145000.01.
The pattern i am using:
^[^0-9\\-]*([0-9\\-\\.\\,]*?)[^0-9\\-]*$
This will return -145,000.01
Then I remove the comma to get -145000.01, I want to ask if that's possible that I change the pattern and directly get -145000.01
String pattern = "^[^0-9\\-]*([0-9\\-\\.\\,]*?)[^0-9\\-]*$";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(str);
if(m.matches()) {
System.out.println(m.group(1));
}
I expect the output could resolve the comma
You can simply it with String.replaceAll() and simpler regex (providing you are expecting the input to be reasonably sane, i.e. without multiple decimal points embedded in the numbers or multiple negative signs)
String str = "Fr.-145,000.01";
str.replaceAll("[^\\d-.]\\.?", "")
If you are going down this route, I would sanity check it by parsing the output with BigDecimal or Double.
One approach would be to just collect our desired digits, ., + and - in a capturing group followed by an optional comma, and then join them:
([+-]?[0-9][0-9.]+),?
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "([+-]?[0-9][0-9.]+),?";
final String string = "Fr.-145,000.01\n"
+ "Fr.-145,000\n"
+ "Fr.-145,000,000\n"
+ "Fr.-145\n"
+ "Fr.+145,000.01\n"
+ "Fr.+145,000\n"
+ "Fr.145,000,000\n"
+ "Fr.145\n"
+ "Fr.145,000,000,000.01";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
Demo
String str = "Fr.-145,000.01";
Pattern regex = Pattern.compile("^[^0-9-]*(-?[0-9]+)(?:,([0-9]{3}))?(?:,([0-9]{3}))?(?:,([0-9]{3}))?(\\.[0-9]+)?[^0-9-]*$");
Matcher matcher = regex.matcher(str);
System.out.println(matcher.replaceAll("$1$2$3$4$5"));
Output:
-145000.01
It looks for number with up to 3 commas (Up to 999,999,999,999.99), and replaces it with the digits.
My approach would be to remove all the unnecessary parts using replaceAll.
The unnecessary parts are, apparently:
Any sequence which is not digits or minus at the beginning of the string.
Commas
The first pattern is represented by ^[^\\d-]+. The second is merely ,.
Put them together with an |:
Pattern p = Pattern.compile("(^[^\\d-]+)|,");
Matcher m = p.matcher(str);
String result = m.replaceAll("");
You could 2 capturing groups and make use of repeating matching using the \G anchor to assert the position at the end of the previous match.
(?:^[^0-9+-]+(?=[.+,\d-]*\.\d+$)([+-]?\d{1,3})|\G(?!^)),(\d{3})
In Java
String regex = "(?:^[^0-9+-]+(?=[.+,\\d-]*\\.\\d+$)([+-]?\\d{1,3})|\\G(?!^)),(\\d{3})";
Explanation
(?: Non capturing group
^[^0-9+-]+ Match 1+ times not a digit, + or -
(?= Positive lookahead, assert that what follows is:
[.+,\d-]*\.\d+$ Match 0+ times what is allowed and assert ending on . and 1+ digits
) Close positive lookahead
( Capturing group 1
[+-]?\d{1,3}) Match optional + or - followed by 1-3 digits
| Or
\G(?!^) Assert position at the end of prevous match, not at the start
), Close capturing group 1 and match ,
(\d{3}) Capture in group 2 matching 3 digits
In the replacement use the 2 capturing groups $1$2
See the Regex demo | Java demo

Regex to match strings in-between double quotes that are not containing some other strings

How to match words between double quotes in lines not containing specific words
input:
System.log("error");
new Exception("error");
view.setText("message");
From the above input, I would like to ignore lines with log and Exception words in them(Case sensitive) and match words in between double quotes.
Expected output
message
I have been trying to use look ahead without luck
(?s)^(?!log)".+"
I need this for a search in IntelliJ using regex
In your pattern (?s)^(?!log)".+" the negative lookahead does not contain a quantifier so it will assert that what is directly after the start of the string is not log
What you could do is use a quantifier .* with an alternation to match either log or Exception and add word boundaries \b to prevent them being part of a larger word.
Then you might use negated character classes [^"] to match not a double quote and use a capturing group ([^"]+) for the value between the double quotes.
^(?!.*\b(?:log|Exception)\b)[^"]*"([^"]+)"
In Java:
String regex = "^(?!.*\\b(?:log|Exception)\\b)[^\"]*\"([^\"]+)\"";
Regex demo
If you want to make the dot to match a newline you can prepend (?s) to the pattern.
My guess is that this expression would likely work for capturing the message,
^(?!.*log.*|.*exception.*).*?"(.+?)".*
Demo 1
Example
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "^(?!.*log.*|.*exception.*).*?"(.+?)".*";
final String string = "System.log(\"error\");\n"
+ "new Exception(\"error\");\n"
+ "view.setText(\"message\");";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}

Get number into string with regex

I have to extract two number in a string, this is always like this:
file_Sig201701311539043872_1736587_614007_marketing.000
I need to save in different value :
1736587
614007
How i can do this ?
I tried with: \_(.*?)\_ but not work properly.
Try the following pattern matcher:
final Pattern NUMBER_MATCHER = Pattern.compile("_(\\d+)_(\\d+)");
Matcher matcher = NUMBER_MATCHER.matcher(/* your file name */);
if (matcher.find()) {
System.out.println("matcher.group(1) = " + matcher.group(1));
System.out.println("matcher.group(2) = " + matcher.group(2));
}
which prints:
matcher.group(1) = 1736587
matcher.group(2) = 614007
For now the regex works for underscore followed by any number of digits followed by an underscore and then again any number of digits.
You can leverage lookarounds:
(?<=_)(\\d+)(?=_)
The captured groups would contain the required digits.
The zero width negative lookbehind, (?<=_) makes sure the match is preceded by a _
The zero width negative lookbehind, (?=_) makes sure the match is followed by a _
(\d+) matches one or more digits and put in captured group
Try with this one.
Sample Code
final Pattern NUMBER_MATCHER = Pattern.compile("_(\\d*)_(\\d*)");
Matcher matcher = NUMBER_MATCHER.matcher("file_Sig201701311539043872_1736587_614007_marketing.000");
if (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}

get everything after a particular string

I have a String coming as "process_client_123_Tree" and "process_abc_pqr_client_123_Tree". I want to extract everything after "process_client_" and "process_abc_pqr_client_" and store it in a String variable.
Here currentKey variable can contain either of above two strings.
String clientId = // how to use currentKey here so that I can get remaining portion in this variable
What is the right way to do this? Should I just use split here or some regex?
import java.util.regex.*;
class test
{
public static void main(String args[])
{
Pattern pattern=Pattern.compile("^process_(client_|abc_pqr_client_)(.*)$");
Matcher matcher = pattern.matcher("process_client_123_Tree");
while(matcher.find())
System.out.println("String 1 Group 2: "+matcher.group(2));
matcher = pattern.matcher("process_abc_pqr_client_123_Tree");
while(matcher.find())
System.out.println("String 2 Group 2: "+matcher.group(2));
System.out.println("Another way..");
System.out.println("String 1 Group 2: "+"process_client_123_Tree".replace("process_client_", ""));
System.out.println("String 2 Group 2: "+"process_abc_pqr_client_123_Tree".replace("process_abc_pqr_client_", ""));
}
}
Output:
$ java test
String 1 Group 2: 123_Tree
String 2 Group 2: 123_Tree
Another way..
String 1 Group 2: 123_Tree
String 2 Group 2: 123_Tree
Regex breakup:
^ match start of line
process_(client_|abc_pqr_client_) match "process_" followed by "client_" or abc_pqr_client_" (captured as group 1)
(.*)$ . means any char and * means 0 or more times, so it match the rest chars in string until end ($) and captures it as group 2
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Matchit{
public static void main(String []args){
String str = "process_abc_pqr_client_123_Tree";
Pattern p = Pattern.compile("process_abc_pqr_client_(.*)|process_client_(.*)");
Matcher m = p.matcher("process_abc_pqr_client_123_Tree");
if (m.find( )) {
System.out.println("Found value: " + m.group(1) );
}
}
}
Gets you:
123_Tree
The parentheses in the regexp define the match groups. The pipe is a logical or. Dot means any character and star means any number. So, I create a pattern object with that regexp and then use a matcher object to get the part of the string that has been matched.
A regex pattern could be: "process_(?:abc_pqr_)?client_(\\w+)" regex101 demo
(?:abc_pqr_)? is the optional part
(?: opens a non capture group )? zero or one times
\w+ matches one or more word characters [A-Za-z0-9_]
Demo at RegexPlanet. Matches will be in group(1) / first capturing group.
To extend it with limit to the right, match lazily up to the right token
"process_(?:abc_pqr_)?client_(\\w+?)_trace_count"
where \w+? matches as few as possible word characters to meet condition.

regex pattern won't match anything

I'd like my mPattern to match FFF1 or FFF3 strings at least 4 times in a search-string. I've written two pattern versions but neither of those give any matches.
Pattern mPattern = Pattern.compile("(FFF1|FFF3){4,}");
ver2:
Pattern mPattern = Pattern.compile("(FFF1{4,}|FFF3{4,})");
search-string is (example):
0DCB1C992B37173740244875C143D50ACDBA0422CD01D73D3C78F05ED7BBC2B33F9D78A7FFF342C0241C6B56B11EC1867984C20F42A4FAC5B9C0
42220314C006D94E124673CD4CC27FC2FCE12215410F12086BE5A3EDFC6DB2BEB0EAEC6EAAA4BF997FFB3337F914AB1A89C808EA6D338912D72E
99CE11E899999D3AE1092590FB2B71D736DC544B0AFD1035A3FFF340C00E178B62E5BE48C46F04B8EFC106AE3F17DDE08B5FD48672EBEABB216A
8438B6FB3B33BF91D3F3EBFCE14184320532ABA37FFD59BFF6ABAD1AA9AADEE73220679D2C7DDBAB766433A99D8CA752B383067465691750A24A
00F32A5078E29258F6D87A620AFFF342C00A158B22E5BE5944BAE8BA2C54739BE486B719A76DF5FD984D5257DBEAC43B238598EFAB3592DE8DD5
The pattern "(FFF1|FFF3){4,}" will match FFF1 or FFF3 placed adjacent, with a repetition of 4 or more. I guess there can be any characters between multiple occurrences. In that case, use the following regex:
"^(?:.*?(FFF1|FFF3)){4,}.*$"
.*? match any character till the next FFF1 or FFF3, then match FFF1|FFF3. Repeat this sequence 4 or more times (applied on entire non-capturing group).
You can use the above pattern directly with String#matches(String) method. Or, if you are building Pattern and Matcher objects, then just use the following pattern with Matcher#find() method:
"(?:.*?(FFF1|FFF3)){4,}"
Working code:
String str = "..."; // initialize
Pattern mPattern = Pattern.compile("(?x)" + // Ignore whitespace
"(?: " + // Non-capturing group
" .*? " + // 0 or more repetition of any character
" (FFF1|FFF3) " + // FFF1 or FFF3
"){4,} " // Group close. Match group 4 or more times
);
Matcher matcher = mPattern.matcher(str);
System.out.println(matcher.find());

Categories

Resources