Java regex pattern group capture - java

I'm trying to split the string below into 3 groups, but with it doesn't seem to be working as expected with the pattern that I'm using. Namely, when I invoke matcher.group(3), I'm getting a null value instead of *;+g.3gpp.cs-voice;require. What's wrong with the pattern?
String: "*;+g.oma.sip-im;explicit,*;+g.3gpp.cs-voice;require"
Pattern: (\\*;.*)?(\\*;.*?\\+g.oma.sip-im.*?)(,\\*;.*)?
Expected:
Group 1: null,
Group 2: *;+g.oma.sip-im;explicit,
Group 3: ,*;+g.3gpp.cs-voice;require
Actual:
Group 1: null,
Group 2: *;+g.oma.sip-im,
Group 3: null

The result you get does actually match your pattern in a non-greedy way. Group2 is expanded to the shortest possible result
*;+g.oma.sip-im
and then the last group is left out because of the question mark at the very end. It appears to me that you are building a far too complicated regex for your purpose.

The thing is that the (,\*;.*)? does not match as the text you expect is located further in the string. You need to make the third group obligatory by removing the ? at the end, but wrap the whole .*? + Group 3 within an optional non-capturing group:
String pat = "(\\*;.*)?(\\*;.*?\\+g\\.oma\\.sip-im)(?:.*?(,\\*;.*))?";
See the regex demo.
Note that literal dots should be escaped in the regex pattern.
Details:
(\\*;.*)? - Group 1 (optional) capturing
\\*; - a *; string
.* - any zero or more chars other than linebreak symbols, as many as possible
(\\*;.*?\\+g\\.oma\\.sip-im) - Group 2 (obligatory) capturing
\\*; - a *; string
.*? - any zero or more chars other than linebreak symbols, as few as possible
\\+g\\.oma\\.sip-im - a literal string +g.oma.sip-im
(?:.*?(,\\*;.*))? - non-capturing group (optional) matching
.*? - any zero or more chars other than linebreak symbols, as few as possible
(,\\*;.*) - Group 3 (obligatory) capturing the same pattern as in Group 1.

Related

Trying to capture multiple groups in regex while skipping others

I am trying to create a single regex expression which will have the following result on two different example texts:
Example 1
Example text 1: "App Name: Person Name"
Captured group 1: "App Name"
Captured group 2: "Person Name"
Example 2
Example text 2: "App Name (1 factor): Person Name"
Captured group 1: "App Name"
Captured group 2: "Person Name"
The regex expression I have come up with is: (.*)(\s\(.*\))?:\s(.*)
But it doesn't seem to be capturing correctly and I can't see why.
I am trying this in Java on Android (and I am using a double slash to escape in the string)
I think what you're looking for is something like:
([A-Za-z0-9\s]*)(\s\(.*\))?:\s(.*)
The (.*) in the first group you have is capturing every character greedily. You have to specify what kind of characters can come before the (. I used regex101.com to test, and it seems to work for your provided cases.
You may use
^(.*?)(?:\s*\([^()]*\))?:\s*(.*)$
See the regex demo.
Details
^ - start of string
(.*?) - Capturing group 1: any zero or more chars other than line break chars, as few as possible
(?:\s*\([^()]*\))? - an optional non-capturing group matching 1 or 0 occurrences of
\s* - 0+ whitespaces
\([^()]*\) - a (, zero or more chars other than ( and ) and then )
: - a colon
\s* - 0 or more whitespaces
(.*) - Capturing group 2: any zero or more chars other than line break chars, as many as possible
$ - end of string.
Just do a non capture of : as so:
Pattern
([^:\n]+)\s*:\s*([^:\n]+)
See the demo here.

Trying to match possible tags in string by regex

those are my possible inputs:
"#smoke"
"#smoke,#Functional1" (OR condition)
"#smoke,#Functional1,#Functional2" (OR condition)
"#smoke","#Functional1" (AND condition),
"#smoke","~#Functional1" (SKIP condition),
"~#smoke","~#Functional1" (NOT condition)
(Please note, the string input for the regex, stops at the last " character on each line, no space or comma follows it!
The regex I came up with so far is
"((?:[~#]{1}\w*)+),?"
This matches in capturing groups for the samples 1, 4, 5 and 6 but NOT 2 and 3.
I am not sure how to continue tweaking it further, any suggestions?
I would like to capture the preceding boolean meaning of the tag (eg: ~) as well please.
If you have any suggestions to pre-process the string in Java before regex that would make it simpler, I am open to that possibility as well.
Thanks.
It seems that you want to match an optional ~ followed by an # and get iterative matches for group 1. You could make use of the \G anchors, which matches either at the start, or at the end of the previous match.
(?:"(?=.*"$)|\G(?!^))(~?#\w+(?:,~?#\w+)*)"?[,\h]?
Explanation
(?: Non capture group
"(?=.*"$) Match " and assert that the string ends with "
| Or
\G(?!^) Assert the position at the end of the previous match, not at the start
) Close non capture group
( Capture group 1
~?#\w+(?:,~?#\w+)* Match an optional ~, than # and 1+ word characters and repeat 0+ times with a comma prepended
)"? Close group 1 and match an optional "
[,\h] Match either a comma or a horizontal whitespace char.
Regex demo | Java demo
Example code
String regex = "(?:\"(?=.*\"$)|\\G(?!^))(~?#\\w+(?:,~?#\\w+)*)\"?[,\\h]?";
String string = "\"#smoke\"\n"
+ "\"#smoke,#Functional1\"\n"
+ "\"#smoke,#Functional1,#Functional2\"\n"
+ "\"#smoke\",\"#Functional1\"\n"
+ "\"#smoke\",\"~#Functional1\"\n"
+ "\"~#smoke\",\"~#Functional1\"";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
#smoke
#smoke,#Functional1
#smoke,#Functional1,#Functional2
#smoke
#Functional1
#smoke
~#Functional1
~#smoke
~#Functional1
Edit
If there are no consecutive matches, you could also use:
"(~?#\w+(?:,~?#\w+)*)"
Regex demo

Regex to continue matching the similar pattern

I have JCL statement to be matched with regex pattern
The statement would be like below
//name JOB optionalParam,keyword=param,keyword=param,keyword=param
Actual statement would be like below
//ADBB503 JOB ,MSGCLASS=2,CLASS=P
//ABCD JOB Something,MSG=NTNG,CLASS=ABC
I have tried a regular expression to match in groups but the last keyword and param will have n number of times I need to continue matching till it exists.
String regex= (\/\/)(\w+)(\s+)(JOB)(\s+)(\w+)?(,)([\w+=\w+]+);
My trial is in the link given below
https://regex101.com/r/gUyRMV/1
The error I am facing is only one keyword=parameter is matching. N number of keyword and parameters needs to be matched.
You could match the job statement in the first capturing group and make use of \G to get the parameters in group 2:
(?:(//\w+\s+JOB(?: \w+)?)\h*|\G(?!^)),(\w+=\w+)
Explanation
(?: Non capturing group
( Capture group 1
//\w+\s+JOB Match //, 1+ word chars and JOB
(?: \w+)? Match optional param
)\h*` Close group and match 0+ horizontal whitespace chars
| Or
\G(?!^) Assert position at the end of previous match, not at the start
), Close non capturing group and match ,
( Capture group 2
\w+=\w+ Match 1+ word chars = 1 + word chars
) Close group
In java
String regex = "(?:(//\\w+\\s+JOB(?: \\w+)?)\\h*|\\G(?!^)),(\\w+=\\w+)";
Regex demo | Java demo

How to use regex groups in Java

I need to replace string 'name' with fullName in the following kind of strings:
software : (publisher:abc and name:oracle)
This needs to be replaced as:
software : (publisher:abc and fullName:xyz)
Now, basically, part "name:xyz" can come anywhere inside parenthesis. e.g.
software:(name:xyz)
I am trying to use groups and the regex I built looks :
(\bsoftware\s*?:\s*?\()((.*?)(\s*?(and|or)\s*?))(\bname:.*?\)\s|:.*?\)$)
You may use
\b(software\s*:\s*\([^()]*)\bname:\w+
and replace with $1fullName:xyz. See the regex demo and the regex graph:
Details
\b - word boundary
(software\s*:\s*\([^()]*) - Capturing group 1 ($1 in the replacement pattern is a placeholder for the value captured in this group):
software - a word
\s*:\s* - a : enclosed with 0+ whitespaces
\( - a ( char
[^()]* - 0 or more chars other than ( and )
\bname - whole word name
: - colon
\w+ - 1 or more letters, digits or underscores.
Java sample code:
String result = s.replaceAll("\\b(software\\s*:\\s*\\([^()]*)\\bname:\\w+", "$1fullName:xyz");

regular expression retrieve a portion of a string that not contain a string

I have some strings like the following:
it.mycompany.db.beans.str1.PD_T_CLASS
it.mycompany.db.beans.join.PD_T_CLASS
it.mycompany.db.beans.str2.PD_T_CLASS_1
it.mycompany.db.beans.join.PD_T_CLASS_1
PD_T_CLASS myVar = new PD_T_CLASS();
myVar.setPD_T_CLASS(something);
and I want to select "PD_" part to substitute it with "" (the void string) but only inf the entire line does not contain the string ".join."
what I want to achieve is:
it.mycompany.db.beans.str1.T_CLASS
it.mycompany.db.beans.join.PD_T_CLASS
it.mycompany.db.beans.str2.T_CLASS_1
it.mycompany.db.beans.join.PD_T_CLASS_1
T_CLASS myVar = new T_CLASS();
myVar.setT_CLASS(something);
The substitution is not a problem since I'm using eclipse search tool and will hit replace as soon as it show me the right result.
I have tried:
^((?!\.join\.).)*(PD_)*$ // whole string selected
^((?!\.join\.).)*(\bPD_\b)*$ // whole string selected
I start getting frustrated since I've searched a bit around (the ^((?!join bla bla come from those searches)
Can you help me?
You may use the following regex:
(?m)(?:\G(?!\A)|^(?!.*\.join\.))(.*?)PD_
and replace with
$1
See the regex demo
Details:
(?m) - a Pattern.MULTILINE inline modifier flag that will force ^ to match the beginning of a line rather than a whole string
(?:\G(?!\A)|^(?!.*\.join\.)) - either of the two alternatives:
\G(?!\A) - the end of the previous successful match
| - or
^(?!.*\.join\.) - start of a line that has no .join. text in it (as the (?!.*\.join\.) is a negative lookahead that will fail the match if it matches any 0+ chars other than line break chars (.*) and then .join.)
(.*?) - Capturing group #1 (referred to with the $1 backreference in the replacement pattern): any 0+ chars other than line breaks, as few as possible, up to the first occurrence of ...
PD_ - a literal PD_
The replacement is a $1 backreference to the first capturing group that will restore any text matched before PD_s.

Categories

Resources