Regex to continue matching the similar pattern - java

I have JCL statement to be matched with regex pattern
The statement would be like below
//name JOB optionalParam,keyword=param,keyword=param,keyword=param
Actual statement would be like below
//ADBB503 JOB ,MSGCLASS=2,CLASS=P
//ABCD JOB Something,MSG=NTNG,CLASS=ABC
I have tried a regular expression to match in groups but the last keyword and param will have n number of times I need to continue matching till it exists.
String regex= (\/\/)(\w+)(\s+)(JOB)(\s+)(\w+)?(,)([\w+=\w+]+);
My trial is in the link given below
https://regex101.com/r/gUyRMV/1
The error I am facing is only one keyword=parameter is matching. N number of keyword and parameters needs to be matched.

You could match the job statement in the first capturing group and make use of \G to get the parameters in group 2:
(?:(//\w+\s+JOB(?: \w+)?)\h*|\G(?!^)),(\w+=\w+)
Explanation
(?: Non capturing group
( Capture group 1
//\w+\s+JOB Match //, 1+ word chars and JOB
(?: \w+)? Match optional param
)\h*` Close group and match 0+ horizontal whitespace chars
| Or
\G(?!^) Assert position at the end of previous match, not at the start
), Close non capturing group and match ,
( Capture group 2
\w+=\w+ Match 1+ word chars = 1 + word chars
) Close group
In java
String regex = "(?:(//\\w+\\s+JOB(?: \\w+)?)\\h*|\\G(?!^)),(\\w+=\\w+)";
Regex demo | Java demo

Related

Regex Pattern Match toString() in Java

I'm looking for some help with matching a pattern for a string (tostring() generated):
MyObject{nothingSpecial='Word1', secretData='Word2', privateEmail='Word3'}
I wanted a pattern that can match on the Word1, Word3. So I came up with:
(?x)(["]?(nothingSpecial|secretData)["]?\s*[:=]{1}\s*["]?)(?:[^"\n,]+)
That worked but now I need to step it up so I can match on the oject name too e.g. MyObject
Examples:
Don't match since it's not MyObject: YourObject{nothingSpecial='Word1', secretData='Word2', privateEmail='Word3'}
Match to MyObject so look for nothingSpecial & privateEmail: MyObject { nothingSpecial='Word1', secretData='Word2', privateEmail='Word3'}
Don't match since it's not MyObject: TheirObject{nothingSpecial='Word1', secretData='Word2', privateEmail='Word3'}
Truthfully, I've never been great a RegEx so any help would be very much appreciated.
To match the 3 words, you could make use if the \G anchor
(?:\b(MyObject)\h*\{\h*(?=[^{}]*})|\G(?!^))(?:(?:nothingSpecial|privateEmail)='([^'\n,]+)'|[^\s=]+='[^'\n,]*')(?:,\h*)?
Regex demo | Java demo
(?: Non capture group
\b(MyObject)\h*\{\h* Capture MyObject in group 1 and match {
(?=[^{}]*})
| Or
\G(?!^) Assert the position at the end of the previous match
) Close the non capture group
(?: Non capture group
(?:nothingSpecial|privateEmail)= Match either nothingSpecial or privateEmail followed by =
'([^'\n,]+)' Capture group 2 Match any char except ' a newline or comma between single quotes
| Or
[^\s=]+='[^'\n,]*' Match a key value pair with single quotes
) Close non capture group
(?:,\h*)? Optionally match a comma and horizontal whitespace chars

Trying to match possible tags in string by regex

those are my possible inputs:
"#smoke"
"#smoke,#Functional1" (OR condition)
"#smoke,#Functional1,#Functional2" (OR condition)
"#smoke","#Functional1" (AND condition),
"#smoke","~#Functional1" (SKIP condition),
"~#smoke","~#Functional1" (NOT condition)
(Please note, the string input for the regex, stops at the last " character on each line, no space or comma follows it!
The regex I came up with so far is
"((?:[~#]{1}\w*)+),?"
This matches in capturing groups for the samples 1, 4, 5 and 6 but NOT 2 and 3.
I am not sure how to continue tweaking it further, any suggestions?
I would like to capture the preceding boolean meaning of the tag (eg: ~) as well please.
If you have any suggestions to pre-process the string in Java before regex that would make it simpler, I am open to that possibility as well.
Thanks.
It seems that you want to match an optional ~ followed by an # and get iterative matches for group 1. You could make use of the \G anchors, which matches either at the start, or at the end of the previous match.
(?:"(?=.*"$)|\G(?!^))(~?#\w+(?:,~?#\w+)*)"?[,\h]?
Explanation
(?: Non capture group
"(?=.*"$) Match " and assert that the string ends with "
| Or
\G(?!^) Assert the position at the end of the previous match, not at the start
) Close non capture group
( Capture group 1
~?#\w+(?:,~?#\w+)* Match an optional ~, than # and 1+ word characters and repeat 0+ times with a comma prepended
)"? Close group 1 and match an optional "
[,\h] Match either a comma or a horizontal whitespace char.
Regex demo | Java demo
Example code
String regex = "(?:\"(?=.*\"$)|\\G(?!^))(~?#\\w+(?:,~?#\\w+)*)\"?[,\\h]?";
String string = "\"#smoke\"\n"
+ "\"#smoke,#Functional1\"\n"
+ "\"#smoke,#Functional1,#Functional2\"\n"
+ "\"#smoke\",\"#Functional1\"\n"
+ "\"#smoke\",\"~#Functional1\"\n"
+ "\"~#smoke\",\"~#Functional1\"";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
#smoke
#smoke,#Functional1
#smoke,#Functional1,#Functional2
#smoke
#Functional1
#smoke
~#Functional1
~#smoke
~#Functional1
Edit
If there are no consecutive matches, you could also use:
"(~?#\w+(?:,~?#\w+)*)"
Regex demo

Java regex. Match any "value" that is no preceded by given string

I need some help with a Java regexp.
I'm working with a file that has JSON similar format:
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',value:'xxxx'},{},[]],
['zul.wgt.Label','f6DQpf',{sclass:'class notranslate',style:'font-weight: bold;',prolog:' ',value:'xxxx'},
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',label:'xxxx'},{},[]]
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',label:'xxxx'},{},[]]
I need to match any label or value data that is not preceded by a "notranslate" value on the sclass property.
I've been working on an almost working Regexp but I need the final push to match only what I've previously wrote
((?!.*?notranslate)sclass:'[\w\s]+'.*?)((value|label):'(.*?)')
Right now it matches anything from sclass that it's not followed by 'notranslate'
Thanks for your help
The values of your current regex are in the 4th capturing group
You could also use 1 capturing group instead of 4:
^(?!.*\bsclass:'[^']*\bnotranslate\b[^']*').*\b(?:label|value):'([^']+)'
Regex demo
That would match:
^ Assert start of the string
(?! Negative lookahead to assert that what is on the right does not
.*\bsclass: Match any character 0+ times followed by class:
'[^']*\bnotranslate\b[^']*' Match notranslate between single quotes and word boundaries
) Close non capturing group
.* match any character 0+ times
\b(?:label|value): Match either label or value followed by :
'([^']+)' Match ', capture in a group matching not ' 1+ times and match '
Java demo

Java regex pattern group capture

I'm trying to split the string below into 3 groups, but with it doesn't seem to be working as expected with the pattern that I'm using. Namely, when I invoke matcher.group(3), I'm getting a null value instead of *;+g.3gpp.cs-voice;require. What's wrong with the pattern?
String: "*;+g.oma.sip-im;explicit,*;+g.3gpp.cs-voice;require"
Pattern: (\\*;.*)?(\\*;.*?\\+g.oma.sip-im.*?)(,\\*;.*)?
Expected:
Group 1: null,
Group 2: *;+g.oma.sip-im;explicit,
Group 3: ,*;+g.3gpp.cs-voice;require
Actual:
Group 1: null,
Group 2: *;+g.oma.sip-im,
Group 3: null
The result you get does actually match your pattern in a non-greedy way. Group2 is expanded to the shortest possible result
*;+g.oma.sip-im
and then the last group is left out because of the question mark at the very end. It appears to me that you are building a far too complicated regex for your purpose.
The thing is that the (,\*;.*)? does not match as the text you expect is located further in the string. You need to make the third group obligatory by removing the ? at the end, but wrap the whole .*? + Group 3 within an optional non-capturing group:
String pat = "(\\*;.*)?(\\*;.*?\\+g\\.oma\\.sip-im)(?:.*?(,\\*;.*))?";
See the regex demo.
Note that literal dots should be escaped in the regex pattern.
Details:
(\\*;.*)? - Group 1 (optional) capturing
\\*; - a *; string
.* - any zero or more chars other than linebreak symbols, as many as possible
(\\*;.*?\\+g\\.oma\\.sip-im) - Group 2 (obligatory) capturing
\\*; - a *; string
.*? - any zero or more chars other than linebreak symbols, as few as possible
\\+g\\.oma\\.sip-im - a literal string +g.oma.sip-im
(?:.*?(,\\*;.*))? - non-capturing group (optional) matching
.*? - any zero or more chars other than linebreak symbols, as few as possible
(,\\*;.*) - Group 3 (obligatory) capturing the same pattern as in Group 1.

Search phone number with regex Java

I want to find numbers like
(123).234.4567
(123)-234-4567
123.234.4567
123-456-4567
This digits represents US phone numbers. So they can be separated with -(dash) or .(dot) and must have 3 digits, followed by another 3 digits, followed by 4 more. And the first 3 digits can be fenced or not with brackets '()'.
and code is like below
Pattern regex = Pattern.compile("^(\\(\\d{3}\\))|^\\d{3}[.-]?\\d{3}[.-]?\\d{4}$");
Matcher matcher = regex.matcher("(425).882.8080 tel");
while(matcher.find()){
System.out.println(matcher.group());
}
But the result is :
(425)
What I am doing Wrong. I want to print (425).882.8080 at once.
You can try Back reference to check for exact match.
Backreferences match the same text as previously matched by a capturing group
^(\(\d{3}\)|\d{3})([.-])\d{3}\2\d{4}$
Capture 2nd group -----------^^^^ ^^-------- Same text as 2nd captured group
Groups are captured by enclosing it inside the parenthesis (...) and can be accessed by \index
Here is online demo
Note: If you want to find substring in a string then remove ^ and $ that are used for beginning and ending of the string respectively.
Patten explanation:
( group and capture to \1:
\( '('
\d{3} digits (0-9) (3 times)
\) ')'
| OR
\d{3} digits (0-9) (3 times)
) end of \1
( group and capture to \2:
[.-] any character of: '.', '-'
) end of \2
\d{3} digits (0-9) (3 times)
\2 what was matched by capture \2
\d{4} digits (0-9) (4 times)
Sample code:
String regex="(\\(\\d{3}\\)|\\d{3})([.-])\\d{3}\\2\\d{4}";
System.out.println("(123).234.4567".matches(regex)); // true
System.out.println("(123)-234-4567".matches(regex)); // true
System.out.println("123.234.4567".matches(regex)); // true
System.out.println("123-456-4567".matches(regex)); // true
System.out.println("(123)-234.4567".matches(regex)); // false
System.out.println("(123-234-4567".matches(regex)); // false
System.out.println("123.234-4567".matches(regex)); // false
System.out.println("123-456.4567".matches(regex)); // false
Sample code: (as per your's)
Matcher matcher = Pattern.compile(regex).matcher("(425).882.8080 tel");
while (matcher.find()) {
String str = matcher.group();
System.out.println(str); // (425).882.8080
}
You only have one matching group: (\\(\\d{3}\\))
Try
Pattern.compile("^((?:\\(\\d{3}\\))|^\\d{3}[.-]?\\d{3}[.-]?\\d{4})");
this will provide you the whole matched number.
If you, instead, need all numbers separately you have to add multiple matching groups, like
Pattern.compile("^(\\(\\d{3}\\))|^(\\d{3})[.-]?(\\d{3})[.-]?(\\d{4})");
Btw. by using [.-]? you will also match 1232344567 (without dots/dashes). To fix this, drop the ? after [.-].
An optimized version could be:
Pattern.compile("^((\\((\\d{3})\\)|(\\d{3}))[.-](\\d{3})[.-](\\d{4}))");
This way you get the whole matched number as well as all included nubmers separately.
Another point: your ininital regexp would also match 123.234-4567 If that's not desireable, anothe OR is needed for all cases.
E.g.
Pattern.compile("^((\\((\\d{3})\\)|((\\d{3})\\.(\\d{3})\\.(\\d{4})|(\\d{3})-(\\d{3})-(\\d{4})))");
Updated for you last edit:
Pattern.compile("^((?:\\(\\d{3}\\)|\\d{3})(?:\\.\\d{3}\\.\\d{4}|\\d{3}-\\d{3}-\\d{4}))");
You don't need to put start and end anchors in your regex if the phone number would appears anywhere on the input.
\d{3}([.-])?\d{3}\1?\d{4}|\(\d{3}\)([.-])?\d{3}\2?\d{4}
DEMO
Java regex would be,
"\\d{3}([.-])?\\d{3}\\1?\\d{4}|\\(\\d{3}\\)([.-])?\\d{3}\\2?\\d{4}"
Example:
Pattern regex = Pattern.compile("\\d{3}([.-])?\\d{3}\\1?\\d{4}|\\(\\d{3}\\)([.-])?\\d{3}\\2?\\d{4}");
Matcher matcher = regex.matcher("(425).882.8080 tel");
while(matcher.find()){
System.out.println(matcher.group());
} // (425).882.8080

Categories

Resources