Java regex. Match any "value" that is no preceded by given string - java

I need some help with a Java regexp.
I'm working with a file that has JSON similar format:
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',value:'xxxx'},{},[]],
['zul.wgt.Label','f6DQpf',{sclass:'class notranslate',style:'font-weight: bold;',prolog:' ',value:'xxxx'},
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',label:'xxxx'},{},[]]
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',label:'xxxx'},{},[]]
I need to match any label or value data that is not preceded by a "notranslate" value on the sclass property.
I've been working on an almost working Regexp but I need the final push to match only what I've previously wrote
((?!.*?notranslate)sclass:'[\w\s]+'.*?)((value|label):'(.*?)')
Right now it matches anything from sclass that it's not followed by 'notranslate'
Thanks for your help

The values of your current regex are in the 4th capturing group
You could also use 1 capturing group instead of 4:
^(?!.*\bsclass:'[^']*\bnotranslate\b[^']*').*\b(?:label|value):'([^']+)'
Regex demo
That would match:
^ Assert start of the string
(?! Negative lookahead to assert that what is on the right does not
.*\bsclass: Match any character 0+ times followed by class:
'[^']*\bnotranslate\b[^']*' Match notranslate between single quotes and word boundaries
) Close non capturing group
.* match any character 0+ times
\b(?:label|value): Match either label or value followed by :
'([^']+)' Match ', capture in a group matching not ' 1+ times and match '
Java demo

Related

Regex Pattern Match toString() in Java

I'm looking for some help with matching a pattern for a string (tostring() generated):
MyObject{nothingSpecial='Word1', secretData='Word2', privateEmail='Word3'}
I wanted a pattern that can match on the Word1, Word3. So I came up with:
(?x)(["]?(nothingSpecial|secretData)["]?\s*[:=]{1}\s*["]?)(?:[^"\n,]+)
That worked but now I need to step it up so I can match on the oject name too e.g. MyObject
Examples:
Don't match since it's not MyObject: YourObject{nothingSpecial='Word1', secretData='Word2', privateEmail='Word3'}
Match to MyObject so look for nothingSpecial & privateEmail: MyObject { nothingSpecial='Word1', secretData='Word2', privateEmail='Word3'}
Don't match since it's not MyObject: TheirObject{nothingSpecial='Word1', secretData='Word2', privateEmail='Word3'}
Truthfully, I've never been great a RegEx so any help would be very much appreciated.
To match the 3 words, you could make use if the \G anchor
(?:\b(MyObject)\h*\{\h*(?=[^{}]*})|\G(?!^))(?:(?:nothingSpecial|privateEmail)='([^'\n,]+)'|[^\s=]+='[^'\n,]*')(?:,\h*)?
Regex demo | Java demo
(?: Non capture group
\b(MyObject)\h*\{\h* Capture MyObject in group 1 and match {
(?=[^{}]*})
| Or
\G(?!^) Assert the position at the end of the previous match
) Close the non capture group
(?: Non capture group
(?:nothingSpecial|privateEmail)= Match either nothingSpecial or privateEmail followed by =
'([^'\n,]+)' Capture group 2 Match any char except ' a newline or comma between single quotes
| Or
[^\s=]+='[^'\n,]*' Match a key value pair with single quotes
) Close non capture group
(?:,\h*)? Optionally match a comma and horizontal whitespace chars

Regex to continue matching the similar pattern

I have JCL statement to be matched with regex pattern
The statement would be like below
//name JOB optionalParam,keyword=param,keyword=param,keyword=param
Actual statement would be like below
//ADBB503 JOB ,MSGCLASS=2,CLASS=P
//ABCD JOB Something,MSG=NTNG,CLASS=ABC
I have tried a regular expression to match in groups but the last keyword and param will have n number of times I need to continue matching till it exists.
String regex= (\/\/)(\w+)(\s+)(JOB)(\s+)(\w+)?(,)([\w+=\w+]+);
My trial is in the link given below
https://regex101.com/r/gUyRMV/1
The error I am facing is only one keyword=parameter is matching. N number of keyword and parameters needs to be matched.
You could match the job statement in the first capturing group and make use of \G to get the parameters in group 2:
(?:(//\w+\s+JOB(?: \w+)?)\h*|\G(?!^)),(\w+=\w+)
Explanation
(?: Non capturing group
( Capture group 1
//\w+\s+JOB Match //, 1+ word chars and JOB
(?: \w+)? Match optional param
)\h*` Close group and match 0+ horizontal whitespace chars
| Or
\G(?!^) Assert position at the end of previous match, not at the start
), Close non capturing group and match ,
( Capture group 2
\w+=\w+ Match 1+ word chars = 1 + word chars
) Close group
In java
String regex = "(?:(//\\w+\\s+JOB(?: \\w+)?)\\h*|\\G(?!^)),(\\w+=\\w+)";
Regex demo | Java demo

Get all unique file names

To preface, I am a beginner with regex. I have a string that looks something like:
my_folder/foo.xml::someextracontent
my_folder/foo.xml::someextracontent
another_folder/foo.xml::someextracontent
my_folder/bar.xml::someextracontent
my_folder/bar.xml::someextracontent
my_folder/hello.xml::someextracontent
I want to return unique XML files which are part of my_folder. So the regex will return:
my_folder/foo.xml
my_folder/bar.xml
my_folder/hello.xml
I've taken a look at Extract All Unique Lines which is close to what I need but I am not sure where to go from there.
The closest attempt I got was (?sm)(my_folder\/.*?.xml)(?=.*\1) which gets all the duplicates but I want the opposite, so I tried doing a negative lookahead instead (?sm)(my_folder\/.*?.xml)(?!.*\1) but the capture groups are totally wrong.
What am I missing here in my regex? Here's link to the regex: https://regex101.com/r/ggY2RB/1
This RegEx might help you to find the unique strings that you might be looking for:
/(\w+\/\w+\.xml)(?![\s\S]*\1)/s
If you only wish to match my_folder, you might try this:
/(\my_folder\/\w+\.xml)(?![\s\S]*\1)/s
Instead of using a positive lookahead (?=, to get the unique strings you could use a negative lookahead (?! to assert what is on the right is not what you have captured in group 1.
In your pattern you are using making the dot match a newline using (?s)and use a non greedy dot start .*? but you might also use a negated character class matching not a newline or a forward slash.
If the folder can also contain nested folders, you might use a pattern that repeats 0+ times 1+ whitespace chars followed by a forward slash.
(?s)(my_folder/(?:[^/\n]+/)*[^/\n]+\.xml)::(?!.*\1)
(?s)
( Capture group
my_folder/ Match literally
(?:[^/\n]+/)* Repeat 0+ times not a forward slash or a newline followed by a forward slash
[^/\n]+\.xml Match 1+ ot a forward slash or a newline followed by .xml
) Close capture group
::(?!.*\1) Match :: followed by asserting what is on the right does not contain what is captured in group 1
In Java
String regex = "(?s)(my_folder/(?:[^/\\n]+/)*[^/\\n]+\\.xml)::(?!.*\\1)";
Regex demo | Java demo

regex for string with backslash for escape

I'm trying to come up with a pattern for finding every text that is between double or single quotation marks in java source code. This is what I have:
"(.*?)"|’(.*?)’
Debuggex Demo
This works for almost every case I guess except one:
"text\"moretext\"evenmore"
Debuggex Demo
This could be used as a valid String definition, because the quotes are escaped. The pattern does not recognize the inner part more text.
Any ideas for a pattern that accounts for this case?
You can use this regex to match single or double quotes string ignoring all escaped quotes:
(["'])([^\\]*?(?:\\.[^\\]*?)*)\1
RegEx Demo
RegEx Breakup:
(["']): Match single or double quote and capture it in group #1
(: Start Capturing group #2
[^\\]*?: Match 0 or more of any characters that is not a \
(?:`: Start non-capturing group
\\: Match a \
.: Followed by any character that is escaped
[^\\]*?: Followed by 0 or more of any non-\ characters
)*: End non-capturing group. Match 0 or more of this non-capturing group
): End capturing group #2
\1: Match closing single or double quote matches in group #1
That should work: "([^"\\]|\\.)*"|'([^'\\]|\\.)*' Regexr test.
Explanation:
" matches ".
[^"\\]|\\. negates match of \ & "(i.e. makes it to consume \") or continues match of \ and any character.
* continue match.
" matches "
Same for '.

Regular Expressions \w character class and equals sign

I am creating a regular expression to match the string
#servername:port:databasename
and through https://regex101.com/ I came up with
\#(((\w+.*-*)+)?\w+)(:\d+)(:\w+)
which matches
e.g. #CORA-PC:1111:databasename or #111.111.1.111:111:databasename
However when I use this regular expression to pattern match in my java code the String #CORA-PC:1111:database=name is also matched.
Why is \w matching the = equals sign? I also tried [0-9a-zA-Z] but it also matched the = equals sign?
Can anyone help me with this?
Thanks!
The .* is a greedy dot matching subpattern that matches the whole line and then backtracks to accommodate for the subsequent subpatterns. That is why the pattern can match a = symbol (see demo - Group 3 matches that part with =) .
Your pattern is rather fragile, as the first part contains nested quantifiers with optional subpatterns that slows down the regex execution and causes other issues. You need to make it more linear.
#(\w+(?:[-.]\w+)*)?(:\d+)(:\w+)
See the regex demo
The regex will match
# - # symbol
(\w+(?:[-.]\w+)*)? - an optional group matching
\w+ - 1+ word chars
(?:[-.]\w+)* - 0+ sequences of a - or . ([-.]) followed with 1+ word chars
(:\d+) - a : symbol followed with 1+ digits
(:\w+) - a : symbol followed with 1+ word chars
If you need to avoid partial matching, use String#matches() (see demo).
NOTE: In Java, backslashes must be doubled.
Code example (Java):
String s = "#CORA-PC:1111:databasename";
String rx = "#(?:\\w+(?:[-.]\\w+)*)?:\\d+:\\w+";
System.out.println(s.matches(rx));
Code example (JS):
var str = '#CORA-PC:1111:databasename';
alert(/^#(?:\w+(?:[-.]\w+)*)?:\d+:\w+$/.test(str));

Categories

Resources