Regex Pattern Match toString() in Java

Regex Pattern Match toString() in Java - java

I'm looking for some help with matching a pattern for a string (tostring() generated):
MyObject{nothingSpecial='Word1', secretData='Word2', privateEmail='Word3'}
I wanted a pattern that can match on the Word1, Word3. So I came up with:
(?x)(["]?(nothingSpecial|secretData)["]?\s*[:=]{1}\s*["]?)(?:[^"\n,]+)
That worked but now I need to step it up so I can match on the oject name too e.g. MyObject
Examples:
Don't match since it's not MyObject: YourObject{nothingSpecial='Word1', secretData='Word2', privateEmail='Word3'}
Match to MyObject so look for nothingSpecial & privateEmail: MyObject { nothingSpecial='Word1', secretData='Word2', privateEmail='Word3'}
Don't match since it's not MyObject: TheirObject{nothingSpecial='Word1', secretData='Word2', privateEmail='Word3'}
Truthfully, I've never been great a RegEx so any help would be very much appreciated.

To match the 3 words, you could make use if the \G anchor
(?:\b(MyObject)\h*\{\h*(?=[^{}]*})|\G(?!^))(?:(?:nothingSpecial|privateEmail)='([^'\n,]+)'|[^\s=]+='[^'\n,]*')(?:,\h*)?
Regex demo | Java demo
(?: Non capture group
\b(MyObject)\h*\{\h* Capture MyObject in group 1 and match {
(?=[^{}]*})
| Or
\G(?!^) Assert the position at the end of the previous match
) Close the non capture group
(?: Non capture group
(?:nothingSpecial|privateEmail)= Match either nothingSpecial or privateEmail followed by =
'([^'\n,]+)' Capture group 2 Match any char except ' a newline or comma between single quotes
| Or
[^\s=]+='[^'\n,]*' Match a key value pair with single quotes
) Close non capture group
(?:,\h*)? Optionally match a comma and horizontal whitespace chars

Related

Trying to match possible tags in string by regex

those are my possible inputs:
"#smoke"
"#smoke,#Functional1" (OR condition)
"#smoke,#Functional1,#Functional2" (OR condition)
"#smoke","#Functional1" (AND condition),
"#smoke","~#Functional1" (SKIP condition),
"~#smoke","~#Functional1" (NOT condition)
(Please note, the string input for the regex, stops at the last " character on each line, no space or comma follows it!
The regex I came up with so far is
"((?:[~#]{1}\w*)+),?"
This matches in capturing groups for the samples 1, 4, 5 and 6 but NOT 2 and 3.
I am not sure how to continue tweaking it further, any suggestions?
I would like to capture the preceding boolean meaning of the tag (eg: ~) as well please.
If you have any suggestions to pre-process the string in Java before regex that would make it simpler, I am open to that possibility as well.
Thanks.

It seems that you want to match an optional ~ followed by an # and get iterative matches for group 1. You could make use of the \G anchors, which matches either at the start, or at the end of the previous match.
(?:"(?=.*"$)|\G(?!^))(~?#\w+(?:,~?#\w+)*)"?[,\h]?
Explanation
(?: Non capture group
"(?=.*"$) Match " and assert that the string ends with "
| Or
\G(?!^) Assert the position at the end of the previous match, not at the start
) Close non capture group
( Capture group 1
~?#\w+(?:,~?#\w+)* Match an optional ~, than # and 1+ word characters and repeat 0+ times with a comma prepended
)"? Close group 1 and match an optional "
[,\h] Match either a comma or a horizontal whitespace char.
Regex demo | Java demo
Example code
String regex = "(?:\"(?=.*\"$)|\\G(?!^))(~?#\\w+(?:,~?#\\w+)*)\"?[,\\h]?";
String string = "\"#smoke\"\n"
+ "\"#smoke,#Functional1\"\n"
+ "\"#smoke,#Functional1,#Functional2\"\n"
+ "\"#smoke\",\"#Functional1\"\n"
+ "\"#smoke\",\"~#Functional1\"\n"
+ "\"~#smoke\",\"~#Functional1\"";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
#smoke
#smoke,#Functional1
#smoke,#Functional1,#Functional2
#smoke
#Functional1
#smoke
~#Functional1
~#smoke
~#Functional1
Edit
If there are no consecutive matches, you could also use:
"(~?#\w+(?:,~?#\w+)*)"
Regex demo

Regex to continue matching the similar pattern

I have JCL statement to be matched with regex pattern
The statement would be like below
//name JOB optionalParam,keyword=param,keyword=param,keyword=param
Actual statement would be like below
//ADBB503 JOB ,MSGCLASS=2,CLASS=P
//ABCD JOB Something,MSG=NTNG,CLASS=ABC
I have tried a regular expression to match in groups but the last keyword and param will have n number of times I need to continue matching till it exists.
String regex= (\/\/)(\w+)(\s+)(JOB)(\s+)(\w+)?(,)([\w+=\w+]+);
My trial is in the link given below
https://regex101.com/r/gUyRMV/1
The error I am facing is only one keyword=parameter is matching. N number of keyword and parameters needs to be matched.

You could match the job statement in the first capturing group and make use of \G to get the parameters in group 2:
(?:(//\w+\s+JOB(?: \w+)?)\h*|\G(?!^)),(\w+=\w+)
Explanation
(?: Non capturing group
( Capture group 1
//\w+\s+JOB Match //, 1+ word chars and JOB
(?: \w+)? Match optional param
)\h*` Close group and match 0+ horizontal whitespace chars
| Or
\G(?!^) Assert position at the end of previous match, not at the start
), Close non capturing group and match ,
( Capture group 2
\w+=\w+ Match 1+ word chars = 1 + word chars
) Close group
In java
String regex = "(?:(//\\w+\\s+JOB(?: \\w+)?)\\h*|\\G(?!^)),(\\w+=\\w+)";
Regex demo | Java demo

Java regex. Match any "value" that is no preceded by given string

I need some help with a Java regexp.
I'm working with a file that has JSON similar format:
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',value:'xxxx'},{},[]],
['zul.wgt.Label','f6DQpf',{sclass:'class notranslate',style:'font-weight: bold;',prolog:' ',value:'xxxx'},
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',label:'xxxx'},{},[]]
['zul.wgt.Label','f6DQof',{sclass:'class',style:'font-weight: bold;',prolog:' ',label:'xxxx'},{},[]]
I need to match any label or value data that is not preceded by a "notranslate" value on the sclass property.
I've been working on an almost working Regexp but I need the final push to match only what I've previously wrote
((?!.*?notranslate)sclass:'[\w\s]+'.*?)((value|label):'(.*?)')
Right now it matches anything from sclass that it's not followed by 'notranslate'
Thanks for your help

The values of your current regex are in the 4th capturing group
You could also use 1 capturing group instead of 4:
^(?!.*\bsclass:'[^']*\bnotranslate\b[^']*').*\b(?:label|value):'([^']+)'
Regex demo
That would match:
^ Assert start of the string
(?! Negative lookahead to assert that what is on the right does not
.*\bsclass: Match any character 0+ times followed by class:
'[^']*\bnotranslate\b[^']*' Match notranslate between single quotes and word boundaries
) Close non capturing group
.* match any character 0+ times
\b(?:label|value): Match either label or value followed by :
'([^']+)' Match ', capture in a group matching not ' 1+ times and match '
Java demo

regular expression retrieve a portion of a string that not contain a string

I have some strings like the following:
it.mycompany.db.beans.str1.PD_T_CLASS
it.mycompany.db.beans.join.PD_T_CLASS
it.mycompany.db.beans.str2.PD_T_CLASS_1
it.mycompany.db.beans.join.PD_T_CLASS_1
PD_T_CLASS myVar = new PD_T_CLASS();
myVar.setPD_T_CLASS(something);
and I want to select "PD_" part to substitute it with "" (the void string) but only inf the entire line does not contain the string ".join."
what I want to achieve is:
it.mycompany.db.beans.str1.T_CLASS
it.mycompany.db.beans.join.PD_T_CLASS
it.mycompany.db.beans.str2.T_CLASS_1
it.mycompany.db.beans.join.PD_T_CLASS_1
T_CLASS myVar = new T_CLASS();
myVar.setT_CLASS(something);
The substitution is not a problem since I'm using eclipse search tool and will hit replace as soon as it show me the right result.
I have tried:
^((?!\.join\.).)*(PD_)*$ // whole string selected
^((?!\.join\.).)*(\bPD_\b)*$ // whole string selected
I start getting frustrated since I've searched a bit around (the ^((?!join bla bla come from those searches)
Can you help me?

You may use the following regex:
(?m)(?:\G(?!\A)|^(?!.*\.join\.))(.*?)PD_
and replace with
$1
See the regex demo
Details:
(?m) - a Pattern.MULTILINE inline modifier flag that will force ^ to match the beginning of a line rather than a whole string
(?:\G(?!\A)|^(?!.*\.join\.)) - either of the two alternatives:
\G(?!\A) - the end of the previous successful match
| - or
^(?!.*\.join\.) - start of a line that has no .join. text in it (as the (?!.*\.join\.) is a negative lookahead that will fail the match if it matches any 0+ chars other than line break chars (.*) and then .join.)
(.*?) - Capturing group #1 (referred to with the $1 backreference in the replacement pattern): any 0+ chars other than line breaks, as few as possible, up to the first occurrence of ...
PD_ - a literal PD_
The replacement is a $1 backreference to the first capturing group that will restore any text matched before PD_s.

remove part of matcher after the match in regex pattern

I need to help in writing regex pattern to remove only part of the matcher from original string.
Original String: 2017-02-15T12:00:00.268+00:00
Expected String: 2017-02-15T12:00:00+00:00
Expected String removes everything in milliseconds.
My regex pattern looks like this: (:[0-5][0-9])\.[0-9]{1,3}
i need this regex to make sure i am removing only the milliseconds from some time field, not everything that comes after dot. But using above regex, I am also removing the minute part. Please suggest and help.

You have defined a capturing group with (...) in your pattern, and you want to have that part of string to be present after the replacement is performed. All you need is to use a backreference to the value stored in this capture. It can be done with $1:
String s = "2017-02-15T12:00:00.268+00:00";
String res = s.replaceFirst("(:[0-5][0-9])\\.[0-9]{1,3}", "$1");
System.out.println(res); // => 2017-02-15T12:00:00+00:00
See the Java demo and a regex demo.
The $1 in the replacement pattern tells the regex engine it should look up the captured group with ID 1 in the match object data. Since you only have one pair of unescaped parentheses (1 capturing group) the ID of the group is 1.

Change your pattern to (?::[0-5][0-9])(\.[0-9]{1,3}), run the find in the matcher and remove all it finds in the group(1).
The backslash will force the match with the '.' char, instead of any char, which is what the dot represents in a regex.
The (?: defines a non-capturing group, so it will not be considered in the group(...) on the matcher.
And adding a parenthesis around what you want will make it show up as group in the matcher, and in this case, the first group.
A good reference is the Pattern javadoc: http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

Use $1 and $2 variable for replace
string.replaceAll("(.*)\\.\\d{1,3}(.*)","$1$2");

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex Pattern Match toString() in Java - java

Related

Trying to match possible tags in string by regex

Regex to continue matching the similar pattern

Java regex. Match any "value" that is no preceded by given string

regular expression retrieve a portion of a string that not contain a string

remove part of matcher after the match in regex pattern

Categories

Resources