How can i add multiple match conditions in a regex - java

I have a String like this : String x = "return function ('ABC','DEF')";
I am using this:
Pattern pattern = Pattern.compile("'(.*?)'");
Matcher matcher = pattern.matcher(formula);
while (matcher.find()) {
System.out.println("------> " + matcher.group();
}
to retrieve strings between single quotes.
My question is: how can i adapt this regex so that it will check for strings between single quotes AND strings like " ,'DEF' " (meaning which start with ,' and end with ')?

You can use this pattern:
'[^']+'|"[^"]+"
Just to match with empty quoted string change '+' to '*'.
See test.

This pattern should do what you want:
"(?:,\s*)?'[^']*'"
The ? means the first group will match zero or one times.
I used (?:...) because this is a non-capturing group. It is better to use when you don't need to capture that portion of the match.
Also, I replaced .*? with [^']*, meaning the single-quoted string contains anything that is not a single quote. This is more efficient and less likely to lead to mistakes in your regex than .*?.
(Note: this regex allows there to be space between the comma and the start of the string. At first looking at your example, I thought that was true of your example. But now I see that it is not. Still, that might be useful depending on what your data looks like).

You could use the regex pattern:
Pattern.compile(",?'(.*?)'");
,? means 0 or 1 commas. The ? is greedy, so if there is a comma, it will be included in the match.
So: This will match:
A comma, followed by a string enclosed in single quotes
OR.. only a string enclosed in single quotes

Related

Regex pattern matching with multiple strings

Forgive me. I am not familiarized much with Regex patterns.
I have created a regex pattern as below.
String regex = Pattern.quote(value) + ", [NnoneOoff0-9\\-\\+\\/]+|[NnoneOoff0-9\\-\\+\\/]+, "
+ Pattern.quote(value);
This regex pattern is failing with 2 different set of strings.
value = "207e/160";
Use Case 1 -
When channelStr = "207e/160, 149/80"
Then channelStr.matches(regex), returns "true".
Use Case 2 -
When channelStr = "207e/160, 149/80, 11"
Then channelStr.matches(regex), returns "false".
Not able to figure out why? As far I can understand it may be because of the multiple spaces involved when more than 2 strings are present with separated by comma.
Not sure what should be correct pattern I should write for more than 2 strings.
Any help will be appreciated.
If you print your pattern, it is:
\Q207e/160\E, [NnoneOoff0-9\-\+\/]+|[NnoneOoff0-9\-\+\/]+, \Q207e/160\E
It consists of an alternation | matching a mandatory comma as well on the left as on the right side.
Using matches(), should match the whole string and that is the case for 207e/160, 149/80 so that is a match.
Only for this string 207e/160, 149/80, 11 there are 2 comma's, so you do get a partial match for the first part of the string, but you don't match the whole string so matches() returns false.
See the matches in this regex demo.
To match all the values, you can use a repeating pattern:
^[NnoeOf0-9+/-]+(?:,\h*[NnoeOf0-90+/-]+)*$
^ Start of string
[NnoeOf0-9\\+/-]+
(?: Non capture group
,\h* Match a comma and optional horizontal whitespace chars
[NnoeOf0-90-9\\+/-]+ Match 1+ any of the listed in the character class
)* Close the non capture group and optionally repeat it (if there should be at least 1 comma, then the quantifier can be + instead of *)
$ End of string
Regex demo
Example using matches():
String channelStr1 = "207e/160, 149/80";
String channelStr2 = "207e/160, 149/80, 11";
String regex = "^[NnoeOf0-9+/-]+(?:,\\h*[NnoeOf0-90+/-]+)*$";
System.out.println(channelStr1.matches(regex));
System.out.println(channelStr2.matches(regex));
Output
true
true
Note that in the character class you can put - at the end not having to escape it, and the + and / also does not have to be escaped.
You can use regex101 to test your RegEx. it has a description of everything that's going on to help with debugging. They have a quick reference section bottom right that you can use to figure out what you can do with examples and stuff.
A few things, you can add literals with \, so \" for a literal double quote.
If you want the pattern to be one or more of something, you would use +. These are called quantifiers and can be applied to groups, tokens, etc. The token for a whitespace character is \s. So, one or more whitespace characters would be \s+.
It's difficult to tell exactly what you're trying to do, but hopefully pointing you to regex101 will help. If you want to provide examples of the current RegEx you have, what you want to match and then the strings you're using to test it I'll be happy to provide you with an example.
^(?:[NnoneOoff0-9\\-\\+\\/]+ *(?:, *(?!$)|$))+$
^ Start
(?: ... ) Non-capturing group that defines an item and its separator. After each item, except the last, the separator (,) must appear. Spaces (one, several, or none) can appear before and after the comma, which is specified with *. This group can appear one or more times to the end of the string, as specified by the + quantifier after the group's closing parenthesis.
Regex101 Test

java regular expression and replace all occurrences

I want to replace one string in a big string, but my regular expression is not proper I guess. So it's not working.
Main string is
Some sql part which is to be replaced
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'
String to find and replace is
Based on some condition sql part to be replaced
hemp.EMPLOYEE_NAME = 'xxx'
I have tried this with
Pattern and Matcher class is used and
Pattern pat1 = Pattern.compile("/^hemp.EMPLOYEE_NAME\\s=\\s\'\\w\'\\s[and|or]*/$", Pattern.CASE_INSENSITIVE);
Matcher mat = pat1.matcher(cond);
while (mat.find()) {
System.out.println("Match: " + mat.group());
cond = mat.replaceFirst("xx "+mat.group()+"x");
mat = pat1.matcher(cond);
}
It's not working, not entering the loop at all. Any help is appreciated.
Obviously not - your regexp pattern doesn't make any sense.
The opening /: In some languages, regexps aren't strings and start with an opening slash. Java is not one of those languages, and it has nothing to do with regexps itself. So, this looks for a literal slash in that SQL, which isn't there, thus, failure.
^ is regexpese for 'start of string'. Your string does not start with hemp.EMPLOYEE_NAME, so that also doesn't work. Get rid of both / and ^ here.
\\s is one whitespace character (there are many whitespace characters - this matches any one of them, exactly one though). Your string doesn't have any spaces. Your intent, surely, was \\s* which matches 0 to many of them, i.e.: \\s* is: "Whitespace is allowed here". \\s is: There must be exactly one whitespace character here. Make all the \\s in your regexp an \\s*.
\\w is exactly one 'word' character (which is more or less a letter or digit), you obviously wanted \\w*.
[and|or] this is regexpese for: "An a, or an n, or a d, or an o, or an r, or a pipe symbol". Clearly you were looking for (and|or) which is regexpese for: Either the sequence "and", or the sequence "or".
* - so you want 0 to many 'and' or 'or', which makes no sense.
closing slash: You don't want this.
closing $: You don't want this - it means 'end of string'. Your string didn't end here.
The code itself:
replaceFirst, itself, also does regexps. You don't want to double apply this stuff. That's not how you replace a found result.
This is what you wanted:
Matcher mat = pat1.matcher(cond);
mat.replaceFirst("replacement goes here");
where replacement can include references to groups in the match if you want to take parts of what you matched (i.e. don't use mat.group(), use those references).
More generally did you read any regexp tutorial, did any testing, or did any reading of the javadoc of Pattern and Matcher?
I've been developing for a few years. It's just personal experience, perhaps, but, reading is pretty fundamental.
Instead of the anchors ^ and $, you can use word boundaries \b to prevent a partial match.
If you want to match spaces on the same line, you can use \h to match horizontal whitespace char, as \s can also match a newline.
You can use replaceFirst on the string using $0 to get the full match, and an inline modifier (?i) for a case insensitive match.
Note that using [and|or] is a character class matching one of the listed chars and escape the dot to match it literally, or else . matches any char except a newline.
(?i)\bhemp\.EMPLOYEE_NAME\h*=\h*'\w+'\h+(?:and|or)\b
See a regex demo or a Java demo
For example
String regex = "\\bhemp\\.EMPLOYEE_NAME\\h*=\\h*'\\w+'\\h+(?:and|or)\\b";
String string = "cond = emp.EMAIL_ID = 'xx#xx.com' AND\n"
+ "emp.PERMANENT_ADDR LIKE('%98n%') \n"
+ "AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'";
System.out.println(string.replaceFirst(regex, "xx$0x"));
Output
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND xxhemp.EMPLOYEE_NAME = 'xxx' andx is_active='Y'

Java String Split using Regex with Escape Character

I have a string which needs to be split based on a delimiter(:). This delimiter can be escaped by a character (say '?'). Basically the delimiter can be preceded by any number of escape character. Consider below example string:
a:b?:c??:d???????:e
Here, after the split, it should give the below list of string:
a
b?:c??
d???????:e
Basically, if the delimiter (:) is preceded by even number of escape characters, it should split. If it is preceded by odd number of escape characters, it should not split. Is there a solution to this with regex?
Any help would be greatly appreciated.
Similar question has been asked earlier here, But the answers are not working for this use case.
Update:
The solution with the regex: (?:\?.|[^:?])* correctly split the string. However, this also gives few empty strings. If + is given instead of *, even the real empty matches also ignored. (Eg:- a::b gives only a,b)
Scenario 1: No empty matches
You may use
(?:\?.|[^:?])+
Or, following the pattern in the linked answer
(?:\?.|[^:?]++)+
See this regex demo
Details
(?: - start of a non-capturing group
\?. - a ? (the delimiter) followed with any char
| - or
[^:?] - any char but the : (your delimiter char) and ? (the escape char)
)+ - 1 or more repetitions.
In Java:
String regex = "(?:\\?.|[^:?]++)+";
In case the input contains line breaks, prepend the pattern with (?s) (like (?s)(?:\\?.|[^:?])+) or compile the pattern with Pattern.DOTALL flag.
Scenario 2: Empty matches included
You may add (?<=:)(?=:) alternative to the above pattern to match empty strings between : chars, see this regex demo:
String s = "::a:b?:c??::d???????:e::";
Pattern pattern = Pattern.compile("(?>\\?.|[^:?])+|(?<=:)(?=:)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println("'" + matcher.group() + "'");
}
Output of the Java demo:
''
'a'
'b?:c??'
''
'd???????:e'
''
Note that if you want to also match empty strings at the start/end of the string, use (?<![^:])(?![^:]) rather than (?<=:)(?=:).

remove part of matcher after the match in regex pattern

I need to help in writing regex pattern to remove only part of the matcher from original string.
Original String: 2017-02-15T12:00:00.268+00:00
Expected String: 2017-02-15T12:00:00+00:00
Expected String removes everything in milliseconds.
My regex pattern looks like this: (:[0-5][0-9])\.[0-9]{1,3}
i need this regex to make sure i am removing only the milliseconds from some time field, not everything that comes after dot. But using above regex, I am also removing the minute part. Please suggest and help.
You have defined a capturing group with (...) in your pattern, and you want to have that part of string to be present after the replacement is performed. All you need is to use a backreference to the value stored in this capture. It can be done with $1:
String s = "2017-02-15T12:00:00.268+00:00";
String res = s.replaceFirst("(:[0-5][0-9])\\.[0-9]{1,3}", "$1");
System.out.println(res); // => 2017-02-15T12:00:00+00:00
See the Java demo and a regex demo.
The $1 in the replacement pattern tells the regex engine it should look up the captured group with ID 1 in the match object data. Since you only have one pair of unescaped parentheses (1 capturing group) the ID of the group is 1.
Change your pattern to (?::[0-5][0-9])(\.[0-9]{1,3}), run the find in the matcher and remove all it finds in the group(1).
The backslash will force the match with the '.' char, instead of any char, which is what the dot represents in a regex.
The (?: defines a non-capturing group, so it will not be considered in the group(...) on the matcher.
And adding a parenthesis around what you want will make it show up as group in the matcher, and in this case, the first group.
A good reference is the Pattern javadoc: http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
Use $1 and $2 variable for replace
string.replaceAll("(.*)\\.\\d{1,3}(.*)","$1$2");

capture all characters between match character (single or repeated) on string

I'm trying to extract the string preceding a specific character (even when character is repeated, like this (ie: underscore '_'):
this_is_my_example_line_0
this_is_my_example_line_1_
this_is_my_example_line_2___
_this_is_my_ _example_line_3_
__this_is_my___example_line_4__
and after running my regex I should get this (the regex should ignore the any instances of the matching character in the middle of the string):
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4
In other words I'm trying to 'trim' the matched character(s) at the beginning and end of string.
I'm trying to use a Regex in Java to accomplish this, my idea is to capture the group of characters between the special character(s) at the end or beginning of the line.
So far I can only do this successfully for example 3 with this regexp:
/[^_]+|_+(.*)[_$]+|_$+/
[^_]+ not 'underscore' once or more
| OR
_+ underscore once or more
(.*) capture all characters
[_$]+ not 'underscore' once or more followed by end of line
|_$+ OR 'underscore' once or more followed by end of line
I just realized that this excludes the first word of the message on example 0,1,2 since the string doesn't start with underscore and it only starts matching after finding a underscore..
Is there an easier way not involving regex?
I don't really care about the first character (although it would be nice) I only need to ignore the repeating character at the end.. it looks that (by this regex tester) just doing this, would work? /()_+$/ the empty parenthesis matches anything before a single or repeting matches at the end of the line.. would that be correct?
Thank you!
There are a couple of options here, you could either replace matches of ^_+|_+$ with an empty string, or extract the contents of the first capture group from the match of ^_*(.*?)_*$. Note that if your strings may be multiple lines and you want to perform the replacement on each line then you will need to use the Pattern.MULTILINE flag for either approach. If your strings may be multiple lines and you only want to replacement to occur at the very beginning and end, don't use Pattern.MULTILINE but use Pattern.DOTALL for the second approach.
For example: http://regexr.com?355ff
How about [^_\n\r](.*[^_\n\r])??
Demo
String data=
"this_is_my_example_line_0\n" +
"this_is_my_example_line_1_\n" +
"this_is_my_example_line_2___\n" +
"_this_is_my_ _example_line_3_\n" +
"__this_is_my___example_line_4__";
Pattern p=Pattern.compile("[^_\n\r](.*[^_\n\r])?");
Matcher m=p.matcher(data);
while(m.find()){
System.out.println(m.group());
}
output:
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4

Categories

Resources