Java/Groovy - string: replace characters on matched regex - java

I have a problem with creating regex of match that will get from string example: NotificationGroup_n+En where n are numbers from 1-4 and when let's say i match desired number from range i will replace or remove it with that specific number.
String BEFORE process: NotificationGroup_4+E3
String AFTER process: NotificationGroup_E3
I removed n (number from 1-4) and leave _E with number
My question is how to write regex in string.replace function to match number and than the plus sign and leave out only the string with _En
def String string = "Notification_Group_4+E3";
println(removeChar(string));
}
public static def removeChar(String string) {
if ((string.contains("1+"))||(string.contains("2+")||(string.contains("3+"))||(string.contains("4+")))) {
def stringReplaced = string.replace('4+', "");
return stringReplaced;
}
}

in groovy:
def result = "Notification_Group_4+E3".replaceFirst(/_\d\+(.*)/, '_$1')
println result
output:
~>  groovy solution.groovy
Notification_Group_E3
~>
Try it online!
A visualization of the regex look like this:
Regex explanation:
we use groovy slashy strings /.../ to define the regex. This makes escaping simpler
we first match on underscore _
Then we match on a single digit (0-9) using the predefined character class \d as described in the javadoc for the java Pattern class.
We then match for one + character. We have to escape this with a backslash \ since + without escaping in regular expressions means "one or more" (see greedy quantifiers in the javadocs) . We don't want one or more, we want just a single + character.
We then create a regex capturing group as described in the logical operators part of the java Pattern regex using the parens expression (.*). We do this so that we are not locked into the input string ending with E3. This way the input string can end in an arbitrary string and the pattern will still work. This essentially says "capture a group and include any character (that is the . in regex) any number of times (that is the * in regex)" which translates to "just capture the rest of the line, whatever it is".
Finally we replace with _$1, i.e. just underscore followed by whatever the capturing group captured. The $1 is a "back reference" to the "first captured group" as documented in, for example, the java Matcher javadocs.

try this regex (\d.*?\+) here demo
in java :
String string = "Notification_Group_4+E3";
System.out.print(string.replaceAll("\\d.*?\\+", ""));
output :
Notification_Group_E3

The simple one-liner:
String res = 'Notification_Group_4+E3'.replaceAll( /_\d+\+/, '_' )
assert 'Notification_Group_E3' == res

Related

java regular expression and replace all occurrences

I want to replace one string in a big string, but my regular expression is not proper I guess. So it's not working.
Main string is
Some sql part which is to be replaced
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'
String to find and replace is
Based on some condition sql part to be replaced
hemp.EMPLOYEE_NAME = 'xxx'
I have tried this with
Pattern and Matcher class is used and
Pattern pat1 = Pattern.compile("/^hemp.EMPLOYEE_NAME\\s=\\s\'\\w\'\\s[and|or]*/$", Pattern.CASE_INSENSITIVE);
Matcher mat = pat1.matcher(cond);
while (mat.find()) {
System.out.println("Match: " + mat.group());
cond = mat.replaceFirst("xx "+mat.group()+"x");
mat = pat1.matcher(cond);
}
It's not working, not entering the loop at all. Any help is appreciated.
Obviously not - your regexp pattern doesn't make any sense.
The opening /: In some languages, regexps aren't strings and start with an opening slash. Java is not one of those languages, and it has nothing to do with regexps itself. So, this looks for a literal slash in that SQL, which isn't there, thus, failure.
^ is regexpese for 'start of string'. Your string does not start with hemp.EMPLOYEE_NAME, so that also doesn't work. Get rid of both / and ^ here.
\\s is one whitespace character (there are many whitespace characters - this matches any one of them, exactly one though). Your string doesn't have any spaces. Your intent, surely, was \\s* which matches 0 to many of them, i.e.: \\s* is: "Whitespace is allowed here". \\s is: There must be exactly one whitespace character here. Make all the \\s in your regexp an \\s*.
\\w is exactly one 'word' character (which is more or less a letter or digit), you obviously wanted \\w*.
[and|or] this is regexpese for: "An a, or an n, or a d, or an o, or an r, or a pipe symbol". Clearly you were looking for (and|or) which is regexpese for: Either the sequence "and", or the sequence "or".
* - so you want 0 to many 'and' or 'or', which makes no sense.
closing slash: You don't want this.
closing $: You don't want this - it means 'end of string'. Your string didn't end here.
The code itself:
replaceFirst, itself, also does regexps. You don't want to double apply this stuff. That's not how you replace a found result.
This is what you wanted:
Matcher mat = pat1.matcher(cond);
mat.replaceFirst("replacement goes here");
where replacement can include references to groups in the match if you want to take parts of what you matched (i.e. don't use mat.group(), use those references).
More generally did you read any regexp tutorial, did any testing, or did any reading of the javadoc of Pattern and Matcher?
I've been developing for a few years. It's just personal experience, perhaps, but, reading is pretty fundamental.
Instead of the anchors ^ and $, you can use word boundaries \b to prevent a partial match.
If you want to match spaces on the same line, you can use \h to match horizontal whitespace char, as \s can also match a newline.
You can use replaceFirst on the string using $0 to get the full match, and an inline modifier (?i) for a case insensitive match.
Note that using [and|or] is a character class matching one of the listed chars and escape the dot to match it literally, or else . matches any char except a newline.
(?i)\bhemp\.EMPLOYEE_NAME\h*=\h*'\w+'\h+(?:and|or)\b
See a regex demo or a Java demo
For example
String regex = "\\bhemp\\.EMPLOYEE_NAME\\h*=\\h*'\\w+'\\h+(?:and|or)\\b";
String string = "cond = emp.EMAIL_ID = 'xx#xx.com' AND\n"
+ "emp.PERMANENT_ADDR LIKE('%98n%') \n"
+ "AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'";
System.out.println(string.replaceFirst(regex, "xx$0x"));
Output
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND xxhemp.EMPLOYEE_NAME = 'xxx' andx is_active='Y'

How do i check if string contains char sequence and backslash "\"?

I'm trying to get true in the following test. I have a string with the backslash, that for some reason doesn't recognized.
String s = "Good news\\ everyone!";
Boolean test = s.matches("(.*)news\\.");
System.out.println(test);
I've tried a lot of variants, but only one (.*)news(.*) works. But that actually means any characters after news, i need only with \.
How can i do that?
Group the elements at the end:(.*)news\\(.*)
You can use this instead :
Boolean test = s.matches("(.*)news\\\\(.*)");
Try something like:
Boolean test = s.matches(".*news\\\\.*");
Here .* means any number of characters followed by news, followed by double back slashes (escaped in a string) and then any number of characters after that (can be zero as well).
With your regex what it means is:
.* Any number of characters
news\\ - matches by "news\" (see one slash)
. followed by one character.
which doesn't satisfies for String in your program "Good news\ everyone!"
You are testing for an escaped occurrence of a literal dot: ".".
Refactor your pattern as follows (inferring the last part as you need it for a full match):
String s = "Good news\\ everyone!";
System.out.println(s.matches("(.*)news\\\\.*"));
Output
true
Explanation
The back-slash is used to escape characters and the back-slash itself in Java Strings
In Java Pattern representations, you need to double-escape your back-slashes for representing a literal back-slash ("\\\\"), as double-back-slashes are already used to represent special constructs (e.g. \\p{Punct}), or escape them (e.g. the literal dot \\.).
String.matches will attempt to match the whole String against your pattern, so you need the terminal part of the pattern I've added
you can try this :
String s = "Good news\\ everyone!";
Boolean test = s.matches("(.*)news\\\\(.*)");
System.out.println(test);

Regular expression to search for text using star characters

I'm writing a Java application where a user can reduce a List of strings based on a filter that the user supplies.
So for example, the user could enter a filter such as:
ABC*xyz
This means that user is looking for strings that start with ABC and have xyz that follow (That would be the same as doing a search for ABC*xyz*)
Another example of a filter the user could enter is:
*DEF*mno*rst
This means that the string can start with anything, but it must then follow with DEF, followed by mno, followed by rst.
How would I write the Java code to be able generate the regular expression that I need to figure out if my strings match the filter the user has specified?
If converting your syntax to regex, which is the "easy" way to do this (avoiding writing a lexer yourself), you must remember to escape your string appropriately.
So if going down this route, you should probably aim to quote the bits that aren't wildcards in your syntax and join with regex .* (or .+ if you want your * to mean "at least one character). This will avoid incorrect results when using *, ., (, ) and all the other regex special characters.
Try something like:
public Pattern createPatternFromSearch(String query) {
StringBuilder sb = new StringBuilder();
for (String part : query.split("\\*")) {
if (part.length() > 0) {
sb.append(Pattern.quote(part));
}
sb.append(".*");
}
return Pattern.compile(sb.toString());
}
// ...
// then you can use it like....
Matcher matcher = createPatternFromQuery("*DEF*mno*rst").matcher(str);
if (matcher.matches()) {
// process the matching result
}
Note that by using Matcher#matches() (not find) and leaving the trailing .*, it will cater for your syntax that is anchored at the start only.
Replace * with .* and you have your regular expression.
String str = "*DEF*mno*rst";
String regex = str.replaceAll("*", ".*");

Java replaceAll regex error

I want to transforme all "*" into ".*" excepte "\*"
String regex01 = "\\*toto".replaceAll("[^\\\\]\\*", ".*");
assertTrue("*toto".matches(regex01));// True
String regex02 = "toto*".replaceAll("[^\\\\]\\*", ".*");
assertTrue("tototo".matches(regex02));// True
String regex03 = "*toto".replaceAll("[^\\\\]\\*", ".*");
assertTrue("tototo".matches(regex03));// Error
If the "*" is the first character a error occure :
java.util.regex.PatternSyntaxException:
Dangling meta character '*' near index 0
What is the correct regex ?
This is currently the only solution capable of dealing with multiple escaped \ in a row:
String regex = input.replaceAll("\\G((?:[^\\\\*]|\\\\[\\\\*])*)[*]", "$1.*");
How it works
Let's print the string regex to have a look at the actual string being parsed by the regex engine:
\G((?:[^\\*]|\\[\\*])*)[*]
((?:[^\\*]|\\[\\*])*) matches a sequence of characters not \ or *, or escape sequence \\ or \*. We match all the characters that we don't want to touch, and put it in a capturing group so that we can put it back.
The above sequence is followed by an unescaped asterisk, as described by [*].
In order to make sure that we don't "jump" when the regex can't match an unescaped *, \G is used to make sure the next match can only start at the beginning of the string, or from where the last match ends.
Why such a long solution? It is necessary, since the look-behind construct to check whether the number of consecutive \ preceding a * is odd or even is not officially supported by Java regex. Therefore, we need to consume the string from left to right, taking into account escape sequences, until we encounter an unescaped * and replace it with .*.
Test program
String inputs[] = {
"toto*",
"\\*toto",
"\\\\*toto",
"*toto",
"\\\\\\\\*toto",
"\\\\*\\\\\\*\\*\\\\\\\\*"};
for (String input: inputs) {
String regex = input.replaceAll("\\G((?:[^\\\\*]|\\\\[\\\\*])*)[*]", "$1.*");
System.out.println(input);
System.out.println(Pattern.compile(regex));
System.out.println();
}
Sample output
toto*
toto.*
\*toto
\*toto
\\*toto
\\.*toto
*toto
.*toto
\\\\*toto
\\\\.*toto
\\*\\\*\*\\\\*
\\.*\\\*\*\\\\.*
You need to use negative lookbehind here:
String regex01 = input.replaceFirst("(?<!\\\\)\\*", ".*");
(?<!\\\\) is a negative lookbehind that means match * if it is not preceded by a backslash.
Examples:
regex01 = "\\*toto".replaceAll("(?<!\\\\)\\*", ".*");
//=> \*toto
regex01 = "*toto".replaceAll("(?<!\\\\)\\*", ".*");
//=> .*toto
You have to cater for the case of a string starting with * in your regex:
(^|[^\\\\])\\*
The single caret represents the 'beginning of the string' ( 'start anchor' ).
Edit
Apart from the correction above, the replacement string in the replaceAll call must be $1.* instead of .* lest a matched character before an unescaped * be lost.

Correct existing regular expression / create a new one

I am trying to learn Regular expressions and am trying to replace values in a string with white-spaces using regular expressions to feed it into a tokenizer. The string might contain many punctuations. However, I do not want to replace whitespaces in string which contain an apostrophe/ hyphen within them.
For example,
six-pack => six-pack
He's => He's
This,that => This That
I tried to replace all the punctuations with whitespace initially but that would not work.
I tried to replace only those punctuations by specifying the wordboundaries as in
\B[^\p{L}\p{N}\s]+\B|\b[^\p{L}\p{N}\s]+\B|\B[^\p{L}\p{N}\s]+\b
But, I am not able to exclude the hyphen and apostrophe from them.
My guess is that the above regex is also very cumbersome and there should be a better way. Is there any?
So, all I am trying to do is:
Replace all punctuations with whitespace
Do not do the above if they are hyphen/apostrophe
Do replace if the hyphen/apostrophe does occur at start/end of a word.
Any help is appreciated.
You can probably work out a set of punctuation characters that are ok between words, and another set that isn't, then define your regular expression based on that.
For instance:
String[] input = {
"six-pack",// => six-pack
"He's",// => He's
"This,that"// => This That"
};
for (String s: input) {
System.out.println(s.replaceAll("(?<=\\w)[\\p{Punct}&&[^'-]](?=\\w)", " "));
}
Output
six-pack
He's
This that
Note
Here I'm defining the Pattern by using a character class including all posix for punctuation, preceded and followed by a word character, but negating a character class containing either ' or -.
You can use this lookahead based regex:
(?!((?!^)['-].))\\p{Punct}
RegEx Demo
You could use negative lookahead assertion like below,
String s = "six-pack\n"
+ "He's\n"
+ "This,that";
System.out.println(s.replaceAll("(?m)^['-]|['-]$|(?!['-])\\p{Punct}", " "));
Output:
six-pack
He's
This that
Explanation:
(?m) Multiline Mode
^['-] Matches ' or - which are at the start.
| OR
['-]$ Matches ' or - which are at the end of the line.
| OR
(?!['-])\\p{Punct} Matches all the punctuations except these two ' or - . It won't touch the matched [-'] symbols (ie, at the start and end).
RegEx Demo

Categories

Resources