I'm a bit rusty with my Java-Fu, but I ran across an issue, which I don't quite get.
I have a string test\\"escape\"test\ that I'm trying to transform into test\"escape"test\ (this is before string escaping).
I found this nifty replaceAll method with function callback, and I'm trying to remove exactly one backslash from all substrings beginning with backslashes and ending in a qoute ( like \\" to \", but any other occurences of backslashes should be unmodified ).
Unfortunately I get back test"escape"test\, one less backslash at position 5 than expected.
Here is the code that I tried:
Pattern.compile("(\\\\+\")")
.matcher("test\\\\\"escape\\\"test\\")
.replaceAll(mr -> mr.group().substring(1));
What am I missing here?
The problem with your code is that the string you passed in the lambda is treated as the replacement pattern where \ is a special char. You need to use Matcher.quoteReplacement(mr.group().substring(1)):
String output = Pattern.compile("(\\\\+\")")
.matcher(text)
.replaceAll(mr -> Matcher.quoteReplacement(mr.group().substring(1)));
See the Java demo.
You do not need to use a lambda here, you can use
String text = "test\\\\\"escape\\\"test\\ that";
String output = text.replaceAll("\\\\(\\\\*\")", "$1");
System.out.println(output);
// => test\"escape"test\ that
See the Java demo and the regex demo.
The \\(\\*") regex matches a \ and then captures zero or more backslashes with a " after them into Group 1, and the replacement is the Group 1 value.
Related
I have a problem with creating regex of match that will get from string example: NotificationGroup_n+En where n are numbers from 1-4 and when let's say i match desired number from range i will replace or remove it with that specific number.
String BEFORE process: NotificationGroup_4+E3
String AFTER process: NotificationGroup_E3
I removed n (number from 1-4) and leave _E with number
My question is how to write regex in string.replace function to match number and than the plus sign and leave out only the string with _En
def String string = "Notification_Group_4+E3";
println(removeChar(string));
}
public static def removeChar(String string) {
if ((string.contains("1+"))||(string.contains("2+")||(string.contains("3+"))||(string.contains("4+")))) {
def stringReplaced = string.replace('4+', "");
return stringReplaced;
}
}
in groovy:
def result = "Notification_Group_4+E3".replaceFirst(/_\d\+(.*)/, '_$1')
println result
output:
~> Â groovy solution.groovy
Notification_Group_E3
~>
Try it online!
A visualization of the regex look like this:
Regex explanation:
we use groovy slashy strings /.../ to define the regex. This makes escaping simpler
we first match on underscore _
Then we match on a single digit (0-9) using the predefined character class \d as described in the javadoc for the java Pattern class.
We then match for one + character. We have to escape this with a backslash \ since + without escaping in regular expressions means "one or more" (see greedy quantifiers in the javadocs) . We don't want one or more, we want just a single + character.
We then create a regex capturing group as described in the logical operators part of the java Pattern regex using the parens expression (.*). We do this so that we are not locked into the input string ending with E3. This way the input string can end in an arbitrary string and the pattern will still work. This essentially says "capture a group and include any character (that is the . in regex) any number of times (that is the * in regex)" which translates to "just capture the rest of the line, whatever it is".
Finally we replace with _$1, i.e. just underscore followed by whatever the capturing group captured. The $1 is a "back reference" to the "first captured group" as documented in, for example, the java Matcher javadocs.
try this regex (\d.*?\+) here demo
in java :
String string = "Notification_Group_4+E3";
System.out.print(string.replaceAll("\\d.*?\\+", ""));
output :
Notification_Group_E3
The simple one-liner:
String res = 'Notification_Group_4+E3'.replaceAll( /_\d+\+/, '_' )
assert 'Notification_Group_E3' == res
I'm trying to get true in the following test. I have a string with the backslash, that for some reason doesn't recognized.
String s = "Good news\\ everyone!";
Boolean test = s.matches("(.*)news\\.");
System.out.println(test);
I've tried a lot of variants, but only one (.*)news(.*) works. But that actually means any characters after news, i need only with \.
How can i do that?
Group the elements at the end:(.*)news\\(.*)
You can use this instead :
Boolean test = s.matches("(.*)news\\\\(.*)");
Try something like:
Boolean test = s.matches(".*news\\\\.*");
Here .* means any number of characters followed by news, followed by double back slashes (escaped in a string) and then any number of characters after that (can be zero as well).
With your regex what it means is:
.* Any number of characters
news\\ - matches by "news\" (see one slash)
. followed by one character.
which doesn't satisfies for String in your program "Good news\ everyone!"
You are testing for an escaped occurrence of a literal dot: ".".
Refactor your pattern as follows (inferring the last part as you need it for a full match):
String s = "Good news\\ everyone!";
System.out.println(s.matches("(.*)news\\\\.*"));
Output
true
Explanation
The back-slash is used to escape characters and the back-slash itself in Java Strings
In Java Pattern representations, you need to double-escape your back-slashes for representing a literal back-slash ("\\\\"), as double-back-slashes are already used to represent special constructs (e.g. \\p{Punct}), or escape them (e.g. the literal dot \\.).
String.matches will attempt to match the whole String against your pattern, so you need the terminal part of the pattern I've added
you can try this :
String s = "Good news\\ everyone!";
Boolean test = s.matches("(.*)news\\\\(.*)");
System.out.println(test);
I try to explain my problem with a little example.
I implemented version 1 and version 2, but I didn't get the desired result. Which replacement-parameter do I have to use to get the desired result with the replaceAll method ?
Version 1:
String s = "TEST";
s = s.replaceAll("TEST", "TEST\nTEST");
System.out.println(s);
Output:
TEST
TEST
Version 2:
String s = "TEST";
s = s.replaceAll("TEST", "TEST\\nTEST");
System.out.println(s);
Output:
TESTnTEST
Desired Output:
TEST\nTEST
From the javadoc of String#replaceAll(String, String):
Note that backslashes (\) and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
Matcher.quoteReplacement(java.lang.String) to suppress the special
meaning of these characters, if desired.
s = s.replaceAll("TEST", Matcher.quoteReplacement("TEST\\nTEST"));
You still need 2 backslashes, as \ is a metachar for string literals.
You can also use 4 backslashes without Matcher.quoteReplacement:
you want one \ in the output
you need to escape it with \, as \ is a metachar for replacement strings: \\
you need to escape both with \, as \ is a metachar for string literals: \\\\
s = s.replaceAll("TEST", "TEST\\\\nTEST");
Don't use replaceAll()!
replaceAll() does a regex search and replace, but your task doesn't need regex - just use the plain text version replace(), also replaces all occurrences.
You need a literal backslash, which is coded as two backslashes in a Java String literal:
String s = "TEST";
s = s.replace("TEST", "TEST\\nTEST");
System.out.println(s);
Output:
TEST\nTEST
My goal is to validate specific characters (*,^,+,?,$,[],[^]) in the some text, like:
?test.test => true
test.test => false
test^test => true
test:test => false
test-test$ => true
test-test => false
I've already created regex regarding to requirment above, but I am not sure in this.
^(.*)([\[\]\^\$\?\*\+])(.*)$
Will be good to know whether it can be optimized in such way.
Your regex is already optimized one as its very simple. You can make is much simpler or readable only.
Also if you use the matches() method of Java's String class then you'll not require the ^ and $ at the both ends.
.*([\\[\\]^$?*+]).*
Double slashes(\\) for Java, otherwise please use single slash(\).
Look, I have removed the captures () along with escape character \ for the characters ^$?*+ as they are inside the character class [].
TL;DR
The quickest regex to do the job is
# ^[^\]\[^$?*+]*([\]\[^$?*+])
^ #start of the string
[^ #any character BUT...
\]\[^$?*+ #...these ones (^$?*+ aren't special inside a character class)
]*+ #zero or more times (possessive quantifier)
([ #capture any of...
\]\[^$?*+ #...these characters
])
Be careful that in a java string, you need to escape the \ as well, so you should transform every \ into \\.
Discussion
At first two regex come in mind:
[\]\[^$?*+], which will match only the character you want inside the string.
^.*[\]\[^$?*+], which will match your string up to the desired character.
It's actually important performance-wise to understand the difference between the case with .* at the beginning and the one with no wildcard at all.
When searching for the pattern, the first .* will make the regex engine eat all the string, then backtrack character by character to see if it's a match for your character range [...]. So the regex will actually search from the end of the string.
This is an advantage when your wanted sign if near the end, a disadvantage when it is at the beginning.
On the other case, the regex engine will try every character, beginning from the left, until it matches what you want.
You can see what I mean with these two examples from the excellent regex101.com:
with the .*, match is found in 26 steps when near the beginning, 8 when it's near the beginning: http://regex101.com/r/oI3pS1/#debugger
without it, it is found in 5 steps when near the beginning and 23 when near the end
Now, if you want to combine these two approaches you can use the tl;dr answer: you eat everything that isn't your character, then you match your character (or fail if there isn't one).
On our example, it takes 7 steps wherever your character is in the string (and 7 steps even if there is no character, thanks to the possessive quantifier).
That should also work:
String regex = ".*[\\[\\]^$?*+].*";
String test1 = "?test.test";
String test2 = "test.test";
String test3 = "test^test";
String test4 = "test:test";
String test5 = "test-test$";
String test6 = "test-test";
System.out.println(test1.matches(regex));
System.out.println(test2.matches(regex));
System.out.println(test3.matches(regex));
System.out.println(test4.matches(regex));
System.out.println(test5.matches(regex));
System.out.println(test6.matches(regex));
I have a long Java String that contains lots of escaped double-quotes:
// Prints: \"Hello my name is Sam.\" \"And I am a good boy.\"
System.out.println(bigString);
I want to remove all the escaped double-quotes (\") and replace them with normal double-quotes (") so that I get:
// Prints: "Hello my name is Sam." "And I am a good boy."
System.out.println(bigString);
I thought this was a no-brainer. My best attempt of:
bigString = bigString.replaceAll("\\", "");
Throws the following exception:
Unexpected internal error near index 1
Any ideas? Thanks in advance.
Everybody is telling you to use replaceAll, the better answer is really to use replace.
replaceAll - requires regular expression
replace [javadoc]- is just a string search and replace
So like this:
bigString = bigString.replace("\\\"", "\"");
Note that this is also faster because regular expression is not needed.
Replace all uses Regular expressions, so add another set of \\
bigString = bigString.replaceAll("\\\\\"", "\"");
Explanation why:
"\" is interpretad by java as a normal \. However if you would use only that in the parameter, it becomes the regular expression \. A \ in a regular expression escapes the next character. Since none is found, it throws an exception.
When you write in Java "\\\\\"", it is first treated by java as the regular expression \\". Which is then treated by the regular expression implementation as "a backslash followed by a double-quote".
String str="\"Hello my name is Sam.\" \"And I am a good boy.\"";
System.out.println(str.replaceAll("\\\"", "\""));
Output:
"Hello my name is Sam." "And I am a good boy."
The first argument to replaceAll is a regular expression. You pass \ which is not a valid regex. Try:
bigString.replaceAll("\\\\", "");