I try to explain my problem with a little example.
I implemented version 1 and version 2, but I didn't get the desired result. Which replacement-parameter do I have to use to get the desired result with the replaceAll method ?
Version 1:
String s = "TEST";
s = s.replaceAll("TEST", "TEST\nTEST");
System.out.println(s);
Output:
TEST
TEST
Version 2:
String s = "TEST";
s = s.replaceAll("TEST", "TEST\\nTEST");
System.out.println(s);
Output:
TESTnTEST
Desired Output:
TEST\nTEST
From the javadoc of String#replaceAll(String, String):
Note that backslashes (\) and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
Matcher.quoteReplacement(java.lang.String) to suppress the special
meaning of these characters, if desired.
s = s.replaceAll("TEST", Matcher.quoteReplacement("TEST\\nTEST"));
You still need 2 backslashes, as \ is a metachar for string literals.
You can also use 4 backslashes without Matcher.quoteReplacement:
you want one \ in the output
you need to escape it with \, as \ is a metachar for replacement strings: \\
you need to escape both with \, as \ is a metachar for string literals: \\\\
s = s.replaceAll("TEST", "TEST\\\\nTEST");
Don't use replaceAll()!
replaceAll() does a regex search and replace, but your task doesn't need regex - just use the plain text version replace(), also replaces all occurrences.
You need a literal backslash, which is coded as two backslashes in a Java String literal:
String s = "TEST";
s = s.replace("TEST", "TEST\\nTEST");
System.out.println(s);
Output:
TEST\nTEST
Related
I have html string from file. I need to escape all double quotes. So I do this way:
String content=readFile(file.getAbsolutePath(), StandardCharsets.UTF_8);
content=content.replaceAll("\"","\\\"");
System.out.println(content);
However, the double quotes are not escaped and the string is the same as it was before replaceAll method. When I do
String content=readFile(file.getAbsolutePath(), StandardCharsets.UTF_8);
content=content.replaceAll("\"","^^^");
System.out.println(content);
All double quotes are replaced with ^^^.
Why content.replaceAll("\"","\\\""); doesn't work?
You need to use 4 backslashes to denote one literal backslash in the replacement pattern:
content=content.replaceAll("\"","\\\\\"");
Here, \\\\ means a literal \ and \" means a literal ".
More details at Java String#replaceAll documentation:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll
And later in Matcher.replaceAll documentation:
Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
Another fun replacement is replacing quotes with dollar sign: the replacement is "\\$". The 2 \s turn into 1 literal \ for the regex engine and it escapes the special character $ used to define backreferences. So, now it is a literal inside the replacement pattern.
You need to do :
String content = "some content with \" quotes.";
content = content.replaceAll("\"", "\\\\\"");
Why will this work?
\" represents the " symbol, while you need \".
If you add a \ as a prefix (\\") then you'll have to escape the prefix too, i.e. you'll have a \\\". This will now represent \", where \ is not the escaping character, but the symbol \.
However in the Java String the " character will be escaped with a \ and you will have to replace it as well. Therefore prefixing again with \\ will do fine:
x = x.replaceAll("\"", "\\\\\"");
It took me way too long in Java to discover Pattern.quote and Matcher.quoteReplacement. These will you achieve what you are trying to do here - which is a simple "find" and "replace" - without any regex and escape logic. The Pattern.quote here would not be necessary but it shows how you can ensure that the "find" part is not interpreted as a regex string:
#Test
public void testEscapeQuotes()
{
String content="some content with \"quotes\".";
content=content.replaceAll(Pattern.quote("\""), Matcher.quoteReplacement("\\\""));
Assert.assertEquals("some content with \\\"quotes\\\".", content);
}
Remember that you can also use the simple .replace method which will also "replaceAll" but will not interpret your parameters as regular expressions:
#Test
public void testEscapeQuotes()
{
String content="some content with \"quotes\".";
content=content.replace("\"", "\\\"");
Assert.assertEquals("some content with \\\"quotes\\\".", content);
}
Much easier with Apache Commons Text-
System.out.println(StringEscapeUtils.escapeJava("\""));
Output:
\"
Honestly, I am surprised by the behaviour, but it seems like you need to double-escape the backslash:
System.out.println("\"Hello world\"".replaceAll("\"", "\\\\\""));
which outputs:
\"Hello world\"
Demo
I am making a program that replaces a certain part of the string.
String x = "hello";
x=x.replaceAll("e","\\\\s");
System.out.println(x);
output: h\sllo
but for
System.out.println("\\s");
output: \s
why do we need extra escape characters in the first case.
You need \\ for a single \ character in regex
But Java string also interprets backslash therefore you need to escape each \ for String hence you need 2+2=4 backslashes to match a single \ (2 for String and 2 for regex engine)
Also note that 2nd argument to String#replaceAll method is also interpreted by regex engine due to potential presence of back-references and that is the reason same regex rules apply for replacement string also.
Your regex is using replacement string of a literal \ followed by a literal s
I need to convert all the occurrences of \\r to \r in string.
My attempt is following:
String test = "\\r New Value";
System.out.println(test);
test = test.replaceAll("\\r", "\r");
System.out.println("output: " + test);
Output:
\r New Value
output: \r New Value
With replaceAll you would have to use .replaceAll("\\\\r", "\r"); because
to represent \ in regex you need to escape it so you need to use pass \\ to regex engine
but and to create string literal for single \ you need to write it as "\\".
Clearer way would be using replace("\\r", "\r"); which will automatically escape all regex metacharacters.
You will have to escape each of the \ symbols.
Try:
test = test.replaceAll("\\\\r", "\\r");
\ is used for escaping in Java - so every time you actually want to match a backslash in a string, you need to write it twice.
In addition, and this is what most people seem to be missing - replaceAll() takes a regex, and you just want to replace based on simple string substitution - so use replace() instead. (You can of course technically use replaceAll() on simple strings as well by escaping the regex, but then you get into either having to use Pattern.quote() on the parameters, or writing 4 backslashes just to match one backslash, because it's a special character in regular expressions as well as Java!)
A common misconception is that replace() just replaces the first instance, but this is not the case:
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence.
...the only difference is that it works for literals, not for regular expressions.
In this case you're inadvertently escaping r, which produces a carriage return feed! So based on the above, to avoid this you actually want:
String test = "\\\\r New Value";
and:
test = test.replace("\\\\r", "\\r");
Character \r is a carriage return :)
What is the difference between \r and \n?
To print \ use escape character\
System.out.println("\\");
Solution
Escape \ with \ :)
public static void main(String[] args) {
String test = "\\\\r New Value";
System.out.println(test);
test = test.replaceAll("\\\\r", "\\r");
System.out.println("output: " + test);
}
result
\\r New Value
output: \r New Value
I have a string "\\u003c", which belongs to UTF-8 charset. I am unable to decode it to unicode because of the presence of double backslashes. How do i get "\u003c" from "\\u003c"? I am using java.
I tried with,
myString.replace("\\\\", "\\");
but could not achieve what i wanted.
This is my code,
String myString = FileUtils.readFileToString(file);
String a = myString.replace("\\\\", "\\");
byte[] utf8 = a.getBytes();
// Convert from UTF-8 to Unicode
a = new String(utf8, "UTF-8");
System.out.println("Converted string is:"+a);
and content of the file is
\u003c
You can use String#replaceAll:
String str = "\\\\u003c";
str= str.replaceAll("\\\\\\\\", "\\\\");
System.out.println(str);
It looks weird because the first argument is a string defining a regular expression, and \ is a special character both in string literals and in regular expressions. To actually put a \ in our search string, we need to escape it (\\) in the literal. But to actually put a \ in the regular expression, we have to escape it at the regular expression level as well. So to literally get \\ in a string, we need write \\\\ in the string literal; and to get two literal \\ to the regular expression engine, we need to escape those as well, so we end up with \\\\\\\\. That is:
String Literal String Meaning to Regex
−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−
\ Escape the next character Would depend on next char
\\ \ Escape the next character
\\\\ \\ Literal \
\\\\\\\\ \\\\ Literal \\
In the replacement parameter, even though it's not a regex, it still treats \ and $ specially — and so we have to escape them in the replacement as well. So to get one backslash in the replacement, we need four in that string literal.
Not sure if you're still looking for a solution to your problem (since you have an accepted answer) but I will still add my answer as a possible solution to the stated problem:
String str = "\\u003c";
Matcher m = Pattern.compile("(?i)\\\\u([\\da-f]{4})").matcher(str);
if (m.find()) {
String a = String.valueOf((char) Integer.parseInt(m.group(1), 16));
System.out.printf("Unicode String is: [%s]%n", a);
}
OUTPUT:
Unicode String is: [<]
Here is online demo of the above code
Regarding the problem of "replacing double backslashes with single backslashes" or, more generally, "replacing a simple string, containing \, with a different simple string, containing \" (which is not entirely the OP problem, but part of it):
Most of the answers in this thread mention replaceAll, which is a wrong tool for the job here. The easier tool is replace, but confusingly, the OP states that replace("\\\\", "\\") doesn't work for him, that's perhaps why all answers focus on replaceAll.
Important note for people with JavaScript background:
Note that replace(CharSequence, CharSequence) in Java does replace ALL occurrences of a substring - unlike in JavaScript, where it only replaces the first one!
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence.
On the other hand, replaceAll(String regex, String replacement) -- more docs also here -- is treating both parameters as more than regular strings:
Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string.
(this is because \ and $ can be used as backreferences to the captured regex groups, hence if you want to used them literally, you need to escape them).
In other words, both first and 2nd params of replace and replaceAll behave differently. For replace you need to double the \ in both params (standard escaping of a backslash in a string literal), whereas in replaceAll, you need to quadruple it! (standard string escape + function-specific escape)
To sum up, for simple replacements, one should stick to replace("\\\\", "\\") (it needs only one escaping, not two).
https://ideone.com/ANeMpw
System.out.println("a\\\\b\\\\c"); // "a\\b\\c"
System.out.println("a\\\\b\\\\c".replaceAll("\\\\\\\\", "\\\\")); // "a\b\c"
//System.out.println("a\\\\b\\\\c".replaceAll("\\\\\\\\", "\\")); // runtime error
System.out.println("a\\\\b\\\\c".replace("\\\\", "\\")); // "a\b\c"
https://www.ideone.com/Fj4RCO
String str = "\\\\u003c";
System.out.println(str); // "\\u003c"
System.out.println(str.replaceAll("\\\\\\\\", "\\\\")); // "\u003c"
System.out.println(str.replace("\\\\", "\\")); // "\u003c"
Another option, capture one of the two slashes and replace both slashes with the captured group:
public static void main(String args[])
{
String str = "C:\\\\";
str= str.replaceAll("(\\\\)\\\\", "$1");
System.out.println(str);
}
Try using,
myString.replaceAll("[\\\\]{2}", "\\\\");
This is for replacing the double back slash to single back slash
public static void main(String args[])
{
String str = "\\u003c";
str= str.replaceAll("\\\\", "\\\\");
System.out.println(str);
}
"\\u003c" does not 'belong to UTF-8 charset' at all. It is five UTF-8 characters: '\', '0', '0', '3', and 'c'. The real question here is why are the double backslashes there at all? Or, are they really there? and is your problem perhaps something completely different? If the String "\\u003c" is in your source code, there are no double backslashes in it at all at runtime, and whatever your problem may be, it doesn't concern decoding in the presence of double backslashes.
The line
System.out.println("\\");
prints a single back-slash (\). And
System.out.println("\\\\");
prints double back-slashes (\\). Understood!
But why in the following code:
class ReplaceTest
{
public static void main(String[] args)
{
String s = "hello.world";
s = s.replaceAll("\\.", "\\\\");
System.out.println(s);
}
}
is the output:
hello\world
instead of
hello\\world
After all, the replaceAll() method is replacing a dot (\\.) with (\\\\).
Can someone please explain this?
When replacing characters using regular expressions, you're allowed to use backreferences, such as \1 to replace a using a grouping within the match.
This, however, means that the backslash is a special character, so if you actually want to use a backslash it needs to be escaped.
Which means it needs to actually be escaped twice when using it in a Java string. (First for the string parser, then for the regex parser.)
The javadoc of replaceAll says:
Note that backslashes ( \ ) and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
Matcher.quoteReplacement(java.lang.String) to suppress the special
meaning of these characters, if desired.
This is a formatted addendum to my comment
s = s.replaceAll("\\.", Matcher.quoteReplacement("\\"));
IS MORE READABLE AND MEANINGFUL THAN
s = s.replaceAll("\\.", "\\\\\\");
If you don't need regex for replacing and just need to replace exact strings, escape regex control characters before replace
String trickyString = "$Ha!I'm tricky|.|";
String safeToUseInReplaceAllString = Pattern.quote(trickyString);
The backslash is an escape character in Java Strings. e.g. backslash has a predefined meaning in Java. You have to use "\ \" to define a single backslash. If you want to define " \ w" then you must be using "\ \ w" in your regex. If you want to use backslash you as a literal you have to type \ \ \ \ as \ is also a escape character in regular expressions.
I believe in this particular case it would be easier to use replace instead of replace all.
Reverend Gonzo Has the correct answer when he talks about escaping the character.
Using replaceAll:
s = s.replaceAll("\\.", "\\\\\\\\");
Using replace:
s = s.replaceAll(".", "\\");
replace just takes a string to match to, not a regular expression.
I don't like this implementation of regex. We should be able to escape characters with a single '\' , not '\'. But anyway if you want to get THIS.Out_Of_That you can do:
String prefix = role.replaceFirst("(\\.).*", "");
So you get prefix = THIS;