Java - escaping double quotes in string from file - java

I have html string from file. I need to escape all double quotes. So I do this way:
String content=readFile(file.getAbsolutePath(), StandardCharsets.UTF_8);
content=content.replaceAll("\"","\\\"");
System.out.println(content);
However, the double quotes are not escaped and the string is the same as it was before replaceAll method. When I do
String content=readFile(file.getAbsolutePath(), StandardCharsets.UTF_8);
content=content.replaceAll("\"","^^^");
System.out.println(content);
All double quotes are replaced with ^^^.
Why content.replaceAll("\"","\\\""); doesn't work?

You need to use 4 backslashes to denote one literal backslash in the replacement pattern:
content=content.replaceAll("\"","\\\\\"");
Here, \\\\ means a literal \ and \" means a literal ".
More details at Java String#replaceAll documentation:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll
And later in Matcher.replaceAll documentation:
Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
Another fun replacement is replacing quotes with dollar sign: the replacement is "\\$". The 2 \s turn into 1 literal \ for the regex engine and it escapes the special character $ used to define backreferences. So, now it is a literal inside the replacement pattern.

You need to do :
String content = "some content with \" quotes.";
content = content.replaceAll("\"", "\\\\\"");
Why will this work?
\" represents the " symbol, while you need \".
If you add a \ as a prefix (\\") then you'll have to escape the prefix too, i.e. you'll have a \\\". This will now represent \", where \ is not the escaping character, but the symbol \.
However in the Java String the " character will be escaped with a \ and you will have to replace it as well. Therefore prefixing again with \\ will do fine:
x = x.replaceAll("\"", "\\\\\"");

It took me way too long in Java to discover Pattern.quote and Matcher.quoteReplacement. These will you achieve what you are trying to do here - which is a simple "find" and "replace" - without any regex and escape logic. The Pattern.quote here would not be necessary but it shows how you can ensure that the "find" part is not interpreted as a regex string:
#Test
public void testEscapeQuotes()
{
String content="some content with \"quotes\".";
content=content.replaceAll(Pattern.quote("\""), Matcher.quoteReplacement("\\\""));
Assert.assertEquals("some content with \\\"quotes\\\".", content);
}
Remember that you can also use the simple .replace method which will also "replaceAll" but will not interpret your parameters as regular expressions:
#Test
public void testEscapeQuotes()
{
String content="some content with \"quotes\".";
content=content.replace("\"", "\\\"");
Assert.assertEquals("some content with \\\"quotes\\\".", content);
}

Much easier with Apache Commons Text-
System.out.println(StringEscapeUtils.escapeJava("\""));
Output:
\"

Honestly, I am surprised by the behaviour, but it seems like you need to double-escape the backslash:
System.out.println("\"Hello world\"".replaceAll("\"", "\\\\\""));
which outputs:
\"Hello world\"
Demo

Related

nifi regex failed to escape backslash "\" [duplicate]

How to write a regular expression to match this \" (a backslash then a quote)? Assume I have a string like this:
click to search
I need to replace all the \" with a ", so the result would look like:
click to search
This one does not work: str.replaceAll("\\\"", "\"") because it only matches the quote. Not sure how to get around with the backslash. I could have removed the backslash first, but there are other backslashes in my string.
If you don't need any of regex mechanisms like predefined character classes \d, quantifiers etc. instead of replaceAll which expects regex use replace which expects literals
str = str.replace("\\\"","\"");
Both methods will replace all occurrences of targets, but replace will treat targets literally.
BUT if you really must use regex you are looking for
str = str.replaceAll("\\\\\"", "\"")
\ is special character in regex (used for instance to create \d - character class representing digits). To make regex treat \ as normal character you need to place another \ before it to turn off its special meaning (you need to escape it). So regex which we are trying to create is \\.
But to create string literal representing text \\ so you could pass it to regex engine you need to write it as four \ ("\\\\"), because \ is also special character in String literals (part of code written using "...") since it can be used for instance as \t to represent tabulator.
That is why you also need to escape \ there.
In short you need to escape \ twice:
in regex \\
and then in String literal "\\\\"
You don't need a regular expression.
str.replace("\\\"", "\"")
should work just fine.
The replace method takes two substrings and replaces all non-overlapping occurrences of the first with the second. Per the javadoc:
public String replace(CharSequence target,
CharSequence replacement)
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".
try this: str.replaceAll("\\\\\"", "\\\"")
because Java will replace \ twice:
(1) \\\\\" --> \\" (for string)
(2) \\" --> \" (for regex)

How Java replaceAll operation works with backslashes?

Why do I need four backslashes (\) to add one backslash into a String?
String replacedValue = neName.replaceAll(",", "\\\\,");
Here in above code you can check I have to replace all commas (,) from \, but I have to add three more backslash (\) ?
Can anybody explain this concept?
Escape once for Java, and a second time for regexp.
\ -> \\ -> \\\\
Or since you're not actually using regular expressions, take khelwood's advice and use replace(String,String) so you need to only escape once.
The documentation of String.replaceAll(regex, replacement) states:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll.
The documentation of Matcher.replaceAll(replacement) then states:
backslashes are used to escape literal characters in the replacement string
So to put this more clearly, when you replace with \,, it is as if you were escaping the comma. But what you want is really the \ character, so you should escape it with \\,. Since that in Java, \ also needs to be escaped, the replacement String becomes \\\\,.
If you are having a hard time remembering all this, you can use the method Matcher.quoteReplacement(s), whose goal is to correctly escape the replacement part. Your code would become:
String replacedValue = neName.replaceAll(",", Matcher.quoteReplacement("\\,"));
\ is used for escape sequence
For example
go to next line then use \n or \r
for tab \t
likewise to print \ which is special in string literal you have to escape it with another \ which gives us \\
Now replaceAll should be used with a regex, since you're not using a regex, use replace as suggested in the comments.
String s = neName.replace(",", "\\,");
You have to first escape the backslash because it's a literal (giving \\), and then escape it again because of the regular expression (giving \\\\).
Therefore this -
String replacedValue = neName.replaceAll(",", "\\\\,"); // you need ////
You can use replace instead of replaceAll-
String replacedValue = neName.replace(",", "\\,");

How to split a string with double quotes " as the delimiter?

I tried splitting like this-
tableData.split("\\"")
but it does not work.
It seems that you tried to escape it same way as you would escape | which is "\\|". But difference between | and " is that
| is metacharacter in regex engine (it represents OR operator)
" is metacharacter in Java language in string literal (it represents start/end of the string)
To escape any String metacharacter (like ") you need to place before it other String metacharacter responsible for escaping which is \1. So to create String which would contain " like this is "quote" you would need to write it as
String s = "this is \"quote\"";
// ^^ ^^ these represent " literal, not end of string
Same idea is applied if we would like to create \ literal (we would need to escape it by placing another \ before it). For instance if we would want to create string representing c:\foo\bar we would need to write it as
String s = "c:\\foo\\bar";
// ^^ ^^ these will represent \ literal
So as you see \ is used to escape metacharacters (make them simple literals).
This character is used in Java language for Strings, but it also is used in regex engine to escape its metacharacters:
\, ^, $, ., |, ?, *, +, (, ), [, {.
If you would like to create regex which will match [ character you will need to use regex \[ but String representing this regex in Java needs to be written as
String leftBracketRegex = "\\[";
// ^^ - Remember what was said earlier?
// To create \ literal in String we need to escape it
So to split on [ we would need to invoke split("\\[") because regex representing [ is \[ which needs to be written as "\\[" in Java.
Since " is not special character in regex but it is special in String we need to escape it only in string literal by writing it as
split("\"");
1) \ is also used to create other characters line separators \n, tab \t. It can also be used to create Unicode characters like \uXXXX where XXXX is index of character in Unicode table in hexadecimal form.
You have escaped the \ by putting in \ twice, try
tableData.split("\"")
Why does this happen?
A backslash escapes the following character. Since the next character is another backslash, the second backslash will be escaped, thus the doublequote won't.
Your resulting escaped string is \", where it should really be just ".
Edit:
Also keep in mind, that String.split() interprets its pattern parameter as a regular expression, which has several special characters, which have to be escaped in the resulting string.
So if you want split by a .(which is a special regex character), you need to specify it as String.split("\\."). The first backslash escapes the escaping function of the second backlash and would result in "\.".
In case of regex characters you could also just use Pattern.quote(); to escape your desired delimiter, but this is far out of the scope the question orignally had.
Try with single backslash \
tableData.split("\"")
Try like this by escaping " with single backslash \ :
tableData.split("\"")
You are not escaping properly. The snippet code will not even compile because of it. The correct way to do it is
tableData.split("\"");
A single backslash will do the trick.
Like this:
tableData.split("\"");
You can actually split without the backward slash. You only have to use single quote
tableData.split('"');

Replacing double backslashes with single backslash

I have a string "\\u003c", which belongs to UTF-8 charset. I am unable to decode it to unicode because of the presence of double backslashes. How do i get "\u003c" from "\\u003c"? I am using java.
I tried with,
myString.replace("\\\\", "\\");
but could not achieve what i wanted.
This is my code,
String myString = FileUtils.readFileToString(file);
String a = myString.replace("\\\\", "\\");
byte[] utf8 = a.getBytes();
// Convert from UTF-8 to Unicode
a = new String(utf8, "UTF-8");
System.out.println("Converted string is:"+a);
and content of the file is
\u003c
You can use String#replaceAll:
String str = "\\\\u003c";
str= str.replaceAll("\\\\\\\\", "\\\\");
System.out.println(str);
It looks weird because the first argument is a string defining a regular expression, and \ is a special character both in string literals and in regular expressions. To actually put a \ in our search string, we need to escape it (\\) in the literal. But to actually put a \ in the regular expression, we have to escape it at the regular expression level as well. So to literally get \\ in a string, we need write \\\\ in the string literal; and to get two literal \\ to the regular expression engine, we need to escape those as well, so we end up with \\\\\\\\. That is:
String Literal String Meaning to Regex
−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−
\ Escape the next character Would depend on next char
\\ \ Escape the next character
\\\\ \\ Literal \
\\\\\\\\ \\\\ Literal \\
In the replacement parameter, even though it's not a regex, it still treats \ and $ specially — and so we have to escape them in the replacement as well. So to get one backslash in the replacement, we need four in that string literal.
Not sure if you're still looking for a solution to your problem (since you have an accepted answer) but I will still add my answer as a possible solution to the stated problem:
String str = "\\u003c";
Matcher m = Pattern.compile("(?i)\\\\u([\\da-f]{4})").matcher(str);
if (m.find()) {
String a = String.valueOf((char) Integer.parseInt(m.group(1), 16));
System.out.printf("Unicode String is: [%s]%n", a);
}
OUTPUT:
Unicode String is: [<]
Here is online demo of the above code
Regarding the problem of "replacing double backslashes with single backslashes" or, more generally, "replacing a simple string, containing \, with a different simple string, containing \" (which is not entirely the OP problem, but part of it):
Most of the answers in this thread mention replaceAll, which is a wrong tool for the job here. The easier tool is replace, but confusingly, the OP states that replace("\\\\", "\\") doesn't work for him, that's perhaps why all answers focus on replaceAll.
Important note for people with JavaScript background:
Note that replace(CharSequence, CharSequence) in Java does replace ALL occurrences of a substring - unlike in JavaScript, where it only replaces the first one!
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence.
On the other hand, replaceAll(String regex, String replacement) -- more docs also here -- is treating both parameters as more than regular strings:
Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string.
(this is because \ and $ can be used as backreferences to the captured regex groups, hence if you want to used them literally, you need to escape them).
In other words, both first and 2nd params of replace and replaceAll behave differently. For replace you need to double the \ in both params (standard escaping of a backslash in a string literal), whereas in replaceAll, you need to quadruple it! (standard string escape + function-specific escape)
To sum up, for simple replacements, one should stick to replace("\\\\", "\\") (it needs only one escaping, not two).
https://ideone.com/ANeMpw
System.out.println("a\\\\b\\\\c"); // "a\\b\\c"
System.out.println("a\\\\b\\\\c".replaceAll("\\\\\\\\", "\\\\")); // "a\b\c"
//System.out.println("a\\\\b\\\\c".replaceAll("\\\\\\\\", "\\")); // runtime error
System.out.println("a\\\\b\\\\c".replace("\\\\", "\\")); // "a\b\c"
https://www.ideone.com/Fj4RCO
String str = "\\\\u003c";
System.out.println(str); // "\\u003c"
System.out.println(str.replaceAll("\\\\\\\\", "\\\\")); // "\u003c"
System.out.println(str.replace("\\\\", "\\")); // "\u003c"
Another option, capture one of the two slashes and replace both slashes with the captured group:
public static void main(String args[])
{
String str = "C:\\\\";
str= str.replaceAll("(\\\\)\\\\", "$1");
System.out.println(str);
}
Try using,
myString.replaceAll("[\\\\]{2}", "\\\\");
This is for replacing the double back slash to single back slash
public static void main(String args[])
{
String str = "\\u003c";
str= str.replaceAll("\\\\", "\\\\");
System.out.println(str);
}
"\\u003c" does not 'belong to UTF-8 charset' at all. It is five UTF-8 characters: '\', '0', '0', '3', and 'c'. The real question here is why are the double backslashes there at all? Or, are they really there? and is your problem perhaps something completely different? If the String "\\u003c" is in your source code, there are no double backslashes in it at all at runtime, and whatever your problem may be, it doesn't concern decoding in the presence of double backslashes.

String's replaceAll() method and escape characters

The line
System.out.println("\\");
prints a single back-slash (\). And
System.out.println("\\\\");
prints double back-slashes (\\). Understood!
But why in the following code:
class ReplaceTest
{
public static void main(String[] args)
{
String s = "hello.world";
s = s.replaceAll("\\.", "\\\\");
System.out.println(s);
}
}
is the output:
hello\world
instead of
hello\\world
After all, the replaceAll() method is replacing a dot (\\.) with (\\\\).
Can someone please explain this?
When replacing characters using regular expressions, you're allowed to use backreferences, such as \1 to replace a using a grouping within the match.
This, however, means that the backslash is a special character, so if you actually want to use a backslash it needs to be escaped.
Which means it needs to actually be escaped twice when using it in a Java string. (First for the string parser, then for the regex parser.)
The javadoc of replaceAll says:
Note that backslashes ( \ ) and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
Matcher.quoteReplacement(java.lang.String) to suppress the special
meaning of these characters, if desired.
This is a formatted addendum to my comment
s = s.replaceAll("\\.", Matcher.quoteReplacement("\\"));
IS MORE READABLE AND MEANINGFUL THAN
s = s.replaceAll("\\.", "\\\\\\");
If you don't need regex for replacing and just need to replace exact strings, escape regex control characters before replace
String trickyString = "$Ha!I'm tricky|.|";
String safeToUseInReplaceAllString = Pattern.quote(trickyString);
The backslash is an escape character in Java Strings. e.g. backslash has a predefined meaning in Java. You have to use "\ \" to define a single backslash. If you want to define " \ w" then you must be using "\ \ w" in your regex. If you want to use backslash you as a literal you have to type \ \ \ \ as \ is also a escape character in regular expressions.
I believe in this particular case it would be easier to use replace instead of replace all.
Reverend Gonzo Has the correct answer when he talks about escaping the character.
Using replaceAll:
s = s.replaceAll("\\.", "\\\\\\\\");
Using replace:
s = s.replaceAll(".", "\\");
replace just takes a string to match to, not a regular expression.
I don't like this implementation of regex. We should be able to escape characters with a single '\' , not '\'. But anyway if you want to get THIS.Out_Of_That you can do:
String prefix = role.replaceFirst("(\\.).*", "");
So you get prefix = THIS;

Categories

Resources