Backslash (\) behaving differently - java

I have small code as shown below
public class Testing {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
String firstString = sc.next();
System.out.println("First String : " + firstString);
String secondString = "text\\";
System.out.println("Second String : " + secondString);
}
}
When I provide input as text\\ I get output as
First String : text\\
Second String : text\
Why I am getting two different string when input I provide to first string is same as second string.
Demo at www.ideone.com

The double backslash in the console you provide as input on runtime are really two backslashes. You simply wrote two times ASCII character backslash.
The double backslash inside the string literal means only one backslash. Because you can't write a single backslash in the a string literal. Why? Because backslash is a special character that is used to "escape" special characters. Eg: tab, newline, backslash, double quote. As you see, backslash is also one of the character that needs to be escaped. How do you escape? With a backslash. So, escaping a backslash is done by putting it behind a backslash. So this results in two backslashes. This will be compiled into a single backslash.
Why do you have to escape characters? Look at this string: this "is" a string. If you want to write this as a string literal in Java, you might intentionally think that it would look like this:
String str = "this "is" a string";
As you can see, this won't compile. So escape them like this:
String str = "this \"is\" a string";
Right now, the compiler knows that the " doesn't close the string but really means character ", because you escaped it with a backslash.

In Strings \ is special character, for example you can use it like \n to create new line sign. To turn off its special meaning you need to use another \ like \\. So in your 2nd case \\ will be interpreted as one \ character.
In case when you are reading Strings from outside sources (like streams) Java assume that they are normal characters, because special characters had already been converted to for example tabulators, new line chars, and so on.

Java use the \ as an escape character in the second string
EDITED on demand
In the first case, the input take all the typed characters and encapsulate them in a String, so all characters are printed (no evaluation, as they are read, they are printed).
In the second, JVM evaluate the String between ", character by character, and the first \ is read has a meta character protecting the second one, so it will not be printed.

String internally sequence of char must not be confused with the sequence of char between double quotes specially because backslash has a special meaning:
"\n\r\t\\\0" => { (char)10,(char)13,(char)9,'\\',(char)0 }

Related

Why do ";" and "\\;" find the same?

I just found Java code like this:
"bla;bla;bla".split("\\;");
It returns:
["bla","bla","bla"] // String array of course
String.split does use regex, but from my research I found that ; is not a special character in regex and doesn't have to be escaped. So I tried replacing it with:
"bla;bla;bla;".split(";");
and it still does the same! So what is happening here? Is Java trying to be nice and ignores a useless backslash in the regex? But I tried it with Notepad++, too, and there it also both finds a single semikolon.
In the following code:
"bla;bla;bla".split("\\;");
String#split() executes in a regex context. Two backslashes \\ result in a literal backslash, and so you end up splitting on \;, which functionally is the same as just splitting on ;, because semicolon does not need to be escaped.
If you tried the following split, you would not the result you expect:
"bla;bla;bla".split("\\\\;");
This would correspond, in regex terms, to splitting on literal \;. Since that separator never appears in your string, you would just get an array whose first element is that input string.
See the answer by #AndyTurner for an explanation on why splitting on \; is allowed in the first place.
From the Javadoc of Pattern (emphasis mine):
The backslash character ('\') serves to introduce escaped constructs
...
It is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct; these are reserved for future extensions to the regular-expression language. A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct.
The answers are fine. However, nobody mentioned Pattern.quote()
Java does not have a raw or literal string (e.g. like a #"..."; verbatim string in C# or a r"..." raw string in Python). Nonetheless, for regular expressions we have the quote method that returns a literal pattern String for the specified String:
This method produces a String that can be used to create a Pattern
that would match the string s as if it were a literal pattern.
So, if you would have used quote to specify your pattern, no split would have happened as illustrated in the following code sample:
import java.util.regex.Pattern;
class Example
{
public static void main (String[] args) throws java.lang.Exception
{
String sourcestring = "bla;bla;bla";
Pattern re = Pattern.compile(Pattern.quote("\\;"));
String[] parts = re.split(sourcestring);
for(int partsIdx = 0; partsIdx < parts.length; partsIdx++ ){
System.out.println( "[" + partsIdx + "] = " + parts[partsIdx]);
}
}
}
Output:
[0] = bla;bla;bla
Otherwise, it's just an escaped semi-colon in the regex context of the split method as explained by Tim and Andy.

replace `\\r` with `\r` in string

I need to convert all the occurrences of \\r to \r in string.
My attempt is following:
String test = "\\r New Value";
System.out.println(test);
test = test.replaceAll("\\r", "\r");
System.out.println("output: " + test);
Output:
\r New Value
output: \r New Value
With replaceAll you would have to use .replaceAll("\\\\r", "\r"); because
to represent \ in regex you need to escape it so you need to use pass \\ to regex engine
but and to create string literal for single \ you need to write it as "\\".
Clearer way would be using replace("\\r", "\r"); which will automatically escape all regex metacharacters.
You will have to escape each of the \ symbols.
Try:
test = test.replaceAll("\\\\r", "\\r");
\ is used for escaping in Java - so every time you actually want to match a backslash in a string, you need to write it twice.
In addition, and this is what most people seem to be missing - replaceAll() takes a regex, and you just want to replace based on simple string substitution - so use replace() instead. (You can of course technically use replaceAll() on simple strings as well by escaping the regex, but then you get into either having to use Pattern.quote() on the parameters, or writing 4 backslashes just to match one backslash, because it's a special character in regular expressions as well as Java!)
A common misconception is that replace() just replaces the first instance, but this is not the case:
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence.
...the only difference is that it works for literals, not for regular expressions.
In this case you're inadvertently escaping r, which produces a carriage return feed! So based on the above, to avoid this you actually want:
String test = "\\\\r New Value";
and:
test = test.replace("\\\\r", "\\r");
Character \r is a carriage return :)
What is the difference between \r and \n?
To print \ use escape character\
System.out.println("\\");
Solution
Escape \ with \ :)
public static void main(String[] args) {
String test = "\\\\r New Value";
System.out.println(test);
test = test.replaceAll("\\\\r", "\\r");
System.out.println("output: " + test);
}
result
\\r New Value
output: \r New Value

Remove escape char ' \' from string in java

I have to remove \ from the string.
My String is "SEPIMOCO EUROPE\119"
I tried replace, indexOf, Pattern but I am not able to remove this \ from this string
String strconst="SEPIMOCO EUROPE\119";
System.out.println(strconst.replace("\\", " ")); // Gives SEPIMOCO EUROPE 9
System.out.println(strconst.replace("\\\\", " ")); // Gives SEPIMOCO EUROPE 9
System.out.println(strconst.indexOf("\\",0)); //Gives -1
Any solutions for this ?
Your string doesn't actually contain a backslash. This part: "\11" is treated as an octal escape sequence (so it's really a tab - U+0009). If you really want a backslash, you need:
String strconst="SEPIMOCO EUROPE\\119";
It's not really clear where you're getting your input data from or what you're trying to achieve, but that explains everything you're seeing at the moment.
You have to distinguish between the string literal, i.e. the thing you write in your source code, enclosed with double quotes, and the string value it represents. When turning the former into the latter, escape sequences are interpreted, causing a difference between these two.
Stripping from string literals
\11 in the literal represents the character with octal value 11, i.e. a tab character, in the actual string value. \11 is equivalent to \t.
There is no way to reliably obtain the escaped version of a string literal. In other words, you cannot know whether the source code contained \11 or \t, because that information isn't present in the class file any more. Therefore, if you wanted to “strip backslashes” from the sequence, you wouldn't know whether 11 or t was the correct replacement.
For this reason, you should try to fix the string literals, either to not include the backslashes if you don't want them at all, or to contain proper backslashes, by escaping them in the literal as well. \\ in a string literal gives a single \ in the string it expresses.
Runtime strings
As you comments to other answers indicate that you're actually receiving this string at runtime, I would expect the string to contain a real backslash instead of a tab character. Unless you employ some fancy input method which parses escape sequences, you will still have the raw backslash. In order to simulate that situation in testing code, you should include a real backslash in your string, i.e. a double backslash \\ in your string literal.
When you have a real backslash in your string, strconst.replace("\\", " ") should do what you want it to do:
String strconst="SEPIMOCO EUROPE\\119";
System.out.println(strconst.replace("\\", " ")); // Gives SEPIMOCO EUROPE 119
Where does your String come from? If you declare it like in the example you will want to add another escaping backslash before the one you have there.

Replacing double backslashes with single backslash

I have a string "\\u003c", which belongs to UTF-8 charset. I am unable to decode it to unicode because of the presence of double backslashes. How do i get "\u003c" from "\\u003c"? I am using java.
I tried with,
myString.replace("\\\\", "\\");
but could not achieve what i wanted.
This is my code,
String myString = FileUtils.readFileToString(file);
String a = myString.replace("\\\\", "\\");
byte[] utf8 = a.getBytes();
// Convert from UTF-8 to Unicode
a = new String(utf8, "UTF-8");
System.out.println("Converted string is:"+a);
and content of the file is
\u003c
You can use String#replaceAll:
String str = "\\\\u003c";
str= str.replaceAll("\\\\\\\\", "\\\\");
System.out.println(str);
It looks weird because the first argument is a string defining a regular expression, and \ is a special character both in string literals and in regular expressions. To actually put a \ in our search string, we need to escape it (\\) in the literal. But to actually put a \ in the regular expression, we have to escape it at the regular expression level as well. So to literally get \\ in a string, we need write \\\\ in the string literal; and to get two literal \\ to the regular expression engine, we need to escape those as well, so we end up with \\\\\\\\. That is:
String Literal String Meaning to Regex
−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−
\ Escape the next character Would depend on next char
\\ \ Escape the next character
\\\\ \\ Literal \
\\\\\\\\ \\\\ Literal \\
In the replacement parameter, even though it's not a regex, it still treats \ and $ specially — and so we have to escape them in the replacement as well. So to get one backslash in the replacement, we need four in that string literal.
Not sure if you're still looking for a solution to your problem (since you have an accepted answer) but I will still add my answer as a possible solution to the stated problem:
String str = "\\u003c";
Matcher m = Pattern.compile("(?i)\\\\u([\\da-f]{4})").matcher(str);
if (m.find()) {
String a = String.valueOf((char) Integer.parseInt(m.group(1), 16));
System.out.printf("Unicode String is: [%s]%n", a);
}
OUTPUT:
Unicode String is: [<]
Here is online demo of the above code
Regarding the problem of "replacing double backslashes with single backslashes" or, more generally, "replacing a simple string, containing \, with a different simple string, containing \" (which is not entirely the OP problem, but part of it):
Most of the answers in this thread mention replaceAll, which is a wrong tool for the job here. The easier tool is replace, but confusingly, the OP states that replace("\\\\", "\\") doesn't work for him, that's perhaps why all answers focus on replaceAll.
Important note for people with JavaScript background:
Note that replace(CharSequence, CharSequence) in Java does replace ALL occurrences of a substring - unlike in JavaScript, where it only replaces the first one!
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence.
On the other hand, replaceAll(String regex, String replacement) -- more docs also here -- is treating both parameters as more than regular strings:
Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string.
(this is because \ and $ can be used as backreferences to the captured regex groups, hence if you want to used them literally, you need to escape them).
In other words, both first and 2nd params of replace and replaceAll behave differently. For replace you need to double the \ in both params (standard escaping of a backslash in a string literal), whereas in replaceAll, you need to quadruple it! (standard string escape + function-specific escape)
To sum up, for simple replacements, one should stick to replace("\\\\", "\\") (it needs only one escaping, not two).
https://ideone.com/ANeMpw
System.out.println("a\\\\b\\\\c"); // "a\\b\\c"
System.out.println("a\\\\b\\\\c".replaceAll("\\\\\\\\", "\\\\")); // "a\b\c"
//System.out.println("a\\\\b\\\\c".replaceAll("\\\\\\\\", "\\")); // runtime error
System.out.println("a\\\\b\\\\c".replace("\\\\", "\\")); // "a\b\c"
https://www.ideone.com/Fj4RCO
String str = "\\\\u003c";
System.out.println(str); // "\\u003c"
System.out.println(str.replaceAll("\\\\\\\\", "\\\\")); // "\u003c"
System.out.println(str.replace("\\\\", "\\")); // "\u003c"
Another option, capture one of the two slashes and replace both slashes with the captured group:
public static void main(String args[])
{
String str = "C:\\\\";
str= str.replaceAll("(\\\\)\\\\", "$1");
System.out.println(str);
}
Try using,
myString.replaceAll("[\\\\]{2}", "\\\\");
This is for replacing the double back slash to single back slash
public static void main(String args[])
{
String str = "\\u003c";
str= str.replaceAll("\\\\", "\\\\");
System.out.println(str);
}
"\\u003c" does not 'belong to UTF-8 charset' at all. It is five UTF-8 characters: '\', '0', '0', '3', and 'c'. The real question here is why are the double backslashes there at all? Or, are they really there? and is your problem perhaps something completely different? If the String "\\u003c" is in your source code, there are no double backslashes in it at all at runtime, and whatever your problem may be, it doesn't concern decoding in the presence of double backslashes.

String format using java

I have to make below statement as string.i am trying,but it's giving invalid character sequence.I know it is basic,But not able to do this.any help on this appreciated.
String str="_1";
'\str%' ESCAPE '\'
Output should be: '\_1%' ESCAPE '\'.
Thanks,
Chaitu
String result = "'\\" + str + "%' ESCAPE '\\'";
Inside a string, a backslash character will "escape" the character after it - which causes that character to be treated differently.
Since \ has this special meaning, if you actually want the \ character itself in the string, you need to put \\. The first backslash escapes the second, causing it to be treated as a literal \ inside the string.
Knowing this, you should be able to construct the resulting string you need. Hope this helps.
String str="_1";
String source = "'\\str%' ESCAPE '\\'";
String result = source.replaceAll("str", str);
Another way to implement string interpolation. The replaceAll function finds all occurrences of str in the source string and replaces them by the passed argument.
To encode the backslash \ in a Java string, you have to duplicate it, because a single backslash works as an escape character.
Beware that the first argument if replaceAll is actually a regular expression, so some characters have a special meaning, but for simple words it will work as expected.
String str="_1";
String output = String.format("'\\%s%%' ESCAPE '\\'",str);
System.out.println(output);//prints '\_1%' ESCAPE '\'

Categories

Resources