converting string of unicode "\u0063" into "c"

converting string of unicode "\u0063" into "c" - java

I'm doing some cryptoanalysis homework and was trying to write code that does a + b = c. My idea was to use unicode. b +(b-a) = c. Problem is my code returns a the unicode value of c not the String "c" and I can't convert it.
Please can someone explain the difference between the string below called unicode and those called test and test2? Also is there any way I could get the string unicodeOfC to print "c"?
//this calculates the unicode value for c
String unicodeOfC = ("\\u" + Integer.toHexString('b'+('b'-'a') | 0x10000).substring(1));
//this prints \u0063
System.out.println(unicodeOfC);
String test = "\u0063";
//this prints c
System.out.println(test);
//this is false
System.out.println(test.equals(unicodeOfC));
String test2 = "\u0063";
//this is true
System.out.println(test.equals(test2));

There is no difference between test and test2. They are both String literals referring to the same String. This String literal is made up of a unicode escape.
A compiler for the Java programming language ("Java compiler") first
recognizes Unicode escapes in its input, translating the ASCII
characters \u followed by four hexadecimal digits to the UTF-16 code
unit (§3.1) for the indicated hexadecimal value, and passing all other
characters unchanged.
So the compiler will translate this unicode escape and convert it to the corresponding UTF-16 code unit. That is, the unicode escape \u0063 translates to the character c.
In this
String unicodeOfC = ("\\u" + Integer.toHexString('b'+('b'-'a') | 0x10000).substring(1));
the String literal "\\u" (which uses a \ character to escape a \ character) has a runtime value of \u, ie. the two character \ and u. That String is concatenated with the result of invoking toHexString(..). You then invoke substring on the resulting String and assign its result to unicodeOfC. So the String value is \u0063, ie. the 6 characters \, u, 0, 0, 6, and 3.
Also is there any way I could get the string unicodeOfC to print "c"?
Similarly to how you created it, you need to get the numerical part of the unicode escape,
String numerical = unicodeOfC.replace("\\u", "");
int val = Integer.parseInt(numerical, 16);
System.out.println((char) val);
You can then print it out.

I think you're not understanding how string escaping works.
In Java backslash is an escape character that allows you to use characters in strings like newlines \n, tabs \t, or unicode \u0063.
Suppose I am writing code and I need to print a newline. I would do this System.out.println("\n");
Now lets say I want to show a backslash, System.out.println("\"); will be a compile error but System.out.println("\\"); will print \.
So your first string is printing the literal backslash character then the letter u then the hexadecimal number.

Related

What can I do with a hex String literal? [duplicate]

This question already has answers here:
Why can some ASCII characters not be expressed in the form '\uXXXX' in Java source code?
(5 answers)
Closed 8 years ago.
I'm learning Java, and I'm on a book chapter about hex String literals. It tells me that I can create a hex String literal in this format: "\uxxxx". So I tried this:
char c = '\u0010';
int x = c;
System.out.println(x); // prints 16.
Firstly, why does the following hex String literal cause a compilation error? I was expecting that 'a' in hex would equal 10 in decimal.
char c = '\u000a';
Returns the following error:
..\src\pkgs\main\Main.java:360: error: illegal line end in character literal
char c = '\u000a';
Secondly, because of my novice Java status, I'm currently not able to appreciate what hex String literals are used for. Why would I want to use one? Can someone please provide me with a "real world" example of their use? Thanks a lot.

The fact that the compiler gives an error is because the compiler will parse the \u000a to CR
char A = '\u000A';
therefore becomes...
char A ='
';
which results in a compile-time error. To avoid this error, always use the special escape characters '\n' (line feed) and '\r' (carriage return).

As noted already, Unicode escapes are actually processed during compilation as a replacement:
Because Unicode escapes are processed very early, it is not correct to write '\u000a' for a character literal whose value is linefeed (LF); the Unicode escape \u000a is transformed into an actual linefeed in translation step 1 (§3.3) and the linefeed becomes a LineTerminator in step 2 (§3.4), and so the character literal is not valid in step 3. Instead, one should use the escape sequence '\n' (§3.10.6). Similarly, it is not correct to write '\u000d' for a character literal whose value is carriage return (CR). Instead, use '\r'.
This aspect of Unicode escapes is not just limited to character literals. For example, the following will print "hello world":
// \u000A System.out.println("hello world");
Another way to get special characters beyond an escape is to use an integer literal:
static final char NUL = 0x0000;
As for their usefulness, for one, because otherwise you'd have to copy and paste special characters or type them in with some keyboard combination. The other reason is that certain characters don't have a proper visual representation. Examples of this are null, escape, backspace and delete. Also code point 7, the bell character, which is actually an instruction for the computer to emit a beep when it gets printed.

Char in Java is of 2 bytes and hence you can print Unicode characters using char.
So if you know unicode character code, then you can store it as hex literal in char and you can use the other language characters.
You can visit this link:
http://voices.yahoo.com/how-print-unicode-characters-java-12507717.html
For understanding the use of hex literals

Determine if there is/are escape character(s) in string

Let say I have
String str="hello\" world\\";
when printing str, the output is
hello" world\
even when printing str.length() the output is
13
Is there any way to prove that str value has escape character(s)?

There is no such thing as escape characters at run time.
Escape characters appear only in String literals. For example,
String literal = "Some\nEscape\rSequence\\\"";
At compilation time, the compiler produces a String value with their actual binary representation (UTF-8 iirc). The JVM uses that String value directly.
You wrote
I am thinking that whenever we print a string and the output contains
character such as " and \, then we can conclude that those character,
" and \ was escaped?
This is not true, those characters might have been read from a file or some other InputStream. They were definitely not escaped in a text file.

Yes.
Use the Apache Commons Library, specifically StringEscapeUtils#escapeJava.
jshell> StringEscapeUtils.escapeJava("Newline \n here \u0344 and unicode \f\n\r\t\"\0\13 and more")
$136 ==> "Newline \\n here \\u0344 and unicode \\f\\n\\r\\t\\\"\\u0000\\u000B and more"
This prepends a backslash to each escape sequence and also swaps the variable-width octal sequences for fixed-width Unicode sequences. This means that every escape sequence will consist of "\\" two backslashes, followed by one of {n, b, r, t, f, ", \}, or a 'u' character, plus exactly four hexadecimal [0-F] digits.
If you just want to know whether or not the original String contains escape sequences, search for "\\" in the Apache-fied string. If you want to find the positions of those sequences, it's a bit more involved.
See more at this Gist.

Remove escape char ' \' from string in java

I have to remove \ from the string.
My String is "SEPIMOCO EUROPE\119"
I tried replace, indexOf, Pattern but I am not able to remove this \ from this string
String strconst="SEPIMOCO EUROPE\119";
System.out.println(strconst.replace("\\", " ")); // Gives SEPIMOCO EUROPE 9
System.out.println(strconst.replace("\\\\", " ")); // Gives SEPIMOCO EUROPE 9
System.out.println(strconst.indexOf("\\",0)); //Gives -1
Any solutions for this ?

Your string doesn't actually contain a backslash. This part: "\11" is treated as an octal escape sequence (so it's really a tab - U+0009). If you really want a backslash, you need:
String strconst="SEPIMOCO EUROPE\\119";
It's not really clear where you're getting your input data from or what you're trying to achieve, but that explains everything you're seeing at the moment.

You have to distinguish between the string literal, i.e. the thing you write in your source code, enclosed with double quotes, and the string value it represents. When turning the former into the latter, escape sequences are interpreted, causing a difference between these two.
Stripping from string literals
\11 in the literal represents the character with octal value 11, i.e. a tab character, in the actual string value. \11 is equivalent to \t.
There is no way to reliably obtain the escaped version of a string literal. In other words, you cannot know whether the source code contained \11 or \t, because that information isn't present in the class file any more. Therefore, if you wanted to “strip backslashes” from the sequence, you wouldn't know whether 11 or t was the correct replacement.
For this reason, you should try to fix the string literals, either to not include the backslashes if you don't want them at all, or to contain proper backslashes, by escaping them in the literal as well. \\ in a string literal gives a single \ in the string it expresses.
Runtime strings
As you comments to other answers indicate that you're actually receiving this string at runtime, I would expect the string to contain a real backslash instead of a tab character. Unless you employ some fancy input method which parses escape sequences, you will still have the raw backslash. In order to simulate that situation in testing code, you should include a real backslash in your string, i.e. a double backslash \\ in your string literal.
When you have a real backslash in your string, strconst.replace("\\", " ") should do what you want it to do:
String strconst="SEPIMOCO EUROPE\\119";
System.out.println(strconst.replace("\\", " ")); // Gives SEPIMOCO EUROPE 119

Where does your String come from? If you declare it like in the example you will want to add another escaping backslash before the one you have there.

Backslash (\) behaving differently

I have small code as shown below
public class Testing {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
String firstString = sc.next();
System.out.println("First String : " + firstString);
String secondString = "text\\";
System.out.println("Second String : " + secondString);
}
}
When I provide input as text\\ I get output as
First String : text\\
Second String : text\
Why I am getting two different string when input I provide to first string is same as second string.
Demo at www.ideone.com

The double backslash in the console you provide as input on runtime are really two backslashes. You simply wrote two times ASCII character backslash.
The double backslash inside the string literal means only one backslash. Because you can't write a single backslash in the a string literal. Why? Because backslash is a special character that is used to "escape" special characters. Eg: tab, newline, backslash, double quote. As you see, backslash is also one of the character that needs to be escaped. How do you escape? With a backslash. So, escaping a backslash is done by putting it behind a backslash. So this results in two backslashes. This will be compiled into a single backslash.
Why do you have to escape characters? Look at this string: this "is" a string. If you want to write this as a string literal in Java, you might intentionally think that it would look like this:
String str = "this "is" a string";
As you can see, this won't compile. So escape them like this:
String str = "this \"is\" a string";
Right now, the compiler knows that the " doesn't close the string but really means character ", because you escaped it with a backslash.

In Strings \ is special character, for example you can use it like \n to create new line sign. To turn off its special meaning you need to use another \ like \\. So in your 2nd case \\ will be interpreted as one \ character.
In case when you are reading Strings from outside sources (like streams) Java assume that they are normal characters, because special characters had already been converted to for example tabulators, new line chars, and so on.

Java use the \ as an escape character in the second string
EDITED on demand
In the first case, the input take all the typed characters and encapsulate them in a String, so all characters are printed (no evaluation, as they are read, they are printed).
In the second, JVM evaluate the String between ", character by character, and the first \ is read has a meta character protecting the second one, so it will not be printed.

String internally sequence of char must not be confused with the sequence of char between double quotes specially because backslash has a special meaning:
"\n\r\t\\\0" => { (char)10,(char)13,(char)9,'\\',(char)0 }

Ignoring octalescape characters in a string

Here is a sample String "one/two/three\123today" that i get from an unknown source i.e i cannot change the format of the input string that i get.
I need to get the sub-string after the backslash i.e 123today
Here the \123 is being considered as an octal escape.
I tried splitting it as a character sequence, but this considers the octal escape as a character.
I am writing the code in java.
How do i go about it?

The answer is very simple.
If you want your Java program to contain a Java String literal containing the character sequence '\', '1', '2', '3', you MUST write it as "...\\123..." in your source code.
For example:
String testInput = "one/two/three\\123today";
int pos = test.indexOf("\\123");
However, backslash escaping is only relevant to Java string (or character) literals in your source code. If your program reads the String from some file (for example), or if it assembles the String in some way that doesn't involve String or character literals, no escaping is required in the source file, or whatever. For example:
char backslash = (char) 92;
String testInput = "one/two/three" + backslash + "123today";
int pos = test.indexOf(backslash + "123");
or
String input = ... // read a file that contains the sequence '\', '1', '2', '3'
int pos = test.indexOf("\\123"); // search for that sequence
(Aside: some programming languages provide alternative String literal syntaxes that mean that you can dispense with escaping. Java does not. End of story.)
Here the \2 is being considered as an octal escape by eclipse.
For the record, it the Java Language Specification that defines this. Eclipse is just (correctly) implementing the Java Language Specification.

The string "one/two/three\123today" is exactly the same as "one/two/threeStoday". If you want to split on an 'S' character, you can do that, but there’s no way to tell whether a character was encoded directly or via an escape sequence.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.