How to convert any kind of white space to a char?

How to convert any kind of white space to a char? - java

I use String.strip() (Java 11) to remove trailing & leading white spaces from a String. There are 25 different kinds of white spaces in a String. I want to test my code with some of these 25 types of white space.
I have a code example which converts a particular type of white space (ex. \u2002) into a char and then uses it in a String. When I try to convert another white space type like \u000A to char, I get a compiler error. Why does this happen and how to fix it ?
public static void main(String...args){
char chr = '\u2002';//No problem.
//Compiler error :
//Intellij IDEA compiler - Illegal escape character in character literal.
//Java compiler - java: illegal line end in character literal.
chr = '\u000a';
String text = chr + "hello world" + chr;
text = text.strip();
System.out.println(text);
}

Are you sure you're not seeing this error instead?
error: illegal line end in character literal
Escape sequences like \u000a are processed very early in the compilation process. The \u000a is being replaced with an actual line feed character (code point 10).
It's as if you wrote this:
chr = '
';
which is why, when I try and compile your code using JDK 11.0.8, I get the "illegal line end" error.
This early conversion is described in the Java Language Specification:
Because Unicode escapes are processed very early, it is not correct to write '\u000a' for a character literal whose value is linefeed (LF); the Unicode escape \u000a is transformed into an actual linefeed in translation step 1 (§3.3) and the linefeed becomes a LineTerminator in step 2 (§3.4), and so the character literal is not valid in step 3. Instead, one should use the escape sequence '\n' (§3.10.6). Similarly, it is not correct to write '\u000d' for a character literal whose value is carriage return (CR). Instead, use '\r'.

Related

Octal representation of characters in java

I write the code System.out.println('\577'); and it produces unclosed character literal error. What's the problem here as all the digits are in the limits of the octal integers?

Just change it to System.out.println("/577");. You are getting error because you are using ' ' for a String but single quot (' ') can't take more that a Character, So you need to use " " (double quot) when you take into more than one character or string. Hope it works.

How to get index of Escape Character in a String?

How to get index of Escape Character in a String?
String test="1234\567890";
System.out.println("Result : "+test.lastIndexOf("\\"));
Result i get:
-1
Result i need: 4

Your original String doesn't contain \. Which means you are searching something which does not exist. Inorder to add \ to your string. You have to escape while adding
String test="1234\\567890";
System.out.println("Result : "+test.lastIndexOf("\\"));
Should work.
In your case look at the last line in the table.

I don't think you can get that because when you use an escape character is for java to interpret the following character in a special way. In another words, the escape character and the next character you see in the string are really one entity from the point of view of program being executed.
When you search for "\\", you are searching for the literal character '\' not the escape character.
Here you can see the difference: java fiddle
While \5 is the character with code 0x5 (ENQ), 5 is the character 0x35. See the table.

Java regex escaped characters

When matching certain characters (such as line feed), you can use the regex "\\n" or indeed just "\n". For example, the following splits a string into an array of lines:
String[] lines = allContent.split("\\r?\\n");
But the following works just as well:
String[] lines = allContent.split("\r?\n");
My question:
Do the above two work in exactly the same way, or is there any subtle difference? If the latter, can you give an example case where you get different results?
Or is there a difference only in [possible/theoretical] performance?

There is no difference in the current scenario. The usual string escape sequences are formed with the help of a single backslash and then a valid escape char ("\n", "\r", etc.) and regex escape sequences are formed with the help of a literal backslash (that is, a double backslash in the Java string literal) and a valid regex escape char ("\\n", "\\d", etc.).
"\n" (an escape sequence) is a literal LF (newline) and "\\n" is a regex escape sequence that matches an LF symbol.
"\r" (an escape sequence) is a literal CR (carriage return) and "\\r" is a regex escape sequence that matches an CR symbol.
"\t" (an escape sequence) is a literal tab symbol and "\\t" is a regex escape sequence that matches a tab symbol.
See the list in the Java regex docs for the supported list of regex escapes.
However, if you use a Pattern.COMMENTS flag (used to introduce comments and format a pattern nicely, making the regex engine ignore all unescaped whitespace in the pattern), you will need to either use "\\n" or "\\\n" to define a newline (LF) in the Java string literal and "\\r" or "\\\r" to define a carriage return (CR).
See a Java test:
String s = "\n";
System.out.println(s.replaceAll("\n", "LF")); // => LF
System.out.println(s.replaceAll("\\n", "LF")); // => LF
System.out.println(s.replaceAll("(?x)\\n", "LF")); // => LF
System.out.println(s.replaceAll("(?x)\\\n", "LF")); // => LF
System.out.println(s.replaceAll("(?x)\n", "<LF>"));
// => <LF>
//<LF>
Why is the last one producing <LF>+newline+<LF>? Because "(?x)\n" is equal to "", an empty pattern, and it matches an empty space before the newline and after it.

Yes there are different. The Java Compiler has different behavior for Unicode Escapes in the Java Book The Java Language Specification section 3.3;
The Java programming language specifies a standard way of transforming
a program written in Unicode into ASCII that changes a program into a
form that can be processed by ASCII-based tools. The transformation
involves converting any Unicode escapes in the source text of the
program to ASCII by adding an extra u - for example, \uxxxx becomes
\uuxxxx - while simultaneously converting non- ASCII characters in the
source text to Unicode escapes containing a single u each.
So how this affect the /n vs //n in the Java Doc:
It is therefore necessary to double backslashes in string literals
that represent regular expressions to protect them from interpretation
by the Java bytecode compiler.
An a example of the same doc:
The string literal "\b", for example, matches a single backspace
character when interpreted as a regular expression, while "\b"
matches a word boundary. The string literal "(hello)" is illegal and
leads to a compile-time error; in order to match the string (hello)
the string literal "\(hello\)" must be used.

converting string of unicode "\u0063" into "c"

I'm doing some cryptoanalysis homework and was trying to write code that does a + b = c. My idea was to use unicode. b +(b-a) = c. Problem is my code returns a the unicode value of c not the String "c" and I can't convert it.
Please can someone explain the difference between the string below called unicode and those called test and test2? Also is there any way I could get the string unicodeOfC to print "c"?
//this calculates the unicode value for c
String unicodeOfC = ("\\u" + Integer.toHexString('b'+('b'-'a') | 0x10000).substring(1));
//this prints \u0063
System.out.println(unicodeOfC);
String test = "\u0063";
//this prints c
System.out.println(test);
//this is false
System.out.println(test.equals(unicodeOfC));
String test2 = "\u0063";
//this is true
System.out.println(test.equals(test2));

There is no difference between test and test2. They are both String literals referring to the same String. This String literal is made up of a unicode escape.
A compiler for the Java programming language ("Java compiler") first
recognizes Unicode escapes in its input, translating the ASCII
characters \u followed by four hexadecimal digits to the UTF-16 code
unit (§3.1) for the indicated hexadecimal value, and passing all other
characters unchanged.
So the compiler will translate this unicode escape and convert it to the corresponding UTF-16 code unit. That is, the unicode escape \u0063 translates to the character c.
In this
String unicodeOfC = ("\\u" + Integer.toHexString('b'+('b'-'a') | 0x10000).substring(1));
the String literal "\\u" (which uses a \ character to escape a \ character) has a runtime value of \u, ie. the two character \ and u. That String is concatenated with the result of invoking toHexString(..). You then invoke substring on the resulting String and assign its result to unicodeOfC. So the String value is \u0063, ie. the 6 characters \, u, 0, 0, 6, and 3.
Also is there any way I could get the string unicodeOfC to print "c"?
Similarly to how you created it, you need to get the numerical part of the unicode escape,
String numerical = unicodeOfC.replace("\\u", "");
int val = Integer.parseInt(numerical, 16);
System.out.println((char) val);
You can then print it out.

I think you're not understanding how string escaping works.
In Java backslash is an escape character that allows you to use characters in strings like newlines \n, tabs \t, or unicode \u0063.
Suppose I am writing code and I need to print a newline. I would do this System.out.println("\n");
Now lets say I want to show a backslash, System.out.println("\"); will be a compile error but System.out.println("\\"); will print \.
So your first string is printing the literal backslash character then the letter u then the hexadecimal number.

What can I do with a hex String literal? [duplicate]

This question already has answers here:
Why can some ASCII characters not be expressed in the form '\uXXXX' in Java source code?
(5 answers)
Closed 8 years ago.
I'm learning Java, and I'm on a book chapter about hex String literals. It tells me that I can create a hex String literal in this format: "\uxxxx". So I tried this:
char c = '\u0010';
int x = c;
System.out.println(x); // prints 16.
Firstly, why does the following hex String literal cause a compilation error? I was expecting that 'a' in hex would equal 10 in decimal.
char c = '\u000a';
Returns the following error:
..\src\pkgs\main\Main.java:360: error: illegal line end in character literal
char c = '\u000a';
Secondly, because of my novice Java status, I'm currently not able to appreciate what hex String literals are used for. Why would I want to use one? Can someone please provide me with a "real world" example of their use? Thanks a lot.

The fact that the compiler gives an error is because the compiler will parse the \u000a to CR
char A = '\u000A';
therefore becomes...
char A ='
';
which results in a compile-time error. To avoid this error, always use the special escape characters '\n' (line feed) and '\r' (carriage return).

As noted already, Unicode escapes are actually processed during compilation as a replacement:
Because Unicode escapes are processed very early, it is not correct to write '\u000a' for a character literal whose value is linefeed (LF); the Unicode escape \u000a is transformed into an actual linefeed in translation step 1 (§3.3) and the linefeed becomes a LineTerminator in step 2 (§3.4), and so the character literal is not valid in step 3. Instead, one should use the escape sequence '\n' (§3.10.6). Similarly, it is not correct to write '\u000d' for a character literal whose value is carriage return (CR). Instead, use '\r'.
This aspect of Unicode escapes is not just limited to character literals. For example, the following will print "hello world":
// \u000A System.out.println("hello world");
Another way to get special characters beyond an escape is to use an integer literal:
static final char NUL = 0x0000;
As for their usefulness, for one, because otherwise you'd have to copy and paste special characters or type them in with some keyboard combination. The other reason is that certain characters don't have a proper visual representation. Examples of this are null, escape, backspace and delete. Also code point 7, the bell character, which is actually an instruction for the computer to emit a beep when it gets printed.

Char in Java is of 2 bytes and hence you can print Unicode characters using char.
So if you know unicode character code, then you can store it as hex literal in char and you can use the other language characters.
You can visit this link:
http://voices.yahoo.com/how-print-unicode-characters-java-12507717.html
For understanding the use of hex literals

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to convert any kind of white space to a char? - java

Related

Octal representation of characters in java

How to get index of Escape Character in a String?

Java regex escaped characters

converting string of unicode "\u0063" into "c"

What can I do with a hex String literal? [duplicate]

Categories

Resources