Why does Integer.parseInt("\uD835\uDFE8") fail? - java

I was under the impression that java supports unicode characters. I made this test and sadly found that it fails. The question is why? Is it a bug or somewhere documented?
// MATHEMATICAL SANS-SERIF "𝟨"
String unicodeNum6 = "\uD835\uDFE8";
int codePoint6 = unicodeNum6.codePointAt(0);
int val6 = Character.getNumericValue(codePoint6);
System.out.println("unicodeNum6 = "+ unicodeNum6
+ ", codePoint6 = "+ codePoint6+ ", val6 = "+val6);
int unicodeNum6Int = Integer.parseInt(unicodeNum6);
This fails with a Exception in thread "main" java.lang.NumberFormatException: For input string: "𝟨"
Unexpected I think, since the println works and prints the expected line:
unicodeNum6 = 𝟨, codePoint6 = 120808, val6 = 6
So Java perfectly knows the numerical value of the unicode character but does not use it in parseInt.
Can someone give a good reason why it should fail?

It's not bug, the behaviour is documented. According to the documentation for parseInt(String s, int radix) (emphasis mine)
The characters in the string must all be digits of the specified radix
(as determined by whether Character.digit(char, int) returns a
nonnegative value), except that the first character may be an ASCII
minus sign '-' ('\u002D') to indicate a negative value or an ASCII
plus sign '+' ('\u002B') to indicate a positive value
If you try :
int aa = Character.digit('\uD835', 10);
int bb = Character.digit('\uDFE8', 10);
You'll see that both return -1.
Mind you, Integer.parseInt(unicodeNum6); will just call Integer.parseInt(unicodeNum6, 10);

Related

Convert float to string and always get specific length of string?

How can I convert a float to a String and always get a resulting string of a specified length?
For example, if I have
float f = 0.023f;
and I want a 6 character string, I'd like to get 0.0230. But if I want to convert it to a 4 character string the result should be 0.02. Also, the value -13.459 limited to 5 characters should return -13.4, and to 10 characters -13.459000.
Here's what I'm using right now, but there's gotta be much prettier ways of doing this...
s = String.valueOf(f);
s = s.substring(0, Math.min(strLength, s.length()));
if( s.length() < strLength )
s = String.format("%1$-" + (strLength-s.length()) + "s", s);
From java.util.Formatter documentaion: you can use g modifier, precision field to limit number to specific number of characters and width field for padding it to column width.
String.format("%1$8.5g", 1000.4213);
https://docs.oracle.com/javase/7/docs/api/java/util/Formatter.html
Though precision doesn't include dot and exponent length – only digits in mantissa counted.
By keeping extra place for dot and cutting extra digits from fractional part if string is significantly wider that could be solved too.
String num = String.format("%1$ .5g", input);
if (num.length > 6)
num = num.substring(0, 2) + num.substring(7); //100300 => ' 1e+05'; 512.334 => ' 512.33'
Scientific format of number always follows strict set of rules, so we don't have to search for dot inside string to cut fraction out of string if sign is always included (or, like in case above – replaced by space character for positive numbers).

Converting Hex in Java, wrong value with negative values

I have seen several questions on the topic mentioned in the subject (e.g this one), but it seems to me that none of them provided this example. I'm using Java7 and I want to convert a String representing an hexadecimal or a decimal into an Integer or Long value (depends on what it represents) and I do the following:
public Number getIntegerOrLong(String num) {
try {
return Integer.decode(num);
} catch (NumberFormatException nf1) {
final long decodedLong = Long.decode(num);
if ((int) decodedLong == decodedLong) {//used in Java8 java.util.Math.toIntExact()
return (int) decodedLong;
}
return decodedLong;
}
}
When I use a String representing a decimal number everything is ok, the problem are arising with negative hexadecimals
Now, If I do:
String hex = "0x"+Integer.toHexString(Integer.MIN_VALUE);
Object number = getIntegerOrLong(hex);
assertTrue(number instanceof Integer):
fails, because it returns a Long. Same for other negative integer values.
Moreover, when I use Long.MIN_VALUE like in the following:
String hex = "0x"+Integer.toHexString(Long.MIN_VALUE);
Object number = getIntegerOrLong(hex);
assertTrue(number instanceof Long):
fails, because of NumberFormatException with message:
java.lang.NumberFormatException: For input string: "8000000000000000"
I also tried with other random Long values (so within the Long.MIN_VALUE and Long.MAX_VALUE, and it fails as well when I have negative numbers. E.g.
the String with the hexadecimal 0xc0f1a47ba0c04d89 for the Long number -4,543,669,698,155,229,815 returns:
java.lang.NumberFormatException: For input string: "c0f1a47ba0c04d89"
How can I fix the script to obtain the desired behavior?
Long.decode and Integer.decode do not accept complemented values such as returned by Integer.toHexString : the sign should be represented as a leading - as described by the DecodableString grammars found in the javadoc.
The sequence of characters following an optional sign and/or radix specifier ("0x", "0X", "#", or leading zero) is parsed as by the Long.parseLong method with the indicated radix (10, 16, or 8). This sequence of characters must represent a positive value or a NumberFormatException will be thrown. The result is negated if first character of the specified String is the minus sign
If you can change the format of your input String, then produce it with Integer.toString(value, 16) rather than Integer.toHexString(value).
If you can switch to Java 8, use parseUnsignedInt/Long.

Java - Construct a signed numeric String and convert it to an integer

Can I somehow prepend a minus sign to a numeric String and convert it into an int?
In example:
If I have 2 Strings :
String x="-";
String y="2";
how can i get them converted to an Int which value is -2?
You will first have to concatenate both Strings since - is not a valid integer character an sich. It is however acceptable when it's used together with an integer value to denote a negative value.
Therefore this will print -2 the way you want it:
String x = "-";
String y = "2";
int i = Integer.parseInt(x + y);
System.out.println(i);
Note that the x + y is used to concatenate 2 Strings and not an arithmetic operation.
Integer.valueOf("-") will throw a NumberFormatException because "-" by itself isn't a number. If you did "-1", however, you would receive the expected value of -1.
If you're trying to get a character code, use the following:
(int) "-".charAt(0);
charAt() returns a char value at a specific index, which is a two-byte unicode value that is, for all intensive purposes, an integer.

Hex to binary conversion java

I have the following code
temp = "0x00"
String binAddr = Integer.toBinaryString(Integer.parseInt(temp, 16));
Why do I get the following error:
Exception in thread "AWT-EventQueue-0" java.lang.NumberFormatException: For input string: "0x00"
Since the string contains 0x, use Integer.decode(String nm):
String binAddr = Integer.toBinaryString(Integer.decode(temp));
Because the leading 0x is not part of a valid base-16 number -- it's just a convention to indicate to a reader that a number is in hex.
Get rid of the '0x': from the javadocs:
The characters in the string must all be digits of the specified radix (as determined by whether Character.digit(char, int) returns a nonnegative value), except that the first character may be an ASCII minus sign '-' ('\u002D') to indicate a negative value or an ASCII plus sign '+' ('\u002B') to indicate a positive value. The resulting integer value is returned.
BigInteger.toString(radix) will solve this issue
Refer method description
Hope it helps.
The 0x is for integer literals, eg:
int num = 0xCAFEBABE;
but is not a parseable format. Try this:
temp = "ABFAB"; // without the "0x"
String binAddr = Integer.toBinaryString(Integer.parseInt(temp, 16));

Creating Unicode character from its number

I want to display a Unicode character in Java. If I do this, it works just fine:
String symbol = "\u2202";
symbol is equal to "∂". That's what I want.
The problem is that I know the Unicode number and need to create the Unicode symbol from that. I tried (to me) the obvious thing:
int c = 2202;
String symbol = "\\u" + c;
However, in this case, symbol is equal to "\u2202". That's not what I want.
How can I construct the symbol if I know its Unicode number (but only at run-time---I can't hard-code it in like the first example)?
If you want to get a UTF-16 encoded code unit as a char, you can parse the integer and cast to it as others have suggested.
If you want to support all code points, use Character.toChars(int). This will handle cases where code points cannot fit in a single char value.
Doc says:
Converts the specified character (Unicode code point) to its UTF-16 representation stored in a char array. If the specified code point is a BMP (Basic Multilingual Plane or Plane 0) value, the resulting char array has the same value as codePoint. If the specified code point is a supplementary code point, the resulting char array has the corresponding surrogate pair.
Just cast your int to a char. You can convert that to a String using Character.toString():
String s = Character.toString((char)c);
EDIT:
Just remember that the escape sequences in Java source code (the \u bits) are in HEX, so if you're trying to reproduce an escape sequence, you'll need something like int c = 0x2202.
The other answers here either only support unicode up to U+FFFF (the answers dealing with just one instance of char) or don't tell how to get to the actual symbol (the answers stopping at Character.toChars() or using incorrect method after that), so adding my answer here, too.
To support supplementary code points also, this is what needs to be done:
// this character:
// http://www.isthisthingon.org/unicode/index.php?page=1F&subpage=4&glyph=1F495
// using code points here, not U+n notation
// for equivalence with U+n, below would be 0xnnnn
int codePoint = 128149;
// converting to char[] pair
char[] charPair = Character.toChars(codePoint);
// and to String, containing the character we want
String symbol = new String(charPair);
// we now have str with the desired character as the first item
// confirm that we indeed have character with code point 128149
System.out.println("First code point: " + symbol.codePointAt(0));
I also did a quick test as to which conversion methods work and which don't
int codePoint = 128149;
char[] charPair = Character.toChars(codePoint);
System.out.println(new String(charPair, 0, 2).codePointAt(0)); // 128149, worked
System.out.println(charPair.toString().codePointAt(0)); // 91, didn't work
System.out.println(new String(charPair).codePointAt(0)); // 128149, worked
System.out.println(String.valueOf(codePoint).codePointAt(0)); // 49, didn't work
System.out.println(new String(new int[] {codePoint}, 0, 1).codePointAt(0));
// 128149, worked
--
Note: as #Axel mentioned in the comments, with java 11 there is Character.toString(int codePoint) which would arguably be best suited for the job.
This one worked fine for me.
String cc2 = "2202";
String text2 = String.valueOf(Character.toChars(Integer.parseInt(cc2, 16)));
Now text2 will have ∂.
Remember that char is an integral type, and thus can be given an integer value, as well as a char constant.
char c = 0x2202;//aka 8706 in decimal. \u codepoints are in hex.
String s = String.valueOf(c);
String st="2202";
int cp=Integer.parseInt(st,16);// it convert st into hex number.
char c[]=Character.toChars(cp);
System.out.println(c);// its display the character corresponding to '\u2202'.
Although this is an old question, there is a very easy way to do this in Java 11 which was released today: you can use a new overload of Character.toString():
public static String toString​(int codePoint)
Returns a String object representing the specified character (Unicode code point). The result is a string of length 1 or 2, consisting solely of the specified codePoint.
Parameters:
codePoint - the codePoint to be converted
Returns:
the string representation of the specified codePoint
Throws:
IllegalArgumentException - if the specified codePoint is not a valid Unicode code point.
Since:
11
Since this method supports any Unicode code point, the length of the returned String is not necessarily 1.
The code needed for the example given in the question is simply:
int codePoint = '\u2202';
String s = Character.toString(codePoint); // <<< Requires JDK 11 !!!
System.out.println(s); // Prints ∂
This approach offers several advantages:
It works for any Unicode code point rather than just those that can be handled using a char.
It's concise, and it's easy to understand what the code is doing.
It returns the value as a string rather than a char[], which is often what you want. The answer posted by McDowell is appropriate if you want the code point returned as char[].
This is how you do it:
int cc = 0x2202;
char ccc = (char) Integer.parseInt(String.valueOf(cc), 16);
final String text = String.valueOf(ccc);
This solution is by Arne Vajhøj.
The code below will write the 4 unicode chars (represented by decimals) for the word "be" in Japanese. Yes, the verb "be" in Japanese has 4 chars!
The value of characters is in decimal and it has been read into an array of String[] -- using split for instance. If you have Octal or Hex, parseInt take a radix as well.
// pseudo code
// 1. init the String[] containing the 4 unicodes in decima :: intsInStrs
// 2. allocate the proper number of character pairs :: c2s
// 3. Using Integer.parseInt (... with radix or not) get the right int value
// 4. place it in the correct location of in the array of character pairs
// 5. convert c2s[] to String
// 6. print
String[] intsInStrs = {"12354", "12426", "12414", "12377"}; // 1.
char [] c2s = new char [intsInStrs.length * 2]; // 2. two chars per unicode
int ii = 0;
for (String intString : intsInStrs) {
// 3. NB ii*2 because the 16 bit value of Unicode is written in 2 chars
Character.toChars(Integer.parseInt(intsInStrs[ii]), c2s, ii * 2 ); // 3 + 4
++ii; // advance to the next char
}
String symbols = new String(c2s); // 5.
System.out.println("\nLooooonger code point: " + symbols); // 6.
// I tested it in Eclipse and Java 7 and it works. Enjoy
Here is a block to print out unicode chars between \u00c0 to \u00ff:
char[] ca = {'\u00c0'};
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 16; j++) {
String sc = new String(ca);
System.out.print(sc + " ");
ca[0]++;
}
System.out.println();
}
Unfortunatelly, to remove one backlash as mentioned in first comment (newbiedoodle) don't lead to good result. Most (if not all) IDE issues syntax error. The reason is in this, that Java Escaped Unicode format expects syntax "\uXXXX", where XXXX are 4 hexadecimal digits, which are mandatory. Attempts to fold this string from pieces fails. Of course, "\u" is not the same as "\\u". The first syntax means escaped 'u', second means escaped backlash (which is backlash) followed by 'u'. It is strange, that on the Apache pages is presented utility, which doing exactly this behavior. But in reality, it is Escape mimic utility. Apache has some its own utilities (i didn't testet them), which do this work for you. May be, it is still not that, what you want to have. Apache Escape Unicode utilities But this utility 1 have good approach to the solution. With combination described above (MeraNaamJoker). My solution is create this Escaped mimic string and then convert it back to unicode (to avoid real Escaped Unicode restriction). I used it for copying text, so it is possible, that in uencode method will be better to use '\\u' except '\\\\u'. Try it.
/**
* Converts character to the mimic unicode format i.e. '\\u0020'.
*
* This format is the Java source code format.
*
* CharUtils.unicodeEscaped(' ') = "\\u0020"
* CharUtils.unicodeEscaped('A') = "\\u0041"
*
* #param ch the character to convert
* #return is in the mimic of escaped unicode string,
*/
public static String unicodeEscaped(char ch) {
String returnStr;
//String uniTemplate = "\u0000";
final static String charEsc = "\\u";
if (ch < 0x10) {
returnStr = "000" + Integer.toHexString(ch);
}
else if (ch < 0x100) {
returnStr = "00" + Integer.toHexString(ch);
}
else if (ch < 0x1000) {
returnStr = "0" + Integer.toHexString(ch);
}
else
returnStr = "" + Integer.toHexString(ch);
return charEsc + returnStr;
}
/**
* Converts the string from UTF8 to mimic unicode format i.e. '\\u0020'.
* notice: i cannot use real unicode format, because this is immediately translated
* to the character in time of compiling and editor (i.e. netbeans) checking it
* instead reaal unicode format i.e. '\u0020' i using mimic unicode format '\\u0020'
* as a string, but it doesn't gives the same results, of course
*
* This format is the Java source code format.
*
* CharUtils.unicodeEscaped(' ') = "\\u0020"
* CharUtils.unicodeEscaped('A') = "\\u0041"
*
* #param String - nationalString in the UTF8 string to convert
* #return is the string in JAVA unicode mimic escaped
*/
public String encodeStr(String nationalString) throws UnsupportedEncodingException {
String convertedString = "";
for (int i = 0; i < nationalString.length(); i++) {
Character chs = nationalString.charAt(i);
convertedString += unicodeEscaped(chs);
}
return convertedString;
}
/**
* Converts the string from mimic unicode format i.e. '\\u0020' back to UTF8.
*
* This format is the Java source code format.
*
* CharUtils.unicodeEscaped(' ') = "\\u0020"
* CharUtils.unicodeEscaped('A') = "\\u0041"
*
* #param String - nationalString in the JAVA unicode mimic escaped
* #return is the string in UTF8 string
*/
public String uencodeStr(String escapedString) throws UnsupportedEncodingException {
String convertedString = "";
String[] arrStr = escapedString.split("\\\\u");
String str, istr;
for (int i = 1; i < arrStr.length; i++) {
str = arrStr[i];
if (!str.isEmpty()) {
Integer iI = Integer.parseInt(str, 16);
char[] chaCha = Character.toChars(iI);
convertedString += String.valueOf(chaCha);
}
}
return convertedString;
}
char c=(char)0x2202;
String s=""+c;
(ANSWER IS IN DOT NET 4.5 and in java, there must be a similar approach exist)
I am from West Bengal in INDIA.
As I understand your problem is ...
You want to produce similar to ' অ ' (It is a letter in Bengali language)
which has Unicode HEX : 0X0985.
Now if you know this value in respect of your language then how will you produce that language specific Unicode symbol right ?
In Dot Net it is as simple as this :
int c = 0X0985;
string x = Char.ConvertFromUtf32(c);
Now x is your answer.
But this is HEX by HEX convert and sentence to sentence conversion is a work for researchers :P

Categories

Resources