convert ASCII into Hex - java

I have a Gui where i want to convert ASCII into Hex, but it prints me fffff84 instead of 84. This only happens at ä, ö, ü. What went wrong?
Example input:
ä
Output:
ffffff84
My Code:
asciihex.addActionListener(new ActionListener() {
public void actionPerformed(ActionEvent e) {
output6.setText("");
String hexadecimal2 = input4.getText().replace("\n", "");
byte[] chars;
try {
chars = hexadecimal2.getBytes("CP850");
StringBuffer hexa = new StringBuffer();
for(int i = 0;i<chars.length;i++){
hexa.append(Integer.toHexString((int) chars[i]));
}
output6.append(hexa.toString());
} catch (UnsupportedEncodingException e1) {
e1.printStackTrace();
}
}
});

Code page 850 is not ASCII. And ä is not an ASCII character. Neither are your other examples of characters that don't work correctly.
What's happening is that the values of those characters, as bytes, are negative, because byte is a signed type in Java. (ä is -124, for instance.) -124 in two's complement hex as an int is 0xFFFFF84. You can get the unsigned version of that by adding it to 256, to get 132 (0x84). Then your conversion to hex would work.

You have to make an unsigned conversion of byte value to int, e.g.
hexa.append(Integer.toHexString((int) chars[i] & 0xFF));
or (Java 8)
hexa.append(Integer.toHexString(Byte.toUnsignedInt(chars[i])));

First of all, hexadecimal value of "ä" is not 0x84, its 0x7B.
For checking all the hexadecimal values please refer standard "ETSI TS 123 038 V14.0.0 (2017-04)".
Now for the coding part, I already made a function which takes any ASCII character and returns its hexadecimal value as per given standard. Since I do not want to post that code as it will be spoon feeding, instead I want to guide you to write your own.
Steps:
1. First refer the given document and understand the given character tables.
2. Create a list which contains the all the characters given in the table as per index values.
3. Make a function to extract the given character's index position and make the actual hexadecimal number. Do keep in mind to write extra functionality for extended character set.
Hope this will help you. :-)

Related

How to read extend ascii code in java

Hello today I have a problem to print extend ASCII code in java. When I try to print it. It does not display. How can I print it.
You can use the String constructor that takes a byte array and a character set to convert a code page 437 ("IBM extended ASCII") character to a Java UTF-16 char:
public static extendedAscii(int codePoint) throws UnsupportedEncodingException {
return new String(new byte[] { (byte) codePoint }, "Cp437").charAt(0);
}
(Note: Yes, all characters in code page 437 fit in single UTF-16 chars; I checked.)

Decoding %E9 to utf8 fails

I'm having some trouble decoding some encoding char.
What i need to decode is the %E9, i have a string like this D%E9bardeur and degr%E9
What i do in my java class, is the following:
try
{
System.out.println(o);// test
o = URLDecoder.decode((String) o, "UTF-8");
}
catch (UnsupportedEncodingException e)
{
e.printStackTrace();
}
After this operation, what i get is
D�bardeur and degr�
The very same happens when i dont decode to utf-8
Any advice?
thx
%E9 is not UTF-8.
The correct way to decode this would be:
URLDecoder.decode((String) o, "ISO-8859-1")
By %E9, could you mean there is a byte in your string that evaluates to hex E9? Because if so, that flags as "multibyte" in UTF-8, and there are 2 more "continuation bytes" (within the correct range) that follow.
Because remember, UTF-8 is a variable length encoding, so some code points (character values) are represented by 1 byte, some by 2, 3, etc.
If you have a string you're treating as UTF-8 and E9 is encountered, the next 2 bytes need to be in the correct range. For example, in this string, 00, which follows E9 is not a valid continuation byte:
http://hexutf8.com/?q=0x640x650x670x720xe90x00
Here's an example where E9 in a string is followed by the correct 2 bytes:
http://hexutf8.com/?q=0xc20xa90xe90x810xaa
And the appropriate character is represented.

Converting binary data to String

If I have some binary data D And I convert it to string S. I expect than on converting it back to binary I will get D. But It's wrong.
public class A {
public static void main(String[] args) throws IOException {
final byte[] bytes = new byte[]{-114, 104, -35};// In hex: 8E 68 DD
System.out.println(bytes.length); //prints 3
System.out.println(new String(bytes, "UTF-8").getBytes("UTF-8").length); //prints 7
}
}
Why does this happens?
Converting between a byte array to a String and back again is not a one-to-one mapping operation. Reading the docs, the String implmentation uses the CharsetDecoder to convert the incoming byte array into unicode. The first and last bytes in your input byte array must not map to a valid unicode character, thus it replaces it with some replacement string.
It's likely that the bytes you're converting to a string don't actually form a valid string. If java can't figure out what you mean by each byte, it will attempt to fix them. This means that when you convert back to the byte array, it won't be the same as when you started. If you try with a valid set of bytes, then you should be more successful.
Your data can't be decoded into valid Unicode characters using UTF-8 encoding. Look at decoded string. It consists of 3 characters: 0xFFFD, 0x0068 and 0xFFFD. First and last are "�" - Unicode replacement characters. I think you need to choose other encoding. I.e. "CP866" produces valid string and converts back into same array.

get char value in java

How can I get the UTF8 code of a char in Java ?
I have the char 'a' and I want the value 97
I have the char 'é' and I want the value 233
here is a table for more values
I tried Character.getNumericValue(a) but for a it gives me 10 and not 97, any idea why?
This seems very basic but any help would be appreciated!
char is actually a numeric type containing the unicode value (UTF-16, to be exact - you need two chars to represent characters outside the BMP) of the character. You can do everything with it that you can do with an int.
Character.getNumericValue() tries to interpret the character as a digit.
You can use the codePointAt(int index) method of java.lang.String for that. Here's an example:
"a".codePointAt(0) --> 97
"é".codePointAt(0) --> 233
If you want to avoid creating strings unnecessarily, the following works as well and can be used for char arrays:
Character.codePointAt(new char[] {'a'},0)
Those "UTF-8" codes are no such thing. They're actually just Unicode values, as per the Unicode code charts.
So an 'é' is actually U+00E9 - in UTF-8 it would be represented by two bytes { 0xc3, 0xa9 }.
Now to get the Unicode value - or to be more precise the UTF-16 value, as that's what Java uses internally - you just need to convert the value to an integer:
char c = '\u00e9'; // c is now e-acute
int i = c; // i is now 233
This produces good result:
int a = 'a';
System.out.println(a); // outputs 97
Likewise:
System.out.println((int)'é');
prints out 233.
Note that the first example only works for characters included in the standard and extended ASCII character sets. The second works with all Unicode characters. You can achieve the same result by multiplying the char by 1.
System.out.println( 1 * 'é');
Your question is unclear. Do you want the Unicode codepoint for a particular character (which is the example you gave), or do you want to translate a Unicode codepoint into a UTF-8 byte sequence?
If the former, then I recommend the code charts at http://www.unicode.org/
If the latter, then the following program will do it:
public class Foo
{
public static void main(String[] argv)
throws Exception
{
char c = '\u00E9';
ByteArrayOutputStream bos = new ByteArrayOutputStream();
OutputStreamWriter out = new OutputStreamWriter(bos, "UTF-8");
out.write(c);
out.flush();
byte[] bytes = bos.toByteArray();
for (int ii = 0 ; ii < bytes.length ; ii++)
System.out.println(bytes[ii] & 0xFF);
}
}
(there's also an online Unicode to UTF8 page, but I don't have the URL on this machine)
My method to do it is something like this:
char c = 'c';
int i = Character.codePointAt(String.valueOf(c), 0);
// testing
System.out.println(String.format("%c -> %d", c, i)); // c -> 99
You can create a simple loop to list all the UTF-8 characters available like this:
public class UTF8Characters {
public static void main(String[] args) {
for (int i = 12; i <= 999; i++) {
System.out.println(i +" - "+ (char)i);
}
}
}
There is an open source library MgntUtils that has a Utility class StringUnicodeEncoderDecoder. That class provides static methods that convert any String into Unicode sequence vise-versa. Very simple and useful. To convert String you just do:
String codes = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(myString);
For example a String "Hello World" will be converted into
"\u0048\u0065\u006c\u006c\u006f\u0020
\u0057\u006f\u0072\u006c\u0064"
It works with any language. Here is the link to the article that explains all te ditails about the library: MgntUtils. Look for the subtitle "String Unicode converter". The article gives you link to Maven Central where you can get artifacts and github where you can get the project itself. The library comes with well written javadoc and source code.

Java String.codePointAt returns unexpected value

If I use any ASCII characters from 33 to 127, the codePointAt method gives the correct decimal value, for example:
String s1 = new String("#");
int val = s1.codePointAt(0);
This returns 35 which is the correct value.
But if I try use ASCII characters from 128 to 255 (extended ASCII/ISO-8859-1), this method gives wrong value, for example:
String s1 = new String("ƒ") // Latin small letter f with hook
int val = s1.codePointAt(0);
This should return 159 as per this reference table, but instead returns 409, why is this?
But if I try use ASCII characters from 128 to 255
ASCII doesn't have values in this range. It only uses 7 bits.
Java chars are UTF-16 (and nothing else!). If you want to represent ASCII using Java, you need to use a byte array.
The codePointAt method returns the 32-bit codepoint. 16-bit chars can't contain the entire Unicode range, so some code points must be split across two chars (as per the encoding scheme for UTF-16). The codePointAt method helps resolve to chars code points.
I wrote a rough guide to encoding in Java here.
Java chars are not encoded in ISO-8859-1. They use UTF-16 which has the same values for 7bit ASCII characters (only values from 0-127).
To get the correct value for ISO-8859-1 you have to convert your string into a byte[] with String.getBytes("ISO-8859-1"); and look in the byte array.
Update
ISO-8859-1 is not the extended ASCII encoding, use String.getBytes("Cp437"); to get the correct values.
in Unicode
ƒ 0x0192 LATIN SMALL LETTER F WITH HOOK
String.codePointAt returns the Unicode-Codepoint at this specified index.
The Unicode-Codepoint of ƒ is 402, see
http://www.decodeunicode.org/de/u+0192/properties
So
System.out.println("ƒ".codePointAt(0));
printing 402 is correct.
If you are interested in the representation in other charsets, you can printout the bytes representaion of the character in other charsets via getBytes(String charsetName):
final String s = "ƒ";
for (final String csName : Charset.availableCharsets().keySet()) {
try {
final Charset cs = Charset.forName(csName);
final CharsetEncoder encode = cs.newEncoder();
if (encode.canEncode(s))
{
System.out.println(csName + ": " + Arrays.toString(s.getBytes(csName)));
}
} catch (final UnsupportedOperationException uoe) {
} catch (final UnsupportedEncodingException e) {
}
}

Categories

Resources