How to read extend ascii code in java - java

Hello today I have a problem to print extend ASCII code in java. When I try to print it. It does not display. How can I print it.

You can use the String constructor that takes a byte array and a character set to convert a code page 437 ("IBM extended ASCII") character to a Java UTF-16 char:
public static extendedAscii(int codePoint) throws UnsupportedEncodingException {
return new String(new byte[] { (byte) codePoint }, "Cp437").charAt(0);
}
(Note: Yes, all characters in code page 437 fit in single UTF-16 chars; I checked.)

Related

convert ASCII into Hex

I have a Gui where i want to convert ASCII into Hex, but it prints me fffff84 instead of 84. This only happens at ä, ö, ü. What went wrong?
Example input:
ä
Output:
ffffff84
My Code:
asciihex.addActionListener(new ActionListener() {
public void actionPerformed(ActionEvent e) {
output6.setText("");
String hexadecimal2 = input4.getText().replace("\n", "");
byte[] chars;
try {
chars = hexadecimal2.getBytes("CP850");
StringBuffer hexa = new StringBuffer();
for(int i = 0;i<chars.length;i++){
hexa.append(Integer.toHexString((int) chars[i]));
}
output6.append(hexa.toString());
} catch (UnsupportedEncodingException e1) {
e1.printStackTrace();
}
}
});
Code page 850 is not ASCII. And ä is not an ASCII character. Neither are your other examples of characters that don't work correctly.
What's happening is that the values of those characters, as bytes, are negative, because byte is a signed type in Java. (ä is -124, for instance.) -124 in two's complement hex as an int is 0xFFFFF84. You can get the unsigned version of that by adding it to 256, to get 132 (0x84). Then your conversion to hex would work.
You have to make an unsigned conversion of byte value to int, e.g.
hexa.append(Integer.toHexString((int) chars[i] & 0xFF));
or (Java 8)
hexa.append(Integer.toHexString(Byte.toUnsignedInt(chars[i])));
First of all, hexadecimal value of "ä" is not 0x84, its 0x7B.
For checking all the hexadecimal values please refer standard "ETSI TS 123 038 V14.0.0 (2017-04)".
Now for the coding part, I already made a function which takes any ASCII character and returns its hexadecimal value as per given standard. Since I do not want to post that code as it will be spoon feeding, instead I want to guide you to write your own.
Steps:
1. First refer the given document and understand the given character tables.
2. Create a list which contains the all the characters given in the table as per index values.
3. Make a function to extract the given character's index position and make the actual hexadecimal number. Do keep in mind to write extra functionality for extended character set.
Hope this will help you. :-)

Converting a string to byte[] such that the contents remain same

I have a String say String a = "abc";. Now I want to convert it into a byte array say byte b[];, so that when I print b it should show "abc".
How can I do that?
getBytes() method is giving different result.
My program looks like that so far:
String a="abc";
byte b[]=a.getBytes();
what I want is I have two methods made in a class one is
public byte[] encrypt(String a)
and another is
public String decrypt(byte[] b)
after doing encryption i saved the data into database but when i am getting it back then byte methods are not giving the correct output but i got the same data using String method but now I have to pass it into decrypt(byte[] b)
How to do it this is the real scenario.
Well, your first problem is that a String in Java is not an array of bytes, but of chars, where each of them takes 16bit. This is to cover for unicode characters, instead of only ascii that you'd get with bytes. That means that if you use the getBytes method, you won't be able to print the string one array position at a time, since it takes two array positions (two bytes) to represent one character.
What you could do is use getChars and then cast each char to a byte, with the corresponding precision los. This is not a good idea since it won't work outside of normal English characters! You asked, though, so here you go ;)
EDIT: as #PeterLawerey mentions,Unicode characters make it even harder, with some unicode characters needing more than one char. There's a good discussion in StackOverflow and it links to an detailed article from Oracle.
byte b[]=a.getBytes();
System.out.println(new String(b));
You could use this constructor to build your string back again:
String a="abc";
byte b[]=a.getBytes("UTF-8");
System.out.println(new String(b, "UTF-8"));
Other than that, you can't do System.out.println(b) and expect to see abc.
A byte is value between -128 and 127. When you print it, it will be a number by default.
If you want to print it as an ASCII char, you can cast it to a (char)
byte[] bytes = "abc".getBytes();
for(byte b: bytes)
System.out.println((char) b);
prints
a
b
c
It seems like you are implementing encryption and decryption code.
String constructors are for text data. you should not use it to convert byte array
which contains encrypted data to string value.
You should use base64 instead, which encodes any binary data into ASCII.
this one is good public domain one
http://iharder.sourceforge.net/current/java/base64/
base64 apache commons
http://commons.apache.org/codec/download_codec.cgi
String msg ="abc";
byte[] data = Base64.decode(msg);
String convert = Base64.encodeBytes(data);
This will convert "abc" to byte and then the code will print "abc" in respective ASCII code (ie. 97 98 99).
byte a[]=new byte[160];
String s="abc";
a=s.getBytes();
for(int i=0;i<s.length();i++)
{
System.out.print(a[i]+" ");
}
If you add these lines it will again change the ASCII code to String (ie. abc)
String s1=new String(a);
System.out.print("\n"+s1);
Hope it Helpes.
Modified Code:
To send byte array as an argument:
public static void another_method_name(byte b1[])
{
String s1=new String(b1);
System.out.print("\n"+s1);
}
public static void main(String[] args)
{
byte a[]=new byte[160];
String s="abc";
a=s.getBytes();
for(int i=0;i<s.length();i++)
{
System.out.print(a[i]+" ");
}
another_method_name(a);
}
Hope it helps again.

Converting binary data to String

If I have some binary data D And I convert it to string S. I expect than on converting it back to binary I will get D. But It's wrong.
public class A {
public static void main(String[] args) throws IOException {
final byte[] bytes = new byte[]{-114, 104, -35};// In hex: 8E 68 DD
System.out.println(bytes.length); //prints 3
System.out.println(new String(bytes, "UTF-8").getBytes("UTF-8").length); //prints 7
}
}
Why does this happens?
Converting between a byte array to a String and back again is not a one-to-one mapping operation. Reading the docs, the String implmentation uses the CharsetDecoder to convert the incoming byte array into unicode. The first and last bytes in your input byte array must not map to a valid unicode character, thus it replaces it with some replacement string.
It's likely that the bytes you're converting to a string don't actually form a valid string. If java can't figure out what you mean by each byte, it will attempt to fix them. This means that when you convert back to the byte array, it won't be the same as when you started. If you try with a valid set of bytes, then you should be more successful.
Your data can't be decoded into valid Unicode characters using UTF-8 encoding. Look at decoded string. It consists of 3 characters: 0xFFFD, 0x0068 and 0xFFFD. First and last are "�" - Unicode replacement characters. I think you need to choose other encoding. I.e. "CP866" produces valid string and converts back into same array.

get char value in java

How can I get the UTF8 code of a char in Java ?
I have the char 'a' and I want the value 97
I have the char 'é' and I want the value 233
here is a table for more values
I tried Character.getNumericValue(a) but for a it gives me 10 and not 97, any idea why?
This seems very basic but any help would be appreciated!
char is actually a numeric type containing the unicode value (UTF-16, to be exact - you need two chars to represent characters outside the BMP) of the character. You can do everything with it that you can do with an int.
Character.getNumericValue() tries to interpret the character as a digit.
You can use the codePointAt(int index) method of java.lang.String for that. Here's an example:
"a".codePointAt(0) --> 97
"é".codePointAt(0) --> 233
If you want to avoid creating strings unnecessarily, the following works as well and can be used for char arrays:
Character.codePointAt(new char[] {'a'},0)
Those "UTF-8" codes are no such thing. They're actually just Unicode values, as per the Unicode code charts.
So an 'é' is actually U+00E9 - in UTF-8 it would be represented by two bytes { 0xc3, 0xa9 }.
Now to get the Unicode value - or to be more precise the UTF-16 value, as that's what Java uses internally - you just need to convert the value to an integer:
char c = '\u00e9'; // c is now e-acute
int i = c; // i is now 233
This produces good result:
int a = 'a';
System.out.println(a); // outputs 97
Likewise:
System.out.println((int)'é');
prints out 233.
Note that the first example only works for characters included in the standard and extended ASCII character sets. The second works with all Unicode characters. You can achieve the same result by multiplying the char by 1.
System.out.println( 1 * 'é');
Your question is unclear. Do you want the Unicode codepoint for a particular character (which is the example you gave), or do you want to translate a Unicode codepoint into a UTF-8 byte sequence?
If the former, then I recommend the code charts at http://www.unicode.org/
If the latter, then the following program will do it:
public class Foo
{
public static void main(String[] argv)
throws Exception
{
char c = '\u00E9';
ByteArrayOutputStream bos = new ByteArrayOutputStream();
OutputStreamWriter out = new OutputStreamWriter(bos, "UTF-8");
out.write(c);
out.flush();
byte[] bytes = bos.toByteArray();
for (int ii = 0 ; ii < bytes.length ; ii++)
System.out.println(bytes[ii] & 0xFF);
}
}
(there's also an online Unicode to UTF8 page, but I don't have the URL on this machine)
My method to do it is something like this:
char c = 'c';
int i = Character.codePointAt(String.valueOf(c), 0);
// testing
System.out.println(String.format("%c -> %d", c, i)); // c -> 99
You can create a simple loop to list all the UTF-8 characters available like this:
public class UTF8Characters {
public static void main(String[] args) {
for (int i = 12; i <= 999; i++) {
System.out.println(i +" - "+ (char)i);
}
}
}
There is an open source library MgntUtils that has a Utility class StringUnicodeEncoderDecoder. That class provides static methods that convert any String into Unicode sequence vise-versa. Very simple and useful. To convert String you just do:
String codes = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(myString);
For example a String "Hello World" will be converted into
"\u0048\u0065\u006c\u006c\u006f\u0020
\u0057\u006f\u0072\u006c\u0064"
It works with any language. Here is the link to the article that explains all te ditails about the library: MgntUtils. Look for the subtitle "String Unicode converter". The article gives you link to Maven Central where you can get artifacts and github where you can get the project itself. The library comes with well written javadoc and source code.

Java String.codePointAt returns unexpected value

If I use any ASCII characters from 33 to 127, the codePointAt method gives the correct decimal value, for example:
String s1 = new String("#");
int val = s1.codePointAt(0);
This returns 35 which is the correct value.
But if I try use ASCII characters from 128 to 255 (extended ASCII/ISO-8859-1), this method gives wrong value, for example:
String s1 = new String("ƒ") // Latin small letter f with hook
int val = s1.codePointAt(0);
This should return 159 as per this reference table, but instead returns 409, why is this?
But if I try use ASCII characters from 128 to 255
ASCII doesn't have values in this range. It only uses 7 bits.
Java chars are UTF-16 (and nothing else!). If you want to represent ASCII using Java, you need to use a byte array.
The codePointAt method returns the 32-bit codepoint. 16-bit chars can't contain the entire Unicode range, so some code points must be split across two chars (as per the encoding scheme for UTF-16). The codePointAt method helps resolve to chars code points.
I wrote a rough guide to encoding in Java here.
Java chars are not encoded in ISO-8859-1. They use UTF-16 which has the same values for 7bit ASCII characters (only values from 0-127).
To get the correct value for ISO-8859-1 you have to convert your string into a byte[] with String.getBytes("ISO-8859-1"); and look in the byte array.
Update
ISO-8859-1 is not the extended ASCII encoding, use String.getBytes("Cp437"); to get the correct values.
in Unicode
ƒ 0x0192 LATIN SMALL LETTER F WITH HOOK
String.codePointAt returns the Unicode-Codepoint at this specified index.
The Unicode-Codepoint of ƒ is 402, see
http://www.decodeunicode.org/de/u+0192/properties
So
System.out.println("ƒ".codePointAt(0));
printing 402 is correct.
If you are interested in the representation in other charsets, you can printout the bytes representaion of the character in other charsets via getBytes(String charsetName):
final String s = "ƒ";
for (final String csName : Charset.availableCharsets().keySet()) {
try {
final Charset cs = Charset.forName(csName);
final CharsetEncoder encode = cs.newEncoder();
if (encode.canEncode(s))
{
System.out.println(csName + ": " + Arrays.toString(s.getBytes(csName)));
}
} catch (final UnsupportedOperationException uoe) {
} catch (final UnsupportedEncodingException e) {
}
}

Categories

Resources