How can I convert ASCII values to hexadecimal and binary values (not their string representation in ASCII)? For example, how can I convert the decimal value 26 to 0x1A?
So far, I've tried converting using the following steps (see below for actual code):
Converting value to bytes
Converting each byte to int
Converting each int to hex, via String.toString(intValue, radix)
Note: I did ask a related question about writing hex values to a file.
Clojure code:
(apply str
(for [byte (.getBytes value)]
(.replace (format "%2s" (Integer/toString (.intValue byte) 16)) " " "0")))))
Java code:
Byte[] bytes = "26".getBytes();
for (Byte data : bytes) {
System.out.print(String.format("%2s", Integer.toString(data.intValue(), 16)).replace(" ", "0"));
}
System.out.print("\n");
Hexadecimal, decimal, and binary integers are not different things -- there's only one underlying representation of an integer. The one thing you said you're trying to avoid -- "the ASCII string representation" -- is the only thing that's different. The variable is always the same, it's just how you print it that's different.
Now, it's not 100% clear to me what you're trying to do. But given the above, the path is clear: if you've got a String, convert it to an int by parsing (i.e., using Integer.parseInt()). Then if you want it printed in some format, it's easy to print that int as whatever you want using, for example, printf format specifiers.
If you actually want hexadecimal strings, then (format "%02X" n) is much simpler than the hoops you jump through in your first try. If you don't, then just write the integer values to a file directly without trying to convert them to a string.
Something like (.write out (read-string string-representing-a-number)) should be sufficient.
Here are your three steps rolled up into one line of clojure:
(apply str (map #(format "0x%02X " %) (.getBytes (str 42))))
convert to bytes (.getBytes (str 42))
no actual need for step 2
convert each byte to a string of characters representing it in hex
or you can make it look more like your steps with the "thread last" macro
(->> (str 42) ; start with your value
(.getBytes) ; put it in an array of bytes
(map #(format "0x%02X " %)) ; make hex string representation
(apply str)) ; optionally wrap it up in a string
static String decimalToHex(String decimal, int minLength) {
Long n = Long.parseLong(decimal, 10);
// Long.toHexString formats assuming n is unsigned.
String hex = Long.toHexString(Math.abs(n), 16);
StringBuilder sb = new StringBuilder(minLength);
if (n < 0) { sb.append('-'); }
int padding = minLength - hex.length - sb.length();
while (--padding >= 0) { sb.append('0'); }
return sb.append(hex).toString();
}
//get Unicode for char
char theChar = 'a';
//use this to go from i`enter code here`nt to Unicode or HEX or ASCII
int theValue = 26;
String hex = Integer.toHexString(theValue);
while (hex.length() < 4) {
hex = "0" + hex;
}
String unicode = "\\u" + (hex);
System.out.println(hex);
Related
How to convert an unsigned integer to EBCDIC format while sending to mainframe, suppose if want to encode 4550 to ebcdic format, below snippet i’m trying, As per the ebcdic chart numbers doesn’t have the equivalent symbol to be encoded and i’m always getting the blank result
String s = “4550”;
String e = new String(s.getBytes(),“Cp037");
Can someone please help me with the steps to encode it to EBCDIC
Mainframe expecting to be it in encoding format, when they consume the request, numbers field should be unreadable format, here is the example
C ¤,G ÚM P1234 N
fields which are in alphanumeric it’s in readable format and few fields which are in numeric it is encoded with symbols, i’m looking for a way to achieve the same
I found some solutions online which converts integer to packeddecimal to ebcdic format, but it didn't work as expected.
You need to be very clear on what is required on the Mainframe end is it pure text or is it a binary format (e.g. Cobol Comp, comp-3).
Do you need to convert it to EBCDIC. If it is just Text. Just create it as normal Text and let the Mainframe Transfer software do the translation to EBCDIC.
Java strings are 16 bit (i.e. small int) unicode. EBCDIC is a 8 bit char encoding, it is best represented as a Byte array or Stream. To convert a java String to EBCDIC:
byte[] ebcdicData = "0123".getBytes("cp037");
I found the solution of way of encoding, below is the snippet
int val = Integer.parseInt(input);
if (val >= 0 && val <= 0xffff) {
hex = String.format("%04x", val);
} else {
hex = String.format("%08x", val);
}
String[] split = hex.split("(?<=\\G.{" + 2 + "})");
String result = "";
for (String sp : split) {
byte b1[] = {(byte) Integer.parseInt(sp, 16)};
result += new String(b1, "Cp037");
}
I need to merge two Hex strings into one.
The first one is composed like this:
while(i=0;i<10;i++){
int ch = inStream.read();
String hexch="";
if (ch >= 0) {
hexch += Integer.toHexString(ch);
}
in the stream I reaceave from a serial port the characters ST=0
The second one like this:
String one = ";sp=16;"
String sqhex="";
byte[] data = one.getBytes();
int j;
for (j=0;j<data.length;j++)
{
sqhex+=Integer.toHexString(data[j]);
}
I need to compose a string with both strings that get me this: "ST=1;sp=16;" in HEX. To do so, I did this:
String mensagem =""
mensagem = hexch + sqhex;
The thing is that the resulting hex string,
53543d31d3b73703d31363b
doesn't represent what I need. Instead of "ST=1;sp=16;" I get "ST=1Ó·7Óc"
Is there anyway to merge the hex strings to build what I need?
Thanks
This is because Integer.toHexString(ch) have a varying length. So the result of your encoding process is not decodable.
//PROBLEM SOLVED
I wrote a program to convert EBCDIC string to hex.
I have a problem with some signs.
So, I read string to bytes, then change them to hex(every two signs)
Problem is that, JAVA converts Decimal ASCII 136 sign according to https://shop.alterlinks.com/ascii-table/ascii-ebcdic-us.php to Decimal ASCII 63.
Which is problematic and wrong, becouse it's the only wrong converted character.
EBCDIC 88 in UltraEdit
//edit -added code
int[] bytes = toByte(0, 8);
String bitmap = hexToBin(toHex(bytes));
//ebytes[] - ebcdic string
public int[] toByte(int from, int to){
int[] newBytes = new int[to - from + 1];
int k = 0;
for(int i = from; i <= to; i++){
newBytes[k] = ebytes[i] & 0xff;
k++;
}
return newBytes;
}
public String toHex(int[] hex){
StringBuilder sb = new StringBuilder();
for (int b : hex) {
if(Integer.toHexString(b).length() == 1){
sb.append("0" + Integer.toHexString(b));
}else{
sb.append(Integer.toHexString(b));
}
}
return sb.toString();
}
public String hexToBin(String hex){
String toReturn = new BigInteger(hex, 16).toString(2);
return String.format("%" + (hex.length() * 4) + "s", toReturn).replace(' ', '0');
}
//edit2
Changing encoding in Eclipse to ISO-8859-1 helped, but I lose some signs while reading text from a file.
//edit3
Problem solved by changing the way of reading file.
Now, I read it byte by byte and parse it to char.
Before, it was line by line.
There is no ASCII-value of 136 since ASII is only 7 bit - everything beyond 127 is some custom extended codepage (the linked table seems to use some sort of windows-codepage, e.g. Cp1252). Since it is printing a ? it seems you are using a codepage that doesn't have a character assigned to the value 136 - e.g. some flavour of ISO-8859.
Solution.
Change encoding in Eclipse to more proper (in my example ISO-8859-1).
Change the way of reading file.
Now, I read it byte by byte and parse it to char.
Before, it was line by line and this is how I lost some chars.
I want to display a Unicode character in Java. If I do this, it works just fine:
String symbol = "\u2202";
symbol is equal to "∂". That's what I want.
The problem is that I know the Unicode number and need to create the Unicode symbol from that. I tried (to me) the obvious thing:
int c = 2202;
String symbol = "\\u" + c;
However, in this case, symbol is equal to "\u2202". That's not what I want.
How can I construct the symbol if I know its Unicode number (but only at run-time---I can't hard-code it in like the first example)?
If you want to get a UTF-16 encoded code unit as a char, you can parse the integer and cast to it as others have suggested.
If you want to support all code points, use Character.toChars(int). This will handle cases where code points cannot fit in a single char value.
Doc says:
Converts the specified character (Unicode code point) to its UTF-16 representation stored in a char array. If the specified code point is a BMP (Basic Multilingual Plane or Plane 0) value, the resulting char array has the same value as codePoint. If the specified code point is a supplementary code point, the resulting char array has the corresponding surrogate pair.
Just cast your int to a char. You can convert that to a String using Character.toString():
String s = Character.toString((char)c);
EDIT:
Just remember that the escape sequences in Java source code (the \u bits) are in HEX, so if you're trying to reproduce an escape sequence, you'll need something like int c = 0x2202.
The other answers here either only support unicode up to U+FFFF (the answers dealing with just one instance of char) or don't tell how to get to the actual symbol (the answers stopping at Character.toChars() or using incorrect method after that), so adding my answer here, too.
To support supplementary code points also, this is what needs to be done:
// this character:
// http://www.isthisthingon.org/unicode/index.php?page=1F&subpage=4&glyph=1F495
// using code points here, not U+n notation
// for equivalence with U+n, below would be 0xnnnn
int codePoint = 128149;
// converting to char[] pair
char[] charPair = Character.toChars(codePoint);
// and to String, containing the character we want
String symbol = new String(charPair);
// we now have str with the desired character as the first item
// confirm that we indeed have character with code point 128149
System.out.println("First code point: " + symbol.codePointAt(0));
I also did a quick test as to which conversion methods work and which don't
int codePoint = 128149;
char[] charPair = Character.toChars(codePoint);
System.out.println(new String(charPair, 0, 2).codePointAt(0)); // 128149, worked
System.out.println(charPair.toString().codePointAt(0)); // 91, didn't work
System.out.println(new String(charPair).codePointAt(0)); // 128149, worked
System.out.println(String.valueOf(codePoint).codePointAt(0)); // 49, didn't work
System.out.println(new String(new int[] {codePoint}, 0, 1).codePointAt(0));
// 128149, worked
--
Note: as #Axel mentioned in the comments, with java 11 there is Character.toString(int codePoint) which would arguably be best suited for the job.
This one worked fine for me.
String cc2 = "2202";
String text2 = String.valueOf(Character.toChars(Integer.parseInt(cc2, 16)));
Now text2 will have ∂.
Remember that char is an integral type, and thus can be given an integer value, as well as a char constant.
char c = 0x2202;//aka 8706 in decimal. \u codepoints are in hex.
String s = String.valueOf(c);
String st="2202";
int cp=Integer.parseInt(st,16);// it convert st into hex number.
char c[]=Character.toChars(cp);
System.out.println(c);// its display the character corresponding to '\u2202'.
Although this is an old question, there is a very easy way to do this in Java 11 which was released today: you can use a new overload of Character.toString():
public static String toString(int codePoint)
Returns a String object representing the specified character (Unicode code point). The result is a string of length 1 or 2, consisting solely of the specified codePoint.
Parameters:
codePoint - the codePoint to be converted
Returns:
the string representation of the specified codePoint
Throws:
IllegalArgumentException - if the specified codePoint is not a valid Unicode code point.
Since:
11
Since this method supports any Unicode code point, the length of the returned String is not necessarily 1.
The code needed for the example given in the question is simply:
int codePoint = '\u2202';
String s = Character.toString(codePoint); // <<< Requires JDK 11 !!!
System.out.println(s); // Prints ∂
This approach offers several advantages:
It works for any Unicode code point rather than just those that can be handled using a char.
It's concise, and it's easy to understand what the code is doing.
It returns the value as a string rather than a char[], which is often what you want. The answer posted by McDowell is appropriate if you want the code point returned as char[].
This is how you do it:
int cc = 0x2202;
char ccc = (char) Integer.parseInt(String.valueOf(cc), 16);
final String text = String.valueOf(ccc);
This solution is by Arne Vajhøj.
The code below will write the 4 unicode chars (represented by decimals) for the word "be" in Japanese. Yes, the verb "be" in Japanese has 4 chars!
The value of characters is in decimal and it has been read into an array of String[] -- using split for instance. If you have Octal or Hex, parseInt take a radix as well.
// pseudo code
// 1. init the String[] containing the 4 unicodes in decima :: intsInStrs
// 2. allocate the proper number of character pairs :: c2s
// 3. Using Integer.parseInt (... with radix or not) get the right int value
// 4. place it in the correct location of in the array of character pairs
// 5. convert c2s[] to String
// 6. print
String[] intsInStrs = {"12354", "12426", "12414", "12377"}; // 1.
char [] c2s = new char [intsInStrs.length * 2]; // 2. two chars per unicode
int ii = 0;
for (String intString : intsInStrs) {
// 3. NB ii*2 because the 16 bit value of Unicode is written in 2 chars
Character.toChars(Integer.parseInt(intsInStrs[ii]), c2s, ii * 2 ); // 3 + 4
++ii; // advance to the next char
}
String symbols = new String(c2s); // 5.
System.out.println("\nLooooonger code point: " + symbols); // 6.
// I tested it in Eclipse and Java 7 and it works. Enjoy
Here is a block to print out unicode chars between \u00c0 to \u00ff:
char[] ca = {'\u00c0'};
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 16; j++) {
String sc = new String(ca);
System.out.print(sc + " ");
ca[0]++;
}
System.out.println();
}
Unfortunatelly, to remove one backlash as mentioned in first comment (newbiedoodle) don't lead to good result. Most (if not all) IDE issues syntax error. The reason is in this, that Java Escaped Unicode format expects syntax "\uXXXX", where XXXX are 4 hexadecimal digits, which are mandatory. Attempts to fold this string from pieces fails. Of course, "\u" is not the same as "\\u". The first syntax means escaped 'u', second means escaped backlash (which is backlash) followed by 'u'. It is strange, that on the Apache pages is presented utility, which doing exactly this behavior. But in reality, it is Escape mimic utility. Apache has some its own utilities (i didn't testet them), which do this work for you. May be, it is still not that, what you want to have. Apache Escape Unicode utilities But this utility 1 have good approach to the solution. With combination described above (MeraNaamJoker). My solution is create this Escaped mimic string and then convert it back to unicode (to avoid real Escaped Unicode restriction). I used it for copying text, so it is possible, that in uencode method will be better to use '\\u' except '\\\\u'. Try it.
/**
* Converts character to the mimic unicode format i.e. '\\u0020'.
*
* This format is the Java source code format.
*
* CharUtils.unicodeEscaped(' ') = "\\u0020"
* CharUtils.unicodeEscaped('A') = "\\u0041"
*
* #param ch the character to convert
* #return is in the mimic of escaped unicode string,
*/
public static String unicodeEscaped(char ch) {
String returnStr;
//String uniTemplate = "\u0000";
final static String charEsc = "\\u";
if (ch < 0x10) {
returnStr = "000" + Integer.toHexString(ch);
}
else if (ch < 0x100) {
returnStr = "00" + Integer.toHexString(ch);
}
else if (ch < 0x1000) {
returnStr = "0" + Integer.toHexString(ch);
}
else
returnStr = "" + Integer.toHexString(ch);
return charEsc + returnStr;
}
/**
* Converts the string from UTF8 to mimic unicode format i.e. '\\u0020'.
* notice: i cannot use real unicode format, because this is immediately translated
* to the character in time of compiling and editor (i.e. netbeans) checking it
* instead reaal unicode format i.e. '\u0020' i using mimic unicode format '\\u0020'
* as a string, but it doesn't gives the same results, of course
*
* This format is the Java source code format.
*
* CharUtils.unicodeEscaped(' ') = "\\u0020"
* CharUtils.unicodeEscaped('A') = "\\u0041"
*
* #param String - nationalString in the UTF8 string to convert
* #return is the string in JAVA unicode mimic escaped
*/
public String encodeStr(String nationalString) throws UnsupportedEncodingException {
String convertedString = "";
for (int i = 0; i < nationalString.length(); i++) {
Character chs = nationalString.charAt(i);
convertedString += unicodeEscaped(chs);
}
return convertedString;
}
/**
* Converts the string from mimic unicode format i.e. '\\u0020' back to UTF8.
*
* This format is the Java source code format.
*
* CharUtils.unicodeEscaped(' ') = "\\u0020"
* CharUtils.unicodeEscaped('A') = "\\u0041"
*
* #param String - nationalString in the JAVA unicode mimic escaped
* #return is the string in UTF8 string
*/
public String uencodeStr(String escapedString) throws UnsupportedEncodingException {
String convertedString = "";
String[] arrStr = escapedString.split("\\\\u");
String str, istr;
for (int i = 1; i < arrStr.length; i++) {
str = arrStr[i];
if (!str.isEmpty()) {
Integer iI = Integer.parseInt(str, 16);
char[] chaCha = Character.toChars(iI);
convertedString += String.valueOf(chaCha);
}
}
return convertedString;
}
char c=(char)0x2202;
String s=""+c;
(ANSWER IS IN DOT NET 4.5 and in java, there must be a similar approach exist)
I am from West Bengal in INDIA.
As I understand your problem is ...
You want to produce similar to ' অ ' (It is a letter in Bengali language)
which has Unicode HEX : 0X0985.
Now if you know this value in respect of your language then how will you produce that language specific Unicode symbol right ?
In Dot Net it is as simple as this :
int c = 0X0985;
string x = Char.ConvertFromUtf32(c);
Now x is your answer.
But this is HEX by HEX convert and sentence to sentence conversion is a work for researchers :P
How can I get the UTF8 code of a char in Java ?
I have the char 'a' and I want the value 97
I have the char 'é' and I want the value 233
here is a table for more values
I tried Character.getNumericValue(a) but for a it gives me 10 and not 97, any idea why?
This seems very basic but any help would be appreciated!
char is actually a numeric type containing the unicode value (UTF-16, to be exact - you need two chars to represent characters outside the BMP) of the character. You can do everything with it that you can do with an int.
Character.getNumericValue() tries to interpret the character as a digit.
You can use the codePointAt(int index) method of java.lang.String for that. Here's an example:
"a".codePointAt(0) --> 97
"é".codePointAt(0) --> 233
If you want to avoid creating strings unnecessarily, the following works as well and can be used for char arrays:
Character.codePointAt(new char[] {'a'},0)
Those "UTF-8" codes are no such thing. They're actually just Unicode values, as per the Unicode code charts.
So an 'é' is actually U+00E9 - in UTF-8 it would be represented by two bytes { 0xc3, 0xa9 }.
Now to get the Unicode value - or to be more precise the UTF-16 value, as that's what Java uses internally - you just need to convert the value to an integer:
char c = '\u00e9'; // c is now e-acute
int i = c; // i is now 233
This produces good result:
int a = 'a';
System.out.println(a); // outputs 97
Likewise:
System.out.println((int)'é');
prints out 233.
Note that the first example only works for characters included in the standard and extended ASCII character sets. The second works with all Unicode characters. You can achieve the same result by multiplying the char by 1.
System.out.println( 1 * 'é');
Your question is unclear. Do you want the Unicode codepoint for a particular character (which is the example you gave), or do you want to translate a Unicode codepoint into a UTF-8 byte sequence?
If the former, then I recommend the code charts at http://www.unicode.org/
If the latter, then the following program will do it:
public class Foo
{
public static void main(String[] argv)
throws Exception
{
char c = '\u00E9';
ByteArrayOutputStream bos = new ByteArrayOutputStream();
OutputStreamWriter out = new OutputStreamWriter(bos, "UTF-8");
out.write(c);
out.flush();
byte[] bytes = bos.toByteArray();
for (int ii = 0 ; ii < bytes.length ; ii++)
System.out.println(bytes[ii] & 0xFF);
}
}
(there's also an online Unicode to UTF8 page, but I don't have the URL on this machine)
My method to do it is something like this:
char c = 'c';
int i = Character.codePointAt(String.valueOf(c), 0);
// testing
System.out.println(String.format("%c -> %d", c, i)); // c -> 99
You can create a simple loop to list all the UTF-8 characters available like this:
public class UTF8Characters {
public static void main(String[] args) {
for (int i = 12; i <= 999; i++) {
System.out.println(i +" - "+ (char)i);
}
}
}
There is an open source library MgntUtils that has a Utility class StringUnicodeEncoderDecoder. That class provides static methods that convert any String into Unicode sequence vise-versa. Very simple and useful. To convert String you just do:
String codes = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(myString);
For example a String "Hello World" will be converted into
"\u0048\u0065\u006c\u006c\u006f\u0020
\u0057\u006f\u0072\u006c\u0064"
It works with any language. Here is the link to the article that explains all te ditails about the library: MgntUtils. Look for the subtitle "String Unicode converter". The article gives you link to Maven Central where you can get artifacts and github where you can get the project itself. The library comes with well written javadoc and source code.