Convert UCS-2 HEX в UTF16 BE java - java

I have a string in which the SMS text is encrypted in USC2 format received from a GSM modem
I'm trying to convert it to UTF16 but it doesn't work. Please tell me what am I doing wrong
public class USC {
public static void main(String[] args) {
String hex = "0412044B0020043F043E043B044C043704430435044204350441044C002004420430044004380444043D044B043C0020043F043B0430043D043E043C0020002204110438002B002200200441002000300033002E00310032002E0032003000320031002E002004230442043E0447043D04380442044C002004430441043B043E04320438044F";
byte[] v = hex.getBytes(StandardCharsets.UTF_16BE);
String str = new String(v);
System.out.println(str);
}
}
On the online decoder through the service https://dencode.com/ works fine

Try the following:
BigInteger bi = new BigInteger(hex, 16);
byte[] a = bi.toByteArray();
System.out.println(new String(a, Charset.forName("UTF-16")));

String hex = "0412044B0020043F043E043B044C043704430435044204350441044C002004420430044004380444043D044B043C0020043F043B0430043D043E043C0020002204110438002B002200200441002000300033002E00310032002E0032003000320031002E002004230442043E0447043D04380442044C002004430441043B043E04320438044F";
int n = hex.length/4;
char[] chars = new char[n];
for (int i = 0; i < n; ++i) {
chars[i] = Integer.parseInt(hex.substring(4*i, 4*i+4), 15) & 0xFFFF);
}
String str = new String(chars);
System.out.println(str);
4 hex chars form one UCS-2 big endian char. Same size as java char (2 bytes).
UTF-16 is superior to UCS-2 which forms a fixed-size subset. So from UCS-2 to UTF-16 needs no special treatment, only whether the 2 bytes of a char are big endian or little endian.

With JDK17 or above you could also make use of HexFormat class:
String str = new String(HexFormat.of().parseHex(hex), StandardCharsets.UTF_16BE);

Related

Convert hex to INT 32 and STRING Little Endian

First question:
Hex: F1620000
After convert hex to INT 32 i expect 253229, but i get -245235712.
I've tried these methods:
Integer.parseUnsignedInt(value, 16));
(int)Long.parseLong(value, 16));
new BigInteger(value, 16).intValue());
How i can get correct value?
Second question:
Hex: 9785908D9B9885828020912E208D2E
After covert hex to STRING i can get this value:
\u0097\u0085\u0090\u008d\u009b\u0098\u0085\u0082\u0080 \u0091. \u008d.
How can I display this value correctly in json? (usning JSONObject).
StringBuilder result = new StringBuilder();
for (int i = 0; i < value.length(); i += 2) {
String str = value.substring(i, i + 2);
result.append((char)Integer.parseInt(str, 16));
}
All your attempts are sufficient for parsing a hexadecimal string in an unsigned interpretation, but did not account for the “little endian” representation. You have to reverse the bytecode manually:
String value = "F1620000";
int i = Integer.reverseBytes(Integer.parseUnsignedInt(value, 16));
System.out.println(i);
25329
For your second task, the missing information was how to interpret the bytes to get to the character content. After searching a bit, the Codepage 866 seems to be the most plausible encoding:
String value = "9785908D9B9885828020912E208D2E";
byte[] bytes = new BigInteger(value, 16).toByteArray();
String result = new String(bytes, bytes[0]==0? 1: 0, value.length()/2, "CP866");
ЧЕРНЫШЕВА С. Н.

Encoding byte array to Hex string

I have come across a legacy piece of code encoding byte array to hex string which is in production and have never caused an issue.
This piece of code is used as:
We encrypt a user password. The encryptor returns a byte[].
We convert the byte[] to Hex String using this encoder code and then use that String representation in our properties file and so on.
However, yesterday we have hit a password, whose encrypted byte[] version is getting encoded incorrectly.
import java.math.BigInteger;
import java.util.HashMap;
import org.apache.commons.codec.DecoderException;
import org.apache.commons.codec.binary.Hex;
public class ByteArrayToHexEncoder {
public static void main(String[] args) throws DecoderException {
String hexString = "a0d21588c0a2c2fc68dc859197fc78cd"; // correct hex representation
// equivalent byte array: this is the byte array returned by the encryptor
byte[] byteArray = Hex.decodeHex(hexString.toCharArray());
// legacy encoder
System.out.println("Legacy code encodes as: " + encodeHexBytesWithPadding(byteArray));
// commons-codec encoder
System.out.println("Commons codec encode as: " + new String(Hex.encodeHex(byteArray)));
}
private static final String PADDING_ZEROS =
"0000000000000000000000000000000000000000000000000000000000000";
private static final HashMap<Integer, Character> MAP_OF_HEX = new HashMap<>();
static {
MAP_OF_HEX.put(0, '0');
MAP_OF_HEX.put(1, '1');
MAP_OF_HEX.put(2, '2');
MAP_OF_HEX.put(3, '3');
MAP_OF_HEX.put(4, '4');
MAP_OF_HEX.put(5, '5');
MAP_OF_HEX.put(6, '6');
MAP_OF_HEX.put(7, '7');
MAP_OF_HEX.put(8, '8');
MAP_OF_HEX.put(9, '9');
MAP_OF_HEX.put(10, 'a');
MAP_OF_HEX.put(11, 'b');
MAP_OF_HEX.put(12, 'c');
MAP_OF_HEX.put(13, 'd');
MAP_OF_HEX.put(14, 'e');
MAP_OF_HEX.put(15, 'f');
}
public static String encodeHexBytesWithPadding(byte[] inputByteArray) {
String encodedValue = encodeHexBytes(inputByteArray);
int expectedSize = inputByteArray.length * 2;
if (encodedValue.length() < expectedSize) {
int zerosToPad = expectedSize - encodedValue.length();
encodedValue = PADDING_ZEROS.substring(0, zerosToPad) + encodedValue;
}
return encodedValue;
}
public static String encodeHexBytes(byte[] inputByteArray) {
String encodedValue;
if (inputByteArray[0] < 0) {
// Something is wrong here! Don't know what!
byte oldValue = inputByteArray[0];
inputByteArray[0] = (byte) (oldValue & 0x0F);
int nibble = (oldValue >> 4) & 0x0F;
encodedValue = new BigInteger(inputByteArray).toString(16);
inputByteArray[0] = oldValue;
encodedValue = MAP_OF_HEX.get(nibble) + encodedValue;
} else {
encodedValue = new BigInteger(inputByteArray).toString(16);
}
return encodedValue;
}
}
The legacy code outputs the encoded value as: 0ad21588c0a2c2fc68dc859197fc78cd while the correct expected value should be: a0d21588c0a2c2fc68dc859197fc78cd.
I am trying to understand what's wrong with the encoder and need some help understanding.
BigInteger(byte[]) constructor is there to handle two's complement representation of a number where the most significant bit also denotes the sign. The Hex common-codec simply translates each byte into a hex representation, there is no special meaning to the most significant bit.
Your legacy code in the if (inputByteArray[0] < 0) branch attempts to modify the first byte in the byte[] input probably to work around the representation of negative numbers in the two-complement's form e.g. -1 being represented as ff. Unfortunately this is implemented incorrectly in your legacy code:
String input = "a000000001";
byte[] bytes = Hex.decodeHex(input.toCharArray());
System.out.println(encodeHexBytesWithPadding(bytes));
System.out.println(Hex.encodeHexString(bytes));
will print
00000000a1
a000000001
showing that the legacy code values are completely wrong.
There is not much to salvage here IMO, instead use Hex.encodeHexString() instead or check other options discussed in this question.

Lowercase hexits. .NET and Java equivalent

A trading partner has asked me to send an HMAC SHA1 hash as lowercase heaxits. The only reference I can find to them is in relation to PHP. I can do the hashing in .NET and Java but how do I output "lowercase hexits" with them? Lowercase hexits doesn't appear to be equivalent to Base64.
Ah! I love simplicity. Here's the solution.
Public Shared Function Encrypt(ByVal plainText As String, ByVal preSharedKey As String) As String
Dim preSharedKeyBytes() As Byte = Encoding.UTF8.GetBytes(preSharedKey)
Dim plainTextBytes As Byte() = Encoding.UTF8.GetBytes(plainText)
Dim hmac = New HMACSHA1(preSharedKeyBytes)
Dim cipherTextBytes As Byte() = hmac.ComputeHash(plainTextBytes)
Dim strTemp As New StringBuilder(cipherTextBytes.Length * 2)
For Each b As Byte In cipherTextBytes
strTemp.Append(Conversion.Hex(b).PadLeft(2, "0"c).ToLower)
Next
Dim cipherText As String = strTemp.ToString
Return cipherText
End Function
This is compatible with the PHP hash_hmac function with FALSE in the raw_output parameter.
For lowercase hex digits (hexits) use:
public static String toHex(byte[] bytes) {
BigInteger bi = new BigInteger(1, bytes);
return String.format("%0" + (bytes.length << 1) + "x", bi);
}
From related question:
In Java, how do I convert a byte array to a string of hex digits while keeping leading zeros?
Here's a C# translation of sedge's solution:
private static String toHex(byte[] cipherTextBytes)
{
var strTemp = new StringBuilder(cipherTextBytes.Length * 2);
foreach(Byte b in cipherTextBytes)
{
strTemp.Append(Microsoft.VisualBasic.Conversion.Hex(b).PadLeft(2, '0').ToLower());
}
String cipherText = strTemp.ToString();
return cipherText;
}

Java String HEX to String ASCII with accentuation [duplicate]

This question already has answers here:
Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte
(8 answers)
Closed 9 years ago.
I have the String String hex = "6174656ec3a7c3a36f"; and i wanna get the String output = "atenção" but in my test i only get String output = "aten????o";
what i m doing wrong?
String hex = "6174656ec3a7c3a36f";
StringBuilder output = new StringBuilder();
for (int i = 0; i < hex.length(); i+=2) {
String str = hex.substring(i, i+2);
output.append((char)Integer.parseInt(str, 16));
}
System.out.println(output); //here is the output "aten????o"
Consider
String hex = "6174656ec3a7c3a36f"; // AAA
ByteBuffer buff = ByteBuffer.allocate(hex.length()/2);
for (int i = 0; i < hex.length(); i+=2) {
buff.put((byte)Integer.parseInt(hex.substring(i, i+2), 16));
}
buff.rewind();
Charset cs = Charset.forName("UTF-8"); // BBB
CharBuffer cb = cs.decode(buff); // BBB
System.out.println(cb.toString()); // CCC
Which prints: atenção
Basically, your hex string represents the hexidecimal encoding of the bytes that represent the characters in the string atenção when encoded in UTF-8.
To decode:
You first have to go from your hex string to bytes (AAA)
Then go from bytes to chars (BBB) -- this is dependent on the encoding, in your case UTF-8.
The go from chars to a string (CCC)
Your hex string appears to denote a UTF-8 string, rather than ISO-8859-1.
The reason I can say this is that if it was ISO-8859-1, you'd have two hex digits per character. Your hex string has 18 characters, but your expected output is only 7 characters. Hence, the hex string must be a variable width encoding, and not a single byte per character like ISO-8859-1.
The following program produces the output: atenção
String hex = "6174656ec3a7c3a36f";
ByteArrayOutputStream baos = new ByteArrayOutputStream();
for (int i = 0; i < hex.length(); i += 2) {
String str = hex.substring(i, i + 2);
int byteVal = Integer.parseInt(str, 16);
baos.write(byteVal);
}
String s = new String(baos.toByteArray(), Charset.forName("UTF-8"));
If you change UTF-8 to ISO-8859-1, you'll see: atenção.
The Java Strings are Unicode: each character is encoded on 16 bits. Your String is - I suppose - a "C" string. You have to know the name of the character encoder and use CharsetDecoder.
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
public class Char8859_1Decoder {
public static void main( String[] args ) throws CharacterCodingException {
String hex = "6174656ec3a7c3a36f";
int len = hex.length();
byte[] cStr = new byte[len/2];
for( int i = 0; i < len; i+=2 ) {
cStr[i/2] = (byte)Integer.parseInt( hex.substring( i, i+2 ), 16 );
}
CharsetDecoder decoder = Charset.forName( "UTF-8" ).newDecoder();
CharBuffer cb = decoder.decode( ByteBuffer.wrap( cStr ));
System.out.println( cb.toString());
}
}
The ç and ã are 16-bit characters, so they are not represented by a byte as you assume in your decode routine, but rather by a full word.
I would, instead of converting each byte to a char, convert the bytes to java Bytes, and then use a string routine to decode the array of Bytes to a string, allowing java the dull task of determining the decoding routine.
Of course, java may guess wrong, so you might have to know ahead of time what the encoding is, as per the answer given by #Aubin or #Martin Ellis

Convert "php unicode" to character

How can I convert so called "php unicode"(link to php unicode) to normal character via Java? Example \xEF\xBC\xA1 -> A. Are there any embedded methods in jdk or should I use regex for this conversion?
You first need to get the bytes out of the string into a byte-array without changing them and then decode the byte-array as a UTF-8 string.
The simplest way to get the string into a byte array is to encode it using ISO-8859-1 which map every character with a unicode value less than 256 to a byte with the same value (or the equivalent negative)
String phpUnicode = "\u00EF\u00BC\u00A1"
byte[] bytes = phpUnicode.getBytes("ISO-8859-1"); // maps to bytes with the same ordinal value
String javaString = new String(bytes, "UTF-8");
System.out.println(javaString);
Edit
The above converts the UTF-8 to the Unicode character. If you then want to convert it to a reasonable ASCII equivalent, there's no standard way of doing that: but see this question
Edit
I assumed that you had a string containing characters that had the same ordinal value as the UTF-8 sequence but you indicate that your string literally contains the escape sequence, as in:
String phpUnicode = "\\xEF\\xBC\\xA1";
The JDK doesn't have any built-in methods to convert Strings like this so you'll need to use your own regex. Since we ultimately want to convert a utf-8 byte-sequence into a String, we need to set up a byte-array, using maybe:
Pattern oneChar = Pattern.compile("\\\\x([0-9A-F]{2})|(.)", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher matcher = oneChar.matcher(phpUnicode);
ByteArrayOutputStream bytes = new ByteArrayOutputStream();
while (matcher.find()) {
int ch;
if (matcher.group(1) == null) {
ch = matcher.group(2).charAt(0);
}
else {
ch = Integer.parseInt(matcher.group(1), 16);
}
bytes.write((int) ch);
}
String javaString = new String(bytes.toByteArray(), "UTF-8");
System.out.println(javaString);
This will generate a UTF-8 stream by converting \xAB sequences. This UTF-8 stream is then converted to a Java string. It's important to note that any character that is not part of an escape sequence will be converted to a byte equivalent to to the low-order 8 bites of the unicode character. This works fine for ascii but can cause transcoding problems for non-ascii characters.
#McDowell:
The sequence:
String phpUnicode = "\u00EF\u00BC\u00A1"
byte[] bytes = phpUnicode.getBytes("ISO-8859-1");
creates a byte array containing as many bytes as the original string has characters and for each character with a unicode value below 256, the same numeric value is stored in the byte-array.
The character FULLWIDTH LATIN CAPITAL LETTER A (U+FF41) is not present in the original String so the fact that it is not in ISO-8859-1 is irrelevant.
I know that transcoding bugs can occur when you convert characters to bytes that's why I said that ISO-8859-1 would only "map every character with a unicode value less than 256 to a byte with the same value"
The character in question is U+FF21 (FULLWIDTH LATIN CAPITAL LETTER A). The PHP form (\xEF\xBC\xA1) is a UTF-8 encoded octet sequence.
In order to decode this sequence to a Java String (which is always UTF-16), you would use the following code:
// \xEF\xBC\xA1
byte[] utf8 = { (byte) 0xEF, (byte) 0xBC, (byte) 0xA1 };
String utf16 = new String(utf8, Charset.forName("UTF-8"));
// print the char as hex
for(char ch : utf16.toCharArray()) {
System.out.format("%02x%n", (int) ch);
}
If you want to decode the data from a string literal you could use code of this form:
public static void main(String[] args) {
String utf16 = transformString("This is \\xEF\\xBC\\xA1 string");
for (char ch : utf16.toCharArray()) {
System.out.format("%s %02x%n", ch, (int) ch);
}
}
private static final Pattern SEQ
= Pattern.compile("(\\\\x\\p{Alnum}\\p{Alnum})+");
private static String transformString(String encoded) {
StringBuilder decoded = new StringBuilder();
Matcher matcher = SEQ.matcher(encoded);
int last = 0;
while (matcher.find()) {
decoded.append(encoded.substring(last, matcher.start()));
byte[] utf8 = toByteArray(encoded.substring(matcher.start(), matcher.end()));
decoded.append(new String(utf8, Charset.forName("UTF-8")));
last = matcher.end();
}
return decoded.append(encoded.substring(last, encoded.length())).toString();
}
private static byte[] toByteArray(String hexSequence) {
byte[] utf8 = new byte[hexSequence.length() / 4];
for (int i = 0; i < utf8.length; i++) {
int offset = i * 4;
String hex = hexSequence.substring(offset + 2, offset + 4);
utf8[i] = (byte) Integer.parseInt(hex, 16);
}
return utf8;
}

Categories

Resources