How to convert a binary in surrogate pairs to unicode?

How to convert a binary in surrogate pairs to unicode? - java

Anyone can help figure out a surrogate pairs problem?
The source binary is #{EDA0BDEDB883}（encoded by Hessian/Java）, how to decode it to 😃 or "^(1F603)"?
I have checked UTF-16 wiki, but it only telled me a half story.
My problem is how to convert #{EDA0BDEDB883} to \ud83d and \ude03?
My aim is to rewrite the python3 program to Rebol or Red-lang，just parsing binary data, without any libs.
This is how python3 do it:
def _decode_surrogate_pair(c1, c2):
"""
Python 3 no longer decodes surrogate pairs for us; we have to do it
ourselves.
"""
# print(c1.encode('utf-8')) # \ud83d
# print(c2.encode('utf-8')) # \ude03
if not('\uD800' <= c1 <= '\uDBFF') or not ('\uDC00' <= c2 <= '\uDFFF'):
raise Exception("Invalid UTF-16 surrogate pair")
code = 0x10000
code += (ord(c1) & 0x03FF) << 10
code += (ord(c2) & 0x03FF)
return chr(code)
def _decode_byte_array(bytes):
s = ''
while(len(bytes)):
b, bytes = bytes[0], bytes[1:]
c = b.decode('utf-8', 'surrogatepass')
if '\uD800' <= c <= '\uDBFF':
b, bytes = bytes[0], bytes[1:]
c2 = b.decode('utf-8', 'surrogatepass')
c = _decode_surrogate_pair(c, c2)
s += c
return s
bytes = [b'\xed\xa0\xbd', b'\xed\xb8\x83']
print(_decode_byte_array(bytes))
public static void main(String[] args) throws Exception {
// "😃"
// "\uD83D\uDE03"
final byte[] bytes1 = "😃".getBytes(StandardCharsets.UTF_16);
// [-2, -1, -40, 61, -34, 3]
// #{FFFFFFFE} #{FFFFFFFF} #{FFFFFFD8} #{0000003D} #{FFFFFFDE} #{00000003}
System.out.println(Arrays.toString(bytes1));
}

Related

java equalent of HEXTORAW Function In Oracle

Hi I want the java equalent of HEXTORAW Function In Oracle .in oracle you can use :
SELECT hextoraw(to_char(ascii('z'))) from DUAL;
that generates 0122 or use
SELECT hextoraw(to_char(ascii('C'))) from DUAL;
which generates 67
all methods that i found in java dont add left 0 to the ascii codes above 99 how can i solve this?
thease are 2 of methods that i used:
String key="zs01C";
IntStream intStream = key.codePoints();
String rawKey = intStream.mapToObj(c->Integer.toString(c)).collect(Collectors.joining());
System.out.println("Method1: "+rawKey);
intStream = key.codePoints();
String rawKey2 = intStream.mapToObj(c->String.format("%2s", c).replace(' ', '0')).collect(Collectors.joining());
System.out.println("Method2: "+rawKey2);

Hex string usually double the size of byte array, so you can try this
String hex = "BC0800AF";
byte[] raw = new byte[hex.length()/2];
for(int i=0;i<hex.length()/2;i+=2) {
char c = hex.charAt(i);
char c1 = hex.charAt(i+1);
raw[i] = (byte)((c << 8) | c1);
}

iOS and Android AES Encryption (No UINT in Java)

All,
I am new to encryption so I'm not sure all of the information I need to share to get help; but I'll edit this question as I learn more about how to ask this question well :)
I am performing AES encryption on both an iOS and an android app that communicate over bluetooth to a device. I am using AES CTR encryption and it is fully implemented and functional on iOS. The problem I'm running into is that when I convert items such as my IV to a byte array; java bytes are signed and swift bytes are unsigned so while I can encrypt and decrypt my string on Java; it is a different result than what I would see in iOS.
How are other people dealing with this unsigned int issue? I feel like there's got to be some straight-forward thing I'm doing wrong. I'm really not sure what code to post. For android I'm using hex string to byte conversion functions I found here on stack overflow and they are working correctly...they're just signed instead of unsigned so the values are different than the unsigned byte arrays in iOS.
iOS Implementation:
let aesPrivateKey = "********************************"
print("MacAddress:-> \(macAddress)")
var index = 0
let aesPrivateKeyStartIndex = aesPrivateKey.startIndex
let macAddressStartIndex = macAddress.startIndex
//Perform an XOR to get the device key
var deviceKeyArray: Array<Character> = Array(repeating: "?", count: 32)
for _ in macAddress {
let nextPrivateKeyIndex = aesPrivateKey.index(aesPrivateKeyStartIndex, offsetBy: index)
let nextMacAddressIndex = macAddress.index(macAddressStartIndex, offsetBy: index)
let nextPrivateKeyString = String(aesPrivateKey[nextPrivateKeyIndex])
let nextMacAddressString = String(macAddress[nextMacAddressIndex])
let nextPrivateKeyByte = Int(nextPrivateKeyString, radix: 16)
let nextMacAddressByte = Int(nextMacAddressString, radix: 16)
let nextCombinedByte = nextPrivateKeyByte! ^ nextMacAddressByte!
let nextCombinedString = nextCombinedByte.hexString
deviceKeyArray[index] = nextCombinedString[nextCombinedString.index(nextCombinedString.startIndex, offsetBy: 1)]
index+=1
}
while(index < 32) {
let nextPrivateKeyIndex = aesPrivateKey.index(aesPrivateKeyStartIndex, offsetBy: index)
deviceKeyArray[index] = aesPrivateKey[nextPrivateKeyIndex]
index += 1
}
//Convert the device key to a byte array
let deviceKey = "0x" + String(deviceKeyArray)
let deviceKeyByte = Array<UInt8>(hex: deviceKey)
//Convert the password to a byte array
let passwordByte : Array<UInt8> = password.bytes
//Convert the initialization vector to a byte array
let aesIVHex = "0x" + AESIV
let aesIVByte = Array<UInt8>(hex: aesIVHex)
//Encrypt the password
var encrypted = [Unicode.UTF8.CodeUnit]()
do{
encrypted = try AES(key: deviceKeyByte, blockMode: CTR(iv: aesIVByte)).encrypt(passwordByte)
}
catch{
print(error)
}
print("The Encrypted Password Data: \(encrypted)")
let encryptedData = encrypted.toHexString()
//Write password to bluetooth and check result
UserDefaultUtils.setObject(encryptedData as AnyObject, key: userDefaults.password)
DeviceLockManager.shared().isEncrypted = false.
DeviceLockManager.share().setVerifyPasswordForDevice(isGunboxDevice:true)
Android implementation:
System.out.println("ble_ Password:"+str_password+"\nble_ AesKey:"+aesDeviceKey+"\nble_ AesIV:"+aesIV);
byte[] encryptedData = encrypt(
str_password.getBytes(),
Utility.getInstance().hexStringToByteArray(aesDeviceKey),
Utility.getInstance().hexStringToByteArray(aesIV));
String encryptedPassword = Utility.getInstance().bytesToHexString(encryptedData);
System.out.println("ble_ AES Encrypted password " + encryptedPassword);
byte[] decryptedData = decrypt(encryptedData, aesDeviceKey.getBytes(), aesIV.getBytes());
System.out.println("ble_ Cipher Decrypt:"+new String(decryptedData));
//Write password to bluetooth and check result
deviceManager.writePassword(encryptedPassword);
Utility.getInstance().sleep(100);
deviceManager.readPasswordResult();
All input values match exactly until I call the function: hextStringtoByteArray. At this point, the iOS byte arrays are unsigned and the android byte arrays are signed.
Here is that function for reference:
public static byte[] hexStringToByteArray(String s){
byte[] b = new byte[s.length() / 2];
for (int i = 0; i < b.length; i++) {
int index = i * 2;
int v = Integer.parseInt(s.substring(index, index + 2), 16);
b[i] = (byte) v;
}
return b;
}
Sample IV Byte Array:
iOS vs Android:
43, 34, 95, 101, 57, 150, 75, 100, 250, 178, 194, 70, 253, 236, 92, 70
43, 34, 95, 101, 57, -106, 75, 100, -6, -78, -62, 70, -3, -20, 92, 70

You might notice a difference between the two printed arrays because java by default displays a byte as a signed value. But in reality those are actually equal. To make it more clear I'll add a little table with the last 5 values of the example IV array you provided.
|----------------------------------------|
| hex | 46 | FD | EC | 5C | 46 |
|----------------------------------------|
| unsigned | 70 | 253 | 236 | 92 | 70 |
|----------------------------------------|
| signed | 70 | -3 | -20 | 92 | 70 |
|----------------------------------------|
So they are actually the same (bit wise), only printed diffently as they are interpreted as different values. If you want to make sure things are correct, I would suggest looking at a few numbers with a calculator on programming mode. Usually there is a way to set the byte/word length so you can play around with signed vs unsigned interpretation of the same Hexadecimal value (there should also be a bit-representation of the value).
As an alternative I found a small website containing a signed vs unsigned type-bit/hex converter, which will do the trick as well. (make sure you select either char-type, otherwise the signed values will be incorrect)
So in the IV-bytes part of the code there shouldn't be any problem. There might be one however when you create your String using only a byte-array as parameter. e.i:
byte[] decryptedData = decrypt(encryptedData, aesDeviceKey.getBytes(), aesIV.getBytes());
System.out.println("ble_ Cipher Decrypt:" + new String(decryptedData));
Since most likely the used Charset is not UTF-8. (you can determine that by calling Charset#defaultCharset, and check its value). The alternative would be:
new String(decryptedData, StandardCharsets.UTF_8)
or:
new String(decryptedData, "UTF-8");

Unable to call a .dll function correclty

I am calling a shared .dll function using JNA Java. From the documentation, the function can be invoked to receive parameters using Visual C++ as below;
PMSifEncodeKcdLcl(PCHAR ff, PCHAR Dta, BOOL Dbg, PCHAR szOpId, PCHAR szOpFirst, PCHAR szOpLast);
From the doc:
ff - A single ASCII character.
Dta - Points to a null-terminated string.
Dbg - a boolean flag
szOpId - points to a null-terminated string
szOpFirst - points to a null-terminated string
szOpLast - points to a null-terminated string
The string is built from a number of Data Fields. The format for each Data Field within the string is as follows:
RS FI data
RS = Record Separator.
Indicates the start of the Data Field. A single ASCII Record Separator [RS] character (hex 1E)
FI = Field Identifier - Indicates the type of data in the field. A single ASCII character.
data = the actual data. A number of ASCII characters, dependent on the Field Identifier. Sometimes the data is variable in length. The Record Separator of the following field indicates the end of a Data Field (or for the last field, the NULL character at the end of the string).
An Answer Code is returned in field ff. Answer Data (if any) is returned in field Dta
I have cross checked the JNA documentation to confirm field mappings but still no success. After trying for days. I came up with the code below;
My Java Code:
/* JNA interface class
*/
public class JNALocksInterface {
public interface LockLibrary extends StdCallLibrary {
LockLibrary INSTANCE = (LockLibrary) Native.loadLibrary("path_to_dll", LockLibrary.class);
public void PMSifEncodeKcdLcl(byte[] ff, byte[] dta, boolean debug, String szOpid, String szOpFirst, String szOpLast);
}
}
/*My Calling Class Code*/
JNALocksInterface.LockLibrary INSTANCE = JNALocksInterface.LockLibrary.INSTANCE;
String dta = "*R101*L101*TSingle Room*NMatu*FZachary*URegular Guest*D201805021347*O201805030111";
String ff = "A";
byte[] dataBytes = new byte[dta.length() + 1];
System.arraycopy(dta.getBytes("UTF-8"), 0, dataBytes, 0, dta.length());
dataBytes[dta.length()] = 0;
byte[] dtaByteArray = new byte[dta.length() + 1];
byte[] ffByteArray = ff.getBytes("UTF-8");
for (int i = 0; i < dataBytes.length; i++) {
String s1 = String.format("%8s", Integer.toBinaryString(dataBytes[i] & 0xFF)).replace(' ', '0');
// System.out.println(s1);
if((char)dataBytes[i] == '*')
{
dtaByteArray[i] = 30;
}
else{
int val = Integer.parseInt(s1, 2);
byte b = (byte) val;
dtaByteArray[i] = b;
}
}
byte[] commandCodeFinal = new byte[1];
for (int i = 0; i < ffByteArray.length; i++) {
String s2 = String.format("%8s", Integer.toBinaryString(ffByteArray[i] & 0xFF)).replace(' ', '0');
System.out.println(s2);
int val = Integer.parseInt(s2, 2);
byte b = (byte) val;
commandCodeFinal[i] = b;
}
String userNameBytes = "test";
String userFirstNameBytes = "test";
String userLastNameBytes = "test";
INSTANCE.PMSifEncodeKcdLcl(commandCodeFinal, dtaByteArray, false, userNameBytes, userFirstNameBytes, userLastNameBytes);
I am getting a wrong response on field ff and dta as shown below.
FF Response >> :
DTA Response >> 0101IR101L101TSingle RoomNMatuFZacharyURegular GuestD201805021347O2018050
I am replacing "*" with the ascii record separator.
Can someone show me how to correctly call the function using JNA? I've searched all over but still no success.

Solved IT. Used Unicode Field separator and used JNA Memory object and it Worked!
Was also using WIndows 10 64 bit. Changed to Windows 7 32 bit and it worked!!
Replaced the code with below snippet;
String fieldSeparator = "\u001e"
String dataTest = fieldSeparator+"R101"+fieldSeparator+"TSingle Room"+fieldSeparator+"FShujaa"+fieldSeparator+"NMatoke"
+ fieldSeparator+"URegular Guest"+fieldSeparator+"D201805040842"+fieldSeparator+"O201805051245";
String dataTestPadded = org.apache.commons.lang.StringUtils.rightPad(dataTest,30,'0');
System.out.println("Padded string >> " + dataTestPadded);
String data = dataTest;
//getPayloadToSend(payLoadSample) + (char)00;
String commandCode = "A";
Memory commandCodeMemory = new Memory(commandCode.length()+1);
commandCodeMemory.setString(0, commandCode);
Memory dataMemory = new Memory(data.length()+1);
dataMemory.setString(0, data);
//dataMemory.setString(1, "0");
System.out.println("Registerring >> " + INSTANCE.PMSifRegister("42860149", "BatchClient")) ;
INSTANCE.PMSifEncodeKcdLcl(commandCodeMemory, dataMemory, false, "ZKMATU", "zACHARY", "tESTING");
System.out.println("FF Response >> " + commandCodeMemory.getString(0));
System.out.println("DTA Response >> " + dataMemory.getString(0));
INSTANCE.PMSifUnregister();

Xor a string that is uint16 or uint32

I am trying to recreate the following logic I created in JAVA to swift:
public String xorMessage(String message, String key) {
try {
if (message == null || key == null) return null;
char[] keys = key.toCharArray();
char[] mesg = message.toCharArray();
int ml = mesg.length;
int kl = keys.length;
char[] newmsg = new char[ml];
for (int i = 0; i < ml; i++) {
newmsg[i] = (char)(mesg[i] ^ keys[i % kl]);
}//for i
return new String(newmsg);
} catch (Exception e) {
return null;
}
I have reached till here while coding in swift3:
import UIKit
import Foundation
let t = "22-Jun-2017 12:30 pm"
let m = "message"
print(UInt8(t))
let a :[UInt8] = Array(t.utf8)
let v = m.characters.map{String ($0) }
print(v)
func encodeWithXorByte(key: UInt8 , Input : String) -> String {
return String(bytes: Input.utf8.map{$0 ^ key}, encoding: String.Encoding.utf8) ?? ""
}
var ml :Int = Int( m.characters.count )
var kl :Int = Int (t.characters.count)
var f = [String]()
for i in 0..<ml{
let key = a[i%kl]
let input = v[i]
f.append(String(bytes: input.utf8.map{$0 ^ key} , encoding : String.Encoding.utf8)!)
// f.append(<#T##newElement: Character##Character#>)
//m[i] = input.utf8.map{$0 ^ key}
}
I am trying to obtain a string(message) which has been xor'ed with a key passed into the above function. But my code in swift is not working as it is returning character array and I want a string, if I try to cast the character array to string it does not show the unicode like \u{0001} etc in the string...
Suppose I get following output :
["_", "W", "^", "9", "\u{14}", "\t", "H"]
and then I try to convert to string, I get this:
_W^9 H
I want :
_W^9\u{14}\tH
Please help.

There are different problems. First, if your intention is to print
"unprintable" characters in a string \u{} escaped then you can use
the .debugDescription method. Example:
let s = "a\u{04}\u{08}b"
print(s) // ab
print(s.debugDescription) // "a\u{04}\u{08}b"
Next, your Swift code converts the string to UTF-8, xor's the bytes
and then converts the result back to a String. That can easily fail
if the xor'ed byte sequence is not valid UTF-8.
The Java code operates on UTF-16 code units, so the equivalent Swift
code would be
func xorMessage(message: String, key: String) -> String {
let keyChars = Array(key.utf16)
let keyLen = keyChars.count
let newMsg = message.utf16.enumerated().map { $1 ^ keyChars[$0 % keyLen] }
return String(utf16CodeUnits: newMsg, count: newMsg.count)
}
Example:
let t = "22-Jun-2017 12:30 pm"
let m = "message"
let encrypted = xorMessage(message: m, key: t)
print(encrypted.debugDescription) // "_W^9\u{14}\tH"
Finally, even that can produce unexpected results unless you restrict
the input (key and message) to ASCII characters. Example:
let m = "😀"
print(Array(m.utf16).map { String($0, radix: 16)} ) // ["d83d", "de00"]
let t = "a€"
print(Array(t.utf16).map { String($0, radix: 16)} ) // ["61", "20ac"]
let e = xorMessage(message: m, key: t)
print(Array(e.utf16).map { String($0, radix: 16)} ) // ["fffd", "feac"]
let d = xorMessage(message: e, key: t)
print(Array(d.utf16).map { String($0, radix: 16)} ) // ["ff9c", "fffd"]
print(d) // ﾜ�
print(d == m) // false
The problem is that the xor'ing produces an invalid UTF-16 sequence
(an unbalanced surrogate pair), which is then replaced by the
"replacement character" U+FFFD.
I don't know how Java handles this, but Swift strings cannot invalid
Unicode scalar values, so the only solution would be to represent
the result as an [UInt16] array instead of a String.

JIS X 0208 conversion: how to handle unified (merged) codepoints

I'm trying to convert Java characters to JIS X 0208 "x-JIS0208" encoding (or any compatible, like EUC-JP, but not Shift-JIS), but I want unified (merged) codepoints to be handled correctly.
For example, 高 is assigned to row 25 column 66 in this JISX0208 chart, and a look-alike character 髙, while classified as an unassigned codepoint, is merged with the former. I quote from wikipedia: "both the form [ ] (高) and the less common form with a ladder-like construction (髙) are subsumed into the same code point".
I tried this in code the code below, and whatever encoding I try, I always get either an exception or the unassigned-character-placeholder ? (either ASCII or full-width).
Is there a way, perhaps a different endoding or an entirely different way of converting, so both these characters return the same codepoint? Alternatively, is there an API to find such characters so I can merge them before converting?
static Charset charset1 = Charset.forName("x-JIS0208");
static Charset charset2 = Charset.forName("EUC-JP");
static Charset[] charsets = {charset1, charset2};
static CharBuffer in = CharBuffer.allocate(1);
public static void main(String[] args) throws Exception
{
CharsetEncoder[] encoders = new CharsetEncoder[charsets.length];
for (int i = 0; i < charsets.length; i++)
encoders[i] = charsets[i].newEncoder();
char[] testChars = {'　', 'Ａ', '？', '亜', '唖', '蔭', '高', '髙'};
for (char ch : testChars)
{
System.out.print("'" + ch + "'\t(" + Integer.toHexString(ch) + ")\t=");
for (int i = 0; i < charsets.length; i++)
{
System.out.print("\t" + interpret(encode1(encoders[i], ch)));
System.out.print("\t" + interpret(encode2(charsets[i], ch)));
}
System.out.println();
}
}
private static String interpret(int i)
{
if (i == -1)
return "excepti";
if (i < 0x80)
return "'" + (char)i + "'";
return Integer.toHexString(i);
}
private static int encode1(CharsetEncoder encoder, char ch)
{
in.rewind();
in.put(ch);
in.rewind();
try
{
ByteBuffer out = encoder.encode(in);
if (out.limit() == 1)
return out.get(0) & 0xFF;
return out.get(1) & 0xFF | (out.get(0) & 0xFF) << 8;
}
catch (CharacterCodingException e)
{
return -1;
}
}
private static int encode2(Charset charset, char ch)
{
in.rewind();
in.put(ch);
in.rewind();
ByteBuffer out = charset.encode(in);
if (out.limit() == 1)
return out.get(0) & 0xFF;
return out.get(1) & 0xFF | (out.get(0) & 0xFF) << 8;
}
The output:
'　' (3000) = 2121 2121 a1a1 a1a1
'Ａ' (ff21) = 2341 2341 a3c1 a3c1
'？' (ff1f) = 2129 2129 a1a9 a1a9
'亜' (4e9c) = 3021 3021 b0a1 b0a1
'唖' (5516) = 3022 3022 b0a2 b0a2
'蔭' (852d) = 307e 307e b0fe b0fe
'高' (9ad8) = 3962 3962 b9e2 b9e2
'髙' (9ad9) = excepti 2129 excepti '?'
Note: I'm only interested in converting single characters, lots of them, not strings or streams, so I actually prefer a different method (if exists) that doesn't allocate a ByteBuffer every conversion.

髙 is not contained in JIS X 0208, but is containd in Microsoft Windows code page 932 (MS932). This is a variant of Shift JIS encoding, and is a superset of JIS X 0208 charset.
You should use the name "Windows-31j" for MS932, like:
Charset.forName("Windows-31j");
rather than Charset.forName("x-JIS0208");.
EDIT
The mapping table for some characters like 𨦇 and 鋏 (scissors) is distributed from the government of Japan, like National Tax Agency (see JIS縮退マップ（Ver.1.0.0）) .
But these mapping tables don't contain the character 髙. I think this is because 髙 is not contained in JIS X 0208 nor JIS X 0213.
So, I think you will have to replace 髙 with 高 manually (with String#replaceAll()), or make your own custom Charset with CharsetProvider.

I only knew that in the spec "ARIB STD-B24" (for ISDB-T 1seg in JP), this character is coding with DRCS pattern data, from DRCS-1 to DRCS-15, and each set consists of 94
characters.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to convert a binary in surrogate pairs to unicode? - java

Related

java equalent of HEXTORAW Function In Oracle

iOS and Android AES Encryption (No UINT in Java)

Unable to call a .dll function correclty

Xor a string that is uint16 or uint32

JIS X 0208 conversion: how to handle unified (merged) codepoints

Categories

Resources