Convert a byte array from one encoding to another java

Convert a byte array from one encoding to another java - java

hi guys i should convert this code to C# in Java. Could you give me a hand?
private static String ConvertStringToHexStringByteArray(String input) {
Encoding ebcdic = Encoding.GetEncoding("IBM037");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(input);
byte[] isoBytes = Encoding.Convert(utf8, ebcdic, utfBytes);
StringBuilder hex = new StringBuilder(isoBytes.length * 2);
foreach( byte b in isoBytes)
hex.AppendFormat("{0:x2}", b);
return hex.ToString();
}
I tried to convert it to java like this. But the result is different:
private static String ConvertStringToHexStringByteArray(String input) throws UnsupportedEncodingException {
byte[] isoBytes = input.getBytes("IBM037");
StringBuilder hex = new StringBuilder(isoBytes.length * 2);
for (byte b : isoBytes) {
hex.append(String.format("%02x", b));
}
return hex.toString();
}
input = "X1GRUPPO 00000000726272772"
expected = "e7f1c7d9e4d7d7d64040404040f0f0f0f0f0f0f0f0f1f6f7f3f5f3f5f5f2"
result = "e7f1c7d9e4d7d7d640f0f0f0f0f0f0f0f0f7f2f6f2f7f2f7f7f2"
what am I doing wrong?

Your code works but you are comparing the output for two different input strings.
When you write expected and result side by side:
e7f1c7d9e4d7d7d64040404040f0f0f0f0f0f0f0f0f1f6f7f3f5f3f5f5f2
e7f1c7d9e4d7d7d640f0f0f0f0f0f0f0f0f7f2f6f2f7f2f7f7f2
you will notice that both start with the same sequence (e7f1c7d9e4d7d7d6) which seems to come from a common beginning X1GRUPPO
But then the two outputs differ:
4040404040f0f0f0f0f0f0f0f0f1f6f7f3f5f3f5f5f2
40f0f0f0f0f0f0f0f0f7f2f6f2f7f2f7f7f2
Reasoning from the input that you provided, the remainder of first input string starts with 5 spaces followed by "00000000167353552"
This means the complete input for the C# code was "X1GRUPPO 00000000167353552", which is not the same input that you provided to the Java code and then clearly the output cannot match.

Related

Java - What is the proper way to convert a UTF-8 String to binary?

I'm using this code to convert a UTF-8 String to binary:
public String toBinary(String str) {
byte[] buf = str.getBytes(StandardCharsets.UTF_8);
StringBuilder result = new StringBuilder();
for (int i = 0; i < buf.length; i++) {
int ch = (int) buf[i];
String binary = Integer.toBinaryString(ch);
result.append(("00000000" + binary).substring(binary.length()));
result.append(' ');
}
return result.toString().trim();
}
Before I was using this code:
private String toBinary2(String str) {
StringBuilder result = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
int ch = (int) str.charAt(i);
String binary = Integer.toBinaryString(ch);
if (ch<256)
result.append(("00000000" + binary).substring(binary.length()));
else {
binary = ("0000000000000000" + binary).substring(binary.length());
result.append(binary.substring(0, 8));
result.append(' ');
result.append(binary.substring(8));
}
result.append(' ');
}
return result.toString().trim();
}
These two method can return different results; for example:
toBinary("è") = "11000011 10101000"
toBinary2("è") = "11101000"
I think that because the bytes of è are negative while the corresponding char is not (because char is a 2 byte unsigned integer).
What I want to know is: which of the two approaches is the correct one and why?
Thanks in advance.

Whenever you want to convert text into binary data (or into text representing binary data, as you do here) you have to use some encoding.
Your toBinary uses UTF-8 for that encoding.
Your toBinary2 uses something that's not a standard encoding: it encodes every UTF-16 codepoint * <= 256 in a single byte and all others in 2 bytes. Unfortunately that one is not a useful encoding, since for decoding you'll have to know if a single byte is stand-alone or part of a 2-byte sequence (UTF-8/UTF-16 do that by indicating with the highest-level bits which one it is).
tl;dr toBinary seems correct, toBinary2 will produce output that can't uniquely be decoded back to the original string.
* You might be wondering where the mention of UTF-16 comes from: That's because all String objects in Java are implicitly encoded in UTF-16. So if you use charAt you get UTF-16 codepoints (which just so happen to be equal to the Unicode code number for all characters that fit into the Basic Multilingual Plane).

This code snippet might help.
String s = "Some String";
byte[] bytes = s.getBytes();
StringBuilder binary = new StringBuilder();
for(byte b:bytes){
int val =b;
for(int i=;i<=s.length;i++){
binary.append((val & 128) == 0 ? 0 : 1);
val<<=1;
}
}
System.out.println(" "+s+ "to binary" +binary);

JAVA: failing to get encrypted data in string using xor

I was trying to print encrypted text using string perhaps i was wrong somewhere. I am doing simple xor on a plain text. Coming encrypted text/string i am putting in a C program and doing same xor again to get plain text again.
But in between, I am not able to get proper string of encrypted text to pass in C
String xorencrypt(byte[] passwd,int pass_len){
char[] st = new char[pass_len];
byte[] crypted = new byte[pass_len];
for(int i = 0; i<pass_len;i++){
crypted[i] = (byte) (passwd[i]^(i+1));
st[i] = (char)crypted[i];
System.out.println((char)passwd[i]+" "+passwd[i] +"= " + (char)crypted[i]+" "+crypted[i]);/* characters are printed fine but problem is when i am convering it in to string */
}
return st.toString();
}
I don't know if any kind of encoding also needed because if i did so how I will decode and decrypt from C program.
example if suppose passwd = bond007
then java program should return akkb78>
further C program will decrypt akkb78> to bond007 again.

Use
return new String(crypted);
in that case you don't need st[] array at all.
By the way, the encoded value for bond007 is cmm`560 and not what you posted.
EDIT
While solution above would most likely work in most java environments, to be safe about encoding,
as suggested by Alex, provide encoding parameter to String constructor.
For example if you want your string to carry 8-bit bytes :
return new String(crypted, "ISO-8859-1");
You would need the same parameter when getting bytes from your string :
byte[] bytes = myString.getBytes("ISO-8859-1")
Alternatively, use solution provided by Alex :
return new String(st);
But, convert bytes to chars properly :
st[i] = (char) (crypted[i] & 0xff);
Otherwise, all negative bytes, crypted[i] < 0 will not be converted to char properly and you get surprising results.

Change this line:
return st.toString();
with this
return new String(st);

Error when show UTF8 String with build mode in java

I have a hex string (sA) convert from UTF8 string.
When I convert hex string sA to UTF8 string, I can't show it in form UI with build mode (run file .jar) but when I run with run mode or debug mode UTF8 string can show in form UI.
I use netbeans IDE 7.3.1.
My code below:
public String hexToString(String txtInHex) {
byte[] txtInByte = new byte[txtInHex.length() / 2];
int j = 0;
for (int i = 0; i < txtInHex.length(); i += 2) {
txtInByte[j++] = Byte.parseByte(txtInHex.substring(i, i + 2), 16);
}
return new String(txtInByte);
}
private String asHex(byte[] buf) {
char[] chars = new char[2 * buf.length];
for (int i = 0; i < buf.length; ++i) {
chars[2 * i] = HEX_CHARS[(buf[i] & 0xF0) >>> 4];
chars[2 * i + 1] = HEX_CHARS[buf[i] & 0x0F];
}
return new String(chars);
}

There are multiple problems with this code.
The valid range for byte values is -128 to 127, or -80 to 7F in hex, and Byte.parseByte enforces this. If your asHex method has to process a character whose second byte is greater than 127 it will produce a string that can't be decoded by toHexString.
The asHex method processes only the second byte of the input characters, so it will work correctly only for the first 256 Unicode characters and produce bogus output for the rest of them.
The toHexString method decodes a string from a byte array assuming some platform-specific default encoding, which will give incorrect results if the data was supposedly encoded in UTF-8 and the default encoding is something else.
Why are you trying to create your own methods for encoding and decoding hex strings instead of using a well known and tested library?

new String(txtInByte, "UTF-8");
Without the encoding the platform encoding is taken, for instance Windows-1252. The same holds for its inverse: String.getBytes-
String s = "....";
byte[] b = s.getBytes("UTF-8");

Converting bytearray output to string without value changing in java

I have a byte[] array named byteval in java and if I do System.out.println(byteval), I can read: d3e1547c254ff7cec8dbcef2262b5cf10ec079c7[B#40d150e0
Now I need this what I read there as a string, but if I try to convert it with Byte.toString or a new string constuctor, the value is not the same, most there are some numbers instead.
So how can I get the byte[] array as a String called strval, also cutting off the [B#40d150e0?
Now: System.out.println(byteval)>> d3e1547c254ff7cec8dbcef2262b5cf10ec079c7[B#40d150e0
Goal: System.out.println(strval)>> d3e1547c254ff7cec8dbcef2262b5cf10ec079c7
Lot of thanks!
Danny
EDIT: Working solution for me:
byte[] byteval = getValue();
// Here System.out.println(byteval) is
// d3e1547c254ff7cec8dbcef2262b5cf10ec079c7[B#40d150e0
BigInteger bi = new BigInteger(1, byteval);
String strval = bi.toString(16);
if ((strval.length() % 2) != 0) {
strval = "0" + strval;
}
System.out.println(strval);
// Here the String output is
// d3e1547c254ff7cec8dbcef2262b5cf10ec079c7
Thank all answerer.

just do
System.out.println(byteval.toString())
instead of
System.out.println(byteval)
this will remove the ending part(Actually that is just the address of the object referenced)

Try this:
String str= new String(byteval, "ISO-8859-1");
System.out.println(str);

Issue Decoding for a specific charset

I'm trying to decode a char and get back the same char.
Following is my simple test.
I'm confused, If i have to encode or decode. Tried both. Both print the same result.
Any suggestions are greatly helpful.
char inpData = '†';
String str = Character.toString((char) inpData);
byte b[] = str.getBytes(Charset.forName("MacRoman"));
System.out.println(b[0]); // prints -96
String decData = Integer.toString(b[0]);
CharsetDecoder decoder = Charset.forName("MacRoman").newDecoder();
ByteBuffer inBuffer = ByteBuffer.wrap(decData.getBytes());
CharBuffer result = decoder.decode(inBuffer);
System.out.println(result.toString()); // prints -96, expecting to print †
CharsetEncoder encoder = Charset.forName("MacRoman").newEncoder();
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(decData));
result = decoder.decode(bbuf);
System.out.println(result.toString());// prints -96, expecting to print †
Thank you.

When you do String decData = Integer.toString(b[0]);, you create the string "-96" and that is the string you're encoding/decoding. Not the original char.
You have to change your String back to a byte before.
To get your character back as a char from the -96 you have to do this :
String string = new String(b, "MacRoman");
char specialChar = string.charAt(0);
With this your reversing your first transformation from char -> String -> byte[0] by doing byte[0] -> String -> char[0]
If you have the String "-96", you must change first your string into a byte with :
byte b = Byte.parseByte("-96");

String decData = Integer.toString(b[0]);
This probably gets you the "-96" output in the last two examples. try
String decData = new String(b, "MacRoman");
Apart from that, keep in mind that System.out.println uses your system-charset to print out strings anyway. For a better test, consider writing your Strings to a file using your specific charset with something like
FileOutputStream fos = new FileOutputStream("test.txt");
OutputStreamWriter writer = new OutputStreamWriter(fos, "MacRoman");
writer.write(result.toString());
writer.close();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Convert a byte array from one encoding to another java - java

Related

Java - What is the proper way to convert a UTF-8 String to binary?

JAVA: failing to get encrypted data in string using xor

Error when show UTF8 String with build mode in java

Converting bytearray output to string without value changing in java

Issue Decoding for a specific charset

Categories

Resources