Error when show UTF8 String with build mode in java

Error when show UTF8 String with build mode in java - java

I have a hex string (sA) convert from UTF8 string.
When I convert hex string sA to UTF8 string, I can't show it in form UI with build mode (run file .jar) but when I run with run mode or debug mode UTF8 string can show in form UI.
I use netbeans IDE 7.3.1.
My code below:
public String hexToString(String txtInHex) {
byte[] txtInByte = new byte[txtInHex.length() / 2];
int j = 0;
for (int i = 0; i < txtInHex.length(); i += 2) {
txtInByte[j++] = Byte.parseByte(txtInHex.substring(i, i + 2), 16);
}
return new String(txtInByte);
}
private String asHex(byte[] buf) {
char[] chars = new char[2 * buf.length];
for (int i = 0; i < buf.length; ++i) {
chars[2 * i] = HEX_CHARS[(buf[i] & 0xF0) >>> 4];
chars[2 * i + 1] = HEX_CHARS[buf[i] & 0x0F];
}
return new String(chars);
}

There are multiple problems with this code.
The valid range for byte values is -128 to 127, or -80 to 7F in hex, and Byte.parseByte enforces this. If your asHex method has to process a character whose second byte is greater than 127 it will produce a string that can't be decoded by toHexString.
The asHex method processes only the second byte of the input characters, so it will work correctly only for the first 256 Unicode characters and produce bogus output for the rest of them.
The toHexString method decodes a string from a byte array assuming some platform-specific default encoding, which will give incorrect results if the data was supposedly encoded in UTF-8 and the default encoding is something else.
Why are you trying to create your own methods for encoding and decoding hex strings instead of using a well known and tested library?

new String(txtInByte, "UTF-8");
Without the encoding the platform encoding is taken, for instance Windows-1252. The same holds for its inverse: String.getBytes-
String s = "....";
byte[] b = s.getBytes("UTF-8");

Related

Java - What is the proper way to convert a UTF-8 String to binary?

I'm using this code to convert a UTF-8 String to binary:
public String toBinary(String str) {
byte[] buf = str.getBytes(StandardCharsets.UTF_8);
StringBuilder result = new StringBuilder();
for (int i = 0; i < buf.length; i++) {
int ch = (int) buf[i];
String binary = Integer.toBinaryString(ch);
result.append(("00000000" + binary).substring(binary.length()));
result.append(' ');
}
return result.toString().trim();
}
Before I was using this code:
private String toBinary2(String str) {
StringBuilder result = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
int ch = (int) str.charAt(i);
String binary = Integer.toBinaryString(ch);
if (ch<256)
result.append(("00000000" + binary).substring(binary.length()));
else {
binary = ("0000000000000000" + binary).substring(binary.length());
result.append(binary.substring(0, 8));
result.append(' ');
result.append(binary.substring(8));
}
result.append(' ');
}
return result.toString().trim();
}
These two method can return different results; for example:
toBinary("è") = "11000011 10101000"
toBinary2("è") = "11101000"
I think that because the bytes of è are negative while the corresponding char is not (because char is a 2 byte unsigned integer).
What I want to know is: which of the two approaches is the correct one and why?
Thanks in advance.

Whenever you want to convert text into binary data (or into text representing binary data, as you do here) you have to use some encoding.
Your toBinary uses UTF-8 for that encoding.
Your toBinary2 uses something that's not a standard encoding: it encodes every UTF-16 codepoint * <= 256 in a single byte and all others in 2 bytes. Unfortunately that one is not a useful encoding, since for decoding you'll have to know if a single byte is stand-alone or part of a 2-byte sequence (UTF-8/UTF-16 do that by indicating with the highest-level bits which one it is).
tl;dr toBinary seems correct, toBinary2 will produce output that can't uniquely be decoded back to the original string.
* You might be wondering where the mention of UTF-16 comes from: That's because all String objects in Java are implicitly encoded in UTF-16. So if you use charAt you get UTF-16 codepoints (which just so happen to be equal to the Unicode code number for all characters that fit into the Basic Multilingual Plane).

This code snippet might help.
String s = "Some String";
byte[] bytes = s.getBytes();
StringBuilder binary = new StringBuilder();
for(byte b:bytes){
int val =b;
for(int i=;i<=s.length;i++){
binary.append((val & 128) == 0 ? 0 : 1);
val<<=1;
}
}
System.out.println(" "+s+ "to binary" +binary);

Long values in objects not serialized and deserialized properly when using proto3 in Java

I am trying to serialize and deserialize an object in java using proto3. Here is what my object in proto looks like
option java_multiple_files = true;
option java_package = "com.project.dataModel";
option java_outer_classname = "FlowProto";
// The request message containing the user's name.
message Flow {
string subscriberIMSEI = 1;
string destinationIP = 2;
uint64 txBytes = 3;
uint64 rxBytes = 4;
uint64 txPkts = 5;
uint64 rxPkts = 6;
uint64 startTimeInMillis = 7;
uint64 endTimeInMillis = 8;
string asnNumber = 9;
string asnName = 10;
string asnCountryCode = 11;
}
Here is how my serialization and deserialzation in java looks like
public class Test {
public static void main(String[] args) throws Exception {
Flow flow =
Flow.newBuilder().setAsnName("abc")
.setEndTimeInMillis(123456789L)
.setStartTimeInMillis(123456789L)
.setDestinationIP("1.1.1.1")
.setTxBytes(1L)
.setRxBytes(1L)
.setTxPkts(1L)
.setRxPkts(1L)
.setAsnName("blah")
.setAsnCountryCode("blah")
.build();
byte[] flowByteArray = flow.toByteArray();
String flowString = flow.toByteString().toStringUtf8();
System.out.println("Parsed from ByteArray:" + Flow.parseFrom(flowByteArray).getEndTimeInMillis());
System.out.println("Parsed from ByteString:" + Flow.parseFrom(ByteString.copyFromUtf8(flowString))
.getEndTimeInMillis());
}
}
My output is as follows
Parsed from ByteArray:123456789
Parsed from ByteString:-4791902657223630865
Where am I going wrong when I am trying to go the ByteString and the utf-8 route for serialization and deserialization?
Thanks!

The reason why you are seeing an issue is because your serialized byte array is being corrupted. This happens because UTF-8 is a variable length encoding and converting to a UTF-8 string changes the bytes in your original array. When you are doing flow.toByteString().toStringUtf8() one byte in the original bytestring may be transformed into three new bytes with different values. Then when you do ByteString.copyFromUtf8(flowString) the byte changes are not undone since that line of code effectively just retrieves the transformed UTF-8 bytes, not the original bytes you put in.
Here is a small test that illustrates the issue you are seeing
#Test
public void byteConsistency() {
byte[] vals = new byte[] {0, 110, -1};
ByteString original = ByteString.copyFrom(vals);
ByteString newString = ByteString.copyFromUtf8(original.toStringUtf8());
for (int index = 0; index < newString.size(); index++) {
System.out.println(newString.byteAt(index));
}
}
You would expect this code to output
0
110
-1
But it actually outputs
0
110
-17
-65
-67
That's because UTF-8 likely dictates that a -1 (0xFF) byte should be encoded as three bytes [-17, -65, -67].
In summary, when dealing with protobuf don't convert serialized objects into UTF-8 strings. Only use the raw bytes for serialization and deserialization. If you try converting to UTF-8 strings the serialized bytes will become corrupted and you will not be able to deserialize them.

JAVA: failing to get encrypted data in string using xor

I was trying to print encrypted text using string perhaps i was wrong somewhere. I am doing simple xor on a plain text. Coming encrypted text/string i am putting in a C program and doing same xor again to get plain text again.
But in between, I am not able to get proper string of encrypted text to pass in C
String xorencrypt(byte[] passwd,int pass_len){
char[] st = new char[pass_len];
byte[] crypted = new byte[pass_len];
for(int i = 0; i<pass_len;i++){
crypted[i] = (byte) (passwd[i]^(i+1));
st[i] = (char)crypted[i];
System.out.println((char)passwd[i]+" "+passwd[i] +"= " + (char)crypted[i]+" "+crypted[i]);/* characters are printed fine but problem is when i am convering it in to string */
}
return st.toString();
}
I don't know if any kind of encoding also needed because if i did so how I will decode and decrypt from C program.
example if suppose passwd = bond007
then java program should return akkb78>
further C program will decrypt akkb78> to bond007 again.

Use
return new String(crypted);
in that case you don't need st[] array at all.
By the way, the encoded value for bond007 is cmm`560 and not what you posted.
EDIT
While solution above would most likely work in most java environments, to be safe about encoding,
as suggested by Alex, provide encoding parameter to String constructor.
For example if you want your string to carry 8-bit bytes :
return new String(crypted, "ISO-8859-1");
You would need the same parameter when getting bytes from your string :
byte[] bytes = myString.getBytes("ISO-8859-1")
Alternatively, use solution provided by Alex :
return new String(st);
But, convert bytes to chars properly :
st[i] = (char) (crypted[i] & 0xff);
Otherwise, all negative bytes, crypted[i] < 0 will not be converted to char properly and you get surprising results.

Change this line:
return st.toString();
with this
return new String(st);

Extract hexadecimal values from a percent encoded URL

Let's say for example i have URL containing the following percent encoded character : %80
It is obviously not an ascii character.
How would it be possible to convert this value to the corresponding hex string in Java.
i tried the following with no luck.Result should be 80.
public static void main(String[] args) {
System.out.print(byteArrayToHexString(URLDecoder.decode("%80","UTF-8").getBytes()));
}
public static String byteArrayToHexString(byte[] bytes)
{
StringBuffer buffer = new StringBuffer();
for(int i=0; i<bytes.length; i++)
{
if(((int)bytes[i] & 0xff) < 0x10)
buffer.append("0");
buffer.append(Long.toString((int) bytes[i] & 0xff, 16));
}
return buffer.toString();
}

The best way to deal with this is to parse the url using either java.net.URL or java.net.URI, and then use the relevant getters to extract the components that you require. These will take care of decoding any %-encoded portions in the appropriate fashion.
The problem with your current idea is that %80 does not represent "80", or 80. Rather it represents a byte that further needs to be interpreted in the context of the character encoding of the URL. And if the encoding is UTF-8, then the %80 needs to be followed by one or two more %-encoded bytes ... otherwise this is a malformed UTF-8 character representation.

I don't really see what you are trying. However, I'll give it a try.
When you have got this String: "%80" and you want to got the string "80", you can use this:
String str = "%80";
String hex = str.substring(1); // Cut off the '%'
If you are trying to extract the value 0x80 (which is 128 in decimal) out of it:
String str = "%80";
String hex = str.substring(1); // Cut off the '%'
int value = Integer.parseInt(hex, 16);
If you are trying to convert an int to its hexadecimal representation use this:
String hexRepresenation = Integer.toString(value, 16);

Converting char array into byte array and back again

I'm looking to convert a Java char array to a byte array without creating an intermediate String, as the char array contains a password. I've looked up a couple of methods, but they all seem to fail:
char[] password = "password".toCharArray();
byte[] passwordBytes1 = new byte[password.length*2];
ByteBuffer.wrap(passwordBytes1).asCharBuffer().put(password);
byte[] passwordBytes2 = new byte[password.length*2];
for(int i=0; i<password.length; i++) {
passwordBytes2[2*i] = (byte) ((password[i]&0xFF00)>>8);
passwordBytes2[2*i+1] = (byte) (password[i]&0x00FF);
}
String passwordAsString = new String(password);
String passwordBytes1AsString = new String(passwordBytes1);
String passwordBytes2AsString = new String(passwordBytes2);
System.out.println(passwordAsString);
System.out.println(passwordBytes1AsString);
System.out.println(passwordBytes2AsString);
assertTrue(passwordAsString.equals(passwordBytes1) || passwordAsString.equals(passwordBytes2));
The assertion always fails (and, critically, when the code is used in production, the password is rejected), yet the print statements print out password three times. Why are passwordBytes1AsString and passwordBytes2AsString different from passwordAsString, yet appear identical? Am I missing out a null terminator or something? What can I do to make the conversion and unconversion work?

Conversion between char and byte is character set encoding and decoding.I prefer to make it as clear as possible in code. It doesn't really mean extra code volume:
Charset latin1Charset = Charset.forName("ISO-8859-1");
charBuffer = latin1Charset.decode(ByteBuffer.wrap(byteArray)); // also decode to String
byteBuffer = latin1Charset.encode(charBuffer); // also decode from String
Aside:
java.nio classes and java.io Reader/Writer classes use ByteBuffer & CharBuffer (which use byte[] and char[] as backing arrays). So often preferable if you use these classes directly. However, you can always do:
byteArray = ByteBuffer.array(); byteBuffer = ByteBuffer.wrap(byteArray);
byteBuffer.get(byteArray); charBuffer.put(charArray);
charArray = CharBuffer.array(); charBuffer = ByteBuffer.wrap(charArray);
charBuffer.get(charArray); charBuffer.put(charArray);

Original Answer
public byte[] charsToBytes(char[] chars){
Charset charset = Charset.forName("UTF-8");
ByteBuffer byteBuffer = charset.encode(CharBuffer.wrap(chars));
return Arrays.copyOf(byteBuffer.array(), byteBuffer.limit());
}
public char[] bytesToChars(byte[] bytes){
Charset charset = Charset.forName("UTF-8");
CharBuffer charBuffer = charset.decode(ByteBuffer.wrap(bytes));
return Arrays.copyOf(charBuffer.array(), charBuffer.limit());
}
Edited to use StandardCharsets
public byte[] charsToBytes(char[] chars)
{
final ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode(CharBuffer.wrap(chars));
return Arrays.copyOf(byteBuffer.array(), byteBuffer.limit());
}
public char[] bytesToChars(byte[] bytes)
{
final CharBuffer charBuffer = StandardCharsets.UTF_8.decode(ByteBuffer.wrap(bytes));
return Arrays.copyOf(charBuffer.array(), charBuffer.limit());
}
Here is a JavaDoc page for StandardCharsets.
Note this on the JavaDoc page:
These charsets are guaranteed to be available on every implementation of the Java platform.

The problem is your use of the String(byte[]) constructor, which uses the platform default encoding. That's almost never what you should be doing - if you pass in "UTF-16" as the character encoding to work, your tests will probably pass. Currently I suspect that passwordBytes1AsString and passwordBytes2AsString are each 16 characters long, with every other character being U+0000.

I would do is use a loop to convert to bytes and another to conver back to char.
char[] chars = "password".toCharArray();
byte[] bytes = new byte[chars.length*2];
for(int i=0;i<chars.length;i++) {
bytes[i*2] = (byte) (chars[i] >> 8);
bytes[i*2+1] = (byte) chars[i];
}
char[] chars2 = new char[bytes.length/2];
for(int i=0;i<chars2.length;i++)
chars2[i] = (char) ((bytes[i*2] << 8) + (bytes[i*2+1] & 0xFF));
String password = new String(chars2);

If you want to use a ByteBuffer and CharBuffer, don't do the simple .asCharBuffer(), which simply does an UTF-16 (LE or BE, depending on your system - you can set the byte-order with the order method) conversion (since the Java Strings and thus your char[] internally uses this encoding).
Use Charset.forName(charsetName), and then its encode or decode method, or the newEncoder /newDecoder.
When converting your byte[] to String, you also should indicate the encoding (and it should be the same one).

This is an extension to Peter Lawrey's answer. In order to backward (bytes-to-chars) conversion work correctly for the whole range of chars, the code should be as follows:
char[] chars = new char[bytes.length/2];
for (int i = 0; i < chars.length; i++) {
chars[i] = (char) (((bytes[i*2] & 0xff) << 8) + (bytes[i*2+1] & 0xff));
}
We need to "unsign" bytes before using (& 0xff). Otherwise half of the all possible char values will not get back correctly. For instance, chars within [0x80..0xff] range will be affected.

You should make use of getBytes() instead of toCharArray()
Replace the line
char[] password = "password".toCharArray();
with
byte[] password = "password".getBytes();

When you use GetBytes From a String in Java, The return result will depend on the default encode of your computer setting.(eg: StandardCharsetsUTF-8 or StandardCharsets.ISO_8859_1etc...).
So, whenever you want to getBytes from a String Object. Make sure to give a encode . like :
String sample = "abc";
Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_8);
Let check what has happened with the code.
In java, the String named sample , is stored by Unicode. every char in String stored by 2 byte.
sample : value: "abc" in Memory(Hex): 00 61 00 62 00 63
a -> 00 61
b -> 00 62
c -> 00 63
But, When we getBytes From a String, we have
Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_8)
//result is : 61 62 63
//length: 3 bytes
Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_16BE)
//result is : 00 61 00 62 00 63
//length: 6 bytes
In order to get the oringle byte of the String. We can just read the Memory of the string and get Each byte of the String.Below is the sample Code:
public static byte[] charArray2ByteArray(char[] chars){
int length = chars.length;
byte[] result = new byte[length*2+2];
int i = 0;
for(int j = 0 ;j<chars.length;j++){
result[i++] = (byte)( (chars[j] & 0xFF00) >> 8 );
result[i++] = (byte)((chars[j] & 0x00FF)) ;
}
return result;
}
Usages:
String sample = "abc";
//First get the chars of the String,each char has two bytes(Java).
Char[] sample_chars = sample.toCharArray();
//Get the bytes
byte[] result = charArray2ByteArray(sample_chars).
//Back to String.
//Make sure we use UTF_16BE. Because we read the memory of Unicode of
//the String from Left to right. That's the same reading
//sequece of UTF-16BE.
String sample_back= new String(result , StandardCharsets.UTF_16BE);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Error when show UTF8 String with build mode in java - java

new String(txtInByte, "UTF-8"); Without the encoding the platform encoding is taken, for instance Windows-1252. The same holds for its inverse: String.getBytes- String s = "...."; byte[] b = s.getBytes("UTF-8");

Related

Java - What is the proper way to convert a UTF-8 String to binary?

Long values in objects not serialized and deserialized properly when using proto3 in Java

JAVA: failing to get encrypted data in string using xor

Extract hexadecimal values from a percent encoded URL

Converting char array into byte array and back again

Categories

Resources