Java String from byte array - java

I am currently reading in a UDP byte array that I know is a string and I know the MAXIMUM possible length of said string. So I print out a string (which is usually shorter than the max length). I am able to print it out but it prints out the text then junk characters. Is there a way to trim the junk binary data without knowing the actual length of the valid text?
String result = new String(input, Charset.forName("US-ASCII"));
Ill try for those asking for more data. Here is how the UDP message is read:
sock.receive(incoming);
byte[] data = incoming.getData();
String s = new String(data, 0, incoming.getLength());
The UDP message itself will contain a header of fixed size and then a set of data (Max size of 1024 bytes). This data may be int, string, byte etc. This is determined by header data. So depending on the type, i chop the data out based on the appropriate size chunks. The problem I am focusing on is the String type of data. I know that the max size of a string will be 128 bytes per string, so I read that amount in chunks via where dataArray is the byte array.:
for (int i = 0; i < msg.length; i = i + readSize)
{
dataArray = Arrays.copyOfRange(msg, i, i + readSize);
}
Then I use the original code in the first code set in this post to place the data into a string object. Thing is, the text that is usually sent is less than the 128 bytes allocated for max size. So when I print the string, I get the valid text and then whitespace and non-normal ascii characters (junk data). Hope this addition helps.
An example of the output is here. Everything up to the .mof is valid:
https://1drv.ms/i/s!Ai0t7Oj1PUFBpRP9K_2RlocAK4B7

Is there a way to trim the junk binary data without knowing the actual
length of the valid text?
Yes you can simply call trim(), it will remove the trailing null characters. Indeed trim() removes every leading and trailing characters less or equal to \u0020 (aka whitespace) which includes \u0000 (aka null character).
byte[] bytes = "foo bar".getBytes();
// Simulate message with a size bigger than the actual encoded String
byte[] msg = new byte[32];
System.arraycopy(bytes, 0, msg, 0, bytes.length);
// Decode the message
String result = new String(msg, Charset.forName("US-ASCII"));
// Trim the result
System.out.printf("Result: '%s'%n", result.trim());
Output:
Result: 'foo bar'

Ok here is how I was able to get it to work. It's a rather manual method but before using
String result = new String(input, Charset.forName("US-ASCII"));
to combine the byte array into a string, I looked at each byte and made sure it was within the printable range of 0x20 - 0x7e. If not, I replaced the value with a space (0x20). Then finished off with a .trim on the string.

Related

Java - checking encoding of string for unit test?

I have a unit test I was trying to write for a generateKey(int **length**) method. The method:
1. Creates a byte array with size of input parameter length
2. Uses SecureRandom().nextBytes(randomKey) method to populate the byte array with random values
3. Encodes the byte array filled with random values to a UTF-8 String object
4. Re-writes the original byte array (called randomKey) to 0's for security
5. Returns the UTF-8 encoded String
I already have a unit test checking for the user inputting a negative value (i.e. -1) such that the byte array would throw a Negative array size exception.
Would a good positive test case be to check that a UTF-8 encoded String is successfully created? Is there a method I can call on the generated String to check that it equals "UTF-8" encoding?
I can't check that the String equals the same String, since the byte array is filled with random values each time it is called....
source code is here:
public static String generateKey(int length) {
byte[] randomKey = new byte[length];
new SecureRandom().nextBytes(randomKey);
String key = new String(randomKey, Charset.forName("UTF-8"));//Base64.getEncoder().encodeToString(randomKey);
Arrays.fill(randomKey,(byte) 0);
return key;
}
You can convert a UTF8 string to a byte array as below
String str = "私の"; // replace this with your generateKey result
byte [] b = str.getBytes();
String newString;
try {
newString = new String (b, "UTF-8");
System.out.println(newString);
System.out.println("size is equal ? " + (str.length() == newString.length()));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
First, the code you posted is simply wrong: you can't take a random array of bytes and treat it as a UTF-8 string, because UTF-8 expects certain bit patterns to indicate multi-byte characters.
Unfortunately, this failure happens silently, because you're using a string constructor that "always replaces malformed-input and unmappable-character sequences with this charset's default replacement string". So you'll get something, but you wouldn't be able to translate it back to the same binary value.
The comment in the code, however, gives you the answer: you need to use Base64 to encode the binary value.
However, that still won't let you verify that the encoded string is equivalent to the original byte array, because the function takes great care to zero out the array immediately after use (which doesn't really do what the author thinks it does, but I'm not going to get into that argument).
If you really want to test a method like this, you need to be able to mock out core parts of it. You could, for example, separate out the generation of random bytes from encoding, and then pass in a byte generator that keeps track of the bytes that it generated.
But the real question is why? What are you (or more correctly, the person writing this code) actually trying to accomplish? And why won't a UUID accomplish it?

How to read a byte back in Java?

I need to read in bytes from a file, turn them into a string, do something with the string, then get the bytes back from the string, so I have the following code :
byte[] bFile=readFileBytes(filePath);
StringBuilder massageBuilder=new StringBuilder();
for (int i=0;i<bFile.length;i++) massageBuilder.append(bFile[i]);
String x=massageBuilder.charAt(n)+"";
...
byte b=x.getBytes();
But the last step doesn't get back the byte, what's wrong, I wan to get back the "massageBuilder.charAt(n)" ?
You can't get back to the original bytes given how you're adding them to your string builder.
Take this example:
byte[] bFile = "This is the input string".getBytes();
StringBuilder massageBuilder = new StringBuilder();
for (int i = 0; i < bFile.length; i++)
massageBuilder.append(bFile[i]);
When you print massageBuilder, you get
8410410511532105115321161041013210511011211711632115116114105110103
These become a random sequence of numbers that offers no way of distinguishing original bytes. One or more characters in the resulting string will be linked to a single input byte. Even if you knew the character set of the original text, you'd still have trouble because of ambiguous sequences.
It might be possible if you used a delimiter of some sort...
massageBuilder.append(bFile[i]).append("-");
//84~104~105~115~32~105~115~32~116~104~101~32~105~110~112~117~116~...
In which case you can split by it and rebuild your byte array.

convert A Byte to ASII char in java

I have a Byte array and each byte in the array corresponds to an ASII character(8 bit ASCII character).I am trying the get the whole list of ASII chars from the list.
byte[] data;
ArrayList<Character> qualAr = new ArrayList<>();
for (int i = 0; i < data.length; i++) {
qualAr.add((char)data[i]);
}
The above method,did not print the all ASCII chars properly as many of the chars that was printed contained square boxes and empty space.If the issue is not setting the encoding,then how to set the type of encoding to ASCII in the above method? Most of the examples i saw where of UTF-8.
Update: Thank you all. The problem was not with the encoding. I had found new documentation stating that the values needs to converted using - ASCII+33 and without that the values tried to print the initial ASCII chars which wouldn't make any sense.
Try using the following code:
String dataConverted = new String(data, "UTF-8");
ArrayList<Character> qualAr = new ArrayList<>();
for (char c : dataConverted.toCharArray()) {
qualAr.add(c);
}
I convert your byte array to a String, and then generate the list of characters. ASCII characters should be represented as one byte codes in UTF-8.
Keep in mind that the first 32 or so ASCII characters may render as boxes or blank spaces.
Here is a link to the basic ASCII table.

Java String to byteArray conversion issue

I am trying to encode/decode a ByteArray to String, but input/output are not matching. Am I doing something wrong?
System.out.println(org.apache.commons.codec.binary.Hex.encodeHexString(by));
String s = new String(by, Charsets.UTF_8);
System.out.println(org.apache.commons.codec.binary.Hex.encodeHexString(s.getBytes(Charsets.UTF_8)));
The output is:
130021000061f8f0001a
130021000061efbfbd
Complete code:
String[] arr = {"13", "00", "21", "00", "00", "61", "F8", "F0", "00", "1A"};
byte[] by = new byte[arr.length];
for (int i = 0; i < arr.length; i++) {
by[i] = (byte)(Integer.parseInt(arr[i],16) & 0xff);
}
System.out.println(org.apache.commons.codec.binary.Hex.encodeHexString(by));
String s = new String(by, Charsets.UTF_8);
System.out.println(org.apache.commons.codec.binary.Hex.encodeHexString(s.getBytes(Charsets.UTF_8)));
The problem here is that f8f0001a isn't a valid UTF-8 byte sequence.
First of all, the f8 opening byte denotes a 5 byte sequence and you've only got four. Secondly, f8 can only be followed by a byte of 8x, 9x, ax or bx form.
Therefore it gets replaced with a unicode replacement character (U+FFFD), whose byte sequence in UTF-8 is efbfbd.
And there (rightly) is no guarantee that the conversion of an invalid byte sequence to and from a string will result in the same byte sequence. (Note that even with two, seemingly identical strings, you might get different bytes representing them in Unicode, see Unicode equivalence. )
The moral of the story is: if you want to represent bytes, don't convert them to characters, and if you want to represent text, don't use byte arrays.
My UTF-8 is a bit rusty :-), but the sequence F8 F0 is imho not a valid utf-8 encoding.
Look at http://en.wikipedia.org/wiki/Utf-8#Description.
When you build the String from the array of bytes, the bytes are decoded.
Since the bytes from your code does not represent valid characters, the bytes that finally composes the String are not the same your passed as parameter.
public String(byte[] bytes)
Constructs a new String by decoding the
specified array of bytes using the platform's default charset. The
length of the new String is a function of the charset, and hence may
not be equal to the length of the byte array.
The behavior of this
constructor when the given bytes are not valid in the default charset
is unspecified. The CharsetDecoder class should be used when more
control over the decoding process is required.

Prepend a byte to a byte array

I need to prepend the string "00" or byte 0x00 to the beginning of a byte array? I tried to do it with a for loop but when I convert it to hex it doesn't show up in the front.
The string "00" is different than the number 0x00 when converted to Bytes. What is the data type you are trying to prepend to your byte array? Assuming it's a Byte representation of the string "00", try the following:
bytes[] orig = <your byte array>;
String prepend = "00";
bytes[] prependBytes = prepend.getBytes();
bytes[] output = new Bytes[prependBytes.length + orig.length];
for(i=0;i<prependBytes.length;i++){
output[i] = prependBytes[i];
}
for(i=prependBytes.length;i<(orig.length+prepend.lenth);i++){
output[i] = orig[i];
}
or you can use Arrays.copy(...) instead of the two for loops as mentioned before to do the prepending. See How to combine two byte arrays
Alternativly, if you are trying to literally prepend 0 to your byte array, decalare prependBytes in the following way and use the same algorithm
byte[] prependBytes = new byte[]{0,0};
Also you say that you're converting your byte array to hex, and that may truncate leading zeros. To test this, try prepending the follwoing and converting to hex and see if there is a different output:
byte[] prependBytes = new byte[]{1,1};
If it is removing the leading zeros that you want, you may wish to convert your hex number to a string and format it.

Categories

Resources