Java - checking encoding of string for unit test?

Java - checking encoding of string for unit test? - java

I have a unit test I was trying to write for a generateKey(int **length**) method. The method:
1. Creates a byte array with size of input parameter length
2. Uses SecureRandom().nextBytes(randomKey) method to populate the byte array with random values
3. Encodes the byte array filled with random values to a UTF-8 String object
4. Re-writes the original byte array (called randomKey) to 0's for security
5. Returns the UTF-8 encoded String
I already have a unit test checking for the user inputting a negative value (i.e. -1) such that the byte array would throw a Negative array size exception.
Would a good positive test case be to check that a UTF-8 encoded String is successfully created? Is there a method I can call on the generated String to check that it equals "UTF-8" encoding?
I can't check that the String equals the same String, since the byte array is filled with random values each time it is called....
source code is here:
public static String generateKey(int length) {
byte[] randomKey = new byte[length];
new SecureRandom().nextBytes(randomKey);
String key = new String(randomKey, Charset.forName("UTF-8"));//Base64.getEncoder().encodeToString(randomKey);
Arrays.fill(randomKey,(byte) 0);
return key;
}

You can convert a UTF8 string to a byte array as below
String str = "私の"; // replace this with your generateKey result
byte [] b = str.getBytes();
String newString;
try {
newString = new String (b, "UTF-8");
System.out.println(newString);
System.out.println("size is equal ? " + (str.length() == newString.length()));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}

First, the code you posted is simply wrong: you can't take a random array of bytes and treat it as a UTF-8 string, because UTF-8 expects certain bit patterns to indicate multi-byte characters.
Unfortunately, this failure happens silently, because you're using a string constructor that "always replaces malformed-input and unmappable-character sequences with this charset's default replacement string". So you'll get something, but you wouldn't be able to translate it back to the same binary value.
The comment in the code, however, gives you the answer: you need to use Base64 to encode the binary value.
However, that still won't let you verify that the encoded string is equivalent to the original byte array, because the function takes great care to zero out the array immediately after use (which doesn't really do what the author thinks it does, but I'm not going to get into that argument).
If you really want to test a method like this, you need to be able to mock out core parts of it. You could, for example, separate out the generation of random bytes from encoding, and then pass in a byte generator that keeps track of the bytes that it generated.
But the real question is why? What are you (or more correctly, the person writing this code) actually trying to accomplish? And why won't a UUID accomplish it?

Related

How to read a byte back in Java?

I need to read in bytes from a file, turn them into a string, do something with the string, then get the bytes back from the string, so I have the following code :
byte[] bFile=readFileBytes(filePath);
StringBuilder massageBuilder=new StringBuilder();
for (int i=0;i<bFile.length;i++) massageBuilder.append(bFile[i]);
String x=massageBuilder.charAt(n)+"";
...
byte b=x.getBytes();
But the last step doesn't get back the byte, what's wrong, I wan to get back the "massageBuilder.charAt(n)" ?

You can't get back to the original bytes given how you're adding them to your string builder.
Take this example:
byte[] bFile = "This is the input string".getBytes();
StringBuilder massageBuilder = new StringBuilder();
for (int i = 0; i < bFile.length; i++)
massageBuilder.append(bFile[i]);
When you print massageBuilder, you get
8410410511532105115321161041013210511011211711632115116114105110103
These become a random sequence of numbers that offers no way of distinguishing original bytes. One or more characters in the resulting string will be linked to a single input byte. Even if you knew the character set of the original text, you'd still have trouble because of ambiguous sequences.
It might be possible if you used a delimiter of some sort...
massageBuilder.append(bFile[i]).append("-");
//84~104~105~115~32~105~115~32~116~104~101~32~105~110~112~117~116~...
In which case you can split by it and rebuild your byte array.

Java String from byte array

I am currently reading in a UDP byte array that I know is a string and I know the MAXIMUM possible length of said string. So I print out a string (which is usually shorter than the max length). I am able to print it out but it prints out the text then junk characters. Is there a way to trim the junk binary data without knowing the actual length of the valid text?
String result = new String(input, Charset.forName("US-ASCII"));
Ill try for those asking for more data. Here is how the UDP message is read:
sock.receive(incoming);
byte[] data = incoming.getData();
String s = new String(data, 0, incoming.getLength());
The UDP message itself will contain a header of fixed size and then a set of data (Max size of 1024 bytes). This data may be int, string, byte etc. This is determined by header data. So depending on the type, i chop the data out based on the appropriate size chunks. The problem I am focusing on is the String type of data. I know that the max size of a string will be 128 bytes per string, so I read that amount in chunks via where dataArray is the byte array.:
for (int i = 0; i < msg.length; i = i + readSize)
{
dataArray = Arrays.copyOfRange(msg, i, i + readSize);
}
Then I use the original code in the first code set in this post to place the data into a string object. Thing is, the text that is usually sent is less than the 128 bytes allocated for max size. So when I print the string, I get the valid text and then whitespace and non-normal ascii characters (junk data). Hope this addition helps.
An example of the output is here. Everything up to the .mof is valid:
https://1drv.ms/i/s!Ai0t7Oj1PUFBpRP9K_2RlocAK4B7

Is there a way to trim the junk binary data without knowing the actual
length of the valid text?
Yes you can simply call trim(), it will remove the trailing null characters. Indeed trim() removes every leading and trailing characters less or equal to \u0020 (aka whitespace) which includes \u0000 (aka null character).
byte[] bytes = "foo bar".getBytes();
// Simulate message with a size bigger than the actual encoded String
byte[] msg = new byte[32];
System.arraycopy(bytes, 0, msg, 0, bytes.length);
// Decode the message
String result = new String(msg, Charset.forName("US-ASCII"));
// Trim the result
System.out.printf("Result: '%s'%n", result.trim());
Output:
Result: 'foo bar'

Ok here is how I was able to get it to work. It's a rather manual method but before using
String result = new String(input, Charset.forName("US-ASCII"));
to combine the byte array into a string, I looked at each byte and made sure it was within the printable range of 0x20 - 0x7e. If not, I replaced the value with a space (0x20). Then finished off with a .trim on the string.

Apache common codec in java from string to hex and viceversa

I am trying to encode a string in hex and then convert it again to string. For this purpose I'm using the apache common codec. In particular I have defined the following methods:
import org.apache.commons.codec.DecoderException;
import org.apache.commons.codec.binary.Hex;
public String toHex(byte[] byteArray){
return Hex.encodeHexString(byteArray);
}
public byte[] fromHex(String hexString){
byte[] array = null;
try {
array = Hex.decodeHex(hexString.toCharArray());
} catch (DecoderException ex) {
Logger.getLogger(SecureHash.class.getName()).log(Level.SEVERE, null, ex);
}
return array;
}
The strange thing is that I do not get the same initial string when converting back. More strangely, the byte array I get, it's different from the initial byte array of the string.
The small test program that I wrote is the following:
String uno = "uno";
byte[] uno_bytes = uno.getBytes();
System.out.println(uno);
System.out.println(uno_bytes);
toHex(uno_bytes);
System.out.println(hexed);
byte [] arr = fromHex(hexed);
System.out.println(arr.toString());
An example of output is the following:
uno #initial string
[B#1afe17b #byte array of the initial string
756e6f #string representation of the hex
[B#34d46a #byte array of the recovered string
There is also another strange behaviour. The byte array ([B#1afe17b) is not fixed, but is different from run to run of the code, but I cannot understand why.

When you print a byte array, the toString() representation does not contain the contents of the array. Instead, it contains a type indicator ([B means byte array) and the hashcode. The hashcode will be different for two distinct byte arrays, even if they contain the same contents. See Object.toString() and Object.hashCode() for further information.
Instead, you maybe want to compare the arrays for equality, using:
System.out.println("Arrays equal: " + Arrays.equals(uno_bytes, arr));

Converting a string to byte[] such that the contents remain same

I have a String say String a = "abc";. Now I want to convert it into a byte array say byte b[];, so that when I print b it should show "abc".
How can I do that?
getBytes() method is giving different result.
My program looks like that so far:
String a="abc";
byte b[]=a.getBytes();
what I want is I have two methods made in a class one is
public byte[] encrypt(String a)
and another is
public String decrypt(byte[] b)
after doing encryption i saved the data into database but when i am getting it back then byte methods are not giving the correct output but i got the same data using String method but now I have to pass it into decrypt(byte[] b)
How to do it this is the real scenario.

Well, your first problem is that a String in Java is not an array of bytes, but of chars, where each of them takes 16bit. This is to cover for unicode characters, instead of only ascii that you'd get with bytes. That means that if you use the getBytes method, you won't be able to print the string one array position at a time, since it takes two array positions (two bytes) to represent one character.
What you could do is use getChars and then cast each char to a byte, with the corresponding precision los. This is not a good idea since it won't work outside of normal English characters! You asked, though, so here you go ;)
EDIT: as #PeterLawerey mentions,Unicode characters make it even harder, with some unicode characters needing more than one char. There's a good discussion in StackOverflow and it links to an detailed article from Oracle.

byte b[]=a.getBytes();
System.out.println(new String(b));

You could use this constructor to build your string back again:
String a="abc";
byte b[]=a.getBytes("UTF-8");
System.out.println(new String(b, "UTF-8"));
Other than that, you can't do System.out.println(b) and expect to see abc.

A byte is value between -128 and 127. When you print it, it will be a number by default.
If you want to print it as an ASCII char, you can cast it to a (char)
byte[] bytes = "abc".getBytes();
for(byte b: bytes)
System.out.println((char) b);
prints
a
b
c

It seems like you are implementing encryption and decryption code.
String constructors are for text data. you should not use it to convert byte array
which contains encrypted data to string value.
You should use base64 instead, which encodes any binary data into ASCII.
this one is good public domain one
http://iharder.sourceforge.net/current/java/base64/
base64 apache commons
http://commons.apache.org/codec/download_codec.cgi
String msg ="abc";
byte[] data = Base64.decode(msg);
String convert = Base64.encodeBytes(data);

This will convert "abc" to byte and then the code will print "abc" in respective ASCII code (ie. 97 98 99).
byte a[]=new byte[160];
String s="abc";
a=s.getBytes();
for(int i=0;i<s.length();i++)
{
System.out.print(a[i]+" ");
}
If you add these lines it will again change the ASCII code to String (ie. abc)
String s1=new String(a);
System.out.print("\n"+s1);
Hope it Helpes.
Modified Code:
To send byte array as an argument:
public static void another_method_name(byte b1[])
{
String s1=new String(b1);
System.out.print("\n"+s1);
}
public static void main(String[] args)
{
byte a[]=new byte[160];
String s="abc";
a=s.getBytes();
for(int i=0;i<s.length();i++)
{
System.out.print(a[i]+" ");
}
another_method_name(a);
}
Hope it helps again.

Java Encode SHA-1 Byte Array

I have an SHA-1 byte array that I would like to use in a GET request. I need to encode this. URLEncoder expects a string, and if I create a string of it and then encode it, it gets corrupt?
To clarify, this is kinda a follow up to another question of mine.
(Bitorrent Tracker Request) I can get the value as a hex string, but that is not recognized by the tracker. On the other hand, encoded answer mark provided return 200 OK.
So I need to convert the hex representation that I got:
9a81333c1b16e4a83c10f3052c1590aadf5e2e20
into encoded form
%9A%813%3C%1B%16%E4%A8%3C%10%F3%05%2C%15%90%AA%DF%5E.%20

Question was edited while I was responding, following is ADDITIONAL code and should work (with my hex conversion code):
//Inefficient, but functional, does not test if input is in hex charset, so somewhat unsafe
//NOT tested, but should be functional
public static String encodeURL(String hexString) throws Exception {
if(hexString==null || hexString.isEmpty()){
return "";
}
if(hexString.length()%2 != 0){
throw new Exception("String is not hex, length NOT divisible by 2: "+hexString);
}
int len = hexString.length();
char[] output = new char[len+len/2];
int i=0;
int j=0;
while(i<len){
output[j++]='%';
output[j++]=hexString.charAt(i++);
output[j++]=hexString.charAt(i++);
}
return new String(output);
}
You'll need to convert the raw bytes to hexadecimal characters or whatever URL-friendly encoding they are using. Base32 or Base64 encodings are possible, but straight hexadecimal characters is the most common. URLEncoder is not needed for this string, because it shouldn't contain any characters that would require URL Encoding to %NN format.
The below will convert bytes for a hash (SHA-1, MD5SUM, etc) to a hexadecimal string:
/** Lookup table: character for a half-byte */
static final char[] CHAR_FOR_BYTE = {'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'};
/** Encode byte data as a hex string... hex chars are UPPERCASE*/
public static String encode(byte[] data){
if(data == null || data.length==0){
return "";
}
char[] store = new char[data.length*2];
for(int i=0; i<data.length; i++){
final int val = (data[i]&0xFF);
final int charLoc=i<<1;
store[charLoc]=CHAR_FOR_BYTE[val>>>4];
store[charLoc+1]=CHAR_FOR_BYTE[val&0x0F];
}
return new String(store);
}
This code is fairly optimized and fast, and I am using it for my own SHA-1 byte encoding. Note that you may need to convert uppercase to lowercase with the String.toLowerCase() method, depending on which form the server accepts.

This depends on what the recipient of your request expects.
I would imagine it could be a hexadecimal representation of the bytes in your hash. A string would probably not be the best idea, because the hash array will most likely contain non-printable character values.
I'd iterate over the array and use Integer.toHexValue() to convert the bytes to hex.

SHA1 is in hex format [0-9a-f], there should be no need to URLEncode it.

Use Apache Commons-Codec for all your encoding/decoding needs (except ASN.1, which is a pain in the ass)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - checking encoding of string for unit test? - java

Related

How to read a byte back in Java?

Java String from byte array

Apache common codec in java from string to hex and viceversa

Converting a string to byte[] such that the contents remain same

Java Encode SHA-1 Byte Array

Categories

Resources