Store byte[] in MongoDB using Java - java

I insert a document into a collection with a field of byte[]. When I query the inserted document to get that field, it returns a different byte[]. How do I fix that?
byte[] inputBytes = ...
MongoCollection<Document> collection = _db.getCollection("collectionx");
Document doc = new Document("test", 1).append("val", inputBytes);
collection.insertOne(doc.getDocument());
MongoCursor<Document> result = collection.find(eq("test", 1)).iterator();
Document retrived_doc = cursor.next();
cursor.close();
byte[] outputBytes = ((Binary)retrived_doc.get("val")).getData();
// inputBytes = [B#719f369d
// outputBytes = [B#7b70cec2

The problem is not your code but how you check if both arrays - input and output array - are equal. It seems you are just comparing the results of calling toString() on both results. But toString() is not overridden for array types, so it is actually Object.toString() which just returns the type and hash code of an object:
The toString method for class Object returns a string consisting of
the name of the class of which the object is an instance, the at-sign
character `#', and the unsigned hexadecimal representation of the hash
code of the object. In other words, this method returns a string equal
to the value of:
getClass().getName() + '#' + Integer.toHexString(hashCode())
So [B#719f369d means: 'Array of bytes' ([B) with hash code 0x719f369d. It has nothing to do with the array content.
In your example, input and output arrays are two different objects, hence they have different memory addresses and hash codes (due to the fact, that hashCode() is also not overridden for array types).
Solution
If you want to compare the contents of two byte arrays, call Arrays.equals(byte[], byte[]).
If you want to print the content of a byte array, call Arrays.toString(byte[]) to convert the content into a human readable String.

MongoDB has support org.bson.types.Binary type
You can use:
BasicDBObject("_id", Binary(session.getIp().getAddress()))
the binary comparisons will work

You can encode byte array and store it in doc also decode it after extraction to retrieve original.
static String encodeBase64String(byte[] binaryData)
Encodes binary data using the base64 algorithm but does not chunk the output.
static byte[] decodeBase64(String base64String)
Decodes a Base64 String into octets.
Please refer this link - https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html

Related

Java - checking encoding of string for unit test?

I have a unit test I was trying to write for a generateKey(int **length**) method. The method:
1. Creates a byte array with size of input parameter length
2. Uses SecureRandom().nextBytes(randomKey) method to populate the byte array with random values
3. Encodes the byte array filled with random values to a UTF-8 String object
4. Re-writes the original byte array (called randomKey) to 0's for security
5. Returns the UTF-8 encoded String
I already have a unit test checking for the user inputting a negative value (i.e. -1) such that the byte array would throw a Negative array size exception.
Would a good positive test case be to check that a UTF-8 encoded String is successfully created? Is there a method I can call on the generated String to check that it equals "UTF-8" encoding?
I can't check that the String equals the same String, since the byte array is filled with random values each time it is called....
source code is here:
public static String generateKey(int length) {
byte[] randomKey = new byte[length];
new SecureRandom().nextBytes(randomKey);
String key = new String(randomKey, Charset.forName("UTF-8"));//Base64.getEncoder().encodeToString(randomKey);
Arrays.fill(randomKey,(byte) 0);
return key;
}
You can convert a UTF8 string to a byte array as below
String str = "私の"; // replace this with your generateKey result
byte [] b = str.getBytes();
String newString;
try {
newString = new String (b, "UTF-8");
System.out.println(newString);
System.out.println("size is equal ? " + (str.length() == newString.length()));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
First, the code you posted is simply wrong: you can't take a random array of bytes and treat it as a UTF-8 string, because UTF-8 expects certain bit patterns to indicate multi-byte characters.
Unfortunately, this failure happens silently, because you're using a string constructor that "always replaces malformed-input and unmappable-character sequences with this charset's default replacement string". So you'll get something, but you wouldn't be able to translate it back to the same binary value.
The comment in the code, however, gives you the answer: you need to use Base64 to encode the binary value.
However, that still won't let you verify that the encoded string is equivalent to the original byte array, because the function takes great care to zero out the array immediately after use (which doesn't really do what the author thinks it does, but I'm not going to get into that argument).
If you really want to test a method like this, you need to be able to mock out core parts of it. You could, for example, separate out the generation of random bytes from encoding, and then pass in a byte generator that keeps track of the bytes that it generated.
But the real question is why? What are you (or more correctly, the person writing this code) actually trying to accomplish? And why won't a UUID accomplish it?

Transform String value to bytearray java

I have a string which contains the byte array's String value for example like this [B#42031498
I want to retrieve the String content as byte[] ? How can I do that ?
PS : converting the string with String.getBytes() method doesn't work . It converts the string but doesn't give me the value as byte array. It works like this.
If It's is not possible is there a way to get byte[] from Object in java (and always without converting)
converting the string with String.getBytes() method doesn't work . It converts the string but doesn't give me the value as byte array.
Yes it does.
You have two problems:
you try and print the array directly; you should use Arrays.toString(), otherwise the .toString() method of the array itself is called;
you don't specify the encoding; you should really use .getBytes(StandardCharsets.UTF_8) to have the same result on all environments.
In the same manner, building a string from a byte array should be done using the correct encoding: new String(array, StandardCharsets.UTF_8).
if [B#42031498 has already been saved into a String, there is no way you can get this back to the originating byte array. Look at the following example:
String str = "[B#42031498";
byte[] ba = str.getBytes();
String s = new String(ba);
System.out.println(s);
This will output [B#42031498
What you have done:
byte[] array = ....
String result = array.toString();
What you (probably) want:
String result = new String(array, "UTF-8");
Iterate the byte array as below and you will get the byte value :
for (byte b : bytes) {
System.out.println(b);
}
The output B#42031498 you get because of Object class toString() method .
public String toString()
{
return getClass().getName() + "#" + Integer.toHexString(hashCode());
}

Apache common codec in java from string to hex and viceversa

I am trying to encode a string in hex and then convert it again to string. For this purpose I'm using the apache common codec. In particular I have defined the following methods:
import org.apache.commons.codec.DecoderException;
import org.apache.commons.codec.binary.Hex;
public String toHex(byte[] byteArray){
return Hex.encodeHexString(byteArray);
}
public byte[] fromHex(String hexString){
byte[] array = null;
try {
array = Hex.decodeHex(hexString.toCharArray());
} catch (DecoderException ex) {
Logger.getLogger(SecureHash.class.getName()).log(Level.SEVERE, null, ex);
}
return array;
}
The strange thing is that I do not get the same initial string when converting back. More strangely, the byte array I get, it's different from the initial byte array of the string.
The small test program that I wrote is the following:
String uno = "uno";
byte[] uno_bytes = uno.getBytes();
System.out.println(uno);
System.out.println(uno_bytes);
toHex(uno_bytes);
System.out.println(hexed);
byte [] arr = fromHex(hexed);
System.out.println(arr.toString());
An example of output is the following:
uno #initial string
[B#1afe17b #byte array of the initial string
756e6f #string representation of the hex
[B#34d46a #byte array of the recovered string
There is also another strange behaviour. The byte array ([B#1afe17b) is not fixed, but is different from run to run of the code, but I cannot understand why.
When you print a byte array, the toString() representation does not contain the contents of the array. Instead, it contains a type indicator ([B means byte array) and the hashcode. The hashcode will be different for two distinct byte arrays, even if they contain the same contents. See Object.toString() and Object.hashCode() for further information.
Instead, you maybe want to compare the arrays for equality, using:
System.out.println("Arrays equal: " + Arrays.equals(uno_bytes, arr));

Get same unique id for same string

Is there a way to find the same integer/long or any digit number equivalent to a given string using java.
e.g If i give a string "Java_programming" it should give me always something like "7287272" digit.
The generated digit/number should be unique i.e. it should always generate "123" for "xyz" not "123" for "abc".
Call the hashCode method of your String object.
I.e :
String t = "Java_programming";
String t2 = "Java_programming";
System.out.println(t.hashCode());
System.out.println(t2.hashCode());
Gives :
748582308
748582308
Using hashCode in this case will meet your requirements (as you formuled it) but be careful, two different String can produce the SAME hashCode value ! (see #Dirk example)
What are your requirements? You could just use a new BigInteger("somestring".toBytes("UTF-8")).toString() to convert the string to a number, but will it do what you want?
Why don't you create a SHA-1 out of the string and use that as a key?
static HashFunction hashFunction = Hashing.sha1();
public static byte[] getHash(final String string) {
HashCode hashCode = hashFunction.newHasher().putBytes(string.getBytes).hash();
return hashCode.asBytes();
}
You can then do Bytes.toInt(hash) or Bytes.toLong(hash)
Yes, it is possible. You can use String.hashCode() method on string.
Here you find more details how integer returned by this method is created
It depends if you want that 2 different String have to give always 2 different identifiers.
String.hashCode() can do what you want, the same string will give always the same ID but 2 different strings can also give the same ID.
If you want a unique identifier you can i.e. concatenate each byte character value from your string.
First take your string and convert it to a sequence of bytes:
String a = "hello";
byte[] b = a.getBytes();
Now convert the bytes to a numeric representation, using BigInteger
BigInteger c = new BigInteger(b);
Finally, convert the BigInteger back to a string using its toString() method
String d = c.toString();
You are guaranteed to get the same output for the same input, and different outputs for different inputs. You can combine all of these into one step by doing
String d = new BigInteger(a.getBytes()).toString();

Java Encode SHA-1 Byte Array

I have an SHA-1 byte array that I would like to use in a GET request. I need to encode this. URLEncoder expects a string, and if I create a string of it and then encode it, it gets corrupt?
To clarify, this is kinda a follow up to another question of mine.
(Bitorrent Tracker Request) I can get the value as a hex string, but that is not recognized by the tracker. On the other hand, encoded answer mark provided return 200 OK.
So I need to convert the hex representation that I got:
9a81333c1b16e4a83c10f3052c1590aadf5e2e20
into encoded form
%9A%813%3C%1B%16%E4%A8%3C%10%F3%05%2C%15%90%AA%DF%5E.%20
Question was edited while I was responding, following is ADDITIONAL code and should work (with my hex conversion code):
//Inefficient, but functional, does not test if input is in hex charset, so somewhat unsafe
//NOT tested, but should be functional
public static String encodeURL(String hexString) throws Exception {
if(hexString==null || hexString.isEmpty()){
return "";
}
if(hexString.length()%2 != 0){
throw new Exception("String is not hex, length NOT divisible by 2: "+hexString);
}
int len = hexString.length();
char[] output = new char[len+len/2];
int i=0;
int j=0;
while(i<len){
output[j++]='%';
output[j++]=hexString.charAt(i++);
output[j++]=hexString.charAt(i++);
}
return new String(output);
}
You'll need to convert the raw bytes to hexadecimal characters or whatever URL-friendly encoding they are using. Base32 or Base64 encodings are possible, but straight hexadecimal characters is the most common. URLEncoder is not needed for this string, because it shouldn't contain any characters that would require URL Encoding to %NN format.
The below will convert bytes for a hash (SHA-1, MD5SUM, etc) to a hexadecimal string:
/** Lookup table: character for a half-byte */
static final char[] CHAR_FOR_BYTE = {'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'};
/** Encode byte data as a hex string... hex chars are UPPERCASE*/
public static String encode(byte[] data){
if(data == null || data.length==0){
return "";
}
char[] store = new char[data.length*2];
for(int i=0; i<data.length; i++){
final int val = (data[i]&0xFF);
final int charLoc=i<<1;
store[charLoc]=CHAR_FOR_BYTE[val>>>4];
store[charLoc+1]=CHAR_FOR_BYTE[val&0x0F];
}
return new String(store);
}
This code is fairly optimized and fast, and I am using it for my own SHA-1 byte encoding. Note that you may need to convert uppercase to lowercase with the String.toLowerCase() method, depending on which form the server accepts.
This depends on what the recipient of your request expects.
I would imagine it could be a hexadecimal representation of the bytes in your hash. A string would probably not be the best idea, because the hash array will most likely contain non-printable character values.
I'd iterate over the array and use Integer.toHexValue() to convert the bytes to hex.
SHA1 is in hex format [0-9a-f], there should be no need to URLEncode it.
Use Apache Commons-Codec for all your encoding/decoding needs (except ASN.1, which is a pain in the ass)

Categories

Resources