Converting a string to byte[] such that the contents remain same - java

I have a String say String a = "abc";. Now I want to convert it into a byte array say byte b[];, so that when I print b it should show "abc".
How can I do that?
getBytes() method is giving different result.
My program looks like that so far:
String a="abc";
byte b[]=a.getBytes();
what I want is I have two methods made in a class one is
public byte[] encrypt(String a)
and another is
public String decrypt(byte[] b)
after doing encryption i saved the data into database but when i am getting it back then byte methods are not giving the correct output but i got the same data using String method but now I have to pass it into decrypt(byte[] b)
How to do it this is the real scenario.

Well, your first problem is that a String in Java is not an array of bytes, but of chars, where each of them takes 16bit. This is to cover for unicode characters, instead of only ascii that you'd get with bytes. That means that if you use the getBytes method, you won't be able to print the string one array position at a time, since it takes two array positions (two bytes) to represent one character.
What you could do is use getChars and then cast each char to a byte, with the corresponding precision los. This is not a good idea since it won't work outside of normal English characters! You asked, though, so here you go ;)
EDIT: as #PeterLawerey mentions,Unicode characters make it even harder, with some unicode characters needing more than one char. There's a good discussion in StackOverflow and it links to an detailed article from Oracle.

byte b[]=a.getBytes();
System.out.println(new String(b));

You could use this constructor to build your string back again:
String a="abc";
byte b[]=a.getBytes("UTF-8");
System.out.println(new String(b, "UTF-8"));
Other than that, you can't do System.out.println(b) and expect to see abc.

A byte is value between -128 and 127. When you print it, it will be a number by default.
If you want to print it as an ASCII char, you can cast it to a (char)
byte[] bytes = "abc".getBytes();
for(byte b: bytes)
System.out.println((char) b);
prints
a
b
c

It seems like you are implementing encryption and decryption code.
String constructors are for text data. you should not use it to convert byte array
which contains encrypted data to string value.
You should use base64 instead, which encodes any binary data into ASCII.
this one is good public domain one
http://iharder.sourceforge.net/current/java/base64/
base64 apache commons
http://commons.apache.org/codec/download_codec.cgi
String msg ="abc";
byte[] data = Base64.decode(msg);
String convert = Base64.encodeBytes(data);

This will convert "abc" to byte and then the code will print "abc" in respective ASCII code (ie. 97 98 99).
byte a[]=new byte[160];
String s="abc";
a=s.getBytes();
for(int i=0;i<s.length();i++)
{
System.out.print(a[i]+" ");
}
If you add these lines it will again change the ASCII code to String (ie. abc)
String s1=new String(a);
System.out.print("\n"+s1);
Hope it Helpes.
Modified Code:
To send byte array as an argument:
public static void another_method_name(byte b1[])
{
String s1=new String(b1);
System.out.print("\n"+s1);
}
public static void main(String[] args)
{
byte a[]=new byte[160];
String s="abc";
a=s.getBytes();
for(int i=0;i<s.length();i++)
{
System.out.print(a[i]+" ");
}
another_method_name(a);
}
Hope it helps again.

Related

Java - checking encoding of string for unit test?

I have a unit test I was trying to write for a generateKey(int **length**) method. The method:
1. Creates a byte array with size of input parameter length
2. Uses SecureRandom().nextBytes(randomKey) method to populate the byte array with random values
3. Encodes the byte array filled with random values to a UTF-8 String object
4. Re-writes the original byte array (called randomKey) to 0's for security
5. Returns the UTF-8 encoded String
I already have a unit test checking for the user inputting a negative value (i.e. -1) such that the byte array would throw a Negative array size exception.
Would a good positive test case be to check that a UTF-8 encoded String is successfully created? Is there a method I can call on the generated String to check that it equals "UTF-8" encoding?
I can't check that the String equals the same String, since the byte array is filled with random values each time it is called....
source code is here:
public static String generateKey(int length) {
byte[] randomKey = new byte[length];
new SecureRandom().nextBytes(randomKey);
String key = new String(randomKey, Charset.forName("UTF-8"));//Base64.getEncoder().encodeToString(randomKey);
Arrays.fill(randomKey,(byte) 0);
return key;
}
You can convert a UTF8 string to a byte array as below
String str = "私の"; // replace this with your generateKey result
byte [] b = str.getBytes();
String newString;
try {
newString = new String (b, "UTF-8");
System.out.println(newString);
System.out.println("size is equal ? " + (str.length() == newString.length()));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
First, the code you posted is simply wrong: you can't take a random array of bytes and treat it as a UTF-8 string, because UTF-8 expects certain bit patterns to indicate multi-byte characters.
Unfortunately, this failure happens silently, because you're using a string constructor that "always replaces malformed-input and unmappable-character sequences with this charset's default replacement string". So you'll get something, but you wouldn't be able to translate it back to the same binary value.
The comment in the code, however, gives you the answer: you need to use Base64 to encode the binary value.
However, that still won't let you verify that the encoded string is equivalent to the original byte array, because the function takes great care to zero out the array immediately after use (which doesn't really do what the author thinks it does, but I'm not going to get into that argument).
If you really want to test a method like this, you need to be able to mock out core parts of it. You could, for example, separate out the generation of random bytes from encoding, and then pass in a byte generator that keeps track of the bytes that it generated.
But the real question is why? What are you (or more correctly, the person writing this code) actually trying to accomplish? And why won't a UUID accomplish it?

String Hex Encoding and Decoding

I am converting a String from UTF-8 to CP1047 and then performing hex encoding on it, which works great. Next what I am doing is converting back, using decoding the hex String and displaying it on console in UTF-8 format. Problem is I am not getting the proper String what I passed to encoding method. Below is the piece of code I coded:
public class HexEncodeDecode {
public static void main(String[] args) throws UnsupportedEncodingException,
DecoderException {
String reqMsg = "ISO0150000150800C220000080000000040000050000000215102190000000014041615141800001427690161 0B0 000123450041234";
char[] hexed = getHex(reqMsg, "UTF-8", "Cp1047");
System.out.println(hexed);
System.out.println(getString(hexed));
}
public static char[] getHex(String source, String inputCharacterCoding,
String outputCharacterCoding) throws UnsupportedEncodingException {
return Hex.encodeHex(new String(source.getBytes(inputCharacterCoding),
outputCharacterCoding).getBytes(), false);
}
public static String getString(char[] source) throws DecoderException,
UnsupportedEncodingException {
return new String(Hex.decodeHex(source), Charset.forName("UTF-8"));
}
}
Output I am getting is :
C3B1C3AB7CC290C291C295C290C290C290C290C291C295C290C298C290C290C3A41616C290C290C290C290C290C298C290C290C290C290C290C290C290C290C294C290C290C290C290C290C295C290C290C290C290C290C290C29016C291C295C291C29016C291C299C290C290C290C290C290C290C290C290C291C294C290C294C291C296C291C295C291C294C291C298C290C290C290C290C291C2941604C296C299C290C291C296C291C280C290C3A2C290C280C280C280C280C290C290C290C29116C293C294C295C290C290C294C29116C293C294
ñë|äâ
So, need help in printing the input String back.
Expected output would be:
C3B1C3AB7CC290C291C295C290C290C290C290C291C295C290C298C290C290C3A41616C290C290C290C290C290C298C290C290C290C290C290C290C290C290C294C290C290C290C290C290C295C290C290C290C290C290C290C29016C291C295C291C29016C291C299C290C290C290C290C290C290C290C290C291C294C290C294C291C296C291C295C291C294C291C298C290C290C290C290C291C2941604C296C299C290C291C296C291C280C290C3A2C290C280C280C280C280C290C290C290C29116C293C294C295C290C290C294C29116C293C294
ISO0150000150800C220000080000000040000050000000215102190000000014041615141800001427690161 0B0 000123450041234
new String(source.getBytes(inputCharacterCoding), outputCharacterCoding)
.getBytes()
This probably does not do what you think it does.
First things first: a String has no encoding. Repeat after me: a String has no encoding.
A String is simply a sequence of tokens which aim to represent characters. It just happens that for this purpose Java uses a sequence of chars. They could just as well be carrier pigeons.
UTF8, CP1047 and others are just character codings; two operations can be performed:
encoding: turn a stream of carrier pigeons (chars) into a stream of bytes;
decoding: turn a stream of bytes into a stream of carrier pigeons (chars).
Basically, your base assumption is wrong; you cannot associate an encoding with a String. Your real input should be a byte stream (more often than not a byte array) which you know is the result of a particular encoding (in your case, UTF-8), which you want to re-encode using another charset (in your case, CP1047).
The "secret" behing a real answer here would be the code of your Hex.encodeHex() method but you don't show it, so this is as good an answer that I can muster.
reqMsg no longer has an encoding so it's pointless (and damaging) to try to convert if from UTF-8 to "Cp1047".
If reqMsg is going to be coming from an external source in the future, such as from disk or network, then you will have to decode - perhaps this is where the confusion comes from. Perhaps you'll being doing: UTF-8->Unicode(String)->CP1047->HEX. When you write it to stdout, the HEX will likely to be ASCII encoded.
The follow example creates an ASCII hex string based on your original string after conversion to CP1047 (Unicode->CP1047->HEX):
String reqMsg = "ISO0150000150800C220000080000000040000050000000215102190000000014041615141800001427690161 0B0 000123450041234";
// encode to cp1047 represented as Hex
byte[] reqMsqBytes = reqMsg.getBytes("Cp1047");
char[] hex = Hex.encodeHex(reqMsqBytes);
System.out.println(hex);
// decode
String respMsqBytes = new String(Hex.decodeHex(hex), "Cp1047");
System.out.println(respMsqBytes);
A quick fix (though a little ugly) would be to change getString() to:
public static String getString(char[] source) throws DecoderException, UnsupportedEncodingException {
return new String(new String(Hex.decodeHex(source), Charset.forName("UTF-8")).getBytes("Cp1047"),"UTF-8");
}
As fge already mentioned, you switch transforming between chars and bytes, which are different pairs of shoes. So in this quick solution you first get your hex decode assuming UTF-8, then encoding it to a Cp1047 byte array and finally, decode it back to a String by using the UTF-8 charset.
As I already said, this is simply a quick one-liner workaround and not the cleanest solution, as the error is already done during the hex encoding.

Apache common codec in java from string to hex and viceversa

I am trying to encode a string in hex and then convert it again to string. For this purpose I'm using the apache common codec. In particular I have defined the following methods:
import org.apache.commons.codec.DecoderException;
import org.apache.commons.codec.binary.Hex;
public String toHex(byte[] byteArray){
return Hex.encodeHexString(byteArray);
}
public byte[] fromHex(String hexString){
byte[] array = null;
try {
array = Hex.decodeHex(hexString.toCharArray());
} catch (DecoderException ex) {
Logger.getLogger(SecureHash.class.getName()).log(Level.SEVERE, null, ex);
}
return array;
}
The strange thing is that I do not get the same initial string when converting back. More strangely, the byte array I get, it's different from the initial byte array of the string.
The small test program that I wrote is the following:
String uno = "uno";
byte[] uno_bytes = uno.getBytes();
System.out.println(uno);
System.out.println(uno_bytes);
toHex(uno_bytes);
System.out.println(hexed);
byte [] arr = fromHex(hexed);
System.out.println(arr.toString());
An example of output is the following:
uno #initial string
[B#1afe17b #byte array of the initial string
756e6f #string representation of the hex
[B#34d46a #byte array of the recovered string
There is also another strange behaviour. The byte array ([B#1afe17b) is not fixed, but is different from run to run of the code, but I cannot understand why.
When you print a byte array, the toString() representation does not contain the contents of the array. Instead, it contains a type indicator ([B means byte array) and the hashcode. The hashcode will be different for two distinct byte arrays, even if they contain the same contents. See Object.toString() and Object.hashCode() for further information.
Instead, you maybe want to compare the arrays for equality, using:
System.out.println("Arrays equal: " + Arrays.equals(uno_bytes, arr));

Converting binary data to String

If I have some binary data D And I convert it to string S. I expect than on converting it back to binary I will get D. But It's wrong.
public class A {
public static void main(String[] args) throws IOException {
final byte[] bytes = new byte[]{-114, 104, -35};// In hex: 8E 68 DD
System.out.println(bytes.length); //prints 3
System.out.println(new String(bytes, "UTF-8").getBytes("UTF-8").length); //prints 7
}
}
Why does this happens?
Converting between a byte array to a String and back again is not a one-to-one mapping operation. Reading the docs, the String implmentation uses the CharsetDecoder to convert the incoming byte array into unicode. The first and last bytes in your input byte array must not map to a valid unicode character, thus it replaces it with some replacement string.
It's likely that the bytes you're converting to a string don't actually form a valid string. If java can't figure out what you mean by each byte, it will attempt to fix them. This means that when you convert back to the byte array, it won't be the same as when you started. If you try with a valid set of bytes, then you should be more successful.
Your data can't be decoded into valid Unicode characters using UTF-8 encoding. Look at decoded string. It consists of 3 characters: 0xFFFD, 0x0068 and 0xFFFD. First and last are "�" - Unicode replacement characters. I think you need to choose other encoding. I.e. "CP866" produces valid string and converts back into same array.

Java Encode SHA-1 Byte Array

I have an SHA-1 byte array that I would like to use in a GET request. I need to encode this. URLEncoder expects a string, and if I create a string of it and then encode it, it gets corrupt?
To clarify, this is kinda a follow up to another question of mine.
(Bitorrent Tracker Request) I can get the value as a hex string, but that is not recognized by the tracker. On the other hand, encoded answer mark provided return 200 OK.
So I need to convert the hex representation that I got:
9a81333c1b16e4a83c10f3052c1590aadf5e2e20
into encoded form
%9A%813%3C%1B%16%E4%A8%3C%10%F3%05%2C%15%90%AA%DF%5E.%20
Question was edited while I was responding, following is ADDITIONAL code and should work (with my hex conversion code):
//Inefficient, but functional, does not test if input is in hex charset, so somewhat unsafe
//NOT tested, but should be functional
public static String encodeURL(String hexString) throws Exception {
if(hexString==null || hexString.isEmpty()){
return "";
}
if(hexString.length()%2 != 0){
throw new Exception("String is not hex, length NOT divisible by 2: "+hexString);
}
int len = hexString.length();
char[] output = new char[len+len/2];
int i=0;
int j=0;
while(i<len){
output[j++]='%';
output[j++]=hexString.charAt(i++);
output[j++]=hexString.charAt(i++);
}
return new String(output);
}
You'll need to convert the raw bytes to hexadecimal characters or whatever URL-friendly encoding they are using. Base32 or Base64 encodings are possible, but straight hexadecimal characters is the most common. URLEncoder is not needed for this string, because it shouldn't contain any characters that would require URL Encoding to %NN format.
The below will convert bytes for a hash (SHA-1, MD5SUM, etc) to a hexadecimal string:
/** Lookup table: character for a half-byte */
static final char[] CHAR_FOR_BYTE = {'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'};
/** Encode byte data as a hex string... hex chars are UPPERCASE*/
public static String encode(byte[] data){
if(data == null || data.length==0){
return "";
}
char[] store = new char[data.length*2];
for(int i=0; i<data.length; i++){
final int val = (data[i]&0xFF);
final int charLoc=i<<1;
store[charLoc]=CHAR_FOR_BYTE[val>>>4];
store[charLoc+1]=CHAR_FOR_BYTE[val&0x0F];
}
return new String(store);
}
This code is fairly optimized and fast, and I am using it for my own SHA-1 byte encoding. Note that you may need to convert uppercase to lowercase with the String.toLowerCase() method, depending on which form the server accepts.
This depends on what the recipient of your request expects.
I would imagine it could be a hexadecimal representation of the bytes in your hash. A string would probably not be the best idea, because the hash array will most likely contain non-printable character values.
I'd iterate over the array and use Integer.toHexValue() to convert the bytes to hex.
SHA1 is in hex format [0-9a-f], there should be no need to URLEncode it.
Use Apache Commons-Codec for all your encoding/decoding needs (except ASN.1, which is a pain in the ass)

Categories

Resources