How to convert Strings to and from UTF8 byte arrays in Java - java

In Java, I have a String and I want to encode it as a byte array (in UTF8, or some other encoding). Alternately, I have a byte array (in some known encoding) and I want to convert it into a Java String. How do I do these conversions?

Convert from String to byte[]:
String s = "some text here";
byte[] b = s.getBytes(StandardCharsets.UTF_8);
Convert from byte[] to String:
byte[] b = {(byte) 99, (byte)97, (byte)116};
String s = new String(b, StandardCharsets.US_ASCII);
You should, of course, use the correct encoding name. My examples used US-ASCII and UTF-8, two commonly-used encodings.

Here's a solution that avoids performing the Charset lookup for every conversion:
import java.nio.charset.Charset;
private final Charset UTF8_CHARSET = Charset.forName("UTF-8");
String decodeUTF8(byte[] bytes) {
return new String(bytes, UTF8_CHARSET);
}
byte[] encodeUTF8(String string) {
return string.getBytes(UTF8_CHARSET);
}

String original = "hello world";
byte[] utf8Bytes = original.getBytes("UTF-8");

You can convert directly via the String(byte[], String) constructor and getBytes(String) method. Java exposes available character sets via the Charset class. The JDK documentation lists supported encodings.
90% of the time, such conversions are performed on streams, so you'd use the Reader/Writer classes. You would not incrementally decode using the String methods on arbitrary byte streams - you would leave yourself open to bugs involving multibyte characters.

My tomcat7 implementation is accepting strings as ISO-8859-1; despite the content-type of the HTTP request. The following solution worked for me when trying to correctly interpret characters like 'é' .
byte[] b1 = szP1.getBytes("ISO-8859-1");
System.out.println(b1.toString());
String szUT8 = new String(b1, "UTF-8");
System.out.println(szUT8);
When trying to interpret the string as US-ASCII, the byte info wasn't correctly interpreted.
b1 = szP1.getBytes("US-ASCII");
System.out.println(b1.toString());

As an alternative, StringUtils from Apache Commons can be used.
byte[] bytes = {(byte) 1};
String convertedString = StringUtils.newStringUtf8(bytes);
or
String myString = "example";
byte[] convertedBytes = StringUtils.getBytesUtf8(myString);
If you have non-standard charset, you can use getBytesUnchecked() or newString() accordingly.

I can't comment but don't want to start a new thread. But this isn't working. A simple round trip:
byte[] b = new byte[]{ 0, 0, 0, -127 }; // 0x00000081
String s = new String(b,StandardCharsets.UTF_8); // UTF8 = 0x0000, 0x0000, 0x0000, 0xfffd
b = s.getBytes(StandardCharsets.UTF_8); // [0, 0, 0, -17, -65, -67] 0x000000efbfbd != 0x00000081
I'd need b[] the same array before and after encoding which it isn't (this referrers to the first answer).

For decoding a series of bytes to a normal string message I finally got it working with UTF-8 encoding with this code:
/* Convert a list of UTF-8 numbers to a normal String
* Usefull for decoding a jms message that is delivered as a sequence of bytes instead of plain text
*/
public String convertUtf8NumbersToString(String[] numbers){
int length = numbers.length;
byte[] data = new byte[length];
for(int i = 0; i< length; i++){
data[i] = Byte.parseByte(numbers[i]);
}
return new String(data, Charset.forName("UTF-8"));
}

If you are using 7-bit ASCII or ISO-8859-1 (an amazingly common format) then you don't have to create a new java.lang.String at all. It's much much more performant to simply cast the byte into char:
Full working example:
for (byte b : new byte[] { 43, 45, (byte) 215, (byte) 247 }) {
char c = (char) b;
System.out.print(c);
}
If you are not using extended-characters like Ä, Æ, Å, Ç, Ï, Ê and can be sure that the only transmitted values are of the first 128 Unicode characters, then this code will also work for UTF-8 and extended ASCII (like cp-1252).

Charset UTF8_CHARSET = Charset.forName("UTF-8");
String strISO = "{\"name\":\"א\"}";
System.out.println(strISO);
byte[] b = strISO.getBytes();
for (byte c: b) {
System.out.print("[" + c + "]");
}
String str = new String(b, UTF8_CHARSET);
System.out.println(str);

Reader reader = new BufferedReader(
new InputStreamReader(
new ByteArrayInputStream(
string.getBytes(StandardCharsets.UTF_8)), StandardCharsets.UTF_8));

//query is your json
DefaultHttpClient httpClient = new DefaultHttpClient();
HttpPost postRequest = new HttpPost("http://my.site/test/v1/product/search?qy=");
StringEntity input = new StringEntity(query, "UTF-8");
input.setContentType("application/json");
postRequest.setEntity(input);
HttpResponse response=response = httpClient.execute(postRequest);

terribly late but i just encountered this issue and this is my fix:
private static String removeNonUtf8CompliantCharacters( final String inString ) {
if (null == inString ) return null;
byte[] byteArr = inString.getBytes();
for ( int i=0; i < byteArr.length; i++ ) {
byte ch= byteArr[i];
// remove any characters outside the valid UTF-8 range as well as all control characters
// except tabs and new lines
if ( !( (ch > 31 && ch < 253 ) || ch == '\t' || ch == '\n' || ch == '\r') ) {
byteArr[i]=' ';
}
}
return new String( byteArr );
}

Related

Java - What is the proper way to convert a UTF-8 String to binary?

I'm using this code to convert a UTF-8 String to binary:
public String toBinary(String str) {
byte[] buf = str.getBytes(StandardCharsets.UTF_8);
StringBuilder result = new StringBuilder();
for (int i = 0; i < buf.length; i++) {
int ch = (int) buf[i];
String binary = Integer.toBinaryString(ch);
result.append(("00000000" + binary).substring(binary.length()));
result.append(' ');
}
return result.toString().trim();
}
Before I was using this code:
private String toBinary2(String str) {
StringBuilder result = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
int ch = (int) str.charAt(i);
String binary = Integer.toBinaryString(ch);
if (ch<256)
result.append(("00000000" + binary).substring(binary.length()));
else {
binary = ("0000000000000000" + binary).substring(binary.length());
result.append(binary.substring(0, 8));
result.append(' ');
result.append(binary.substring(8));
}
result.append(' ');
}
return result.toString().trim();
}
These two method can return different results; for example:
toBinary("è") = "11000011 10101000"
toBinary2("è") = "11101000"
I think that because the bytes of è are negative while the corresponding char is not (because char is a 2 byte unsigned integer).
What I want to know is: which of the two approaches is the correct one and why?
Thanks in advance.
Whenever you want to convert text into binary data (or into text representing binary data, as you do here) you have to use some encoding.
Your toBinary uses UTF-8 for that encoding.
Your toBinary2 uses something that's not a standard encoding: it encodes every UTF-16 codepoint * <= 256 in a single byte and all others in 2 bytes. Unfortunately that one is not a useful encoding, since for decoding you'll have to know if a single byte is stand-alone or part of a 2-byte sequence (UTF-8/UTF-16 do that by indicating with the highest-level bits which one it is).
tl;dr toBinary seems correct, toBinary2 will produce output that can't uniquely be decoded back to the original string.
* You might be wondering where the mention of UTF-16 comes from: That's because all String objects in Java are implicitly encoded in UTF-16. So if you use charAt you get UTF-16 codepoints (which just so happen to be equal to the Unicode code number for all characters that fit into the Basic Multilingual Plane).
This code snippet might help.
String s = "Some String";
byte[] bytes = s.getBytes();
StringBuilder binary = new StringBuilder();
for(byte b:bytes){
int val =b;
for(int i=;i<=s.length;i++){
binary.append((val & 128) == 0 ? 0 : 1);
val<<=1;
}
}
System.out.println(" "+s+ "to binary" +binary);

Convert byte to string in Java

I use below code to convert byte to string:
System.out.println("string " + Byte.toString((byte)0x63));
Why it print "string 99".
How to modify to let it print "string c"?
System.out.println(new String(new byte[]{ (byte)0x63 }, "US-ASCII"));
Note especially that converting bytes to Strings always involves an encoding. If you do not specify it, you'll be using the platform default encoding, which means the code can break when running in different environments.
The string ctor is suitable for this conversion:
System.out.println("string " + new String(new byte[] {0x63}));
Use char instead of byte:
System.out.println("string " + (char)0x63);
Or if you want to be a Unicode puritan, you use codepoints:
System.out.println("string " + new String(new int[]{ 0x63 }, 0, 1));
And if you like the old skool US-ASCII "every byte is a character" idea:
System.out.println("string " + new String(new byte[]{ (byte)0x63 },
StandardCharsets.US_ASCII));
Avoid using the String(byte[]) constructor recommended in other answers; it relies on the default charset. Circumstances could arise where 0x63 actually isn't the character c.
You can use printf:
System.out.printf("string %c\n", 0x63);
You can as well create a String with such formatting, using String#format:
String s = String.format("string %c", 0x63);
you can use
the character equivalent to 0x63 is 'c' but byte equivalent to it is 99
System.out.println("byte "+(char)0x63);
You have to construct a new string out of a byte array. The first element in your byteArray should be 0x63. If you want to add any more letters, make the byteArray longer and add them to the next indices.
byte[] byteArray = new byte[1];
byteArray[0] = 0x63;
try {
System.out.println("string " + new String(byteArray, "US-ASCII"));
} catch (UnsupportedEncodingException e) {
// TODO: Handle exception.
e.printStackTrace();
}
Note that specifying the encoding will eventually throw an UnsupportedEncodingException and you must handle that accordingly.
If it's a single byte, just cast the byte to a char and it should work out to be fine i.e. give a char entity corresponding to the codepoint value of the given byte. If not, use the String constructor as mentioned elsewhere.
char ch = (char)0x63;
System.out.println(ch);
String str = "0x63";
int temp = Integer.parseInt(str.substring(2, 4), 16);
char c = (char)temp;
System.out.print(c);
This is my version:
public String convertBytestoString(InputStream inputStream)
{
int bytes;
byte[] buffer = new byte[1024];
bytes = inputStream.read(buffer);
String stringData = new String(buffer,0,bytes);
return stringData;
}
Using StringBuilder class in Java:
StringBuilder str = new StringBuilder();
for (byte aByte : bytesArray) {
if (aByte != 0) {
str.append((char) aByte);
} else {
break;
}

Converting char array into byte array and back again

I'm looking to convert a Java char array to a byte array without creating an intermediate String, as the char array contains a password. I've looked up a couple of methods, but they all seem to fail:
char[] password = "password".toCharArray();
byte[] passwordBytes1 = new byte[password.length*2];
ByteBuffer.wrap(passwordBytes1).asCharBuffer().put(password);
byte[] passwordBytes2 = new byte[password.length*2];
for(int i=0; i<password.length; i++) {
passwordBytes2[2*i] = (byte) ((password[i]&0xFF00)>>8);
passwordBytes2[2*i+1] = (byte) (password[i]&0x00FF);
}
String passwordAsString = new String(password);
String passwordBytes1AsString = new String(passwordBytes1);
String passwordBytes2AsString = new String(passwordBytes2);
System.out.println(passwordAsString);
System.out.println(passwordBytes1AsString);
System.out.println(passwordBytes2AsString);
assertTrue(passwordAsString.equals(passwordBytes1) || passwordAsString.equals(passwordBytes2));
The assertion always fails (and, critically, when the code is used in production, the password is rejected), yet the print statements print out password three times. Why are passwordBytes1AsString and passwordBytes2AsString different from passwordAsString, yet appear identical? Am I missing out a null terminator or something? What can I do to make the conversion and unconversion work?
Conversion between char and byte is character set encoding and decoding.I prefer to make it as clear as possible in code. It doesn't really mean extra code volume:
Charset latin1Charset = Charset.forName("ISO-8859-1");
charBuffer = latin1Charset.decode(ByteBuffer.wrap(byteArray)); // also decode to String
byteBuffer = latin1Charset.encode(charBuffer); // also decode from String
Aside:
java.nio classes and java.io Reader/Writer classes use ByteBuffer & CharBuffer (which use byte[] and char[] as backing arrays). So often preferable if you use these classes directly. However, you can always do:
byteArray = ByteBuffer.array(); byteBuffer = ByteBuffer.wrap(byteArray);
byteBuffer.get(byteArray); charBuffer.put(charArray);
charArray = CharBuffer.array(); charBuffer = ByteBuffer.wrap(charArray);
charBuffer.get(charArray); charBuffer.put(charArray);
Original Answer
public byte[] charsToBytes(char[] chars){
Charset charset = Charset.forName("UTF-8");
ByteBuffer byteBuffer = charset.encode(CharBuffer.wrap(chars));
return Arrays.copyOf(byteBuffer.array(), byteBuffer.limit());
}
public char[] bytesToChars(byte[] bytes){
Charset charset = Charset.forName("UTF-8");
CharBuffer charBuffer = charset.decode(ByteBuffer.wrap(bytes));
return Arrays.copyOf(charBuffer.array(), charBuffer.limit());
}
Edited to use StandardCharsets
public byte[] charsToBytes(char[] chars)
{
final ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode(CharBuffer.wrap(chars));
return Arrays.copyOf(byteBuffer.array(), byteBuffer.limit());
}
public char[] bytesToChars(byte[] bytes)
{
final CharBuffer charBuffer = StandardCharsets.UTF_8.decode(ByteBuffer.wrap(bytes));
return Arrays.copyOf(charBuffer.array(), charBuffer.limit());
}
Here is a JavaDoc page for StandardCharsets.
Note this on the JavaDoc page:
These charsets are guaranteed to be available on every implementation of the Java platform.
The problem is your use of the String(byte[]) constructor, which uses the platform default encoding. That's almost never what you should be doing - if you pass in "UTF-16" as the character encoding to work, your tests will probably pass. Currently I suspect that passwordBytes1AsString and passwordBytes2AsString are each 16 characters long, with every other character being U+0000.
I would do is use a loop to convert to bytes and another to conver back to char.
char[] chars = "password".toCharArray();
byte[] bytes = new byte[chars.length*2];
for(int i=0;i<chars.length;i++) {
bytes[i*2] = (byte) (chars[i] >> 8);
bytes[i*2+1] = (byte) chars[i];
}
char[] chars2 = new char[bytes.length/2];
for(int i=0;i<chars2.length;i++)
chars2[i] = (char) ((bytes[i*2] << 8) + (bytes[i*2+1] & 0xFF));
String password = new String(chars2);
If you want to use a ByteBuffer and CharBuffer, don't do the simple .asCharBuffer(), which simply does an UTF-16 (LE or BE, depending on your system - you can set the byte-order with the order method) conversion (since the Java Strings and thus your char[] internally uses this encoding).
Use Charset.forName(charsetName), and then its encode or decode method, or the newEncoder /newDecoder.
When converting your byte[] to String, you also should indicate the encoding (and it should be the same one).
This is an extension to Peter Lawrey's answer. In order to backward (bytes-to-chars) conversion work correctly for the whole range of chars, the code should be as follows:
char[] chars = new char[bytes.length/2];
for (int i = 0; i < chars.length; i++) {
chars[i] = (char) (((bytes[i*2] & 0xff) << 8) + (bytes[i*2+1] & 0xff));
}
We need to "unsign" bytes before using (& 0xff). Otherwise half of the all possible char values will not get back correctly. For instance, chars within [0x80..0xff] range will be affected.
You should make use of getBytes() instead of toCharArray()
Replace the line
char[] password = "password".toCharArray();
with
byte[] password = "password".getBytes();
When you use GetBytes From a String in Java, The return result will depend on the default encode of your computer setting.(eg: StandardCharsetsUTF-8 or StandardCharsets.ISO_8859_1etc...).
So, whenever you want to getBytes from a String Object. Make sure to give a encode . like :
String sample = "abc";
Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_8);
Let check what has happened with the code.
In java, the String named sample , is stored by Unicode. every char in String stored by 2 byte.
sample : value: "abc" in Memory(Hex): 00 61 00 62 00 63
a -> 00 61
b -> 00 62
c -> 00 63
But, When we getBytes From a String, we have
Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_8)
//result is : 61 62 63
//length: 3 bytes
Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_16BE)
//result is : 00 61 00 62 00 63
//length: 6 bytes
In order to get the oringle byte of the String. We can just read the Memory of the string and get Each byte of the String.Below is the sample Code:
public static byte[] charArray2ByteArray(char[] chars){
int length = chars.length;
byte[] result = new byte[length*2+2];
int i = 0;
for(int j = 0 ;j<chars.length;j++){
result[i++] = (byte)( (chars[j] & 0xFF00) >> 8 );
result[i++] = (byte)((chars[j] & 0x00FF)) ;
}
return result;
}
Usages:
String sample = "abc";
//First get the chars of the String,each char has two bytes(Java).
Char[] sample_chars = sample.toCharArray();
//Get the bytes
byte[] result = charArray2ByteArray(sample_chars).
//Back to String.
//Make sure we use UTF_16BE. Because we read the memory of Unicode of
//the String from Left to right. That's the same reading
//sequece of UTF-16BE.
String sample_back= new String(result , StandardCharsets.UTF_16BE);

Issue Decoding for a specific charset

I'm trying to decode a char and get back the same char.
Following is my simple test.
I'm confused, If i have to encode or decode. Tried both. Both print the same result.
Any suggestions are greatly helpful.
char inpData = '†';
String str = Character.toString((char) inpData);
byte b[] = str.getBytes(Charset.forName("MacRoman"));
System.out.println(b[0]); // prints -96
String decData = Integer.toString(b[0]);
CharsetDecoder decoder = Charset.forName("MacRoman").newDecoder();
ByteBuffer inBuffer = ByteBuffer.wrap(decData.getBytes());
CharBuffer result = decoder.decode(inBuffer);
System.out.println(result.toString()); // prints -96, expecting to print †
CharsetEncoder encoder = Charset.forName("MacRoman").newEncoder();
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(decData));
result = decoder.decode(bbuf);
System.out.println(result.toString());// prints -96, expecting to print †
Thank you.
When you do String decData = Integer.toString(b[0]);, you create the string "-96" and that is the string you're encoding/decoding. Not the original char.
You have to change your String back to a byte before.
To get your character back as a char from the -96 you have to do this :
String string = new String(b, "MacRoman");
char specialChar = string.charAt(0);
With this your reversing your first transformation from char -> String -> byte[0] by doing byte[0] -> String -> char[0]
If you have the String "-96", you must change first your string into a byte with :
byte b = Byte.parseByte("-96");
String decData = Integer.toString(b[0]);
This probably gets you the "-96" output in the last two examples. try
String decData = new String(b, "MacRoman");
Apart from that, keep in mind that System.out.println uses your system-charset to print out strings anyway. For a better test, consider writing your Strings to a file using your specific charset with something like
FileOutputStream fos = new FileOutputStream("test.txt");
OutputStreamWriter writer = new OutputStreamWriter(fos, "MacRoman");
writer.write(result.toString());
writer.close();

How to convert byte array to string and vice versa?

I have to convert a byte array to string in Android, but my byte array contains negative values.
If I convert that string again to byte array, values I am getting are different from original byte array values.
What can I do to get proper conversion? Code I am using to do the conversion is as follows:
// Code to convert byte arr to str:
byte[] by_original = {0,1,-2,3,-4,-5,6};
String str1 = new String(by_original);
System.out.println("str1 >> "+str1);
// Code to convert str to byte arr:
byte[] by_new = str1.getBytes();
for(int i=0;i<by_new.length;i++)
System.out.println("by1["+i+"] >> "+str1);
I am stuck in this problem.
Your byte array must have some encoding. The encoding cannot be ASCII if you've got negative values. Once you figure that out, you can convert a set of bytes to a String using:
byte[] bytes = {...}
String str = new String(bytes, StandardCharsets.UTF_8); // for UTF-8 encoding
There are a bunch of encodings you can use, look at the supported encodings in the Oracle javadocs.
The "proper conversion" between byte[] and String is to explicitly state the encoding you want to use. If you start with a byte[] and it does not in fact contain text data, there is no "proper conversion". Strings are for text, byte[] is for binary data, and the only really sensible thing to do is to avoid converting between them unless you absolutely have to.
If you really must use a String to hold binary data then the safest way is to use Base64 encoding.
The root problem is (I think) that you are unwittingly using a character set for which:
bytes != encode(decode(bytes))
in some cases. UTF-8 is an example of such a character set. Specifically, certain sequences of bytes are not valid encodings in UTF-8. If the UTF-8 decoder encounters one of these sequences, it is liable to discard the offending bytes or decode them as the Unicode codepoint for "no such character". Naturally, when you then try to encode the characters as bytes the result will be different.
The solution is:
Be explicit about the character encoding you are using; i.e. use a String constructor and String.toByteArray method with an explicit charset.
Use the right character set for your byte data ... or alternatively one (such as "Latin-1" where all byte sequences map to valid Unicode characters.
If your bytes are (really) binary data and you want to be able to transmit / receive them over a "text based" channel, use something like Base64 encoding ... which is designed for this purpose.
For Java, the most common character sets are in java.nio.charset.StandardCharsets. If you are encoding a string that can contain any Unicode character value then UTF-8 encoding (UTF_8) is recommended.
If you want a 1:1 mapping in Java then you can use ISO Latin Alphabet No. 1 - more commonly just called "Latin 1" or simply "Latin" (ISO_8859_1). Note that Latin-1 in Java is the IANA version of Latin-1 which assigns characters to all possible 256 values including control blocks C0 and C1. These are not printable: you won't see them in any output.
From Java 8 onwards Java contains java.util.Base64 for Base64 encoding / decoding. For URL-safe encoding you may want to to use Base64.getUrlEncoder instead of the standard encoder. This class is also present in Android since Android Oreo (8), API level 26.
We just need to construct a new String with the array: http://www.mkyong.com/java/how-do-convert-byte-array-to-string-in-java/
String s = new String(bytes);
The bytes of the resulting string differs depending on what charset you use. new String(bytes) and new String(bytes, Charset.forName("utf-8")) and new String(bytes, Charset.forName("utf-16")) will all have different byte arrays when you call String#getBytes() (depending on the default charset)
Using new String(byOriginal) and converting back to byte[] using getBytes() doesn't guarantee two byte[] with equal values. This is due to a call to StringCoding.encode(..) which will encode the String to Charset.defaultCharset(). During this encoding, the encoder might choose to replace unknown characters and do other changes. Hence, using String.getBytes() might not return an equal array as you've originally passed to the constructor.
Why was the problem: As someone already specified:
If you start with a byte[] and it does not in fact contain text data, there is no "proper conversion". Strings are for text, byte[] is for binary data, and the only really sensible thing to do is to avoid converting between them unless you absolutely have to.
I was observing this problem when I was trying to create byte[] from a pdf file and then converting it to String and then taking the String as input and converting back to file.
So make sure your encoding and decoding logic is same as I did. I explicitly encoded the byte[] to Base64 and decoded it to create the file again.
Use-case:
Due to some limitation I was trying to sent byte[] in request(POST) and the process was as follows:
PDF File >> Base64.encodeBase64(byte[]) >> String >> Send in request(POST) >> receive String >> Base64.decodeBase64(byte[]) >> create binary
Try this and this worked for me..
File file = new File("filePath");
byte[] byteArray = new byte[(int) file.length()];
try {
FileInputStream fileInputStream = new FileInputStream(file);
fileInputStream.read(byteArray);
String byteArrayStr= new String(Base64.encodeBase64(byteArray));
FileOutputStream fos = new FileOutputStream("newFilePath");
fos.write(Base64.decodeBase64(byteArrayStr.getBytes()));
fos.close();
}
catch (FileNotFoundException e) {
System.out.println("File Not Found.");
e.printStackTrace();
}
catch (IOException e1) {
System.out.println("Error Reading The File.");
e1.printStackTrace();
}
Even though
new String(bytes, "UTF-8")
is correct it throws a UnsupportedEncodingException which forces you to deal with a checked exception. You can use as an alternative another constructor since Java 1.6 to convert a byte array into a String:
new String(bytes, StandardCharsets.UTF_8)
This one does not throw any exception.
Converting back should be also done with StandardCharsets.UTF_8:
"test".getBytes(StandardCharsets.UTF_8)
Again you avoid having to deal with checked exceptions.
private static String toHexadecimal(byte[] digest){
String hash = "";
for(byte aux : digest) {
int b = aux & 0xff;
if (Integer.toHexString(b).length() == 1) hash += "0";
hash += Integer.toHexString(b);
}
return hash;
}
I did notice something that is not in any of the answers. You can cast each of the bytes in the byte array to characters, and put them in a char array. Then the string is new String(cbuf) where cbuf is the char array. To convert back, loop through the string casting each of the chars to bytes to put into a byte array, and this byte array will be the same as the first.
public class StringByteArrTest {
public static void main(String[] args) {
// put whatever byte array here
byte[] arr = new byte[] {-12, -100, -49, 100, -63, 0, -90};
for (byte b: arr) System.out.println(b);
// put data into this char array
char[] cbuf = new char[arr.length];
for (int i = 0; i < arr.length; i++) {
cbuf[i] = (char) arr[i];
}
// this is the string
String s = new String(cbuf);
System.out.println(s);
// converting back
byte[] out = new byte[s.length()];
for (int i = 0; i < s.length(); i++) {
out[i] = (byte) s.charAt(i);
}
for (byte b: out) System.out.println(b);
}
}
This works fine for me:
String cd = "Holding some value";
Converting from string to byte[]:
byte[] cookie = new sun.misc.BASE64Decoder().decodeBuffer(cd);
Converting from byte[] to string:
cd = new sun.misc.BASE64Encoder().encode(cookie);
Following is the sample code safely converts byte array to String and String to byte array back.
byte bytesArray[] = { 1, -2, 4, -5, 10};
String encoded = java.util.Base64.getEncoder().encodeToString(bytesArray);
byte[] decoded = java.util.Base64.getDecoder().decode(encoded);
System.out.println("input: "+Arrays.toString(bytesArray));
System.out.println("encoded: "+encoded);
System.out.println("decoded: "+Arrays.toString(decoded));
Output:
input: [1, -2, 4, -5, 10]
encoded: Af4E+wo=
decoded: [1, -2, 4, -5, 10]
javax.xml.bind.DatatypeConverter should do it:
byte [] b = javax.xml.bind.DatatypeConverter.parseHexBinary("E62DB");
String s = javax.xml.bind.DatatypeConverter.printHexBinary(b);
byte[] bytes = "Techie Delight".getBytes();
// System.out.println(Arrays.toString(bytes));
// Create a string from the byte array without specifying
// character encoding
String string = new String(bytes);
System.out.println(string);
Heres a few methods that convert an array of bytes to a string. I've tested them they work well.
public String getStringFromByteArray(byte[] settingsData) {
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(settingsData);
Reader reader = new BufferedReader(new InputStreamReader(byteArrayInputStream));
StringBuilder sb = new StringBuilder();
int byteChar;
try {
while((byteChar = reader.read()) != -1) {
sb.append((char) byteChar);
}
}
catch(IOException e) {
e.printStackTrace();
}
return sb.toString();
}
public String getStringFromByteArray(byte[] settingsData) {
StringBuilder sb = new StringBuilder();
for(byte willBeChar: settingsData) {
sb.append((char) willBeChar);
}
return sb.toString();
}
While base64 encoding is safe and one could argue "the right answer", I arrived here looking for a way to convert a Java byte array to/from a Java String as-is. That is, where each member of the byte array remains intact in its String counterpart, with no extra space required for encoding/transport.
This answer describing 8bit transparent encodings was very helpful for me. I used ISO-8859-1 on terabytes of binary data to convert back and forth successfully (binary <-> String) without the inflated space requirements needed for a base64 encoding, so is safe for my use-case - YMMV.
This was also helpful in explaining when/if you should experiment.
Using Kotlin on Android I found out it is handy to create some simple extension functions for that purpose.
Solution based on Base64 encoding/decoding to be able to pass via JSON, XML, etc:
import android.util.Base64
fun ByteArray.encodeToString() = String(Base64.encode(this, Base64.NO_WRAP), Charsets.UTF_8)
fun String.decodeToBytes(): ByteArray = Base64.decode(toByteArray(Charsets.UTF_8), Base64.NO_WRAP)
So you can use it
val byteArray = byteArrayOf(0, 1, 2, -1, -2, -3)
val string = byteArray.encodeToString()
val restoredArray = string.decodeToBytes()
import sun.misc.BASE64Decoder;
import sun.misc.BASE64Encoder;
private static String base64Encode(byte[] bytes)
{
return new BASE64Encoder().encode(bytes);
}
private static byte[] base64Decode(String s) throws IOException
{
return new BASE64Decoder().decodeBuffer(s);
}
I succeeded converting byte array to a string with this method:
public static String byteArrayToString(byte[] data){
String response = Arrays.toString(data);
String[] byteValues = response.substring(1, response.length() - 1).split(",");
byte[] bytes = new byte[byteValues.length];
for (int i=0, len=bytes.length; i<len; i++) {
bytes[i] = Byte.parseByte(byteValues[i].trim());
}
String str = new String(bytes);
return str.toLowerCase();
}
This one works for me up to android Q:
You can use the following method to convert o hex string to string
public static String hexToString(String hex) {
StringBuilder sb = new StringBuilder();
char[] hexData = hex.toCharArray();
for (int count = 0; count < hexData.length - 1; count += 2) {
int firstDigit = Character.digit(hexData[count], 16);
int lastDigit = Character.digit(hexData[count + 1], 16);
int decimal = firstDigit * 16 + lastDigit;
sb.append((char)decimal);
}
return sb.toString();
}
with the following to convert a byte array to a hex string
public static String bytesToHex(byte[] bytes) {
char[] hexChars = new char[bytes.length * 2];
for (int j = 0; j < bytes.length; j++) {
int v = bytes[j] & 0xFF;
hexChars[j * 2] = hexArray[v >>> 4];
hexChars[j * 2 + 1] = hexArray[v & 0x0F];
}
return new String(hexChars);
}
Here the working code.
// Encode byte array into string . TemplateBuffer1 is my bytearry variable.
String finger_buffer = Base64.encodeToString(templateBuffer1, Base64.DEFAULT);
Log.d(TAG, "Captured biometric device->" + finger_buffer);
// Decode String into Byte Array. decodedString is my bytearray[]
decodedString = Base64.decode(finger_buffer, Base64.DEFAULT);
You can use simple for loop for conversion:
public void byteArrToString(){
byte[] b = {'a','b','$'};
String str = "";
for(int i=0; i<b.length; i++){
char c = (char) b[i];
str+=c;
}
System.out.println(str);
}
byte[] image = {...};
String imageString = Base64.encodeToString(image, Base64.NO_WRAP);
Try to specify an 8-bit charset in both conversions. ISO-8859-1 for instance.
Read the bytes from String using ByteArrayInputStream and wrap it with BufferedReader which is Char Stream instead of Byte Stream which converts the byte data to String.
package com.cs.sajal;
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
public class TestCls {
public static void main(String[] args) {
String s=new String("Sajal is a good boy");
try
{
ByteArrayInputStream bis;
bis=new ByteArrayInputStream(s.getBytes("UTF-8"));
BufferedReader br=new BufferedReader(new InputStreamReader(bis));
System.out.println(br.readLine());
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
Output is:
Sajal is a good boy
You can do the following to convert byte array to string and then convert that string to byte array:
// 1. convert byte array to string and then string to byte array
// convert byte array to string
byte[] by_original = {0, 1, -2, 3, -4, -5, 6};
String str1 = Arrays.toString(by_original);
System.out.println(str1); // output: [0, 1, -2, 3, -4, -5, 6]
// convert string to byte array
String newString = str1.substring(1, str1.length()-1);
String[] stringArray = newString.split(", ");
byte[] by_new = new byte[stringArray.length];
for(int i=0; i<stringArray.length; i++) {
by_new[i] = (byte) Integer.parseInt(stringArray[i]);
}
System.out.println(Arrays.toString(by_new)); // output: [0, 1, -2, 3, -4, -5, 6]
But to convert the string to byte array and then convert that byte array to string, below approach can be used:
// 2. convert string to byte array and then byte array to string
// convert string to byte array
String str2 = "[0, 1, -2, 3, -4, -5, 6]";
byte[] byteStr2 = str2.getBytes(StandardCharsets.UTF_8);
// Now byteStr2 is [91, 48, 44, 32, 49, 44, 32, 45, 50, 44, 32, 51, 44, 32, 45, 52, 44, 32, 45, 53, 44, 32, 54, 93]
// convert byte array to string
System.out.println(new String(byteStr2, StandardCharsets.UTF_8)); // output: [0, 1, -2, 3, -4, -5, 6]
A string is a collection of char's (16bit unsigned). So if you are going to convert negative numbers into a string, they'll be lost in translation.
public class byteString {
/**
* #param args
*/
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
String msg = "Hello";
byte[] buff = new byte[1024];
buff = msg.getBytes("UTF-8");
System.out.println(buff);
String m = new String(buff);
System.out.println(m);
}
}

Categories

Resources