How to convert byte array to string and vice versa? - java

I have to convert a byte array to string in Android, but my byte array contains negative values.
If I convert that string again to byte array, values I am getting are different from original byte array values.
What can I do to get proper conversion? Code I am using to do the conversion is as follows:
// Code to convert byte arr to str:
byte[] by_original = {0,1,-2,3,-4,-5,6};
String str1 = new String(by_original);
System.out.println("str1 >> "+str1);
// Code to convert str to byte arr:
byte[] by_new = str1.getBytes();
for(int i=0;i<by_new.length;i++)
System.out.println("by1["+i+"] >> "+str1);
I am stuck in this problem.

Your byte array must have some encoding. The encoding cannot be ASCII if you've got negative values. Once you figure that out, you can convert a set of bytes to a String using:
byte[] bytes = {...}
String str = new String(bytes, StandardCharsets.UTF_8); // for UTF-8 encoding
There are a bunch of encodings you can use, look at the supported encodings in the Oracle javadocs.

The "proper conversion" between byte[] and String is to explicitly state the encoding you want to use. If you start with a byte[] and it does not in fact contain text data, there is no "proper conversion". Strings are for text, byte[] is for binary data, and the only really sensible thing to do is to avoid converting between them unless you absolutely have to.
If you really must use a String to hold binary data then the safest way is to use Base64 encoding.

The root problem is (I think) that you are unwittingly using a character set for which:
bytes != encode(decode(bytes))
in some cases. UTF-8 is an example of such a character set. Specifically, certain sequences of bytes are not valid encodings in UTF-8. If the UTF-8 decoder encounters one of these sequences, it is liable to discard the offending bytes or decode them as the Unicode codepoint for "no such character". Naturally, when you then try to encode the characters as bytes the result will be different.
The solution is:
Be explicit about the character encoding you are using; i.e. use a String constructor and String.toByteArray method with an explicit charset.
Use the right character set for your byte data ... or alternatively one (such as "Latin-1" where all byte sequences map to valid Unicode characters.
If your bytes are (really) binary data and you want to be able to transmit / receive them over a "text based" channel, use something like Base64 encoding ... which is designed for this purpose.
For Java, the most common character sets are in java.nio.charset.StandardCharsets. If you are encoding a string that can contain any Unicode character value then UTF-8 encoding (UTF_8) is recommended.
If you want a 1:1 mapping in Java then you can use ISO Latin Alphabet No. 1 - more commonly just called "Latin 1" or simply "Latin" (ISO_8859_1). Note that Latin-1 in Java is the IANA version of Latin-1 which assigns characters to all possible 256 values including control blocks C0 and C1. These are not printable: you won't see them in any output.
From Java 8 onwards Java contains java.util.Base64 for Base64 encoding / decoding. For URL-safe encoding you may want to to use Base64.getUrlEncoder instead of the standard encoder. This class is also present in Android since Android Oreo (8), API level 26.

We just need to construct a new String with the array: http://www.mkyong.com/java/how-do-convert-byte-array-to-string-in-java/
String s = new String(bytes);
The bytes of the resulting string differs depending on what charset you use. new String(bytes) and new String(bytes, Charset.forName("utf-8")) and new String(bytes, Charset.forName("utf-16")) will all have different byte arrays when you call String#getBytes() (depending on the default charset)

Using new String(byOriginal) and converting back to byte[] using getBytes() doesn't guarantee two byte[] with equal values. This is due to a call to StringCoding.encode(..) which will encode the String to Charset.defaultCharset(). During this encoding, the encoder might choose to replace unknown characters and do other changes. Hence, using String.getBytes() might not return an equal array as you've originally passed to the constructor.

Why was the problem: As someone already specified:
If you start with a byte[] and it does not in fact contain text data, there is no "proper conversion". Strings are for text, byte[] is for binary data, and the only really sensible thing to do is to avoid converting between them unless you absolutely have to.
I was observing this problem when I was trying to create byte[] from a pdf file and then converting it to String and then taking the String as input and converting back to file.
So make sure your encoding and decoding logic is same as I did. I explicitly encoded the byte[] to Base64 and decoded it to create the file again.
Use-case:
Due to some limitation I was trying to sent byte[] in request(POST) and the process was as follows:
PDF File >> Base64.encodeBase64(byte[]) >> String >> Send in request(POST) >> receive String >> Base64.decodeBase64(byte[]) >> create binary
Try this and this worked for me..
File file = new File("filePath");
byte[] byteArray = new byte[(int) file.length()];
try {
FileInputStream fileInputStream = new FileInputStream(file);
fileInputStream.read(byteArray);
String byteArrayStr= new String(Base64.encodeBase64(byteArray));
FileOutputStream fos = new FileOutputStream("newFilePath");
fos.write(Base64.decodeBase64(byteArrayStr.getBytes()));
fos.close();
}
catch (FileNotFoundException e) {
System.out.println("File Not Found.");
e.printStackTrace();
}
catch (IOException e1) {
System.out.println("Error Reading The File.");
e1.printStackTrace();
}

Even though
new String(bytes, "UTF-8")
is correct it throws a UnsupportedEncodingException which forces you to deal with a checked exception. You can use as an alternative another constructor since Java 1.6 to convert a byte array into a String:
new String(bytes, StandardCharsets.UTF_8)
This one does not throw any exception.
Converting back should be also done with StandardCharsets.UTF_8:
"test".getBytes(StandardCharsets.UTF_8)
Again you avoid having to deal with checked exceptions.

private static String toHexadecimal(byte[] digest){
String hash = "";
for(byte aux : digest) {
int b = aux & 0xff;
if (Integer.toHexString(b).length() == 1) hash += "0";
hash += Integer.toHexString(b);
}
return hash;
}

I did notice something that is not in any of the answers. You can cast each of the bytes in the byte array to characters, and put them in a char array. Then the string is new String(cbuf) where cbuf is the char array. To convert back, loop through the string casting each of the chars to bytes to put into a byte array, and this byte array will be the same as the first.
public class StringByteArrTest {
public static void main(String[] args) {
// put whatever byte array here
byte[] arr = new byte[] {-12, -100, -49, 100, -63, 0, -90};
for (byte b: arr) System.out.println(b);
// put data into this char array
char[] cbuf = new char[arr.length];
for (int i = 0; i < arr.length; i++) {
cbuf[i] = (char) arr[i];
}
// this is the string
String s = new String(cbuf);
System.out.println(s);
// converting back
byte[] out = new byte[s.length()];
for (int i = 0; i < s.length(); i++) {
out[i] = (byte) s.charAt(i);
}
for (byte b: out) System.out.println(b);
}
}

This works fine for me:
String cd = "Holding some value";
Converting from string to byte[]:
byte[] cookie = new sun.misc.BASE64Decoder().decodeBuffer(cd);
Converting from byte[] to string:
cd = new sun.misc.BASE64Encoder().encode(cookie);

Following is the sample code safely converts byte array to String and String to byte array back.
byte bytesArray[] = { 1, -2, 4, -5, 10};
String encoded = java.util.Base64.getEncoder().encodeToString(bytesArray);
byte[] decoded = java.util.Base64.getDecoder().decode(encoded);
System.out.println("input: "+Arrays.toString(bytesArray));
System.out.println("encoded: "+encoded);
System.out.println("decoded: "+Arrays.toString(decoded));
Output:
input: [1, -2, 4, -5, 10]
encoded: Af4E+wo=
decoded: [1, -2, 4, -5, 10]

javax.xml.bind.DatatypeConverter should do it:
byte [] b = javax.xml.bind.DatatypeConverter.parseHexBinary("E62DB");
String s = javax.xml.bind.DatatypeConverter.printHexBinary(b);

byte[] bytes = "Techie Delight".getBytes();
// System.out.println(Arrays.toString(bytes));
// Create a string from the byte array without specifying
// character encoding
String string = new String(bytes);
System.out.println(string);

Heres a few methods that convert an array of bytes to a string. I've tested them they work well.
public String getStringFromByteArray(byte[] settingsData) {
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(settingsData);
Reader reader = new BufferedReader(new InputStreamReader(byteArrayInputStream));
StringBuilder sb = new StringBuilder();
int byteChar;
try {
while((byteChar = reader.read()) != -1) {
sb.append((char) byteChar);
}
}
catch(IOException e) {
e.printStackTrace();
}
return sb.toString();
}
public String getStringFromByteArray(byte[] settingsData) {
StringBuilder sb = new StringBuilder();
for(byte willBeChar: settingsData) {
sb.append((char) willBeChar);
}
return sb.toString();
}

While base64 encoding is safe and one could argue "the right answer", I arrived here looking for a way to convert a Java byte array to/from a Java String as-is. That is, where each member of the byte array remains intact in its String counterpart, with no extra space required for encoding/transport.
This answer describing 8bit transparent encodings was very helpful for me. I used ISO-8859-1 on terabytes of binary data to convert back and forth successfully (binary <-> String) without the inflated space requirements needed for a base64 encoding, so is safe for my use-case - YMMV.
This was also helpful in explaining when/if you should experiment.

Using Kotlin on Android I found out it is handy to create some simple extension functions for that purpose.
Solution based on Base64 encoding/decoding to be able to pass via JSON, XML, etc:
import android.util.Base64
fun ByteArray.encodeToString() = String(Base64.encode(this, Base64.NO_WRAP), Charsets.UTF_8)
fun String.decodeToBytes(): ByteArray = Base64.decode(toByteArray(Charsets.UTF_8), Base64.NO_WRAP)
So you can use it
val byteArray = byteArrayOf(0, 1, 2, -1, -2, -3)
val string = byteArray.encodeToString()
val restoredArray = string.decodeToBytes()

import sun.misc.BASE64Decoder;
import sun.misc.BASE64Encoder;
private static String base64Encode(byte[] bytes)
{
return new BASE64Encoder().encode(bytes);
}
private static byte[] base64Decode(String s) throws IOException
{
return new BASE64Decoder().decodeBuffer(s);
}

I succeeded converting byte array to a string with this method:
public static String byteArrayToString(byte[] data){
String response = Arrays.toString(data);
String[] byteValues = response.substring(1, response.length() - 1).split(",");
byte[] bytes = new byte[byteValues.length];
for (int i=0, len=bytes.length; i<len; i++) {
bytes[i] = Byte.parseByte(byteValues[i].trim());
}
String str = new String(bytes);
return str.toLowerCase();
}

This one works for me up to android Q:
You can use the following method to convert o hex string to string
public static String hexToString(String hex) {
StringBuilder sb = new StringBuilder();
char[] hexData = hex.toCharArray();
for (int count = 0; count < hexData.length - 1; count += 2) {
int firstDigit = Character.digit(hexData[count], 16);
int lastDigit = Character.digit(hexData[count + 1], 16);
int decimal = firstDigit * 16 + lastDigit;
sb.append((char)decimal);
}
return sb.toString();
}
with the following to convert a byte array to a hex string
public static String bytesToHex(byte[] bytes) {
char[] hexChars = new char[bytes.length * 2];
for (int j = 0; j < bytes.length; j++) {
int v = bytes[j] & 0xFF;
hexChars[j * 2] = hexArray[v >>> 4];
hexChars[j * 2 + 1] = hexArray[v & 0x0F];
}
return new String(hexChars);
}

Here the working code.
// Encode byte array into string . TemplateBuffer1 is my bytearry variable.
String finger_buffer = Base64.encodeToString(templateBuffer1, Base64.DEFAULT);
Log.d(TAG, "Captured biometric device->" + finger_buffer);
// Decode String into Byte Array. decodedString is my bytearray[]
decodedString = Base64.decode(finger_buffer, Base64.DEFAULT);

You can use simple for loop for conversion:
public void byteArrToString(){
byte[] b = {'a','b','$'};
String str = "";
for(int i=0; i<b.length; i++){
char c = (char) b[i];
str+=c;
}
System.out.println(str);
}

byte[] image = {...};
String imageString = Base64.encodeToString(image, Base64.NO_WRAP);

Try to specify an 8-bit charset in both conversions. ISO-8859-1 for instance.

Read the bytes from String using ByteArrayInputStream and wrap it with BufferedReader which is Char Stream instead of Byte Stream which converts the byte data to String.
package com.cs.sajal;
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
public class TestCls {
public static void main(String[] args) {
String s=new String("Sajal is a good boy");
try
{
ByteArrayInputStream bis;
bis=new ByteArrayInputStream(s.getBytes("UTF-8"));
BufferedReader br=new BufferedReader(new InputStreamReader(bis));
System.out.println(br.readLine());
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
Output is:
Sajal is a good boy

You can do the following to convert byte array to string and then convert that string to byte array:
// 1. convert byte array to string and then string to byte array
// convert byte array to string
byte[] by_original = {0, 1, -2, 3, -4, -5, 6};
String str1 = Arrays.toString(by_original);
System.out.println(str1); // output: [0, 1, -2, 3, -4, -5, 6]
// convert string to byte array
String newString = str1.substring(1, str1.length()-1);
String[] stringArray = newString.split(", ");
byte[] by_new = new byte[stringArray.length];
for(int i=0; i<stringArray.length; i++) {
by_new[i] = (byte) Integer.parseInt(stringArray[i]);
}
System.out.println(Arrays.toString(by_new)); // output: [0, 1, -2, 3, -4, -5, 6]
But to convert the string to byte array and then convert that byte array to string, below approach can be used:
// 2. convert string to byte array and then byte array to string
// convert string to byte array
String str2 = "[0, 1, -2, 3, -4, -5, 6]";
byte[] byteStr2 = str2.getBytes(StandardCharsets.UTF_8);
// Now byteStr2 is [91, 48, 44, 32, 49, 44, 32, 45, 50, 44, 32, 51, 44, 32, 45, 52, 44, 32, 45, 53, 44, 32, 54, 93]
// convert byte array to string
System.out.println(new String(byteStr2, StandardCharsets.UTF_8)); // output: [0, 1, -2, 3, -4, -5, 6]

A string is a collection of char's (16bit unsigned). So if you are going to convert negative numbers into a string, they'll be lost in translation.

public class byteString {
/**
* #param args
*/
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
String msg = "Hello";
byte[] buff = new byte[1024];
buff = msg.getBytes("UTF-8");
System.out.println(buff);
String m = new String(buff);
System.out.println(m);
}
}

Related

Efficient way to convert a String (representing HEX) to a byte array? Android

I'm working on a project that involves receiving a byte array over wireless, the Android app reads this as a String over a TCP connection:
input = new BufferedReader(new InputStreamReader(this.clientSocket.getInputStream()));
...
...
//Loop
String read = input.readLine();
//Do something meaningful with String read...
The String will always be of a fixed format i.e. the first 3 characters will be an ID and the next following 20 characters will be the message data. The amount of characters will not change (3+20 characters = 23, with a starting and ending character '[' and ']' so that's 25 characters in total.
An example of a String received by the application would be [01A01020304050A0B0C0D]
ID - 0x01A
Byte0 0x01
Byte1 0x02
Byte2 0x03
Byte3 0x04
Byte4 0x05
Byte5 0x0A
Byte6 0x0B
Byte7 0x0C
Byte 8 0x0D
I would guess that I would have to use the substring operation, but I'm having trouble converting the substring to a byte value (note: the app is expecting byte[] and not Byte[]) and I feel I'm not doing it efficiently. I came across this piece of code that I've been using:
public static byte[] hexStringToByteArray(String s) {
int len = s.length();
byte[] data = new byte[len / 2];
for (int i = 0; i < len; i += 2) {
data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4)
+ Character.digit(s.charAt(i+1), 16));
}
return data;
This is returning a byte array of size 1 and will have to be run 9 times (9 bytes) per message. I'm a bit concerned that this may be a bit too strenuous on processing, especially when the application is receiving messages very frequently (roughly about 10-15 messages per second)
I appreciate any thoughts and many thanks in advance!
just use this :
byte[] decodedString = Base64.decode(your_string, Base64.DEFAULT);
byte[] b = string.getBytes();
byte[] b = string.getBytes(Charset.forName("UTF-8"));
byte[] b = string.getBytes("UTF-8");
There is no way to be more effecient than using this methods.
Best and simple way:
String myString = "This is my string";
byte[] myByteArray = myString.getBytes("UTF-8");
Now, you able to access id, message whatever; easily from myByteArray.
Just write your data like
byte[] data = yourData.getBytes();
os.write(data, 0, data.length) // data is of 23 bytes
os.flush();
what about reading through InputStream, as you mentioned in your question that String is of 23 characters just do like
public byte[] readData(InputStream is) {
byte[] data = new byte[23];
int read = is.read(data);
System.out.println("Read: " + read);
return data;
}
When you have data then you can split data like this
byte[] tempId = new byte[3];
System.arrayCopy(data, 0, id, 0, id.length);
byte[] tempMessage = new byte[20];
System.arrayCopy(data, 3, message, 0, message.length);
String id = new String(tempId);
String message = new String(tempMessage);
Now you id and message separated and converted into String.
byte[] array = String.getBytes("UTF-8");

Convert byte to string in Java

I use below code to convert byte to string:
System.out.println("string " + Byte.toString((byte)0x63));
Why it print "string 99".
How to modify to let it print "string c"?
System.out.println(new String(new byte[]{ (byte)0x63 }, "US-ASCII"));
Note especially that converting bytes to Strings always involves an encoding. If you do not specify it, you'll be using the platform default encoding, which means the code can break when running in different environments.
The string ctor is suitable for this conversion:
System.out.println("string " + new String(new byte[] {0x63}));
Use char instead of byte:
System.out.println("string " + (char)0x63);
Or if you want to be a Unicode puritan, you use codepoints:
System.out.println("string " + new String(new int[]{ 0x63 }, 0, 1));
And if you like the old skool US-ASCII "every byte is a character" idea:
System.out.println("string " + new String(new byte[]{ (byte)0x63 },
StandardCharsets.US_ASCII));
Avoid using the String(byte[]) constructor recommended in other answers; it relies on the default charset. Circumstances could arise where 0x63 actually isn't the character c.
You can use printf:
System.out.printf("string %c\n", 0x63);
You can as well create a String with such formatting, using String#format:
String s = String.format("string %c", 0x63);
you can use
the character equivalent to 0x63 is 'c' but byte equivalent to it is 99
System.out.println("byte "+(char)0x63);
You have to construct a new string out of a byte array. The first element in your byteArray should be 0x63. If you want to add any more letters, make the byteArray longer and add them to the next indices.
byte[] byteArray = new byte[1];
byteArray[0] = 0x63;
try {
System.out.println("string " + new String(byteArray, "US-ASCII"));
} catch (UnsupportedEncodingException e) {
// TODO: Handle exception.
e.printStackTrace();
}
Note that specifying the encoding will eventually throw an UnsupportedEncodingException and you must handle that accordingly.
If it's a single byte, just cast the byte to a char and it should work out to be fine i.e. give a char entity corresponding to the codepoint value of the given byte. If not, use the String constructor as mentioned elsewhere.
char ch = (char)0x63;
System.out.println(ch);
String str = "0x63";
int temp = Integer.parseInt(str.substring(2, 4), 16);
char c = (char)temp;
System.out.print(c);
This is my version:
public String convertBytestoString(InputStream inputStream)
{
int bytes;
byte[] buffer = new byte[1024];
bytes = inputStream.read(buffer);
String stringData = new String(buffer,0,bytes);
return stringData;
}
Using StringBuilder class in Java:
StringBuilder str = new StringBuilder();
for (byte aByte : bytesArray) {
if (aByte != 0) {
str.append((char) aByte);
} else {
break;
}

Convert a String to a byte array and then back to the original String

Is it possible to convert a string to a byte array and then convert it back to the original string in Java or Android?
My objective is to send some strings to a microcontroller (Arduino) and store it into EEPROM (which is the only 1  KB). I tried to use an MD5 hash, but it seems it's only one-way encryption. What can I do to deal with this issue?
I would suggest using the members of string, but with an explicit encoding:
byte[] bytes = text.getBytes("UTF-8");
String text = new String(bytes, "UTF-8");
By using an explicit encoding (and one which supports all of Unicode) you avoid the problems of just calling text.getBytes() etc:
You're explicitly using a specific encoding, so you know which encoding to use later, rather than relying on the platform default.
You know it will support all of Unicode (as opposed to, say, ISO-Latin-1).
EDIT: Even though UTF-8 is the default encoding on Android, I'd definitely be explicit about this. For example, this question only says "in Java or Android" - so it's entirely possible that the code will end up being used on other platforms.
Basically given that the normal Java platform can have different default encodings, I think it's best to be absolutely explicit. I've seen way too many people using the default encoding and losing data to take that risk.
EDIT: In my haste I forgot to mention that you don't have to use the encoding's name - you can use a Charset instead. Using Guava I'd really use:
byte[] bytes = text.getBytes(Charsets.UTF_8);
String text = new String(bytes, Charsets.UTF_8);
You can do it like this.
String to byte array
String stringToConvert = "This String is 76 characters long and will be converted to an array of bytes";
byte[] theByteArray = stringToConvert.getBytes();
http://www.javadb.com/convert-string-to-byte-array
Byte array to String
byte[] byteArray = new byte[] {87, 79, 87, 46, 46, 46};
String value = new String(byteArray);
http://www.javadb.com/convert-byte-array-to-string
Use [String.getBytes()][1] to convert to bytes and use [String(byte[] data)][2] constructor to convert back to string.
byte[] pdfBytes = Base64.decode(myPdfBase64String, Base64.DEFAULT)
import java.io.FileInputStream;
import java.io.ByteArrayOutputStream;
public class FileHashStream
{
// write a new method that will provide a new Byte array, and where this generally reads from an input stream
public static byte[] read(InputStream is) throws Exception
{
String path = /* type in the absolute path for the 'commons-codec-1.10-bin.zip' */;
// must need a Byte buffer
byte[] buf = new byte[1024 * 16]
// we will use 16 kilobytes
int len = 0;
// we need a new input stream
FileInputStream is = new FileInputStream(path);
// use the buffer to update our "MessageDigest" instance
while(true)
{
len = is.read(buf);
if(len < 0) break;
md.update(buf, 0, len);
}
// close the input stream
is.close();
// call the "digest" method for obtaining the final hash-result
byte[] ret = md.digest();
System.out.println("Length of Hash: " + ret.length);
for(byte b : ret)
{
System.out.println(b + ", ");
}
String compare = "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d";
String verification = Hex.encodeHexString(ret);
System.out.println();
System.out.println("===")
System.out.println(verification);
System.out.println("Equals? " + verification.equals(compare));
}
}

Issue Decoding for a specific charset

I'm trying to decode a char and get back the same char.
Following is my simple test.
I'm confused, If i have to encode or decode. Tried both. Both print the same result.
Any suggestions are greatly helpful.
char inpData = '†';
String str = Character.toString((char) inpData);
byte b[] = str.getBytes(Charset.forName("MacRoman"));
System.out.println(b[0]); // prints -96
String decData = Integer.toString(b[0]);
CharsetDecoder decoder = Charset.forName("MacRoman").newDecoder();
ByteBuffer inBuffer = ByteBuffer.wrap(decData.getBytes());
CharBuffer result = decoder.decode(inBuffer);
System.out.println(result.toString()); // prints -96, expecting to print †
CharsetEncoder encoder = Charset.forName("MacRoman").newEncoder();
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(decData));
result = decoder.decode(bbuf);
System.out.println(result.toString());// prints -96, expecting to print †
Thank you.
When you do String decData = Integer.toString(b[0]);, you create the string "-96" and that is the string you're encoding/decoding. Not the original char.
You have to change your String back to a byte before.
To get your character back as a char from the -96 you have to do this :
String string = new String(b, "MacRoman");
char specialChar = string.charAt(0);
With this your reversing your first transformation from char -> String -> byte[0] by doing byte[0] -> String -> char[0]
If you have the String "-96", you must change first your string into a byte with :
byte b = Byte.parseByte("-96");
String decData = Integer.toString(b[0]);
This probably gets you the "-96" output in the last two examples. try
String decData = new String(b, "MacRoman");
Apart from that, keep in mind that System.out.println uses your system-charset to print out strings anyway. For a better test, consider writing your Strings to a file using your specific charset with something like
FileOutputStream fos = new FileOutputStream("test.txt");
OutputStreamWriter writer = new OutputStreamWriter(fos, "MacRoman");
writer.write(result.toString());
writer.close();

How to convert Strings to and from UTF8 byte arrays in Java

In Java, I have a String and I want to encode it as a byte array (in UTF8, or some other encoding). Alternately, I have a byte array (in some known encoding) and I want to convert it into a Java String. How do I do these conversions?
Convert from String to byte[]:
String s = "some text here";
byte[] b = s.getBytes(StandardCharsets.UTF_8);
Convert from byte[] to String:
byte[] b = {(byte) 99, (byte)97, (byte)116};
String s = new String(b, StandardCharsets.US_ASCII);
You should, of course, use the correct encoding name. My examples used US-ASCII and UTF-8, two commonly-used encodings.
Here's a solution that avoids performing the Charset lookup for every conversion:
import java.nio.charset.Charset;
private final Charset UTF8_CHARSET = Charset.forName("UTF-8");
String decodeUTF8(byte[] bytes) {
return new String(bytes, UTF8_CHARSET);
}
byte[] encodeUTF8(String string) {
return string.getBytes(UTF8_CHARSET);
}
String original = "hello world";
byte[] utf8Bytes = original.getBytes("UTF-8");
You can convert directly via the String(byte[], String) constructor and getBytes(String) method. Java exposes available character sets via the Charset class. The JDK documentation lists supported encodings.
90% of the time, such conversions are performed on streams, so you'd use the Reader/Writer classes. You would not incrementally decode using the String methods on arbitrary byte streams - you would leave yourself open to bugs involving multibyte characters.
My tomcat7 implementation is accepting strings as ISO-8859-1; despite the content-type of the HTTP request. The following solution worked for me when trying to correctly interpret characters like 'é' .
byte[] b1 = szP1.getBytes("ISO-8859-1");
System.out.println(b1.toString());
String szUT8 = new String(b1, "UTF-8");
System.out.println(szUT8);
When trying to interpret the string as US-ASCII, the byte info wasn't correctly interpreted.
b1 = szP1.getBytes("US-ASCII");
System.out.println(b1.toString());
As an alternative, StringUtils from Apache Commons can be used.
byte[] bytes = {(byte) 1};
String convertedString = StringUtils.newStringUtf8(bytes);
or
String myString = "example";
byte[] convertedBytes = StringUtils.getBytesUtf8(myString);
If you have non-standard charset, you can use getBytesUnchecked() or newString() accordingly.
I can't comment but don't want to start a new thread. But this isn't working. A simple round trip:
byte[] b = new byte[]{ 0, 0, 0, -127 }; // 0x00000081
String s = new String(b,StandardCharsets.UTF_8); // UTF8 = 0x0000, 0x0000, 0x0000, 0xfffd
b = s.getBytes(StandardCharsets.UTF_8); // [0, 0, 0, -17, -65, -67] 0x000000efbfbd != 0x00000081
I'd need b[] the same array before and after encoding which it isn't (this referrers to the first answer).
For decoding a series of bytes to a normal string message I finally got it working with UTF-8 encoding with this code:
/* Convert a list of UTF-8 numbers to a normal String
* Usefull for decoding a jms message that is delivered as a sequence of bytes instead of plain text
*/
public String convertUtf8NumbersToString(String[] numbers){
int length = numbers.length;
byte[] data = new byte[length];
for(int i = 0; i< length; i++){
data[i] = Byte.parseByte(numbers[i]);
}
return new String(data, Charset.forName("UTF-8"));
}
If you are using 7-bit ASCII or ISO-8859-1 (an amazingly common format) then you don't have to create a new java.lang.String at all. It's much much more performant to simply cast the byte into char:
Full working example:
for (byte b : new byte[] { 43, 45, (byte) 215, (byte) 247 }) {
char c = (char) b;
System.out.print(c);
}
If you are not using extended-characters like Ä, Æ, Å, Ç, Ï, Ê and can be sure that the only transmitted values are of the first 128 Unicode characters, then this code will also work for UTF-8 and extended ASCII (like cp-1252).
Charset UTF8_CHARSET = Charset.forName("UTF-8");
String strISO = "{\"name\":\"א\"}";
System.out.println(strISO);
byte[] b = strISO.getBytes();
for (byte c: b) {
System.out.print("[" + c + "]");
}
String str = new String(b, UTF8_CHARSET);
System.out.println(str);
Reader reader = new BufferedReader(
new InputStreamReader(
new ByteArrayInputStream(
string.getBytes(StandardCharsets.UTF_8)), StandardCharsets.UTF_8));
//query is your json
DefaultHttpClient httpClient = new DefaultHttpClient();
HttpPost postRequest = new HttpPost("http://my.site/test/v1/product/search?qy=");
StringEntity input = new StringEntity(query, "UTF-8");
input.setContentType("application/json");
postRequest.setEntity(input);
HttpResponse response=response = httpClient.execute(postRequest);
terribly late but i just encountered this issue and this is my fix:
private static String removeNonUtf8CompliantCharacters( final String inString ) {
if (null == inString ) return null;
byte[] byteArr = inString.getBytes();
for ( int i=0; i < byteArr.length; i++ ) {
byte ch= byteArr[i];
// remove any characters outside the valid UTF-8 range as well as all control characters
// except tabs and new lines
if ( !( (ch > 31 && ch < 253 ) || ch == '\t' || ch == '\n' || ch == '\r') ) {
byteArr[i]=' ';
}
}
return new String( byteArr );
}

Categories

Resources