Convert byte to string in Java - java

I use below code to convert byte to string:
System.out.println("string " + Byte.toString((byte)0x63));
Why it print "string 99".
How to modify to let it print "string c"?

System.out.println(new String(new byte[]{ (byte)0x63 }, "US-ASCII"));
Note especially that converting bytes to Strings always involves an encoding. If you do not specify it, you'll be using the platform default encoding, which means the code can break when running in different environments.

The string ctor is suitable for this conversion:
System.out.println("string " + new String(new byte[] {0x63}));

Use char instead of byte:
System.out.println("string " + (char)0x63);
Or if you want to be a Unicode puritan, you use codepoints:
System.out.println("string " + new String(new int[]{ 0x63 }, 0, 1));
And if you like the old skool US-ASCII "every byte is a character" idea:
System.out.println("string " + new String(new byte[]{ (byte)0x63 },
StandardCharsets.US_ASCII));
Avoid using the String(byte[]) constructor recommended in other answers; it relies on the default charset. Circumstances could arise where 0x63 actually isn't the character c.

You can use printf:
System.out.printf("string %c\n", 0x63);
You can as well create a String with such formatting, using String#format:
String s = String.format("string %c", 0x63);

you can use
the character equivalent to 0x63 is 'c' but byte equivalent to it is 99
System.out.println("byte "+(char)0x63);

You have to construct a new string out of a byte array. The first element in your byteArray should be 0x63. If you want to add any more letters, make the byteArray longer and add them to the next indices.
byte[] byteArray = new byte[1];
byteArray[0] = 0x63;
try {
System.out.println("string " + new String(byteArray, "US-ASCII"));
} catch (UnsupportedEncodingException e) {
// TODO: Handle exception.
e.printStackTrace();
}
Note that specifying the encoding will eventually throw an UnsupportedEncodingException and you must handle that accordingly.

If it's a single byte, just cast the byte to a char and it should work out to be fine i.e. give a char entity corresponding to the codepoint value of the given byte. If not, use the String constructor as mentioned elsewhere.
char ch = (char)0x63;
System.out.println(ch);

String str = "0x63";
int temp = Integer.parseInt(str.substring(2, 4), 16);
char c = (char)temp;
System.out.print(c);

This is my version:
public String convertBytestoString(InputStream inputStream)
{
int bytes;
byte[] buffer = new byte[1024];
bytes = inputStream.read(buffer);
String stringData = new String(buffer,0,bytes);
return stringData;
}

Using StringBuilder class in Java:
StringBuilder str = new StringBuilder();
for (byte aByte : bytesArray) {
if (aByte != 0) {
str.append((char) aByte);
} else {
break;
}

Related

Converting bytearray output to string without value changing in java

I have a byte[] array named byteval in java and if I do System.out.println(byteval), I can read: d3e1547c254ff7cec8dbcef2262b5cf10ec079c7[B#40d150e0
Now I need this what I read there as a string, but if I try to convert it with Byte.toString or a new string constuctor, the value is not the same, most there are some numbers instead.
So how can I get the byte[] array as a String called strval, also cutting off the [B#40d150e0?
Now: System.out.println(byteval)>> d3e1547c254ff7cec8dbcef2262b5cf10ec079c7[B#40d150e0
Goal: System.out.println(strval)>> d3e1547c254ff7cec8dbcef2262b5cf10ec079c7
Lot of thanks!
Danny
EDIT: Working solution for me:
byte[] byteval = getValue();
// Here System.out.println(byteval) is
// d3e1547c254ff7cec8dbcef2262b5cf10ec079c7[B#40d150e0
BigInteger bi = new BigInteger(1, byteval);
String strval = bi.toString(16);
if ((strval.length() % 2) != 0) {
strval = "0" + strval;
}
System.out.println(strval);
// Here the String output is
// d3e1547c254ff7cec8dbcef2262b5cf10ec079c7
Thank all answerer.
just do
System.out.println(byteval.toString())
instead of
System.out.println(byteval)
this will remove the ending part(Actually that is just the address of the object referenced)
Try this:
String str= new String(byteval, "ISO-8859-1");
System.out.println(str);

Check if a String is valid UTF-8 encoded in Java

How can I check if a string is in valid UTF-8 format?
Only byte data can be checked. If you constructed a String then its already in UTF-16 internally.
Also only byte arrays can be UTF-8 encoded.
Here is a common case of UTF-8 conversions.
String myString = "\u0048\u0065\u006C\u006C\u006F World";
System.out.println(myString);
byte[] myBytes = null;
try
{
myBytes = myString.getBytes("UTF-8");
}
catch (UnsupportedEncodingException e)
{
e.printStackTrace();
System.exit(-1);
}
for (int i=0; i < myBytes.length; i++) {
System.out.println(myBytes[i]);
}
If you don't know the encoding of your byte array, juniversalchardet is a library to help you detect it.
The following post is taken from the official Java tutorials available at: https://docs.oracle.com/javase/tutorial/i18n/text/string.html.
The StringConverter program starts by creating a String containing
Unicode characters:
String original = new String("A" + "\u00ea" + "\u00f1" + "\u00fc" + "C");
When printed, the String named original appears as:
AêñüC
To convert the String object to UTF-8, invoke the getBytes method and
specify the appropriate encoding identifier as a parameter. The
getBytes method returns an array of bytes in UTF-8 format. To create a
String object from an array of non-Unicode bytes, invoke the String
constructor with the encoding parameter. The code that makes these
calls is enclosed in a try block, in case the specified encoding is
unsupported:
try {
byte[] utf8Bytes = original.getBytes("UTF8");
byte[] defaultBytes = original.getBytes();
String roundTrip = new String(utf8Bytes, "UTF8");
System.out.println("roundTrip = " + roundTrip);
System.out.println();
printBytes(utf8Bytes, "utf8Bytes");
System.out.println();
printBytes(defaultBytes, "defaultBytes");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
The StringConverter program prints out the values in the utf8Bytes and
defaultBytes arrays to demonstrate an important point: The length of
the converted text might not be the same as the length of the source
text. Some Unicode characters translate into single bytes, others into
pairs or triplets of bytes.
The printBytes method displays the byte arrays by invoking the byteToHex method, which is defined in the source file,
UnicodeFormatter.java. Here is the printBytes method:
public static void printBytes(byte[] array, String name) {
for (int k = 0; k < array.length; k++) {
System.out.println(name + "[" + k + "] = " + "0x" +
UnicodeFormatter.byteToHex(array[k]));
}
}
The output of the printBytes method follows. Note that only the first
and last bytes, the A and C characters, are the same in both arrays:
utf8Bytes[0] = 0x41
utf8Bytes[1] = 0xc3
utf8Bytes[2] = 0xaa
utf8Bytes[3] = 0xc3
utf8Bytes[4] = 0xb1
utf8Bytes[5] = 0xc3
utf8Bytes[6] = 0xbc
utf8Bytes[7] = 0x43
defaultBytes[0] = 0x41
defaultBytes[1] = 0xea
defaultBytes[2] = 0xf1
defaultBytes[3] = 0xfc
defaultBytes[4] = 0x43

Issue Decoding for a specific charset

I'm trying to decode a char and get back the same char.
Following is my simple test.
I'm confused, If i have to encode or decode. Tried both. Both print the same result.
Any suggestions are greatly helpful.
char inpData = '†';
String str = Character.toString((char) inpData);
byte b[] = str.getBytes(Charset.forName("MacRoman"));
System.out.println(b[0]); // prints -96
String decData = Integer.toString(b[0]);
CharsetDecoder decoder = Charset.forName("MacRoman").newDecoder();
ByteBuffer inBuffer = ByteBuffer.wrap(decData.getBytes());
CharBuffer result = decoder.decode(inBuffer);
System.out.println(result.toString()); // prints -96, expecting to print †
CharsetEncoder encoder = Charset.forName("MacRoman").newEncoder();
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(decData));
result = decoder.decode(bbuf);
System.out.println(result.toString());// prints -96, expecting to print †
Thank you.
When you do String decData = Integer.toString(b[0]);, you create the string "-96" and that is the string you're encoding/decoding. Not the original char.
You have to change your String back to a byte before.
To get your character back as a char from the -96 you have to do this :
String string = new String(b, "MacRoman");
char specialChar = string.charAt(0);
With this your reversing your first transformation from char -> String -> byte[0] by doing byte[0] -> String -> char[0]
If you have the String "-96", you must change first your string into a byte with :
byte b = Byte.parseByte("-96");
String decData = Integer.toString(b[0]);
This probably gets you the "-96" output in the last two examples. try
String decData = new String(b, "MacRoman");
Apart from that, keep in mind that System.out.println uses your system-charset to print out strings anyway. For a better test, consider writing your Strings to a file using your specific charset with something like
FileOutputStream fos = new FileOutputStream("test.txt");
OutputStreamWriter writer = new OutputStreamWriter(fos, "MacRoman");
writer.write(result.toString());
writer.close();

How to convert byte array to string and vice versa?

I have to convert a byte array to string in Android, but my byte array contains negative values.
If I convert that string again to byte array, values I am getting are different from original byte array values.
What can I do to get proper conversion? Code I am using to do the conversion is as follows:
// Code to convert byte arr to str:
byte[] by_original = {0,1,-2,3,-4,-5,6};
String str1 = new String(by_original);
System.out.println("str1 >> "+str1);
// Code to convert str to byte arr:
byte[] by_new = str1.getBytes();
for(int i=0;i<by_new.length;i++)
System.out.println("by1["+i+"] >> "+str1);
I am stuck in this problem.
Your byte array must have some encoding. The encoding cannot be ASCII if you've got negative values. Once you figure that out, you can convert a set of bytes to a String using:
byte[] bytes = {...}
String str = new String(bytes, StandardCharsets.UTF_8); // for UTF-8 encoding
There are a bunch of encodings you can use, look at the supported encodings in the Oracle javadocs.
The "proper conversion" between byte[] and String is to explicitly state the encoding you want to use. If you start with a byte[] and it does not in fact contain text data, there is no "proper conversion". Strings are for text, byte[] is for binary data, and the only really sensible thing to do is to avoid converting between them unless you absolutely have to.
If you really must use a String to hold binary data then the safest way is to use Base64 encoding.
The root problem is (I think) that you are unwittingly using a character set for which:
bytes != encode(decode(bytes))
in some cases. UTF-8 is an example of such a character set. Specifically, certain sequences of bytes are not valid encodings in UTF-8. If the UTF-8 decoder encounters one of these sequences, it is liable to discard the offending bytes or decode them as the Unicode codepoint for "no such character". Naturally, when you then try to encode the characters as bytes the result will be different.
The solution is:
Be explicit about the character encoding you are using; i.e. use a String constructor and String.toByteArray method with an explicit charset.
Use the right character set for your byte data ... or alternatively one (such as "Latin-1" where all byte sequences map to valid Unicode characters.
If your bytes are (really) binary data and you want to be able to transmit / receive them over a "text based" channel, use something like Base64 encoding ... which is designed for this purpose.
For Java, the most common character sets are in java.nio.charset.StandardCharsets. If you are encoding a string that can contain any Unicode character value then UTF-8 encoding (UTF_8) is recommended.
If you want a 1:1 mapping in Java then you can use ISO Latin Alphabet No. 1 - more commonly just called "Latin 1" or simply "Latin" (ISO_8859_1). Note that Latin-1 in Java is the IANA version of Latin-1 which assigns characters to all possible 256 values including control blocks C0 and C1. These are not printable: you won't see them in any output.
From Java 8 onwards Java contains java.util.Base64 for Base64 encoding / decoding. For URL-safe encoding you may want to to use Base64.getUrlEncoder instead of the standard encoder. This class is also present in Android since Android Oreo (8), API level 26.
We just need to construct a new String with the array: http://www.mkyong.com/java/how-do-convert-byte-array-to-string-in-java/
String s = new String(bytes);
The bytes of the resulting string differs depending on what charset you use. new String(bytes) and new String(bytes, Charset.forName("utf-8")) and new String(bytes, Charset.forName("utf-16")) will all have different byte arrays when you call String#getBytes() (depending on the default charset)
Using new String(byOriginal) and converting back to byte[] using getBytes() doesn't guarantee two byte[] with equal values. This is due to a call to StringCoding.encode(..) which will encode the String to Charset.defaultCharset(). During this encoding, the encoder might choose to replace unknown characters and do other changes. Hence, using String.getBytes() might not return an equal array as you've originally passed to the constructor.
Why was the problem: As someone already specified:
If you start with a byte[] and it does not in fact contain text data, there is no "proper conversion". Strings are for text, byte[] is for binary data, and the only really sensible thing to do is to avoid converting between them unless you absolutely have to.
I was observing this problem when I was trying to create byte[] from a pdf file and then converting it to String and then taking the String as input and converting back to file.
So make sure your encoding and decoding logic is same as I did. I explicitly encoded the byte[] to Base64 and decoded it to create the file again.
Use-case:
Due to some limitation I was trying to sent byte[] in request(POST) and the process was as follows:
PDF File >> Base64.encodeBase64(byte[]) >> String >> Send in request(POST) >> receive String >> Base64.decodeBase64(byte[]) >> create binary
Try this and this worked for me..
File file = new File("filePath");
byte[] byteArray = new byte[(int) file.length()];
try {
FileInputStream fileInputStream = new FileInputStream(file);
fileInputStream.read(byteArray);
String byteArrayStr= new String(Base64.encodeBase64(byteArray));
FileOutputStream fos = new FileOutputStream("newFilePath");
fos.write(Base64.decodeBase64(byteArrayStr.getBytes()));
fos.close();
}
catch (FileNotFoundException e) {
System.out.println("File Not Found.");
e.printStackTrace();
}
catch (IOException e1) {
System.out.println("Error Reading The File.");
e1.printStackTrace();
}
Even though
new String(bytes, "UTF-8")
is correct it throws a UnsupportedEncodingException which forces you to deal with a checked exception. You can use as an alternative another constructor since Java 1.6 to convert a byte array into a String:
new String(bytes, StandardCharsets.UTF_8)
This one does not throw any exception.
Converting back should be also done with StandardCharsets.UTF_8:
"test".getBytes(StandardCharsets.UTF_8)
Again you avoid having to deal with checked exceptions.
private static String toHexadecimal(byte[] digest){
String hash = "";
for(byte aux : digest) {
int b = aux & 0xff;
if (Integer.toHexString(b).length() == 1) hash += "0";
hash += Integer.toHexString(b);
}
return hash;
}
I did notice something that is not in any of the answers. You can cast each of the bytes in the byte array to characters, and put them in a char array. Then the string is new String(cbuf) where cbuf is the char array. To convert back, loop through the string casting each of the chars to bytes to put into a byte array, and this byte array will be the same as the first.
public class StringByteArrTest {
public static void main(String[] args) {
// put whatever byte array here
byte[] arr = new byte[] {-12, -100, -49, 100, -63, 0, -90};
for (byte b: arr) System.out.println(b);
// put data into this char array
char[] cbuf = new char[arr.length];
for (int i = 0; i < arr.length; i++) {
cbuf[i] = (char) arr[i];
}
// this is the string
String s = new String(cbuf);
System.out.println(s);
// converting back
byte[] out = new byte[s.length()];
for (int i = 0; i < s.length(); i++) {
out[i] = (byte) s.charAt(i);
}
for (byte b: out) System.out.println(b);
}
}
This works fine for me:
String cd = "Holding some value";
Converting from string to byte[]:
byte[] cookie = new sun.misc.BASE64Decoder().decodeBuffer(cd);
Converting from byte[] to string:
cd = new sun.misc.BASE64Encoder().encode(cookie);
Following is the sample code safely converts byte array to String and String to byte array back.
byte bytesArray[] = { 1, -2, 4, -5, 10};
String encoded = java.util.Base64.getEncoder().encodeToString(bytesArray);
byte[] decoded = java.util.Base64.getDecoder().decode(encoded);
System.out.println("input: "+Arrays.toString(bytesArray));
System.out.println("encoded: "+encoded);
System.out.println("decoded: "+Arrays.toString(decoded));
Output:
input: [1, -2, 4, -5, 10]
encoded: Af4E+wo=
decoded: [1, -2, 4, -5, 10]
javax.xml.bind.DatatypeConverter should do it:
byte [] b = javax.xml.bind.DatatypeConverter.parseHexBinary("E62DB");
String s = javax.xml.bind.DatatypeConverter.printHexBinary(b);
byte[] bytes = "Techie Delight".getBytes();
// System.out.println(Arrays.toString(bytes));
// Create a string from the byte array without specifying
// character encoding
String string = new String(bytes);
System.out.println(string);
Heres a few methods that convert an array of bytes to a string. I've tested them they work well.
public String getStringFromByteArray(byte[] settingsData) {
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(settingsData);
Reader reader = new BufferedReader(new InputStreamReader(byteArrayInputStream));
StringBuilder sb = new StringBuilder();
int byteChar;
try {
while((byteChar = reader.read()) != -1) {
sb.append((char) byteChar);
}
}
catch(IOException e) {
e.printStackTrace();
}
return sb.toString();
}
public String getStringFromByteArray(byte[] settingsData) {
StringBuilder sb = new StringBuilder();
for(byte willBeChar: settingsData) {
sb.append((char) willBeChar);
}
return sb.toString();
}
While base64 encoding is safe and one could argue "the right answer", I arrived here looking for a way to convert a Java byte array to/from a Java String as-is. That is, where each member of the byte array remains intact in its String counterpart, with no extra space required for encoding/transport.
This answer describing 8bit transparent encodings was very helpful for me. I used ISO-8859-1 on terabytes of binary data to convert back and forth successfully (binary <-> String) without the inflated space requirements needed for a base64 encoding, so is safe for my use-case - YMMV.
This was also helpful in explaining when/if you should experiment.
Using Kotlin on Android I found out it is handy to create some simple extension functions for that purpose.
Solution based on Base64 encoding/decoding to be able to pass via JSON, XML, etc:
import android.util.Base64
fun ByteArray.encodeToString() = String(Base64.encode(this, Base64.NO_WRAP), Charsets.UTF_8)
fun String.decodeToBytes(): ByteArray = Base64.decode(toByteArray(Charsets.UTF_8), Base64.NO_WRAP)
So you can use it
val byteArray = byteArrayOf(0, 1, 2, -1, -2, -3)
val string = byteArray.encodeToString()
val restoredArray = string.decodeToBytes()
import sun.misc.BASE64Decoder;
import sun.misc.BASE64Encoder;
private static String base64Encode(byte[] bytes)
{
return new BASE64Encoder().encode(bytes);
}
private static byte[] base64Decode(String s) throws IOException
{
return new BASE64Decoder().decodeBuffer(s);
}
I succeeded converting byte array to a string with this method:
public static String byteArrayToString(byte[] data){
String response = Arrays.toString(data);
String[] byteValues = response.substring(1, response.length() - 1).split(",");
byte[] bytes = new byte[byteValues.length];
for (int i=0, len=bytes.length; i<len; i++) {
bytes[i] = Byte.parseByte(byteValues[i].trim());
}
String str = new String(bytes);
return str.toLowerCase();
}
This one works for me up to android Q:
You can use the following method to convert o hex string to string
public static String hexToString(String hex) {
StringBuilder sb = new StringBuilder();
char[] hexData = hex.toCharArray();
for (int count = 0; count < hexData.length - 1; count += 2) {
int firstDigit = Character.digit(hexData[count], 16);
int lastDigit = Character.digit(hexData[count + 1], 16);
int decimal = firstDigit * 16 + lastDigit;
sb.append((char)decimal);
}
return sb.toString();
}
with the following to convert a byte array to a hex string
public static String bytesToHex(byte[] bytes) {
char[] hexChars = new char[bytes.length * 2];
for (int j = 0; j < bytes.length; j++) {
int v = bytes[j] & 0xFF;
hexChars[j * 2] = hexArray[v >>> 4];
hexChars[j * 2 + 1] = hexArray[v & 0x0F];
}
return new String(hexChars);
}
Here the working code.
// Encode byte array into string . TemplateBuffer1 is my bytearry variable.
String finger_buffer = Base64.encodeToString(templateBuffer1, Base64.DEFAULT);
Log.d(TAG, "Captured biometric device->" + finger_buffer);
// Decode String into Byte Array. decodedString is my bytearray[]
decodedString = Base64.decode(finger_buffer, Base64.DEFAULT);
You can use simple for loop for conversion:
public void byteArrToString(){
byte[] b = {'a','b','$'};
String str = "";
for(int i=0; i<b.length; i++){
char c = (char) b[i];
str+=c;
}
System.out.println(str);
}
byte[] image = {...};
String imageString = Base64.encodeToString(image, Base64.NO_WRAP);
Try to specify an 8-bit charset in both conversions. ISO-8859-1 for instance.
Read the bytes from String using ByteArrayInputStream and wrap it with BufferedReader which is Char Stream instead of Byte Stream which converts the byte data to String.
package com.cs.sajal;
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
public class TestCls {
public static void main(String[] args) {
String s=new String("Sajal is a good boy");
try
{
ByteArrayInputStream bis;
bis=new ByteArrayInputStream(s.getBytes("UTF-8"));
BufferedReader br=new BufferedReader(new InputStreamReader(bis));
System.out.println(br.readLine());
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
Output is:
Sajal is a good boy
You can do the following to convert byte array to string and then convert that string to byte array:
// 1. convert byte array to string and then string to byte array
// convert byte array to string
byte[] by_original = {0, 1, -2, 3, -4, -5, 6};
String str1 = Arrays.toString(by_original);
System.out.println(str1); // output: [0, 1, -2, 3, -4, -5, 6]
// convert string to byte array
String newString = str1.substring(1, str1.length()-1);
String[] stringArray = newString.split(", ");
byte[] by_new = new byte[stringArray.length];
for(int i=0; i<stringArray.length; i++) {
by_new[i] = (byte) Integer.parseInt(stringArray[i]);
}
System.out.println(Arrays.toString(by_new)); // output: [0, 1, -2, 3, -4, -5, 6]
But to convert the string to byte array and then convert that byte array to string, below approach can be used:
// 2. convert string to byte array and then byte array to string
// convert string to byte array
String str2 = "[0, 1, -2, 3, -4, -5, 6]";
byte[] byteStr2 = str2.getBytes(StandardCharsets.UTF_8);
// Now byteStr2 is [91, 48, 44, 32, 49, 44, 32, 45, 50, 44, 32, 51, 44, 32, 45, 52, 44, 32, 45, 53, 44, 32, 54, 93]
// convert byte array to string
System.out.println(new String(byteStr2, StandardCharsets.UTF_8)); // output: [0, 1, -2, 3, -4, -5, 6]
A string is a collection of char's (16bit unsigned). So if you are going to convert negative numbers into a string, they'll be lost in translation.
public class byteString {
/**
* #param args
*/
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
String msg = "Hello";
byte[] buff = new byte[1024];
buff = msg.getBytes("UTF-8");
System.out.println(buff);
String m = new String(buff);
System.out.println(m);
}
}

How to convert Strings to and from UTF8 byte arrays in Java

In Java, I have a String and I want to encode it as a byte array (in UTF8, or some other encoding). Alternately, I have a byte array (in some known encoding) and I want to convert it into a Java String. How do I do these conversions?
Convert from String to byte[]:
String s = "some text here";
byte[] b = s.getBytes(StandardCharsets.UTF_8);
Convert from byte[] to String:
byte[] b = {(byte) 99, (byte)97, (byte)116};
String s = new String(b, StandardCharsets.US_ASCII);
You should, of course, use the correct encoding name. My examples used US-ASCII and UTF-8, two commonly-used encodings.
Here's a solution that avoids performing the Charset lookup for every conversion:
import java.nio.charset.Charset;
private final Charset UTF8_CHARSET = Charset.forName("UTF-8");
String decodeUTF8(byte[] bytes) {
return new String(bytes, UTF8_CHARSET);
}
byte[] encodeUTF8(String string) {
return string.getBytes(UTF8_CHARSET);
}
String original = "hello world";
byte[] utf8Bytes = original.getBytes("UTF-8");
You can convert directly via the String(byte[], String) constructor and getBytes(String) method. Java exposes available character sets via the Charset class. The JDK documentation lists supported encodings.
90% of the time, such conversions are performed on streams, so you'd use the Reader/Writer classes. You would not incrementally decode using the String methods on arbitrary byte streams - you would leave yourself open to bugs involving multibyte characters.
My tomcat7 implementation is accepting strings as ISO-8859-1; despite the content-type of the HTTP request. The following solution worked for me when trying to correctly interpret characters like 'é' .
byte[] b1 = szP1.getBytes("ISO-8859-1");
System.out.println(b1.toString());
String szUT8 = new String(b1, "UTF-8");
System.out.println(szUT8);
When trying to interpret the string as US-ASCII, the byte info wasn't correctly interpreted.
b1 = szP1.getBytes("US-ASCII");
System.out.println(b1.toString());
As an alternative, StringUtils from Apache Commons can be used.
byte[] bytes = {(byte) 1};
String convertedString = StringUtils.newStringUtf8(bytes);
or
String myString = "example";
byte[] convertedBytes = StringUtils.getBytesUtf8(myString);
If you have non-standard charset, you can use getBytesUnchecked() or newString() accordingly.
I can't comment but don't want to start a new thread. But this isn't working. A simple round trip:
byte[] b = new byte[]{ 0, 0, 0, -127 }; // 0x00000081
String s = new String(b,StandardCharsets.UTF_8); // UTF8 = 0x0000, 0x0000, 0x0000, 0xfffd
b = s.getBytes(StandardCharsets.UTF_8); // [0, 0, 0, -17, -65, -67] 0x000000efbfbd != 0x00000081
I'd need b[] the same array before and after encoding which it isn't (this referrers to the first answer).
For decoding a series of bytes to a normal string message I finally got it working with UTF-8 encoding with this code:
/* Convert a list of UTF-8 numbers to a normal String
* Usefull for decoding a jms message that is delivered as a sequence of bytes instead of plain text
*/
public String convertUtf8NumbersToString(String[] numbers){
int length = numbers.length;
byte[] data = new byte[length];
for(int i = 0; i< length; i++){
data[i] = Byte.parseByte(numbers[i]);
}
return new String(data, Charset.forName("UTF-8"));
}
If you are using 7-bit ASCII or ISO-8859-1 (an amazingly common format) then you don't have to create a new java.lang.String at all. It's much much more performant to simply cast the byte into char:
Full working example:
for (byte b : new byte[] { 43, 45, (byte) 215, (byte) 247 }) {
char c = (char) b;
System.out.print(c);
}
If you are not using extended-characters like Ä, Æ, Å, Ç, Ï, Ê and can be sure that the only transmitted values are of the first 128 Unicode characters, then this code will also work for UTF-8 and extended ASCII (like cp-1252).
Charset UTF8_CHARSET = Charset.forName("UTF-8");
String strISO = "{\"name\":\"א\"}";
System.out.println(strISO);
byte[] b = strISO.getBytes();
for (byte c: b) {
System.out.print("[" + c + "]");
}
String str = new String(b, UTF8_CHARSET);
System.out.println(str);
Reader reader = new BufferedReader(
new InputStreamReader(
new ByteArrayInputStream(
string.getBytes(StandardCharsets.UTF_8)), StandardCharsets.UTF_8));
//query is your json
DefaultHttpClient httpClient = new DefaultHttpClient();
HttpPost postRequest = new HttpPost("http://my.site/test/v1/product/search?qy=");
StringEntity input = new StringEntity(query, "UTF-8");
input.setContentType("application/json");
postRequest.setEntity(input);
HttpResponse response=response = httpClient.execute(postRequest);
terribly late but i just encountered this issue and this is my fix:
private static String removeNonUtf8CompliantCharacters( final String inString ) {
if (null == inString ) return null;
byte[] byteArr = inString.getBytes();
for ( int i=0; i < byteArr.length; i++ ) {
byte ch= byteArr[i];
// remove any characters outside the valid UTF-8 range as well as all control characters
// except tabs and new lines
if ( !( (ch > 31 && ch < 253 ) || ch == '\t' || ch == '\n' || ch == '\r') ) {
byteArr[i]=' ';
}
}
return new String( byteArr );
}

Categories

Resources