Java convert string bytes to readable string [duplicate]

Java convert string bytes to readable string [duplicate] - java

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Java Byte Array to String to Byte Array
I have a method called READ() that accept a String parameter. This string is already have been converted into bytes. All I want is to convert into a readable string.
public static String READ(final String data) throws UnsupportedEncodingException{
char[] temp = data.toCharArray();
byte[] bytes = new byte[temp.length];
int i = 0;
for(char c : temp){
bytes[i++] = (byte)c;
}
return new String(bytes, "UTF-8");
}
public static String SEND(String data) throws UnsupportedEncodingException{
return data.getBytes()+"";
}
Testing:
String msg = "testing !";
String msgBytes = null;
try {
msgBytes = SEND(msg);
} catch (UnsupportedEncodingException e2) {
e2.printStackTrace();
}
System.out.println( "SEND: " + msgBytes);
try {
System.out.println("RECEIVE: " + READ(msgBytes));
} catch (UnsupportedEncodingException e2) {
e2.printStackTrace();
}
And the OUTPUT IS:
SEND: [B#452467ec
RECEIVE: [B#452467ec

String has a constructor that takes byte[] as an argument.
String(byte[] bytes) --
Constructs a new String by decoding the specified array of bytes using the platform's default charset.
String(byte[] bytes, Charset charset) --
Constructs a new String by decoding the specified array of bytes using the specified charset.
So print it like this:
System.out.println(new String(msgBytes, "UTF-8"));
What you currently see is a default Object.toString() which prints a memory reference to the byte array (all arrays extend from Object).

You print a byte[] which is an Object, so it just prints the reference into memory of that Object because that's what the default implementation of toString() does.
You should print the String directly. Don't print its byte[] representation.
What you do in your code is to transform a String into byte[] and then back to String in the wrong way.

Related

How to correctly encode and decode a string in Base64?

I want to encode a string in Base64 for later decoding it. I encode it doing this:
public static String encryptString(String string) {
byte[] bytesEncoded = Base64.getEncoder().encode(string.getBytes());
return (new String(bytesEncoded));
}
Then, the encoded string is stored on disk using UTF-8. After restarting the application, the encoded string is readed from disk and I'm trying to decode the string using this:
public static String decryptString(String string) {
byte[] valueDecoded = Base64.getDecoder().decode(string);
return (new String(valueDecoded));
}
Something is wrong because it is giving me this exception:
java.lang.IllegalArgumentException: Illegal base64 character d
at java.base/java.util.Base64$Decoder.decode0(Base64.java:743)
at java.base/java.util.Base64$Decoder.decode(Base64.java:535)
at java.base/java.util.Base64$Decoder.decode(Base64.java:558)
This is a TRACE step by step
1º i encode this: {"configuration":{"shop":{"name":"","addressLine1":"","addressLine2":"","postalCode":"","city":"","country":"","phoneNumber":""}},"jointBets":[],"groups":[{"name":"Test","members":[]}]}
into this: eyJjb25maWd1cmF0aW9uIjp7InNob3AiOnsibmFtZSI6IiIsImFkZHJlc3NMaW5lMSI6IiIsImFkZHJlc3NMaW5lMiI6IiIsInBvc3RhbENvZGUiOiIiLCJjaXR5IjoiIiwiY291bnRyeSI6IiIsInBob25lTnVtYmVyIjoiIn19LCJqb2ludEJldHMiOltdLCJncm91cHMiOlt7Im5hbWUiOiJUZXN0IiwibWVtYmVycyI6W119XX0=
2º i store it on disk in utf8
3º i retreive it from disk and it's this string:
eyJjb25maWd1cmF0aW9uIjp7InNob3AiOnsibmFtZSI6IiIsImFkZHJlc3NMaW5lMSI6IiIsImFkZHJlc3NMaW5lMiI6IiIsInBvc3RhbENvZGUiOiIiLCJjaXR5IjoiIiwiY291bnRyeSI6IiIsInBob25lTnVtYmVyIjoiIn19LCJqb2ludEJldHMiOltdLCJncm91cHMiOlt7Im5hbWUiOiJUZXN0IiwibWVtYmVycyI6W119XX0=
4º i decode it and get the exception.

The old Base64 utility add linebreaks every 76 characters in Java8.
The result looks like that:
/9j/4AAQSkZJRgABAgAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0a
HBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIy
MjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCABkAGQDASIA
AhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQA
AAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3
ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWm
...
It seems that this behaviour changed with some version. At least with Java11 the decoder is not accepting line-breaks anymore.
To avoid the problem you could change you method
public static String decryptString(String string) {
byte[] valueDecoded = Base64.getDecoder().decode(string.replace("\n","").replace("\r","");
return new String(valueDecoded);
}

Then, the encoded string is stored on disk using UTF-8. After
restarting the application, the encoded string is readed from disk and
I'm trying to decode the string using this:
This seems to be a point of failure. Most likely your problem is OS/JDK dependent Apparently the following code seems to work well for me (Win 7, latest JDK 1.8):
public static void main(String[] args) throws IOException {
String source = "{\"configuration\":{\"shop\":{\"name\":\"España\",\"addressLine1\":\"\",\"addressLine2\":\"\"," +
"\"postalCode\":\"\",\"city\":\"\",\"country\":\"\",\"phoneNumber\":\"\"}},\"jointBets\":[]," +
"\"groups\":[{\"name\":\"Test\",\"members\":[]}]}";
// Encode string
String encoded = encryptString(source);
System.out.println("Base64 encoded: " + encoded);
// Temp Dir
String tempDir = System.getProperty("java.io.tmpdir");
// Write to File
try (BufferedWriter writer = new BufferedWriter(new FileWriter(tempDir + "data.txt"))) {
writer.write(encoded);
}
// Read from File
Path path = Paths.get(tempDir + "data.txt");
Stream<String> lines = Files.lines(path);
String dataFromFile = lines.collect(Collectors.joining("\n"));
lines.close();
// Compare content
assert encoded.equals(dataFromFile);
// Decode string
String decoded = decryptString(dataFromFile);
System.out.println("Base64 decoded: " + decoded);
}
public static String encryptString(String string) {
byte[] bytesEncoded = Base64.getEncoder().encode(string.getBytes(StandardCharsets.UTF_8));
return new String(bytesEncoded);
}
public static String decryptString(String string) {
byte[] valueDecoded = Base64.getDecoder().decode(string);
return new String(valueDecoded);
}
Base64 encoded:
eyJjb25maWd1cmF0aW9uIjp7InNob3AiOnsibmFtZSI6IkVzcGHDsWEiLCJhZGRyZXNzTGluZTEiOiIiLCJhZGRyZXNzTGluZTIiOiIiLCJwb3N0YWxDb2RlIjoiIiwiY2l0eSI6IiIsImNvdW50cnkiOiIiLCJwaG9uZU51bWJlciI6IiJ9fSwiam9pbnRCZXRzIjpbXSwiZ3JvdXBzIjpbeyJuYW1lIjoiVGVzdCIsIm1lbWJlcnMiOltdfV19
Base64 decoded:
{"configuration":{"shop":{"name":"España","addressLine1":"","addressLine2":"","postalCode":"","city":"","country":"","phoneNumber":""}},"jointBets":[],"groups":[{"name":"Test","members":[]}]}

My guess is that you are not specifying a charset. Try running the below maybe with and without the charset specified for the String constructor to verify.
#Test
public void base64Test() throws Exception{
String string = "ABCDF";
byte[] bytesEncoded = Base64.getEncoder().encode(string.getBytes());
String encodedStr = (new String(bytesEncoded,Charset.forName("ISO-8859-1")));
System.out.println(encodedStr);
byte[] valueDecoded = Base64.getDecoder().decode(encodedStr);
String decodedStr = (new String(valueDecoded,Charset.forName("ISO-8859-1")));
System.out.println(decodedStr);
}

Encoding without a String in Java

In the following method, how it is possible to return the Cp037 byte[] without creating a String object... maybe using some encode() methods and Charset?
public byte[] encodeCp037(byte[] bytes)
{
String str = null;
try
{
str = new String(bytes, "Cp037");
}
catch (UnsupportedEncodingException e)
{
throw new UnsupportedOperationException("Invalid encoding. Charset=Cp037.");
}
return str.getBytes();
}

This method can change encoding without creating a String.(Of course it has overhead but no String)
public static byte[] transform(byte[] bytes, String fromCharset, String toCharset) {
ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
CharBuffer charBuffer = Charset.forName(fromCharset).decode(byteBuffer);
ByteBuffer targetBuffer = Charset.forName(toCharset).encode(charBuffer);
return Arrays.copyOfRange(targetBuffer.array(), targetBuffer.position(), targetBuffer.limit());
}
And you can use it convert from cp037 to utf-8 for example.
byte[] cp037 = "123".getBytes("cp037");
byte[] utf8 = transform(cp037, "cp037", "utf8");
System.out.println(new String(utf8, "utf-8"));
This will print 123 and it was converted successfully.

Encode String to UCS2

I need to encode java string to UCS like 'h' to '0068'
eg. I have a test setup
#Test
public void testEncodeAsUCS2_string1() throws Exception {
String encoded = sendRequestTransformer.encodeAsUCS2("hello");
Assert.assertEquals("00680065006C006C006F", encoded);
}
so this test lets me know if the conversion was successful eg. hello should return 00680065006C006C006F
In my code I have the following conversion which I know is wrong as it prints [B#742cb491. (This toString method for class Object returns a string consisting of the name of the class of which the object is an instance, the at-sign character `#', and the unsigned hexadecimal representation of the hash code of the object)
I need help with the conversion
String encodeAsUcx2(String messageContent){
try {
messageContent.getBytes("UTF-16LE");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return String.valueOf(messageContent.getBytes());
}
How would I go about this?

After you convert your message to a byte[], you just need to loop through and convert each byte to a hex representation, and build a new String as you go.
For instance, you could do something like this:
static String encodeAsUcs2(String messageContent) throws UnsupportedEncodingException {
byte[] bytes = messageContent.getBytes("UTF-16LE");
StringBuilder sb = new StringBuilder();
for (byte b : bytes) {
sb.append(String.format("%02X", b));
}
return sb.toString();
}
I should note, however, that this will return "680065006C006C006F00", not "00680065006C006C006F" when encoding "hello".
To get the output you're expecting, you would need to use big-endian encoding ("UTF-16BE").

Java: Encode String in quoted-printable

I am looking for a way to quoted-printable encode a string in Java just like php's native quoted_printable_encode() function.
I have tried to use JavaMails's MimeUtility library. But I cannot get the encode(java.io.OutputStream os, java.lang.String encoding) method to work since it is taking an OutputStream as input instead of a String (I used the function getBytes() to convert the String) and outputs something that I cannot get back to a String (I'm a Java noob :)
Can anyone give me tips on how to write a wrapper that converts a String into an OutputStream and outputs the result as a String after encoding it?

To use this MimeUtility method you have to create a ByteArrayOutputStream which will accumulate the bytes written to it, which you can then recover. For example, to encode the string original:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
OutputStream encodedOut = MimeUtility.encode(baos, "quoted-printable");
encodedOut.write(original.getBytes());
String encoded = baos.toString();
The encodeText function from the same class will work on strings, but it produces Q-encoding, which is similar to quoted-printable but not quite the same:
String encoded = MimeUtility.encodeText(original, null, "Q");

Thats what helps me
#Test
public void koi8r() {
String input = "=?koi8-r?Q?11=5F=F4=ED=5F21=2E05=2Erar?=";
String decode = EncodingUtils.decodeKoi8r(input);
Assertions.assertEquals("11_ТМ_21.05.rar", decode);
}
#Test
public void koi8rWithoutStartTag() {
String input = "=CF=D4=C4=C5=CC=D8=CE=D9=CD =D4=D2=C1=CE=DB=C5=CD =D2=C5=DA=C0=CD=.eml";
String decode = EncodingUtils.decodeKoi8r(input);
Assertions.assertEquals("отдельным траншем резюм=.eml", decode);
}
public static String decodeKoi8r(String text) {
String decode;
try {
decode = MimeUtility.decodeText(text);
} catch (UnsupportedEncodingException e) {
decode = text;
}
if (isQuotedKoi8r(decode)) {
decode = decode(text, "KOI8-R", "quoted-printable", "KOI8-R");
}
return decode;
}
public static boolean isQuotedKoi8r(String text) {
return text.contains("=") || text.toLowerCase().contains("koi8-r");
}
public static String decode(String text, String textEncoding, String encoding, String resultCharset) {
if (text.length() == 0) {
return text;
}
try {
byte[] bytes = text.getBytes(textEncoding);
InputStream decodedStream = MimeUtility.decode(new ByteArrayInputStream(bytes), encoding);
byte[] tmp = new byte[bytes.length];
int n = decodedStream.read(tmp);
byte[] res = new byte[n];
System.arraycopy(tmp, 0, res, 0, n);
return new String(res, resultCharset);
} catch (IOException | MessagingException e) {
return text;
}
}

Check if a String is valid UTF-8 encoded in Java

How can I check if a string is in valid UTF-8 format?

Only byte data can be checked. If you constructed a String then its already in UTF-16 internally.
Also only byte arrays can be UTF-8 encoded.
Here is a common case of UTF-8 conversions.
String myString = "\u0048\u0065\u006C\u006C\u006F World";
System.out.println(myString);
byte[] myBytes = null;
try
{
myBytes = myString.getBytes("UTF-8");
}
catch (UnsupportedEncodingException e)
{
e.printStackTrace();
System.exit(-1);
}
for (int i=0; i < myBytes.length; i++) {
System.out.println(myBytes[i]);
}
If you don't know the encoding of your byte array, juniversalchardet is a library to help you detect it.

The following post is taken from the official Java tutorials available at: https://docs.oracle.com/javase/tutorial/i18n/text/string.html.
The StringConverter program starts by creating a String containing
Unicode characters:
String original = new String("A" + "\u00ea" + "\u00f1" + "\u00fc" + "C");
When printed, the String named original appears as:
AêñüC
To convert the String object to UTF-8, invoke the getBytes method and
specify the appropriate encoding identifier as a parameter. The
getBytes method returns an array of bytes in UTF-8 format. To create a
String object from an array of non-Unicode bytes, invoke the String
constructor with the encoding parameter. The code that makes these
calls is enclosed in a try block, in case the specified encoding is
unsupported:
try {
byte[] utf8Bytes = original.getBytes("UTF8");
byte[] defaultBytes = original.getBytes();
String roundTrip = new String(utf8Bytes, "UTF8");
System.out.println("roundTrip = " + roundTrip);
System.out.println();
printBytes(utf8Bytes, "utf8Bytes");
System.out.println();
printBytes(defaultBytes, "defaultBytes");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
The StringConverter program prints out the values in the utf8Bytes and
defaultBytes arrays to demonstrate an important point: The length of
the converted text might not be the same as the length of the source
text. Some Unicode characters translate into single bytes, others into
pairs or triplets of bytes.
The printBytes method displays the byte arrays by invoking the byteToHex method, which is defined in the source file,
UnicodeFormatter.java. Here is the printBytes method:
public static void printBytes(byte[] array, String name) {
for (int k = 0; k < array.length; k++) {
System.out.println(name + "[" + k + "] = " + "0x" +
UnicodeFormatter.byteToHex(array[k]));
}
}
The output of the printBytes method follows. Note that only the first
and last bytes, the A and C characters, are the same in both arrays:
utf8Bytes[0] = 0x41
utf8Bytes[1] = 0xc3
utf8Bytes[2] = 0xaa
utf8Bytes[3] = 0xc3
utf8Bytes[4] = 0xb1
utf8Bytes[5] = 0xc3
utf8Bytes[6] = 0xbc
utf8Bytes[7] = 0x43
defaultBytes[0] = 0x41
defaultBytes[1] = 0xea
defaultBytes[2] = 0xf1
defaultBytes[3] = 0xfc
defaultBytes[4] = 0x43

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java convert string bytes to readable string [duplicate] - java

Related

How to correctly encode and decode a string in Base64?

Encoding without a String in Java

Encode String to UCS2

Java: Encode String in quoted-printable

Check if a String is valid UTF-8 encoded in Java

Categories

Resources