Java: Convert encoded characters to regular string - java

I have a string like this in Java:
"\xd0\xb5\xd0\xbd\xd0\xb4\xd0\xbf\xd0\xbe\xd0\xb9\xd0\xbd\xd1\x82"
How can I convert it to a human readable equivalent?
Note:
actually it is GWT and this string is coming from python as part of a JSON data.
The JSONParser transforms it to something that is totally irrelevant, so I want to be able to convert the string prior to parsing.
The expected, so called by me "human readable", should be "ендойнт" (https://mothereff.in/utf-8#%D0%B5%D0%BD%D0%B4%D0%BF%D0%BE%D0%B9%D0%BD%D1%82)

Assuming that the pattern is a repetition of characters in the form of "\x00", where 00 can be any number or letter in [a-fA-F], you can convert it with something like this:
String values = "\\xd0\\xb5\\xd0\\xbd\\xd0\\xb4\\xd0\\xbf\\xd0\\xbe\\xd0\\xb9\\xd0\\xbd\\xd1\\x82";
for (String val: values.split("\\\\x")) {
if (val.length() > 0 ) System.err.print((char) Integer.parseInt(val, 16));
}
Note that the if condition is due to the first delimiter: see How to prevent java.lang.String.split() from creating a leading empty string?

I don't know if it's just my console or it's not working, but you may try this code:
import java.io.UnsupportedEncodingException;
import javax.xml.bind.DatatypeConverter;
public class Utf8Decoder {
public static void main(String[] args) {
// TODO Auto-generated method stub
String url = "\\xd0\\xb5\\xd0\\xbd\\xd0\\xb4\\xd0\\xbf\\xd0\\xbe\\xd0\\xb9\\xd0\\xbd\\xd1\\x82";
url= url.replaceAll("\\\\x", ""); //remove the \x on the string...
//it is now hex so let's parse it
//convert to human readable text
String result="";
try {
byte[] bytes= DatatypeConverter.parseHexBinary(url);
result = new String(bytes, "UTF-8");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.print("decoded value:"+result);
}
}

Related

How to display backslash in json object value in java - Illegal escape JSONException

How to display a backslash in json value in java. I get a org.json.JSONException: Illegal escape. at 9 with the below sample code.
I'm using json 1.0.0 jar - org.json
String s1 = "{'Hi':'\\ksdfdsfsdfdfg'}";
int i = (int) '/';
System.out.println(s1);
try
{
JSONObject json = new JSONObject(s1);
}
catch (JSONException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
You need two backslashes to produce one backslash in a Java string literal "\\", and you need to double the backslash to get a backslash in the JSON string (since JavaScript has similar rules about backslash escapes and string literals as Java), thus, you need four backslashes:
String s1 = "{'Hi':'\\\\ksdfdsfsdfdfg'}";
If you do this:
String s1 = "{'Hi':'\\\\ksdfdsfsdfdfg'}";
try {
JSONObject json = new JSONObject(s1);
System.out.println(json.get("Hi"));
} catch (JSONException e) {
e.printStackTrace();
}
It prints:
\ksdfdsfsdfdfg
I think, you use wrong quotation marks, use double quotation mark in JSON:
String s1 = "{\"Hi\":\"\\ksdfdsfsdfdfg\"}"
That should work.
You have to add 4 backslashes for this to work. If you just print the parsed json object value you will see 2 backslashes. But if you get the value from the JSONObject you will see only one.
String s1 = "{'Hi':'\\\\ksdfdsfsdfdfg'}";
int i = (int) '/';
System.out.println(s1);
try {
JSONObject json = new JSONObject(s1);
System.out.println(json);//this will print two backslashes
String s = (String) json.get("Hi");
System.out.println(s);//this will print only one
} catch (JSONException e) {
e.printStackTrace();
}

Java - '\' not being used to escape double quotes " in string

I have some JSON content in bytebuffer as : {\"ID\":101}"}
This content is being returned from a service invocation. I get result from that microservice in a bytebuffer. (This means - I cannot get the content changed)
Now, I need to get the json object from this buffer. I'm using this code :
ByteBuffer payloadByteBuffer = invokeResult.getPayload();
byte[] payloadByteArray = payloadByteBuffer.array();
rawJson = new String(payloadByteArray, Charset.forName("UTF-8"));
System.out.println("Raw JSon result = "+rawJson);
The string that gets printed is : "{\"ID\":101}"
Please note that '\' is getting printed within the string but it is originally used to escape double quote. So, when I try to convert this string to JSON object, I get an error :
"Missing value at 1 [character 2 line 1]"
which is probably due to '\' not being used to escape double quote character(I think).
So, My question is, how do I modify my string to treat '\' character for it's correct purpose?
I have tried replacing "\". but it didn't work. I don't know why.
I have also tried different charset encoding : US-ASCII and ASCII but got the same result.
You can try json-simple. You can use the dependency from here
P.S: Your JSON response is wrong.
ByteBuffer payloadByteBuffer = invokeResult.getPayload();
byte[] payloadByteArray = payloadByteBuffer.array();
rawJson = new String(payloadByteArray, Charset.forName("UTF-8"));
JSONParser parser = new JSONParser();
try {
JSONObject json = (JSONObject) parser.parse(rawJson);
} catch (ParseException e) {
e.printStackTrace();
}
It parses the JSON with \ in it.
How about using the apache's commons lang library?
I think it's a simple and easy way of remove your problem away.
Here is my full test code for you.
package just.test;
import java.nio.charset.Charset;
import org.apache.commons.lang3.StringEscapeUtils;
import org.json.JSONObject;
import com.ibm.icu.impl.ByteBuffer;
public class UnescapeCharTest {
private static void testJSONString(final String rawJson)
{
JSONObject json = null;
try
{
json = new JSONObject(rawJson);
System.out.println("ID = "+json.get("ID"));
}
catch(org.json.JSONException je)
{
je.printStackTrace();
}
String convJson = rawJson.replace("\\", "");
try
{
json = new JSONObject(convJson);
System.out.println("ID = "+json.get("ID"));
}
catch(org.json.JSONException je)
{
je.printStackTrace();
}
convJson = StringEscapeUtils.unescapeJson(rawJson);
try
{
json = new JSONObject(convJson);
System.out.println("ID = "+json.get("ID"));
}
catch(org.json.JSONException je)
{
je.printStackTrace();
}
}
public static void main(String[] args)
{
String rawJson = "{\\\"ID\\\":101}";
testJSONString(rawJson);
String rawJson2 = null;
ByteBuffer payloadByteBuffer = ByteBuffer.wrap(rawJson.getBytes());
byte[] payloadByteArray = payloadByteBuffer.array();
rawJson2 = new String(payloadByteArray, Charset.forName("UTF-8"));
testJSONString(rawJson2);
}
}
I hope the code is right for you.
Have a good coding...

Check if a string contain characters with bad encoding

I receive a XML file with a tag whose value is "97ò00430 ò" while this tag initially contains only numbers. The encoding use is "ISO-8859-1".
How to detect the bad characters (ò...) in java, please ?
LNA
I guess you could use a Regex to check the format of your tag (here, "\d+" if you want numbers only).
public static String encode(String chr) {
try {
byte[] bytes = chr.getBytes("ISO-8859-1");
if (!validUTF8(bytes))
return chr;
return new String(bytes, "UTF-8");
} catch (UnsupportedEncodingException e) {
throw new IllegalStateException("No char" + e.getMessage());
}
}

Java Problems encoding UTF8

I think the easiest way to explain my problem is with a little example:
My string at the beginning is: Pâtes, and the result should be: Pâtes. What I get as result is still Pâtes How can I fix this?
Here the code:
private String encode(String string) {
try {
byte ptext[] =string.getBytes("UTF8");
string = new String(ptext, "UTF8");
}
catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return string;
}
There are two problems with your code. The first is that you're using UTF8, but the correct character set is UTF-8.
The second is that you're essentially performing a no op. By calling byte ptext[] =string.getBytes("UTF-8"); you are saying that this string is UTF-8. Then you convert it to UTF-8 which it already is.
What I think you mean is that the input is ISO-8859-1 and you want to convert it to UTF-8. (This fits with the example input and output you've given).
Try:
private String encode(String string) {
try {
byte ptext[] = string.getBytes("ISO-8859-1");
string = new String(ptext, "UTF-8");
}
catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return string;
}
This assumes that your initial string was originally read from somewhere and only contains ISO-8859-1 characters. As mentioned in a comment you should try to ensure the data is loaded in correctly from the source (i.e. when it is still just an array of bytes).

java concatenate two strings error

I have one function that returns me String :
public String getString(String password){
......
try {
.......
encodedPassword = Base64.encodeToString(msgDigest,1 );
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (NoSuchAlgorithmException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return encodedPassword;
}
I want to add (concatenate) "=" String to returning string from function
I try using this:
encrptdPassword = getString("1234");
encrptdPassword = encrptdPassword+"=";
Or:
encrptdPassword = encrptdPassword .concat("=");
but I get result like two different objects (space or brake between)
I think problem is in Base64.encodeToString , but I must use 64 based string
Function getString returns me:
A6xnQhbz4Vx2HuGl4lXwZ5U2I8iziLRFnhP5eNfIRvQ
I want to add = to the returning string as:
A6xnQhbz4Vx2HuGl4lXwZ5U2I8iziLRFnhP5eNfIRvQ=
but I receive this on output
A6xnQhbz4Vx2HuGl4lXwZ5U2I8iziLRFnhP5eNfIRvQ =
Or:
A6xnQhbz4Vx2HuGl4lXwZ5U2I8iziLRFnhP5eNfIRvQ
=
...like 2 different strings.
Where I'm wrong?
I assume you're using Base64 from Apache Commons Codec.
The default constructor for this class uses "\r\n" as a line separator, which it adds to the end of every encoded line. If you don't want this, construct the object as:
new Base64(76, '');
If this isn't the class you're calling (it looks like from your code sample you're calling a static method), check the API and see if you can set a line separator for the conversion.
Isn't the 1 in Base64.encodeToString(msgDigest,1 ) padding?
If it's not, then you could just trim() the string to remove the whitespace.

Categories

Resources