ISO-8859-1 encoded strings out of /into JSON in Java

ISO-8859-1 encoded strings out of /into JSON in Java - java

My application has a Java servlet that reads a JSONObject out of the request and constructs some Java objects that are used elsewhere. I'm running into a problem because there are strings in the JSON that are encoded in ISO-8859-1. When I extract them into Java strings, the encoding appears to get interpreted as UTF-16. I need to be able to get the correctly encoded string back at some point to put into another JSON object.
I've tried mucking around with ByteBuffers and CharBuffers, but then I don't get any characters at all. I can't change the encoding, as I have to play nicely with other applications that use ISO-8859-1.
Any tips would be greatly appreciated.
It's a legacy application using Struts 1.3.8. I'm using net.sf.json 2.2.4 for JSONObject and JSONArray.
A snippet of the parsing code is:
final JSONObject a = (JSONObject) i;
final JSONObject attr = a.getJSONObject("attribute");
final String category = attr.getString("category");
final String value = attr.getString("value");
I then create POJOs using that information, that are retrieved by another action class to create JSON to pass to the client for display, or to pass to other applications.
So to clarify, if the JSON contains the string "Juan Guzmán", the Java String contains something like Juan Guzm?_An (I don't have the exact one in front of me). I'm not sure how to get the correct diacritical back. I believe that if I can get a Java String that contains the correct representation, that Mezzie's solution, below, will allow me to create the string with the correct encoding to put back into the JSON to serve back.

I had the same issue and I am using the same technology as you are. In our case, it was UTF 8. so just change that to UTF-16
public static String UTF8toISO( String str )
{
try
{
return new String( str.getBytes( "ISO-8859-1" ), "UTF-8" );
}
catch ( UnsupportedEncodingException e )
{
e.printStackTrace();
}
return str;
}

Related

Rest API call encoding in Flutter and decoding in Java

My Flutter app calls a REST API method /user/search/<search string> and I am forming the URL endpoint using encodeQueryComponent like this:
String endpoint = "/user/search/"+Uri.encodeQueryComponent(searchString);
The back-end implemented in Java tries to retrieve the search string like this:
String value = URLDecoder.decode(value, StandardCharsets.UTF_8.toString());
However, when the search string contains the + sign, the raw encode string in the back-end contains %2B and the decoded String contains space. As a temporary hack, I am currently doing value = value.replace("%2B", "+"); instead of decode. But this is obviously not the right approach because the search string may contain characters from any language or special characters.
Can someone tell me what is the right way to get the original string sent by the user in Java?

How to url encode data in Java

So Im trying to translate a working python code into Java. One of the steps required is to url encode the data. But when I encode the data in Java it looks different than the one in encoded in Python.
In one of the block of Python code theres this:
data = {'request-json': json}
print('Sending form data:', data)
data = urlencode(data)
data = data.encode('utf-8')
print('Sending data:', data)
The Output
Sending form data: {'request-json': '{"apikey": "xewpjipcpovwiiql"}'}
The output after being encoded
Sending data: b'request-json=%7B%22apikey%22%3A+%22xewpjipcpovwiiql%22%7D'
So this is what im trying to do in Java. As you can imagine Java is more involved. I used gson to convert to Json
Gson gson = new Gson();
API_Key key = new API_Key("xewpjipcpovwiiql");
String jsonInputString = gson.toJson(key);
Data data = new Data(key);
String request_form = gson.toJson(data);
System.out.println(request_form);
String urlencoded = URLEncoder.encode(request_form,StandardCharsets.UTF_8);
System.out.println(urlencoded);
The output:
Sending form data: {"request-json":{"apikey":"xewpjipcpovwiiql"}}
The output of the encoded string:
%7B%22requestjson%22%3A%7B%22apikey%22%3A%22xewpjipcpovwiiql%22%7D%7D
So they dont look the same so why are they coming differently ? How do I get the same python encoded String in Java ? I noticed in Python it used a combination of single and double quotes and in Java its only Double quotes so I dont know if that makes a difference.
Thank You!

On the Python side: The data.encode('utf-8') call is not necessary or at least the documentation describes with a different intention compared to this use https://docs.python.org/3/library/stdtypes.html#str.encode (and that's why there's a b' at the beggining).
The outer brackets are missing because it is interpreting request-json as the URL parameter name (it may be easier to understand if you add a second property at the json's top/first property level, you'll see you end with request-json=%7B%22apikey%22%3A+%22xewpjipcpovwiiql%22%7D&second-property=<second-property-value>).
On the Java side: the request_form is being completely interpreted as a single value to encode so you can put the encoded value as part of some parameter in a URL, as in: https://host:port?some-parameter-name=%7B%22requestjson%22%3A%7B%22apikey%22%3A%22xewpjipcpovwiiql%22%7D%7D

trying to figure out what kind of unicode should i have

I'm working on spring boot on a project that fetch the data from the database then use post method to send them through HTTP post request, everything is okay but with Latina, the data i have in database encoded with: ISO 8859-6 i have encoded it to UTF-8 and UTF-16 but still it returns unreadable text question marks and special characters
test example in Arabic :
مرحبا
should be like this to be valid and reliable after post method :
06450631062d06280627
i can't figure out what kind of encoding happend here, now im doing integration from .NET to java:
this what they used in .NET :
public static String UnicodeStr2HexStr(String strMessage)
{
byte[] ba = Encoding.BigEndianUnicode.GetBytes(strMessage);
String strHex = BitConverter.ToString(ba);
strHex = strHex.Replace("-", "");
return strHex;
}
i just need to know what kind of encoding happend here to apply in java, and it would helpfull if someone provide me with way:
i have tried this but it return different value:
String encodedWithISO88591 = "مرحبا;
String decodedToUTF8 = new String(encodedWithISO88591.getBytes("ISO-8859-1"), "UTF-8");

What you're looking to get is the hex representation of the Arabic String in UTF-16BE
String yourVal = "مرحبا";
System.out.println(DatatypeConverter.printHexBinary(yourVal.getBytes(StandardCharsets.UTF_16BE)));
output will be :
06450631062D06280627

How to ensure that the JSON string is UTF-8 encoded in Java

I am working on a legacy web service client code where the JSON data is being sent to the web service. Recently it was found that for some requests in the JSON body, the service is giving HTTP 400 response due to invalid characters (non-UTF8) in the JSON Body.
Below is one example of the data which is causing the issue.
String value = "zu3z5eq tô‰U\f‹Á‹€z";
I am using org.json.JSONObject.toString() method to generate the JSON string. Can you please let me know how can I ensure that the JSON string is UTF-8 encoded?
I already tried few solutions like available online , like converting to byte array and then back, using java charset methods etc, but they did not work. Either they convert the valid values as well like chinese/japanese characters, or doesn't work at all.
Can you please provide some input on this?

You need to set the character encoding for OutputStreamWriter when you create it:
httpConn.connect();
wr = new OutputStreamWriter(httpConn.getOutputStream(), StandardCharsets.UTF_8);
wr.write(jsonObject.toString());
wr.flush();
Otherwise it defaults to the "platform default encoding," which is some encoding that has been used historically for text files on whatever system you are running.

Use Base64 encoding for converting the value to Byte[].
String value = "zu3z5eq tô‰U\f‹Á‹€z";
// WHILE SENDING ENCODE THE VALUE
byte[] encodedBytes = Base64.getEncoder().encode(value.getBytes("UTF-8"));
String encodedValue = new String(encodedBytes, "UTF-8");
// TRANSPORT....
// ON RECEIVING END DECODE THE VALUE
byte[] decodedBytes = Base64.getDecoder().decode(encodedValue.getBytes("UTF-8"));
System.out.println( new String(decodedBytes, "UTF-8"));

Cloudant java non-latin characters

I am having a difficulty trying to use the Cloudant java client with Greek characters. Saving objects that include Strings with Greek characters seems to be working ok, as they appear correctly at the Cloudant console. Below is a minimal test case for this. The DummyObject has a String name, an _id and a _rev.
String password = "xxxx";
CloudantClient client = new CloudantClient("xx", "xxx", password);
Database database = client.database("mydatabase", false);
DummyClass dummyobject = new DummyClass();
dummyobject.setName("ά έ ό ύ αβγδεζηθικλμνξ");
Response saveResponse = database.save(dummyobject);
String id = saveResponse.getId();
String result=new String();
DummyClass loaded = database.find(DummyClass.class,id);
result = result+"Object:"+loaded.getName()+"\n"; //Prints out garbage
result = result+"UTF-8:"+new String(loaded.getName().getBytes(),Charset.forName("utf-8"))+"\n"; //Prints most characters correct, except for some accented ones
InputStream inputStream = database.find(id);
DummyClass loadedFromStream = Json.fromJson(Json.parse(inputStream), DummyClass.class);
result = result+"From Stream:"+loadedFromStream.getName(); //prints out fine
return ok(result);
By retrieving the stream and using Jackson to deserialize, the output is correct, but then I'd have to implement many of the provided methods for views, bulk document manipulation etc.
Perhaps the problem is in the LightCouch library, specifically here: CouchDbClientBase.java, since that is the point that I have found differs between the two implementations (get() as object and as stream). However, I do not know how to confirm, fix or work around it.

We fixed this in release 1.1.0, I think:
https://github.com/cloudant/java-cloudant/releases/tag/1.1.0
[FIX] Fixed handling of non-ASCII characters when the platform's default charset is not UTF-8.

The problem was indeed at the LightCouch library. Making the following change and respective changes on the code for views, fixed it.
return getGson().fromJson(new InputStreamReader(in), classType);
to
return getGson().fromJson(new InputStreamReader(in, Charset.forName("UTF-8")), classType);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.