trying to figure out what kind of unicode should i have - java

I'm working on spring boot on a project that fetch the data from the database then use post method to send them through HTTP post request, everything is okay but with Latina, the data i have in database encoded with: ISO 8859-6 i have encoded it to UTF-8 and UTF-16 but still it returns unreadable text question marks and special characters
test example in Arabic :
مرحبا
should be like this to be valid and reliable after post method :
06450631062d06280627
i can't figure out what kind of encoding happend here, now im doing integration from .NET to java:
this what they used in .NET :
public static String UnicodeStr2HexStr(String strMessage)
{
byte[] ba = Encoding.BigEndianUnicode.GetBytes(strMessage);
String strHex = BitConverter.ToString(ba);
strHex = strHex.Replace("-", "");
return strHex;
}
i just need to know what kind of encoding happend here to apply in java, and it would helpfull if someone provide me with way:
i have tried this but it return different value:
String encodedWithISO88591 = "مرحبا;
String decodedToUTF8 = new String(encodedWithISO88591.getBytes("ISO-8859-1"), "UTF-8");

What you're looking to get is the hex representation of the Arabic String in UTF-16BE
String yourVal = "مرحبا";
System.out.println(DatatypeConverter.printHexBinary(yourVal.getBytes(StandardCharsets.UTF_16BE)));
output will be :
06450631062D06280627

Related

Rest API call encoding in Flutter and decoding in Java

My Flutter app calls a REST API method /user/search/<search string> and I am forming the URL endpoint using encodeQueryComponent like this:
String endpoint = "/user/search/"+Uri.encodeQueryComponent(searchString);
The back-end implemented in Java tries to retrieve the search string like this:
String value = URLDecoder.decode(value, StandardCharsets.UTF_8.toString());
However, when the search string contains the + sign, the raw encode string in the back-end contains %2B and the decoded String contains space. As a temporary hack, I am currently doing value = value.replace("%2B", "+"); instead of decode. But this is obviously not the right approach because the search string may contain characters from any language or special characters.
Can someone tell me what is the right way to get the original string sent by the user in Java?

How to url encode data in Java

So Im trying to translate a working python code into Java. One of the steps required is to url encode the data. But when I encode the data in Java it looks different than the one in encoded in Python.
In one of the block of Python code theres this:
data = {'request-json': json}
print('Sending form data:', data)
data = urlencode(data)
data = data.encode('utf-8')
print('Sending data:', data)
The Output
Sending form data: {'request-json': '{"apikey": "xewpjipcpovwiiql"}'}
The output after being encoded
Sending data: b'request-json=%7B%22apikey%22%3A+%22xewpjipcpovwiiql%22%7D'
So this is what im trying to do in Java. As you can imagine Java is more involved. I used gson to convert to Json
Gson gson = new Gson();
API_Key key = new API_Key("xewpjipcpovwiiql");
String jsonInputString = gson.toJson(key);
Data data = new Data(key);
String request_form = gson.toJson(data);
System.out.println(request_form);
String urlencoded = URLEncoder.encode(request_form,StandardCharsets.UTF_8);
System.out.println(urlencoded);
The output:
Sending form data: {"request-json":{"apikey":"xewpjipcpovwiiql"}}
The output of the encoded string:
%7B%22requestjson%22%3A%7B%22apikey%22%3A%22xewpjipcpovwiiql%22%7D%7D
So they dont look the same so why are they coming differently ? How do I get the same python encoded String in Java ? I noticed in Python it used a combination of single and double quotes and in Java its only Double quotes so I dont know if that makes a difference.
Thank You!
On the Python side: The data.encode('utf-8') call is not necessary or at least the documentation describes with a different intention compared to this use https://docs.python.org/3/library/stdtypes.html#str.encode (and that's why there's a b' at the beggining).
The outer brackets are missing because it is interpreting request-json as the URL parameter name (it may be easier to understand if you add a second property at the json's top/first property level, you'll see you end with request-json=%7B%22apikey%22%3A+%22xewpjipcpovwiiql%22%7D&second-property=<second-property-value>).
On the Java side: the request_form is being completely interpreted as a single value to encode so you can put the encoded value as part of some parameter in a URL, as in: https://host:port?some-parameter-name=%7B%22requestjson%22%3A%7B%22apikey%22%3A%22xewpjipcpovwiiql%22%7D%7D

Hive UDF's treatment of URLs

I've created a Hive UDF that parses a URL. The URL contains query parameters. When I parse the input in my UDF, however, characters like '=' and '&' are converted to gibberish.
Initially, I was relying on String's toString() method to convert the Hive Text to Java String. The above characters are converted to gibberish with this approach. I then tried using the new String(str, StandardCharsets.UTF_8) to convert the Hive Text to Java String. This worked at first. Then, it started producing gibberish as well.
My method is shown below. Any ideas on what I might not be doing right?
public Text evaluate(final Text requestInput, final Text referrerInput) {
if (requestInput == null || referrerInput == null)
return null;
final String request = new String(requestInput.getBytes(), StandardCharsets.UTF_8); // converts '=' and '&' in URL strings to gibberish
final String referrer = new String(referrerInput.getBytes(), StandardCharsets.UTF_8); // converts '=' and '&' in URL strings to gibberish
}
When I run HQL in Hive:
SELECT get_json_object(json, '$.base.request_url') FROM events
I get this:
GET /api/get_info?id=1465473313746 HTTP/1.1
In my UDF, the toString() method (no additional processing) produces the following output:
GET /api/get_info?id\u003d1465473313746 HTTP/1.1
I learned that the = and & were being converted to their Unicode equivalents. Why this was happening is still unclear to me. Using Apache Commons StringEscapeUtils utility, the problem became easier:
StringEscapeUtils.unescapeJava(requestInput.toString())
solved the issue.

ISO-8859-1 encoded strings out of /into JSON in Java

My application has a Java servlet that reads a JSONObject out of the request and constructs some Java objects that are used elsewhere. I'm running into a problem because there are strings in the JSON that are encoded in ISO-8859-1. When I extract them into Java strings, the encoding appears to get interpreted as UTF-16. I need to be able to get the correctly encoded string back at some point to put into another JSON object.
I've tried mucking around with ByteBuffers and CharBuffers, but then I don't get any characters at all. I can't change the encoding, as I have to play nicely with other applications that use ISO-8859-1.
Any tips would be greatly appreciated.
It's a legacy application using Struts 1.3.8. I'm using net.sf.json 2.2.4 for JSONObject and JSONArray.
A snippet of the parsing code is:
final JSONObject a = (JSONObject) i;
final JSONObject attr = a.getJSONObject("attribute");
final String category = attr.getString("category");
final String value = attr.getString("value");
I then create POJOs using that information, that are retrieved by another action class to create JSON to pass to the client for display, or to pass to other applications.
So to clarify, if the JSON contains the string "Juan Guzmán", the Java String contains something like Juan Guzm?_An (I don't have the exact one in front of me). I'm not sure how to get the correct diacritical back. I believe that if I can get a Java String that contains the correct representation, that Mezzie's solution, below, will allow me to create the string with the correct encoding to put back into the JSON to serve back.
I had the same issue and I am using the same technology as you are. In our case, it was UTF 8. so just change that to UTF-16
public static String UTF8toISO( String str )
{
try
{
return new String( str.getBytes( "ISO-8859-1" ), "UTF-8" );
}
catch ( UnsupportedEncodingException e )
{
e.printStackTrace();
}
return str;
}

How to encode cyrillic symbols in HTTP-requests in Java?

Good time!
My Adroid app executes HTTP request to the one of the API services of Google. Sure, it works, when the parameter of the request in english, but when I test my function with cyrillic - I get the 400-error. Seems to be, the problem is to encode the Win-1251 string to UTF-8 ?How it can be done in Java ?
Try:
URLEncoder.encode(yourString, HTTP.UTF-8);
You should use URLEncoder#encode() to encode request parameters.
String query = "name1=" + URLEncoder.encode(value1, "UTF-8")
+ "&name2=" + URLEncoder.encode(value2, "UTF-8")
+ "&name3=" + URLEncoder.encode(value3, "UTF-8");
String url = "http://example.com?" + query;
// ...
Note that parameter names should actually also be URL-encoded, however in this particular example, they are all valid already. Also note that when you're using Android's builtin HttpClient API, you don't need to do this.
All String objects in Java are encoded as Unicode (UTF-16)
and Unicode includes characters from the Windows-1251 character
set.
For example, "Česká" is "\u010Cesk\u00E1".
If you want to send this string to other software using a different
character set then you need to convert the string to bytes in
that character set using class CharsetEncoder, or using
class OutputStreamWriter and passing the Charset.
And if you receive a string from other software in a different character
set then use class CharsetDecoder or InputStreamReader
with the Charset to convert it to back Unicode.
Update on depricated parameter:
import static java.nio.charset.StandardCharsets.UTF_8;
String pathEncoded = "";
try {
pathEncoded = URLEncoder.encode(path, UTF_8.toString());
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}

Categories

Resources