How to encode cyrillic symbols in HTTP-requests in Java?

How to encode cyrillic symbols in HTTP-requests in Java? - java

Good time!
My Adroid app executes HTTP request to the one of the API services of Google. Sure, it works, when the parameter of the request in english, but when I test my function with cyrillic - I get the 400-error. Seems to be, the problem is to encode the Win-1251 string to UTF-8 ?How it can be done in Java ?

Try:
URLEncoder.encode(yourString, HTTP.UTF-8);

You should use URLEncoder#encode() to encode request parameters.
String query = "name1=" + URLEncoder.encode(value1, "UTF-8")
+ "&name2=" + URLEncoder.encode(value2, "UTF-8")
+ "&name3=" + URLEncoder.encode(value3, "UTF-8");
String url = "http://example.com?" + query;
// ...
Note that parameter names should actually also be URL-encoded, however in this particular example, they are all valid already. Also note that when you're using Android's builtin HttpClient API, you don't need to do this.

All String objects in Java are encoded as Unicode (UTF-16)
and Unicode includes characters from the Windows-1251 character
set.
For example, "Česká" is "\u010Cesk\u00E1".
If you want to send this string to other software using a different
character set then you need to convert the string to bytes in
that character set using class CharsetEncoder, or using
class OutputStreamWriter and passing the Charset.
And if you receive a string from other software in a different character
set then use class CharsetDecoder or InputStreamReader
with the Charset to convert it to back Unicode.

Update on depricated parameter:
import static java.nio.charset.StandardCharsets.UTF_8;
String pathEncoded = "";
try {
pathEncoded = URLEncoder.encode(path, UTF_8.toString());
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}

Related

trying to figure out what kind of unicode should i have

I'm working on spring boot on a project that fetch the data from the database then use post method to send them through HTTP post request, everything is okay but with Latina, the data i have in database encoded with: ISO 8859-6 i have encoded it to UTF-8 and UTF-16 but still it returns unreadable text question marks and special characters
test example in Arabic :
مرحبا
should be like this to be valid and reliable after post method :
06450631062d06280627
i can't figure out what kind of encoding happend here, now im doing integration from .NET to java:
this what they used in .NET :
public static String UnicodeStr2HexStr(String strMessage)
{
byte[] ba = Encoding.BigEndianUnicode.GetBytes(strMessage);
String strHex = BitConverter.ToString(ba);
strHex = strHex.Replace("-", "");
return strHex;
}
i just need to know what kind of encoding happend here to apply in java, and it would helpfull if someone provide me with way:
i have tried this but it return different value:
String encodedWithISO88591 = "مرحبا;
String decodedToUTF8 = new String(encodedWithISO88591.getBytes("ISO-8859-1"), "UTF-8");

What you're looking to get is the hex representation of the Arabic String in UTF-16BE
String yourVal = "مرحبا";
System.out.println(DatatypeConverter.printHexBinary(yourVal.getBytes(StandardCharsets.UTF_16BE)));
output will be :
06450631062D06280627

Base64 String to Windows1251 (cyrillic symbols)

I have a trouble to convert email attachment(simple text file in windows-1251 encoding with latin and cyrillic symbols) to String. I.e I have a problem with converting cyrillic.
I got attachment file as base64 encoded String like this:
Base64Encoded email Attachment
Original file
So when I try to decode it, I got "?" instead of Cyrillic symbols.
How can I get right Cyrillic(Russian) symbols instead of "?"
I've already tried this code with all encodings, but nothing help to get correct Russian symbols.
BASE64Decoder dec = new BASE64Decoder();
for (String key : Charset.availableCharsets().keySet()) {
System.out.println("K=" + key + " Value:" +
Charset.availableCharsets().get(key));
try {
System.out.println(new String(dec.decodeBuffer(encoded), key));
} catch (Exception e) {
continue;
}
}
Thank You beforehand.

I am not very familiar with BPEL and protocols it uses. If you communicate between nodes using some binary protocols, then you must 1) ensure, client and receiver use the same charset and 2) convert java string into proper bytes in this encoding. Java stores string internally in UTF-16 format. So when you execute String correct = new String(commonName.getBytes("ISO-8859-1"), "ISO-8859-5") you will get correct string in UTF-16. Then you need to export it to bytes in requested encoding, eg. byte[] buff = correct.getBytes("UTF-8") assuming the encoding you use between nodes is UTF-8. If happen the encoding is different, then you must make sure, it actually supports Cyrillic characters (e.g. ISO-8859-1 does not support it).
If you use XML for data exchange, make sure it uses suitable encoding in <?xml encoding="UTF-8"?>. You don't need then to play with bytes, you just need to correctly "import" the string (see correct variable). Writing to XML converts characters automatically, but it (encoding) must support characters you want to write. So if you set encoding="ISO-88591", then you will get those question marks again.

ISO-8859-1 encoded strings out of /into JSON in Java

My application has a Java servlet that reads a JSONObject out of the request and constructs some Java objects that are used elsewhere. I'm running into a problem because there are strings in the JSON that are encoded in ISO-8859-1. When I extract them into Java strings, the encoding appears to get interpreted as UTF-16. I need to be able to get the correctly encoded string back at some point to put into another JSON object.
I've tried mucking around with ByteBuffers and CharBuffers, but then I don't get any characters at all. I can't change the encoding, as I have to play nicely with other applications that use ISO-8859-1.
Any tips would be greatly appreciated.
It's a legacy application using Struts 1.3.8. I'm using net.sf.json 2.2.4 for JSONObject and JSONArray.
A snippet of the parsing code is:
final JSONObject a = (JSONObject) i;
final JSONObject attr = a.getJSONObject("attribute");
final String category = attr.getString("category");
final String value = attr.getString("value");
I then create POJOs using that information, that are retrieved by another action class to create JSON to pass to the client for display, or to pass to other applications.
So to clarify, if the JSON contains the string "Juan Guzmán", the Java String contains something like Juan Guzm?_An (I don't have the exact one in front of me). I'm not sure how to get the correct diacritical back. I believe that if I can get a Java String that contains the correct representation, that Mezzie's solution, below, will allow me to create the string with the correct encoding to put back into the JSON to serve back.

I had the same issue and I am using the same technology as you are. In our case, it was UTF 8. so just change that to UTF-16
public static String UTF8toISO( String str )
{
try
{
return new String( str.getBytes( "ISO-8859-1" ), "UTF-8" );
}
catch ( UnsupportedEncodingException e )
{
e.printStackTrace();
}
return str;
}

How to encode URL to avoid special characters in Java? [duplicate]

This question already has answers here:
HTTP URL Address Encoding in Java
(24 answers)
Closed 5 years ago.
i need java code to encode URL to avoid special characters such as spaces and % and & ...etc

URL construction is tricky because different parts of the URL have different rules for what characters are allowed: for example, the plus sign is reserved in the query component of a URL because it represents a space, but in the path component of the URL, a plus sign has no special meaning and spaces are encoded as "%20".
RFC 2396 explains (in section 2.4.2) that a complete URL is always in its encoded form: you take the strings for the individual components (scheme, authority, path, etc.), encode each according to its own rules, and then combine them into the complete URL string. Trying to build a complete unencoded URL string and then encode it separately leads to subtle bugs, like spaces in the path being incorrectly changed to plus signs (which an RFC-compliant server will interpret as real plus signs, not encoded spaces).
In Java, the correct way to build a URL is with the URI class. Use one of the multi-argument constructors that takes the URL components as separate strings, and it'll escape each component correctly according to that component's rules. The toASCIIString() method gives you a properly-escaped and encoded string that you can send to a server. To decode a URL, construct a URI object using the single-string constructor and then use the accessor methods (such as getPath()) to retrieve the decoded components.
Don't use the URLEncoder class! Despite the name, that class actually does HTML form encoding, not URL encoding. It's not correct to concatenate unencoded strings to make an "unencoded" URL and then pass it through a URLEncoder. Doing so will result in problems (particularly the aforementioned one regarding spaces and plus signs in the path).

I also spent quite some time with this issue, so that's my solution:
String urlString2Decode = "http://www.test.com/äüö/path with blanks/";
String decodedURL = URLDecoder.decode(urlString2Decode, "UTF-8");
URL url = new URL(decodedURL);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String decodedURLAsString = uri.toASCIIString();

If you don't want to do it manually use Apache Commons - Codec library. The class you are looking at is: org.apache.commons.codec.net.URLCodec
String final url = "http://www.google.com?...."
String final urlSafe = org.apache.commons.codec.net.URLCodec.encode(url);

Here is my solution which is pretty easy:
Instead of encoding the url itself i encoded the parameters that I was passing because the parameter was user input and the user could input any unexpected string of special characters so this worked for me fine :)
String review="User input"; /*USER INPUT AS STRING THAT WILL BE PASSED AS PARAMTER TO URL*/
try {
review = URLEncoder.encode(review,"utf-8");
review = review.replace(" " , "+");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
String URL = "www.test.com/test.php"+"?user_review="+review;

I would echo what Wyzard wrote but add that:
for query parameters, HTML encoding is often exactly what the server is expecting; outside these, it is correct that URLEncoder should not be used
the most recent URI spec is RFC 3986, so you should refer to that as a primary source
I wrote a blog post a while back about this subject: Java: safe character handling and URL building

What is the most efficient way to format UTF-8 strings in java?

I am doing the following:
String url = String.format(WEBSERVICE_WITH_CITYSTATE, cityName, stateName);
String urlUtf8 = new String(url.getBytes(), "UTF8");
Log.d(TAG, "URL: [" + urlUtf8 + "]");
Reader reader = WebService.queryApi(url);
The output that I am looking for is essentially to get the city name with blanks (e.g., "Overland Park") to be formatted as Overland%20Park.
Is it this the best way?

Assuming you are actually wanting to encode your string for use in a URL (ie, "Overland Park" can also be formatted as "Overland+Park") you want URLEncoder.encode(url, "UTF-8"). Other unsafe characters will be converted to the %xx format you are asking for.

The simple answer is to use URLEncoder.encode(...) as stated by #Recurse. However, if part or all of the URL has already been encoded, then this can lead to double encoding. For example:
http://foo.com/pages/Hello%20There
or
http://foo.com/query?keyword=what%3f
Another concern with URLEncoder.encode(...) is that it doesn't understand that certain characters should be escaped in some contexts and not others. So for example, a '?' in a query parameter should be escaped, but the '?' that marks the start of the "query part" should not be escaped.
I think that safer way to add missing escapes would be the following:
String safeURI = new URI(url).toASCIIString();
However, I haven't tested this ...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.