URI Decoding doesn't work in MultipartConfig files - java

I have used this code to decode a URI string:
java.net.URLDecoder.decode(request.getParameter("comment"), "UTF-8"). and it works. e.g.
Input: cl%C4%81mor
Output: clāmor
But when I use #MultipartConfig in my java servlet file, this happens:
Input: cl%C4%81mor
Output: cl%C4%81mor
I am not sure why this didn't work. Can you tell me why this happened and/or how to fix it? Thanks in advance.

Could it be that #MultipartConfig changes the default request encoding in your setup? Can you check what request.getCharacterEncoding() returns UTF-8? Is the value returned from request.getParameter("comment") different after you add #MultipartConfig.
It would be easier to answer if you would provide more information about your setup. If you are using Spring with JEE annotations maybe you want to look at this answer.

Related

french accents giving "<?>" in http responses with correct charset (Java)

Calling an API that returns french sentences, all the accented characters are displayed like <?> in my java code, even if the charset is well defined (application/json;charset=iso-8859-1).
Using postman or my web browser, I don't face any problem.
I also tried to call the API with a Content-Type header with the value application/json;charset=UTF-8 or application/json;charset=iso-8859-1 but the problem remains the same.
Any idea ?
response.getBody() gives:
{"sentences":[{"fr_value":"il �tait loin","dz_value":"kaan b3id","additional_information":{"personal_prounoun":"HE","verb":"�tre","adjective":"loin","tense":"pass�"}}],"count":1}
new String(response.getBody().getBytes(StandardCharsets.UTF_8)) gives exactly the same.
I'm using scribejava.
Edit: even saving the response in a file and opening it with NotePad++, the result is similar:
You need to read it as ISO-8859-1. Not sure what then as I don't know what you're doing. My https://technojeeves.com/index.php/aliasjava1/51-transcode-in-java is helpful. With wget:
wget -O - us-central1-dz-dialect-api.cl… | xcode -ie Latin1 (I made 'xcode' to invoke that Java app)
Problem solved using the following code :
httpResponse.setContentType("application/json;charset=UTF-8");
mapper.getFactory().configure(JsonGenerator.Feature.ESCAPE_NON_ASCII, true);

Cyrillic in request

When i call request like:
<serverUrl>/objects.svc/objects(<some-cyrillic-str>)
i have error like: "The URI is malformed".
And when i add single quotes, so it becomes like:
<serverUrl>/objects.svc/objects('<some-cyrillic-str>')
i have error like: "The key property 'Id' is invalid".
I think that problem is: URL encoding.
In servlet cyrillic part of request URL becomes like: %D7%....etc) and Olingo can't use it.
Q: what is the proper way to use cyrillic in such situations?
UPD:
Cyr. part of URL working by js (encodeURLComponent()) and sending. Servlet (and then Olingo) get this part like %D7%... When i try to decode url in filter (before servlet), i have proper cyr. part in filter, but servlet can't to be called by such url anymore.
It was solved by myself. It was silly mistake in Olingo server.
UPD:
Mistake was: key property 'Id' was INT type (not STRING). After correction Olingo worked cyrillic string in right way.
Thanx all.

Strange behaviour with Jersey multipart request for UTF-8 encoding

I have seen strange behavior with jersey and tomcat multipart request.
I have files in different languages example
минуты назад.txt or 您好.txt
With help of other post I figured out that we need to convert this in UTF-8 format.
Something like
String fileName=new String(bodyPart.getContentDisposition().getFileName().getBytes(),"UTF-8");
With this I see that the names are converted back but some characters are garbled with question marks. The above mentioned file names are converted to something like
мин�?�?�? назад.txt and �?�好.txt
I am not sure why only few characters are lost. In above code bodyPart is nothing but FormDataBodyPart bodyPart from Jersey.
Is there any additional configuration needed in Tomcat ? I tried adding URIEncoding="UTF-8" but that did not help.
Need some help to understand this.

URLDecoder giving unexpected value on UTF-8 url parameter

I'm using java.net.URLDecoder to decode a URL parameter that is supposed to be encoded in UTF-8. A quick test reveals I'm getting ? instead of ∩ in the output. Here's the code:
System.out.println(java.net.URLDecoder.decode("A%E2%88%A9B%0AYour+answer+is%3A+3", "UTF-8"));
And as output I'm getting:
A?B
Your answer is: 3
When I plug the string A%E2%88%A9B%0AYour+answer+is%3A+3 into web decoders (e.g. here or here), they get it right:
A∩B
Your answer is: 3
Does anyone know what I'm doing wrong. Is this not actually UTF-8? The string is coming from com.google.gwt.http.client.URL.encodeQueryString(), which claims UTF-8 encoding.
As Siguza and VGe0rge pointed out, the Java code was running correctly, but Eclipse's console will not display in UTF-8 by default. A solution to that issue can be found here.

encoding problem in servlet

I have a servlet which receive some parameter from the client ,then do some job.
And the parameter from the client is Chinese,so I often got some invalid characters in the servet.
For exmaple:
If I enter
http://localhost:8080/Servlet?q=中文&type=test
Then in the servlet,the parameter of 'type' is correct(test),however the parameter of 'q' is not correctly encoding,they become invalid characters that can not parsed.
However if I enter the adderss bar again,the url will changed to :
http://localhost:8080/Servlet?q=%D6%D0%CE%C4&type=test
Now my servlet will get the right parameter of 'q'.
What is the problem?
UPDATE
BTW,it words well when I send the form with post.
WHen I send them in the ajax,for example:
url="http://..q='中文',
xmlhttp.open("POST",url,true);
Then the server side also get the invalid characters.
It seems that just when the Chinese character are encoded like %xx,the server side can get the right result.
That's to say http://.../q=中文 does not work,
http://.../q=%D6%D0%CE%C4 work.
But why "http://www.google.com.hk/search?hl=zh-CN&newwindow=1&safe=strict&q=%E4%B8%AD%E6%96%87&btnG=Google+%E6%90%9C%E7%B4%A2&aq=f&aqi=&aql=&oq=&gs_rfai=" work?
Ensure that the encoding of the page with the form itself is also UTF-8 and ensure that the browser is instructed to read the page as UTF-8. Assuming that it's JSP, just put this in very top of the page to achieve that:
<%# page pageEncoding="UTF-8" %>
Then, to process GET query string as UTF-8, ensure that the servletcontainer in question is configured to do so. It's unclear which one you're using, so here's a Tomcat example: set the URIEncoding attribute of the <Connector> element in /conf/server.xml to UTF-8.
<Connector URIEncoding="UTF-8">
For the case that you'd like to use POST, then you need to ensure that the HttpServletRequest is instructed to parse the POST request body using UTF-8.
request.setCharacterEncoding("UTF-8");
Call this before you access the first parameter. A Filter is the best place for this.
See also:
Unicode - How to get the characters right?
Using non-ASCII characters as GET parameters (i.e. in URLs) is generally problematic. RFC 3986 recommends using UTF-8 and then percent encoding, but that's AFAIK not an official standard. And what you are using in the case where it works isn't UTF-8!
It would probably be safest to switch to POST requests.
I believe that the problem is on sending side. As I understood from your description if you are writing the URL in browser you get "correctly" encoded request. This job is done by browser: it knows to convert unicode characters to sequence of codes like %xx.
So, try to check how do you send the request. It should be encoded on sending.
Other possibility is to use POST method instead of GET.
Do read this article on URL encoding format "www.blooberry.com/indexdot/html/topics/urlencoding.htm".
If you want, you could convert characters to hex or Base64 and put them in the parameters of the URL.
I think it's better to put them in the body (Post) then the URL (Get).

Categories

Resources