character encoding responses with RESTEasy / jax-rs

character encoding responses with RESTEasy / jax-rs - java

I'm set up using RESTeasy for jax-rs on my server. My client sends a string containing the character '✓', and the server can store that character (I can confirm that it is being stored correctly on the server). However, the server can't seem to return the '✓' in a response - instead, a '?' gets sent.
I'm assuming I need to specify a return encoding or something, but I don't know where to do this, or how to check to see what the current encoding is!
How do I specify the encoding on my server so that I can return a '✓' in a response?
edit to add code
My server code:
#Path("compiled/{rootReportGroupId}")
#GET
#Produces("text/html; charset=UTF-8")
#NoCache
public String getCompiledReports(#PathParam("rootReportGroupId") Long rootReportGroupId){
return "✓";
}
A sample request:
GET http://192.168.0.12:8888/rest/reports/compiled/190
Host 192.168.0.12:8888
User-Agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0.1) Gecko/20100101 Firefox/7.0.1
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip, deflate
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection keep-alive
Content-Type application/json
The response headers:
Cache-Control public, no-transform, no-cache
Content-Type text/html;charset="UTF-8"
Content-Length 1
Server Jetty(6.1.x)
The response body:
?

A bit rambling and long so I put it into an answer, but it is mostly a comment.
Out of curiosity, what versions of Java, Rest Easy, compiler settings are you using?
I used your code you posted here on MacOS 10.6, RestEasy 2.2.3.GA, Java 1.6.0_29, Tomcat 7.0.22, and it worked correctly (I removed the param piece, but it doesn't seem relevant).
What is the code used to read and write on the server side? Are there encoding issues reading?
I'm also suspicious of your response headers, particularly:
Content-Type text/html;charset="UTF-8"
I think should be:
Content-Type text/html;charset=UTF-8

How do I specify the encoding on my server so that I can return a '✓'
in a response?
There are three layers to configure:
Browser Display and Form Submission
JSP
<%#page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>
HTML
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Web Server Processing
JSP
<%
request.setCharacterEncoding("UTF-8");
String name = request.getParameter("NAME");
%>
Same type of thing in a Servlet. See JBoss specific solution as well as complete server independent solution in this answer.
Database Settings
You may be losing the character information at the database level. Check to make sure your database encoding is UTF-8 also, and not ASCII.
For a complete discussion of this topic, refer to Java article Character Conversions from Browser to Database.

I think the problem is that your IDE/Text Editor is saving the file in another encoding, so you making the container return the UTF-8 encoding but the text isn't it that enconding, that makes the problem happens.
Regards
Luan

Related

Rendered JSP markup has garbled UTF-8 characters

I modified the embedded-jetty project to create a stand-alone jsp-viewer (one file with full source code). The result works fine, but it has a problem in displaying JSPs containing special glyphs. The problem is not that the Content-Type is not set when transmitting the markup, but that the rendered markup is garbled (in view-source or via curl). The JSP files must be read using the wrong character encoding, but starting the jvm with -Dfile.encoding=UTF-8 does nothing.
These strings
Butikknavn – et smartere valg
få ekstra fordeler når
becomes
Butikknavn â<80><93> et smartere valg
fÃ¥ ekstra fordeler nÃ¥r
Edit: Just to state the obvious, the content header is already set, as can be seen from the raw HTTP response
Content-Type:text/html;charset=utf-8

You have to add
<%# page pageEncoding="UTF-8" %>
to your JSP file(s).
The -Dfile.encoding=UTF-8 should do the pageEncoding="UTF-8" part for the whole Jetty instance, regrettably, as you've mentioned, it doesn't. You might also try to add <page-encoding>UTF-8</page-encoding> to your web.xml (as described here), but I've never tried it.

Your HTTP response is probably missing the content-type header. Try adding one as follows:
Content-Type: text/html; charset=utf-8

request.getCharacterEncoding() returns NULL... why?

A coworker of mine created a basic contact-us type form, which is mangling accented characters (è, é, à, etc). We're using KonaKart a Java e-commerce platform on Struts 1.
I've narrowed the issue down to the data coming in through the HttpServletRequest object. Comparing a similar (properly functioning) form, I noticed that on the old form the request object's Character Encoding (request.getCharacterEncoding()) is returned as "UTF-8", but on the new form it is coming back as NULL, and the text coming out of request.getParameter() is already mangled.
Aside from that, I haven't found any significant differences between the known-good form, and the new-and-broken form.
Things I've ruled out:
Both HTML pages have the tag: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Both form tags in the HTML use POST, and do not set encodings
Checking from Firebug, both the Request and Response headers have the same properties
Both JSP pages use the same attributes in the <%#page contentType="text/html;charset=UTF-8" language="java" %> tag
There's nothing remotely interesting going on in the *Form.java files, both inherit from BaseValidatorForm
I've checked the source file encodings, they're all set to Default - inherited from Container: UTF-8
If I convert them from ISO-8859-1 to UTF-8, it works great, but I would much rather figure out the core issue.
eg: new String(request.getParameter("firstName").getBytes("ISO-8859-1"),"UTF8")
Any suggestions are welcome, I'm all out of ideas.

Modern browsers usually don't supply the character encoding in the HTTP request Content-Type header. It's in case of HTML form based applications however the same character encoding as specified in the Content-Type header of the initial HTTP response serving the page with the form. You need to explicitly set the request character encoding to the same encoding yourself, which is in your case thus UTF-8.
request.setCharacterEncoding("UTF-8");
Do this before any request parameter is been retrieved from the request (otherwise it's too late; the server platform default encoding would then be used to parse the parameters, which is indeed often ISO-8859-1). A servlet filter which is mapped on /* is a perfect place for this.
See also:
Unicode - How to get the characters right?

The request.getCharacterEncoding() relies on the Content-Type request attribute, not Accept-Charset
So application/x-www-form-urlencoded;charset=IS08859_1 should work for the POST action. The <%#page tag doesn't affect the POST data.

Java utf8 email not working with IE after deploy

i have a mailer class that works fine with IE when i run the application locally, but after deploying it on a server it keeps sending gobbledygook and unreadable characters, i dont see where the problem is, everything is utf8, here is my code :
public static void sendHTMLEmail(String to, String subject, String body)
throws EmailException {
HtmlEmail email = new HtmlEmail();
email.setSmtpPort(25);
email.setAuthenticator(new DefaultAuthenticator("myMail","myPass"));
email.setDebug(false);
email.setHostName("smtp.gmail.com");
email.setFrom("webmail#mysite.com","Webmail#mysite");
email.setCharset("UTF-8");
email.setSubject(subject);
// --set Body--
String HTMLBody ="<html xmlns='http://www.w3.org/1999/xhtml'>";
HTMLBody += "<head><meta http-equiv='Content-Type' content='text/html; charset=utf-8' /></head>";
HTMLBody += "<body><div dir='rtl'>";
HTMLBody += body;
HTMLBody += "</div></body></html>";
// -----------
email.setHtmlMsg(HTMLBody);
email.setTextMsg("Your email client does not support HTML messages");
email.addTo(to);
email.setTLS(true);
email.send();
}
and here are my libraries :
import org.apache.commons.mail.DefaultAuthenticator;
import org.apache.commons.mail.Email;
import org.apache.commons.mail.EmailException;
import org.apache.commons.mail.HtmlEmail;
import org.apache.commons.mail.SimpleEmail;
thnx for your time

I'll assume that the String body method argument is actually the user-supplied data which has been entered in some <input> or <textarea> and submitted by a <form method="post"> in a JSP page.
The data will be submitted using the charset as is been specified in the Content-Type header of the page containing the form. If the charset is absent in the Content-Type header, then the browser will simply make a best guess and MSIE is generally not that smart as others, it'll just grab the client platform default encoding.
You need to ensure of the following three things to get it all straight:
Ensure that the HTTP response containing the <form> is been sent with charset=UTF-8 in the Content-Type header. You can achieve this by adding the following line to the top of the JSP responsible for generating the response:
<%#page pageEncoding="UTF-8" %>
This not only sets the response encoding to UTF-8, but also implicitly sets the Content-Type header to text/html;charset=UTF-8.
Ensure that the servlet which processes the form submit processes the input data in the obtained HTTP request with the same character encoding. You can achieve this by adding the following line before you get any information from the request, such as getParameter().
request.setCharacterEncoding("UTF-8");
A more convenient way would be to drop that line in some Filter which is been mapped on an URL pattern of interest, so that you don't need to copypaste the line over all servlets.
Ensure that you do not use the accept-charset attribute of the <form>. MSIE has serious bugs with this.

Does a servlet know the encoding of the sent form that specified using http-equiv?

Does a servlet knows the encoding of the sent form that specified using http-equiv?
When I specify an encoding of a POSTed form using http-equiv like that:
<HTML>
<head>
<meta http-equiv='Content-Type' content='text/html; charset=gb2312'/>
</head>
<BODY >
<form name="form" method="post" >
<input type="text" name="v_rcvname" value="相宜本草">
</form>
</BODY>
</HTML>
And then at the servlet I use the method, request.getCharacterEncoding() I got null !
So, Is there a way that I can tell the server that I am encoding the data in some char encoding??

This will indeed return null from most webbrowsers. But usually you can safely assume that the webbrowser has actually used the encoding as specified in the original response header, which is in this case gb2312. A common approach is to create a Filter which checks the request encoding and then uses ServletRequest#setCharacterEncoding() to force the desired value (which you should of course use consistently throughout your webapplication).
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws ServletException, IOException {
if (request.getCharacterEncoding() == null) {
request.setCharacterEncoding("gb2312");
}
chain.doFilter(request, response);
}
Map this Filter on an url-pattern covering all servlet requests, e.g. /*.
If you didn't do this and let it go, then the servletcontainer will use its default encoding to parse the parameters, which is usually ISO-8859-1, which in turn is wrong. Your input of 相宜本草 would end up like ÏàÒË±¾²Ý.

It's impossible to send POST data back in GB2312. I think UTF-8 is the W3C recommendation and all new browsers only send data back in either Latin-1 or UTF-8.
We were able to get GB2312 encoded data back in old IE on Win 95 but it's generally not possible on the new Unicode based browsers.
See this test on Firefox,
POST / HTTP/1.1
Host: localhost:1234
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 46
My page is in GB2312 and I specified GB2312 everywhere but the Firefox simply ignores it.
Some broken browsers even encode Chinese in Latin-1. We recently added a hidden field with a known value. By checking the value, we can figure out the encoding.
request.getCharacterEncoding() returns the encoding from Content-Type. As you can see from my trace, it's always null.

Java Servlet Mimetype and Internet Explorer mimetype handling bug

I have a servlet that may return text/html or application/pdf content. Apparently, it looks like Internet Explorer (IE7) does not handle the application/pdf correctly.
For example. Servlet Output A may return html content:
[html content here]
And then Servlet Output B may return PDF content:
[pdf content here]
The URL associated with these outputs are the same Servlet URL: http://web/Servlet
Reading online, it looks like IE may have a buggy mechanism and not trusting the mimetype/content type that is set from the server. Mainly, I am having an issue under Internet Explorer where I output the PDF but for some reason IE reverts the content type to text/html and I get a blank html page.
Here is a quote on the issue:
"Now there is still another bug lurking even where the PDF servlet is fixed to set the MIME type of the response as application/pdf. If no results were found, then response sent this information back to the client using HTML! Now because of IE's MIME type shenanigans, the response would get displayed using a text/html MIME type. However most other browsers will trust the application/pdf MIME type sent from the server"
In Firefox with the same Servlet, I don't get this issue.
In the java code, I am essentially setting these response header values:
Expires=0
Cache-Control=max-age=1, must-revalidate, no-cache, post-check=0, pre-check=0
Pragma=public
Content-Disposition=inline; filename=filename_1257804404940.pdf
Content-Length=457834
Connection=Keep-Alive
Content-Type=application/pdf
Content-Language=en-US
Above is the output from firefox. Under IE, I may get:
Content-Length=0
Connection=Keep-Alive
Content-Type=text/html
Content-Language=en-US
Even though the code is the same.
Here is my question, how can I avoid this problem?

We had similar problems at work. You can twist the browser's arm by including a file name with the proper extension in the URL you go to.
If you have a HTML serving servlet and one that does PDF, this is obviously no problem; just map different URLs to them.
If the type is determined at runtime, you can kludge-solve the problem by defaulting to HTML and returning a document with a meta redirect to the PDF URL if that's needed.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.