I modified the embedded-jetty project to create a stand-alone jsp-viewer (one file with full source code). The result works fine, but it has a problem in displaying JSPs containing special glyphs. The problem is not that the Content-Type is not set when transmitting the markup, but that the rendered markup is garbled (in view-source or via curl). The JSP files must be read using the wrong character encoding, but starting the jvm with -Dfile.encoding=UTF-8 does nothing.
These strings
Butikknavn – et smartere valg
få ekstra fordeler når
becomes
Butikknavn â<80><93> et smartere valg
få ekstra fordeler når
Edit: Just to state the obvious, the content header is already set, as can be seen from the raw HTTP response
Content-Type:text/html;charset=utf-8
You have to add
<%# page pageEncoding="UTF-8" %>
to your JSP file(s).
The -Dfile.encoding=UTF-8 should do the pageEncoding="UTF-8" part for the whole Jetty instance, regrettably, as you've mentioned, it doesn't. You might also try to add <page-encoding>UTF-8</page-encoding> to your web.xml (as described here), but I've never tried it.
Your HTTP response is probably missing the content-type header. Try adding one as follows:
Content-Type: text/html; charset=utf-8
Related
What is the point of this part of an autogenerated jsp-page in intellij?
"<%# page contentType="text/html;charset=UTF-8" language="java" %>"
Because I don't see any loss of functionality if I remove this line. What's it's purpose?
A jsp page should have the contentType set to text/html but the webserver may default files to text/html. Same with the charset.
The UTF-8 in the middle is to set the encoding format, otherwise there will be garbled code when the browser opens the page, such as garbled code displayed in Chinese
A coworker of mine created a basic contact-us type form, which is mangling accented characters (è, é, à, etc). We're using KonaKart a Java e-commerce platform on Struts 1.
I've narrowed the issue down to the data coming in through the HttpServletRequest object. Comparing a similar (properly functioning) form, I noticed that on the old form the request object's Character Encoding (request.getCharacterEncoding()) is returned as "UTF-8", but on the new form it is coming back as NULL, and the text coming out of request.getParameter() is already mangled.
Aside from that, I haven't found any significant differences between the known-good form, and the new-and-broken form.
Things I've ruled out:
Both HTML pages have the tag: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Both form tags in the HTML use POST, and do not set encodings
Checking from Firebug, both the Request and Response headers have the same properties
Both JSP pages use the same attributes in the <%#page contentType="text/html;charset=UTF-8" language="java" %> tag
There's nothing remotely interesting going on in the *Form.java files, both inherit from BaseValidatorForm
I've checked the source file encodings, they're all set to Default - inherited from Container: UTF-8
If I convert them from ISO-8859-1 to UTF-8, it works great, but I would much rather figure out the core issue.
eg: new String(request.getParameter("firstName").getBytes("ISO-8859-1"),"UTF8")
Any suggestions are welcome, I'm all out of ideas.
Modern browsers usually don't supply the character encoding in the HTTP request Content-Type header. It's in case of HTML form based applications however the same character encoding as specified in the Content-Type header of the initial HTTP response serving the page with the form. You need to explicitly set the request character encoding to the same encoding yourself, which is in your case thus UTF-8.
request.setCharacterEncoding("UTF-8");
Do this before any request parameter is been retrieved from the request (otherwise it's too late; the server platform default encoding would then be used to parse the parameters, which is indeed often ISO-8859-1). A servlet filter which is mapped on /* is a perfect place for this.
See also:
Unicode - How to get the characters right?
The request.getCharacterEncoding() relies on the Content-Type request attribute, not Accept-Charset
So application/x-www-form-urlencoded;charset=IS08859_1 should work for the POST action. The <%#page tag doesn't affect the POST data.
I have a web page where I do a search based on the text given in a text box. This text can be in any language like japanese, chinese etc (or any mbcs character).
Now when i enter a text in japanese (or any other mbcs character), the result populates the screen (form) with some wierd characters.
For Example: testテスト will turn into testãã¹ã.
When i see the post parameters in Firebug (debugging tool) i can see that the search string goes as testテスト however when i put debug statements in my code, i can see that request.getParameter("searchString") is not able to identify the japanese characters and turn them into some wierd chars.
My JSP header already has <%# page contentType="text/html; charset=UTF-8"
I have also tried putting pageEncoding="UTF-8" in this but it didn't help.
I tried setting character encoding like request.setCharacterEncoding("UTF-8") also just before doing request.getParameter but that too didn't work for me.
After going through a few forums and blogs i also tried setting useBodyEncodingForURI=true in the <Connector> of tomcat config but that also did not help me.
Can anybody suggest me something to resolve this issue?
set the following encoding in every servlet/ action
response.setContentType("UTF-8");
response.setCharacterEncoding("UTF-8");
request.setCharacterEncoding("UTF-8");
also set following in first servlet/action
for japanese
response.setCharacterEncoding("UTF-8");
request.setCharacterEncoding("UTF-8");
session.setAttribute(Globals.LOCALE_KEY, new Locale("jp", "ja_JP"));
I'm set up using RESTeasy for jax-rs on my server. My client sends a string containing the character '✓', and the server can store that character (I can confirm that it is being stored correctly on the server). However, the server can't seem to return the '✓' in a response - instead, a '?' gets sent.
I'm assuming I need to specify a return encoding or something, but I don't know where to do this, or how to check to see what the current encoding is!
How do I specify the encoding on my server so that I can return a '✓' in a response?
edit to add code
My server code:
#Path("compiled/{rootReportGroupId}")
#GET
#Produces("text/html; charset=UTF-8")
#NoCache
public String getCompiledReports(#PathParam("rootReportGroupId") Long rootReportGroupId){
return "✓";
}
A sample request:
GET http://192.168.0.12:8888/rest/reports/compiled/190
Host 192.168.0.12:8888
User-Agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0.1) Gecko/20100101 Firefox/7.0.1
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip, deflate
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection keep-alive
Content-Type application/json
The response headers:
Cache-Control public, no-transform, no-cache
Content-Type text/html;charset="UTF-8"
Content-Length 1
Server Jetty(6.1.x)
The response body:
?
A bit rambling and long so I put it into an answer, but it is mostly a comment.
Out of curiosity, what versions of Java, Rest Easy, compiler settings are you using?
I used your code you posted here on MacOS 10.6, RestEasy 2.2.3.GA, Java 1.6.0_29, Tomcat 7.0.22, and it worked correctly (I removed the param piece, but it doesn't seem relevant).
What is the code used to read and write on the server side? Are there encoding issues reading?
I'm also suspicious of your response headers, particularly:
Content-Type text/html;charset="UTF-8"
I think should be:
Content-Type text/html;charset=UTF-8
How do I specify the encoding on my server so that I can return a '✓'
in a response?
There are three layers to configure:
Browser Display and Form Submission
JSP
<%#page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>
HTML
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Web Server Processing
JSP
<%
request.setCharacterEncoding("UTF-8");
String name = request.getParameter("NAME");
%>
Same type of thing in a Servlet. See JBoss specific solution as well as complete server independent solution in this answer.
Database Settings
You may be losing the character information at the database level. Check to make sure your database encoding is UTF-8 also, and not ASCII.
For a complete discussion of this topic, refer to Java article Character Conversions from Browser to Database.
I think the problem is that your IDE/Text Editor is saving the file in another encoding, so you making the container return the UTF-8 encoding but the text isn't it that enconding, that makes the problem happens.
Regards
Luan
I'm having trouble dealing with Charsets while parsing and rendering a page using the JSoup library. here is an example of the page it renders:
http://dl.dropbox.com/u/13093/charset-problem.html
As you can see, where there should be ' characters, ? is being rendered instead (even when you view the source).
This page is being generated by downloading a web page, parsing with JSoup, and then re-rendering it again having made some structural changes.
I'm downloading the page as follows:
final Document inputDoc = Jsoup.connect(sourceURL.toString()).get();
When I create the output document I do so as follows:
outputDoc.outputSettings().charset(Charset.forName("UTF-8"));
outputDoc.head().appendElement("meta").attr("charset", "UTF-8");
outputDoc.head().appendElement("meta").attr("http-equiv", "Content-Type")
.attr("content", "text/html; charset=UTF-8");
Can anyone offer suggestions as to what I'm doing wrong?
edit: Note that the source page is http://blog.locut.us/ and as you'll see, it appears to render correctly
The question marks are typical whenever you write characters to the outputstream of the response which are not covered by the response's character encoding. You seem to be relying on the platform default character encoding when serving the response. The response Content-Type header of your site also confirms this by a missing charset attribute.
Assuming that you're using a servlet to serve the modified HTML, then you should be using HttpServletResponse#setCharacterEncoding() to set the character encoding before writing the modified HTML out.
response.setCharacterEncoding("UTF-8");
response.getWriter().write(html);
The problem is most likely in reading the input page, you need to have the correct encoding for the source too.