HTML : Form does not send UTF-8 format inputs - java

I've visited each one of the questions about UTF-8 encoding in HTML and nothing seems to be making it work like expected.
I added the meta tag : nothing changed.
I added the accept-charset attribute in form : nothing changed.
JSP File
<%# page pageEncoding="UTF-8" %>
<%# taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
<title>Editer les sous-titres</title>
</head>
<body>
<form method="post" action="/Subtitlor/edit" accept-charset="UTF-8">
<h3 name="nameOfFile"><c:out value="${ nameOfFile }"/></h3>
<input type="hidden" name="nameOfFile" id="nameOfFile" value="${ nameOfFile }"/>
<c:if test="${ !saved }">
<input value ="Enregistrer le travail" type="submit" style="position:fixed; top: 10px; right: 10px;" />
</c:if>
Retour à la page d'accueil
<c:if test="${ saved }">
<div style="position:fixed; top: 90px; right: 10px;">
<c:out value="Travail enregistré dans la base de donnée"/>
</div>
</c:if>
<table border="1">
<c:if test="${ !saved }">
<thead>
<th style="weight:bold">Original Line</th>
<th style="weight:bold">Translation</th>
<th style="weight:bold">Already translated</th>
</thead>
</c:if>
<c:forEach items="${ subtitles }" var="line" varStatus="status">
<tr>
<td style="text-align:right;"><c:out value="${ line }" /></td>
<td><input type="text" name="line${ status.index }" id="line${ status.index }" size="35" /></td>
<td style="text-align:right"><c:out value="${ lines[status.index].content }"/></td>
</tr>
</c:forEach>
</table>
</form>
</body>
</html>
Servlet
for (int i = 0 ; i < 2; i++){
System.out.println(request.getParameter("line"+i));
}
Output
Et ton père et sa soeur
Il ne sera jamais parti.

I added the meta tag : nothing changed.
It indeed doesn't have any effect when the page is served over HTTP instead of e.g. from local disk file system (i.e. the page's URL is http://... instead of e.g. file://...). In HTTP, the charset in HTTP response header will be used. You've already set it as below:
<%#page pageEncoding="UTF-8"%>
This will not only write out the HTTP response using UTF-8, but also set the charset attribute in the Content-Type response header.
This one will be used by the webbrowser to interpret the response and encode any HTML form params.
I added the accept-charset attribute in form : nothing changed.
It has only effect in Microsoft Internet Explorer browser. Even then it is doing it wrongly. Never use it. All real webbrowsers will instead use the charset attribute specified in the Content-Type header of the response. Even MSIE will do it the right way as long as you do not specify the accept-charset attribute. As said before, you have already properly set it via pageEncoding.
Get rid of both the meta tag and accept-charset attribute. They do not have any useful effect and they will only confuse yourself in long term and even make things worse when enduser uses MSIE. Just stick to pageEncoding. Instead of repeating the pageEncoding over all JSP pages, you could also set it globally in web.xml as below:
<jsp-config>
<jsp-property-group>
<url-pattern>*.jsp</url-pattern>
<page-encoding>UTF-8</page-encoding>
</jsp-property-group>
</jsp-config>
As said, this will tell the JSP engine to write HTTP response output using UTF-8 and set it in the HTTP response header too. The webbrowser will use the same charset to encode the HTTP request parameters before sending back to server.
Your only missing step is to tell the server that it must use UTF-8 to decode the HTTP request parameters before returning in getParameterXxx() calls. How to do that globally depends on the HTTP request method. Given that you're using POST method, this is relatively easy to achieve with the below servlet filter class which automatically hooks on all requests:
#WebFilter("/*")
public class CharacterEncodingFilter implements Filter {
#Override
public void init(FilterConfig config) throws ServletException {
// NOOP.
}
#Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
request.setCharacterEncoding("UTF-8");
chain.doFilter(request, response);
}
#Override
public void destroy() {
// NOOP.
}
}
That's all. In Servlet 3.0+ (Tomcat 7 and newer) you don't need additional web.xml configuration.
You only need to keep in mind that it's very important that setCharacterEncoding() method is called before the POST request parameters are obtained for the first time using any of getParameterXxx() methods. This is because they are parsed only once on first access and then cached in server memory.
So e.g. below sequence is wrong:
String foo = request.getParameter("foo"); // Wrong encoding.
// ...
request.setCharacterEncoding("UTF-8"); // Attempt to set it.
String bar = request.getParameter("bar"); // STILL wrong encoding!
Doing the setCharacterEncoding() job in a servlet filter will guarantee that it runs timely (at least, before any servlet).
In case you'd like to instruct the server to decode GET (not POST) request parameters using UTF-8 too (those parameters you see after ? character in URL, you know), then you'd basically need to configure it in the server end. It's not possible to configure it via servlet API. In case you're using for example Tomcat as server, then it's a matter of adding URIEncoding="UTF-8" attribute in <Connector> element of Tomcat's own /conf/server.xml.
In case you're still seeing Mojibake in the console output of System.out.println() calls, then chances are big that the stdout itself is not configured to use UTF-8. How to do that depends on who's responsible for interpreting and presenting the stdout. In case you're using for example Eclipse as IDE, then it's a matter of setting Window > Preferences > General > Workspace > Text File Encoding to UTF-8.
See also:
Unicode - How to get the characters right?

Warm up
Let me start by saying the universal fact which we all know that computer doesn't understand anything but bits - 0's and 1's.
Now, when you are submitting a HTML form over HTTP and values travel over the wire to reach destination server then essentially a whole lot of bits - 0's and 1's are being passed over.
Before sending the data to the server, HTTP client (browser or curl etc.) will encode it using some encoding scheme and expects server to decode it using same scheme so that server knows exactly what client has sent.
Before sending response back to the client, server will encode it using some encoding scheme and expects client to decode it using same scheme so that client knows exactly what server has sent.
An analogy for this can be - I am sending a letter to you and telling you whether it is written in English or French or Dutch, so that you will get exact message as I intended to send you. And while replying to me you will also mention in which language I should read.
Important take away is that the fact that when data is leaving the client it will be encoded and same will be decoded at server side, and vice-versa. If you do not specify anything then content will be encoded as per application/x-www-form-urlencoded before leaving from client side to server side.
Core concept
Reading warm up is important. There are couple of things you need to make sure to get the expected results.
Having correct encoding set before sending data from client to server.
Having correct decoding and encoding set at server side to read request and write response back to client (this was the reason why you were not getting expected results)
Ensure that everywhere same encoding scheme is used, it should not happen that at client you are encoding using ISO-8859-1 and at server you are decoding using UTF-8, else there will be goof-up (from my analogy, I am writing you in English and you are reading in French)
Having correct encoding set for your logs viewer, if trying to verify using log using Windows command-line or Eclipse log viewer etc. (this was a contributing reason for your issue but it was not primary reason because in the first place your data read from request object was not correctly decoded. windows cmd or Eclipse log viewer encoding also matters, read here)
Having correct encoding set before sending data from client to server
To ensure this, there are several ways talked about but I will say use HTTP Accept-Charset request-header field. As per your provided code snippet you are already using and using it correctly so you are good from that front.
There are people who will say that do not use this or it is not implemented but I would very humbly disagree with them. Accept-Charset is part of HTTP 1.1 specification (I have provided link) and browser implementing HTTP 1.1 will implement the same. They may also argue that use Accept request-header field's "charset" attribute but
Really it is not present, check the Accept request-header field link I provided.
Check this
I am providing you all data and facts, not just words, but still if you are not satisfied then do following tests using different browsers.
Set accept-charset="ISO-8859-1" in your HTML form and POST/GET form having Chinese or advanced French characters to server.
At server decode the data using UTF-8 scheme.
Now repeat same test by swapping client and server encoding.
You will see that none of times you were able to see the expected characters at server. But if you will use same encoding scheme then you will see expected character. So, browsers do implements accept-charset and its effect kicks-in.
Having correct decoding and encoding set at server side to read request and write response back to client
There are hell lot of ways talked about that you can do to achieve this (sometime some configuration may be required based on specific scenario but below solves 95% cases and holds good for your case as well). For example:
Use character encoding filter for setting encoding on request and response.
Use setCharacterEncoding on request and response
Configure web or application server for correct character encoding using -Dfile.encoding=utf8 etc. Read more here
Etc.
My favorite is first one and will solve your problem as well - "Character Encoding Filter", because of below reasons:
All you encoding handling logic is at one place.
You have all the power through configuration, change at one place and everyone if happy.
You need not to worry that some other code may be reading my request stream or flushing out the response stream before I could set the character encoding.
1. Character encoding filter
You can do following to implement your own character encoding filter. If you are using some framework like Springs etc. then you need not to write you own class but just do the configuration in web.xml
Core logic in below is very similar to what Spring does, apart from a lot of dependency, bean aware thing they do.
web.xml (configuration)
<filter>
<filter-name>EncodingFilter</filter-name>
<filter-class>
com.sks.hagrawal.EncodingFilter
</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>EncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
EncodingFilter (character encoding implementation class)
public class EncodingFilter implements Filter {
private String encoding = "UTF-8";
private boolean forceEncoding = false;
public void doFilter(ServletRequest request, ServletResponse response, FilterChain filterChain) throws IOException, ServletException {
request.setCharacterEncoding(encoding);
if(forceEncoding){ //If force encoding is set then it means that set response stream encoding as well ...
response.setCharacterEncoding(encoding);
}
filterChain.doFilter(request, response);
}
public void init(FilterConfig filterConfig) throws ServletException {
String encodingParam = filterConfig.getInitParameter("encoding");
String forceEncoding = filterConfig.getInitParameter("forceEncoding");
if (encodingParam != null) {
encoding = encodingParam;
}
if (forceEncoding != null) {
this.forceEncoding = Boolean.valueOf(forceEncoding);
}
}
#Override
public void destroy() {
// TODO Auto-generated method stub
}
}
2. ServletRequest.setCharacterEncoding()
This is essentially same code done in character encoding filter but instead of doing in filter, you are doing it in your servlet or controller class.
Idea is again to use request.setCharacterEncoding("UTF-8"); to set the encoding of http request stream before you start reading the http request stream.
Try below code, and you will see that if you are not using some sort of filter to set the encoding on request object then first log will be NULL while second log will be "UTF-8".
System.out.println("CharacterEncoding = " + request.getCharacterEncoding());
request.setCharacterEncoding("UTF-8");
System.out.println("CharacterEncoding = " + request.getCharacterEncoding());
Below is important excerpt from setCharacterEncoding Java docs. Another thing to note is you should provide a valid encoding scheme else you will get UnsupportedEncodingException
Overrides the name of the character encoding used in the body of this
request. This method must be called prior to reading request
parameters or reading input using getReader(). Otherwise, it has no
effect.
Wherever needed I have tried best to provide you official links or StackOverflow accepted bounty answers, so that you can build trust.

Based on your posted output it seems that the parameter is sent as UTF8 and later the unicode bytes of the string are interpreted as ISO-8859-1.
Following snippet demonstrates your observed behavior
String eGrave = "\u00E8"; // the letter è
System.out.printf("letter UTF8 : %s%n", eGrave);
byte[] bytes = eGrave.getBytes(StandardCharsets.UTF_8);
System.out.printf("UTF-8 hex : %X %X%n",
bytes[0], bytes[1], bytes[0], bytes[1]
);
System.out.printf("letter ISO-8859-1: %s%n",
new String(bytes, StandardCharsets.ISO_8859_1)
);
output
letter UTF8 : è
UTF-8 hex : C3 A8
letter ISO-8859-1: è
For me the form send the correct UTF8 encoded data, but later this data is not treated as UTF8.
edit Some other points to try:
output the character encoding your request has
System.out.println(request.getCharacterEncoding())
force the usage of UTF-8 to retrieve the parameter (untested, only an idea)
request.setCharacterEncoding("UTF-8");
... request.getParameter(...);

You can try to write that on .jsp:
<%# page language="java" contentType="text/html; charset=ISO-8859-1"
pageEncoding="UTF-8"%>
problem resolved for me with that.

You can use Strings related to ISO in your charset and pageEncoding definations in your JSP code.
Like charset="ISO-8859-1" and pageEncoding="ISO-8859-1".

There is a bug in tomcat that may trapped you. The first-filter defines the encoding the request is based on.
Every other filter or servlet behind the first-filter can not change the encoding of the request anymore.
I do not think this bug will be fixed in the future because the current applications may rely on the encoding.

resp.setContentType("text/html;charset=UT-8");

Related

Spring MVC and UTF-8: How to work with Swedish special characters?

I try to find the word with special Swedish characters "bäck" in my database,
I have a jsp-page:
<%# page pageEncoding="utf-8" contentType="text/html; charset=utf-8" %>
...
<form name="mainform" action="/web/admin/users/">
<input id="keywords" type="text" name="keywords" size="30"
value="${status.value}" tabindex="1" />
<button class="link" type="submit">Search</button>
</form>
a filter:
public class RequestResponseCharacterEncodingFilter extends OncePerRequestFilter {
private String encoding;
private boolean forceEncoding;
protected void doFilterInternal(
HttpServletRequest request, HttpServletResponse response, FilterChain filterChain)
throws ServletException, IOException {
request.setCharacterEncoding(this.encoding);
response.setCharacterEncoding(this.encoding);
filterChain.doFilter(request, response);
}
}
web.xml
<web-app ...>
...
<filter>
<filter-name>encodingFilter</filter-name>
<filter-class>test.testdomain.spring.RequestResponseCharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>encodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
...
</web-app>
When I start finding the "bäck" word, It appears like this bäck. A request is encoded into UTF-8:
but right before I exit my doFilterInternal method in my filter in debugger I see:
What I am doing wrong? Why is the text not encoded into UTF-8?
EDIT: It is very strange, I've just tried to query in Chrome and Mozilla Firefox and there it works well, so it appears to me that I have this problem only in Internet Explorer
EDIT: Internet Explorer gives me this string: b%C3%A4ck but Mozilla Firefox and Chrome give me the string: b%E4ck. They are obviously different why is that?
Your screenshots indicate that your search keyword, bäck, is sent as part of the URL, as a URL parameter. It also indicates that this work seems correctly UTF-8 URL encoded. And the String you get back in your debugger is typical of ISO-Latin decoding of UTF-8 encoded bytes : e.g. the HTTPServletRequest parser used ISO-Latin parsing for a UTF-8 encoded string.
So, your ServletFilter is of no help in interpreting it :
request.setCharacterEncoding(this.encoding);
response.setCharacterEncoding(this.encoding);
Because as the javadoc says : these methods work on the body of HTTP request, not on its URLs.
/**
* Overrides the name of the character encoding used in the body of this
* request. This method must be called prior to reading request parameters
* or reading input using getReader(). Otherwise, it has no effect.
*
Seeing URL parameter parsing is a responsability of your Servlet container, the setting you should look at probably is a container level one.
For example, on Tomcat, as stated in the documentation at : http://tomcat.apache.org/tomcat-7.0-doc/config/http.html :
URIEncoding : This specifies the character encoding used to decode the URI bytes, after %xx decoding the URL. If not specified, ISO-8859-1 will be used.
By default, it uses ISO-8859-1. You should change that to UTF-8, and then, your request parameters will be correctly parsed from your servlet container, and passed to the HTTPServletRequest object.
EDIT : As you are seeing inconsistent browser behaviour, you may look into the consistency of your HTML form. Please make sure that
Your HTTP Content-Type header AND your HTML "meta" tag defining the charset are both present and coherent in declaring a charset. (Given your servlet filter, they both should be UTF-8)
You actually respect that charset declaration in the body of your response (you actually write UTF-8 strings from your JSP - or whatever else)

Encoding and rewriting of URL

We use URL rewriting or cookies to maintain session across a web server and a browser because they communicate using HTTP protocol which is stateless in nature. Because of its stateless nature, server never recognizes any client(browser) whether it has made any request previously or not. We therefore need to maintain a unique identifier in between them.
When a browser (client) doesn't support cookies or cookies are disabled on the browser, the technique called URL rewriting is used and the sessionID needs to be encoded in the URL such as,
try
{
response.sendRedirect(response.encodeRedirectURL("index.jsp?param=value"));
}
catch(Exception e)
{
}
Regrading a normal link in JSP, I use the JSTL's <c:url> tag like,
<c:url value="Category.htm" var="url">
<c:param name="id" value="${row.category.catId}"/>
</c:url>
<a href="${url}"
name="catId${row.category.catId}"
title="Click to view the details.">${row.category.catName}
</a>
It is embedded within a <c:forEach> loop.
But if the browser supports cookies or session tracking is turned off, URL encoding is unnecessary and it doesn't take place.
So, in that case, what if a URI or a query contains special characters like +, &, #?
They need to be encoded and if a URI and a query string were encoded separately, would URL rewriting automatically done, in case cookies are disabled or not supported by a browser like?
URI uri = new URI(
"http",
null,
request.getServerName(),
request.getServerPort(), "/WebApp/index.jsp",
"param="+URLEncoder.encode("some value#+", "UTF-8"), null);
String uriString = uri.toASCIIString();
and in this case the parameter param needs to be decoded while retrieving,
out.println(URLDecoder.decode(request.getParameter("param"), "UTF-8")");
What about the URL rewriting in this case, I'm unsure whether it is done by the Servlet Container or to be handled separately on our own.
One additional thing, while using RequestDispatcher, <jsp:forward page="index.jsp"/> and <jsp:include page="template.jsp"/>, is it necessary to take care of URL encode like?
try
{
RequestDispatcher requestDispatcher=
request.getRequestDispatcher(response.encodeURL("index.jsp?param=value"));
requestDispatcher.forward(request, response);
}
catch(Exception e)
{
}
and I always use <c:url> with the form's action attribute like (regarding Spring),
<c:url value="${param.url}" var="url">
<c:param name="id" value="${param.id}"/>
</c:url>
<form:form action="${url}" id="dataForm" name="dataForm" method="post" commandName="someBean">
.
.
.
</form:form>
It refers to the current URL. Is it really required (even though I'm not supplying any parameter(s))?
About the last question, experience with Spring 3.0.5 tells that you can omit the action attribute in form:form , it will user the current URL and append the session ID parameter in case cookies are not used.
Related to this, is it possible to use a shorter syntax, something like <form:form action="<c:url value="foo"/>" ... ?
This exact syntax does not work, as tags can not be nested this way. The usage of an extra variable is a bit too long to my taste.

request.getCharacterEncoding() returns NULL... why?

A coworker of mine created a basic contact-us type form, which is mangling accented characters (è, é, à, etc). We're using KonaKart a Java e-commerce platform on Struts 1.
I've narrowed the issue down to the data coming in through the HttpServletRequest object. Comparing a similar (properly functioning) form, I noticed that on the old form the request object's Character Encoding (request.getCharacterEncoding()) is returned as "UTF-8", but on the new form it is coming back as NULL, and the text coming out of request.getParameter() is already mangled.
Aside from that, I haven't found any significant differences between the known-good form, and the new-and-broken form.
Things I've ruled out:
Both HTML pages have the tag: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Both form tags in the HTML use POST, and do not set encodings
Checking from Firebug, both the Request and Response headers have the same properties
Both JSP pages use the same attributes in the <%#page contentType="text/html;charset=UTF-8" language="java" %> tag
There's nothing remotely interesting going on in the *Form.java files, both inherit from BaseValidatorForm
I've checked the source file encodings, they're all set to Default - inherited from Container: UTF-8
If I convert them from ISO-8859-1 to UTF-8, it works great, but I would much rather figure out the core issue.
eg: new String(request.getParameter("firstName").getBytes("ISO-8859-1"),"UTF8")
Any suggestions are welcome, I'm all out of ideas.
Modern browsers usually don't supply the character encoding in the HTTP request Content-Type header. It's in case of HTML form based applications however the same character encoding as specified in the Content-Type header of the initial HTTP response serving the page with the form. You need to explicitly set the request character encoding to the same encoding yourself, which is in your case thus UTF-8.
request.setCharacterEncoding("UTF-8");
Do this before any request parameter is been retrieved from the request (otherwise it's too late; the server platform default encoding would then be used to parse the parameters, which is indeed often ISO-8859-1). A servlet filter which is mapped on /* is a perfect place for this.
See also:
Unicode - How to get the characters right?
The request.getCharacterEncoding() relies on the Content-Type request attribute, not Accept-Charset
So application/x-www-form-urlencoded;charset=IS08859_1 should work for the POST action. The <%#page tag doesn't affect the POST data.

Java utf8 email not working with IE after deploy

i have a mailer class that works fine with IE when i run the application locally, but after deploying it on a server it keeps sending gobbledygook and unreadable characters, i dont see where the problem is, everything is utf8, here is my code :
public static void sendHTMLEmail(String to, String subject, String body)
throws EmailException {
HtmlEmail email = new HtmlEmail();
email.setSmtpPort(25);
email.setAuthenticator(new DefaultAuthenticator("myMail","myPass"));
email.setDebug(false);
email.setHostName("smtp.gmail.com");
email.setFrom("webmail#mysite.com","Webmail#mysite");
email.setCharset("UTF-8");
email.setSubject(subject);
// --set Body--
String HTMLBody ="<html xmlns='http://www.w3.org/1999/xhtml'>";
HTMLBody += "<head><meta http-equiv='Content-Type' content='text/html; charset=utf-8' /></head>";
HTMLBody += "<body><div dir='rtl'>";
HTMLBody += body;
HTMLBody += "</div></body></html>";
// -----------
email.setHtmlMsg(HTMLBody);
email.setTextMsg("Your email client does not support HTML messages");
email.addTo(to);
email.setTLS(true);
email.send();
}
and here are my libraries :
import org.apache.commons.mail.DefaultAuthenticator;
import org.apache.commons.mail.Email;
import org.apache.commons.mail.EmailException;
import org.apache.commons.mail.HtmlEmail;
import org.apache.commons.mail.SimpleEmail;
thnx for your time
I'll assume that the String body method argument is actually the user-supplied data which has been entered in some <input> or <textarea> and submitted by a <form method="post"> in a JSP page.
The data will be submitted using the charset as is been specified in the Content-Type header of the page containing the form. If the charset is absent in the Content-Type header, then the browser will simply make a best guess and MSIE is generally not that smart as others, it'll just grab the client platform default encoding.
You need to ensure of the following three things to get it all straight:
Ensure that the HTTP response containing the <form> is been sent with charset=UTF-8 in the Content-Type header. You can achieve this by adding the following line to the top of the JSP responsible for generating the response:
<%#page pageEncoding="UTF-8" %>
This not only sets the response encoding to UTF-8, but also implicitly sets the Content-Type header to text/html;charset=UTF-8.
Ensure that the servlet which processes the form submit processes the input data in the obtained HTTP request with the same character encoding. You can achieve this by adding the following line before you get any information from the request, such as getParameter().
request.setCharacterEncoding("UTF-8");
A more convenient way would be to drop that line in some Filter which is been mapped on an URL pattern of interest, so that you don't need to copypaste the line over all servlets.
Ensure that you do not use the accept-charset attribute of the <form>. MSIE has serious bugs with this.

UTF-8 text is garbled when form is posted as multipart/form-data

I'm uploading a file to the server. The file upload HTML form has 2 fields:
File name - A HTML text box where the user can give a name in any language.
File upload - A HTMl 'file' where user can specify a file from disk to upload.
When the form is submitted, the file contents are received properly. However, when the file name (point 1 above) is read, it is garbled. ASCII characters are displayed properly. When the name is given in some other language (German, French etc.), there are problems.
In the servlet method, the request's character encoding is set to UTF-8. I even tried doing a filter as mentioned - How can I make this code to submit a UTF-8 form textarea with jQuery/Ajax work? - but it doesn't seem to work. Only the filename seems to be garbled.
The MySQL table where the file name goes supports UTF-8. I gave random non-English characters & they are stored/displayed properly.
Using Fiddler, I monitored the request & all the POST data is passed correctly. I'm trying to identify how/where the data could get garbled. Any help will be greatly appreciated.
I had the same problem using Apache commons-fileupload.
I did not find out what causes the problems especially because I have the UTF-8 encoding in the following places:
1. HTML meta tag
2. Form accept-charset attribute
3. Tomcat filter on every request that sets the "UTF-8" encoding
-> My solution was to especially convert Strings from ISO-8859-1 (or whatever is the default encoding of your platform) to UTF-8:
new String (s.getBytes ("iso-8859-1"), "UTF-8");
hope that helps
Edit: starting with Java 7 you can also use the following:
new String (s.getBytes (StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);
Just use Apache commons upload library.
Add URIEncoding="UTF-8" to Tomcat's connector, and use FileItem.getString("UTF-8") instead of FileItem.getString() without charset specified.
Hope this help.
I got stuck with this problem and found that it was the order of the call to
request.setCharacterEncoding("UTF-8");
that was causing the problem. It has to be called before any all call to request.getParameter(), so I made a special filter to use at the top of my filter chain.
https://rogerkeays.com/servletrequest-setcharactercoding-ignored
I had the same problem and it turned out that in addition to specifying the encoding in the Filter
request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");
it is necessary to add "acceptcharset" to the form
<form method="post" enctype="multipart/form-data" acceptcharset="UTF-8" >
and run the JVM with
-Dfile.encoding=UTF-8
The HTML meta tag is not necessary if you send it in the HTTP header using response.setCharacterEncoding().
In case someone stumbled upon this problem when working on Grails (or pure Spring) web application, here is the post that helped me:
http://forum.spring.io/forum/spring-projects/web/2491-solved-character-encoding-and-multipart-forms
To set default encoding to UTF-8 (instead of the ISO-8859-1) for multipart requests, I added the following code in resources.groovy (Spring DSL):
multipartResolver(ContentLengthAwareCommonsMultipartResolver) {
defaultEncoding = 'UTF-8'
}
I'm using org.apache.commons.fileupload.servlet.ServletFileUpload.ServletFileUpload(FileItemFactory)
and defining the encoding when reading out parameter value:
List<FileItem> items = new ServletFileUpload(new DiskFileItemFactory()).parseRequest(request);
for (FileItem item : items) {
String fieldName = item.getFieldName();
if (item.isFormField()) {
String fieldValue = item.getString("UTF-8"); // <-- HERE
The filter is key for IE. A few other things to check;
What is the page encoding and character set? Both should be UTF-8
<%# page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
What is the character set in the meta tag?
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Does your MySQL connection string specify UTF-8?
e.g.
jdbc:mysql://127.0.0.1/dbname?requireSSL=false&useUnicode=true&characterEncoding=UTF-8
I am using Primefaces with glassfish and SQL Server.
in my case i created the Webfilter, in back-end, to get every request and convert to UTF-8, like this:
package br.com.teste.filter;
import java.io.IOException;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.annotation.WebFilter;
#WebFilter(servletNames={"Faces Servlet"})
public class Filter implements javax.servlet.Filter {
#Override
public void destroy() {
// TODO Auto-generated method stub
}
#Override
public void doFilter(ServletRequest request, ServletResponse response,
FilterChain chain) throws IOException, ServletException {
request.setCharacterEncoding("UTF-8");
chain.doFilter(request, response);
}
#Override
public void init(FilterConfig filterConfig) throws ServletException {
// TODO Auto-generated method stub
}
}
In the View (.xhtml) i need to set the enctype paremeter's form to UTF-8 like #Kevin Rahe:
<h:form id="frmt" enctype="multipart/form-data;charset=UTF-8" >
<!-- your code here -->
</h:form>
The filter thing and setting up Tomcat to support UTF-8 URIs is only important if you're passing the via the URL's query string, as you would with a HTTP GET. If you're using a POST, with a query string in the HTTP message's body, what's important is going to be the content-type of the request and this will be up to the browser to set the content-type to UTF-8 and send the content with that encoding.
The only way to really do this is by telling the browser that you can only accept UTF-8 by setting the Accept-Charset header on every response to "UTF-8;q=1,ISO-8859-1;q=0.6". This will put UTF-8 as the best quality and the default charset, ISO-8859-1, as acceptable, but a lower quality.
When you say the file name is garbled, is it garbled in the HttpServletRequest.getParameter's return value?
I had the same problem. The only solution that worked for me was adding <property = "defaultEncoding" value = "UTF-8"> to multipartResoler in spring configurations file.
You also have to make sure that your encoding filter (org.springframework.web.filter.CharacterEncodingFilter) in your web.xml is mapped before the multipart filter (org.springframework.web.multipart.support.MultipartFilter).
I think i'am late for the party but when you use a wildfly, you can add an default-encoding to the standalone.xml.
Just search in the standalone.xml for
<servlet-container name="default">
and add encoding like this:
<servlet-container name="default" default-encoding="UTF-8">
To avoid converting all request parameters manually to UTF-8, you can define a method annotated with #InitBinder in your controller:
#InitBinder
protected void initBinder(WebDataBinder binder) {
binder.registerCustomEditor(String.class, new CharacterEditor(true) {
#Override
public void setAsText(String text) throws IllegalArgumentException {
String properText = new String(text.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);
setValue(properText);
}
});
}
The above will automatically convert all request parameters to UTF-8 in the controller where it is defined.
You do not use UTF-8 to encode text data for HTML forms. The html standard defines two encodings, and the relevant part of that standard is here. The "old" encoding, than handles ascii, is application/x-www-form-urlencoded. The new one, that works properly, is multipart/form-data.
Specifically, the form declaration looks like this:
<FORM action="http://server.com/cgi/handle"
enctype="multipart/form-data"
method="post">
<P>
What is your name? <INPUT type="text" name="submit-name"><BR>
What files are you sending? <INPUT type="file" name="files"><BR>
<INPUT type="submit" value="Send"> <INPUT type="reset">
</FORM>
And I think that's all you have to worry about - the webserver should handle it. If you are writing something that directly reads the InputStream from the web client, then you will need to read RFC 2045 and RFC 2046.

Categories

Resources