Spring/Tomcat not honoring UTF-8 encoding? - java

My web application (Java/Tomcat/Spring/Maven) is having trouble dealing with special characters like ’ (hex 92, decimal 146). This comes into my app as another weird character.
I have looked at this question and verified that I I have the following line in all my JSP files:
<%# page contentType="text/html; charset=UTF-8" %>
I also looked at this question and verified that I have the following line in my Maven pom.xml:
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
So as far as I can tell everything should be built and handled in UTF-8. But when I submit the string Martin’s Auto Repair what shows up at the server during the Spring binding process is Martinâ\u0080\u0099s Auto Repair. This is the string that gets handed back by Tomcat to my application.
Worse, this is echoed back to the browser so submitting the altered string again expands the weird characters over and over.
Any suggestions? At this point I'm not sure if this is a browser problem or a server problem.

Hex 92 is not a character in Unicode (http://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF)
Windows codepage 1252 is not 100% identical to Unicode.

Related

JSP contentType and pageEncoding not working, I've tried it all

I'm completely puzzled. I've set my Apache and Tomcat config files, Java Servlet project and JSP pages, ALL OF IT to "UTF-8" to support spanish characters (á, í, ó, etc). I've systematically followed all guidelines found on the documentation and forums. I know I could use Latin1, but since it seems to be easier to use UTF-8, but after 4 days of trial and error, I've decided (since my servlet will support only spanish characters) to switch to "ISO-8859-1", which is actually mostly working.
The only problem is that ONLY my JSP pages still says "UTF-8" when I right click --> Properties. The page directive and meta tag is correct (ISO-8859-1), but when I open it on the browser says "Windows-1252".
I have no idea why this is happening. If I switch to "UTF-8" (all of it, including server config, Java project, etc), the characters appears garbled at the browser, e.g.: "puntuación" instead of "puntuación".
So, to iron this question out...
Does anyone knows how to implement UTF-8 correctly and make spanish characters work everywhere?
or
Does anyone knows how to change JSP pages to be "ISO-8859-1" everywhere? Right now, it's UTF-8 at the Properties window, ISO-8859-1 as #page (contentType charset and pageEncoding) directive, and Windows-1252 at the browser
As always, I'm more than grateful in advance for your patience and support.
After adding the "CharsetFilter" class to my project, everything worked just fine.
Follow these guidelines: How to get UTF-8 working in Java webapps?
PS: I've completely removed all lines mentioning:
response.setCharacterEncoding("UTF-8");
response.setContentType("text/html;charset=UTF-8");
But I kept on the JSPs:
Happy coding!

Strange character encoding issue with Eclipse / Spring / Tomcat 6

I have been trying things all day but can't get a proper solution. My problem is: I am developing a Spring MVC based app in my local Tomcat. My MySQl database has UTF-8 encoding set, all content in there displays properly when using phpMyAdmin. Also the output in LOG files using log4j in catalina.out works fine.
My JSP pages are configured by
<!-- encoding -->
<%# page contentType="text/html; charset=UTF-8" %>
<%# page pageEncoding="UTF-8" %>
Also showing data on my JSP works fine. I can also send data from my Controller without any DB intereference using special chars, e.g.
String str = "UTF-8 Test: Ä Ö Ü ß è é â";
logger.debug(str);
mav.addObject("utftest", str);
That displays correctly in log and on jsp page in browser.
BUT: When having special chars directly in my JSP file, e.g. for text in headers, this does not work. FF and Google Chrome display strange chars but report the page to be UTF-8. When switching to Latin, the chars just get more and more strange.
Same problem when showing text tokens from my messages.properties file, although Eclipse says when right-clicking that UTF-8 will be used.
I am a little at lost and don't know where to check now.
Summary:
DB storage is fine
DB output on JSP is fine
Output on JSP directly form controller is fine
even reading in form forms is fine
.properties files and JSP text is not fine !!!
Any ideas? I really appreciate and tips.
The quest
I got exactly the same problem than yours with a very similar configuration (Tomcat, Spring, Spring Web Flow, JSF2).
Little facts about my own investigations:
WAR under Tomcat Window: encoding problem,
same WAR under Tomcat Linux: no problem → suspect OS default encoding as Linux is in UTF-8,
same WAR under Tomcat run by Eclipse WTP on Windows: no problem → WTF?!
passing properties files in UTF-8 with natural latin characters instead of unicode placeholders: solve the problem for externalized labels,
same in Facelets (JSF2 pages): always get the problem, only thing working is <f:verbatim>&eacute;</f:verbatim>.
Still getting the problem, after having checked all my code for classic prerequisites and recommandations found on forums:
<?xml version="1.0" encoding="UTF-8" ?> at top of XML files,
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> inside HTML header of same files,
encoding="UTF-8" in <f:view>.
The configuration of Tomcat in the following ways did nothing:
URIEncoding="UTF-8" on connector in server.xml (normal because it concerns URI encoding not page encoding)
org.springframework.web.filter.CharacterEncodingFilter on and off,
also that (I presumably miss the point here):
<locale-encoding-mapping-list>
<locale-encoding-mapping>
<locale>fr</locale>
<encoding>UTF-8</encoding>
</locale-encoding-mapping>
</locale-encoding-mapping-list>
The key
I found the solution comparing the Tomcat command line between WTP and classic command-line MS-DOS Tomcat launch. The only difference is the parameter -Dfile.encoding=UTF-8. It was the key for me to solve the problem.
Set JAVA_OPTS=-Dfile.encoding="UTF-8" and it works fine.
The (attempted) explanation
The only explanation I found, Tomcat use JVM encoding which is by default the system encoding (UTF-8 on Linux, CP1252 on Windows). Eclipse WTP force the JVM encoding according to its workspace encoding settings. Passing JVM in UTF-8 gives the solution.
I suspect it's not really the right one and that there is a configuration problem either on my stack or on resources filtering made either by maven-resources-plugin or maven-war-plugin, but I haven't found it yet.
As BalusC said, you must save the files in format utf-8.
To address your additional problem of included files, simply include the header
<%# page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
at the top of each included file. This tells the servlet to treat the file as UTF-8 encoded, instead of using the default ISO-8859-1.
You need to configure Eclipse to save the files as UTF-8.
Go to Window > Preferences, enter filter text encoding in top, explore all sections to set everything to UTF-8. Specifically for JSP files this is in Web > JSP Files > Encoding. Choose the topmost UTF-8 option (called "ISO 10646/Unicode(UTF-8)").
For properties files this is a story apart. As per the specification, they will by default be read as ISO-8859-1. You need either native2ascii tool for this or supply a custom properfies file loader which uses UTF-8. For more detail, see this article.
I'm using Tomcat 7 with Spring frameworks and using <jsp:include page="anyFile.html"/> in JSP fail and give me a java.lang.IllegalStateException. The <jsp:include> works fine if i want to include another JSP file instead of a static HTML file though but when I'm trying to inject static HTML file it keep giving me this exception in relation with the Character Encoding.
Using <jsp:directive.include file="anyFile.html" /> or <%#include file="anyFile.html"%> works but all the special character ("é", "è", "ç" etc.) appear coded into ISO-8891 instead of UTF-8 even if the JSP file have the <%#page contentType="text/html" pageEncoding="UTF-8"%> and the <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> in it.
I found the solution by using the JSLT tag library with the import tag:
put this into the JSP:
<%#taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c"%>
Then get the HTML file I want to include using this:
<c:import url="anyFile.html" charEncoding="UTF-8"/>
Has you can see the import tag from the JSLT library have a charEncoding attribute that can set the html file to the appropriate Character encoding and display it's content correctly.
For JSP, see #BalusC.
For properties files see: http://download.oracle.com/javase/1.4.2/docs/api/java/util/Properties.html
When saving properties to a stream or loading them from a stream, the
ISO 8859-1 character encoding is used. For characters that cannot be
directly represented in this encoding, Unicode escapes are used;
however, only a single 'u' character is allowed in an escape sequence.
The native2ascii tool can be used to convert property files to and
from other character encodings.

Encountering encoding issues on linux box, not Windows

I'm running into an encoding issue that has stumped me for a few weeks and nothing seems to work. I have a website that works fine on my local machine, but when I push the jsp files to a Linux box for review, characters that previously rendered fine are now displaying as funky characters.
For some reason, some characters display just fine, but other characters will not encode properly. All text on the page is being read from java .properties files and output to the page using beans.
I've added a meta tag to the page to set encoding, which did nothing. I also added <%# page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%> but this did nothing on the linux box and actually made the encoding errors appear on my local windows machine.
Any help would be greatly appreciated.
Check that the method loading the properties is using the character encoding that the property files are actually written in.
Without explicit setting this, the default encoding for the file system is used, and it is ISO-Latin-1 on Windows, and UTF-8 on some Linux distributions.
The following need to play together for character encoding to work properly in Nixes and Nuxes:
file system encoding
database encoding (does not seem to apply)
database connector encoding
Java-internal string encoding (UTF-16, if I remember correctly)
Java output encoding
HTML page encoding
With your page directive, you only addressed the last bullet. In other words, you are instructing the brower to decode the page as UTF-8, but that's not what you are sending.
Take a look at this (admittedly a few years old) paper, chapter 11 in particular.
Also, check the physical files on both machines. I've seen several FTP clients muck up files during transfer. A quick check is to push your file as html instead of jsp. You'll get garbage for all the <% %> sequences, but the other text should show up unchanged. You've also taken the app server out of the equation. If the text is still funky, it's your FTP or WebDAV client trying to "help".
Look at the http headers sent by the server. That is the first place the browser looks for encoding before anything else.

Unicode/Japanese characters in a Java applet

I'm writing an applet that's supposed to show both English and Japanese (unicode) characters on a JLabel. The Japanese characters show up fine when I run the applet on my system, but all I get is mojibake when I run it from the web page. The page can display Japanese characters if they're hard-coded into the HTML, but not in the applet. I'm pretty sure I've seen this sort of thing working before. Is there anything I can do in the Java code to fix this?
My first guess would be that the servlet container is not sending back the right character set for your webapp resources. Have a look at the response in an HTTP sniffer to see what character set is included - if the response says that the charset is e.g. CP-1252, then Japanese characters would not be decoded correctly.
You may be able to fix this in code by explicitly setting a Content-Type header with the right charset; but I'd argue it's more appropriate to fix the servlet container's config to return the correct character set for the relevant resources.
Well I'm not sure what was causing the problem, but I set EVERYTHING to read in and display out in UTF-8 and it seems to work now.

Dealing with special characters in a URL using Java

I've written a Java program to generate an m3u file based on a CD ripped from k3b which pretty much preserves special character encodings in artist, album and track names. I then place these m3u files on a server and generate a GWT web application where the m3u file name is the target of an HTML anchor tag. For 99+% of cases, this all works perfectly. For a few cases, special characters cause the link to fail.
One failing example is the Movits! album Äppelknyckarjazz (note the first character which gets encoded by a URI constructor as %C3%84). Since the client is GWT, view source does not show the link, :-( But when hovering over the link Firefox shows the correctly decoded URL. When clicking on the link, Firefox fails with: "...Äppelknyckarjazz.m3u was not found on this server" It is as though different character encoding schemes are at play but frankly my brain is hurting in trying to unravel the puzzle at this level.
So there are really two questions:
1) Is my problem an encoding scheme issue?
2) Assuming it is, how can I maintain consistency given the various pieces of the application (Java m3u generater, GWT client, Firefox browser, Apache web server).
String result = java.net.URLEncoder.encode("Äppelknyckarjazz", "UTF-8");
I think this is a solution for you.
Ä can be encoded as %C3%84 (UTF8) or %C4 (Latin1). Sounds like you are using a mixture of Latin11 and UTF8. You need to make sure the same encoding is used across all your systems.
In rare case that you can't control the encoding, see my answer to this question,
How to determine if a String contains invalid encoded characters
First you have to declare a charset on your HTML-page. Best ist UTF-8.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Then you should configure your webserver to interpret requests from clients as UTF-8. When using tomcat, set the URIEncoding-parameter on your Connector-tag:
<Connector port="8080" protocol="HTTP/1.1" URIEncoding="UTF-8" />

Categories

Resources