I'm facing a really annoying problem:I created a form with spring's form-tags and when I insert text with non-latin characters I get a sequence of questionmarks.I've used the CharacterEncodingFilter in my web.xml but I'm still facing the same problem,I've set characterEncoding in UTF-8 at the formBackingObject method of my controller,I've set page encoding charset and enctype to UTF-8 with no result.I know there are similar posts here and I've tried the suggested solutions but nothing changed!Any suggestions? thank you in advance
A sequence of question marks is typical when either the DB encoding or the HTTP response encoding cannot accept the obtained bytes for the encoding it was instructed to use.
Since you've set the page encoding to UTF-8, the HTTP response encoding part is fine (assuming that all you did was putting <%#page pageEncoding="UTF-8" %> in top of JSP).
So, the DB encoding is suspect. You need to ensure that the DB is been instructed to use the proper encoding to store the characters. You're supposed to do this in CREATE DATABASE and CREATE TABLE statements. With some JDBC drivers you also need to pass an extra argument in JDBC connection string to specify the encoding the bytes are transferred in. The details depends on the DB and JDBC driver used, so it's up to you to consult the appropriate manuals. If you stucks, update your question to include the DB make/version used.
See also:
Unicode - How to get the characters right? - Section about Databases
Related
Good afternoon,
I'm trying resolve the classic encoding error in java, but I don't know what to do...
I try:
add on jsp: <%#page contentType="text/html"pageEncoding="UTF-8"%>
use "SQL_Latin1_General_CP1_CI_AS" no select(sqlserver)
add "CharacterSet=UTF-8" on String conection of jdbc
add response.setContentType("application/json"); and response.setCharacterEncoding("utf-8"); on servlet
but nothing works!!!!
SGBD: SQL Server
Server: GlassFish
Exemple record of database "Está"
what can I do?
Seems that you have jtds parameter sendStringParametersAsUnicode=false
One solution is to change it to true. If not then:
SQL_Latin1_General_CP1_CI_AS is CP-1252 (Windows-1252) encoding, so to search in database you need to encode your Unicode string to Windows-1252:
new String(value.getBytes("UTF-8"), "Windows-1252")
Vice versa after read from database:
new String(value.getBytes("Windows-1252"), "UTF-8")
I try to send info to server. charset encoding set to UTF-8. jsp page encoding also set to UTF-8. I use Spring mvc
I form json and try to send it to server. but when when I get response body I see strange symbols between words attributeCategory%5B0%5D%5Batt.
I searched and all suggestions were to have encoding utf-8 to resolve such problem.
UPDATE
When I add on server side this line URLDecoder.decode(body, "ISO-8859-1"); everything was encoded in normal form. So my question what I need to change with my json or something else to make my program work with UTF-8 encoding
%5B = [ (hexadecimal code 5B)
%5D = ] (hexadecimal code 5D)
This might stem from HTML INPUT fields with the same name, so in fact attributeCategory[0][att was meant (probably miscopied here).
It could also be JavaScript.
It is a url encoding for HTTP transmission of non-basic characters like [, ] and so on. Nothing to do with an encoding.
I hope this points to some cause of this error.
I have a search form in JSF that is implemented using a RichFaces 4 autocomplete component and the following JSF 2 page and Java bean. I use Tomcat 6 & 7 to run the application.
...
<h:commandButton value="#{msg.search}" styleClass="search-btn" action="#{autoCompletBean.doSearch}" />
...
In the AutoCompleteBean
public String doSearch() {
//some logic here
return "/path/to/page/with/multiple_results?query=" + searchQuery + "&faces-redirect=true";
}
This works well as long as everything withing the "searchQuery" String is in Latin-1, it does not work if is outside of Latin-1.
For instance a search for "bodø" will be automatically encoded as "bod%F8". However a search for "Kra Ðong" will not work since it is unable to encode "Ð".
I have now tried several different approaches to solve this, but none of them works.
I have tried encoding the searchQuery my self using URLEncode, but this only leads to double encoding since % is encoded as %25.
I have tried using java.net.URI to get the encoding, but gives the same result as URLEncode.
I have tried turning on UTF-8 in Tomcat using URIEncoding="UTF-8" in the Connector but this only worsens that problem since then non-ascii characters does not work at all.
So to my questions:
Can I change the way JSF 2 encodes the GET parameters?
If I cannot change the way JSF 2 encodes the GET parameters, can I turn of the encoding and do it manually?
Am I doing something where strange here? This seems like something that should be supported out-of-the-box, but I cannot find any others with the same problem.
I think you've hit a corner case bug in JSF. The query string is URL-encoded by ExternalContext#encodeRedirectURL() which uses the response character encoding as obtained by ExternalContext#getResponseCharacterEncoding(). However, while JSF by default uses UTF-8 as response character encoding, this is only been set if the view is actually to be rendered, not when the response is to be redirected, so the response character encoding still returns the platform default of ISO-8859-1 which causes your characters to be URL-encoded using this wrong encoding.
I've reported this as issue 2440. In the meanwhile your best bet is to explicitly set the response character encoding yourself beforehand.
FacesContext.getCurrentInstance().getExternalContext().setResponseCharacterEncoding("UTF-8");
Note that this still requires that the container itself uses the same character encoding to decode the request URL, so you certainly need to set URIEncoding="UTF-8" in Tomcat's configuration. This won't mess up the characters anymore as they will be really UTF-8 now.
The only character encoding accepted for HTTP URLs and headers is US-ASCII, you need to URL encode these characters to send them back to the application. Simplest way to do this in java would be:
public String doSearch() {
//some logic here
String encodedSearchQuery = java.net.URLEncoder.encode( searchQuery, "UTF-8" );
return "/path/to/page/with/multiple_results?query=" + encodedSearchQuery + "&faces-redirect=true";
}
And then it should work for any character that you use.
I have a Java servlet which gets RSS feeds converts them to JSON. It works great on Windows, but it fails on Centos.
The RSS feed contains Arabic and it shows unintelligible characters on Centos. I am using those lines to encode the RSS feed:
byte[] utf8Bytes = Xml.getBytes("Cp1256");
// byte[] defaultBytes = Xml.getBytes();
String roundTrip = new String(utf8Bytes, "UTF-8");
I tried it on Glassfish and Tomcat. Both have the same problem; it works on Windows, but fails on Centos. How is this caused and how can I solve it?
byte[] utf8Bytes = Xml.getBytes("Cp1256");
String roundTrip = new String(utf8Bytes, "UTF-8");
This is an attempt to correct a badly-decoded string. At some point prior to this operation you have read in Xml using the default encoding, which on your Windows box is code page 1256 (Windows Arabic). Here you are encoding that string back to code page 1256 to retrieve its original bytes, then decoding it properly as the encoding you actually wanted, UTF-8.
On your Linux server, it fails, because the default encoding is something other than Cp1256; it would also fail on any Windows server not installed in an Arabic locale.
The commented-out line that uses the default encoding instead of explicitly Cp1256 is more likely to work on a Linux server. However, the real fix is to find where Xml is being read, and fix that operation to use the correct encoding(*) instead of the default. Allowing the default encoding to be used is almost always a mistake, as it makes applications dependent on configuration that varies between servers.
(*: for this feed, that's UTF-8, which is the most common encoding, but it may differ for others. Finding out the right encoding for a feed depends on the Content-Type header returned for the resource and the <?xml encoding declaration. By far the best way to cope with this is to fetch and parse the resource using a proper XML library that knows about this, for example with DocumentBuilder.parse(uri).)
There are many places where wrong encoding can be used. Here is the complete list http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8
I have a servlet which receive some parameter from the client ,then do some job.
And the parameter from the client is Chinese,so I often got some invalid characters in the servet.
For exmaple:
If I enter
http://localhost:8080/Servlet?q=中文&type=test
Then in the servlet,the parameter of 'type' is correct(test),however the parameter of 'q' is not correctly encoding,they become invalid characters that can not parsed.
However if I enter the adderss bar again,the url will changed to :
http://localhost:8080/Servlet?q=%D6%D0%CE%C4&type=test
Now my servlet will get the right parameter of 'q'.
What is the problem?
UPDATE
BTW,it words well when I send the form with post.
WHen I send them in the ajax,for example:
url="http://..q='中文',
xmlhttp.open("POST",url,true);
Then the server side also get the invalid characters.
It seems that just when the Chinese character are encoded like %xx,the server side can get the right result.
That's to say http://.../q=中文 does not work,
http://.../q=%D6%D0%CE%C4 work.
But why "http://www.google.com.hk/search?hl=zh-CN&newwindow=1&safe=strict&q=%E4%B8%AD%E6%96%87&btnG=Google+%E6%90%9C%E7%B4%A2&aq=f&aqi=&aql=&oq=&gs_rfai=" work?
Ensure that the encoding of the page with the form itself is also UTF-8 and ensure that the browser is instructed to read the page as UTF-8. Assuming that it's JSP, just put this in very top of the page to achieve that:
<%# page pageEncoding="UTF-8" %>
Then, to process GET query string as UTF-8, ensure that the servletcontainer in question is configured to do so. It's unclear which one you're using, so here's a Tomcat example: set the URIEncoding attribute of the <Connector> element in /conf/server.xml to UTF-8.
<Connector URIEncoding="UTF-8">
For the case that you'd like to use POST, then you need to ensure that the HttpServletRequest is instructed to parse the POST request body using UTF-8.
request.setCharacterEncoding("UTF-8");
Call this before you access the first parameter. A Filter is the best place for this.
See also:
Unicode - How to get the characters right?
Using non-ASCII characters as GET parameters (i.e. in URLs) is generally problematic. RFC 3986 recommends using UTF-8 and then percent encoding, but that's AFAIK not an official standard. And what you are using in the case where it works isn't UTF-8!
It would probably be safest to switch to POST requests.
I believe that the problem is on sending side. As I understood from your description if you are writing the URL in browser you get "correctly" encoded request. This job is done by browser: it knows to convert unicode characters to sequence of codes like %xx.
So, try to check how do you send the request. It should be encoded on sending.
Other possibility is to use POST method instead of GET.
Do read this article on URL encoding format "www.blooberry.com/indexdot/html/topics/urlencoding.htm".
If you want, you could convert characters to hex or Base64 and put them in the parameters of the URL.
I think it's better to put them in the body (Post) then the URL (Get).