Character is corrupted when use request.getParameter() in java

Character is corrupted when use request.getParameter() in java - java

I have a web page where I do a search based on the text given in a text box. This text can be in any language like japanese, chinese etc (or any mbcs character).
Now when i enter a text in japanese (or any other mbcs character), the result populates the screen (form) with some wierd characters.
For Example: testテスト will turn into testãã¹ã.
When i see the post parameters in Firebug (debugging tool) i can see that the search string goes as testテスト however when i put debug statements in my code, i can see that request.getParameter("searchString") is not able to identify the japanese characters and turn them into some wierd chars.
My JSP header already has <%# page contentType="text/html; charset=UTF-8"
I have also tried putting pageEncoding="UTF-8" in this but it didn't help.
I tried setting character encoding like request.setCharacterEncoding("UTF-8") also just before doing request.getParameter but that too didn't work for me.
After going through a few forums and blogs i also tried setting useBodyEncodingForURI=true in the <Connector> of tomcat config but that also did not help me.
Can anybody suggest me something to resolve this issue?

set the following encoding in every servlet/ action
response.setContentType("UTF-8");
response.setCharacterEncoding("UTF-8");
request.setCharacterEncoding("UTF-8");
also set following in first servlet/action
for japanese
response.setCharacterEncoding("UTF-8");
request.setCharacterEncoding("UTF-8");
session.setAttribute(Globals.LOCALE_KEY, new Locale("jp", "ja_JP"));

Related

how do i get utf8 value from querystring using jsp

I am using JSP pages and I am trying to get the UTF-8 value from query string using the statements as below request.getparameter("q");
It is working fine, it gives me the appropriate results but when I am trying it using IE9 it gives me ????? instead of unicode value.
My question is how do I get proper unicode value from query string using JSP that will gives correct values on all browser including IE, and what statements I need to add within JSP page to get correct values at IE as well.
Please help me in this regard.
thank you

In the jsp page directive you need to set content-type to utf-8 (for each jsp page)
<%# page contentType="text/html; charset=UTF-8" pageEncoding="ISO-8859-1" %>
If still problem persist then use this SO question to handle encoding for DB, Tomcat. HERE

request.getCharacterEncoding() returns NULL... why?

A coworker of mine created a basic contact-us type form, which is mangling accented characters (è, é, à, etc). We're using KonaKart a Java e-commerce platform on Struts 1.
I've narrowed the issue down to the data coming in through the HttpServletRequest object. Comparing a similar (properly functioning) form, I noticed that on the old form the request object's Character Encoding (request.getCharacterEncoding()) is returned as "UTF-8", but on the new form it is coming back as NULL, and the text coming out of request.getParameter() is already mangled.
Aside from that, I haven't found any significant differences between the known-good form, and the new-and-broken form.
Things I've ruled out:
Both HTML pages have the tag: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Both form tags in the HTML use POST, and do not set encodings
Checking from Firebug, both the Request and Response headers have the same properties
Both JSP pages use the same attributes in the <%#page contentType="text/html;charset=UTF-8" language="java" %> tag
There's nothing remotely interesting going on in the *Form.java files, both inherit from BaseValidatorForm
I've checked the source file encodings, they're all set to Default - inherited from Container: UTF-8
If I convert them from ISO-8859-1 to UTF-8, it works great, but I would much rather figure out the core issue.
eg: new String(request.getParameter("firstName").getBytes("ISO-8859-1"),"UTF8")
Any suggestions are welcome, I'm all out of ideas.

Modern browsers usually don't supply the character encoding in the HTTP request Content-Type header. It's in case of HTML form based applications however the same character encoding as specified in the Content-Type header of the initial HTTP response serving the page with the form. You need to explicitly set the request character encoding to the same encoding yourself, which is in your case thus UTF-8.
request.setCharacterEncoding("UTF-8");
Do this before any request parameter is been retrieved from the request (otherwise it's too late; the server platform default encoding would then be used to parse the parameters, which is indeed often ISO-8859-1). A servlet filter which is mapped on /* is a perfect place for this.
See also:
Unicode - How to get the characters right?

The request.getCharacterEncoding() relies on the Content-Type request attribute, not Accept-Charset
So application/x-www-form-urlencoded;charset=IS08859_1 should work for the POST action. The <%#page tag doesn't affect the POST data.

Jquery html() encoding error in chrome

I'm developing an Java Web Application, I used some jQuery and a REST web services that output an JSON object with a list of Javascript objects using AJAX. All is ok but when I try to fill a table created with Javascript using jQuery.html() to a valid div, all hell broke loose in Chrome, including this
Error: INVALID_STATE_ERR: DOM Exception 11
in the Javascript console.
The problem is like this, try this in chrome Javascript console:
$('#ValidadorWrapper').html("<div>AMOXICILIN 100 C&psulapsula</div>");
But if we delete the ampersand, it works
$('#ValidadorWrapper').html("<div>AMOXICILIN 100 Camp;psulapsula</div>");
This problem happens only in Chrome, I suspect it has something to do with the encoding characters but I cant' find any way of doing it. Obviously I need to input an & ( i mean the & entity) in this document.
Some steps I have tried and didn't work:
I'm using the gson library to output a String to an JSP page. My JSP page have this header <%#page pageEncoding= "UTF-8" contentType="text/html; charset=UTF-8" %> . This was my first attempt and didn't work when an ampersand appeared in my JSON object (well any special char).
My second attempt was using the HTMLEntities Java library to encode all special chars. This is the actual version and it still doesn't work
Using unicode chars like \u0026 doesnt work either
There is something more strange. Apparently if I use $('#ValidadorWrapper').html("AMOXICILIN 100 \u0026"); it works!, but this is just an example. I'm trying to fill an HTML table with my object so I really need to put that data inside html (table) tags

Try this:
$('<div></div>').appendTo('#ValidadorWrapper').text('AMOXICILIN 100 Cápsulapsula');

jsp pages displaying junk characters in non english languages

I have a Main JSP page say jsp1 which includes two JSP pages (jsp2, jsp3). All the strings in these pages come from property files.
The non-english property files are converted using native2ascii
native2ascii –encoding="8859-1" lang.properties lang1.properties
All the JSP pages have
<%# page contentType="text/html;charset=UTF-8" language="java" %>
Now when main jsp page(jsp1) gets displayed, we see garbled characters in a few strings of jsp2 and jsp3. Till now I have seen this happening to Russian, Korean, Japanese language strings. And it happens on a random string.
Does any one have an idea what could be wrong
Updating with more details
The string in rus_utf8.proeperties is
Щелкните <strong>УСТАНОВИТЬ СЕЙЧАС</strong> и сохраните файл в некотором расположении
After Conversion using native2Ascii, String in rus.properties is
\u0429\u0435\u043b\u043a\u043d\u0438\u0442\u0435 <strong>\u0423\u0421\u0422\u0410\u041d\u041e\u0412\u0418\u0422\u042c \u0421\u0415\u0419\u0427\u0410\u0421</strong> \u0438 \u0441\u043e\u0445\u0440\u0430\u043d\u0438\u0442\u0435 \u0444\u0430\u0439\u043b \u0432 \u043d\u0435\u043a\u043e\u0442\u043e\u0440\u043e\u043c \u0440\u0430\u0441\u043f\u043e\u043b\u043e\u0436\u0435\u043d\u0438\u0438.
In JSP we use struts <s:text> to load the string from property file
In firefox the string got displayed as
��елкните УСТАНОВИТЬ СЕЙЧАС и сохраните файл в некотором расположении.
The char Щ got garbled. Same String in some other place in the page got displayed properly.

The non-english property files are converted using native2ascii
native2ascii –encoding="8859-1" lang.properties lang1.properties
This is invalid. It should have been
native2ascii –encoding ISO-8859-1 lang.properties lang1.properties
Apart from the syntax error which you have there (which should immediately have aborted native2ascii), the ISO-8859-1 encoding can impossibly be correct for Russian, Korean and Japanese strings. The ISO-8859-1 encoding does not cover those characters at all. Assuming that you saved it as UTF-8, then you should be using
native2ascii –encoding UTF-8 lang.properties lang1.properties
This way the native2ascii will convert from an UTF-8 lang.properties to an ISO-8859-1 compatible lang1.properties. The native2ascii will always convert to ASCII. The -encoding attribute concerns the encoding of the source file, not the target file.
As to the JSP pages, just a
<%#page contentType="text/html; charset=UTF-8" %>
ought to be sufficient, per http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8.
See also:
Unicode - How to get the characters right?
Update as per your update with the examples. Everything is actually working right. It only look much like that the UTF-8 BOM (Byte Order Mark) is the culprit. Notepad adds it by default. Try creating the properties file in another editor instead like Eclipse.

I have the same problem as you and I also tried the similar solutions but they didn't work. Hence I suspected that it may not be an issue with JSP config but rather it is a config issue with my tomcat.
I found this on a Chinese site, https://openhome.cc/Gossip/Encoding/Servlet.html: request.setCharacterEncoding("UTF-8");. It worked for me. I added this before my request.getParameter();.

Handling Spanish characters in Java/JSP

I have a small webapp which handles a lot of Spanish text.
At one point in the code, a JSP page responds with a Json String containing some of this text. If I print the String to the Console, it looks like jibberish. But if I examine the header/content of the response in Chrome Developer Tools, it looks correct. It is transferred in the correct encoding. This part of the webapp functions as expected.
At another point in the code, a different JSP page responds with HTML. Some of this HTML contains more of the Spanish text. This time, the text is transferred (and displayed) as jibberish.
What are potential reasons that this could be happening? Both times, I'm just printing the text using out.print. Why does it work at one point, but not in other?
Examples:
// In a file who's only output is the json string
String jsonString = ...
System.err.println(jsonString); // prints jibberish
out.println(jsonString); // looks correct when the response is viewed in Chrome Developer tools, and looks correct in a browser
...
// In a file who's output is a complete html page
String spanishText = ...
out.println("<label>" + spanishText + "</label>"); // looks like jibberish when the response is viewed in Chrome developer tools, and shows up as jibberish in a browser

You need to set the encoding which the JSP/Servlet response should use to print the characters and instruct the webbrowser to use the same encoding.
This can be done by putting this in top of your JSP:
<%# page pageEncoding="UTF-8" %>
Or if you're actually doing this in a Servlet:
response.setCharacterEncoding("UTF-8");
The "jibberish" when using System.err is a different problem. You need to set the encoding of the console/logfile which is been used to print this information to. If it's for example Eclipse, then you can set it by Window > Preferences > General > Workspace > Text File Encoding.
See also:
Unicode - How to get the characters right? - Fixing JSP/Servlet response
Unicode - How to get the characters right? - Fixing development environment

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.