Say I am displaying escaped value in HTML with below code under text area:
<c:out value="${person.name}" />
My question do I need to decode this value at server side manually or browser will do it automatically ?
No, you need not to decode this value manually .. All you need is:
Specify your HTTP response content type encoding as UTF-8. To be precise use HttpServletResponse.setContentType ("text/html;charset=utf-8");.
Your JSP should have content type encoding set as UTF-8 in your JSP .. To be precise add this meta tag in your JSP and you should be good to go <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
When you have this tag in your JSP then browser will understand that content of this page should be render as per UTF-8 encoding rules.
If don't specify page encoding explicitly using these kind of meta tags or some other mechanism then browser use default encoding associated with it while page rendering and you may not see expected result especially for characters from Unicode's advanced blocks of BMP and Supplementary Multilingual Plane. Check this on how to see the default encoding of browser.
Concept
Server should specify desired encoding scheme in "response stream" and same encoding scheme should be used in JSP/ASP/HTML page.
Server side encoding options
PHP
header('Content-type: text/html; charset=utf-8');
Perl
print "Content-Type: text/html; charset=utf-8\n\n";
Python
Use the same solution as for Perl (except that you don't need a semicolon at the end).
Java Servlets
resource.setContentType ("text/html;charset=utf-8");
JSP
<%# page contentType="text/html; charset=UTF-8" %>
ASP and ASP.Net
<%Response.charset="utf-8"%>
Client side encoding options
Use following meta tag in your HTML page <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
Further reading:
HTTP-charset
This answer
when I get the request.parameter for the escaped input (done thru) <c:out value="${person.name}" />, I get the escaped value and store it in db as it is. For example :- <script>test</script> is stored as <script>test</script> Now when value is fetched from DB and displayed on browser, it renders it correctly i.e <script>test</script> is displayed as <script>test</script>
Related
I am trying to setup the right encoding for my JSP/servlet pages in Tomcat 7. Though, I have to be successful yet. I made some tries from the suggestions given by this stackexchange thread: Character encoding JSP -displayed wrong in JSP but not in URL: "á » á é » é", but they didn't work.
The curious fact lies on the fact that if I let the pages "as is" the browser recognise them as having the encoding Windows-CP 1252 and when I change for UTF-8 the text is displayed correctly. But applying filters and other mechanisms the browser put the encoding as UTF-8 and is not possibile to display it correctly. In fact for the latter if I change the encoding the results are horrible at minimum.
I got it right now. In pages JSP I am putting as first instruction:
<%# page pageEncoding="utf-8" %>
This fixes all problems. Other possibilities like to put response.setCharacterEncoding( "UTF-8" ) as first instruction don't work.
In relation to servlets I need to setup the character encoding before to get the PrintWriter object:
response.setContentType("text/html");
response.setCharacterEncoding("UTF-8");
PrintWriter out = response.getWriter();
These things have solved my problem of strange characters. To sum up: The problem was that the response coming out from JSP/servlet didn't have pointed that itself was encoded in UTF-8
Maybe is not a JSP problem. Have you tried doing that in the page, directly?
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
...
</head>
Also, try to save the page in UTF-8 format
A coworker of mine created a basic contact-us type form, which is mangling accented characters (è, é, à, etc). We're using KonaKart a Java e-commerce platform on Struts 1.
I've narrowed the issue down to the data coming in through the HttpServletRequest object. Comparing a similar (properly functioning) form, I noticed that on the old form the request object's Character Encoding (request.getCharacterEncoding()) is returned as "UTF-8", but on the new form it is coming back as NULL, and the text coming out of request.getParameter() is already mangled.
Aside from that, I haven't found any significant differences between the known-good form, and the new-and-broken form.
Things I've ruled out:
Both HTML pages have the tag: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Both form tags in the HTML use POST, and do not set encodings
Checking from Firebug, both the Request and Response headers have the same properties
Both JSP pages use the same attributes in the <%#page contentType="text/html;charset=UTF-8" language="java" %> tag
There's nothing remotely interesting going on in the *Form.java files, both inherit from BaseValidatorForm
I've checked the source file encodings, they're all set to Default - inherited from Container: UTF-8
If I convert them from ISO-8859-1 to UTF-8, it works great, but I would much rather figure out the core issue.
eg: new String(request.getParameter("firstName").getBytes("ISO-8859-1"),"UTF8")
Any suggestions are welcome, I'm all out of ideas.
Modern browsers usually don't supply the character encoding in the HTTP request Content-Type header. It's in case of HTML form based applications however the same character encoding as specified in the Content-Type header of the initial HTTP response serving the page with the form. You need to explicitly set the request character encoding to the same encoding yourself, which is in your case thus UTF-8.
request.setCharacterEncoding("UTF-8");
Do this before any request parameter is been retrieved from the request (otherwise it's too late; the server platform default encoding would then be used to parse the parameters, which is indeed often ISO-8859-1). A servlet filter which is mapped on /* is a perfect place for this.
See also:
Unicode - How to get the characters right?
The request.getCharacterEncoding() relies on the Content-Type request attribute, not Accept-Charset
So application/x-www-form-urlencoded;charset=IS08859_1 should work for the POST action. The <%#page tag doesn't affect the POST data.
I have a form in which the user enters his name in Chinese, but when I do
String strName = request.getParameter("name");
I get strName as some meaningless characters. As a solution I tried
request.setCharacterEncoding("UTF-8");
before reading any parameters from the request object. This worked. What I want to know is how do I achieve this in HTML/javascript . I have tried the
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
but this doesn't work . Any help?
It should work if you define the charset in the Content-Type request header. I believe it is usually not defined by default. For example if you use jQuery to make the request you have to add "charset=utf-8" to the contentType option: http://api.jquery.com/jQuery.ajax/
There's no way to force a web browser to send Content-Type for every request, so it's better call setCharacterEncoding always.
What you do client-side affects how the data is sent. Apparently the data is sent as UTF-8 encoded, since the problem was in reading it server-side. So adding the meta (though OK) has no effect on this.
In pure HTML you should need <form enctype="multipart/form-data" accept-charset="UTF-8"> set to submit utf-8 in the browser.
I java code, I am having a string name = "örebro"; // its a swedish character.
But when I use this name in web application. I print some special character at 'Ö' character.
Is there anyway I can use the same character as it is in "örebro".
I did some thing like this but does not worked.
String name = "örebro";
byte[] utf8s = name .getBytes("UTF-8");
name = new String(utf8s, "UTF-8");
But the name at the end prints the same, something like this. �rebo
Please guide me
The Java code you've provided is pointless, it will do nothing. Java Strings are already perfectly capable of encoding any character (though you have to be careful with literals in the source code, as they depend on the encoding the compiler uses, which is platform-dependant).
Most likely your problem is that your webpage does not declare the encoding correctly in the HTTP header or the HTML meta tags.
You need to set the encoding of your output to UTF8.
It is likely the browser that reads the page does not know the encoding.
send the header (before any other output) something in Java like ServletResponse resource; (...)resource.setContentType ("text/html;charset=utf-8");
in your html page, mention the encoding by sending (printing)<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
If the page used to generate the output is jsp it's useful to precise
<%# page contentType="text/html; charset=utf-8" %>
I have some HTML code that I store in a Java.lang.String variable. I write that variable to a file and set the encoding to UTF-8 when writing the contents of the string variable to the file on the filesystem. I open up that file and everything looks great e.g. → shows up as a right arrow.
However, if the same String (containing the same content) is used by a jsp page to render content in a browser, characters such as → show up as a question mark (?)
When storing content in the String variable, I make sure that I use:
String myStr = new String(bytes[], charset)
instead of just:
String myStr = "<html><head/><body>→</body></html>";
Can someone please tell me why the String content gets written to the filesystem perfectly but does not render in the jsp/browser?
Thanks.
but does not render in the jsp/browser?
You need to set the response encoding as well. In a JSP you can do this using
<%# page pageEncoding="UTF-8" %>
This has actually the same effect as setting the following meta tag in HTML <head>:
<meta http-equiv="content-type" content="text/html; charset=utf-8">
Possibilities:
The browser does not support UTF-8
You don't have Content-Type: text/html; charset=utf-8 in your HTTP Headers.
The lazy developer (=me) uses Apache Common Lang StringEscapeUtils.escapeHtml http://commons.apache.org/lang/api-release/org/apache/commons/lang/StringEscapeUtils.html#escapeHtml(java.lang.String) which will help you handle all 'odd' characters. Let the browser do the final translation of the html entities.