Where to add the UTF-8 extension in the HTML page? - java

I need to add the charset="utf-8" at the end of the script tags to get the translation to another language done.
I don know where all I should add the tags. Any rules are followed. Please let me know where to add the charset. Do i need to add at the end of "ApplicationLoader.js" or only after the jquery plugins. Any suggestion please.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>My Web App</title>
<link href="css/jquery/jquery.ui.all.css" rel="stylesheet" type="text/css" />
<script type="text/javascript" src="js/jquery-1.4.2.min.js" charset="utf-8"></script>
<script type="text/javascript" src="js/jquery-ui-1.8.custom.min.js" charset="utf-8"></script>
<script type="text/javascript" src="js/jquery.depends.js" charset="utf-8"></script>
<script type="text/javascript" src="myemployeelist.js" ></script>
Update:
I will make it more easy for you to understand my situation in details and am not a more stuff guy am a newbie under training now.As far as I understood I will explain my problems to you.
I created a webapplication project in Eclipse, in which I created a class
for JDBC connection to MySQL.
I have a class in serverside which gets the users profile value from my webapp
textbox and saves in DB.
I am using jQuery plugin for doing translation from English to Arabic.
I have an HTML page in which as stated in the above part of question I have the
tags in which I added the charset="utf-8" to specify unicode.
I am using dwr to get the values in js and send it to server side.
I changed my computer input language to Arabic and my Mozilla firefox locale to
Arabic.
I am able to enter the English values in mysql and I am able to retrieve it but when I
enter a Arabic value it's not getting saved. The JDBC error is
java.sql.SQLException: Incorrect string value: '\xD8\xB3\xD9\x84\xD8\xA8...'
I dont know how to configure my server Jetty 6 LANG variable to utf-8. Any suggestion please. Thanks.

If a Content-Type header is present in the HTTP response headers, then this will override the meta headers. Very often, this header is already by default supplied by the webserver and more than often the charset is absent (which would assume client's default charset which is often ISO-8859-1). In other words, the meta headers are generally only interpreted whenever the resources are opened locally (not by HTTP). Big chance that this is the reason that your meta headers apparently didn't work when served over HTTP.
You can use Firebug or Fiddler2 to determine the HTTP response headers. Below is a Firebug screen:
You can configure the general setting for HTTP response headers at webserver level. You can also configure it on a request basis at programming language level. Since it's unclear what webserver / programming language you're using, I can't go in detail about how to configure it accordingly.
Update: as per the problem symptoms, which is the following typical MySQL exception:
java.sql.SQLException: Incorrect string value: '\xD8\xB3\xD9\x84\xD8\xA8...'
The byte sequence D8 B3 D9 84 D8 A8 is a valid UTF-8 sequence which represents those characters سلب (U+0633, U+0644 and U+0628). So the HTTP part is fine. You mentioned that you were using Jetty 6 as servletcontainer. The later builds of Jetty 6 already support UTF-8 out of the box.
However, the problem is in the DB part. This exception indicates that the charset the DB/table is been instructed to use doesn't support this byte sequence. This can happen when the DB/table isn't been instructed to use UTF-8.
To fix the DB part, issue those MySQL commands:
ALTER DATABASE db_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
And for future DB/tables, use CHARACTER SET utf8 COLLATE utf8_general_ci in CREATE statement as well.

It is usually enough to declare the main document's encoding once, either through a content-type header (See #BalusC's answer for an in-depth explanation) or, if that is not available, the Meta tag.
If you use the Meta tag, make sure it is in the first line of the head section.
There should be no need to specify the character set explicitly for the script files.
Of course, all the content you deal with needs to be UTF-8 encoded as well for this to work. It's not enough to just slap the content-type meta tag in front. (But you are probably aware of that.)

In this way:
<script type="text/javascript" src="[path]/myscript.js" charset="utf-8"></script>

Related

Encoding of URL query params in HTML mail

My application sends an HTML email with a hyperlink. The user should be able to click on the link in a mail client and the browser opens the page. Easy... i thought.
But now the link contains special characters in a query param. E.g. the value for the second query param is id=1234/pid=1000 and the resulting java.net.URLEncoder.encode()d URL becomes
http://example.com/path/?key1=value1&key2=id%3D1234%2Fpid%3D1000
And this is what gets embedded into an HTML mail:
<html ...
<head>
<meta http-equiv=Content-Type content="text/html; charset=unicode">
...
<body lang=EN-US ...
link text
...
(at least this can be seen in the raw mail content by the receiving client).
Problem is that when clicking on the link, the browser opens up but the query params are double encoded:
http://example.com/path/?key1=value1&key2=id%253D1234%252Fpid%253D1000
and the link does not work anymore. Funny thing, if the link is copy/pasted into the browser, the additional encoding is omitted (as expected).
When hovering over the link with the mouse, the URL is displayed decoded:
http://example.com/path/?key1=value1&key2=id=1234/pid=1000 (tested with Outlook and Thundbird, both have same behaviour).
So my question is, what is wrong here?
Is it my first encoding and should I skip it?
Is it my first encoding and am I doing it wrong?
Are the mail clients to blame?
I can find tons of information about how URL encoding works and that it should be done but not which encoding in which use cases.
Thanks.
Finally solved it. It has nothing to do with the encoding in the mail or the mail client but the server does an instant redirect and changes the query parameter -> duplicate of mod_rewrite urlencoding an already urlencoded query string parameter - any way to disable this?.
Should have really tried example.com to figure this out but I couldn't even imagine it's a problem with the web server.

Is Explicit decoding required here?

Say I am displaying escaped value in HTML with below code under text area:
<c:out value="${person.name}" />
My question do I need to decode this value at server side manually or browser will do it automatically ?
No, you need not to decode this value manually .. All you need is:
Specify your HTTP response content type encoding as UTF-8. To be precise use HttpServletResponse.setContentType ("text/html;charset=utf-8");.
Your JSP should have content type encoding set as UTF-8 in your JSP .. To be precise add this meta tag in your JSP and you should be good to go <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
When you have this tag in your JSP then browser will understand that content of this page should be render as per UTF-8 encoding rules.
If don't specify page encoding explicitly using these kind of meta tags or some other mechanism then browser use default encoding associated with it while page rendering and you may not see expected result especially for characters from Unicode's advanced blocks of BMP and Supplementary Multilingual Plane. Check this on how to see the default encoding of browser.
Concept
Server should specify desired encoding scheme in "response stream" and same encoding scheme should be used in JSP/ASP/HTML page.
Server side encoding options
PHP
header('Content-type: text/html; charset=utf-8');
Perl
print "Content-Type: text/html; charset=utf-8\n\n";
Python
Use the same solution as for Perl (except that you don't need a semicolon at the end).
Java Servlets
resource.setContentType ("text/html;charset=utf-8");
JSP
<%# page contentType="text/html; charset=UTF-8" %>
ASP and ASP.Net
<%Response.charset="utf-8"%>
Client side encoding options
Use following meta tag in your HTML page <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
Further reading:
HTTP-charset
This answer
when I get the request.parameter for the escaped input (done thru) <c:out value="${person.name}" />, I get the escaped value and store it in db as it is. For example :- <script>test</script> is stored as <script>test</script> Now when value is fetched from DB and displayed on browser, it renders it correctly i.e <script>test</script> is displayed as <script>test</script>

JSP mysql and utf8

I am trying to store data encoded in greek through my JSP page in a mysql database using an insert statement. I have set the mysql arrays collation to utf 8. I have already used the
request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");
statements and I have made the proper modification to the server.xml of the tomcat server ...
Any other ideas???
set collation greek_general_ci.
ALTER TABLE <table name> CONVERT TO CHARACTER SET utf8 COLLATE greek_general_ci;
EDIT:
In MySQL Workbench right click on table->Alter Table--> change Collation to greek_general_ci or greekbin
It seems to me that you are trying to read UTF-8 characters (data is inputed as UTF-8 as you stated) using the Greek charset or other encoding that do not have equivalent Unicode chars.
Question marks or equivalent are shown when the byte (or bytes) do not have any association with the encoding you are using. You are parsing bytes that your client does not understand as valid, there is no representation.
I recommend you to read this very helpful article before you continue:
For example, you could encode the Unicode string for Hello (U+0048
U+0065 U+006C U+006C U+006F) in ASCII, or the old OEM Greek Encoding,
or the Hebrew ANSI Encoding, or any of several hundred encodings that
have been invented so far, with one catch: some of the letters might
not show up! If there's no equivalent for the Unicode code point
you're trying to represent in the encoding you're trying to represent
it in, you usually get a little question mark: ? or, if you're really
good, a box. Which did you get? -> �
Also the problem is not in your database or table/column encoding. It would only apply if you were using stored procedures for example.
Make sure your browser is operating in UTF-8 when inputing or showing information. Using Chrome you can go to Tools > Encoding, the Unicode UTF-8 should be set after you open your JSP in the browser. You can also debug the request/response in the Network tab. If you are inputing data as UTF-8, read it as UTF-8.
You dont need to change Tomcat config or use the HttpFilter to set encoding if you are doing it in the JSP. I was able to simulate your problem using only the following config:
ISO-8859-7 is a Greek Encoding that I used to read unicode and simulate your problem
<%# page language="java" contentType="text/html; charset=ISO-8859-7" pageEncoding="ISO-8859-7"%>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-7">
</head>
</html>
If this does not help I ask you to post complete details and source code of your application.

request.getCharacterEncoding() returns NULL... why?

A coworker of mine created a basic contact-us type form, which is mangling accented characters (è, é, à, etc). We're using KonaKart a Java e-commerce platform on Struts 1.
I've narrowed the issue down to the data coming in through the HttpServletRequest object. Comparing a similar (properly functioning) form, I noticed that on the old form the request object's Character Encoding (request.getCharacterEncoding()) is returned as "UTF-8", but on the new form it is coming back as NULL, and the text coming out of request.getParameter() is already mangled.
Aside from that, I haven't found any significant differences between the known-good form, and the new-and-broken form.
Things I've ruled out:
Both HTML pages have the tag: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Both form tags in the HTML use POST, and do not set encodings
Checking from Firebug, both the Request and Response headers have the same properties
Both JSP pages use the same attributes in the <%#page contentType="text/html;charset=UTF-8" language="java" %> tag
There's nothing remotely interesting going on in the *Form.java files, both inherit from BaseValidatorForm
I've checked the source file encodings, they're all set to Default - inherited from Container: UTF-8
If I convert them from ISO-8859-1 to UTF-8, it works great, but I would much rather figure out the core issue.
eg: new String(request.getParameter("firstName").getBytes("ISO-8859-1"),"UTF8")
Any suggestions are welcome, I'm all out of ideas.
Modern browsers usually don't supply the character encoding in the HTTP request Content-Type header. It's in case of HTML form based applications however the same character encoding as specified in the Content-Type header of the initial HTTP response serving the page with the form. You need to explicitly set the request character encoding to the same encoding yourself, which is in your case thus UTF-8.
request.setCharacterEncoding("UTF-8");
Do this before any request parameter is been retrieved from the request (otherwise it's too late; the server platform default encoding would then be used to parse the parameters, which is indeed often ISO-8859-1). A servlet filter which is mapped on /* is a perfect place for this.
See also:
Unicode - How to get the characters right?
The request.getCharacterEncoding() relies on the Content-Type request attribute, not Accept-Charset
So application/x-www-form-urlencoded;charset=IS08859_1 should work for the POST action. The <%#page tag doesn't affect the POST data.

retrieve and display Tamil characters from mysql database to Browser

I am using java language. I can store Tamil characters to database in the same format. But when I retrieve and display in the browser using jsp it displayed like boxes. I use the following code to save Tamil character in mysql database.
Properties pr = new Properties();
pr.put("user", "root");
pr.put("password", "root");
pr.put("characterEncoding", "UTF-8");
pr.put("useUnicode", "true");
Class.forName("com.mysql.jdbc.Driver");
connection = DriverManager.getConnection(connectionURL,pr);
I can see the Tamil characters in database. But I can't retrieve and display in the same format. Please Help me. Thanks in advance.
Add the following to the top of your JSPs:
<%# page pageEncoding="UTF-8" %>
It instructs the server to use UTF-8 to write the characters to the response. It also adds a HTTP response Content-Type header with a value of text/html;charset=UTF-8. This is quite different from a simple <meta> tag which is ignored by webbrowsers when the content is served over HTTP. For debugging purposes, you can see the real HTTP headers using for example Fiddler2 or Firebug.
That should be sufficient.
What is the character set of your JSP pages? Make sure that it is UTF-8.
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
adarshr gave you the probable solution to your probleme, also you could read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets.

Categories

Resources