retrieve and display Tamil characters from mysql database to Browser - java

I am using java language. I can store Tamil characters to database in the same format. But when I retrieve and display in the browser using jsp it displayed like boxes. I use the following code to save Tamil character in mysql database.
Properties pr = new Properties();
pr.put("user", "root");
pr.put("password", "root");
pr.put("characterEncoding", "UTF-8");
pr.put("useUnicode", "true");
Class.forName("com.mysql.jdbc.Driver");
connection = DriverManager.getConnection(connectionURL,pr);
I can see the Tamil characters in database. But I can't retrieve and display in the same format. Please Help me. Thanks in advance.

Add the following to the top of your JSPs:
<%# page pageEncoding="UTF-8" %>
It instructs the server to use UTF-8 to write the characters to the response. It also adds a HTTP response Content-Type header with a value of text/html;charset=UTF-8. This is quite different from a simple <meta> tag which is ignored by webbrowsers when the content is served over HTTP. For debugging purposes, you can see the real HTTP headers using for example Fiddler2 or Firebug.
That should be sufficient.

What is the character set of your JSP pages? Make sure that it is UTF-8.
<meta http-equiv="content-type" content="text/html; charset=UTF-8">

adarshr gave you the probable solution to your probleme, also you could read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets.

Related

Is Explicit decoding required here?

Say I am displaying escaped value in HTML with below code under text area:
<c:out value="${person.name}" />
My question do I need to decode this value at server side manually or browser will do it automatically ?
No, you need not to decode this value manually .. All you need is:
Specify your HTTP response content type encoding as UTF-8. To be precise use HttpServletResponse.setContentType ("text/html;charset=utf-8");.
Your JSP should have content type encoding set as UTF-8 in your JSP .. To be precise add this meta tag in your JSP and you should be good to go <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
When you have this tag in your JSP then browser will understand that content of this page should be render as per UTF-8 encoding rules.
If don't specify page encoding explicitly using these kind of meta tags or some other mechanism then browser use default encoding associated with it while page rendering and you may not see expected result especially for characters from Unicode's advanced blocks of BMP and Supplementary Multilingual Plane. Check this on how to see the default encoding of browser.
Concept
Server should specify desired encoding scheme in "response stream" and same encoding scheme should be used in JSP/ASP/HTML page.
Server side encoding options
PHP
header('Content-type: text/html; charset=utf-8');
Perl
print "Content-Type: text/html; charset=utf-8\n\n";
Python
Use the same solution as for Perl (except that you don't need a semicolon at the end).
Java Servlets
resource.setContentType ("text/html;charset=utf-8");
JSP
<%# page contentType="text/html; charset=UTF-8" %>
ASP and ASP.Net
<%Response.charset="utf-8"%>
Client side encoding options
Use following meta tag in your HTML page <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
Further reading:
HTTP-charset
This answer
when I get the request.parameter for the escaped input (done thru) <c:out value="${person.name}" />, I get the escaped value and store it in db as it is. For example :- <script>test</script> is stored as <script>test</script> Now when value is fetched from DB and displayed on browser, it renders it correctly i.e <script>test</script> is displayed as <script>test</script>

JSP mysql and utf8

I am trying to store data encoded in greek through my JSP page in a mysql database using an insert statement. I have set the mysql arrays collation to utf 8. I have already used the
request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");
statements and I have made the proper modification to the server.xml of the tomcat server ...
Any other ideas???
set collation greek_general_ci.
ALTER TABLE <table name> CONVERT TO CHARACTER SET utf8 COLLATE greek_general_ci;
EDIT:
In MySQL Workbench right click on table->Alter Table--> change Collation to greek_general_ci or greekbin
It seems to me that you are trying to read UTF-8 characters (data is inputed as UTF-8 as you stated) using the Greek charset or other encoding that do not have equivalent Unicode chars.
Question marks or equivalent are shown when the byte (or bytes) do not have any association with the encoding you are using. You are parsing bytes that your client does not understand as valid, there is no representation.
I recommend you to read this very helpful article before you continue:
For example, you could encode the Unicode string for Hello (U+0048
U+0065 U+006C U+006C U+006F) in ASCII, or the old OEM Greek Encoding,
or the Hebrew ANSI Encoding, or any of several hundred encodings that
have been invented so far, with one catch: some of the letters might
not show up! If there's no equivalent for the Unicode code point
you're trying to represent in the encoding you're trying to represent
it in, you usually get a little question mark: ? or, if you're really
good, a box. Which did you get? -> �
Also the problem is not in your database or table/column encoding. It would only apply if you were using stored procedures for example.
Make sure your browser is operating in UTF-8 when inputing or showing information. Using Chrome you can go to Tools > Encoding, the Unicode UTF-8 should be set after you open your JSP in the browser. You can also debug the request/response in the Network tab. If you are inputing data as UTF-8, read it as UTF-8.
You dont need to change Tomcat config or use the HttpFilter to set encoding if you are doing it in the JSP. I was able to simulate your problem using only the following config:
ISO-8859-7 is a Greek Encoding that I used to read unicode and simulate your problem
<%# page language="java" contentType="text/html; charset=ISO-8859-7" pageEncoding="ISO-8859-7"%>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-7">
</head>
</html>
If this does not help I ask you to post complete details and source code of your application.

Wrong encoding in java servlet (tomcat)

I am trying to setup the right encoding for my JSP/servlet pages in Tomcat 7. Though, I have to be successful yet. I made some tries from the suggestions given by this stackexchange thread: Character encoding JSP -displayed wrong in JSP but not in URL: "á » á é » é", but they didn't work.
The curious fact lies on the fact that if I let the pages "as is" the browser recognise them as having the encoding Windows-CP 1252 and when I change for UTF-8 the text is displayed correctly. But applying filters and other mechanisms the browser put the encoding as UTF-8 and is not possibile to display it correctly. In fact for the latter if I change the encoding the results are horrible at minimum.
I got it right now. In pages JSP I am putting as first instruction:
<%# page pageEncoding="utf-8" %>
This fixes all problems. Other possibilities like to put response.setCharacterEncoding( "UTF-8" ) as first instruction don't work.
In relation to servlets I need to setup the character encoding before to get the PrintWriter object:
response.setContentType("text/html");
response.setCharacterEncoding("UTF-8");
PrintWriter out = response.getWriter();
These things have solved my problem of strange characters. To sum up: The problem was that the response coming out from JSP/servlet didn't have pointed that itself was encoded in UTF-8
Maybe is not a JSP problem. Have you tried doing that in the page, directly?
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
...
</head>
Also, try to save the page in UTF-8 format

Where to add the UTF-8 extension in the HTML page?

I need to add the charset="utf-8" at the end of the script tags to get the translation to another language done.
I don know where all I should add the tags. Any rules are followed. Please let me know where to add the charset. Do i need to add at the end of "ApplicationLoader.js" or only after the jquery plugins. Any suggestion please.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>My Web App</title>
<link href="css/jquery/jquery.ui.all.css" rel="stylesheet" type="text/css" />
<script type="text/javascript" src="js/jquery-1.4.2.min.js" charset="utf-8"></script>
<script type="text/javascript" src="js/jquery-ui-1.8.custom.min.js" charset="utf-8"></script>
<script type="text/javascript" src="js/jquery.depends.js" charset="utf-8"></script>
<script type="text/javascript" src="myemployeelist.js" ></script>
Update:
I will make it more easy for you to understand my situation in details and am not a more stuff guy am a newbie under training now.As far as I understood I will explain my problems to you.
I created a webapplication project in Eclipse, in which I created a class
for JDBC connection to MySQL.
I have a class in serverside which gets the users profile value from my webapp
textbox and saves in DB.
I am using jQuery plugin for doing translation from English to Arabic.
I have an HTML page in which as stated in the above part of question I have the
tags in which I added the charset="utf-8" to specify unicode.
I am using dwr to get the values in js and send it to server side.
I changed my computer input language to Arabic and my Mozilla firefox locale to
Arabic.
I am able to enter the English values in mysql and I am able to retrieve it but when I
enter a Arabic value it's not getting saved. The JDBC error is
java.sql.SQLException: Incorrect string value: '\xD8\xB3\xD9\x84\xD8\xA8...'
I dont know how to configure my server Jetty 6 LANG variable to utf-8. Any suggestion please. Thanks.
If a Content-Type header is present in the HTTP response headers, then this will override the meta headers. Very often, this header is already by default supplied by the webserver and more than often the charset is absent (which would assume client's default charset which is often ISO-8859-1). In other words, the meta headers are generally only interpreted whenever the resources are opened locally (not by HTTP). Big chance that this is the reason that your meta headers apparently didn't work when served over HTTP.
You can use Firebug or Fiddler2 to determine the HTTP response headers. Below is a Firebug screen:
You can configure the general setting for HTTP response headers at webserver level. You can also configure it on a request basis at programming language level. Since it's unclear what webserver / programming language you're using, I can't go in detail about how to configure it accordingly.
Update: as per the problem symptoms, which is the following typical MySQL exception:
java.sql.SQLException: Incorrect string value: '\xD8\xB3\xD9\x84\xD8\xA8...'
The byte sequence D8 B3 D9 84 D8 A8 is a valid UTF-8 sequence which represents those characters سلب (U+0633, U+0644 and U+0628). So the HTTP part is fine. You mentioned that you were using Jetty 6 as servletcontainer. The later builds of Jetty 6 already support UTF-8 out of the box.
However, the problem is in the DB part. This exception indicates that the charset the DB/table is been instructed to use doesn't support this byte sequence. This can happen when the DB/table isn't been instructed to use UTF-8.
To fix the DB part, issue those MySQL commands:
ALTER DATABASE db_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
And for future DB/tables, use CHARACTER SET utf8 COLLATE utf8_general_ci in CREATE statement as well.
It is usually enough to declare the main document's encoding once, either through a content-type header (See #BalusC's answer for an in-depth explanation) or, if that is not available, the Meta tag.
If you use the Meta tag, make sure it is in the first line of the head section.
There should be no need to specify the character set explicitly for the script files.
Of course, all the content you deal with needs to be UTF-8 encoded as well for this to work. It's not enough to just slap the content-type meta tag in front. (But you are probably aware of that.)
In this way:
<script type="text/javascript" src="[path]/myscript.js" charset="utf-8"></script>

Java String Encoding to UTF-8

I have some HTML code that I store in a Java.lang.String variable. I write that variable to a file and set the encoding to UTF-8 when writing the contents of the string variable to the file on the filesystem. I open up that file and everything looks great e.g. → shows up as a right arrow.
However, if the same String (containing the same content) is used by a jsp page to render content in a browser, characters such as → show up as a question mark (?)
When storing content in the String variable, I make sure that I use:
String myStr = new String(bytes[], charset)
instead of just:
String myStr = "<html><head/><body>→</body></html>";
Can someone please tell me why the String content gets written to the filesystem perfectly but does not render in the jsp/browser?
Thanks.
but does not render in the jsp/browser?
You need to set the response encoding as well. In a JSP you can do this using
<%# page pageEncoding="UTF-8" %>
This has actually the same effect as setting the following meta tag in HTML <head>:
<meta http-equiv="content-type" content="text/html; charset=utf-8">
Possibilities:
The browser does not support UTF-8
You don't have Content-Type: text/html; charset=utf-8 in your HTTP Headers.
The lazy developer (=me) uses Apache Common Lang StringEscapeUtils.escapeHtml http://commons.apache.org/lang/api-release/org/apache/commons/lang/StringEscapeUtils.html#escapeHtml(java.lang.String) which will help you handle all 'odd' characters. Let the browser do the final translation of the html entities.

Categories

Resources