JSP mysql and utf8 - java

I am trying to store data encoded in greek through my JSP page in a mysql database using an insert statement. I have set the mysql arrays collation to utf 8. I have already used the
request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");
statements and I have made the proper modification to the server.xml of the tomcat server ...
Any other ideas???

set collation greek_general_ci.
ALTER TABLE <table name> CONVERT TO CHARACTER SET utf8 COLLATE greek_general_ci;
EDIT:
In MySQL Workbench right click on table->Alter Table--> change Collation to greek_general_ci or greekbin

It seems to me that you are trying to read UTF-8 characters (data is inputed as UTF-8 as you stated) using the Greek charset or other encoding that do not have equivalent Unicode chars.
Question marks or equivalent are shown when the byte (or bytes) do not have any association with the encoding you are using. You are parsing bytes that your client does not understand as valid, there is no representation.
I recommend you to read this very helpful article before you continue:
For example, you could encode the Unicode string for Hello (U+0048
U+0065 U+006C U+006C U+006F) in ASCII, or the old OEM Greek Encoding,
or the Hebrew ANSI Encoding, or any of several hundred encodings that
have been invented so far, with one catch: some of the letters might
not show up! If there's no equivalent for the Unicode code point
you're trying to represent in the encoding you're trying to represent
it in, you usually get a little question mark: ? or, if you're really
good, a box. Which did you get? -> �
Also the problem is not in your database or table/column encoding. It would only apply if you were using stored procedures for example.
Make sure your browser is operating in UTF-8 when inputing or showing information. Using Chrome you can go to Tools > Encoding, the Unicode UTF-8 should be set after you open your JSP in the browser. You can also debug the request/response in the Network tab. If you are inputing data as UTF-8, read it as UTF-8.
You dont need to change Tomcat config or use the HttpFilter to set encoding if you are doing it in the JSP. I was able to simulate your problem using only the following config:
ISO-8859-7 is a Greek Encoding that I used to read unicode and simulate your problem
<%# page language="java" contentType="text/html; charset=ISO-8859-7" pageEncoding="ISO-8859-7"%>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-7">
</head>
</html>
If this does not help I ask you to post complete details and source code of your application.

Related

Internationalization using resource bundle properties in JSP, non-Latin text becomes Mojibake

I have the following index.jsp:
<%# taglib prefix="fmt" uri="http://java.sun.com/jsp/jstl/fmt" %>
<%# page contentType="text/html;charset=UTF-8" language="java" %>
<fmt:setLocale value="ru_RU"/>
<fmt:setBundle basename="messages"/>
<html>
<head>
<title></title>
</head>
<body>
<h1><fmt:message key="login"/></h1>
</body>
</html>
And property file messages_ru_RU.properties:
login = Логин
The problem is that I get the junk unicode characters in the output:
Ëîãèí
Update
Changed the .properies file encoding to UTF-8.
The latest output: Ðогин
Help me, please, to change this to the normal cyrillic letters.
Property file:
messages_ru_RU.properties
Properties files are as per specification read using ISO-8859-1.
... the input/output stream is encoded in ISO 8859-1 character encoding. Characters that cannot be directly represented in this encoding can be written using Unicode escapes as defined in section 3.3 of The Java™ Language Specification; only a single 'u' character is allowed in an escape sequence. The native2ascii tool can be used to convert property files to and from other character encodings.
So, any character which is not covered by the ISO-8859-1 range needs to be escaped in the Unicode escape sequences \uXXXX. You can use the JDK-supplied native2ascii tool to convert them. You can find it in JDK's /bin folder.
Here's an example assuming that foo_utf8.properties is the one which you saved using UTF-8 and that foo.properties is the one which you'd like to use in your application:
native2ascii –encoding UTF-8 foo_utf8.properties foo.properties
In your particular case, the property in question would then be converted to:
login = \u041B\u043E\u0433\u0438\u043D
This can then be successfully read and displayed in a JSP page with the below minimum #page configuration:
<%# page pageEncoding="UTF-8" %>
(the remainder you had is irrelevant as those are the defaults already when above is set)
If you're using a Java-aware IDE such as Eclipse, then you can just use its builtin properties file editor which should automatically be associated with .properties files in a Java-faceted project. If you use this editor instead of the plain text editor / source editor, then it'll automatically escape the characters which are not covered by the ISO-8859-1 range.
See also:
Unicode - How to get the characters right?
How to internationalize a Java web application?
Image showing to change to unicode
I had same problem with hindi language, so i changed my pageEncoding to UTF-8 and have saved file with Unicode encoding. Since i have given unicodes in .properties file. This worked for me.
Since Java SE properties files are loaded in UTF-8 encoding.
See https://docs.oracle.com/javase/9/intl/internationalization-enhancements-jdk-9.htm and https://stackoverflow.com/a/46926020/548473

retrieve and display Tamil characters from mysql database to Browser

I am using java language. I can store Tamil characters to database in the same format. But when I retrieve and display in the browser using jsp it displayed like boxes. I use the following code to save Tamil character in mysql database.
Properties pr = new Properties();
pr.put("user", "root");
pr.put("password", "root");
pr.put("characterEncoding", "UTF-8");
pr.put("useUnicode", "true");
Class.forName("com.mysql.jdbc.Driver");
connection = DriverManager.getConnection(connectionURL,pr);
I can see the Tamil characters in database. But I can't retrieve and display in the same format. Please Help me. Thanks in advance.
Add the following to the top of your JSPs:
<%# page pageEncoding="UTF-8" %>
It instructs the server to use UTF-8 to write the characters to the response. It also adds a HTTP response Content-Type header with a value of text/html;charset=UTF-8. This is quite different from a simple <meta> tag which is ignored by webbrowsers when the content is served over HTTP. For debugging purposes, you can see the real HTTP headers using for example Fiddler2 or Firebug.
That should be sufficient.
What is the character set of your JSP pages? Make sure that it is UTF-8.
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
adarshr gave you the probable solution to your probleme, also you could read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets.

jsp pages displaying junk characters in non english languages

I have a Main JSP page say jsp1 which includes two JSP pages (jsp2, jsp3). All the strings in these pages come from property files.
The non-english property files are converted using native2ascii
native2ascii –encoding="8859-1" lang.properties lang1.properties
All the JSP pages have
<%# page contentType="text/html;charset=UTF-8" language="java" %>
Now when main jsp page(jsp1) gets displayed, we see garbled characters in a few strings of jsp2 and jsp3. Till now I have seen this happening to Russian, Korean, Japanese language strings. And it happens on a random string.
Does any one have an idea what could be wrong
Updating with more details
The string in rus_utf8.proeperties is
Щелкните <strong>УСТАНОВИТЬ СЕЙЧАС</strong> и сохраните файл в некотором расположении
After Conversion using native2Ascii, String in rus.properties is
\u0429\u0435\u043b\u043a\u043d\u0438\u0442\u0435 <strong>\u0423\u0421\u0422\u0410\u041d\u041e\u0412\u0418\u0422\u042c \u0421\u0415\u0419\u0427\u0410\u0421</strong> \u0438 \u0441\u043e\u0445\u0440\u0430\u043d\u0438\u0442\u0435 \u0444\u0430\u0439\u043b \u0432 \u043d\u0435\u043a\u043e\u0442\u043e\u0440\u043e\u043c \u0440\u0430\u0441\u043f\u043e\u043b\u043e\u0436\u0435\u043d\u0438\u0438.
In JSP we use struts <s:text> to load the string from property file
In firefox the string got displayed as
��елкните УСТАНОВИТЬ СЕЙЧАС и сохраните файл в некотором расположении.
The char Щ got garbled. Same String in some other place in the page got displayed properly.
The non-english property files are converted using native2ascii
native2ascii –encoding="8859-1" lang.properties lang1.properties
This is invalid. It should have been
native2ascii –encoding ISO-8859-1 lang.properties lang1.properties
Apart from the syntax error which you have there (which should immediately have aborted native2ascii), the ISO-8859-1 encoding can impossibly be correct for Russian, Korean and Japanese strings. The ISO-8859-1 encoding does not cover those characters at all. Assuming that you saved it as UTF-8, then you should be using
native2ascii –encoding UTF-8 lang.properties lang1.properties
This way the native2ascii will convert from an UTF-8 lang.properties to an ISO-8859-1 compatible lang1.properties. The native2ascii will always convert to ASCII. The -encoding attribute concerns the encoding of the source file, not the target file.
As to the JSP pages, just a
<%#page contentType="text/html; charset=UTF-8" %>
ought to be sufficient, per http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8.
See also:
Unicode - How to get the characters right?
Update as per your update with the examples. Everything is actually working right. It only look much like that the UTF-8 BOM (Byte Order Mark) is the culprit. Notepad adds it by default. Try creating the properties file in another editor instead like Eclipse.
I have the same problem as you and I also tried the similar solutions but they didn't work. Hence I suspected that it may not be an issue with JSP config but rather it is a config issue with my tomcat.
I found this on a Chinese site, https://openhome.cc/Gossip/Encoding/Servlet.html: request.setCharacterEncoding("UTF-8");. It worked for me. I added this before my request.getParameter();.

Where to add the UTF-8 extension in the HTML page?

I need to add the charset="utf-8" at the end of the script tags to get the translation to another language done.
I don know where all I should add the tags. Any rules are followed. Please let me know where to add the charset. Do i need to add at the end of "ApplicationLoader.js" or only after the jquery plugins. Any suggestion please.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>My Web App</title>
<link href="css/jquery/jquery.ui.all.css" rel="stylesheet" type="text/css" />
<script type="text/javascript" src="js/jquery-1.4.2.min.js" charset="utf-8"></script>
<script type="text/javascript" src="js/jquery-ui-1.8.custom.min.js" charset="utf-8"></script>
<script type="text/javascript" src="js/jquery.depends.js" charset="utf-8"></script>
<script type="text/javascript" src="myemployeelist.js" ></script>
Update:
I will make it more easy for you to understand my situation in details and am not a more stuff guy am a newbie under training now.As far as I understood I will explain my problems to you.
I created a webapplication project in Eclipse, in which I created a class
for JDBC connection to MySQL.
I have a class in serverside which gets the users profile value from my webapp
textbox and saves in DB.
I am using jQuery plugin for doing translation from English to Arabic.
I have an HTML page in which as stated in the above part of question I have the
tags in which I added the charset="utf-8" to specify unicode.
I am using dwr to get the values in js and send it to server side.
I changed my computer input language to Arabic and my Mozilla firefox locale to
Arabic.
I am able to enter the English values in mysql and I am able to retrieve it but when I
enter a Arabic value it's not getting saved. The JDBC error is
java.sql.SQLException: Incorrect string value: '\xD8\xB3\xD9\x84\xD8\xA8...'
I dont know how to configure my server Jetty 6 LANG variable to utf-8. Any suggestion please. Thanks.
If a Content-Type header is present in the HTTP response headers, then this will override the meta headers. Very often, this header is already by default supplied by the webserver and more than often the charset is absent (which would assume client's default charset which is often ISO-8859-1). In other words, the meta headers are generally only interpreted whenever the resources are opened locally (not by HTTP). Big chance that this is the reason that your meta headers apparently didn't work when served over HTTP.
You can use Firebug or Fiddler2 to determine the HTTP response headers. Below is a Firebug screen:
You can configure the general setting for HTTP response headers at webserver level. You can also configure it on a request basis at programming language level. Since it's unclear what webserver / programming language you're using, I can't go in detail about how to configure it accordingly.
Update: as per the problem symptoms, which is the following typical MySQL exception:
java.sql.SQLException: Incorrect string value: '\xD8\xB3\xD9\x84\xD8\xA8...'
The byte sequence D8 B3 D9 84 D8 A8 is a valid UTF-8 sequence which represents those characters سلب (U+0633, U+0644 and U+0628). So the HTTP part is fine. You mentioned that you were using Jetty 6 as servletcontainer. The later builds of Jetty 6 already support UTF-8 out of the box.
However, the problem is in the DB part. This exception indicates that the charset the DB/table is been instructed to use doesn't support this byte sequence. This can happen when the DB/table isn't been instructed to use UTF-8.
To fix the DB part, issue those MySQL commands:
ALTER DATABASE db_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
And for future DB/tables, use CHARACTER SET utf8 COLLATE utf8_general_ci in CREATE statement as well.
It is usually enough to declare the main document's encoding once, either through a content-type header (See #BalusC's answer for an in-depth explanation) or, if that is not available, the Meta tag.
If you use the Meta tag, make sure it is in the first line of the head section.
There should be no need to specify the character set explicitly for the script files.
Of course, all the content you deal with needs to be UTF-8 encoded as well for this to work. It's not enough to just slap the content-type meta tag in front. (But you are probably aware of that.)
In this way:
<script type="text/javascript" src="[path]/myscript.js" charset="utf-8"></script>

converting a String to UTF8 format

I java code, I am having a string name = "örebro"; // its a swedish character.
But when I use this name in web application. I print some special character at 'Ö' character.
Is there anyway I can use the same character as it is in "örebro".
I did some thing like this but does not worked.
String name = "örebro";
byte[] utf8s = name .getBytes("UTF-8");
name = new String(utf8s, "UTF-8");
But the name at the end prints the same, something like this. �rebo
Please guide me
The Java code you've provided is pointless, it will do nothing. Java Strings are already perfectly capable of encoding any character (though you have to be careful with literals in the source code, as they depend on the encoding the compiler uses, which is platform-dependant).
Most likely your problem is that your webpage does not declare the encoding correctly in the HTTP header or the HTML meta tags.
You need to set the encoding of your output to UTF8.
It is likely the browser that reads the page does not know the encoding.
send the header (before any other output) something in Java like ServletResponse resource; (...)resource.setContentType ("text/html;charset=utf-8");
in your html page, mention the encoding by sending (printing)<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
If the page used to generate the output is jsp it's useful to precise
<%# page contentType="text/html; charset=utf-8" %>

Categories

Resources