GWT Characters with accent not recognised when added programmatically

GWT Characters with accent not recognised when added programmatically - java

I am using the UIBinder in GWT but I have problems displaying letters with an accent.
My xml looks like this
<!DOCTYPE ui:UiBinder SYSTEM "http://dl.google.com/gwt/DTD/xhtml.ent">
<ui:UiBinder xmlns:ui="urn:ui:com.google.gwt.uibinder"
xmlns:g="urn:import:com.google.gwt.user.client.ui">
...
<g:Label ui:field="lbl"></Label>
If I type my text directly in the xml <g:Label>éç</g:Label> the accents come out fine. But if I use the setText method in the associated class lbl.setText("éç") they are replaced by a diamond with a question mark in it.
Edit: If if type them in html it displays the ampersand and stuff
SOLUTION:
In fact when I tested the app after changing the file format to UTF-8 I hadn't went back through the code to retype all the accent which were broken during the change. So they still appeared the same in the browser.

You need to set the response encoding and client encoding to UTF-8 as well.
Add this to top of your page to instruct the XML parser to use UTF-8:
<?xml version="1.0" encoding="UTF-8"?>
Add this to the HTML <head> to instruct the client to use UTF-8:
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />

After doing BalusC tasks check your File Save Option go to >> 'File\AdvancedSaveOptions...' and check if your page saved as Unicode (UTF-8 with signature) codepage 650001.
Your issue may be because of using Windows Codepage 1252
just note that you have to retype your Unicode String

Related

Internationalization using resource bundle properties in JSP, non-Latin text becomes Mojibake

I have the following index.jsp:
<%# taglib prefix="fmt" uri="http://java.sun.com/jsp/jstl/fmt" %>
<%# page contentType="text/html;charset=UTF-8" language="java" %>
<fmt:setLocale value="ru_RU"/>
<fmt:setBundle basename="messages"/>
<html>
<head>
<title></title>
</head>
<body>
<h1><fmt:message key="login"/></h1>
</body>
</html>
And property file messages_ru_RU.properties:
login = Логин
The problem is that I get the junk unicode characters in the output:
Ëîãèí
Update
Changed the .properies file encoding to UTF-8.
The latest output: ÐÐ¾Ð³Ð¸Ð½
Help me, please, to change this to the normal cyrillic letters.
Property file:
messages_ru_RU.properties

Properties files are as per specification read using ISO-8859-1.
... the input/output stream is encoded in ISO 8859-1 character encoding. Characters that cannot be directly represented in this encoding can be written using Unicode escapes as defined in section 3.3 of The Java™ Language Specification; only a single 'u' character is allowed in an escape sequence. The native2ascii tool can be used to convert property files to and from other character encodings.
So, any character which is not covered by the ISO-8859-1 range needs to be escaped in the Unicode escape sequences \uXXXX. You can use the JDK-supplied native2ascii tool to convert them. You can find it in JDK's /bin folder.
Here's an example assuming that foo_utf8.properties is the one which you saved using UTF-8 and that foo.properties is the one which you'd like to use in your application:
native2ascii –encoding UTF-8 foo_utf8.properties foo.properties
In your particular case, the property in question would then be converted to:
login = \u041B\u043E\u0433\u0438\u043D
This can then be successfully read and displayed in a JSP page with the below minimum #page configuration:
<%# page pageEncoding="UTF-8" %>
(the remainder you had is irrelevant as those are the defaults already when above is set)
If you're using a Java-aware IDE such as Eclipse, then you can just use its builtin properties file editor which should automatically be associated with .properties files in a Java-faceted project. If you use this editor instead of the plain text editor / source editor, then it'll automatically escape the characters which are not covered by the ISO-8859-1 range.
See also:
Unicode - How to get the characters right?
How to internationalize a Java web application?

Image showing to change to unicode
I had same problem with hindi language, so i changed my pageEncoding to UTF-8 and have saved file with Unicode encoding. Since i have given unicodes in .properties file. This worked for me.

Since Java SE properties files are loaded in UTF-8 encoding.
See https://docs.oracle.com/javase/9/intl/internationalization-enhancements-jdk-9.htm and https://stackoverflow.com/a/46926020/548473

JSP mysql and utf8

I am trying to store data encoded in greek through my JSP page in a mysql database using an insert statement. I have set the mysql arrays collation to utf 8. I have already used the
request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");
statements and I have made the proper modification to the server.xml of the tomcat server ...
Any other ideas???

set collation greek_general_ci.
ALTER TABLE <table name> CONVERT TO CHARACTER SET utf8 COLLATE greek_general_ci;
EDIT:
In MySQL Workbench right click on table->Alter Table--> change Collation to greek_general_ci or greekbin

It seems to me that you are trying to read UTF-8 characters (data is inputed as UTF-8 as you stated) using the Greek charset or other encoding that do not have equivalent Unicode chars.
Question marks or equivalent are shown when the byte (or bytes) do not have any association with the encoding you are using. You are parsing bytes that your client does not understand as valid, there is no representation.
I recommend you to read this very helpful article before you continue:
For example, you could encode the Unicode string for Hello (U+0048
U+0065 U+006C U+006C U+006F) in ASCII, or the old OEM Greek Encoding,
or the Hebrew ANSI Encoding, or any of several hundred encodings that
have been invented so far, with one catch: some of the letters might
not show up! If there's no equivalent for the Unicode code point
you're trying to represent in the encoding you're trying to represent
it in, you usually get a little question mark: ? or, if you're really
good, a box. Which did you get? -> �
Also the problem is not in your database or table/column encoding. It would only apply if you were using stored procedures for example.
Make sure your browser is operating in UTF-8 when inputing or showing information. Using Chrome you can go to Tools > Encoding, the Unicode UTF-8 should be set after you open your JSP in the browser. You can also debug the request/response in the Network tab. If you are inputing data as UTF-8, read it as UTF-8.
You dont need to change Tomcat config or use the HttpFilter to set encoding if you are doing it in the JSP. I was able to simulate your problem using only the following config:
ISO-8859-7 is a Greek Encoding that I used to read unicode and simulate your problem
<%# page language="java" contentType="text/html; charset=ISO-8859-7" pageEncoding="ISO-8859-7"%>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-7">
</head>
</html>
If this does not help I ask you to post complete details and source code of your application.

Unicode chars are converted to broken symbols when I use wkhtmltopdf

I have HTML that contains some Unicode characters, and saved in "UTF-8" to disk. I can use less to display it, all characters displayed well:
<h1>什么是Action?</h1>
<p>Play程序接收到的大部分请求，都是由<code>Action</code>来处理的。
But when I use "wkhtmltopdf" to convert it to PDF, it shows broken characters:
My command is:
wkhtmltopdf --encoding utf-8 book.html book.pdf
How to fix this?

Finally I found the reason: I don't have unicode fonts in my ubuntu server.
I upload some truetype fonts from my local ubuntu to the server, everything works fine.
freewind#freewind:/usr/share/fonts$ cd truetype/
freewind#freewind:/usr/share/fonts/truetype$ ls
arphic ttf-dejavu ttf-lao
freefont ttf-devanagari-fonts ttf-liberation
kochi ttf-gujarati-fonts ttf-malayalam-fonts
msttcorefonts ttf-indic-fonts-core ttf-oriya-fonts
openoffice ttf-japanese-gothic.ttf ttf-punjabi-fonts
sazanami ttf-japanese-mincho.ttf ttf-tamil-fonts
takao ttf-kacst-one ttf-telugu-fonts
thai ttf-kannada-fonts unfonts
ttf-bengali-fonts ttf-khmeros-core wqy
I simply upload them all, it fix this problem, although I don't know which font is the key.

I was having this problem too. Turned out, the HTML file had a meta tag that was setting the wrong charset.
Eg the HTML file had
<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<style>
and the issue was resolved when I switched it to instead utf-8 for the charset, like so:
<head>
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<style>

Try
wkhtmltopdf-i386 book.html book.pdf

If you are on a MS Windows machine (the above answer is for X Windows font server), the following worked for me:
You can use YaHei or SimSun with wkhtmltoimage.
Explicitly set content using Chinese letters to the new font-family in your style:
.smsnotification_chinese {
font-size: 30px;
font-family: "Microsoft Yahei", SimSun;
}
This will work on stock US Windows machines. There is a more robust description of font fallbacks described here for others: Chinese Standard Web Fonts: A Guide to CSS Font Family Declarations for Web Design in Simplified Chinese.
Note: The wkhtmltoimage binary does not work on Azure worker machines due to GDI+ sandbox restrictions. You can get around this by writing your own web service wrapper or using this free wrapper: Convert HTML to PDF in .Net on Azure

How do I take off the XML version tag in the XOM library for Java?

I'm writing a small application in Java that uses XOM to output XHTML.
The problem is that XOM places the following tag before all the html:
<?xml version="1.0" encoding="UTF-8"?>
I've read their documentation, but I can't seem to find how to remove this tag. Thanks guys.
Edit: I'm outputting to a file using XOM's Serializer class
Follow up: If it is good practice to use the XML tag before the DOCTYPE, why don't any websites use it? Also, why does the W3C validator give me and error when it sees the XML tag? Here is the error:
Illegal processing instruction target (found xml)
Finally, if I were to put the XML tag before my DOCTYPE, does this mean I don't have to specify <meta charset="UTF-8" /> in my html header?

The tag is valid as XML and XHTML, and good practice. There should be no reason to remove it.
Just leave it there ... or fix whatever it is that is expecting it not to be there.
If you don't believe me, take a look at this excerpt from the XHTML 1.1 spec.
"Example of an XHTML 1.1 document
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html version="-//W3C//DTD XHTML 1.1//EN"
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/1999/xhtml
http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd"
>
<head>
<title>Virtual Library</title>
</head>
<body>
<p>Moved to example.org.</p>
</body>
</html>
Note that in this example, the XML declaration is included. An XML declaration like the one above is not required in all XML documents. XHTML document authors SHOULD use XML declarations in all their documents. XHTML document authors MUST use an XML declaration when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding is specified by a higher-level protocol."
By the way, the W3C validation service says that is OK ... but if there is any whitespace before the <?xml ...?> tag it complains.

Does this work? This is listed in the Javadoc
protected void writeXMLDeclaration()
throws IOException
You could override it, and do nothing.....
Agreed you should normally output the prologue

Assuming you wish to serve your XHTML as text/html content type, you are right to want to remove the XML declaration, because if you don't, it will throw IE6 into quirks mode.
Overriding writeXMLDeclaration() as suggested by MJB looks like a good way to do it.
But you should be aware that you may well hit other problems using an XML serializer and serving the output as text/html.
Most likely, is that the output will produce a tag like this: <script src="myscript.js" />. Browsers (except Safari) won't treat that as a script self closing tag, but as as a script start tag, and everything that follows will be treated as part of the script and not rendered by the browser.
You will probably need to override your serializer to make it HTML aware to resolve this. I suggest overriding the writeEmptyElementTag() function, and for all elements with names not in the list "area", "base", "basefont", "bgsound", "br", "col", "command", "embed", "frame", "hr", "isindex", "image", "img", "input", "keygen", "link", "meta", "param", "source", "spacer" and "wbr", call writeStartTag() and then writeEndTag() instead of the default behaviour.
Finally, if I were to put the XML tag
before my DOCTYPE, does this mean I
don't have to specify <meta
charset="UTF-8" /> in my html header?
No it doesn't. When served as text/html, the XML declaration is simply ignored by browsers, so you will still need to provide the character encoding by some other means, either the meta tag, or in the HTTP headers.

jsp pages displaying junk characters in non english languages

I have a Main JSP page say jsp1 which includes two JSP pages (jsp2, jsp3). All the strings in these pages come from property files.
The non-english property files are converted using native2ascii
native2ascii –encoding="8859-1" lang.properties lang1.properties
All the JSP pages have
<%# page contentType="text/html;charset=UTF-8" language="java" %>
Now when main jsp page(jsp1) gets displayed, we see garbled characters in a few strings of jsp2 and jsp3. Till now I have seen this happening to Russian, Korean, Japanese language strings. And it happens on a random string.
Does any one have an idea what could be wrong
Updating with more details
The string in rus_utf8.proeperties is
Щелкните <strong>УСТАНОВИТЬ СЕЙЧАС</strong> и сохраните файл в некотором расположении
After Conversion using native2Ascii, String in rus.properties is
\u0429\u0435\u043b\u043a\u043d\u0438\u0442\u0435 <strong>\u0423\u0421\u0422\u0410\u041d\u041e\u0412\u0418\u0422\u042c \u0421\u0415\u0419\u0427\u0410\u0421</strong> \u0438 \u0441\u043e\u0445\u0440\u0430\u043d\u0438\u0442\u0435 \u0444\u0430\u0439\u043b \u0432 \u043d\u0435\u043a\u043e\u0442\u043e\u0440\u043e\u043c \u0440\u0430\u0441\u043f\u043e\u043b\u043e\u0436\u0435\u043d\u0438\u0438.
In JSP we use struts <s:text> to load the string from property file
In firefox the string got displayed as
��елкните УСТАНОВИТЬ СЕЙЧАС и сохраните файл в некотором расположении.
The char Щ got garbled. Same String in some other place in the page got displayed properly.

The non-english property files are converted using native2ascii
native2ascii –encoding="8859-1" lang.properties lang1.properties
This is invalid. It should have been
native2ascii –encoding ISO-8859-1 lang.properties lang1.properties
Apart from the syntax error which you have there (which should immediately have aborted native2ascii), the ISO-8859-1 encoding can impossibly be correct for Russian, Korean and Japanese strings. The ISO-8859-1 encoding does not cover those characters at all. Assuming that you saved it as UTF-8, then you should be using
native2ascii –encoding UTF-8 lang.properties lang1.properties
This way the native2ascii will convert from an UTF-8 lang.properties to an ISO-8859-1 compatible lang1.properties. The native2ascii will always convert to ASCII. The -encoding attribute concerns the encoding of the source file, not the target file.
As to the JSP pages, just a
<%#page contentType="text/html; charset=UTF-8" %>
ought to be sufficient, per http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8.
See also:
Unicode - How to get the characters right?
Update as per your update with the examples. Everything is actually working right. It only look much like that the UTF-8 BOM (Byte Order Mark) is the culprit. Notepad adds it by default. Try creating the properties file in another editor instead like Eclipse.

I have the same problem as you and I also tried the similar solutions but they didn't work. Hence I suspected that it may not be an issue with JSP config but rather it is a config issue with my tomcat.
I found this on a Chinese site, https://openhome.cc/Gossip/Encoding/Servlet.html: request.setCharacterEncoding("UTF-8");. It worked for me. I added this before my request.getParameter();.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.