Convert String from AS400 to Java - java

To communicate with as400 I use a java web service with the jt400 library, this web service is running under Linux.
The text result after calling as400 program contains accented character like é à è… but in my xhtml page the text isn't displayed correctly, for example é is replaced by {.
The as400 is configured like this: ccsid : 65535 and encoding : 297.
When the same web service run under windows, I can display correctly accented characters
Thank for help.

You seem to have ran in to Mojibake caused by interpreting bytes of text in the incorrect encoding. You mention é being replaced by {; the code point for é in CCSID 297 is 0xC0 which in CCSID 37 is {, so this makes sense.
I'm not sure where the data is coming from, but if you're using AS400Text to convert the data in to a Java String object, you'll need to specify the correct CCSID or it will pick a CCSID based on the current locale. You can either specify the CCSID from AS400.getCcsid or the associated encoding string value from AS400.getJobCCSIDEncoding.

Related

Trouble of viewing an XML file encoded in utf-8 with non-ascii characters

I have an xml file which gets its data from an oracle table of CLOB type. I wrote into the Clob value using Unicode Stream
Writer value = clob.setCharacterStream(0L);
value.write(StrValue));
When I write non-ascii characters like chinese and then access the clob attribute using PL/SQL developer, I see the characters showing up as they are. However, when I put the value in an xml file encoded in UTF-8 and try to open the xml file through IE, i get the error message
"an invalid character was found in text content. Error processing
resource ...".
The other interesting thing is that when I write into CLOB using ascii stream, like:
OutputStream value = clob.getAsciiOutputStream();
value.write(strValue.getBytes("UTF-8"));
then, the characters appear correctly in the XML on browser , but are messy in DB when accessed using PL/SQL developer.
Is there any problem in converting unicode characters to UTF-8. Any suggestion please?

Converting XML to JSON results in unknown characters when running on Centos instead of Windows

I have a Java servlet which gets RSS feeds converts them to JSON. It works great on Windows, but it fails on Centos.
The RSS feed contains Arabic and it shows unintelligible characters on Centos. I am using those lines to encode the RSS feed:
byte[] utf8Bytes = Xml.getBytes("Cp1256");
// byte[] defaultBytes = Xml.getBytes();
String roundTrip = new String(utf8Bytes, "UTF-8");
I tried it on Glassfish and Tomcat. Both have the same problem; it works on Windows, but fails on Centos. How is this caused and how can I solve it?
byte[] utf8Bytes = Xml.getBytes("Cp1256");
String roundTrip = new String(utf8Bytes, "UTF-8");
This is an attempt to correct a badly-decoded string. At some point prior to this operation you have read in Xml using the default encoding, which on your Windows box is code page 1256 (Windows Arabic). Here you are encoding that string back to code page 1256 to retrieve its original bytes, then decoding it properly as the encoding you actually wanted, UTF-8.
On your Linux server, it fails, because the default encoding is something other than Cp1256; it would also fail on any Windows server not installed in an Arabic locale.
The commented-out line that uses the default encoding instead of explicitly Cp1256 is more likely to work on a Linux server. However, the real fix is to find where Xml is being read, and fix that operation to use the correct encoding(*) instead of the default. Allowing the default encoding to be used is almost always a mistake, as it makes applications dependent on configuration that varies between servers.
(*: for this feed, that's UTF-8, which is the most common encoding, but it may differ for others. Finding out the right encoding for a feed depends on the Content-Type header returned for the resource and the <?xml encoding declaration. By far the best way to cope with this is to fetch and parse the resource using a proper XML library that knows about this, for example with DocumentBuilder.parse(uri).)
There are many places where wrong encoding can be used. Here is the complete list http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8

UTF-8Characters not displayed correctly in

we are working on a project for school, The project is mandatory tri-lingual (dutch, english and french) , so the answer "Change to English will not do".
All our classes and resource files are encoded in UTF-8 format, and alle non-standar english characters are diplayed correctly in the classes themself.
the problem is that once we try to display our text, alle non-standard english characters are distorted.
We hear alot that this is due to an encoding issue, but I sincerly doubt that, since our whole project is encode in UTF-8.
here is extract from the french resource bundle:
VIDEOSETTINGS = Réglages du Vidéo
SOUNDSETTINGS = Réglages du son
KEYBINDSETTINGS = Keybind Paramètres
LANGUAGESETTINGS = Paramètres de langue
DIFFICULTYSETTINGS = Paramètres de Difficulté
EXITSETTINGS = Sortie les paramètres
and this results in these following displayed strings.
display result for provided resourcebundle extract
I would be most gratefull for a solution for this problem
EDIT
for extra info we are building a desktop app using Swing.
This is due to an encoding issue.
You are using the wrong decoder (probably ISO-8859-1) on UTF-8 encoded bytes.
Are these strings stored in a file? How are you loading the file? Via the Properties class? The Properties class always applies ISO-8859-1 decoding when loading the plain text format from an InputStream. If you are using Properties, use the load(Reader) overload, switch to the XML format, or re-write the file with the matching encoding. Also, if you are using Resource.getBundle() to load a properties file, you must use ISO-8859-1 encoding to write that file, escaping any non-Latin characters.
Since this is an encoding issue, it would be most helpful if you posted the code you have used to select the character encoding.
You didn't show some code, where you read the resource files. But if you use PropertyResourceBundle with an InputStream in the constructor, the InputStream must be encoded in ISO-8859-1. In that case, characters that cannot be represented in ISO-8859-1 encoding must be represented by Unicode Escapes.
You can use native2ascii or AnyEdit as tools to convert Properties to unicode escapes,
see Use cyrillic .properties file in eclipse project

Swedish text(vår, än, tall).try to send a mail .But while getting mail Question Mark(?) is observed in place accent character in Linux Env

I read the template which contains the Swedish text(vår, än, tall) . then I try to send a mail .
But while getting mail Question Mark(?) is observed in place of special or accent character .
In Windows it work fine but in Linux it’s not support.
I have used content type as :
text/html; charset=ISO-8859-1 (Windows work fine but in Linux it does not support)
text/html;charset=utf-8 (Windows work fine but in Linux it does not support)
text/x-vcard; charset=utf-8
text/plain; charset=ISO-8859-1; format=flowed
I set content Type as CONTENT_TYPE = "text/html;charset=utf-8";
Is there any solution ,to get proper mail.
private final static String CONTENT_TYPE = "text/html; charset=ISO-8859-1";
Message msg = new MimeMessage(session);
msg.setContent(message, CONTENT_TYPE);
System.setproperty("utf8");
I am getting ? in place of accent characters in Linux Env.
If you want your String to be encoded correctly, you should use one of the methods:
MimeMessage.setText(String text, String charset)
MimeMessage.setText(String text, String charset, String subtype)
which allow you to specify the charset of your text (i.e. "utf-8") and additionally the subtype (e.g. "plain", as in "text/plain")
Actually problem was while read the text.b/c inputstreamreader was reading the file with UTF-8 character set which provided by linux as default.that's why question marks are observed in place of accent characters. And in windows inputstreamreader use characterset as MS-1252 which also support swedish text. that's why mail was coming proper in windows.So i come to know that always find the root cause problem was not to send a mail .problem was in reading the file.
For future reference, Analysis of the issue is as follows:
Root Cause:
Input stream reader of java takes the default character set based on operating system if not specified explicitly.
While trying to read the file using input stream reader of java:
• For UNIX system: Default character set is “UTF-8” which do not support Swedish (Latin) characters.
• For Windows system: Default character set is “MS-1252” which supports Latin characters.
Solution:
Set the correct character set while initializing the input stream reader in java for reading the file from file system. This will avoid defaulting of character set based on operating system.
Use constructor (suggested):
InputStreamReader(InputStream in, String charsetName)
Generally used constructor (avoid for supporting different charsets):
InputStreamReader(InputStream in)
Sample code:
new InputStreamReader(bfInputStream, CHARACTER_SET);
where CHARACTER_SET = “ISO-8859-1”

Accents not shown on j2me

I'm building a j2me app in french, but it doesn't show certain strings not right. For instance: "Cette page donne un aperçu des dernières nouvelles" becomes "Cette page donne un eperÃϨ"res nouvelles".
Does anyone know why this is?
That funny ÃϨ string is what UTF-8 looks like when displayed in a program that's expecting ASCII or Windows-1252. Are you sure the software on the phone is set to UTF-8, and/or the encoding header on the data stream matches the actual encoding?
For example, if this is XML and the header said Windows-1252 but the actual encoding was UTF-8, and the phone software respected the header, this would be the result.

Categories

Resources