I use properties for i18n. If I specify non-English locale:
ResourceBundle messages = ResourceBundle.getBundle("i18n/Messages", new Locale("tr", "TR"));
I see the message with wrong encoding:
ÐÑивеÑ!
How to set the correct encoding?
Problem: By default ISO 8859-1 character encoding is used for reading content of properties file, so if the file contains any character beyond ISO 8859-1 then it will not be processed properly.
First solution:
Process the properties file to have content with ISO 8859-1 character encoding by using native2ascii - Native-to-ASCII Converter
What native2ascii does: It converts all the non-ISO 8859-1 character in their equivalent \uXXXX. This is a good tool because you need not to search the \uXXXX equivalent of special character.
Usage for UTF-8: native2ascii -encoding utf8 e:\a.txt e:\b.txt
If you use Eclipse then you will notice that it implicitly converts the special character into \uXXXX equivalent. Try copying
会意字 / 會意字
into a properties file opened in Eclipse.
Second solution: Create a custom ResourceBundle.Control class which can be used with ResourceBundle to read properties in any given encoding scheme. You may want to use an already created and shared CustomResourceBundleControl.
Use it as below:
ResourceBundle messages = ResourceBundle.getBundle("i18n/Messages", new CustomResourceBundleControl("UTF-8"));
Related
I want to read hindi text from lang.properties(JAVA.util.properties) file.
I am using eclipse IDE.
First of all how can I save(or write) hindi letter in .properties file
Secondly how to read the string from my java class.
lang.properties
hindiText=साहिलसाहिल
Java Class
Properties prop = new Properties();
prop.load(MyCalss.class.getClassLoader().getResourceAsStream("lang.properties"));
String hindi=prop.getProperty("hindiText");
It's not working.
As documented, Properties.load(InputStream) will always use the ISO-8859-1 encoding, and that encoding doesn't handle the characters you're interested in.
Options:
Wrap your stream in an InputStreamReader and specify the encoding explicitly
Use Unicode escaping (e.g \u1234) in the file for any characters not in ISO-8859-1 (and make sure the file is saved as ISO-8859-1)
In eclipse, I changed the default encoding to ISO-8859-1. Then I wrote this:
String str = "Русский язык ";
PrintStream ps = new PrintStream(System.out, true, "UTF-8");
ps.print(str);
It should print the String correctly, as I am specifying UTF-8 encoding. However, it is not printing.
The ISO-8859-1 character encoding only supports characters between 0 and 255, and anything else is likely to be turned into '?'
If you save the source file (the .java file) as ISO-8859-1 than str will be encoded by javac using ISO-8859-1. Your problem does not lie in the creation of PrintStream: the str you are printing is wrong from the beginning.
Yes, it looks like the terminal that your are sending this output does not support this encoding.
If you are running Eclipse, you could set the encoding as follows:
In Run Configurations...->Common ->Encoding->Other
Select UTF-8
You are basically telling the PrintStream writer to expect the input characters to be UTF-8 encoded and to output it as UTF-8. There is no conversion. If you set your IDE to use ISO-8859-1 as character encoding for your file, which in turns contains the input string than you pipe ISO-8859-1 encoded characters into an UTF-8 expecting writer. So the writer treats the bytes receiving as UTF encoded characters which will result in data junk.
Either set your IDE to encode your source files in UTF-8 and check that your characters are correctly displayed and stored. Or tell your writer to treat them as ISO-8859-1, either way should do.
I have problem with Barcode4J and generation DataMatrix with ISO-8859-2 characters in message.
Below example use of barcode4j (version 2.1.0) from command line. As You can see when i use message "żaba" i get error Message contains characters outside ISO-8859-1 encoding. Is DataMatrix specification support ISO-8859-1 only or something is missing in Barcode4J ?
java -cp build/barcode4j.jar:lib/avalon-framework-4.2.0.jar:lib/commons-cli-1.0.jar org.krysalis.barcode4j.cli.Main -s datamatrix "żaba"
Exception in thread "main" java.lang.IllegalArgumentException: Message contains characters outside ISO-8859-1 encoding.
at org.krysalis.barcode4j.impl.datamatrix.DataMatrixHighLevelEncoder$EncoderContext.<init>(DataMatrixHighLevelEncoder.java:199)
at org.krysalis.barcode4j.impl.datamatrix.DataMatrixHighLevelEncoder.createEncoderContext(DataMatrixHighLevelEncoder.java:171)
at org.krysalis.barcode4j.impl.datamatrix.DataMatrixHighLevelEncoder.encodeHighLevel(DataMatrixHighLevelEncoder.java:119)
at org.krysalis.barcode4j.impl.datamatrix.DataMatrixLogicImpl.generateBarcodeLogic(DataMatrixLogicImpl.java:50)
at org.krysalis.barcode4j.impl.datamatrix.DataMatrixBean.generateBarcode(DataMatrixBean.java:128)
at org.krysalis.barcode4j.impl.ConfigurableBarcodeGenerator.generateBarcode(ConfigurableBarcodeGenerator.java:174)
at org.krysalis.barcode4j.cli.Main.handleCommandLine(Main.java:164)
at org.krysalis.barcode4j.cli.Main.main(Main.java:86)
As is described here, Barcode4J only currently supports the default character set defined by the DataMatrix specification (ISO-8859-1). Support for ECI hasn't been implemented for DataMatrix, yet. You can, however, encode binary messages by encoding a byte stream as an RFC 2397 data URL. That byte stream could be a string encoded using UTF-8. The drawback: the reader might not be able to interpret the data correctly.
When I process a properties file with the Spanish characters ó and é, characters are displayed as ?. I tried different ways to fix this, but still fail:
I tried to use \uxxxx
I tried to use InputStreamReader with encoding UTF-8
I tried to convert string to bytes and then create a new String from those bytes:
new String( val.getBytes("UTF-8"), "UTF-8")
Nothing worked. What should I do next to fix this issue? Japanese and Russian are still OK.
The properties file needs to be in the proper encoding. By default some IDE's like eclipse saves the content using CP1252 but you are requiring the file as UTF-8. This is also required for your java code.
If you try to use \uxxxx characters but your application by default is working with CP1252 the conversion of the escape code result in a bad character.
If you use the InputStreamReader to force the reading as UTF-8 but your code and/or your file are not using UTF-8 support result in a bad character.
If you use UTF-8 conversion of an string but your source code is CP1252 you should have the same problem.
Related previous answer about source code : Should source code be saved in UTF-8 format
Notepad ++ Has a menu to view the format of the file and change it in "Format" menu you should view the file as if it should be opened by other formarts or you should convert the file to other file formats like "UTF-8"
we are working on a project for school, The project is mandatory tri-lingual (dutch, english and french) , so the answer "Change to English will not do".
All our classes and resource files are encoded in UTF-8 format, and alle non-standar english characters are diplayed correctly in the classes themself.
the problem is that once we try to display our text, alle non-standard english characters are distorted.
We hear alot that this is due to an encoding issue, but I sincerly doubt that, since our whole project is encode in UTF-8.
here is extract from the french resource bundle:
VIDEOSETTINGS = Réglages du Vidéo
SOUNDSETTINGS = Réglages du son
KEYBINDSETTINGS = Keybind Paramètres
LANGUAGESETTINGS = Paramètres de langue
DIFFICULTYSETTINGS = Paramètres de Difficulté
EXITSETTINGS = Sortie les paramètres
and this results in these following displayed strings.
display result for provided resourcebundle extract
I would be most gratefull for a solution for this problem
EDIT
for extra info we are building a desktop app using Swing.
This is due to an encoding issue.
You are using the wrong decoder (probably ISO-8859-1) on UTF-8 encoded bytes.
Are these strings stored in a file? How are you loading the file? Via the Properties class? The Properties class always applies ISO-8859-1 decoding when loading the plain text format from an InputStream. If you are using Properties, use the load(Reader) overload, switch to the XML format, or re-write the file with the matching encoding. Also, if you are using Resource.getBundle() to load a properties file, you must use ISO-8859-1 encoding to write that file, escaping any non-Latin characters.
Since this is an encoding issue, it would be most helpful if you posted the code you have used to select the character encoding.
You didn't show some code, where you read the resource files. But if you use PropertyResourceBundle with an InputStream in the constructor, the InputStream must be encoded in ISO-8859-1. In that case, characters that cannot be represented in ISO-8859-1 encoding must be represented by Unicode Escapes.
You can use native2ascii or AnyEdit as tools to convert Properties to unicode escapes,
see Use cyrillic .properties file in eclipse project