Read Japanese Characters from a properties file using Resource Bundle

Read Japanese Characters from a properties file using Resource Bundle - java

I am working on developing a project and when I attempt to read in a Japanese character string from a properties file using ResourceBundle in Java, all I get are a bunch of question marks (??????????) displayed in my application.
I know that I need to encode the string somehow, but I've tried several different ways with no luck. I've also tried to encode the properties file in a text editor but this doesn't seem to work how I'd like either. My code for retrieving the string is below. I know you can do this with stream readers and other methods but I would like to try to stay with the current code structure using ResourceBundle if possible. If anyone has any questions or needs clarification please feel free to let me know.
private final ResourceBundle bundle = ResourceBundle.getBundle("propertiesFile");
titleText = bundle.getString("title.text");
Just to give you an idea, title.text=会議参加者事前登録用ページ in my property file.
Any suggestions are appreciated. This has turned into a real pain.
Thanks again,
-Dave F.

ResourceBundle can only read ISO 8859-1 file. See this thread. How to use UTF-8 in resource properties with ResourceBundle
I had the same problem here, and I've converted my ResourceBundle to a Properties object like in the answer of this thread: read resourcebundle as UTF-8. getString() Method seems to change encoding to ISO-8859
Like this:
InputStream utf8in = getClass().getClassLoader().getResourceAsStream("/path/to/utf8.properties");
Reader reader = new InputStreamReader(utf8in, "UTF-8");
Properties props = new Properties();
props.load(reader);

After some research I was actually still able to make use of the ResourceBundle method.
In my properties file that ResourceBundle pulls from, I listed the following in unicode:
title.text=\u4f1a\u8b70\u53c2\u52a0\u8005\u4e8b\u524d\u767b\u9332\u7528\u30da\u30fc\u30b8
When pulled in by ResourceBundle, it translates as:
会議参加者事前登録用ページ
I'm sure this probably isn't the best practice, but it works without having to change how the entire project pulls in its resource properties, so I just thought I would share with you all for helping me out.

when you run your app you can try this:
java -Dfile.encoding=UTF-8 -jar YourApp.jar
this basically sets the default charset

Related

Creating a text file with java without using absolute path

following the question I asked before How to have my java project to use some files without using their absolute path? I found the solution but another problem popped up in creating text files that I want to write into.here's my code:
private String pathProvider() throws Exception {
//finding the location where the jar file has been located
String jarPath=URLDecoder.decode(getClass().getProtectionDomain().getCodeSource().getLocation().getPath(), "UTF-8");
//creating the full and final path
String completePath=jarPath.substring(0,jarPath.lastIndexOf("/"))+File.separator+"Records.txt";
return completePath;
}
public void writeRecord() {
try(Formatter writer=new Formatter(new FileWriter(new File(pathProvider()),true))) {
writer.format("%s %s %s %s %s %s %s %s %n", whichIsChecked(),nameInput.getText(),lastNameInput.getText()
,idInput.getText(),fieldOfStudyInput.getText(),date.getSelectedItem().toString()
,month.getSelectedItem().toString(),year.getSelectedItem().toString());
successful();
} catch (Exception e) {
failure();
}
}
this works and creates the text file wherever the jar file is running from but my problem is that when the information is been written to the file, the numbers,symbols, and English characters are remained but other characters which are in Persian are turned into question marks. like: ????? 111 ????? ????.although running the app in eclipse doesn't make this problem,running the jar does.
Note:I found the code ,inside pathProvider method, in some person's question.

Your pasted code and the linked question are complete red herrings - they have nothing whatsoever to do with the error you ran into. Also, that protection domain stuff is a hack and you've been told before not to write data files next to your jar files, it's not how OSes (are supposed to) work. Use user.home for this.
There is nothing in this method that explains the question marks - the string, as returned, has plenty of issues (see above), but NOT that it will result in question marks in the output.
Files are fundamentally bytes. Strings are fundamentally characters. Therefore, when you write code that writes a string to a file, some code somewhere is converting chars to bytes.
Make sure the place where that happens includes a charset encoding.
Use the new API (I think you've also been told to do this, by me, in an earlier question of yours) which defaults to UTF-8. Alternatively, specify UTF-8 when you write. Note that the usage of UTF-8 here is about the file name, not the contents of it (as in, if you put persian symbols in the file name, it's not about persian symbols in the contents of the file / in the contents you want to write).
Because you didn't paste the code, I can't give you specific details as there are hundreds of ways to do this, and I do not know which one you used.
To write to a file given a String representing its path:
Path p = Paths.get(completePath);
Files.write("Hello, World!", p);
is all you need. This will write as UTF_8, which can handle persian symbols (because the Files API defaults to UTF-8 if you specify no encoding, unlike e.g. new File, FileOutputStream, FileWriter, etc).
If you're using outdated APIs: new BufferedWriter(new OutputStreamWriter(new FileOutputStream(thePath), StandardCharsets.UTF-8) - but note that this is a resource leak bug unless you add the appropriate try-with-resources.
If you're using FileWriter: FileWriter is broken, never use this class. Use something else.
If you're converting the string on its own, it's str.getBytes(StandardCharsets.UTF_8), not str.getBytes().

Java saving a file with special characters in file name

I'm having a problem on Java file encoding.
I have a Java program will save a input stream as a file with a given file name, the code snippet is like:
File out = new File(strFileName);
Files.copy(inStream, out.toPath());
It works fine on Windows unless the file name contains some special characters like Ö, with these characters in the file name, the saved file will display a garbled file name on Windows.
I understand that by applying JVM option -Dfile.encoding=UTF-8 this issue can be fixed, but I would have a solution in my code rather than ask all my users to change their JVM options.
While debugging the program I can see the file name string always shows the correct character, so I guess the problem is not about internal encoding.
Could someone please explain what went wrong behind the scene? and is there a way to avoid this problem programmatically? I tried get the bytes from the string and change the encoding but it doesn't work.
Thanks.

Using the URLEncoder class would work:
String name = URLEncoder.encode("fileName#", "UTF-8");
File output = new File(name);

java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key

I don't know whats going on.
Here is my Titles_en_US.properties file:
WEBSITE.TITLE = Hello World
FOOTER.DISCLAIMER = Disclaimer
FOOTER.TERMS_OF_USE = Terms of Use
FOOTER.PRIVACY_POLICY = Privacy Policy
Here is my method:
private String getTitle() throws Exception {
System.out.println("\n\n==>"+getProperty("FOOTER.DISCLAIMER",LabelsFile()));
return getProperty("WEBSITE.TITLE", LabelsFile());
}
Both FOOTER.DISCLAIMER and WEBSITE.TITLE in same properties file but one is working and other is throwing following error:
==>Disclaimer
Resources.ResourceBundle.java:getProperty()
java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key WEBSITE.TITLE
Please advise where I am making mistake?
UPDATE
I noticed that when I give line break then its working fine. Not able to understand why resource bundle is not reading from 1st line of property file?

I found that when I give a line break at top of my property file then it started work fine. Then I searched why the java ResourceBundle is not reading first line of the property file and found this POST. In this post said that:
The load(Reader) / store(Writer, String) methods load and store
properties from and to a character based stream in a simple
line-oriented format specified below. The load(InputStream) /
store(OutputStream, String) methods work the same way as the
load(Reader)/store(Writer, String) pair, except the input/output
stream is encoded in ISO 8859-1 character encoding.
and also in the aforementioned post advised to change the encoding of the properties file to ISO-8859-1.

First key from Language properties file not accessible

I have a language properties file with around 3000+ keys. When I try to read the value of a key using
ResourceBundle messages = ResourceBundle.getBundle("com.mt.asm.language.MessagesBundle", locale);, I see that the first key alone is missing in the messages bundle.
I try to retrieve the value using:
String value = new String(messages.getString(key).getBytes("ISO-8859-1") , "UTF-8");
I tried a lot to identify the root cause, but my tries were of no use.
What could be the possible reason for this strange behaviour.

I was able to find the cause of the problem.
The properties file was in UTF-8 BOM format which was expecting Byte Order Mark in the file. This has been solved by using a properties file in UTF-8 format.

getBytes() With UTF-8 Doesn't Work for Upper-Case German Umlauts

For development I'm using ResourceBundle to read a UTF-8 encoded properties-file (I set that in Eclipse' file properties on that file) directly from my resources-directory in the IDE (native2ascii is used on the way to production), e.g.:
menu.file.open.label=&Öffnen...
label.btn.add.name=&Hinzufügen
label.btn.remove.name=&Löschen
Since that causes issues with the character encoding when using non-ASCII characters I thought I'd be happy with:
ResourceBundle resourceBundle = ResourceBundle.getBundle("messages", Locale.getDefault());
String value = resourceBundle.getString(key);
value = new String(value.getBytes(), "UTF-8");
Well, it does work nicely for lower-case German umlauts, but not for the upper-case ones, the ß also doesn't work. Here's the value read with getString(key) and the value after the conversion with new String(value.getBytes(), "UTF-8"):
&LÃ¶schen => &Löschen
&HinzufÃ¼gen => &Hinzufügen
&Ã?ber => &??ber
&SchlieÃ?en => &Schlie??en
&Ã?ffnen... => &??ffnen...
The last three should be:
&Ã?ber => &Über
&SchlieÃ?en => &Schließen
&Ã?ffnen... => &Öffnen...
I guess that I'm not too far away from the truth, but what am I missing here?
Google found something similar, but that remained unanswered.
EDIT: a little more code

The problem is you're calling String.getBytes() without specifying an encoding - which will use the default platform encoding. You're then using the binary result of that operation as if it were in UTF-8.
If you use UTF-8 in both directions, it'll be fine:
// Should be a round-trip
value = new String(value.getBytes("UTF-8"), "UTF-8");
... but if you were trying to use this to read a UTF-8-encoded property file without telling the code which is performing the initial read, that won't work.
The code you've presented is basically always the wrong approach. Your "Since that causes issues with the character encoding" suggests that you'd already run across an earlier problem - so I'd go back to that, instead of trying to apply a broken fix. If you've already lost data when constructing the ResourceBundle, it's too late to go back later... you need to make sure the ResourceBundle itself is loaded correctly.
Please tell us exactly what problems you had with the ResourceBundle, and we can see if we can fix the root cause.
EDIT: It's not clear how you're running native2ascii. The fix may be as simple as changing to use:
native2ascii -encoding UTF-8 input.properties output.properties

Some notes:
If it is a String it is UTF-16 and if it isn't it is a corrupt string (and too late to fix.)
new String(value.getBytes(), "UTF-8"); - this code will (at best) do nothing on a system that uses UTF-8 as the default encoding; otherwise it will corrupt the string.
.properties files must be ISO 8859-1 (the Properties type supports other formats and encodings, but I don't know how you would tell ResourceBundle that.)
System.out can introduce its own transcoding bugs (the PrintStream encodes UTF-16 strings to the default encoding; the receiving device must decode the bytes using the same encoding.)
I suspect you are trying to fix your problems in the wrong place.

You are encoding the text with a different encoding to the one you are decoding with.
Try instead using the same character set for encoding and decoding.
value = new String(value.getBytes("UTF-8"), "UTF-8");
String s = "ßßßßß";
s += s.toUpperCase();
s = new String(s.getBytes("UTF-8"), "UTF-8");
System.out.println(s);
prints
ßßßßßSSSSSSSSSS

Today I was talking to one of my colleagues and he was pretty much on the same path as the other answers have mentioned. So I tried to achieve what Jon Skeet had mentioned, meaning creating the same file as in production. Since rebuilding the project after each change of a resource is out of question and I hadn't done any of what solved this (and I guess this will be new to some) let me line it out (even if it may be just for personal reference ;) ). In short this uses Eclipse' project builders.
Create an Ant-style build.xml
<?xml version="1.0" encoding="UTF-8"?>
<project>
<property name="dir.resources" value="src/main/resources" />
<property name="dir.target" value="bin/main" />
<target name="native-to-ascii">
<delete dir="${dir.target}" includes="**/*.properties" />
<native2ascii src="${dir.resources}" dest="${dir.target}" includes="**/*.properties" />
</target>
</project>
Its intention is to delete the properties-files in the target directory and use native2ascii to recreate them. The delete is necessary as native2ascii won't overwrite existing files.
In Eclipse go to the project properties and select "Builders", click "New...", pick "Ant Builder" (that's the slightly enhanced editor for run configurations)
In "Main" let "Buildfile" point to the Ant-script, set "Base Directory" to ${project_loc}
In "Refresh" tick "Refresh resources upon completion" and pick "The project containing the selected resource"
In "Targets" click "Set Targets" next to the "Auto Build" and pick native-to-ascii there (note that for some reason I had to do this later again)
This might not be necessary for everybody, but in "JRE" pick a proper execution environment
In "Build Options" tick off "Allocate Console" (however, you may want to keep this ticked on until you see that it's all working)
"Apply", "OK"
I was told that the newly created builder should be somewhere underneath the Java Builder (use Up/Down-button)
In the "Java Build Path" select the source folder with the resources (src/main/resources for me) and add an exclusion for **/*.properties
That should have been it. If you edit a properties-file and save it, it should automatically be converted to ASCII in the output folder. You can try with entering ü, which should end up as \u00fc.
Note that if you have a lot of properties-files, this may take some time. Just don't save after every keypress. :)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.