Trouble getting accents to show up in my java app

Trouble getting accents to show up in my java app - java

We recently got a localization file the contains Portuguese translations of all the string in our java app. The file they gave me was a .csv file and I use file maker to create .tab file which is what we need for our purposes. Unfortunately none of the accents seem to work. For example the string vocÍ in our localization file shows up as vocΩ inside the application. I tried switching the language settings to portuguese before creating compiling but I still get this problem, anyone have any ideas of what else I might need to try?

I think that you problem is related to the file encoding used.
Java has full unicode support so there shouldn't be any problems, unless the file you are reading (the one made with FileMaker) is encoded in something different than UTF8 (which is the default used by Java).
You can try saving the file in a different encoding or specifying which encoding to use when opening it from Java (see here). Many API classes support additional parameters to specify which charset to use when opening a file. Just take a look at the documentation..

Related

Java PC application - exported JAR do not behave as in development

I have an classic Java application for PC. The result of the build is a JAR file which is running on Windows machine.
The application is reading some XML files and creating an HTML document as an end result. The Xml file contains specific language characters that are not native to English.
While in development, in the IDE (Apache NetBeans 13), build - > Run the exported HTML file contains specific language characters.
When I run the JAR file, from the Project - > dist directory , HTML do not contain specific language characters.
For example characters like: č , ć , đ, š are being exported as : Ä� , while running from NetBeans they are exported as such, not as that strange symbol.
The letters in question are from Serbian, Croatian and Bosnian.
When I export the project from NetBeans, I made sure to have this option enabled:
Project -> Project properties -> Build -> Packaging where the "Copy Dependent Libraries" option is selected.
I am puzzled at this point. If anybody has any idea why something is working one way in IDE and other when exported please let me know.

The likely problem is that your HTML file needs to identify its character encoding. Nowadays, generally best to use UTF-8 as the encoding for most purposes.
Determine the file’s encoding
If you have access to the source code of your Java app, examine that to see what character encoding is being used when producing the HTML file. But I assume you have no such access.
Open the HTML file in a text-editor to examine its raw source code. See if it specifies a character encoding. If it does, and that character encoding indicator is incorrect, you will need to alter your HTML file.
If no character encoding is indicated within the HTML, you will need to experiment to discover the encoding. Open the HTML file in a web browser, then use the "view" or developer tools available in most browsers (Firefox, Safari, Edge, etc.) to explicitly switch between encodings.
If switching to a particular encoding causes the text to appear as expected, then you know the likely encoding.
Specify the file’s encoding
In the modern version of HTML, HTML5, UTF-8 is the default encoding assumed by the web browser. But if the web browser switches into Quirks Mode, the browser may assume another encoding. To help avoid Quirks Mode, a HTML5 document should start with <!DOCTYPE html>.
So, best to be explicit about the encoding. Once you determine the encoding being used by your Java app creating the HTML file, either alter that app (if you have source code) to write an indicator of the encoding, or else write another Java app to edit the produced HTML file to include the indicator. If you are not a Java developer, you could use any programming language or even a shell script to edit the produced HTML file.
To indicate the encoding of an HTML5 file, add a meta element.
For UTF-8:
<meta charset="UTF-8">
For Latin-1:
<meta charset="ISO-8859-1">
If your Java app was developed exclusively on Microsoft Windows, the developer may have knowingly or unwittingly used one of the Microsoft defined character encodings. Older versions of Java defaulted to using a character encoding specific to the host platform — but be aware in Java 18+ the default changes to UTF-8 across platforms.
For more info
You can read about these issues in many places. Like here and in Wikipedia.
If you are not savvy with character sets and character encoding, I highly recommend reading the surprisingly entertaining article, The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), by Joel Spolsky.

Special Characters with MYSQL JAVA Netbeans and JXL

I'm having troubles with special characters like í, ó, ñ, and so on in my MYSQL database.
I've read a lot about character set, collation, utf and son on but I might not being able to proper apply it to this problem. I'm not sure if the problem is with my tables or when reading the source file and encoding. I'm working on netbeans developing an interface that generates specific reports
Here I briefly explain the process when the issue occurs
There is a table where are located the path files for the Excel files that are requested.
My code takes that path, opens the file using the JXL library.
Then it starts reading every cell that it is indicated in the code and export the data to the tables indicated (HERE IS WHERE IN THE TABLES APPEARS THIS � IN EVERY SPECIAL CHARACTER)
The data is correctly exported to the several tables, and there is no problem with that, only special characters that are replaced with this �
So, after researching I've tried this
As I'm using MYSQL workbench I've alter every table in the collation from utf8 - default TO utf8 - utf8_spanish_ci and utf8 - utf8_spanish_ci
I've tried also changing the collation to utf16 - default collation, utf_spanish_ci, bin, etc
And also, I've tried using the utf32 collation.
The netbeans encoding it is correctly set to utf-8
I've tried exploring the JAVA functions that connects to MYSQL, I'm using this connection but I haven't found nothing about encoding:
private Connection conecta=null;
String Driver="com.mysql.jdbc.Driver";
String Url="jdbc:mysql://XXX.XXX.XXX.XX/BLAABLA";
Class.forName(Driver);
conecta=DriverManager.getConnection(Url,User,Password);
The database when I manually insert data that contains special characters, it correctly displays whenever I need it, but when I try using the automatically insert option as previously described that reads the file using jxl library and so on, then it happens the � thing.
I've found some useful answers here in stackoverflow but all are related to specific cases of php, html, xml and so on. All the answers about java are related to the utf8 collation
So I hope you can provide me some help.
Am I not doing the collation correctly?, Should I try something weird directly in the console? Does mysql workbench is forbidding something?
I'm open to all kind of suggestions but if the answer is like "You must use another library because jxl does not work with that" please consider that my project is almost done, and re-do this with a different library could take me much more time as I already have expected. Please, if JXL is the problem probably there must be something else. Nothing is impossible right?
Thanks for the time.

Excel files by default uses windows-1255 (Cp1255) codification. So when you read those bytes from the excel file you need to treat them with that charset and then store them with utf8.

DataEditorSupport and/or ClonableEditor using UTF-8

I have a Netbeans platform program that uses a custom DataEditorSupport and a custom ClonableEditor. The files it reads are UTF-8 encoded and the editor that is created does not seem to be using UTF-8.
For example my file has
"TêSt"
and the editor displays this as
"TÃªStÃ"
How can I get the DataEditorSupport or ClonableEditor to use correctly read UTF-8?

This FAQ entry in the NetBeans wiki might be of help. See also the General Queries API and, in special, the FileEncodingQuery.

File names with Japanese characters turn to garbage when written to a zip file using java.util.zip.*

I have a directory with a name that contains Japanese characters, and I need to use the zip utils in java.util.zip to write it to a zip file. Writing the zip file succeeds, but when I open the resulting zip file with either Windows' built-in compressed file utility or 7-Zip, the directory with Japanese characters in the name appears as a bunch of garbage characters. I do have the Japanese/East Asian language pack installed on my system -- I can create directories with Japanese names, so that isn't the issue.
Interestingly, if I write a separate script to read the resulting zip file using java.util.zip, the directory name is correct, and I can extract the contents of the zip into appropriately named directories, with Japanese characters. But I can't do this using the commercial zip tools that I've tried, which is undoubtedly what our customers will want to do.
Any ideas about what is causing this problem, and how I can work around it?
I know about this bug, but I still need a workaround for this case.

TrueZIP claims to do this better:
The J2SE API always uses UTF-8 (eight
bit Unicode character set) for entry
names and comments instead of CP437
(a.k.a. IBM437, the genuine IBM-PC
character set), which is used by the
de-facto standard PKZIP from PKWARE.
As a result, you cannot read or write
ZIP files with international entry
file names such as e.g. "täscht.txt"
in a ZIP file created by a (southern)
German.
[description of other problems omitted]
The TrueZIP Library has been developed to overcome these limitations/disadvantages.

Miracles indeed happen, and Sun/Oracle did really fix the long-living bug/rfe:
Now it's possible to [set up filename encodings upon creating][1] the zip file/stream (requires Java 7).
[1]: http://download.java.net/jdk7/docs/api/java/util/zip/ZipOutputStream.html#ZipOutputStream(java.io.OutputStream, java.nio.charset.Charset)

If java.util.zip still behaves as this post describes, I'm not sure if it is possible (with the built-in classes). I have seen Chilkat's Java Zip library mentioned before as a way to get this to work, but have never used it.

How do I properly store and retrieve internationalized Strings in properties files?

I'm experimenting with internationalization by making a Hello World program that uses properties files + ResourceBundle to get different strings.
Specifically, I have a file "messages_en_US.properties" that stores "hello.world=Hello World!", which works fine of course.
I then have a file "messages_ja_JP.properties" which I've tried all sorts of things with, but it always appears as some type of garbled string when printed to the console or in Swing. The problem is obviously with the reading of the content into a Java string, as a Java string in Japanese typed directly into the source can print fine.
Things I've tried:
The .properties file in UTF-8 encoding with the Japanese string as-is for the value. Something I read indicates that Java expects a properties file to be in the native encoding of the system...? It didn't work either way.
The file in default encoding (ISO-8859-1) and the value stored as escaped Unicode created by the native2ascii program included with Java. Tried with a source file in various Japanese encodings... SHIFT-JIS, EUC-JP, ISO-2022-JP.
Edit:
I actually figured this out while I was typing this, but I figured I'd post it anyway and answer it in case it helps anyone.

I realized that native2ascii was assuming (surprise) that it was converting from my operating system's default encoding each time, and as such not producing the correct escaped Unicode string.
Running native2ascii with the "-encoding encoding_name" option where encoding_name was the name of the source file's encoding (SHIFT-JIS in this case) produced the correct result and everything works fine.
Ant also has a native2ascii task that runs native2ascii on a set of input files and sends output files wherever you want, so I was able to add a builder that does that in Eclipse so that my source folder has the strings in their original encoding for easy editing and building automatically puts converted files of the same name in the output folder.

As of JDK 1.6, Properties has a load() method that accepts a Reader. That means you can save all the property files as UTF-8 and read them all directly by passing an InputStreamReader to load(). I think that's the most elegant solution, but it requires your app to run on a Java 6 runtime.
Historically, load() only accepted an InputStream, and the stream was decoded as ISO-8859-1. Not the system default encoding, always ISO-8859-1. That's important, because it makes a certain hack possible. Say your property file is stored as UTF-8. After you retrieve a property, you can re-encode it as ISO-8859-1 and decode it again as UTF-8, like this:
String realProp = new String(prop.getBytes("ISO-8859-1"), "UTF-8");
It's ugly and fragile, but it does work. But I think the best solution, at least for the next few years, is the one you found: bulk-convert the files with native2ascii using a build tool like Ant.

An alternative way to handle the properties files is:
http://www.unipad.org/main/
This is an editor which can read/write files in \u unicode escape format, this is the format native2ascii creates.
It don't know how well it works with Japanese, I've used it for Hungarian.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.