Java 11 Freemaker with utf-8 resources - java

We have a Java application (OpenJDK 1.8) - a service generating some payload using the Freemaker templates (mvn version 2.3.31). The content translations are handled using the resource bundles (.property files with translations, e.g. template.properties, template_fi.properties, template_bg.properties, ..). The properties files have content of the utf-8 encoding and all works good.
When migrating to Java 11 (Zulu OpenJDK 11), we started to have an issue with translations, which were not "latin" - having characters not in the ISO-8859-1 charset. All characters out of the charset encoding were changed to ?. (yet the resource files were utf-8 encoded, changing the content using native2ascii did not help)
After some time / experiments we solved the encoding issue using the system property:
-D java.util.PropertyResourceBundle.encoding=ISO-8859-1
I'm looking for an explanation - WHY? I find the property value counterintuitive and I'd like to understand the process.
According to the documentation I understand the ResourceBundle suppose to read the property in using the ISO-8859-1 and throw an exception when encountering an invalid character. The system properly mentioned above should enable having the property file encoded in UTF. Yet the workable solution was explicitly setting the ISO-8859-1 encoding
And indeed testing pure Java implementation, the proper output is achieved using the UTF-8 encoding
System.setProperty("java.util.PropertyResourceBundle.encoding","UTF-8");
// "ISO-8859-1" <- not working
// System.setProperty("java.util.PropertyResourceBundle.encoding","ISO-8859-1");
Locale locale = Locale.forLanguageTag("bg-BG");
ResourceBundle testBundle = ResourceBundle.getBundle("test", locale);
System.out.println(testBundle.getString("name"));
// return encoded, so the terminal doeesn't break the non-latin characters
System.out.println(
Base64.getEncoder()
.encodeToString(testBundle.getString("name").getBytes("UTF-8")));
I assume that the Fremaker library somehow makes some encoding changes internally, yet not sure what/why, the Freemaker internal localized string is a simple bundle

Related

Character encoding for French locale while creating PDF - Java

I have a spring boot application which renders a XML document into PDF. The document contains French characters likeé à. While running the application through STS I have no issues the PDF is generated as expected. But while running the application through command line using java -jar target\application.jar the generated PDF has French characters as é Ã. I am converting the XML into byte[] and creating the PDF. I couldn't figure out a way out. Any help is much appreciated.
Two options:
Force the encoding with the file.encoding argument, such as -Dfile.encoding=utf-8.
java -Dfile.encoding=utf-8 -jar target\application.jar
(better) When you convert the xml file into a byte array, specify the encoding:
Reader reader = new InputStreamReader(new FileInputStream("/path/to/xml/file"), StandardCharsets.UTF_8);
// do your file reading ...

Issue using Coldfusion FileExists when checking files with UTF-8 and ASCII

When trying to detect the existence of the files that were encoded in UTF-8 with FileExists function, the files could not be found.
I found that in the Coldfusion server the Java File Encoding was originally set to "UTF-8". For some unknown reason it was back to default "ASCII". I suspect that this is the issue.
For example, a user uploaded a photo named 云拼花.jpg while the server Java file encoding was set to UTF-8, and now with the server Java file encoding set to ASCII, I use
<cfif FileExists("#currentpath##pic#")>
The result will be not found i.e. file does not exist. However if I simply display it using:
<IMG SRC="/images/#pic#">
The image will display. This caused issues when I try to test the existence of the images. The images are there but can't be found by FileExists.
Now the directory has a mix of files encoded in either UTF-8 or ASCII. Is there anyway to:
force any upload file to UTF-8 encoding
check for the existence of the file
regardless of CF Admin Java File Encoding setting?
Add this to your page.
<cfprocessingdirective pageencoding="utf-8">
This should fix the issue.

Umlaut problems with Spark job writing to an NFSv3 mounted volume

I am trying to copy files to an nfsv3 mounted volume during a spark job. Some of the files contain umlauts. For example:
Malformed input or input contains unmappable characters:
/import/nfsmountpoint/Währungszählmaske.pdf
The error occurs in the following line of scala code:
//targetPath is String and looks ok
val target = Paths.get(targetPath)
The file encoding is shown as ANSI X3.4-1968 although the linux locale on the spark machines is set to en_US.UTF-8.
I already tried to change the locale for the spark job itself using the following arguments:
--conf 'spark.executor.extraJavaOptions=-Dsun.jnu.encoding=UTF8 -Dfile.encoding=UTF8'
--conf 'spark.driver.extraJavaOptions=-Dsun.jnu.encoding=UTF8 -Dfile.encoding=UTF8'
This solves the error, but the filename on the target volume looks like this:
/import/nfsmountpoint/W?hrungsz?hlmaske.pdf
The volume mountpoint is:
hnnetapp666.mydomain:/vol/nfsmountpoint on /import/nfsmountpoint type nfs (rw,nosuid,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,noacl,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=4.14.1.36,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=4.14.1.36)
Is there a possible way to fix this?
Solved this by setting the encoding settings like mentioned above and manually converting from and to UTF-8:
Solution for encoding conversion
Just using NFSv4 with UTF-8 support would have been an easier solution.

AndroidSutdio "bug" : Unable to view correctly a foreign UTF8 string in debug or android layout (Windows)

I'm unable to view correctly my UTF8 string in AndroidSutdio debug or android layout.
below is my code :
String test1 = "hélǐ";
Results is test1 = hél�
test1 Looks similar to reading my UTF8 string with ANSI encoding in notepad++. However if I harcode it into the XML layout directly (instead of using Button.setHint() method) I can see hélǐ correctly.
UPDATE 1 : thanks Jon Skeet for pointing out test1.lenght() = 6 and not 4 thus is not a display issue.
UPDATE 2 : thanks Joop Eggen for pointing out that "h\u00e9l\u012d" return the correct answer
UPDATE 3 : I have copy paste my code to an equivalent android project on eclipse and it works fine. thus it must be AndroidStudio related issue.
UPDATE 4 : added a variable JAVA_TOOL_OPTIONS = -Dfile.encoding=UTF-8 to force javac using that command but with no effect to the result.
UPDATE 5 : I installed AndroidStudio on ubuntu and copy paste my code and it run fine as well there. But how to fix it on AndroidStudio Windows? (as unfortunately I need to use windows)
Anyone ever faced that issue before? how to fix without using \u code
Thanks
note:
I'm using AndroidStudio
in my Javac and Android DX complier I added -encoding utf8
in my AndroidStudio all files encodings are set to utf8 (and i can see utf8 in bottom right)
Charset.defaultCharset() return utf8
InputStreamReader.getEncoding() return utf8
all my xml layout have an utf8 flag on the top
Notepad++ read correctly my copy/paste "hélǐ" with utf8 encoding.
Try h\u00e9l\u012d too. This removes the factor of java source encoding.
Try writing the text to file:
new OutputStreamWriter(new FileOutputStream(file), "UTF-8")
Then the cause should become clearer.
As editor and compiler must use the same encoding, you seem to have done all thing possible for the rest. Especially checking with NotePad++ (- JEdit is possible too). One small point is IDE background compiling and and final compiling.
Also running is a console: the console could be erroneously use the operating system encoding.
All those new String(...) are superflous and erroneously. Do not use that here, as indeed one error could cancel an error between mismatched encoding of editor and compiler.
(In ISO-8859-1 ĭ (i-breve) is not available - hence test3.)
My case was exactly the same as Simon (I am using Android Studio for Mac 0.2.2)
I solved the problem by editing the build.gradle file under the /src folder, adding these lines:
tasks.withType(Compile) {
options.encoding = 'UTF-8'
}
It worked for me. Hope it helps somebody with the same problem.
Fixed in latest version of AndroidStudio

Comparing unicode characters in Junit

I had problems in some flow with unicode chars in some of my flows. So i fixed the flow and added a test.
assertEquals("Björk", buyingOption.getArtist());
the buyingOption.getArtist() will return the same name that is on ,here is a snippet :
but junit will fail with the message :
junit.framework.ComparisonFailure: null
Expected :Bj?rk
Actual :Bj?rk
at com.delver.update.system.AECSystemTest.basicOperationtsTest1(AECSystemTest.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
This is probably due to the default encoding used for your Java source files. The ö in the string literal in the JUnit source code is probably being converted to something else when the test is compiled.
To avoid this, use Unicode escapes (\uxxxx) in the string literals in your JUnit source code:
assertEquals("Bj\u00F6rk", buyingOption.getArtist());
I agree with Grodriguez but would like to suggest you to change your default encoding to UTF-8 and forget about this kind of problems.
How to do this? It depends on your IDE. For example in Eclipse go to Window/Preferences then type "encoding", choose Workspace and change encoding to UTf-8
I found the solution was to change the default encoding before running mvn test
My fix to this issue was to set the ENV var JAVA_TOOL_OPTIONS before running
export JAVA_TOOL_OPTIONS="$JAVA_TOOL_OPTIONS -Dfile.encoding=UTF8"
mvn test

Categories

Resources