Character encoding for French locale while creating PDF - Java

Character encoding for French locale while creating PDF - Java - java

I have a spring boot application which renders a XML document into PDF. The document contains French characters likeé à. While running the application through STS I have no issues the PDF is generated as expected. But while running the application through command line using java -jar target\application.jar the generated PDF has French characters as Ã© Ã. I am converting the XML into byte[] and creating the PDF. I couldn't figure out a way out. Any help is much appreciated.

Two options:
Force the encoding with the file.encoding argument, such as -Dfile.encoding=utf-8.
java -Dfile.encoding=utf-8 -jar target\application.jar
(better) When you convert the xml file into a byte array, specify the encoding:
Reader reader = new InputStreamReader(new FileInputStream("/path/to/xml/file"), StandardCharsets.UTF_8);
// do your file reading ...

Related

Java 11 Freemaker with utf-8 resources

We have a Java application (OpenJDK 1.8) - a service generating some payload using the Freemaker templates (mvn version 2.3.31). The content translations are handled using the resource bundles (.property files with translations, e.g. template.properties, template_fi.properties, template_bg.properties, ..). The properties files have content of the utf-8 encoding and all works good.
When migrating to Java 11 (Zulu OpenJDK 11), we started to have an issue with translations, which were not "latin" - having characters not in the ISO-8859-1 charset. All characters out of the charset encoding were changed to ?. (yet the resource files were utf-8 encoded, changing the content using native2ascii did not help)
After some time / experiments we solved the encoding issue using the system property:
-D java.util.PropertyResourceBundle.encoding=ISO-8859-1
I'm looking for an explanation - WHY? I find the property value counterintuitive and I'd like to understand the process.
According to the documentation I understand the ResourceBundle suppose to read the property in using the ISO-8859-1 and throw an exception when encountering an invalid character. The system properly mentioned above should enable having the property file encoded in UTF. Yet the workable solution was explicitly setting the ISO-8859-1 encoding
And indeed testing pure Java implementation, the proper output is achieved using the UTF-8 encoding
System.setProperty("java.util.PropertyResourceBundle.encoding","UTF-8");
// "ISO-8859-1" <- not working
// System.setProperty("java.util.PropertyResourceBundle.encoding","ISO-8859-1");
Locale locale = Locale.forLanguageTag("bg-BG");
ResourceBundle testBundle = ResourceBundle.getBundle("test", locale);
System.out.println(testBundle.getString("name"));
// return encoded, so the terminal doeesn't break the non-latin characters
System.out.println(
Base64.getEncoder()
.encodeToString(testBundle.getString("name").getBytes("UTF-8")));
I assume that the Fremaker library somehow makes some encoding changes internally, yet not sure what/why, the Freemaker internal localized string is a simple bundle

Issue using Coldfusion FileExists when checking files with UTF-8 and ASCII

When trying to detect the existence of the files that were encoded in UTF-8 with FileExists function, the files could not be found.
I found that in the Coldfusion server the Java File Encoding was originally set to "UTF-8". For some unknown reason it was back to default "ASCII". I suspect that this is the issue.
For example, a user uploaded a photo named 云拼花.jpg while the server Java file encoding was set to UTF-8, and now with the server Java file encoding set to ASCII, I use
<cfif FileExists("#currentpath##pic#")>
The result will be not found i.e. file does not exist. However if I simply display it using:
<IMG SRC="/images/#pic#">
The image will display. This caused issues when I try to test the existence of the images. The images are there but can't be found by FileExists.
Now the directory has a mix of files encoded in either UTF-8 or ASCII. Is there anyway to:
force any upload file to UTF-8 encoding
check for the existence of the file
regardless of CF Admin Java File Encoding setting?

Add this to your page.
<cfprocessingdirective pageencoding="utf-8">
This should fix the issue.

Umlaut problems with Spark job writing to an NFSv3 mounted volume

I am trying to copy files to an nfsv3 mounted volume during a spark job. Some of the files contain umlauts. For example:
Malformed input or input contains unmappable characters:
/import/nfsmountpoint/Währungszählmaske.pdf
The error occurs in the following line of scala code:
//targetPath is String and looks ok
val target = Paths.get(targetPath)
The file encoding is shown as ANSI X3.4-1968 although the linux locale on the spark machines is set to en_US.UTF-8.
I already tried to change the locale for the spark job itself using the following arguments:
--conf 'spark.executor.extraJavaOptions=-Dsun.jnu.encoding=UTF8 -Dfile.encoding=UTF8'
--conf 'spark.driver.extraJavaOptions=-Dsun.jnu.encoding=UTF8 -Dfile.encoding=UTF8'
This solves the error, but the filename on the target volume looks like this:
/import/nfsmountpoint/W?hrungsz?hlmaske.pdf
The volume mountpoint is:
hnnetapp666.mydomain:/vol/nfsmountpoint on /import/nfsmountpoint type nfs (rw,nosuid,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,noacl,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=4.14.1.36,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=4.14.1.36)
Is there a possible way to fix this?

Solved this by setting the encoding settings like mentioned above and manually converting from and to UTF-8:
Solution for encoding conversion
Just using NFSv4 with UTF-8 support would have been an easier solution.

AndroidSutdio "bug" : Unable to view correctly a foreign UTF8 string in debug or android layout (Windows)

I'm unable to view correctly my UTF8 string in AndroidSutdio debug or android layout.
below is my code :
String test1 = "hélǐ";
Results is test1 = hÃ©lÇ�
test1 Looks similar to reading my UTF8 string with ANSI encoding in notepad++. However if I harcode it into the XML layout directly (instead of using Button.setHint() method) I can see hélǐ correctly.
UPDATE 1 : thanks Jon Skeet for pointing out test1.lenght() = 6 and not 4 thus is not a display issue.
UPDATE 2 : thanks Joop Eggen for pointing out that "h\u00e9l\u012d" return the correct answer
UPDATE 3 : I have copy paste my code to an equivalent android project on eclipse and it works fine. thus it must be AndroidStudio related issue.
UPDATE 4 : added a variable JAVA_TOOL_OPTIONS = -Dfile.encoding=UTF-8 to force javac using that command but with no effect to the result.
UPDATE 5 : I installed AndroidStudio on ubuntu and copy paste my code and it run fine as well there. But how to fix it on AndroidStudio Windows? (as unfortunately I need to use windows)
Anyone ever faced that issue before? how to fix without using \u code
Thanks
note:
I'm using AndroidStudio
in my Javac and Android DX complier I added -encoding utf8
in my AndroidStudio all files encodings are set to utf8 (and i can see utf8 in bottom right)
Charset.defaultCharset() return utf8
InputStreamReader.getEncoding() return utf8
all my xml layout have an utf8 flag on the top
Notepad++ read correctly my copy/paste "hélǐ" with utf8 encoding.

Try h\u00e9l\u012d too. This removes the factor of java source encoding.
Try writing the text to file:
new OutputStreamWriter(new FileOutputStream(file), "UTF-8")
Then the cause should become clearer.
As editor and compiler must use the same encoding, you seem to have done all thing possible for the rest. Especially checking with NotePad++ (- JEdit is possible too). One small point is IDE background compiling and and final compiling.
Also running is a console: the console could be erroneously use the operating system encoding.
All those new String(...) are superflous and erroneously. Do not use that here, as indeed one error could cancel an error between mismatched encoding of editor and compiler.
(In ISO-8859-1 ĭ (i-breve) is not available - hence test3.)

My case was exactly the same as Simon (I am using Android Studio for Mac 0.2.2)
I solved the problem by editing the build.gradle file under the /src folder, adding these lines:
tasks.withType(Compile) {
options.encoding = 'UTF-8'
}
It worked for me. Hope it helps somebody with the same problem.

Fixed in latest version of AndroidStudio

How to upload a file using java without changing its encoding

I have a Java class that upload a text file from a Windows client to a Linux server.
The file I am triyng to upload is encoded using Cp1252 or ISO-8859-1.
When the file is uploaded, it become encoded using utf-8, then strings containing accents like éèà can't be read.
The command
file -i *
in the linux server tells me that it's encoded using utf-8.
I think the encoding was changed diring the upload, so I added this code to my servlet:
String currentEncoding=System.getProperty("file.encoding");
System.setProperty("file.encoding", "Cp1252");
item.write(file);
System.setProperty("file.encoding", currentEncoding);
In the jsp file, I have this code:
<form name="formUpload"
action="..." method="post"
enctype="multipart/form-data" accept-charset="ISO-8859-1">
The lib I use to upload a file is apache commun.
Doe's any one have a clue, cause I'm really runnig out of ideas!
Thanks,
Otmane MALIH

Setting the system property file.encoding will only work when you start Java. Instead, you will have to open the file with this code:
public static BufferedWriter createWriter( File file, Charset charset ) throws IOException {
FileOutputStream stream = new FileOutputStream( file );
return new BufferedWriter( new OutputStreamWriter( stream, charset ) );
}
Use Charset.forName("iso8859-1") as charset parameter.
[EDIT] Your problem is most likely the file command. MacOS is the only OS in the world which can tell you the encoding of a file with confidence. Windows and Linux have to make a guess. This guess can be wrong.
So what you need to do is to open the file with an editor where you specify the encoding. You need to do that on Windows (to make sure that the file really was saved with Cp1252; some applications ignore the platform and always safe their data in UTF-8).
And you need to do the same on Linux. If you just open the file, the editor will take the platform encoding (which is UTF-8 on modern Linux systems) and try to read the file with that -> ISO-8859-1 umlauts will be garbled. But if you open the file with ISO-8859-1, then UTF-8 will be garbled. That's the only way to be sure what the encoding of a text file really is.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Character encoding for French locale while creating PDF - Java - java

Related

Java 11 Freemaker with utf-8 resources

Issue using Coldfusion FileExists when checking files with UTF-8 and ASCII

Umlaut problems with Spark job writing to an NFSv3 mounted volume

AndroidSutdio "bug" : Unable to view correctly a foreign UTF8 string in debug or android layout (Windows)

How to upload a file using java without changing its encoding

Categories

Resources