Java encoding with Eclipse and Maven

Java encoding with Eclipse and Maven - java

I have often seen problems arise with encoding. Therefore I have written down this instruction set to do all the needed actions in order to make things work (with encoding).
This set is related to Eclipse but it will also guide with maven settings.
The issue with encoding is most problematic when using scandinavian letters in java files (åäö, and they had actual meaning on runtime).
An example case is having a constant variable in a java file, that contains a scandic letter and it is used to identify a value from incoming stream (wich is in UTF-8).
Also the underlying OS may be Windows and they use cp1252 by default.
E.g. the following code:
#Test
public void scandicTest() {
System.out.println("scandics: åäö");
}
When everything is configured correctly (e.g. in eclipse), running this test will produce:
scandics: åäö
But if you run this via Maven (from command line or in eclipse => mvn test), you will have:
scandics: ���
First of all, the encoding needs to be changed in eclipse and also in the maven pom.xml to read and store files correctly and for the eclipse to use correct encoding when saving the files / running tests.
However the constant value in the java file itself remains corrupted even that the files read in are correct (containing the scandic letters) when the Maven and the resulting java code handled the incoming streams (compiled & run the tests).
The System Java still uses a OS specific default encoding even that everything else is set correctly. For this reason you can not configure all within the project, you must do it for the OS-JVM also.

I will explain all the the encoding steps needed for this, even that there are multiple answers for this "common" part already (at least for step 2). My particular case is to resolve step 3.
Configure the eclipse:
Open: Window > Preferences
Type 'encoding' in the search field
There will be lots of entries, but first select the 'General > Workspace'
Find the 'Text file encoding' and select: Other > UTF-8
You also want/need to set the encoding also for all the 'General > Content Types'
Select 'text' item from the right hand panel (will open a list of file types), and browse through all the types. Set their 'Default encoding' to 'UTF-8'
Click the 'update' button to persist the change.
You may need to do this also for all the other entries and items found with the search.
E.g. 'Web > CSS Files > Encoding' | ISO 10646/Unicode(UTF-8)
When all set, the Eclipse should behave properly with the encoding.
Set the encoding in maven.pom.xml
<project>
...
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
</properties>
...
</project>
You may need to set the encoding for all plugins also.
<plugin>
...
<configuration>
<encoding>UTF-8</encoding>
...
</configuration>
</plugin>
or
<plugin>
<executions>
<execution>
<configuration>
<encoding>UTF-8</encoding>
...
</configuration>
...
</execution>
</executions>
</plugin>
Though i am not sure if the latter is mandatory or if it will take the default actually.
Configure the OS
You need to set environment variable JAVA_TOOL_OPTIONS with value -Dfile.encoding=UTF8
As suggested in comment, here's some more info for converting a file:
You should note that all the files must have the UTF-8 encoding in order them to work. If you edit everything via eclipse with the given configuration, they will be as UTF-8.
If you receive a file that you should process with your code, you may need to convert that. You can simply do that by opening it in eclipse and saving the file again (you may need to add and remove a character to enable saving).
If you can use NotePad++, there is an 'encoding' menu for converting the file.
When converting a file, the scandics may get corrupted sometimes, so you need to check them manually after conversion.
And one more thing. The files saved in other tools, may have the BOM. (Byte Order Mark). This 'character' is invisible and for example an XML file containing this can not be read in by some parsers.
You can remove the BOM mark by opening the file in eclipse and setting the cursor before the first character in the file, then tab once the 'backspace'. Nothing changes, but the character gets actually removed and the file works then.
NotePad may insert the BOM-mark, so do not use it for editing XML files!

Related

javafx-maven-plugin mac pkg installer does not show app icon

I use the following maven plugin configuration on my mac to generate the native installers.
<plugin>
<groupId>com.zenjava</groupId>
<artifactId>javafx-maven-plugin</artifactId>
<version>8.5.0</version>
<configuration>
<appName>${project.name}</appName>
<title>${project.name}</title>
<description>${project.description}</description>
<vendor>example</vendor>
<certCountry>com</certCountry>
<mainClass>${mainClass}</mainClass>
<needMenu>true</needMenu>
<additionalAppResources>src/main/deploy/package/all</additionalAppResources>
<bundleArguments>
<icon.ico>src/main/resources/icons/Icon.ico</icon.ico>
<icon.png>src/main/resources/icons/Icon_32.png</icon.png>
<icon.icns>src/main/resources/icons/Icon.icns</icon.icns>
</bundleArguments>
<jfxMainAppJarName>${project.build.finalName}.jar</jfxMainAppJarName>
</configuration>
</plugin>
The Example.app folder and the Example.dmg installer both show the correct app icon, but the Example.pkg installer shows the plain java jar image (1).
How can change that image (1)?
Is it possible to change the small icon in the title bar (2) as well?
The folder structure of my project:
src
main
java
*.java
resources
icons
Icon.ico
Icon.icns
Icon_*.png
deploy
package
all
LICENSE
pom.xml
I tried with Oracle Java JDK 8 Update 40 and Update 101 (64bit)
See also: javafx-maven-plugin#224

To make it dedundant, I'm pasting the answer here too:
Hi there,
this is no bug, it is "just" undocumented (seems there are a lot of people who know this, or very few people who use this feature).
Please see the getConfig_BackgroundImage-method:
https://github.com/teamfx/openjfx-8u-dev-rt/blob/fd634925571310284b02d89ff512552e795ba5e8/modules/fxpackager/src/main/java/com/oracle/tools/packager/mac/MacPkgBundler.java#L192
private File getConfig_BackgroundImage(Map<String, ? super Object> params) {
return new File(CONFIG_ROOT.fetchFrom(params), APP_NAME.fetchFrom(params) + "-background.png");
}
Please create some image and place it below src/main/deploy/package/macosx, it has to be PNG-fileformat. This file should be named ${project.name}-background.png to get fetched by the bundler.
This should have been printed out to you while having <verbose> set to true.
General advice: turn VERBOSE-switch on ;) the packager itself behaves different (like not removing the temporary created working-folder, making it possible to further adjust your stuff) and prints out more important debugging-stuff.
The verbose-hint is even mentioned in the official documentation:
https://docs.oracle.com/javase/8/docs/technotes/guides/deploy/self-contained-packaging.html#BCGHHDGC

Late reply, but if you still wonder how to change the small icon in the title bar (2) you can do that simply by right-clicking the pkg-file and choose Get Info in the menu, then you simply drag-and-drop your .icns-file on top of the current one next to the pkg name. This will change the pkg-icon and what is shown in the title bar during installation. This does not appear to break the product signature either which is good, see picture below...
To verify signature still holds after icon change:
spctl -a -v --type install MyAppName-1.0.pkg
MyAppName-1.0.pkg: accepted
source=Developer ID

Encoding for project set to UTF-8, default charset returns windows-1252

I've ran into an issue with encoding. Not sure if it's IDE related but I'm using NetBeans 7.4. I got this piece of code in my J2EE project:
String test = "kukuřičné";
System.out.println(new String(test.getBytes("UTF-8"))); // should display ok
System.out.println(new String(test.getBytes("ISO-8859-1")));
System.out.println(new String(test.getBytes("UTF-16")));
System.out.println(new String(test.getBytes("US-ASCII")));
System.out.println(new String(test.getBytes("windows-1250")));
System.out.println(test); // should display ok
And when I run it, it never displays properly. UTF-8 should be able to print that out ok but it doesn't. Also when I tried:
System.out.println(Charset.defaultCharset());
it returned windows-1252. The project is set to UTF-8 encoding. I've even tried resaving this specific java file in UTF-8 but it still doesn't display properly.
I've tried to create J2SE project on the other hand and when I run the same code it displays properly. Also the default charset returns UTF-8.
Both projects are set the UTF-8 encoding.
I want my J2EE project to act the same like the J2SE one. I didn't notice this issue until I updated my java to version 1.7.0_51-b13 but again I'm not sure if that is related.
I'm experiencing the same issue like this guy: http://forums.netbeans.org/ptopic37752.html
I've also tried setting the default encoding for the whole IDE: -J-Dfile.encoding=UTF-8 but it didn't help.
I've noticed an important fact. When I create a new web application it displays ok. When I create new Maven web application it displays incorrectly.
Found the same issue here: https://netbeans.org/bugzilla/show_bug.cgi?id=224526
I still haven't fixed it yet. There's still no solution working.
In my pom.xml the encoding is set properly, but it still shows windows-1252 in the end.
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

I've spend few hours trying to find the best solution.
First of all this is an issue of maven which picks up platform encoding and uses it even though you've specified different encoding to be used. Maven doesn't seem to care (it even prints to console that it's using UTF-8 but when you run a file with the code above, it won't display properly).
I've managed to tackle this issue by setting a system variable:
JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8
There should be another option instead of setting system variables and that is to set it as additional compiler parameter.
like javac -Dfile.encoding=UTF8

You are mixing a few concepts here:
the project encoding is the encoding used to save the Java source files (xxxx.java) - it has nothing to do with how your code executes
test.getBytes("UTF-8") returns a series of bytes representing your String in UTF-8 encoding
to recreate the original string, you need to explicitly give the encoding, unless it is the default of your machine: new String(test.getBytes("UTF-8"), StandardCharsets.UTF_8)

Why do I need to escape unicode in java source files?

Please note that I'm not asking how but why. And I don't know if it's a RCP specific problem or if it's something inherent to java.
My java source files are encoded in UTF-8.
If I define my literal strings like this :
new Language("fr", "Français"),
new Language("zh", "中文")
It works as I expect when I use the string in the application by launching it from Eclipse as an Eclipse application :
But if fails when I launch the .exe built by the "Eclipse Product Export Wizard" :
The solution I use is to escape the chars like this :
new Language("fr", "Fran\u00e7ais"), // Français
new Language("zh", "\u4e2d\u6587") // 中文
There is no problem in doing this (all my other strings are in properties files, only the languages names are hardcoded) but I'd like to understand.
I thought the compiler had to convert the java literal strings when building the bytecode. So why is the unicode escaping necessary ? Is it wrong to use use high range unicode chars in java source files ? What happens exactly to those chars at compilation and in what it is different from the handling of escaped chars ? Is the problem just related to RCP cache ?

It appears that the Eclipse Product Export Wizard is not interpreting your files as UTF-8. Perhaps you need to run Eclipse's JVM with the encoding set to UTF-8 (-Dfile.encoding=UTF8 in eclipse.ini)?
(Copypasta'd at OPs request)

When exporting a plug-in, it gets compiled through a process separate from the normal build process within the IDE. There is a known bug that the build process (PDE.Build) disregards the text encoding used by the IDE.
The export can be made to work properly by specifying the text encoding in the build.properties file of your plugin
javacDefaultEncoding.. =UTF-8

Saving JSP as UTF-8 in NetBeans

i've got some jsp files from another developers and now need to work with them. When i add to the document any UTF-8 char and want to save the document, NetBeans automatically offers me saving in ISO-8859-1.
Actually i'm getting this message from NetBeans:
The index.jsp contains characters
which will probably be damaged during
conversion to the ISO-8859-1 character
set. Do you want to save the file
using this character set? (Yes/No)
NB didn't offer me any other option like saving the file as UTF-8 (as it should be already written in).
I don't know how to save those jsp files in the character set they are already written in.
And don't tell me, that changing the content of the file itself (which is uneffective due to including headers etc. from other files) is the only way...
http://forums.netbeans.org/topic8750.html

Firstly; don't forget to consider this line at top:
<%#page contentType="text/html" pageEncoding="UTF-8"%>
Secondly;
In the NetBeans folder there is a config file. There should be a line like that:
netbeans_default_options="-J-Xms32m -J-Xmx128m -J-XX:PermSize=32m -J-XX:MaxPermSize=160m -J-Xverify:none -J-Dapple.laf.useScreenMenuBar=true"
Add this to the end of the line:
-J-Dfile.encoding=UTF-8
Thirdly:
NetBeans implements a project encoding setting.
To change the language encoding for a project:
Right-click a project node in the Projects windows and choose Properties.
Under Sources, select an encoding value from the Encoding drop-down field.
The encoding affects at least:
* how non-ASCII characters are displayed in the editor window when you open files
* Java file compilation of sources containing non-ASCII identifiers, string literals, or comments
* textual search for international characters over the project
Starting from NetBeans IDE 6.8, you can also specify the encoding that will be used at runtime. For example, this can be useful when the encoding for the operating system on which the application will run is different from your project's encoding.
To specify the encoding to be used at runtime:
In the Files window for your project, open nbproject > private > private.properties
Add the following line to the private.properties file and save changes:
runtime.encoding = < encoding >
This encoding will override the encoding setting for your project and will be used when running your application.
In general,
*.properties files always use ISO-8859-1 encoding plus \uXXXX escapes. (International characters will be displayed natively in the editor but stored as an escape on disk.)
*.xml files and some *.html files can specify their own encodings, regardless of the project encoding. For such files, the IDE's editor ignores the project encoding.
These may help you.
Sources for my answer that I used:
Link1: http://forums.netbeans.org/topic33.html
Link2: http://wiki.netbeans.org/FaqI18nProjectEncoding

Maven filter garbling special characters

I have a resource file with the following string in it, note the special characters:
Questa funzionalità non è sostenuta: {0} {1}
After Maven does its process-resources (which I need for something else) I get:
Questa funzionalitï¿½ non ï¿½ sostenuta: {0} {1}
Please tell me there is an easy fix to this?

The text files that held the strings were Java properties files. By default, most files in an Eclipse project inherit the default encoding scheme from the container (Eclipse) -- in my case that is UTF-8. If you just manually add a text file to the project it does not set it to UTF-8!!
So my properties files were actually encoded as ISO-8859-1. I changed the default encoding in Eclipse by clicking right on the file and selecting properties. I then was forced to re-enter ALL the special characters.
The other part of the fix was to tell the Maven process resource plug-in to use UTF-8 encoding while processing resources. Instructions for that are here:
http://maven.apache.org/plugins/maven-resources-plugin/examples/encoding.html
And of course I had to implement a UTF-8 ResourceBundle.Control because (for backwards compatibility) the detault ResourceBundle is still ISO-8859-1. Details on that class can be found here:
http://www.mail-archive.com/stripes-users#lists.sourceforge.net/msg03972.html
Hope this helps somebody someday.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.