Running UTF-8 Java file on Windows 10 same as on Linux

Running UTF-8 Java file on Windows 10 same as on Linux - java

I have gradle project I created in linux OS. It contains some java files with standard UTF-8 encoding. One class contains string with central-europe characters.
Sometimes I use my second computer with Windows 10. When I want to run this gradle project correctly on win10 via CMD, I have to resave java files with ISO-8859-2 encoding and repair strings. It is very annoying and I didn't find way to run same with both systems.
I tried:
- JAVA_TOOL_OPTIONS >> -Dfile.encoding=UTF-8 in environmental variables
- CHCP 65001 (before start in CMD)
- UTF-8 in Windows locale setting (beta W10 feature)
Same bad results.
Is there any way?

One alternative would be to write any non-standard characters as Unicode escape sequences, e.g. \u01D1. This would make the file work anywhere with no local environment changes necessary. You also wouldn't have to worry about editing the file on the wrong environment and saving it with the wrong character set. But, of course, the foreign characters wouldn't be so easily legible in the file.
If your Gradle file contains foreign characters, set the environment variable GRADLE_OPTS:
GRADLE_OPTS="-Dfile.encoding=UTF-8"
If you are trying to compile the Java source files with Gradle, you may need to configure the encoding for the compileJava task, within the build.gradle file:
apply plugin: 'java'
tasks.withType(JavaCompile) {
options.encoding = 'UTF-8'
}

Related

Java difference between running from netbeans and cmd

I have a program that writes text data to files. When I run it from netbeans the files are in a correct encoding and you can read them with a notepad. When I run it from cmd using java -cp ....jar the encoding is different.
What may be the issue??
ps. I've checked that the jre. versions are the same that executes (v 1.8.0_31)

Netbeans startup scripts may specify a different encoding than your system default. You can check in your netbeans.conf.
You can set the file.encoding property when invoking java. For example, java -Dfile.encoding=UTF8 -cp... jar.
If you do not want to be surprised when running your code on different environments, even better solution would be to specify the encoding in your source code.
Further reading:
file encoding: Character and Byte Streams
netbeans.conf encoding options: How To Display UTF8 In Netbeans 7?

ant: warning: unmappable character for encoding UTF8

I have seen numerous of questions like mine but they don't answer my question because I'm using ant and I'm not using eclipse. I run this code: ant clean dist and it tells me numerous times that warning: unmappable character for encoding UTF8.
I see on the Java command that there is a -encoding option, but that doesn't help me cuz I'm using the ant.
I'm on Linux and I'm trying to run the developer version of Sentrick; I haven't made no modifications to anything, I just downloaded it and followed all their instructions and it ain't makes no difference. I emailed the developper and they told me it was this problem but I suspect that it is actually something that gotta do with this error at the end:
BUILD FAILED
/home/daniel/sentricksrc/sentrick/build.xml:22: The following error occurred while executing this line:
/home/daniel/sentricksrc/sentrick/ant/common-targets.xml:83: Test de.denkselbst.sentrick.tokeniser.components.DetectedAbbreviationAnnotatorTest failed
I'm not sure what I'm gonna do now because I really need for it to work

Try to change file encoding of your source files and set the Default Java File Encoding to UTF-8 also.
For Ant:
add -Dfile.encoding=UTF8 to your ANT_OPTS environment variable
Setting the Default Java File Encoding to UTF-8:
export JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8
Or you can start up java with an argument -Dfile.encoding=UTF8

The problem is not eclipse or ant. The problem is that you have a build file with special characters in it. Like smart quotes or m-dashes from MS Word. Anyway, you have characters in your XML file that are not part of the UTF-8 character set. So you should fix your XML to remove those invalid characters and replace them with similar looking but valid UTF-8 versions. Look for special characters like &#64 © — ® etc. and replace them with the (c) or whatever is useful to you.
BTW, the bad character is in common-targets.xml at line 83

Changing encoding to Cp 1252 worked for my project with same error. I tried changing eclipse properties several times but it did not help me in any way. I added encoding property to my pom.xml file and the error gone. http://ctrlaltsolve.blogspot.in/2015/11/encoding-properties-in-maven.html

Setting the CLASSPATH from Cygwin

I use Eclipse to develop Java so I have a folder full of Eclipse Java Project folders. The /bin folder resides in each folder, so to run the project from Cygwin, the classpath must be set (on my system) to: "E:/programming/java/workspace/SomeProject/bin". Since there are ~40 projects in my folder, I'd rather make a script to add the paths to the CLASSPATH. My script seems to add the paths to CLASSPATH, but when I try to run Java I get a class not found error. In my .bashrc here is my script:
JAVAWORKSPACE="/cygdrive/e/programming/java/workspace/*"
BIN="/bin;"
for f in $JAVAWORKSPACE
do
if [ -d $f ] ; then
export CLASSPATH="$f$BIN$CLASSPATH"
fi
done
When I start Cygwin and echo $CLASSPATH, all of the directories show up, but java can't find the classes. I have also tried JAVAWORKSPACE="E:\programming\java\workspace\* but this resulted in nothing being added to CLASSPATH. If I go through the Windows settings and manually enter "E:/programming/java/workspace/MyProject/bin" to the CLASSPATH, command line Java has no trouble finding the classes. What's up with this? I'm not sure if it's a problem with the script or if CLASSPATH doesn't like unix-style paths. If I need to add windows paths, please help me change my script to do this. Thanks!

I don't have Cygwin set up right now, but I ran into this problem a number of years ago. Java knows nothing about Cygwin pathnames, and bash treats a single backslash as an escape character, stripping it before it can be transmitted to java(c). If you do
echo E:\programming\java\workspace\*
You'll see it outputs E:programmingjavaworkspace*, not what you're expecting. The key is to either escape the escape chars, like
E:\\programming\\java\\workspace\\*
or even better, use cygpath like this.

Java source file encoding with Chinese character

I import a Java project from Windows platform to Ubuntu.
My Ubuntu is 10.10, Gnome environment: My LANGUAGE is set to en_US:en
My terminal's character encoding is: Unicode (UTF-8)
My IDE is eclipse and text file encoding is: GBK.
In source file, there are some Chinese constant character.
The project build successful on Windows with ant,
but on Ubuntu, I get compile error:
illegal character: \65533
I don't want to use \uxxxx format as the file is already there,
And I've tried the -encoding option for javac, but still can't compile.

I think the problem lies not with Ubuntu, Ubuntu's console, Javac or Eclipse but with the way you transfer the file from windows to Ubuntu. You have to store it as utf-8 before you copy it to Ubuntu otherwise the codepoint-information that is set in your Windows your locale is already lost.

Did you specify the encoding option of the <javac> task in your build.xml?
It should look like this:
<javac encoding="GBK" ...>
If you haven't specified it, then on Windows it will use the platform default encoding (which is GBK in your setup) and on Linux it will use the platform default encoding (which is UTF-8 in your setup).
Since you want the build to work on both platforms (preferably without changing the configuration of either platform), you need to specify the encoding when you compile.

You need to convert you source codes from you windows codepage to UTF-8. Use iconv for this.

How to force a jar to use(or the jvm that jar runs in) utf-8 instead of the system's default encoding

My Windows's default encoding is GBK, and my Eclipse is totally utf-8 encoded.
So an application which runs well in my Eclipse, crashes because the words become unreadable when exported as a jar file;
I have to write the following line in a .bat file to run the application
start java -Dfile.encoding=utf-8 -jar xxx.jar
Now my question is that can I write something in the source code to set the application uses(or the jvm runs in) utf-8 instead of the system's default encoding.

When you open a file for reading, you need to explicitly specify the encoding you want to use for reading the file:
Reader r = new InputStreamReader(new FileInputStream("myfile"), StandardCharsets.UTF_8);
Then the value of the default platform encoding (which you can change using -Dfile.encoding) no longer matters.
Note:
I would normally recommend to always specify the encoding explicitly for any operation that depends on the standard locale, such as character I/O. Many Java API methods default to the platform encoding, which I consider a bad design, because often the platform encoding is not the right one, plus it may suddenly change (if the user e.g. switches OS locale), breaking your app.
So just always say which encoding you want.
There are some cases where the platform encoding is the right one (such as when opening a file the user just created for you), but they are fairly rare.
Note 2:
java.nio.charset.StandardCharsets was introduced in Java 1.7. For older Java versions, you need to specify the input encoding as a String (ugh). The list of possible encodings depends on the JVM, but every JVM is guaranteed to at least have:
US-ASCII, ISO-8859-1,UTF-8,UTF-16BE,UTF-16LE,UTF-16.

There's another way.
If you are sure how you like to encode the input and output, you can save the settings before you compile your jar file.
Here is a example for NetBeans.
Go To Project >> Properties >> Run >> VM Options and type -Dfile. encoding=UTF-8
After that, everything is encoded in UTF-8 every time the Java VM is started.
(I think Eclipse offers the same possibility. If not, just google to VM Options.)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Running UTF-8 Java file on Windows 10 same as on Linux - java

Related

Java difference between running from netbeans and cmd

ant: warning: unmappable character for encoding UTF8

Setting the CLASSPATH from Cygwin

Java source file encoding with Chinese character

How to force a jar to use(or the jvm that jar runs in) utf-8 instead of the system's default encoding

Categories

Resources