Netbeans Java Console Encoding UTF-8 and Umlauts - java

My problem ist about a little java program written using NetBeans 7.4. There is obviously an encoding issue since I need to handle German input containing special characters (äüöÄÜÖß).
Reading in text from files works like a charm, special characters are saved and displayed as expected:
String fileText = new Scanner(file, "UTF-8" ).useDelimiter("\\A").next();
However I also need to read the user input from console - in this case I only care about the one in NetBeas itself since this code will not be used outside the IDE. Entering special characters here leads to the usual symbols (box, question mark) instead of the umlauts.
Scanner scanner = new Scanner(System.in, "UTF-8");
userQuery = scanner.nextLine();
Input: könig
Output: k�nig
I have been stuck on this for quite a while now, having tried every option Google brought my way, but so far no luck. Most people seem to have fixed this by changing the standard encoding (Project Properties -> Sources -> Encoding), which is already set to UTF-8 though.
There is no issue using those characters in any other way, such as saving them in strings or printing them to the console. So the issue seems to be with the NetBeans console encoding setting.
I tried manually changing that without any luck. I'm not sure this setting even affects the NetBeans console, since trying to access the console object just returns null.
System.setProperty("console.encoding", "UTF-8");
Anybody have an idea where to look next? I have already exhausted all Google searches (not much useful on pages > 5, as always).
Thanks!

I have also been confused by I/O encoding in the Netbeans console window for years, and have finally found out why.
At least on my system (Netbeans 8.1 on Windows 10), the Netbeans console confusingly uses UTF-8 for output (that's why your output works for UTF-8 input files), but uses Windows-1252 for input. (So much for POLA :)
So if you change your scanner to use that encoding
Scanner scanner = new Scanner(System.in, "Windows-1252");
everything should work fine. Or you can tell Netbeans to use UTF-8 as console input encoding by adding
-J-Dfile.encoding=UTF-8
to the variable netbeans_default_options in etc/netbeans.conf (in Netbeans installation directory).
For maximum consistency with running the app from the system command line, I would have preferred to use Windows-1252 (or rather IBM850) as Netbeans console encoding on Windows. But Netbeans seems to ignore the given switch for the console output, it always uses UTF-8, so that is the best we can do.
I really like Netbeans, but I'd wish they would clean up this mess...

Related

Java, not printing Chinese Character in Visual Studios Code, but works in Python

When I do
System.out.println('说');
It just prints "?"
In the bottom right corner it says UTF-8 (so that is good).
I have no idea what I am doing wrong, any help much appreciated.
PS: When I make a python file and print it, it prints it properly. But not in java :(
I tried doing System.setProperty("file.encoding", "UTF-8"); but same result sadly. I tried running code in repl.it, and it works. But not in visual studios.
Note that windows Locale is set to support UTF-8. And I am using Consolas Font, which should support UTF-8.
I also tried uninstalling VS and installing it again - it didn't fix anything.
I am also using terminal for all output.
What is your machine language? Change the system language to Chinese, and modify the system locale to Chinese(you may need to restart the computer), then restart vscode and print out Chinese characters.
Another simple and effective way is to use the Code Runner extension. Install the extension and execute the script with Run Code, the OUTPUT panel will display the result.

Printing Japanese does not work, despite Java now using UTF-be 8 by default

I have Eclipse 2022, which now has UTF-8 as default, and I'm trying to print Japanese characters. Such as:
System.out.println("わ");
The problem is that in the console, it just prints out ? marks. The closest thing I found to someone having the same problem as me would be:
UTF-8 text (Hindi) not getting displayed on Browser window or Eclipse console
This has nothing to do with Java's UTF-8 support. String and println() always already correctly supported Unicode out of the box.
The result of this solely depends on whether your console supports Unicode or not. I suspect you used Windows CMD, which does not support Unicode by default.
Also see: How to use unicode characters in Windows command line?
The mentioned "UTF-8 by default" feature is for stuff like new FileReader(...) or new String(...), APIs that defaulted to the platform's default encoding in the past.

Eclipse UTF-8-weird characters

I am writing a program in java with Eclipse IDE, and i want to write my comments in Greek. So i changed the encoding from Window->Preferences->General->Content Types->Text->Java Source File, to UTF-8. The comments in my code are ok but when i run my program some words contains weird characters e.g San Germ�n (San Germán). If i change the encoding to ISO-8859-1, all are ok when i run the program but the comments in my code are not(weird characters !). So, what is going wrong with it?
Edit: My program is in java swing and the weird characters with UTF-8 are Strings in cells of a JTable.
EDIT(2): Ok, i solve my problem i keep the UTF-8 encoding for java file but i change the encoding of the strings. String k = new String(myStringInByteArray,ISO-8859-1);
This is most likely due to the compiler not using the correct character encoding when reading your source. This is a very common source of error when moving between systems.
The typical way to solve it is to use plain ASCII (which is identical in both Windows 1252 and UTF-8) and the "\u1234" encoding scheme (unicode character 0x1234), but it is a bit cumbersome to handle as Eclipse (last time I looked) did not transparently support this.
The property file editor does, though, so a reasonable suggestion could be that you put all your strings in a property file, and load the strings as resources when needing to display them. This is also an excellent introduction to Locales which are needed when you want to have your application be able to speak more than one language.

JTable won't display Unicode correctly when the application is executed from the command line or a jar file. It works fine in eclipse, though

I'm writing an application that reads a text file containing a list of vocabulary words in both English and Chinese. These are then displayed in a JTable. When I run or debug the app in Eclipse, everything displays fine. I can see and read the characters and the English. However, when I execute the app from the command line or from an executable jar, it's all wrong. The characters show up as either squares or as gibberish.
I also have a text box that when I type Chinese into it, it displays correctly.
My first thought was that it was a font problem. I was using a font installed on my system. Since I can't guarantee that the person using this app will have that font, I moved it to a resource folder and load the font from a file. The font appears as though it's been loaded so I'm convinced it's not a font issue.
I found another question that suggested using -Dfile.encoding=utf-8. I've tried this and it did not work.
Would the brilliant folks at Stack Overflow have any advice on how to make this work?
I'm writing this on a non-chinese version of Windows.
Well then you won't ever be able to get a Java program to produce Chinese command-line output.
Java, like almost all languages, uses the C standard library which has byte-based IO. The Windows command prompt interprets byte-based IO using the locale-specific default code page. That's never a UTF, so Unicode characters outside of the current locale's default code page just won't work.
(In theory you should be able to get it to work by changing your console fonts and using chcp 65001 (UTF-8) together with -Dfile.encoding=UTF-8, but in practice it doesn't work reliably due to bugs in the C runtime. Unicode on the command prompt is a long-standing sore point.)

Java application failing on special characters

An application I am working on reads information from files to populate a database. Some of the characters in the files are non-English, for example accented French characters.
The application is working fine in Windows but on our Solaris machine it is failing to recognise the special characters and is throwing an exception. For example when it encounters the accented e in "Gérer" it says :-
Encountered: "\u0161" (353), after : "\'G\u00c3\u00a9rer les mod\u00c3"
(an exception which is thrown from our application)
I suspect that in order to stop this from happening I need to change the file.encoding property of the JVM. I tried to do this via System.setProperty() but it has not stopped the error from occurring.
Are there any suggestions for what I could do? I was thinking about setting the basic locale of the solaris platform in /etc/default/init to be UTF-8. Does anyone think this might help?
Any thoughts are much appreciated.
That looks like a file that was converted by native2ascii using the wrong parameters. To demonstrate, create a file with the contents
Gérer les modÚ
and save it as "a.txt" with the encoding UTF-8. Then run this command:
native2ascii -encoding windows-1252 a.txt b.txt
Open the new file and you should see this:
G\u00c3\u00a9rer les mod\u00c3\u0161
Now reverse the process, but specify ISO-8859-1 this time:
native2ascii -reverse -encoding ISO-8859-1 b.txt c.txt
Read the new file as UTF-8 and you should see this:
Gérer les modÀ\u0161
It recovers the "é" okay, but chokes on the "Ú", like your app did.
I don't know what all is going wrong in your app, but I'm pretty sure incorrect use of native2ascii is part of it. And that was probably the result of letting the app use the system default encoding. You should always specify the encoding when you save text, whether it's to a file or a database or what--never let it default. And if you don't have a good reason to choose something else, use UTF-8.
Try to use
java -Dfile.encoding=UTF-8 ...
when starting the application in both systems.
Another way to solve the problem is to change the encoding from both system to UTF-8, but i prefer the first option (less intrusive on the system).
EDIT:
Check this answer on stackoverflow, It might help either:
Changing the default encoding for String(byte[])
Instead of setting the system-wide character encoding, it might be easier and more robust, to specify the character encoding when reading and writing specific text data. How is your application reading the files? All the Java I/O package readers and writers support passing in a character encoding name to be used when reading/writing text to/from bytes. If you don't specify one, it will then use the platform default encoding, as you are likely experiencing.
Some databases are surprisingly limited in the text encodings they can accept. If your Java application reads the files as text, in the proper encoding, then it can output it to the database however it needs it. If your database doesn't support any encoding whose character repetoire includes the non-ASCII characters you have, then you may need to encode your non-English text first, for example into UTF-8 bytes, then Base64 encode those bytes as ASCII text.
PS: Never use String.getBytes() with no character encoding argument for exactly the reasons you are seeing.
I managed to get past this error by running the command
export LC_ALL='en_GB.UTF-8'
This command set the locale for the shell that I was in. This set all of the LC_ environment variables to the Unicode file encoding.
Many thanks for all of your suggestions.
You can also set the encoding at the command line, like so java -Dfile.encoding=utf-8.
I think we'll need more information to be able to help you with your problem:
What exception are you getting exactly, and which method are you calling when it occurs.
What is the encoding of the input file? UTF8? UTF16/Unicode? ISO8859-1?
It'll also be helpful if you could provide us with relevant code snippets.
Also, a few things I want to point out:
The problem isn't occurring at the 'é' but later on.
It sounds like the character encoding may be hard coded in your application somewhere.
Also, you may want to verify that operating system packages to support UTF-8 (SUNWeulux, SUNWeuluf etc) are installed.
Java uses operating system's default encoding while reading and writing files. Now, one should never rely on that. It's always a good practice to specify the encoding explicitly.
In Java you can use following for reading and writing:
Reading:
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(inputPath),"UTF-8"));
Writing:
PrintWriter pw = new PrintWriter(new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputPath), "UTF-8")));

Categories

Resources