Java | Duplicate characters on Windows 8 CMD UTF-8 encoding - java

I was trying to make cmd display UTF-8 encoded texts and I was finally able to do that.
I wrote a class containing the following code in order to make java write characters encoded in UTF-8:
String text = "çşğüöıÇŞĞÜÖİ UTF-8 (65001)";
System.setOut(new PrintStream(System.out, true, "utf8"));
System.out.println(text);
In command line, I entered the command chcp 65001 before running the class, in order to change the encoding setting of commandline.
Anyway, after doing all these stuff I was finally able to print UTF-8 encoded characters. But I had a problem:
The output was supposed to look like this
çşğüöıÇŞĞÜÖİ UTF-8 (65001)
Instead the output was as follows:
çşğüöıÇŞĞÜÖİ UTF-8 (65001)TF-8 (65001)
It duplicates some of the characters and I couldn't figure out why.

It duplicates some of the characters and I couldn't figure out why.
Because the Windows console does not handle utf-8 properly.

Related

encode œ correctly with UTF-8 - java

I am having problems to write out the following string into a file correctly. Especially with the character "œ". The Problem appears on my local machine (Windows 7) and on the server (Linux)
String: "Cœurs d’artichauts grillées"
Does Work (œ gets displays correctly, while the apostrophe get translated into a question mark):
Files.write(path, content.getBytes(StandardCharsets.ISO_8859_1));
Does not work (result in file):
Files.write(path, content.getBytes(StandardCharsets.UTF_8));
According to the first answer of this question, UTF-8 should be able to encode the œ correctly as well. Has anyone have an idea what i am doing wrong?
Your second approach works
String content = "Cœurs d’artichauts grillées";
Path path = Paths.get("out.txt");
Files.write(path, content.getBytes(Charset.forName("UTF-8")));
Is producing an out.txt file with:
Cœurs d’artichauts grillées
Most likely the editor you are using is not displaying the content correctly. You might have to force your editor to use the UTF-8 encoding and a font that displays œ and other UTF-8 characters. Notepad++ or IntelliJ IDEA work out of the box.

Java Character Encoding Writing to Text File

My Issue is as follows:
Having issue with character encoding when writing to text file. The issue is characters are not showing the intended value. for example I am writing ' '(which is probably a Tab character) and 'Â' is what is displayed in the text file.
Background information
This data is being stored on a MSQL Database. The Database Collation is SQL_Latin1_General_CP1_CI_AS and the fields are varchar. I've come to learn the collation and type determine what character encoding is used on the database side. Values are stored correctly so no issues here.
My Java application runs queries to pull the data from the DB and this too also looks OK. I have debugged the code and seen all the Strings have the correct representation before writing to the file.
Next I write the text to the .TXT file using a OutputStreamWriter as follows:
public OfferFileBuilder(String clientAppName, boolean isAppend) throws IOException, URISyntaxException {
String exportFileLocation = getExportedFileLocation();
File offerFile = new File(getDatedFileName(exportFileLocation+"/"+clientAppName+"_OFFERRECORDS"));
bufferedWriter = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(offerFile, isAppend), "UTF-8"));
}
Now once I open up the file on the Linux server by running cat command on file or open up the file using notepad++ some of the characters are incorrectly displaying.
I've ran the following commands on the server to see its encoding locale charmap which prints UTF-8, echo $LANG which prints en_US.UTF-8, and echo $LC_CTYPE` prints nothing.
Here is what I've attempted so far.
I've attempted to change the Character encoding used by the OutputStreamWriter I've tried UTF-8, and CP1252. When switching encoding some characters are fixed when others are then improperly displayed.
My Question is this:
Which encoding should my OutputStreamWriter be using?
(Bonus Questions) how are we supposed to avoid issues like this from happening. The rule of thumb i was provided was use UTF-8 and you will never run into problems, but this isn't the case for me right now.
running file -bi command on the server revealed that the file was encoded with ascii instead of utf8. Removing the file completely and rerunning the process fixed this for me.

Chinese Characters Displayed as Questions Marks in Mac Terminal

I am trying to retrieve some UTF-8 uni coded Chinese characters from a database using a Java file. When I do this the characters are returned as question marks.
However, when I display the characters from the database (using select * from ...) the characters are displayed normally. When I print a String in a Java file consisting of Chinese characters, they are also printed normally.
I had this problem in Eclipse: when I ran the program, the characters were being printed as question marks. However this problem was solved when I saved the Java file in UTF-8 format.
Running "locale" in the terminal currently returns this:
LANG="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_CTYPE="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_ALL=
I have also tried to compile my java file using this:
javac -encoding UTF-8 [java file]
But still, the output is question marks.
It's quite strange how it will only sometimes display the characters. Does anyone have an explanation for this? Or even better, how to fix this so that the characters are correctly displayed?
The System.out printstream isn't created as a UTF-8 print stream. You can convert it to be one like this:
import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
public class JavaTest {
public static void main(String[] args) {
try{
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println("Hello");
out.println("施华洛世奇");
out.println("World");
}
catch(UnsupportedEncodingException UEE){
//Yada yada yada
}
}
}
You can also set the default encoding as per here by:
java -Dfile.encoding=UTF-8 -jar JavaTest.jar

reading unicode data from a file

Default encoding is ISO-8859-1
BufferedReader bis = new BufferedReader(new InputStreamReader(new FileInputStream("file having unicode characters"),"UTF-8"));
String strTemp = bis.readLine();// on debugging strTemp is having actual unicode data
System.out.println(strTemp);// uses default encoding which is ISO-8859-1,So not printing ///actual data
PrintStream psTemp = new PrintStream(System.out, true, "UTF-8");
psTemp.println(strTemp);// here i am giving encoding as UTF-8,still not printing unicode data.
Even if i am giving encoding as UTF-8 in PrintStream constructor i am not able to print unicode data, if i change default encoding from ISO-8859-1 to UTF-8, it works. Why this is so?
if i change default encoding from ISO-8859-1 to UTF-8, it works. Why this is so?
I expect that this works because it is telling your console / shell / whatever is displaying the characters to expect UTF-8 characters. If the default behaviour is to expect ISO-8859-1, then sending it UTF-8 is not going to work.
Are you printing on eclipse console ? or in the shell ? Try to print to a file and check the result.
For example, windows shell is limited to "cp850" charset. The problem might be caused by the OS shell, not the JVM.

Character Encoding in Java

In eclipse, I changed the default encoding to ISO-8859-1. Then I wrote this:
String str = "Русский язык ";
PrintStream ps = new PrintStream(System.out, true, "UTF-8");
ps.print(str);
It should print the String correctly, as I am specifying UTF-8 encoding. However, it is not printing.
The ISO-8859-1 character encoding only supports characters between 0 and 255, and anything else is likely to be turned into '?'
If you save the source file (the .java file) as ISO-8859-1 than str will be encoded by javac using ISO-8859-1. Your problem does not lie in the creation of PrintStream: the str you are printing is wrong from the beginning.
Yes, it looks like the terminal that your are sending this output does not support this encoding.
If you are running Eclipse, you could set the encoding as follows:
In Run Configurations...->Common ->Encoding->Other
Select UTF-8
You are basically telling the PrintStream writer to expect the input characters to be UTF-8 encoded and to output it as UTF-8. There is no conversion. If you set your IDE to use ISO-8859-1 as character encoding for your file, which in turns contains the input string than you pipe ISO-8859-1 encoded characters into an UTF-8 expecting writer. So the writer treats the bytes receiving as UTF encoded characters which will result in data junk.
Either set your IDE to encode your source files in UTF-8 and check that your characters are correctly displayed and stored. Or tell your writer to treat them as ISO-8859-1, either way should do.

Categories

Resources