I am trying to retrieve some UTF-8 uni coded Chinese characters from a database using a Java file. When I do this the characters are returned as question marks.
However, when I display the characters from the database (using select * from ...) the characters are displayed normally. When I print a String in a Java file consisting of Chinese characters, they are also printed normally.
I had this problem in Eclipse: when I ran the program, the characters were being printed as question marks. However this problem was solved when I saved the Java file in UTF-8 format.
Running "locale" in the terminal currently returns this:
LANG="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_CTYPE="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_ALL=
I have also tried to compile my java file using this:
javac -encoding UTF-8 [java file]
But still, the output is question marks.
It's quite strange how it will only sometimes display the characters. Does anyone have an explanation for this? Or even better, how to fix this so that the characters are correctly displayed?
The System.out printstream isn't created as a UTF-8 print stream. You can convert it to be one like this:
import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
public class JavaTest {
public static void main(String[] args) {
try{
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println("Hello");
out.println("施华洛世奇");
out.println("World");
}
catch(UnsupportedEncodingException UEE){
//Yada yada yada
}
}
}
You can also set the default encoding as per here by:
java -Dfile.encoding=UTF-8 -jar JavaTest.jar
Related
I am experiencing some issues with java Unicode output. I've tried multiple things and I know see the Unicode characters, but they are preceded by a diamond with a question mark inside.
Here is my test file created with notepad:
Here is the file working in notepad++:
Here is my cmd.exe output:
cmd font settings:
Run cmd /U, still no characters (found different font was used here [Consolas] which is why its question marks in boxes):
Windows version:
I tried powershell as well, which seems to think its a different encoding:
I wrote a small java program, and that is able to print the Unicode, but it some cases with an extra character preceding it.
public class App {
private static final Logger logger = LogManager.getLogger(App.class);
public static void main(String[] args) {
Charset utf8Charset = Charset.forName("UTF-8");
Charset defaultCharset = Charset.defaultCharset();
System.out.println(defaultCharset);
System.out.println("A 😃");
System.out.println("B ✔ ");
System.out.println("C ❌");
}
}
I then run this java app with the following flag
java -Dfile.encoding=UTF-8
here is the output:
Why can my Java app print Unicode but not the cmd.exe directly?
How is java adding in characters cmd.exe doesn't know about?
What else can I test/try/change to get Unicode to behave better.
Notes:
I read the Lucida Console should work; I tried all fonts in the field, NSimSun showed the x and checkmark, but not the emoji face. This font is a bit hard on my eyes though.
When I opened the txt file in Word, if I selected all the text and set it to Consolas or Lucida Console it seemed to change the emojis to Segoe UI, so maybe the font's just don't support it - though I've seen other posts which suggest Lucida Console should work?
my code(Qwe.java)
public class Qwe {
public static void main(String[] args) {
System.out.println("тест привет");
}
}
where
тест привет
is russian words
Qwe.java in UTF-8
on my machine(ubuntu 14.04) result is
тест привет
on server(ubuntu 12.04) I have:
???? ??????
$java Qwe > test.txt
in test.txt is see
???? ??????
I fix it just use export JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8
The java source text must use the same encoding as the javac compiler. That seems to have been the case, and UTF-8 is of course ideal.
The the file Qwe.class is fine, with internally using Unicode for String. The output to the console uses the server platform encoding. That is, java converts the text in Unicode to bytes using probably the default (platform) encoding, and that cannot handle Cyrillic.
So you have to write to a file, never using FileWriter (a utility class for local files only), but using:
... new OutputStreamWriter(new FileOutputStream(file), "UTF-8")
You can also set user locales on the server but that is not my beer.
In general I would switch to a file logger.
I am not sure but it might only accept ASCII characters from the english language unless you have some extension or something. But like I said my best guess is it is not finding the characters and just outputting garbage in their place.
"Java, any unknown character which is passed through the write() methods of an OutputStream get printed as a plain question mark “?”"
as taken from here
I was trying to make cmd display UTF-8 encoded texts and I was finally able to do that.
I wrote a class containing the following code in order to make java write characters encoded in UTF-8:
String text = "çşğüöıÇŞĞÜÖİ UTF-8 (65001)";
System.setOut(new PrintStream(System.out, true, "utf8"));
System.out.println(text);
In command line, I entered the command chcp 65001 before running the class, in order to change the encoding setting of commandline.
Anyway, after doing all these stuff I was finally able to print UTF-8 encoded characters. But I had a problem:
The output was supposed to look like this
çşğüöıÇŞĞÜÖİ UTF-8 (65001)
Instead the output was as follows:
çşğüöıÇŞĞÜÖİ UTF-8 (65001)TF-8 (65001)
It duplicates some of the characters and I couldn't figure out why.
It duplicates some of the characters and I couldn't figure out why.
Because the Windows console does not handle utf-8 properly.
Im having a strange issue trying to write in text files with strings which contain characters like "ñ", "á".. and so on. Let me first show you my little piece of code:
import java.io.*;
public class test {
public static void main(String[] args) throws Exception {
String content = "whatever";
int c;
c = System.in.read();
content = content + (char)c;
FileWriter fw = new FileWriter("filename.txt");
BufferedWriter bw = new BufferedWriter(fw);
bw.write(content);
bw.close();
}
}
In this example, im just reading a char from the keyboard input and appending it to a given string; then writting the final string into a txt. The problem is that if I type an "ñ" for example (i have a Spanish layout keyboard), when i check the txt, it shows a strange char "¤" where there should be a "ñ", that is, the content of the file is "whatever¤". The same happens with "ç", "ú"..etc. However it writes it fine ("whateverñ") if i just forget about the keyboard input and i write:
...
String content = "whateverñ";
...
or
...
content = content + "ñ";
...
It makes me think that there might be something wrong with the read() method? Or maybe im using it wrongly? or should i use a different method to get the keyboard input? or..? Im a bit lost here.
(Im using the jdk 7u45 # Windows 7 Pro x64)
So ...
It works (i.e. you can read the accented characters on the output file) if you write them as literal strings.
It doesn't work when you read them from System.in and then write them.
This suggests that the problem is on the input side. Specifically, I think your console / keyboard must be using a character encoding for the input stream that does not match the encoding that Java thinks should be used.
You should be able to confirm this tentative diagnosis by outputting the characters you are reading in hexadecimal, and then checking the codes against the unicode tables (which you can find at unicode.org for example).
It strikes me as "odd" that the "platform default encoding" appears to be working on the output side, but not the input side. Maybe someone else can explain ... and offer a concrete suggestion for fixing it. My gut feeling is that the problem is in the way your keyboard is configured, not in Java or your application.
files do not remember their encoding format, when you look at a .txt, the text editor makes a "best guess" to the encoding used.
if you try to read the file into your program again, the text should be back to normal.
also, try printing the "strange" character directly.
I'm running my Java program from Unix. To simplify matters, I describe only the relevant part.
public static void main(String[] args) {
System.out.println("féminin");
}
My output is garbage. It is obviously a character-encoding problem, the French character é is not showing up correctly. I've tried the following:
public static void main(String[] args) {
PrintStream ps = new PrintStream(System.out, true, "ISO-8859-1");
ps.println("féminin");
}
But my output is still showing ? in palce of french character.
I ran the sam efile in command prompt with java -Dfile.encoding=IBM437 DSIClient féminin
it worked fine. But How can I resolve this character-encoding issue with Unix? Thanks
The problem is most likely that your code editor and your terminal emulator use different encodings, and Java's notion of the default encoding may in addition be different.
To see if your terminal and your editor agree, you could simply cat your java source file. Does the é show up correctly? If so, you use the same encoding in your source code editor and your terminal, but it is not Java's default encoding. If, OTOH, you can't see the é, you need to find out what encoding is used by your terminal and your editor and brind them in agreement.