Special character encoding (PC8) in Java file writing

Special character encoding (PC8) in Java file writing - java

I need to write a file with Java in PC8 character encoding. How can a 'custom' character set be applied to a (text) file?
This is what I'm trying to do:
try {
writer = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream("test.txt"), "utf-8")); // obviously need to change this
String info = "#TEST \"test åäö\"";
writer.write(info);
writer.close();
} catch (Exception e) {
System.out.println(e);
}
So I'd need to know if it is even possible to write in a special character encoding, and what do I need to do? Specifying "PC-8" or "PC8" in the encoding did not work.

I found the answer while writing the question itself. Here is the list of supported character encodings for Java, and how to specify them in the code block I provided in the question: http://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html
So this is what works for me:
writer = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream("test.txt"), "ibm-437"));
What caused problems in my case was that Google was littered with questions for UTF-8 character encoding. Furthermore, PC8 is not the official name for the character encoding, so I couldn't find the needed information with that name. Hope this helps generally with encoding problems.

Related

FreeMarker special character output as question mark

I am trying to submit a form with fields containing special characters, such as €ŠšŽžŒœŸ. As far as I can see from the ISO-8859-15 wikipedia page, these characters are included in the standard. Even though the encoding for both request and response is set to the ISO-8859-15, when I am trying to display the values (using FreeMarker 2.3.18 in a JAVA EE environment), the values are ???????. I have set the form's accepted charset to ISO-8859-15, I have checked that the form is submitted with content-type text/html;charset=ISO-8859-15 (using firebug) but I can't figure out how to display the correct characters. If I am running the following code, the correct hex value is displayed (ex: Ÿ = be).
What am I missing? Thank you in advance!
System.out.println(Integer.toHexString(myString.charAt(i)));
EDIT:
I am having the following code as I process the request:
PrintStream ps = new PrintStream(System.out, true, "ISO-8859-15");
String firstName = request.getParameter("firstName");
// check for null before
for (int i = 0; i < firstName.length(); i++) {
ps.println(firstName.charAt(i)); // prints "?"
}
BufferedWriter file=new BufferedWriter(new OutputStreamWriter(new FileOutputStream(path), "ISO-8859-15"));
file.write(firstName); // writes "?" to file (checked with notepad++, correct encoding set)
file.close();

According to the hex value, the form data is submitted correctly.
The problem seems to be related to the output. Java replaces a character with ? if it cannot be represented with the charset in use.
You have to use a correct charset when constructing the output stream. What commands do you use for that? I do not know FreeMarker but there will probably be something like
Writer out = new OutputStreamWriter(System.out);
This should be replaced with something resembling this:
Writer out = new OutputStreamWriter(System.out, "iso-8859-15");
By the way, UTF-8 is usually much better choice for the encoding charset.

Error when reading non-English language character from file

I am building an app where users have to guess a secret word. I have *.txt files in assets folder. The problem is that words are in Albanian language. Our language uses letters like "ë" and "ç", so whenever I try to read from the file some word containing any of those characters I get some wicked symbol and I can not implement string.compare() for these characters. I have tried many options with UTF-8, changed Eclipse setting but still the same error.
I wold really appreciate if someone has got any advice.
The code I use to read the files is:
AssetManager am = getAssets();
strOpenFile = "fjalet.txt";
InputStream fins = am.open(strOpenFile);
reader = new BufferedReader(new InputStreamReader(fins));
ArrayList<String> stringList = new ArrayList<String>();
while ((aDataRow = reader.readLine()) != null) {
aBuffer += aDataRow + "\n";
stringList.add(aDataRow);
}
Otherwise the code works fine, except for mentioned characters

It seems pretty clear that the default encoding that is in force when you create the InputStreamReader does not match the file.
If the file you are trying to read is UTF-8, then this should work:
reader = new BufferedReader(new InputStreamReader(fins, "UTF-8"));
If the file is not UTF-8, then that won't work. Instead you should use the name of the file's true encoding. (My guess is that it is in ISO/IEC_8859-1 or ISO/IEC_8859-16.)
Once you have figured out what the file's encoding really is, you need to try to understand why it does not correspond to your Java platform's default encoding ... and then make a pragmatic decision on what to do about it. (Should you hard-wire the encoding into your application ... as above? Should you make it a configuration property or command parameter? Should you change the default encoding? Should you change the file?)

You need to determine the character encoding that was used when creating the file, and specify this encoding when reading it. If it's UTF-8, for example, use
reader = new BufferedReader(new InputStreamReader(fins, "UTF-8"));
or
reader = new BufferedReader(new InputStreamReader(fins, StandardCharsets.UTF_8));
if you're under Java 7.
Text editors like Notepad++ have good heuristics to guess what the encoding of a file is. Try opening it with such an editor and see which encoding it has guessed (if the characters appear correctly).

You should know encoding of the file.
InputStream class reads file binary. Although you can interpet input as character, it will be implicit guessing, which may be wrong.
InputStreamReader class converts binary to chars. But it should know character set.
You should use the following version to feed it by character set.
UPDATE
Don't suggest you have UTF-8 encoded file, which may be wrong. Here in Russia we have such encodings as CP866, WIN1251 and KOI8, which are all differ from UTF8. Probably you have some popular Albanian encoding of text files. Check your OS setting to guess.

UTF-8 write xml successful

today I faced with very interesting problem. When I try to rewrite xml file.
I have 3 ways to do this. And I want to know the best way and reason of problem.
I.
File file = new File(REAL_XML_PATH);
try {
FileWriter fileWriter = new FileWriter(file);
XMLOutputter xmlOutput = new XMLOutputter();
xmlOutput.output(document, System.out);
xmlOutput.output(document, fileWriter);
fileWriter.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
In this case I have a big problem with my app. After writing in file in my own language I can't read anything. Encoding file was changed on ANSI javax.servlet.ServletException: javax.servlet.jsp.JspException: Invalid argument looking up property: "document.rootElement.children[0].children"
II.
File file = new File(REAL_XML_PATH);
XMLOutputter output=new XMLOutputter();
try {
output.output(document, new FileOutputStream(file));
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
In this case I haven't problems. Encoding wasn't change. No problem with reading and writing.
And this article http://tripoverit.blogspot.com/2007/04/javas-utf-8-and-unicode-writing-is.html
And I want to know the best way and reason of problem.

Well, this looks like the problem:
FileWriter fileWriter = new FileWriter(file);
That will always use the platform default encoding, which is rarely what you want. Suppose your default encoding is ISO-8859-1. If your document declares itself to be encoded in UTF-8, but you actually write everything in ISO-8859-1, then your file will be invalid if you have any non-ASCII characters - you'll end up writing them out with the ISO-8859-1 single byte representation, which isn't valid UTF-8.
I would actually provide a stream to XMLOutputter rather than a Writer. That way there's no room for conflict between the encoding declared by the file and the encoding used by the writer. So just change your code to:
FileOutputStream fileOutput = new FileOutputStream(file);
...
xmlOutput.output(document, fileOutput);
... as I now see you've done in your second bit of code. So yes, this is the preferred approach. Here, the stream makes no assumptions about the encoding to use, because it's just going to handle binary data. The XML writing code gets to decide what that binary data will be, and it can make sure that the character encoding it really uses matches the declaration at the start of the file.
You should also clean up your exception handling - don't just print a stack trace and continue on failure, and call close in a finally block instead of at the end of the try block. If you can't genuinely handle an exception, either let it propagate up the stack directly (potentially adding throws clauses to your method) or catch it, log it and then rethrow either the exception or a more appropriate one wrapping the cause.

If I remember correctly, you can force your xmlOutputter to use a "pretty" format with:
new XMLOutputter(Format.getPrettyFormat()) so it should work with I too
pretty is:
Returns a new Format object that performs whitespace beautification
with 2-space indents, uses the UTF-8 encoding, doesn't expand empty
elements, includes the declaration and encoding, and uses the default
entity escape strategy. Tweaks can be made to the returned Format
instance without affecting other instances.

Special characters when run in netbeans are showing correctly, but when running "jar" file strange characters appear

Seems like a simple problem, but even after searching forum and web I could not find an answer.
When I run my program in netbeans all the special characters like ä, ö, ü are showing correctly. But when I run "jar" file of the same project (I did clean and rebuild) some strange characters as #A &$ and so on are appearing instead of correct character.
Any help would be appreciated.
//edited 22. 08. 2012 00:46
I thought the solution would be easier so I didn't post any code or details. Ok then:
//input file is in UTF-8
try {
BufferedReader in = new BufferedReader(new FileReader("fin.dir"));
String line;
while ((line = in.readLine()) != null) {
processLine(line, 0);
}
in.close();
} catch (FileNotFoundException ex) {
System.out.println(ex.getMessage());
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
I am displaying characters in this way:
JOptionPane.showMessageDialog(rootPane, "Correct!\n\n"
+ testingFin.getWord(), "Congrats", 1);`

From the description of FileReader:
Convenience class for reading character files. The constructors of
this class assume that the default character encoding and the default
byte-buffer size are appropriate. To specify these values yourself,
construct an InputStreamReader on a FileInputStream.
If you're on Windows, the default encoding is ISO-8859-1, so as Jon commented, the encoding problem is occurring on input. Try this:
in = new BufferedReader(
new InputStreamReader(new FileInputStream("fin.dir"),"UTF-8"));

Add your netbeans setting under YOURNETBEANS/etc/netbeans.conf likes this;
-J-Dfile.encoding=UTF-8

How will append a utf-8 string to a properties file

How will append a utf-8 string to a properties file. I have given the code below.
public static void addNewAppIdToRootFiles() {
Properties properties = new Properties();
try {
FileInputStream fin = new FileInputStream("C:\Users\sarika.sukumaran\Desktop\root\root.properties");
properties.load(new InputStreamReader(fin, Charset.forName("UTF-8")));
String propertyStr = new String(("قسيمات").getBytes("iso-8859-1"), "UTF-8");
BufferedWriter bw = new BufferedWriter(new FileWriter(directoryPath + rootFiles, true));
bw.write(propertyStr);
bw.newLine();
bw.flush();
bw.close();
fin.close();
} catch (Exception e) {
System.out.println("Exception : " + e);
}
}
But when I open the file, the string I have written "قسيمات" to the file shows as "??????". Please help me.

OK, your first mistake is getBytes("iso-8859-1"). You should not do these manipulations at all. If you want to write unicode text to file you should open the file and write text. The internal representations of strings in java is unicdoe, so everything will be writter correctly.
You have to care about charset when you are reading file. BTW you do it correctly.
But you do not have to use file manipulation tools to append something to properites file. You can just call prop.setProperty("yourkey", "yourvalue") and then call prop.store(new FileOutputStream(youfilename)).

Ok, I have checked the specification for Properties class. If you use following methods: load() for input stream or store() for output stream, the input/output stream for the file is assumed a iso-8859-1 encoding by default. Therefore, you have to be cautious with a few things:
Some characters in French, German and Portuguese are iso-8859-1 (Latin1) compatible, which they normally work fine in iso-8859-1. So, you don't have to worry that much. But, others like Arabic and Hebrew characters are not Latin1 compatible, so you need to be careful with the choice of encoding for these characters. If you have a mix of characters of French and Arabic, you have no choice but to use Unicode.
What is your current input file's encoding if it already exists to be used with Properties's load() method? If it is not the default iso-8859-1, then you need to figure out what it is first before opening the file. If infile file encoding is UTF-8, then use properties.load(new InputStreamReader(new FileInputStream("infile"), "UTF8"))); Then, stick to this encoding till the end. Match the file encoding with the character encoding as well.
If it is a new input file to be used with Properties's load() method, choose the file encoding that works with your character's encoding. Then, stick to this encoding till the end.
Your expected output file's encoding shall be the same with what is used from Properties's load() method before you use the store() method. If it is not the default iso-8859-1, then you need to figure out what it is first before saving the file. Stick to this encoding till the end. Match the file encoding with the character encoding as well. If outfile file encoding is UTF-8, then specifically use UTF-8 encoding when saving the file. But, if the store() method still ends up with an outfile in iso-8859-1 encoding, then you need to do what is suggested next...
If you stick to the default iso-8859-1, it works fine for characters like French. But, if the characters are not iso-8859-1 or Latin1 encoding compatible, you need to use Unicode escape characters instead as an alternative: for example:\uFE94 for the Arabic ﺔ character. For me, this escaping is too tedious and normally we use native2ascii utility provided in JRE or JDK to convert a properties file from one encoding to another encoding. Of course, there are other ways...just check the references below...For me, it is better to use a properties file in XML format since by default it is UTF-8...
References:
Java properties UTF-8 encoding in Eclipse
Setting the default Java character encoding?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Special character encoding (PC8) in Java file writing - java

Related

FreeMarker special character output as question mark

Error when reading non-English language character from file

UTF-8 write xml successful

Special characters when run in netbeans are showing correctly, but when running "jar" file strange characters appear

How will append a utf-8 string to a properties file

Categories

Resources