Why is UnsupportedEncodingException not a subclass of RuntimeException? - java

I've come across the UnsupportedEncodingException while using the URLEncoder with the UTF-8 encoding and it forces me to write my code like this:
String foo = ... // contains unsafe characters
try {
foo = URLEncoder.encode(foo, "UTF-8");
} catch (UnsupportedEncodingException e)
// do something with the exception that should never occur
}
Instead of just this:
String foo = ... // contains unsafe characters
foo = URLEncoder.encode(foo, "UTF-8");
The documentation of URLEncoder discourages the use of any encoding other than UTF-8:
Note: The World Wide Web Consortium Recommendation states that UTF-8 should be used. Not doing so may introduce incompatibilities.
And the UTF-8 encoding should always be available, at least according to the Supported Encodings page from the documentation.
The accepted answer to the question on how to handle the UnsupportedEncodingException and if it can even occur is "It cannot happen, unless there is something fundamentally broken in your JVM".
So I'm wondering, why does the UnsupportedEncodingException class not extend the RuntimeException class, which would allow me to use the second code snippet? Is it just because it exists as it is right now and it would be hard to change that?

If this was changed some existing code could be broken. For example
try {
... do something that could throw UnsupportedEncodingException
} catch (IOException e) {
... handle the exception
}
If UnsupportedEncodingException is no longer an IOException it would not be handled any more.

Related

PrintWriter the output file is not there although no exception thrown

I have just started learning Java and I have got following problem I have been struggling for hours with. I want to use PrintWriter in order to produce a simple text file.
I do not get any runtime exception, still the file is not appearing in the specified directory.
public class Main {
public static void main(String[] args) {
try (final PrintWriter writer = new PrintWriter(
new File("c:\test\new\notes.txt"))) {
writer.write("Test note");
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
What am I doing wrong?
\ represents an escape character so needs to be escaped itself for literal backslash characters. You can also use / and Java will resolve the correct separation character for the platform
try (final PrintWriter writer = new PrintWriter("c:\\test\new\\notes.txt")) {
Add writer.flush() after writer.write("Test note"), and use double backslashes for Windows paths (as other answers are suggesting).
As Reimeus already said, \ is an escape character in java.
That means that a string containing "\n" or "\t" does not represent the stringliteral \n or \t!
'\n' represents the newline character and '\t' represents the TAB character!
For the better understanding, the following code:
System.out.println("c:\test\new\notes.txt");
would not print c:\test\new\notes.txt to the console, it would print
c: est
ew
otes.txt
to the console!
To be able to write the backslash in a string you'll need to use '\\'!
I see your question as having 2 parts:
Why doesn't the code work?
Why was no exception thrown?
The first question has already been answered, but I think the answer to the second question is at least as important because your current code will still fail silently if there is any problem writing to the file.
From the documentation of PrintWriter (http://docs.oracle.com/javase/7/docs/api/java/io/PrintWriter.html):
Methods in this class never throw I/O exceptions, although some of its
constructors may. The client may inquire as to whether any errors have
occurred by invoking checkError().
Therefore it is essential that you call checkerror() after every call to a PrintWriter method or your code will not be reliable.

How can I tell whether the text I receive from a network/read from a file uses a given encoding?

I have a file, or I read from a socket; the data I read is supposed to be text encoded with a given character coding.
But even if I specify a coding and it turns out to be wrong in the end, the operation succeeds; instead of an exception of any sort, I get a lot of � in my text :/
Is there a way I can trigger a failure instead?
Yes there is.
First, some information: what is that pesky � character, really? Well, it is Unicode's "replacement character", code point U+FFFD.
Now, why do you get this? In order to explain this, we need to delve a little deeper into what happens...
First, a "formal" definition: a character coding is a process which defines a bijection between a stream of bytes and a stream of characters; as it is a bijection, it means that two operations are defined: encoding (turning a stream of characters into a stream of bytes) and decoding (turning a stream of bytes into a stream of characters).
In Java, a character coding is encompassed in a Charset; you can obtain an encoder using Charset.newEncoder(), and a decoder using Charset.newDecoder().
And of course, it can happen that in the decoding process, which is what is of interest here, a certain sequence of bytes turns out to be malformed, in which case the CharsetDecoder must decide what to do... And this behavior depends on CodingErrorAction, which has three values:
REPLACE (the default!!): replace any unmappable sequence with Unicode's replacement character!
IGNORE: scrap all unmappable sequences, don't output anything;
REPORT: throw an exception on an unmappable sequence...
Now, what we want in order to detect malformed inputs and throw an error is to REPORT them!
So, how do we do this given an InputStream? The solution is to use an InputStreamReader; it has a constructor allowing you to specify a CharsetDecoder as an argument. All you have to do is to create your decoder!
For instance, if you want to ensure correct UTF-8, you would do this:
final CharsetDecoder decoder = StandardCharsets.UTF_8
.newDecoder().onMalformedInput(CodingErrorAction.REPORT);
try (
final InputStreamReader reader = new InputStreamReader(in, decoder);
) {
// read from the reader here
}
The exception you want to catch here is a CharacterCodingException. Note that it inherits IOException, so you want to:
try (
...
) {
...
} catch (CharacterCodingException e) {
...
} catch (IOException e) {
...
}

Generating fatal error in Java

Suppose we are writing a Java library, which provides some I/O ulitity functions, for example, a convenient method to read text files as Strings:
public class StringReader {
private static final Logger log = LoggerFactory.getLog(StringReader.class);
/**
* Returns the contents of file <b>fileName</b> as String.
* #param fileName file name to read
* #return null on IO error
*/
public static String readString(String fileName) {
FileInputStream fis = null;
try {
fis = new FileInputStream(fileName);
byte[] data = new byte[fis.available()];
fis.read(data);
return new String(data, "ISO-8859-1"); // may throw UnsupportedEncodingException!
} catch (IOException e) {
log.error("unable to read file", e);
} catch (UnsupportedEncodingException e) {
log.fatal("JRE does not support ISO-8859-1!", e);
// ???
} finally {
closeQuiet(fis);
}
return null;
}
}
This code reads a text file into a String using ISO-8859-1 encoding and returns the String to user.
The String(byte[], String) constructor throws an UnsupportedEncodingException when specified encoding is not supported. But, as we know, ISO-8859-1 must be supported by JRE, as said here (see the Standard charsets section).
Hence, we expect the block
catch (UnsupportedEncodingException e) {
log.fatal("encoding is unsupported", e);
// ???
}
is never reached if JRE distribution conforms the standard.
But what if it doesn't? How to handle this exception in the most correct way?
The question is, how to alert properly about such error?
The suggestions are:
Throw some kind of RuntimeException.
Do not disable the logger in production code, write an exception details in log and ignore it.
Put the assert false here, so it produce AssertionError if user launched VM with -ea.
Throw an AssertionError manually.
Add an UnsupportedEncodingException in method declaration and allow user to choose. Not very convenient, I think.
Call System.exit(1).
Thanks.
But what if it doesn't?
Then you're in a really bad situation, and you should probably get out of it as quickly as possible. When a JRE is violating its own promises, what would you want to depend on?
I'd feel happy using AssertionError in this case.
It's important to note that not all unchecked exceptions are treated equally - it's not unusual for code to catch Exception at the top level of the stack, log an error and then keep going... if you just throw RuntimeException, that will be caught by such a scheme. AssertionError would only be caught if the catch block specified Throwable (or specifically Error or AssertionError, but that's much rarer to see). Given how impossible this should be, I think it's reasonable to abort really hard.
Also note that in Java 7, you can use StandardCharsets.ISO_8859_1 instead of the string name, which is cleaner and removes the problem.
There are other things I'd change about your code, by the way:
I would avoid using available() as far as possible. That tells you how many bytes are available right now - it doesn't tell you how long the file is, necessarily.
I would definitely not assume that read() will read the whole file in one go. Call read() in a loop, ideally until it says there's no more data.
I would personally accept a Charset as a parameter, rather than hard-coding ISO-8859-1. - I would let IOException bubble up from the method rather than just returning null. After all, unless you're really going to check the return value of every call for nullity, you're just going to end up with a NullPointerException instead, which is harder to diagnose than the original IOException.
Alternatively, just use Guava's Files.toString(File, Charset) to start with :) (If you're not already using Guava, now is a good time to start...)
This is a rather common occurrence in code.
Unchecked exceptions are made for this. They shouldn't happen (which is why they are unchecked), but if they do, there is still an exception.
So, throw a RuntimeException that has the original Exception as the cause.
catch (UnsupportedEncodingException e) {
throw new RuntimeException(e); //should not happen
}
assert(false); also throws an unchecked exception, but it assertions can be turned off, so I would recommend RuntimeException.

UTF-8 write xml successful

today I faced with very interesting problem. When I try to rewrite xml file.
I have 3 ways to do this. And I want to know the best way and reason of problem.
I.
File file = new File(REAL_XML_PATH);
try {
FileWriter fileWriter = new FileWriter(file);
XMLOutputter xmlOutput = new XMLOutputter();
xmlOutput.output(document, System.out);
xmlOutput.output(document, fileWriter);
fileWriter.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
In this case I have a big problem with my app. After writing in file in my own language I can't read anything. Encoding file was changed on ANSI javax.servlet.ServletException: javax.servlet.jsp.JspException: Invalid argument looking up property: "document.rootElement.children[0].children"
II.
File file = new File(REAL_XML_PATH);
XMLOutputter output=new XMLOutputter();
try {
output.output(document, new FileOutputStream(file));
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
In this case I haven't problems. Encoding wasn't change. No problem with reading and writing.
And this article http://tripoverit.blogspot.com/2007/04/javas-utf-8-and-unicode-writing-is.html
And I want to know the best way and reason of problem.
Well, this looks like the problem:
FileWriter fileWriter = new FileWriter(file);
That will always use the platform default encoding, which is rarely what you want. Suppose your default encoding is ISO-8859-1. If your document declares itself to be encoded in UTF-8, but you actually write everything in ISO-8859-1, then your file will be invalid if you have any non-ASCII characters - you'll end up writing them out with the ISO-8859-1 single byte representation, which isn't valid UTF-8.
I would actually provide a stream to XMLOutputter rather than a Writer. That way there's no room for conflict between the encoding declared by the file and the encoding used by the writer. So just change your code to:
FileOutputStream fileOutput = new FileOutputStream(file);
...
xmlOutput.output(document, fileOutput);
... as I now see you've done in your second bit of code. So yes, this is the preferred approach. Here, the stream makes no assumptions about the encoding to use, because it's just going to handle binary data. The XML writing code gets to decide what that binary data will be, and it can make sure that the character encoding it really uses matches the declaration at the start of the file.
You should also clean up your exception handling - don't just print a stack trace and continue on failure, and call close in a finally block instead of at the end of the try block. If you can't genuinely handle an exception, either let it propagate up the stack directly (potentially adding throws clauses to your method) or catch it, log it and then rethrow either the exception or a more appropriate one wrapping the cause.
If I remember correctly, you can force your xmlOutputter to use a "pretty" format with:
new XMLOutputter(Format.getPrettyFormat()) so it should work with I too
pretty is:
Returns a new Format object that performs whitespace beautification
with 2-space indents, uses the UTF-8 encoding, doesn't expand empty
elements, includes the declaration and encoding, and uses the default
entity escape strategy. Tweaks can be made to the returned Format
instance without affecting other instances.

StringBufferInputStream Question in Java

I want to read an input string and return it as a UTF8 encoded string. SO I found an example on the Oracle/Sun website that used FileInputStream. I didn't want to read a file, but a string, so I changed it to StringBufferInputStream and used the code below. The method parameter jtext, is some Japanese text. Actually this method works great. The question is about the deprecated code. I had to put #SuppressWarnings because StringBufferInputStream is deprecated. I want to know is there a better way to get a string input stream? Is it ok just to leave it as is? I've spent so long trying to fix this problem that I don't want to change anything now I seem to have cracked it.
#SuppressWarnings("deprecation")
private String readInput(String jtext) {
StringBuffer buffer = new StringBuffer();
try {
StringBufferInputStream sbis = new StringBufferInputStream (jtext);
InputStreamReader isr = new InputStreamReader(sbis,
"UTF8");
Reader in = new BufferedReader(isr);
int ch;
while ((ch = in.read()) > -1) {
buffer.append((char)ch);
}
in.close();
return buffer.toString();
} catch (IOException e) {
e.printStackTrace();
return null;
}
}
I think I found a solution - of sorts:
private String readInput(String jtext) {
String n;
try {
n = new String(jtext.getBytes("8859_1"));
return n;
} catch (UnsupportedEncodingException e) {
return null;
}
}
Before I was desparately using getBytes(UTF8). But I by chance I used Latin-1 "8859_1" and it worked. Why it worked, I can't fathom. This is what I did step-by-step:
OpenOffice CSV(utf8)------>SQLite(utf8, apparently)------->java encoded as Latin-1, somehow readable.
The reason that StringBufferInputStream is deprecated is because it is fundamentally broken ... for anything other than Strings consisting entirely of Latin-1 characters. According to the javadoc it "encodes" characters by simply chopping off the top 8 bits! You don't want to use it if your application needs to handle Unicode, etc correctly.
If you want to create an InputStream from a String, then the correct way to do it is to use String.getBytes(...) to turn the String into a byte array, and then wrap that in a ByteArrayInputStream. (Make sure that you choose an appropriate encoding!).
But your sample application immediately takes the InputStream, converts it to a Reader and then adds a BufferedReader If this is your real aim, then a simpler and more efficient approach is simply this:
Reader in = new StringReader(text);
This avoids the unnecessary encoding and decoding of the String, and also the "buffer" layer which serves no useful purpose in this case.
(A buffered stream is much more efficient than an unbuffered stream if you are doing small I/O operations on a file, network or console stream. But for a stream that is served from an in-memory data structure the benefits are much smaller, and possibly even negative.)
FOLLOWUP
I realized what you are trying to do now ... work around a character encoding / decoding issue.
My advice would be to try to figure out definitively the actual encoding of the character data that is being delivered by the database, then make sure that the JDBC drivers are configured to use the same encoding. Trying to undo the mis-translation by encoding with one encoding and decoding with another is dodgy, and can give you only a partial correction of the problems.
You also need to consider the possibility that the characters got mangled on the way into the database. If this is the case, then you may be unable to de-mangle them.
Is this what you are trying to do? Here is previous answer on similar question. I am not sure why you want to convert to a String to an exactly the same String.
Java String holds a sequence of chars in which each char represents a Unicode number. So it is possible to construct the same string from two different byte sequences, says one is encoded with UTF-8 and the other is encoded with US-ASCII.
If you want to write it to file, you can always convert it with String.getBytes("encoder");
private static String readInput(String jtext) {
byte[] bytes = jtext.getBytes();
try {
String string = new String(bytes, "UTF-8");
return string;
} catch (UnsupportedEncodingException ex) {
// do something
return null;
}
}
Update
Here is my assumption.
According to your comment, you SQLite DB store text value using one encoding, says UTF-16. For some reason, your SQLite APi cannot determine what the encoding it uses to encode the Unicode values to sequence of bytes.
So when you use getString method from your SQLite API, it reads a set of bytes form you DB, and convert them into Java String using incorrect encoding. If this is the case, you should use getBytes method and reconstruct the String yourself, i.e. new String(bytes, "encoding used in your DB"); If you DB is stored in UTF-16, then new String(bytes, "UTF-16"); should be readable.
Update
I wasn't talking about getBytes method on String class. I talked about getBytes method on your SQL result object, e.g. result.getBytes(String columnLabel).
ResultSet result = .... // from SQL query
String readableString = readInput(result.getBytes("my_table_column"));
You will need to change the signature of your readInput method to
private static String readInput(byte[] bytes) {
try {
// change encoding to your DB encoding.
// this can be UTF-8, UTF-16, 8859_1, etc.
String string = new String(bytes, "UTF-8");
return string;
} catch (UnsupportedEncodingException ex) {
// do something, at least return garbled text
return new String(bytes, "UTF-8");;
}
}
Whatever encoding you set in here which makes your String readable, it is definitely the encoding of your column in DB. This involves no unexplanable phenomenon and you know exactly what your column encoding is.
But it will be good to config your JDBC driver to use the correct encoding so that you will not need to use this readInput method to convert.
If no encoding can make your string readable, you will need consider the possibility of the characters got mangled when it was written to DB as #Stephen C said. If this is the case, using walk around method may cause you to lose some of the charaters during conversions. You will also need to solve encoding problem during writting as well.
The StringReader class is the new alternative to the deprecated StringBufferInputStream class.
However, you state that what you actually want to do is take an existing String and return it encoded as UTF-8. You should be able to do that much more simply I expect. Something like:
s8 = new String(jtext.getBytes("UTF8"));

Categories

Resources