PrintWriter using a StandardCharset - java

If I were to create an InputStreamReader with the following code,
new InputStreamReader(anInputStream, "UTF-8")
I would have to catch UnsupportedEncodingException, which is reasonable. I can avoid this by using
new InputStreamReader(anInputStream, StandardCharsets.UTF_8)
which doesn't throw UnsupportedEncodingException as the charset is already known to be valid. All good so far.
Now enter its counterpart, the PrintWriter:
new PrintWriter("filename", StandardCharsets.UTF_8)
doesn't compile because the PrintWriter constructor doesn't take a Charset argument. I can do this:
new PrintWriter("filename", StandardCharsets.UTF_8.name())
but then I can't avoid having to catch UnsupportedEncodingException, even though the charset name has just come from a valid charset.
The StandardCharsets utility class was added later on in Java's lifetime, and when Sun added it, they also added an overload to the InputStreamReader constructor. Why did they add an overload to InputStreamReader but not PrintWriter?
Is there another class I can use instead, which takes a charset instead of a charset name?

The counterpart to InputStreamReader is not PrintWriter.
Use OutputStreamWriter instead.
If you want to use PrintWriter, it's possible to use PrintWriter(new OutputStreamWriter(anOutputStream, StandardCharsets.UTF_8));

The counterpart of java.io.InputStreamReader is java.io.OutputStreamWriter, not java.io.PrintWriter.
That said, you can create the PrintWriter safely like this:
Reader reader = new InputStreamReader(anyOutputStream, StandardCharsets.UTF_8);
Writer writer = new OutputStreamWriter(anyInputStream, StandardCharsets.UTF_8);
PrintWriter printWriter = new PrintWriter(writer);

but then I can't avoid having to catch UnsupportedEncodingException, even though the charset name has just come from a valid charset.
Which makes sense, right? Since it's still a String.
As suggested by Stewart, using the java.io.OutputStreamWriter would be the way to go.
new PrintWriter(new OutputStreamWriter(anOutputStream, StandardCharsets.UTF_8), isAutoFlush)

Related

Wrong encoding of non-English String

I am trying to send a message that contains English and Russian text, but the Russian text is displayed as "?? ???????"
PrintWriter writer = new PrintWriter(clientSocket.getOutputStream());
writer.println("English" + "На русском");
writer.flush();
Is there a way to fix this?
So, your problem was has nothing to do with IntelliJ IDEA.
The PrintWriter constructor that only accepts an OutputStream creates a PrintWriter that makes use of the default character encoding of the JVM. You can check what the default character encoding of the JVM is by invoking Charset.defaultCharset(). However, you should not rely on it having any particular value, either at the sending end, or at the receiving end. It is best to either set the default character encoding of the JVM, or to supply a specific character encoding when creating your PrintWriter. The following should do it:
Charset charset = StandardCharsets.UTF_8;
OutputStreamWriter osw = new OutputStreamWriter( outputStream, charset );
PrintWriter printWriter = new PrintWriter( new BufferedWriter( osw ) );
If you are in control of both the sending side and the receiving side, then you may have to add the corresponding on the other side with the Charset of the InputStream.

writing to file: the difference between stream and writer

Hi I have a bit of confusion about the stream to use to write in a text file
I had seen some example:
one use the PrintWriter stream
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter(fname)));
out.println(/*something to write*/);
out.close();
this instead use:
PrintStream out = new PrintStream(new FileOutputStream(fname));
out.println(/*something to write*/)
but which is the difference?both write in a file with the same result?
PrintWriter is new as of Java 1.1; it is more capable than the PrintStream class. You should use PrintWriter instead of PrintStream because it uses the default encoding scheme to convert characters to bytes for an underlying OutputStream. The constructors for PrintStream are deprecated in Java 1.1. In fact, the whole class probably would have been deprecated, except that it would have generated a lot of compilation warnings for code that uses System.out and System.err.
PrintWriter is for writing text, whereas PrintStream is for writing data - raw bytes. PrintWriter may change the encoding of the bytes to make handling text easier, so it might corrupt your data.
PrintWriter extends the class Writer, a class thinked to write characters, while PrintStream implements OutputStream, an interface for more generic outputs.

Write string to output stream

I have several output listeners that are implementing OutputStream.
It can be either a PrintStream writing to stdout or to a File, or it can be writing to memory or any other output destination; therefore, I specified OutputStream as (an) argument in the method.
Now, I have received the String. What is the best way to write to streams here?
Should I just use Writer.write(message.getBytes())? I can give it bytes, but if the destination stream is a character stream then will it convert automatically?
Do I need to use some bridge streams here instead?
Streams (InputStream and OutputStream) transfer binary data. If you want to write a string to a stream, you must first convert it to bytes, or in other words encode it. You can do that manually (as you suggest) using the String.getBytes(Charset) method, but you should avoid the String.getBytes() method, because that uses the default encoding of the JVM, which can't be reliably predicted in a portable way.
The usual way to write character data to a stream, though, is to wrap the stream in a Writer, (often a PrintWriter), that does the conversion for you when you call its write(String) (or print(String)) method. The corresponding wrapper for InputStreams is a Reader.
PrintStream is a special OutputStream implementation in the sense that it also contain methods that automatically encode strings (it uses a writer internally). But it is still a stream. You can safely wrap your stream with a writer no matter if it is a PrintStream or some other stream implementation. There is no danger of double encoding.
Example of PrintWriter with OutputStream:
try (PrintWriter p = new PrintWriter(new FileOutputStream("output-text.txt", true))) {
p.println("Hello");
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
OutputStream writes bytes, String provides chars. You need to define Charset to encode string to byte[]:
outputStream.write(string.getBytes(Charset.forName("UTF-8")));
Change UTF-8 to a charset of your choice.
You can create a PrintStream wrapping around your OutputStream and then just call it's print(String):
final OutputStream os = new FileOutputStream("/tmp/out");
final PrintStream printStream = new PrintStream(os);
printStream.print("String");
printStream.close();
By design it is to be done this way:
OutputStream out = ...;
try (Writer w = new OutputStreamWriter(out, "UTF-8")) {
w.write("Hello, World!");
} // or w.close(); //close will auto-flush
Wrap your OutputStream with a PrintWriter and use the print methods on that class. They take in a String and do the work for you.
You may use Apache Commons IO:
try (OutputStream outputStream = ...) {
IOUtils.write("data", outputStream, "UTF-8");
}
IOUtils.write(String data, OutputStream output, String encoding)

Why an InputStream obj cannot be nested into a BufferedReader obj directly?

An OutputStream obj can be connected into a PrintWriter obj directly, e.g.,
//either is OK
new PrintWriter(socket.getOutputStream());
new PrintWriter(new OutputStreamWriter(socket.getOutputStream()));
But in the case of an InputStream obj, it must be connected to a BufferedReader obj through an InputStreamReader obj, that is,
new BufferedReader(new InputStreamReader(socket.getInputStream())); //OK
new BufferedReader(socket.getInputStream()); //doesnt work
Is there any reason for this inconsistency of API design?
There isn't any inconsistency... you should be comparing BufferedReader and BufferedWriter. They exist to wrap other Readers and Writers respectively.
The basic reason for that is that different types of Readers and Writers may have different ways of being initialized and different ways of functioning, not necessarily wrapping an InputStream or OutputStream at all. In your example of a BufferedReader wrapping an InputStreamReader, InputStreamReader can (and generally should) be initialized with both an InputStream and a Charset. Should BufferedReader have an overload for that, when its only job is to provide buffering?
Java introduced Reader and Writer hierarchy (java 1.1 I think) when Input and output stream classes were already in use. Therefore using a bridge pattern they allow you to have stream classes passed into reader classes.
Further for writer also PritnerWriter is directly the bridge class which is equivalent to InputStreamReader. You will see the same thing for BufferedWriter too
For more info read up http://www.codeguru.com/java/tij/tij0114.shtml
The BufferedReader is probably just decorating the InputReader being passed in. It makes no sense for a BufferedReader to accept a class it can't decorate, like an InputStream.

Open InputStream as Reader

Can I easily convert InputStream to BufferedReader using Guava?
I'm looking for something like:
InputStream inputStream = ...;
BufferedReader br = Streams.newBufferedReader(inputStream);
I can open files using the Files.newReader(File file, Charset charset). That's cool and I want to do the same using the InputStream.
UPDATE:
Using CharStreams.newReaderSupplier seems to verbose for me. Correct me if I'm wrong, but in order to easily convert InputStream to BufferedReader using Guava I have to do something like that:
final InputStream inputStream = new FileInputStream("/etc/fstab");
Reader bufferedReader = new BufferedReader(CharStreams.newReaderSupplier(new InputSupplier<InputStream>(){
public InputStream getInput() throws IOException {
return inputStream;
}
}, Charset.defaultCharset()).getInput());
Of course I can create helper do sth like:
return new BufferedReader(new InputStreamReader(inputStream));
However I think that such helper should be offered by Guava IO. I can do such trick for File instance. Why cannot I for InputStream?
// Guava can do this
Reader r = Files.newReader(new File("foo"), charset);
// but cannot do this
Reader r = SomeGuavaUtil.newReader(inputStream, charset);
Correct me If I'm wrong but it seems to me like lack in the API.
No, there isn't anything quite like that in Guava. CharStreams is the general class for working with Readers and Writers and it has a method
InputSupplier<InputStreamReader> newReaderSupplier(
InputSupplier<? extends InputStream> in, Charset charset)
which could be useful with any kind of supplier of InputStreams.
Obviously, you can just write new BufferedReader(new InputStreamReader(in, charset)) or wrap that in your own factory method as well.
Edit:
Yes, you wouldn't want to use the InputSupplier version when you already have an InputStream. It's sort of like how it's a bad idea to make an Iterable that can actually only work once, such as one that wraps an existing Iterator or Enumeration or some such. In general, using InputSupplier requires thinking about how you do I/O a little different, such as thinking of a File as something that can act as a supplier of FileInputStreams. I've used InputSuppliers that wrap whole requests to a server and return the response content as an InputStream, enabling me to use Guava utilities to copy that to a file, etc.
In any case, I'm not entirely sure why CharStreams doesn't have a method to create a Reader from an InputStream other than perhaps they didn't feel it was needed. You may want to file an issue requesting this.

Categories

Resources