Suggestion for API design about stream

Suggestion for API design about stream - java

I need to design an API method which takes an OutputStream as a parameter.
Is it a good practice to close the stream inside the API method or let the caller close it?
test(OutputStream os) {
os.close() //???
}

I think it should be symmetric.
If you do not open that stream (which is likely to be your case), you should not close it, either, in general.

Unless the purpose of the API is to "finish up the stream", you should let the caller close. He had it first, he was responsible for it, and he may decide that he wants to write some stuff to the stream that your API didn't originally envision. Keep your functionality seperated; its more composable.

Let user close it. As you are taking OutputStream in argument so we can think that user has already created and opened it. So if you close in your method it will be not good. And if you are just taking new OutputStream as argument and opens it in your method then no need to take it as argument and you can also close it in your method.

Different use-cases require different patterns, for example, depending on whether the caller needs to read from or write to the stream after the call has completed.
The key API design rule is that the API should specify whether it is the caller or called method's responsibility to close the stream.
Having said that, it is generally simpler and safer if the code that opens a stream is also responsible for closing it.
Consider the case where methodA is supposed to open a stream and pass it to methodB, but an exception is thrown between the stream being opened and methodB entering the try / finally statement that is ultimately responsible for closing it. You need to code it something like the following to ensure that streams don't leak:
public void methodA() throws IOException {
InputStream myStream = new FileInputStream(...);
try {
// do stuff with stream
methodB(myStream);
} finally {
myStream.close();
}
}
/**
* #param myStream this method is responsible for closing myStream.
*/
public void methodB(InputStream myStream) throws IOException {
try {
// do more stuff with myStream
} finally {
myStream.close();
}
}
This won't leak an open stream as a result of exceptions (or errors!) thrown in either methodA or methodB. (It works for the standard stream types because the Closable API specifies that close has no effect when called on a stream that is already closed.)

Related

JVM not killed on SIGPIPE

What is the reason for the JVM handling SIGPIPE the way it does?
I would've expected for
java foo | head -10
with
public class Foo {
public static void main(String[] args){
Stream.iterate(0, n -> n + 1).forEach(System.out::println);
}
}
to cause the process to be killed when writing the 11th line, however that is not the case. Instead, it seems that only a trouble flag is being set at the PrintStream, which can be checked through System.out.checkError().

What happens is that the SIGPIPE exception results in an IOException.
For most OutputStream and Writer classes, this exception propagates through the "write" method, and has to be handled by the caller.
However, when you are writing to System.out, you are using a PrintStream, and that class by design takes care of the IOException of you. As the javadoc says:
A PrintStream adds functionality to another output stream, namely the ability to print representations of various data values conveniently. Two other features are provided as well. Unlike other output streams, a PrintStream never throws an IOException; instead, exceptional situations merely set an internal flag that can be tested via the checkError method.
What is the reason for the JVM handling SIGPIPE the way it does?
The above explains what is happening. The "why" is ... I guess ... that the designers wanted to make PrintStream easy to use for typical use cases of System.out where the caller doesn't want to deal with a possible IOException on every call.
Unfortunately, there is no elegant solution to this:
You could just call checkError ...
You should be able get hold of the FileDescriptor.out object, and wrap it in a new FileOutputStream object ... and use that instead of System.out.
Note that there are no strong guarantees that the Java app will only write 10 lines of output in java foo | head -1. It is quite possible for the app to write-ahead many lines, and to only "see" the pipe closed after head has gotten around to reading the first 10 of them. This applies with System.out (and checkError) or if you wrap FileDescriptor.

What is the use case for null(Input/Output)Stream API in Java?

With Java 11, I could initialize an InputStream as:
InputStream inputStream = InputStream.nullInputStream();
But I am unable to understand a potential use case of InputStream.nullInputStream or a similar API for OutputStream
i.e. OutputStream.nullOutputStream.
From the API Javadocs, I could figure out that it
Returns a new InputStream that reads no bytes. The returned stream is
initially open. The stream is closed by calling the close() method.
Subsequent calls to close() have no effect. While the stream is open,
the available(), read(), read(byte[]), ...
skip(long), and transferTo() methods all behave as if end of stream
has been reached.
I went through the detailed release notes further which states:
There are various times where I would like to use methods that require
as a parameter a target OutputStream/Writer for sending output, but
would like to execute those methods silently for their other effects.
This corresponds to the ability in Unix to redirect command output to
/dev/null, or in DOS to append command output to NUL.
Yet I fail to understand what are those methods in the statement as stated as .... execute those methods silently for their other effects. (blame my lack of hands-on with the APIs)
Can someone help me understand what is the usefulness of having such an input or output stream with a help of an example if possible?
Edit: One of a similar implementation I could find on browsing further is apache-commons' NullInputStream, which does justify the testing use case much better.

Sometimes you want to have a parameter of InputStream type, but also to be able to choose not to feed your code with any data. In tests it's probably easier to mock it but in production you may choose to bind null input instead of scattering your code with ifs and flags.
compare:
class ComposableReprinter {
void reprint(InputStream is) throws IOException {
System.out.println(is.read());
}
void bla() {
reprint(InputStream.nullInputStream());
}
}
with this:
class ControllableReprinter {
void reprint(InputStream is, boolean for_real) throws IOException {
if (for_real) {
System.out.println(is.read());
}
}
void bla() {
reprint(new BufferedInputStream(), false);
}
}
or this:
class NullableReprinter {
void reprint(InputStream is) throws IOException {
if (is != null) {
System.out.println(is.read());
}
}
void bla() {
reprint(null);
}
}
It makes more sense with output IMHO. Input is probably more for consistency.
This approach is called Null Object: https://en.wikipedia.org/wiki/Null_object_pattern

I see it as a safer (1) and more expressive (2) alternative to initialising a stream variable with null.
No worries about NPEs.
[Output|Input]Stream is an abstraction. In order to return a null/empty/mock stream, you had to deviate from the core concept down to a specific implementation.

I think nullOutputStream is very easy and clear: just to discard output (similar to > /dev/null) and/or for testing (no need to invent an OutputStream).
An (obviously basic) example:
OutputStream out = ... // an easy way to either print it to System.out or just discard all prints, setting it basically to the nullOutputStream
out.println("yeah... or not");
exporter.exportTo(out); // discard or real export?
Regarding nullInputStream it's probably more for testing (I don't like mocks) and APIs requiring an input stream or (this now being more probable) delivering an input stream which does not contain any data, or you can't deliver and where null is not a viable option:
importer.importDocument("name", /* input stream... */);
InputStream inputStream = content.getInputStream(); // better having no data to read, then getting a null
When you test that importer, you can just use a nullInputStream there, again instead of inventing your own InputStream or instead of using a mock. Other use cases here rather look like a workaround or misuse of the API ;-)
Regarding the return of an InputStream: that rather makes sense. If you haven't any data you may want to return that nullInputStream instead of null so that callers do not have to deal with null and can just read as they would if there was data.
Finally, these are just convenience methods to make our lifes easier without adding another dependency ;-) and as others already stated (comments/answers), it's basically an implementation of the null object pattern.
Using the null*Stream might also have the benefit that tests are executed faster... if you stream real data (of course... depending on size, etc.) you may just slow down your tests unnecessarily and we all want tests to complete fast, right? (some will put in mocks here... well...)

Why is Files.lines (and similar Streams) not automatically closed?

The javadoc for Stream states:
Streams have a BaseStream.close() method and implement AutoCloseable, but nearly all stream instances do not actually need to be closed after use. Generally, only streams whose source is an IO channel (such as those returned by Files.lines(Path, Charset)) will require closing. Most streams are backed by collections, arrays, or generating functions, which require no special resource management. (If a stream does require closing, it can be declared as a resource in a try-with-resources statement.)
Therefore, the vast majority of the time one can use Streams in a one-liner, like collection.stream().forEach(System.out::println); but for Files.lines and other resource-backed streams, one must use a try-with-resources statement or else leak resources.
This strikes me as error-prone and unnecessary. As Streams can only be iterated once, it seems to me that there is no a situation where the output of Files.lines should not be closed as soon as it has been iterated, and therefore the implementation should simply call close implicitly at the end of any terminal operation. Am I mistaken?

Yes, this was a deliberate decision. We considered both alternatives.
The operating design principle here is "whoever acquires the resource should release the resource". Files don't auto-close when you read to EOF; we expect files to be closed explicitly by whoever opened them. Streams that are backed by IO resources are the same.
Fortunately, the language provides a mechanism for automating this for you: try-with-resources. Because Stream implements AutoCloseable, you can do:
try (Stream<String> s = Files.lines(...)) {
s.forEach(...);
}
The argument that "it would be really convenient to auto-close so I could write it as a one-liner" is nice, but would mostly be the tail wagging the dog. If you opened a file or other resource, you should also be prepared to close it. Effective and consistent resource management trumps "I want to write this in one line", and we chose not to distort the design just to preserve the one-line-ness.

I have more specific example in addition to #BrianGoetz answer. Don't forget that the Stream has escape-hatch methods like iterator(). Suppose you are doing this:
Iterator<String> iterator = Files.lines(path).iterator();
After that you may call hasNext() and next() several times, then just abandon this iterator: Iterator interface perfectly supports such use. There's no way to explicitly close the Iterator, the only object you can close here is the Stream. So this way it would work perfectly fine:
try(Stream<String> stream = Files.lines(path)) {
Iterator<String> iterator = stream.iterator();
// use iterator in any way you want and abandon it at any moment
} // file is correctly closed here.

In addition if you want "one line write". You can just do this:
Files.readAllLines(source).stream().forEach(...);
You can use it if you are sure that you need entire file and the file is small. Because it isn't a lazy read.

If you're lazy like me and don't mind the "if an exception is raised, it will leave the file handle open" you could wrap the stream in an autoclosing stream, something like this (there may be other ways):
static Stream<String> allLinesCloseAtEnd(String filename) throws IOException {
Stream<String> lines = Files.lines(Paths.get(filename));
Iterator<String> linesIter = lines.iterator();
Iterator it = new Iterator() {
#Override
public boolean hasNext() {
if (!linesIter.hasNext()) {
lines.close(); // auto-close when reach end
return false;
}
return true;
}
#Override
public Object next() {
return linesIter.next();
}
};
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(it, Spliterator.DISTINCT), false);
}

Closing streams in the middle of pipelines

When I execute this code which opens a lot of files during a stream pipeline:
public static void main(String[] args) throws IOException {
Files.find(Paths.get("JAVA_DOCS_DIR/docs/api/"),
100, (path, attr) -> path.toString().endsWith(".html"))
.map(file -> runtimizeException(() -> Files.lines(file, StandardCharsets.ISO_8859_1)))
.map(Stream::count)
.forEachOrdered(System.out::println);
}
I get an exception:
java.nio.file.FileSystemException: /long/file/name: Too many open files
The problem is that Stream.count does not close the stream when it is done traversing it. But I don't see why it shouldn't, given that it is a terminal operation. The same holds for other terminal operations such as reduce and forEach. flatMap on the other hand closes the streams it consists of.
The documentation tells me to use a try-with-resouces-statement to close streams if necessary. In my case I could replace the count line with something like this:
.map(s -> { long c = s.count(); s.close(); return c; } )
But that is noisy and ugly and could be a real inconvenience in some cases with big, complex pipelines.
So my questions are the following:
Why were the streams not designed so that terminal operations close the streams they are working on? That would make them work better with IO streams.
What is the best solution for closing IO streams in pipelines?
runtimizeException is a method that wraps checked exception in RuntimeExceptions.

There are two issues here: handling of checked exceptions such as IOException, and timely closing of resources.
None of the predefined functional interfaces declare any checked exceptions, which means that they have to be handled within the lambda, or wrapped in an unchecked exception and rethrown. It looks like your runtimizeException function does that. You probably also had to declare your own functional interface for it. As you've probably discovered, this is a pain.
On the closing of resources like files, there was some investigation of having streams be closed automatically when the end of the stream was reached. This would be convenient, but it doesn't deal with closing when an exception is thrown. There's no magic do-the-right-thing mechanism for this in streams.
We're left with the standard Java techniques of dealing with resource closure, namely the try-with-resources construct introduced in Java 7. TWR really wants to have resources be closed at the same level in the call stack as they were opened. The principle of "whoever opens it has to close it" applies. TWR also deals with exception handling, which usually makes it convenient to deal with exception handling and resource closing in the same place.
In this example, the stream is somewhat unusual in that it maps a Stream<Path> to a Stream<Stream<String>>. These nested streams are the ones that aren't closed, resulting in the eventual exception when the system runs out of open file descriptors. What makes this difficult is that files are opened by one stream operation and then passed downstream; this makes it impossible to use TWR.
An alternative approach to structuring this pipeline is as follows.
The Files.lines call is the one that opens the file, so this has to be the resource in the TWR statement. The processing of this file is where (some) IOExceptions get thrown, so we can do the exception wrapping in the same TWR statement. This suggests having a simple function that maps the path to a line count, while handling resource closing and exception wrapping:
long lineCount(Path path) {
try (Stream<String> s = Files.lines(path, StandardCharsets.ISO_8859_1)) {
return s.count();
} catch (IOException ioe) {
throw new UncheckedIOException(ioe);
}
}
Once you have this helper function, the main pipeline looks like this:
Files.find(Paths.get("JAVA_DOCS_DIR/docs/api/"),
100, (path, attr) -> path.toString().endsWith(".html"))
.mapToLong(this::lineCount)
.forEachOrdered(System.out::println);

It is possible to create a utility method that reliably closes streams in the middle of a pipeline.
This makes sure that each resource is closed with a try-with-resource-statement but avoids the need for a custom utility method, and is much less verbose than writing the try-statement directly in the lambda.
With this method the pipeline from the question looks like this:
Files.find(Paths.get("Java_8_API_docs/docs/api"), 100,
(path, attr) -> path.toString().endsWith(".html"))
.map(file -> applyAndClose(
() -> Files.lines(file, StandardCharsets.ISO_8859_1),
Stream::count))
.forEachOrdered(System.out::println);
The implementation looks like this:
/**
* Applies a function to a resource and closes it afterwards.
* #param sup Supplier of the resource that should be closed
* #param op operation that should be performed on the resource before it is closed
* #return The result of calling op.apply on the resource
*/
private static <A extends AutoCloseable, B> B applyAndClose(Callable<A> sup, Function<A, B> op) {
try (A res = sup.call()) {
return op.apply(res);
} catch (RuntimeException exc) {
throw exc;
} catch (Exception exc) {
throw new RuntimeException("Wrapped in applyAndClose", exc);
}
}
(Since resources that need to be closed often also throw exceptions when they are allocated non-runtime exceptions are wrapped in runtime exceptions, avoiding the need for a separate method that does that.)

You will need to call close() in this stream operation, which will cause all underlying close handlers to be called.
Better yet, would be to wrap your whole statement in a try-with-resources block, as then it will automagically call the close handler.
This may not be possibility in your situation, this means that you will need to handle it yourself in some operation. Your current methods may not be suited for streams at all.
It seems like you indeed need to do it in your second map() operation.

The close of the interface AutoCloseable should only be called once. See the documentation of AutoCloseable for more information.
If final operations would close the stream automatically, close might be invoked twice. Take a look at the following example:
try (Stream<String> lines = Files.lines(path)) {
lines.count();
}
As it is defined right now, the close method on lines will be invoked exactly once. Regardless whether the final operation completes normally, or the operation is aborted with in IOException. If the stream would instead be closed implicitly in the final operation, the close method would be called once, if an IOException occurs, and twice if the operation completes successfully.

Here is an alternative which uses another method from Files and will avoid leaking file descriptors:
Files.find(Paths.get("JAVA_DOCS_DIR/docs/api/"),
100, (path, attr) -> path.toString().endsWith(".html"))
.map(file -> runtimizeException(() -> Files.readAllLines(file, StandardCharsets.ISO_8859_1).size())
.forEachOrdered(System.out::println);
Unlike your version, it will return an int instead of a long for the line count; but you don't have files with that many lines, do you?

Java Detect Closed Stream

I have a general socket implementation consisting of an OutputStream and an InputStream.
After I do some work, I am closing the OutputStream.
When this is done, my InputStream's read() method returns -1 for an infinite amount of time, instead of throwing an exception like I had anticipated.
I am now unsure of the safest route to take, so I have a few of questions:
Am I safe to assume that -1 is only
returned when the stream is closed?
Is there no way to recreate the IO
exception that occurs when the
connection is forcefully broken?
Should I send a packet that will tell my InputStream that it should close instead of the previous two methods?
Thanks!

The -1 is the expected behavior at the end of a stream. See InputStream.read():
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.
You should still catch IOException for unexpected events of course.

Am I safe to assume that -1 is only returned when the stream is closed?
Yes.
You should not assume things like this. You should read the javadoc and implement according how the API is specified to behave. Especially if you want your code to be robust (or "safe" as you put it.)
Having said that, this is more or less what the javadoc says in this case. (One could quibble that EOF and "stream has been closed" don't necessarily mean the same thing ... and that closing the stream by calling InputStream.close() or Socket.close() locally will have a different effect. However, neither of these are directly relevant to your use-case.)
Is there no way to recreate the IO exception that occurs when the connection is forcefully broken?
No. For a start, no exception is normally thrown in the first place, so there is typically nothing to "recreate". Second the information in the original exception (if there ever was one) is gone.
Should I send a packet that will tell my InputStream that it should close instead of the previous two methods?
No. The best method is to test the result of the read call. You need to test it anyway, since you cannot assume that the read(byte[]) method (or whatever) will have returned the number of bytes you actually asked for.
I suppose that throwing an application specific exception would be OK under some circumstances.
But remember the general principle that exceptions should not be used for normal flow control.
One of the other answers suggests creating a proxy InputStream that throws some exception instead of returning -1.
IMO, that is a bad idea. You end up with a proxy class that claims to be an InputStream, but violates the contract of the read methods. That could lead to trouble if the proxy was passed to something that expected a properly implemented InputStream.
Second, InputStream is an abstract class not an interface, so Java's dynamic proxy mechanism won't work. (For example, the newProxyInstance method requires a list of interfaces, not classes.)

According to the InputStream javadoc, read() returns:
the next byte of data, or -1 if the end of the stream is reached.
So you are safe to assume that and it's better to use what's specified in the API than try and recreate an exception because exceptions thrown could be implementation-dependent.

Also, closing the Outputs Stream in a socket closes the socket itself.
This is what the JavaDoc for Socket says:
public OutputStream getOutputStream()
throws IOException
Returns an output stream for this socket.
If this socket has an associated channel then the resulting output
stream delegates all of its operations
to the channel. If the channel is in
non-blocking mode then the output
stream's write operations will throw
an IllegalBlockingModeException.
Closing the returned OutputStream will close the associated socket.
Returns:
an output stream for writing bytes to this socket.
Throws:
IOException - if an I/O error occurs when creating the output stream
or if the socket is not connected.
Not sure that this is what you actually want to do.

Is there no way to recreate the IO exception that occurs when the connection is forcefully broken?
I'll answer this one. InputStream is only an interface. If you really want implementation to throw an exception on EOF, provide your own small wrapper, override read()s and throw an exception on -1 result.
The easiest (least coding) way would be to use a Dynamic Proxy:
InputStream pxy = (InputStream) java.lang.reflect.Proxy.newProxyInstance(
obj.getClass().getClassLoader(),
new Class[]{ InputStream.class },
new ThrowOnEOFProxy(obj));
where ThrowOnEOFProxy would check the method name, call it and if result is -1, throw IOException("EOF").

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.