Why is Files.lines (and similar Streams) not automatically closed?

Why is Files.lines (and similar Streams) not automatically closed? - java

The javadoc for Stream states:
Streams have a BaseStream.close() method and implement AutoCloseable, but nearly all stream instances do not actually need to be closed after use. Generally, only streams whose source is an IO channel (such as those returned by Files.lines(Path, Charset)) will require closing. Most streams are backed by collections, arrays, or generating functions, which require no special resource management. (If a stream does require closing, it can be declared as a resource in a try-with-resources statement.)
Therefore, the vast majority of the time one can use Streams in a one-liner, like collection.stream().forEach(System.out::println); but for Files.lines and other resource-backed streams, one must use a try-with-resources statement or else leak resources.
This strikes me as error-prone and unnecessary. As Streams can only be iterated once, it seems to me that there is no a situation where the output of Files.lines should not be closed as soon as it has been iterated, and therefore the implementation should simply call close implicitly at the end of any terminal operation. Am I mistaken?

Yes, this was a deliberate decision. We considered both alternatives.
The operating design principle here is "whoever acquires the resource should release the resource". Files don't auto-close when you read to EOF; we expect files to be closed explicitly by whoever opened them. Streams that are backed by IO resources are the same.
Fortunately, the language provides a mechanism for automating this for you: try-with-resources. Because Stream implements AutoCloseable, you can do:
try (Stream<String> s = Files.lines(...)) {
s.forEach(...);
}
The argument that "it would be really convenient to auto-close so I could write it as a one-liner" is nice, but would mostly be the tail wagging the dog. If you opened a file or other resource, you should also be prepared to close it. Effective and consistent resource management trumps "I want to write this in one line", and we chose not to distort the design just to preserve the one-line-ness.

I have more specific example in addition to #BrianGoetz answer. Don't forget that the Stream has escape-hatch methods like iterator(). Suppose you are doing this:
Iterator<String> iterator = Files.lines(path).iterator();
After that you may call hasNext() and next() several times, then just abandon this iterator: Iterator interface perfectly supports such use. There's no way to explicitly close the Iterator, the only object you can close here is the Stream. So this way it would work perfectly fine:
try(Stream<String> stream = Files.lines(path)) {
Iterator<String> iterator = stream.iterator();
// use iterator in any way you want and abandon it at any moment
} // file is correctly closed here.

In addition if you want "one line write". You can just do this:
Files.readAllLines(source).stream().forEach(...);
You can use it if you are sure that you need entire file and the file is small. Because it isn't a lazy read.

If you're lazy like me and don't mind the "if an exception is raised, it will leave the file handle open" you could wrap the stream in an autoclosing stream, something like this (there may be other ways):
static Stream<String> allLinesCloseAtEnd(String filename) throws IOException {
Stream<String> lines = Files.lines(Paths.get(filename));
Iterator<String> linesIter = lines.iterator();
Iterator it = new Iterator() {
#Override
public boolean hasNext() {
if (!linesIter.hasNext()) {
lines.close(); // auto-close when reach end
return false;
}
return true;
}
#Override
public Object next() {
return linesIter.next();
}
};
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(it, Spliterator.DISTINCT), false);
}

Related

Storing a reference to a Stream

I have a class which manages a Stream:
class MyStreamManager {
private Stream<Object> currentStream = null;
boolean hasMoreData() {
//code here to assert currentStream is null
final Optional<Stream<Object>> maybeAStream = somethingWhichMightProvideAStream.getNextStream();
currentStream = maybeAStream.orElse(null);
return currentStream != null;
}
#MustBeClosed
Stream<Object> getCurrentStream() { return currentStream; }
void finish() {
currentStream.close();
currentStream = null;
}
}
Which is used in the following style:
while (myStreamManager.hasMoreData()) {
try {
myStreamManager.getCurrentStream().map(...).filter(...); //etc
} finally {
myStreamManager.finish();
}
}
Is storing a reference to a Stream like this bad practice? While this works, it definitely doesn't feel right, and ErrorProne is flagging it (hence the #MustBeClosed annotation).
MyStreamManager is a Spring #Bean but is only used by one thread (this is running in a batch).
I can think of two different approaches which are probably better:
instantiate MyStreamManager and wrap it in a try-with-resources, delegating the close() call to the Stream
use the Spliterators class to create a Spliterator that delegates to many Streams?

I don't think that it's as much the fact you're storing a Stream per se that makes this feel awkward, but rather that you've got sequential coupling.
You have to call hasMoreData; then getCurrentStream(); then finish(). If you're only using the class in a limited number of places, you will probably be able to get it right in all of those; but every place you use it is a new opportunity to use it incorrectly.
I would say that your manager class is actually just making things harder for yourself.
for (Optional<Stream<Object>> opt = somethingWhichMightProvideAStream.getNextStream();
opt.isPresent();
opt = somethingWhichMightProvideAStream.getNextStream()) {
try (Stream<Object> stream = opt.get()) { // try-with-resources auto-closes the stream
stream.map(...).filter(...); //etc
}
}
or:
Optional<Stream<Object>> opt;
while ((opt = somethingWhichMightProvideAStream.getNextStream()).isPresent()) {
try (Stream<Object> stream = opt.get()) {
stream.map(...).filter(...); //etc
}
}
The loop declarations in either case are not especially pretty; but this is way shorter (roughly as long as the while/try/finally loop you already have), and harder to use wrong, I think.
(Admittedly, you've still got sequential coupling here: you have to remember to close the stream returned in the optional. Sigh.)

Mixing imperative (while loop, try-finally) and declarative (streams) code together doesn't seem right.
If all of these opeartions are synchronous I guess it could be done in one pipeline (without MyStreamManager at all).
I think that you could think of focusing on moving some logic to object containing method somethingWhichMightProvideAStream because mixing imperative iterator pattern with stream API doesn't look like idiomatic. For example it can return List (or even better a Stream!) of Streams instead of Optional
Think twice if you really need to close this stream. From documentation:
Streams have a BaseStream.close() method and implement AutoCloseable, but nearly all stream instances do not actually need to be closed after use. Generally, only streams whose source is an IO channel (such as those returned by Files.lines(Path, Charset)) will require closing.

What is the use case for null(Input/Output)Stream API in Java?

With Java 11, I could initialize an InputStream as:
InputStream inputStream = InputStream.nullInputStream();
But I am unable to understand a potential use case of InputStream.nullInputStream or a similar API for OutputStream
i.e. OutputStream.nullOutputStream.
From the API Javadocs, I could figure out that it
Returns a new InputStream that reads no bytes. The returned stream is
initially open. The stream is closed by calling the close() method.
Subsequent calls to close() have no effect. While the stream is open,
the available(), read(), read(byte[]), ...
skip(long), and transferTo() methods all behave as if end of stream
has been reached.
I went through the detailed release notes further which states:
There are various times where I would like to use methods that require
as a parameter a target OutputStream/Writer for sending output, but
would like to execute those methods silently for their other effects.
This corresponds to the ability in Unix to redirect command output to
/dev/null, or in DOS to append command output to NUL.
Yet I fail to understand what are those methods in the statement as stated as .... execute those methods silently for their other effects. (blame my lack of hands-on with the APIs)
Can someone help me understand what is the usefulness of having such an input or output stream with a help of an example if possible?
Edit: One of a similar implementation I could find on browsing further is apache-commons' NullInputStream, which does justify the testing use case much better.

Sometimes you want to have a parameter of InputStream type, but also to be able to choose not to feed your code with any data. In tests it's probably easier to mock it but in production you may choose to bind null input instead of scattering your code with ifs and flags.
compare:
class ComposableReprinter {
void reprint(InputStream is) throws IOException {
System.out.println(is.read());
}
void bla() {
reprint(InputStream.nullInputStream());
}
}
with this:
class ControllableReprinter {
void reprint(InputStream is, boolean for_real) throws IOException {
if (for_real) {
System.out.println(is.read());
}
}
void bla() {
reprint(new BufferedInputStream(), false);
}
}
or this:
class NullableReprinter {
void reprint(InputStream is) throws IOException {
if (is != null) {
System.out.println(is.read());
}
}
void bla() {
reprint(null);
}
}
It makes more sense with output IMHO. Input is probably more for consistency.
This approach is called Null Object: https://en.wikipedia.org/wiki/Null_object_pattern

I see it as a safer (1) and more expressive (2) alternative to initialising a stream variable with null.
No worries about NPEs.
[Output|Input]Stream is an abstraction. In order to return a null/empty/mock stream, you had to deviate from the core concept down to a specific implementation.

I think nullOutputStream is very easy and clear: just to discard output (similar to > /dev/null) and/or for testing (no need to invent an OutputStream).
An (obviously basic) example:
OutputStream out = ... // an easy way to either print it to System.out or just discard all prints, setting it basically to the nullOutputStream
out.println("yeah... or not");
exporter.exportTo(out); // discard or real export?
Regarding nullInputStream it's probably more for testing (I don't like mocks) and APIs requiring an input stream or (this now being more probable) delivering an input stream which does not contain any data, or you can't deliver and where null is not a viable option:
importer.importDocument("name", /* input stream... */);
InputStream inputStream = content.getInputStream(); // better having no data to read, then getting a null
When you test that importer, you can just use a nullInputStream there, again instead of inventing your own InputStream or instead of using a mock. Other use cases here rather look like a workaround or misuse of the API ;-)
Regarding the return of an InputStream: that rather makes sense. If you haven't any data you may want to return that nullInputStream instead of null so that callers do not have to deal with null and can just read as they would if there was data.
Finally, these are just convenience methods to make our lifes easier without adding another dependency ;-) and as others already stated (comments/answers), it's basically an implementation of the null object pattern.
Using the null*Stream might also have the benefit that tests are executed faster... if you stream real data (of course... depending on size, etc.) you may just slow down your tests unnecessarily and we all want tests to complete fast, right? (some will put in mocks here... well...)

How to make a Stream from a DirectoryStream

When reading the API for DirectoryStream I miss a lot of functions. First of all it suggests using a for loop to go from stream to List. And I miss the fact that it a DirectoryStream is not a Stream.
How can I make a Stream<Path> from a DirectoryStream in Java 8?

While it is possible to convert a DirectoryStream into a Stream using its spliterator method, there is no reason to do so. Just create a Stream<Path> in the first place.
E.g., instead of calling Files.newDirectoryStream(Path) just call Files.list(Path).
The overload of newDirectoryStream which accepts an additional Filter may be replaced by Files.list(Path).filter(Predicate) and there are additional operations like Files.find and Files.walk returning a Stream<Path>, however, I did not find a replacement for the case you want to use the “glob pattern”. That seems to be the only case where translating a DirectoryStream into a Stream might be useful (I prefer using regular expressions anyway)…

DirectoryStream is not a Stream (it's been there since Java 7, before the streams api was introduced in Java 8) but it implements the Iterable<Path> interface so you could write:
try (DirectoryStream<Path> ds = ...) {
Stream<Path> s = StreamSupport.stream(ds.spliterator(), false);
}

DirectoryStream has a method that returns a spliterator. So just do:
Stream<Path> stream = StreamSupport.stream(myDirectoryStream.spliterator(), false);
You might want to see this question, which is basically what your problem reduces to: How to create a Stream from an Iterable.

Java 8 Streams and try with resources

I thought that the stream API was here to make the code easier to read.
I found something quite annoying. The Stream interface extends the java.lang.AutoCloseable interface.
So if you want to correctly close your streams, you have to use try with resources.
Listing 1. Not very nice, streams are not closed.
public void noTryWithResource() {
Set<Integer> photos = new HashSet<Integer>(Arrays.asList(1, 2, 3));
#SuppressWarnings("resource") List<ImageView> collect = photos.stream()
.map(photo -> new ImageView(new Image(String.valueOf(photo))))
.collect(Collectors.<ImageView>toList());
}
Listing 2. With 2 nested try
public void tryWithResource() {
Set<Integer> photos = new HashSet<Integer>(Arrays.asList(1, 2, 3));
try (Stream<Integer> stream = photos.stream()) {
try (Stream<ImageView> map = stream
.map(photo -> new ImageView(new Image(String.valueOf(photo)))))
{
List<ImageView> collect = map.collect(Collectors.<ImageView>toList());
}
}
}
Listing 3. As map returns a stream, both the stream() and the map() functions have to be closed.
public void tryWithResource2() {
Set<Integer> photos = new HashSet<Integer>(Arrays.asList(1, 2, 3));
try (Stream<Integer> stream = photos.stream(); Stream<ImageView> map = stream.map(photo -> new ImageView(new Image(String.valueOf(photo)))))
{
List<ImageView> collect = map.collect(Collectors.<ImageView>toList());
}
}
The example I give does not make any sense. I replaced Path to jpg images with Integer, for the sake of the example. But don't let you distract by these details.
What is the best way to go around with those auto closable streams.
I have to say I'm not satisfied with any of the 3 options I showed.
What do you think? Are there yet other more elegant solutions?

You're using #SuppressWarnings("resource") which presumably suppresses a warning about an unclosed resource. This isn't one of the warnings emitted by javac. Web searches seem to indicate that Eclipse issues warnings if an AutoCloseable is left unclosed.
This is a reasonable warning according to the Java 7 specification that introduced AutoCloseable:
A resource that must be closed when it is no longer needed.
However, the Java 8 specification for AutoCloseable was relaxed to remove the "must be closed" clause. It now says, in part,
An object that may hold resources ... until it is closed.
It is possible, and in fact common, for a base class to implement AutoCloseable even though not all of its subclasses or instances will hold releasable resources. For code that must operate in complete generality, or when it is known that the AutoCloseable instance requires resource release, it is recommended to use try-with-resources constructions. However, when using facilities such as Stream that support both I/O-based and non-I/O-based forms, try-with-resources blocks are in general unnecessary when using non-I/O-based forms.
This issue was discussed extensively within the Lambda expert group; this message summarizes the decision. Among other things it mentions changes to the AutoCloseable specification (cited above) and the BaseStream specification (cited by other answers). It also mentions the possible need to adjust the Eclipse code inspector for the changed semantics, presumably not to emit warnings unconditionally for AutoCloseable objects. Apparently this message didn't get to the Eclipse folks or they haven't changed it yet.
In summary, if Eclipse warnings are leading you into thinking that you need to close all AutoCloseable objects, that's incorrect. Only certain specific AutoCloseable objects need to be closed. Eclipse needs to be fixed (if it hasn't already) not to emit warnings for all AutoCloseable objects.

You only need to close Streams if the stream needs to do any cleanup of itself, usually I/O. Your example uses an HashSet so it doesn't need to be closed.
from the Stream javadoc:
Generally, only streams whose source is an IO channel (such as those returned by Files.lines(Path, Charset)) will require closing. Most streams are backed by collections, arrays, or generating functions, which require no special resource management.
So in your example this should work without issue
List<ImageView> collect = photos.stream()
.map(photo -> ...)
.collect(toList());
EDIT
Even if you need to clean up resources, you should be able to use just one try-with-resource. Let's pretend you are reading a file where each line in the file is a path to an image:
try(Stream<String> lines = Files.lines(file)){
List<ImageView> collect = lines
.map(line -> new ImageView( ImageIO.read(new File(line)))
.collect(toList());
}

“Closeable” means “can be closed”, not “must be closed”.
That was true in the past, e.g. see ByteArrayOutputStream:
Closing a ByteArrayOutputStream has no effect.
And that is true now for Streams where the documentation makes clear:
Streams have a BaseStream.close() method and implement AutoCloseable, but nearly all stream instances do not actually need to be closed after use. Generally, only streams whose source is an IO channel (such as those returned by Files.lines(Path, Charset)) will require closing.
So if an audit tool generates false warnings, it’s a problem of the audit tool, not of the API.
Note that even if you want to add resource management, there is no need to nest try statements. While the following is sufficient:
final Path p = Paths.get(System.getProperty("java.home"), "COPYRIGHT");
try(Stream<String> stream=Files.lines(p, StandardCharsets.ISO_8859_1)) {
System.out.println(stream.filter(s->s.contains("Oracle")).count());
}
you may also add the secondary Stream to the resource management without an additional try:
final Path p = Paths.get(System.getProperty("java.home"), "COPYRIGHT");
try(Stream<String> stream=Files.lines(p, StandardCharsets.ISO_8859_1);
Stream<String> filtered=stream.filter(s->s.contains("Oracle"))) {
System.out.println(filtered.count());
}

It is possible to create a utility method that reliably closes streams with a try-with-resource-statement.
It is a bit like a try-finally that is an expression (something that is the case in e.g. Scala).
/**
* Applies a function to a resource and closes it afterwards.
* #param sup Supplier of the resource that should be closed
* #param op operation that should be performed on the resource before it is closed
* #return The result of calling op.apply on the resource
*/
private static <A extends AutoCloseable, B> B applyAndClose(Callable<A> sup, Function<A, B> op) {
try (A res = sup.call()) {
return op.apply(res);
} catch (RuntimeException exc) {
throw exc;
} catch (Exception exc) {
throw new RuntimeException("Wrapped in applyAndClose", exc);
}
}
(Since resources that need to be closed often also throw exceptions when they are allocated non-runtime exceptions are wrapped in runtime exceptions, avoiding the need for a separate method that does that.)
With this method the example from the question looks like this:
Set<Integer> photos = new HashSet<Integer>(Arrays.asList(1, 2, 3));
List<ImageView> collect = applyAndClose(photos::stream, s -> s
.map(photo -> new ImageView(new Image(String.valueOf(photo))))
.collect(Collectors.toList()));
This is useful in situations when closing the stream is required, such as when using Files.lines. It also helps when you have to do a "double close", as in your example in Listing 3.
This answer is an adaptation of an old answer to a similar question.

Closing streams in the middle of pipelines

When I execute this code which opens a lot of files during a stream pipeline:
public static void main(String[] args) throws IOException {
Files.find(Paths.get("JAVA_DOCS_DIR/docs/api/"),
100, (path, attr) -> path.toString().endsWith(".html"))
.map(file -> runtimizeException(() -> Files.lines(file, StandardCharsets.ISO_8859_1)))
.map(Stream::count)
.forEachOrdered(System.out::println);
}
I get an exception:
java.nio.file.FileSystemException: /long/file/name: Too many open files
The problem is that Stream.count does not close the stream when it is done traversing it. But I don't see why it shouldn't, given that it is a terminal operation. The same holds for other terminal operations such as reduce and forEach. flatMap on the other hand closes the streams it consists of.
The documentation tells me to use a try-with-resouces-statement to close streams if necessary. In my case I could replace the count line with something like this:
.map(s -> { long c = s.count(); s.close(); return c; } )
But that is noisy and ugly and could be a real inconvenience in some cases with big, complex pipelines.
So my questions are the following:
Why were the streams not designed so that terminal operations close the streams they are working on? That would make them work better with IO streams.
What is the best solution for closing IO streams in pipelines?
runtimizeException is a method that wraps checked exception in RuntimeExceptions.

There are two issues here: handling of checked exceptions such as IOException, and timely closing of resources.
None of the predefined functional interfaces declare any checked exceptions, which means that they have to be handled within the lambda, or wrapped in an unchecked exception and rethrown. It looks like your runtimizeException function does that. You probably also had to declare your own functional interface for it. As you've probably discovered, this is a pain.
On the closing of resources like files, there was some investigation of having streams be closed automatically when the end of the stream was reached. This would be convenient, but it doesn't deal with closing when an exception is thrown. There's no magic do-the-right-thing mechanism for this in streams.
We're left with the standard Java techniques of dealing with resource closure, namely the try-with-resources construct introduced in Java 7. TWR really wants to have resources be closed at the same level in the call stack as they were opened. The principle of "whoever opens it has to close it" applies. TWR also deals with exception handling, which usually makes it convenient to deal with exception handling and resource closing in the same place.
In this example, the stream is somewhat unusual in that it maps a Stream<Path> to a Stream<Stream<String>>. These nested streams are the ones that aren't closed, resulting in the eventual exception when the system runs out of open file descriptors. What makes this difficult is that files are opened by one stream operation and then passed downstream; this makes it impossible to use TWR.
An alternative approach to structuring this pipeline is as follows.
The Files.lines call is the one that opens the file, so this has to be the resource in the TWR statement. The processing of this file is where (some) IOExceptions get thrown, so we can do the exception wrapping in the same TWR statement. This suggests having a simple function that maps the path to a line count, while handling resource closing and exception wrapping:
long lineCount(Path path) {
try (Stream<String> s = Files.lines(path, StandardCharsets.ISO_8859_1)) {
return s.count();
} catch (IOException ioe) {
throw new UncheckedIOException(ioe);
}
}
Once you have this helper function, the main pipeline looks like this:
Files.find(Paths.get("JAVA_DOCS_DIR/docs/api/"),
100, (path, attr) -> path.toString().endsWith(".html"))
.mapToLong(this::lineCount)
.forEachOrdered(System.out::println);

It is possible to create a utility method that reliably closes streams in the middle of a pipeline.
This makes sure that each resource is closed with a try-with-resource-statement but avoids the need for a custom utility method, and is much less verbose than writing the try-statement directly in the lambda.
With this method the pipeline from the question looks like this:
Files.find(Paths.get("Java_8_API_docs/docs/api"), 100,
(path, attr) -> path.toString().endsWith(".html"))
.map(file -> applyAndClose(
() -> Files.lines(file, StandardCharsets.ISO_8859_1),
Stream::count))
.forEachOrdered(System.out::println);
The implementation looks like this:
/**
* Applies a function to a resource and closes it afterwards.
* #param sup Supplier of the resource that should be closed
* #param op operation that should be performed on the resource before it is closed
* #return The result of calling op.apply on the resource
*/
private static <A extends AutoCloseable, B> B applyAndClose(Callable<A> sup, Function<A, B> op) {
try (A res = sup.call()) {
return op.apply(res);
} catch (RuntimeException exc) {
throw exc;
} catch (Exception exc) {
throw new RuntimeException("Wrapped in applyAndClose", exc);
}
}
(Since resources that need to be closed often also throw exceptions when they are allocated non-runtime exceptions are wrapped in runtime exceptions, avoiding the need for a separate method that does that.)

You will need to call close() in this stream operation, which will cause all underlying close handlers to be called.
Better yet, would be to wrap your whole statement in a try-with-resources block, as then it will automagically call the close handler.
This may not be possibility in your situation, this means that you will need to handle it yourself in some operation. Your current methods may not be suited for streams at all.
It seems like you indeed need to do it in your second map() operation.

The close of the interface AutoCloseable should only be called once. See the documentation of AutoCloseable for more information.
If final operations would close the stream automatically, close might be invoked twice. Take a look at the following example:
try (Stream<String> lines = Files.lines(path)) {
lines.count();
}
As it is defined right now, the close method on lines will be invoked exactly once. Regardless whether the final operation completes normally, or the operation is aborted with in IOException. If the stream would instead be closed implicitly in the final operation, the close method would be called once, if an IOException occurs, and twice if the operation completes successfully.

Here is an alternative which uses another method from Files and will avoid leaking file descriptors:
Files.find(Paths.get("JAVA_DOCS_DIR/docs/api/"),
100, (path, attr) -> path.toString().endsWith(".html"))
.map(file -> runtimizeException(() -> Files.readAllLines(file, StandardCharsets.ISO_8859_1).size())
.forEachOrdered(System.out::println);
Unlike your version, it will return an int instead of a long for the line count; but you don't have files with that many lines, do you?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.