Closing streams in the middle of pipelines - java

When I execute this code which opens a lot of files during a stream pipeline:
public static void main(String[] args) throws IOException {
Files.find(Paths.get("JAVA_DOCS_DIR/docs/api/"),
100, (path, attr) -> path.toString().endsWith(".html"))
.map(file -> runtimizeException(() -> Files.lines(file, StandardCharsets.ISO_8859_1)))
.map(Stream::count)
.forEachOrdered(System.out::println);
}
I get an exception:
java.nio.file.FileSystemException: /long/file/name: Too many open files
The problem is that Stream.count does not close the stream when it is done traversing it. But I don't see why it shouldn't, given that it is a terminal operation. The same holds for other terminal operations such as reduce and forEach. flatMap on the other hand closes the streams it consists of.
The documentation tells me to use a try-with-resouces-statement to close streams if necessary. In my case I could replace the count line with something like this:
.map(s -> { long c = s.count(); s.close(); return c; } )
But that is noisy and ugly and could be a real inconvenience in some cases with big, complex pipelines.
So my questions are the following:
Why were the streams not designed so that terminal operations close the streams they are working on? That would make them work better with IO streams.
What is the best solution for closing IO streams in pipelines?
runtimizeException is a method that wraps checked exception in RuntimeExceptions.

There are two issues here: handling of checked exceptions such as IOException, and timely closing of resources.
None of the predefined functional interfaces declare any checked exceptions, which means that they have to be handled within the lambda, or wrapped in an unchecked exception and rethrown. It looks like your runtimizeException function does that. You probably also had to declare your own functional interface for it. As you've probably discovered, this is a pain.
On the closing of resources like files, there was some investigation of having streams be closed automatically when the end of the stream was reached. This would be convenient, but it doesn't deal with closing when an exception is thrown. There's no magic do-the-right-thing mechanism for this in streams.
We're left with the standard Java techniques of dealing with resource closure, namely the try-with-resources construct introduced in Java 7. TWR really wants to have resources be closed at the same level in the call stack as they were opened. The principle of "whoever opens it has to close it" applies. TWR also deals with exception handling, which usually makes it convenient to deal with exception handling and resource closing in the same place.
In this example, the stream is somewhat unusual in that it maps a Stream<Path> to a Stream<Stream<String>>. These nested streams are the ones that aren't closed, resulting in the eventual exception when the system runs out of open file descriptors. What makes this difficult is that files are opened by one stream operation and then passed downstream; this makes it impossible to use TWR.
An alternative approach to structuring this pipeline is as follows.
The Files.lines call is the one that opens the file, so this has to be the resource in the TWR statement. The processing of this file is where (some) IOExceptions get thrown, so we can do the exception wrapping in the same TWR statement. This suggests having a simple function that maps the path to a line count, while handling resource closing and exception wrapping:
long lineCount(Path path) {
try (Stream<String> s = Files.lines(path, StandardCharsets.ISO_8859_1)) {
return s.count();
} catch (IOException ioe) {
throw new UncheckedIOException(ioe);
}
}
Once you have this helper function, the main pipeline looks like this:
Files.find(Paths.get("JAVA_DOCS_DIR/docs/api/"),
100, (path, attr) -> path.toString().endsWith(".html"))
.mapToLong(this::lineCount)
.forEachOrdered(System.out::println);

It is possible to create a utility method that reliably closes streams in the middle of a pipeline.
This makes sure that each resource is closed with a try-with-resource-statement but avoids the need for a custom utility method, and is much less verbose than writing the try-statement directly in the lambda.
With this method the pipeline from the question looks like this:
Files.find(Paths.get("Java_8_API_docs/docs/api"), 100,
(path, attr) -> path.toString().endsWith(".html"))
.map(file -> applyAndClose(
() -> Files.lines(file, StandardCharsets.ISO_8859_1),
Stream::count))
.forEachOrdered(System.out::println);
The implementation looks like this:
/**
* Applies a function to a resource and closes it afterwards.
* #param sup Supplier of the resource that should be closed
* #param op operation that should be performed on the resource before it is closed
* #return The result of calling op.apply on the resource
*/
private static <A extends AutoCloseable, B> B applyAndClose(Callable<A> sup, Function<A, B> op) {
try (A res = sup.call()) {
return op.apply(res);
} catch (RuntimeException exc) {
throw exc;
} catch (Exception exc) {
throw new RuntimeException("Wrapped in applyAndClose", exc);
}
}
(Since resources that need to be closed often also throw exceptions when they are allocated non-runtime exceptions are wrapped in runtime exceptions, avoiding the need for a separate method that does that.)

You will need to call close() in this stream operation, which will cause all underlying close handlers to be called.
Better yet, would be to wrap your whole statement in a try-with-resources block, as then it will automagically call the close handler.
This may not be possibility in your situation, this means that you will need to handle it yourself in some operation. Your current methods may not be suited for streams at all.
It seems like you indeed need to do it in your second map() operation.

The close of the interface AutoCloseable should only be called once. See the documentation of AutoCloseable for more information.
If final operations would close the stream automatically, close might be invoked twice. Take a look at the following example:
try (Stream<String> lines = Files.lines(path)) {
lines.count();
}
As it is defined right now, the close method on lines will be invoked exactly once. Regardless whether the final operation completes normally, or the operation is aborted with in IOException. If the stream would instead be closed implicitly in the final operation, the close method would be called once, if an IOException occurs, and twice if the operation completes successfully.

Here is an alternative which uses another method from Files and will avoid leaking file descriptors:
Files.find(Paths.get("JAVA_DOCS_DIR/docs/api/"),
100, (path, attr) -> path.toString().endsWith(".html"))
.map(file -> runtimizeException(() -> Files.readAllLines(file, StandardCharsets.ISO_8859_1).size())
.forEachOrdered(System.out::println);
Unlike your version, it will return an int instead of a long for the line count; but you don't have files with that many lines, do you?

Related

What could be a better way to achieve Checked Exception handling in mapping functions passed to Java 8 Streams

I have a simple scenario which I am trying to code without being clumsy and without writing unreadable multiline lambdas.
public class StreamTest {
public static void main(String[] args) {
List<String> list = Arrays.asList("hellow", "world");
Stream<String> stream = list.stream().map(StreamTest::exceptionThrowingMappingFunction);
}
public static String exceptionThrowingMappingFunction(String s) throws Exception {
if (s.equals("world")) {
throw new Exception("world is doomed");
}
return s + " exists";
}
}
What I would like to have are the following options:
Fail the whole stream if the exception is thrown
Skip the value and continue with the rest of the stream if exception occurs
I know about popular ways of dealing with this, like throwing a RuntimeException in a custom FunctionalInterface or just handling the exception inline.
But is there some way, where I can extend Streams and just write a stream like StreamWithExceptionHandling extends Stream. Which also accepts an ExceptionHandler and just implements the above behaviour?
Thanks for taking your time to read this one.
Try writing a sample solution and posting it to Code Review. Your problem might be a good fit.
Lambdas are useful for one liners. For the rest: don't feel bad about just defining a class or a method.
For option 2, map the value into a result object that contains operation status and return value and then filter by status. You'll avoid introducing non-standard behaviour to the streams API.
You can use CompletionStages to help out in this scenario. They have a good interface for handling exceptional flows.
So, convert your streamed value to an already completed CompletableFuture, as a map step in the stream, then map it again to CompletionStage.thenApply, which returns a new CompletionStage that holds any exceptions for you. You can then filter the unwanted exceptional completion stages out of the stream, or include other other processing steps if you want (like logging the exception, for example).
And of course you can map the value back out of a CompletionStage into the actual completed value easily enough.
It’s one way to do it at least, without trying to write your own streams interface.

Why is Files.lines (and similar Streams) not automatically closed?

The javadoc for Stream states:
Streams have a BaseStream.close() method and implement AutoCloseable, but nearly all stream instances do not actually need to be closed after use. Generally, only streams whose source is an IO channel (such as those returned by Files.lines(Path, Charset)) will require closing. Most streams are backed by collections, arrays, or generating functions, which require no special resource management. (If a stream does require closing, it can be declared as a resource in a try-with-resources statement.)
Therefore, the vast majority of the time one can use Streams in a one-liner, like collection.stream().forEach(System.out::println); but for Files.lines and other resource-backed streams, one must use a try-with-resources statement or else leak resources.
This strikes me as error-prone and unnecessary. As Streams can only be iterated once, it seems to me that there is no a situation where the output of Files.lines should not be closed as soon as it has been iterated, and therefore the implementation should simply call close implicitly at the end of any terminal operation. Am I mistaken?
Yes, this was a deliberate decision. We considered both alternatives.
The operating design principle here is "whoever acquires the resource should release the resource". Files don't auto-close when you read to EOF; we expect files to be closed explicitly by whoever opened them. Streams that are backed by IO resources are the same.
Fortunately, the language provides a mechanism for automating this for you: try-with-resources. Because Stream implements AutoCloseable, you can do:
try (Stream<String> s = Files.lines(...)) {
s.forEach(...);
}
The argument that "it would be really convenient to auto-close so I could write it as a one-liner" is nice, but would mostly be the tail wagging the dog. If you opened a file or other resource, you should also be prepared to close it. Effective and consistent resource management trumps "I want to write this in one line", and we chose not to distort the design just to preserve the one-line-ness.
I have more specific example in addition to #BrianGoetz answer. Don't forget that the Stream has escape-hatch methods like iterator(). Suppose you are doing this:
Iterator<String> iterator = Files.lines(path).iterator();
After that you may call hasNext() and next() several times, then just abandon this iterator: Iterator interface perfectly supports such use. There's no way to explicitly close the Iterator, the only object you can close here is the Stream. So this way it would work perfectly fine:
try(Stream<String> stream = Files.lines(path)) {
Iterator<String> iterator = stream.iterator();
// use iterator in any way you want and abandon it at any moment
} // file is correctly closed here.
In addition if you want "one line write". You can just do this:
Files.readAllLines(source).stream().forEach(...);
You can use it if you are sure that you need entire file and the file is small. Because it isn't a lazy read.
If you're lazy like me and don't mind the "if an exception is raised, it will leave the file handle open" you could wrap the stream in an autoclosing stream, something like this (there may be other ways):
static Stream<String> allLinesCloseAtEnd(String filename) throws IOException {
Stream<String> lines = Files.lines(Paths.get(filename));
Iterator<String> linesIter = lines.iterator();
Iterator it = new Iterator() {
#Override
public boolean hasNext() {
if (!linesIter.hasNext()) {
lines.close(); // auto-close when reach end
return false;
}
return true;
}
#Override
public Object next() {
return linesIter.next();
}
};
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(it, Spliterator.DISTINCT), false);
}

Java 8 Streams and try with resources

I thought that the stream API was here to make the code easier to read.
I found something quite annoying. The Stream interface extends the java.lang.AutoCloseable interface.
So if you want to correctly close your streams, you have to use try with resources.
Listing 1. Not very nice, streams are not closed.
public void noTryWithResource() {
Set<Integer> photos = new HashSet<Integer>(Arrays.asList(1, 2, 3));
#SuppressWarnings("resource") List<ImageView> collect = photos.stream()
.map(photo -> new ImageView(new Image(String.valueOf(photo))))
.collect(Collectors.<ImageView>toList());
}
Listing 2. With 2 nested try
public void tryWithResource() {
Set<Integer> photos = new HashSet<Integer>(Arrays.asList(1, 2, 3));
try (Stream<Integer> stream = photos.stream()) {
try (Stream<ImageView> map = stream
.map(photo -> new ImageView(new Image(String.valueOf(photo)))))
{
List<ImageView> collect = map.collect(Collectors.<ImageView>toList());
}
}
}
Listing 3. As map returns a stream, both the stream() and the map() functions have to be closed.
public void tryWithResource2() {
Set<Integer> photos = new HashSet<Integer>(Arrays.asList(1, 2, 3));
try (Stream<Integer> stream = photos.stream(); Stream<ImageView> map = stream.map(photo -> new ImageView(new Image(String.valueOf(photo)))))
{
List<ImageView> collect = map.collect(Collectors.<ImageView>toList());
}
}
The example I give does not make any sense. I replaced Path to jpg images with Integer, for the sake of the example. But don't let you distract by these details.
What is the best way to go around with those auto closable streams.
I have to say I'm not satisfied with any of the 3 options I showed.
What do you think? Are there yet other more elegant solutions?
You're using #SuppressWarnings("resource") which presumably suppresses a warning about an unclosed resource. This isn't one of the warnings emitted by javac. Web searches seem to indicate that Eclipse issues warnings if an AutoCloseable is left unclosed.
This is a reasonable warning according to the Java 7 specification that introduced AutoCloseable:
A resource that must be closed when it is no longer needed.
However, the Java 8 specification for AutoCloseable was relaxed to remove the "must be closed" clause. It now says, in part,
An object that may hold resources ... until it is closed.
It is possible, and in fact common, for a base class to implement AutoCloseable even though not all of its subclasses or instances will hold releasable resources. For code that must operate in complete generality, or when it is known that the AutoCloseable instance requires resource release, it is recommended to use try-with-resources constructions. However, when using facilities such as Stream that support both I/O-based and non-I/O-based forms, try-with-resources blocks are in general unnecessary when using non-I/O-based forms.
This issue was discussed extensively within the Lambda expert group; this message summarizes the decision. Among other things it mentions changes to the AutoCloseable specification (cited above) and the BaseStream specification (cited by other answers). It also mentions the possible need to adjust the Eclipse code inspector for the changed semantics, presumably not to emit warnings unconditionally for AutoCloseable objects. Apparently this message didn't get to the Eclipse folks or they haven't changed it yet.
In summary, if Eclipse warnings are leading you into thinking that you need to close all AutoCloseable objects, that's incorrect. Only certain specific AutoCloseable objects need to be closed. Eclipse needs to be fixed (if it hasn't already) not to emit warnings for all AutoCloseable objects.
You only need to close Streams if the stream needs to do any cleanup of itself, usually I/O. Your example uses an HashSet so it doesn't need to be closed.
from the Stream javadoc:
Generally, only streams whose source is an IO channel (such as those returned by Files.lines(Path, Charset)) will require closing. Most streams are backed by collections, arrays, or generating functions, which require no special resource management.
So in your example this should work without issue
List<ImageView> collect = photos.stream()
.map(photo -> ...)
.collect(toList());
EDIT
Even if you need to clean up resources, you should be able to use just one try-with-resource. Let's pretend you are reading a file where each line in the file is a path to an image:
try(Stream<String> lines = Files.lines(file)){
List<ImageView> collect = lines
.map(line -> new ImageView( ImageIO.read(new File(line)))
.collect(toList());
}
“Closeable” means “can be closed”, not “must be closed”.
That was true in the past, e.g. see ByteArrayOutputStream:
Closing a ByteArrayOutputStream has no effect.
And that is true now for Streams where the documentation makes clear:
Streams have a BaseStream.close() method and implement AutoCloseable, but nearly all stream instances do not actually need to be closed after use. Generally, only streams whose source is an IO channel (such as those returned by Files.lines(Path, Charset)) will require closing.
So if an audit tool generates false warnings, it’s a problem of the audit tool, not of the API.
Note that even if you want to add resource management, there is no need to nest try statements. While the following is sufficient:
final Path p = Paths.get(System.getProperty("java.home"), "COPYRIGHT");
try(Stream<String> stream=Files.lines(p, StandardCharsets.ISO_8859_1)) {
System.out.println(stream.filter(s->s.contains("Oracle")).count());
}
you may also add the secondary Stream to the resource management without an additional try:
final Path p = Paths.get(System.getProperty("java.home"), "COPYRIGHT");
try(Stream<String> stream=Files.lines(p, StandardCharsets.ISO_8859_1);
Stream<String> filtered=stream.filter(s->s.contains("Oracle"))) {
System.out.println(filtered.count());
}
It is possible to create a utility method that reliably closes streams with a try-with-resource-statement.
It is a bit like a try-finally that is an expression (something that is the case in e.g. Scala).
/**
* Applies a function to a resource and closes it afterwards.
* #param sup Supplier of the resource that should be closed
* #param op operation that should be performed on the resource before it is closed
* #return The result of calling op.apply on the resource
*/
private static <A extends AutoCloseable, B> B applyAndClose(Callable<A> sup, Function<A, B> op) {
try (A res = sup.call()) {
return op.apply(res);
} catch (RuntimeException exc) {
throw exc;
} catch (Exception exc) {
throw new RuntimeException("Wrapped in applyAndClose", exc);
}
}
(Since resources that need to be closed often also throw exceptions when they are allocated non-runtime exceptions are wrapped in runtime exceptions, avoiding the need for a separate method that does that.)
With this method the example from the question looks like this:
Set<Integer> photos = new HashSet<Integer>(Arrays.asList(1, 2, 3));
List<ImageView> collect = applyAndClose(photos::stream, s -> s
.map(photo -> new ImageView(new Image(String.valueOf(photo))))
.collect(Collectors.toList()));
This is useful in situations when closing the stream is required, such as when using Files.lines. It also helps when you have to do a "double close", as in your example in Listing 3.
This answer is an adaptation of an old answer to a similar question.

Suggestion for API design about stream

I need to design an API method which takes an OutputStream as a parameter.
Is it a good practice to close the stream inside the API method or let the caller close it?
test(OutputStream os) {
os.close() //???
}
I think it should be symmetric.
If you do not open that stream (which is likely to be your case), you should not close it, either, in general.
Unless the purpose of the API is to "finish up the stream", you should let the caller close. He had it first, he was responsible for it, and he may decide that he wants to write some stuff to the stream that your API didn't originally envision. Keep your functionality seperated; its more composable.
Let user close it. As you are taking OutputStream in argument so we can think that user has already created and opened it. So if you close in your method it will be not good. And if you are just taking new OutputStream as argument and opens it in your method then no need to take it as argument and you can also close it in your method.
Different use-cases require different patterns, for example, depending on whether the caller needs to read from or write to the stream after the call has completed.
The key API design rule is that the API should specify whether it is the caller or called method's responsibility to close the stream.
Having said that, it is generally simpler and safer if the code that opens a stream is also responsible for closing it.
Consider the case where methodA is supposed to open a stream and pass it to methodB, but an exception is thrown between the stream being opened and methodB entering the try / finally statement that is ultimately responsible for closing it. You need to code it something like the following to ensure that streams don't leak:
public void methodA() throws IOException {
InputStream myStream = new FileInputStream(...);
try {
// do stuff with stream
methodB(myStream);
} finally {
myStream.close();
}
}
/**
* #param myStream this method is responsible for closing myStream.
*/
public void methodB(InputStream myStream) throws IOException {
try {
// do more stuff with myStream
} finally {
myStream.close();
}
}
This won't leak an open stream as a result of exceptions (or errors!) thrown in either methodA or methodB. (It works for the standard stream types because the Closable API specifies that close has no effect when called on a stream that is already closed.)

In Java how would you write the equivalent of Iterable which could throw exceptions?

In java a class can implement Iterable which lets you use the foreach() statement and the iteration syntatic sugar:
for(T t:ts) ...
However, this does not allow you to throw exceptions on the construction for an Iterator. If you were iterating off a network, file, database etc it would be nice to be able to throw exceptions. Obvious candidates are java.io.InputStream, Reader and the java.nio.Channel code, but none of this can use Generics like the Iterable interface can.
Is there a common idiom or Java API for this situation?
Clarification: This is asking if there is a pattern or alternative interface for iterating for objects off a non-memory source. As responders have said, just throwing RuntimeExceptions to get around the problem is not recommended or what I was looking for.
Edit 2: Thanks to answers so far. The consensus seems to be "you can't". So can I extend the question to "What do you do in this situation, when this situation would be useful?" Just write your own interface?
Unfortunately you can't. There are two problems:
The Iterator API doesn't declare any exceptions to be thrown, so you'd have to throw RuntimeExceptions (or non-Exception throwables)
The enhanced for loop doesn't do anything to try to release resources at the end of the loop
This is very annoying. In C#, for instance, you can really easily write code to iterate through the lines of a text file:
public static IEnumerable<string> ReadLines(string filename)
{
using (TextReader reader = File.OpenText(filename))
{
string line;
while ( (line=reader.ReadLine()) != null)
{
yield return line;
}
}
}
Use as:
foreach (string line in ReadLines("foo.txt"))
The foreach loop calls Dispose on the IEnumerator in a finally block, which translates to "check if we need to do anything in the iterator block's finally (from the using statement)". Obviously there are no checked exceptions in C#, so that side of things isn't a problem either.
A whole (useful!) idiom is pretty much unworkable in Java due to this.
Streams like a network aren't really iterable in the traditional sense. Data can come through at any time, so it doesn't make sense to have a for each loop.
For a file read, or a DB snapshot (like a select query) there's no reason you can't take that data, segment it into logical chunks and implement an iterable interface.
You can also call an initialize method first that will catch any exceptions, if that's an issue.
try{
ts.initializeIOIterator();
}catch(...)
for(T t:ts)
...
Best what you can do is to create RuntimeIOException which you will throw from your hasNext/next implementation in case of errors.
try {
for (...) {
// do my stuff here
}
catch (RuntimeIOException e) {
throw e.getCause(); // rethrow IOException
}
RuntimeIOException will be runtime exception, wrapping your IOException:
class RuntimeIOException extends RuntimeException {
RuntimeIOException(IOException e) {
super(e);
}
IOException getCause() {
return (IOException) super.getCause();
}
}
Sometimes there is no other way.
I'd say you can't, even if you could you probably shouldn't. You get bytes from these things, if they were used in a for loop likely every byte would end up boxed.
What you can do is wrap checked exceptions in unchecked exceptions and comply to the iterable interface, though again this isn't advisable.
Generally in this situation, I would throw an appropriate subclass of RuntimeException in the Iterable's implementation.
In terms of cleaning up resources, a try - finally block works just as well wrapping a foreach block as it does around any other bit of code, so from the client's perspective it can easily use this to clean up any resources. If you want to manage resources within the Iterable it can be trickier, since there's no obvious start and finish lifecycle points.
In this case the best you could probably do is to create the resources on demand (i.e. the first call to next()), and then destroy them either when a call to next() is about to return false, or when an exception is thrown in the body of next(). Doing this would of course mean that when your next() method exits with an exception, the iterator can no longer be used - this is not an unreasonable constraint to place (consider the exception a more error-y version of returning false) but is something you should document as this isn't strictly covered by the interface.
That said, the above assumes that you're creating something solely as an Iterable. I find that in practice, when I implement Iterable on a class, it's more like a "super-getter" (i.e. a way for clients to conveniently access the information stored within it), than it is the point of the class itself. Most of the time these objects will be set up independently and accessed via other methods, so their lifecycle can be managed completely separately from their existence as an Iterable.
This might seem tangential to the question, but the immediate answer to the question is straightforward ("use runtime exceptions") - the tricky part is maintaining an appropriate state in the presence of these exceptions.

Categories

Resources