I thought that the stream API was here to make the code easier to read.
I found something quite annoying. The Stream interface extends the java.lang.AutoCloseable interface.
So if you want to correctly close your streams, you have to use try with resources.
Listing 1. Not very nice, streams are not closed.
public void noTryWithResource() {
Set<Integer> photos = new HashSet<Integer>(Arrays.asList(1, 2, 3));
#SuppressWarnings("resource") List<ImageView> collect = photos.stream()
.map(photo -> new ImageView(new Image(String.valueOf(photo))))
.collect(Collectors.<ImageView>toList());
}
Listing 2. With 2 nested try
public void tryWithResource() {
Set<Integer> photos = new HashSet<Integer>(Arrays.asList(1, 2, 3));
try (Stream<Integer> stream = photos.stream()) {
try (Stream<ImageView> map = stream
.map(photo -> new ImageView(new Image(String.valueOf(photo)))))
{
List<ImageView> collect = map.collect(Collectors.<ImageView>toList());
}
}
}
Listing 3. As map returns a stream, both the stream() and the map() functions have to be closed.
public void tryWithResource2() {
Set<Integer> photos = new HashSet<Integer>(Arrays.asList(1, 2, 3));
try (Stream<Integer> stream = photos.stream(); Stream<ImageView> map = stream.map(photo -> new ImageView(new Image(String.valueOf(photo)))))
{
List<ImageView> collect = map.collect(Collectors.<ImageView>toList());
}
}
The example I give does not make any sense. I replaced Path to jpg images with Integer, for the sake of the example. But don't let you distract by these details.
What is the best way to go around with those auto closable streams.
I have to say I'm not satisfied with any of the 3 options I showed.
What do you think? Are there yet other more elegant solutions?
You're using #SuppressWarnings("resource") which presumably suppresses a warning about an unclosed resource. This isn't one of the warnings emitted by javac. Web searches seem to indicate that Eclipse issues warnings if an AutoCloseable is left unclosed.
This is a reasonable warning according to the Java 7 specification that introduced AutoCloseable:
A resource that must be closed when it is no longer needed.
However, the Java 8 specification for AutoCloseable was relaxed to remove the "must be closed" clause. It now says, in part,
An object that may hold resources ... until it is closed.
It is possible, and in fact common, for a base class to implement AutoCloseable even though not all of its subclasses or instances will hold releasable resources. For code that must operate in complete generality, or when it is known that the AutoCloseable instance requires resource release, it is recommended to use try-with-resources constructions. However, when using facilities such as Stream that support both I/O-based and non-I/O-based forms, try-with-resources blocks are in general unnecessary when using non-I/O-based forms.
This issue was discussed extensively within the Lambda expert group; this message summarizes the decision. Among other things it mentions changes to the AutoCloseable specification (cited above) and the BaseStream specification (cited by other answers). It also mentions the possible need to adjust the Eclipse code inspector for the changed semantics, presumably not to emit warnings unconditionally for AutoCloseable objects. Apparently this message didn't get to the Eclipse folks or they haven't changed it yet.
In summary, if Eclipse warnings are leading you into thinking that you need to close all AutoCloseable objects, that's incorrect. Only certain specific AutoCloseable objects need to be closed. Eclipse needs to be fixed (if it hasn't already) not to emit warnings for all AutoCloseable objects.
You only need to close Streams if the stream needs to do any cleanup of itself, usually I/O. Your example uses an HashSet so it doesn't need to be closed.
from the Stream javadoc:
Generally, only streams whose source is an IO channel (such as those returned by Files.lines(Path, Charset)) will require closing. Most streams are backed by collections, arrays, or generating functions, which require no special resource management.
So in your example this should work without issue
List<ImageView> collect = photos.stream()
.map(photo -> ...)
.collect(toList());
EDIT
Even if you need to clean up resources, you should be able to use just one try-with-resource. Let's pretend you are reading a file where each line in the file is a path to an image:
try(Stream<String> lines = Files.lines(file)){
List<ImageView> collect = lines
.map(line -> new ImageView( ImageIO.read(new File(line)))
.collect(toList());
}
“Closeable” means “can be closed”, not “must be closed”.
That was true in the past, e.g. see ByteArrayOutputStream:
Closing a ByteArrayOutputStream has no effect.
And that is true now for Streams where the documentation makes clear:
Streams have a BaseStream.close() method and implement AutoCloseable, but nearly all stream instances do not actually need to be closed after use. Generally, only streams whose source is an IO channel (such as those returned by Files.lines(Path, Charset)) will require closing.
So if an audit tool generates false warnings, it’s a problem of the audit tool, not of the API.
Note that even if you want to add resource management, there is no need to nest try statements. While the following is sufficient:
final Path p = Paths.get(System.getProperty("java.home"), "COPYRIGHT");
try(Stream<String> stream=Files.lines(p, StandardCharsets.ISO_8859_1)) {
System.out.println(stream.filter(s->s.contains("Oracle")).count());
}
you may also add the secondary Stream to the resource management without an additional try:
final Path p = Paths.get(System.getProperty("java.home"), "COPYRIGHT");
try(Stream<String> stream=Files.lines(p, StandardCharsets.ISO_8859_1);
Stream<String> filtered=stream.filter(s->s.contains("Oracle"))) {
System.out.println(filtered.count());
}
It is possible to create a utility method that reliably closes streams with a try-with-resource-statement.
It is a bit like a try-finally that is an expression (something that is the case in e.g. Scala).
/**
* Applies a function to a resource and closes it afterwards.
* #param sup Supplier of the resource that should be closed
* #param op operation that should be performed on the resource before it is closed
* #return The result of calling op.apply on the resource
*/
private static <A extends AutoCloseable, B> B applyAndClose(Callable<A> sup, Function<A, B> op) {
try (A res = sup.call()) {
return op.apply(res);
} catch (RuntimeException exc) {
throw exc;
} catch (Exception exc) {
throw new RuntimeException("Wrapped in applyAndClose", exc);
}
}
(Since resources that need to be closed often also throw exceptions when they are allocated non-runtime exceptions are wrapped in runtime exceptions, avoiding the need for a separate method that does that.)
With this method the example from the question looks like this:
Set<Integer> photos = new HashSet<Integer>(Arrays.asList(1, 2, 3));
List<ImageView> collect = applyAndClose(photos::stream, s -> s
.map(photo -> new ImageView(new Image(String.valueOf(photo))))
.collect(Collectors.toList()));
This is useful in situations when closing the stream is required, such as when using Files.lines. It also helps when you have to do a "double close", as in your example in Listing 3.
This answer is an adaptation of an old answer to a similar question.
Related
I have a simple scenario which I am trying to code without being clumsy and without writing unreadable multiline lambdas.
public class StreamTest {
public static void main(String[] args) {
List<String> list = Arrays.asList("hellow", "world");
Stream<String> stream = list.stream().map(StreamTest::exceptionThrowingMappingFunction);
}
public static String exceptionThrowingMappingFunction(String s) throws Exception {
if (s.equals("world")) {
throw new Exception("world is doomed");
}
return s + " exists";
}
}
What I would like to have are the following options:
Fail the whole stream if the exception is thrown
Skip the value and continue with the rest of the stream if exception occurs
I know about popular ways of dealing with this, like throwing a RuntimeException in a custom FunctionalInterface or just handling the exception inline.
But is there some way, where I can extend Streams and just write a stream like StreamWithExceptionHandling extends Stream. Which also accepts an ExceptionHandler and just implements the above behaviour?
Thanks for taking your time to read this one.
Try writing a sample solution and posting it to Code Review. Your problem might be a good fit.
Lambdas are useful for one liners. For the rest: don't feel bad about just defining a class or a method.
For option 2, map the value into a result object that contains operation status and return value and then filter by status. You'll avoid introducing non-standard behaviour to the streams API.
You can use CompletionStages to help out in this scenario. They have a good interface for handling exceptional flows.
So, convert your streamed value to an already completed CompletableFuture, as a map step in the stream, then map it again to CompletionStage.thenApply, which returns a new CompletionStage that holds any exceptions for you. You can then filter the unwanted exceptional completion stages out of the stream, or include other other processing steps if you want (like logging the exception, for example).
And of course you can map the value back out of a CompletionStage into the actual completed value easily enough.
It’s one way to do it at least, without trying to write your own streams interface.
I have a class which manages a Stream:
class MyStreamManager {
private Stream<Object> currentStream = null;
boolean hasMoreData() {
//code here to assert currentStream is null
final Optional<Stream<Object>> maybeAStream = somethingWhichMightProvideAStream.getNextStream();
currentStream = maybeAStream.orElse(null);
return currentStream != null;
}
#MustBeClosed
Stream<Object> getCurrentStream() { return currentStream; }
void finish() {
currentStream.close();
currentStream = null;
}
}
Which is used in the following style:
while (myStreamManager.hasMoreData()) {
try {
myStreamManager.getCurrentStream().map(...).filter(...); //etc
} finally {
myStreamManager.finish();
}
}
Is storing a reference to a Stream like this bad practice? While this works, it definitely doesn't feel right, and ErrorProne is flagging it (hence the #MustBeClosed annotation).
MyStreamManager is a Spring #Bean but is only used by one thread (this is running in a batch).
I can think of two different approaches which are probably better:
instantiate MyStreamManager and wrap it in a try-with-resources, delegating the close() call to the Stream
use the Spliterators class to create a Spliterator that delegates to many Streams?
I don't think that it's as much the fact you're storing a Stream per se that makes this feel awkward, but rather that you've got sequential coupling.
You have to call hasMoreData; then getCurrentStream(); then finish(). If you're only using the class in a limited number of places, you will probably be able to get it right in all of those; but every place you use it is a new opportunity to use it incorrectly.
I would say that your manager class is actually just making things harder for yourself.
for (Optional<Stream<Object>> opt = somethingWhichMightProvideAStream.getNextStream();
opt.isPresent();
opt = somethingWhichMightProvideAStream.getNextStream()) {
try (Stream<Object> stream = opt.get()) { // try-with-resources auto-closes the stream
stream.map(...).filter(...); //etc
}
}
or:
Optional<Stream<Object>> opt;
while ((opt = somethingWhichMightProvideAStream.getNextStream()).isPresent()) {
try (Stream<Object> stream = opt.get()) {
stream.map(...).filter(...); //etc
}
}
The loop declarations in either case are not especially pretty; but this is way shorter (roughly as long as the while/try/finally loop you already have), and harder to use wrong, I think.
(Admittedly, you've still got sequential coupling here: you have to remember to close the stream returned in the optional. Sigh.)
Mixing imperative (while loop, try-finally) and declarative (streams) code together doesn't seem right.
If all of these opeartions are synchronous I guess it could be done in one pipeline (without MyStreamManager at all).
I think that you could think of focusing on moving some logic to object containing method somethingWhichMightProvideAStream because mixing imperative iterator pattern with stream API doesn't look like idiomatic. For example it can return List (or even better a Stream!) of Streams instead of Optional
Think twice if you really need to close this stream. From documentation:
Streams have a BaseStream.close() method and implement AutoCloseable, but nearly all stream instances do not actually need to be closed after use. Generally, only streams whose source is an IO channel (such as those returned by Files.lines(Path, Charset)) will require closing.
The javadoc for Stream states:
Streams have a BaseStream.close() method and implement AutoCloseable, but nearly all stream instances do not actually need to be closed after use. Generally, only streams whose source is an IO channel (such as those returned by Files.lines(Path, Charset)) will require closing. Most streams are backed by collections, arrays, or generating functions, which require no special resource management. (If a stream does require closing, it can be declared as a resource in a try-with-resources statement.)
Therefore, the vast majority of the time one can use Streams in a one-liner, like collection.stream().forEach(System.out::println); but for Files.lines and other resource-backed streams, one must use a try-with-resources statement or else leak resources.
This strikes me as error-prone and unnecessary. As Streams can only be iterated once, it seems to me that there is no a situation where the output of Files.lines should not be closed as soon as it has been iterated, and therefore the implementation should simply call close implicitly at the end of any terminal operation. Am I mistaken?
Yes, this was a deliberate decision. We considered both alternatives.
The operating design principle here is "whoever acquires the resource should release the resource". Files don't auto-close when you read to EOF; we expect files to be closed explicitly by whoever opened them. Streams that are backed by IO resources are the same.
Fortunately, the language provides a mechanism for automating this for you: try-with-resources. Because Stream implements AutoCloseable, you can do:
try (Stream<String> s = Files.lines(...)) {
s.forEach(...);
}
The argument that "it would be really convenient to auto-close so I could write it as a one-liner" is nice, but would mostly be the tail wagging the dog. If you opened a file or other resource, you should also be prepared to close it. Effective and consistent resource management trumps "I want to write this in one line", and we chose not to distort the design just to preserve the one-line-ness.
I have more specific example in addition to #BrianGoetz answer. Don't forget that the Stream has escape-hatch methods like iterator(). Suppose you are doing this:
Iterator<String> iterator = Files.lines(path).iterator();
After that you may call hasNext() and next() several times, then just abandon this iterator: Iterator interface perfectly supports such use. There's no way to explicitly close the Iterator, the only object you can close here is the Stream. So this way it would work perfectly fine:
try(Stream<String> stream = Files.lines(path)) {
Iterator<String> iterator = stream.iterator();
// use iterator in any way you want and abandon it at any moment
} // file is correctly closed here.
In addition if you want "one line write". You can just do this:
Files.readAllLines(source).stream().forEach(...);
You can use it if you are sure that you need entire file and the file is small. Because it isn't a lazy read.
If you're lazy like me and don't mind the "if an exception is raised, it will leave the file handle open" you could wrap the stream in an autoclosing stream, something like this (there may be other ways):
static Stream<String> allLinesCloseAtEnd(String filename) throws IOException {
Stream<String> lines = Files.lines(Paths.get(filename));
Iterator<String> linesIter = lines.iterator();
Iterator it = new Iterator() {
#Override
public boolean hasNext() {
if (!linesIter.hasNext()) {
lines.close(); // auto-close when reach end
return false;
}
return true;
}
#Override
public Object next() {
return linesIter.next();
}
};
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(it, Spliterator.DISTINCT), false);
}
I'd like to duplicate a Java 8 stream so that I can deal with it twice. I can collect as a list and get new streams from that;
// doSomething() returns a stream
List<A> thing = doSomething().collect(toList());
thing.stream()... // do stuff
thing.stream()... // do other stuff
But I kind of think there should be a more efficient/elegant way.
Is there a way to copy the stream without turning it into a collection?
I'm actually working with a stream of Eithers, so want to process the left projection one way before moving onto the right projection and dealing with that another way. Kind of like this (which, so far, I'm forced to use the toList trick with).
List<Either<Pair<A, Throwable>, A>> results = doSomething().collect(toList());
Stream<Pair<A, Throwable>> failures = results.stream().flatMap(either -> either.left());
failures.forEach(failure -> ... );
Stream<A> successes = results.stream().flatMap(either -> either.right());
successes.forEach(success -> ... );
I think your assumption about efficiency is kind of backwards. You get this huge efficiency payback if you're only going to use the data once, because you don't have to store it, and streams give you powerful "loop fusion" optimizations that let you flow the whole data efficiently through the pipeline.
If you want to re-use the same data, then by definition you either have to generate it twice (deterministically) or store it. If it already happens to be in a collection, great; then iterating it twice is cheap.
We did experiment in the design with "forked streams". What we found was that supporting this had real costs; it burdened the common case (use once) at the expense of the uncommon case. The big problem was dealing with "what happens when the two pipelines don't consume data at the same rate." Now you're back to buffering anyway. This was a feature that clearly didn't carry its weight.
If you want to operate on the same data repeatedly, either store it, or structure your operations as Consumers and do the following:
stream()...stuff....forEach(e -> { consumerA(e); consumerB(e); });
You might also look into the RxJava library, as its processing model lends itself better to this kind of "stream forking".
You can use a local variable with a Supplier to set up common parts of the stream pipeline.
From http://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/:
Reusing Streams
Java 8 streams cannot be reused. As soon as you call any terminal operation the stream is closed:
Stream<String> stream = Stream.of("d2", "a2", "b1", "b3", "c")
.filter(s -> s.startsWith("a"));
stream.anyMatch(s -> true); // ok
stream.noneMatch(s -> true); // exception
Calling `noneMatch` after `anyMatch` on the same stream results in the following exception:
java.lang.IllegalStateException: stream has already been operated upon or closed
at
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:229)
at
java.util.stream.ReferencePipeline.noneMatch(ReferencePipeline.java:459)
at com.winterbe.java8.Streams5.test7(Streams5.java:38)
at com.winterbe.java8.Streams5.main(Streams5.java:28)
To overcome this limitation we have to to create a new stream chain for every terminal operation we want to execute, e.g. we could create a stream supplier to construct a new stream with all intermediate operations already set up:
Supplier<Stream<String>> streamSupplier =
() -> Stream.of("d2", "a2", "b1", "b3", "c")
.filter(s -> s.startsWith("a"));
streamSupplier.get().anyMatch(s -> true); // ok
streamSupplier.get().noneMatch(s -> true); // ok
Each call to get() constructs a new stream on which we are save to call the desired terminal operation.
Use a Supplier to produce the stream for each termination operation.
Supplier<Stream<Integer>> streamSupplier = () -> list.stream();
Whenever you need a stream of that collection,
use streamSupplier.get() to get a new stream.
Examples:
streamSupplier.get().anyMatch(predicate);
streamSupplier.get().allMatch(predicate2);
We've implemented a duplicate() method for streams in jOOλ, an Open Source library that we created to improve integration testing for jOOQ. Essentially, you can just write:
Tuple2<Seq<A>, Seq<A>> duplicates = Seq.seq(doSomething()).duplicate();
Internally, there is a buffer storing all values that have been consumed from one stream but not from the other. That's probably as efficient as it gets if your two streams are consumed about at the same rate, and if you can live with the lack of thread-safety.
Here's how the algorithm works:
static <T> Tuple2<Seq<T>, Seq<T>> duplicate(Stream<T> stream) {
final List<T> gap = new LinkedList<>();
final Iterator<T> it = stream.iterator();
#SuppressWarnings("unchecked")
final Iterator<T>[] ahead = new Iterator[] { null };
class Duplicate implements Iterator<T> {
#Override
public boolean hasNext() {
if (ahead[0] == null || ahead[0] == this)
return it.hasNext();
return !gap.isEmpty();
}
#Override
public T next() {
if (ahead[0] == null)
ahead[0] = this;
if (ahead[0] == this) {
T value = it.next();
gap.offer(value);
return value;
}
return gap.poll();
}
}
return tuple(seq(new Duplicate()), seq(new Duplicate()));
}
More source code here
Tuple2 is probably like your Pair type, whereas Seq is Stream with some enhancements.
You could create a stream of runnables (for example):
results.stream()
.flatMap(either -> Stream.<Runnable> of(
() -> failure(either.left()),
() -> success(either.right())))
.forEach(Runnable::run);
Where failure and success are the operations to apply. This will however create quite a few temporary objects and may not be more efficient than starting from a collection and streaming/iterating it twice.
Another way to handle the elements multiple times is to use Stream.peek(Consumer):
doSomething().stream()
.peek(either -> handleFailure(either.left()))
.foreach(either -> handleSuccess(either.right()));
peek(Consumer) can be chained as many times as needed.
doSomething().stream()
.peek(element -> handleFoo(element.foo()))
.peek(element -> handleBar(element.bar()))
.peek(element -> handleBaz(element.baz()))
.foreach(element-> handleQux(element.qux()));
cyclops-react, a library I contribute to, has a static method that will allow you duplicate a Stream (and returns a jOOλ Tuple of Streams).
Stream<Integer> stream = Stream.of(1,2,3);
Tuple2<Stream<Integer>,Stream<Integer>> streams = StreamUtils.duplicate(stream);
See comments, there is performance penalty that will be incurred when using duplicate on an existing Stream. A more performant alternative would be to use Streamable :-
There is also a (lazy) Streamable class that can be constructed from a Stream, Iterable or Array and replayed multiple times.
Streamable<Integer> streamable = Streamable.of(1,2,3);
streamable.stream().forEach(System.out::println);
streamable.stream().forEach(System.out::println);
AsStreamable.synchronizedFromStream(stream) - can be used to create a Streamable that will lazily populate it's backing collection, in a way such that can be shared across threads. Streamable.fromStream(stream) will not incur any synchronization overhead.
For this particular problem you can use also partitioning. Something like
// Partition Eighters into left and right
List<Either<Pair<A, Throwable>, A>> results = doSomething();
Map<Boolean, Object> passingFailing = results.collect(Collectors.partitioningBy(s -> s.isLeft()));
passingFailing.get(true) <- here will be all passing (left values)
passingFailing.get(false) <- here will be all failing (right values)
We can make use of Stream Builder at the time of reading or iterating a stream.
Here's the document of Stream Builder.
https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.Builder.html
Use case
Let's say we have employee stream and we need to use this stream to write employee data in excel file and then update the employee collection/table
[This is just use case to show the use of Stream Builder]:
Stream.Builder<Employee> builder = Stream.builder();
employee.forEach( emp -> {
//store employee data to excel file
// and use the same object to build the stream.
builder.add(emp);
});
//Now this stream can be used to update the employee collection
Stream<Employee> newStream = builder.build();
I had a similar problem, and could think of three different intermediate structures from which to create a copy of the stream: a List, an array and a Stream.Builder. I wrote a little benchmark program, which suggested that from a performance point of view the List was about 30% slower than the other two which were fairly similar.
The only drawback of converting to an array is that it is tricky if your element type is a generic type (which in my case it was); therefore I prefer to use a Stream.Builder.
I ended up writing a little function that creates a Collector:
private static <T> Collector<T, Stream.Builder<T>, Stream<T>> copyCollector()
{
return Collector.of(Stream::builder, Stream.Builder::add, (b1, b2) -> {
b2.build().forEach(b1);
return b1;
}, Stream.Builder::build);
}
I can then make a copy of any stream str by doing str.collect(copyCollector()) which feels quite in keeping with the idiomatic usage of streams.
When I execute this code which opens a lot of files during a stream pipeline:
public static void main(String[] args) throws IOException {
Files.find(Paths.get("JAVA_DOCS_DIR/docs/api/"),
100, (path, attr) -> path.toString().endsWith(".html"))
.map(file -> runtimizeException(() -> Files.lines(file, StandardCharsets.ISO_8859_1)))
.map(Stream::count)
.forEachOrdered(System.out::println);
}
I get an exception:
java.nio.file.FileSystemException: /long/file/name: Too many open files
The problem is that Stream.count does not close the stream when it is done traversing it. But I don't see why it shouldn't, given that it is a terminal operation. The same holds for other terminal operations such as reduce and forEach. flatMap on the other hand closes the streams it consists of.
The documentation tells me to use a try-with-resouces-statement to close streams if necessary. In my case I could replace the count line with something like this:
.map(s -> { long c = s.count(); s.close(); return c; } )
But that is noisy and ugly and could be a real inconvenience in some cases with big, complex pipelines.
So my questions are the following:
Why were the streams not designed so that terminal operations close the streams they are working on? That would make them work better with IO streams.
What is the best solution for closing IO streams in pipelines?
runtimizeException is a method that wraps checked exception in RuntimeExceptions.
There are two issues here: handling of checked exceptions such as IOException, and timely closing of resources.
None of the predefined functional interfaces declare any checked exceptions, which means that they have to be handled within the lambda, or wrapped in an unchecked exception and rethrown. It looks like your runtimizeException function does that. You probably also had to declare your own functional interface for it. As you've probably discovered, this is a pain.
On the closing of resources like files, there was some investigation of having streams be closed automatically when the end of the stream was reached. This would be convenient, but it doesn't deal with closing when an exception is thrown. There's no magic do-the-right-thing mechanism for this in streams.
We're left with the standard Java techniques of dealing with resource closure, namely the try-with-resources construct introduced in Java 7. TWR really wants to have resources be closed at the same level in the call stack as they were opened. The principle of "whoever opens it has to close it" applies. TWR also deals with exception handling, which usually makes it convenient to deal with exception handling and resource closing in the same place.
In this example, the stream is somewhat unusual in that it maps a Stream<Path> to a Stream<Stream<String>>. These nested streams are the ones that aren't closed, resulting in the eventual exception when the system runs out of open file descriptors. What makes this difficult is that files are opened by one stream operation and then passed downstream; this makes it impossible to use TWR.
An alternative approach to structuring this pipeline is as follows.
The Files.lines call is the one that opens the file, so this has to be the resource in the TWR statement. The processing of this file is where (some) IOExceptions get thrown, so we can do the exception wrapping in the same TWR statement. This suggests having a simple function that maps the path to a line count, while handling resource closing and exception wrapping:
long lineCount(Path path) {
try (Stream<String> s = Files.lines(path, StandardCharsets.ISO_8859_1)) {
return s.count();
} catch (IOException ioe) {
throw new UncheckedIOException(ioe);
}
}
Once you have this helper function, the main pipeline looks like this:
Files.find(Paths.get("JAVA_DOCS_DIR/docs/api/"),
100, (path, attr) -> path.toString().endsWith(".html"))
.mapToLong(this::lineCount)
.forEachOrdered(System.out::println);
It is possible to create a utility method that reliably closes streams in the middle of a pipeline.
This makes sure that each resource is closed with a try-with-resource-statement but avoids the need for a custom utility method, and is much less verbose than writing the try-statement directly in the lambda.
With this method the pipeline from the question looks like this:
Files.find(Paths.get("Java_8_API_docs/docs/api"), 100,
(path, attr) -> path.toString().endsWith(".html"))
.map(file -> applyAndClose(
() -> Files.lines(file, StandardCharsets.ISO_8859_1),
Stream::count))
.forEachOrdered(System.out::println);
The implementation looks like this:
/**
* Applies a function to a resource and closes it afterwards.
* #param sup Supplier of the resource that should be closed
* #param op operation that should be performed on the resource before it is closed
* #return The result of calling op.apply on the resource
*/
private static <A extends AutoCloseable, B> B applyAndClose(Callable<A> sup, Function<A, B> op) {
try (A res = sup.call()) {
return op.apply(res);
} catch (RuntimeException exc) {
throw exc;
} catch (Exception exc) {
throw new RuntimeException("Wrapped in applyAndClose", exc);
}
}
(Since resources that need to be closed often also throw exceptions when they are allocated non-runtime exceptions are wrapped in runtime exceptions, avoiding the need for a separate method that does that.)
You will need to call close() in this stream operation, which will cause all underlying close handlers to be called.
Better yet, would be to wrap your whole statement in a try-with-resources block, as then it will automagically call the close handler.
This may not be possibility in your situation, this means that you will need to handle it yourself in some operation. Your current methods may not be suited for streams at all.
It seems like you indeed need to do it in your second map() operation.
The close of the interface AutoCloseable should only be called once. See the documentation of AutoCloseable for more information.
If final operations would close the stream automatically, close might be invoked twice. Take a look at the following example:
try (Stream<String> lines = Files.lines(path)) {
lines.count();
}
As it is defined right now, the close method on lines will be invoked exactly once. Regardless whether the final operation completes normally, or the operation is aborted with in IOException. If the stream would instead be closed implicitly in the final operation, the close method would be called once, if an IOException occurs, and twice if the operation completes successfully.
Here is an alternative which uses another method from Files and will avoid leaking file descriptors:
Files.find(Paths.get("JAVA_DOCS_DIR/docs/api/"),
100, (path, attr) -> path.toString().endsWith(".html"))
.map(file -> runtimizeException(() -> Files.readAllLines(file, StandardCharsets.ISO_8859_1).size())
.forEachOrdered(System.out::println);
Unlike your version, it will return an int instead of a long for the line count; but you don't have files with that many lines, do you?