Stream spliterator implementation detail - java

While looking into the source code of the WrappingSpliterator::trySplit, I was very mislead by it's implementation:
#Override
public Spliterator<P_OUT> trySplit() {
if (isParallel && buffer == null && !finished) {
init();
Spliterator<P_IN> split = spliterator.trySplit();
return (split == null) ? null : wrap(split);
}
else
return null;
}
And if you are wondering why this matters, is because for example this:
Arrays.asList(1,2,3,4,5)
.stream()
.filter(x -> x != 1)
.spliterator();
is using it. In my understanding the addition of any intermediate operation to a stream, will cause that code to be triggered.
Basically this method says that unless the stream is parallel, treat this Spliterator as one that can not be split, at all. And this matters to me. In one of my methods (this is how I got to that code), I get a Stream as input and "parse" it in smaller pieces, manually, with trySplit. You can think for example that I am trying to do a findLast from a Stream.
And this is where my desire to split in smaller chunks is nuked, because as soon as I do:
Spliterator<T> sp = stream.spliterator();
Spliterator<T> prefixSplit = sp.trySplit();
I find out that prefixSplit is null, meaning that I basically can't do anything else other than consume the entire sp with forEachRemaning.
And this is a bit weird, may be it makes some sense for when filter is present; because in this case the only way (in my understanding) a Spliterator could be returned is using some kind of a buffer, may be even with a predefined size (much like Files::lines). But why this:
Arrays.asList(1,2,3,4)
.stream()
.sorted()
.spliterator()
.trySplit();
returns null is something I don't understand. sorted is a stateful operation that buffers the elements anyway, without actually reducing or increasing their initial number, so at least theoretically this can return something other than null...

When you invoke spliterator() on a Stream, there are only two possible outcomes with the current implementation.
If the stream has no intermediate operations you’ll get the source spliterator that has been used to construct the stream and whose splitting capability is entirely independent from the stream’s parallel state, as in fact, the spliterator doesn’t know anything about the stream.
Otherwise, you’ll get a WrappingSpliterator, which will encapsulate a source Spliterator and a pipeline state, expressed as PipelineHelper. This combination of Spliterator and PipelineHelper does not need to work in parallel and, in fact, would not work in case of distinct(), as the WrappingSpliterator will get an entirely different combination, depending on whether the Stream is parallel or not.
For stateless intermediate operations, it would not make a difference though. But, as discussed in “Why the tryAdvance of stream.spliterator() may accumulate items into a buffer?”, the WrappingSpliterator is a “one-fits-all implementation” that doesn’t consider the actual nature of the pipeline, so its limitations are the superset of all possible limitations of all supported pipeline stages. So the existence of one scenario that wouldn’t work when ignoring the parallel flag is enough to forbid splitting for all pipelines when not being parallel.

Related

Are there any direct or indirect performance benefits of java 8 sequential streams?

While going through articles of sequential streams the question came in my mind that are there any performance benefits of using sequential streams over traditional for loops or streams are just sequential syntactic sugar with an additional performance overhead?
Consider Below Example where I can not see any performance benefits of using sequential streams:
Stream.of("d2", "a2", "b1", "b3", "c")
.filter(s -> {
System.out.println("filter: " + s);
return s.startsWith("a");
})
.forEach(s -> System.out.println("forEach: " + s));
Using classic java:
String[] strings = {"d2", "a2", "b1", "b3", "c"};
for (String s : strings)
{
System.out.println("Before filtering: " + s);
if (s.startsWith("a"))
{
System.out.println("After Filtering: " + s);
}
}
Point Here is in streams processing of a2 starts only after all the operations on d2 is complete(Earlier I thought while d2 is being processed by foreach ,filter would have strated operating on a2 but that is not the case as per this article : https://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/), same is the case with classic java, so what should be the motivation of using streams beyond "expressive" and "elegant" coding style?I know there are performance overheads for compiler while handling streams, does anyone know/have experienced about any performance benefits while using sequential streams?
First of all, letting special cases, like omitting a redundant sorted operation or returning the known size on count(), aside, the time complexity of an operation usually doesn’t change, so all differences in execution timing are usually about a constant offset or a (rather small) factor, not fundamental changes.
You can always write a manual loop doing basically the same as the Stream implementation does internally. So, internal optimizations, as mentioned by this answer could always get dismissed with “but I could do the same in my loop”.
But… when we compare “the Stream” with “a loop”, is it really reasonable to assume that all manual loops are written in the most efficient manner for the particular use case? A particular Stream implementation will apply its optimizations to all use cases where applicable, regardless of the experience level of the calling code’s author. I’ve already seen loops missing the opportunity to short-circuit or performing redundant operations not needed for a particular use case.
Another aspect is the information needed to perform certain optimizations. The Stream API is built around the Spliterator interface which can provide characteristics of the source data, e.g. it allows to find out whether the data has a meaningful order needed to be retained for certain operations or whether it is already pre-sorted, to the natural order or with a particular comparator. It may also provide the expected number of elements, as an estimate or exact, when predictable.
A method receiving an arbitrary Collection, to implement an algorithm with an ordinary loop, would have a hard time to find out, whether there are such characteristics. A List implies a meaningful order, whereas a Set usually does not, unless it’s a SortedSet or a LinkedHashSet, whereas the latter is a particular implementation class, rather than an interface. So testing against all known constellations may still miss 3rd party implementations with special contracts not expressible by a predefined interface.
Of course, since Java 8, you could acquire a Spliterator yourself, to examine these characteristics, but that would change your loop solution to a non-trivial thing and also imply repeating the work already done with the Stream API.
There’s also another interesting difference between Spliterator based Stream solutions and conventional loops, using an Iterator when iterating over something other than an array. The pattern is to invoke hasNext on the iterator, followed by next, unless hasNext returned false. But the contract of Iterator does not mandate this pattern. A caller may invoke next without hasNext, even multiple times, when it is known to succeed (e.g. you do already know the collection’s size). Also, a caller may invoke hasNext multiple times without next in case the caller did not remember the result of the previous call.
As a consequence, Iterator implementations have to perform redundant operations, e.g. the loop condition is effectively checked twice, once in hasNext, to return a boolean, and once in next, to throw a NoSuchElementException when not fulfilled. Often, the hasNext has to perform the actual traversal operation and store the result into the Iterator instance, to ensure that the result stays valid until the subsequent next call. The next operation in turn, has to check whether such a traversal did already happen or whether it has to perform the operation itself. In practice, the hot spot optimizer may or may not eliminate the overhead imposed by the Iterator design.
In contrast, the Spliterator has a single traversal method, boolean tryAdvance(Consumer<? super T> action), which performs the actual operation and returns whether there was an element. This simplifies the loop logic significantly. There’s even the void forEachRemaining(Consumer<? super T> action) for non-short-circuiting operations, which allows the actual implementation to provide the entire looping logic. E.g., in case of ArrayList the operation will end up at a simple counting loop over the indices, performing a plain array access.
You may compare such design with, e.g. readLine() of BufferedReader, which performs the operation and returns null after the last element, or find() of a regex Matcher, which performs the search, updates the matcher’s state and returns the success state.
But the impact of such design differences is hard to predict in an environment with an optimizer designed specifically to identify and eliminate redundant operations. The takeaway is that there is some potential for Stream based solutions to turn out to be even faster, while it depends on a lot of factors whether it will ever materialize in a particular scenario. As said at the beginning, it’s usually not changing the overall time complexity, which would be more important to worry about.
Streams might (and have some tricks already) under the hood, that a traditional for-loop does not. For example:
Arrays.asList(1,2,3)
.map(x -> x + 1)
.count();
Since java-9, map will be skipped, since you don't really care about it.
Or internal implementation might check if a certain data structure is already sorted, for example:
someSource.stream()
.sorted()
....
If someSource is already sorted (like a TreeSet), in such a case sorted would be a no-op. There are many of these optimizations that are done internally and there is ground for even more that may be will be done in the future.
If you were to use streams still, you could have created a stream out of your array using Arrays.stream and used a forEach as:
Arrays.stream(strings).forEach(s -> {
System.out.println("Before filtering: " + s);
if (s.startsWith("a")) {
System.out.println("After Filtering: " + s);
}
});
On the performance note, since you would be willing to traverse the entire array, there is no specific benefit from using streams over loops. More about it has been discussed In Java, what are the advantages of streams over loops? and other linked questions.
enter image description hereIf using stream, we can use with parallel(), as bellow:
Stream<String> stringStream = Stream.of("d2", "a2", "b1", "b3", "c")
.parallel()
.filter(s -> s.startsWith("d"));
It's faster because your computer will normally be able to run more than one thread together.
Test it's:
#Test
public void forEachVsStreamVsParallelStream_Test() {
IntStream range = IntStream.range(Integer.MIN_VALUE, Integer.MAX_VALUE);
StopWatch stopWatch = new StopWatch();
stopWatch.start("for each");
int forEachResult = 0;
for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE; i++) {
if (i % 15 == 0)
forEachResult++;
}
stopWatch.stop();
stopWatch.start("stream");
long streamResult = range
.filter(v -> (v % 15 == 0))
.count();
stopWatch.stop();
range = IntStream.range(Integer.MIN_VALUE, Integer.MAX_VALUE);
stopWatch.start("parallel stream");
long parallelStreamResult = range
.parallel()
.filter(v -> (v % 15 == 0))
.count();
stopWatch.stop();
System.out.println(String.format("forEachResult: %s%s" +
"parallelStreamResult: %s%s" +
"streamResult: %s%s",
forEachResult, System.lineSeparator(),
parallelStreamResult, System.lineSeparator(),
streamResult, System.lineSeparator()));
System.out.println("prettyPrint: " + stopWatch.prettyPrint());
System.out.println("Time Elapsed: " + stopWatch.getTotalTimeSeconds());
}

Java Stream stateful findFirst

The below method is part of a weighted random selection algorithm for picking songs.
I would like to convert the below method to use streams, to decide if it would be clearer / preferable. I am not certain it is possible at all, since calculation is a stateful operation, dependent on position in the list.
public Song songForTicketNumber(long ticket)
{
if(ticket<0) return null;
long remaining = ticket;
for(Song s : allSongs) // allSongs is ordered list
{
rem-=s.numTickets; // numTickets is a long and never negative
if(remaining<0)
return s;
}
return null;
}
More formally: If n is the sum of all Song::numTickets for each Song object in allSongs, then for any integer 0 through n-1, the above method should return a song in the list. The number of integers which will return a specific Song object x, would be determined by x.numTickets. The selection criteria for a specific song is a range of consecutive integers determined by both its numTickets property and the numTickets property for each item in the list to its left. As currently written, anything outside the range would return null.
Note: Out of range behavior can be modified to accommodate Streams (other than returning null)
The efficiency of a Stream compared to a basic for or for-each loop is a matter of circumstance. In yours, it's highly likely that a Stream would be less efficient than your current code for, among others, these major reasons:
Your function is stateful as you mentioned. Maintaining a state with this method probably means finagling some kind of anonymous implementation of a BinaryOperator to use with Stream.reduce, and it's going to turn out bulkier and more confusing to read than your current code.
You're short circuiting in your current loop, and no Stream operation will reflect that kind of efficiency, especially considering this in combination with #1.
Your collection is ordered, which means the stream will iterate over elements in a manner very similar to your existing loop anyway. Depending on the size of your collection, you might get some efficiency out of parallelStream, but having to maintain the order in this case will mean a less efficient stream.
The only real benefit you could get from switching to a Stream is the difference in memory consumption (You could keep allSongs out of memory and let Stream handle it in a more memory-efficient way), which doesn't seem applicable here.
In conclusion, since the Stream operations would be even more complex to write and would probably be harmful, if anything, to your efficiency, I would recommend that you do not pursue this change.
That being said, I personally can't come up with a Stream based solution to actually answer your question of how to convert this work to a Stream. Again, it would be something complex and strange involving a reducer or similar... (I'll delete this answer if this is insufficient.)
Java streams do have the facility to short circuit evaluation, see for example the documentation for findFirst(). Having said that, decrementing and checking remaining, requires state mutation which is not great. Not great, but doable:
public Optional<Song> songForTicketNumber(long ticket, Stream<Song> songs) {
if (ticket < 0) return Optional.empty();
AtomicLong remaining = new AtomicLong(ticket);
return songs.filter(song -> decrementAndCheck(song, remaining)).findFirst();
}
private boolean decrementAndCheck(Song song, AtomicLong total) {
total.addAndGet(-song.numTickets);
return total.get() < 0;
}
As far as I can tell, the only advantage of this approach is that you could switch to parallel streams if you wanted to.

Invoking .map() on an infinite stream?

According to Javadocs for SE 8 Stream.map() does the following
Returns a stream consisting of the results of applying the given function to the elements of this stream.
However, a book I'm reading (Learning Network Programming with Java, Richard M. Reese) on networking implements roughly the following code snippet in an echo server.
Supplier<String> inputLine = () -> {
try {
return br.readLine();
} catch(IOException e) {
e.printStackTrace();
return null;
}
};
Stream.generate(inputLine).map((msg) -> {
System.out.println("Recieved: " + (msg == null ? "end of stream" : msg));
out.println("echo: " + msg);
return msg;
}).allMatch((msg) -> msg != null);
This is supposed to be a functional way to accomplish getting user input to print to the socket input stream. It works as intended, but I don't quite understand how. Is it because map knows the stream is infinite so it lazily executes as new stream tokens become available? It seems like adding something to a collection currently being iterated over by map is a little black magick. Could someone please help me understand what is going on behind the scenes?
Here is how I restated this in order to avoid the confusing map usage. I believe the author was trying to avoid an infinite loop since you can't break out of a forEach.
Stream.generate(inputLine).allMatch((msg) -> {
boolean alive = msg != null;
System.out.println("Recieved: " + (alive ? msg : "end of stream"));
out.println("echo: " + msg);
return alive;
});
Streams are lazy. Think of them as workers in a chain that pass buckets to each other. The laziness is in the fact that they will only ask the worker behind them for the next bucket if the worker in front of them asks them for it.
So it's best to think about this as allMatch - being a final action, thus eager - asking the map stream for the next item, and the map stream asking the generate stream for the next item, and the generate stream going to its supplier, and providing that item as soon as it arrives.
It stops when allMatch stops asking for items. And it does so when it knows the answer. Are all items in this stream not null? As soon as the allMatch receives an item that is null, it knows the answer is false, and will finish and not ask for any more items. Because the stream is infinite, it will not stop otherwise.
So you have two factors causing this to work the way it work - one is allMatch asking eagerly for the next item (as long as the previous ones weren't null), and the generate stream that - in order to supply that next item - may need to wait for the supplier that waits for the user to send more input.
But it should be said that map shouldn't have been used here. There should not be side effects in map - it should be used for mapping an item of one type to an item of another type. I think this example was used only as a study aid. The much simpler and straightforward way would be to use BufferedReader's method lines() which gives you a finite Stream of the lines coming from the buffered reader.
Yes - Streams are setup lazily until and unless you perform a terminal operation (final action) on the Stream. Or simpler:
For as long as the operations on your stream return another Stream, you do not have a terminal operation, and you keep on chaining until you have something returning anything other than a Stream, including void.
This makes sense, as to be able to return anything other than a Stream, the operations earlier in your stream will need to be evaluated to actually be able to provide the data.
In this case, and as per documentation, allMatch returns a boolean, and thus final execution of your stream is required to calculate that boolean. This is the point also where you provide a Predicate limiting your resulting Stream.
Also note that in the documentation it states:
This is a short-circuiting terminal operation.
Follow that link for more information on those terminal operations, but a terminal operation basically means that it will actually execute the operation. Additionally, the limiting of your infinite stream is the 'short-circuiting' aspect of that method.
Here are two the most relevant sentences of the java-stream documentation. The snippet you provided is a perfect example of these working together:
Stream::generate(Supplier<T> s) says that it returns:
Returns an infinite sequential unordered stream where each element is generated by the provided Supplier.
3rd dot of Stream package documentation:
Laziness-seeking. Many stream operations, such as filtering, mapping, or duplicate removal, can be implemented lazily, exposing opportunities for optimization. For example, "find the first String with three consecutive vowels" need not examine all the input strings. Stream operations are divided into intermediate (Stream-producing) operations and terminal (value- or side-effect-producing) operations. Intermediate operations are always lazy.
In a shortcut, this generated stream await the further elements until the terminal operation is reached. As long as the execution inside the supplied Supplier<T>, the stream pipeline continues.
As an example, if you provide the following Supplier, the execution has no chance to stop and will continue infinitely:
Supplier<String> inputLine = () -> {
return "Hello world";
};

Java 8 Stream: Iterating, Processing and Count

Is it ok to process and count processed data in such way?
long count = userDao.findApprovedWithoutData().parallelStream().filter(u -> {
Data d = dataDao.findInfoByEmail(u.getEmail());
boolean ret = false;
if (d != null) {
String result = "";
result += getFieldValue(d::getName, ". \n");
result += getFieldValue(d::getOrganization, ". \n");
result += getFieldValue(d::getAddress, ". \n");
if(!result.isEmpty()) {
u.setData(d.getInfo());
userDao.update(u);
ret = true;
}
}
return ret;
}).count();
So, in short: iterate over not complete records, update if data is present and count this number of records?
IMHO this is bad code, because:
The filter predicate has (quite significant) side effects
Predicates should not have side effects (just like getters shouldn't). It's unexpected, and that makes it bad.
The filter predicate is very inefficient
Each execution of the predicate causes a large chain of queries to fire, which makes this code not scaleable.
At first glance, the main purpose seems to be getting a count, but really that's a minor (dispensable) bit of info
Good code makes it obvious what is going on (unlike this code)
You should change the code to use a (fairly simple) single update query (that employs a join) and get the count from the "number of rows updated" info in the result from the persistence API.
It depends on your definition of process . I cannot give you a clear yes or no because, I think it is hard to conclude without understanding your code and how it is implemented.
You are using Parallel Stream and what happens there is Java runtime splits the Stream into sub-streams based on number of available threads in ForkJoinPool's common pool.
When using parallelism you need to be careful for possible side effects:
Interference (Lambda expression in a stream should not interfere)
Lambda expressions in stream operations should not interfere.
Interference occurs when the source of a stream is modified while a
pipeline processes the stream.
Statetful Lambda expressions
Avoid using stateful lambda expressions as parameters in stream
operations. A stateful lambda expression is one whose result depends
on any state that might change during the execution of a pipeline.
Looking at your question and applying the above points to it.
Non-interference > strongly states that Lambda expressions should not interfere with the source of stream (unless stream source is concurrent) during pipeline operation because it can cause:
Exception (i.e. ConcurrentModificationException)
Incorrect Answer
Nonconformant behaviour
With exception of well-behaved streams where the modification takes place during intermediate operation (i.e. filter), read more in here.
Your Lambda expression does interfere with the source of the stream, which is not advised but, the interference is within Intermediate operation and now everything comes down to whether the stream is well-behaved or not. So you might consider re-thinking your lambda expression when it comes to interference. It might also come down to how you update the source of the stream via userDao.udpate, which is not clear from your question.
Stateful Lambda Expression > Your Lambda expression does not seem to be stateful and that is because the result of Lambda depends on value/s that do not change during the execution of the pipeline. So this does not apply to your case.
I advise you go through the documentation of Java 8 Stream as well as this blog which explains Java 8 Stream really well with examples.

Should I return a Collection or a Stream?

Suppose I have a method that returns a read-only view into a member list:
class Team {
private List<Player> players = new ArrayList<>();
// ...
public List<Player> getPlayers() {
return Collections.unmodifiableList(players);
}
}
Further suppose that all the client does is iterate over the list once, immediately. Maybe to put the players into a JList or something. The client does not store a reference to the list for later inspection!
Given this common scenario, should I return a stream instead?
public Stream<Player> getPlayers() {
return players.stream();
}
Or is returning a stream non-idiomatic in Java? Were streams designed to always be "terminated" inside the same expression they were created in?
The answer is, as always, "it depends". It depends on how big the returned collection will be. It depends on whether the result changes over time, and how important consistency of the returned result is. And it depends very much on how the user is likely to use the answer.
First, note that you can always get a Collection from a Stream, and vice versa:
// If API returns Collection, convert with stream()
getFoo().stream()...
// If API returns Stream, use collect()
Collection<T> c = getFooStream().collect(toList());
So the question is, which is more useful to your callers.
If your result might be infinite, there's only one choice: Stream.
If your result might be very large, you probably prefer Stream, since there may not be any value in materializing it all at once, and doing so could create significant heap pressure.
If all the caller is going to do is iterate through it (search, filter, aggregate), you should prefer Stream, since Stream has these built-in already and there's no need to materialize a collection (especially if the user might not process the whole result.) This is a very common case.
Even if you know that the user will iterate it multiple times or otherwise keep it around, you still may want to return a Stream instead, for the simple fact that whatever Collection you choose to put it in (e.g., ArrayList) may not be the form they want, and then the caller has to copy it anyway. If you return a Stream, they can do collect(toCollection(factory)) and get it in exactly the form they want.
The above "prefer Stream" cases mostly derive from the fact that Stream is more flexible; you can late-bind to how you use it without incurring the costs and constraints of materializing it to a Collection.
The one case where you must return a Collection is when there are strong consistency requirements, and you have to produce a consistent snapshot of a moving target. Then, you will want put the elements into a collection that will not change.
So I would say that most of the time, Stream is the right answer — it is more flexible, it doesn't impose usually-unnecessary materialization costs, and can be easily turned into the Collection of your choice if needed. But sometimes, you may have to return a Collection (say, due to strong consistency requirements), or you may want to return Collection because you know how the user will be using it and know this is the most convenient thing for them.
If you already have a suitable Collection "lying around", and it seems likely that your users would rather interact with it as a Collection, then it is a reasonable choice (though not the only one, and more brittle) to just return what you have.
I have a few points to add to Brian Goetz' excellent answer.
It's quite common to return a Stream from a "getter" style method call. See the Stream usage page in the Java 8 javadoc and look for "methods... that return Stream" for the packages other than java.util.Stream. These methods are usually on classes that represent or can contain multiple values or aggregations of something. In such cases, APIs typically have returned collections or arrays of them. For all the reasons that Brian noted in his answer, it's very flexible to add Stream-returning methods here. Many of these classes have collections- or array-returning methods already, because the classes predate the Streams API. If you're designing a new API, and it makes sense to provide Stream-returning methods, it might not be necessary to add collection-returning methods as well.
Brian mentioned the cost of "materializing" the values into a collection. To amplify this point, there are actually two costs here: the cost of storing values in the collection (memory allocation and copying) and also the cost of creating the values in the first place. The latter cost can often be reduced or avoided by taking advantage of a Stream's laziness-seeking behavior. A good example of this are the APIs in java.nio.file.Files:
static Stream<String> lines(path)
static List<String> readAllLines(path)
Not only does readAllLines have to hold the entire file contents in memory in order to store it into the result list, it also has to read the file to the very end before it returns the list. The lines method can return almost immediately after it has performed some setup, leaving file reading and line breaking until later when it's necessary -- or not at all. This is a huge benefit, if for example, the caller is interested only in the first ten lines:
try (Stream<String> lines = Files.lines(path)) {
List<String> firstTen = lines.limit(10).collect(toList());
}
Of course considerable memory space can be saved if the caller filters the stream to return only lines matching a pattern, etc.
An idiom that seems to be emerging is to name stream-returning methods after the plural of the name of the things that it represents or contains, without a get prefix. Also, while stream() is a reasonable name for a stream-returning method when there is only one possible set of values to be returned, sometimes there are classes that have aggregations of multiple types of values. For example, suppose you have some object that contains both attributes and elements. You might provide two stream-returning APIs:
Stream<Attribute> attributes();
Stream<Element> elements();
Were streams designed to always be "terminated" inside the same expression they were created in?
That is how they are used in most examples.
Note: returning a Stream is not that different to returning a Iterator (admitted with much more expressive power)
IMHO the best solution is to encapsulate why you are doing this, and not return the collection.
e.g.
public int playerCount();
public Player player(int n);
or if you intend to count them
public int countPlayersWho(Predicate<? super Player> test);
If the stream is finite, and there is an expected/normal operation on the returned objects which will throw a checked exception, I always return a Collection. Because if you are going to be doing something on each of the objects that can throw a check exception, you will hate the stream. One real lack with streams i there inability to deal with checked exceptions elegantly.
Now, perhaps that is a sign that you don't need the checked exceptions, which is fair, but sometimes they are unavoidable.
While some of the more high-profile respondents gave great general advice, I'm surprised no one has quite stated:
If you already have a "materialized" Collection in-hand (i.e. it was already created before the call - as is the case in the given example, where it is a member field), there is no point converting it to a Stream. The caller can easily do that themselves. Whereas, if the caller wants to consume the data in its original form, you converting it to a Stream forces them to do redundant work to re-materialize a copy of the original structure.
In contrast to collections, streams have additional characteristics. A stream returned by any method might be:
finite or infinite
parallel or sequential (with a default globally shared threadpool that can impact any other part of an application)
ordered or non-ordered
holding references to be closed or not
These differences also exists in collections, but there they are part of the obvious contract:
All Collections have size, Iterator/Iterable can be infinite.
Collections are explicitly ordered or non-ordered
Parallelity is thankfully not something the collection care about beyond thread-safety
Collections also are not closable typically, so also no need to worry about using try-with-resources as a guard.
As a consumer of a stream (either from a method return or as a method parameter) this is a dangerous and confusing situation. To make sure their algorithm behaves correctly, consumers of streams need to make sure the algorithm makes no wrong assumption about the stream characteristics. And that is a very hard thing to do. In unit testing, that would mean that you have to multiply all your tests to be repeated with the same stream contents, but with streams that are
(finite, ordered, sequential, requiring-close)
(finite, ordered, parallel, requiring-close)
(finite, non-ordered, sequential, requiring-close)...
Writing method guards for streams that throw an IllegalArgumentException if the input stream has a characteristics breaking your algorithm is difficult, because the properties are hidden.
Documentation mitigates the problem, but it is flawed and often overlooked, and does not help when a stream provider is modified. As an example, see these javadocs of Java8 Files:
/**
* [...] The returned stream encapsulates a Reader. If timely disposal of
* file system resources is required, the try-with-resources
* construct should be used to ensure that the stream's close
* method is invoked after the stream operations are completed.
*/
public static Stream<String> lines(Path path, Charset cs)
/**
* [...] no mention of closing even if this wraps the previous method
*/
public static Stream<String> lines(Path path)
That leaves Stream only as a valid choice in a method signature when none of the problems above matter, typically when the stream producer and consumer are in the same codebase, and all consumers are known (e.g. not part of the public interface of a class reusable in many places).
It is much safer to use other datatypes in method signatures with an explicit contract (and without implicit thread-pool processing involved) that makes it impossible to accidentally process data with wrong assumptions about orderedness, sizedness or parallelity (and threadpool usage).
I think it depends on your scenario. May be, if you make your Team implement Iterable<Player>, it is sufficient.
for (Player player : team) {
System.out.println(player);
}
or in the a functional style:
team.forEach(System.out::println);
But if you want a more complete and fluent api, a stream could be a good solution.
Perhaps a Stream factory would be a better choice. The big win of only
exposing collections via Stream is that it better encapsulates your
domain model’s data structure. It’s impossible for any use of your domain classes to affect the inner workings of your List or Set simply
by exposing a Stream.
It also encourages users of your domain class to
write code in a more modern Java 8 style. It’s possible to
incrementally refactor to this style by keeping your existing getters
and adding new Stream-returning getters. Over time, you can rewrite
your legacy code until you’ve finally deleted all getters that return
a List or Set. This kind of refactoring feels really good once you’ve
cleared out all the legacy code!
I would probably have 2 methods, one to return a Collection and one to return the collection as a Stream.
class Team
{
private List<Player> players = new ArrayList<>();
// ...
public List<Player> getPlayers()
{
return Collections.unmodifiableList(players);
}
public Stream<Player> getPlayerStream()
{
return players.stream();
}
}
This is the best of both worlds. The client can choose if they want the List or the Stream and they don't have to do the extra object creation of making an immutable copy of the list just to get a Stream.
This also only adds 1 more method to your API so you don't have too many methods

Categories

Resources