Java Stream stateful findFirst - java

The below method is part of a weighted random selection algorithm for picking songs.
I would like to convert the below method to use streams, to decide if it would be clearer / preferable. I am not certain it is possible at all, since calculation is a stateful operation, dependent on position in the list.
public Song songForTicketNumber(long ticket)
{
if(ticket<0) return null;
long remaining = ticket;
for(Song s : allSongs) // allSongs is ordered list
{
rem-=s.numTickets; // numTickets is a long and never negative
if(remaining<0)
return s;
}
return null;
}
More formally: If n is the sum of all Song::numTickets for each Song object in allSongs, then for any integer 0 through n-1, the above method should return a song in the list. The number of integers which will return a specific Song object x, would be determined by x.numTickets. The selection criteria for a specific song is a range of consecutive integers determined by both its numTickets property and the numTickets property for each item in the list to its left. As currently written, anything outside the range would return null.
Note: Out of range behavior can be modified to accommodate Streams (other than returning null)

The efficiency of a Stream compared to a basic for or for-each loop is a matter of circumstance. In yours, it's highly likely that a Stream would be less efficient than your current code for, among others, these major reasons:
Your function is stateful as you mentioned. Maintaining a state with this method probably means finagling some kind of anonymous implementation of a BinaryOperator to use with Stream.reduce, and it's going to turn out bulkier and more confusing to read than your current code.
You're short circuiting in your current loop, and no Stream operation will reflect that kind of efficiency, especially considering this in combination with #1.
Your collection is ordered, which means the stream will iterate over elements in a manner very similar to your existing loop anyway. Depending on the size of your collection, you might get some efficiency out of parallelStream, but having to maintain the order in this case will mean a less efficient stream.
The only real benefit you could get from switching to a Stream is the difference in memory consumption (You could keep allSongs out of memory and let Stream handle it in a more memory-efficient way), which doesn't seem applicable here.
In conclusion, since the Stream operations would be even more complex to write and would probably be harmful, if anything, to your efficiency, I would recommend that you do not pursue this change.
That being said, I personally can't come up with a Stream based solution to actually answer your question of how to convert this work to a Stream. Again, it would be something complex and strange involving a reducer or similar... (I'll delete this answer if this is insufficient.)

Java streams do have the facility to short circuit evaluation, see for example the documentation for findFirst(). Having said that, decrementing and checking remaining, requires state mutation which is not great. Not great, but doable:
public Optional<Song> songForTicketNumber(long ticket, Stream<Song> songs) {
if (ticket < 0) return Optional.empty();
AtomicLong remaining = new AtomicLong(ticket);
return songs.filter(song -> decrementAndCheck(song, remaining)).findFirst();
}
private boolean decrementAndCheck(Song song, AtomicLong total) {
total.addAndGet(-song.numTickets);
return total.get() < 0;
}
As far as I can tell, the only advantage of this approach is that you could switch to parallel streams if you wanted to.

Related

Can I use identityHashCode to produce a compareTo between Objects respecting same-ness?

I want to implement a simple comparator between two Objects, whose only requirements are that
it is a valid comparator (i.e. defines a linear order on all objects) and
.compare will return 0 if and only if the objects are the same.
Will Comparator.comparing(System::identityHashCode) work? Is there another way?
Motivation:
I want to build a collection that will allow me to store time-stamped messages in a thread-safe collection, which will support queries like "get me all the messages whose timestamp lies in [a,b)".
It seems that Guava's TreeMultimap uses a global lock (edit: if wrapped with the synchronizedSortedSetMultimap wrapper), and ConcurrentSkipListMap seems to support only one entry per time (it is a map, not a multi map). So I thought of using just a set of pairs:
ConcurrentSkipListSet<ImmutablePair<Float,Message>> db,
where the pairs are lexically ordered, first by the times (using Float.compareTo) and then by something like Comparator.nullsFirst(Comparator.comparing(System::identityHashCode)).
The nullsFirst is there just so db.subSet(ImmutablePair.of(a,null), ImmutablePair.of(b,null)) queries the half-open time interval [a,b).
You see why I care about the comparator preserving sameness: if the message comparator returns zero for non-same messages, messages may be deleted.
You also see why I don't need much else from the comparator: it's just there so I can use the storage mechanism of ConcurrentSkipListSet. I certainly don't want to impose on the user (well, just me :-) to implement a comparator for Message.
Another possible solution is to use a ConcurrentSkipListMap<Float, Set<Message>> (with thread-safe Set<> instances) but it seems a bit wasteful in terms of memory, and I will need to remove emptySet's myself to save memory once messages are deleted.
EDIT: As several people noted, identityHashCode may produce collisions, and in fact I've now confirmed that such collisions exist in my setup (which is roughly equivalent to having 4K collections as above, each populated with 4K messages per time bin). This is most likely the reason I see some messages dropped. So I'm now even more interested than ever in finding some way to have an "agnostic" comparison operator, that truly respects sameness. Actually, a 64 bit hash value (instead of the 32bit value provided by identityHashCode) would probably suffice.
While it's not guaranteed, I suspect the chances of this causing a problem are vanishingly small.
System.identityHashCode returns the value that Object.hashCode would return if not overridden, including this in the documentation:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.
So is "as much as is reasonably practical" sufficient? While it's not guaranteed, I would be very surprised if you ever ran into a situation where it causes a problem. You'd have to have two messages with exactly the same timestamp and where the JVM's Object.hashCode implementation returns the same value for the two messages.
If the result of that coincidence were to be "nuclear power plant explodes" then I wouldn't risk it. If the result of that coincidence were to be "we fail to bill a customer" - or even "we bill a customer twice, and might get sued" I'd probably accept that chance, if no better alternatives are suggested.
As #StuartMarks noted in his comment, Guava supports
Ordering.arbitrary(), which provides thread-safe collision handling. The implementation makes an efficient use of identityHashCode:
#Override
public int compare(Object left, Object right) {
if (left == right) {
return 0;
} else if (left == null) {
return -1;
} else if (right == null) {
return 1;
}
int leftCode = identityHashCode(left);
int rightCode = identityHashCode(right);
if (leftCode != rightCode) {
return leftCode < rightCode ? -1 : 1;
}
// identityHashCode collision (rare, but not as rare as you'd think)
int result = getUid(left).compareTo(getUid(right));
if (result == 0) {
throw new AssertionError(); // extremely, extremely unlikely.
}
return result;
}
so only if there is a hash collision, getUid (which uses a memoized AtomicInteger counter to allocate uid's) is invoked.
It's also quite easy to write (perhaps less easy to read?) the desired timestamped message container in "one" line:
db = new ConcurrentSkipListSet<>(
(Ordering.<Float>natural().<ImmutablePair<Float,Message>>onResultOf(x -> x.left))
.compound(Ordering.arbitrary().nullsFirst().<ImmutablePair<Float,Message>>onResultOf(x -> x.right)))
Will Comparator.comparing(System::identityHashCode) work? Is there another way?
As mentioned, identityHashCode is not unique.
Actually, a 64 bit hash value (instead of the 32bit value provided by identityHashCode) would probably suffice
I think this would just be reducing the chances of overlap, not removing them. Hash algorithems are designed to limit overlaps but typically have no guarantees about none. For example, MD5 is 128 bit and still has overlaps.
How about just assigning a unique number to each message with AtomicLong. Then your comparison function would do:
Compare by time. I would use long if possible instead of float.
If same time then compare by the unique value.
If you have multiple systems doing the ingesting of these messages then you are going to need to record unique system-id and message number to ensure uniqueness.

Are there any direct or indirect performance benefits of java 8 sequential streams?

While going through articles of sequential streams the question came in my mind that are there any performance benefits of using sequential streams over traditional for loops or streams are just sequential syntactic sugar with an additional performance overhead?
Consider Below Example where I can not see any performance benefits of using sequential streams:
Stream.of("d2", "a2", "b1", "b3", "c")
.filter(s -> {
System.out.println("filter: " + s);
return s.startsWith("a");
})
.forEach(s -> System.out.println("forEach: " + s));
Using classic java:
String[] strings = {"d2", "a2", "b1", "b3", "c"};
for (String s : strings)
{
System.out.println("Before filtering: " + s);
if (s.startsWith("a"))
{
System.out.println("After Filtering: " + s);
}
}
Point Here is in streams processing of a2 starts only after all the operations on d2 is complete(Earlier I thought while d2 is being processed by foreach ,filter would have strated operating on a2 but that is not the case as per this article : https://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/), same is the case with classic java, so what should be the motivation of using streams beyond "expressive" and "elegant" coding style?I know there are performance overheads for compiler while handling streams, does anyone know/have experienced about any performance benefits while using sequential streams?
First of all, letting special cases, like omitting a redundant sorted operation or returning the known size on count(), aside, the time complexity of an operation usually doesn’t change, so all differences in execution timing are usually about a constant offset or a (rather small) factor, not fundamental changes.
You can always write a manual loop doing basically the same as the Stream implementation does internally. So, internal optimizations, as mentioned by this answer could always get dismissed with “but I could do the same in my loop”.
But… when we compare “the Stream” with “a loop”, is it really reasonable to assume that all manual loops are written in the most efficient manner for the particular use case? A particular Stream implementation will apply its optimizations to all use cases where applicable, regardless of the experience level of the calling code’s author. I’ve already seen loops missing the opportunity to short-circuit or performing redundant operations not needed for a particular use case.
Another aspect is the information needed to perform certain optimizations. The Stream API is built around the Spliterator interface which can provide characteristics of the source data, e.g. it allows to find out whether the data has a meaningful order needed to be retained for certain operations or whether it is already pre-sorted, to the natural order or with a particular comparator. It may also provide the expected number of elements, as an estimate or exact, when predictable.
A method receiving an arbitrary Collection, to implement an algorithm with an ordinary loop, would have a hard time to find out, whether there are such characteristics. A List implies a meaningful order, whereas a Set usually does not, unless it’s a SortedSet or a LinkedHashSet, whereas the latter is a particular implementation class, rather than an interface. So testing against all known constellations may still miss 3rd party implementations with special contracts not expressible by a predefined interface.
Of course, since Java 8, you could acquire a Spliterator yourself, to examine these characteristics, but that would change your loop solution to a non-trivial thing and also imply repeating the work already done with the Stream API.
There’s also another interesting difference between Spliterator based Stream solutions and conventional loops, using an Iterator when iterating over something other than an array. The pattern is to invoke hasNext on the iterator, followed by next, unless hasNext returned false. But the contract of Iterator does not mandate this pattern. A caller may invoke next without hasNext, even multiple times, when it is known to succeed (e.g. you do already know the collection’s size). Also, a caller may invoke hasNext multiple times without next in case the caller did not remember the result of the previous call.
As a consequence, Iterator implementations have to perform redundant operations, e.g. the loop condition is effectively checked twice, once in hasNext, to return a boolean, and once in next, to throw a NoSuchElementException when not fulfilled. Often, the hasNext has to perform the actual traversal operation and store the result into the Iterator instance, to ensure that the result stays valid until the subsequent next call. The next operation in turn, has to check whether such a traversal did already happen or whether it has to perform the operation itself. In practice, the hot spot optimizer may or may not eliminate the overhead imposed by the Iterator design.
In contrast, the Spliterator has a single traversal method, boolean tryAdvance(Consumer<? super T> action), which performs the actual operation and returns whether there was an element. This simplifies the loop logic significantly. There’s even the void forEachRemaining(Consumer<? super T> action) for non-short-circuiting operations, which allows the actual implementation to provide the entire looping logic. E.g., in case of ArrayList the operation will end up at a simple counting loop over the indices, performing a plain array access.
You may compare such design with, e.g. readLine() of BufferedReader, which performs the operation and returns null after the last element, or find() of a regex Matcher, which performs the search, updates the matcher’s state and returns the success state.
But the impact of such design differences is hard to predict in an environment with an optimizer designed specifically to identify and eliminate redundant operations. The takeaway is that there is some potential for Stream based solutions to turn out to be even faster, while it depends on a lot of factors whether it will ever materialize in a particular scenario. As said at the beginning, it’s usually not changing the overall time complexity, which would be more important to worry about.
Streams might (and have some tricks already) under the hood, that a traditional for-loop does not. For example:
Arrays.asList(1,2,3)
.map(x -> x + 1)
.count();
Since java-9, map will be skipped, since you don't really care about it.
Or internal implementation might check if a certain data structure is already sorted, for example:
someSource.stream()
.sorted()
....
If someSource is already sorted (like a TreeSet), in such a case sorted would be a no-op. There are many of these optimizations that are done internally and there is ground for even more that may be will be done in the future.
If you were to use streams still, you could have created a stream out of your array using Arrays.stream and used a forEach as:
Arrays.stream(strings).forEach(s -> {
System.out.println("Before filtering: " + s);
if (s.startsWith("a")) {
System.out.println("After Filtering: " + s);
}
});
On the performance note, since you would be willing to traverse the entire array, there is no specific benefit from using streams over loops. More about it has been discussed In Java, what are the advantages of streams over loops? and other linked questions.
enter image description hereIf using stream, we can use with parallel(), as bellow:
Stream<String> stringStream = Stream.of("d2", "a2", "b1", "b3", "c")
.parallel()
.filter(s -> s.startsWith("d"));
It's faster because your computer will normally be able to run more than one thread together.
Test it's:
#Test
public void forEachVsStreamVsParallelStream_Test() {
IntStream range = IntStream.range(Integer.MIN_VALUE, Integer.MAX_VALUE);
StopWatch stopWatch = new StopWatch();
stopWatch.start("for each");
int forEachResult = 0;
for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE; i++) {
if (i % 15 == 0)
forEachResult++;
}
stopWatch.stop();
stopWatch.start("stream");
long streamResult = range
.filter(v -> (v % 15 == 0))
.count();
stopWatch.stop();
range = IntStream.range(Integer.MIN_VALUE, Integer.MAX_VALUE);
stopWatch.start("parallel stream");
long parallelStreamResult = range
.parallel()
.filter(v -> (v % 15 == 0))
.count();
stopWatch.stop();
System.out.println(String.format("forEachResult: %s%s" +
"parallelStreamResult: %s%s" +
"streamResult: %s%s",
forEachResult, System.lineSeparator(),
parallelStreamResult, System.lineSeparator(),
streamResult, System.lineSeparator()));
System.out.println("prettyPrint: " + stopWatch.prettyPrint());
System.out.println("Time Elapsed: " + stopWatch.getTotalTimeSeconds());
}

Can the java compiler optimize loops to return early?

I'm working with an external library that decided to handle collections on its own. Not working with it or updating is outside my control. To work with elements of this third party "collection" it only returns iterators.
A question came up during a code review about having multiple returns in the code to gain performance. We all agree (within the team) the code is more readable with a single return, but some are worried about optimizations.
I'm aware premature optimization is bad. That is a topic for another day.
I believe the JIT compiler can handle this and skip the unneeded iterations, but could not find any info to back this up. Is JIT capable of such a thing?
A code sample of the issue at hand:
public void boolean contains(MyThings things, String valueToFind) {
Iterator<Thing> thingIterator = things.iterator();
boolean valueFound = false;
while(thingIterator.hasNext()) {
Thing thing = thingIterator.next();
if (valueToFind.equals(thing.getValue())) {
valueFound = true;
}
}
return valueFound;
}
VS
public void boolean contains(MyThings things, String valueToFind) {
Iterator<Thing> thingIterator = things.iterator();
while(thingIterator.hasNext()) {
Thing thing = thingIterator.next();
if (valueToFind.equals(thing.getValue())) {
return true;
}
}
return false;
}
We all agree the code is more readable with a single return.
Not really. This is just old school structured programming when functions were typically not kept small and the paradigms of keeping values immutable weren't popular yet.
Although subject to debate, there is nothing wrong with having very small methods (a handful of lines of code), which return at different points. For example, in recursive methods, you typically have at least one base case which returns immediately, and another one which returns the value returned by the recursive call.
Often you will find that creating an extra result variable, just to hold the return value, and then making sure no other part of the function overwrites the result, when you already know you can just return, just creates noise which makes it less readable not more. The reader has to deal with cognitive overload to see the result is not modified further down. During debugging this increases the pain even more.
I don't think your example is premature optimisation. It is a logical and critical part of your search algorithm. That is why you can break from loops, or in your case, just return the value. I don't think the JIT could realise that easily it should break out the loop. It doesn't know if you want to change the variable back to false if you find something else in the collection. (I don't think it is that smart to realise that valueFound doesn't change back to false).
In my opinion, your second example is not only more readable (the valueFound variable is just extra noise) but also faster, because it just returns when it does its job. The first example would be as fast if you put a break after setting valueFound = true. If you don't do this, and you have a million items to check, and the item you need is the first, you will be comparing all the others just for nothing.
Java compiler cannot do an optimization like that, because doing so in a general case would change the logic of the program.
Specifically, adding an early return would change the number of invocations of thingIterator.hasNext(), because your first code block continues iterating the collection to the end.
Java could potentially replace a break with an early return, but that would have any effect on the timing of the program.

Performance Difference For-Loop Foreach

I always asked myself "what to use" should I use a for-loop or a foreach.
In my opinion it's both the "same". I know for iterating through a list etc. is a foreach better but what if we have the following case :
for (String zipCode : zipCodes) {
if (zipCode.equals(zip)) {
return true;
}
}
or
for (int i = 0; i < zipCodes.length; i++) {
if (zipCodes[i].equals(zip)) {
return true;
}
}
What would be better? Or is in this case really no difference?
First things first - for-each is nothing but syntactic sugar for Iterator. Read this section of JLS. So, I will address this question as a simple FOR loop vs Iterator.
Now, when you use Iterator to traverse over a collection, at bare minimum you will be using two method - next() and hasNext(), and below are their ArrayList implementations:
public boolean hasNext() {
return cursor != size;
}
#SuppressWarnings("unchecked")
public E next() {
checkForComodification();
int i = cursor;
if (i >= size)
throw new NoSuchElementException();
Object[] elementData = ArrayList.this.elementData;
if (i >= elementData.length)
throw new ConcurrentModificationException();
cursor = i + 1;
return (E) elementData[lastRet = I]; // hagrawal: this is what simple FOR loop does
}
Now, we all know the basic computing that there will be performance difference if on the processor I have to just execute myArray[i] v/s complete implementation of next() method. So, there has to be a difference in performance.
It is likely that some folk might come back strongly on this, citing performance benchmarks and excerpts from Effective Java, but the only other way I can try to explain is that this is even written in Oracle's official documentation - please read below from RandomAccess interface docs over here.
It is very clearly mentioned that there will be differences. So, if you can convince me that what is written in official docs is wrong and will be changed, I will be ready to accept the argument that there is no performance difference between simple FOR loop and Iterator or for-each.
So IMHO, correct way to put this whole argument is this:
If the collection implements RandomAccess interface then simple FOR loop will perform (at least theoretically) better than Iterator or for-each. (this is what is also written in RandomAccess docs)
If the collection doesn't implement RandomAccess interface then Iterator or for-each will perform (for sure) better than simple FOR loop.
However, for all practical purposes, for-each is the best choice in general.
If zipCodes[i] is not O(1), then the performance of your second case will be much worse. (That said, I don't think there yet exists a container in Java where [] is not O(1)). Put another way, the short form for loop cannot be slower.
Plus the short form for loop is clearer, which really ought to be the primary consideration unless speed matters.
It is less about optimisation nowadays, as any difference will be unnoticeable, unless you need to process a very large amount of data. Also, if you used a Collection, the performance would depend on the chosen implementation.
What you should really think about is the quality of the code. The rule is that you should use as few elements as possible to present the logic as clearly as possible. The second solution introduces a new element, the i index, which is not actually needed and only makes the code this little bit more complicated. Only use the fori loop if you actually need to know the index in each iteration.
So, from code quality perspective, you should use the first solution :-)
Note that there is no performance penalty for using the for-each loop,
even for arrays. In fact, it may offer a slight performance advantage
over an ordinary for loop in some circumstances, as it computes the
limit of the array index only once.
Item 46 in Effective Java by Joshua Bloch

Should I return a Collection or a Stream?

Suppose I have a method that returns a read-only view into a member list:
class Team {
private List<Player> players = new ArrayList<>();
// ...
public List<Player> getPlayers() {
return Collections.unmodifiableList(players);
}
}
Further suppose that all the client does is iterate over the list once, immediately. Maybe to put the players into a JList or something. The client does not store a reference to the list for later inspection!
Given this common scenario, should I return a stream instead?
public Stream<Player> getPlayers() {
return players.stream();
}
Or is returning a stream non-idiomatic in Java? Were streams designed to always be "terminated" inside the same expression they were created in?
The answer is, as always, "it depends". It depends on how big the returned collection will be. It depends on whether the result changes over time, and how important consistency of the returned result is. And it depends very much on how the user is likely to use the answer.
First, note that you can always get a Collection from a Stream, and vice versa:
// If API returns Collection, convert with stream()
getFoo().stream()...
// If API returns Stream, use collect()
Collection<T> c = getFooStream().collect(toList());
So the question is, which is more useful to your callers.
If your result might be infinite, there's only one choice: Stream.
If your result might be very large, you probably prefer Stream, since there may not be any value in materializing it all at once, and doing so could create significant heap pressure.
If all the caller is going to do is iterate through it (search, filter, aggregate), you should prefer Stream, since Stream has these built-in already and there's no need to materialize a collection (especially if the user might not process the whole result.) This is a very common case.
Even if you know that the user will iterate it multiple times or otherwise keep it around, you still may want to return a Stream instead, for the simple fact that whatever Collection you choose to put it in (e.g., ArrayList) may not be the form they want, and then the caller has to copy it anyway. If you return a Stream, they can do collect(toCollection(factory)) and get it in exactly the form they want.
The above "prefer Stream" cases mostly derive from the fact that Stream is more flexible; you can late-bind to how you use it without incurring the costs and constraints of materializing it to a Collection.
The one case where you must return a Collection is when there are strong consistency requirements, and you have to produce a consistent snapshot of a moving target. Then, you will want put the elements into a collection that will not change.
So I would say that most of the time, Stream is the right answer — it is more flexible, it doesn't impose usually-unnecessary materialization costs, and can be easily turned into the Collection of your choice if needed. But sometimes, you may have to return a Collection (say, due to strong consistency requirements), or you may want to return Collection because you know how the user will be using it and know this is the most convenient thing for them.
If you already have a suitable Collection "lying around", and it seems likely that your users would rather interact with it as a Collection, then it is a reasonable choice (though not the only one, and more brittle) to just return what you have.
I have a few points to add to Brian Goetz' excellent answer.
It's quite common to return a Stream from a "getter" style method call. See the Stream usage page in the Java 8 javadoc and look for "methods... that return Stream" for the packages other than java.util.Stream. These methods are usually on classes that represent or can contain multiple values or aggregations of something. In such cases, APIs typically have returned collections or arrays of them. For all the reasons that Brian noted in his answer, it's very flexible to add Stream-returning methods here. Many of these classes have collections- or array-returning methods already, because the classes predate the Streams API. If you're designing a new API, and it makes sense to provide Stream-returning methods, it might not be necessary to add collection-returning methods as well.
Brian mentioned the cost of "materializing" the values into a collection. To amplify this point, there are actually two costs here: the cost of storing values in the collection (memory allocation and copying) and also the cost of creating the values in the first place. The latter cost can often be reduced or avoided by taking advantage of a Stream's laziness-seeking behavior. A good example of this are the APIs in java.nio.file.Files:
static Stream<String> lines(path)
static List<String> readAllLines(path)
Not only does readAllLines have to hold the entire file contents in memory in order to store it into the result list, it also has to read the file to the very end before it returns the list. The lines method can return almost immediately after it has performed some setup, leaving file reading and line breaking until later when it's necessary -- or not at all. This is a huge benefit, if for example, the caller is interested only in the first ten lines:
try (Stream<String> lines = Files.lines(path)) {
List<String> firstTen = lines.limit(10).collect(toList());
}
Of course considerable memory space can be saved if the caller filters the stream to return only lines matching a pattern, etc.
An idiom that seems to be emerging is to name stream-returning methods after the plural of the name of the things that it represents or contains, without a get prefix. Also, while stream() is a reasonable name for a stream-returning method when there is only one possible set of values to be returned, sometimes there are classes that have aggregations of multiple types of values. For example, suppose you have some object that contains both attributes and elements. You might provide two stream-returning APIs:
Stream<Attribute> attributes();
Stream<Element> elements();
Were streams designed to always be "terminated" inside the same expression they were created in?
That is how they are used in most examples.
Note: returning a Stream is not that different to returning a Iterator (admitted with much more expressive power)
IMHO the best solution is to encapsulate why you are doing this, and not return the collection.
e.g.
public int playerCount();
public Player player(int n);
or if you intend to count them
public int countPlayersWho(Predicate<? super Player> test);
If the stream is finite, and there is an expected/normal operation on the returned objects which will throw a checked exception, I always return a Collection. Because if you are going to be doing something on each of the objects that can throw a check exception, you will hate the stream. One real lack with streams i there inability to deal with checked exceptions elegantly.
Now, perhaps that is a sign that you don't need the checked exceptions, which is fair, but sometimes they are unavoidable.
While some of the more high-profile respondents gave great general advice, I'm surprised no one has quite stated:
If you already have a "materialized" Collection in-hand (i.e. it was already created before the call - as is the case in the given example, where it is a member field), there is no point converting it to a Stream. The caller can easily do that themselves. Whereas, if the caller wants to consume the data in its original form, you converting it to a Stream forces them to do redundant work to re-materialize a copy of the original structure.
In contrast to collections, streams have additional characteristics. A stream returned by any method might be:
finite or infinite
parallel or sequential (with a default globally shared threadpool that can impact any other part of an application)
ordered or non-ordered
holding references to be closed or not
These differences also exists in collections, but there they are part of the obvious contract:
All Collections have size, Iterator/Iterable can be infinite.
Collections are explicitly ordered or non-ordered
Parallelity is thankfully not something the collection care about beyond thread-safety
Collections also are not closable typically, so also no need to worry about using try-with-resources as a guard.
As a consumer of a stream (either from a method return or as a method parameter) this is a dangerous and confusing situation. To make sure their algorithm behaves correctly, consumers of streams need to make sure the algorithm makes no wrong assumption about the stream characteristics. And that is a very hard thing to do. In unit testing, that would mean that you have to multiply all your tests to be repeated with the same stream contents, but with streams that are
(finite, ordered, sequential, requiring-close)
(finite, ordered, parallel, requiring-close)
(finite, non-ordered, sequential, requiring-close)...
Writing method guards for streams that throw an IllegalArgumentException if the input stream has a characteristics breaking your algorithm is difficult, because the properties are hidden.
Documentation mitigates the problem, but it is flawed and often overlooked, and does not help when a stream provider is modified. As an example, see these javadocs of Java8 Files:
/**
* [...] The returned stream encapsulates a Reader. If timely disposal of
* file system resources is required, the try-with-resources
* construct should be used to ensure that the stream's close
* method is invoked after the stream operations are completed.
*/
public static Stream<String> lines(Path path, Charset cs)
/**
* [...] no mention of closing even if this wraps the previous method
*/
public static Stream<String> lines(Path path)
That leaves Stream only as a valid choice in a method signature when none of the problems above matter, typically when the stream producer and consumer are in the same codebase, and all consumers are known (e.g. not part of the public interface of a class reusable in many places).
It is much safer to use other datatypes in method signatures with an explicit contract (and without implicit thread-pool processing involved) that makes it impossible to accidentally process data with wrong assumptions about orderedness, sizedness or parallelity (and threadpool usage).
I think it depends on your scenario. May be, if you make your Team implement Iterable<Player>, it is sufficient.
for (Player player : team) {
System.out.println(player);
}
or in the a functional style:
team.forEach(System.out::println);
But if you want a more complete and fluent api, a stream could be a good solution.
Perhaps a Stream factory would be a better choice. The big win of only
exposing collections via Stream is that it better encapsulates your
domain model’s data structure. It’s impossible for any use of your domain classes to affect the inner workings of your List or Set simply
by exposing a Stream.
It also encourages users of your domain class to
write code in a more modern Java 8 style. It’s possible to
incrementally refactor to this style by keeping your existing getters
and adding new Stream-returning getters. Over time, you can rewrite
your legacy code until you’ve finally deleted all getters that return
a List or Set. This kind of refactoring feels really good once you’ve
cleared out all the legacy code!
I would probably have 2 methods, one to return a Collection and one to return the collection as a Stream.
class Team
{
private List<Player> players = new ArrayList<>();
// ...
public List<Player> getPlayers()
{
return Collections.unmodifiableList(players);
}
public Stream<Player> getPlayerStream()
{
return players.stream();
}
}
This is the best of both worlds. The client can choose if they want the List or the Stream and they don't have to do the extra object creation of making an immutable copy of the list just to get a Stream.
This also only adds 1 more method to your API so you don't have too many methods

Categories

Resources