I have this concern when it is said there can be one terminal operation and terminal operations cannot be chained. we can write something like this right?
Stream1.map().collect().forEach()
Isn’t this chaining collect and forEach which are both terminal operations. I don’t get that part
The above works fine
Because
Assuming you meant collect(Collectors.toList()), forEach is a List operation, not a Stream operation. Perhaps the source of your confusion: there is a forEach method on Stream as well, but that's not what you're using.
Even if it weren't, nothing stops you from creating another stream from something that can be streamed, you just can't use the same stream you created in the first place.
Stream has forEach, and List has forEach (by extending Iterable). Different methods, but with the same name and purpose. Only the former is a terminal operation.
One practical difference is that the Stream version can be called on a parallel stream, and in that case, the order is not guaranteed. It might appear "random". The version from Iterable always happens on the same, calling thread. The order is guaranteed to match that of an Iterator.
Your example is terminally collecting the stream's items into a List, then calling forEach on that List.
That example is bad style because the intermediate List is useless. It's creating a List for something you could have done directly on the Stream.
Related
I am learning Java 8 and came across a situation. Where in I have to iterate over a list of strings and then convert them to upperCase. The possible solutions would be to stream the list. Among many suggestions from Intellij the below two seems to be useful.
list.stream()
.map(String::toUpperCase)
or
list.stream().
forEach(p -> p.toUpperCase())
I am confused on which one to use and the use cases for all the Suggestions. Can I get help regarding which method to use and how to understand using all those suggestions?
Stream.map() will never run unless you end the pipeline in a terminal operation, like forEach(). But calling toUpperCase() in a forEach() won't do anything either, because strings are immutable. String.toUpperCase() doesn't change the string; it returns a new one.
If you just want to update the list in-place, you can use
list.replaceAll(String::toUpperCase);
which actually replaces each element with the result of the passed function.
If you want the results in a new list, use the map() snippet with a collector:
List<String> list2 = list.stream()
.map(String::toUpperCase)
.collect(Collectors.toList());
forEach is an terminal operation that makes a difference through side effects. map is a non-terminal operation that makes a direct mapping from one element to another. For example, here is a canonical usage of forEach:
stream.forEach(System.out::println);
This will invoke, on each element of the stream, the equivalent of System.out.println(element);. However, the stream will be closed after this, and no operations may be executed on stream afterwards. map, on the other hand, may be used like this:
streamOfObjects.map(Object::toString).collect(Collectors.toList());
In this case, each Object within streamOfObjects is mapped to a String, created by invocation of toString. Then, the stream of Strings produced by map is collected into a List using a Collector.
In any case, I'd suggest using replaceAll for this use case, as suggested by #shmosel.
As for how to understand suggestions provided by autocomplete, I would strongly suggest reading JavaDocs on the related classes.
I read some questions how to create a finite Stream (
Finite generated Stream in Java - how to create one?, How do streams stop?).
The answers suggested to implement a Spliterator. The Spliterator would implement the logic how to and which element to provide as next (tryAdvance). But there are two other non-default methods trySplit and estimateSize() which I would have to implement.
The JavaDoc of Spliterator says:
An object for traversing and partitioning elements of a source. The source of elements covered by a Spliterator could be, for example, an array, a Collection, an IO channel, or a generator function. ... The Spliterator API was designed to support efficient parallel
traversal in addition to sequential traversal, by supporting
decomposition as well as single-element iteration. ...
On the other hand I could implement the logic how to advance to the next element around a Stream.Builder and bypass a Spliterator. On every advance I would call accept or add and at the end build. So it looks quite simple.
What does the JavaDoc say?
A mutable builder for a Stream. This allows the creation of a Stream
by generating elements individually and adding them to the Builder
(without the copying overhead that comes from using an ArrayList as a
temporary buffer.)
Using StreamSupport.stream I can use a Spliterator to obtain a Stream. And also a Builder will provide a Stream.
When should / could I use a Stream.Builder?
Only if a Spliterator wouldn't be more efficient (for instance because the source cannot be partitioned and its size cannot be estimated)?
Note that you can extend Spliterators.AbstractSpliterator. Then, there is only tryAdvance to implement.
So the complexity of implementing a Spliterator is not higher.
The fundamental difference is that a Spliterator’s tryAdvance method is only invoked when a new element is needed. In contrast, the Stream.Builder has a storage which will be filled with all stream elements, before you can acquire a Stream.
So a Spliterator is the first choice for all kinds of lazy evaluations, as well as when you have an existing storage you want to traverse, to avoid copying the data.
The builder is the first choice when the creation of the elements is non-uniform, so you can’t express the creation of an element on demand. Think of situations where you would otherwise use Stream.of(…), but it turns out to be to inflexible.
E.g. you have Stream.of(a, b, c, d, e), but now it turns out, c and d are optional. So the solution is
Stream.Builder<MyType> builder = Stream.builder();
builder.add(a).add(b);
if(someCondition) builder.add(c).add(d);
builder.add(e).build()
/* stream operations */
Other use cases are this answer, where a Consumer was needed to query an existing spliterator and push the value back to a Stream afterwards, or this answer, where a structure without random access (a class hierarchy) should be streamed in the opposite order.
On the other hand I could implement the logic how to advance to the next element around a Stream.Builder and bypass a Spliterator. On every advance I would call accept or add and at the end build. So it looks quite simple.
Yes and no. It is simple, but I don't think you understand the usage model:
A stream builder has a lifecycle, which starts in a building phase, during which elements can be added, and then transitions to a built phase, after which elements may not be added. The built phase begins when the build() method is called, which creates an ordered Stream whose elements are the elements that were added to the stream builder, in the order they were added.
(Javadocs)
In particular no, you would not invoke a Stream.Builder's accept or add method on any stream advance. You need to provide all the objects for the stream in advance. Then you build() to get a stream that will provide all the objects you previously added. This is analogous to adding all the objects to a List, and then invoking that List's stream() method.
If that serves your purposes and you can in fact do it efficiently then great! But if you need to generate elements on an as-needed basis, whether with or without limit, then Stream.Builder cannot help you. Spliterator can.
The Stream.Builder is a misnomer as streams can't really be built. Things that can be built are value objects - dto, array, collection.
So if Stream.Builder is instead thought of as a buffer, it might help understand it better, eg:
buffer.add(a)
buffer.add(b)
buffer.stream()
This shows how similar it is to an ArrayList:
list.add(a)
list.add(b)
list.stream()
On the other hand, Spliterator is the basis of a stream and allows for efficient navigation over data sets (improved version of the Iterator).
So the answer is they should not be compared. Comparing Stream.Builder to Spliterator is the same as comparing ArrayList to Spliterator.
I am unable to unable to iterate (second time) over the stream created with Steam.spliterator . I could not find documentation about the same.
Here is what i am doing:
I got a Iterable as funciton argument and I am iterating this via stream like following code :
StreamSupport.stream(values.spliterator(), false)
and following that i am doing it again but the second one do not iterate at all. I spent lot of time debugging it and finally converted the iterable to a list in the beginning itself.
Do any of you guys know the reason ?
Edit: Sorry if i am not clear ,
I was not using the stream multiple times , I was generating the stream in the above way with the same Iterable.
Iterable is the one coming from reduce in MapReduce job.
Thanks,
Hareendra
A Stream is a one-shot object. You can consume it only once, not multiple times. If you want to use the contents multiple times you have to do like you did, converting the stream to a list or array or anything non-streamy and then create two new streams out of it for the two things you want to do.
Quote from JavaDoc of Stream class:
A stream should be operated on (invoking an intermediate or terminal stream operation) only once. This rules out, for example, "forked" streams, where the same source feeds two or more pipelines, or multiple traversals of the same stream.
Be sure that the Iterable instance you are using to create your Spliterator truly honors the Iterable contract. Some people make the mistake of thinking that anything that implements iterator() will serve as an Iterable, and that is not the case. To comply with the Iterable contract, one must be able to call iterator() multiple times and be able to iterate with it each time.
Because it is very easy to create an Iterable from anything that has an iterator() function, I have seen several cases of a manufactured Iterable exhibiting the behavior you mention. For example, one can do this:
Stream<String> stream = ...
Iterable<String> falseIterable = stream::iterator;
falseIterable does not follow required Iterable semantics because falseIterable.iterator(), being a wrapper around stream.iterator(), will not return a usable Iterator a second time, once it has been iterated over.
Let's say you have a collection with some strings and you want to return the first two characters of each string (or some other manipulation...).
In Java 8 for this case you can use either the map or the forEach methods on the stream() which you get from the collection (maybe something else but that is not important right now).
Personally I would use the map primarily because I associate forEach with mutating the collection and I want to avoid this. I also created a really small test regarding the performance but could not see any improvements when using forEach (I perfectly understand that small tests cannot give reliable results but still).
So what are the use-cases where one should choose forEach?
map is the better choice for this, because you're not trying to do anything with the strings yet, just map them to different strings.
forEach is designed to be the "final operation." As such, it doesn't return anything, and is all about mutating some state -- though not necessarily that of the original collection. For instance, you might use it to write elements to a file, having used other constructs (including map) to get those elements.
forEach terminates the stream and is exectued because of the side effect of the called Cosumer. It does not necessarily mutate the stream members.
map maps each stream element to a different value/object using a provided Function. A Stream <R> is returned on which more steps can act.
The forEach terminal operation might be useful in several cases: when you want to collect into some older class for which you don't have a proper collector or when you don't want to collect at all, but send you data somewhere outside (write into the database, print into OutputStream, etc.). There are many cases when the best way is to use both map (as intermediate operation) and forEach (as terminal operation).
This question already has answers here:
What is difference between Collection.stream().forEach() and Collection.forEach()?
(5 answers)
Closed 7 years ago.
This is a example:
code A:
files.forEach(f -> {
//TODO
});
and another code B may use on this way:
files.stream().forEach(f -> { });
What is the difference between both, with stream() and no stream()?
Practically speaking, they are mostly the same, but there is a small semantic difference.
Code A is defined by Iterable.forEach, whereas code B is defined by Stream.forEach. The definition of Stream.forEach allows for the elements to be processed in any order -- even for sequential streams. (For parallel streams, Stream.forEach will very likely process elements out-of-order.)
Iterable.forEach gets an Iterator from the source and calls forEachRemaining() on it. As far as I can see, all current (JDK 8) implementations of Stream.forEach on the collections classes will create a Spliterator built from one of the source's Iterators, and will then call forEachRemaining on that Iterator -- just like Iterable.forEach does. So they do the same thing, though the streams version has some extra setup overhead.
However, in the future, it's possible that the streams implementation could change so that this is no longer the case.
(If you want to guarantee ordering of processing streams elements, use forEachOrdered() instead.)
There is no difference in terms of semantics, though the direct implementation without stream is probably slightly more efficient.
A stream is an sequence of elements (i.e a data structure) for using up an operation or iteration. Any Collection can be exposed as a stream. The operations you perform on a stream can either be
Intermediate operations (map, skip, concat, substream, distinct, filter, sorted, limit, peek..) producing another java.util.stream.Stream but the intermediate operations are lazy operations, which will be executed only after a terminal operation was executed.
And the Terminal operations (forEach, max, count, matchAny, findFirst, reduce, collect, sum, findAny ) producing an object that is not a stream.
Basically it is similar to pipeline as in Unix.
Both approaches uses the terminal operation Iterable.forEach, but the version with .stream() also unnecessarily creates a Stream object representing the List. While there is no difference, it is suboptimal.