Let's say you have a collection with some strings and you want to return the first two characters of each string (or some other manipulation...).
In Java 8 for this case you can use either the map or the forEach methods on the stream() which you get from the collection (maybe something else but that is not important right now).
Personally I would use the map primarily because I associate forEach with mutating the collection and I want to avoid this. I also created a really small test regarding the performance but could not see any improvements when using forEach (I perfectly understand that small tests cannot give reliable results but still).
So what are the use-cases where one should choose forEach?
map is the better choice for this, because you're not trying to do anything with the strings yet, just map them to different strings.
forEach is designed to be the "final operation." As such, it doesn't return anything, and is all about mutating some state -- though not necessarily that of the original collection. For instance, you might use it to write elements to a file, having used other constructs (including map) to get those elements.
forEach terminates the stream and is exectued because of the side effect of the called Cosumer. It does not necessarily mutate the stream members.
map maps each stream element to a different value/object using a provided Function. A Stream <R> is returned on which more steps can act.
The forEach terminal operation might be useful in several cases: when you want to collect into some older class for which you don't have a proper collector or when you don't want to collect at all, but send you data somewhere outside (write into the database, print into OutputStream, etc.). There are many cases when the best way is to use both map (as intermediate operation) and forEach (as terminal operation).
Related
I have this concern when it is said there can be one terminal operation and terminal operations cannot be chained. we can write something like this right?
Stream1.map().collect().forEach()
Isn’t this chaining collect and forEach which are both terminal operations. I don’t get that part
The above works fine
Because
Assuming you meant collect(Collectors.toList()), forEach is a List operation, not a Stream operation. Perhaps the source of your confusion: there is a forEach method on Stream as well, but that's not what you're using.
Even if it weren't, nothing stops you from creating another stream from something that can be streamed, you just can't use the same stream you created in the first place.
Stream has forEach, and List has forEach (by extending Iterable). Different methods, but with the same name and purpose. Only the former is a terminal operation.
One practical difference is that the Stream version can be called on a parallel stream, and in that case, the order is not guaranteed. It might appear "random". The version from Iterable always happens on the same, calling thread. The order is guaranteed to match that of an Iterator.
Your example is terminally collecting the stream's items into a List, then calling forEach on that List.
That example is bad style because the intermediate List is useless. It's creating a List for something you could have done directly on the Stream.
I am learning Java 8 and came across a situation. Where in I have to iterate over a list of strings and then convert them to upperCase. The possible solutions would be to stream the list. Among many suggestions from Intellij the below two seems to be useful.
list.stream()
.map(String::toUpperCase)
or
list.stream().
forEach(p -> p.toUpperCase())
I am confused on which one to use and the use cases for all the Suggestions. Can I get help regarding which method to use and how to understand using all those suggestions?
Stream.map() will never run unless you end the pipeline in a terminal operation, like forEach(). But calling toUpperCase() in a forEach() won't do anything either, because strings are immutable. String.toUpperCase() doesn't change the string; it returns a new one.
If you just want to update the list in-place, you can use
list.replaceAll(String::toUpperCase);
which actually replaces each element with the result of the passed function.
If you want the results in a new list, use the map() snippet with a collector:
List<String> list2 = list.stream()
.map(String::toUpperCase)
.collect(Collectors.toList());
forEach is an terminal operation that makes a difference through side effects. map is a non-terminal operation that makes a direct mapping from one element to another. For example, here is a canonical usage of forEach:
stream.forEach(System.out::println);
This will invoke, on each element of the stream, the equivalent of System.out.println(element);. However, the stream will be closed after this, and no operations may be executed on stream afterwards. map, on the other hand, may be used like this:
streamOfObjects.map(Object::toString).collect(Collectors.toList());
In this case, each Object within streamOfObjects is mapped to a String, created by invocation of toString. Then, the stream of Strings produced by map is collected into a List using a Collector.
In any case, I'd suggest using replaceAll for this use case, as suggested by #shmosel.
As for how to understand suggestions provided by autocomplete, I would strongly suggest reading JavaDocs on the related classes.
I am unable to unable to iterate (second time) over the stream created with Steam.spliterator . I could not find documentation about the same.
Here is what i am doing:
I got a Iterable as funciton argument and I am iterating this via stream like following code :
StreamSupport.stream(values.spliterator(), false)
and following that i am doing it again but the second one do not iterate at all. I spent lot of time debugging it and finally converted the iterable to a list in the beginning itself.
Do any of you guys know the reason ?
Edit: Sorry if i am not clear ,
I was not using the stream multiple times , I was generating the stream in the above way with the same Iterable.
Iterable is the one coming from reduce in MapReduce job.
Thanks,
Hareendra
A Stream is a one-shot object. You can consume it only once, not multiple times. If you want to use the contents multiple times you have to do like you did, converting the stream to a list or array or anything non-streamy and then create two new streams out of it for the two things you want to do.
Quote from JavaDoc of Stream class:
A stream should be operated on (invoking an intermediate or terminal stream operation) only once. This rules out, for example, "forked" streams, where the same source feeds two or more pipelines, or multiple traversals of the same stream.
Be sure that the Iterable instance you are using to create your Spliterator truly honors the Iterable contract. Some people make the mistake of thinking that anything that implements iterator() will serve as an Iterable, and that is not the case. To comply with the Iterable contract, one must be able to call iterator() multiple times and be able to iterate with it each time.
Because it is very easy to create an Iterable from anything that has an iterator() function, I have seen several cases of a manufactured Iterable exhibiting the behavior you mention. For example, one can do this:
Stream<String> stream = ...
Iterable<String> falseIterable = stream::iterator;
falseIterable does not follow required Iterable semantics because falseIterable.iterator(), being a wrapper around stream.iterator(), will not return a usable Iterator a second time, once it has been iterated over.
Consider the following example that prints the maximum element in a List :
List<Integer> list = Arrays.asList(1,4,3,9,7,4,8);
list.stream().max(Comparator.naturalOrder()).ifPresent(System.out::println);
The same objective can also be achieved using the Collections.max method :
System.out.println(Collections.max(list));
The above code is not only shorter but also cleaner to read (in my opinion). There are similar examples that come to mind such as the use of binarySearch vs filter used in conjunction with findAny.
I understand that Stream can be an infinite pipeline as opposed to a Collection that is limited by the memory available to the JVM. This would be my criteria for deciding whether to use a Stream or the Collections API. Are there any other reasons for choosing Stream over the Collections API (such as performance). More generally, is this the only reason to chose Stream over older API that can do the job in a cleaner and shorter way?
Stream API is like a Swiss Army knife: it allows you to do quite complex operations by combining the tools effectively. On the other hand if you just need a screwdriver, probably the standalone screwdriver would be more convenient. Stream API includes many things (like distinct, sorted, primitive operations etc.) which otherwise would require you to write several lines and introduce intermediate variables/data structures and boring loops drawing the programmer attention from the actual algorithm. Sometimes using the Stream API can improve the performance even for sequential code. For example, consider some old API:
class Group {
private Map<String, User> users;
public List<User> getUsers() {
return new ArrayList<>(users.values());
}
}
Here we want to return all the users of the group. The API designer decided to return a List. But it can be used outside in a various ways:
List<User> users = group.getUsers();
Collections.sort(users);
someOtherMethod(users.toArray(new User[users.size]));
Here it's sorted and converted to array to pass to some other method which happened to accept an array. In the other place getUsers() may be used like this:
List<User> users = group.getUsers();
for(User user : users) {
if(user.getAge() < 18) {
throw new IllegalStateException("Underage user in selected group!");
}
}
Here we just want to find the user matched some criteria. In both cases copying to intermediate ArrayList was actually unnecessary. When we move to Java 8, we can replace getUsers() method with users():
public Stream<User> users() {
return users.values().stream();
}
And modify the caller code. The first one:
someOtherMethod(group.users().sorted().toArray(User[]::new));
The second one:
if(group.users().anyMatch(user -> user.getAge() < 18)) {
throw new IllegalStateException("Underage user in selected group!");
}
This way it's not only shorter, but may work faster as well, because we skip the intermediate copying.
The other conceptual point in Stream API is that any stream code written according to the guidelines can be parallelized simply by adding the parallel() step. Of course this will not always boost the performance, but it helps more often than I expected. Usually if the operation executed sequentially for 0.1ms or longer, it can benefit from the parallelization. Anyways we haven't seen such simple way to do the parallel programming in Java before.
Of course, it always depends on the circumstances. Take you initial example:
List<Integer> list = Arrays.asList(1,4,3,9,7,4,8);
list.stream().max(Comparator.naturalOrder()).ifPresent(System.out::println);
If you want to do the same thing efficiently, you would use
IntStream.of(1,4,3,9,7,4,8).max().ifPresent(System.out::println);
which doesn’t involve any auto-boxing. But if your assumption is to have a List<Integer> beforehand, that might not be an option, so if you are just interested in the max value, Collections.max might be the simpler choice.
But this would lead to the question why you have a List<Integer> beforehand. Maybe, it’s the result of old code (or new code written using old thinking), which had no other choice than using boxing and Collections as there was no alternative in the past?
So maybe you should think about the source producing the collection, before bother with how to consume it (or well, think about both at the same time).
If all you have is a Collection and all you need is a single terminal operation for which a simple Collection based implementation exists, you may use it directly without bother with the Stream API. The API designers acknowledged this idea as they added methods like forEach(…) to the Collection API instead of insisting of everyone using stream().forEach(…). And Collection.forEach(…) is not a simple short-hand for Collection.stream().forEach(…), in fact, it’s already defined on the more abstract Iterable interface which even hasn’t a stream() method.
Btw., you should understand the difference between Collections.binarySearch and Stream.filter/findAny. The former requires the collection to be sorted and if that prerequisite is met, might be the better choice. But if the collection isn’t sorted, a simple linear search is more efficient than sorting just for a single use of binary search, not to speak of the fact, that binary search works with Lists only while filter/findAny works with any stream supporting every kind of source collection.
This question already has an answer here:
Why doesn't java.util.Collection implement the new Stream interface?
(1 answer)
Closed 8 years ago.
This is a question about API desing. When extension methods were added in C#, IEnumerable got all the methods that enabled using lambda expression directly on all Collections.
With the advent of lambdas and default methods in Java, I would expect that Collection would implement Stream and provide default implementations for all its methods. This way, we would not need to call stream() in order to leverage the power it provides.
What is the reason the library architects opted for the less convenient approach?
From Maurice Naftalin's Lambda FAQ:
Why are Stream operations not defined directly on Collection?
Early drafts of the API exposed methods like filter, map, and reduce on Collection or Iterable. However, user experience with this design led to a more formal separation of the “stream” methods into their own abstraction. Reasons included:
Methods on Collection such as removeAll make in-place modifications, in contrast to the new methods which are more functional in nature. Mixing two different kinds of methods on the same abstraction forces the user to keep track of which are which. For example, given the declaration
Collection strings;
the two very similar-looking method calls
strings.removeAll(s -> s.length() == 0);
strings.filter(s -> s.length() == 0); // not supported in the current API
would have surprisingly different results; the first would remove all empty String objects from the collection, whereas the second would return a stream containing all the non-empty Strings, while having no effect on the collection.
Instead, the current design ensures that only an explicitly-obtained stream can be filtered:
strings.stream().filter(s.length() == 0)...;
where the ellipsis represents further stream operations, ending with a terminating operation. This gives the reader a much clearer intuition about the action of filter;
With lazy methods added to Collection, users were confused by a perceived—but erroneous—need to reason about whether the collection was in “lazy mode” or “eager mode”. Rather than burdening Collection with new and different functionality, it is cleaner to provide a Stream view with the new functionality;
The more methods added to Collection, the greater the chance of name collisions with existing third-party implementations. By only adding a few methods (stream, parallel) the chance for conflict is greatly reduced;
A view transformation is still needed to access a parallel view; the asymmetry between the sequential and the parallel stream views was unnatural. Compare, for example
coll.filter(...).map(...).reduce(...);
with
coll.parallel().filter(...).map(...).reduce(...);
This asymmetry would be particularly obvious in the API documentation, where Collection would have many new methods to produce sequential streams, but only one to produce parallel streams, which would then have all the same methods as Collection. Factoring these into a separate interface, StreamOps say, would not help; that would still, counterintuitively, need to be implemented by both Stream and Collection;
A uniform treatment of views also leaves room for other additional views in the future.
A Collection is an object model
A Stream is a subject model
Collection definition in doc :
A collection represents a group of objects, known as its elements.
Stream definition in doc :
A sequence of elements supporting sequential and parallel aggregate operations
Seen this way, a stream is a specific collection. Not the way around. Thus Collection should not Implement Stream, regardless of backward compatibility.
So why doesnt Stream<T> implement Collection<T> ? Because It is another way of looking at a bunch of objects. Not as a group of elements, but by the operations you can perform on it. Thus this is why I say a Collection is an object model while a Stream is a subject model
First, from the documentation of Stream:
Collections and streams, while bearing some superficial similarities, have different goals. Collections are primarily concerned with the efficient management of, and access to, their elements. By contrast, streams do not provide a means to directly access or manipulate their elements, and are instead concerned with declaratively describing their source and the computational operations which will be performed in aggregate on that source.
So you want to keep the concepts of stream and collection appart. If Collection would implement Stream every collection would be a stream, which it is conceptually not. The way it is done now, every collection can give you a stream which works on that collection, which is something different if you think about it.
Another factor that comes to mind is cohesion/coupling as well as encapsulation. If every class that implements Collection had to implement the operations of Stream as well, it would have two (kind of) different purposes and might become too long.
My guess would be that it was made that way to avoid breakage with existing code that implements Collection. It would be hard to provide a default implementation that worked correctly with all existing implementations.