Java Unordered() function - java

In java 8, when I do,
Type 1
list.stream().parallel().map(/**/).unordered().filter(/**/).collect(/**/);
Type 2
list.stream().parallel().unordered().map(/**/).filter(/**/).collect(/**/);
As both the streams are parallel, I can understand all the objects for each of the operations like filter, map etc.. will be executed parallely but the operations itself will be executed sequentially in the order defined.
Questions
1.In Type1, I do say unordered() after map() operation. So, does the map() operation try to handle 'ordering', because it is before unOrdered()?
2.In Type2, Ordering is not maintained across map, filter ops right? Is my understanding correct?

There are 3 Stream state-modifying methods:
sequential()
Returns an equivalent stream that is sequential. May return itself, either because the stream was already sequential, or because the underlying stream state was modified to be sequential.
parallel()
Returns an equivalent stream that is parallel. May return itself, either because the stream was already parallel, or because the underlying stream state was modified to be parallel.
unordered()
Returns an equivalent stream that is unordered. May return itself, either because the stream was already unordered, or because the underlying stream state was modified to be unordered.
As you can see, all three may modify the underlying stream state, which means that the position in the stream chain of methods don't matter.
Your two examples are the same. So would these be:
list.stream().parallel().map(/**/).filter(/**/).unordered().collect(/**/);
list.stream().map(/**/).filter(/**/).unordered().parallel().collect(/**/);
list.stream().unordered().map(/**/).parallel().filter(/**/).collect(/**/);
list.stream().unordered().parallel().map(/**/).filter(/**/).collect(/**/);
You should click on the unordered link and read the javadoc to learn more about ordering of streams.

The effect of .unordered() is to only remove constraints on the stream that it must remain ordered. So any intermediate operations in the pipeline are unaffected by an ordering constraint. In the example provided, assuming the internal operations of each are not stateful operations, .unordered() has no effect.
Here are some helpful quotes from the docs:
Stream Operations:
Traversal of the pipeline source does not begin until the terminal
operation of the pipeline is executed.
So all of the effects of intermediate operations are consolidated and operate on an optimized representation of the input data. These means that regardless of the order of the intermediate operation, they effect the entirety of the operation of the pipeline the same way. This is true for parallel or sequential streams.
Ordering:
However, if the source has no defined encounter order, then any
permutation of the values [2, 4, 6] would be a valid result.
This is related to your question about T1 (maintaining ordering). In the pipelines you have, this quote means there is nothing that will maintain order.

Related

Terminal operations on streams cannot be chained?

I have this concern when it is said there can be one terminal operation and terminal operations cannot be chained. we can write something like this right?
Stream1.map().collect().forEach()
Isn’t this chaining collect and forEach which are both terminal operations. I don’t get that part
The above works fine
Because
Assuming you meant collect(Collectors.toList()), forEach is a List operation, not a Stream operation. Perhaps the source of your confusion: there is a forEach method on Stream as well, but that's not what you're using.
Even if it weren't, nothing stops you from creating another stream from something that can be streamed, you just can't use the same stream you created in the first place.
Stream has forEach, and List has forEach (by extending Iterable). Different methods, but with the same name and purpose. Only the former is a terminal operation.
One practical difference is that the Stream version can be called on a parallel stream, and in that case, the order is not guaranteed. It might appear "random". The version from Iterable always happens on the same, calling thread. The order is guaranteed to match that of an Iterator.
Your example is terminally collecting the stream's items into a List, then calling forEach on that List.
That example is bad style because the intermediate List is useless. It's creating a List for something you could have done directly on the Stream.

What is Time Complexity for Arrays.stream() O(n) or O(n log n)?

Stream API documentation says:
certain stream sources (such as List or arrays) are intrinsically
ordered, whereas others (such as HashSet) are not.
What would be the time complexity of Arrays.stream() method?
O(n log n), as it returns sorted array, or O(n), as we expect from stream()s methods ?
O(n) or O(nlogn) ?
Neither of these.
Firstly, it seems like you're confusing a stream which elements are sorted with an ordered stream, i.e. a stream which has a particular encounter order of elements.
Whether a stream is ordered or not depends on the stream source and intermediate operations in it.
A stream created over an array, or ordered collection like a List, or a Queue is ordered respectively to order elements in it, but it does not imply that such stream is sorted.
We can make a stream unordered by applying unordered() operation on in it. This operation alone will not change the stream data, but it will have an impact on the execution of stateful intermediate operations like takeWhile() that require buffering, and terminal operation like reduce(), collect() that give a guarantee to respect the initial encounter order. As a result, a parallel unordered stream might have better performance because of loosening this constraint.
Here is a quote from the API documentation:
Ordering
Streams may or may not have a defined encounter order. Whether or
not a stream has an encounter order depends on the source and the
intermediate operations. Certain stream sources (such as List or arrays) are
intrinsically ordered, whereas others (such as HashSet)
are not. Some intermediate operations, such as sorted(), may impose an
encounter order on an otherwise unordered stream, and others may
render an ordered stream unordered, such as BaseStream.unordered().
Further, some terminal operations may ignore encounter order, such as
forEach().
If a stream is ordered, most operations are constrained to operate on
the elements in their encounter order; if the source of a stream is a
List containing [1, 2, 3], then the result of executing map(x -> x*2)
must be [2, 4, 6]. However, if the source has no defined encounter
order, then any permutation of the values [2, 4, 6] would be a valid
result.
For sequential streams, the presence or absence of an encounter order
does not affect performance, only determinism. If a stream is ordered,
repeated execution of identical stream pipelines on an identical
source will produce an identical result; if it is not ordered,
repeated execution might produce different results.
For parallel streams, relaxing the ordering constraint can sometimes
enable more efficient execution. Certain aggregate operations, such as
filtering duplicates (distinct()) or grouped reductions
(Collectors.groupingBy()) can be implemented more efficiently if
ordering of elements is not relevant. Similarly, operations that are
intrinsically tied to encounter order, such as limit(), may require
buffering to ensure proper ordering, undermining the benefit of
parallelism. In cases where the stream has an encounter order, but the
user does not particularly care about that encounter order, explicitly
de-ordering the stream with unordered() may improve parallel
performance for some stateful or terminal operations. However, most
stream pipelines, such as the "sum of weight of blocks" example above,
still parallelize efficiently even under ordering constraints.
Secondly, because you're assuming that creating a stream over an array will cost at list O(n) you might have a misconception regarding the nature of streams.
In essence, stream is a mean of iteration, it is not a container of data like Collection.
Creation of a stream doesn't require dumping all the data from the source into memory, we're only creating an internal iterator over the source of data, and this action has a time complexity of O(1).
Streams are lazy and every action in the stream pipeline occur only when it's needed, and elements from the source are processed one by one.
For instance, let's assume we have an integer array containing 1,000,000 elements, and we want to get the first 10 elements from it as hexadecimal strings:
List<String> result = Arrays.stream(sourceArray)
.mapToObj(Integer::toHexString)
.limit(10)
.toList();
On execution, only the first 10 elements would be retrieved from the source array, and then the stream would immediately terminate, producing the result.
The overall time complexity of such a stream would be O(1) because we care only about a fixed number of elements at the very beginning, and don't need all the data that the source contains.

What is the order in which stream operations are applied to list elements? [duplicate]

This question already has answers here:
How to ensure order of processing in java8 streams?
(2 answers)
Closed 6 years ago.
Suppose we have a standard method chain of stream operations:
Arrays.asList("a", "bc", "def").stream()
.filter(e -> e.length() != 2)
.map(e -> e.length())
.forEach(e -> System.out.println(e));
Are there any guarantees in the JLS regarding the order in which stream operations are applied to the list elements?
For example, is it guaranteed that:
Applying the filter predicate to "bc" is not going to happen before applying the filter predicate to "a"?
Applying the mapping function to "def" is not going to happen before applying the mapping function to "a"?
1 will be printed before 3?
Note: I am talking here specifically about stream(), not parallelStream() where it is expected that operations like mapping and filtering are done in parallel.
Everything you want to know can be found within the java.util.stream JavaDoc.
Ordering
Streams may or may not have a defined encounter order. Whether or not
a stream has an encounter order depends on the source and the
intermediate operations. Certain stream sources (such as List or
arrays) are intrinsically ordered, whereas others (such as HashSet)
are not. Some intermediate operations, such as sorted(), may impose an
encounter order on an otherwise unordered stream, and others may
render an ordered stream unordered, such as BaseStream.unordered().
Further, some terminal operations may ignore encounter order, such as
forEach().
If a stream is ordered, most operations are constrained to operate on
the elements in their encounter order; if the source of a stream is a
List containing [1, 2, 3], then the result of executing map(x -> x*2)
must be [2, 4, 6]. However, if the source has no defined encounter
order, then any permutation of the values [2, 4, 6] would be a valid
result.
For sequential streams, the presence or absence of an encounter order
does not affect performance, only determinism. If a stream is ordered,
repeated execution of identical stream pipelines on an identical
source will produce an identical result; if it is not ordered,
repeated execution might produce different results.
For parallel streams, relaxing the ordering constraint can sometimes
enable more efficient execution. Certain aggregate operations, such as
filtering duplicates (distinct()) or grouped reductions
(Collectors.groupingBy()) can be implemented more efficiently if
ordering of elements is not relevant. Similarly, operations that are
intrinsically tied to encounter order, such as limit(), may require
buffering to ensure proper ordering, undermining the benefit of
parallelism. In cases where the stream has an encounter order, but the
user does not particularly care about that encounter order, explicitly
de-ordering the stream with unordered() may improve parallel
performance for some stateful or terminal operations. However, most
stream pipelines, such as the "sum of weight of blocks" example above,
still parallelize efficiently even under ordering constraints.
Are there any guarantees in the JLS regarding the order in which stream operations are applied to the list elements?
The Streams library is not covered by the JLS. You would need to read the Javadoc for the library.
Streams also support parallel stream and the order in which things are processed depends on the implementations.
Applying the filter predicate to "bc" is not going to happen before applying the filter predicate to "a"?
It would be reasonable to assume that it would, but you can't guarantee it, nor should you be writing code which requires this guarantee otherwise you wouldn't be able to parallelise it later.
applying the mapping function to "def" is not going to happen before applying the mapping function to "a"?
It is safe assume this does happen, but you shouldn't write code which requires it.
There is no guarantee of the order in which list items are passed to predicate lambdas. Stream documentation makes guarantees regarding the output of streams, including the order of encounter; it does not make guarantees about implementation details, such as the order in which filter predicates are applied.
Therefore, the documentation does not prevent filter from, say, reading several elements, running the predicate on them in reverse order, and then sending the elements passing the predicate to the output of the stream in the order in which they came in. I don't know why filter() would do something like that, but doing so wouldn't break any guarantee made in the documentation.
You can make pretty strong inference from the documentation that filter() would call predicate on the elements in the order in which collection supplies them, because you are passing the result of calling stream() on a list, which calls Collection.stream(), and, according to Java documentation, guarantees that Stream<T> produced in this way is sequential:
Returns a sequential Stream with this collection as its source.
Further, filter() is stateless:
Stateless operations, such as filter and map, retain no state from previously seen element when processing a new element - each element can be processed independently of operations on other elements.
Therefore it is rather likely that filter would call the predicate on elements in the order they are supplied by the collection.
I am talking here specifically about stream(), not parallelStream()
Note that Stream<T> may be unordered without being parallel. For example, calling unordered() on a stream(), the result becomes unordered, but not parallel.
Are there any guarantees in the JLS regarding the order in which
stream operations are applied to the list elements?
Quoting from Ordering section in Stream javadocs
Streams may or may not have a defined encounter order. Whether or not
a stream has an encounter order depends on the source and the
intermediate operations.
Applying the filter predicate to "bc" is not going to happen before
applying the filter predicate to "a"?
As quoted above, streams may or may not have a defined order. But in your example since it is a List, the same Ordering section in Stream javadocs goes on saying that
If a stream is ordered, most operations are constrained to operate on
the elements in their encounter order; if the source of a stream is a
List containing [1, 2, 3], then the result of executing map(x -> x*2)
must be [2, 4, 6].
Applying the above statement to your example - I believe, the filter predicate would receive the elements in order defined in the List.
Or, applying the mapping function to "def" is not going to happen before applying the mapping function to "a"?
For this I would refer to the Stream operations section Stream operations in Streams, that says,
Stateless operations, such as filter and map, retain no state from
previously seen element when processing a new element
Since map() doesn't retain state, I believe, it is safe to assume "def" is not going to be processed before "a" in your example.
1 will be printed before 3?
Although it may be unlikely with sequential streams (like List) but not guaranteed as the Ordering section in Stream javadocs does indicate that
some terminal operations may ignore encounter order, such as
forEach().
If the stream is created from a list, it is guaranteed that the collected result will be ordered the same way the original list was, as the documentation states:
Ordering
If a stream is ordered, most operations are constrained to operate on the elements in their encounter order; if the source of a stream is a List containing [1, 2, 3], then the result of executing map(x -> x*2) must be [2, 4, 6]. However, if the source has no defined encounter order, then any permutation of the values [2, 4, 6] would be a valid result.
To go further, there is no guarantee regarding the order of the execution of the map execution though.
From the same documention page (in the Side Effects paragraph):
Side-effects
If the behavioral parameters do have side-effects, unless explicitly stated, there are no guarantees as to the visibility of those side-effects to other threads, nor are there any guarantees that different operations on the "same" element within the same stream pipeline are executed in the same thread. Further, the ordering of those effects may be surprising. Even when a pipeline is constrained to produce a result that is consistent with the encounter order of the stream source (for example, IntStream.range(0,5).parallel().map(x -> x*2).toArray() must produce [0, 2, 4, 6, 8]), no guarantees are made as to the order in which the mapper function is applied to individual elements, or in what thread any behavioral parameter is executed for a given element.
In practice, for an ordered sequential stream, chances are that the stream operations will be executed in order, but there is no guarantee.

What is the difference between .foreach and .stream().foreach? [duplicate]

This question already has answers here:
What is difference between Collection.stream().forEach() and Collection.forEach()?
(5 answers)
Closed 7 years ago.
This is a example:
code A:
files.forEach(f -> {
//TODO
});
and another code B may use on this way:
files.stream().forEach(f -> { });
What is the difference between both, with stream() and no stream()?
Practically speaking, they are mostly the same, but there is a small semantic difference.
Code A is defined by Iterable.forEach, whereas code B is defined by Stream.forEach. The definition of Stream.forEach allows for the elements to be processed in any order -- even for sequential streams. (For parallel streams, Stream.forEach will very likely process elements out-of-order.)
Iterable.forEach gets an Iterator from the source and calls forEachRemaining() on it. As far as I can see, all current (JDK 8) implementations of Stream.forEach on the collections classes will create a Spliterator built from one of the source's Iterators, and will then call forEachRemaining on that Iterator -- just like Iterable.forEach does. So they do the same thing, though the streams version has some extra setup overhead.
However, in the future, it's possible that the streams implementation could change so that this is no longer the case.
(If you want to guarantee ordering of processing streams elements, use forEachOrdered() instead.)
There is no difference in terms of semantics, though the direct implementation without stream is probably slightly more efficient.
A stream is an sequence of elements (i.e a data structure) for using up an operation or iteration. Any Collection can be exposed as a stream. The operations you perform on a stream can either be
Intermediate operations (map, skip, concat, substream, distinct, filter, sorted, limit, peek..) producing another java.util.stream.Stream but the intermediate operations are lazy operations, which will be executed only after a terminal operation was executed.
And the Terminal operations (forEach, max, count, matchAny, findFirst, reduce, collect, sum, findAny ) producing an object that is not a stream.
Basically it is similar to pipeline as in Unix.
Both approaches uses the terminal operation Iterable.forEach, but the version with .stream() also unnecessarily creates a Stream object representing the List. While there is no difference, it is suboptimal.

When a Collection is converted to a Stream, does the resulting Collection maintain any links to the original?

When working with a Collection in Java, I regularly convert it to a Stream to begin with, process and collect it, and then return the resulting Collection. For example:
static Set<String> getTopUsers(Set<String> users){
Set<String> topUsers = users.stream().filter((String s) -> isTop(s)).collect(Collectors.toSet());
return topUsers;
}
static boolean isTop(String user){
// some logic
}
Does the topUsers return value have any link to the original? For instance, could adding and removing elements from users result in any changes in topUsers, and vice-versa? I'm asking because I haven't been copying my parameters (e.g. users in this case) as I pass them in, and I'm wondering whether I should.
(I've looked at the documentation for Stream, and it mentions that "an operation on a stream produces a result, but does not modify its source" - but I just wanted to be sure that there's nothing that I'm not missing out on)
No, topUsers is a completely new Set with no relation to users. The Stream operations are executed, applying some transformation on the values and collecting the results in a new Set.
The values in the two sets might be the same (ex. your filter might not cause the removal of any values), but the sets themselves are independent.
From the documentation you quoted, a few nuggets (emphasis mine) :
Stream operations are divided into intermediate and terminal operations, and are combined to form stream pipelines. A stream pipeline consists of a source (such as a Collection, an array, a generator function, or an I/O channel); followed by zero or more intermediate operations such as Stream.filter or Stream.map; and a terminal operation such as Stream.forEach or Stream.reduce.
Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed.
For that to be true, we have to assume that as long as you have not called a terminal operation your original stream is not processed yet (or at least, totally processed).
Terminal operations, such as Stream.forEach or IntStream.sum, may traverse the stream to produce a result or a side-effect. After the terminal operation is performed, the stream pipeline is considered consumed, and can no longer be used; if you need to traverse the same data source again, you must return to the data source to get a new stream. In almost all cases, terminal operations are eager, completing their traversal of the data source and processing of the pipeline before returning. Only the terminal operations iterator() and spliterator() are not; these are provided as an "escape hatch" to enable arbitrary client-controlled pipeline traversals in the event that the existing operations are not sufficient to the task.
Any form of reduction is a terminal operation.
In your case, the users Set is then traversed and consumed as soon as (but no sooner than) the runtime hits the collect method. At which point, the datas coming out of your users set will be read and processed, and further updates to the original Set ignored. topUsers and users are "disconnected" at this point.
If you wanted your method to return "kind of" a live, filtred view of your original set using the Stream API, you could consider returning it as a Stream insted of a Set, and only expressing Intermediat Operations, but leavning the actual collection up to the caller.
Also note that a Collections tool such as Google Guava allows you to build a live updating Collection from anoother collection, using a predicate function, in pretty much the same way you're doing there. It might be more appropriate to what your are seeking to achieve here (be warned of concurrency effects though!).

Categories

Resources