I have a Stack<Object> and following piece of code:
while(!stack.isEmpty()){
Object object = stack.pop();
// do some operation on object
}
How this iteration can be implemented using Java 8 Stream so that it loops until the stack is empty and in every iteration the stack should be reduce by popping one element from top?
In Java 9, there will be a 3-arg version of Stream.iterate (like a for loop -- initial value, lambda for determining end-of-input, lambda for determining next input) that could do this, though it would be a little strained:
if (!stack.isEmpty()) {
Stream.iterate(stack.pop(),
e -> !stack.isEmpty(),
e -> stack.pop())
...
}
In case you don’t want to wait for the Java 9 solution, here’s a stream factory which works under Java 8.
public static <T> Stream<T> pop(Stack<T> stack) {
return StreamSupport.stream(new Spliterators.AbstractSpliterator<T>(
stack.size(), Spliterator.ORDERED|Spliterator.SIZED) {
public boolean tryAdvance(Consumer<? super T> action) {
if(stack.isEmpty()) return false;
action.accept(stack.pop());
return true;
}
}, false);
}
Note that this reports the initial size of the stack, taking it for granted, which implies that you must not change the stack in-between (modifying a stream source in-between is a bad idea anyway). On the other hand, this will make certain Stream operations more efficient than the iterate variant.
Now, a general warning that applies to both variants. Stream sources that are modified due to an ongoing Stream operation, like popping the elements which the Stream consumes, can leave the source in an unpredictable state. Short circuiting operations may not consume all elements and in combination with parallel Streams, they still may consume more elements than needed for the terminal operation.
So analogous to BufferedReader.lines()
After execution of the terminal stream operation there are no guarantees that the reader will be at a specific position from which to read the next character or line.
you should not make any assumptions about the Stack contents after consuming elements this way.
Not possible using the stream of the stack since first it would be in first-in-first-out order and second since it is based on an iterator would throw a ConcurrentModificationException. Still possible but of course not recommended when compared to the simple for loop:
IntStream.range(0, s.size()).forEach(i -> stack.pop());
Related
There are lots of questions regarding peek in the Java Streams API. I'm looking for a way to complete the following common pattern using Java Streams. I can make it work with Streams, but it is non-obvious which means slightly dangerous without a comment which it is not ideal.
boolean anyPricingComponentsChanged = false;
for (var pc : plan.getPricingComponents()) {
if (pc.getValidTill() == null || pc.getValidTill().compareTo(dateNow) <= 0) {
anyPricingComponentsChanged = true;
pc.setValidTill(dateNow);
}
}
My option:
long numberChanged = plan.getPricingComponents()
.stream()
.filter(pc -> pc.getValidTill() == null || pc.getValidTill().compareTo(dateNow) <= 0)
.peek(pc -> pc.setValidTill(dateNow))
.count(); //`count` rather than `findAny` to ensure that `peek` processes all components.
boolean anyPricingComponentsChanged = numberChanged != 0;
As an aside, whilst compareTo is not an expensive operation here and consistently returns the same result, in other cases this might not be true, and I'd rather avoid running it multiple times for this pattern.
// to ensure that peek processes all components
You can't really ensure that peek() would process all the stream elements that should be modified. In some cases, this operation can be elided from the pipeline, and you should not perform any important actions via peek().
Here's a quote from the documenation of peek():
API Note:
This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline ...
In cases where the stream implementation is able to optimize away the production of some or all the elements (such as with short-circuiting operations like findFirst, or in the example described in count()), the action will not be invoked for those elements.
Also, here's what Stream API documentation says regarding Side-effects:
If the behavioral parameters do have side-effects, unless explicitly stated, there are no guarantees as to:
the visibility of those side-effects to other threads;
that different operations on the "same" element within the same stream pipeline are executed in the same thread; and
that behavioral parameters are always invoked, since a stream implementation is free to elide operations (or entire stages) from a
stream pipeline if it can prove that it would not affect the result of
the computation.
...
The eliding of side-effects may also be surprising. With the exception
of terminal operations forEach and forEachOrdered, side-effects of
behavioral parameters may not always be executed when the stream
implementation can optimize away the execution of behavioral
parameters without affecting the result of the computation. (For a
specific example see the API note documented on the count operation.)
Amphesys added
Since peek is not meant to contribute to the result of the stream execution Stream implementations are free to throw it away.
Instead of relying on peek() you can do the following:
List<PricingComponent> componentsToChange = plan.getPricingComponents()
.stream()
.filter(pc -> pc.getValidTill() == null || pc.getValidTill().compareTo(dateNow) <= 0)
.toList();
componentsToChange.forEach(pc -> pc.setValidTill(dateNow));
boolean anyPricingComponentsChanged = componentsToChange.size() != 0;
If you don't want to materialize the objects that need to be modified as a List, then stick with a for-loop.
Note
The quotes above from the API documentation like "stream implementation is free to elide operations (or entire stages) from a stream pipeline if it can prove that it would not affect the result of the computation" are applicable to any intermediate operation having an embedded side-effect. Either a side-effect can be elided, or the whole pipeline stage (stream operation) optimized away if it has no impact on the result. And to be on the same page regurding the terminology, in short, side-effect - is anything that a function does apart from producing the required result (e.g. i -> { side-effect; return i * 2; })
Although it's not advisable to assign peek() with an action which should be executed at any circumstances, at least is choice doesn't contradicts the semantics of peek. To the contrary, performing side-effects via filter, map, or other operation which are not designed to operate through side-effects not only doesn't resolve the problem, but is also weird since it goes against the semantics of these operations and violates the Principle of least astonishment.
I can create a Stream from an array using Arrays.stream(array) or Stream.of(values). Similarly, is it possible to create a ParallelStream directly from an array, without creating an intermediate collection as in Arrays.asList(array).parallelStream()?
Stream.of(array).parallel()
or
Arrays.stream(array).parallel()
TLDR;
Any sequential Stream can be converted into a parallel one by calling .parallel() on it. So all you need is:
Create a stream
Invoke method parallel() on it.
Long answer
The question is pretty old, but I believe some additional explanation will make the things much clearer.
All implementations of Java streams implement interface BaseStream. Which as per JavaDoc is:
Base interface for streams, which are sequences of elements supporting sequential and parallel aggregate operations.
From API's point of view there is no difference between sequential and parallel streams. They share the same aggregate operations.
In order do distinguish between sequential and parallel streams the aggregate methods call BaseStream::isParallel method.
Let's explore the implementation of isParallel method in AbstractPipeline:
#Override
public final boolean isParallel() {
return sourceStage.parallel;
}
As you see, the only thing isParallel does is checking the boolean flag in source stage:
/**
* True if pipeline is parallel, otherwise the pipeline is sequential; only
* valid for the source stage.
*/
private boolean parallel;
So what does the parallel() method do then? How does it turn a sequential stream into a parallel one?
#Override
#SuppressWarnings("unchecked")
public final S parallel() {
sourceStage.parallel = true;
return (S) this;
}
Well it only sets the parallel flag to true. That's all it does.
As you can see, in current implementation of Java Stream API it doesn't matter how you create a stream (or receive it as a method parameter). You can always turn a stream into a parallel one with zero cost.
I've come across a rule in Sonar which says:
A key difference with other intermediate Stream operations is that the Stream implementation is free to skip calls to peek() for optimization purpose. This can lead to peek() being unexpectedly called only for some or none of the elements in the Stream.
Also, it's mentioned in the Javadoc which says:
This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline
In which case can java.util.Stream.peek() be skipped? Is it related to debugging?
Not only peek but also map can be skipped. It is for sake of optimization.
For example, when the terminal operation count() is called, it makes no sense to peek or map the individual items as such operations do not change the number/count of the present items.
Here are two examples:
1. Map and peek are not skipped because the filter can change the number of items beforehand.
long count = Stream.of("a", "aa")
.peek(s -> System.out.println("#1"))
.filter(s -> s.length() < 2)
.peek(s -> System.out.println("#2"))
.map(s -> {
System.out.println("#3");
return s.length();
})
.count();
#1
#2
#3
#1
1
2. Map and peek are skipped because the number of items is unchanged.
long count = Stream.of("a", "aa")
.peek(s -> System.out.println("#1"))
//.filter(s -> s.length() < 2)
.peek(s -> System.out.println("#2"))
.map(s -> {
System.out.println("#3");
return s.length();
})
.count();
2
Important: The methods should have no side-effects (they do above, but only for the sake of example).
Side-effects in behavioral parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement, as well as other thread-safety hazards.
The following implementation is dangerous. Assuming callRestApi method performs a REST call, it won't be performed as the Stream violates the side-effect.
long count = Stream.of("url1", "url2")
.map(string -> callRestApi(HttpMethod.POST, string))
.count();
/**
* Performs a REST call
*/
public String callRestApi(HttpMethod httpMethod, String url);
peek() is an intermediate operation, and it expects a consumer which perform an action (side-effect) on elements of the stream.
In case when a stream pipe-line doesn't contain intermediate operations which can change the number of elements in the stream, like takeWhile, filter, limit, etc., and ends with terminal operation count() and when the stream-source allows evaluating the number of elements in it, then count() simply interrogates the source and returns the result. All intermediate operations get optimized away.
Note: this optimization of count() operation, which exists since Java 9 (see the API Note), is not directly related to peek(), it would affect every intermediate operation which doesn't change the number of elements in the stream (for now these are map(), sorted(), peek()).
There's More to it
peek() has a very special niche among other intermediate operations.
By its nature, peek() differs from other intermediate operations like map() as well as from the terminal operations that cause side-effects (like peek() does), performing a final action for each element that reaches them, which are forEach() and forEachOrdered().
The key point is that peek() doesn't contribute to the result of stream execution. It never affects the result produced by the terminal operation, whether it's a value or a final action.
In other words, if we throw away peek() from the pipeline, it would not affect the terminal operation.
Documentation of the method peek() as well the Stream API documentation warns its action could be elided, and you shouldn't rely on it.
A quote from the documentation of peek():
In cases where the stream implementation is able to optimize away the
production of some or all the elements (such as with short-circuiting
operations like findFirst, or in the example described in count()),
the action will not be invoked for those elements.
A quote from the API documentation, paragraph Side-effects:
The eliding of side-effects may also be surprising. With the exception of terminal operations forEach and forEachOrdered, side-effects of behavioral parameters may not always be executed when the stream implementation can optimize away the execution of behavioral parameters without affecting the result of the computation.
Here's an example of the stream (link to the source) where none of the intermediate operations gets elided apart from peek():
Stream.of(1, 2, 3)
.parallel()
.peek(System.out::println)
.skip(1)
.map(n -> n * 10)
.forEach(System.out::println);
In this pipe-line peek() presides skip() therefor you might expect it to display every element from the source on the console. However, it doesn't happen (element 1 will not be printed). Due to the nature of peek() it might be optimized away without breaking the code, i.e. without affecting the terminal operation.
That's why documentation explicitly states that this operation is provided exclusively for debugging purposes, and it should not be assigned with an action which needs to be executed at any circumstances.
The referenced optimization at this thread is the known architecture of java streams which is based on lazy computation.
Streams are lazy; computation on the source data is only performed
when the terminal operation is initiated, and source elements are
consumed only as needed. (java doc)
Also
Intermediate operations return a new stream. They are always lazy;
executing an intermediate operation such as filter() does not actually
perform any filtering, but instead creates a new stream that, when
traversed, contains the elements of the initial stream that match the
given predicate. Traversal of the pipeline source does not begin until
the terminal operation of the pipeline is executed. (java doc)
This lazy computation affects several other operators not just .peek. In the same way that peek (which is an intermediate operation) is affected by this lazy computation are also all other intermediate operations affected (filter, map, mapToInt, mapToDouble, mapToLong, flatMap, flatMapToInt, flatMapToDouble, flatMapToLong). But probably someone not understanding the concept of lazy computation can be caught in the trap with .peek that sonar reports here.
So the example that the Sonar correctly reports
Stream.of("one", "two", "three", "four")
.filter(e -> e.length() > 3)
.peek(e -> System.out.println("Filtered value: " + e));
should not be used as is, because no terminal operation in the above example exists. So Streams will not invoke at all the intermidiate .peek operator, even though 2 elements ( "three", "four") are eligible to pass through the stream pipeline.
Example 1. Add a terminal operator like the following:
Stream.of("one", "two", "three", "four")
.filter(e -> e.length() > 3)
.peek(e -> System.out.println("Filtered value: " + e))
.collect(Collectors.toList()); // <----
and the elements passed through would be also passed through .peek intermediate operator. Never an element would be skipped on this example.
Example 2. Now here is the interesting part, if you use some other terminal operator for example the .findFirst because the Stream Api is based on lazy computation
Stream.of("one", "two", "three", "four")
.filter(e -> e.length() > 3)
.peek(e -> System.out.println("Filtered value: " + e))
.findFirst(); // <----
Only 1 element will pass through the operator .peek and not 2.
But as long as you know what you are doing (example 1) and you have understood lazy computation, you can expect that in certain cases .peek will be invoked for every element passing down the stream channel and no element would be skipped, and in other cases you would know which elements are to be skipped from .peek.
But extremely caution if you use .peek with parallel streams since there exists another set of traps which can arise. As the java API for .peek mentions:
For parallel stream pipelines, the action may be called at
* whatever time and in whatever thread the element is made available by the
* upstream operation. If the action modifies shared state,
* it is responsible for providing the required synchronization.
I have the following function:
public Stream getStream(boolean isParallel) {
...
return someSteamFromHere;
}
This function will return a parallel stream if "isParallel" is true, otherwise a sequential stream. Now I want to collect this parallel/sequential stream. Does the caller function need to implement this logic:
boolean isParallel = isParallel();
Stream stream = getStream(isParallel);
List list;
if (isParallel) {
list = stream.parallel().collect(Collectors.toList());
} else {
list = stream.collect(Collectors.toList());
}
Or can i simply collect the stream regardless, and if its parallel, it will be collected in parallel and if sequential, it will be collected in a single thread?
parallelism is a property of the stream. So, if you have a parallel stream, calling .parallel() on this is a no-op. It does absolutely nothing whatsoever.
Note that collecting a parallel stream does imply that any concept of 'order' is right out the window.
Your code can just be List list = stream.collect(Collectors.toList());.
Note that as a general rule, if parallelism matters at all, collecting it into a list seems... bizarre. Whatever performance benefits you think you're getting from treating it parallel are pretty much obliterated when you do this.
Why do you pass in the boolean to the function if you use it after the function's return? Either the function receives the boolean and uses it or it doesn't get it and the test sits outside as you wrote.
Btw, functions with boolean parameters are considered code smell as they clearly do more than one thing. Have a look here.
There is the question on whether java methods should return Collections or Streams, in which Brian Goetz answers that even for finite sequences, Streams should usually be preferred.
But it seems to me that currently many operations on Streams that come from other places cannot be safely performed, and defensive code guards are not possible because Streams do not reveal if they are infinite or unordered.
If parallel was a problem to the operations I want to perform on a Stream(), I can call isParallel() to check or sequential to make sure computation is in parallel (if i remember to).
But if orderedness or finity(sizedness) was relevant to the safety of my program, I cannot write safeguards.
Assuming I consume a library implementing this fictitious interface:
public interface CoordinateServer {
public Stream<Integer> coordinates();
// example implementations:
// finite, ordered, sequential
// IntStream.range(0, 100).boxed()
// final AtomicInteger atomic = new AtomicInteger();
// // infinite, unordered, sequential
// Stream.generate(() -> atomic2.incrementAndGet())
// infinite, unordered, parallel
// Stream.generate(() -> atomic2.incrementAndGet()).parallel()
// finite, ordered, sequential, should-be-closed
// Files.lines(Path.path("coordinates.txt")).map(Integer::parseInt)
}
Then what operations can I safely call on this stream to write a correct algorithm?
It seems if I maybe want to do write the elements to a file as a side-effect, I need to be concerned about the stream being parallel:
// if stream is parallel, which order will be written to file?
coordinates().peek(i -> {writeToFile(i)}).count();
// how should I remember to always add sequential() in such cases?
And also if it is parallel, based on what Threadpool is it parallel?
If I want to sort the stream (or other non-short-circuit operations), I somehow need to be cautious about it being infinite:
coordinates().sorted().limit(1000).collect(toList()); // will this terminate?
coordinates().allMatch(x -> x > 0); // will this terminate?
I can impose a limit before sorting, but which magic number should that be, if I expect a finite stream of unknown size?
Finally maybe I want to compute in parallel to save time and then collect the result:
// will result list maintain the same order as sequential?
coordinates().map(i -> complexLookup(i)).parallel().collect(toList());
But if the stream is not ordered (in that version of the library), then the result might become mangled due to the parallel processing. But how can I guard against this, other than not using parallel (which defeats the performance purpose)?
Collections are explicit about being finite or infinite, about having an order or not, and they do not carry the processing mode or threadpools with them. Those seem like valuable properties for APIs.
Additionally, Streams may sometimes need to be closed, but most commonly not. If I consume a stream from a method (of from a method parameter), should I generally call close?
Also, streams might already have been consumed, and it would be good to be able to handle that case gracefully, so it would be good to check if the stream has already been consumed;
I would wish for some code snippet that can be used to validate assumptions about a stream before processing it, like>
Stream<X> stream = fooLibrary.getStream();
Stream<X> safeStream = StreamPreconditions(
stream,
/*maxThreshold or elements before IllegalArgumentException*/
10_000,
/* fail with IllegalArgumentException if not ordered */
true
)
After looking at things a bit (some experimentation and here) as far as I see, there is no way to know definitely whether a stream is finite or not.
More than that, sometimes even it is not determined except at runtime (such as in java 11 - IntStream.generate(() -> 1).takeWhile(x -> externalCondition(x))).
What you can do is:
You can find out with certainty if it is finite, in a few ways (notice that receiving false on these does not mean it is infinite, only that it may be so):
stream.spliterator().getExactSizeIfKnown() - if this has an known exact size, it is finite, otherwise it will return -1.
stream.spliterator().hasCharacteristics(Spliterator.SIZED) - if it is SIZED will return true.
You can safe-guard yourself, by assuming the worst (depends on your case).
stream.sequential()/stream.parallel() - explicitly set your preferred consumption type.
With potentially infinite stream, assume your worst case on each scenario.
For example assume you want listen to a stream of tweets until you find one by Venkat - it is a potentially infinite operation, but you'd like to wait until such a tweet is found. So in this case, simply go for stream.filter(tweet -> isByVenkat(tweet)).findAny() - it will iterate until such a tweet comes along (or forever).
A different scenario, and probably the more common one, is wanting to do something on all the elements, or only to try a certain amount of time (similar to timeout). For this, I'd recommend always calling stream.limit(x) before calling your operation (collect or allMatch or similar) where x is the amount of tries you're willing to tolerate.
After all this, I'll just mention that I think returning a stream is generally not a good idea, and I'd try to avoid it unless there are large benefits.