takeWhile() working differently with flatmap - java

I am creating snippets with takeWhile to explore its possibilities. When used in conjunction with flatMap, the behaviour is not in line with the expectation. Please find the code snippet below.
String[][] strArray = {{"Sample1", "Sample2"}, {"Sample3", "Sample4", "Sample5"}};
Arrays.stream(strArray)
.flatMap(indStream -> Arrays.stream(indStream))
.takeWhile(ele -> !ele.equalsIgnoreCase("Sample4"))
.forEach(ele -> System.out.println(ele));
Actual Output:
Sample1
Sample2
Sample3
Sample5
ExpectedOutput:
Sample1
Sample2
Sample3
Reason for the expectation is that takeWhile should be executing till the time the condition inside turns true. I have also added printout statements inside flatmap for debugging. The streams are returned just twice which is inline with the expectation.
However, this works just fine without flatmap in the chain.
String[] strArraySingle = {"Sample3", "Sample4", "Sample5"};
Arrays.stream(strArraySingle)
.takeWhile(ele -> !ele.equalsIgnoreCase("Sample4"))
.forEach(ele -> System.out.println(ele));
Actual Output:
Sample3
Here the actual output matches with the expected output.
Disclaimer: These snippets are just for code practise and does not serve any valid usecases.
Update:
Bug JDK-8193856: fix will be available as part of JDK 10.
The change will be to correct whileOps
Sink::accept
#Override
public void accept(T t) {
if (take = predicate.test(t)) {
downstream.accept(t);
}
}
Changed Implementation:
#Override
public void accept(T t) {
if (take && (take = predicate.test(t))) {
downstream.accept(t);
}
}

This is a bug in JDK 9 - from issue #8193856:
takeWhile is incorrectly assuming that an upstream operation supports and honors cancellation, which unfortunately is not the case for flatMap.
Explanation
If the stream is ordered, takeWhile should show the expected behavior. This is not entirely the case in your code because you use forEach, which waives order. If you care about it, which you do in this example, you should use forEachOrdered instead. Funny thing: That doesn't change anything. 🤔
So maybe the stream isn't ordered in the first place? (In that case the behavior is ok.) If you create a temporary variable for the stream created from strArray and check whether it is ordered by executing the expression ((StatefulOp) stream).isOrdered(); at the breakpoint, you will find that it is indeed ordered:
String[][] strArray = {{"Sample1", "Sample2"}, {"Sample3", "Sample4", "Sample5"}};
Stream<String> stream = Arrays.stream(strArray)
.flatMap(indStream -> Arrays.stream(indStream))
.takeWhile(ele -> !ele.equalsIgnoreCase("Sample4"));
// breakpoint here
System.out.println(stream);
That means that this is very likely an implementation error.
Into The Code
As others have suspected, I now also think that this might be connected to flatMap being eager. More precisely, both problems might have the same root cause.
Looking into the source of WhileOps, we can see these methods:
#Override
public void accept(T t) {
if (take = predicate.test(t)) {
downstream.accept(t);
}
}
#Override
public boolean cancellationRequested() {
return !take || downstream.cancellationRequested();
}
This code is used by takeWhile to check for a given stream element t whether the predicate is fulfilled:
If so, it passes the element on to the downstream operation, in this case System.out::println.
If not, it sets take to false, so when it is asked next time whether the pipeline should be canceled (i.e. it is done), it returns true.
This covers the takeWhile operation. The other thing you need to know is that forEachOrdered leads to the terminal operation executing the method ReferencePipeline::forEachWithCancel:
#Override
final boolean forEachWithCancel(Spliterator<P_OUT> spliterator, Sink<P_OUT> sink) {
boolean cancelled;
do { } while (
!(cancelled = sink.cancellationRequested())
&& spliterator.tryAdvance(sink));
return cancelled;
}
All this does is:
check whether pipeline was canceled
if not, advance the sink by one element
stop if this was the last element
Looks promising, right?
Without flatMap
In the "good case" (without flatMap; your second example) forEachWithCancel directly operates on the WhileOp as sink and you can see how this plays out:
ReferencePipeline::forEachWithCancel does its loop:
WhileOps::accept is given each stream element
WhileOps::cancellationRequested is queried after each element
at some point "Sample4" fails the predicate and the stream is canceled
Yay!
With flatMap
In the "bad case" (with flatMap; your first example), forEachWithCancel operates on the flatMap operation, though, , which simply calls forEachRemaining on the ArraySpliterator for {"Sample3", "Sample4", "Sample5"}, which does this:
if ((a = array).length >= (hi = fence) &&
(i = index) >= 0 && i < (index = hi)) {
do { action.accept((T)a[i]); } while (++i < hi);
}
Ignoring all that hi and fence stuff, which is only used if the array processing is split for a parallel stream, this is a simple for loop, which passes each element to the takeWhile operation, but never checks whether it is cancelled. It will hence eagerly ply through all elements in that "substream" before stopping, likely even through the rest of the stream.

This is a bug no matter how I look at it - and thank you Holger for your comments. I did not want to put this answer in here (seriously!), but none of the answer clearly states that this is a bug.
People are saying that this has to with ordered/un-ordered, and this is not true as this will report true 3 times:
Stream<String[]> s1 = Arrays.stream(strArray);
System.out.println(s1.spliterator().hasCharacteristics(Spliterator.ORDERED));
Stream<String> s2 = Arrays.stream(strArray)
.flatMap(indStream -> Arrays.stream(indStream));
System.out.println(s2.spliterator().hasCharacteristics(Spliterator.ORDERED));
Stream<String> s3 = Arrays.stream(strArray)
.flatMap(indStream -> Arrays.stream(indStream))
.takeWhile(ele -> !ele.equalsIgnoreCase("Sample4"));
System.out.println(s3.spliterator().hasCharacteristics(Spliterator.ORDERED));
It's very interesting also that if you change it to:
String[][] strArray = {
{ "Sample1", "Sample2" },
{ "Sample3", "Sample5", "Sample4" }, // Sample4 is the last one here
{ "Sample7", "Sample8" }
};
then Sample7 and Sample8 will not be part of the output, otherwise they will. It seems that flatmap ignores a cancel flag that would be introduced by dropWhile.

If you look at the documentation for takeWhile:
if this stream is ordered, [returns] a stream consisting of the
longest prefix of elements taken from this stream that match the given
predicate.
if this stream is unordered, [returns] a stream consisting of a subset
of elements taken from this stream that match the given predicate.
Your stream is coincidentally ordered, but takeWhile doesn't know that it is. As such, it is returning 2nd condition - the subset. Your takeWhile is just acting like a filter.
If you add a call to sorted before takeWhile, you'll see the result you expect:
Arrays.stream(strArray)
.flatMap(indStream -> Arrays.stream(indStream))
.sorted()
.takeWhile(ele -> !ele.equalsIgnoreCase("Sample4"))
.forEach(ele -> System.out.println(ele));

The reason for that is the flatMap operation also being an intermediate operations with which (one of) the stateful short-circuiting intermediate operation takeWhile is used.
The behavior of flatMap as pointed by Holger in this answer is certainly a reference one shouldn't miss out to understand the unexpected output for such short-circuiting operations.
Your expected result can be achieved by splitting these two intermediate operations by introducing a terminal operation to deterministically use an ordered stream further and performing them for a sample as :
List<String> sampleList = Arrays.stream(strArray).flatMap(Arrays::stream).collect(Collectors.toList());
sampleList.stream().takeWhile(ele -> !ele.equalsIgnoreCase("Sample4"))
.forEach(System.out::println);
Also, there seems to be a related Bug#JDK-8075939 to trace this behavior already registered.
Edit: This can be tracked further at JDK-8193856 accepted as a bug.

Related

How to set a value to variable based on multiple conditions using Java Streams API?

I couldn't wrap my head around writing the below condition using Java Streams. Let's assume that I have a list of elements from the periodic table. I've to write a method that returns a String by checking whether the list has Silicon or Radium or Both. If it has only Silicon, method has to return Silicon. If it has only Radium, method has to return Radium. If it has both, method has to return Both. If none of them are available, method returns "" (default value).
Currently, the code that I've written is below.
String resolve(List<Element> elements) {
AtomicReference<String> value = new AtomicReference<>("");
elements.stream()
.map(Element::getName)
.forEach(name -> {
if (name.equalsIgnoreCase("RADIUM")) {
if (value.get().equals("")) {
value.set("RADIUM");
} else {
value.set("BOTH");
}
} else if (name.equalsIgnoreCase("SILICON")) {
if (value.get().equals("")) {
value.set("SILICON");
} else {
value.set("BOTH");
}
}
});
return value.get();
}
I understand the code looks messier and looks more imperative than functional. But I don't know how to write it in a better manner using streams. I've also considered the possibility of going through the list couple of times to filter elements Silicon and Radium and finalizing based on that. But it doesn't seem efficient going through a list twice.
NOTE : I also understand that this could be written in an imperative manner rather than complicating with streams and atomic variables. I just want to know how to write the same logic using streams.
Please share your suggestions on better ways to achieve the same goal using Java Streams.
It could be done with Stream IPA in a single statement and without multiline lambdas, nested conditions and impure function that changes the state outside the lambda.
My approach is to introduce an enum which elements correspond to all possible outcomes with its constants EMPTY, SILICON, RADIUM, BOTH.
All the return values apart from empty string can be obtained by invoking the method name() derived from the java.lang.Enum. And only to caver the case with empty string, I've added getName() method.
Note that since Java 16 enums can be declared locally inside a method.
The logic of the stream pipeline is the following:
stream elements turns into a stream of string;
gets filtered and transformed into a stream of enum constants;
reduction is done on the enum members;
optional of enum turs into an optional of string.
Implementation can look like this:
public static String resolve(List<Element> elements) {
return elements.stream()
.map(Element::getName)
.map(String::toUpperCase)
.filter(str -> str.equals("SILICON") || str.equals("RADIUM"))
.map(Elements::valueOf)
.reduce((result, next) -> result == Elements.BOTH || result != next ? Elements.BOTH : next)
.map(Elements::getName)
.orElse("");
}
enum
enum Elements {EMPTY, SILICON, RADIUM, BOTH;
String getName() {
return this == EMPTY ? "" : name(); // note name() declared in the java.lang.Enum as final and can't be overridden
}
}
main
public static void main(String[] args) {
System.out.println(resolve(List.of(new Element("Silicon"), new Element("Lithium"))));
System.out.println(resolve(List.of(new Element("Silicon"), new Element("Radium"))));
System.out.println(resolve(List.of(new Element("Ferrum"), new Element("Oxygen"), new Element("Aurum")))
.isEmpty() + " - no target elements"); // output is an empty string
}
output
SILICON
BOTH
true - no target elements
Note:
Although with streams you can produce the result in O(n) time iterative approach might be better for this task. Think about it this way: if you have a list of 10.000 elements in the list and it starts with "SILICON" and "RADIUM". You could easily break the loop and return "BOTH".
Stateful operations in the streams has to be avoided according to the documentation, also to understand why javadoc warns against stateful streams you might take a look at this question. If you want to play around with AtomicReference it's totally fine, just keep in mind that this approach is not considered to be good practice.
I guess if I had implemented such a method with streams, the overall logic would be the same as above, but without utilizing an enum. Since only a single object is needed it's a reduction, so I'll apply reduce() on a stream of strings, extract the reduction logic with all the conditions to a separate method. Normally, lambdas have to be well-readable one-liners.
Collect the strings to a unique set. Then check containment in constant time.
Set<String> names = elements.stream().map(Element::getName).map(String::toLowerCase).collect(toSet());
boolean hasSilicon = names.contains("silicon");
boolean hasRadium = names.contains("radium");
String result = "";
if (hasSilicon && hasRadium) {
result = "BOTH";
} else if (hasSilicon) {
result = "SILICON";
} else if (hasRadium) {
result = "RADIUM";
}
return result;
i have used predicate in filter to for radium and silicon and using the resulted set i am printing the result.
import java.util.ArrayList;
import java.util.List;
import java.util.Set;
import java.util.stream.Collectors;
public class Test {
public static void main(String[] args) {
List<Element> elementss = new ArrayList<>();
Set<String> stringSet = elementss.stream().map(e -> e.getName())
.filter(string -> (string.equals("Radium") || string.equals("Silicon")))
.collect(Collectors.toSet());
if(stringSet.size()==2){
System.out.println("both");
}else if(stringSet.size()==1){
System.out.println(stringSet);
}else{
System.out.println(" ");
}
}
}
You could save a few lines if you use regex, but I doubt if it is better than the other answers:
String resolve(List<Element> elements) {
String result = elements.stream()
.map(Element::getName)
.map(String::toUpperCase)
.filter(str -> str.matches("RADIUM|SILICON"))
.sorted()
.collect(Collectors.joining());
return result.matches("RADIUMSILICON") ? "BOTH" : result;
}

How to return void in stream?

I am haveing List of sending orders.It is increased when method name of parameter is same
But It is not working. Because It hasn't Termination operation
List<SendingOrdres> sendingOrders = new ArrayList<SendingOrdres>();
private void countUpOrResetSendingOrders(String method) {
sendingOrders.stream()
.filter((e) -> {
System.out.println("filter:"+e);
return e.getMethod().equals(method);
})
.peek((e) -> System.out.println("peek:"+e)) //For check
.map((e)->{
int nextNowSendingOrder = e.getNowSendingOrder()+1;
if(nextNowSendingOrder > e.getMaxSendingOrder()) {
e.setNowSendingOrder(0);
}else {
e.setNowSendingOrder(nextNowSendingOrder);
}
return e;
});
// no Termination operation
}
I added Termination operation in upper code. It is working well.
.collect(Collectors.toList());
I have a question.I don't need to return value. So i want to return void.
But If Termination operation hasn't, Stream is not working.
How to return void in stream?
Stream consists of two mandatory (sourcing, terminal) and one optional (intermediate) parts.
Stream:
is generated with sourcing operation (something that creates the Stream<T> instance);
is then optionally continued with one or more, chained intermediate operation(s);
is finally terminated with terminal operation.
void can only be considered to be the return type of the terminal operation (hence, of its lambda (or method reference) expression) in the stream, because every intermediate operation has to return stream, upon which, subsequent intermediate (or terminal) operation would operate.
For example:
List.of(1, 2, 3, 4)
.stream() //sourcing the stream
.forEach(System.out::println); //terminating the stream
is OK, because println just consumes the stream and doesn't have to return another stream.
List.of(1, 2, 3, 4)
.stream() //sourcing the stream
.filter(System.out::println); //ouch..
however, does not compile.
Additionally, beware, that Stream API is lazy, in Java. Intermediate operations are not effectively evaluated, until the terminal operation is executed.

Guava's Streams::findLast implementation

I am looking into the implementation of Streams::findLast from Guava and while trying to understand it, there were a couple of things that simply I could not grasp. Here is it's implementation:
public static <T> java.util.Optional<T> findLast(Stream<T> stream) {
class OptionalState {
boolean set = false;
T value = null;
void set(#Nullable T value) {
set = true;
this.value = value;
}
T get() {
checkState(set);
return value;
}
}
OptionalState state = new OptionalState();
Deque<Spliterator<T>> splits = new ArrayDeque<>();
splits.addLast(stream.spliterator());
while (!splits.isEmpty()) {
Spliterator<T> spliterator = splits.removeLast();
if (spliterator.getExactSizeIfKnown() == 0) {
continue; // drop this split
}
// Many spliterators will have trySplits that are SUBSIZED even if they are not themselves
// SUBSIZED.
if (spliterator.hasCharacteristics(Spliterator.SUBSIZED)) {
// we can drill down to exactly the smallest nonempty spliterator
while (true) {
Spliterator<T> prefix = spliterator.trySplit();
if (prefix == null || prefix.getExactSizeIfKnown() == 0) {
break;
} else if (spliterator.getExactSizeIfKnown() == 0) {
spliterator = prefix;
break;
}
}
// spliterator is known to be nonempty now
spliterator.forEachRemaining(state::set);
return java.util.Optional.of(state.get());
}
Spliterator<T> prefix = spliterator.trySplit();
if (prefix == null || prefix.getExactSizeIfKnown() == 0) {
// we can't split this any further
spliterator.forEachRemaining(state::set);
if (state.set) {
return java.util.Optional.of(state.get());
}
// fall back to the last split
continue;
}
splits.addLast(prefix);
splits.addLast(spliterator);
}
return java.util.Optional.empty();
}
In essence the implementation is not that complicated to be honest, but here are the things that I find a bit weird (and I'll take the blame here if this question gets closed as "opinion-based", I understand it might happen).
First of all is the creation of OptionalState class, this could have been replaced with an array of a single element:
T[] state = (T[]) new Object[1];
and used as simple as:
spliterator.forEachRemaining(x -> state[0] = x);
Then the entire method could be split into 3 pieces:
when a certain Spliterator is known to be empty:
if (spliterator.getExactSizeIfKnown() == 0)
In this case it's easy - just drop it.
then if the Spliterator is known to be SUBSIZED. This is the "happy-path" scenario; as in this case we can split this until we get to the last element. Basically the implementation says: split until the prefix is either null or it's empty (in which case consume the "right" spliterator) or if after a split the "right" spliterator is known to be empty, consume the prefix one. This is done via:
// spliterator is known to be nonempty now
spliterator.forEachRemaining(state::set);
return java.util.Optional.of(state.get());
Second question I have is actually about this comment:
// Many spliterators will have trySplits that are SUBSIZED
// even if they are not themselves SUBSIZED.
This is very interesting, but I could not find such an example, would appreciate if someone would introduce me to one. As a matter of fact, because this comment exists, the code in the next (3-rd part of the method can not be done with a while(true) like the second), because it assumes that after a trySplit we could obtain a Spliterator that is SUBSIZED, even if our initial one was not, so it has to go to the very beginning of findLast.
this part of the method is when a Spliterator is known not to be SUBSIZED and in this case it does not have a known size; thus it relies on how the Spliterator from the source is implemented and in this case actually a findLast makes little sense... for example a Spliterator from a HashSet will return whatever the last entry is in the last bucket...
When you iterate a Spliterator of an unknown size, you have to track whether an element has been encountered. This can be done by calling tryAdvance and using the return value or by using forEachRemaining with a Consumer which records whether an element has been encountered. When you go the latter route, a dedicated class is simpler than an array. And once you have a dedicated class, why not use it for the SIZED spliterator as well.
What’s strange to me, is that this local class, which only exists to be used as a Consumer, doesn’t implement Consumer but requires the binding via state::set.
Consider
Stream.concat(
Stream.of("foo").filter(s -> !s.isEmpty()),
Stream.of("bar", "baz"))
The Spliterator representing the entire stream can’t have the SIZED characteristic. But when splitting off the first substream with the unknown size, the remaining stream has a known size.
Test code:
Spliterator<String> sp = Stream.concat(
Stream.of("foo").filter(s -> !s.isEmpty()),
Stream.of("bar", "baz"))
.spliterator();
do {
System.out.println(
"SIZED: "+sp.hasCharacteristics(Spliterator.SIZED)
+ ", SUBSIZED: "+sp.hasCharacteristics(Spliterator.SUBSIZED)
+ ", exact size if known: "+sp.getExactSizeIfKnown());
} while(sp.trySplit() != null);
Result:
SIZED: false, SUBSIZED: false, exact size if known: -1
SIZED: true, SUBSIZED: true, exact size if known: 2
SIZED: true, SUBSIZED: true, exact size if known: 1
But to me, it looks weird when someone tells in a comment to know that splitting can change the characteristics and then doing a pre-test with SUBSIZED, instead of just doing the split and check whether the result has a known size. After all, the code is doing the split anyway, in the alternative branch, when the characteristic is not present. In my old answer, I did the pretest to avoid allocating data structures, but here, the ArrayDeque is always created and used. But I think, even my old answer could be simplified.
I’m not sure what you are aiming at. When a Spliterator has the ORDERED characteristic, the order of traversal and splitting is well-defined. Since HashSet is not ordered, the term “last” is meaningless. If you are radical, you could optimize the operation to just return the first element for unordered streams; that’s valid and much faster.
What is strange, is this condition:
if (prefix == null || prefix.getExactSizeIfKnown() == 0) {
// we can't split this any further
(and a similar loop termination in the SUBSIZED path)
Just because one prefix happened to have a known zero size, it does not imply that the suffix can’t split further. Nothing in the specification says that.
As a consequence of this condition, Stream.concat(Stream.of("foo"), Stream.of("bar","baz")) can be handled optimally, whereas for Stream.concat(Stream.of(), Stream.of("bar", "baz")), it will fall back to a traversal, because the first prefix has a known size of zero.

Fail(or rather, Return) fast using functional java

Apologies for the badly chosen title, couldn't think of a better title. Please feel free to suggest a new title.
I want to write the following piece of code using Java 8 lambda expressions
List<Function<MyClass, Optional<String>>> functions //assume initialized
for (final Function<MyClass, Optional<String>> function : functions) {
final Optional<String> result = function.apply(objectOfMyClass);
if (result.isPresent()) { //as soon as one of the function returns a non-null string, return
return result; //
}
}
Any suggestions on how to go about it?
functions.stream()
.map(f -> f.apply(objectOfMyClass))
.filter(Optional::isPresent)
.map(Optional::get)
.findAny()
findAny is a short-circuiting terminal operation, which basically means that yes, it short circuits as soon as it finds a valid output.

Terminate Iterable.forEach early [duplicate]

This question already has answers here:
Limit a stream by a predicate
(19 answers)
Closed 8 years ago.
I have a set and a method:
private static Set<String> set = ...;
public static String method(){
final String returnVal[] = new String[1];
set.forEach((String str) -> {
returnVal[0] += str;
//if something: goto mark
});
//mark
return returnVal[0];
}
Can I terminate the forEach inside the lambda (with or without using exceptions)?
Should I use an anonymous class?
I could do this:
set.forEach((String str) -> {
if(someConditions()){
returnVal[0] += str;
}
});
but it wastes time.
implementation using stream.reduce
return set.parallelStream().reduce((output, next) -> {
return someConditions() ? next : output;
}).get(); //should avoid empty set before
I am looking for the fastest solution so exception and a 'real' for each loop are acceptable if they are fast enough.
I'm reluctant to answer this even though I'm not entirely sure what you're attempting to accomplish, but the simple answer is no, you can't terminate a forEach when it's halfway through processing elements.
The official Javadoc states that it is a terminal operation that applies against all elements in the stream.
Performs an action for each element of this stream.
This is a terminal operation.
If you want to gather the results into a single result, you want to use reduction instead.
Be sure to consider what it is a stream is doing. It is acting on all elements contained in it - and if it's filtered along the way, each step in the chain can be said to act on all elements in its stream, even if it's a subset of the original.
In case you were curious as to why simply putting a return wouldn't have any effect, here's the implementation of forEach.
default void forEach(Consumer<? super T> action) {
Objects.requireNonNull(action);
for (T t : this) {
action.accept(t);
}
}
The consumer is explicitly passed in, ad this is done independently of the actual iteration going on. I imagine you could throw an exception, but that would be tacky when more elegant solutions likely exist.

Categories

Resources