RxJava groupBy and subsequent blocking operations (onComplete missing?) - java

I have boiled my problem down into the following snippet:
Observable<Integer> numbers = Observable.just(1, 2, 3);
Observable<GroupedObservable<Integer,Integer>> outer = numbers.groupBy(i->i%3);
System.out.println(outer.count().toBlocking().single());
which blocks interminably. I've been reading several posts and believe I understand the problem: GroupedObservables will not call onComplete until their inner Observables have also been completed. Unfortunately though I still can't get the above snippet to print!
For example, the following:
Observable<Integer> just = Observable.just(1, 2, 3);
Observable<GroupedObservable<Integer,Integer>> groupBy = just.groupBy(i->i%3);
groupBy.subscribe(inner -> inner.ignoreElements());
System.out.println(groupBy.count().toBlocking().single());
still does nothing. Have I misunderstood the problem? Is there another problem? In short, how can I get the above snippets to work?
Many thanks in advance,
Dan.

Yes, you have to consume the groups in some fashion. Your second example doesn't work because you have two independent subscription to the grouping operation.
Usually, the solution is flatMap, but not not with ignoreElements because that will just complete and count won't get any elements. Instead, you can use takeLast(1):
Observable.just(1, 2, 3)
.groupBy(k -> k % 3)
.flatMap(g -> g.takeLast(1))
.count()
.toBlocking()
.forEach(System.out::println);

Related

Flux takeUntil only takes one element

Can someone help me understand what is going on here with Flux's takeUntil operator?
Flux.just(1, 2, 3, 4, 5)
.takeUntil { it < 4 }
.map { println("Flux:$it") }
.subscribe()
In the console, the only thing that is printed is:
Flux:1
I expected to see
Flux:1
Flux:2
Flux:3
Why do I only see one element?
Please, note that you are using the takeUntil() operator:
Relay values from this Flux until the given Predicate matches. This includes the matching data (unlike takeWhile(java.util.function.Predicate<? super T>)).
— Flux (reactor-core 3.4.22).
Please, note: «until»:
until the given Predicate matches
To achieve the desired behavior, please, consider using the takeWhile() operator instead:
Relay values from this Flux while a predicate returns TRUE for the values (checked before each value is delivered). This only includes the matching data (unlike takeUntil(java.util.function.Predicate<? super T>)).
— Flux (reactor-core 3.4.22).

Bug in parallelStream in java

Can someone tell me why this is happening and if it's expected behaviour or a bug
List<Integer> a = Arrays.asList(1,1,3,3);
a.parallelStream().filter(Objects::nonNull)
.filter(value -> value > 2)
.reduce(1,Integer::sum)
Answer: 10
But if we use stream instead of parallelStream I'm getting the right & expected answer 7
The first argument to reduce is called "identity" and not "initialValue".
1 is no identity according to addition. 1 is identity for multiplication.
Though you need to provide 0 if you want to sum the elements.
Java uses "identity" instead of "initialValue" because this little trick allows to parallelize reduce easily.
In parallel execution, each thread will run the reduce on a part of the stream, and when the threads are done, they will be combined using the very same reduce function.
Though it will look something like this:
mainThread:
start thread1;
start thread2;
wait till both are finished;
thread1:
return sum(1, 3); // your reduce function applied to a part of the stream
thread2:
return sum(1, 3);
// when thread1 and thread2 are finished:
mainThread:
return sum(sum(1, resultOfThread1), sum(1, resultOfThread2));
= sum(sum(1, 4), sum(1, 4))
= sum(5, 5)
= 10
I hope you can see, what happens and why the result is not what you expected.

Java Reactor StepVerifier.withVirtualTime loop: repeatedly check with "expectNoEvent()", "expectNext()" and "thenAwait()"

I am learning Java 11 reactor. I have seen this example:
StepVerifier.withVirtualTime(() -> Flux.interval(Duration.ofSeconds(1)).take(3600))
.expectSubscription()
.expectNextCount(3600);
This example just checks that with a Flux<Long> which increments 1 after every second till one hour, the final count is 3600.
But, is there any way to check the counter repeatedly after every second?
I know this:
.expectNoEvent(Duration.ofSeconds(1))
.expectNext(0L)
.thenAwait(Duration.ofSeconds(1))
But I have seen no way to repeatedly check this after every second, like:
.expectNoEvent(Duration.ofSeconds(1))
.expectNext(i)
.thenAwait(Duration.ofSeconds(1))
when i increments till 3600. Is there?
PS:
I tried to add verifyComplete() at last in a long-running tests and it will never end. Do I have to? Or just ignore it?
You can achieve what you wanted by using expectNextSequence. You have to pass an Iterable and include every element you expect to arrive. See my example below:
var longRange = LongStream.range(0, 3600)
.boxed()
.collect(Collectors.toList());
StepVerifier
.withVirtualTime(() -> Flux.interval(Duration.ofSeconds(1)).take(3600))
.expectSubscription()
.thenAwait(Duration.ofHours(1))
.expectNextSequence(longRange)
.expectComplete().verify();
If you don't add verifyComplete() or expectComplete().verify() then the JUnit test won't wait until elements arrive from the flux and just terminate.
For further reference see the JavaDoc of verify():
this method will block until the stream has been terminated

Infinite loop JavaRDD<String> spark 1.6

i'am trying to iterate a JavaRDD and find element by applying method which use this RDD and then i should delete is
here is my code:
items=input.map(x->{
min=getMin(input);
return min;
})
.filter(x -> ! Domine(x, min));
but there is no result it seem an infinite loop
how can i fix it
thanks
When it comes to implementations like this one (same as Java8 streams, or Kotlin sequences) they are implemented in a lazy way, thus you need to perform a terminal operation, only then the work will be done.
So if you do a filter and end there - nothing will happen since you didn't perform any terminal operation. Do for example first(), take(1), forEach(...) or any other terminal operation, you can find them here.
From the very vague description, i believe what you require would be something like the following, assuming that input is of type JavaRDD<Row>:
final Row min = input.min((row1, row2) -> {
// TODO: replace by some real comparator implementation
Integer row1value = row1.getInt(row1.fieldIndex("fieldName"));
Integer row2value = row2.getInt(row2.fieldIndex("fieldName"));
return row1value.compareTo(row2value);
});
items = input.filter(row -> !Domine(row, min));
Sine Apache SPARK Transformations like filter are inherently lazy, to actually retrieve the value you would have to then write List collectedValues = items.collect(); I would, however, strongly recommend that .collect() never actually goes into production since it can be very dangerous indeed.

Why can't stream of streams be reduced un parallel ? / stream has already been operated upon or closed

Context
I've stumble upon a rather annoying problem : I've a program with a lot of data source that are able to stream the same type of elements and I want to "map" each availiable element in the program (element order doesn't matter).
Therefore I've tried to reduce my Stream<Stream<T>> streamOfStreamOfT; into a simple Stream<T> streamOfT; using streamOfT = streamOfStreamOfT.reduce(Stream.empty(), Stream::concat);
Since element order is not important for me, I've tried to parallelize the reduce operation with a .parallel() : streamOfT = streamOfStreamOfT.parallel().reduce(Stream.empty(), Stream::concat); But this triggers an java.lang.IllegalStateException: stream has already been operated upon or closed
Example
To experience it yourself just play with the following main (java 1.8u20) by commenting / uncommenting the .parallel()
public static void main(String[] args) {
// GIVEN
List<Stream<Integer>> listOfStreamOfInts = new ArrayList<>();
for (int j = 0; j < 10; j++) {
IntStream intStreamOf10Ints = IntStream.iterate(0, i -> i + 1)
.limit(10);
Stream<Integer> genericStreamOf10Ints = StreamSupport.stream(
intStreamOf10Ints.spliterator(), true);
listOfStreamOfInts.add(genericStreamOf10Ints);
}
Stream<Stream<Integer>> streamOfStreamOfInts = listOfStreamOfInts
.stream();
// WHEN
Stream<Integer> streamOfInts = streamOfStreamOfInts
// ////////////////
// PROBLEM
// |
// V
.parallel()
.reduce(Stream.empty(), Stream::concat);
// THEN
System.out.println(streamOfInts.map(String::valueOf).collect(
joining(", ")));
}
Question
Can someone explain this limitation ? / find a better way of handling parallel reduction of stream of streams
Edit 1
Following #Smutje and #LouisWasserman comments it seems that .flatMap(Function.identity()) is a better option that tolerates .parallel() streams
The form of reduce you are using takes an identity value and an associative combining function. But Stream.empty() is not a value; it has state. Streams are not data structures like arrays or collections; they are carriers for pushing data through possibly-parallel aggregate operations, and they have some state (like whether the stream has been consumed or not.) Think about how this works; you're going to build a tree where the same "empty" stream appears in more than one leaf. When you try to use this stateful not-an-identity twice (which won't happen sequentially, but will happen in parallel), the second time you try and traverse through that empty stream, it will quite correctly be seen to be already used.
So the problem is, you're simply using this reduce method incorrectly. The problem is not with the parallelism; it is simply that the parallelism exposed the underlying problem.
Secondly, even if this "worked" the way you think it should, you would only get parallelism building the tree that represents the flattened stream-of-streams; when you go to do the joining, that's a sequential stream pipeline there. Ooops.
Thirdly, even if this "worked" the way you think it should, you're going to add a lot of element-access overhead by building up concatenated streams, and you're not going to get the benefit of parallelism that you are seeking.
The simple answer is to flatten the streams:
String joined = streamOfStreams.parallel()
.flatMap(s -> s)
.collect(joining(", "));

Categories

Resources