parallel() not work with the 3-arg iterate() - java

Here is my code:
int count=20;
Stream.iterate(4,i->i<count,i->i+1).parallel().forEach(i -> {
try {
Thread.sleep(new Random().nextInt(2000));
} catch (InterruptedException ignored) {
}
System.out.println(i);
});
I expect this will go parallel, but!
When i run it, parallel() seems "dead", numbers from 4 to 19 comes with a "perfect" sequence which was not i want.
So i modified the "iterate" part like this:
.iterate(4,i->i+1).limit(count-4)
There comes fixed, the parallel() works again.
So, Why??? The iterate-limit-parallel combo seems too silly.
...Or in fact, there may be a reason?
please help me.
THX!
ps: I print the thread ID, in the first case, that always printing "main". in the second case, at least 5 threads shows up.
pps: I've tried 2000 as a big number during this weird stuck a few minutes ago. now even with several larger num(even using Integer.MAX_VALUE). Still stucks.

It's very hard to construct a meaningful spliterator for a stream which is generated by iterate() with 3 arguments. For example, if you have a simple stream of integers from range from 1 to 1000 you can easily split this stream into two streams, one from 1 to 500 and another from 501 to 1000 and process them in parallel. But consider this example:
Stream.iterate(4,i->i<Integer.MAX_VALUE,i->i+ThreadLocalRandom.current().nextInteger())
How would you make this stream parallel? Which number should the second thread start from? There is no way to effectively split this stream into parts without calculating all its elements first.
When you add limit() to the stream you actually buffer the results of the previous stream, so limit() can provide a non-trivial spliterator for it's internal buffer. However, neither of the streams are guaranteed to do something special in parallel streams, parallel() is just a "recommendation".

Related

Reactor Flux: how to parse file with header

Recently I started using project reactor 3.3 and I don't know what is the best way to handle flux of lines, with first line as column names, then use those column names to process/convert all other lines. Right now I'm doing this way:
Flux<String> lines = ....;
Mono<String[]> columns = Mono.from(lines.take(1).map(header -> header.split(";"))); //getting first line
Flux<SomeDto> objectFlux = lines.skip(1) //skip first line
.flatMapIterable(row -> //iterating over lines
columns.map(cols -> convert(cols, row))); //convert line into SomeDto object
So is it the right way?
So is it the right way?
There's always more than one way to cook an egg - but the code you have there seems odd / suboptimal for two main reasons:
I'd assume it's one line per record / DTO you want to extract, so it's a bit odd you're using flatMapIterable() rather than flatMap()
You're going to resubscribe to lines once for each line, when you re-evaluate that Mono. That's almost certainly not what you want to do. (Caching the Mono helps, but you'd still resubscribe at least twice.)
Instead you may want to look at using switchOnFirst(), which will enable you to dynamically transform the Flux based on the first element (the header in your case.) This means you can do something like so:
lines
.switchOnFirst((signal, flux) -> flux.zipWith(Flux.<String[]>just(signal.get().split(";")).repeat()))
.map(row -> convert(row.getT1(), row.getT2()))
Note this is a bear-bones example, in real-world use you'll need to check whether the signal actually has a value as per the docs:
Note that the source might complete or error immediately instead of emitting, in which case the Signal would be onComplete or onError. It is NOT necessarily an onNext Signal, and must be checked accordingly.

Java performance issue: Need to iterate more than 8 million records with a target-branch check

We have a system that processes flat-file and (with a couple of validations only) inserts into database.
This code:
//There can be 8 million lines-of-codes
for(String line: lines){
if (!Class.isBranchNoValid(validBranchNoArr, obj.branchNo)){
continue;
}
list.add(line);
}
definition of isBranchNoValid:
//the array length ranges from 2 to 5 only
public static boolean isBranchNoValid(String[] validBranchNoArr, String branchNo) {
for (int i = 0; i < validBranchNoArr.length; i++) {
if (validBranchNoArr[i].equals(branchNo)) {
return true;
}
}
return false;
}
The validation is at line-level (we have to filter or skip the line that doesn't have a branchNo in the array). Earlier, this wasn't (filter) the case.
Now, high-performance degradation is troubling us.
I understand (may be, I am wrong) that this repeated function call is causing a lot of stack creation resulting in a very high GC invocation.
I can't figure out a way (is it even possible) to perform this filter without this high cost of performance degradation (a little difference is fine).
This is not a stack problem for sure, because your function is not recursive nothing is kept in the stack between calls; after each call the variables are erased since they are not needed anymore.
You can put the valid numbers in a set and use that one for some optimization but in your case I am not sure it will bring any benefits at all since you have at most 5 elements.
So there are several possible bottlenecks in your scenario.
reading the lines of the file
Parse the line to construct the object to insert into the database
check the applicability of the object (ie branch no filter)
insert into the db
Generally, you'd say IO is the slowest, so 1. and 2. You're saying nothing except 2. changed, right? That is weird.
Anyway, if you want to optimize that, I wouldn't be passing the array around 8 million times, and I wouldn't iterate it every time either. Since your valid branches are known, create a HashSet from it - it has O(1) access.
Set<String> validBranches = Arrays.stream(branches)
.collect(Collectors.toCollection(HashSet::new));
Then, iterate the lines
for (String line : lines) {
YourObject obj = parse(line);
if (validBranches.contains(obj.branchNo)) {
writeToDb(obj);
}
}
or, in the stream version
Files.lines(yourPath)
.map(this::parse)
.filter(o -> validBranches.contains(o.branchNo))
.forEach(this::writeToDb);
I'd also check if it isn't more efficient to first collect a batch of objects, then write to db. Also, it's possible that handling the lines in parallel gains some speed, in case the parsing is time intensive.

RxJava 2 operator combination that delivers only the first Value from a bunch

The idea is when I call publishSubject.onNext(someValue) multiple times I need to get only one value like debounce operator does, but it delivers the last value, and I need to skip all values except first in a bunch till I stop calling onNext() for 1 sec.
I've tried to use something like throttleFirst(1000,TimeUnit.MILLISECONDS) but it's not working like debounce, it just makes windows after every delivery and after 1 sec immediate deliver next value.
Try this:
// Observable<T> stream = ...;
stream.window(stream.debounce(1, TimeUnit.Seconds))
.flatMap(w -> w.take(1));
Explanation: If I understand you correctly, you want to emit items if none have been emitted for 1 second prior. This is equivalent to getting the first element following an item debounced by 1 second. The below marble diagram may also help:
You can use the first operator. Like:
Observable.first()
It will take only the first value

Difference between traditional imperative style of programming and functional style of programming

I have a problem statement here
what I need to do it iterate over a list find the first integer which is greater than 3 and is even then just double it and return it.
These are some methods to check how many operations are getting performed
public static boolean isGreaterThan3(int number){
System.out.println("WhyFunctional.isGreaterThan3 " + number);
return number > 3;
}
public static boolean isEven(int number){
System.out.println("WhyFunctional.isEven " + number);
return number % 2 == 0;
}
public static int doubleIt(int number){
System.out.println("WhyFunctional.doubleIt " + number);
return number << 1;
}
with java 8 streams I could do it like
List<Integer> integerList = Arrays.asList(1, 2, 3, 5, 4, 6, 7, 8, 9, 10);
integerList.stream()
.filter(WhyFunctional::isGreaterThan3)
.filter(WhyFunctional::isEven)
.map(WhyFunctional::doubleIt)
.findFirst();
and the output is
WhyFunctional.isGreaterThan3 1
WhyFunctional.isGreaterThan3 2
WhyFunctional.isGreaterThan3 3
WhyFunctional.isGreaterThan3 5
WhyFunctional.isEven 5
WhyFunctional.isGreaterThan3 4
WhyFunctional.isEven 4
WhyFunctional.doubleIt 4
Optional[8]
so total 8 operations.
And with imperative style or before java8 I could code it like
for (Integer integer : integerList) {
if(isGreaterThan3(integer)){
if(isEven(integer)){
System.out.println(doubleIt(integer));
break;
}
}
}
and the output is
WhyFunctional.isGreaterThan3 1
WhyFunctional.isGreaterThan3 2
WhyFunctional.isGreaterThan3 3
WhyFunctional.isGreaterThan3 5
WhyFunctional.isEven 5
WhyFunctional.isGreaterThan3 4
WhyFunctional.isEven 4
WhyFunctional.doubleIt 4
8
and operations are same. So my question is what difference does it make if I am using streams rather traditional for loop.
Stream API introduces the new idea of streams which allows you to decouple the task in a new way. For example, based on your task it's possible that you want to do different things with the doubled even numbers greater than three. In some place you want to find the first one, in other place you need 10 such numbers, in third place you want to apply more filtering. You can encapsulate the algorithm of finding such numbers like this:
static IntStream numbers() {
return IntStream.range(1, Integer.MAX_VALUE)
.filter(WhyFunctional::isGreaterThan3)
.filter(WhyFunctional::isEven)
.map(WhyFunctional::doubleIt);
}
Here it is. You've just created an algorithm to generate such numbers (without generating them) and you don't care how they will be used. One user might call:
int num = numbers().findFirst().get();
Other user might need to get 10 such numbers:
int[] tenNumbers = numbers().limit(10).toArray();
Third user might want to find the first matching number which is also divisible by 7:
int result = numbers().filter(n -> n % 7 == 0).findFirst().get();
It would be more difficult to encapsulate the algorithm in traditional imperative style.
In general the Stream API is not about the performance (though parallel streams may work faster than traditional solution). It's about the expressive power of your code.
The imperative style complects the computational logic with the mechanism used to achieve it (iteration). The functional style, on the other hand, decomplects the two. You code against an API to which you supply your logic and the API has the freedom to choose how and when to apply it.
In particular, the Streams API has two ways how to apply the logic: either sequentially or in parallel. The latter is actually the driving force behind the introduction of both lambdas and the Streams API itself into Java.
The freedom to choose when to perform computation gives rise to laziness: whereas in the imperative style you have a concrete collection of data, in the functional style you can have a collection paired with logic to transform it. The logic can be applied "just in time", when you actually consume the data. This further allows you to spread the building up of computation: each method can receive a stream and apply a further step of computation on it, or it can consume it in different ways (by collecting into a list, by finding just the first item and never applying computation to the rest, but calculating an aggregate value, etc.).
As a particular example of the new opportunities offered by laziness, I was able to write a Spring MVC controller which returned a Stream whose data source was a database—and at the time I return the stream, the data is still in the database. Only the View layer will pull the data, implicitly applying the transformation logic it has no knowledge of, never having to retain more than a single stream element in memory. This converted a solution which classically had O(n) space complexity into O(1), thus becoming insensitive to the size of the result set.
Using the Stream API you are describing an operation instead of implementing it. One commonly known advantage of letting the Stream API implement the operation is the option of using different execution strategies like parallel execution (as already said by others).
Another feature which seems to be a bit underestimated is the possibility to alter the operation itself in a way that is impossible to do in an imperative programming style as that would imply modifying the code:
IntStream is=IntStream.rangeClosed(1, 10).filter(i -> i > 4);
if(evenOnly) is=is.filter(i -> (i&1)==0);
if(doubleIt) is=is.map(i -> i<<1);
is.findFirst().ifPresent(System.out::println);
Here, the decision whether to filter out odd numbers or double the result is made before the terminal operation is commenced. In an imperative programming you either have to recheck the flags within the loop or code multiple alternative loops. It should be mentioned that checking such conditions within a loop isn’t that bad on today’s JVM as the optimizer is capable of moving them out of the loop at runtime, so coding multiple loops is usually unnecessary.
But consider the following example:
Stream<String> s = Stream.of("java8 streams", "are cool");
if(singleWords) s=s.flatMap(Pattern.compile("\\s")::splitAsStream);
s.collect(Collectors.groupingBy(str->str.charAt(0)))
.forEach((k,v)->System.out.println(k+" => "+v));
Since flatMap is the equivalent of a nested loop, coding the same in an imperative style isn’t that simple any more as we have either a simple loop or a nested loop based on a runtime value. Usually, you have to resort to splitting the code into multiple methods if you want to share it between both kind of loops.
I already encountered a real-life example where the composition of a complex operation had multiple conditional flatMap steps. The equivalent imperative code is insane…
1) Functional approach allows more declarative way of programming: you just provide a list of functions to apply and don't need to write iterations manually, so your code is more consine sometimes.
2) If you switch to parallel stream (https://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html) it will be possible to automatically convert your program to parallel and execute it faster. It is possbile because you don't explicitly code iteration, just list what functions to apply, so compiler/runtime may parallel it.
In this simple example, there is little difference, and the JVM will try to do the same amount of work in each case.
Where you start to see a difference is in more complicated examples like
integerList.parallelStream()
making the code concurrent for a loop is much harder. Note: you wouldn't actually do this as the overhead would to high and you only want the first element.
BTW The first example returns the result and the second prints.

For loop with no parameters in Java

I am looking at somebody else's code and I found this piece of code:
for (;;) {
I'm not a Java expert; what is this line of code doing?
At first, I thought it would be creating an infinite loop, but in the very SAME class this programmer uses
while(true)
Which (correct me if I'm wrong) IS an infinite loop. Are these two identical? Why would somebody change their method to repeat the same process?
Any insight would help,
Thanks!
Remember the three clauses of the for() are [1] initialization [2] termination and [3] increment. Since the termination clause is empty the loop never terminates. This is directly taken from C syntax.
Those two lines would have the same effect. I can't think of a good reason to use the first one unless you like to confuse people. I guess it's less characters.
they are entirely the same the only real difference would be either preference (the for construct can be typed marginally faster)
or the for indicates that is is some iteration that is broken out of by a break or return and a while loop indicates a repeating section of the same thing until a meaningful result appears

Categories

Resources