Aggregate functions in Java streams [duplicate] - java

This question already has answers here:
Java Stream API: why the distinction between sequential and parallel execution mode?
(4 answers)
Closed 6 years ago.
I don't know why I should use aggregate functions.
I mean, it is supposed that an aggregate function would parallelize the execution if improves the performance.
https://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html
But it is not true, according to the documentation, the code won't be parallel if you do not use a parallelStream() instead stream(), so
Why should I use a stream() if nothing goes better?
Shouldn't those codes be the same?
//it is not parallel
listOfIntegers.stream()
.forEach( e -> System.out.print(e+" "));
And
//it is parallel
listOfIntegers.parallelStream()
.forEach( e -> System.out.print(e+" "));

if you use stream, all data in your list will be processed in order, while if you use parallelStream your data might not be process in order.
consider method
static void test(Integer i){
try {
Thread.sleep((long) (1000*Math.random()));
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println(i);
}
and compare output from this method using parallelStream and stream

Related

Why stream don't throw RuntimeException? [duplicate]

This question already has answers here:
Why lambda inside map is not running?
(1 answer)
Why does Java8 Stream generate nothing?
(3 answers)
Closed last year.
Test method always throw RuntimeException. And I can catch the RuntimeException
void test() {
throw new RuntimeException();
}
System.out.println("start");
try {
test();
}
catch(RuntimeException e) {
System.out.println(e);
}
System.out.println("end");
This codes show the below result.
start
java.lang.RuntimeException
end
But, In a stream, I can't catch the RuntimeException.
System.out.println("start");
try {
nums.stream().map((num) -> {
test();
return null;
});
}
catch(RuntimeException e) {
System.out.println(e);
}
System.out.println("end");
This codes show the below result.
start
end
Why I can't catch the RuntimeException in stream.
As it says in the Javadoc of the java.util.stream package (emphasis added):
Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed.
map is an intermediate operation, and you don't have a terminal operation. As such, the pipeline source is not traversed.
Change map to forEach (and remove the return):
nums.stream().forEach((num) -> {
test();
});
(I assume also that nums is non-empty).

List of Strings is not running in parallel - Java 8 parallel streams

I got a requirement to run a collection using parallel stream and it is always running in sequence, here in the below example List is always running in sequence where as IntStream is running in parallel. could some please help me to understand the difference between running a parallel Stream on IntStream and parallel Stream on List<String>.
Also, can you help with the code snippet how to run List<String> in parallel similar to how IntStream is running parallel?
import java.util.List;
import java.util.stream.IntStream;
public class ParallelCollectionTest {
public static void main(String[] args) {
System.out.println("Parallel int stream testing.. ");
IntStream range2 = IntStream.rangeClosed(1, 5);
range2.parallel().peek(t -> {
System.out.println("before");
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}).forEachOrdered(System.out::println);
System.out.println("Parallel String collection testing.. ");
List<String> list = List.of("a","b","c","d");
list.stream().parallel().forEachOrdered(o ->
{
System.out.println("before");
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(o);
});
}
}
output of the above code is below.
Parallel int stream testing..
before
before
before
before
before
1
2
3
4
5
Parallel String collection testing..
before
a
before
b
before
c
before
d
The different behavior is not caused by the different streams (IntStream vs. Stream<String>).
The logic of your two stream pipelines is not the same.
In the IntStream snippet you are performing the sleep in the peek() call, which allows it to run in parallel for different elements, which is why that pipeline ends quickly.
In the Stream<String> snippet you are performing the sleep in the forEachOrdered, which means the sleep() for each element must be performed after the sleep() of the previous element ends. That's the behavior of forEachOrdered - This operation processes the elements one at a time, in encounterorder if one exists.
You can make the second snippet behave similar to the first if you add a peek() call to it:
System.out.println("Parallel String collection testing.. ");
List<String> list = List.of("a","b","c","d","e","f","g","h");
list.stream().parallel().peek(t -> {
System.out.println("before");
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
})
.forEachOrdered(System.out::println);
Now it will produce:
Parallel String collection testing..
before
before
before
before
a
b
c
d

How to run a method simultaneously for each element in a List

I want to call below calculation() simultaneously in calculateKalmanValues() for each element in the 'movingBeacons' list to reduce the processing time. I think java stream.parallel() is the ideal solution.
public void calculation(){
// do some think
}
// This is the method which I call calculation()
//for each element in the movingBeacons list simultaneously should called the calculation method
public void calculateKalmanValues() {
List<String> movingBeacons=incomingBtRssiRepository.movingBeacons();
movingBeacons.forEach.parallel()
}
By using java stream.parallel() or multithreading.
You could try using a parallel stream:
List<String> movingBeacons = incomingBtRssiRepository.movingBeacons();
movingBeacons.parallelStream()
.forEach(s -> s.calculation());
But note that parallel streams might not always speed up serial operations. See the documentation for caveats.

Should I use try-with-resource in flatMap for an I/O-based stream?

A Stream is an AutoCloseable and if I/O-based, should be used in a try-with-resource block. What about intermediate I/O-based streams which are inserted via flatMap()? Example:
try (var foos = foos()) {
return foos.flatMap(Foo::bars).toArray(Bar[]::new);
}
vs.
try (var foos = foos()) {
return foos.flatMap(foo -> {
try (var bars = foo.bars()) {
return bars;
}
}).toArray(Bar[]::new);
}
The flatMap() documentation says:
Each mapped stream is closed after its contents have been placed into this stream.
Well, that's the happy path. What if there happened an exception in between? Would that stream then stay unclosed and potentially leaking resources? Should I then always use a try-with-resource also for intermediate streams?
There is no sense in a construct like
return foos.flatMap(foo -> {
try (var bars = foo.bars()) {
return bars;
}
}).toArray(Bar[]::new);
as that would close the stream before it is returned to the caller, which makes the sub-stream entirely unusable.
In fact, it is impossible for the function’s code to ensure that the closing will happen at the appropriate place, which is outside the function. That’s surely the reason why the API designers decided that you don’t have to, and the Stream implementation will take care.
This also applies to the exceptional case. The Stream still ensures that the stream gets closed, once the function has returned it to the Stream:
try {
IntStream.range(1, 3)
.flatMap(i -> {
System.out.println("creating "+i);
return IntStream.range('a', 'a'+i)
.peek(j -> {
System.out.println("processing sub "+i+" - "+(char)j);
if(j=='b') throw new IllegalStateException();
})
.onClose(() -> System.out.println("closing "+i));
})
.forEach(i -> System.out.println("consuming "+(char)i));
} catch(IllegalStateException ex) {
System.out.println("caught "+ex);
}
creating 1
processing sub 1 - a
consuming a
closing 1
creating 2
processing sub 2 - a
consuming a
processing sub 2 - b
closing 2
caught java.lang.IllegalStateException
You may play with the conditions, to see that a constructed Stream is always closed. For elements of the outer Stream which do not get processed, there will be no Stream at all.
For a Stream operation like .flatMap(Foo::bars) or .flatMap(foo -> foo.bars()), you can assume that once bars() successfully created and returned a Stream, it will be passed to the caller for sure and properly closed.
A different scenario would be mapping functions which perform operations after the Stream creation which could fail, e.g.
.flatMap(foo -> {
Stream<Type> s = foo.bar();
anotherOperation(); // Stream is not closed if this throws
return s;
})
In this case, it would be necessary to ensure the closing in the exceptional case, but only in the exceptional case:
.flatMap(foo -> {
Stream<Type> s = foo.bar();
try {
anotherOperation();
} catch(Throwable t) {
try(s) { throw t; } // close and do addSuppressed if follow-up error
}
return s;
})
but obviously, you should follow the general rule to keep lambdas simple, in which case you don’t need such protection.
In Stream or not, you have to close the IO resources at the relevant place.
The flatMap() method is a general stream method and so it not aware of IO resources you opened inside it.
But Why flatMap() would behave differently from any method that manipulates IO resources ?
For example if you manipulate IO in map(), you could get the same issue (no releasing resource) if an exception occurs.
Closing a stream (as in flatMap()) will not make it release all resources opened in the stream operation.
Some methods do that, File.lines(Path) for example. But if you open yourself some resources in flatMap(), the closing of these resources will not do automatically when the stream is closed.
For example here the flatMap processing doesn't close the FileInputStreams opened :
...
.stream()
.flatMap(foo -> {
try {
FileInputStream fileInputStream = new FileInputStream("..."));
//...
}
catch (IOException e) {
// handle
}
})
You have to close it explicitly :
...
.stream()
.flatMap(foo -> {
try (FileInputStream fileInputStream = new FileInputStream("...")){
//...
} catch (IOException e) {
// handle
}
// return
})
So yes if the statements used inside the flatMap() or any method manipulates some IO resources, you want to close them in any case by surrounding it with a try-with-resources statement to make them free.

Java 8 parallel streams don't appear to actually be working in parallel

I'm trying to use Java 8's parallelStream() to execute several long-running requests (eg web requests) in parallel. Simplified example:
List<Supplier<Result>> myFunctions = Arrays.asList(() -> doWebRequest(), ...)
List<Result> results = myFunctions.parallelStream().map(function -> function.get()).collect(...
So if there are two functions that block for 2 and 3 seconds respectively, I'd expect to get the result after 3 seconds. However, it really takes 5 seconds - ie it seems the functions are being executed in sequence and not in parallel. Am I doing something wrong?
edit: This is an example. The time taken is ~4000 milliseconds when I want it to be ~2000.
long start = System.currentTimeMillis();
Map<String, Supplier<String>> input = new HashMap<String, Supplier<String>>();
input.put("1", () -> {
try {
Thread.sleep(2000);
} catch (InterruptedException e) {
e.printStackTrace();
}
return "a";
});
input.put("2", () -> {
try {
Thread.sleep(2000);
} catch (InterruptedException e) {
e.printStackTrace();
}
return "b";
});
Map<String, String> results = input.keySet().parallelStream().collect(Collectors.toConcurrentMap(
key -> key,
key -> {
return input.get(key).get();
}));
System.out.println("Time: " + (System.currentTimeMillis() - start));
}
Doesn't make any difference if I iterate over the entrySet() instead of the keySet()
edit: changing the parallel part to the following also does not help:
Map<String, String> results = input.entrySet().parallelStream().map(entry -> {
return new ImmutablePair<String, String>(entry.getKey(), entry.getValue().get());
}).collect(Collectors.toConcurrentMap(Pair::getLeft, Pair::getRight));
When executing in parallel, there is overhead of decomposing the input set, creating tasks to represent the different portions of the calculation, distributing the actions across threads, waiting for results, combining results, etc. This is over and above the work of actually solving the problem. If a parallel framework were to always decompose problems down to a granularity of one element, for most problems, these overheads would overwhelm the actual computation and parallelism would result in a slower execution. So parallel frameworks have some latitude to decide how finely to decompose the input, and that's what's happening here.
In your case, your input set is simply too small to be decomposed. So the library chooses to execute sequentially.
Try this on your four-core system: compare
IntStream.range(0, 100_000).sum()
vs
IntStream.range(0, 100_000).parallel().sum()
Here, you're giving it enough input that it will be confident it can win through parallel execution. If you measure with a responsible measurement methodology (say, the JMH microbenchmark harness), you'll probably see an almost-linear speedup between these two examples.

Categories

Resources