Java 8 Nested ParallelStream not working Properly - java

package com.spse.pricing.client.main;
import java.util.stream.IntStream;
public class NestedParalleStream {
int total = 0;
public static void main(String[] args) {
NestedParalleStream nestedParalleStream = new NestedParalleStream();
nestedParalleStream.test();
}
void test(){
try{
IntStream stream1 = IntStream.range(0, 2);
stream1.parallel().forEach(a ->{
IntStream stream2 = IntStream.range(0, 2);
stream2.parallel().forEach(b ->{
IntStream stream3 = IntStream.range(0, 2);
stream3.parallel().forEach(c ->{
//2 * 2 * 2 = 8;
total ++;
});
});
});
//It should display 8
System.out.println(total);
}catch(Exception e){
e.printStackTrace();
}
}
}
Pls help how to customize parallestream to make sure we will get consistency results.

Since multiple threads are incrementing total, you must declare it volatile to avoid race conditions
Edit: volatile makes read / write operations atomic, but total++ requires mores than one operation. For that reason, you should use an AtomicInteger:
AtomicInteger total = new AtomicInteger();
...
total.incrementAndGet();

Problem in statement total ++; it is invoked in multiple threads simultaneously.
You should protect it with synchronized or use AtomicInteger

LongAdder or LongAccumulator are preferable to AtomicLong or AtomicInteger where multiple threads are mutating the value and it's intended to be read relatively few times, such as once at the end of the computation. The adder/accumulator objects avoid contention problems that can occur with the atomic objects. (There are corresponding adder/accumulator objects for double values.)
There is usually a way to rewrite accumulations using reduce() or collect(). These are often preferable, especially if the value being accumulated (or collected) isn't a long or a double.

There is a major problem regarding mutability with the way you are solving it. A better way to solve it the way you want would be as follows:
int total = IntStream.range(0,2)
.parallel()
.map(i -> {
return IntStream.range(0,2)
.map(j -> {
return IntStream.range(0,2)
.map(k -> i * j * k)
.reduce(0,(acc, val) -> acc + 1);
}).sum();
}).sum();

Related

Memory Visibility in Heap [duplicate]

I'm working on an application, that has uses a HashMap to share state. I need to prove via unit tests that it will have problems in a multi-threaded environment.
I tried to check the state of the application in a single thread environment and in a multi-threaded environment via checking the size and elements of the HashMap in both of them. But seems this doesn't help, the state is always the same.
Are there any other ways to prove it or prove that an application that performs operations on the map works well with concurrent requests?
This is quite easy to prove.
Shortly
A hash map is based on an array, where each item represents a bucket. As more keys are added, the buckets grow and at a certain threshold the array is recreated with a bigger size so that its buckets are spread more evenly (performance considerations). During the array recreation, the array becomes empty, which results in empty result for the caller, until the recreation completes.
Details and Proof
It means that sometimes HashMap#put() will internally call HashMap#resize() to make the underlying array bigger.
HashMap#resize() assigns the table field a new empty array with a bigger capacity and populates it with the old items. While this population happens, the underlying array doesn't contain all of the old items and calling HashMap#get() with an existing key may return null.
The following code demonstrates that. You are very likely to get the exception that will mean the HashMap is not thread safe. I chose the target key as 65 535 - this way it will be the last element in the array, thus being the last element during re-population which increases the possibility of getting null on HashMap#get() (to see why, see HashMap#put() implementation).
final Map<Integer, String> map = new HashMap<>();
final Integer targetKey = 0b1111_1111_1111_1111; // 65 535
final String targetValue = "v";
map.put(targetKey, targetValue);
new Thread(() -> {
IntStream.range(0, targetKey).forEach(key -> map.put(key, "someValue"));
}).start();
while (true) {
if (!targetValue.equals(map.get(targetKey))) {
throw new RuntimeException("HashMap is not thread safe.");
}
}
One thread adds new keys to the map. The other thread constantly checks the targetKey is present.
If count those exceptions, I get around 200 000.
It is hard to simulate Race but looking at the OpenJDK source for put() method of HashMap:
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
//Operation 1
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
//Operation 2
modCount++;
//Operation 3
addEntry(hash, key, value, i);
return null;
}
As you can see put() involves 3 operations which are not synchronized. And compound operations are non thread safe. So theoretically it is proven that HashMap is not thread safe.
Its an old thread. But just pasting my sample code which is able to demonstrate the problems with hashmap.
Take a look at the below code, we try to insert 30000 Items into the hashmap using 10 threads (3000 items per thread).
So after all the threads are completed, you should ideally see that the size of hashmap should be 30000. But the actual output would be either an exception while rebuilding the tree or the final count is less than 30000.
class TempValue {
int value = 3;
#Override
public int hashCode() {
return 1; // All objects of this class will have same hashcode.
}
}
public class TestClass {
public static void main(String args[]) {
Map<TempValue, TempValue> myMap = new HashMap<>();
List<Thread> listOfThreads = new ArrayList<>();
// Create 10 Threads
for (int i = 0; i < 10; i++) {
Thread thread = new Thread(() -> {
// Let Each thread insert 3000 Items
for (int j = 0; j < 3000; j++) {
TempValue key = new TempValue();
myMap.put(key, key);
}
});
thread.start();
listOfThreads.add(thread);
}
for (Thread thread : listOfThreads) {
thread.join();
}
System.out.println("Count should be 30000, actual is : " + myMap.size());
}
}
Output 1 :
Count should be 30000, actual is : 29486
Output 2 : (Exception)
java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNodejava.lang.ClassCastException: java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode
at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1819)
at java.util.HashMap$TreeNode.treeify(HashMap.java:1936)
at java.util.HashMap.treeifyBin(HashMap.java:771)
at java.util.HashMap.putVal(HashMap.java:643)
at java.util.HashMap.put(HashMap.java:611)
at TestClass.lambda$0(TestClass.java:340)
at java.lang.Thread.run(Thread.java:745)
However if you modify the line Map<TempValue, TempValue> myMap = new HashMap<>(); to a ConcurrentHashMap the output is always 30000.
Another Observation :
In the above example the hashcode for all objects of TempValue class was the same(** i.e., 1**). So you might be wondering, this issue with HashMap might occur only in case there is a collision (due to hashcode).
I tried another example.
Modify the TempValue class to
class TempValue {
int value = 3;
}
Now re-execute the same code. Out of every 5 runs, I see 2-3 runs still give a different output than 30000.
So even if you usually don't have much collisions, you might still end up with an issue. (Maybe due to rebuilding of HashMap, etc.)
Overall these examples show the issue with HashMap which ConcurrentHashMap handles.
I need to prove via unit tests that it will have problems in multithread environment.
This is going to be tremendously hard to do. Race conditions are very hard to demonstrate. You could certainly write a program which does puts and gets into a HashMap in a large number of threads but logging, volatile fields, other locks, and other timing details of your application may make it extremely hard to force your particular code to fail.
Here's a stupid little HashMap failure test case. It fails because it times out when the threads go into an infinite loop because of memory corruption of HashMap. However, it may not fail for you depending on number of cores and other architecture details.
#Test(timeout = 10000)
public void runTest() throws Exception {
final Map<Integer, String> map = new HashMap<Integer, String>();
ExecutorService pool = Executors.newFixedThreadPool(10);
for (int i = 0; i < 10; i++) {
pool.submit(new Runnable() {
#Override
public void run() {
for (int i = 0; i < 10000; i++) {
map.put(i, "wow");
}
}
});
}
pool.shutdown();
pool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
}
Is reading the API docs enough? There is a statement in there:
Note that this implementation is not synchronized. If multiple threads
access a hash map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally. (A
structural modification is any operation that adds or deletes one or
more mappings; merely changing the value associated with a key that an
instance already contains is not a structural modification.) This is
typically accomplished by synchronizing on some object that naturally
encapsulates the map. If no such object exists, the map should be
"wrapped" using the Collections.synchronizedMap method. This is best
done at creation time, to prevent accidental unsynchronized access to
the map:
The problem with thread safety is that it's hard to prove through a test. It could be fine most of the times. Your best bet would be to just run a bunch of threads that are getting/putting and you'll probably get some concurrency errors.
I suggest using a ConcurrentHashMap and trust that the Java team saying that HashMap is not synchronized is enough.
Are there any other ways to prove it?
How about reading the documentation (and paying attention to the emphasized "must"):
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally
If you are going to attempt to write a unit test that demonstrates incorrect behavior, I recommend the following:
Create a bunch of keys that all have the same hashcode (say 30 or 40)
Add values to the map for each key
Spawn a separate thread for the key, which has an infinite loop that (1) asserts that the key is present int the map, (2) removes the mapping for that key, and (3) adds the mapping back.
If you're lucky, the assertion will fail at some point, because the linked list behind the hash bucket will be corrupted. If you're unlucky, it will appear that HashMap is indeed threadsafe despite the documentation.
It may be possible, but will never be a perfect test. Race conditions are just too unpredictable. That being said, I wrote a similar type of test to help fix a threading issue with a proprietary data structure, and in my case, it was much easier to prove that something was wrong (before the fix) than to prove that nothing would go wrong (after the fix). You could probably construct a multi-threaded test that will eventually fail with sufficient time and the right parameters.
This post may be helpful in identifying areas to focus on in your test and has some other suggestions for optional replacements.
You can create multiple threads each adding an element to a hashmap and iterating over it.
i.e. In the run method we have to use "put" and then iterate using iterator.
For the case of HashMap we get ConcurrentModificationException while for ConcurrentHashMap we dont get.
Most probable race condition at java.util.HashMap implementation
Most of hashMaps failing if we are trying to read values while resizing or rehashing step executing. Resizing and rehashing operation executed under certain conditions most commonly if exceed bucket threshold. This code proves that if I call resizing externally or If I put more element than threshold and tend to call resizing operation internally causes to some null read which shows that HashMap is not thread safe. There should be more race condition but it is enough to prove it is not Thread Safe.
Practically proof of race condition
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.util.HashMap;
import java.util.Map;
import java.util.stream.IntStream;
public class HashMapThreadSafetyTest {
public static void main(String[] args) {
try {
(new HashMapThreadSafetyTest()).testIt();
} catch (Exception e) {
e.printStackTrace();
}
}
private void threadOperation(int number, Map<Integer, String> map) {
map.put(number, "hashMapTest");
while (map.get(number) != null);
//If code passes to this line that means we did some null read operation which should not be
System.out.println("Null Value Number: " + number);
}
private void callHashMapResizeExternally(Map<Integer, String> map)
throws NoSuchMethodException, InvocationTargetException, IllegalAccessException {
Method method = map.getClass().getDeclaredMethod("resize");
method.setAccessible(true);
System.out.println("calling resize");
method.invoke(map);
}
private void testIt()
throws InterruptedException, NoSuchMethodException, IllegalAccessException, InvocationTargetException {
final Map<Integer, String> map = new HashMap<>();
IntStream.range(0, 12).forEach(i -> new Thread(() -> threadOperation(i, map)).start());
Thread.sleep(60000);
// First loop should not show any null value number untill calling resize method of hashmap externally.
callHashMapResizeExternally(map);
// First loop should fail from now on and should print some Null Value Numbers to the out.
System.out.println("Loop count is 12 since hashmap initially created for 2^4 bucket and threshold of resizing"
+ "0.75*2^4 = 12 In first loop it should not fail since we do not resizing hashmap. "
+ "\n\nAfter 60 second: after calling external resizing operation with reflection should forcefully fail"
+ "thread safety");
Thread.sleep(2000);
final Map<Integer, String> map2 = new HashMap<>();
IntStream.range(100, 113).forEach(i -> new Thread(() -> threadOperation(i, map2)).start());
// Second loop should fail from now on and should print some Null Value Numbers to the out. Because it is
// iterating more than 12 that causes hash map resizing and rehashing
System.out.println("It should fail directly since it is exceeding hashmap initial threshold and it will resize"
+ "when loop iterate 13rd time");
}
}
Example output
No null value should be printed untill thread sleep line passed
calling resize
Loop count is 12 since hashmap initially created for 2^4 bucket and threshold of resizing0.75*2^4 = 12 In first loop it should not fail since we do not resizing hashmap.
After 60 second: after calling external resizing operation with reflection should forcefully failthread safety
Null Value Number: 11
Null Value Number: 5
Null Value Number: 6
Null Value Number: 8
Null Value Number: 0
Null Value Number: 7
Null Value Number: 2
It should fail directly since it is exceeding hashmap initial threshold and it will resizewhen loop iterate 13th time
Null Value Number: 111
Null Value Number: 100
Null Value Number: 107
Null Value Number: 110
Null Value Number: 104
Null Value Number: 106
Null Value Number: 109
Null Value Number: 105
Very Simple Solution to prove this
Here is the code, which proves the Hashmap implementation is not thread safe.
In this example, we are only adding the elements to map, not removing it from any method.
We can see that it prints the keys which are not in map, even though we have put the same key in map before doing get operation.
package threads;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class HashMapWorkingDemoInConcurrentEnvironment {
private Map<Long, String> cache = new HashMap<>();
public String put(Long key, String value) {
return cache.put(key, value);
}
public String get(Long key) {
return cache.get(key);
}
public static void main(String[] args) {
HashMapWorkingDemoInConcurrentEnvironment cache = new HashMapWorkingDemoInConcurrentEnvironment();
class Producer implements Callable<String> {
private Random rand = new Random();
public String call() throws Exception {
while (true) {
long key = rand.nextInt(1000);
cache.put(key, Long.toString(key));
if (cache.get(key) == null) {
System.out.println("Key " + key + " has not been put in the map");
}
}
}
}
ExecutorService executorService = Executors.newFixedThreadPool(4);
System.out.println("Adding value...");
try {
for (int i = 0; i < 4; i++) {
executorService.submit(new Producer());
}
} finally {
executorService.shutdown();
}
}
}
Sample Output for a execution run
Adding value...
Key 611 has not been put in the map
Key 978 has not been put in the map
Key 35 has not been put in the map
Key 202 has not been put in the map
Key 714 has not been put in the map
Key 328 has not been put in the map
Key 606 has not been put in the map
Key 149 has not been put in the map
Key 763 has not been put in the map
Its strange to see the values printed, that's why hashmap is not thread safe implementation working in concurrent environment.
There is a great tool open sourced by the OpenJDK team called JCStress which is used in the JDK for concurrency testing.
https://github.com/openjdk/jcstress
In one of its sample: https://github.com/openjdk/jcstress/blob/master/tests-custom/src/main/java/org/openjdk/jcstress/tests/collections/HashMapFailureTest.java
#JCStressTest
#Outcome(id = "0, 0, 1, 2", expect = Expect.ACCEPTABLE, desc = "No exceptions, entire map is okay.")
#Outcome(expect = Expect.ACCEPTABLE_INTERESTING, desc = "Something went wrong")
#State
public class HashMapFailureTest {
private final Map<Integer, Integer> map = new HashMap<>();
#Actor
public void actor1(IIII_Result r) {
try {
map.put(1, 1);
r.r1 = 0;
} catch (Exception e) {
r.r1 = 1;
}
}
#Actor
public void actor2(IIII_Result r) {
try {
map.put(2, 2);
r.r2 = 0;
} catch (Exception e) {
r.r2 = 1;
}
}
#Arbiter
public void arbiter(IIII_Result r) {
Integer v1 = map.get(1);
Integer v2 = map.get(2);
r.r3 = (v1 != null) ? v1 : -1;
r.r4 = (v2 != null) ? v2 : -1;
}
}
The methods marked with actor are run concurrently on different threads.
The result for this on my machine is:
Results across all configurations:
RESULT SAMPLES FREQ EXPECT DESCRIPTION
0, 0, -1, 2 3,854,896 5.25% Interesting Something went wrong
0, 0, 1, -1 4,251,564 5.79% Interesting Something went wrong
0, 0, 1, 2 65,363,492 88.97% Acceptable No exceptions, entire map is okay.
This shows that 88% of the times expected values were observed but in around 12% of the times, incorrect results were seen.
You can try out this tool and the samples and write your own tests to verify that concurrency of some code is broken.
As a yet another reply to this topic, I would recommend example from https://www.baeldung.com/java-concurrent-map, that looks as below. Theory is very straigthforwad - for N times we run 10 threads, that each of them increments the value in a common map 10 times. If the map was thread safe, the value should be 100 every time. Example proves, it's not.
#Test
public void givenHashMap_whenSumParallel_thenError() throws Exception {
Map<String, Integer> map = new HashMap<>();
List<Integer> sumList = parallelSum100(map, 100);
assertNotEquals(1, sumList
.stream()
.distinct()
.count());
long wrongResultCount = sumList
.stream()
.filter(num -> num != 100)
.count();
assertTrue(wrongResultCount > 0);
}
private List<Integer> parallelSum100(Map<String, Integer> map,
int executionTimes) throws InterruptedException {
List<Integer> sumList = new ArrayList<>(1000);
for (int i = 0; i < executionTimes; i++) {
map.put("test", 0);
ExecutorService executorService =
Executors.newFixedThreadPool(4);
for (int j = 0; j < 10; j++) {
executorService.execute(() -> {
for (int k = 0; k < 10; k++)
map.computeIfPresent(
"test",
(key, value) -> value + 1
);
});
}
executorService.shutdown();
executorService.awaitTermination(5, TimeUnit.SECONDS);
sumList.add(map.get("test"));
}
return sumList;
}

How can I use Java Stream to reduce with this class structure?

This is the example of the class I'm working on
public class TestReduce
{
private static Set<Integer> seed = ImmutableSet.of(1, 2);
public static void main(String args[]) {
List<Accumulator> accumulators = ImmutableList.of(new Accumulator(ImmutableSet.of(5, 6)), new Accumulator(ImmutableSet.of(7, 8)));
accumulators.stream()
.forEach(a -> {
seed = a.combineResult(seed);
});
System.out.println(seed);
}
}
class Accumulator
{
public Accumulator(Set<Integer> integers)
{
accumulatedNumbers = integers;
}
public Set<Integer> combineResult(Set<Integer> numbers) {
// Do some manipulation for the numbers
return (the new numbers);
}
private Set<Integer> accumulatedNumbers;
}
I would like to reduce all of the Accumulators to just a set of numbers but with the initial value. However, I cannot change the signature of the method combineResult. In the example, I did this by just using forEach but I'm not sure if there's a cleaner way or java stream way to achieve this? I tried using reduce but I couldn't quite get the arguments of the reduce right.
(Answer for the original question)
This doesn't seem like a good approach. You're just unioning some sets.
If you can't change the signature of combineResult, you can do:
ImmutableSet<Integer> seed =
Stream.concat(
initialSet.stream(),
accumulators.stream()
// Essentially just extracting the set from each accumulator.
// Adding a getter for the set to the Accumulator class would be clearer.
.map(a -> a.combineResult(Collections.emptySet()))
.flatMap(Set::stream))
.collect(ImmutableSet.toImmutableSet());
For a generalized combineResult, you shouldn't use reduce, because that operation may be non-associative.
It's easy just to use a plain old loop in that case.
Set<Integer> seed = ImmutableSet.of(1, 2);
for (Accumulator a : accumulators) {
seed = a.combineResult(seed);
}
This avoids the principal issue with your current approach, namely non-thread locality of the calculation state (that is, other threads and previous invocations of the loop cannot affect the current invocation).

Java 8 lambda sum, count and group by

Select sum(paidAmount), count(paidAmount), classificationName,
From tableA
Group by classificationName;
How can i do this in Java 8 using streams and collectors?
Java8:
lineItemList.stream()
.collect(Collectors.groupingBy(Bucket::getBucketName,
Collectors.reducing(BigDecimal.ZERO,
Bucket::getPaidAmount,
BigDecimal::add)))
This gives me sum and group by. But how can I also get count on the group name ?
Expectation is :
100, 2, classname1
50, 1, classname2
150, 3, classname3
Using an extended version of the Statistics class of this answer,
class Statistics {
int count;
BigDecimal sum;
Statistics(Bucket bucket) {
count = 1;
sum = bucket.getPaidAmount();
}
Statistics() {
count = 0;
sum = BigDecimal.ZERO;
}
void add(Bucket b) {
count++;
sum = sum.add(b.getPaidAmount());
}
Statistics merge(Statistics another) {
count += another.count;
sum = sum.add(another.sum);
return this;
}
}
you can use it in a Stream operation like
Map<String, Statistics> map = lineItemList.stream()
.collect(Collectors.groupingBy(Bucket::getBucketName,
Collector.of(Statistics::new, Statistics::add, Statistics::merge)));
this may have a small performance advantage, as it only creates one Statistics instance per group for a sequential evaluation. It even supports parallel evaluation, but you’d need a very large list with sufficiently large groups to get a benefit from parallel evaluation.
For a sequential evaluation, the operation is equivalent to
lineItemList.forEach(b ->
map.computeIfAbsent(b.getBucketName(), x -> new Statistics()).add(b));
whereas merging partial results after a parallel evaluation works closer to the example already given in the linked answer, i.e.
secondMap.forEach((key, value) -> firstMap.merge(key, value, Statistics::merge));
As you're using BigDecimal for the amounts (which is the correct approach, IMO), you can't make use of Collectors.summarizingDouble, which summarizes count, sum, average, min and max in one pass.
Alexis C. has already shown in his answer one way to do it with streams. Another way would be to write your own collector, as shown in Holger's answer.
Here I'll show another way. First let's create a container class with a helper method. Then, instead of using streams, I'll use common Map operations.
class Statistics {
int count;
BigDecimal sum;
Statistics(Bucket bucket) {
count = 1;
sum = bucket.getPaidAmount();
}
Statistics merge(Statistics another) {
count += another.count;
sum = sum.add(another.sum);
return this;
}
}
Now, you can make the grouping as follows:
Map<String, Statistics> result = new HashMap<>();
lineItemList.forEach(b ->
result.merge(b.getBucketName(), new Statistics(b), Statistics::merge));
This works by using the Map.merge method, whose docs say:
If the specified key is not already associated with a value or is associated with null, associates it with the given non-null value. Otherwise, replaces the associated value with the results of the given remapping function
You could reduce pairs where the keys would hold the sum and the values would hold the count:
Map<String, SimpleEntry<BigDecimal, Long>> map =
lineItemList.stream()
.collect(groupingBy(Bucket::getBucketName,
reducing(new SimpleEntry<>(BigDecimal.ZERO, 0L),
b -> new SimpleEntry<>(b.getPaidAmount(), 1L),
(v1, v2) -> new SimpleEntry<>(v1.getKey().add(v2.getKey()), v1.getValue() + v2.getValue()))));
although Collectors.toMap looks cleaner:
Map<String, SimpleEntry<BigDecimal, Long>> map =
lineItemList.stream()
.collect(toMap(Bucket::getBucketName,
b -> new SimpleEntry<>(b.getPaidAmount(), 1L),
(v1, v2) -> new SimpleEntry<>(v1.getKey().add(v2.getKey()), v1.getValue() + v2.getValue())));

Issue with Java 8 Lambda for effective final while incrementing counts

I want to use Java 8 Lambda expression in following scenario but I am getting Local variable fooCount defined in an enclosing scope must be final or effectively final. I understand what the error message says, but I need to calculate percentage here so need to increment fooCount and barCount then calculate percentage. So what's the way to achieve it:
// key is a String with values like "FOO;SomethinElse" and value is Long
final Map<String, Long> map = null;
....
private int calculateFooPercentage() {
long fooCount = 0L;
long barCount = 0L;
map.forEach((k, v) -> {
if (k.contains("FOO")) {
fooCount++;
} else {
barCount++;
}
});
final int fooPercentage = 0;
//Rest of the logic to calculate percentage
....
return fooPercentage;
}
One option I have is to use AtomicLong here instead of long but I would like to avoid it, so later if possible I want to use parallel stream here.
There is a count method in stream to do counts for you.
long fooCount = map.keySet().stream().filter(k -> k.contains("FOO")).count();
long barCount = map.size() - fooCount;
If you want parallelisation, change .stream() to .parallelStream().
Alternatively, if you were trying to increment a variable manually, and use stream parallelisation, then you would want to use something like AtomicLong for thread safety. A simple variable, even if the compiler allowed it, would not be thread-safe.
To get both numbers, matching and non-matching elements, you can use
Map<Boolean, Long> result = map.keySet().stream()
.collect(Collectors.partitioningBy(k -> k.contains("FOO"), Collectors.counting()));
long fooCount = result.get(true);
long barCount = result.get(false);
But since your source is a Map, which knows its total size, and want to calculate a percentage, for which barCount is not needed, this specific task can be solved as
private int calculateFooPercentage() {
return (int)(map.keySet().stream().filter(k -> k.contains("FOO")).count()
*100/map.size());
}
Both variants are thread safe, i.e. changing stream() to parallelStream() will perform the operation in parallel, however, it’s unlikely that this operation will benefit from parallel processing. You would need humongous key strings or maps to get a benefit…
I agree with the other answers indicating you should use countor partitioningBy.
Just to explain the atomicity problem with an example, consider the following code:
private static AtomicInteger i1 = new AtomicInteger(0);
private static int i2 = 0;
public static void main(String[] args) {
IntStream.range(0, 100000).parallel().forEach(n -> i1.incrementAndGet());
System.out.println(i1);
IntStream.range(0, 100000).parallel().forEach(n -> i2++);
System.out.println(i2);
}
This returns the expected result of 100000 for i1 but an indeterminate number less than that (between 50000 and 80000 in my test runs) for i2. The reason should be pretty obvious.

Java Streams — How to perform an intermediate function every nth item

I am looking for an operation on a Stream that enables me to perform a non-terminal (and/or terminal) operation every nth item. Although I use a stream of primes for example, the stream could just as easily be web-requests, user actions, or some other cold data or live feed being produced.
From this:
Duration start = Duration.ofNanos(System.nanoTime());
IntStream.iterate(2, n -> n + 1)
.filter(Findprimes::isPrime)
.limit(1_000_1000 * 10)
.forEach(System.out::println);
System.out.println("Duration: " + Duration.ofNanos(System.nanoTime()).minus(start));
To a stream function like this:
IntStream.iterate(2, n -> n + 1)
.filter(Findprimes::isPrime)
.limit(1_000_1000 * 10)
.peekEvery(10, System.out::println)
.forEach( it -> {});
Create a helper method to wrap the peek() consumer:
public static IntConsumer every(int count, IntConsumer consumer) {
if (count <= 0)
throw new IllegalArgumentException("Count must be >1: Got " + count);
return new IntConsumer() {
private int i;
#Override
public void accept(int value) {
if (++this.i == count) {
consumer.accept(value);
this.i = 0;
}
}
};
}
You can now use it almost exactly like you wanted:
IntStream.rangeClosed(1, 20)
.peek(every(5, System.out::println))
.count();
Output
5
10
15
20
The helper method can be put in a utility class and statically imported, similar to how the Collectors class is nothing but static helper methods.
As noted by #user140547 in a comment, this code is not thread-safe, so it cannot be used with parallel streams. Besides, the output order would be messed up, so it doesn't really make sense to use it with parallel streams anyway.
It is not a good idea to rely on peek() and count() as it is possible that the operation is not invoked at all if count() can be calculated without going over the whole stream. Even if it works now, it does not mean that it is also going to work in future. See the javadoc of Stream.count() in Java 9.
Better use forEach().
For the problem itself: In special cases like a simple iteration, you could just filter your objects like.
Stream.iterate(2, n->n+1)
.limit(20)
.filter(n->(n-2)%5==0 && n!=2)
.forEach(System.out::println);
This of course won't work for other cases, where you might use a stateful IntConsumer. If iterate() is used, it is probably not that useful to use parallel streams anyway.
If you want a generic solution, you could also try to use a "normal" Stream, which may not be as efficient as an IntStream, but should still suffice in many cases:
class Tuple{ // ctor, getter/setter omitted
int index;
int value;
}
Then you could do:
Stream.iterate( new Tuple(1,2),t-> new Tuple(t.index+1,t.value*2))
.limit(30)
.filter(t->t.index %5 == 0)
.forEach(System.out::println);
If you have to use peek(), you can also do
.peek(t->{if (t.index %5 == 0) System.out.println(t);})
Or if you add methods
static Tuple initialTuple(int value){
return new Tuple(1,value);
}
static UnaryOperator<Tuple> createNextTuple(IntUnaryOperator f){
return current -> new Tuple(current.index+1,f.applyAsInt(current.value));
}
static Consumer<Tuple> every(int n,IntConsumer consumer){
return tuple -> {if (tuple.index % n == 0) consumer.accept(tuple.value);};
}
you can also do (with static imports):
Stream.iterate( initialTuple(2), createNextTuple(x->x*2))
.limit(30)
.peek(every(5,System.out::println))
.forEach(System.out::println);
Try this.
int[] counter = {0};
long result = IntStream.iterate(2, n -> n + 1)
.filter(Findprimes::isPrime)
.limit(100)
.peek(x -> { if (counter[0]++ % 10 == 0) System.out.print(x + " ");} )
.count();
result:
2 31 73 127 179 233 283 353 419 467

Categories

Resources