Parallely processing an array in java

Parallely processing an array in java - java

I am trying to apply get faster output through threads. Just doing a small POC sort.
Suppose I have a problem statement to find all the the numbers in an array who have odd occurrence.
Following is my attempt for both sequentially and parallel.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Random;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Collectors;
public class Test1 {
final static Map<Integer, Integer> mymap = new HashMap<>();
static Map<Integer, AtomicInteger> mymap1 = new ConcurrentHashMap<>();
public static void generateData(final int[] arr) {
final Random aRandom = new Random();
for (int i = 0; i < arr.length; i++) {
arr[i] = aRandom.nextInt(10);
}
}
public static void calculateAllOddOccurrence(final int[] arr) {
for (int i = 0; i < arr.length; i++) {
if (mymap.containsKey(arr[i])) {
mymap.put(arr[i], mymap.get(arr[i]) + 1);
} else {
mymap.put(arr[i], 1);
}
}
for (final Map.Entry<Integer, Integer> entry : mymap.entrySet()) {
if (entry.getValue() % 2 != 0) {
System.out.println(entry.getKey() + "=" + entry.getValue());
}
}
}
public static void calculateAllOddOccurrenceThread(final int[] arr) {
final ExecutorService executor = Executors.newFixedThreadPool(10);
final List<Future<?>> results = new ArrayList<>();
;
final int range = arr.length / 10;
for (int count = 0; count < 10; ++count) {
final int startAt = count * range;
final int endAt = startAt + range;
executor.submit(() -> {
for (int i = startAt; i < endAt; i++) {
if (mymap1.containsKey(arr[i])) {
final AtomicInteger accumulator = mymap1.get(arr[i]);
accumulator.incrementAndGet();
mymap1.put(arr[i], accumulator);
} else {
mymap1.put(arr[i], new AtomicInteger(1));
}
}
});
}
awaitTerminationAfterShutdown(executor);
for (final Entry<Integer, AtomicInteger> entry : mymap1.entrySet()) {
if (entry.getValue().get() % 2 != 0) {
System.out.println(entry.getKey() + "=" + entry.getValue());
}
}
}
public static void calculateAllOddOccurrenceStream(final int[] arr) {
final ConcurrentMap<Integer, List<Integer>> map2 = Arrays.stream(arr).parallel().boxed().collect(Collectors.groupingByConcurrent(i -> i));
map2.entrySet().stream().parallel().filter(e -> e.getValue().size() % 2 != 0).forEach(entry -> System.out.println(entry.getKey() + "=" + entry.getValue().size()));
}
public static void awaitTerminationAfterShutdown(final ExecutorService threadPool) {
threadPool.shutdown();
try {
if (!threadPool.awaitTermination(60, TimeUnit.SECONDS)) {
threadPool.shutdownNow();
}
} catch (final InterruptedException ex) {
threadPool.shutdownNow();
Thread.currentThread().interrupt();
}
}
public static void main(final String... doYourBest) {
final int[] arr = new int[200000000];
generateData(arr);
long starttime = System.currentTimeMillis();
calculateAllOddOccurrence(arr);
System.out.println("Total time=" + (System.currentTimeMillis() - starttime));
starttime = System.currentTimeMillis();
calculateAllOddOccurrenceStream(arr);
System.out.println("Total time Thread=" + (System.currentTimeMillis() - starttime));
}
}
Output:
1=20003685
2=20000961
3=19991311
5=20006433
7=19995737
8=19999463
Total time=3418
5=20006433
7=19995737
1=20003685
8=19999463
2=20000961
3=19991311
Total time Thread=19640
Parallel execution (calculateAllOddOccurrenceStream ) is taking more time. What is the best way to process an array in parallel and then merge the result?
My goal is not to find the fastest algorithm, but to use any algorithm and try to run on in different threads such that they are processing different part of array simultaneously.

It seems that those threads are working on same parts of the array simultaneously hence the answer is not coming correctly.
Rather divide the array in parts with proper start and end indexes. Allocate separate threads to process these parts and count the occurences of each number in each of those parts.
At the end, you would have multiple maps having counts calculated from those separate parts. Merge those maps to get the final answer.
OR you could have a single concurrentHashMap for storing the counts coming from all those threads, but a bug could creep in there I guess as there would still be concurrent write conflicts. In a highly multi-threaded environment, writes on a cocnurrentHashMap might not be 100% safe. For a guaranteed write behaviour, the correct way is to use the the atomicity of ConcurrentHashMap.putIfAbsent(K key, V value) method and pay attention to the return value, which tells if the put operation was successful or not. Simple put might not be correct. See https://stackoverflow.com/a/14947844/945214
You could use java 8 streams API (https://www.journaldev.com/2774/java-8-stream) to write the code OR simple threading code using Java 5 constructs would also do.
Added Java8 stream code, Notice the timing differences. ArrayList (instead) of an array makes a difference:
package com.test;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.function.Function;
import java.util.stream.Collectors;
public class Test {
public static void generateData(final int[] arr) {
final Random aRandom = new Random();
for (int i = 0; i < arr.length; i++) {
arr[i] = aRandom.nextInt(10);
}
}
public static void calculateAllOddOccurrence(final int[] arr) {
final Map<Integer, Integer> mymap = new HashMap<>();
for (int i = 0; i < arr.length; i++) {
if (mymap.containsKey(arr[i])) {
mymap.put(arr[i], mymap.get(arr[i]) + 1);
} else {
mymap.put(arr[i], 1);
}
}
for (final Map.Entry<Integer, Integer> entry : mymap.entrySet()) {
if (entry.getValue() % 2 != 0) {
System.out.println(entry.getKey() + "=" + entry.getValue());
}
}
}
public static void calculateAllOddOccurrenceStream( int[] arr) {
Arrays.stream(arr).boxed().collect(Collectors.groupingBy(Function.identity(), Collectors.counting())).entrySet().parallelStream().filter(e -> e.getValue() % 2 != 0).forEach(entry -> System.out.println(entry.getKey()+"="+ entry.getValue()));
}
public static void calculateAllOddOccurrenceStream(List<Integer> list) {
list.parallelStream().collect(Collectors.groupingBy(Function.identity(), Collectors.counting())).entrySet().parallelStream().filter(e -> e.getValue() % 2 != 0).forEach(entry -> System.out.println(entry.getKey()+"="+ entry.getValue()));
}
public static void main(final String... doYourBest) {
final int[] arr = new int[200000000];
generateData(arr);
long starttime = System.currentTimeMillis();
calculateAllOddOccurrence(arr);
System.out.println("Total time with simple map=" + (System.currentTimeMillis() - starttime));
List<Integer> list = Arrays.stream(arr).boxed().collect(Collectors.toList());
starttime = System.currentTimeMillis();
calculateAllOddOccurrenceStream(list);
System.out.println("Total time stream - with a readymade list, which might be the case for most apps as arraylist is more easier to work with =" + (System.currentTimeMillis() - starttime));
starttime = System.currentTimeMillis();
calculateAllOddOccurrenceStream(arr);
System.out.println("Total time Stream with array=" + (System.currentTimeMillis() - starttime));
}}
OUTPUT
0=19999427
2=20001707
4=20002331
5=20001585
7=20001859
8=19993989
Total time with simple map=2813
4=20002331
0=19999427
2=20001707
7=20001859
8=19993989
5=20001585
Total time stream - with a readymade list, which might be the case for most apps as arraylist is more easier to work with = 3328
8=19993989
7=20001859
0=19999427
4=20002331
2=20001707
5=20001585
Total time Stream with array=6115

You are looking at the STREAMS API introduced in Java 8:
http://www.baeldung.com/java-8-streams
Example:
// sequential processes
myArray.stream().filter( ... ).map( ... ).collect(Collectors.toList()):
// parallel processes
myArray.parallelStream().filter( ... ).map( ... ).collect(Collectors.toList());

Looking at your code, you're going wrong with this line:
mymap1.put(arr[i], mymap1.get(arr[i]) + 1);
You are overwriting the values in parallel, for example:
Thread 1 'get' = 0
Thread 2 'get' = 0
Thread 1 'put 1'
Thread 2 'put 1'
Change your map to:
static Map<Integer, AtomicInteger> mymap1 = new ConcurrentHashMap<>();
static {
//initialize to avoid null values and non-synchronized puts from different Threads
for(int i=0;i<10;i++) {
mymap1.put(i, new AtomicInteger());
}
}
....
//in your loop
for (int i = 0; i < arr.length; i++) {
AtomicInteger accumulator = mymap1.get(arr[i]);
accumulator.incrementAndGet();
}
Edit: The problem with the above approach is of course the initialization of mymap1. To avoid falling into the same trap (creating AtomicInteger within the loop and overwriting each other yet again), it needs to be prefilled with values.
Since I'm feeling generous, here's what might work with the Streams API:
int totalEvenCount = Arrays.stream(arr).parallel().filter(i->i%2==0).reduce(0, Integer::sum);
int totalOddCount = Arrays.stream(arr).parallel().filter(i->i%2!=0).reduce(0, Integer::sum);
//or this to count by individual numbers:
ConcurrentMap<Integer,List<Integer>> map1 = Arrays.stream(arr).parallel().boxed().collect(Collectors.groupingByConcurrent(i->i));
map1.entrySet().stream().filter(e -> e.getKey()%2!=0).forEach(entry -> System.out.println(entry.getKey() + "=" + entry.getValue().size()));
As an exercise to the reader, perhaps you can look into how the various Collectors work, in order to write your own countingBy(i->i%2!=0) to output a map only containing the counts instead of a list of values.

Related

Why is Koloboke HashObjObjMaps collections so slow at putIfAbsent when using Longs as keys?

The following code shows that Koloboke HashHashObjs maps are very slow at putIfAbsent, is there a design flaw here?
import com.koloboke.collect.map.hash.HashObjObjMaps;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.TimeUnit;
public class Koloboke {
public static void main(String [] args) {
Map<Long, String> normalMap = new HashMap<>();
Map<Long, String> kolobokeMap = HashObjObjMaps.newMutableMap();
long iterations = 100_000;
for(long i = 0;i<iterations;i++) {
normalMap.put(i, Long.toString(i));
kolobokeMap.put(i, Long.toString(i));
}
long nanoStart= System.nanoTime();
for(long i = 0;i<iterations;i++) {
normalMap.putIfAbsent(i, Long.toString(i));
}
System.out.println("hashmap putIfAbsent took " + TimeUnit.NANOSECONDS.toMillis(System.nanoTime()-nanoStart) + " millis");
nanoStart= System.nanoTime();
for(long i = 0;i<iterations;i++) {
kolobokeMap.putIfAbsent(i, Long.toString(i));
}
System.out.println("koloboke putIfAbsent took " + TimeUnit.NANOSECONDS.toMillis(System.nanoTime()-nanoStart) + " millis");
}
}
Output:
hashmap putIfAbsent took 27 millis
koloboke putIfAbsent took 19733 millis

How do I provoke race conditions on this non-threadsafe arraylist class?

I'm playing around with trying to build a arraylist class that is threadsafe
import java.util.stream.*;
import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class LongArrayListUnsafe {
public static void main(String[] args) {
LongArrayList dal1 = LongArrayList.withElements();
ExecutorService executorService = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
for (int i=0; i<1000; i++) {
executorService.execute(new Runnable() {
public void run() {
for (int i=0; i<10; i++)
dal1.add(i);
}
});}
System.out.println("Using toString(): " + dal1);
for (int i=0; i<dal1.size(); i++)
System.out.println(dal1.get(i));
System.out.println(dal1.size());} }
class LongArrayList {
private long[] items;
private int size;
public LongArrayList() {
reset();
}
public static LongArrayList withElements(long... initialValues){
LongArrayList list = new LongArrayList();
for (long l : initialValues) list.add( l );
return list;
}
// reset me to initial
public void reset(){
items = new long[2];
size = 0;
}
// Number of items in the double list
public int size() {
return size;
}
// Return item number i
public long get(int i) {
if (0 <= i && i < size)
return items[i];
else
throw new IndexOutOfBoundsException(String.valueOf(i));
}
// Replace item number i, if any, with x
public long set(int i, long x) {
if (0 <= i && i < size) {
long old = items[i];
items[i] = x;
return old;
} else
throw new IndexOutOfBoundsException(String.valueOf(i));}
// Add item x to end of list
public LongArrayList add(long x) {
if (size == items.length) {
long[] newItems = new long[items.length * 2];
for (int i=0; i<items.length; i++)
newItems[i] = items[i];
items = newItems;
}
items[size] = x;
size++;
return this;
}
public String toString() {
return Arrays.stream(items, 0,size)
.mapToObj( Long::toString )
.collect(Collectors.joining(", ", "[", "]"));
}
}
The "Longaraylist" class is simply a class for a list that is not threadsafe, this is the class that I later want to manipulate to become threadsafe.
What is bugging me right now is the driver code in main, I am creating a executorservice, and I am submitting a bunch of tasks to it. In my mind, using this non-threadsafe class, this should introduce some race conditions, and the "size" variable should at the very least be broken. But every time I run this code, this runs perfectly. The size that I print at the end perfectly corresponds to the amount of times that add() is called.
What could I do differently? Maybe I am using the executorservice wrong?
I would also be grateful of any other tips to test the threadsafety of such a class. thank you

Incrementing and removing elements of ConcurrentHashMap

There is class Counter, which contains a set of keys and allows incrementing value of each key and getting all values. So, the task I'm trying to solve is the same as in Atomically incrementing counters stored in ConcurrentHashMap . The difference is that the set of keys is unbounded, so new keys are added frequently.
In order to reduce memory consumption, I clear values after they are read, this happens in Counter.getAndClear(). Keys are also removed, and this seems to break things up.
One thread increments random keys and another thread gets snapshots of all values and clears them.
The code is below:
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.ThreadLocalRandom;
import java.util.Map;
import java.util.HashMap;
import java.lang.Thread;
class HashMapTest {
private final static int hashMapInitSize = 170;
private final static int maxKeys = 100;
private final static int nIterations = 10_000_000;
private final static int sleepMs = 100;
private static class Counter {
private ConcurrentMap<String, Long> map;
public Counter() {
map = new ConcurrentHashMap<String, Long>(hashMapInitSize);
}
public void increment(String key) {
Long value;
do {
value = map.computeIfAbsent(key, k -> 0L);
} while (!map.replace(key, value, value + 1L));
}
public Map<String, Long> getAndClear() {
Map<String, Long> mapCopy = new HashMap<String, Long>();
for (String key : map.keySet()) {
Long removedValue = map.remove(key);
if (removedValue != null)
mapCopy.put(key, removedValue);
}
return mapCopy;
}
}
// The code below is used for testing
public static void main(String[] args) throws InterruptedException {
Counter counter = new Counter();
Thread thread = new Thread(new Runnable() {
public void run() {
for (int j = 0; j < nIterations; j++) {
int index = ThreadLocalRandom.current().nextInt(maxKeys);
counter.increment(Integer.toString(index));
}
}
}, "incrementThread");
Thread readerThread = new Thread(new Runnable() {
public void run() {
long sum = 0;
boolean isDone = false;
while (!isDone) {
try {
Thread.sleep(sleepMs);
}
catch (InterruptedException e) {
isDone = true;
}
Map<String, Long> map = counter.getAndClear();
for (Map.Entry<String, Long> entry : map.entrySet()) {
Long value = entry.getValue();
sum += value;
}
System.out.println("mapSize: " + map.size());
}
System.out.println("sum: " + sum);
System.out.println("expected: " + nIterations);
}
}, "readerThread");
thread.start();
readerThread.start();
thread.join();
readerThread.interrupt();
readerThread.join();
// Ensure that counter is empty
System.out.println("elements left in map: " + counter.getAndClear().size());
}
}
While testing I have noticed that some increments are lost. I get the following results:
sum: 9993354
expected: 10000000
elements left in map: 0
If you can't reproduce this error (that sum is less than expected), you can try to increase maxKeys a few orders of magnitude or decrease hashMapInitSize or increase nIterations (the latter also increases run time). I have also included testing code (main method) in the case it has any errors.
I suspect that the error is happening when capacity of ConcurrentHashMap is increased during runtime. On my computer the code appears to work correctly when hashMapInitSize is 170, but fails when hashMapInitSize is 171. I believe that size of 171 triggers increasing of capacity (128 / 0.75 == 170.66, where 0.75 is the default load factor of hash map).
So, the question is: am I using remove, replace and computeIfAbsent operations correctly? I assume that they are atomic operations on ConcurrentHashMap based on answers to Use of ConcurrentHashMap eliminates data-visibility troubles?. If so, why are some increments lost?
EDIT:
I think that I missed an important detail here that increment() is supposed to be called much more frequently than getAndClear(), so that I try to avoid any explicit locking in increment(). However, I'm going to test performance of different versions later to see if it is really an issue.

I gues the problem is the use of remove while iterating over the keySet. This is what the JavaDoc says for Map#keySet() (my emphasis):
Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation), the results of the iteration are undefined.
The JavaDoc for ConcurrentHashMap give further clues:
Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.
The conclusion is that mutating the map while iterating over the keys is not predicatble.
One solution is to create a new map for the getAndClear() operation and just return the old map. The switch has to be protected, and in the example below I used a ReentrantReadWriteLock:
class HashMapTest {
private final static int hashMapInitSize = 170;
private final static int maxKeys = 100;
private final static int nIterations = 10_000_000;
private final static int sleepMs = 100;
private static class Counter {
private ConcurrentMap<String, Long> map;
ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
ReadLock readLock = lock.readLock();
WriteLock writeLock = lock.writeLock();
public Counter() {
map = new ConcurrentHashMap<>(hashMapInitSize);
}
public void increment(String key) {
readLock.lock();
try {
map.merge(key, 1L, Long::sum);
} finally {
readLock.unlock();
}
}
public Map<String, Long> getAndClear() {
ConcurrentMap<String, Long> oldMap;
writeLock.lock();
try {
oldMap = map;
map = new ConcurrentHashMap<>(hashMapInitSize);
} finally {
writeLock.unlock();
}
return oldMap;
}
}
// The code below is used for testing
public static void main(String[] args) throws InterruptedException {
final AtomicBoolean ready = new AtomicBoolean(false);
Counter counter = new Counter();
Thread thread = new Thread(new Runnable() {
public void run() {
for (int j = 0; j < nIterations; j++) {
int index = ThreadLocalRandom.current().nextInt(maxKeys);
counter.increment(Integer.toString(index));
}
}
}, "incrementThread");
Thread readerThread = new Thread(new Runnable() {
public void run() {
long sum = 0;
while (!ready.get()) {
try {
Thread.sleep(sleepMs);
} catch (InterruptedException e) {
//
}
Map<String, Long> map = counter.getAndClear();
for (Map.Entry<String, Long> entry : map.entrySet()) {
Long value = entry.getValue();
sum += value;
}
System.out.println("mapSize: " + map.size());
}
System.out.println("sum: " + sum);
System.out.println("expected: " + nIterations);
}
}, "readerThread");
thread.start();
readerThread.start();
thread.join();
ready.set(true);
readerThread.join();
// Ensure that counter is empty
System.out.println("elements left in map: " + counter.getAndClear().size());
}
}

Partition a Set into smaller Subsets and process as batch

I have a continuous running thread in my application, which consists of a HashSet to store all the symbols inside the application. As per the design at the time it was written, inside the thread's while true condition it will iterate the HashSet continuously, and update the database for all the symbols contained inside HashSet.
The maximum number of symbols that might be present inside the HashSet will be around 6000. I don't want to update the DB with all the 6000 symbols at once, but divide this HashSet into different subsets of 500 each (12 sets) and execute each subset individually and have a thread sleep after each subset for 15 minutes, so that I can reduce the pressure on the database.
This is my code (sample code snippet)
How can I partition a set into smaller subsets and process (I have seen the examples for partitioning ArrayList, TreeSet, but didn't find any example related to HashSet)
package com.ubsc.rewji.threads;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import java.util.concurrent.PriorityBlockingQueue;
public class TaskerThread extends Thread {
private PriorityBlockingQueue<String> priorityBlocking = new PriorityBlockingQueue<String>();
String symbols[] = new String[] { "One", "Two", "Three", "Four" };
Set<String> allSymbolsSet = Collections
.synchronizedSet(new HashSet<String>(Arrays.asList(symbols)));
public void addsymbols(String commaDelimSymbolsList) {
if (commaDelimSymbolsList != null) {
String[] symAr = commaDelimSymbolsList.split(",");
for (int i = 0; i < symAr.length; i++) {
priorityBlocking.add(symAr[i]);
}
}
}
public void run() {
while (true) {
try {
while (priorityBlocking.peek() != null) {
String symbol = priorityBlocking.poll();
allSymbolsSet.add(symbol);
}
Iterator<String> ite = allSymbolsSet.iterator();
System.out.println("=======================");
while (ite.hasNext()) {
String symbol = ite.next();
if (symbol != null && symbol.trim().length() > 0) {
try {
updateDB(symbol);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Thread.sleep(2000);
} catch (Exception e) {
e.printStackTrace();
}
}
}
public void updateDB(String symbol) {
System.out.println("THE SYMBOL BEING UPDATED IS" + " " + symbol);
}
public static void main(String args[]) {
TaskerThread taskThread = new TaskerThread();
taskThread.start();
String commaDelimSymbolsList = "ONVO,HJI,HYU,SD,F,SDF,ASA,TRET,TRE,JHG,RWE,XCX,WQE,KLJK,XCZ";
taskThread.addsymbols(commaDelimSymbolsList);
}
}

With Guava:
for (List<String> partition : Iterables.partition(yourSet, 500)) {
// ... handle partition ...
}
Or Apache Commons:
for (List<String> partition : ListUtils.partition(yourList, 500)) {
// ... handle partition ...
}

Do something like
private static final int PARTITIONS_COUNT = 12;
List<Set<Type>> theSets = new ArrayList<Set<Type>>(PARTITIONS_COUNT);
for (int i = 0; i < PARTITIONS_COUNT; i++) {
theSets.add(new HashSet<Type>());
}
int index = 0;
for (Type object : originalSet) {
theSets.get(index++ % PARTITIONS_COUNT).add(object);
}
Now you have partitioned the originalSet into 12 other HashSets.

We can use the following approach to divide a Set.
We will get the output as
[a, b]
[c, d]
[e]`
private static List<Set<String>> partitionSet(Set<String> set, int partitionSize)
{
List<Set<String>> list = new ArrayList<>();
int setSize = set.size();
Iterator iterator = set.iterator();
while(iterator.hasNext())
{
Set newSet = new HashSet();
for(int j = 0; j < partitionSize && iterator.hasNext(); j++)
{
String s = (String)iterator.next();
newSet.add(s);
}
list.add(newSet);
}
return list;
}
public static void main(String[] args)
{
Set<String> set = new HashSet<>();
set.add("a");
set.add("b");
set.add("c");
set.add("d");
set.add("e");
int size = 2;
List<Set<String>> list = partitionSet(set, 2);
for(int i = 0; i < list.size(); i++)
{
Set<String> s = list.get(i);
System.out.println(s);
}
}

If you are not worried much about space complexity, you can do like this in a clean way :
List<List<T>> partitionList = Lists.partition(new ArrayList<>(inputSet), PARTITION_SIZE);
List<Set<T>> partitionSet = partitionList.stream().map((Function<List<T>, HashSet>) HashSet::new).collect(Collectors.toList());

The Guava solution from #Andrey_chaschev seems the best, but in case it is not possible to use it, I believe the following would help
public static List<Set<String>> partition(Set<String> set, int chunk) {
if(set == null || set.isEmpty() || chunk < 1)
return new ArrayList<>();
List<Set<String>> partitionedList = new ArrayList<>();
double loopsize = Math.ceil((double) set.size() / (double) chunk);
for(int i =0; i < loopsize; i++) {
partitionedList.add(set.stream().skip((long)i * chunk).limit(chunk).collect(Collectors.toSet()));
}
return partitionedList;
}

A very simple way for your actual problem would be to change your code as follows:
Iterator<String> ite = allSymbolsSet.iterator();
System.out.println("=======================");
int i = 500;
while ((--i > 0) && ite.hasNext()) {
A general method would be to use the iterator to take the elements out one by one in a simple loop:
int i = 500;
while ((--i > 0) && ite.hasNext()) {
sublist.add(ite.next());
ite.remove();
}

java map concurrent update

I'm trying to create a Map with int values and increase them by multiple threads. two or more threads might increase the same key.
ConcurrentHashMap documentation was very unclear to me since it sais that:
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove)
I wonder if the following code using ConcurrentHashMap will works correctly:
myMap.put(X, myMap.get(X) + 1);
if not, how can I manage such thing?

Concurrent map will not help thread safety of your code. You still can get race condition:
Thread-1: x = 1, get(x)
Thread-2: x = 1, get(x)
Thread-1: put(x + 1) => 2
Thread-2: put(x + 1) => 2
Two increments happened, but you still get only +1. You need a concurrent map only if you aim for modifying the map itself, not its content. Even the simplest HashMap is threadsafe for concurrent reads, given the map is not mutated anymore.
So instead of a threadsafe map for primitive type, you need a threadsafe wrapper for the type. Either something from java.util.concurrent.atomic or roll your own locked container if needing an arbitrary type.

One idea would be combining ConcurrentMap with AtomicInteger, which has a increment method.
AtomicInteger current = map.putIfAbsent(key, new AtomicInteger(1));
int newValue = current == null ? 1 :current.incrementAndGet();
or (more efficiently, thanks #Keppil) with an extra code guard to avoid unnecessary object creation:
AtomicInteger current = map.get(key);
if (current == null){
current = map.putIfAbsent(key, new AtomicInteger(1));
}
int newValue = current == null ? 1 : current.incrementAndGet();

Best practice. You can use HashMap and AtomicInteger.
Test code:
public class HashMapAtomicIntegerTest {
public static final int KEY = 10;
public static void main(String[] args) {
HashMap<Integer, AtomicInteger> concurrentHashMap = new HashMap<Integer, AtomicInteger>();
concurrentHashMap.put(HashMapAtomicIntegerTest.KEY, new AtomicInteger());
List<HashMapAtomicCountThread> threadList = new ArrayList<HashMapAtomicCountThread>();
for (int i = 0; i < 500; i++) {
HashMapAtomicCountThread testThread = new HashMapAtomicCountThread(
concurrentHashMap);
testThread.start();
threadList.add(testThread);
}
int index = 0;
while (true) {
for (int i = index; i < 500; i++) {
HashMapAtomicCountThread testThread = threadList.get(i);
if (testThread.isAlive()) {
break;
} else {
index++;
}
}
if (index == 500) {
break;
}
}
System.out.println("The result value should be " + 5000000
+ ",actually is"
+ concurrentHashMap.get(HashMapAtomicIntegerTest.KEY));
}
}
class HashMapAtomicCountThread extends Thread {
HashMap<Integer, AtomicInteger> concurrentHashMap = null;
public HashMapAtomicCountThread(
HashMap<Integer, AtomicInteger> concurrentHashMap) {
this.concurrentHashMap = concurrentHashMap;
}
#Override
public void run() {
for (int i = 0; i < 10000; i++) {
concurrentHashMap.get(HashMapAtomicIntegerTest.KEY)
.getAndIncrement();
}
}
}
Results:
The result value should be 5000000,actually is5000000
Or HashMap and synchronized, but much slower than the former
public class HashMapSynchronizeTest {
public static final int KEY = 10;
public static void main(String[] args) {
HashMap<Integer, Integer> hashMap = new HashMap<Integer, Integer>();
hashMap.put(KEY, 0);
List<HashMapSynchronizeThread> threadList = new ArrayList<HashMapSynchronizeThread>();
for (int i = 0; i < 500; i++) {
HashMapSynchronizeThread testThread = new HashMapSynchronizeThread(
hashMap);
testThread.start();
threadList.add(testThread);
}
int index = 0;
while (true) {
for (int i = index; i < 500; i++) {
HashMapSynchronizeThread testThread = threadList.get(i);
if (testThread.isAlive()) {
break;
} else {
index++;
}
}
if (index == 500) {
break;
}
}
System.out.println("The result value should be " + 5000000
+ ",actually is" + hashMap.get(KEY));
}
}
class HashMapSynchronizeThread extends Thread {
HashMap<Integer, Integer> hashMap = null;
public HashMapSynchronizeThread(
HashMap<Integer, Integer> hashMap) {
this.hashMap = hashMap;
}
#Override
public void run() {
for (int i = 0; i < 10000; i++) {
synchronized (hashMap) {
hashMap.put(HashMapSynchronizeTest.KEY,
hashMap
.get(HashMapSynchronizeTest.KEY) + 1);
}
}
}
}
Results:
The result value should be 5000000,actually is5000000
Use ConcurrentHashMap will get the wrong results.
public class ConcurrentHashMapTest {
public static final int KEY = 10;
public static void main(String[] args) {
ConcurrentHashMap<Integer, Integer> concurrentHashMap = new ConcurrentHashMap<Integer, Integer>();
concurrentHashMap.put(KEY, 0);
List<CountThread> threadList = new ArrayList<CountThread>();
for (int i = 0; i < 500; i++) {
CountThread testThread = new CountThread(concurrentHashMap);
testThread.start();
threadList.add(testThread);
}
int index = 0;
while (true) {
for (int i = index; i < 500; i++) {
CountThread testThread = threadList.get(i);
if (testThread.isAlive()) {
break;
} else {
index++;
}
}
if (index == 500) {
break;
}
}
System.out.println("The result value should be " + 5000000
+ ",actually is" + concurrentHashMap.get(KEY));
}
}
class CountThread extends Thread {
ConcurrentHashMap<Integer, Integer> concurrentHashMap = null;
public CountThread(ConcurrentHashMap<Integer, Integer> concurrentHashMap) {
this.concurrentHashMap = concurrentHashMap;
}
#Override
public void run() {
for (int i = 0; i < 10000; i++) {
concurrentHashMap.put(ConcurrentHashMapTest.KEY,
concurrentHashMap.get(ConcurrentHashMapTest.KEY) + 1);
}
}
}
Results:
The result value should be 5000000,actually is11759

You could just put the operation in a synchronized (myMap) {...} block.

Your current code changes the values of your map concurrently so this will not work.
If multiple threads can put values into your map, you have to use a concurrent map like ConcurrentHashMap with non thread safe values like Integer. ConcurrentMap.replace will then do what you want (or use AtomicInteger to ease your code).
If your threads will only change the values (and not add/change the keys) of your map, then you can use a standard map storing thread safe values like AtomicInteger. Then your thread will call:map.get(key).incrementAndGet() for instance.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parallely processing an array in java - java

Related

Why is Koloboke HashObjObjMaps collections so slow at putIfAbsent when using Longs as keys?

How do I provoke race conditions on this non-threadsafe arraylist class?

Incrementing and removing elements of ConcurrentHashMap

Partition a Set into smaller Subsets and process as batch

java map concurrent update

Categories

Resources