Incrementing and removing elements of ConcurrentHashMap - java

There is class Counter, which contains a set of keys and allows incrementing value of each key and getting all values. So, the task I'm trying to solve is the same as in Atomically incrementing counters stored in ConcurrentHashMap . The difference is that the set of keys is unbounded, so new keys are added frequently.
In order to reduce memory consumption, I clear values after they are read, this happens in Counter.getAndClear(). Keys are also removed, and this seems to break things up.
One thread increments random keys and another thread gets snapshots of all values and clears them.
The code is below:
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.ThreadLocalRandom;
import java.util.Map;
import java.util.HashMap;
import java.lang.Thread;
class HashMapTest {
private final static int hashMapInitSize = 170;
private final static int maxKeys = 100;
private final static int nIterations = 10_000_000;
private final static int sleepMs = 100;
private static class Counter {
private ConcurrentMap<String, Long> map;
public Counter() {
map = new ConcurrentHashMap<String, Long>(hashMapInitSize);
}
public void increment(String key) {
Long value;
do {
value = map.computeIfAbsent(key, k -> 0L);
} while (!map.replace(key, value, value + 1L));
}
public Map<String, Long> getAndClear() {
Map<String, Long> mapCopy = new HashMap<String, Long>();
for (String key : map.keySet()) {
Long removedValue = map.remove(key);
if (removedValue != null)
mapCopy.put(key, removedValue);
}
return mapCopy;
}
}
// The code below is used for testing
public static void main(String[] args) throws InterruptedException {
Counter counter = new Counter();
Thread thread = new Thread(new Runnable() {
public void run() {
for (int j = 0; j < nIterations; j++) {
int index = ThreadLocalRandom.current().nextInt(maxKeys);
counter.increment(Integer.toString(index));
}
}
}, "incrementThread");
Thread readerThread = new Thread(new Runnable() {
public void run() {
long sum = 0;
boolean isDone = false;
while (!isDone) {
try {
Thread.sleep(sleepMs);
}
catch (InterruptedException e) {
isDone = true;
}
Map<String, Long> map = counter.getAndClear();
for (Map.Entry<String, Long> entry : map.entrySet()) {
Long value = entry.getValue();
sum += value;
}
System.out.println("mapSize: " + map.size());
}
System.out.println("sum: " + sum);
System.out.println("expected: " + nIterations);
}
}, "readerThread");
thread.start();
readerThread.start();
thread.join();
readerThread.interrupt();
readerThread.join();
// Ensure that counter is empty
System.out.println("elements left in map: " + counter.getAndClear().size());
}
}
While testing I have noticed that some increments are lost. I get the following results:
sum: 9993354
expected: 10000000
elements left in map: 0
If you can't reproduce this error (that sum is less than expected), you can try to increase maxKeys a few orders of magnitude or decrease hashMapInitSize or increase nIterations (the latter also increases run time). I have also included testing code (main method) in the case it has any errors.
I suspect that the error is happening when capacity of ConcurrentHashMap is increased during runtime. On my computer the code appears to work correctly when hashMapInitSize is 170, but fails when hashMapInitSize is 171. I believe that size of 171 triggers increasing of capacity (128 / 0.75 == 170.66, where 0.75 is the default load factor of hash map).
So, the question is: am I using remove, replace and computeIfAbsent operations correctly? I assume that they are atomic operations on ConcurrentHashMap based on answers to Use of ConcurrentHashMap eliminates data-visibility troubles?. If so, why are some increments lost?
EDIT:
I think that I missed an important detail here that increment() is supposed to be called much more frequently than getAndClear(), so that I try to avoid any explicit locking in increment(). However, I'm going to test performance of different versions later to see if it is really an issue.

I gues the problem is the use of remove while iterating over the keySet. This is what the JavaDoc says for Map#keySet() (my emphasis):
Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation), the results of the iteration are undefined.
The JavaDoc for ConcurrentHashMap give further clues:
Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.
The conclusion is that mutating the map while iterating over the keys is not predicatble.
One solution is to create a new map for the getAndClear() operation and just return the old map. The switch has to be protected, and in the example below I used a ReentrantReadWriteLock:
class HashMapTest {
private final static int hashMapInitSize = 170;
private final static int maxKeys = 100;
private final static int nIterations = 10_000_000;
private final static int sleepMs = 100;
private static class Counter {
private ConcurrentMap<String, Long> map;
ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
ReadLock readLock = lock.readLock();
WriteLock writeLock = lock.writeLock();
public Counter() {
map = new ConcurrentHashMap<>(hashMapInitSize);
}
public void increment(String key) {
readLock.lock();
try {
map.merge(key, 1L, Long::sum);
} finally {
readLock.unlock();
}
}
public Map<String, Long> getAndClear() {
ConcurrentMap<String, Long> oldMap;
writeLock.lock();
try {
oldMap = map;
map = new ConcurrentHashMap<>(hashMapInitSize);
} finally {
writeLock.unlock();
}
return oldMap;
}
}
// The code below is used for testing
public static void main(String[] args) throws InterruptedException {
final AtomicBoolean ready = new AtomicBoolean(false);
Counter counter = new Counter();
Thread thread = new Thread(new Runnable() {
public void run() {
for (int j = 0; j < nIterations; j++) {
int index = ThreadLocalRandom.current().nextInt(maxKeys);
counter.increment(Integer.toString(index));
}
}
}, "incrementThread");
Thread readerThread = new Thread(new Runnable() {
public void run() {
long sum = 0;
while (!ready.get()) {
try {
Thread.sleep(sleepMs);
} catch (InterruptedException e) {
//
}
Map<String, Long> map = counter.getAndClear();
for (Map.Entry<String, Long> entry : map.entrySet()) {
Long value = entry.getValue();
sum += value;
}
System.out.println("mapSize: " + map.size());
}
System.out.println("sum: " + sum);
System.out.println("expected: " + nIterations);
}
}, "readerThread");
thread.start();
readerThread.start();
thread.join();
ready.set(true);
readerThread.join();
// Ensure that counter is empty
System.out.println("elements left in map: " + counter.getAndClear().size());
}
}

Related

Immutable 100%, but still not thread-safe

I've read a lot about thread-safety. In certain part of my multi-threaded program, I preferred to try the immutability. After getting incorrect results, I noticed my immutable object is not thread-safe although it is 100% immutable. Please correct me if I'm wrong.
public final class ImmutableGaugeV4 {
private final long max, current;
public ImmutableGaugeV4(final long max) {
this(max, 0);
}
private ImmutableGaugeV4(final long max, final long current) {
this.max = max;
this.current = current;
}
public final ImmutableGaugeV4 increase(final long increment) {
final long c = current;
return new ImmutableGaugeV4(max, c + increment);
}
public final long getCurrent() {
return current;
}
public final long getPerc() {
return current * 100 / max;
}
#Override
public final String toString() {
return "ImmutableGaugeV4 [max=" + max + ", current=" + current + "](" + getPerc() + "%)";
}
}
aaaaa
public class T4 {
public static void main(String[] args) {
new T4().x();
}
ImmutableGaugeV4 g3 = new ImmutableGaugeV4(10000);
private void x() {
for (int i = 0; i < 10; i++) {
new Thread() {
public void run() {
for (int j = 0; j < 1000; j++) {
g3 = g3.increase(1);
System.out.println(g3);
}
}
}.start();
}
}
}
Sometimes I'm getting correct results, and most of the times I'm not
ImmutableGaugeV4 [max=10000, current=9994](99%)
ImmutableGaugeV4 [max=10000, current=9995](99%)
ImmutableGaugeV4 [max=10000, current=9996](99%)
ImmutableGaugeV4 [max=10000, current=9997](99%)
What is wrong with this immutable object? What is missing to make it thread-safe without using intrinsic locks?
Neither
final long c = current;
return new ImmutableGaugeV4(max, c + increment);
nor
g3 = g3.increase(1);
is thread-safe. These compound actions aren't atomic.
I recommend reading "Java concurrency in practice" by Brian Goetz: the chapters devoted to compound actions and "publication and escape" problems.
Your problem is that you are not using thread safe operations for your numeric variables max and current. Because of that, many threads can get the same value from them even tough it has already been changed.
You could add synchronized blocks to handle reading / writing to them, but the best approach is to use thread safe classes to handle that for you.
If you need long values, that would be AtomicLong. Take a look at it’s documentation, it has methods to do the operations you want.
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/AtomicLong.html
Whenever you’re multithreading you should go for threadsafe objects, such as the Atomic family, ConcurrentHashMap for maps, and so on.
Hope it helps!
The only problem here is the following line:
g3 = g3.increase(1);
This is equivalent to the following lines:
var tmp = g3;
tmp = tmp.increase(1);
g3 = tmp;
To fix this, you could use a Compare And Swap:
private static final VarHandle G3;
static {
try {
G3 = MethodHandles.lookup().findVarHandle(T4.class, "g3", ImmutableGaugeV4.class);
} catch (ReflectiveOperationException roe) {
throw new Error(roe);
}
}
And then replace g3 = g3.increase(1); with:
ImmutableGaugeV4 oldVal, newVal;
do {
oldVal = g3;
newVal = oldVal.increase(1);
} while (!G3.compareAndSet(T4.this, oldVal, newVal));
System.out.println(newVal);
In the end, your T4 becomes:
import java.lang.invoke.MethodHandles;
import java.lang.invoke.VarHandle;
public class T4 {
public static void main(String[] args) {
new T4().x();
}
ImmutableGaugeV4 g3 = new ImmutableGaugeV4(10000);
private static final VarHandle G3;
static {
try {
G3 = MethodHandles.lookup().findVarHandle(T4.class, "g3", ImmutableGaugeV4.class);
} catch (ReflectiveOperationException roe) {
throw new Error(roe);
}
}
private void x() {
for (int i = 0; i < 10; i++) {
new Thread() {
public void run() {
for (int j = 0; j < 1000; j++) {
ImmutableGaugeV4 oldVal, newVal;
do {
oldVal = g3;
newVal = oldVal.increase(1);
} while (!G3.compareAndSet(T4.this, oldVal, newVal));
System.out.println(newVal);
}
}
}.start();
}
}
}

Parallely processing an array in java

I am trying to apply get faster output through threads. Just doing a small POC sort.
Suppose I have a problem statement to find all the the numbers in an array who have odd occurrence.
Following is my attempt for both sequentially and parallel.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Random;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Collectors;
public class Test1 {
final static Map<Integer, Integer> mymap = new HashMap<>();
static Map<Integer, AtomicInteger> mymap1 = new ConcurrentHashMap<>();
public static void generateData(final int[] arr) {
final Random aRandom = new Random();
for (int i = 0; i < arr.length; i++) {
arr[i] = aRandom.nextInt(10);
}
}
public static void calculateAllOddOccurrence(final int[] arr) {
for (int i = 0; i < arr.length; i++) {
if (mymap.containsKey(arr[i])) {
mymap.put(arr[i], mymap.get(arr[i]) + 1);
} else {
mymap.put(arr[i], 1);
}
}
for (final Map.Entry<Integer, Integer> entry : mymap.entrySet()) {
if (entry.getValue() % 2 != 0) {
System.out.println(entry.getKey() + "=" + entry.getValue());
}
}
}
public static void calculateAllOddOccurrenceThread(final int[] arr) {
final ExecutorService executor = Executors.newFixedThreadPool(10);
final List<Future<?>> results = new ArrayList<>();
;
final int range = arr.length / 10;
for (int count = 0; count < 10; ++count) {
final int startAt = count * range;
final int endAt = startAt + range;
executor.submit(() -> {
for (int i = startAt; i < endAt; i++) {
if (mymap1.containsKey(arr[i])) {
final AtomicInteger accumulator = mymap1.get(arr[i]);
accumulator.incrementAndGet();
mymap1.put(arr[i], accumulator);
} else {
mymap1.put(arr[i], new AtomicInteger(1));
}
}
});
}
awaitTerminationAfterShutdown(executor);
for (final Entry<Integer, AtomicInteger> entry : mymap1.entrySet()) {
if (entry.getValue().get() % 2 != 0) {
System.out.println(entry.getKey() + "=" + entry.getValue());
}
}
}
public static void calculateAllOddOccurrenceStream(final int[] arr) {
final ConcurrentMap<Integer, List<Integer>> map2 = Arrays.stream(arr).parallel().boxed().collect(Collectors.groupingByConcurrent(i -> i));
map2.entrySet().stream().parallel().filter(e -> e.getValue().size() % 2 != 0).forEach(entry -> System.out.println(entry.getKey() + "=" + entry.getValue().size()));
}
public static void awaitTerminationAfterShutdown(final ExecutorService threadPool) {
threadPool.shutdown();
try {
if (!threadPool.awaitTermination(60, TimeUnit.SECONDS)) {
threadPool.shutdownNow();
}
} catch (final InterruptedException ex) {
threadPool.shutdownNow();
Thread.currentThread().interrupt();
}
}
public static void main(final String... doYourBest) {
final int[] arr = new int[200000000];
generateData(arr);
long starttime = System.currentTimeMillis();
calculateAllOddOccurrence(arr);
System.out.println("Total time=" + (System.currentTimeMillis() - starttime));
starttime = System.currentTimeMillis();
calculateAllOddOccurrenceStream(arr);
System.out.println("Total time Thread=" + (System.currentTimeMillis() - starttime));
}
}
Output:
1=20003685
2=20000961
3=19991311
5=20006433
7=19995737
8=19999463
Total time=3418
5=20006433
7=19995737
1=20003685
8=19999463
2=20000961
3=19991311
Total time Thread=19640
Parallel execution (calculateAllOddOccurrenceStream ) is taking more time. What is the best way to process an array in parallel and then merge the result?
My goal is not to find the fastest algorithm, but to use any algorithm and try to run on in different threads such that they are processing different part of array simultaneously.
It seems that those threads are working on same parts of the array simultaneously hence the answer is not coming correctly.
Rather divide the array in parts with proper start and end indexes. Allocate separate threads to process these parts and count the occurences of each number in each of those parts.
At the end, you would have multiple maps having counts calculated from those separate parts. Merge those maps to get the final answer.
OR you could have a single concurrentHashMap for storing the counts coming from all those threads, but a bug could creep in there I guess as there would still be concurrent write conflicts. In a highly multi-threaded environment, writes on a cocnurrentHashMap might not be 100% safe. For a guaranteed write behaviour, the correct way is to use the the atomicity of ConcurrentHashMap.putIfAbsent(K key, V value) method and pay attention to the return value, which tells if the put operation was successful or not. Simple put might not be correct. See https://stackoverflow.com/a/14947844/945214
You could use java 8 streams API (https://www.journaldev.com/2774/java-8-stream) to write the code OR simple threading code using Java 5 constructs would also do.
Added Java8 stream code, Notice the timing differences. ArrayList (instead) of an array makes a difference:
package com.test;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.function.Function;
import java.util.stream.Collectors;
public class Test {
public static void generateData(final int[] arr) {
final Random aRandom = new Random();
for (int i = 0; i < arr.length; i++) {
arr[i] = aRandom.nextInt(10);
}
}
public static void calculateAllOddOccurrence(final int[] arr) {
final Map<Integer, Integer> mymap = new HashMap<>();
for (int i = 0; i < arr.length; i++) {
if (mymap.containsKey(arr[i])) {
mymap.put(arr[i], mymap.get(arr[i]) + 1);
} else {
mymap.put(arr[i], 1);
}
}
for (final Map.Entry<Integer, Integer> entry : mymap.entrySet()) {
if (entry.getValue() % 2 != 0) {
System.out.println(entry.getKey() + "=" + entry.getValue());
}
}
}
public static void calculateAllOddOccurrenceStream( int[] arr) {
Arrays.stream(arr).boxed().collect(Collectors.groupingBy(Function.identity(), Collectors.counting())).entrySet().parallelStream().filter(e -> e.getValue() % 2 != 0).forEach(entry -> System.out.println(entry.getKey()+"="+ entry.getValue()));
}
public static void calculateAllOddOccurrenceStream(List<Integer> list) {
list.parallelStream().collect(Collectors.groupingBy(Function.identity(), Collectors.counting())).entrySet().parallelStream().filter(e -> e.getValue() % 2 != 0).forEach(entry -> System.out.println(entry.getKey()+"="+ entry.getValue()));
}
public static void main(final String... doYourBest) {
final int[] arr = new int[200000000];
generateData(arr);
long starttime = System.currentTimeMillis();
calculateAllOddOccurrence(arr);
System.out.println("Total time with simple map=" + (System.currentTimeMillis() - starttime));
List<Integer> list = Arrays.stream(arr).boxed().collect(Collectors.toList());
starttime = System.currentTimeMillis();
calculateAllOddOccurrenceStream(list);
System.out.println("Total time stream - with a readymade list, which might be the case for most apps as arraylist is more easier to work with =" + (System.currentTimeMillis() - starttime));
starttime = System.currentTimeMillis();
calculateAllOddOccurrenceStream(arr);
System.out.println("Total time Stream with array=" + (System.currentTimeMillis() - starttime));
}}
OUTPUT
0=19999427
2=20001707
4=20002331
5=20001585
7=20001859
8=19993989
Total time with simple map=2813
4=20002331
0=19999427
2=20001707
7=20001859
8=19993989
5=20001585
Total time stream - with a readymade list, which might be the case for most apps as arraylist is more easier to work with = 3328
8=19993989
7=20001859
0=19999427
4=20002331
2=20001707
5=20001585
Total time Stream with array=6115
You are looking at the STREAMS API introduced in Java 8:
http://www.baeldung.com/java-8-streams
Example:
// sequential processes
myArray.stream().filter( ... ).map( ... ).collect(Collectors.toList()):
// parallel processes
myArray.parallelStream().filter( ... ).map( ... ).collect(Collectors.toList());
Looking at your code, you're going wrong with this line:
mymap1.put(arr[i], mymap1.get(arr[i]) + 1);
You are overwriting the values in parallel, for example:
Thread 1 'get' = 0
Thread 2 'get' = 0
Thread 1 'put 1'
Thread 2 'put 1'
Change your map to:
static Map<Integer, AtomicInteger> mymap1 = new ConcurrentHashMap<>();
static {
//initialize to avoid null values and non-synchronized puts from different Threads
for(int i=0;i<10;i++) {
mymap1.put(i, new AtomicInteger());
}
}
....
//in your loop
for (int i = 0; i < arr.length; i++) {
AtomicInteger accumulator = mymap1.get(arr[i]);
accumulator.incrementAndGet();
}
Edit: The problem with the above approach is of course the initialization of mymap1. To avoid falling into the same trap (creating AtomicInteger within the loop and overwriting each other yet again), it needs to be prefilled with values.
Since I'm feeling generous, here's what might work with the Streams API:
int totalEvenCount = Arrays.stream(arr).parallel().filter(i->i%2==0).reduce(0, Integer::sum);
int totalOddCount = Arrays.stream(arr).parallel().filter(i->i%2!=0).reduce(0, Integer::sum);
//or this to count by individual numbers:
ConcurrentMap<Integer,List<Integer>> map1 = Arrays.stream(arr).parallel().boxed().collect(Collectors.groupingByConcurrent(i->i));
map1.entrySet().stream().filter(e -> e.getKey()%2!=0).forEach(entry -> System.out.println(entry.getKey() + "=" + entry.getValue().size()));
As an exercise to the reader, perhaps you can look into how the various Collectors work, in order to write your own countingBy(i->i%2!=0) to output a map only containing the counts instead of a list of values.

java map concurrent update

I'm trying to create a Map with int values and increase them by multiple threads. two or more threads might increase the same key.
ConcurrentHashMap documentation was very unclear to me since it sais that:
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove)
I wonder if the following code using ConcurrentHashMap will works correctly:
myMap.put(X, myMap.get(X) + 1);
if not, how can I manage such thing?
Concurrent map will not help thread safety of your code. You still can get race condition:
Thread-1: x = 1, get(x)
Thread-2: x = 1, get(x)
Thread-1: put(x + 1) => 2
Thread-2: put(x + 1) => 2
Two increments happened, but you still get only +1. You need a concurrent map only if you aim for modifying the map itself, not its content. Even the simplest HashMap is threadsafe for concurrent reads, given the map is not mutated anymore.
So instead of a threadsafe map for primitive type, you need a threadsafe wrapper for the type. Either something from java.util.concurrent.atomic or roll your own locked container if needing an arbitrary type.
One idea would be combining ConcurrentMap with AtomicInteger, which has a increment method.
AtomicInteger current = map.putIfAbsent(key, new AtomicInteger(1));
int newValue = current == null ? 1 :current.incrementAndGet();
or (more efficiently, thanks #Keppil) with an extra code guard to avoid unnecessary object creation:
AtomicInteger current = map.get(key);
if (current == null){
current = map.putIfAbsent(key, new AtomicInteger(1));
}
int newValue = current == null ? 1 : current.incrementAndGet();
Best practice. You can use HashMap and AtomicInteger.
Test code:
public class HashMapAtomicIntegerTest {
public static final int KEY = 10;
public static void main(String[] args) {
HashMap<Integer, AtomicInteger> concurrentHashMap = new HashMap<Integer, AtomicInteger>();
concurrentHashMap.put(HashMapAtomicIntegerTest.KEY, new AtomicInteger());
List<HashMapAtomicCountThread> threadList = new ArrayList<HashMapAtomicCountThread>();
for (int i = 0; i < 500; i++) {
HashMapAtomicCountThread testThread = new HashMapAtomicCountThread(
concurrentHashMap);
testThread.start();
threadList.add(testThread);
}
int index = 0;
while (true) {
for (int i = index; i < 500; i++) {
HashMapAtomicCountThread testThread = threadList.get(i);
if (testThread.isAlive()) {
break;
} else {
index++;
}
}
if (index == 500) {
break;
}
}
System.out.println("The result value should be " + 5000000
+ ",actually is"
+ concurrentHashMap.get(HashMapAtomicIntegerTest.KEY));
}
}
class HashMapAtomicCountThread extends Thread {
HashMap<Integer, AtomicInteger> concurrentHashMap = null;
public HashMapAtomicCountThread(
HashMap<Integer, AtomicInteger> concurrentHashMap) {
this.concurrentHashMap = concurrentHashMap;
}
#Override
public void run() {
for (int i = 0; i < 10000; i++) {
concurrentHashMap.get(HashMapAtomicIntegerTest.KEY)
.getAndIncrement();
}
}
}
Results:
The result value should be 5000000,actually is5000000
Or HashMap and synchronized, but much slower than the former
public class HashMapSynchronizeTest {
public static final int KEY = 10;
public static void main(String[] args) {
HashMap<Integer, Integer> hashMap = new HashMap<Integer, Integer>();
hashMap.put(KEY, 0);
List<HashMapSynchronizeThread> threadList = new ArrayList<HashMapSynchronizeThread>();
for (int i = 0; i < 500; i++) {
HashMapSynchronizeThread testThread = new HashMapSynchronizeThread(
hashMap);
testThread.start();
threadList.add(testThread);
}
int index = 0;
while (true) {
for (int i = index; i < 500; i++) {
HashMapSynchronizeThread testThread = threadList.get(i);
if (testThread.isAlive()) {
break;
} else {
index++;
}
}
if (index == 500) {
break;
}
}
System.out.println("The result value should be " + 5000000
+ ",actually is" + hashMap.get(KEY));
}
}
class HashMapSynchronizeThread extends Thread {
HashMap<Integer, Integer> hashMap = null;
public HashMapSynchronizeThread(
HashMap<Integer, Integer> hashMap) {
this.hashMap = hashMap;
}
#Override
public void run() {
for (int i = 0; i < 10000; i++) {
synchronized (hashMap) {
hashMap.put(HashMapSynchronizeTest.KEY,
hashMap
.get(HashMapSynchronizeTest.KEY) + 1);
}
}
}
}
Results:
The result value should be 5000000,actually is5000000
Use ConcurrentHashMap will get the wrong results.
public class ConcurrentHashMapTest {
public static final int KEY = 10;
public static void main(String[] args) {
ConcurrentHashMap<Integer, Integer> concurrentHashMap = new ConcurrentHashMap<Integer, Integer>();
concurrentHashMap.put(KEY, 0);
List<CountThread> threadList = new ArrayList<CountThread>();
for (int i = 0; i < 500; i++) {
CountThread testThread = new CountThread(concurrentHashMap);
testThread.start();
threadList.add(testThread);
}
int index = 0;
while (true) {
for (int i = index; i < 500; i++) {
CountThread testThread = threadList.get(i);
if (testThread.isAlive()) {
break;
} else {
index++;
}
}
if (index == 500) {
break;
}
}
System.out.println("The result value should be " + 5000000
+ ",actually is" + concurrentHashMap.get(KEY));
}
}
class CountThread extends Thread {
ConcurrentHashMap<Integer, Integer> concurrentHashMap = null;
public CountThread(ConcurrentHashMap<Integer, Integer> concurrentHashMap) {
this.concurrentHashMap = concurrentHashMap;
}
#Override
public void run() {
for (int i = 0; i < 10000; i++) {
concurrentHashMap.put(ConcurrentHashMapTest.KEY,
concurrentHashMap.get(ConcurrentHashMapTest.KEY) + 1);
}
}
}
Results:
The result value should be 5000000,actually is11759
You could just put the operation in a synchronized (myMap) {...} block.
Your current code changes the values of your map concurrently so this will not work.
If multiple threads can put values into your map, you have to use a concurrent map like ConcurrentHashMap with non thread safe values like Integer. ConcurrentMap.replace will then do what you want (or use AtomicInteger to ease your code).
If your threads will only change the values (and not add/change the keys) of your map, then you can use a standard map storing thread safe values like AtomicInteger. Then your thread will call:map.get(key).incrementAndGet() for instance.

Is it a thread-safe mechanism?

Is this class thread-safe?
class Counter {
private ConcurrentMap<String, AtomicLong> map =
new ConcurrentHashMap<String, AtomicLong>();
public long add(String name) {
if (this.map.get(name) == null) {
this.map.putIfAbsent(name, new AtomicLong());
}
return this.map.get(name).incrementAndGet();
}
}
What do you think?
Yes, provided you make the map final. The if is not necessary but you can keep it for performance reasons if you want, although it will most likely not make a noticeable difference:
public long add(String name) {
this.map.putIfAbsent(name, new AtomicLong());
return this.map.get(name).incrementAndGet();
}
EDIT
For the sake of it, I have quickly tested both implementation (with and without the check). 10 millions calls on the same string take:
250 ms with the check
480 ms without the check
Which confirms what I said: unless you call this method millions of time or it is in performance critical part of your code, it does not make a difference.
EDIT 2
Full test result - see the BetterCounter which yields even better results. Now the test is very specific (no contention + the get always works) and does not necessarily correspond to your usage.
Counter: 482 ms
LazyCounter: 207 ms
MPCounter: 303 ms
BetterCounter: 135 ms
public class Test {
public static void main(String args[]) throws IOException {
Counter count = new Counter();
LazyCounter lazyCount = new LazyCounter();
MPCounter mpCount = new MPCounter();
BetterCounter betterCount = new BetterCounter();
//WARM UP
for (int i = 0; i < 10_000_000; i++) {
count.add("abc");
lazyCount.add("abc");
mpCount.add("abc");
betterCount.add("abc");
}
//TEST
long start = System.nanoTime();
for (int i = 0; i < 10_000_000; i++) {
count.add("abc");
}
long end = System.nanoTime();
System.out.println((end - start) / 1000000);
start = System.nanoTime();
for (int i = 0; i < 10_000_000; i++) {
lazyCount.add("abc");
}
end = System.nanoTime();
System.out.println((end - start) / 1000000);
start = System.nanoTime();
for (int i = 0; i < 10_000_000; i++) {
mpCount.add("abc");
}
end = System.nanoTime();
System.out.println((end - start) / 1000000);
start = System.nanoTime();
for (int i = 0; i < 10_000_000; i++) {
betterCount.add("abc");
}
end = System.nanoTime();
System.out.println((end - start) / 1000000);
}
static class Counter {
private final ConcurrentMap<String, AtomicLong> map =
new ConcurrentHashMap<String, AtomicLong>();
public long add(String name) {
this.map.putIfAbsent(name, new AtomicLong());
return this.map.get(name).incrementAndGet();
}
}
static class LazyCounter {
private final ConcurrentMap<String, AtomicLong> map =
new ConcurrentHashMap<String, AtomicLong>();
public long add(String name) {
if (this.map.get(name) == null) {
this.map.putIfAbsent(name, new AtomicLong());
}
return this.map.get(name).incrementAndGet();
}
}
static class BetterCounter {
private final ConcurrentMap<String, AtomicLong> map =
new ConcurrentHashMap<String, AtomicLong>();
public long add(String name) {
AtomicLong counter = this.map.get(name);
if (counter != null)
return counter.incrementAndGet();
AtomicLong newCounter = new AtomicLong();
counter = this.map.putIfAbsent(name, newCounter);
return (counter == null ? newCounter.incrementAndGet() : counter.incrementAndGet());
}
}
static class MPCounter {
private final ConcurrentMap<String, AtomicLong> map =
new ConcurrentHashMap<String, AtomicLong>();
public long add(String name) {
final AtomicLong newVal = new AtomicLong(),
prevVal = map.putIfAbsent(name, newVal);
return (prevVal != null ? prevVal : newVal).incrementAndGet();
}
}
}
EDIT
Yes if you make the map final. Otherwise, it's not guaranteed that all threads see the most recent version of the map data structure when they call add() for the first time.
Several threads can reach the body of the if(). The putIfAbsent() will make sure that only a single AtomicLong is put into the map.
There should be no way that putIfAbsent() can return without the new value being in the map.
So when the second get() is executed, it will never get a null value and since only a single AtomicLong can have been added to the map, all threads will get the same instance.
[EDIT2] The next question: How efficient is this?
This code is faster since it avoids unnecessary searches:
public long add(String name) {
AtomicLong counter = map.get( name );
if( null == counter ) {
map.putIfAbsent( name, new AtomicLong() );
counter = map.get( name ); // Have to get again!!!
}
return counter.incrementAndGet();
}
This is why I prefer Google's CacheBuilder which has a method that is called when a key can't be found. That way, the map is searched only once and I don't have to create extra instances.
No one seems to have the complete solution, which is:
public long add(String name) {
AtomicLong counter = this.map.get(name);
if (counter == null) {
AtomicLong newCounter = new AtomicLong();
counter = this.map.putIfAbsent(name, newCounter);
if(counter == null) {
counter = newCounter;
}
}
return counter.incrementAndGet();
}
What about this:
class Counter {
private final ConcurrentMap<String, AtomicLong> map =
new ConcurrentHashMap<String, AtomicLong>();
public long add(String name) {
this.map.putIfAbsent(name, new AtomicLong());
return this.map.get(name).incrementAndGet();
}
}
The map should be final to guarantee it is fully visible to all threads before the first method is invoked. (see 17.5 final Field Semantics (Java Language Specification) for details)
I think the if is redundant, I hope I'm not overseeing anything.
Edit: Added a quote from the Java Language Specification:
This solution (note that I am showing only the body of the add method -- the rest stays the same!) spares you of any calls to get:
final AtomicLong newVal = new AtomicLong(),
prevVal = map.putIfAbsent(name, newVal);
return (prevVal != null? prevVal : newVal).incrementAndGet();
In all probability an extra get is much costlier than an extra new AtomicLong().
I think you would be better off with something like this:
class Counter {
private ConcurrentMap<String, AtomicLong> map = new ConcurrentHashMap<String, AtomicLong>();
public long add(String name) {
AtomicLong counter = this.map.get(name);
if (counter == null) {
AtomicLong newCounter = new AtomicLong();
counter = this.map.putIfAbsent(name, newCounter);
if (counter == null) {
// The new counter was added - use it
counter = newCounter;
}
}
return counter.incrementAndGet();
}
}
Otherwise multiple threads may add simultaneously and you wouldn't notice (since you ignore the value returned by putIfAbsent).
I assume that you never recreate the map.

Multithreading - Counting total amount of words from several files

I made a program to count words from individual files,
but how can i modify my program, so it gives the total amount of words from all files (as ONE value).
My code looks like this:
public class WordCount implements Runnable
{
public WordCount(String filename)
{
this.filename = filename;
}
public void run()
{
int count = 0;
try
{
Scanner in = new Scanner(new File(filename));
while (in.hasNext())
{
in.next();
count++;
}
System.out.println(filename + ": " + count);
}
catch (FileNotFoundException e)
{
System.out.println(filename + " blev ikke fundet.");
}
}
private String filename;
}
With a Main-Class:
public class Main
{
public static void main(String args[])
{
for (String filename : args)
{
Runnable tester = new WordCount(filename);
Thread t = new Thread(tester);
t.start();
}
}
}
And how to avoid race conditions?
Thank you for your help.
A worker thread:
class WordCount extends Thread
{
int count;
#Override
public void run()
{
count = 0;
/* Count the words... */
...
++count;
...
}
}
And a class to use them:
class Main
{
public static void main(String args[]) throws InterruptedException
{
WordCount[] counters = new WordCount[args.length];
for (int idx = 0; idx < args.length; ++idx) {
counters[idx] = new WordCount(args[idx]);
counters[idx].start();
}
int total = 0;
for (WordCount counter : counters) {
counter.join();
total += counter.count;
}
System.out.println("Total: " + total);
}
}
Many hard drives don't do a great job of reading multiple files concurrently. Locality of reference has a big impact on performance.
You can either use Future to get the count number and in the end add up all the counts or use a static variable and increment it in a synchronized manner i.e. use explicitely synchronized or use Atomic Increment
What if your Runnable took two arguments:
a BlockingQueue<String> or BlockingQueue<File> of input files
an AtomicLong
In a loop, you would get the next String/File from the queue, count its words, and increment the AtomicLong by that amount. Whether the loop is while(!queue.isEmpty()) or while(!done) depends on how you feed files into the queue: if you know all the files from the start, you can use the isEmpty version, but if you're streaming them in from somewhere, you want to use the !done version (and have done be a volatile boolean or AtomicBoolean for memory visibility).
Then you feed these Runnables to an executor, and you should be good to go.
You can create some listener to get a feedback from the thread.
public interface ResultListener {
public synchronized void result(int words);
}
private String filename;
private ResultListener listener;
public void run()
{
int count = 0;
try
{
Scanner in = new Scanner(new File(filename));
while (in.hasNext())
{
in.next();
count++;
}
listener.result(count);
}
catch (FileNotFoundException e)
{
System.out.println(filename + " blev ikke fundet.");
}
}
}
You can add a contructor parameter for the listener just like for your filename.
public class Main
{
private static int totalCount = 0;
private static ResultListener listener = new ResultListener(){
public synchronized void result(int words){
totalCount += words;
}
}
public static void main(String args[])
{
for (String filename : args)
{
Runnable tester = new WordCount(filename, listener);
Thread t = new Thread(tester);
t.start();
}
}
}
You can make the count volatile and static so all the threads can increment it.
public class WordCount implements Runnable
{
private static AtomicInteger count = new AtomicInteger(0); // <-- now all threads increment the same count
private String filename;
public WordCount(String filename)
{
this.filename = filename;
}
public static int getCount()
{
return count.get();
}
public void run()
{
try
{
Scanner in = new Scanner(new File(filename));
while (in.hasNext())
{
in.next();
count.incrementAndGet();
}
System.out.println(filename + ": " + count);
}
catch (FileNotFoundException e)
{
System.out.println(filename + " blev ikke fundet.");
}
}
}
Update: haven't done java in a while, but the point about making it a private static field still stands... just make it an AtomicInteger.
You could create a Thread pool with a synchronized task queue that would hold all of the files you wish to count the words for.
When your thread pool workers come online they could ask the task queue for a file to count.
After the worker completes their job then they could notify the main thread of their final number.
The main thread would have a synchronized notify method that would add up all of the worker threads' results.
Hope this helps.
Or you can have all the threads update a single word count variable. count++ is atomic if count is word-sided (an int should suffice).
EDIT: Turns out the Java specs are just silly enough that count++ is not atomic. I have no idea why. Anyway, look at AtomicInteger and its incrementAndGet method. Hopefully this is atomic (I don't know what to expect now...), and you don't need any other synchronization mechanisms - just store your count in an AtomicInteger.
The given solution is shared with consideration to Java8 concurrent package involving Executors and Future for multithreading.
First, callable class created for processing individual file
public class WordCounter implements Callable {
Path bookPath;
public WordCounter(Path bookPath) {
this.bookPath = bookPath;
}
#Override
public Map<String, Long> call() throws Exception {
Map<String, Long> wordCount = new HashMap<>();
wordCount = Files.lines(bookPath).flatMap(line -> Arrays.stream(line.trim().split(" ")).parallel())
.map(word -> word.replaceAll("[^a-zA-Z]", "").toLowerCase().trim())
.filter(word -> word.length() > 0)
.map(word -> new SimpleEntry<>(word, 1))
.collect(Collectors.groupingBy(SimpleEntry::getKey, Collectors.counting()));
return wordCount;
}
}
Now, we'll create multiple future tasks to invoke/process each file in the argument as below
ExecutorService exes = Executors.newCachedThreadPool();
FutureTask[] tasks = new FutureTask[count];
Map<String, Long> result = new HashMap<>();
Path[] books = new Path[2];
books[0] = Paths.get("C:\\Users\\Documents\\book1.txt");
books[1] = Paths.get("C:\\Users\\Documents\\book2.txt");
for(int i=0; i<books.length; i++) {
tasks[i] = new FutureTask(new WordCounter(books[i]));
exes.submit(tasks[i]);
}
for(int i=0; i<count; i++) {
try {
Map<String, Long> wordCount = (Map<String, Long>) tasks[i].get();
wordCount.forEach((k,v) -> result.put(k, result.getOrDefault(k, 0L)+1));
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
}
exes.shutdown();
Further result map can be upgraded to volatile keyword and shared among the WordCounter threads to update word count concurrently.
End Result : result.size() should give the expected output

Categories

Resources