I've a program where I am trying to understand thread parallelism. This program deals with coin-flips and counts the number of heads and tails (and the total number of coin flips).
Please see the following code:
import java.util.Random;
import java.util.concurrent.ConcurrentHashMap;
public class CoinFlip{
// main
public static void main (String[] args) {
if (args.length != 2){
System.out.println("CoinFlip #threads #iterations");
return;
}
// check if arguments are integers
int numberOfThreads = 0;
long iterations = 0;
try{
numberOfThreads = Integer.parseInt(args[0]);
iterations = Long.parseLong(args[1]);
}catch(NumberFormatException e){
System.out.println("error: I asked for numbers mate.");
System.out.println("error: " + e);
System.exit(1);
}
// ------------------------------
// set time field
// ------------------------------
// create a hashmap
ConcurrentHashMap <String, Long> universalMap = new ConcurrentHashMap <String, Long> ();
// store count for heads, tails and iterations
universalMap.put("HEADS", new Long(0));
universalMap.put("TAILS", new Long(0));
universalMap.put("ITERATIONS", new Long(0));
long startTime = System.currentTimeMillis();
Thread[] doFlip = new Thread[numberOfThreads];
for (int i = 0; i < numberOfThreads; i ++){
doFlip[i] = new Thread( new DoFlip(iterations/numberOfThreads, universalMap));
doFlip[i].start();
}
for (int i = 0; i < numberOfThreads; i++){
try{
doFlip[i].join();
}catch(InterruptedException e){
System.out.println(e);
}
}
// log time taken to accomplish task
long elapsedTime = System.currentTimeMillis() - startTime;
System.out.println("Runtime:" + elapsedTime);
// print the output to check if the values are legal
// iterations = heads + tails = args[1]
System.out.println(
universalMap.get("HEADS") + " " +
universalMap.get("TAILS") + " " +
universalMap.get("ITERATIONS") + "."
);
return;
}
private static class DoFlip implements Runnable{
// local counters for heads/tails/count
long heads = 0, tails = 0, iterations = 0;
Random randomHT = new Random();
// constructor values -----------------------
long times = 0; // number of iterations
ConcurrentHashMap <String, Long> map; // pointer to hash map
DoFlip(long times, ConcurrentHashMap <String, Long> map){
this.times = times;
this.map = map;
}
public void run(){
while(this.times > 0){
int r = randomHT.nextInt(2); // 0 and 1
if (r == 1){
this.heads ++;
}else{
this.tails ++;
}
// System.out.println("Happening...");
this.iterations ++;
this.times --;
}
updateStats();
}
public void updateStats(){
// read from hashmap and get the existing values
Long nHeads = (Long)this.map.get("HEADS");
Long nTails = (Long)this.map.get("TAILS");
Long nIterations = (Long)this.map.get("ITERATIONS");
// update values
nHeads = nHeads + this.heads;
nTails = nTails + this.tails;
nIterations = nIterations + this.iterations;
// push updated values to hashmap
this.map.put("HEADS", nHeads);
this.map.put("TAILS", nTails);
this.map.put("ITERATIONS", nIterations);
}
}
}
I am using a ConcurrentHashMap to store the different counts. Apparently, when the returns wrong values.
I wrote a PERL script to check the (sum of) values of heads and tails (individually for each thread), it seems to be appropriate. I cannot understand why I get different values from the hashmap.
A concurrent hash map provides you with guarantees with respect to visibility of changes with respect to the map itself, not to its values. In this case you retrieve some values from the map, hold them for some arbitrary amount of time, then try and store them into the map again. In between the read and consequent write though, any number of operations might have happened on the map.
The concurrent in concurrent hash map just guarantees, for example, that if I put a value into a map, that I will actually be able to read that value in another thread (aka it will be visible).
What you need to do is ensure that all threads accessing the map wait their turn, so to speak, when updating the shared counters. In order to do this, you either have to use an atomic operation like 'addAndGet` on AtomicInteger:
this.map.get("HEADS").addAndGet(this.heads);
or you need to synchronize both the read and write manually (most easily accomplished by synchronizing on the map itself):
synchronized(this.map) {
Long currentHeads = this.map.get("HEADS");
this.map.put("HEADS", Long.valueOf(currentHeads.longValue() + this.heads);
}
Personally, I prefer to leverage the SDK whenever I can, so I would go with the use of an Atomic data type.
You should use AtomicLongs as values and you should create them only once and increment them instead of get/put.
ConcurrentHashMap <String, AtomicLong> universalMap = new ConcurrentHashMap <String, AtomicLong> ();
...
universalMap.put("HEADS", new AtomicLong(0));
universalMap.put("TAILS", new AtomicLong(0));
universalMap.put("ITERATIONS", new AtomicLong(0));
...
public void updateStats(){
// read from hashmap and get the existing values
this.map.get("HEADS").getAndAdd(heads);
this.map.get("TAILS").getAndAdd(tails);
this.map.get("ITERATIONS").getAndAdd(iterations);
}
Long is immutable.
An example:
Thread 1: get 0
Thread 2: get 0
Thread 2: put 10
Thread 3: get 10
Thread 3: put 15
Thread 1: put 5
Now your map contains 5 instead of 20
Basically your problem is not the Map. You can use a regular HashMap since you do not modify it. Of course you have to make the map field final.
A couple things. One you really don't need to use a ConcurrentHashMap. A ConcurrentHashMap is only useful when you are dealing with concurrent put/removes. In this case the map is fairly static as far as the keys go simply use an UnmodifiableMap to prove this.
Finally if you are dealing with concurrent adds you really should consider using a LongAdder. It scales far better when many parallel adds occur in which you don't need to worry about the count until the end.
public class HeadsTails{
private final Map<String, LongAdder> map;
public HeadsTails(){
Map<String,LongAdder> local = new HashMap<String,LongAdder>();
local.put("HEADS", new LongAdder());
local.put("TAILS", new LongAdder());
local.put("ITERATIONS", new LongAdder());
map = Collections.unmodifiableMap(local);
}
public void count(){
map.get("HEADS").increment();
map.get("TAILS").increment();
}
public void print(){
System.out.println(map.get("HEADS").sum());
/// etc...
}
}
I mean, in reality I wouldn't even use a map...
public class HeadsTails{
private final LongAdder heads = new LongAdder();
private final LongAdder tails = new LongAdder();
private final LongAdder iterations = new LongAdder();
private final Map<String, LongAdder> map;
public void count(){
heads.increment();
tails.increment();
}
public void print(){
System.out.println(iterations.sum());
}
}
Related
I'm working on an application, that has uses a HashMap to share state. I need to prove via unit tests that it will have problems in a multi-threaded environment.
I tried to check the state of the application in a single thread environment and in a multi-threaded environment via checking the size and elements of the HashMap in both of them. But seems this doesn't help, the state is always the same.
Are there any other ways to prove it or prove that an application that performs operations on the map works well with concurrent requests?
This is quite easy to prove.
Shortly
A hash map is based on an array, where each item represents a bucket. As more keys are added, the buckets grow and at a certain threshold the array is recreated with a bigger size so that its buckets are spread more evenly (performance considerations). During the array recreation, the array becomes empty, which results in empty result for the caller, until the recreation completes.
Details and Proof
It means that sometimes HashMap#put() will internally call HashMap#resize() to make the underlying array bigger.
HashMap#resize() assigns the table field a new empty array with a bigger capacity and populates it with the old items. While this population happens, the underlying array doesn't contain all of the old items and calling HashMap#get() with an existing key may return null.
The following code demonstrates that. You are very likely to get the exception that will mean the HashMap is not thread safe. I chose the target key as 65 535 - this way it will be the last element in the array, thus being the last element during re-population which increases the possibility of getting null on HashMap#get() (to see why, see HashMap#put() implementation).
final Map<Integer, String> map = new HashMap<>();
final Integer targetKey = 0b1111_1111_1111_1111; // 65 535
final String targetValue = "v";
map.put(targetKey, targetValue);
new Thread(() -> {
IntStream.range(0, targetKey).forEach(key -> map.put(key, "someValue"));
}).start();
while (true) {
if (!targetValue.equals(map.get(targetKey))) {
throw new RuntimeException("HashMap is not thread safe.");
}
}
One thread adds new keys to the map. The other thread constantly checks the targetKey is present.
If count those exceptions, I get around 200 000.
It is hard to simulate Race but looking at the OpenJDK source for put() method of HashMap:
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
//Operation 1
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
//Operation 2
modCount++;
//Operation 3
addEntry(hash, key, value, i);
return null;
}
As you can see put() involves 3 operations which are not synchronized. And compound operations are non thread safe. So theoretically it is proven that HashMap is not thread safe.
Its an old thread. But just pasting my sample code which is able to demonstrate the problems with hashmap.
Take a look at the below code, we try to insert 30000 Items into the hashmap using 10 threads (3000 items per thread).
So after all the threads are completed, you should ideally see that the size of hashmap should be 30000. But the actual output would be either an exception while rebuilding the tree or the final count is less than 30000.
class TempValue {
int value = 3;
#Override
public int hashCode() {
return 1; // All objects of this class will have same hashcode.
}
}
public class TestClass {
public static void main(String args[]) {
Map<TempValue, TempValue> myMap = new HashMap<>();
List<Thread> listOfThreads = new ArrayList<>();
// Create 10 Threads
for (int i = 0; i < 10; i++) {
Thread thread = new Thread(() -> {
// Let Each thread insert 3000 Items
for (int j = 0; j < 3000; j++) {
TempValue key = new TempValue();
myMap.put(key, key);
}
});
thread.start();
listOfThreads.add(thread);
}
for (Thread thread : listOfThreads) {
thread.join();
}
System.out.println("Count should be 30000, actual is : " + myMap.size());
}
}
Output 1 :
Count should be 30000, actual is : 29486
Output 2 : (Exception)
java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNodejava.lang.ClassCastException: java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode
at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1819)
at java.util.HashMap$TreeNode.treeify(HashMap.java:1936)
at java.util.HashMap.treeifyBin(HashMap.java:771)
at java.util.HashMap.putVal(HashMap.java:643)
at java.util.HashMap.put(HashMap.java:611)
at TestClass.lambda$0(TestClass.java:340)
at java.lang.Thread.run(Thread.java:745)
However if you modify the line Map<TempValue, TempValue> myMap = new HashMap<>(); to a ConcurrentHashMap the output is always 30000.
Another Observation :
In the above example the hashcode for all objects of TempValue class was the same(** i.e., 1**). So you might be wondering, this issue with HashMap might occur only in case there is a collision (due to hashcode).
I tried another example.
Modify the TempValue class to
class TempValue {
int value = 3;
}
Now re-execute the same code. Out of every 5 runs, I see 2-3 runs still give a different output than 30000.
So even if you usually don't have much collisions, you might still end up with an issue. (Maybe due to rebuilding of HashMap, etc.)
Overall these examples show the issue with HashMap which ConcurrentHashMap handles.
I need to prove via unit tests that it will have problems in multithread environment.
This is going to be tremendously hard to do. Race conditions are very hard to demonstrate. You could certainly write a program which does puts and gets into a HashMap in a large number of threads but logging, volatile fields, other locks, and other timing details of your application may make it extremely hard to force your particular code to fail.
Here's a stupid little HashMap failure test case. It fails because it times out when the threads go into an infinite loop because of memory corruption of HashMap. However, it may not fail for you depending on number of cores and other architecture details.
#Test(timeout = 10000)
public void runTest() throws Exception {
final Map<Integer, String> map = new HashMap<Integer, String>();
ExecutorService pool = Executors.newFixedThreadPool(10);
for (int i = 0; i < 10; i++) {
pool.submit(new Runnable() {
#Override
public void run() {
for (int i = 0; i < 10000; i++) {
map.put(i, "wow");
}
}
});
}
pool.shutdown();
pool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
}
Is reading the API docs enough? There is a statement in there:
Note that this implementation is not synchronized. If multiple threads
access a hash map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally. (A
structural modification is any operation that adds or deletes one or
more mappings; merely changing the value associated with a key that an
instance already contains is not a structural modification.) This is
typically accomplished by synchronizing on some object that naturally
encapsulates the map. If no such object exists, the map should be
"wrapped" using the Collections.synchronizedMap method. This is best
done at creation time, to prevent accidental unsynchronized access to
the map:
The problem with thread safety is that it's hard to prove through a test. It could be fine most of the times. Your best bet would be to just run a bunch of threads that are getting/putting and you'll probably get some concurrency errors.
I suggest using a ConcurrentHashMap and trust that the Java team saying that HashMap is not synchronized is enough.
Are there any other ways to prove it?
How about reading the documentation (and paying attention to the emphasized "must"):
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally
If you are going to attempt to write a unit test that demonstrates incorrect behavior, I recommend the following:
Create a bunch of keys that all have the same hashcode (say 30 or 40)
Add values to the map for each key
Spawn a separate thread for the key, which has an infinite loop that (1) asserts that the key is present int the map, (2) removes the mapping for that key, and (3) adds the mapping back.
If you're lucky, the assertion will fail at some point, because the linked list behind the hash bucket will be corrupted. If you're unlucky, it will appear that HashMap is indeed threadsafe despite the documentation.
It may be possible, but will never be a perfect test. Race conditions are just too unpredictable. That being said, I wrote a similar type of test to help fix a threading issue with a proprietary data structure, and in my case, it was much easier to prove that something was wrong (before the fix) than to prove that nothing would go wrong (after the fix). You could probably construct a multi-threaded test that will eventually fail with sufficient time and the right parameters.
This post may be helpful in identifying areas to focus on in your test and has some other suggestions for optional replacements.
You can create multiple threads each adding an element to a hashmap and iterating over it.
i.e. In the run method we have to use "put" and then iterate using iterator.
For the case of HashMap we get ConcurrentModificationException while for ConcurrentHashMap we dont get.
Most probable race condition at java.util.HashMap implementation
Most of hashMaps failing if we are trying to read values while resizing or rehashing step executing. Resizing and rehashing operation executed under certain conditions most commonly if exceed bucket threshold. This code proves that if I call resizing externally or If I put more element than threshold and tend to call resizing operation internally causes to some null read which shows that HashMap is not thread safe. There should be more race condition but it is enough to prove it is not Thread Safe.
Practically proof of race condition
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.util.HashMap;
import java.util.Map;
import java.util.stream.IntStream;
public class HashMapThreadSafetyTest {
public static void main(String[] args) {
try {
(new HashMapThreadSafetyTest()).testIt();
} catch (Exception e) {
e.printStackTrace();
}
}
private void threadOperation(int number, Map<Integer, String> map) {
map.put(number, "hashMapTest");
while (map.get(number) != null);
//If code passes to this line that means we did some null read operation which should not be
System.out.println("Null Value Number: " + number);
}
private void callHashMapResizeExternally(Map<Integer, String> map)
throws NoSuchMethodException, InvocationTargetException, IllegalAccessException {
Method method = map.getClass().getDeclaredMethod("resize");
method.setAccessible(true);
System.out.println("calling resize");
method.invoke(map);
}
private void testIt()
throws InterruptedException, NoSuchMethodException, IllegalAccessException, InvocationTargetException {
final Map<Integer, String> map = new HashMap<>();
IntStream.range(0, 12).forEach(i -> new Thread(() -> threadOperation(i, map)).start());
Thread.sleep(60000);
// First loop should not show any null value number untill calling resize method of hashmap externally.
callHashMapResizeExternally(map);
// First loop should fail from now on and should print some Null Value Numbers to the out.
System.out.println("Loop count is 12 since hashmap initially created for 2^4 bucket and threshold of resizing"
+ "0.75*2^4 = 12 In first loop it should not fail since we do not resizing hashmap. "
+ "\n\nAfter 60 second: after calling external resizing operation with reflection should forcefully fail"
+ "thread safety");
Thread.sleep(2000);
final Map<Integer, String> map2 = new HashMap<>();
IntStream.range(100, 113).forEach(i -> new Thread(() -> threadOperation(i, map2)).start());
// Second loop should fail from now on and should print some Null Value Numbers to the out. Because it is
// iterating more than 12 that causes hash map resizing and rehashing
System.out.println("It should fail directly since it is exceeding hashmap initial threshold and it will resize"
+ "when loop iterate 13rd time");
}
}
Example output
No null value should be printed untill thread sleep line passed
calling resize
Loop count is 12 since hashmap initially created for 2^4 bucket and threshold of resizing0.75*2^4 = 12 In first loop it should not fail since we do not resizing hashmap.
After 60 second: after calling external resizing operation with reflection should forcefully failthread safety
Null Value Number: 11
Null Value Number: 5
Null Value Number: 6
Null Value Number: 8
Null Value Number: 0
Null Value Number: 7
Null Value Number: 2
It should fail directly since it is exceeding hashmap initial threshold and it will resizewhen loop iterate 13th time
Null Value Number: 111
Null Value Number: 100
Null Value Number: 107
Null Value Number: 110
Null Value Number: 104
Null Value Number: 106
Null Value Number: 109
Null Value Number: 105
Very Simple Solution to prove this
Here is the code, which proves the Hashmap implementation is not thread safe.
In this example, we are only adding the elements to map, not removing it from any method.
We can see that it prints the keys which are not in map, even though we have put the same key in map before doing get operation.
package threads;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class HashMapWorkingDemoInConcurrentEnvironment {
private Map<Long, String> cache = new HashMap<>();
public String put(Long key, String value) {
return cache.put(key, value);
}
public String get(Long key) {
return cache.get(key);
}
public static void main(String[] args) {
HashMapWorkingDemoInConcurrentEnvironment cache = new HashMapWorkingDemoInConcurrentEnvironment();
class Producer implements Callable<String> {
private Random rand = new Random();
public String call() throws Exception {
while (true) {
long key = rand.nextInt(1000);
cache.put(key, Long.toString(key));
if (cache.get(key) == null) {
System.out.println("Key " + key + " has not been put in the map");
}
}
}
}
ExecutorService executorService = Executors.newFixedThreadPool(4);
System.out.println("Adding value...");
try {
for (int i = 0; i < 4; i++) {
executorService.submit(new Producer());
}
} finally {
executorService.shutdown();
}
}
}
Sample Output for a execution run
Adding value...
Key 611 has not been put in the map
Key 978 has not been put in the map
Key 35 has not been put in the map
Key 202 has not been put in the map
Key 714 has not been put in the map
Key 328 has not been put in the map
Key 606 has not been put in the map
Key 149 has not been put in the map
Key 763 has not been put in the map
Its strange to see the values printed, that's why hashmap is not thread safe implementation working in concurrent environment.
There is a great tool open sourced by the OpenJDK team called JCStress which is used in the JDK for concurrency testing.
https://github.com/openjdk/jcstress
In one of its sample: https://github.com/openjdk/jcstress/blob/master/tests-custom/src/main/java/org/openjdk/jcstress/tests/collections/HashMapFailureTest.java
#JCStressTest
#Outcome(id = "0, 0, 1, 2", expect = Expect.ACCEPTABLE, desc = "No exceptions, entire map is okay.")
#Outcome(expect = Expect.ACCEPTABLE_INTERESTING, desc = "Something went wrong")
#State
public class HashMapFailureTest {
private final Map<Integer, Integer> map = new HashMap<>();
#Actor
public void actor1(IIII_Result r) {
try {
map.put(1, 1);
r.r1 = 0;
} catch (Exception e) {
r.r1 = 1;
}
}
#Actor
public void actor2(IIII_Result r) {
try {
map.put(2, 2);
r.r2 = 0;
} catch (Exception e) {
r.r2 = 1;
}
}
#Arbiter
public void arbiter(IIII_Result r) {
Integer v1 = map.get(1);
Integer v2 = map.get(2);
r.r3 = (v1 != null) ? v1 : -1;
r.r4 = (v2 != null) ? v2 : -1;
}
}
The methods marked with actor are run concurrently on different threads.
The result for this on my machine is:
Results across all configurations:
RESULT SAMPLES FREQ EXPECT DESCRIPTION
0, 0, -1, 2 3,854,896 5.25% Interesting Something went wrong
0, 0, 1, -1 4,251,564 5.79% Interesting Something went wrong
0, 0, 1, 2 65,363,492 88.97% Acceptable No exceptions, entire map is okay.
This shows that 88% of the times expected values were observed but in around 12% of the times, incorrect results were seen.
You can try out this tool and the samples and write your own tests to verify that concurrency of some code is broken.
As a yet another reply to this topic, I would recommend example from https://www.baeldung.com/java-concurrent-map, that looks as below. Theory is very straigthforwad - for N times we run 10 threads, that each of them increments the value in a common map 10 times. If the map was thread safe, the value should be 100 every time. Example proves, it's not.
#Test
public void givenHashMap_whenSumParallel_thenError() throws Exception {
Map<String, Integer> map = new HashMap<>();
List<Integer> sumList = parallelSum100(map, 100);
assertNotEquals(1, sumList
.stream()
.distinct()
.count());
long wrongResultCount = sumList
.stream()
.filter(num -> num != 100)
.count();
assertTrue(wrongResultCount > 0);
}
private List<Integer> parallelSum100(Map<String, Integer> map,
int executionTimes) throws InterruptedException {
List<Integer> sumList = new ArrayList<>(1000);
for (int i = 0; i < executionTimes; i++) {
map.put("test", 0);
ExecutorService executorService =
Executors.newFixedThreadPool(4);
for (int j = 0; j < 10; j++) {
executorService.execute(() -> {
for (int k = 0; k < 10; k++)
map.computeIfPresent(
"test",
(key, value) -> value + 1
);
});
}
executorService.shutdown();
executorService.awaitTermination(5, TimeUnit.SECONDS);
sumList.add(map.get("test"));
}
return sumList;
}
I am aggregating multiple values for keys in a multi-threaded environment. The keys are not known in advance. I thought I would do something like this:
class Aggregator {
protected ConcurrentHashMap<String, List<String>> entries =
new ConcurrentHashMap<String, List<String>>();
public Aggregator() {}
public void record(String key, String value) {
List<String> newList =
Collections.synchronizedList(new ArrayList<String>());
List<String> existingList = entries.putIfAbsent(key, newList);
List<String> values = existingList == null ? newList : existingList;
values.add(value);
}
}
The problem I see is that every time this method runs, I need to create a new instance of an ArrayList, which I then throw away (in most cases). This seems like unjustified abuse of the garbage collector. Is there a better, thread-safe way of initializing this kind of a structure without having to synchronize the record method? I am somewhat surprised by the decision to have the putIfAbsent method not return the newly-created element, and by the lack of a way to defer instantiation unless it is called for (so to speak).
Java 8 introduced an API to cater for this exact problem, making a 1-line solution:
public void record(String key, String value) {
entries.computeIfAbsent(key, k -> Collections.synchronizedList(new ArrayList<String>())).add(value);
}
For Java 7:
public void record(String key, String value) {
List<String> values = entries.get(key);
if (values == null) {
entries.putIfAbsent(key, Collections.synchronizedList(new ArrayList<String>()));
// At this point, there will definitely be a list for the key.
// We don't know or care which thread's new object is in there, so:
values = entries.get(key);
}
values.add(value);
}
This is the standard code pattern when populating a ConcurrentHashMap.
The special method putIfAbsent(K, V)) will either put your value object in, or if another thread got before you, then it will ignore your value object. Either way, after the call to putIfAbsent(K, V)), get(key) is guaranteed to be consistent between threads and therefore the above code is threadsafe.
The only wasted overhead is if some other thread adds a new entry at the same time for the same key: You may end up throwing away the newly created value, but that only happens if there is not already an entry and there's a race that your thread loses, which would typically be rare.
As of Java-8 you can create Multi Maps using the following pattern:
public void record(String key, String value) {
entries.computeIfAbsent(key,
k -> Collections.synchronizedList(new ArrayList<String>()))
.add(value);
}
The ConcurrentHashMap documentation (not the general contract) specifies that the ArrayList will only be created once for each key, at the slight initial cost of delaying updates while the ArrayList is being created for a new key:
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html#computeIfAbsent-K-java.util.function.Function-
In the end, I implemented a slight modification of #Bohemian's answer. His proposed solution overwrites the values variable with the putIfAbsent call, which creates the same problem I had before. The code that seems to work looks like this:
public void record(String key, String value) {
List<String> values = entries.get(key);
if (values == null) {
values = Collections.synchronizedList(new ArrayList<String>());
List<String> values2 = entries.putIfAbsent(key, values);
if (values2 != null)
values = values2;
}
values.add(value);
}
It's not as elegant as I'd like, but it's better than the original that creates a new ArrayList instance at every call.
Created two versions based on Gene's answer
public static <K,V> void putIfAbsetMultiValue(ConcurrentHashMap<K,List<V>> entries, K key, V value) {
List<V> values = entries.get(key);
if (values == null) {
values = Collections.synchronizedList(new ArrayList<V>());
List<V> values2 = entries.putIfAbsent(key, values);
if (values2 != null)
values = values2;
}
values.add(value);
}
public static <K,V> void putIfAbsetMultiValueSet(ConcurrentMap<K,Set<V>> entries, K key, V value) {
Set<V> values = entries.get(key);
if (values == null) {
values = Collections.synchronizedSet(new HashSet<V>());
Set<V> values2 = entries.putIfAbsent(key, values);
if (values2 != null)
values = values2;
}
values.add(value);
}
It works well
This is a problem I also looked for an answer. The method putIfAbsent does not actually solve the extra object creation problem, it just makes sure that one of those objects doesn't replace another. But the race conditions among threads can cause multiple object instantiation. I could find 3 solutions for this problem (And I would follow this order of preference):
1- If you are on Java 8, the best way to achieve this is probably the new computeIfAbsent method of ConcurrentMap. You just need to give it a computation function which will be executed synchronously (at least for the ConcurrentHashMap implementation). Example:
private final ConcurrentMap<String, List<String>> entries =
new ConcurrentHashMap<String, List<String>>();
public void method1(String key, String value) {
entries.computeIfAbsent(key, s -> new ArrayList<String>())
.add(value);
}
This is from the javadoc of ConcurrentHashMap.computeIfAbsent:
If the specified key is not already associated with a value, attempts
to compute its value using the given mapping function and enters it
into this map unless null. The entire method invocation is performed
atomically, so the function is applied at most once per key. Some
attempted update operations on this map by other threads may be
blocked while computation is in progress, so the computation should be
short and simple, and must not attempt to update any other mappings of
this map.
2- If you cannot use Java 8, you can use Guava's LoadingCache, which is thread-safe. You define a load function to it (just like the compute function above), and you can be sure that it'll be called synchronously. Example:
private final LoadingCache<String, List<String>> entries = CacheBuilder.newBuilder()
.build(new CacheLoader<String, List<String>>() {
#Override
public List<String> load(String s) throws Exception {
return new ArrayList<String>();
}
});
public void method2(String key, String value) {
entries.getUnchecked(key).add(value);
}
3- If you cannot use Guava either, you can always synchronise manually and do a double-checked locking. Example:
private final ConcurrentMap<String, List<String>> entries =
new ConcurrentHashMap<String, List<String>>();
public void method3(String key, String value) {
List<String> existing = entries.get(key);
if (existing != null) {
existing.add(value);
} else {
synchronized (entries) {
List<String> existingSynchronized = entries.get(key);
if (existingSynchronized != null) {
existingSynchronized.add(value);
} else {
List<String> newList = new ArrayList<>();
newList.add(value);
entries.put(key, newList);
}
}
}
}
I made an example implementation of all those 3 methods and additionally, the non-synchronized method, which causes extra object creation: http://pastebin.com/qZ4DUjTr
Waste of memory (also GC etc.) that Empty Array list creation problem is handled with Java 1.7.40. Don't worry about creating empty arraylist.
Reference : http://javarevisited.blogspot.com.tr/2014/07/java-optimization-empty-arraylist-and-Hashmap-cost-less-memory-jdk-17040-update.html
The approach with putIfAbsent has the fastest execution time, it is from 2 to 50 times faster than the "lambda" approach in evironments with high contention. The Lambda isn't the reason behind this "powerloss", the issue is the compulsory synchronisation inside of computeIfAbsent prior to the Java-9 optimisations.
the benchmark:
import java.util.Random;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicLong;
public class ConcurrentHashMapTest {
private final static int numberOfRuns = 1000000;
private final static int numberOfThreads = Runtime.getRuntime().availableProcessors();
private final static int keysSize = 10;
private final static String[] strings = new String[keysSize];
static {
for (int n = 0; n < keysSize; n++) {
strings[n] = "" + (char) ('A' + n);
}
}
public static void main(String[] args) throws InterruptedException {
for (int n = 0; n < 20; n++) {
testPutIfAbsent();
testComputeIfAbsentLamda();
}
}
private static void testPutIfAbsent() throws InterruptedException {
final AtomicLong totalTime = new AtomicLong();
final ConcurrentHashMap<String, AtomicInteger> map = new ConcurrentHashMap<String, AtomicInteger>();
final Random random = new Random();
ExecutorService executorService = Executors.newFixedThreadPool(numberOfThreads);
for (int i = 0; i < numberOfThreads; i++) {
executorService.execute(new Runnable() {
#Override
public void run() {
long start, end;
for (int n = 0; n < numberOfRuns; n++) {
String s = strings[random.nextInt(strings.length)];
start = System.nanoTime();
AtomicInteger count = map.get(s);
if (count == null) {
count = new AtomicInteger(0);
AtomicInteger prevCount = map.putIfAbsent(s, count);
if (prevCount != null) {
count = prevCount;
}
}
count.incrementAndGet();
end = System.nanoTime();
totalTime.addAndGet(end - start);
}
}
});
}
executorService.shutdown();
executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
System.out.println("Test " + Thread.currentThread().getStackTrace()[1].getMethodName()
+ " average time per run: " + (double) totalTime.get() / numberOfThreads / numberOfRuns + " ns");
}
private static void testComputeIfAbsentLamda() throws InterruptedException {
final AtomicLong totalTime = new AtomicLong();
final ConcurrentHashMap<String, AtomicInteger> map = new ConcurrentHashMap<String, AtomicInteger>();
final Random random = new Random();
ExecutorService executorService = Executors.newFixedThreadPool(numberOfThreads);
for (int i = 0; i < numberOfThreads; i++) {
executorService.execute(new Runnable() {
#Override
public void run() {
long start, end;
for (int n = 0; n < numberOfRuns; n++) {
String s = strings[random.nextInt(strings.length)];
start = System.nanoTime();
AtomicInteger count = map.computeIfAbsent(s, (k) -> new AtomicInteger(0));
count.incrementAndGet();
end = System.nanoTime();
totalTime.addAndGet(end - start);
}
}
});
}
executorService.shutdown();
executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
System.out.println("Test " + Thread.currentThread().getStackTrace()[1].getMethodName()
+ " average time per run: " + (double) totalTime.get() / numberOfThreads / numberOfRuns + " ns");
}
}
The results:
Test testPutIfAbsent average time per run: 115.756501 ns
Test testComputeIfAbsentLamda average time per run: 276.9667055 ns
Test testPutIfAbsent average time per run: 134.2332435 ns
Test testComputeIfAbsentLamda average time per run: 223.222063625 ns
Test testPutIfAbsent average time per run: 119.968893625 ns
Test testComputeIfAbsentLamda average time per run: 216.707419875 ns
Test testPutIfAbsent average time per run: 116.173902375 ns
Test testComputeIfAbsentLamda average time per run: 215.632467375 ns
Test testPutIfAbsent average time per run: 112.21422775 ns
Test testComputeIfAbsentLamda average time per run: 210.29563725 ns
Test testPutIfAbsent average time per run: 120.50643475 ns
Test testComputeIfAbsentLamda average time per run: 200.79536475 ns
"A train has wagonCount , wagons indexed as 0,1,......wagonCount-1.Each wagon must be filled in the constructor of the Train using the fillWagon function.Which accepts wagon's index and return the wagon's cargo.The code below works, but the server has enough memory only for small train.Reactor the code so that server has enough memory even for large train?"
"Thinking we can convert the Hashtable collection to arrays, but no idea how to start, please help. or any idea would be great help. "
import java.util.function.Function;
public class Train {
private Hashtable<Integer, Integer> wagons;
public Train(int wagonCount, Function<Integer, Integer> fillWagon) {
this.wagons = new Hashtable<Integer, Integer>();
for (int i = 0; i < wagonCount; i++) {
this.wagons.put(i, fillWagon.apply(i));
}
}
public int peekWagon(int wagonIndex) {
return this.wagons.get(wagonIndex);
}
public static void main(String[] args) {
Train train = new Train(10, wagonIndex -> wagonIndex);
for (int i = 0; i < 10; i++) {
System.out.println("Wagon: " + i + ", cargo: " + train.peekWagon(i));
}
}
}
You could use int[] it consumes less memory.
It is the most optimal structure to keep integers. Hashtable<Integer, Integer> has a complex structure and huge overhead on storing the numbers, Even Ineger[] consumes alot more memory then int[]. So the best structure is array of primitives. Have a look at good explanation Memory usage of Java objects.
We use index of array to access to the element by required position, instead of Hashtable.get it's required less cpu resources:
public class Train {
private int[] wagons;
public Train(int wagonCount, Function<Integer, Integer> fillWagon) {
this.wagons = new int[wagonCount];
for (int i = 0; i < wagonCount; i++) {
this.wagons[i] = fillWagon.apply(i);
}
}
public int peekWagon(int wagonIndex) {
return this.wagons[wagonIndex];
}
public static void main(String[] args) {
Train train = new Train(10, wagonIndex -> wagonIndex);
for (int i = 0; i < 10; i++) {
System.out.println("Wagon: " + i + ", cargo: " + train.peekWagon(i));
}
}
}
If it is a requirement to fill all wagons during the execution of the constructor, then there is just no way to store an arbitrary number of wagon contents in memory, when that memory has its size limited to some small constant. Sure, using an int array will take a bit less memory than a map, but it still grows linear to the input size.
If however, it is allowed to defer the actual storing of the wagon contents, then you could use the constructor to keep a reference to the callback function, and only call it when peekWagon is called. You could still use the little memory that is available for storing some of the wagon contents, but only for the last queried k wagons. That way, you will have in memory what is queried regularly, but will need to retrieve (again) the contents when that particular wagon is not (or no longer) in memory. You would then call the callback function again.
This assumes of course that the callback function will not have undesirable side-effects, and that it will always return the same value when passed the same argument.
If these assumptions are OK, your code could look like this:
import java.util.*;
import java.util.function.Function;
public class Main {
static int maxSize = 4;
private LinkedHashMap<Integer, Integer> wagons;
private Function<Integer, Integer> fillWagon;
public Main(int wagonCount, Function<Integer, Integer> fillWagon) {
this.wagons = new LinkedHashMap<Integer, Integer>();
this.fillWagon = fillWagon;
}
public int peekWagon(int wagonIndex) {
int content;
if (!this.wagons.containsKey(wagonIndex)) {
if (this.wagons.size() >= maxSize) {
// Make room by removing an entry
int key = this.wagons.entrySet().iterator().next().getKey();
this.wagons.remove(key);
}
content = this.fillWagon.apply(wagonIndex);
} else {
// Remove entry so to put it at end of LinkedHashMap
content = this.wagons.get(wagonIndex);
this.wagons.remove(wagonIndex);
}
this.wagons.put(wagonIndex, content);
return content;
}
/* ... */
}
This question is a bit more complex that the title states.
What I am trying to do is store a map of {Object:Item} for a game where the Object represents a cupboard and the Item represents the content of the cupboard (i.e the item inside).
Essentially what I need to do is update the values of the items in a clockwise (positive) rotation; though I do NOT want to modify the list in any way after it is created, only shift the positions of the values + 1.
I am currently doing almost all That I need, however, there are more Object's than Item's so I use null types to represent empty cupboards. However, when I run my code, the map is being modified (likely as it's in the for loop) and in turn, elements are being overwritten incorrectly which after A while may leave me with a list full of nulls (and empty cupboards)
What I have so far...
private static Map<Integer, Integer> cupboardItems = new HashMap<Integer, Integer>();
private static Map<Integer, Integer> rewardPrices = new HashMap<Integer, Integer>();
private static final int[] objects = { 10783, 10785, 10787, 10789, 10791, 10793, 10795, 10797 };
private static final int[] rewards = { 6893, 6894, 6895, 6896, 6897 };
static {
int reward = rewards[0];
for (int i = 0; i < objects.length; i++) {
if (reward > rewards[rewards.length - 1])
cupboardItems.put(objects[i], null);
else
cupboardItems.put(objects[i], reward);
reward++;
}
}
// updates the items in the cupboards in clockwise rotation.
for (int i = 0; i < cupboardItems.size(); i++) {
if (objects[i] == objects[objects.length - 2])
cupboardItems.put(objects[i], cupboardItems.get(objects[0]));
else if (objects[i] == objects[objects.length - 1])
cupboardItems.put(objects[i], cupboardItems.get(objects[1]));
else
cupboardItems.put(objects[i], cupboardItems.get(objects[i + 2]));
}
So how may I modify my code to update so i get the following results..
======
k1:v1
k2:v2
k3:v3
k4:none
=======
k1:none
k2:v1
k3:v2
k4:v3
?
HashMap doesn't guarantee ordering, therefore if you need ordering, use ArrayList or LinkedList.
If you want to stick with HashMap, you need to sort the HashMap based on the key before each rotation. You can sort easily since the keys are Integer objects. But this will affect the performace.
Ragavan has a good answer if you want to stick to your approach. However, you are doing a lot of work to just rotate the items. It would be much more efficient to just rotate the index (using modulus) and keep the arrays the same:
final static List<Integer> objects = new ArrayList<Integer>(
Arrays.asList(10783, 10785, 10787, 10789, 10791, 10793, 10795, 10797));
final static List<Integer> rewards = new ArrayList<Integer>(
Arrays.asList(6893, 6894, 6895, 6896, 6897, -1, -1, -1));
public static int getReward(int obj, int rot){
int rotIndex = (objects.indexOf(obj) - rot)%objects.size();
//modulus in java can be negative
rotIndex = rotIndex < 0 ? rotIndex+objects.size():rotIndex;
return rewards.get(rotIndex);
}
public static void main(String... args){
//This should give 6897, which is the reward for obj 10783 after 4 rotations
System.out.println(getReward(10783,4));
}
This question already has answers here:
Difference between HashMap, LinkedHashMap and TreeMap
(17 answers)
What is the difference between a HashMap and a TreeMap? [duplicate]
(8 answers)
Closed 8 years ago.
I am writing an dictionary that make heavily use of String as key in Map<String, Index>. What I concern is which one of HashMap and TreeMap will result in better (faster) performance in searching a key in the map?
Given that there are not many collissions hashmaps will give you o(1) performance (with a lot of colissions this can degrade to potentially O(n) where N is the number of entries (colissions) in any single bucket). TreeMaps on the other hand are used if you want to have some sort of balanced tree structure which yields O(logN) retrieval. So it really depends on your particular use-case. But if you just want to access elements, irrespective of their order use HashMap
public class MapsInvestigation {
public static HashMap<String, String> hashMap = new HashMap<String, String>();
public static TreeMap<String, String> treeMap = new TreeMap<String, String>();
public static ArrayList<String> list = new ArrayList<String>();
static {
for (int i = 0; i < 10000; i++) {
list.add(Integer.toString(i, 16));
}
}
public static void main(String[] args) {
System.out.println("Warmup populate");
for (int i = 0; i < 1000; i++) {
populateSet(hashMap);
populateSet(treeMap);
}
measureTimeToPopulate(hashMap, "HashMap", 1000);
measureTimeToPopulate(treeMap, "TreeMap", 1000);
System.out.println("Warmup get");
for (int i = 0; i < 1000; i++) {
get(hashMap);
get(treeMap);
}
measureTimeToContains(hashMap, "HashMap", 1000);
measureTimeToContains(treeMap, "TreeMap", 1000);
}
private static void get(Map<String, String> map) {
for (String s : list) {
map.get(s);
}
}
private static void populateSet(Map<String, String> map) {
map.clear();
for (String s : list) {
map.put(s, s);
}
}
private static void measureTimeToPopulate(Map<String, String> map, String setName, int reps) {
long start = System.currentTimeMillis();
for (int i = 0; i < reps; i++) {
populateSet(map);
}
long finish = System.currentTimeMillis();
System.out.println("Time to populate " + (reps * map.size()) + " entries in a " + setName + ": " + (finish - start));
}
private static void measureTimeToContains(Map<String, String> map, String setName, int reps) {
long start = System.currentTimeMillis();
for (int i = 0; i < reps; i++) {
get(map);
}
long finish = System.currentTimeMillis();
System.out.println("Time to get() " + (reps * map.size()) + " entries in a " + setName + ": " + (finish - start));
}
}
Gives these results:
Warmup populate
Time to populate 10000000 entries in a HashMap: 230
Time to populate 10000000 entries in a TreeMap: 1995
Warmup get
Time to get() 10000000 entries in a HashMap: 140
Time to get() 10000000 entries in a TreeMap: 1164
HashMap is O(1) (usually) for access; TreeMap is O(log n) (guaranteed).
This assumes that your key objects are immutable and have properly written equals and hashCode methods. See Joshua Bloch's "Effective Java" chapter 3 for how to override equals and hashCode correctly.
a HashMap is O(1) average, so it is supposed to be faster, and for large maps will probably have better throughput.
However, a HashMap requires rehashing when Load Balance become too high. rehashing is O(n), so at any time of the program's life, you may suffer unexpectedly performance loss due to rehash, which might be critical in some apps [high latency]. So think twice before using HashMap if latency is an issue!
a HashMap is also vulnerable to poor hashing functions, which might cause O(n), if many items in use are hashed into the same place.
HashMap is faster. However if you would often need to process your dictionary in alphabetical order, you would be better off with the TreeMap since you would otherwise need to sort all your words every time you need to process them in alphabetical order.
For your application HashMap is the better choice since I doubt you will need the alphabetically sorted list often, if ever.