Related
I'm working on an application, that has uses a HashMap to share state. I need to prove via unit tests that it will have problems in a multi-threaded environment.
I tried to check the state of the application in a single thread environment and in a multi-threaded environment via checking the size and elements of the HashMap in both of them. But seems this doesn't help, the state is always the same.
Are there any other ways to prove it or prove that an application that performs operations on the map works well with concurrent requests?
This is quite easy to prove.
Shortly
A hash map is based on an array, where each item represents a bucket. As more keys are added, the buckets grow and at a certain threshold the array is recreated with a bigger size so that its buckets are spread more evenly (performance considerations). During the array recreation, the array becomes empty, which results in empty result for the caller, until the recreation completes.
Details and Proof
It means that sometimes HashMap#put() will internally call HashMap#resize() to make the underlying array bigger.
HashMap#resize() assigns the table field a new empty array with a bigger capacity and populates it with the old items. While this population happens, the underlying array doesn't contain all of the old items and calling HashMap#get() with an existing key may return null.
The following code demonstrates that. You are very likely to get the exception that will mean the HashMap is not thread safe. I chose the target key as 65 535 - this way it will be the last element in the array, thus being the last element during re-population which increases the possibility of getting null on HashMap#get() (to see why, see HashMap#put() implementation).
final Map<Integer, String> map = new HashMap<>();
final Integer targetKey = 0b1111_1111_1111_1111; // 65 535
final String targetValue = "v";
map.put(targetKey, targetValue);
new Thread(() -> {
IntStream.range(0, targetKey).forEach(key -> map.put(key, "someValue"));
}).start();
while (true) {
if (!targetValue.equals(map.get(targetKey))) {
throw new RuntimeException("HashMap is not thread safe.");
}
}
One thread adds new keys to the map. The other thread constantly checks the targetKey is present.
If count those exceptions, I get around 200 000.
It is hard to simulate Race but looking at the OpenJDK source for put() method of HashMap:
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
//Operation 1
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
//Operation 2
modCount++;
//Operation 3
addEntry(hash, key, value, i);
return null;
}
As you can see put() involves 3 operations which are not synchronized. And compound operations are non thread safe. So theoretically it is proven that HashMap is not thread safe.
Its an old thread. But just pasting my sample code which is able to demonstrate the problems with hashmap.
Take a look at the below code, we try to insert 30000 Items into the hashmap using 10 threads (3000 items per thread).
So after all the threads are completed, you should ideally see that the size of hashmap should be 30000. But the actual output would be either an exception while rebuilding the tree or the final count is less than 30000.
class TempValue {
int value = 3;
#Override
public int hashCode() {
return 1; // All objects of this class will have same hashcode.
}
}
public class TestClass {
public static void main(String args[]) {
Map<TempValue, TempValue> myMap = new HashMap<>();
List<Thread> listOfThreads = new ArrayList<>();
// Create 10 Threads
for (int i = 0; i < 10; i++) {
Thread thread = new Thread(() -> {
// Let Each thread insert 3000 Items
for (int j = 0; j < 3000; j++) {
TempValue key = new TempValue();
myMap.put(key, key);
}
});
thread.start();
listOfThreads.add(thread);
}
for (Thread thread : listOfThreads) {
thread.join();
}
System.out.println("Count should be 30000, actual is : " + myMap.size());
}
}
Output 1 :
Count should be 30000, actual is : 29486
Output 2 : (Exception)
java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNodejava.lang.ClassCastException: java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode
at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1819)
at java.util.HashMap$TreeNode.treeify(HashMap.java:1936)
at java.util.HashMap.treeifyBin(HashMap.java:771)
at java.util.HashMap.putVal(HashMap.java:643)
at java.util.HashMap.put(HashMap.java:611)
at TestClass.lambda$0(TestClass.java:340)
at java.lang.Thread.run(Thread.java:745)
However if you modify the line Map<TempValue, TempValue> myMap = new HashMap<>(); to a ConcurrentHashMap the output is always 30000.
Another Observation :
In the above example the hashcode for all objects of TempValue class was the same(** i.e., 1**). So you might be wondering, this issue with HashMap might occur only in case there is a collision (due to hashcode).
I tried another example.
Modify the TempValue class to
class TempValue {
int value = 3;
}
Now re-execute the same code. Out of every 5 runs, I see 2-3 runs still give a different output than 30000.
So even if you usually don't have much collisions, you might still end up with an issue. (Maybe due to rebuilding of HashMap, etc.)
Overall these examples show the issue with HashMap which ConcurrentHashMap handles.
I need to prove via unit tests that it will have problems in multithread environment.
This is going to be tremendously hard to do. Race conditions are very hard to demonstrate. You could certainly write a program which does puts and gets into a HashMap in a large number of threads but logging, volatile fields, other locks, and other timing details of your application may make it extremely hard to force your particular code to fail.
Here's a stupid little HashMap failure test case. It fails because it times out when the threads go into an infinite loop because of memory corruption of HashMap. However, it may not fail for you depending on number of cores and other architecture details.
#Test(timeout = 10000)
public void runTest() throws Exception {
final Map<Integer, String> map = new HashMap<Integer, String>();
ExecutorService pool = Executors.newFixedThreadPool(10);
for (int i = 0; i < 10; i++) {
pool.submit(new Runnable() {
#Override
public void run() {
for (int i = 0; i < 10000; i++) {
map.put(i, "wow");
}
}
});
}
pool.shutdown();
pool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
}
Is reading the API docs enough? There is a statement in there:
Note that this implementation is not synchronized. If multiple threads
access a hash map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally. (A
structural modification is any operation that adds or deletes one or
more mappings; merely changing the value associated with a key that an
instance already contains is not a structural modification.) This is
typically accomplished by synchronizing on some object that naturally
encapsulates the map. If no such object exists, the map should be
"wrapped" using the Collections.synchronizedMap method. This is best
done at creation time, to prevent accidental unsynchronized access to
the map:
The problem with thread safety is that it's hard to prove through a test. It could be fine most of the times. Your best bet would be to just run a bunch of threads that are getting/putting and you'll probably get some concurrency errors.
I suggest using a ConcurrentHashMap and trust that the Java team saying that HashMap is not synchronized is enough.
Are there any other ways to prove it?
How about reading the documentation (and paying attention to the emphasized "must"):
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally
If you are going to attempt to write a unit test that demonstrates incorrect behavior, I recommend the following:
Create a bunch of keys that all have the same hashcode (say 30 or 40)
Add values to the map for each key
Spawn a separate thread for the key, which has an infinite loop that (1) asserts that the key is present int the map, (2) removes the mapping for that key, and (3) adds the mapping back.
If you're lucky, the assertion will fail at some point, because the linked list behind the hash bucket will be corrupted. If you're unlucky, it will appear that HashMap is indeed threadsafe despite the documentation.
It may be possible, but will never be a perfect test. Race conditions are just too unpredictable. That being said, I wrote a similar type of test to help fix a threading issue with a proprietary data structure, and in my case, it was much easier to prove that something was wrong (before the fix) than to prove that nothing would go wrong (after the fix). You could probably construct a multi-threaded test that will eventually fail with sufficient time and the right parameters.
This post may be helpful in identifying areas to focus on in your test and has some other suggestions for optional replacements.
You can create multiple threads each adding an element to a hashmap and iterating over it.
i.e. In the run method we have to use "put" and then iterate using iterator.
For the case of HashMap we get ConcurrentModificationException while for ConcurrentHashMap we dont get.
Most probable race condition at java.util.HashMap implementation
Most of hashMaps failing if we are trying to read values while resizing or rehashing step executing. Resizing and rehashing operation executed under certain conditions most commonly if exceed bucket threshold. This code proves that if I call resizing externally or If I put more element than threshold and tend to call resizing operation internally causes to some null read which shows that HashMap is not thread safe. There should be more race condition but it is enough to prove it is not Thread Safe.
Practically proof of race condition
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.util.HashMap;
import java.util.Map;
import java.util.stream.IntStream;
public class HashMapThreadSafetyTest {
public static void main(String[] args) {
try {
(new HashMapThreadSafetyTest()).testIt();
} catch (Exception e) {
e.printStackTrace();
}
}
private void threadOperation(int number, Map<Integer, String> map) {
map.put(number, "hashMapTest");
while (map.get(number) != null);
//If code passes to this line that means we did some null read operation which should not be
System.out.println("Null Value Number: " + number);
}
private void callHashMapResizeExternally(Map<Integer, String> map)
throws NoSuchMethodException, InvocationTargetException, IllegalAccessException {
Method method = map.getClass().getDeclaredMethod("resize");
method.setAccessible(true);
System.out.println("calling resize");
method.invoke(map);
}
private void testIt()
throws InterruptedException, NoSuchMethodException, IllegalAccessException, InvocationTargetException {
final Map<Integer, String> map = new HashMap<>();
IntStream.range(0, 12).forEach(i -> new Thread(() -> threadOperation(i, map)).start());
Thread.sleep(60000);
// First loop should not show any null value number untill calling resize method of hashmap externally.
callHashMapResizeExternally(map);
// First loop should fail from now on and should print some Null Value Numbers to the out.
System.out.println("Loop count is 12 since hashmap initially created for 2^4 bucket and threshold of resizing"
+ "0.75*2^4 = 12 In first loop it should not fail since we do not resizing hashmap. "
+ "\n\nAfter 60 second: after calling external resizing operation with reflection should forcefully fail"
+ "thread safety");
Thread.sleep(2000);
final Map<Integer, String> map2 = new HashMap<>();
IntStream.range(100, 113).forEach(i -> new Thread(() -> threadOperation(i, map2)).start());
// Second loop should fail from now on and should print some Null Value Numbers to the out. Because it is
// iterating more than 12 that causes hash map resizing and rehashing
System.out.println("It should fail directly since it is exceeding hashmap initial threshold and it will resize"
+ "when loop iterate 13rd time");
}
}
Example output
No null value should be printed untill thread sleep line passed
calling resize
Loop count is 12 since hashmap initially created for 2^4 bucket and threshold of resizing0.75*2^4 = 12 In first loop it should not fail since we do not resizing hashmap.
After 60 second: after calling external resizing operation with reflection should forcefully failthread safety
Null Value Number: 11
Null Value Number: 5
Null Value Number: 6
Null Value Number: 8
Null Value Number: 0
Null Value Number: 7
Null Value Number: 2
It should fail directly since it is exceeding hashmap initial threshold and it will resizewhen loop iterate 13th time
Null Value Number: 111
Null Value Number: 100
Null Value Number: 107
Null Value Number: 110
Null Value Number: 104
Null Value Number: 106
Null Value Number: 109
Null Value Number: 105
Very Simple Solution to prove this
Here is the code, which proves the Hashmap implementation is not thread safe.
In this example, we are only adding the elements to map, not removing it from any method.
We can see that it prints the keys which are not in map, even though we have put the same key in map before doing get operation.
package threads;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class HashMapWorkingDemoInConcurrentEnvironment {
private Map<Long, String> cache = new HashMap<>();
public String put(Long key, String value) {
return cache.put(key, value);
}
public String get(Long key) {
return cache.get(key);
}
public static void main(String[] args) {
HashMapWorkingDemoInConcurrentEnvironment cache = new HashMapWorkingDemoInConcurrentEnvironment();
class Producer implements Callable<String> {
private Random rand = new Random();
public String call() throws Exception {
while (true) {
long key = rand.nextInt(1000);
cache.put(key, Long.toString(key));
if (cache.get(key) == null) {
System.out.println("Key " + key + " has not been put in the map");
}
}
}
}
ExecutorService executorService = Executors.newFixedThreadPool(4);
System.out.println("Adding value...");
try {
for (int i = 0; i < 4; i++) {
executorService.submit(new Producer());
}
} finally {
executorService.shutdown();
}
}
}
Sample Output for a execution run
Adding value...
Key 611 has not been put in the map
Key 978 has not been put in the map
Key 35 has not been put in the map
Key 202 has not been put in the map
Key 714 has not been put in the map
Key 328 has not been put in the map
Key 606 has not been put in the map
Key 149 has not been put in the map
Key 763 has not been put in the map
Its strange to see the values printed, that's why hashmap is not thread safe implementation working in concurrent environment.
There is a great tool open sourced by the OpenJDK team called JCStress which is used in the JDK for concurrency testing.
https://github.com/openjdk/jcstress
In one of its sample: https://github.com/openjdk/jcstress/blob/master/tests-custom/src/main/java/org/openjdk/jcstress/tests/collections/HashMapFailureTest.java
#JCStressTest
#Outcome(id = "0, 0, 1, 2", expect = Expect.ACCEPTABLE, desc = "No exceptions, entire map is okay.")
#Outcome(expect = Expect.ACCEPTABLE_INTERESTING, desc = "Something went wrong")
#State
public class HashMapFailureTest {
private final Map<Integer, Integer> map = new HashMap<>();
#Actor
public void actor1(IIII_Result r) {
try {
map.put(1, 1);
r.r1 = 0;
} catch (Exception e) {
r.r1 = 1;
}
}
#Actor
public void actor2(IIII_Result r) {
try {
map.put(2, 2);
r.r2 = 0;
} catch (Exception e) {
r.r2 = 1;
}
}
#Arbiter
public void arbiter(IIII_Result r) {
Integer v1 = map.get(1);
Integer v2 = map.get(2);
r.r3 = (v1 != null) ? v1 : -1;
r.r4 = (v2 != null) ? v2 : -1;
}
}
The methods marked with actor are run concurrently on different threads.
The result for this on my machine is:
Results across all configurations:
RESULT SAMPLES FREQ EXPECT DESCRIPTION
0, 0, -1, 2 3,854,896 5.25% Interesting Something went wrong
0, 0, 1, -1 4,251,564 5.79% Interesting Something went wrong
0, 0, 1, 2 65,363,492 88.97% Acceptable No exceptions, entire map is okay.
This shows that 88% of the times expected values were observed but in around 12% of the times, incorrect results were seen.
You can try out this tool and the samples and write your own tests to verify that concurrency of some code is broken.
As a yet another reply to this topic, I would recommend example from https://www.baeldung.com/java-concurrent-map, that looks as below. Theory is very straigthforwad - for N times we run 10 threads, that each of them increments the value in a common map 10 times. If the map was thread safe, the value should be 100 every time. Example proves, it's not.
#Test
public void givenHashMap_whenSumParallel_thenError() throws Exception {
Map<String, Integer> map = new HashMap<>();
List<Integer> sumList = parallelSum100(map, 100);
assertNotEquals(1, sumList
.stream()
.distinct()
.count());
long wrongResultCount = sumList
.stream()
.filter(num -> num != 100)
.count();
assertTrue(wrongResultCount > 0);
}
private List<Integer> parallelSum100(Map<String, Integer> map,
int executionTimes) throws InterruptedException {
List<Integer> sumList = new ArrayList<>(1000);
for (int i = 0; i < executionTimes; i++) {
map.put("test", 0);
ExecutorService executorService =
Executors.newFixedThreadPool(4);
for (int j = 0; j < 10; j++) {
executorService.execute(() -> {
for (int k = 0; k < 10; k++)
map.computeIfPresent(
"test",
(key, value) -> value + 1
);
});
}
executorService.shutdown();
executorService.awaitTermination(5, TimeUnit.SECONDS);
sumList.add(map.get("test"));
}
return sumList;
}
Edit: Already solved using RDD.collectAsMap()
I am trying to replicate the solution to the problem from pages 28-30 of http://on-demand.gputechconf.com/gtc/2016/presentation/S6424-michela-taufer-apache-spark.pdf
I have a HashMap that I instantiate outside of the map function. The HashMap contains the following data:
{1:2, 2:3, 3:2, 4:2, 5:3}
A previously defined RDD previousRDD was has the type:
JavaPairRDD<Integer, Iterable<Tuple2<Integer, Integer>>>
has the data:
1: [(1,2), (1,5)]
2: [(2,1), (2,3), (2,5)]
3: [(3,2), (3,4)]
4: [(4,3), (4,5)]
5: [(5,1), (5,2), (5,4)]
I try to create a new RDD with a flatMapToPair:
JavaPairRDD<Integer, Integer> newRDD = previousRDD.flatMapToPair(new PairFlatMapFunction<Tuple2<Integer, Iterable<Tuple2<Integer, Integer>>>, Integer, Integer>() {
#Override
public Iterator<Tuple2<Integer, Integer>> call(Tuple2<Integer, Iterable<Tuple2<Integer, Integer>>> integerIterableTuple2) throws Exception {
Integer count;
ArrayList<Tuple2<Integer, Integer>> list = new ArrayList<>();
count = hashMap.get(integerIterableTuple2._1);
for (Tuple2<Integer, Integer> t : integerIterableTuple2._2) {
Integer tcount = hashMap.get(t._2);
if (count < tcount || (count.equals(tcount) && integerIterableTuple2._1 < t._2)) {
list.add(t);
}
}
return list.iterator();
}
});
But in this, the hashMap.get(t._2) inside the for loop gets NULLs most of the time. I have checked that the proper values are inside the HashMap.
Is there a way to properly get the values of a HashMap inside a Spark function?
It should work. Spark should capture your variable, serialize it and send to each worker with each task. You might try broadcasting this map
sc.broadcast(hashMap)
and use the result instead of hashMap. It is more efficient memory-wise too (shared storage per executor).
I had similar problem with class variables. You can try make your variable local or declare one more, like this:
Map localMap = hashMap;
JavaPairRDD<Integer, Integer> newRDD = previousRDD.flatMapToPair(
...
Integer tcount = localMap.get(t._2);
...
);
I think this is due to spark serialization mechanism. You can read more about it here.
I search the database many times,even I have cache some result, it still cost took a long time.
List<Map<Long, Node>> aNodeMapList = new ArrayList<>();
Map<String, List<Map<String, Object>>> cacheRingMap = new ConcurrentHashMap<>();
for (Ring startRing : startRings) {
for (Ring endRing : endRings) {
Map<String, Object> nodeMapResult = getNodeMapResult(startRing, endRing, cacheRingMap);
Map<Long, Node> nodeMap = (Map<Long, Node>) nodeMapResult.get("nodeMap");
if (nodeMap.size() > 0) {
aNodeMapList.add(nodeMap);
}
}
}
getNodeMapResult is a function to search database according to startRing, endRing, and cache in cacheRingMap, and next time it may not need to search database if I find the result have exist in
cacheRingMap.
My leader tell me that multithread technology can be used. So I change it to executorCompletionService, but now I have a question, is this thread safe when I use concurrentHashMap to cache result in executorCompletionService?
Will it run fast after I change?
int totalThreadCount = startRings.size() * endRings.size();
ExecutorService threadPool2 = Executors.newFixedThreadPool(totalThreadCount > 4 ? 4 : 2);
CompletionService<Map<String, Object>> completionService = new ExecutorCompletionService<Map<String, Object>>(threadPool2);
for (Ring startRing : startRings) {
for (Ring endRing : endRings) {
completionService.submit(new Callable<Map<String, Object>>() {
#Override
public Map<String, Object> call() throws Exception {
return getNodeMapResult(startRing, endRing, cacheRingMap);
}
});
}
}
for (int i = 0; i < totalThreadCount; i++) {
Map<String, Object> nodeMapResult = completionService.take().get();
Map<Long, Node> nodeMap = (Map<Long, Node>) nodeMapResult.get("nodeMap");
if (nodeMap.size() > 0) {
aNodeMapList.add(nodeMap);
}
}
Is this thread safe when I use concurrentHashMap to cache result in executorCompletionService?
The ConcurrentHashMap itself is thread safe, as its name suggests ("Concurrent"). However, that doesn't mean that the code that uses it is thread safe.
For instance, if your code does the following:
SomeObject object = cacheRingMap.get(someKey); //get from cache
if (object == null){ //oh-oh, cache miss
object = getObjectFromDb(someKey); //get from the db
cacheRingMap.put(someKey, object); //put in cache for next time
}
Since the get and put aren't performed atomically in this example, two threads executing this code could end up both looking for the same key first in the cache, and then in the db. It's still thread-safe, but we performed two db lookups instead of just one. But this is just a simple example, more complex caching logic (say one that includes cache invalidation and removals from the cache map) can end up being not just wasteful, but actually incorrect. It all depends on how the map is used and what guarantees you need from it. I suggest you read the ConcurrentHashMap javadoc. See what it can guarantee, and what it cannot.
Will it run fast after I change?
That depends on too many parameters to know in advance. How would the database handle the concurrent queries? How many queries are there? How fast is a single query? Etc. The best way of knowing is to actually try it out.
As a side note, if you're looking for ways to improve performance, you might want to try using a batch query. The flow would then be to search the cache for all the keys you need, gather the keys you need to look up, and then send them all together in a single query to the database. In many cases, a single large query would run faster that a bunch of smaller ones.
Also, you should check whether concurrent lookups in the map are faster than single threaded ones in your case. Perhaps parallelizing only the query itself, and not the cache lookup could yield better results in your case.
I have a class in which I am populating a map liveSocketsByDatacenter from a single background thread every 30 seconds and then I have a method getNextSocket which will be called by multiple reader threads to get a live socket available which uses the same map to get this info.
public class SocketManager {
private static final Random random = new Random();
private final ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor();
private final Map<Datacenters, List<SocketHolder>> liveSocketsByDatacenter = new HashMap<>();
private final ZContext ctx = new ZContext();
// Lazy Loaded Singleton Pattern
private static class Holder {
private static final SocketManager instance = new SocketManager();
}
public static SocketManager getInstance() {
return Holder.instance;
}
private SocketManager() {
connectToZMQSockets();
scheduler.scheduleAtFixedRate(new Runnable() {
public void run() {
updateLiveSockets();
}
}, 30, 30, TimeUnit.SECONDS);
}
private void connectToZMQSockets() {
Map<Datacenters, ImmutableList<String>> socketsByDatacenter = Utils.SERVERS;
for (Map.Entry<Datacenters, ImmutableList<String>> entry : socketsByDatacenter.entrySet()) {
List<SocketHolder> addedColoSockets = connect(entry.getKey(), entry.getValue(), ZMQ.PUSH);
liveSocketsByDatacenter.put(entry.getKey(), addedColoSockets);
}
}
private List<SocketHolder> connect(Datacenters colo, List<String> addresses, int socketType) {
List<SocketHolder> socketList = new ArrayList<>();
for (String address : addresses) {
try {
Socket client = ctx.createSocket(socketType);
// Set random identity to make tracing easier
String identity = String.format("%04X-%04X", random.nextInt(), random.nextInt());
client.setIdentity(identity.getBytes(ZMQ.CHARSET));
client.setTCPKeepAlive(1);
client.setSendTimeOut(7);
client.setLinger(0);
client.connect(address);
SocketHolder zmq = new SocketHolder(client, ctx, address, true);
socketList.add(zmq);
} catch (Exception ex) {
// log error
}
}
return socketList;
}
// this method will be called by multiple threads to get the next live socket
public Optional<SocketHolder> getNextSocket() {
Optional<SocketHolder> liveSocket = Optional.absent();
List<Datacenters> dcs = Datacenters.getOrderedDatacenters();
for (Datacenters dc : dcs) {
liveSocket = getLiveSocket(liveSocketsByDatacenter.get(dc));
if (liveSocket.isPresent()) {
break;
}
}
return liveSocket;
}
private Optional<SocketHolder> getLiveSocket(final List<SocketHolder> listOfEndPoints) {
if (!CollectionUtils.isEmpty(listOfEndPoints)) {
Collections.shuffle(listOfEndPoints);
for (SocketHolder obj : listOfEndPoints) {
if (obj.isLive()) {
return Optional.of(obj);
}
}
}
return Optional.absent();
}
private void updateLiveSockets() {
Map<Datacenters, ImmutableList<String>> socketsByDatacenter = Utils.SERVERS;
for (Entry<Datacenters, ImmutableList<String>> entry : socketsByDatacenter.entrySet()) {
List<SocketHolder> liveSockets = liveSocketsByDatacenter.get(entry.getKey());
List<SocketHolder> liveUpdatedSockets = new ArrayList<>();
for (SocketHolder liveSocket : liveSockets) {
Socket socket = liveSocket.getSocket();
String endpoint = liveSocket.getEndpoint();
Map<byte[], byte[]> holder = populateMap();
boolean status = SendToSocket.getInstance().execute(3, holder, socket);
boolean isLive = (status) ? true : false;
SocketHolder zmq = new SocketHolder(socket, liveSocket.getContext(), endpoint, isLive);
liveUpdatedSockets.add(zmq);
}
liveSocketsByDatacenter.put(entry.getKey(), liveUpdatedSockets);
}
}
}
As you can see in my above class:
From a single background thread which runs every 30 seconds, I populate liveSocketsByDatacenter map with all the live sockets.
And then from multiple threads, I call getNextSocket method to give me live socket available which uses liveSocketsByDatacenter map to get the required information.
Is my above code thread safe and all the reader threads will see liveSocketsByDatacenter accurately? Since I am modifying liveSocketsByDatacenter map every 30 seconds from a single background thread and then from a lot of reader threads, I am calling getNextSocket method so I am not sure if I did anything wrong here.
It looks like there might be a thread safety issue in my "getLiveSocket" method as every read gets a shared ArrayList out of the map and shuffles it? And there might be few more places as well which I might have missed. What is the best way to fix these thread safety issues in my code?
If there is any better way to rewrite this, then I am open for that as well.
To be thread-safe, your code must synchronize any access to all shared mutable state.
Here you share liveSocketsByDatacenter, an instance of HashMap a non thread-safe implementation of a Map that can potentially be concurrently read (by updateLiveSockets and getNextSocket) and modified (by connectToZMQSockets and updateLiveSockets) without synchronizing any access which is already enough to make your code non thread safe. Moreover, the values of this Map are instances of ArrayList a non thread-safe implementation of a List that can also potentially be concurrently read (by getNextSocket and updateLiveSockets) and modified (by getLiveSocket more precisely by Collections.shuffle).
The simple way to fix your 2 thread safety issues could be to:
use a ConcurrentHashMap instead of a HashMap for your variable liveSocketsByDatacenter as it is a natively thread safe implementation of a Map.
put the unmodifiable version of your ArrayList instances as value of your map using Collections.unmodifiableList(List<? extends T> list), your lists would then be immutable so thread safe.
For example:
liveSocketsByDatacenter.put(
entry.getKey(), Collections.unmodifiableList(liveUpdatedSockets)
);`
rewrite your method getLiveSocket to avoid calling Collections.shuffle directly on your list, you could for example shuffle only the list of live sockets instead of all sockets or use a copy of your list (with for example new ArrayList<>(listOfEndPoints)) instead of the list itself.
For example:
private Optional<SocketHolder> getLiveSocket(final List<SocketHolder> listOfEndPoints) {
if (!CollectionUtils.isEmpty(listOfEndPoints)) {
// The list of live sockets
List<SocketHolder> liveOnly = new ArrayList<>(listOfEndPoints.size());
for (SocketHolder obj : listOfEndPoints) {
if (obj.isLive()) {
liveOnly.add(obj);
}
}
if (!liveOnly.isEmpty()) {
// The list is not empty so we shuffle it an return the first element
Collections.shuffle(liveOnly);
return Optional.of(liveOnly.get(0));
}
}
return Optional.absent();
}
For #1 as you seem to frequently read and rarely (only once every 30 seconds) modify your map, you could consider to rebuild your map then share its immutable version (using Collections.unmodifiableMap(Map<? extends K,? extends V> m)) every 30 seconds, this approach is very efficient in mostly read scenario as you no longer pay the price of any synchronization mechanism to access to the content of your map.
Your code would then be:
// Your variable is no more final, it is now volatile to ensure that all
// threads will see the same thing at all time by getting it from
// the main memory instead of the CPU cache
private volatile Map<Datacenters, List<SocketHolder>> liveSocketsByDatacenter
= Collections.unmodifiableMap(new HashMap<>());
private void connectToZMQSockets() {
Map<Datacenters, ImmutableList<String>> socketsByDatacenter = Utils.SERVERS;
// The map in which I put all the live sockets
Map<Datacenters, List<SocketHolder>> liveSockets = new HashMap<>();
for (Map.Entry<Datacenters, ImmutableList<String>> entry :
socketsByDatacenter.entrySet()) {
List<SocketHolder> addedColoSockets = connect(
entry.getKey(), entry.getValue(), ZMQ.PUSH
);
liveSockets.put(entry.getKey(), Collections.unmodifiableList(addedColoSockets));
}
// Set the new content of my map as an unmodifiable map
this.liveSocketsByDatacenter = Collections.unmodifiableMap(liveSockets);
}
public Optional<SocketHolder> getNextSocket() {
// For the sake of consistency make sure to use the same map instance
// in the whole implementation of my method by getting my entries
// from the local variable instead of the member variable
Map<Datacenters, List<SocketHolder>> liveSocketsByDatacenter =
this.liveSocketsByDatacenter;
...
}
...
// Added the modifier synchronized to prevent concurrent modification
// it is needed because to build the new map we first need to get the
// old one so both must be done atomically to prevent concistency issues
private synchronized void updateLiveSockets() {
// Initialize my new map with the current map content
Map<Datacenters, List<SocketHolder>> liveSocketsByDatacenter =
new HashMap<>(this.liveSocketsByDatacenter);
Map<Datacenters, ImmutableList<String>> socketsByDatacenter = Utils.SERVERS;
// The map in which I put all the live sockets
Map<Datacenters, List<SocketHolder>> liveSockets = new HashMap<>();
for (Entry<Datacenters, ImmutableList<String>> entry : socketsByDatacenter.entrySet()) {
...
liveSockets.put(entry.getKey(), Collections.unmodifiableList(liveUpdatedSockets));
}
// Set the new content of my map as an unmodifiable map
this.liveSocketsByDatacenter = Collections.unmodifiableMap(liveSocketsByDatacenter);
}
Your field liveSocketsByDatacenter could also be of type AtomicReference<Map<Datacenters, List<SocketHolder>>> , it would then be final, your map will still be stored in a volatile variable but within the class AtomicReference.
The previous code would then be:
private final AtomicReference<Map<Datacenters, List<SocketHolder>>> liveSocketsByDatacenter
= new AtomicReference<>(Collections.unmodifiableMap(new HashMap<>()));
...
private void connectToZMQSockets() {
...
// Update the map content
this.liveSocketsByDatacenter.set(Collections.unmodifiableMap(liveSockets));
}
public Optional<SocketHolder> getNextSocket() {
// For the sake of consistency make sure to use the same map instance
// in the whole implementation of my method by getting my entries
// from the local variable instead of the member variable
Map<Datacenters, List<SocketHolder>> liveSocketsByDatacenter =
this.liveSocketsByDatacenter.get();
...
}
// Added the modifier synchronized to prevent concurrent modification
// it is needed because to build the new map we first need to get the
// old one so both must be done atomically to prevent concistency issues
private synchronized void updateLiveSockets() {
// Initialize my new map with the current map content
Map<Datacenters, List<SocketHolder>> liveSocketsByDatacenter =
new HashMap<>(this.liveSocketsByDatacenter.get());
...
// Update the map content
this.liveSocketsByDatacenter.set(Collections.unmodifiableMap(liveSocketsByDatacenter));
}
As you can read in detail e.g. here, if multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally to avoid an inconsistent view of the contents.
So to be thread safe you should use either Java Collections synchronizedMap() method or a ConcurrentHashMap.
//synchronizedMap
private final Map<Datacenters, List<SocketHolder>> liveSocketsByDatacenter = Collections.synchronizedMap(new HashMap<Datacenters, List<SocketHolder>>());
or
//ConcurrentHashMap
private final Map<Datacenters, List<SocketHolder>> liveSocketsByDatacenter = new ConcurrentHashMap<Datacenters, List<SocketHolder>>();
As you have very highly concurrent application modifying and reading key value in different threads, you should also have a look at the Producer-Consumer principle, e.g. here.
It seems, that you can safely use ConcurrentHashMap here instead of regular HashMap and it should work.
In your current approach, using regular HashMap, you need to have synchronization of methods:
getNextSocket, connectToZMQSockets and updateLiveSockets (everywhere you update or read the HashMap) like a sychronized word before those methods or other lock on a monitor common for all these methods - And this is not because of ConcurrentModificationException, but because without synchornization reading threads can see not updated values.
There is also problem with concurrent modification in the getLiveSocket, one of the simplest way to avoid this problem is to copy the listOfEndpoints to a new list before shuffle, like this:
private Optional<SocketHolder> getLiveSocket(final List<SocketHolder> endPoints) {
List<SocketHolder> listOfEndPoints = new ArrayList<SocketHolder>(endPoints);
if (!CollectionUtils.isEmpty(listOfEndPoints)) {
Collections.shuffle(listOfEndPoints);
for (SocketHolder obj : listOfEndPoints) {
if (obj.isLive()) {
return Optional.of(obj);
}
}
}
return Optional.absent();
}
Using ConcurrentHashMap should make your code threadsafe. Alternatively use synchronized methods to access existing hashmap.
I am having some difficulty when using Map.putAll(). Instead of updating / adding particular records to my main map, it is overwriting the entries:
ConcurrentMap<String, ConcurrentHashMap<CardType, Card>> cache = new ConcurrentHashMap<String, ConcurrentHashMap<CardType, Card>>();
The three separate maps are generated as below:
ConcurrentMap<String, ConcurrentHashMap<CardType, Card>> businessCardCache = buildBusinesscardCacheValues(connection, getBusinessCards);
ConcurrentMap<String, ConcurrentHashMap<CardType, Card>> personalCardCache = buildPersonalcardCacheValues(connection, getPersonalCards);
ConcurrentMap<String, ConcurrentHashMap<CardType, Card>> socialCardCache = buildSocialcardCacheValues(connection, getSocialCard);
cache.putAll(businessCardCache);
cache.putAll(personalCardCache);
cache.putAll(socialCardCache);
What should happen is user ben for example should be the key and he should have a business a personal and a social card. What in fact happens is he only ends up with a socialCard as I assume it is the last to run and therefore overwrites the previous.
How should I approach modifying this?
Thanks
Your current initialization of cache would cause cache.putAll(personalCardCache); to replace the values added by cache.putAll(businessCardCache); for keys that appear in both maps.
If you want cache to contain all the cards of each user (taken from all 3 input maps), you should initialize it in a different way :
for (String key : businessCardCache.keySet()) {
ConcurrentHashMap<CardType, Card> cards = null;
if (cache.containsKey(key) {
cards = cache.get(key);
} else {
cards = new ConcurrentHashMap<CardType, Card>();
}
cards.putAll (businessCardCache.get(key));
}
Then you do the same for the other 2 maps.