Modifying hash map from a single thread and reading from multiple threads?

Modifying hash map from a single thread and reading from multiple threads? - java

I have a class in which I am populating a map liveSocketsByDatacenter from a single background thread every 30 seconds and then I have a method getNextSocket which will be called by multiple reader threads to get a live socket available which uses the same map to get this info.
public class SocketManager {
private static final Random random = new Random();
private final ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor();
private final Map<Datacenters, List<SocketHolder>> liveSocketsByDatacenter = new HashMap<>();
private final ZContext ctx = new ZContext();
// Lazy Loaded Singleton Pattern
private static class Holder {
private static final SocketManager instance = new SocketManager();
}
public static SocketManager getInstance() {
return Holder.instance;
}
private SocketManager() {
connectToZMQSockets();
scheduler.scheduleAtFixedRate(new Runnable() {
public void run() {
updateLiveSockets();
}
}, 30, 30, TimeUnit.SECONDS);
}
private void connectToZMQSockets() {
Map<Datacenters, ImmutableList<String>> socketsByDatacenter = Utils.SERVERS;
for (Map.Entry<Datacenters, ImmutableList<String>> entry : socketsByDatacenter.entrySet()) {
List<SocketHolder> addedColoSockets = connect(entry.getKey(), entry.getValue(), ZMQ.PUSH);
liveSocketsByDatacenter.put(entry.getKey(), addedColoSockets);
}
}
private List<SocketHolder> connect(Datacenters colo, List<String> addresses, int socketType) {
List<SocketHolder> socketList = new ArrayList<>();
for (String address : addresses) {
try {
Socket client = ctx.createSocket(socketType);
// Set random identity to make tracing easier
String identity = String.format("%04X-%04X", random.nextInt(), random.nextInt());
client.setIdentity(identity.getBytes(ZMQ.CHARSET));
client.setTCPKeepAlive(1);
client.setSendTimeOut(7);
client.setLinger(0);
client.connect(address);
SocketHolder zmq = new SocketHolder(client, ctx, address, true);
socketList.add(zmq);
} catch (Exception ex) {
// log error
}
}
return socketList;
}
// this method will be called by multiple threads to get the next live socket
public Optional<SocketHolder> getNextSocket() {
Optional<SocketHolder> liveSocket = Optional.absent();
List<Datacenters> dcs = Datacenters.getOrderedDatacenters();
for (Datacenters dc : dcs) {
liveSocket = getLiveSocket(liveSocketsByDatacenter.get(dc));
if (liveSocket.isPresent()) {
break;
}
}
return liveSocket;
}
private Optional<SocketHolder> getLiveSocket(final List<SocketHolder> listOfEndPoints) {
if (!CollectionUtils.isEmpty(listOfEndPoints)) {
Collections.shuffle(listOfEndPoints);
for (SocketHolder obj : listOfEndPoints) {
if (obj.isLive()) {
return Optional.of(obj);
}
}
}
return Optional.absent();
}
private void updateLiveSockets() {
Map<Datacenters, ImmutableList<String>> socketsByDatacenter = Utils.SERVERS;
for (Entry<Datacenters, ImmutableList<String>> entry : socketsByDatacenter.entrySet()) {
List<SocketHolder> liveSockets = liveSocketsByDatacenter.get(entry.getKey());
List<SocketHolder> liveUpdatedSockets = new ArrayList<>();
for (SocketHolder liveSocket : liveSockets) {
Socket socket = liveSocket.getSocket();
String endpoint = liveSocket.getEndpoint();
Map<byte[], byte[]> holder = populateMap();
boolean status = SendToSocket.getInstance().execute(3, holder, socket);
boolean isLive = (status) ? true : false;
SocketHolder zmq = new SocketHolder(socket, liveSocket.getContext(), endpoint, isLive);
liveUpdatedSockets.add(zmq);
}
liveSocketsByDatacenter.put(entry.getKey(), liveUpdatedSockets);
}
}
}
As you can see in my above class:
From a single background thread which runs every 30 seconds, I populate liveSocketsByDatacenter map with all the live sockets.
And then from multiple threads, I call getNextSocket method to give me live socket available which uses liveSocketsByDatacenter map to get the required information.
Is my above code thread safe and all the reader threads will see liveSocketsByDatacenter accurately? Since I am modifying liveSocketsByDatacenter map every 30 seconds from a single background thread and then from a lot of reader threads, I am calling getNextSocket method so I am not sure if I did anything wrong here.
It looks like there might be a thread safety issue in my "getLiveSocket" method as every read gets a shared ArrayList out of the map and shuffles it? And there might be few more places as well which I might have missed. What is the best way to fix these thread safety issues in my code?
If there is any better way to rewrite this, then I am open for that as well.

To be thread-safe, your code must synchronize any access to all shared mutable state.
Here you share liveSocketsByDatacenter, an instance of HashMap a non thread-safe implementation of a Map that can potentially be concurrently read (by updateLiveSockets and getNextSocket) and modified (by connectToZMQSockets and updateLiveSockets) without synchronizing any access which is already enough to make your code non thread safe. Moreover, the values of this Map are instances of ArrayList a non thread-safe implementation of a List that can also potentially be concurrently read (by getNextSocket and updateLiveSockets) and modified (by getLiveSocket more precisely by Collections.shuffle).
The simple way to fix your 2 thread safety issues could be to:
use a ConcurrentHashMap instead of a HashMap for your variable liveSocketsByDatacenter as it is a natively thread safe implementation of a Map.
put the unmodifiable version of your ArrayList instances as value of your map using Collections.unmodifiableList(List<? extends T> list), your lists would then be immutable so thread safe.
For example:
liveSocketsByDatacenter.put(
entry.getKey(), Collections.unmodifiableList(liveUpdatedSockets)
);`
rewrite your method getLiveSocket to avoid calling Collections.shuffle directly on your list, you could for example shuffle only the list of live sockets instead of all sockets or use a copy of your list (with for example new ArrayList<>(listOfEndPoints)) instead of the list itself.
For example:
private Optional<SocketHolder> getLiveSocket(final List<SocketHolder> listOfEndPoints) {
if (!CollectionUtils.isEmpty(listOfEndPoints)) {
// The list of live sockets
List<SocketHolder> liveOnly = new ArrayList<>(listOfEndPoints.size());
for (SocketHolder obj : listOfEndPoints) {
if (obj.isLive()) {
liveOnly.add(obj);
}
}
if (!liveOnly.isEmpty()) {
// The list is not empty so we shuffle it an return the first element
Collections.shuffle(liveOnly);
return Optional.of(liveOnly.get(0));
}
}
return Optional.absent();
}
For #1 as you seem to frequently read and rarely (only once every 30 seconds) modify your map, you could consider to rebuild your map then share its immutable version (using Collections.unmodifiableMap(Map<? extends K,? extends V> m)) every 30 seconds, this approach is very efficient in mostly read scenario as you no longer pay the price of any synchronization mechanism to access to the content of your map.
Your code would then be:
// Your variable is no more final, it is now volatile to ensure that all
// threads will see the same thing at all time by getting it from
// the main memory instead of the CPU cache
private volatile Map<Datacenters, List<SocketHolder>> liveSocketsByDatacenter
= Collections.unmodifiableMap(new HashMap<>());
private void connectToZMQSockets() {
Map<Datacenters, ImmutableList<String>> socketsByDatacenter = Utils.SERVERS;
// The map in which I put all the live sockets
Map<Datacenters, List<SocketHolder>> liveSockets = new HashMap<>();
for (Map.Entry<Datacenters, ImmutableList<String>> entry :
socketsByDatacenter.entrySet()) {
List<SocketHolder> addedColoSockets = connect(
entry.getKey(), entry.getValue(), ZMQ.PUSH
);
liveSockets.put(entry.getKey(), Collections.unmodifiableList(addedColoSockets));
}
// Set the new content of my map as an unmodifiable map
this.liveSocketsByDatacenter = Collections.unmodifiableMap(liveSockets);
}
public Optional<SocketHolder> getNextSocket() {
// For the sake of consistency make sure to use the same map instance
// in the whole implementation of my method by getting my entries
// from the local variable instead of the member variable
Map<Datacenters, List<SocketHolder>> liveSocketsByDatacenter =
this.liveSocketsByDatacenter;
...
}
...
// Added the modifier synchronized to prevent concurrent modification
// it is needed because to build the new map we first need to get the
// old one so both must be done atomically to prevent concistency issues
private synchronized void updateLiveSockets() {
// Initialize my new map with the current map content
Map<Datacenters, List<SocketHolder>> liveSocketsByDatacenter =
new HashMap<>(this.liveSocketsByDatacenter);
Map<Datacenters, ImmutableList<String>> socketsByDatacenter = Utils.SERVERS;
// The map in which I put all the live sockets
Map<Datacenters, List<SocketHolder>> liveSockets = new HashMap<>();
for (Entry<Datacenters, ImmutableList<String>> entry : socketsByDatacenter.entrySet()) {
...
liveSockets.put(entry.getKey(), Collections.unmodifiableList(liveUpdatedSockets));
}
// Set the new content of my map as an unmodifiable map
this.liveSocketsByDatacenter = Collections.unmodifiableMap(liveSocketsByDatacenter);
}
Your field liveSocketsByDatacenter could also be of type AtomicReference<Map<Datacenters, List<SocketHolder>>> , it would then be final, your map will still be stored in a volatile variable but within the class AtomicReference.
The previous code would then be:
private final AtomicReference<Map<Datacenters, List<SocketHolder>>> liveSocketsByDatacenter
= new AtomicReference<>(Collections.unmodifiableMap(new HashMap<>()));
...
private void connectToZMQSockets() {
...
// Update the map content
this.liveSocketsByDatacenter.set(Collections.unmodifiableMap(liveSockets));
}
public Optional<SocketHolder> getNextSocket() {
// For the sake of consistency make sure to use the same map instance
// in the whole implementation of my method by getting my entries
// from the local variable instead of the member variable
Map<Datacenters, List<SocketHolder>> liveSocketsByDatacenter =
this.liveSocketsByDatacenter.get();
...
}
// Added the modifier synchronized to prevent concurrent modification
// it is needed because to build the new map we first need to get the
// old one so both must be done atomically to prevent concistency issues
private synchronized void updateLiveSockets() {
// Initialize my new map with the current map content
Map<Datacenters, List<SocketHolder>> liveSocketsByDatacenter =
new HashMap<>(this.liveSocketsByDatacenter.get());
...
// Update the map content
this.liveSocketsByDatacenter.set(Collections.unmodifiableMap(liveSocketsByDatacenter));
}

As you can read in detail e.g. here, if multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally to avoid an inconsistent view of the contents.
So to be thread safe you should use either Java Collections synchronizedMap() method or a ConcurrentHashMap.
//synchronizedMap
private final Map<Datacenters, List<SocketHolder>> liveSocketsByDatacenter = Collections.synchronizedMap(new HashMap<Datacenters, List<SocketHolder>>());
or
//ConcurrentHashMap
private final Map<Datacenters, List<SocketHolder>> liveSocketsByDatacenter = new ConcurrentHashMap<Datacenters, List<SocketHolder>>();
As you have very highly concurrent application modifying and reading key value in different threads, you should also have a look at the Producer-Consumer principle, e.g. here.

It seems, that you can safely use ConcurrentHashMap here instead of regular HashMap and it should work.
In your current approach, using regular HashMap, you need to have synchronization of methods:
getNextSocket, connectToZMQSockets and updateLiveSockets (everywhere you update or read the HashMap) like a sychronized word before those methods or other lock on a monitor common for all these methods - And this is not because of ConcurrentModificationException, but because without synchornization reading threads can see not updated values.
There is also problem with concurrent modification in the getLiveSocket, one of the simplest way to avoid this problem is to copy the listOfEndpoints to a new list before shuffle, like this:
private Optional<SocketHolder> getLiveSocket(final List<SocketHolder> endPoints) {
List<SocketHolder> listOfEndPoints = new ArrayList<SocketHolder>(endPoints);
if (!CollectionUtils.isEmpty(listOfEndPoints)) {
Collections.shuffle(listOfEndPoints);
for (SocketHolder obj : listOfEndPoints) {
if (obj.isLive()) {
return Optional.of(obj);
}
}
}
return Optional.absent();
}

Using ConcurrentHashMap should make your code threadsafe. Alternatively use synchronized methods to access existing hashmap.

Related

Multithreading in a hashmap entry loop - Executor service (Java)

I have a map (HashMap<String, Map<String,String> mapTest) in which I have a loop that does several operations.
HashMap<String, Map<String, String> mapTest = new HashMap<>();
ArrayList<Object> testFinals = new ArrayList<>();
for (Map.Entry<String, Map<String, String>> entry : mapTest.entrySet()) {
// in here I do a lot of things, like another for loops, if's, etc.
//the final juice to get from here is that in each time the loop is executed I have this:
List<Object> resultExp = methodXYZ(String, String, String);
testFinals.addAll(resultExp);
}
- In here, I have to wait before I proceed, since I need the full testFinals filled to advance.
Now, what I need to do is:
1 - This mapTest can have like 400 rows to iterate from.
I want to schedule like 4 threads, and assign like 100 rows of that FOR cycle to thread 1, the next 100 rows of the mapTest to thread 2, and so on.
Already tryed a few solutions, like this one:
ExecutorService taskExecutor = Executors.newFixedThreadPool(4);
while(...) {
taskExecutor.execute(new MyTask());
}
taskExecutor.shutdown();
try {
taskExecutor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException e) {
...
}
but I can't adapt this correctly or a similar working solution to what I have now, with that map iteration.

HashMap is not a thread safe data structure.
When using concurrent, consider that the threads must obtain, hold and relinquish locks on a variable.
This is done at field level -not content.
On short ... The hasmap is locked for access y a specific thread. Not some random entry.

How to do ConcurrentHashMap.computeIfAbsent without assigning the entry to hashmap?

Is there a way to simulate ConcurrentHashmap.computeIfAbsent but not assign the entry to the hashmap. I just need the instance that the method generated when the entry in the hashmap is not yet there. These are running in Thread. So I need it to be thread-safe.
I have tried with sychronized block, but this will hold the entire hash. I want it to be like the computeIfAbsent.
synchronized(hashMap) {
if (hashMap.contains(key)) {
rs = hashMap.get(key);
} else {
rs = createNewInstance();
}
}
/* this would be perfect, but I don't what the new instance to be in the hashMap */
rs = hashMap.computeIfAbsent(key, k -> createNewInstance());

I guess you could do something like:
AtomicReference<Foo> holder = new AtomicReference<>();
map.computeIfAbsent(key, k -> {
holder.set(/* compute the value */);
return null; // returning null means no value is stored.
});
/* Use holder.get() to access value */
but this is a pretty unusual requirement.

You can use map.getOrDefault(key, defaultValue) instead.

There is no such method. You could use the fact that get returns null if the value is not present:
public static void main(String[] args) {
ConcurrentMap<String, String> map = new ConcurrentHashMap<>();
Object rs = Optional.ofNullable(map.get("key"))
.orElseGet(() -> "created instance");
}

Java HashMaps with only put and get - possibly concurrency issues?

I've worked with ConcurrentHashMaps, but I'm not quite sure if that will cover all of the bases here.
I have a Spring Component. This Component will contain a Map. This will simply be a quick reference for objects in an external service. If the map does not contain a matching String, it will make a call to the external service, retrieve the object, and store it in the mapping. Then other classes can use the mapping for quick retrieval and usage. As such, there are only put() and get() operations performed on the map. Entries are never removed.
That being said, I'm a little concerned that ConcurrentHashMap may not provide the atomic control I'd like. Fetching SomeObject from the external service is potentially expensive. I'd rather not have two separate threads calling in at nearly the same time, resulting in multiple calls for the same value to the external service.
The idea is this:
Map<String, SomeObject> map = Collections.concurrentHashMap(
new HashMap<String, SomeObject>());
public SomeObject getSomeObject(String key){
if (!map.containsKey(key)){
map.put(key, retrieveSomeObjectFromService(key));
}
return map.get(key);
Or this:
Map<String, SomeObject> map = new HashMap<String, SomeObject>();
public SomeObject getSomeObject(String key){
synchronized(map){
if (!map.containsKey(key)){
map.put(key, retrieveSomeObjectFromService(key));
}
}
return map.get(key);
}
The former is certainly simpler, but the latter would ensure that one two or more threads won't try to simultaneously trigger a fetching of the same SomeObject. Alternatively, I suppose I could try locking out only gets that attempt to retrieve a SomeObject that is already in the process of being fetched and does not block retrieving SomeObjects that already exist, but that would require a wait mechanism on the various string values and I'm not sure how to best implement that.

I would suggest you do a little bit of both!
Fast path, just 1 get out of the concurrent hashmap.
Slow path, full sync and lock
private final ConcurrentHashMap<String, Object> map = new ConcurrentHashMap<String, Object>();
private final ReentrantLock lock = new ReentrantLock();
public Object getSomeObject(String key) {
Object value = map.get(key);
if (value == null) {
try {
lock.lock();
value = map.get(key);
if (value == null) {
value = retrieveSomeObjectFromService(key);
map.put(key, value);
}
} finally {
lock.unlock();
}
}
return value;
}
Do you understand why we need the 2nd get inside of the lock? Leaving that out leaves a case where we end up making the inside object twice, and having different copies of it floating around.
Also doing the assign the result to value and nullcheck vs using the contains method - understand why that is better? If we do a .contains then a .get, we just did 2 hashmap lookup. If I just do a get, I can cut my hashmap lookup time in half.
Another version Peter suggested.. less lines of code, but not my personal preference:
private final ConcurrentHashMap<String, Object> map = new ConcurrentHashMap<String, Object>();
public Object getSomeObject(String key) {
Object value = map.get(key);
if (value == null) {
synchronized (map) {
value = map.get(key);
if (value == null) {
value = retrieveSomeObjectFromService(key);
map.put(key, value);
}
}
}
return value;
}

Inserting enum values into HashMap

I am doing a program in which i need to insert enum values into a HashMap. Can we really do that? I tried out it in many ways, but failed.
Can anyone please help me? Through the program I need to implement a HashMap containing 4 threadpools (whose names act as key) corresponding to which i have a ThreapoolExcecutor object.
Below given is my code :
public class MyThreadpoolExcecutorPgm {
enum ThreadpoolName
{
DR,
PQ,
EVENT,
MISCELLENEOUS;
}
private static String threadName;
private static HashMap<String, ThreadPoolExecutor> threadpoolExecutorHash;
public MyThreadpoolExcecutorPgm(String p_threadName) {
threadName = p_threadName;
}
public static void fillthreadpoolExecutorHash() {
int poolsize = 3;
int maxpoolsize = 3;
long keepAliveTime = 10;
ThreadPoolExecutor tp = null;
threadpoolExecutorHash = new HashMap<String, ThreadPoolExecutor>();
ThreadpoolName poolName ;
tp = new ThreadPoolExecutor(poolsize, maxpoolsize, keepAliveTime,
TimeUnit.SECONDS, new ArrayBlockingQueue<Runnable>(5));
threadpoolExecutorHash.put(poolName,tp); //Here i am failing to implement currect put()
}

You may want to consider using an EnumMap instead of a HashMap here. EnumMap is much faster and more space-efficient than a HashMap when using enumerated values, which seems to be precisely what you're doing here.

Sure, it's possible to have enums as keys in a Map.
You get an error because the threadpoolExecutorHash maps from Strings to ThreadPoolExecutors, and it fails because you're trying to insert a String (poolName) as key.
Just change from
threadpoolExecutorHash = new HashMap<String, ThreadPoolExecutor>();
to
threadpoolExecutorHash = new HashMap<ThreadpoolName, ThreadPoolExecutor>();
As mentioned by #templatetypedef, there is even a special Map implementation, EnumMap tailored for using enums as keys.

You're using a String as the key of your HashMap, you should be using the Enum class instead. Your code should look like this :
public class MyThreadpoolExcecutorPgm {
enum ThreadpoolName
{
DR,
PQ,
EVENT,
MISCELLENEOUS;
}
private static String threadName;
private static HashMap<ThreadpoolName, ThreadPoolExecutor> threadpoolExecutorHash;
public MyThreadpoolExcecutorPgm(String p_threadName) {
threadName = p_threadName;
}
public static void fillthreadpoolExecutorHash() {
int poolsize = 3;
int maxpoolsize = 3;
long keepAliveTime = 10;
ThreadPoolExecutor tp = null;
threadpoolExecutorHash = new HashMap<ThreadpoolName, ThreadPoolExecutor>();
ThreadpoolName poolName ;
tp = new ThreadPoolExecutor(poolsize, maxpoolsize, keepAliveTime,
TimeUnit.SECONDS, new ArrayBlockingQueue<Runnable>(5));
threadpoolExecutorHash.put(poolName,tp); //Here i am failing to implement currect put()
}

Can/Should I use ConcurrentMap together with my own cache?

Many people refer ConcurrentMap as a cache.
Is it a good idea to do this:
public List<Task> listTasks(final ProcessDefinition def, final boolean filterEnumerated) {
final String CACHENAME = def.getName() + "-v" + def.getVersion() + "-Tasks";
ConcurrentMap<String, List<Task>> cache = (ConcurrentMap<String, List<Task>>) Contexts.getApplicationContext().get(CACHENAME);
if (Contexts.getApplicationContext().isSet(CACHENAME) && cache != null) {
return cache.get(CACHENAME);
} else {
ConcurrentMap<String, List<Task>> myTasks = new MapMaker()
.softValues()
.expiration(2L, TimeUnit.HOURS)
.makeComputingMap(
new Function<String, List<Task>>() {
#Override
public List<Task> apply(String from) {
return getTasksFromDefinition(def, filterEnumerated);
}
});
myTasks.put(CACHENAME, getTasksFromDefinition(def, filterEnumerated));
Contexts.getApplicationContext().set(CACHENAME,myTasks);
Collection<List<Task>> tz = myTasks.values();
//First element in the collection
return new ArrayList<Task>(tz.iterator().next());
}
}
Or isn't it necessary to use the ApplicationContext (which is the java-ee application context) to also cache the map, and just retrieve the value from the map?
Similar to the answer on this post
Also I would like to know, the .expiration(2L, TimeUnit.HOURS). Is this really 2 hours, or does the long take values in milliseconds?

I personally think it's fine to store such a cache in your ApplicationContext as it is threadsafe. But do be aware of the size of the cache, especially if you are clustering etc.
On the expiration question, the expiration is 2 hours as you'd expect.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Modifying hash map from a single thread and reading from multiple threads? - java

Using ConcurrentHashMap should make your code threadsafe. Alternatively use synchronized methods to access existing hashmap.

Related

Multithreading in a hashmap entry loop - Executor service (Java)

How to do ConcurrentHashMap.computeIfAbsent without assigning the entry to hashmap?

Java HashMaps with only put and get - possibly concurrency issues?

Inserting enum values into HashMap

Can/Should I use ConcurrentMap together with my own cache?

Categories

Resources