How to make atomic a nested iterative operation on ConcurrentHashMaps? - java

I have a ConcurrentHashMap subscriptions that contain another object (sessionCollection) and I need to do the following iterative operation:
subscriptions.values().forEach(sessionCollection ->
sessionCollection.removeAllSubscriptionsOfSession(sessionId));
where sessionCollection.removeAllSubscriptionsOfSession does another iterative operation over a collection (also ConcurrentHashMap) inside sessionCollection:
// inside SessionCollection:
private final ConcurrentHashMap<String, CopyOnWriteArrayList<String>> topicsToSessions =
new ConcurrentHashMap<>();
public void removeAllSubscriptionsOfSession(String sessionId) {
// Remove sessions from all topics on record
topicsToSessions.keySet().forEach(topicSessionKey ->
removeTopicFromSession(sessionId, topicSessionKey));
}
What would be the steps to make this overall atomic operation?

ConcurrentHashMap has batch operations (forEach*()), but they are not atomic with respect to the whole map. The only way to make atomic batch changes on a map is to implement all the necessary synchronization yourself. For instance, by using synchronized blocks explicitly or by creating a wrapper (or an extension) for your map that will take care of synchronization where needed. In this case a simple HashMap will suffice since you have to do synchronization anyway:
public class SubscriptionsRegistry {
private final Map<Integer, SessionCollection> map = new HashMap<>();
public synchronized void removeSubscriptions(Integer sessionId) {
map.values().forEach(...);
}
public synchronized void addSubscription(...) {
...
}
...
}
You'll also want to protect the topics-to-sessions maps (at least their modifiable versions) from leaking outside your SubscriptionsRegistry, so nobody is able to modify them without proper synchronization.

Related

Implementing a cache within a Repository using HashMap question

I got this question on an interview and I'm trying to learn from this.
Assuming that this repository is used in a concurrent context with billions of messages in the database.
public class MessageRepository {
public static final Map<String, Message> cache = new HashMap<>();
public Message findMessageById(String id) {
if(cache.containsKey(id)) {
return cache.get(id);
}
Message p = loadMessageFromDb(id);
cache.put(id, p);
return p;
}
Message loadMessageFromDb(String id) {
/* query DB and map row to a Message object */
}
}
What are possible problems with this approach?
One I can think of is HashMap not being a thread safe implementation of Map. Perhaps ConcurrentHashMap would be better for that.
I wasn't sure about any other of the possible answers which were:
1) Class MessageRepository is final meaning it's immutable, so it can't have a modifiable cache.
(AFAIK HashMap is mutable and it's composed into MessageRepository so this wouldn't be an issue).
2) Field cache is final meaning that it's immutable, so it can't be modified by put.
(final doesn't mean immutable so this wouldn't be an issue either)
3) Field cache is static meaning that it will be reset every time an instance of MessageRepository will be built.
(cache will be shared by all instances of MessageRepository so it shouldn't be a problem)
4) HashMap is synchronized, performances may be better without synchronization.
(I think SynchronizedHashMap is the synced implementation)
5) HashMap does not implement evict mechanism out of the box, it may cause memory problems.
(What kind of problems?)
I see two problems with this cache. If loadMessageFromDb() is an expensive operation, then two threads can initiate duplicate calculations. This isn't alleviated even if you use ConcurrentHashMap. A proper implementation of a cache that avoid this would be:
public class MessageRepository {
private static final Map<String, Future<Message>> CACHE = new ConcurrentHashMap<>();
public Message findMessageById(String id) throws ExecutionException, InterruptedException {
Future<Message> messageFuture = CACHE.get(id);
if (null == messageFuture) {
FutureTask<Message> ft = new FutureTask<>(() -> loadMessageFromDb(id));
messageFuture = CACHE.putIfAbsent(id, ft);
if (null == messageFuture) {
messageFuture = ft;
ft.run();
}
}
return messageFuture.get();
}
}
(Taken directly from JCIP By Brian Goetz et. al.)
In the cache above, when a thread starts a computation, it puts the Future into the cache and then patiently waits till the computation finishes. Any thread that comes in with the same id sees that a computation is already ongoing and will again wait on the same future. If two threads call exactly at the same time, putIfAbsent ensures that only one thread is able to initiate the computation.
Java does not have any SynchronizedHashMap class. You should use ConcurrentHashMap. You can do Collections.synchronisedMap(new HashMap<>()) but it has really bad performance.
A problem with the above cache is that it does not evict entries. Java provides LinkedHashMap that can help you create a LRU cache, but it is not synchronised. If you want both functionalities, you should try Guava cache.

Adding or deleting elements concurrently from a Hashmap and achieving synchronization

I am new to Java and concurrency stuff.
The purpose of the assignment was to learn concurrency.
- So when answering this question please keep in mind that I am supposed to use only Hashmap (which is not synchronized by nature) and synchronize it myself. If you provide more knowledge its appreciated but not required.
I declared a hashmap like this:
private HashMap<String, Flight> flights = new HashMap<>();
recordID is the key of the flight to be deleted.
Flight flightObj = flights.get(recordID);
synchronized(flightObj){
Flight deletedFlight = flights.remove(recordID);
editResponse = "Flight with flight ID " + deletedFlight.getFlightID() +" deleted successfully";
return editResponse;
}
Now my doubt: Is it fine to synch on the basis of flightObj?
Doubt 2:
Flight newFlight = new Flight(FlightServerImpl.createFlightID());
flights.put(newFlight.getFlightID(),newFlight);
If I create flightts by using above code and if more than 1 thread try execute this code will there be any data consistency issues ? Why or why not?
Thanks in advance.
To quickly answer you questions:
Both are not okay - you can't remove two different objects in parallel, and you can't add two different objects in parallel.
From java documentation:
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:
So, it's okay for many threads to use get concurrently and even put that replaces an object.
But if you remove or add a new object - you need to synchronize before calling any hashmap function.
In that case you can either do what's suggested in the documentation and use a global lock. But, it seems that since some limited concurrency is still allowed, you could get that concurrency it by using a read/write lock.
You can do the following
class MySynchronizedHashMap<E> implements Collection<E>, Serializable {
private static final long serialVersionUID = 3053995032091335093L;
final Collection<E> c; // Backing Collection
final Object mutex; // Object on which to synchronize
SynchronizedCollection(Collection<E> c) {
this.c = Objects.requireNonNull(c);
mutex = this;
}
public boolean add(E e) {
synchronized (mutex) {return c.add(e);}
}
public boolean remove(Object o) {
synchronized (mutex) {return c.remove(o);}
}
}
MySynchronizedHashMap mshm = new MySynchronizedHashMap<>(new HashMap<String, Flight>());
mshm.add(new Flight());

How to lock a hashmap during refresh?

I have a static HashMap that is populated on application startup, and refreshed daily.
How can I ensure that during refresh no other thread can access the map?
#ThreadSafe
public class MyService {
private static final Map<String, Object> map = new HashMap<>();
private MyDao dao;
public void refresh(List<Object> objects) {
map.clear();
map.addAll(dao.findAll()); //maybe long running routine
}
public Object get(String key) {
map.get(key); //ensure this waits during a refresh??
}
}
Should I introduce a simple boolean lock that is set and cleared during refresh()? Or are there better choices? Or is the synchronized mechanism a way to go?
You could use a volatile map and reassign it after population:
public class MyService {
private static volatile Map<String, Object> map = new HashMap<>();
private MyDao dao;
public void refresh(List<Object> objects) {
Map<String, Object> newMap = new HashMap<>();
newMap.addAll(dao.findAll()); //maybe long running routine
map = newMap;
}
public Object get(String key) {
map.get(key); //ensure this waits during a refresh??
}
}
It is non blocking, the assignment from newMap to map is atomic and ensures visibility: any subsequent call to get will be based on the refreshed map.
Performance wise this should work well because volatile reads are almost as fast as normal reads. Volatile writes are a tiny bit slower but considering the refreshing frequency it should not be an issue. If performance matters you should run appropriate tests.
Note: you must make sure that no external code can get access to the map reference, otherwise that code could access stale data.
Please dont make the map-attribute static, all accessor-methods are non-static.
If get should wait or refresh mutates the map instead of completely exchanging it, then ReadWriteLock is the way to go. ConcurrentMap if the collection is mutated but get should not wait.
But if refresh completely replaces the map, i may suggest different non-waiting implementations:
1) do the long running operation outside the synchronized block
public void refresh() {
Map<String, Object> objs = dao.findAll();
synchronized(this) {
map.clear();
map.addAll(objs);
}
}
public Object get(String key) {
synchronized(this) {
return map.get(key);
}
}
The readers are not run in parallel, but else perfectly valid.
2) use a volatile non-final reference of an nonchanged collection:
// guava's ImmutableHashMap instead of Map would be even better
private volatile Map<String, Object> map = new HashMap<>();
public void refresh() {
Map<String, Object> map = dao.findAll();
this.map = map;
}
3) AtomicReference of nonchanged collection
Instead of a volatile reference also a AtomicReference may be used. Probably better because more explicit than the easily missed volatile.
// guava's ImmutableHashMap instead of Map would be even better
private final AtomicReference<Map<String, Object>> mapRef =
new AtomicReference<>(new HashMap<String, Object>());
public void refresh() {
mapRef.set(dao.findAll());
}
public Object get(String key) {
return map.get().get(key);
}
Using synchronized block or a ReadWriteLock would be a better choice here. This way, you wouldn't have to change anything in the calling code.
You could also use a concurrentHash, but in that case, for aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries.
It's weird you need to clear() then addAll() for such a global map. I smell your problem needs to be resolved properly by a ReadWriteLock protected double buffering.
Anyway, from a pure performance point of view, on normal server boxes with total number of CPU core < 32, and much more read than write, ConcurrentHashMap is probably your best choice. Otherwise it needs to be studied case by case.

Hashmaps used in multithreaded environment

public class Test {
private final Map<URI, Set<TestObject>> uriToTestObject = new HashMap<URI, Set<TestObject>>();
private final Map<Channel, TestObject> connToTestObject = new HashMap<Channel, TestObject>();
private static class TestObject {
private URI server;
private Channel channel;
private final long startNano = System.nanoTime();
private AtomicInteger count = new AtomicInteger(0);
}
}
This is a class which I am planning to use as a connection manager. There are two maps one will have server uri to connection details i.e Test object and other will have channel to TestObject i.e connection entry details, when connection is created then put channel testobject and server uri as per required in both the maps , when give another request first check in the map for that server uri and obtain a channel, similarly when channel is close remove from both the maps, its corresponding entries i.e channel object and test object, should I use concurrent hash map or should I use HashMap and then synchronize on the add remove methods, also I shall be using count AtomicInteger variable for statistics purpose it will be incremented and decremented.
My question here is in the multithreaded environment do I need to make my methods synchronized even if I use ConcurrentHashmap, as I would be doing some operations on both maps in one method.
Yes, you need synchronization in multi-threaded environment.
Its better if you go with block level synchronization instead of method level synchronization.
Code snippet:
Object lock = new Object();
void method1(){
synchronized(lock){
//do your operation on hash map
}
}
void method2(){
synchronized(lock){
//do your operation on hash map
}
}
And about ConcurrentHashMap
Retrieval operations (including get) generally do not block, so may
overlap with update operations (including put and remove).
So yes you still may need to syncronization even you used ConcurrentHashMap.
Since you need operate two Maps at the same time,
Make the method synchronized is better choice.
And HashMap is enough if method is synchronized

Reloadable cache

I need to store a lookup map in memory on a servlet. The map should be loaded from a file, and whenever the file is updated (which is not very often), the map should be reloaded in the same thread that is doing the lookup.
But I'm not sure how to implement this functionality in a thread safe manner. I want to make sure that the reload does not happen more than once.
public class LookupTable
{
private File file;
private long mapLastUpdate;
private Map<String, String> map;
public String getValue(String key)
{
long fileLastUpdate = file.lastModified();
if (fileLastUpdate > mapLastUpdate)
{
// Only the first thread should run the code in the synchronized block.
// The other threads will wait until it is finished. Then skip it.
synchronized (this)
{
Map newMap = loadMap();
this.map = newMap;
this.mapLastUpdate = fileLastUpdate;
}
}
return map.get(key);
}
private Map<String, String> loadMap()
{
// Load map from file.
return null;
}
}
If anyone has any suggestions on external libraries that has solved this already, that would work too. I took a quick look at some caching libraries, but I couldn't find what I needed.
Thanks
I would suggest you using imcache. Please a build a concurrent cache with cache loader as follows,
Cache<String,LookupTable> lookupTableCache = CacheBuilder.
concurrentHeapCache().cacheLoader(new CacheLoader<String, LookupTable>() {
public LookupTable load(String key) {
//code to load item from file.
}
}).build();
As suggested by z5h, you need to protect your condition (fileLastUpdate > mapsLastUpdate) by the same lock that is used to keep the file reloading atomic.
The way I think about this stuff is to look at all of the member variables in the class and figure out what thread-safety guarantees they need. In your case, none of the members (File, long, HashMap -- ok, I'm assuming HashMap) are thread safe, and thus they must all be protected by a lock. They're also all involved in an invariant (they all change together) together, so they must be protected by the SAME lock.
Your code, updated, and using the annotations (these are just info, they don't enforce anything!) suggested by Java Concurrency In Practice (an excellent book all Java devs should read :))
/**
* Lookup table that automatically reloads itself from a file
* when the filechanges.
*/
#ThreadSafe
public class LookupTable
{
#GuardedBy("this")
private long mapLastUpdate;
#GuardedBy("this")
private final File file;
#GuardedBy("this")
private Map<String, String> map;
public LookupTable(File file)
{
this.file = file;
this.map = loadMap()
}
public synchronized String getValue(String key)
{
long fileLastUpdate = file.lastModified();
if (fileLastUpdate > this.mapLastUpdate)
{
// Only the first thread should run the code in the synchronized block.
// The other threads will wait until it is finished. Then skip it.
Map newMap = loadMap();
this.map = newMap;
this.mapLastUpdate = fileLastUpdate;
}
return map.get(key);
}
private synchronized Map<String, String> loadMap()
{
// Load map from file.
return null;
}
}
This will be safe, but it is fully synchronized: only one thread doing a lookup in the map at once. If you need concurrency on the lookups, you'll need a more sophisticated scheme. Implementation would depend on whether threads are allowed to see the old version of the lookup table while the new one is loading, among other things.
If you made the map member final, and protected it with a ReadWriteLock, you might get some bang. It's hard to predict how much contention you might have on this lock from the limited info here.
Your check needs to be in the synchronized block.
Otherwise several threads could read (fileLastUpdate > mapLastUpdate) as true, then all block on the update code. Worst of both worlds.

Categories

Resources