Hashmaps used in multithreaded environment

Hashmaps used in multithreaded environment - java

public class Test {
private final Map<URI, Set<TestObject>> uriToTestObject = new HashMap<URI, Set<TestObject>>();
private final Map<Channel, TestObject> connToTestObject = new HashMap<Channel, TestObject>();
private static class TestObject {
private URI server;
private Channel channel;
private final long startNano = System.nanoTime();
private AtomicInteger count = new AtomicInteger(0);
}
}
This is a class which I am planning to use as a connection manager. There are two maps one will have server uri to connection details i.e Test object and other will have channel to TestObject i.e connection entry details, when connection is created then put channel testobject and server uri as per required in both the maps , when give another request first check in the map for that server uri and obtain a channel, similarly when channel is close remove from both the maps, its corresponding entries i.e channel object and test object, should I use concurrent hash map or should I use HashMap and then synchronize on the add remove methods, also I shall be using count AtomicInteger variable for statistics purpose it will be incremented and decremented.
My question here is in the multithreaded environment do I need to make my methods synchronized even if I use ConcurrentHashmap, as I would be doing some operations on both maps in one method.

Yes, you need synchronization in multi-threaded environment.
Its better if you go with block level synchronization instead of method level synchronization.
Code snippet:
Object lock = new Object();
void method1(){
synchronized(lock){
//do your operation on hash map
}
}
void method2(){
synchronized(lock){
//do your operation on hash map
}
}
And about ConcurrentHashMap
Retrieval operations (including get) generally do not block, so may
overlap with update operations (including put and remove).
So yes you still may need to syncronization even you used ConcurrentHashMap.

Since you need operate two Maps at the same time,
Make the method synchronized is better choice.
And HashMap is enough if method is synchronized

Related

Implementing a cache within a Repository using HashMap question

I got this question on an interview and I'm trying to learn from this.
Assuming that this repository is used in a concurrent context with billions of messages in the database.
public class MessageRepository {
public static final Map<String, Message> cache = new HashMap<>();
public Message findMessageById(String id) {
if(cache.containsKey(id)) {
return cache.get(id);
}
Message p = loadMessageFromDb(id);
cache.put(id, p);
return p;
}
Message loadMessageFromDb(String id) {
/* query DB and map row to a Message object */
}
}
What are possible problems with this approach?
One I can think of is HashMap not being a thread safe implementation of Map. Perhaps ConcurrentHashMap would be better for that.
I wasn't sure about any other of the possible answers which were:
1) Class MessageRepository is final meaning it's immutable, so it can't have a modifiable cache.
(AFAIK HashMap is mutable and it's composed into MessageRepository so this wouldn't be an issue).
2) Field cache is final meaning that it's immutable, so it can't be modified by put.
(final doesn't mean immutable so this wouldn't be an issue either)
3) Field cache is static meaning that it will be reset every time an instance of MessageRepository will be built.
(cache will be shared by all instances of MessageRepository so it shouldn't be a problem)
4) HashMap is synchronized, performances may be better without synchronization.
(I think SynchronizedHashMap is the synced implementation)
5) HashMap does not implement evict mechanism out of the box, it may cause memory problems.
(What kind of problems?)

I see two problems with this cache. If loadMessageFromDb() is an expensive operation, then two threads can initiate duplicate calculations. This isn't alleviated even if you use ConcurrentHashMap. A proper implementation of a cache that avoid this would be:
public class MessageRepository {
private static final Map<String, Future<Message>> CACHE = new ConcurrentHashMap<>();
public Message findMessageById(String id) throws ExecutionException, InterruptedException {
Future<Message> messageFuture = CACHE.get(id);
if (null == messageFuture) {
FutureTask<Message> ft = new FutureTask<>(() -> loadMessageFromDb(id));
messageFuture = CACHE.putIfAbsent(id, ft);
if (null == messageFuture) {
messageFuture = ft;
ft.run();
}
}
return messageFuture.get();
}
}
(Taken directly from JCIP By Brian Goetz et. al.)
In the cache above, when a thread starts a computation, it puts the Future into the cache and then patiently waits till the computation finishes. Any thread that comes in with the same id sees that a computation is already ongoing and will again wait on the same future. If two threads call exactly at the same time, putIfAbsent ensures that only one thread is able to initiate the computation.
Java does not have any SynchronizedHashMap class. You should use ConcurrentHashMap. You can do Collections.synchronisedMap(new HashMap<>()) but it has really bad performance.
A problem with the above cache is that it does not evict entries. Java provides LinkedHashMap that can help you create a LRU cache, but it is not synchronised. If you want both functionalities, you should try Guava cache.

Adding or deleting elements concurrently from a Hashmap and achieving synchronization

I am new to Java and concurrency stuff.
The purpose of the assignment was to learn concurrency.
- So when answering this question please keep in mind that I am supposed to use only Hashmap (which is not synchronized by nature) and synchronize it myself. If you provide more knowledge its appreciated but not required.
I declared a hashmap like this:
private HashMap<String, Flight> flights = new HashMap<>();
recordID is the key of the flight to be deleted.
Flight flightObj = flights.get(recordID);
synchronized(flightObj){
Flight deletedFlight = flights.remove(recordID);
editResponse = "Flight with flight ID " + deletedFlight.getFlightID() +" deleted successfully";
return editResponse;
}
Now my doubt: Is it fine to synch on the basis of flightObj?
Doubt 2:
Flight newFlight = new Flight(FlightServerImpl.createFlightID());
flights.put(newFlight.getFlightID(),newFlight);
If I create flightts by using above code and if more than 1 thread try execute this code will there be any data consistency issues ? Why or why not?
Thanks in advance.

To quickly answer you questions:
Both are not okay - you can't remove two different objects in parallel, and you can't add two different objects in parallel.
From java documentation:
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:
So, it's okay for many threads to use get concurrently and even put that replaces an object.
But if you remove or add a new object - you need to synchronize before calling any hashmap function.
In that case you can either do what's suggested in the documentation and use a global lock. But, it seems that since some limited concurrency is still allowed, you could get that concurrency it by using a read/write lock.

You can do the following
class MySynchronizedHashMap<E> implements Collection<E>, Serializable {
private static final long serialVersionUID = 3053995032091335093L;
final Collection<E> c; // Backing Collection
final Object mutex; // Object on which to synchronize
SynchronizedCollection(Collection<E> c) {
this.c = Objects.requireNonNull(c);
mutex = this;
}
public boolean add(E e) {
synchronized (mutex) {return c.add(e);}
}
public boolean remove(Object o) {
synchronized (mutex) {return c.remove(o);}
}
}
MySynchronizedHashMap mshm = new MySynchronizedHashMap<>(new HashMap<String, Flight>());
mshm.add(new Flight());

How to make atomic a nested iterative operation on ConcurrentHashMaps?

I have a ConcurrentHashMap subscriptions that contain another object (sessionCollection) and I need to do the following iterative operation:
subscriptions.values().forEach(sessionCollection ->
sessionCollection.removeAllSubscriptionsOfSession(sessionId));
where sessionCollection.removeAllSubscriptionsOfSession does another iterative operation over a collection (also ConcurrentHashMap) inside sessionCollection:
// inside SessionCollection:
private final ConcurrentHashMap<String, CopyOnWriteArrayList<String>> topicsToSessions =
new ConcurrentHashMap<>();
public void removeAllSubscriptionsOfSession(String sessionId) {
// Remove sessions from all topics on record
topicsToSessions.keySet().forEach(topicSessionKey ->
removeTopicFromSession(sessionId, topicSessionKey));
}
What would be the steps to make this overall atomic operation?

ConcurrentHashMap has batch operations (forEach*()), but they are not atomic with respect to the whole map. The only way to make atomic batch changes on a map is to implement all the necessary synchronization yourself. For instance, by using synchronized blocks explicitly or by creating a wrapper (or an extension) for your map that will take care of synchronization where needed. In this case a simple HashMap will suffice since you have to do synchronization anyway:
public class SubscriptionsRegistry {
private final Map<Integer, SessionCollection> map = new HashMap<>();
public synchronized void removeSubscriptions(Integer sessionId) {
map.values().forEach(...);
}
public synchronized void addSubscription(...) {
...
}
...
}
You'll also want to protect the topics-to-sessions maps (at least their modifiable versions) from leaking outside your SubscriptionsRegistry, so nobody is able to modify them without proper synchronization.

How to design the class, which will be accessed by multiple threads.

I want to create a class which will be accessed by multiple threads for getting a data. So I have a method like this:
public static String getData(){
//some logic before returning the value
// return data once initialized
}
Now there will be method which will set that data:
private void setData(){
}
I want this setData to be called only once, so thought of including it in static block:
static {
setData()
}
Now I have these questions:
I'm planning to create a static class for this, so that other threads can call like this:
ThreadSafe.getData();
In my getData, I want to check first the validity of that data before returning, if the data is invalid, I need to again call setData and then return it. Given this fact, I need to make getData synchronized right?
will my above approach work in a multi threading environment?

You can use synchronization:
public class ThreadSafe {
private static final Object monitor = new Object();
private static final String data;
static {
setData();
}
private static void setData() {
data = String.valueOf(System.currentTimeMillis());
}
public static String getData() {
synchronized(monitor) {
if (data.indexOf("7") < 0) {
// recalculate invalid data
setData();
}
return data;
}
}
}

You can achieve it in multiple ways.
You can use volatile variable ( If one Thread write/modify data and all other Threads read the data
You can use AtomicReference variables ( If multiple threads can read/modify shared data)
If you have some atomic transaction on top of these variables instead of just write & read data, you have to protect your business transaction code block with synchronized or other alternatives like Lock API.
Refer to this jenkov tutorial for more details.
Related SE posts with code examples:
Difference between volatile and synchronized in Java
Avoid synchronized(this) in Java?
When to use AtomicReference in Java?
What is the difference between atomic / volatile / synchronized?

I would say you better synchronise (with double lock checking) block that calls setData() in getData(). Assuming it's a rare case this should work faster on high load.

Is this Synchronized Code causing unnecessary thread waiting?

We have an application that has a class which holds members that are populated via the database. Here is a representative example of the situation.
private AtomicBoolean data1Initialized = new AtomicBoolean(false);
protected SomeSynchronizedDataStructure<Object> data1 = <initializeit>;
protected final synchronized void initData1() {
if (data1Initialized.compareAndSet(false, true)){
// Populate data1 data structure from database
}
}
public SomeSynchronizedDataStructure<Object> getData1(){
initData1();
return data1;
}
And we have this same pattern for data1, data2, data3... dataN. Each dataStructure is not related to the other, they just are in the same class. This data is accessed across multiple threads. A few questions about this pattern:
Synchronizing the methods will make it so that threads have to wait for a boolean check across all different dataNs, correct? Which is unnecessary?
Does the data structure need to be synchronized? The data will not change throughout the life of the application. I would think that only initalizing it would need to be synchronized and access could happen unsynchronized.
Most important to me personally
Could this ever cause a deadlock? I think no, but I'm not experienced with threads.

As you are initialising at class creation time all you need is:
public class DataHolder {
// Note: 'final' is needed to ensure value is committed to memory before DataHolder
private final SomeSynchronizedDataStructure<Object> data1 = <initializeit>;
public SomeSynchronizedDataStructure<Object> getData1(){
return data1;
}
}
Because the "initializeit" code will be run in the class' constructor you know it will be ready by the time you have a class handle available to pass around anyway. E.g.:
DataHolder dataHolder = new DataHolder();
// dataHolder has already created the data structure by the time I do...
dataHolder.getData1();
If you did want to go with lazy loading you could simply use synchronized:
public class DataHolder {
private SomeSynchronizedDataStructure<Object> data1;
public synchronized SomeSynchronizedDataStructure<Object> getData1() {
// synchronized guarantees each thread will see "data1" just as the
// last thread left it.
if(data1 == null) {
data1 = initializeit();
}
return data1;
}
}

Synchronizing the methods will make it so that threads have to wait for a boolean check across all different dataNs, correct?
Correct.
Which is unnecessary?
Yes, it is unnecessary because the dataNs are unrelated.
Does the data structure need to be synchronized? The data will not change throughout the life of the application. I would think that only initalizing it would need to be synchronized and access could happen unsynchronized.
Again, correct. BarrySW19's answer gives you the pattern for safely initializing it without synchronizing.
Most important to me personally
Could this ever cause a deadlock? I think no, but I'm not experienced with threads.
It can't cause a deadlock in and of itself. However, if one of the data init methods invokes something else that is synchronized on another monitor (call it m), and meanwhile some other thread owns m and now tries to init one of the dataNs, that's deadlock.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Hashmaps used in multithreaded environment - java

Since you need operate two Maps at the same time, Make the method synchronized is better choice. And HashMap is enough if method is synchronized

Related

Implementing a cache within a Repository using HashMap question

Adding or deleting elements concurrently from a Hashmap and achieving synchronization

How to make atomic a nested iterative operation on ConcurrentHashMaps?

How to design the class, which will be accessed by multiple threads.

Is this Synchronized Code causing unnecessary thread waiting?

Categories

Resources