Concurrent read access on map structure without lock

Concurrent read access on map structure without lock - java

I have a hashmap:
LinkedHashMap<Long, List<IOperation>> operations.
Which is written at by multiple threads. I use a lock around it.
synchronized (lock){...}
To make sure that only 1 thread writes on it at a given moment.
Neveretheless, in some occasions I need to operate some long read requests on it.
For this purpose I copy the map:
temp.putAll(operations);
or
= new LinkedHashMap<>(operations)
Is there a way to make such a copy with the following premises:
No need to lock the map to copy it.
No call by value on the members of the map and its copy.
Thanks already
Some additional details.
I execute quite often some series of long reads on it and performance is crucial.

It would most likely be easier for you when initializing your LinkedHashMap<Long, List<IOperation>> to wrap it in Collections#synchronizedMap so each atomic operation on the map will be synchronized, rather than having to do it yourself.
If you insist on copying the Map without having to lock it, I would just create another LinkedHashMap with the synchronized Map when beginning to read it.
Map<Long, List<IOperation>> copy = new LinkedHashMap<>(synchronizedMap);
You can also eliminate the need for the copy of the Map by simply wrapping the "long read requests" in a synchronized block using the Map as the lock.

Related

Is Hashmap's containsKey method threadsafe if the map is initialized once, and is never modified again

Can we use Hashmap's containsKey() method without synchronizing in an multi-threaded environment?
Note: Threads are only going to read the Hashmap. The map is initialized once, and is never modified again.

It really depends on how/when your map is accessed.
Assuming the map is initialized once, and never modified again, then methods that don't modify the internal state like containsKey() should be safe.
In this case though, you should make sure your map really is immutable, and is published safely.
Now if in your particular case the state does change during the course of your program, then no, it is not safe.
From the documentation:
Note that this implementation is not synchronized.
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
In this case, you should use ConcurrentHashMap, or synchronize externally.

You shouldn't look at a single method this way. A HashMap is not meant to be used in a multi-threaded setup.
Having said that, the one exception would be: a map that gets created once (single threaded), and afterwards is "read" only. In other words: if a map doesn't get changed anymore, then you can have as many threads reading it as you want.
From that point of view, just containsKey() calls shouldn't call a problem. The problem arises when the information that this method relies on changes over time.

No, it is not thread-safe for any operations. You need to synchronise all access or use something like ConcurrentHashMap.
My favourite production system troubleshooting horror story is when we found that HashMap.get went into an infinite loop locking up 100% CPU forever because of missing synchronisation. This happened because the linked lists that were used within each bucket got into an inconsistent state. The same could happen with containsKey.
You should be safe if no one modifies the HashMap after it has been initially published, but better use an implementation that guarantees this explicitly (such as ImmutableMap or, again, a ConcurrentMap).

No. (No it is not. Not at all. 30 characters?)

It's complicated, but, mostly, no.
The spec of HashMap makes no guarantees whatsoever. It therefore reserves the right to blast yankee doodle dandy from your speakers if you try: You're just not supposed to use it that way.
... however, in practice, whilst the API of HashMap makes no guarantees, generally it works out. But, mind the horror story of #Thilo's answer.
... buuut, the Java Memory Model works like this: You should consider that each thread gets an individual copy of each and every field across the entire heap of the VM. These individual copies are then synced up at indeterminate times. That means that all sorts of code simply isn't going to work right; you add an entry to the map from one thread, and if you then access that map from another you won't see it even though a lot of time has passed – that's theoretically possible. Also, internally, map uses multiple fields and presumably these fields must be consistent with each other or you'll get weird behaviours (exceptions and wrong results). The JMM makes no guarantees about consistency either. The way out of this dilemma is that the JMM offers these things called 'comes-before/comes-after' relationships which give you guarantees that changes have been synced up. Using the 'synchronized' keyword is one easy way to get such relationships going.
Why not use a ConcurrentHashMap which has all the bells and whistles built in and does in fact guarantee that adding an entry from thread A and then querying it via containsKey from thread B will get you a consistent answer (which might still be 'no, that key is not in the map', because perhaps thread B got there slightly before thread A or slightly after but there's no way for you to know. It won't throw any exceptions or do something really bizarre such as returning 'false' for things you added ages ago all of a sudden).
So, whilst it's complicated, the answer is basically: Don't do that; either use a synchronized guard, or, probably the better choice: ConcurrentHashMap.

No, Read the bold part of HashMap documentation:
Note that this implementation is not synchronized.
So you should handle it:
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
And suggested solutions:
This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method

#user7294900 is right.
If your application does not modifies the HashMap structurally which is build thread-safely and your application just invoke containsKey method, it's thread safe.
For instance, I've used HashMap like this:
#Component
public class SpringSingletonBean {
private Map<String, String> map = new HashMap<>();
public void doSomething() {
//
if (map.containsKey("aaaa")) {
//do something
}
}
#PostConstruct
public void init() {
// do something to initialize the map
}
}
It works well.

Bulk operation on java synchronized cache

I would like to implement a simple cache, which is updated periodically, and every update triggers a full cache clear and data insertation.
Pseudocode:
//context calls periodically this method
cache.clear();
cache.putAll(newValues)
Since other threads might read the cache during the refresh operation. I need some kind of synchronization.
The simplest solution might be similar to the following:
computeNewCacheValues()
computeStaleKeys() //e.g. ones are in the cache but are not in the new cache
removeStaleKeysOneByOneFromCache()
updateKeysFromNewCacheValueOneByOne()
The implementation is backed by a ConccurentHashMap instance - so during cache updates:
no concurrency issues occurs(?)
the cache is not locked for the entire process (thus accessible during refresh)
This might be (?) a good solution, but I was wondering: are there other, more efficient/safe ways to implement this? Are there any libraries which are capable of this operation?

If you are always replacing the entire cache you can just replace it.
final AtomicReference<Map<K, V>> mapRef = new AtomicReference<>();
// assuming you don't modify the map after calling this.
public void update(Map<K, V> map) {
mapRef.set(map);
}
public V get(K key) {
// this will always see the latest complete map.
return mapRef.get().get(key);
}
Note: there is no locking required as the Map is not altered once it is added to the cache.

For future reference:
When creating a "bulk cache", the simplest solution might be the one that #Peter mentioned; on a separate thread create the new cache, and change the reference of the old one to the new one.
Things to consider:
the solution works, because the assignment operator ("=") execution is atomic
the cache object must be volatile; this way its value is accessible to multiple threads (without this one thread might change its value, but the other thread will not be able to see the change)
AtomicReference is another option; basically its a wrapper for a volatile object, although it adds some useful methods (e.g. for audit)

Thread safe way to copy a map

Am using JDK 7, SQLite, and have Guava in my project.
I have a TreeMap with less than 100 entries that is being updated by a single "worker" thread hundreds of times a second. I am now writing a component (another thread - the "DB thread") that will write the map to my database every 5 or 10 seconds.
I know that I need to make a deep copy of the map so the DB thread will use a snapshot, while the worker thread continues its job. I am looking at the Guava Maps class which has many methods that make copies, but I am not sure if any of them meet my needs to synchronize on the map whenever a copy is needed. Is there a method there that will meet my needs, or should I write a synchronized block to make my own deep copy?

It depends on what you want:
If you want a fully concurrent map (cant read while adding and so on) You should use what JSlain said before me.
If all you want is the CURRENT snapshot of the map and you do not care if the map will be modified as long as the iterator you are using wont be changed.
Then use ConcurrentSkipListMap
This will provide each iteration with a new independent iterator so even if the real map is changed you wont notice it.
You will see it in the next update (5 seconds in your case.)

From TreeMap javadoc:
Note that this implementation is not synchronized. If multiple threads
access a map
concurrently, and at least one of the threads modifies the map structurally, it must be
synchronized externally. (A structural modification is any operation that adds or deletes one
or more mappings; merely changing the value associated with an existing key is not a
structural modification.) This is typically accomplished by synchronizing on some object that
naturally encapsulates the map. If no such object exists, the map should be "wrapped" using
the Collections.synchronizedSortedMap method. This is best done at creation time, to prevent
accidental unsynchronized access to the map:
SortedMap m = Collections.synchronizedSortedMap(new TreeMap(...));

Why should we use HashMap in multi-threaded environments?

Today I was reading about how HashMap works in Java. I came across a blog and I am quoting directly from the article of the blog. I have gone through this article on Stack Overflow. Still
I want to know the detail.
So the answer is Yes there is potential race condition exists while
resizing HashMap in Java, if two thread at the same time found that
now HashMap needs resizing and they both try to resizing. on the
process of resizing of HashMap in Java , the element in bucket which
is stored in linked list get reversed in order during there migration
to new bucket because java HashMap doesn't append the new element at
tail instead it append new element at head to avoid tail traversing.
If race condition happens then you will end up with an infinite loop.
It states that as HashMap is not thread-safe during resizing of the HashMap a potential race condition can occur. I have seen in our office projects even, people are extensively using HashMaps knowing they are not thread safe. If it is not thread safe, why should we use HashMap then? Is it just lack of knowledge among developers as they might not be aware about structures like ConcurrentHashMap or some other reason. Can anyone put a light on this puzzle.

I can confidently say ConcurrentHashMap is a pretty ignored class. Not many people know about it and not many people care to use it. The class offers a very robust and fast method of synchronizing a Map collection. I have read a few comparisons of HashMap and ConcurrentHashMap on the web. Let me just say that they’re totally wrong. There is no way you can compare the two, one offers synchronized methods to access a map while the other offers no synchronization whatsoever.
What most of us fail to notice is that while our applications, web applications especially, work fine during the development & testing phase, they usually go tilts up under heavy (or even moderately heavy) load. This is due to the fact that we expect our HashMap’s to behave a certain way but under load they usually misbehave. Hashtable’s offer concurrent access to their entries, with a small caveat, the entire map is locked to perform any sort of operation.
While this overhead is ignorable in a web application under normal load, under heavy load it can lead to delayed response times and overtaxing of your server for no good reason. This is where ConcurrentHashMap’s step in. They offer all the features of Hashtable with a performance almost as good as a HashMap. ConcurrentHashMap’s accomplish this by a very simple mechanism.
Instead of a map wide lock, the collection maintains a list of 16 locks by default, each of which is used to guard (or lock on) a single bucket of the map. This effectively means that 16 threads can modify the collection at a single time (as long as they’re all working on different buckets). Infact there is no operation performed by this collection that locks the entire map.

There are several aspects to this: First of all, most of the collections are not thread safe. If you want a thread safe collection you can call synchronizedCollection or synchronizedMap
But the main point is this: You want your threads to run in parallel, no synchronization at all - if possible of course. This is something you should strive for but of course cannot be achieved every time you deal with multithreading.
But there is no point in making the default collection/map thread safe, because it should be an edge case that a map is shared. Synchronization means more work for the jvm.

In a multithreaded environment, you have to ensure that it is not modified concurrently or you can reach a critical memory problem, because it is not synchronized in any way.
Dear just check Api previously I also thinking in same manner.
I thought that the solution was to use the static Collections.synchronizedMap method. I was expecting it to return a better implementation. But if you look at the source code you will realize that all they do in there is just a wrapper with a synchronized call on a mutex, which happens to be the same map, not allowing reads to occur concurrently.
In the Jakarta commons project, there is an implementation that is called FastHashMap. This implementation has a property called fast. If fast is true, then the reads are non-synchronized, and the writes will perform the following steps:
Clone the current structure
Perform the modification on the clone
Replace the existing structure with the modified clone
public class FastSynchronizedMap implements Map,
Serializable {
private final Map m;
private ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
.
.
.
public V get(Object key) {
lock.readLock().lock();
V value = null;
try {
value = m.get(key);
} finally {
lock.readLock().unlock();
}
return value;
}
public V put(K key, V value) {
lock.writeLock().lock();
V v = null;
try {
v = m.put(key, value);
} finally {
lock.writeLock().lock();
}
return v;
}
.
.
.
}
Note that we do a try finally block, we want to guarantee that the lock is released no matter what problem is encountered in the block.
This implementation works well when you have almost no write operations, and mostly read operations.

Hashmap can be used when a single thread has an access to it. However when multiple threads start accessing the Hashmap there will be 2 main problems:
1. resizing of hashmap is not gauranteed to work as expected.
2. Concurrent Modification exception would be thrown. This can also be thrown when its accessed by single thread to read and write onto the hashmap at the same time.

A workaround for using HashMap in multi-threaded environment is to initialize it with the expected number of objects' count, hence avoiding the need for a re-sizing.

Slow interation over a ReadWriteLock protected map: read lock, concurrent map or copying?

I have a map that is frequently read but rarely write to. Some operations (can be reading or writing) involves multiple objects that needs to be operated atomically, so I used a ReadWriteLock to improve performance.
Now I have the option to downgrade the concurrent map to a normal hash map, but I am concerned about some of the slow iteration code.
If I downgrade the map, the long iterators to must hold the read lock to avoid concurrent access exceptions. I think this will block the writing threads for too long.
Since some of the iterators are not sensitive to inconsistent data, I can keep the concurrent map so that iterators can be used with concurrent writes. However, this adds unnecessary overhead (from the concurrent map) to operations that are properly using the locks.
Alternatively, I can implement something like a Read-on-write map, where the entire (non-concurrent) map is cloned for a write operation so that existing iterators continues to work outside a read lock.
Obviously, all these methods are valid and the performance depends on the actual code and setup. However, I am wondering is there any study on this (so I don't have to do the experiments myself)?

I have a map that is frequently read but rarely write to.
In that case I would consider implementing a copy of write map. e.g.
private final Map<Key, Value> map = ?* thread safe map */
private volatile Map<Key, Value> mapCopy = emptyMap();
// when you write
get lock
modify map
take a copy and store it in mapCopy
release lock
// when you read
Map<Key, Value> map = this.mapCopy;
use map
As you can see, you never need to obtain a lock on a read, only on a write.
If I downgrade the map, the long iterators to must hold the read lock to avoid concurrent access exceptions. I think this will block the writing threads for too long.
Instead of guess, I suggest you measure it.
I am wondering is there any study on this
I wouldn't take such a study too seriously if it did. As you suggest the result vary based on your situation.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.