What is the different between map.put and creating a new map?

What is the different between map.put and creating a new map? - java

i'm reading the source code of sentinel, i find when the map need adding a entry, it create a new hashmap replacing the old rather than using map.put directly. like this:
public class NodeSelectorSlot extends AbstractLinkedProcessorSlot<Object> {
private volatile Map<String, DefaultNode> map = new HashMap<String, DefaultNode>(10);
#Override
public void entry(Context context, ResourceWrapper resourceWrapper, Object obj, int count, boolean prioritized, Object... args)
throws Throwable {
DefaultNode node = map.get(context.getName());
if (node == null) {
synchronized (this) {
node = map.get(context.getName());
if (node == null) {
node = new DefaultNode(resourceWrapper, null);
// create a new hashmap
HashMap<String, DefaultNode> cacheMap = new HashMap<String, DefaultNode>(map.size());
cacheMap.putAll(map);
cacheMap.put(context.getName(), node);
map = cacheMap;
((DefaultNode) context.getLastNode()).addChild(node);
}
}
}
context.setCurNode(node);
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
...
}
what's the different between them?

The code you are looking is fetching a Node from the map, creating and adding a new Node if one is not present.
Clearly, this operation needs to be thread-safe. The simple ways to implement this would be:
Lock the map and perform get and put operations while holding the lock.
Use a ConcurrentHashMap which has operations for doing this kind of thing atomically; e.g. computeIfAbsent.
The authors of this code have chosen a different approach. They are using so-called Double Checked Locking (DCL) to avoid doing the initial get while holding a lock. That is what this code does:
DefaultNode node = map.get(context.getName());
if (node == null) {
synchronized (this) {
node = map.get(context.getName());
...
The authors have decided that when they then need to add a new entry to the map they need to do it by replacing the entire map with a new one. On the face of it, that seems unnecessary. The map updates are being performed while holding the lock and the volatile adds a happens before that seems to ensure that the initial map.get call sees any recent writes to the HashMap.
But that reasoning is INCORRECT. The problem is that there is a small time window between fetching the map reference and the get call completing. During that time window, a simultaneous put operation may be updating the HashMap data structures. This is harmful because those changes could cause the get to read stale data (because there is no happens before relationship from the put writes to the get reads). Even worse, the put could trigger reconstruction of a hash chain or even an expansion of the hash array. The resulting behavior is (at least) outside of the HashMap spec, since HashMap is not defined to be thread-safe.
The authors' solution is to create a new HashMap with the existing entries and the new one, then update map with a single assignment. I haven't done a formal analysis, but I think that this approach is thread-safe.
In short, the reason that the code creates a new HashMap is to make the DCL approach thread-safe.
And if you ignore the thread-safety aspect, this approach is functionality equivalent to a simple put.
Finally, we need to consider whether the authors' approach is going to give optimal performance. The answer will depend on whether the number of cache entries stabilizes, and whether it is relatively small. One observation is that the cost of adding N entries to the cache is O(N^2) !! (Assuming that entries are never removed, as appears to be the case.)

It is so-called copy-on-write, which is intended to ensure thread-safe. When read operations are a lot more than write operations, it is more efficient than mechanisms like ConcurrentHashMap.
Ref: https://github.com/alibaba/Sentinel/issues/1733

Related

Efficiently removing an element added to a ConcurrentQueue

In principle it is easy to remove an element from ConcurrentLinkedQueue or similar implementation. For example, the Iterator for that class supports efficient O(1) removal of the current element:
public void remove() {
Node<E> l = lastRet;
if (l == null) throw new IllegalStateException();
// rely on a future traversal to relink.
l.item = null;
lastRet = null;
}
I want to add an element to the queue with add(), and then later delete that exact element from the queue. The only options I can see are:
Save a reference to the object and call ConcurrentLinkedQueue.remove(Object o) with the object - but this forces a traversal of the whole queue in the worst case (and half on average with a random add and removal pattern).
This has the further issue that it doesn't necessarily remove the same object I inserted. It removes an equal object, which may very be a different one if multiple objects in my queue are equal.
Use ConcurrentLinkedDeque instead, then addLast() my element, then immediately grab a descendingIterator() and iterate until I find my element (which will often be the first one, but may be later since I'm effectively swimming against the tide of concurrent additions).
This addition to being awkward and potentially quite slow, this forces me to use Deque class which in this case is much more complex and slower for many operations (check out Iterator.remove() for that class!).
Furthermore this solution still has a subtle failure mode if identical (i.e., == identity) can be inserted, because I might find the object inserted by someone else, but that can ignored in the usual case that is not possible.
Both solutions seem really awkward, but deleting an arbitrary element in these kind of structures seems like a core operation. What am I missing?
It occurs to me this is a general issue with other concurrent lists and dequeues and even with non concurrent structures like LinkedList.
C++ offers it in the form of methods like insert.

Nope, there's not really any way of doing this in the Java APIs; it's not really considered a core operation.
For what it's worth, there are some significant technical difficulties to how you would do it in the first place. For example, consider ArrayList. If adding an element to an ArrayList gave you another reference object that told you where that element was...
you'd be adding an allocation to every add operation
each object would have to keep track of one (or more!) references to its "pointer" object, which would at least double the memory consumption of the data structure
every time you inserted an element into the ArrayList, you'd have to update the pointer objects for every other element whose position shifted
In short, while this might be a reasonable operation for linked-list-based data structures, there's not really any good way of fitting it into a more general API, so it's pretty much omitted.

If you specifically need this capability (e.g. you are going to have massive collections) then you will likely need to implement your own collection that returns a reference to the entry on add and has the ability to remove in O(1).
class MyLinkedList<V> {
public class Entry {
private Entry next;
private Entry prev;
private V value;
}
public Entry add(V value) {
...
}
public void remove(Entry entry) {
...
}
}
In other words you are not removing by value but by reference to the collection entry:
MyLinkedList<Integer> intList;
MyLinkedList.Entry entry = intList.add(15);
intList.remove(entry);
That's obviously a fair amount of work to implement.

Implementing per-key or striped locking in a Map - best approach?

I came across this dilemma at work and wanted to see if there is a better solution... it feels like there should be an easier, cleaner answer.
Goal: Concurrently access a map with locks at the key level, not at the entire map level, to ensure atomicity while impacting performance as little as possible.
I have a Map which needs to be concurrent. *(Added) The map will be filled with an unknown amount of entries over time. I have multiple readers and a single writer. The writer does a "check-then-put" and the reader does a simple get(). I need these to be atomic... but only at the key level. So for example, if the reader is checking for Key X, and the writer is writing to Key Y, I don't care if I miss the write to Key Y. If the reader/writer is working on the same key however I need that to be atomic.
The easiest solution is to lock the whole map. But this seems like it would impact performance, since there are about 10,000 keys that will end up in the map. (If that doesn't seem like it would hurt performance because the size of the Map is relatively small, let's pretend the Map has many more keys, for arguments sake.)
As far as I know, ConcurrentHashMap will not guarantee the "per-key" atomic behavior I need.
The next solution that came to mind was to have an array of lock objects. You would index into that array of lock Object()'s based on a hash of the original key. This would still have some contention since you have less locks than you have keys into the original map. I'm aware that ConcurrentHashMap does a similar thing under the hood (striping) to provide concurrency (but not atomicity).
Is there an easier way to perform this type of per-key or striped locking?
Thanks.

This concern can come up when value generation is a time-consuming process. You don't want to lock the whole map and find a missing value, and keep the map locked while you generate the value. You could release the map during generation, but then you could have two simultaneous misses and generations.
Instead of directly storing the value with the key, store it inside a reference object:
public class Ref<T>
{
private T value;
public T getValue()
{
return value;
}
public void setValue(T value)
{
this.value = value;
}
}
So if you originally had a map of Map<String, MyThing>, you instead use Map<String, Ref<MyThing>>. Don't bother with a concurrent implementation, just use HashMap or LinkedHashMap or whatever.
Now you can lock the map to find or create a reference holder, and then release the map. Following that, you can lock the reference to find or create the value object:
String key; // key you're looking up
Map<String, Ref<MyThing>> map; // the map
// Find the reference container, create it if necessary
Ref<MyThing> ref;
synchronized(map)
{
ref = map.get(key);
if (ref == null)
{
ref = new Ref<MyThing>();
map.put(key, ref);
}
}
// Map is released at this point
// Now get the value, creating if necessary
MyThing result;
synchronized(ref)
{
result = ref.getValue();
if (result == null)
{
result = generateMyThing();
ref.setValue(result);
}
}
// result == your existing or new object

Java concurrent access to field, trick to not use volatile

Preface: I'm know that in most cases using a volatile field won't yield any measurable performance penalty, but this question is more theoretical and targeted towards a design with an extremly high corrency support.
I've got a field that is a List<Something> which is filled after constrution. To save some performance I would like to convert the List into a read only Map. Doing so at any point requires at least a volatile Map field so make changes visible for all threads.
I was thinking of doing the following:
Map map;
public void get(Object key){
if(map==null){
Map temp = new Map();
for(Object value : super.getList()){
temp.put(value.getKey(),value);
}
map = temp;
}
return map.get(key);
}
This could cause multiple threads to generate the map even if they enter the get block in a serialized way. This would be no big issue, if threads work on different identical instances of the map. What worries me more is:
Is it possible that one thread assigns the new temp map to the map field, and then a second thread sees that map!=null and therefore accesses the map field without generating a new one, but to my suprise finds that the map is empty, because the put operations where not yet pushed to some shared memory area?
Answers to comments:
The threads only modify the temporary map after that it is read only.
I must convert a List to a Map because of some speical JAXB setup which doesn't make it feasable to have a Map to begin with.

Is it possible that one thread assigns the new temp map to the map field, and then a second thread sees that map!=null and therefore accesses the map field without generating a new one, but to my suprise finds that the map is empty, because the put operations where not yet pushed to some shared memory area?
Yes, this is absolutely possible; for example, an optimizing compiler could actually completely get rid of the local temp variable, and just use the map field the whole time, provided it restored map to null in the case of an exception.
Similarly, a thread could also see a non-null, non-empty map that is nonetheless not fully populated. And unless your Map class is carefully designed to allow simultaneous reads and writes (or uses synchronized to avoid the issue), you could also get bizarre behavior if one thread is calling its get method while another is calling its put.

Can you create your Map in the ctor and declare it final? Provided you don't leak the map so others can modify it, that should suffice to make your get() safely sharable by multiple threads.

When you really in doubt whether an other thread could read an "half completed" map
(I don't think so, but never say never ;-), you may try this.
map is null or complete
static class MyMap extends HashMap {
MyMap (List pList) {
for(Object value : pList){
put(value.getKey(), value);
}
}
}
MyMap map;
public Object get(Object key){
if(map==null){
map = new MyMap (super.getList());
}
return map.get(key);
}
Or does someone see a new introduced problem ?

In addition to the visibility concerns previously mentioned, there is another problem with the original code, viz. it can throw a NullPointerException here:
return this.map.get(key)
Which is counter-intuitive, but that is what you can expect from incorrectly synchronized code.
Sample code to prevent this:
Map temp;
if ((temp = this.map) == null)
{
temp = new ImmutableMap(getList());
this.map = temp;
}
return temp.get(key);

modifying a ConcurrentHashMap and Synchronized ArrayList in same method

I have a collection of objects that is modified by one thread and read by another (more specifically the EDT). I needed a solution that gave me fast look up and also fast indexing (by order inserted), so I'm using a ConcurrentHashMap with an accompanying ArrayList of the keys, so if want to index an entry, I can index the List for the key and then use the returned key to get the value from the hash map. So I have a wrapper class that makes sure when and entry is added, the mapping is added to the hash map and the key is added to the list at the same time, similarly for removal.
I'm posting an example of the code in question:
private List<K> keys = Collections.synchronizedList(new ArrayList<K>(INITIAL_CAPACITY));
private ConcurrentMap<K, T> entries = new ConcurrentHashMap<K, T>(INITIAL_CAPACITY, .75f);
public synchronized T getEntryAt(int index){
return entries.get(keys.get(index));
}
**public synchronized void addOrReplaceEntry(K key, T value){
T result = entries.get(key);
if(result == null){
entries.putIfAbsent(key, value);
keys.add(key);
}
else{
entries.replace(key, result);
}
}**
public syncrhonized T removeEntry(K key, T value){
keys.remove(key);
entries.remove(key, value);
}
public synchronized int getSize(){
return keys.size();
}
my question is: am I losing all the benefits of using the ConcurrentHashMap (over syncrhonized hashmap) by operating on it in synchronized methods? I have to synchronize the methods to safely modify/read from the ArrayList of keys (CopyOnWriteArrayList is not an option because a lot of modification happens...) Also, if you know of a better way to do this, that would be appreciated...

Yes, using a Concurrent collection and a Synchronized collection in only synchronized blocks is a waste. You wont get the benefits of ConcurrentHashMap because only one thread will be accesing it at a time.
You could have a look at this implementation of a concurrent linked hashmap, I havnt use it so can't attest to it's features.
One thing to consider would be to switching from synchronized blocks to a ReadWriteLock to improve concurrent read only performance.
I'm not really sure of the utility of proving a remove at index method, perhaps you could give some more details about the problem you are trying to solve?

It seems that you only care about finding values by index. If so, dump the Map and just use a List. Why do you need the Map?

Mixing synchronized and concurrent collections the way you have done it is not recommended. Any reason why you are maintaining two copies of the stuff you are interested in? You can easily get a list of all the keys from the map anytime rather than maintaining a separate list.

Why not store the values in the list and in the map the key -> index mapping?
so for getEntry you only need on lookup (in the list which should be anyway faster than a map) and for remove you do not have to travers the whole list. Syhnronization happens so.

You can get all access to the List keys onto the event queue using EventQueue.invokeLater. This will get rid of the synchronization. With all the synching you were not running much in parallel anyway. Also it means the getSize method will give the same answer for the duration of an event.
If you stick with synchronization instead of using invokeLater, at least get the entries hash table out of the synch block. Either way, you get more parallel processing. Of course, entries can now become out-of-synch with keys. The only down side is sometimes a key will come up with a null entry. With such a dynamic table this is unlikely to matter much.
Using the suggestion made by chrisichris to put the values in the list will solve this problem if it is one. In fact, this puts a nice wall between keys and entries; they are now used in completely separate ways. (If your only need for entries is to provide values to the JTable, you can get rid of it.) But entries (if still needed) should reference the entries, not contain an index; maintaining indexes there would be a hopeless task. And always remember that keys and entries are snapshots of "reality" (for lack of a better word) taken at different times.

Java concurrency with a Map of Lists

I have a java class that is accessed by a lot of threads at once and want to make sure it is thread safe. The class has one private field, which is a Map of Strings to Lists of Strings. I've implemented the Map as a ConcurrentHashMap to ensure gets and puts are thread safe:
public class ListStore {
private Map<String, List<String>> innerListStore;
public ListStore() {
innerListStore = new ConcurrentHashMap<String, List<String>>();
}
...
}
So given that gets and puts to the Map are thread safe, my concern is with the lists that are stored in the Map. For instance, consider the following method that checks if a given entry exists in a given list in the store (I've omitted error checking for brevity):
public boolean listEntryExists(String listName, String listEntry) {
List<String> listToSearch = innerListStore.get(listName);
for (String entryName : listToSearch) {
if(entryName.equals(listEntry)) {
return true;
}
}
return false;
}
It would seem that I need to synchronize the entire contents of this method because if another method changed the contents of the list at innerListStore.get(listName) while this method is iterating over it, a ConcurrentModificationException would be thrown.
Is that correct and if so, do I synchronize on innerListStore or would synchronizing on the local listToSearch variable work?
UPDATE: Thanks for the responses. It sounds like I can synchronize on the list itself. For more information, here is the add() method, which can be running at the same time the listEntryExists() method is running in another thread:
public void add(String listName, String entryName) {
List<String> addTo = innerListStore.get(listName);
if (addTo == null) {
addTo = Collections.synchronizedList(new ArrayList<String>());
List<String> added = innerListStore.putIfAbsent(listName, addTo);
if (added != null) {
addTo = added;
}
}
addTo.add(entryName);
}
If this is the only method that modifies the underlying lists stored in the map and no public methods return references to the map or entries in the map, can I synchronize iteration on the lists themselves and is this implementation of add() sufficient?

You can synchronize on listToSearch ("synchronized(listToSearch) {...}"). Make sure that there is no race condition creating the lists (use innerListStore.putIfAbsent to create them).

You could synchronize on just listToSearch, there's no reason to lock the entire map any time anyone is using just one entry.
Just remember though, that you need to synchronize on the list everywhere it is modified! Synchronizing the iterator doesn't automagically block other people from doing an add() or whatnot if you passed out to them references to the unsynchronized list.
It would be safest to just store synchronized lists in the Map and then lock on them when you iterate, and also document when you return a reference to the list that the user must sycnhronize on it if they iterate. Synchronization is pretty cheap in modern JVMs when no actual contention is happening. Of course if you never let a reference to one of the lists escape your class, you can handle it internally with a finer comb.
Alternately you can use a threadsafe list such as CopyOnWriteArrayList that uses snapshot iterators. What kind of point in time consistency you need is a design decision we can't make for you. The javadoc also includes a helpful discussion of performance characteristics.

It would seem that I need to synchronize the entire contents of this method because if another method changed the contents of the list at innerListStore.get(listName) while this method is iterating over it, a ConcurrentModificationException would be thrown.
Are other threads accessing the List itself, or only though operations exposed by ListStore?
Will operations invoked by other threads result in the contents of the a List stored in the Map being changed? Or will entries only be added/removed from the Map?
You would only need to synchronize access to the List stored within the Map if different threads can result in changes to the same List instances. If the threads are only allowed to add/remove List instances from the Map (i.e. change the structure of the Map), then synchronization is not necessary.

if the lists stored in the map are of the type that don't throw CME (CopyOnWriteArrayList for example) you can iterate at will
this can introduce some races though if you're not careful

If the Map is already thread safe, then I think syncronizing the listToSearch should work. Im not 100% but I think it should work
synchronized(listToSearch)
{
}

You could use another abstraction from Guava
Note that this will synchronize on the whole map, so it might be not that useful for you.

As you haven't provided any client for the map of lists apart from the boolean listEntryExists(String listName, String listEntry) method, I wonder why you are storing lists at all? This structure seems to be more naturally a Map<String, Set<String>> and the listEntryExists should use the contains method (available on List as well, but O(n) to the size of the list):
public boolean listEntryExists(String name, String entry) {
SetString> set = map.get(name);
return (set == null) ? false : set.contains(entry;
}
Now, the contains call can encapsulate whatever internal concurrency protocol you want it to.
For the add you can either use a synchronized wrapper (simple, but maybe slow) or if writes are infrequent compared to reads, utilise ConcurrentMap.replace to implement your own copy-on-write strategy. For instance, using Guava ImmutableSet:
public boolean add(String name, String entry) {
while(true) {
SetString> set = map.get(name);
if (set == null) {
if (map.putIfAbsent(name, ImmutableSet.of(entry))
return true
continue;
}
if (set.contains(entry)
return false; // no need to change, already exists
Set<String> newSet = ImmutableSet.copyOf(Iterables.concat(set, ImmutableSet.of(entry))
if (map.replace(name, set, newSet)
return true;
}
}
This is now an entirely thread-safe lock-free structure, where concurrent readers and writers will not block each other (modulo the lock-freeness of the underlying ConcurrentMap implementation). This implementation does have an O(n) in its write, where your original implementation was O9n) in the read. Again if you are read-mostly rather than write-mostly this could be a big win.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.