Java: How to take static snapshot of ConcurrentHashMap? - java

Java doc says that return values of method values() and entrySet() are backed by the map. So changes to the map are reflected in the set and vice versa. I don't want this to happen to my static copy. Essentially, I want lots of concurrent operations to be done on my DS. But for some cases I want to iterate over its static snapshot. I want to iterate over static snapshot, as I am assuming iterating over static snapshot will be faster as compared to a version which is being updated concurrently.

Just make a copy, and it wont be changed.
Set<K> keySetCopy = new HashSet<>(map.keySet());
List<V> valuesCopy = new ArrayList<>(map.values());
All collection implementations have a copy constructor which will copy the entire data of the supplied collection to the newly created one, without being backed by the original.
Note: this won't work with entrySet(), as the actual Map Entries will still "belong" to the original Map and changes to the original entries will be reflected in your copies. In case you need the entrySet(), you should copy the entire Map first, with the same technique.
Set<Entry<K,V>> entrySetCopy = new HashMap<>(map).entrySet();
Note that all of these will require a full iteration ONCE (in the constructor) and will only then be static snapshots. There is no way around this limitation, to my knowledge.

Simply make a copy, new HashMap would be independent of the original one.
Set<K> keySetCopy = new HashSet<>(map.keySet());
List<V> valuesCopy = new ArrayList<>(map.values());
However mind that this will take a full iteration over the concurrentStructure, once but will only then be static snapshots. So you will need time equivalent to one full iteration.

Related

Java unmodifiableMap can be replaced with 'Map.copyOf' call

I'm new to Java and I recently learnt that somtimes it's important to deepcopy a Collection and make an unmodifiable view of it so that the data inside remains safe and unchanged.
When I try to practice this(unmodifiableMap2), I get a warning from IDEA that
unmodifiableMap Can be replaced with 'Map.copyOf' call
That's weird for me because I think unmodifiableMap is not only a copy of the underlying map. Besides, when I try to create the same unmodifiableMap in another way(unmodifiableMap1), the warning doesn't pop up!
How should I understand this behavior of IDEA ?
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
public class test {
public static void main(String[] args) {
Map<Integer, Integer> map = new HashMap<>();
map.put(1,1);
map.put(2,2);
Map<Integer, Integer> map1 = new HashMap<>(map);
Map<Integer, Integer> unmodifiableMap1 = Collections.unmodifiableMap(map1);
Map<Integer, Integer> unmodifiableMap2 = Collections.unmodifiableMap(new HashMap<>(map););
}
}
Map.copyOf() makes a copy of the given Map instance, but it requires that no value in the map is null. Usually, this is the case, but it is not a strict requirement for a Map in general.
java.util.Collections.unmodifiableMap() just wraps a reference to the given Map instance. This means that the receiver is unable to modify the map, but modifications to the original map (that one that was the argument to unmodifiableMap()) are visible to the receiver.
Assuming we have two threads, one iterates over the unmodifiable map, while the other modifies the original one. As a result, you may get a ConcurrentModificationException for an operation on the unmodifiable map … not funny to debug that thing!
This cannot happen with the copy created by Map.copyOf(). But this has a price: with a copy, you need two times the amount of memory for the map (roughly). For really large maps, this may cause memory shortages up to an OutOfMemoryError. Also not fun to debug!
In addition, just wrapping the existing map is presumably much faster than copying it.
So there is no best solution in general, but for most scenarios, I have a preference for using Map.copyOf() when I need an unmodifiable map.
The sample in the question did not wrap the original Map instance, but it makes a copy before wrapping it (either in a line of its own, or on the fly). This eliminates the potential problem with the 'under-the-hood' modification, but may bring the memory issue.
From my experience so far, Map.copyOf( map ) looks to be more efficient than Collections.unmodifiableMap( new HashMap( map ) ).
By the way: Map.copyOf() returns a map that resembles a HashMap; when you copy a TreeMap with it, the sort order gets lost, while the wrapping with unmodifiableMap() keeps the underlying Map implementation and therefore also the sort order. So when this is important, you can use Collections.unmodifiableMap( new TreeMap( map ) ), while Map.copyOf() does not work here.
An unmodifiable map using an existing reference to a map is perfectly fine, and there are many reasons you might want to do this.
Consider this class:
class Foo {
private final Map<String, String> fooMap = new HashMap<>();
// some methods which mutate the map
public Map<String, String> getMap() {
return Collections.unmodifiableMap(fooMap);
}
}
What this class does is provide a read-only view of the map it encapsulates. The class can be sure that clients who consume the map cannot alter it, they can just see its contents. They will also be able to see any updates to the entries if they keep hold of the reference for some time.
If we had tried to expose a read-only view by copying the map, it would take time and memory to perform the copy and the client would not see any changes because both maps are then distinct instances - the source and the copy.
However in the case of this:
Collections.unmodifiableMap(new HashMap<>(map));
You are first copying the map into a new hash map and then passing that copy into Collections.unmodifiableMap. The result is effectively constant. You do not have a reference to the copy you created with new HashMap<>(map), and nor can you get one*.
If what you want is a constant map, then Map.copyOf is a more concise way of achieving that, so IntelliJ suggests you should use that instead.
In the first case, since the reference to the map already exists, IntelliJ cannot make the same inference about your intent so it gives no such suggestion.
You can see the IntelliJ ticket for this feature if you like, though it doesn't explain why the two are essentially equivalent, just that they are.
* well, you probably could via reflection, but IntelliJ is assuming that you won't
Map.copyOf(map) is fully equivalent to Collections.unmodifiableMap(new HashMap<>(map)).
Neither does any kind of deep copying. But it's strictly shorter to do Maps.copyOf(map).

Is there any data structure that has no duplicates but can have elements added to it while being iterated over?

I know a set has no duplicates but the issue is that I can't add elements to it while iterating over it using an iterator or for each loop. Is there any other way? Thank you.
The ConcurrentHashMap class can be used for this. For example:
Set<T> set = Collections.newSetFromMap(new ConcurrentHashMap<T, Boolean>());
(You can replace <T, Boolean> with <> and let the compiler infer the types. I wrote it as above for illustrative purposes.)
The Collections::newSetFromMap javadoc says:
Returns a set backed by the specified map. The resulting set displays the same ordering, concurrency, and performance characteristics as the backing map. In essence, this factory method provides a Set implementation corresponding to any Map implementation.
Since ConcurrentHashMap allows simultaneous iteration and updates, so does the Set produced as above. The catch is that an iteration may not see the effect of additions or removals made while iterating.
The concurrency properties of iteration can be inferred from the javadoc for ConcurrentHashMap.
Is there any other way.
It depends on your requirements, but there are potentially ways to avoid the problem. For example, you could:
copy the set before iterating it, OR
add the new element to another new set and add the existing elements to the new set to the new set after ... or while ... iterating.
However, these these are unlikely to work without a concurrency bottleneck (e.g. 1.) or a differences in behavior (e.g. 2.)
Not sure whether below approach fixes your problem but you can try it:
HashSet<Integer> original = new HashSet<>();
HashSet<Integer> elementsToAdd = new HashSet<>();
elementsToAdd.add(element); //while iterating original
original.addAll(elementsToAdd); //Once done with iterating, just add all.

how to make access to a value in a Java Hashmap synchronized?

Let's say I have a Java Hashmap where the keys are strings or whatever, and the values are lists of other values, for example
Map<String,List<String>> myMap=new HashMap<String,List<String>>();
//adding value to it would look like this
myMap.put("catKey", new ArrayList<String>(){{add("catValue1");}} );
If we have many threads adding and removing values from the lists (not changing the keys just the values of the Hashmap) is there a way to make the access to the lists only threadsafe? so that many threads can edit many values in the same time?
Use a synchronized or concurrent list implementation instead of ArrayList, e.g.
new Vector() (synchronized)
Collections.synchronizedList(new ArrayList<>()) (synchronized wrapper)
new CopyOnWriteArrayList<>() (concurrent)
new ConcurrentLinkedDeque<>() (concurrent, not a List)
The last is not a list, but is useful if you don't actually need access-by-index (i.e. random access), because it performs better than the others.
Note, since you likely need concurrent insertion of the initial empty list into the map for a new key, you should use a ConcurrentHashMap for the Map itself, instead of a plain HashMap.
Recommendation
Map<String, Deque<String>> myMap = new ConcurrentHashMap<>();
// Add new key/value pair
String key = "catKey";
String value = "catValue1";
myMap.computeIfAbsent(key, k -> new ConcurrentLinkedDeque<>()).add(value);
The above code is fully thread-safe when adding a new key to the map, and fully thread-safe when adding a new value to a list. The code doesn't spend time obtaining synchronization locks, and don't suffer the degradation that CopyOnWriteArrayList has when the list grows large.
The only problem is that it uses a Deque, not a List, but the reality is that most uses of List could as easily be using a Deque, but is specifying a List out of habit, so this is likely an acceptable change.
There is a ConcurrentHashMap class which implements ConcurrentMap which can be used for thread-safe Map handling. compute, putIfAbsent, merge, all thread-safely handle multiple things trying to affect the same value at once.
Firstly use the concurrent hash map which will synchronize that particular bucket.
Secondly atomic functions must be used, otherwise when one thread will use the get method another thread can call the put method. Like below
// wrong
if(myMap.get("catKey") == null){
myMap.put("catKey",new ArrayList<String>(){{add("catValue1");}});
}
//correct
myMap.compute("catKey", (key, value) -> if(value==null){return new ArrayList<String>(){{add("catValue1");}}} return value;);

Copying sets Java

Is there a way to copy a TreeSet? That is, is it possible to go
Set <Item> itemList;
Set <Item> tempList;
tempList = itemList;
or do you have to physically iterate through the sets and copy them one by one?
Another way to do this is to use the copy constructor:
Collection<E> oldSet = ...
TreeSet<E> newSet = new TreeSet<E>(oldSet);
Or create an empty set and add the elements:
Collection<E> oldSet = ...
TreeSet<E> newSet = new TreeSet<E>();
newSet.addAll(oldSet);
Unlike clone these allow you to use a different set class, a different comparator, or even populate from some other (non-set) collection type.
Note that the result of copying a Set is a new Set containing references to the objects that are elements if the original Set. The element objects themselves are not copied or cloned. This conforms with the way that the Java Collection APIs are designed to work: they don't copy the element objects.
Starting from Java 10:
Set<E> oldSet = Set.of();
Set<E> newSet = Set.copyOf(oldSet);
Set.copyOf() returns an unmodifiable Set containing the elements of the given Collection.
The given Collection must not be null, and it must not contain any null elements.
With Java 8 you can use stream and collect to copy the items:
Set<Item> newSet = oldSet.stream().collect(Collectors.toSet());
Or you can collect to an ImmutableSet (if you know that the set should not change):
Set<Item> newSet = oldSet.stream().collect(ImmutableSet.toImmutableSet());
Java 8+:
Set<String> copy = new HashSet<>(mySet);
The copy constructor given by #Stephen C is the way to go when you have a Set you created (or when you know where it comes from).
When it comes from a Map.entrySet(), it will depend on the Map implementation you're using:
findbugs says
The entrySet() method is allowed to return a view of the underlying
Map in which a single Entry object is reused and returned during the
iteration. As of Java 1.6, both IdentityHashMap and EnumMap did so.
When iterating through such a Map, the Entry value is only valid until
you advance to the next iteration. If, for example, you try to pass
such an entrySet to an addAll method, things will go badly wrong.
As addAll() is called by the copy constructor, you might find yourself with a Set of only one Entry: the last one.
Not all Map implementations do that though, so if you know your implementation is safe in that regard, the copy constructor definitely is the way to go. Otherwise, you'd have to create new Entry objects yourself:
Set<K,V> copy = new HashSet<K,V>(map.size());
for (Entry<K,V> e : map.entrySet())
copy.add(new java.util.AbstractMap.SimpleEntry<K,V>(e));
Edit: Unlike tests I performed on Java 7 and Java 6u45 (thanks to Stephen C), the findbugs comment does not seem appropriate anymore. It might have been the case on earlier versions of Java 6 (before u45) but I don't have any to test.

modifying a ConcurrentHashMap and Synchronized ArrayList in same method

I have a collection of objects that is modified by one thread and read by another (more specifically the EDT). I needed a solution that gave me fast look up and also fast indexing (by order inserted), so I'm using a ConcurrentHashMap with an accompanying ArrayList of the keys, so if want to index an entry, I can index the List for the key and then use the returned key to get the value from the hash map. So I have a wrapper class that makes sure when and entry is added, the mapping is added to the hash map and the key is added to the list at the same time, similarly for removal.
I'm posting an example of the code in question:
private List<K> keys = Collections.synchronizedList(new ArrayList<K>(INITIAL_CAPACITY));
private ConcurrentMap<K, T> entries = new ConcurrentHashMap<K, T>(INITIAL_CAPACITY, .75f);
public synchronized T getEntryAt(int index){
return entries.get(keys.get(index));
}
**public synchronized void addOrReplaceEntry(K key, T value){
T result = entries.get(key);
if(result == null){
entries.putIfAbsent(key, value);
keys.add(key);
}
else{
entries.replace(key, result);
}
}**
public syncrhonized T removeEntry(K key, T value){
keys.remove(key);
entries.remove(key, value);
}
public synchronized int getSize(){
return keys.size();
}
my question is: am I losing all the benefits of using the ConcurrentHashMap (over syncrhonized hashmap) by operating on it in synchronized methods? I have to synchronize the methods to safely modify/read from the ArrayList of keys (CopyOnWriteArrayList is not an option because a lot of modification happens...) Also, if you know of a better way to do this, that would be appreciated...
Yes, using a Concurrent collection and a Synchronized collection in only synchronized blocks is a waste. You wont get the benefits of ConcurrentHashMap because only one thread will be accesing it at a time.
You could have a look at this implementation of a concurrent linked hashmap, I havnt use it so can't attest to it's features.
One thing to consider would be to switching from synchronized blocks to a ReadWriteLock to improve concurrent read only performance.
I'm not really sure of the utility of proving a remove at index method, perhaps you could give some more details about the problem you are trying to solve?
It seems that you only care about finding values by index. If so, dump the Map and just use a List. Why do you need the Map?
Mixing synchronized and concurrent collections the way you have done it is not recommended. Any reason why you are maintaining two copies of the stuff you are interested in? You can easily get a list of all the keys from the map anytime rather than maintaining a separate list.
Why not store the values in the list and in the map the key -> index mapping?
so for getEntry you only need on lookup (in the list which should be anyway faster than a map) and for remove you do not have to travers the whole list. Syhnronization happens so.
You can get all access to the List keys onto the event queue using EventQueue.invokeLater. This will get rid of the synchronization. With all the synching you were not running much in parallel anyway. Also it means the getSize method will give the same answer for the duration of an event.
If you stick with synchronization instead of using invokeLater, at least get the entries hash table out of the synch block. Either way, you get more parallel processing. Of course, entries can now become out-of-synch with keys. The only down side is sometimes a key will come up with a null entry. With such a dynamic table this is unlikely to matter much.
Using the suggestion made by chrisichris to put the values in the list will solve this problem if it is one. In fact, this puts a nice wall between keys and entries; they are now used in completely separate ways. (If your only need for entries is to provide values to the JTable, you can get rid of it.) But entries (if still needed) should reference the entries, not contain an index; maintaining indexes there would be a hopeless task. And always remember that keys and entries are snapshots of "reality" (for lack of a better word) taken at different times.

Categories

Resources