I'm new to Java and I recently learnt that somtimes it's important to deepcopy a Collection and make an unmodifiable view of it so that the data inside remains safe and unchanged.
When I try to practice this(unmodifiableMap2), I get a warning from IDEA that
unmodifiableMap Can be replaced with 'Map.copyOf' call
That's weird for me because I think unmodifiableMap is not only a copy of the underlying map. Besides, when I try to create the same unmodifiableMap in another way(unmodifiableMap1), the warning doesn't pop up!
How should I understand this behavior of IDEA ?
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
public class test {
public static void main(String[] args) {
Map<Integer, Integer> map = new HashMap<>();
map.put(1,1);
map.put(2,2);
Map<Integer, Integer> map1 = new HashMap<>(map);
Map<Integer, Integer> unmodifiableMap1 = Collections.unmodifiableMap(map1);
Map<Integer, Integer> unmodifiableMap2 = Collections.unmodifiableMap(new HashMap<>(map););
}
}
Map.copyOf() makes a copy of the given Map instance, but it requires that no value in the map is null. Usually, this is the case, but it is not a strict requirement for a Map in general.
java.util.Collections.unmodifiableMap() just wraps a reference to the given Map instance. This means that the receiver is unable to modify the map, but modifications to the original map (that one that was the argument to unmodifiableMap()) are visible to the receiver.
Assuming we have two threads, one iterates over the unmodifiable map, while the other modifies the original one. As a result, you may get a ConcurrentModificationException for an operation on the unmodifiable map … not funny to debug that thing!
This cannot happen with the copy created by Map.copyOf(). But this has a price: with a copy, you need two times the amount of memory for the map (roughly). For really large maps, this may cause memory shortages up to an OutOfMemoryError. Also not fun to debug!
In addition, just wrapping the existing map is presumably much faster than copying it.
So there is no best solution in general, but for most scenarios, I have a preference for using Map.copyOf() when I need an unmodifiable map.
The sample in the question did not wrap the original Map instance, but it makes a copy before wrapping it (either in a line of its own, or on the fly). This eliminates the potential problem with the 'under-the-hood' modification, but may bring the memory issue.
From my experience so far, Map.copyOf( map ) looks to be more efficient than Collections.unmodifiableMap( new HashMap( map ) ).
By the way: Map.copyOf() returns a map that resembles a HashMap; when you copy a TreeMap with it, the sort order gets lost, while the wrapping with unmodifiableMap() keeps the underlying Map implementation and therefore also the sort order. So when this is important, you can use Collections.unmodifiableMap( new TreeMap( map ) ), while Map.copyOf() does not work here.
An unmodifiable map using an existing reference to a map is perfectly fine, and there are many reasons you might want to do this.
Consider this class:
class Foo {
private final Map<String, String> fooMap = new HashMap<>();
// some methods which mutate the map
public Map<String, String> getMap() {
return Collections.unmodifiableMap(fooMap);
}
}
What this class does is provide a read-only view of the map it encapsulates. The class can be sure that clients who consume the map cannot alter it, they can just see its contents. They will also be able to see any updates to the entries if they keep hold of the reference for some time.
If we had tried to expose a read-only view by copying the map, it would take time and memory to perform the copy and the client would not see any changes because both maps are then distinct instances - the source and the copy.
However in the case of this:
Collections.unmodifiableMap(new HashMap<>(map));
You are first copying the map into a new hash map and then passing that copy into Collections.unmodifiableMap. The result is effectively constant. You do not have a reference to the copy you created with new HashMap<>(map), and nor can you get one*.
If what you want is a constant map, then Map.copyOf is a more concise way of achieving that, so IntelliJ suggests you should use that instead.
In the first case, since the reference to the map already exists, IntelliJ cannot make the same inference about your intent so it gives no such suggestion.
You can see the IntelliJ ticket for this feature if you like, though it doesn't explain why the two are essentially equivalent, just that they are.
* well, you probably could via reflection, but IntelliJ is assuming that you won't
Map.copyOf(map) is fully equivalent to Collections.unmodifiableMap(new HashMap<>(map)).
Neither does any kind of deep copying. But it's strictly shorter to do Maps.copyOf(map).
Related
I have the following code. I am trying to understand if it would make any changes to memory.
Approach 1: Using collectors I can directly return map like so:
List<Customer> customerList = new ArrayList<>();
customerList.add(new Customer("1", "pavan"));
customerList.add(new Customer("2", "kumar"));
return customerList.stream().collect(Collectors.toMap(t->t.getId(), t->t));
Approach 2: Using an explicit map to collect results, like so:
Map<String,Customer> map = new HashMap<String, Customer>();
map = customerList.stream().collect(Collectors.toMap(t->t.getId(), t->t));
return map;
Compared to the first, does the second approach make any difference to memory/ GC, if I iterate over a million times?
Aside from instantiating a Map instance that you don't need in the second example, both pieces of code are identical. You immediately replace the Map reference you created with the one returned by the stream. Most likely the compiler would eliminate that as redundant code.
The collect method of the streams API will instantiate a Map for you; the code has been well optimised, which is one of the advantages of using the Stream API over doing it yourself.
To answer your specific question, you can iterate over both sections of code as many times as you like and it won't make any difference to the GC impact.
There code is by far not identical; specifically Collectors.toMap says that it will return a Map :
There are no guarantees on the type, mutability, serializability, or thread-safety of the Map returned.
There are absolutely no guarantees what-so-ever that the returned Map is actually a HashMap. It could be anything other - any other Map here; so assigning it to a HashMap is just wrong.
Then there is the way you build the Collector. It could be simplified to:
customerList.stream().collect(Collectors.toMap(Customer::getId, Function.identity()));
The method reference Customer::getId, as opposed to the lambda expression, will create one less method (since lambda expressions are de-sugared to methods and method references are not).
Also Function.identity() instead of t -> t will create less objects if used in multiple places. Read this.
Then there is the fact how a HashMap works internally. If you don't specify a default size, it might have to re-size - which is an expensive operation. By default Collectors.toMap will start with a default Map of 16 entries and a load_factor of 0.75 - which means you can put 12 entries into it before the next resize.
You can't omit that using Collectors.toMap since the supplier of that will always start from HashMap::new - using the default 16 entries and load_factor of 0.75.
Knowing this, you can drop the stream entirely:
Map<String, Customer> map = new HashMap<String, Customer>((int)Math.ceil(customerList.size() / 0.75));
customerList.forEach(x -> map.put(x.getId(), x));
In the second approach, you are instantiating a Map, and reassigning the reference to the one returned by the call to stream.collect().
Obviously, the first Map object referenced by "map" is lost.
The first approach does not have this problem.
In short, yes, this makes a minor difference in terms of memory usage, but it is likely negligible considering you have a million entries to iterate over.
Java doc says that return values of method values() and entrySet() are backed by the map. So changes to the map are reflected in the set and vice versa. I don't want this to happen to my static copy. Essentially, I want lots of concurrent operations to be done on my DS. But for some cases I want to iterate over its static snapshot. I want to iterate over static snapshot, as I am assuming iterating over static snapshot will be faster as compared to a version which is being updated concurrently.
Just make a copy, and it wont be changed.
Set<K> keySetCopy = new HashSet<>(map.keySet());
List<V> valuesCopy = new ArrayList<>(map.values());
All collection implementations have a copy constructor which will copy the entire data of the supplied collection to the newly created one, without being backed by the original.
Note: this won't work with entrySet(), as the actual Map Entries will still "belong" to the original Map and changes to the original entries will be reflected in your copies. In case you need the entrySet(), you should copy the entire Map first, with the same technique.
Set<Entry<K,V>> entrySetCopy = new HashMap<>(map).entrySet();
Note that all of these will require a full iteration ONCE (in the constructor) and will only then be static snapshots. There is no way around this limitation, to my knowledge.
Simply make a copy, new HashMap would be independent of the original one.
Set<K> keySetCopy = new HashSet<>(map.keySet());
List<V> valuesCopy = new ArrayList<>(map.values());
However mind that this will take a full iteration over the concurrentStructure, once but will only then be static snapshots. So you will need time equivalent to one full iteration.
This question already has answers here:
Immutable vs Unmodifiable collection [duplicate]
(11 answers)
Closed 8 years ago.
Context
I need to return a reference to a map that I'm using for a data cache, and I'd like to make sure nobody can modify their reference.
Question
I've seen lots of references to UnmodifiableMap and ImmutableMap online, but I don't see anything comparing/contrasting them. I figure there is a good reason that Google/Guava created their own version - can someone tell me what it is?
An unmodifiable map may still change. It is only a view on a modifiable map, and changes in the backing map will be visible through the unmodifiable map. The unmodifiable map only prevents modifications for those who only have the reference to the unmodifiable view:
Map<String, String> realMap = new HashMap<String, String>();
realMap.put("A", "B");
Map<String, String> unmodifiableMap = Collections.unmodifiableMap(realMap);
// This is not possible: It would throw an
// UnsupportedOperationException
//unmodifiableMap.put("C", "D");
// This is still possible:
realMap.put("E", "F");
// The change in the "realMap" is now also visible
// in the "unmodifiableMap". So the unmodifiableMap
// has changed after it has been created.
unmodifiableMap.get("E"); // Will return "F".
In contrast to that, the ImmutableMap of Guava is really immutable: It is a true copy of a given map, and nobody may modify this ImmutableMap in any way.
Update:
As pointed out in a comment, an immutable map can also be created with the standard API using
Map<String, String> immutableMap =
Collections.unmodifiableMap(new LinkedHashMap<String, String>(realMap));
This will create an unmodifiable view on a true copy of the given map, and thus nicely emulates the characteristics of the ImmutableMap without having to add the dependency to Guava.
Have a look at ImmutableMap JavaDoc: doc
There is information about that there:
Unlike Collections.unmodifiableMap(java.util.Map), which is a view of a separate map which can still change, an instance of ImmutableMap contains its own data and will never change. ImmutableMap is convenient for public static final maps ("constant maps") and also lets you easily make a "defensive copy" of a map provided to your class by a caller.
Guava Documentation
The JDK provides Collections.unmodifiableXXX methods, but in our opinion, these can be unwieldy and verbose; unpleasant to use everywhere you want to make defensive copies unsafe: the returned collections are only truly immutable if nobody holds a reference to the original collection inefficient: the data structures still have all the overhead of mutable collections, including concurrent modification checks, extra space in hash tables, etc.
ImmutableMap does not accept null values whereas Collections.unmodifiableMap() does. In addition it will never change after construction, while UnmodifiableMap may. From the JavaDoc:
An immutable, hash-based Map with reliable user-specified iteration order. Does not permit null keys or values.
Unlike Collections.unmodifiableMap(java.util.Map), which is a view of a separate map which can still change, an instance of ImmutableMap contains its own data and will never change. ImmutableMap is convenient for public static final maps ("constant maps") and also lets you easily make a "defensive copy" of a map provided to your class by a caller.
I would like to add copies of a propertyMap to my propertyMap:
public void addProperties(Map<String, Object> propertyMap) {
for (Map.Entry<String, Object> propertyEntry : propertyMap.entrySet()) {
this.propertyMap.put(propertyEntry.getKey(), propertyEntry.getValue());
}
}
The code above does not do that but hopefully conveys the intent?
What's the best way to do this? I have done some reading on "cloning", "defensive copying", "immutable objects", Collections.unmodifiable... and the like but I am more confused than before.
All I need, in typical SO style, is a better way to write what I mean in the code snippet, please.
It looks like you can just use putAll:
public void addProperties(Map<String, Object> propertyMap) {
this.propertyMap.putAll(propertyMap);
}
This is called "defensive copying". What happens here is the values in the local propertyMap are copied into the instance's propertyMap. A weakness here is that changes the given propertyMap aren't going to be reflected in the instance's propertyMap. This is essentially creating a snapshot of the given map and copying that snapshot to the instance field map.
There are other ways of creating defensive copies as well, including clone() and the HashMap(Map) constructor.
For immutable collections, the unmodifiable methods in Collections will return collections that throw exceptions when you try to add to them. For example,
Set<String> strs = Collections.unmodifiableSet(new HashSet<String>());
strs.add("Error"); // This line throws an exception
Immutable collections protect their values by disallowing modification (removing and adding) while defensive copies protect their values by not referencing the copied collection (in other words, changes in the original collection aren't shown in the copy).
I think for each key you don't have to worry about making copies because they are immutable. But for the values it depends on what type objects they are. If they are mutable objects then you have to make copies of all of them.
public void addProperties(Map<String, Object> propertyMap) {
Cloner cloner = new Cloner();
for (Map.Entry<String, Object> propertyEntry : propertyMap.entrySet()) {
this.propertyMap.put(propertyEntry.getKey(), cloner.deepClone(propertyEntry.getValue()));
}
}
You can check this for deep clonning Deep clone utility recomendation.
From the homepage http://code.google.com/p/cloning/
IMPORTANT : deep cloning of Java classes might mean thousands of objects are cloned! Also cloning of files and streams might make the JVM crash. Enable dumping of cloned classes to stdout during development is highly recommended in order to view what is cloned.
So, it's good to know what you are trying to clone.
Preface: I'm know that in most cases using a volatile field won't yield any measurable performance penalty, but this question is more theoretical and targeted towards a design with an extremly high corrency support.
I've got a field that is a List<Something> which is filled after constrution. To save some performance I would like to convert the List into a read only Map. Doing so at any point requires at least a volatile Map field so make changes visible for all threads.
I was thinking of doing the following:
Map map;
public void get(Object key){
if(map==null){
Map temp = new Map();
for(Object value : super.getList()){
temp.put(value.getKey(),value);
}
map = temp;
}
return map.get(key);
}
This could cause multiple threads to generate the map even if they enter the get block in a serialized way. This would be no big issue, if threads work on different identical instances of the map. What worries me more is:
Is it possible that one thread assigns the new temp map to the map field, and then a second thread sees that map!=null and therefore accesses the map field without generating a new one, but to my suprise finds that the map is empty, because the put operations where not yet pushed to some shared memory area?
Answers to comments:
The threads only modify the temporary map after that it is read only.
I must convert a List to a Map because of some speical JAXB setup which doesn't make it feasable to have a Map to begin with.
Is it possible that one thread assigns the new temp map to the map field, and then a second thread sees that map!=null and therefore accesses the map field without generating a new one, but to my suprise finds that the map is empty, because the put operations where not yet pushed to some shared memory area?
Yes, this is absolutely possible; for example, an optimizing compiler could actually completely get rid of the local temp variable, and just use the map field the whole time, provided it restored map to null in the case of an exception.
Similarly, a thread could also see a non-null, non-empty map that is nonetheless not fully populated. And unless your Map class is carefully designed to allow simultaneous reads and writes (or uses synchronized to avoid the issue), you could also get bizarre behavior if one thread is calling its get method while another is calling its put.
Can you create your Map in the ctor and declare it final? Provided you don't leak the map so others can modify it, that should suffice to make your get() safely sharable by multiple threads.
When you really in doubt whether an other thread could read an "half completed" map
(I don't think so, but never say never ;-), you may try this.
map is null or complete
static class MyMap extends HashMap {
MyMap (List pList) {
for(Object value : pList){
put(value.getKey(), value);
}
}
}
MyMap map;
public Object get(Object key){
if(map==null){
map = new MyMap (super.getList());
}
return map.get(key);
}
Or does someone see a new introduced problem ?
In addition to the visibility concerns previously mentioned, there is another problem with the original code, viz. it can throw a NullPointerException here:
return this.map.get(key)
Which is counter-intuitive, but that is what you can expect from incorrectly synchronized code.
Sample code to prevent this:
Map temp;
if ((temp = this.map) == null)
{
temp = new ImmutableMap(getList());
this.map = temp;
}
return temp.get(key);