HashMap put method clarification - java

I have a curious situation - there is a HashMap, that is initialized as follows:
HashMap<String, HashSet<String>> downloadMap = new HashMap<String, HashSet<String>>();
and then I have the following things, that will be executed indefinitely via a quartz scheduler:
myHashSet = retrieve(animal);
downloadMap.put(myKey, myHashSet);
// do stuff
downloadMap.get(myKey).clear();
What happens after, is that one value gets associated with the different keys. So, for instance, I will have things like:
Kitens [cute kitten, sad kitten]
Puppies [cute kitten, sad kitten]
Which never should happen.
Particularly, after I retrieve the HashSet of the kittens:
myHashSet = retrieve(animal);
myHashSet = [cute kitten, sad kitten]
downloadMap = Kittens [], Puppies[]
then put() is executed and I get:
downloadMap = Kitens [cute kitten, sad kitten], Puppies [cute kitten, sad kitten]
Does anyone knows why this is the case?
Thank you in advance!

Looks like you use the same HashSet<String> reference in all your values of the HashMap<String, HashSet<String>>. Knowing this, the problem is how you insert the HashSet<String>s in your HashMap. Note that you must use a new HashSet<String> reference for every key-value pair.
Update your question accordingly to receive a more specific answer.
Not directly associated to the real problem, it is better to program oriented to interfaces instead of direct class implementations. With this, I mean that you should declare the downloadMap variable as
Map<String, Set<String>> downloadMap = new HashMap<String, Set<String>>();
Similar for the Sets that will be put in this map.
More info:
What does it mean to "program to an interface"?

The solution is to re-program retrieve() so it returns a different HashSet every time it is called. In fact, my preferred solution consists in allowing the caller to specify where to retrieve objects as a parameter:
myHashSet= retrieve( new HashSet<String>() ) ;
So, if a different program ever wanted to accumulate objects in a single set, it could simply do so by calling retrieve with the same set. The client has the last word!

Related

Java unmodifiableMap can be replaced with 'Map.copyOf' call

I'm new to Java and I recently learnt that somtimes it's important to deepcopy a Collection and make an unmodifiable view of it so that the data inside remains safe and unchanged.
When I try to practice this(unmodifiableMap2), I get a warning from IDEA that
unmodifiableMap Can be replaced with 'Map.copyOf' call
That's weird for me because I think unmodifiableMap is not only a copy of the underlying map. Besides, when I try to create the same unmodifiableMap in another way(unmodifiableMap1), the warning doesn't pop up!
How should I understand this behavior of IDEA ?
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
public class test {
public static void main(String[] args) {
Map<Integer, Integer> map = new HashMap<>();
map.put(1,1);
map.put(2,2);
Map<Integer, Integer> map1 = new HashMap<>(map);
Map<Integer, Integer> unmodifiableMap1 = Collections.unmodifiableMap(map1);
Map<Integer, Integer> unmodifiableMap2 = Collections.unmodifiableMap(new HashMap<>(map););
}
}
Map.copyOf() makes a copy of the given Map instance, but it requires that no value in the map is null. Usually, this is the case, but it is not a strict requirement for a Map in general.
java.util.Collections.unmodifiableMap() just wraps a reference to the given Map instance. This means that the receiver is unable to modify the map, but modifications to the original map (that one that was the argument to unmodifiableMap()) are visible to the receiver.
Assuming we have two threads, one iterates over the unmodifiable map, while the other modifies the original one. As a result, you may get a ConcurrentModificationException for an operation on the unmodifiable map … not funny to debug that thing!
This cannot happen with the copy created by Map.copyOf(). But this has a price: with a copy, you need two times the amount of memory for the map (roughly). For really large maps, this may cause memory shortages up to an OutOfMemoryError. Also not fun to debug!
In addition, just wrapping the existing map is presumably much faster than copying it.
So there is no best solution in general, but for most scenarios, I have a preference for using Map.copyOf() when I need an unmodifiable map.
The sample in the question did not wrap the original Map instance, but it makes a copy before wrapping it (either in a line of its own, or on the fly). This eliminates the potential problem with the 'under-the-hood' modification, but may bring the memory issue.
From my experience so far, Map.copyOf( map ) looks to be more efficient than Collections.unmodifiableMap( new HashMap( map ) ).
By the way: Map.copyOf() returns a map that resembles a HashMap; when you copy a TreeMap with it, the sort order gets lost, while the wrapping with unmodifiableMap() keeps the underlying Map implementation and therefore also the sort order. So when this is important, you can use Collections.unmodifiableMap( new TreeMap( map ) ), while Map.copyOf() does not work here.
An unmodifiable map using an existing reference to a map is perfectly fine, and there are many reasons you might want to do this.
Consider this class:
class Foo {
private final Map<String, String> fooMap = new HashMap<>();
// some methods which mutate the map
public Map<String, String> getMap() {
return Collections.unmodifiableMap(fooMap);
}
}
What this class does is provide a read-only view of the map it encapsulates. The class can be sure that clients who consume the map cannot alter it, they can just see its contents. They will also be able to see any updates to the entries if they keep hold of the reference for some time.
If we had tried to expose a read-only view by copying the map, it would take time and memory to perform the copy and the client would not see any changes because both maps are then distinct instances - the source and the copy.
However in the case of this:
Collections.unmodifiableMap(new HashMap<>(map));
You are first copying the map into a new hash map and then passing that copy into Collections.unmodifiableMap. The result is effectively constant. You do not have a reference to the copy you created with new HashMap<>(map), and nor can you get one*.
If what you want is a constant map, then Map.copyOf is a more concise way of achieving that, so IntelliJ suggests you should use that instead.
In the first case, since the reference to the map already exists, IntelliJ cannot make the same inference about your intent so it gives no such suggestion.
You can see the IntelliJ ticket for this feature if you like, though it doesn't explain why the two are essentially equivalent, just that they are.
* well, you probably could via reflection, but IntelliJ is assuming that you won't
Map.copyOf(map) is fully equivalent to Collections.unmodifiableMap(new HashMap<>(map)).
Neither does any kind of deep copying. But it's strictly shorter to do Maps.copyOf(map).

Using temp variable using collectors

I have the following code. I am trying to understand if it would make any changes to memory.
Approach 1: Using collectors I can directly return map like so:
List<Customer> customerList = new ArrayList<>();
customerList.add(new Customer("1", "pavan"));
customerList.add(new Customer("2", "kumar"));
return customerList.stream().collect(Collectors.toMap(t->t.getId(), t->t));
Approach 2: Using an explicit map to collect results, like so:
Map<String,Customer> map = new HashMap<String, Customer>();
map = customerList.stream().collect(Collectors.toMap(t->t.getId(), t->t));
return map;
Compared to the first, does the second approach make any difference to memory/ GC, if I iterate over a million times?
Aside from instantiating a Map instance that you don't need in the second example, both pieces of code are identical. You immediately replace the Map reference you created with the one returned by the stream. Most likely the compiler would eliminate that as redundant code.
The collect method of the streams API will instantiate a Map for you; the code has been well optimised, which is one of the advantages of using the Stream API over doing it yourself.
To answer your specific question, you can iterate over both sections of code as many times as you like and it won't make any difference to the GC impact.
There code is by far not identical; specifically Collectors.toMap says that it will return a Map :
There are no guarantees on the type, mutability, serializability, or thread-safety of the Map returned.
There are absolutely no guarantees what-so-ever that the returned Map is actually a HashMap. It could be anything other - any other Map here; so assigning it to a HashMap is just wrong.
Then there is the way you build the Collector. It could be simplified to:
customerList.stream().collect(Collectors.toMap(Customer::getId, Function.identity()));
The method reference Customer::getId, as opposed to the lambda expression, will create one less method (since lambda expressions are de-sugared to methods and method references are not).
Also Function.identity() instead of t -> t will create less objects if used in multiple places. Read this.
Then there is the fact how a HashMap works internally. If you don't specify a default size, it might have to re-size - which is an expensive operation. By default Collectors.toMap will start with a default Map of 16 entries and a load_factor of 0.75 - which means you can put 12 entries into it before the next resize.
You can't omit that using Collectors.toMap since the supplier of that will always start from HashMap::new - using the default 16 entries and load_factor of 0.75.
Knowing this, you can drop the stream entirely:
Map<String, Customer> map = new HashMap<String, Customer>((int)Math.ceil(customerList.size() / 0.75));
customerList.forEach(x -> map.put(x.getId(), x));
In the second approach, you are instantiating a Map, and reassigning the reference to the one returned by the call to stream.collect().
Obviously, the first Map object referenced by "map" is lost.
The first approach does not have this problem.
In short, yes, this makes a minor difference in terms of memory usage, but it is likely negligible considering you have a million entries to iterate over.

When do I want to declare interface over the actual class?

For example, I often see this:
Set<Integer> s = new TreeSet<Integer>();
Set<Integer> s = new HashSet<Integer>();
Map<Integer, String> m = new HashMap<Integer, String>();
over
TreeSet<Integer> ts = new TreeSet<Integer>();
HashSet<Integer> hs = new HashSet<Integer>();
HashMap<Integer, String> hm = new HashMap<Integer, String>();
What are the advantages/disadvanges of the former vs the latter?
For me it comes down to a number of points.
Do you care about the implementation? Does your code need to know that the Map is a HashMap or a TreeMap ? Or does it just care that it's got a key/value structure of some kind
It also means that when I'm building my implementation code, if I expose a method that returns Map, I can change the implementation over time without effecting any code that relies on it (hence the reason why it's a bad idea to try and cast these types of values)
The other is that it becomes easier to move these structures around the code, such that any method that can accept a Map is going to be easier to deal with then one that relies on a HashMap for instance
The convention (that I follow) is basically to use the lowest functional interface that meets the need of the API. No point using an interface that does not provide the functionality your API needs (for example, if you need a SortedMap, no point using a Map then)
IMHO
Generally, you want to declare the most general type that has the behavior you're actually using. That way you don't have to change as much code if you decide to take a different concrete class. And you allow users of the function more freedom.
You should read On Understanding Data Abstraction, Revisited by William R. Cook and also his Proposal for Simplified, Modern Definitions of "Object" and "Object Oriented".
Bascially: if you use Java classes as anything else than factories, i.e. if you have a classname anywhere expect after a new operator, then you are not doing object-oriented programming. Following this rule does not guarantee that you are doing OO, but violating this rule means that you aren't.
Note: there's nothing wrong with not doing OO.
The maintenance of an application can cost three times as much as the development. This means you want the code to be as simple and as clear as possible.
If you use a List instead of an ArrayList, you make it clear you are not using any method special to an ArrayList and that it can be changed from another List implementation. The problem with using an ArrayList when it doesn't have to be is that it takes a long time to determine safely that really it could have been a List. i.e. its very hard to prove you never needed something. (It is relatively easy to add something than remove it)
A similar example is using Vector when a List will do. If you see a Vector you say; the developer chose a Vector for a good reason, it is thread safe. But I need to change it now and check that the code is thread safe. They you say, but I can't see how it is used in a multi-threaded way so I have to check all the ways it could possibly be used or do I need to add synchronized when I iterate over it when actually it never need to be thread safe. Using a thread safe collection when it doesn't need to be is not just a waste of CPU time but more importantly a waste of the developers time.

Java: Using a hashmap, retrieving all values and calling methods

I have a need to store a list of dynamically created objects in a way where they can all be retrieved and their methods called on demand.
As far as I can see for the list and creation, a HashMap fits my needs but i'm a bit puzzled on recalling the objects and calling their methods using the HashMap.
Just as a reference, let me give you a little code:
Here is the HashMap:
Map<String, Object> unitMap = new HashMap<String, Object>();
// here is how I put an object in the Map notice i'm passing coordinates to the constructor:
unitMap.put("1", new Worker(240, 240));
unitMap.put("2", new Worker(240, 240));
Now I need to create a method that retrieves every object in the hashmap and call a method from each object. is this possible or can the created objects only be referenced directly. If so, is there another way to call a method of all existing instances of a class dynamically (in other words, on user input)?
Sure. You can do this:
for (Object thing : unitMap.values()) {
// use "thing" here
}
If you need the keys too, you can either get just the keys:
for (String key : unitMap.keySet()) {
// use "key" here
}
or both the keys and values together:
for (Map.Entry<String, Object> entry : unitMap.entrySet()) {
// use "entry.getKey()" and "entry.getValue()"
}
In all the above cases, each entry in the map is traversed one by one. So at the end of the loop, you'll have processed all the entries in the map.
If all of the values in the Map are Worker objects, you should declare your map to be of type Map<String, Worker>. This way, when you pull a value out of the map, it will be typed as a Worker. This way you can call any method declared on Worker as opposed to having to check the type at runtime using instanceof.
If the map holds different values, and you need to keep the value type as Object, it may be advantageous to use an interface to define the method that you want to call for each different object type.
If you do not know what method you want to run on the values until runtime, and the map can hold different values, you will just have to do what you are currently doing, and use Map<String, Object>.
Finally, to get the values of the map, you do just as Chris Jester-Young mentioned before me. The biggest advantage, as I said previously, is that your objects will be typed, and you will have no need for casting/instanceof checking.
I use this to put all values from hashMap on a List, hope it helps.
private List<String> getValuesFromHashMap(HashMap<String, String> hashMap) {
List<String> values = new ArrayList<String>();
for (String item : hashMap.values()) {
values.add(item);
}
return values;
}

cast LinkedHashMap to HashMap in groovy

How do I convert LinkedHashMap to java.util.HashMap in groovy?
When I create something like this in groovy, it automatically creates a LinkedHashMap even when I declare it like HashMap h = .... or def HashMap h = ...
I tried doing:
HashMap h = ["key1":["val1", "val2"], "key2":["val3"]]
and
def HashMap h = ["key1":["val1", "val2"], "key2":["val3"]]
h.getClass().getName() still comes back with LinkedHashMap.
LinkedHashMap is a subclass of HashMap so you can use it as a HashMap.
Resources :
javadoc - LinkedHashMap
Simple answer -- maps have something that looks a lot like a copy constructor:
Map m = ['foo' : 'bar', 'baz' : 'quux'];
HashMap h = new HashMap(m);
So, if you're wedded to the literal notation but you absolutely have to have a different implementation, this will do the job.
But the real question is, why do you care what the underlying implementation is? You shouldn't even care that it's a HashMap. The fact that it implements the Map interface should be sufficient for almost any purpose.
He probably got caught with the dreaded Groovy-Map-Gotcha, and was stumbling around in the wilderness of possibilities as I did for the entire afternoon.
Here's the deal:
When using variable string keys, you cannot access a map in property notation format (e.g. map.a.b.c), something unexpected in Groovy where everything is generally concise and wonderful ;--)
The workaround is to wrap variable keys in parens instead of quotes.
def(a,b,c) = ['foo','bar','baz']
Map m = [(a):[(b):[(c):1]]]
println m."$a"."$b"."$c" // 1
println m.foo.bar.baz // also 1
Creating the map like so will bring great enjoyment to sadists worldwide:
Map m = ["$a":["$b":["$c":1]]]
Hope this saves another Groovy-ist from temporary insanity...
HashMap h = new HashMap()
h.getClass().getName();
works. Using the [:] notation seems to tie it to LinkedHashMap.

Categories

Resources