Is it possible/required to speed up HashMap operations on same entry?

Is it possible/required to speed up HashMap operations on same entry? - java

Suppose I wish to check HashMap entry and then replace it:
if( check( hashMap.get(key) ) ) {
hashMap.put(key, newValue);
}
this will cause search procedure inside HashMap to run two times: once while get and another one while put. This looks ineffective. Is it possible to modify value of already found entry of Map?
UPDATE
I know I can make a wrapper and I know I have problems to mutate entry. But the question is WHY? May be HashMap remembers last search to improve repeated one? Why there are no methods to do such operation?

EDIT: I've just discovered that you can modify the entry, via Map.Entry.setValue (and the HashMap implementation is mutable). It's a pain to get the entry for a particular key though, and I can't remember ever seeing anyone do this. You can get a set of the entries, but you can't get the entry for a single key, as far as I can tell.
There's one evil way of doing it - declare your own subclass of HashMap within the java.util package, and create a public method which just delegates to the package-private existing method:
package java.util;
// Please don't actually do this...
public class BadMap<K, V> extends HashMap<K, V> {
public Map.Entry<K, V> getEntryPublic(K key) {
return getEntry(key);
}
}
That's pretty nasty though.
You wouldn't normally modify the entry - but of course you can change data within the value, if that's a mutable type.
I very much doubt that this is actually a performance bottleneck though, unless you're doing this a heck of a lot. You should profile your application to prove to yourself that this is a real problem before you start trying to fine-tune something which is probably not an issue.
If it does turn out to be an issue, you could change (say) a Map<Integer, String> into a Map<Integer, AtomicReference<String>> and use the AtomicReference<T> as a simple mutable wrapper type.

Too much information for a comment on your question. Check the documentation for Hashmap.
This implementation provides constant-time performance for the basic
operations (get and put), assuming the hash function disperses the
elements properly among the buckets. Iteration over collection views
requires time proportional to the "capacity" of the HashMap instance
(the number of buckets) plus its size (the number of key-value
mappings). Thus, it's very important not to set the initial capacity
too high (or the load factor too low) if iteration performance is
important.
Constant time means that it always requires the same amount of time to do the get and put operations [O(1)]. The amount of time that is going to be required is going to be linear based on how many times you need to loop [O(n)].

You can change the entry if it is mutable. One example of where you might do this is
private final Map<String, List<String>> map = new LinkedHashMap<>();
public void put(String key, String value) {
List<String> list = map.get(key);
if (list == null)
map.put(key, list = new ArrayList<>());
list.add(value);
}
This allows you to update a value, but you can't find and replace a value in one operation.

Take a look at trove ( http://trove4j.sourceforge.net/ ), their maps do have several methods that might be what you want:
adjustOrPut
putIfAbsent
I don't know how this is implemented internally, but i would guess that since trove is made to be highly performant, there will be only one lookup.

Related

In Java, what precautions should be taken when using a Set as a key in a Map?

I'm not sure what are prevailing opinions about using dynamic objects such as Sets as keys in Maps.
I know that typical Map implementations (for example, HashMap) use a hashcode to decide what bucket to put the entry in, and that if that hashcode should change somehow (perhaps because the contents of the Set should change at all, then that could mess up the HashMap by causing the bucket to be incorrectly computed (compared to how the Set was initially inserted into the HashMap).
However, if I ensure that the Set contents do not change at all, does that make this a viable option? Even so, is this approach generally considered error-prone because of the inherently volatile nature of Sets (even if precautions are taken to ensure that they are not modified)?
It looks like Java allows one to designate function arguments as final; this is perhaps one minor precaution that could be taken?
Do people even do stuff like this in commercial/open-source practice? (put List, Set, Map, or the like as keys in Maps?)
I guess I should describe sort of what I'm trying to accomplish with this, so that the motivation will become more clear and perhaps alternate implementations could be suggested.
What I am trying to accomplish is to have something of this sort:
class TaggedMap<T, V> {
Map<Set<T>, V> _map;
Map<T, Set<Set<T>>> _keys;
}
...essentially, to be able to "tag" certain data (V) with certain keys (T) and write other auxiliary functions to access/modify the data and do other fancy stuff with it (ie. return a list of all entries satisfying some criteria of keys). The function of the _keys is to serve as a sort of index, to facilitate looking up the values without having to cycle through all of _map's entries.
In my case, I intend to specifically use T = String, V = Integer. Someone I talked to about this had suggested substituting a String for the Set, viz, something like:
class TaggedMap<V> {
Map<String, V> _map;
Map<T, Set<String>> _keys;
}
where the key in _map is of the sort "key1;key2;key3" with keys separated by delimiter. But I was wondering if I could accomplish a more generalised version of this rather than having to enforce a String with delimiters between the keys.
Another thing I was wondering was whether there was some way to make this as a Map extension. I was envisioning something like:
class TaggedMap<Set<T>, V> implements Map<Set<T>, V> {
Map<Set<T>, V> _map;
Map<T, Set<Set<T>>> _keys;
}
However, I was not able to get this to compile, probably due to my inferior understanding of generics. With this as a goal, can anyone fix the above declaration so that it works according to the spirit of what I had described or suggest some slight structural modifications? In particular, I am wondering about the "implements Map, V>" clause, whether it is possible to declare such a complex interface implementation.

You are correct that if you ensure that
The Set contents are not modified, and
The Sets themselves are not modified
That it is perfectly safe to use them as keys in a Map.
It's difficult to ensure that (1) is not violated accidentally. One option might be to specifically design the class being stored inside the Set so that all instances of that class are immutable. This would prevent anyone from accidentally changing one of the Set keys, so (1) would not be possible. For example, if you use a Set<String> as a key, you don't need to worry about the Strings inside the Set changing due to external modification.
You can make (2) possible quite easily by using the Collections.unmodifiableSet method, which returns a wrapped view of a Set that cannot be modified. This can be done to any Set, which means that it's probably a very good idea to use something like this for your keys.
Hope this helps! And if your user name means what I think it does, good luck learning every language! :-)

As you mention, sets can change, and even if you prevent the set from changing (i.e., the elements it contains), the elements themselves may change. Those factor into the hashcode.
Can you describe what you are trying to do in higher-level terms?

#templatetypedef's answer is basically correct. You can only safely use a Set as a key in some data structure if the set's state cannot change while it is a key. If the set's state changes, the invariants of the data structure are violated and operations on it will give incorrect results.
The wrappers created using Collections.unmodifiableSet can help, but there is a hidden gotcha. If the original set is still directly reachable, the application could modify it; e.g.
public void addToMap(Set key, Object value);
someMap.put(Collections.unmodifiableSet(key), value);
}
// but ...
Set someKey = ...
addToMap(someKey, "Hi mum");
...
someKey.add("something"); // Ooops ...
To guarantee that this can't happen, you need to make a deep copy of the set before you wrap it. That could be expensive.
Another problem with using a Set as a key is that it can be expensive. There are two general approaches to implementing key / value mappings; using hashcode method or using a compareTo method that implements an ordering. Both of these are expensive for sets.

modifying a ConcurrentHashMap and Synchronized ArrayList in same method

I have a collection of objects that is modified by one thread and read by another (more specifically the EDT). I needed a solution that gave me fast look up and also fast indexing (by order inserted), so I'm using a ConcurrentHashMap with an accompanying ArrayList of the keys, so if want to index an entry, I can index the List for the key and then use the returned key to get the value from the hash map. So I have a wrapper class that makes sure when and entry is added, the mapping is added to the hash map and the key is added to the list at the same time, similarly for removal.
I'm posting an example of the code in question:
private List<K> keys = Collections.synchronizedList(new ArrayList<K>(INITIAL_CAPACITY));
private ConcurrentMap<K, T> entries = new ConcurrentHashMap<K, T>(INITIAL_CAPACITY, .75f);
public synchronized T getEntryAt(int index){
return entries.get(keys.get(index));
}
**public synchronized void addOrReplaceEntry(K key, T value){
T result = entries.get(key);
if(result == null){
entries.putIfAbsent(key, value);
keys.add(key);
}
else{
entries.replace(key, result);
}
}**
public syncrhonized T removeEntry(K key, T value){
keys.remove(key);
entries.remove(key, value);
}
public synchronized int getSize(){
return keys.size();
}
my question is: am I losing all the benefits of using the ConcurrentHashMap (over syncrhonized hashmap) by operating on it in synchronized methods? I have to synchronize the methods to safely modify/read from the ArrayList of keys (CopyOnWriteArrayList is not an option because a lot of modification happens...) Also, if you know of a better way to do this, that would be appreciated...

Yes, using a Concurrent collection and a Synchronized collection in only synchronized blocks is a waste. You wont get the benefits of ConcurrentHashMap because only one thread will be accesing it at a time.
You could have a look at this implementation of a concurrent linked hashmap, I havnt use it so can't attest to it's features.
One thing to consider would be to switching from synchronized blocks to a ReadWriteLock to improve concurrent read only performance.
I'm not really sure of the utility of proving a remove at index method, perhaps you could give some more details about the problem you are trying to solve?

It seems that you only care about finding values by index. If so, dump the Map and just use a List. Why do you need the Map?

Mixing synchronized and concurrent collections the way you have done it is not recommended. Any reason why you are maintaining two copies of the stuff you are interested in? You can easily get a list of all the keys from the map anytime rather than maintaining a separate list.

Why not store the values in the list and in the map the key -> index mapping?
so for getEntry you only need on lookup (in the list which should be anyway faster than a map) and for remove you do not have to travers the whole list. Syhnronization happens so.

You can get all access to the List keys onto the event queue using EventQueue.invokeLater. This will get rid of the synchronization. With all the synching you were not running much in parallel anyway. Also it means the getSize method will give the same answer for the duration of an event.
If you stick with synchronization instead of using invokeLater, at least get the entries hash table out of the synch block. Either way, you get more parallel processing. Of course, entries can now become out-of-synch with keys. The only down side is sometimes a key will come up with a null entry. With such a dynamic table this is unlikely to matter much.
Using the suggestion made by chrisichris to put the values in the list will solve this problem if it is one. In fact, this puts a nice wall between keys and entries; they are now used in completely separate ways. (If your only need for entries is to provide values to the JTable, you can get rid of it.) But entries (if still needed) should reference the entries, not contain an index; maintaining indexes there would be a hopeless task. And always remember that keys and entries are snapshots of "reality" (for lack of a better word) taken at different times.

Iterating through the union of several Java Map key sets efficiently

In one of my Java 6 projects I have an array of LinkedHashMap instances as input to a method which has to iterate through all keys (i.e. through the union of the key sets of all maps) and work with the associated values. Not all keys exist in all maps and the method should not go through each key more than once or alter the input maps.
My current implementation looks like this:
Set<Object> keyset = new HashSet<Object>();
for (Map<Object, Object> map : input) {
for (Object key : map.keySet()) {
if (keyset.add(key)) {
...
}
}
}
The HashSet instance ensures that no key will be acted upon more than once.
Unfortunately this part of the code is rather critical performance-wise, as it is called very frequently. In fact, according to the profiler over 10% of the CPU time is spent in the HashSet.add() method.
I am trying to optimise this code us much as possible. The use of LinkedHashMap with its more efficient iterators (in comparison to the plain HashMap) was a significant boost, but I was hoping to reduce what is essentially book-keeping time to the minimum.
Putting all the keys in the HashSet before-hand, by using addAll() proved to be less efficient, due to the cost of calling HashSet.contains() afterwards.
At the moment I am looking at whether I can use a bitmap (well, a boolean[] to be exact) to avoid the HashSet completely, but it may not be possible at all, depending on my key range.
Is there a more efficient way to do this? Preferrably something that will not pose restrictions on the keys?
EDIT:
A few clarifications and comments:
I do need all the values from the maps - I cannot drop any of them.
I also need to know which map each value came from. The missing part (...) in my code would be something like this:
for (Map<Object, Object> m : input) {
Object v = m.get(key);
// Do something with v
}
A simple example to get an idea of what I need to do with the maps would be to print all maps in parallel like this:
Key Map0 Map1 Map2
F 1 null 2
B 2 3 null
C null null 5
...
That's not what I am actually doing, but you should get the idea.
The input maps are extremely variable. In fact, each call of this method uses a different set of them. Therefore I would not gain anything by caching the union of their keys.
My keys are all String instances. They are sort-of-interned on the heap using a separate HashMap, since they are pretty repetitive, therefore their hash code is already cached and most hash validations (when the HashMap implementation is checking whether two keys are actually equal, after their hash codes match) boil down to an identity comparison (==). The profiler confirms that only 0.5% of the CPU time is spent on String.equals() and String.hashCode().
EDIT 2:
Based on the suggestions in the answers, I made a few tests, profiling and benchmarking along the way. I ended up with roughly a 7% increase in performance. What I did:
I set the initial capacity of the HashSet to double the collective size of all input maps. This gained me something in the region of 1-2%, by eliminating most (all?) resize() calls in the HashSet.
I used Map.entrySet() for the map I am currently iterating. I had originally avoided this approach due to the additional code and the fear that the extra checks and Map.Entry getter method calls would outweigh any advantages. It turned out that the overall code was slightly faster.
I am sure that some people will start screaming at me, but here it is: Raw types. More specifically I used the raw form of HashSet in the code above. Since I was already using Object as its content type, I do not lose any type safety. The cost of that useless checkcast operation when calling HashSet.add() was apparently important enough to produce a 4% increase in performance when removed. Why the JVM insists on checking casts to Object is beyond me...

Can't provide a replacement for your approach but a few suggestions to (slightly) optimize the existing code.
Consider initializing the hash set with a capacity (the sum of the sizes of all maps). This avoids/reduces resizing of the set during an add operation
Consider not using the keySet() as it will always create a new set in the background. Use the entrySet(), that should be much faster
Have a look at the implementations of equals() and hashCode() - if they are "expensive", then you have a negative impact on the add method.

How you avoid using a HashSet depends on what you are doing.
I would only calculate the union once each time the input is changed. This should be relatively rare conmpared with the number of lookups.
// on an update.
Map<Key, Value> union = new LinkedHashMap<Key, Value>();
for (Map<Key, Value> map : input)
union.putAll(map);
// on a lookup.
Value value = union.get(key);
// process each key once
for(Entry<Key, Value> entry: union) {
// do something.
}

Option A is to use the .values() method and iterate through it. But I suppose you already had thought of it.
If the code is called so often, then it might be worth creating additional structures (depending of how often the data is changed). Create a new HashMap; every key in any of your hashmaps is a key in this one and the list keeps the HashMaps where that key appears.
This will help if the data is somewhat static (related to the frequency of queries), so the overload from managing the structure is relatively small, and if the key space is not very dense (keys do not repeat themselves a lot in different HashMaps), as it will save a lot of unneeded contains().
Of course, if you are mixing data structures it is better if you encapsulate all in your own data structure.

You could take a look at Guava's Sets.union() http://guava-libraries.googlecode.com/svn/tags/release04/javadoc/com/google/common/collect/Sets.html#union(java.util.Set,%20java.util.Set)

Finding the highest-n values in a Map

I have a large map of String->Integer and I want to find the highest 5 values in the map. My current approach involves translating the map into an array list of pair(key, value) object and then sorting using Collections.sort() before taking the first 5. It is possible for a key to have its value updated during the course of operation.
I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
Could I have some suggestions/alternatives on optimizing this please? Am happy to consider different data structures if there is benefit.
Thanks!

Well, to find the highest 5 values in a Map, you can do that in O(n) time where any sort is slower than that.
The easiest way is to simply do a for loop through the entry set of the Map.
for (Entry<String, Integer> entry: map.entrySet()) {
if (entry.getValue() > smallestMaxSoFar)
updateListOfMaximums();
}

You could use two Maps:
// Map name to value
Map<String, Integer> byName
// Maps value to names
NavigableMap<Integer, Collection<String>> byValue
and make sure to always keep them in sync (possibly wrap both in another class which is responsible for put, get, etc). For the highest values use byValue.navigableKeySet().descendingIterator().

I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
There is an approach in between that you can take as well. When a thread requests a "sorted view" of the map, create a copy of the map and then handle the sorting on that.
public List<Integer> getMaxFive() {
Map<String, Integer> copy = null;
synchronized(lockObject) {
copy = new HashMap<String, Integer>(originalMap);
}
//sort the copy as usual
return list;
}
Ideally if you have some state (such as this map) accessed by multiple threads, you are encapsulating the state behind some other class so that each thread is not updating the map directly.

I would create a method like:
private static int[] getMaxFromMap(Map<String, Integer> map, int qty) {
int[] max = new int[qty];
for (int a=0; a<qty; a++) {
max[a] = Collections.max(map.values());
map.values().removeAll(Collections.singleton(max[a]));
if (map.size() == 0)
break;
}
return max;
}
Taking advantage of Collections.max() and Collections.singleton()

There are two ways of doing that easily:
Put the map into a heap structure and retrive the n elements you want from it.
Iterate through the map and update a list of n highest values using each entry.
If you want to retrive an unknown or a large number of highest values the first method is the way to go. If you have a fixed small amount of values to retrieve, the second might be easier to understand for some programmers.
Personally, I prefer the first method.

Please try another data structure. Suppose there's a class named MyClass which its attributes are key (String) and value (int). MyClass, of course, needs to implement Comparable interface. Another approach is to create a class named MyClassComparator which extends Comparator.
The compareTo (no matter where it is) method should be defined like this:
compareTo(parameters){
return value2 - value1; // descending
}
The rest is easy. Using List and invoking Collections.sort(parameters) method will do the sorting part.
I don't know what sorting algorithm Collections.sort(parameters) uses. But if you feel that some data may come over time, you will need an insertion sort. Since it's good for a data that nearly sorted and it's online.

If modifications are rare, I'd implement some SortedByValHashMap<K,V> extends HashMap <K,V>, similar to LinkedHashMap) that keeps the entries ordered by value.

What Java Data Structure/Solution would best fit these requirements?

I need a java data structure/solution that meets these requirements. What best fits these?
1) Object's insertion order must be kept
2) Object's must be unique (These are database objects that are uniquely identified by a UUID).
3) If a newer object with the same ID is added, the older version of the object should be over-written/removed
4) The Solution should be accessible by many threads.
5) When the first object added to the Structure is read/used, it should be removed from the data structure

There are a couple of possibilities here. The simplest might be to start with a LinkedHashSet. That will provide you with the uniqueness and predictable ordering that you require. Then, you could wrap the resulting set to make it thread-safe:
Set<T> s = Collections.synchronizedSet(new LinkedHashSet<T>(...));
Note: Since a Set doesn't really define a method for retrieving items from it, your code would have to manually invoke Set.remove(Object).
Alternatively, you could wrap a LinkedHashMap, which does provide a hook for the delete-on-read semantics you require:
class DeleteOnReadMap<K, V> implements Map<K, V> {
private Map<K, V> m = new LinkedHashMap<K, V>();
// implement Map "read" methods Map with delete-on-read semantics
public V get(K key) {
// ...
}
// (other read methods here)
// implement remaining Map methods by forwarding to inner Map
public V put(K key, V value) {
return m.put(key, value);
}
// (remaining Map methods here)
}
Finally, wrap an instance of your custom Map to make it thread-safe:
Map<K, V> m = Collections.synchronizedMap(new DeleteOnReadMap<K, V>(...));

My thought is something like the following:
Collections.synchronizedMap(new LinkedHashMap<K, V>());
I think that takes care of everything except requirement 5, but you can do that by using the remove() method instead of get().
This won't be quite as efficient as a ConcurrentMap would be - synchronization locks the entire map on every access, but I think ConncurrentMap implementations can use read-write locks and selective locking on only part of the map to allow multiple non-conflicting accesses to go on simultaneously. If you wanted, you could probably get better performance by writing your own subclass of some existing Map implementation.

1) Object's insertion order must be
kept
This is any "normal" data structure - array, arrayList, tree. So avoid self-balancing or self-sorting data structures: heaps, hashtables, or move-to-front trees (splay trees, for example.) Then again, you could use one of those structures, but then you have to keep track of its insertion order in each node.
2) Object's must be unique (These are
database objects that are uniquely
identified by a UUID).
Keep a unique identifier associated with each object. If this is a C program, then the pointer to that node is unique (I guess this applies in Java as well.) If the node's pointer is not sufficient to maintain "uniqueness", then you need to add a field to each node which you gaurantee to have a unique value.
3) If a newer object with the same ID
is added, the older version of the
object should be over-written/removed
Where do you want to place the node? Do you want to replace the existing node? Or do you want to delete the old node,and then add the new one to the end? This is important because it is related to your requirement #1, where the order of insertion must be preserved.
4) The Solution should be accessible
by many threads.
The only way I can think of to do this is to implement some sort of locking. Java lets you wrap strucutres and code within an synchronized block.
5) When the first object added to the
Structure is read/used, it should be
removed from the data structure
Kinda like a "dequeue" operation.
Seems like an ArrayList is a pretty good option for this: simply because of #5. The only problem is that searches are linear. But if you have a relatively small amount of data, then it isn't really that much of a problem.
Otherwise, like others have said: a HashMap or even a Tree of some sort would work - but that will depend on the frequency of accesses. (For example, if the "most recent" element is most likely to be accessed, I'd use a linear structure. But if accesses will be of "random" elements, I'd go with a HashMap or Tree.)

The solutions talking about LinkedHashSet would be a good starting point.
However, you would have to override the equals and hashcode methods on the objects that you are going to be putting in the set in order to satisfy your requirement number 3.

Sounds like you have to create your own data structure, but it sounds like a pretty easy class assignment.
Basically you start with anything like an Array or Stack but then you have to extend it for the rest of the functionality.
You can look at the 'Contains' method as you will need that.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.