fast static key-value mapping - java

I have a set of unique key-value pairs, both key and value are strings. The number of pairs is very huge and finding the value of a certain string is extremely time-critical.
The pairs are computed beforehand and are given for a certain program. So i could just write a method containing:
public String getValue(String key)
{
//repeat for every pair
if(key.equals("abc"))
{
return "def";
}
}
but i am talking about more than 250,000 pairs and perhaps sorting them could be faster...
I am having a class that contains the getValue() method and can use its constructor, but has no connection to any database/file system etc. So every pair has to be defined inside the class.
Do you have any ideas that could be faster than a huge series of if-statements? Perhaps using a sorting map that gets the pairs presorted. Perhaps improve constructor-time by deserializing an already created map?
I would like your answers to contain a basic code example of your approach, I will comment answers with their corresponding time it took an a set of pairs!
Time-frame: for one constructor call and 20 calls of getValue() 1000 milliseconds.
Keys have a size of 256 and values have a size < 16

This is exactly what a hash table is made for. It provides O(1) lookup if implemented correctly, which means that as long as you have enough memory and your hash function and collision strategy are smart, it can get values for keys in constant time. Java has multiple hash-backed data structures, from the sounds of things a HashMap<String, String> would do the trick for you.
You can construct it like this:
Map<String, String> myHashTable = new HashMap<String, String>();
add values like this:
myHashTable.put("abcd","value corresponding to abcd");
and get the value for a key like this:
myHashTable.get("abcd");
You were on the right track when you had the intuition that running through all of the keys and checking was not efficient, that would be an O(n) runtime approach, since you'd have to run through all n elements.

Related

ArrayMap put method pushes elements in strange order

I am using ArrayMap for first time in my project and I thought it works just like an array. I expected when I use .put method it inserts it at next index.
But in my case this is not true - after I added all elements one by one the first element I added ended up at index 4 which is kind of strange.
Here are the first three steps which I add elements:
1 - Salads:
2 - Soups:
3 - Appetizers:
So somehow on second step "Soup" element was inserted in index 0 instead of 1 as I was expecting, but strangely on third step "Appetizers" was inserted as expected after "Soup".
This is the code I am using to push key and value pair:
function ArrayMap<String, DMType> addElement(String typeKey, DMType type) {
ArrayMap<String, DMType> types = new ArrayMap<>();
types.put(typeKey, type);
return types;
}
Am I missing something about the behavior of ArrayMap?
Yeah it is misleading because of the name but ArrayMap does no gurantee order unlike arrays.
ArrayMap is a generic key->value mapping data structure that is
designed to be more memory efficient than a traditional HashMap.
ArrayMap is actually a Map:
public class ArrayMap extends SimpleArrayMap implements Map
If you want the Map functionality with order guranteed use LinkedHashMap instead.
LinkedHashMap defines the iteration ordering, which is normally the
order in which keys were inserted into the map (insertion-order).
documentation
I thought it works just like an array
No, it works like a map, because it is a map. It is similar to a HashMap, but more memory efficient for smaller data sets.
It's order shouldn't and doesn't matter. Under the hood, it is implemented using
an array which has an order since arrays do. This inherently gives the ArrayMap an order, but that is not part of it's API anyway. Just like which memory slot your Java objects are in, you shouldn't care about the order here either.
It doesn't work as an array, I don't see Array in the name but Map and the documentation clearly states that behaves as a generic key->value mapping, more efficient (memory wise) than traditional HashMap implementation.
Actually I don't see why you care about the order compared to the insertion one. Data is private inside the class and you have no way to obtain the element by the index, so you are basically wondering about a private implementation which is irrelevant for its usage.
If you really want to understand how it stores its data you should take a look at the source code.
ArrayMap does NOT work like an Array, instead, it works like a HashMap with performance optimizations.
The internal sequence of the key-value pair is not guaranteed as it is NOT part of the contract.
In your case, what you really want to use is probably an ArrayList<Element>, where the Element class is defined like this:
public class Element{
private final String typeKey;
private final DMType type;
public Element(String typeKey, DMType type){
this.typeKey = typeKey;
this.type = type;
}
}
If you don't want a new Class just to store the result, and you want to keep the sequence, you can use a LinkedHashMap<String, DMType>. As the document specifies:
Class LinkedHashMap
Hash table and linked list implementation of the Map interface, with predictable iteration order. This implementation differs from HashMap in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order). Note that insertion order is not affected if a key is re-inserted into the map. (A key k is reinserted into a map m if m.put(k, v) is invoked when m.containsKey(k) would return true immediately prior to the invocation.)

Is there an efficient way of checking if HashMap contains keys that map to the same value?

I basically need to know if my HashMap has different keys that map to the same value. I was wondering if there is a way other than checking each keys value against all other values in the map.
Update:
Just some more information that will hopefully clarify what I'm trying to accomplish. Consider a String "azza". Say that I'm iterating over this String and storing each character as a key, and it's corresponding value is some other String. Let's say I eventually get to the last occurrence of 'a' and the value is already be in the map.This would be fine if the key corresponding with the value that is already in the map is also 'a'. My issue occurs when 'a' and 'z' both map to the same value. Only if different keys map to the same value.
Sure, the fastest to both code and execute is:
boolean hasDupeValues = new HashSet<>(map.values()).size() != map.size();
which executes in O(n) time.
Sets don't allow duplicates, so the set will be smaller than the values list if there are dupes.
Very similar to EJP's and Bohemian's answer above but with streams:
boolean hasDupeValues = map.values().stream().distinct().count() != map.size();
You could create a HashMap that maps values to lists of keys. This would take more space and require (slightly) more complex code, but with the benefit of greatly higher efficiency (amortized O(1) vs. O(n) for the method of just looping all values).
For example, say you currently have HashMap<Key, Value> map1, and you want to know which keys have the same value. You create another map, HashMap<Value, List<Key>> map2.
Then you just modify map1 and map2 together.
map1.put(key, value);
if(!map2.containsKey(value)) {
map2.put(value, new ArrayList<Key>);
}
map2.get(value).add(key);
Then to get all keys that map to value, you just do map2.get(value).
If you need to put/remove in many different places, to make sure that you don't forget to use map2 you could create your own data structure (i.e. a separate class) that contains 2 maps and implement put/remove/get/etc. for that.
Edit: I may have misunderstood the question. If you don't need an actual list of keys, just a simple "yes/no" answer to "does the map already contain this value?", and you want something better than O(n), you could keep a separate HashMap<Value, Integer> that simply counts up how many times the value occurs in the map. This would take considerably less space than a map of lists.
You can check whether a map contains a value already by calling map.values().contains(value). This is not as efficient as looking up a key in the map, but still, it's O(n), and you don't need to create a new set just in order to count its elements.
However, what you seem to need is a BiMap. There is no such thing in the Java standard library, but you can build one relatively easily by using two HashMaps: one which maps keys to values and one which maps values to keys. Every time you map a key to a value, you can then check in amortized O(1) whether the value already is mapped to, and if it isn't, map the key to the value in the one map and the value to the key in the other.
If it is an option to create a new dependency for your project, some third-party libraries contain ready-made bimaps, such as Guava (BiMap) and Apache Commons (BidiMap).
You could iterate over the keys and save the current value in the Set.
But, before inserting that value in a Set, check if the Set already contains that value.
If this is true, it means that a previous key already contains the same value.
Map<Integer, String> map = new HashMap<>();
Set<String> values = new HashSet<>();
Set<Integter> keysWithSameValue = new HashSet<>();
for(Integer key : map.keySet()) {
if(values.contains(map.get(key))) {
keysWithSameValue.add(key);
}
values.add(map.get(key));
}

How to improve search in an array of HashMap

A search in any HashMap (or any hash based data structure) requires a single hash operation i-e O(1). However, when we have to search through an array of HashMaps then a single search will requires O(n) hash operations, where n is the size of array. I was wondering since array is a collections of consecutive memory locations, therefore there could be an efficient method to reduce the search from O(n) to O(1) in array of HashMap. Or we can design some object that in fact have the advantage of array as well as require single hash operation for search. Any suggestion ???
Consider the scenario, You are processing elements from different users and you want to keep them based on user profiles (separate from each other). The most memory efficient way is to keep them separate is using array of HashMap.
I've edited your question to include the hypothetical scenario of various user accounts, each with various properties. As mentioned in a different answer, the solution here is a map of maps. There are two separate searches occurring:
Find the right user
Find the right property of this user
Each of these searches can be done with a separate map.
The user map:
class Application {
...
Map<String, User> userMap = new HashMap<>();
public User getUser(String userName) {
return userMap.get(userName);
}
...
}
The property map:
class User {
...
Map<String, Property> propertyMap = new HashMap<>();
public Property getProperty(String propertyName) {
return propertyMap .get(propertyName);
}
...
}
Now to find the property named favoriteTowel of the user named Arthur Dent:
myApplication.getUser("Arthur Dent").getProperty("favoriteTowel");
The answer for this is in your question. When you're using an ordinary Array to store these elements, there's no way to get O(1) complexity when searching it.
You stated that the complexity of searching a Hash Map is O(1), so why not store those Hash Maps... in a Hash Map? I'll leave the implementation of that up to you, haha.
Go Go For java.util packages, Go Go for Java Collections
look at set (Hashset, enumset) and hash (HashMap,linkedhash...,idnetityhash..) based implementations. they have O(1) for contains()
Below are stats for reference.

Iterating through the union of several Java Map key sets efficiently

In one of my Java 6 projects I have an array of LinkedHashMap instances as input to a method which has to iterate through all keys (i.e. through the union of the key sets of all maps) and work with the associated values. Not all keys exist in all maps and the method should not go through each key more than once or alter the input maps.
My current implementation looks like this:
Set<Object> keyset = new HashSet<Object>();
for (Map<Object, Object> map : input) {
for (Object key : map.keySet()) {
if (keyset.add(key)) {
...
}
}
}
The HashSet instance ensures that no key will be acted upon more than once.
Unfortunately this part of the code is rather critical performance-wise, as it is called very frequently. In fact, according to the profiler over 10% of the CPU time is spent in the HashSet.add() method.
I am trying to optimise this code us much as possible. The use of LinkedHashMap with its more efficient iterators (in comparison to the plain HashMap) was a significant boost, but I was hoping to reduce what is essentially book-keeping time to the minimum.
Putting all the keys in the HashSet before-hand, by using addAll() proved to be less efficient, due to the cost of calling HashSet.contains() afterwards.
At the moment I am looking at whether I can use a bitmap (well, a boolean[] to be exact) to avoid the HashSet completely, but it may not be possible at all, depending on my key range.
Is there a more efficient way to do this? Preferrably something that will not pose restrictions on the keys?
EDIT:
A few clarifications and comments:
I do need all the values from the maps - I cannot drop any of them.
I also need to know which map each value came from. The missing part (...) in my code would be something like this:
for (Map<Object, Object> m : input) {
Object v = m.get(key);
// Do something with v
}
A simple example to get an idea of what I need to do with the maps would be to print all maps in parallel like this:
Key Map0 Map1 Map2
F 1 null 2
B 2 3 null
C null null 5
...
That's not what I am actually doing, but you should get the idea.
The input maps are extremely variable. In fact, each call of this method uses a different set of them. Therefore I would not gain anything by caching the union of their keys.
My keys are all String instances. They are sort-of-interned on the heap using a separate HashMap, since they are pretty repetitive, therefore their hash code is already cached and most hash validations (when the HashMap implementation is checking whether two keys are actually equal, after their hash codes match) boil down to an identity comparison (==). The profiler confirms that only 0.5% of the CPU time is spent on String.equals() and String.hashCode().
EDIT 2:
Based on the suggestions in the answers, I made a few tests, profiling and benchmarking along the way. I ended up with roughly a 7% increase in performance. What I did:
I set the initial capacity of the HashSet to double the collective size of all input maps. This gained me something in the region of 1-2%, by eliminating most (all?) resize() calls in the HashSet.
I used Map.entrySet() for the map I am currently iterating. I had originally avoided this approach due to the additional code and the fear that the extra checks and Map.Entry getter method calls would outweigh any advantages. It turned out that the overall code was slightly faster.
I am sure that some people will start screaming at me, but here it is: Raw types. More specifically I used the raw form of HashSet in the code above. Since I was already using Object as its content type, I do not lose any type safety. The cost of that useless checkcast operation when calling HashSet.add() was apparently important enough to produce a 4% increase in performance when removed. Why the JVM insists on checking casts to Object is beyond me...
Can't provide a replacement for your approach but a few suggestions to (slightly) optimize the existing code.
Consider initializing the hash set with a capacity (the sum of the sizes of all maps). This avoids/reduces resizing of the set during an add operation
Consider not using the keySet() as it will always create a new set in the background. Use the entrySet(), that should be much faster
Have a look at the implementations of equals() and hashCode() - if they are "expensive", then you have a negative impact on the add method.
How you avoid using a HashSet depends on what you are doing.
I would only calculate the union once each time the input is changed. This should be relatively rare conmpared with the number of lookups.
// on an update.
Map<Key, Value> union = new LinkedHashMap<Key, Value>();
for (Map<Key, Value> map : input)
union.putAll(map);
// on a lookup.
Value value = union.get(key);
// process each key once
for(Entry<Key, Value> entry: union) {
// do something.
}
Option A is to use the .values() method and iterate through it. But I suppose you already had thought of it.
If the code is called so often, then it might be worth creating additional structures (depending of how often the data is changed). Create a new HashMap; every key in any of your hashmaps is a key in this one and the list keeps the HashMaps where that key appears.
This will help if the data is somewhat static (related to the frequency of queries), so the overload from managing the structure is relatively small, and if the key space is not very dense (keys do not repeat themselves a lot in different HashMaps), as it will save a lot of unneeded contains().
Of course, if you are mixing data structures it is better if you encapsulate all in your own data structure.
You could take a look at Guava's Sets.union() http://guava-libraries.googlecode.com/svn/tags/release04/javadoc/com/google/common/collect/Sets.html#union(java.util.Set,%20java.util.Set)

Finding the highest-n values in a Map

I have a large map of String->Integer and I want to find the highest 5 values in the map. My current approach involves translating the map into an array list of pair(key, value) object and then sorting using Collections.sort() before taking the first 5. It is possible for a key to have its value updated during the course of operation.
I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
Could I have some suggestions/alternatives on optimizing this please? Am happy to consider different data structures if there is benefit.
Thanks!
Well, to find the highest 5 values in a Map, you can do that in O(n) time where any sort is slower than that.
The easiest way is to simply do a for loop through the entry set of the Map.
for (Entry<String, Integer> entry: map.entrySet()) {
if (entry.getValue() > smallestMaxSoFar)
updateListOfMaximums();
}
You could use two Maps:
// Map name to value
Map<String, Integer> byName
// Maps value to names
NavigableMap<Integer, Collection<String>> byValue
and make sure to always keep them in sync (possibly wrap both in another class which is responsible for put, get, etc). For the highest values use byValue.navigableKeySet().descendingIterator().
I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
There is an approach in between that you can take as well. When a thread requests a "sorted view" of the map, create a copy of the map and then handle the sorting on that.
public List<Integer> getMaxFive() {
Map<String, Integer> copy = null;
synchronized(lockObject) {
copy = new HashMap<String, Integer>(originalMap);
}
//sort the copy as usual
return list;
}
Ideally if you have some state (such as this map) accessed by multiple threads, you are encapsulating the state behind some other class so that each thread is not updating the map directly.
I would create a method like:
private static int[] getMaxFromMap(Map<String, Integer> map, int qty) {
int[] max = new int[qty];
for (int a=0; a<qty; a++) {
max[a] = Collections.max(map.values());
map.values().removeAll(Collections.singleton(max[a]));
if (map.size() == 0)
break;
}
return max;
}
Taking advantage of Collections.max() and Collections.singleton()
There are two ways of doing that easily:
Put the map into a heap structure and retrive the n elements you want from it.
Iterate through the map and update a list of n highest values using each entry.
If you want to retrive an unknown or a large number of highest values the first method is the way to go. If you have a fixed small amount of values to retrieve, the second might be easier to understand for some programmers.
Personally, I prefer the first method.
Please try another data structure. Suppose there's a class named MyClass which its attributes are key (String) and value (int). MyClass, of course, needs to implement Comparable interface. Another approach is to create a class named MyClassComparator which extends Comparator.
The compareTo (no matter where it is) method should be defined like this:
compareTo(parameters){
return value2 - value1; // descending
}
The rest is easy. Using List and invoking Collections.sort(parameters) method will do the sorting part.
I don't know what sorting algorithm Collections.sort(parameters) uses. But if you feel that some data may come over time, you will need an insertion sort. Since it's good for a data that nearly sorted and it's online.
If modifications are rare, I'd implement some SortedByValHashMap<K,V> extends HashMap <K,V>, similar to LinkedHashMap) that keeps the entries ordered by value.

Categories

Resources