How to improve search in an array of HashMap

How to improve search in an array of HashMap - java

A search in any HashMap (or any hash based data structure) requires a single hash operation i-e O(1). However, when we have to search through an array of HashMaps then a single search will requires O(n) hash operations, where n is the size of array. I was wondering since array is a collections of consecutive memory locations, therefore there could be an efficient method to reduce the search from O(n) to O(1) in array of HashMap. Or we can design some object that in fact have the advantage of array as well as require single hash operation for search. Any suggestion ???
Consider the scenario, You are processing elements from different users and you want to keep them based on user profiles (separate from each other). The most memory efficient way is to keep them separate is using array of HashMap.

I've edited your question to include the hypothetical scenario of various user accounts, each with various properties. As mentioned in a different answer, the solution here is a map of maps. There are two separate searches occurring:
Find the right user
Find the right property of this user
Each of these searches can be done with a separate map.
The user map:
class Application {
...
Map<String, User> userMap = new HashMap<>();
public User getUser(String userName) {
return userMap.get(userName);
}
...
}
The property map:
class User {
...
Map<String, Property> propertyMap = new HashMap<>();
public Property getProperty(String propertyName) {
return propertyMap .get(propertyName);
}
...
}
Now to find the property named favoriteTowel of the user named Arthur Dent:
myApplication.getUser("Arthur Dent").getProperty("favoriteTowel");

The answer for this is in your question. When you're using an ordinary Array to store these elements, there's no way to get O(1) complexity when searching it.
You stated that the complexity of searching a Hash Map is O(1), so why not store those Hash Maps... in a Hash Map? I'll leave the implementation of that up to you, haha.

Go Go For java.util packages, Go Go for Java Collections
look at set (Hashset, enumset) and hash (HashMap,linkedhash...,idnetityhash..) based implementations. they have O(1) for contains()
Below are stats for reference.

Related

Is it a good idea to store data as keys in HashMap with empty/null values?

I had originally written an ArrayList and stored unique values (usernames, i.e. Strings) in it. I later needed to use the ArrayList to search if a user existed in it. That's O(n) for the search.
My tech lead wanted me to change that to a HashMap and store the usernames as keys in the array and values as empty Strings.
So, in Java -
hashmap.put("johndoe","");
I can see if this user exists later by running -
hashmap.containsKey("johndoe");
This is O(1) right?
My lead said this was a more efficient way to do this and it made sense to me, but it just seemed a bit off to put null/empty as values in the hashmap and store elements in it as keys.
My question is, is this a good approach? The efficiency beats ArrayList#contains or an array search in general. It works.
My worry is, I haven't seen anyone else do this after a search. I may be missing an obvious issue somewhere but I can't see it.

Since you have a set of unique values, a Set is the appropriate data structure. You can put your values inside HashSet, an implementation of the Set interface.
My lead said this was a more efficient way to do this and it made sense to me, but it just seemed a bit off to put null/empty as values in the hashmap and store elements in it as keys.
The advice of the lead is flawed. Map is not the right abstraction for this, Set is. A Map is appropriate for key-value pairs. But you don't have values, only keys.
Example usage:
Set<String> users = new HashSet<>(Arrays.asList("Alice", "Bob"));
System.out.println(users.contains("Alice"));
// -> prints true
System.out.println(users.contains("Jack"));
// -> prints false
Using a Map would be awkward, because what should be the type of the values? That question makes no sense in your use case,
as you have just keys, not key-value pairs.
With a Set, you don't need to ask that, the usage is perfectly natural.
This is O(1) right?
Yes, searching in a HashMap or a HashSet is O(1) amortized worst case, while searching in a List or an array is O(n) worst case.
Some comments point out that a HashSet is implemented in terms of HashMap.
That's fine, at that level of abstraction.
At the level of abstraction of the task at hand ---
to store a collection of unique usernames,
using a set is a natural choice, more natural than a map.

This is basically how HashSet is implemented, so I guess you can say it's a good approach. You might as well use HashSet instead of your HashMap with empty values.
For example :
HashSet's implementation of add is
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
where map is the backing HashMap and PRESENT is a dummy value.
My worry is, I haven't seen anyone else do this after a search. I may be missing an obvious issue somewhere but I can't see it.
As I mentioned, the developers of the JDK are using this same approach.

Java HashMap to store in different buckets internally

I have couple of scenarios related to storing of HashMap, which I am not aware how to accomplish.
Case 1: As there are buckets on which objects are saved, and hashcode will be taken into consideration while saving it. Now say, there are 5 buckets and I want to have my own control on which bucket to save it. Is there a way to achieve it? Say, By internal mechanism, it was going to be saved into bucket 4, but I wanted to save that particular object into bucket 1.
Case 2: Similarly, If I see that out of 5 buckets, 1 bucket was getting much more load than other, and I want to do a load balancing kind of job by moving it to different buckets. How can that be accomplished?

There is fundamentally no way to achieve load balancing in a hashtable. The quintessential property of this structure is direct access to exactly the bucket which must hold the requested key. Any balancing scheme would involve reshuffling the objects among buckets and destroy this property. This is the reason why good-quality hashcodes are vital to the proper operation of a hashtable.
Additionally note that you can't even control bucket selection by manipulating the hashCode() method of your objects, because hashcodes of any two equal objects must match, and because any self-respecting hashtable implementation will additionally shuffle the bits of the value retrieved from hashCode() to ensure better dispersion.

The implementations are designed so that you shouldn't have to worry about these details.
If you wanted to control these more carefully, then you can create your own class implementing Map.

With HashMap and with all Collections whose names start with Hash the more important part is the hasCode generated by the domain object that you are trying to store. That's why every object has a hashCode implementation(implicity with object.hashCode() or explicitely).
First of all HashMap tries to accomplish what you stated in case 2(sort of). If your hashCode implementation is good, meaning can produce evenly dispersed hashCode values for variety of objects than load of the buckets of HashMap is more or less evenly distributed, and you don't have to anything(other than writing a good hashCode function.). Also you can somehow manupulate the balance by implementing your hascode accordingly by producing same hashcode for objects that you want them to be in the same bucket.
If you want to have complete control on the internals of the hashMap than you should implement your own HashMap by implementing Map interface.

The underlying mechanism for bucket creation and placement are abstracted away.
For case 1, you can simply use objects as your keys for the bucket placement. For case 2, you cannot see the actual placement of objects directly.
Although, what you can do is use a Multimap which you can treat the keys as if they were buckets. It's basically a map from keys to collections. Here you can check any given key(bucket) and see how many items you have placed in there. Here you can satisfy requirements from both cases. This is probably as close as you're going to get without actually tampering with the internal bucketing mechanism.
From the link, here is a snippet:
public class MutliMapTest {
public static void main(String... args) {
Multimap<String, String> myMultimap = ArrayListMultimap.create();
// Adding some key/value
myMultimap.put("Fruits", "Bannana");
myMultimap.put("Fruits", "Apple");
myMultimap.put("Fruits", "Pear");
myMultimap.put("Vegetables", "Carrot");
// Getting the size
int size = myMultimap.size();
System.out.println(size); // 4
// Getting values
Collection<string> fruits = myMultimap.get("Fruits");
System.out.println(fruits); // [Bannana, Apple, Pear]
Collection<string> vegetables = myMultimap.get("Vegetables");
System.out.println(vegetables); // [Carrot]
// Iterating over entire Mutlimap
for(String value : myMultimap.values()) {
System.out.println(value);
}
// Removing a single value
myMultimap.remove("Fruits","Pear");
System.out.println(myMultimap.get("Fruits")); // [Bannana, Pear]
// Remove all values for a key
myMultimap.removeAll("Fruits");
System.out.println(myMultimap.get("Fruits")); // [] (Empty Collection!)
}

Creating Dictionary in java?

Everywhere on net, here is the way
Map<String, String> map = new HashMap<String, String>();
map.put("dog", "type of animal");
System.out.println(map.get("dog"));
My point is should it not be Treemap considering dictionary has to be sorted? Agreed lookup wont be optimized in case of Treemap but considering sorting its best data structure
UPDATE :- one more requirement is return the lexicographically nearest word if the word searched is not present . I am not sure how to achieve it?

If you need the map sorted by its keys, then use TreeMap, which "...provides guaranteed log(n) time cost for the containsKey, get, put and remove operations." If not, use the more general HashMap (which "...provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets..."), or one of the other Map implementations, depending on your need.

If you want to get value for given key and if the probability of having the exact match of key in hashmap is less then using hashmap wont give you benefit of direct lookup.
If using TreeMap you can get list of keys which is already ordered and can perform a binary search on the list. While searching compare key lexicographically. Continue binary search till the lexicographic distance between two keys is minimum or 0.

Dictionary is no longer a term used in the language. You'll get multiple answers.
I know that Objective-C uses a class called Dictionary that is as a Key / Value data structure. The fact that it's named Dictionary leads me to believe that is the ordering of the objects, I imagine the Key has to be a string or char
So, it depends on the entire question.
When someone says they want to create a Key/Value data structure that is ordered alphabetically, or a "Dictionary", the answer is:
TreeMap<String, Object> map = new TreeMap<>()
If someone is asking how to create a Key/Value object similar to a Dictionary in whatever language, they will likely get any of the java.util classes that implement the Map<K, V> interface, for example HashMap, TreeMap. A good answer would be a TreeMap.
In this case telling someone to use a HashMap is not debatable, because the answer is as vague as the question.

fast static key-value mapping

I have a set of unique key-value pairs, both key and value are strings. The number of pairs is very huge and finding the value of a certain string is extremely time-critical.
The pairs are computed beforehand and are given for a certain program. So i could just write a method containing:
public String getValue(String key)
{
//repeat for every pair
if(key.equals("abc"))
{
return "def";
}
}
but i am talking about more than 250,000 pairs and perhaps sorting them could be faster...
I am having a class that contains the getValue() method and can use its constructor, but has no connection to any database/file system etc. So every pair has to be defined inside the class.
Do you have any ideas that could be faster than a huge series of if-statements? Perhaps using a sorting map that gets the pairs presorted. Perhaps improve constructor-time by deserializing an already created map?
I would like your answers to contain a basic code example of your approach, I will comment answers with their corresponding time it took an a set of pairs!
Time-frame: for one constructor call and 20 calls of getValue() 1000 milliseconds.
Keys have a size of 256 and values have a size < 16

This is exactly what a hash table is made for. It provides O(1) lookup if implemented correctly, which means that as long as you have enough memory and your hash function and collision strategy are smart, it can get values for keys in constant time. Java has multiple hash-backed data structures, from the sounds of things a HashMap<String, String> would do the trick for you.
You can construct it like this:
Map<String, String> myHashTable = new HashMap<String, String>();
add values like this:
myHashTable.put("abcd","value corresponding to abcd");
and get the value for a key like this:
myHashTable.get("abcd");
You were on the right track when you had the intuition that running through all of the keys and checking was not efficient, that would be an O(n) runtime approach, since you'd have to run through all n elements.

Finding the highest-n values in a Map

I have a large map of String->Integer and I want to find the highest 5 values in the map. My current approach involves translating the map into an array list of pair(key, value) object and then sorting using Collections.sort() before taking the first 5. It is possible for a key to have its value updated during the course of operation.
I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
Could I have some suggestions/alternatives on optimizing this please? Am happy to consider different data structures if there is benefit.
Thanks!

Well, to find the highest 5 values in a Map, you can do that in O(n) time where any sort is slower than that.
The easiest way is to simply do a for loop through the entry set of the Map.
for (Entry<String, Integer> entry: map.entrySet()) {
if (entry.getValue() > smallestMaxSoFar)
updateListOfMaximums();
}

You could use two Maps:
// Map name to value
Map<String, Integer> byName
// Maps value to names
NavigableMap<Integer, Collection<String>> byValue
and make sure to always keep them in sync (possibly wrap both in another class which is responsible for put, get, etc). For the highest values use byValue.navigableKeySet().descendingIterator().

I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
There is an approach in between that you can take as well. When a thread requests a "sorted view" of the map, create a copy of the map and then handle the sorting on that.
public List<Integer> getMaxFive() {
Map<String, Integer> copy = null;
synchronized(lockObject) {
copy = new HashMap<String, Integer>(originalMap);
}
//sort the copy as usual
return list;
}
Ideally if you have some state (such as this map) accessed by multiple threads, you are encapsulating the state behind some other class so that each thread is not updating the map directly.

I would create a method like:
private static int[] getMaxFromMap(Map<String, Integer> map, int qty) {
int[] max = new int[qty];
for (int a=0; a<qty; a++) {
max[a] = Collections.max(map.values());
map.values().removeAll(Collections.singleton(max[a]));
if (map.size() == 0)
break;
}
return max;
}
Taking advantage of Collections.max() and Collections.singleton()

There are two ways of doing that easily:
Put the map into a heap structure and retrive the n elements you want from it.
Iterate through the map and update a list of n highest values using each entry.
If you want to retrive an unknown or a large number of highest values the first method is the way to go. If you have a fixed small amount of values to retrieve, the second might be easier to understand for some programmers.
Personally, I prefer the first method.

Please try another data structure. Suppose there's a class named MyClass which its attributes are key (String) and value (int). MyClass, of course, needs to implement Comparable interface. Another approach is to create a class named MyClassComparator which extends Comparator.
The compareTo (no matter where it is) method should be defined like this:
compareTo(parameters){
return value2 - value1; // descending
}
The rest is easy. Using List and invoking Collections.sort(parameters) method will do the sorting part.
I don't know what sorting algorithm Collections.sort(parameters) uses. But if you feel that some data may come over time, you will need an insertion sort. Since it's good for a data that nearly sorted and it's online.

If modifications are rare, I'd implement some SortedByValHashMap<K,V> extends HashMap <K,V>, similar to LinkedHashMap) that keeps the entries ordered by value.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to improve search in an array of HashMap - java

Go Go For java.util packages, Go Go for Java Collections look at set (Hashset, enumset) and hash (HashMap,linkedhash...,idnetityhash..) based implementations. they have O(1) for contains() Below are stats for reference.

Related

Is it a good idea to store data as keys in HashMap with empty/null values?

Java HashMap to store in different buckets internally

Creating Dictionary in java?

fast static key-value mapping

Finding the highest-n values in a Map

Categories

Resources