I have a large map of String->Integer and I want to find the highest 5 values in the map. My current approach involves translating the map into an array list of pair(key, value) object and then sorting using Collections.sort() before taking the first 5. It is possible for a key to have its value updated during the course of operation.
I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
Could I have some suggestions/alternatives on optimizing this please? Am happy to consider different data structures if there is benefit.
Thanks!
Well, to find the highest 5 values in a Map, you can do that in O(n) time where any sort is slower than that.
The easiest way is to simply do a for loop through the entry set of the Map.
for (Entry<String, Integer> entry: map.entrySet()) {
if (entry.getValue() > smallestMaxSoFar)
updateListOfMaximums();
}
You could use two Maps:
// Map name to value
Map<String, Integer> byName
// Maps value to names
NavigableMap<Integer, Collection<String>> byValue
and make sure to always keep them in sync (possibly wrap both in another class which is responsible for put, get, etc). For the highest values use byValue.navigableKeySet().descendingIterator().
I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.
There is an approach in between that you can take as well. When a thread requests a "sorted view" of the map, create a copy of the map and then handle the sorting on that.
public List<Integer> getMaxFive() {
Map<String, Integer> copy = null;
synchronized(lockObject) {
copy = new HashMap<String, Integer>(originalMap);
}
//sort the copy as usual
return list;
}
Ideally if you have some state (such as this map) accessed by multiple threads, you are encapsulating the state behind some other class so that each thread is not updating the map directly.
I would create a method like:
private static int[] getMaxFromMap(Map<String, Integer> map, int qty) {
int[] max = new int[qty];
for (int a=0; a<qty; a++) {
max[a] = Collections.max(map.values());
map.values().removeAll(Collections.singleton(max[a]));
if (map.size() == 0)
break;
}
return max;
}
Taking advantage of Collections.max() and Collections.singleton()
There are two ways of doing that easily:
Put the map into a heap structure and retrive the n elements you want from it.
Iterate through the map and update a list of n highest values using each entry.
If you want to retrive an unknown or a large number of highest values the first method is the way to go. If you have a fixed small amount of values to retrieve, the second might be easier to understand for some programmers.
Personally, I prefer the first method.
Please try another data structure. Suppose there's a class named MyClass which its attributes are key (String) and value (int). MyClass, of course, needs to implement Comparable interface. Another approach is to create a class named MyClassComparator which extends Comparator.
The compareTo (no matter where it is) method should be defined like this:
compareTo(parameters){
return value2 - value1; // descending
}
The rest is easy. Using List and invoking Collections.sort(parameters) method will do the sorting part.
I don't know what sorting algorithm Collections.sort(parameters) uses. But if you feel that some data may come over time, you will need an insertion sort. Since it's good for a data that nearly sorted and it's online.
If modifications are rare, I'd implement some SortedByValHashMap<K,V> extends HashMap <K,V>, similar to LinkedHashMap) that keeps the entries ordered by value.
Related
I am trying to find ways to reduce the code repetition. Like this code I have multiple Map elements in my class. Please suggest me if you have something..!
public Class MyObject{
private Map map1;
private Map map2;
private Map map3;
//having setter/getter
}
importDatabase(){
//here i have data in the map1, map2, map3 elements
importMap(MyObject.getMap1());
importMap(MyObject.getMap2());
importMap(MyObject.getMap3());
}
importMap(Map map){
//want to insert map objects into database at one go, instead of creating 3 methods for each type
Iterator iterator = map.values().iterator;
while(iterator.hasNext(){
}
}
If you have many similar object of the same type (maps in your case) and perform the same operations on all of them (put them in a database), a simple and usually viable way to avoid repetition is to store them in an array or in an ArrayList (if you don't know how many they are in advance) and perform the operations in a for or foreach loop.
With three elements it isn't always worth the effort and can actually make the code less readable than before. This will only make your code shorter and not improve performance (I assumed this is what you meant with at one go).
I am using ArrayMap for first time in my project and I thought it works just like an array. I expected when I use .put method it inserts it at next index.
But in my case this is not true - after I added all elements one by one the first element I added ended up at index 4 which is kind of strange.
Here are the first three steps which I add elements:
1 - Salads:
2 - Soups:
3 - Appetizers:
So somehow on second step "Soup" element was inserted in index 0 instead of 1 as I was expecting, but strangely on third step "Appetizers" was inserted as expected after "Soup".
This is the code I am using to push key and value pair:
function ArrayMap<String, DMType> addElement(String typeKey, DMType type) {
ArrayMap<String, DMType> types = new ArrayMap<>();
types.put(typeKey, type);
return types;
}
Am I missing something about the behavior of ArrayMap?
Yeah it is misleading because of the name but ArrayMap does no gurantee order unlike arrays.
ArrayMap is a generic key->value mapping data structure that is
designed to be more memory efficient than a traditional HashMap.
ArrayMap is actually a Map:
public class ArrayMap extends SimpleArrayMap implements Map
If you want the Map functionality with order guranteed use LinkedHashMap instead.
LinkedHashMap defines the iteration ordering, which is normally the
order in which keys were inserted into the map (insertion-order).
documentation
I thought it works just like an array
No, it works like a map, because it is a map. It is similar to a HashMap, but more memory efficient for smaller data sets.
It's order shouldn't and doesn't matter. Under the hood, it is implemented using
an array which has an order since arrays do. This inherently gives the ArrayMap an order, but that is not part of it's API anyway. Just like which memory slot your Java objects are in, you shouldn't care about the order here either.
It doesn't work as an array, I don't see Array in the name but Map and the documentation clearly states that behaves as a generic key->value mapping, more efficient (memory wise) than traditional HashMap implementation.
Actually I don't see why you care about the order compared to the insertion one. Data is private inside the class and you have no way to obtain the element by the index, so you are basically wondering about a private implementation which is irrelevant for its usage.
If you really want to understand how it stores its data you should take a look at the source code.
ArrayMap does NOT work like an Array, instead, it works like a HashMap with performance optimizations.
The internal sequence of the key-value pair is not guaranteed as it is NOT part of the contract.
In your case, what you really want to use is probably an ArrayList<Element>, where the Element class is defined like this:
public class Element{
private final String typeKey;
private final DMType type;
public Element(String typeKey, DMType type){
this.typeKey = typeKey;
this.type = type;
}
}
If you don't want a new Class just to store the result, and you want to keep the sequence, you can use a LinkedHashMap<String, DMType>. As the document specifies:
Class LinkedHashMap
Hash table and linked list implementation of the Map interface, with predictable iteration order. This implementation differs from HashMap in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order). Note that insertion order is not affected if a key is re-inserted into the map. (A key k is reinserted into a map m if m.put(k, v) is invoked when m.containsKey(k) would return true immediately prior to the invocation.)
Here is my problem (simplified):
Suppose we have a class:
public class MyClass{
String name;
Double amount;
String otherAttribute;
}
And a List<MyClass> myList
Suppose we have 2 elements from myList. Let's say object1 and object2
What I would like to do is:
if (object1.name.equals(object2.name){
//add amount of object2 to object1
//remove object 2 from the list
}
Considering I have a large list (maybe 100 elements) and I would like to find the best and less consuming way to do what I want.
What would you suggest ?
EDIT:
Yes 100 items is not large, but I would call this method (of merging similar objects) many times for many different sized lists. So that's way I would like to find the best practice for this.
I can't override equals or hashCode methods of MyClass, unfortunately (client requirement)
I'd add the objects to a HashMap where the name is the key and MyClass is the value being stored. Loop through each object in your list to add them to the map. If the name isn't in the map, just add the name, object pair. If it is already in the map, add the amount to the object already stored. When the loop completes, extract the objects from the map.
100 elements is a tiny size for a list, considering you're not going to repeat the operation some hundreds of thousands times. If it's the case, I'd consider creating a data structure indexing the list items by the search property (Map for instance), or ordering it if suitable and using an efficient search algorithm.
One approach (as suggested by Bill) would be to traverse the List adding every element to a Map, with the name property as key. You can take advantage of put's return to know if a name has been previously put into the map, and add the previosuly accumulated amounts in the current element. Finally, you could use values() to get the List without duplicates.
For instance:
List<MyClass> l;
Map<String, Myclass> m = new HashMap<MyClass>();
for (MyClass elem : l) {
MyClass oldElem = m.put(elem.getName(), elem);
if (oldElem != null) {
elem.setAmount(elem.getAmount() + oldElem.getAmount());
}
}
l = new ArrayList<MyClass>(m.values());
If you need to preserve order in the list, consider using a LinkedHashMap.
This is an O(n^2) problem unfortunately. You need to compare n elements to n-1 other elements. There is no way to do this but to brute force it.
If you used a HashMap however, you could check the map for an element before adding it to the Map which is an O(1) operation. It would look something like this:
HashMap<String, MyClass> map = new HashMap<String, MyClass>();
when you add an element:
if (map.get(obj1.name) != null) {
var obj2 = map.get(obj1.name);
obj2.amount = obj2.amount + obj1.amount;
map.put(obj1.name, obj2);
}
'Large' is relative, 100 items is definitely not large, imagine if you had to process a stream of 1.000.000 items/second. Then you would redefine large :D
In your example, what I think would be good to avoid would be to create a Set of your items' names. Searching a java HashSet takes O(1), so if an objects' name exists in the hash set, then update it on the list. An even better solution would be to create a HashMap, on which you could say e.g.
if(mymap.contains(thename)){
mymap.put(thename, newSum);
}
this being an example of how you could use it. Here's a link to get you started: http://java67.blogspot.gr/2013/02/10-examples-of-hashmap-in-java-programming-tutorial.html
I suggest to optimize (if possible) by not even doing the .add() to the list if an element with the same name exists. Using one of the hash based collections in combination with a proper equals() & hashCode() implementation based on MyClass.name should also give you somewhat good performance.
First, since you cannot override equals or hashCode, then you need to have the function that will do this functionality in the same package as your MyClass class, since no accessor methods are defined in MyClass
Second, try to have your items in a LinkedList, so that you can remove repeating elements from that list really quick without having to move around the other items.
Use a map to keep track of the amount that corresponds to a given name, while iterating the list, and removing repeating elements at the same time. In this way you don't have to create a new list.
List<MyClass> myClass_l;
Map<String, MyClass> nameMyClass_m = new HashMap<String, MyClass>();
for (Iterator<MyClass> iterator = myClass_l.iterator(); iterator.hasNext(){
MyClass m = iterator.next();
if (nameAmount_m.contains(m.name)){
MyClass firstClass = m.get(m.name);
firstClass.amount += m.amount;
iterator.remove();
}
else{
nameMyClass_m.put(m.name, m);
}
}
By the time you have finished the loop, you will have the items you want in your original list.
I want to store some words and their occurrence times in a website, and I don't know which structure I should use.
Every time I add a word in the structure, it first checks if the word already exists, if yes, the occurrence times plus one, if not, add the word into the structure. Thus I can find an element very fast by using this structure. I guess I should use a hashtable or hashmap, right?
And I also want to get a sorted list, thus the structure can be ranked in a short time.
Forgot to mention, I am using Java to write it.
Thanks guys! :)
A HashMap seems like it would suit you well. If you need a thread-safe option, then go with ConcurrentHashMap.
For example:
Map<String, Integer> wordOccurenceMap = new HashMap<>();
"TreeMap provides guaranteed O(log n) lookup time (and insertion etc), whereas HashMap provides O(1) lookup time if the hash code disperses keys appropriately. Unless you need the entries to be sorted, I'd stick with HashMap." -part of Jon Skeet's answer in TreeMap or HashMap.
TreeMap is the better solution, if you want both Sorting functionality and counting words.
Custom Trie can make more efficient but it's not required unless you are modifying the words.
Define a Hashmap with word as the key and counter as the value
Map<String,Integer> wordsCountMap = new HashMap<String,Integer>();
Then add the logic like this:
When you get a word, check for it in the map using containsKey method
If key(word) is found, fetch the value using get and increment the value
If key(word) is not found, add the value using thw word as key and put with count 1 as value
So, you could use HashMap, but don't forget about multythreading. Is this data structure could be accessed throught few thread? Also, you could use three map in a case that data have some hirarchy (e.g. in a case of rakning and sort it by time). Also, you could look throught google guava collections, probably, they will be more sutabile for you.
Any Map Implementation Will Do. If Localized Changes prefer HashMap otherWise
ConcurrentHashMap for multithreading.
Remember to use any stemming Library.
stemming library in java
for example working and work logically are same word.
Remember Integer is immutable see example below
Example :
Map<String, Integer> occurrence = new ConcurrentHashMap<String, Integer>();
synchronized void addWord(String word) { // may need to synchronize this method
String stemmedWord = stem(word);
Integer count = occurrence.get(stemmedWord)
if(count == null) {
count = new Integer(0);
}
count ++;
occurrence.put(stemmedWord, count);
**// the above is necessary as Integer is immutable**
}
Suppose I wish to check HashMap entry and then replace it:
if( check( hashMap.get(key) ) ) {
hashMap.put(key, newValue);
}
this will cause search procedure inside HashMap to run two times: once while get and another one while put. This looks ineffective. Is it possible to modify value of already found entry of Map?
UPDATE
I know I can make a wrapper and I know I have problems to mutate entry. But the question is WHY? May be HashMap remembers last search to improve repeated one? Why there are no methods to do such operation?
EDIT: I've just discovered that you can modify the entry, via Map.Entry.setValue (and the HashMap implementation is mutable). It's a pain to get the entry for a particular key though, and I can't remember ever seeing anyone do this. You can get a set of the entries, but you can't get the entry for a single key, as far as I can tell.
There's one evil way of doing it - declare your own subclass of HashMap within the java.util package, and create a public method which just delegates to the package-private existing method:
package java.util;
// Please don't actually do this...
public class BadMap<K, V> extends HashMap<K, V> {
public Map.Entry<K, V> getEntryPublic(K key) {
return getEntry(key);
}
}
That's pretty nasty though.
You wouldn't normally modify the entry - but of course you can change data within the value, if that's a mutable type.
I very much doubt that this is actually a performance bottleneck though, unless you're doing this a heck of a lot. You should profile your application to prove to yourself that this is a real problem before you start trying to fine-tune something which is probably not an issue.
If it does turn out to be an issue, you could change (say) a Map<Integer, String> into a Map<Integer, AtomicReference<String>> and use the AtomicReference<T> as a simple mutable wrapper type.
Too much information for a comment on your question. Check the documentation for Hashmap.
This implementation provides constant-time performance for the basic
operations (get and put), assuming the hash function disperses the
elements properly among the buckets. Iteration over collection views
requires time proportional to the "capacity" of the HashMap instance
(the number of buckets) plus its size (the number of key-value
mappings). Thus, it's very important not to set the initial capacity
too high (or the load factor too low) if iteration performance is
important.
Constant time means that it always requires the same amount of time to do the get and put operations [O(1)]. The amount of time that is going to be required is going to be linear based on how many times you need to loop [O(n)].
You can change the entry if it is mutable. One example of where you might do this is
private final Map<String, List<String>> map = new LinkedHashMap<>();
public void put(String key, String value) {
List<String> list = map.get(key);
if (list == null)
map.put(key, list = new ArrayList<>());
list.add(value);
}
This allows you to update a value, but you can't find and replace a value in one operation.
Take a look at trove ( http://trove4j.sourceforge.net/ ), their maps do have several methods that might be what you want:
adjustOrPut
putIfAbsent
I don't know how this is implemented internally, but i would guess that since trove is made to be highly performant, there will be only one lookup.