Search a Map for multiple keys in parallel - java

Given a Map<String, Collection<String>> up to 1M items. I know what to query that Map for 5K keys, of which I'm unsure whether they are in the map or not.
Currently, I'm using a TreeMap and search for each item, one by one. Which seems sub-optimal. Is there an, already, implemented way to query a Map for X keys?
The result of the search should be a subset of items, which are found in the Map, for further querying - ordering is irrelevant.
I was hoping to use stream, but, apparently, that's only for Collections.
Note: the number are impressions, from what I've seen in the map, probably not the upper limit...

There is no better way than querying your map for each element:
List<V> vs = keysToSearch.stream()
.map(k -> map.get(k))
.filter(Objects::nonNull)
.collect(Collectors.toList())
You can also try using a parallelStream if your data structures work in a concurrent environment.

assuming memory is not a problem for you. here is one way of doing it.
by using retainAll
Set<String> mapKeys = new HashSet<String>(myMap.keySet());
mapKeys.retainAll(my5kKeys); //<--- all keys that match the my5kKeys...

If you have M items in your map, and K keys you are searching for, then your best-case efficiency is O(min(M, K)). If M is very large, the best you can do is to check each K (perhaps in parallel, but you must do each).
If it were the case that M turned out to be much smaller than K, then you could do better by only checking through all M values to see if they existed in K. In any event, you want to check the smaller set's values against the larger.

There is no better way then to create a loop and search for all the keys individually.
A method like retainAll is just a wrapper around such a loop written by somebody else.
However, the important thing is to use a HashMap instead of a TreeMap. Hashmaps contains is O(1) while Treemap takes O(log(n)).
If you need the sorted collection for something else, you could put the data in both a TreeMap and a HashMap.

Related

What thread safe java Data structure or custom implementation can let me get the position of a String in constant time

I need to get the position of a string in a List in constant time,
but do not find any such Java implementation.
Maybe is there no need for that, given current computers memory size ;-)
LinkedHashMap are interesting but do not maintain a position.(ref)
Any clue ?
A custom implementation could keep a list and a map in parallel. The map containing the strings as keys and their list index as values. Assuming each string occurs only once.
As reminded by #Reto Höhener like said in the ref of my question
to keep original insertion order index positions, one can mirror the keys in the KeySet in something like an ArrayList, keep it in sync with updates to the HashMap and use it for finding position
for a List:
Filling an ordered ConcurrentMap like ConcurrentSkipListMap with a list of positions as values should do the trick.
Cafeine could also do the job but I still need to look at it.
final int[] index = {0};
Stream<ComplexObject> t = someListOfComplexObject.stream();
ConcurrentMap<String, List<Integer>> m =
t.collect(Collectors.groupingBy(
e -> e.getComplexStringElem(),
Collectors.mapping(
e -> index[0]++,
Collectors.toList()
),
ConcurrentSkipListMap::new));

A Collection that associates keys to values, and where both a key and a value can be obtained by index

By indexed I mean keys and values can be accessed via an index representing the order in which they were inserted into the collection.
I need a collection that behaves like a Map<K, V>, but also a List<K>(Read-Only) and a List<V>(also Read-Only). My naive implementation is to wrap a HashMap<K, V> and 2 ArrayList, but that leads to massive data redundancy and poor perfomance. Then I thought about LinkedHashMap<K, V>, which would work a lot better in this case, but the getByIndex operations would not perform well, because that would require navigating the internal Linked Nodes, which for small quantities of data is perfectly acceptable, but I'm not exactly sure how will the list be used by client code.
In short, is there something that suits my requirements better than the alternative?
EDIT: If I had something like pointer arithmetics and low level functions like memcpy and a runtime sizeof operator resolving the sizes of K and V, then maybe I could come up with a very efficient implementation. Are there any equivalents to any of that in Java?
I can suggest you few indirect ways.
You can create HashMap < Integer,HashMap< K,V > >. You can insert in this map keeping order as key and then can put the Key-value pair HashMap as value.
You can simply have a single ArrayList<K> and a HashMap<K,V>. For each entry to the map you can insert the key in the array list.
You can use (as you have said in the question itself) LinkedHashMap and can get the iterator or can use for each enhanced for loop for iteration. This way of iterating is efficient and for each step of iteration entire list is not iterated. But you can only iterate and not get the random indexed entry.
If third-party libraries are fair game, Guava's ImmutableMap does this nicely if you don't need mutation. Once it's created, you can use map.entrySet().asList(), map.keySet().asList(), and map.values().asList() to get, in O(1), random-access lists of the entries, keys, and values that support get(index) in O(1).

Which data structure should I use?

I want to store some words and their occurrence times in a website, and I don't know which structure I should use.
Every time I add a word in the structure, it first checks if the word already exists, if yes, the occurrence times plus one, if not, add the word into the structure. Thus I can find an element very fast by using this structure. I guess I should use a hashtable or hashmap, right?
And I also want to get a sorted list, thus the structure can be ranked in a short time.
Forgot to mention, I am using Java to write it.
Thanks guys! :)
A HashMap seems like it would suit you well. If you need a thread-safe option, then go with ConcurrentHashMap.
For example:
Map<String, Integer> wordOccurenceMap = new HashMap<>();
"TreeMap provides guaranteed O(log n) lookup time (and insertion etc), whereas HashMap provides O(1) lookup time if the hash code disperses keys appropriately. Unless you need the entries to be sorted, I'd stick with HashMap." -part of Jon Skeet's answer in TreeMap or HashMap.
TreeMap is the better solution, if you want both Sorting functionality and counting words.
Custom Trie can make more efficient but it's not required unless you are modifying the words.
Define a Hashmap with word as the key and counter as the value
Map<String,Integer> wordsCountMap = new HashMap<String,Integer>();
Then add the logic like this:
When you get a word, check for it in the map using containsKey method
If key(word) is found, fetch the value using get and increment the value
If key(word) is not found, add the value using thw word as key and put with count 1 as value
So, you could use HashMap, but don't forget about multythreading. Is this data structure could be accessed throught few thread? Also, you could use three map in a case that data have some hirarchy (e.g. in a case of rakning and sort it by time). Also, you could look throught google guava collections, probably, they will be more sutabile for you.
Any Map Implementation Will Do. If Localized Changes prefer HashMap otherWise
ConcurrentHashMap for multithreading.
Remember to use any stemming Library.
stemming library in java
for example working and work logically are same word.
Remember Integer is immutable see example below
Example :
Map<String, Integer> occurrence = new ConcurrentHashMap<String, Integer>();
synchronized void addWord(String word) { // may need to synchronize this method
String stemmedWord = stem(word);
Integer count = occurrence.get(stemmedWord)
if(count == null) {
count = new Integer(0);
}
count ++;
occurrence.put(stemmedWord, count);
**// the above is necessary as Integer is immutable**
}

java concurrent map sorted by value

I'm looking for a way to have a concurrent map or similar key->value storage that can be sorted by value and not by key.
So far I was looking at ConcurrentSkipListMap but I couldn't find a way to sort it by value (using Comparator), since compare method receives only the keys as parameters.
The map has keys as String and values as Integer. What I'm looking is a way to retrieve the key with the smallest value(integer).
I was also thinking about using 2 maps, and create a separate map with Integer keys and String values and in this way I will have a sorted map by integer as I wanted, however there can be more than one integers with the same value, which could lead me into more problems.
Example
"user1"=>3
"user2"=>1
"user3"=>3
sorted list:
"user2"=>1
"user1"=>3
"user3"=>3
Is there a way to do this or are any 3rd party libraries that can do this?
Thanks
To sort by value where you can have multiple "value" to "key" mapping, you need a MultiMap. This needs to be synchronized as there is no concurrent version.
This doesn't meant the performance will be poor as that depends on how often you call this data structure. e.g. it could add up to 1 micro-second.
I recently had to do this and ended up using a ConcurrentSkipListMap where the keys contain a string and an integer. I ended up using the answer proposed below. The core insight is that you can structure your code to allow for a duplicate of a key with a different value before removing the previous one.
Atomic way to reorder keys in a ConcurrentSkipListMap / ConcurrentSkipListSet?
The problem was to keep a dynamic set of strings which were associated with integers that could change concurrently from different threads, described below. It sounds very similar to what you wanted to do.
Is there an embeddable Java alternative to Redis?
Here's the code for my implementation:
https://github.com/HarvardEconCS/TurkServer/blob/master/turkserver/src/main/java/edu/harvard/econcs/turkserver/util/UserItemMatcher.java
The principle of a ConcurrentMap is that it can be accessed concurrently - if you want it sorted at any time, performance will suffer significantly as that map would need to be fully synchronized (like a hashtable), resulting in poor throughput.
So I think your best bet is to return a sorted view of your map by putting all elements in an unmodifiable TreeMap for example (although sorting a TreeMap by values needs a bit of tweaking).

Java HashMap adds value to the head of the list

I was working on java HashMaps and found that it adds values to the head of the list. For example ,
hm.put(mike,2);
hm.put(andrew,3);
Now,if i print the hasmap using iterator,i get
andrew 3
mike 2
I want the items to be added in the FIFO fashion rather than LIFO fashion ... Is there a way to do it?
The Map abstraction in Java does not play well with notions of LIFO or FIFO. These concepts primarily apply to ordered sequences, while Maps are stored in an ordering that is entirely independent of the orde in which the values are inserted in order to maximize efficiency. For example, the HashMap uses hashing to store its values, and the more randomly the hash function distributes its values the better the performance. Similarly, the TreeMap uses a balanced binary search tree, which stores its values in sorted order to guarantee fast lookups.
However, Java does have a really cool class called the LinkedHashMap that I believe is exactly what you're looking for. It gives the speed of a HashMap while guaranteeing a predictable traversal order which is defined by the order in which you insert the elements.
Hope this helps!
Try using a LinkedHashMap instead. I don't think HashMaps guarantee order.
LinkedHashMap<String,String> lHashMap = new LinkedHashMap<String,String>();
lHashMap.put("1", "One");
lHashMap.put("2", "Two");
lHashMap.put("3", "Three");
Collection c = lHashMap.values();
Iterator itr = c.iterator();
while (itr.hasNext()){
System.out.println(itr.next());
}
output
One
Two
Three
Do you want to use a Queue?
http://download.oracle.com/javase/6/docs/api/java/util/Queue.html
HashMaps are not ordered, the fact that you are getting them returned from the iterator in the 'wrong' order is just a function of how the hashing is happening on the key.
How specifically do you want to use this datastructure?

Categories

Resources