which datastructure for this hashmap scenario - java

I have a scenario where i store values in a hashmap.
Keys are strings like
fruits
fruits_citrus_orange
fruits_citrus_lemon
fruits_fleshly_apple
fruits_fleshly
fruits_dry
and so on.
Values are some objects. Now for a given input say fruits_fleshly i need to retrieve all cases where it starts with "fruits_fleshly"
In the above case I need to fetch
fruits_fleshly_apple
fruits_fleshly
One way to do this is by doing String.indexOf over all the keys. Is there any other effective way to do this instead of iterating over all the keys in a map

though these are strings, but to me, it looks like these are certain categories & sub categories, like fruit, fruit-freshly, fruit-citrus etc..
If that is a case you can instead implement a Tree data-structure. This would be most effective for search operation.
since Tree has a parent-child structure, there is a root node & child node. You can have a structure like this:
(0) (1) (2)
fruit
|_____citrus
| |_____lemon
| |_____orange
|
|_____freshly
|_____apple
|_____
in this structure, say if you want to search for citrus fruit, you can just go to citrus, and list all its child. And finally you can construct full name by concatenating the name as a path from root to leaves.

Iterating the map seems quite simple and straight-forward way of doing this. However, since you don't want to iterate over keys on your own, you can use Guava's Maps#filterEntries, if you are ok with using 3rd party library.
Here's how it would work:
Map<String, Object> = Maps.filterEntries(
yourMap,
Predicate.containsPattern("^fruits_fleshly"));
But, that would too iterate over the map in the backyard. So, iteration is still there, if you are bothered about efficiency.

Since HashMap doesn't maintain any order for its keys it's not a very good choice for this problem. A better choice is the TreeMap: it has methods for retrieving a sub map for a range of keys. These methods run in O(log n) time (n number of entries) so it's better than iterating over the keys.
Map subMap = myMap.subMap("fruits_fleshly", true, "fruits_fleshly\uffff", true);

The nature of a hashmap means that there's no way to do a "like" comparison on keys - you have to iterate over them all to find where key.startsWith(input).
I suppose you could nest hashmaps and split up your keys. E.g.,
{
"fruits":{
"citrus":{
"orange":(value),
"lemon":(value)
},
"fleshly":{
"apple":(value),
"":(value)
}
}
}
...etc.
The performance implications are probably horrific on a small scale, but that may not matter in a homework context but maybe not so bad if you're dealing with a lot of data and only a couple layers of nesting.
Alternatively, create a Category object with a List of Categories (sub-categories) and a List of entries.

I believe Radix Trie is what you are looking for. It is similar idea as #ay89 solution.
You can just use this open source library Radix Trie example. It perform better than O(log(N)). You will be able to find a hashmap assigned to a key in average constant time (number of underscores in your search key string) with a decent implementation of Radix Trie.fruits
fruits_citrus_orange
fruits_citrus_lemon
fruits_fleshly_apple
fruits_fleshly
fruits_dry
Trie<String, Map> trie = new PatriciaTrie<>;
trie.put("fruits", hashmap1);
trie.put("fruits_citrus_orange", hashmap2);
trie.put("fruits_citrus_lemon", hashmap3);
trie.put("fruits_fleshly_apple", hashmap4);
trie.put("fruits_fleshly", hashmap5);
Map.Entry<String, Map> entry = trie.select("fruits_fleshy");
If you just want one hashmap to be return by select you might be able to get slightly better performance if you implement your own Radix Trie.

Related

Building a Java TreeMap from a sorted list of elements

I have a text file containing a sorted list of words being my dictionary.
I would like to use a TreeMap in order to have log(n) as average cost when I have to see if a words belongs to the dictionary or not (that is containsKey).
I have read of the Black-Read tree being behind the scenes of the TreeMap, so it is self balancing.
My question is: which is the best way to feed the TreeMap with the list of words?
I mean: feeding it with a sorted list should be the worst case scenario for a binary tree, because it have to balance almost every other word, haven't it?
The list of words can vary from 7K to 150K in number.
TreeMap hides its implementation details, as good OO design prescribes, so to really optimize for your use case will probably be hard.
However, if it is an option to read all items into an array/list before adding them to your TreeMap, you can add them "inside out": the middle element of the list will become the root, so add it first, and then recursively add the first half and second half in the same manner. In fact, this is the strategy that the TreeMap(SortedMap) constructor follows.
If it is not an option to read all items, I think you have no other option than to simply put your entries to the map consecutively, or write your own tree implementation so that you have more control over how to generate it. If you at least know the number of items beforehand, you should be able to generate a balanced tree without ever having to rebalance.
If you do not need the extra features of a TreeMap, you might also consider using a HashMap, which (given a good hash function for your keys) even has O(1) access.

Data structure in Java that supports quick search and remove in array with duplicates

More specifically, suppose I have an array with duplicates:
{3,2,3,4,2,2,1,4}
I want to have a data structure that supports search and remove the first occurrence of some value faster than O(n), say if the value is 4, then it becomes:
{3,2,3,2,2,1,4}
I also need to iterate the list from head according to the same order. Other operations like get(index) or insert are not needed.
You can use O(n) time to record the original data(say it's an int[]) in your data structure, I just need the later search and remove faster than O(n).
"Search and remove" is considered as ONE operation as shown above.
If I have to make it myself, I would use a LinkedList to store the data, and HashMap to map every key to a list of all occurrence of nodes together with their previous and next ones.
Is it a right approach? Are there any better choices already there in Java?
The data structure you describe, essentially a hybrid linked list and map, I think is the most efficient way of handling your stated problem. You'll have to keep track of the nodes yourself, since Java's LinkedList doesn't provide access to the actual nodes. The AbstractSequentialList may be helpful here.
The index structure you'll need is a map from an element value to the appearances of that element in the list. I recommend a hash table from hashCode % modulus to a linked list of (value, list of main-list nodes).
Note that this approach is still O(n) in the worst case, when you have universal hash collisions; this applies whether you use open or closed hashing. In the average case it should be something closer to O(ln(n)), but I'm not prepared to prove that.
Consider also whether the overhead of keeping track of all of this is really worth the gains. Unless you've actually profiled running code and determined that a LinkedList is causing problems because remove is O(n), stick with that until you do.
Since your requirement is that the first occurrence of the element should be removed and the remaining occurrences retained, there would be no way to do it faster than O(n) as you would definitely have to move through to the end of the list to find out if there is another occurrence. There is no standard api from Oracle in the java package that does this.

most efficient Java data structure for searching triples of strings

Suppose I have a large list (around 10,000 entries) of string triples as such:
car noun yes
dog noun no
effect noun yes
effect verb no
Suppose I am presented with a string double - for example, (effect, verb) - and I need to quickly look in the list to see if the pair appears and, if it does, whether its value is yes or no. (For this example the double does appear and the value is "no".)
What is the best data structure in Java to store the list and the most efficient way to perform the search? I am running hundreds of thousands of these searches so speed is of the essence.
Thanks!
You might consider using a HashMap<YourDouble, String>. Searches will be O(1).
You could either create an object, YourDouble which holds the first two values, or else append one to the other -- if values will still be unique -- and use HashMap<String, String>.
I would create a HashMultimap for each type of search you want, e.g. "all three", "each pair", and "each single field". When you build the list, populate all the different maps, then you can fetch from whichever map is appropriate for your query.
(The downside is that you'll need a type for at least each arity, e.g. use just String for the "single field" maps, but a Pair for the two-field maps, and a Triple for the three-field map.)
You could use a HashMap where the key is the concatenation of the first two strings, the ones which you'll use for lookups, and the value is a Boolean, representing the yes and no strings.
Alternatively, it seems the words in the second column would be fewer, since they represent categories. You could have a HashMap<String, HashMap<String, Boolean>> where you first index by e.g. "noun", "verb" etc. and then you index by e.g. "car", "dog", "effect", to get to your boolean. This would probably be more space-efficient.
10k doesn't seem that large to me. Have you tried a DB?
The place to look for information like this is the Semantic Web. A number of projects work on Triple Stores of just this type. There's a list at the bottom of the Triple Store page of implementations.
As far as java is concerned your algorithms are almost certainly going to be language dependent and if you find a good algorithm implemented in C its java port will also be fast.
Also, what's your data set look like? Are there a lot of 2 matches such that subject and verb are often the same? How many matches are you expecting to get? MapReduce will work work well for finding one match in 10k but won't work as well doing a query that returns a 8k of 10k where the query can't be easily partitioned.
There's a query language made just for this problem too: SPARQL. The bigdata blog has some good insights, though again 10k doesn't seem that large.

Efficient way to subtract values in two HashMaps by key

I am wondering how to efficiently subtract the values of two maps when their keys match. Currently I have 2 HashMap<String,Integer> and do it like this:
for (String key: map1.keySet()){
if (map2.keySet().contains(key)){
//subtract
}
}
Is there a better way to do it?
Theoretically speaking, this is about as fast as it can be done unless you can somehow do a faster than O(n) way of finding the matching keys between the two HashMaps.
Iterate over keys in first map's keySet() - O(n)
See if key is in other map - O(1)
Do your operation - O(1)
Realise this is an old thread but do check out guava from google
https://code.google.com/p/guava-libraries/wiki/CollectionUtilitiesExplained#Maps
You can use Map.difference and then get the entries in common, only in left, right etc.
I think there isn't a better method unless you use a different approach, and/or different data structures. You can for example create a class named ValuePair that can contain (up to) two values, which represent the values you are currently storing in two different maps, but you instead store all the pairs in a single map, and when it comes to "subtract" you can iterate in a single set of keys. Please note that a pair can be incomplete, so that no subtraction is done.
But that's probabily overkill.
have you considered using Apache Commons Collections?
CollectionUtils.subtract( collection1, collection2 );

nth item of hashmap

HashMap selections = new HashMap<Integer, Float>();
How can i get the Integer key of the 3rd smaller value of Float in all HashMap?
Edit
im using the HashMap for this
for (InflatedRunner runner : prices.getRunners()) {
for (InflatedMarketPrices.InflatedPrice price : runner.getLayPrices()) {
if (price.getDepth() == 1) {
selections.put(new Integer(runner.getSelectionId()), new Float(price.getPrice()));
}
}
}
i need the runner of the 3rd smaller price with depth 1
maybe i should implement this in another way?
Michael Mrozek nails it with his question if you're using HashMap right: this is highly atypical scenario for HashMap. That said, you can do something like this:
get the Set<Map.Entry<K,V>> from the HashMap<K,V>.entrySet().
addAll to List<Map.Entry<K,V>>
Collections.sort the list with a custom Comparator<Map.Entry<K,V>> that sorts based on V.
If you just need the 3rd Map.Entry<K,V> only, then a O(N) selection algorithm may suffice.
//after edit
It looks like selection should really be a SortedMap<Float, InflatedRunner>. You should look at java.util.TreeMap.
Here's an example of how TreeMap can be used to get the 3rd lowest key:
TreeMap<Integer,String> map = new TreeMap<Integer,String>();
map.put(33, "Three");
map.put(44, "Four");
map.put(11, "One");
map.put(22, "Two");
int thirdKey = map.higherKey(map.higherKey(map.firstKey()));
System.out.println(thirdKey); // prints "33"
Also note how I take advantage of Java's auto-boxing/unboxing feature between int and Integer. I noticed that you used new Integer and new Float in your original code; this is unnecessary.
//another edit
It should be noted that if you have multiple InflatedRunner with the same price, only one will be kept. If this is a problem, and you want to keep all runners, then you can do one of a few things:
If you really need a multi-map (one key can map to multiple values), then you can:
have TreeMap<Float,Set<InflatedRunner>>
Use MultiMap from Google Collections
If you don't need the map functionality, then just have a List<RunnerPricePair> (sorry, I'm not familiar with the domain to name it appropriately), where RunnerPricePair implements Comparable<RunnerPricePair> that compares on prices. You can just add all the pairs to the list, then either:
Collections.sort the list and get the 3rd pair
Use O(N) selection algorithm
Are you sure you're using hashmaps right? They're used to quickly lookup a value given a key; it's highly unusual to sort the values and then try to find a corresponding key. If anything, you should be mapping the float to the int, so you could at least sort the float keys and get the integer value of the third smallest that way
You have to do it in steps:
Get the Collection<V> of values from the Map
Sort the values
Choose the index of the nth smallest
Think about how you want to handle ties.
You could do it with the google collections BiMap, assuming that the Floats are unique.
If you regularly need to get the key of the nth item, consider:
using a TreeMap, which efficiently keeps keys in sorted order
then using a double map (i.e. one TreeMap mapping integer > float, the other mapping float > integer)
You have to weigh up the inelegance and potential risk of bugs from needing to maintain two maps with the scalability benefit of having a structure that efficiently keeps the keys in order.
You may need to think about two keys mapping to the same float...
P.S. Forgot to mention: if this is an occasional function, and you just need to find the nth largest item of a large number of items, you could consider implementing a selection algorithm (effectively, you do a sort, but don't actually bother sorting subparts of the list that you realise you don't need to sort because their order makes no difference to the position of the item you're looking for).

Categories

Resources