As described in the answer to Double in HashMap, Doubles shouldn't be used in HashMaps because they are difficult to compare for equality. I believe my case is different, but I thought I'd ask to make sure since I didn't see anything about this.
I'm going to have a series of double values associated with objects, and I want them to be sorted by the double values. Is TreeMap an appropriate solution? Would there be a better one? The double values are generated a bunch of math, so the likelihood of a duplicate value is extremely low.
EDIT: I should clarify: all I need is to have this list of objects sorted by the doubles they're associated with. The values of the doubles will be discarded and I'll never call map.get(key)
Doubles shouldn't be used in HashMaps because they are difficult to compare for equality.
Will you ever try to get the values based on certain keys?
If yes, then the reasoning about "difficult to compare" applies and you should probably avoid such data structure (or always rely on tailMap / headMap / submap and fetch ranges of the map).
If no (i.e. you'll typically just do for (Double key : map.keySet()) ... or iterate over the entrySet) then I would say you're fine using Double as keys.
The double values are generated a bunch of math, so the likelihood of a duplicate value is extremely low.
Is it a bug if you actually do get a duplicate?
If yes then it's not the right data structure to use. You could for instance use a Multimap from Guava instead.
If no, (i.e. it doesn't matter which of the two values it maps to, because they can only differ by a small epsilon anyway) then you should be fine.
The problem with doubles in tree maps is exactly the same as it is with doubles in hash map - comparing for equality. If you avoid calls of treeMap.get(myDouble) and stay with range queries instead (e.g. by using submap) you should be fine.
TreeMap<Double,String> tm = new TreeMap<Double,String>();
tm.put(1.203, "quick");
tm.put(1.231, "brown");
tm.put(1.233, "fox");
tm.put(1.213, "jumps");
tm.put(1.243, "over");
tm.put(1.2301, "the");
tm.put(1.2203, "lazy");
tm.put(1.2003, "dog");
for (Map.Entry<Double,String> e : tm.subMap(1.230, 1.232).entrySet()) {
System.out.println(e);
}
This prints
1.2301=the
1.231=brown
See this snippet on ideone.
If you only want to sort them, the best thing would be to create a wrapper object around the double and the object, implement the "comparable" interface on this wrapper, and use a simple collection to sort them
If you just want them sorted, there are better collections (for instance SortedSet). You can also use any list and use the utilities for sorting (I think they are in java.util.Collection).
Only use Maps and Tables when you want to directly access an item by its key.
Related
I had originally written an ArrayList and stored unique values (usernames, i.e. Strings) in it. I later needed to use the ArrayList to search if a user existed in it. That's O(n) for the search.
My tech lead wanted me to change that to a HashMap and store the usernames as keys in the array and values as empty Strings.
So, in Java -
hashmap.put("johndoe","");
I can see if this user exists later by running -
hashmap.containsKey("johndoe");
This is O(1) right?
My lead said this was a more efficient way to do this and it made sense to me, but it just seemed a bit off to put null/empty as values in the hashmap and store elements in it as keys.
My question is, is this a good approach? The efficiency beats ArrayList#contains or an array search in general. It works.
My worry is, I haven't seen anyone else do this after a search. I may be missing an obvious issue somewhere but I can't see it.
Since you have a set of unique values, a Set is the appropriate data structure. You can put your values inside HashSet, an implementation of the Set interface.
My lead said this was a more efficient way to do this and it made sense to me, but it just seemed a bit off to put null/empty as values in the hashmap and store elements in it as keys.
The advice of the lead is flawed. Map is not the right abstraction for this, Set is. A Map is appropriate for key-value pairs. But you don't have values, only keys.
Example usage:
Set<String> users = new HashSet<>(Arrays.asList("Alice", "Bob"));
System.out.println(users.contains("Alice"));
// -> prints true
System.out.println(users.contains("Jack"));
// -> prints false
Using a Map would be awkward, because what should be the type of the values? That question makes no sense in your use case,
as you have just keys, not key-value pairs.
With a Set, you don't need to ask that, the usage is perfectly natural.
This is O(1) right?
Yes, searching in a HashMap or a HashSet is O(1) amortized worst case, while searching in a List or an array is O(n) worst case.
Some comments point out that a HashSet is implemented in terms of HashMap.
That's fine, at that level of abstraction.
At the level of abstraction of the task at hand ---
to store a collection of unique usernames,
using a set is a natural choice, more natural than a map.
This is basically how HashSet is implemented, so I guess you can say it's a good approach. You might as well use HashSet instead of your HashMap with empty values.
For example :
HashSet's implementation of add is
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
where map is the backing HashMap and PRESENT is a dummy value.
My worry is, I haven't seen anyone else do this after a search. I may be missing an obvious issue somewhere but I can't see it.
As I mentioned, the developers of the JDK are using this same approach.
If I have a HashMap<KEY, VALUE> and I need fast look up of the key by the value is there any other approach besides creating a second HashMap<VALUE, KEY> that store the same data but using the value as the key?
Is there any approach/tick about this? If it makes a difference my interest is about String both as key and value
Note: I am on Java 7
Update:
I am not sure why the other question is a duplicate as I am asking a specific way on implementing this.
Unless the only/best way is a 2 way map I can't see why this is a duplicate
Short answer: no, there isn't.
You need two maps. If you want to use O(1) time for both, that means two hashmaps.
If you're worried about space, don't worry so much: you're just storing duplicate pointers, and not two strings.
I.e., you're just storing
HashMap<String* k, String* v> normal;
HashMap<String* k, String* v> inverse;
rather than entire strings. (Although pointers kind of don't exist in Java.)
Is there a datatype or class which will allow me to accomplish this or what would be an efficient way to produce a similar effect.
1) Add items to an array with an associated float key (will have a few
duplicate float keys)
2) Sort the array from least to greatest based on the float key OR grab the
lowest float key and return those objects.
I need this to be relatively efficient because I will be repeating this many times per second.
What you need for your purpose is a Multimap as there will be duplicate keys. While C++ provides the multimap interface, Java SE does not have one built in. But you can use the TreeMultimap from the Google Guava library (it used to be available under Google Collections but as Louis Wassernman noted in the comments, it's been dead for a long time and you should avoid using it). The documentation for the class is at http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/TreeMultimap.html
Please keep in mind that the TreeMultimap sorts both the keys and values based on the comparator provided (or natural ordering if no comparators are provided). The natural ordering is to sort the map from the smallest to the biggest entry. If you do not want your values sorted as well, you will want to play around with the supplied comparator a bit.
Here is code that has some unit tests for the TreeMultimap itself. You can very easily use this as an example for what you want http://google-collections.googlecode.com/svn-history/r76/trunk/test/com/google/common/collect/TreeMultimapNaturalTest.java.
I'm looking for a way to have a concurrent map or similar key->value storage that can be sorted by value and not by key.
So far I was looking at ConcurrentSkipListMap but I couldn't find a way to sort it by value (using Comparator), since compare method receives only the keys as parameters.
The map has keys as String and values as Integer. What I'm looking is a way to retrieve the key with the smallest value(integer).
I was also thinking about using 2 maps, and create a separate map with Integer keys and String values and in this way I will have a sorted map by integer as I wanted, however there can be more than one integers with the same value, which could lead me into more problems.
Example
"user1"=>3
"user2"=>1
"user3"=>3
sorted list:
"user2"=>1
"user1"=>3
"user3"=>3
Is there a way to do this or are any 3rd party libraries that can do this?
Thanks
To sort by value where you can have multiple "value" to "key" mapping, you need a MultiMap. This needs to be synchronized as there is no concurrent version.
This doesn't meant the performance will be poor as that depends on how often you call this data structure. e.g. it could add up to 1 micro-second.
I recently had to do this and ended up using a ConcurrentSkipListMap where the keys contain a string and an integer. I ended up using the answer proposed below. The core insight is that you can structure your code to allow for a duplicate of a key with a different value before removing the previous one.
Atomic way to reorder keys in a ConcurrentSkipListMap / ConcurrentSkipListSet?
The problem was to keep a dynamic set of strings which were associated with integers that could change concurrently from different threads, described below. It sounds very similar to what you wanted to do.
Is there an embeddable Java alternative to Redis?
Here's the code for my implementation:
https://github.com/HarvardEconCS/TurkServer/blob/master/turkserver/src/main/java/edu/harvard/econcs/turkserver/util/UserItemMatcher.java
The principle of a ConcurrentMap is that it can be accessed concurrently - if you want it sorted at any time, performance will suffer significantly as that map would need to be fully synchronized (like a hashtable), resulting in poor throughput.
So I think your best bet is to return a sorted view of your map by putting all elements in an unmodifiable TreeMap for example (although sorting a TreeMap by values needs a bit of tweaking).
In one of my Java 6 projects I have an array of LinkedHashMap instances as input to a method which has to iterate through all keys (i.e. through the union of the key sets of all maps) and work with the associated values. Not all keys exist in all maps and the method should not go through each key more than once or alter the input maps.
My current implementation looks like this:
Set<Object> keyset = new HashSet<Object>();
for (Map<Object, Object> map : input) {
for (Object key : map.keySet()) {
if (keyset.add(key)) {
...
}
}
}
The HashSet instance ensures that no key will be acted upon more than once.
Unfortunately this part of the code is rather critical performance-wise, as it is called very frequently. In fact, according to the profiler over 10% of the CPU time is spent in the HashSet.add() method.
I am trying to optimise this code us much as possible. The use of LinkedHashMap with its more efficient iterators (in comparison to the plain HashMap) was a significant boost, but I was hoping to reduce what is essentially book-keeping time to the minimum.
Putting all the keys in the HashSet before-hand, by using addAll() proved to be less efficient, due to the cost of calling HashSet.contains() afterwards.
At the moment I am looking at whether I can use a bitmap (well, a boolean[] to be exact) to avoid the HashSet completely, but it may not be possible at all, depending on my key range.
Is there a more efficient way to do this? Preferrably something that will not pose restrictions on the keys?
EDIT:
A few clarifications and comments:
I do need all the values from the maps - I cannot drop any of them.
I also need to know which map each value came from. The missing part (...) in my code would be something like this:
for (Map<Object, Object> m : input) {
Object v = m.get(key);
// Do something with v
}
A simple example to get an idea of what I need to do with the maps would be to print all maps in parallel like this:
Key Map0 Map1 Map2
F 1 null 2
B 2 3 null
C null null 5
...
That's not what I am actually doing, but you should get the idea.
The input maps are extremely variable. In fact, each call of this method uses a different set of them. Therefore I would not gain anything by caching the union of their keys.
My keys are all String instances. They are sort-of-interned on the heap using a separate HashMap, since they are pretty repetitive, therefore their hash code is already cached and most hash validations (when the HashMap implementation is checking whether two keys are actually equal, after their hash codes match) boil down to an identity comparison (==). The profiler confirms that only 0.5% of the CPU time is spent on String.equals() and String.hashCode().
EDIT 2:
Based on the suggestions in the answers, I made a few tests, profiling and benchmarking along the way. I ended up with roughly a 7% increase in performance. What I did:
I set the initial capacity of the HashSet to double the collective size of all input maps. This gained me something in the region of 1-2%, by eliminating most (all?) resize() calls in the HashSet.
I used Map.entrySet() for the map I am currently iterating. I had originally avoided this approach due to the additional code and the fear that the extra checks and Map.Entry getter method calls would outweigh any advantages. It turned out that the overall code was slightly faster.
I am sure that some people will start screaming at me, but here it is: Raw types. More specifically I used the raw form of HashSet in the code above. Since I was already using Object as its content type, I do not lose any type safety. The cost of that useless checkcast operation when calling HashSet.add() was apparently important enough to produce a 4% increase in performance when removed. Why the JVM insists on checking casts to Object is beyond me...
Can't provide a replacement for your approach but a few suggestions to (slightly) optimize the existing code.
Consider initializing the hash set with a capacity (the sum of the sizes of all maps). This avoids/reduces resizing of the set during an add operation
Consider not using the keySet() as it will always create a new set in the background. Use the entrySet(), that should be much faster
Have a look at the implementations of equals() and hashCode() - if they are "expensive", then you have a negative impact on the add method.
How you avoid using a HashSet depends on what you are doing.
I would only calculate the union once each time the input is changed. This should be relatively rare conmpared with the number of lookups.
// on an update.
Map<Key, Value> union = new LinkedHashMap<Key, Value>();
for (Map<Key, Value> map : input)
union.putAll(map);
// on a lookup.
Value value = union.get(key);
// process each key once
for(Entry<Key, Value> entry: union) {
// do something.
}
Option A is to use the .values() method and iterate through it. But I suppose you already had thought of it.
If the code is called so often, then it might be worth creating additional structures (depending of how often the data is changed). Create a new HashMap; every key in any of your hashmaps is a key in this one and the list keeps the HashMaps where that key appears.
This will help if the data is somewhat static (related to the frequency of queries), so the overload from managing the structure is relatively small, and if the key space is not very dense (keys do not repeat themselves a lot in different HashMaps), as it will save a lot of unneeded contains().
Of course, if you are mixing data structures it is better if you encapsulate all in your own data structure.
You could take a look at Guava's Sets.union() http://guava-libraries.googlecode.com/svn/tags/release04/javadoc/com/google/common/collect/Sets.html#union(java.util.Set,%20java.util.Set)