java concurrent map sorted by value - java

I'm looking for a way to have a concurrent map or similar key->value storage that can be sorted by value and not by key.
So far I was looking at ConcurrentSkipListMap but I couldn't find a way to sort it by value (using Comparator), since compare method receives only the keys as parameters.
The map has keys as String and values as Integer. What I'm looking is a way to retrieve the key with the smallest value(integer).
I was also thinking about using 2 maps, and create a separate map with Integer keys and String values and in this way I will have a sorted map by integer as I wanted, however there can be more than one integers with the same value, which could lead me into more problems.
Example
"user1"=>3
"user2"=>1
"user3"=>3
sorted list:
"user2"=>1
"user1"=>3
"user3"=>3
Is there a way to do this or are any 3rd party libraries that can do this?
Thanks

To sort by value where you can have multiple "value" to "key" mapping, you need a MultiMap. This needs to be synchronized as there is no concurrent version.
This doesn't meant the performance will be poor as that depends on how often you call this data structure. e.g. it could add up to 1 micro-second.

I recently had to do this and ended up using a ConcurrentSkipListMap where the keys contain a string and an integer. I ended up using the answer proposed below. The core insight is that you can structure your code to allow for a duplicate of a key with a different value before removing the previous one.
Atomic way to reorder keys in a ConcurrentSkipListMap / ConcurrentSkipListSet?
The problem was to keep a dynamic set of strings which were associated with integers that could change concurrently from different threads, described below. It sounds very similar to what you wanted to do.
Is there an embeddable Java alternative to Redis?
Here's the code for my implementation:
https://github.com/HarvardEconCS/TurkServer/blob/master/turkserver/src/main/java/edu/harvard/econcs/turkserver/util/UserItemMatcher.java

The principle of a ConcurrentMap is that it can be accessed concurrently - if you want it sorted at any time, performance will suffer significantly as that map would need to be fully synchronized (like a hashtable), resulting in poor throughput.
So I think your best bet is to return a sorted view of your map by putting all elements in an unmodifiable TreeMap for example (although sorting a TreeMap by values needs a bit of tweaking).

Related

Is it a good idea to store data as keys in HashMap with empty/null values?

I had originally written an ArrayList and stored unique values (usernames, i.e. Strings) in it. I later needed to use the ArrayList to search if a user existed in it. That's O(n) for the search.
My tech lead wanted me to change that to a HashMap and store the usernames as keys in the array and values as empty Strings.
So, in Java -
hashmap.put("johndoe","");
I can see if this user exists later by running -
hashmap.containsKey("johndoe");
This is O(1) right?
My lead said this was a more efficient way to do this and it made sense to me, but it just seemed a bit off to put null/empty as values in the hashmap and store elements in it as keys.
My question is, is this a good approach? The efficiency beats ArrayList#contains or an array search in general. It works.
My worry is, I haven't seen anyone else do this after a search. I may be missing an obvious issue somewhere but I can't see it.
Since you have a set of unique values, a Set is the appropriate data structure. You can put your values inside HashSet, an implementation of the Set interface.
My lead said this was a more efficient way to do this and it made sense to me, but it just seemed a bit off to put null/empty as values in the hashmap and store elements in it as keys.
The advice of the lead is flawed. Map is not the right abstraction for this, Set is. A Map is appropriate for key-value pairs. But you don't have values, only keys.
Example usage:
Set<String> users = new HashSet<>(Arrays.asList("Alice", "Bob"));
System.out.println(users.contains("Alice"));
// -> prints true
System.out.println(users.contains("Jack"));
// -> prints false
Using a Map would be awkward, because what should be the type of the values? That question makes no sense in your use case,
as you have just keys, not key-value pairs.
With a Set, you don't need to ask that, the usage is perfectly natural.
This is O(1) right?
Yes, searching in a HashMap or a HashSet is O(1) amortized worst case, while searching in a List or an array is O(n) worst case.
Some comments point out that a HashSet is implemented in terms of HashMap.
That's fine, at that level of abstraction.
At the level of abstraction of the task at hand ---
to store a collection of unique usernames,
using a set is a natural choice, more natural than a map.
This is basically how HashSet is implemented, so I guess you can say it's a good approach. You might as well use HashSet instead of your HashMap with empty values.
For example :
HashSet's implementation of add is
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
where map is the backing HashMap and PRESENT is a dummy value.
My worry is, I haven't seen anyone else do this after a search. I may be missing an obvious issue somewhere but I can't see it.
As I mentioned, the developers of the JDK are using this same approach.

Maximising efficiency of a data structure ordered by a person name using Java Collections

I need to have a data structure mapped by the name of the person, which implies that duplicate keys will have to be stored. I would like to have log(n) times (at most) for insertions, deletions and searches.
Ideally, I would have a Hashtable mapped by a unique identifier, that will be generated upon insertion. By doing so, I have insertions in constant time. With an auxiliary ordered Balanced Tree with each entry being a reference to an entry in the Hashtable, I would be able to search/delete by name in logarithmic time, and print all the entries in linear time, ordered by name.
Is there a way to do this in Java by reusing the available Collections? Or at least a solution with similar complexity...
In this question, it is suggested that a Map will solve this problem. But from my understanding, no Map can deal with repeated keys properly.
If I understand the question right, you want a MultiMap. There are a few implementations in Guava and Commons-Lang, or you could roll your own using e.g. a HashMap (or TreeMap if you suspect your .hashCode() implementation is too slow or too poor, but I'd benchmark first) and a List or Set implementation as the value types.
Note that if you want to iterate over the values based on Name order, you're best off using a Tree-based Multimap: If the name is already the key, your order will be what you want without needing to do anything else.
As you said yourself, Map will help you. For repeated keys, you should store list of Objects as values instead of Objects themselves. You can very well levarage Generics for this purpose to either store Object or List of Objects in the Value.
Edit: This degenerates to O(n) complexity in worst case for search (All entries have same name), but this case should be very very rare, so should probably ignore it.
If you agree to generating a unique ID per entity, then you should use a composite key consisting of the name and the unique ID. Make the key comparable by name, then by ID to break ties.
Then you just use a plain TreeMap for storage and retrieval. Especially take note of the NavigableMap API, which will allow you to find the best match for a key disregarding the differences in the ID part.

java constantly sorted list with quick retrieval

I'm looking for a constantly sorted list in java, which can also be used to retrieve an object very quickly. PriorityQueue works great for the "constantly sorted" requirement, and HashMap works great for the fast retrieval by key, but I need both in the same list. At one point I had wrote my own, but it does not implement the collections interfaces (so can't be used as a drop-in replacement for a java.util.List etc), and I'd rather stick to standard java classes if possible.
Is there such a list out there? Right now I'm using 2 lists, a priority queue and a hashmap, both contain the same objects. I use the priority queue to traverse the first part of the list in sorted order, the hashmap for fast retrieval by key (I need to do both operations interchangeably), but I'm hoping for a more elegant solution...
Edit: I should add that I need to have the list sorted by a different comparator then what is used for retrieval by key; the list is sorted by a long value, the key retrieval is a String.
Since you're already using HashMap, that implies that you have unique keys. Assuming that you want to order by those keys, TreeMap is your answer.
It sounds like what you're talking about is a collection with an automatically-maintained index.
Try looking at GlazedLists which use "list pipelines" to efficiently propagate changes -- their SortedList class should do the job.
edit: missed your retrieval-by-key requirement. That can be accomplished with GlazedLists.syncEventListToMap and GlazedLists.syncEventListToMultimap -- syncEventListToMap works if there are no duplicate keys, and syncEventListToMultimap works if there are duplicate keys. The nice part about this approach is that you can create multiple maps based on different indices.
If you want to use TreeMaps for indices -- which may give you better performance -- you need to keep your TreeMaps privately encapsulated within a custom class of your choosing, that exposes the interfaces/methods you want, and create accessors/mutators for that class to keep the indices in sync with the collection. Be sure to deal with concurrency issues (via synchronized methods or locks or whatever) if you access the collection from multiple threads.
edit: finally, if fast traversal of the items in sorted order is important, consider using ConcurrentSkipListMap instead of TreeMap -- not for its concurrency, but for its fast traversal. Skip lists are linked lists with multiple levels of linkage, one that traverses all items, the next that traverses every K items on average (for a given constant K), the next that traverses every K2 items on average, etc.
TreeMap
http://download.oracle.com/javase/6/docs/api/java/util/TreeMap.html
Go with a TreeSet.
A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).
I haven't tested this so I might be wrong, so consider this just an attempt.
Use TreeMap, wrap the key of this map as an object which has two attributes (the string which you use as the key in hashmap and the long which you use to maintain the sort order in PriorityQueue). Now for this object, override the equals and hashcode method using the string. Implement the comparable interface using the long.
Why don't you encapsulate your solution to a class that implements Collection or Map?
This way you could simply delegate the retrieval methods to the faster/better suiting collection. Just make sure that calls to write-methods (add/remove/put) will be forwarded to both collections. Remember indirect accesses, like iterator.remove(). Most of these methods are optional to implement, but you have to deactivate them (Collections.unmodifiableXXX will help here in most cases).

Should use TreeMap or HashMap for wrapping named parameters?

In most cases, there will be only 0-5 parameters in the map. I guess TreeMap might have a smaller footprint, because it's less sparse then HashMap. But I'm not sure.
Or, maybe it's even better to write my own Map in such case?
The main difference is that TreeMap is a SortedMap, and HashMap is not. If you need your map to be sorted, use a TreeMap, if not then use a HashMap. The performance characteristics and memory usage can vary, but if you only have 0-5 entries then there will be no noticeable difference.
I would not recommend you write your own map unless you need functionality which is not available from the standard Maps, which it sounds like you don't.
I guess TreeMap might have a smaller
footprint, because it's less sparse
then HashMap.
That may actually be wrong, because empty HashMap slots are null and thus take up little space, and TreeMap entries have a higher overhead than HashMap entries because of the child pointers and color flag.
In any case, it's only a concern if you have hundreds of thousands of such maps.
I guess you don't need order of entries in Map, so HashMap is OK for you.
And 5 entries are not performance concern.
You need to write Map which has dozen of methods to implemented, I don't think that is what you need.
If your about 5 keys are always the same (or part of a small set of keys), and you are usually querying them by string literals anyway, and you only seldom have to really parse the keys from user input or similar, then you may think about using an enum type as key type of a EnumMap. This should be even more efficient than a HashMap. The difference will only matter if you have many of these maps, though.

Is there a sorted java collection which handles duplicates?

I need a collection that behaves something like C++ multimap, but I also need to be able to get elements by a range of keys.
You can look into Google Collections. It has multiple implementations for MultiMap.
There is no built-in multimap collection in Java. To solve this you can map to every key a list of values: Map<String, List<String>>, for example. Otherwise there are third-party libraries with implemented multimaps - here is one of them.
There is a simple hack around creating multimap sortable collections in java...Use the dataset TreeMap and for keys enter key*10^4+counter. This way you are storing duplicate key values in the map (by adding counter they are actually not duplicates, so you can store the in treeMap, but you know not to use the last four digits of the integer key values), however your dataset is being sorted using your original key values. Note that depending how large is your dataset you might want to adjust 10^n to make sure that it is larger then the number of entries in your data.

Categories

Resources