Should use TreeMap or HashMap for wrapping named parameters? - java

In most cases, there will be only 0-5 parameters in the map. I guess TreeMap might have a smaller footprint, because it's less sparse then HashMap. But I'm not sure.
Or, maybe it's even better to write my own Map in such case?

The main difference is that TreeMap is a SortedMap, and HashMap is not. If you need your map to be sorted, use a TreeMap, if not then use a HashMap. The performance characteristics and memory usage can vary, but if you only have 0-5 entries then there will be no noticeable difference.
I would not recommend you write your own map unless you need functionality which is not available from the standard Maps, which it sounds like you don't.

I guess TreeMap might have a smaller
footprint, because it's less sparse
then HashMap.
That may actually be wrong, because empty HashMap slots are null and thus take up little space, and TreeMap entries have a higher overhead than HashMap entries because of the child pointers and color flag.
In any case, it's only a concern if you have hundreds of thousands of such maps.

I guess you don't need order of entries in Map, so HashMap is OK for you.
And 5 entries are not performance concern.
You need to write Map which has dozen of methods to implemented, I don't think that is what you need.

If your about 5 keys are always the same (or part of a small set of keys), and you are usually querying them by string literals anyway, and you only seldom have to really parse the keys from user input or similar, then you may think about using an enum type as key type of a EnumMap. This should be even more efficient than a HashMap. The difference will only matter if you have many of these maps, though.

Related

Map.of vs EnumMap for immutable map with enum keys

An EnumMap uses the restriction that all keys of a map will be from the same enum to gain performance benefits:
Enum maps are represented internally as arrays. This representation is extremely compact and efficient.
In this case, the keys and values are stored in separate arrays and the values are ordinal-ordered. Iteration is done with the internal EnumMapIterator class.
An immutable map created by the various Map.of methods use the restriction that the map will not change structurally to gain performance benefits. If the map is not of size 0 or 1, it uses the MapN internal class that also stores its entries in an array. In this case, the value is stored 1 index after its key. Iteration is done with the internal MapNIterator.
For an immutable map of enum keys of size 2 or more, which answers both of the above's requirements, which map performs better? (Criteria could be space efficiency, time efficiency for containsKey, containsValue, get, and iteration efficiency of entrySet, keySet and values.)
which map gives better space efficiency, and time efficiency for its operations and iteration: containsKey, containsValue, get, entrySet, keySet and values?
You're raising 1 + 6 (or 2 * 6, depending on how it's understood) questions, that's a bit too much. If you want a definite answer, you have to concentrate on a single thing and profile it (nobody's gonna do it for you unless you find a very interesting problem).
The space efficiency for an EnumMap simply must be better. There's no need to store the keys as a shared enum array can be used. There's no need for a holes-containing hash lookup array.
There may be exceptions like a small map based on a huge enum.
The most important operation is get. With an EnumMap, it involves no lookup, just a trivial class comparison and an array access. With Map.of(...), there's a loop, which for enums, usually terminates after the first iteration as Enum.hashCode is IMHO stupid, but usually well distributed.
As containsKey is based on the same lookup, it's clear.
I doubt, I've ever used containsValue, but it doesn't do anything smarter than linear search. I'd expect a tiny win for EnumMap because of the holes (needing trivial null test, but causing branch mispredictions).
The remaining three operations are not worth looking up as they return a collection containing no data and simply pointing to the map, i.e., a constant time operation. For example, map.keySet().contains(x) simply delegates to map.containsKey().
The efficiency of the iteration would be a more interesting question, but you didn't ask it.

HashMap or TreeMap for Map specifically sized 1

I need a Map sized 1 and began to wonder what would be best, a TreeMap or a HashMap?
My thought is that TreeMap would be better, as initialising and adding a value to a HashMap will cause it to create a table of 15 entries, whereas I believe TreeMapis a red-black tree implementation, which at size one will simply have a root node.
With that said, I suppose it depends on the hashCode / compareTo for the key of the HashMap / TreeMap respectively.
Ultimately, I suppose it really doesn't matter in terms of performance, I'm thinking in terms of best practice. I guess the best performance would come from a custom one entry Map implementation but that is just a bit ridiculous.
The canonical way of doing this is to use Collections.singletonMap()
Nice and simple, provided that you also require (or at least, don't mind) immutability.
And yes, internally it is implemented as a custom single-node Map.
As a complete aside, you can create a HashMap with a single bucket if in the constructor you specify a capacity of 1 and a loadFactor that is greater than the number of elements you want to put in. But memory-wise that would still be a bit of a waste as you'd have the overhead of the Entry array, the Entry object and all the other fields HashMap has (like load factor, size, resize treshold).
If you are only looking for a key to value structure you can just use SimpleEntry<K,V> class, this is basically an implementation of Map.Entry<K,V>

Using EnumSet or EnumMap on arbitrary keys

We know that EnumSet and EnumMap are faster than HashSet/HashMap due to the power of bit manipulation. But are we actually harnessing the true power of EnumSet/EnumMap when it really matters? If we have a set of millions of record and we want to find out if some object is present in that set or not, can we take advantage of EnumSet's speed?
I checked around but haven't found anything discussing this. Everywhere the usual stuff is found i.e. because EnumSet and EnumMap uses a predefined set of keys lookups on small collections are very fast. I know enums are compile-time constants but can we have best of both worlds - an EnumSet-like data structure without needing enums as keys?
Interesting insight; the short answer is no, but your question is exploring some good data-structure design concepts which I'll try to discuss.
First, lets talk about HashMap (HashSet uses a HashMap internally, so they share most behavior); a hash-based data structure is powerful because it is fast and general. It's fast (i.e. approximately O(1)) because we can find the key we're looking for with a very small number of computations. Roughly, we have an array of lists of keys, convert the key to an integer index into that array, then look through the associated list for the key. As the mapping gets bigger, the backing array is repeatedly resized to hold more lists. Assuming the lists are evenly distributed, this lookup is very fast. And because this works for any generic object (that has a proper .hashcode() and .equals()) it's useful for just about any application.
Enums have several interesting properties, but for the purpose of efficient lookup we only care about two of them - they're generally small, and they have a fixed number of values. Because of this, we can do better than HashMap; specifically, we can map every possible value to a unique integer, meaning we don't need to compute a hash, and we don't need to worry about hashes colliding. So EnumMap simply stores an array of the same size as the enum and looks up directly into it:
// From Java 7's EnumMap
public V get(Object key) {
return (isValidKey(key) ?
unmaskNull(vals[((Enum)key).ordinal()]) : null);
}
Stripping away some of the required Map sanity checks, it's simply:
return vals[key.ordinal()];
Note that this is conceptually no different than a standard HashMap, it's simply avoiding a few computations. EnumSet is a little more clever, using the bits in one or more longs to represent array indices, but functionally it's no different than the EnumMap case - we allocate enough slots to cover all possible enum values and can use their integer .ordinal() rather than compute a hash.
So how much faster than HashMap is EnumMap? It's clearly faster, but in truth it's not that much faster. HashMap is already a very efficient data structure, so any optimization on it will only yield marginally better results. In particular, both HashMap and EnumMap are asymptotically the same speed (O(1)), meaning as they get bigger, they behave equally well. This is the primary reason why there isn't a more general data-structure like EnumMap - because it wouldn't be worth the effort relative to HashMap.
The second reason we don't want a more general "FiniteKeysMap" is it would make our lives as users more complicated, which would be worthwhile if it were a marked speed increase, but since it's not would just be a hassle. We would have to define an interface (and probably also a factory pattern) for any type that could be a key in this map. The interface would need to provide a guarantee that every unique instance returns a unique hashcode in the range [0-n), and also provide a way for the map to get n and potentially all n elements. Those last two operations would be better as static methods, but since we can't define static methods in an interface, they'd have to either be passed in directly to every map we create, or a separate factory object with this information would have to exist and be passed to the map/set at construction. Because enums are part of the language, they get all of those benefits for free, meaning there's no such cost for end-user programmers to need to take advantage of.
Furthermore, it would be very easy to make mistakes with this interface; say you have a type that has exactly 100,000 unique values. Should it implement our interface? It could. But you'd likely actually be shooting yourself in the foot. This would eat up a lot of unnecessary memory, since our FiniteKeysMap would allocate a new 100,000 length array to represent even an empty map. Generally speaking, that sort of wasted space is not worth the marginal improvement such a data structure would provide.
In short, while your idea is possible, it is not practical. HashMap is so efficient that attempting to create a separate data structure for a very limited number of cases would add far more complexity than value.
For the specific case of faster .contains() checks, you might like Bloom Filters. It's a set-like data structure that very efficiently stores very large sets, with the condition that it may sometimes incorrectly say an element is in the set when it is not (but not the other way around - if it says an element isn't in the set, it's definitely not). Guava provides a nice BloomFilter implementation.

Assuming I have a good key when is NOT appropriate to use a map in java [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
When to use HashMap over LinkedList or ArrayList and vice-versa
Since coming across Maps in Java i have been using them extensively. In particular HashMap is an excellent option for many scenarios. It seems that it trumps an ArrayList in every category - some say that iteration isn't predictable but for that we have the LinkedHashMap.
So, my question is: why not use a HashMap all the time provided we have a solid immutable key?
Additionally, is it appropriate to use a something like a HashMap for a very small (<10) number of items or is there some additional overhead I am not considering?
Use an ArrayList when your keys are sequential integers. (If they aren't based at 0, just use an offset.) It's much more efficient for access (particularly random access) and update. Otherwise, sure, HashMap (or, as you say, LinkedHashMap) are extremely useful data structures.
I believe that the default initial size for a HashMap is 16 buckets, so there's a bit of overhead for very small lists. But unless you're creating a lot of maps, it shouldn't be a factor in your coding.
HashMap has a significant amount of overhead compared to an array (or ArrayList):
You need to hash the key to get an index into the backing array, then store the value and the key. This is slower and uses more memory than an array. This is more important when your key is something large or complicated, since it will take longer to hash or take more space.
You also need to hash the key every time you look up a value.
When you resize an ArrayList, you just create a new array and copy everything. When you resize a HashMap, you create a new array, then calculate the hashes all over again (so they'll be spread out through the new array).
HashMap's perform badly when they're full, so they generally leave something like 25% of their space empty.
These are all pretty minor, so you could get away with just using HashMaps all the time (in fact, this is what PHP seems to do), but it's wasteful to use a HashMap when you don't really need it.
A comparison might be helpful: Anything you can do with an integer, you can also do with a string, so why do we have integers? Because they're smaller and faster to work with (and provide some nice guarantees, like that they always contain a number).
I use Maps all the time - it is one of the most powerful and versatile data structures. I mostly use LinkedHashMap but, when working with Strings as key I use TreeMap because of the additional benefit of having the keys sorted.
However:
if your key is an int and you plan to use all the keys 0..n,use
an array (remember - int is more efficient than Integer). But a map is better if you have "sparse values"
if you need a list of unindexed items, use a linkedlist
if you need to store unique elements, use a set (why waste space to keep the values if you just need the keys)!
Remember - Java gives you very powerful collections (Set, Map,List) and, for each one, multiple implementations with different features - they are there for a reason.
Every data structure has its use, even if many can be implemented using a map as a backend, the most appropriate data structure is... more appropriate (and, usually, more efficient, with less overhead and providing more functionalities)
The size does not matter - 5 or 500 elements, if it looks like a map, use a map (there maybe few exceptions and corner cases where you need maximum efficiency and hard coded values are better). But if it looks like a set - use a set!

java concurrent map sorted by value

I'm looking for a way to have a concurrent map or similar key->value storage that can be sorted by value and not by key.
So far I was looking at ConcurrentSkipListMap but I couldn't find a way to sort it by value (using Comparator), since compare method receives only the keys as parameters.
The map has keys as String and values as Integer. What I'm looking is a way to retrieve the key with the smallest value(integer).
I was also thinking about using 2 maps, and create a separate map with Integer keys and String values and in this way I will have a sorted map by integer as I wanted, however there can be more than one integers with the same value, which could lead me into more problems.
Example
"user1"=>3
"user2"=>1
"user3"=>3
sorted list:
"user2"=>1
"user1"=>3
"user3"=>3
Is there a way to do this or are any 3rd party libraries that can do this?
Thanks
To sort by value where you can have multiple "value" to "key" mapping, you need a MultiMap. This needs to be synchronized as there is no concurrent version.
This doesn't meant the performance will be poor as that depends on how often you call this data structure. e.g. it could add up to 1 micro-second.
I recently had to do this and ended up using a ConcurrentSkipListMap where the keys contain a string and an integer. I ended up using the answer proposed below. The core insight is that you can structure your code to allow for a duplicate of a key with a different value before removing the previous one.
Atomic way to reorder keys in a ConcurrentSkipListMap / ConcurrentSkipListSet?
The problem was to keep a dynamic set of strings which were associated with integers that could change concurrently from different threads, described below. It sounds very similar to what you wanted to do.
Is there an embeddable Java alternative to Redis?
Here's the code for my implementation:
https://github.com/HarvardEconCS/TurkServer/blob/master/turkserver/src/main/java/edu/harvard/econcs/turkserver/util/UserItemMatcher.java
The principle of a ConcurrentMap is that it can be accessed concurrently - if you want it sorted at any time, performance will suffer significantly as that map would need to be fully synchronized (like a hashtable), resulting in poor throughput.
So I think your best bet is to return a sorted view of your map by putting all elements in an unmodifiable TreeMap for example (although sorting a TreeMap by values needs a bit of tweaking).

Categories

Resources