Java hashmap default size - java

Hi I have a situation where I have to store a single key value pair to my Hashmap . The map size is always constant. i.e., 1 . But default size of hash map is 16 bits . Here am almost wasting nearly 15 bits. Is there any way to limit the size of the hashmap.
Thanks in Advance for your valuable suggestions .

You can provide an initial capacity in the HashMap constructor:
Map<String> map = new HashMap<>(1);
It looks like that is genuinely obeyed in the implementation I'm looking at, but I can easily imagine some implementations having a "minimum viable capacity" such as 16. You'd have to be creating a large number of maps for this to really be an issue.
On the other hand, if you really only need a single-entry map and you're in a performance-sensitive situation, I would suggest you might not want to use HashMap at all. It wouldn't be hard to write a Map implementation which knew that it always had exactly one entry (which would presumably be provided on construction). If your code depends on having a HashMap rather than a Map, you should check whether you really want that top be the case.

If you only need a pair of immutable values, you can use Pair class.
Pair Class
You always can keep the structure with this class, just use getLeft() for getting the key, and getRight() to return the value of your object.

Related

Fast look up of key by value without creating an inverse hashmap?

If I have a HashMap<KEY, VALUE> and I need fast look up of the key by the value is there any other approach besides creating a second HashMap<VALUE, KEY> that store the same data but using the value as the key?
Is there any approach/tick about this? If it makes a difference my interest is about String both as key and value
Note: I am on Java 7
Update:
I am not sure why the other question is a duplicate as I am asking a specific way on implementing this.
Unless the only/best way is a 2 way map I can't see why this is a duplicate
Short answer: no, there isn't.
You need two maps. If you want to use O(1) time for both, that means two hashmaps.
If you're worried about space, don't worry so much: you're just storing duplicate pointers, and not two strings.
I.e., you're just storing
HashMap<String* k, String* v> normal;
HashMap<String* k, String* v> inverse;
rather than entire strings. (Although pointers kind of don't exist in Java.)

HashMap or TreeMap for Map specifically sized 1

I need a Map sized 1 and began to wonder what would be best, a TreeMap or a HashMap?
My thought is that TreeMap would be better, as initialising and adding a value to a HashMap will cause it to create a table of 15 entries, whereas I believe TreeMapis a red-black tree implementation, which at size one will simply have a root node.
With that said, I suppose it depends on the hashCode / compareTo for the key of the HashMap / TreeMap respectively.
Ultimately, I suppose it really doesn't matter in terms of performance, I'm thinking in terms of best practice. I guess the best performance would come from a custom one entry Map implementation but that is just a bit ridiculous.
The canonical way of doing this is to use Collections.singletonMap()
Nice and simple, provided that you also require (or at least, don't mind) immutability.
And yes, internally it is implemented as a custom single-node Map.
As a complete aside, you can create a HashMap with a single bucket if in the constructor you specify a capacity of 1 and a loadFactor that is greater than the number of elements you want to put in. But memory-wise that would still be a bit of a waste as you'd have the overhead of the Entry array, the Entry object and all the other fields HashMap has (like load factor, size, resize treshold).
If you are only looking for a key to value structure you can just use SimpleEntry<K,V> class, this is basically an implementation of Map.Entry<K,V>

Assuming I have a good key when is NOT appropriate to use a map in java [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
When to use HashMap over LinkedList or ArrayList and vice-versa
Since coming across Maps in Java i have been using them extensively. In particular HashMap is an excellent option for many scenarios. It seems that it trumps an ArrayList in every category - some say that iteration isn't predictable but for that we have the LinkedHashMap.
So, my question is: why not use a HashMap all the time provided we have a solid immutable key?
Additionally, is it appropriate to use a something like a HashMap for a very small (<10) number of items or is there some additional overhead I am not considering?
Use an ArrayList when your keys are sequential integers. (If they aren't based at 0, just use an offset.) It's much more efficient for access (particularly random access) and update. Otherwise, sure, HashMap (or, as you say, LinkedHashMap) are extremely useful data structures.
I believe that the default initial size for a HashMap is 16 buckets, so there's a bit of overhead for very small lists. But unless you're creating a lot of maps, it shouldn't be a factor in your coding.
HashMap has a significant amount of overhead compared to an array (or ArrayList):
You need to hash the key to get an index into the backing array, then store the value and the key. This is slower and uses more memory than an array. This is more important when your key is something large or complicated, since it will take longer to hash or take more space.
You also need to hash the key every time you look up a value.
When you resize an ArrayList, you just create a new array and copy everything. When you resize a HashMap, you create a new array, then calculate the hashes all over again (so they'll be spread out through the new array).
HashMap's perform badly when they're full, so they generally leave something like 25% of their space empty.
These are all pretty minor, so you could get away with just using HashMaps all the time (in fact, this is what PHP seems to do), but it's wasteful to use a HashMap when you don't really need it.
A comparison might be helpful: Anything you can do with an integer, you can also do with a string, so why do we have integers? Because they're smaller and faster to work with (and provide some nice guarantees, like that they always contain a number).
I use Maps all the time - it is one of the most powerful and versatile data structures. I mostly use LinkedHashMap but, when working with Strings as key I use TreeMap because of the additional benefit of having the keys sorted.
However:
if your key is an int and you plan to use all the keys 0..n,use
an array (remember - int is more efficient than Integer). But a map is better if you have "sparse values"
if you need a list of unindexed items, use a linkedlist
if you need to store unique elements, use a set (why waste space to keep the values if you just need the keys)!
Remember - Java gives you very powerful collections (Set, Map,List) and, for each one, multiple implementations with different features - they are there for a reason.
Every data structure has its use, even if many can be implemented using a map as a backend, the most appropriate data structure is... more appropriate (and, usually, more efficient, with less overhead and providing more functionalities)
The size does not matter - 5 or 500 elements, if it looks like a map, use a map (there maybe few exceptions and corner cases where you need maximum efficiency and hard coded values are better). But if it looks like a set - use a set!

Iterating through the union of several Java Map key sets efficiently

In one of my Java 6 projects I have an array of LinkedHashMap instances as input to a method which has to iterate through all keys (i.e. through the union of the key sets of all maps) and work with the associated values. Not all keys exist in all maps and the method should not go through each key more than once or alter the input maps.
My current implementation looks like this:
Set<Object> keyset = new HashSet<Object>();
for (Map<Object, Object> map : input) {
for (Object key : map.keySet()) {
if (keyset.add(key)) {
...
}
}
}
The HashSet instance ensures that no key will be acted upon more than once.
Unfortunately this part of the code is rather critical performance-wise, as it is called very frequently. In fact, according to the profiler over 10% of the CPU time is spent in the HashSet.add() method.
I am trying to optimise this code us much as possible. The use of LinkedHashMap with its more efficient iterators (in comparison to the plain HashMap) was a significant boost, but I was hoping to reduce what is essentially book-keeping time to the minimum.
Putting all the keys in the HashSet before-hand, by using addAll() proved to be less efficient, due to the cost of calling HashSet.contains() afterwards.
At the moment I am looking at whether I can use a bitmap (well, a boolean[] to be exact) to avoid the HashSet completely, but it may not be possible at all, depending on my key range.
Is there a more efficient way to do this? Preferrably something that will not pose restrictions on the keys?
EDIT:
A few clarifications and comments:
I do need all the values from the maps - I cannot drop any of them.
I also need to know which map each value came from. The missing part (...) in my code would be something like this:
for (Map<Object, Object> m : input) {
Object v = m.get(key);
// Do something with v
}
A simple example to get an idea of what I need to do with the maps would be to print all maps in parallel like this:
Key Map0 Map1 Map2
F 1 null 2
B 2 3 null
C null null 5
...
That's not what I am actually doing, but you should get the idea.
The input maps are extremely variable. In fact, each call of this method uses a different set of them. Therefore I would not gain anything by caching the union of their keys.
My keys are all String instances. They are sort-of-interned on the heap using a separate HashMap, since they are pretty repetitive, therefore their hash code is already cached and most hash validations (when the HashMap implementation is checking whether two keys are actually equal, after their hash codes match) boil down to an identity comparison (==). The profiler confirms that only 0.5% of the CPU time is spent on String.equals() and String.hashCode().
EDIT 2:
Based on the suggestions in the answers, I made a few tests, profiling and benchmarking along the way. I ended up with roughly a 7% increase in performance. What I did:
I set the initial capacity of the HashSet to double the collective size of all input maps. This gained me something in the region of 1-2%, by eliminating most (all?) resize() calls in the HashSet.
I used Map.entrySet() for the map I am currently iterating. I had originally avoided this approach due to the additional code and the fear that the extra checks and Map.Entry getter method calls would outweigh any advantages. It turned out that the overall code was slightly faster.
I am sure that some people will start screaming at me, but here it is: Raw types. More specifically I used the raw form of HashSet in the code above. Since I was already using Object as its content type, I do not lose any type safety. The cost of that useless checkcast operation when calling HashSet.add() was apparently important enough to produce a 4% increase in performance when removed. Why the JVM insists on checking casts to Object is beyond me...
Can't provide a replacement for your approach but a few suggestions to (slightly) optimize the existing code.
Consider initializing the hash set with a capacity (the sum of the sizes of all maps). This avoids/reduces resizing of the set during an add operation
Consider not using the keySet() as it will always create a new set in the background. Use the entrySet(), that should be much faster
Have a look at the implementations of equals() and hashCode() - if they are "expensive", then you have a negative impact on the add method.
How you avoid using a HashSet depends on what you are doing.
I would only calculate the union once each time the input is changed. This should be relatively rare conmpared with the number of lookups.
// on an update.
Map<Key, Value> union = new LinkedHashMap<Key, Value>();
for (Map<Key, Value> map : input)
union.putAll(map);
// on a lookup.
Value value = union.get(key);
// process each key once
for(Entry<Key, Value> entry: union) {
// do something.
}
Option A is to use the .values() method and iterate through it. But I suppose you already had thought of it.
If the code is called so often, then it might be worth creating additional structures (depending of how often the data is changed). Create a new HashMap; every key in any of your hashmaps is a key in this one and the list keeps the HashMaps where that key appears.
This will help if the data is somewhat static (related to the frequency of queries), so the overload from managing the structure is relatively small, and if the key space is not very dense (keys do not repeat themselves a lot in different HashMaps), as it will save a lot of unneeded contains().
Of course, if you are mixing data structures it is better if you encapsulate all in your own data structure.
You could take a look at Guava's Sets.union() http://guava-libraries.googlecode.com/svn/tags/release04/javadoc/com/google/common/collect/Sets.html#union(java.util.Set,%20java.util.Set)

Should use TreeMap or HashMap for wrapping named parameters?

In most cases, there will be only 0-5 parameters in the map. I guess TreeMap might have a smaller footprint, because it's less sparse then HashMap. But I'm not sure.
Or, maybe it's even better to write my own Map in such case?
The main difference is that TreeMap is a SortedMap, and HashMap is not. If you need your map to be sorted, use a TreeMap, if not then use a HashMap. The performance characteristics and memory usage can vary, but if you only have 0-5 entries then there will be no noticeable difference.
I would not recommend you write your own map unless you need functionality which is not available from the standard Maps, which it sounds like you don't.
I guess TreeMap might have a smaller
footprint, because it's less sparse
then HashMap.
That may actually be wrong, because empty HashMap slots are null and thus take up little space, and TreeMap entries have a higher overhead than HashMap entries because of the child pointers and color flag.
In any case, it's only a concern if you have hundreds of thousands of such maps.
I guess you don't need order of entries in Map, so HashMap is OK for you.
And 5 entries are not performance concern.
You need to write Map which has dozen of methods to implemented, I don't think that is what you need.
If your about 5 keys are always the same (or part of a small set of keys), and you are usually querying them by string literals anyway, and you only seldom have to really parse the keys from user input or similar, then you may think about using an enum type as key type of a EnumMap. This should be even more efficient than a HashMap. The difference will only matter if you have many of these maps, though.

Categories

Resources