Please tell me what will be with old items in HashMap if I put more items than specify in capacity?
for example:
HashMap<String, Bitmap> hashmap= new HashMap<String, Bitmap>(5);
i set capacity to 5.
but what will be with first 5 items and bitmap's if I put 10 items to this HashMap?
You're only specifying an initial capacity - the HashMap will grow as it needs to anyway, copying the contents internally. It's only available as an optimization, so that if you know you'll need a large capacity, you can start off with that capacity so that no copying is required.
From the documentation:
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
...
If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table
Related
I have a hashmap and around 12 element in it, at the same index, now if one more element is inserted at the same index, since now it has reached the threshold it will add the element and increase the hashmap size to double. similarly if 12 more element are added at same index then again it will resize and size will be doubled.This will lead to waste of space (other index elements will be empty).
Any lead/help will be appericiated.
Like separate chaining, open addressing is a method for handling collisions. In Open Addressing, all elements are stored in the hash table itself. So at any point, the size of the table must be greater than or equal to the total number of keys.
Open addressing has several advantages over chaining:
Open addressing provides better cache performance as everything is stored in the same table.
A slot can be used even if the input doesn’t map to it.
You can refer to Hashing overview for more details.
When hashmap reaches allowed size (capacity*loadFactor) then it automatically increased and after that all elements will be relocated into the new indexies. So, why need to perform this relocation?
Because it makes hash table sparse, allowing elements to sit in their own buckets instead of piling up in small number of buckets.
When several elements hit the same bucket, HashMap has to create a list (and sometimes even a tree) which is bad for both memory footprint and performance of elements retrieval. So, to prevent the number of such collisions, HashMap is growing its internal hash table and rehashes.
The rehashing is required because the calculation used when mapping a key value to a bucket is dependent on the total number of buckets. When the number of buckets changes (to increase capacity), the new mapping calculation may map a given key to a different bucket.
In other words, lookups for some or all of the previous entries may fail to behave properly because the entries are in the wrong buckets after growing the backing store.
While this may seem unfortunate, you actually want the mapping function to take into account the total number of buckets that are available. In this way, all buckets can be utilized and no entries get mapped to buckets that do not exist.
There are other data structures that do not have this property, but this is the standard way that hash maps work.
I have been studying Java Collections recently. I noticed that ArrayList, ArrayDeque or HashMap contains helper functions which expand capacity of the containers if necessary, but neither of them have function to narrow the cap if the container gets empty.
If I am correct, is the memory cost of references (4 byte) so irrelevant?
You're correct, most of the collections have an internal capacity that is expanded automatically and that never shrinks. The exception is ArrayList, which has methods ensureCapacity() and trimToSize() that let the application manage the list's internal capacity explicitly. In practice, I believe these methods are rarely used.
The policy of growing but not shrinking automatically is based on some assumptions about the usage model of collections:
applications often don't know how many elements they want to store, so the collections will expand themselves automatically as elements are added;
once a collection is fully populated, the number of elements will generally remain around that number, neither growing nor shrinking significantly;
the per-element overhead of a collection is generally small compared to the size of the elements themselves.
For applications that fit these assumptions, the policy seems to work out reasonably well. For example, suppose you insert a million key-value pairs into a HashMap. The default load factor is 0.75, so the internal table size would be 1.33 million. Table sizes are rounded up to the next power of two, which would be 2^21 (2,097,152). In a sense, that's a million or so "extra" slots in the map's internal table. Since each slot is typically a 4-byte object reference, that's 4MB of wasted space!
But consider, you're using this map to store a million key-value pairs. Suppose each key and value is 50 bytes (which seems like a pretty small object). That's 100MB to store the data. Compared to that, 4MB of extra map overhead isn't that big of a deal.
Suppose, though, that you've stored a million mappings, and you want to run through them all and delete all but a hundred mappings of interest. Now you're storing 10KB of data, but your map's table of 2^21 elements is occupying 8MB of space. That's a lot of waste.
But it also seems that performing 999,900 deletions from a map is kind of an unlikely thing to do. If you want to keep 100 mappings, you'd probably create a new map, insert just the 100 mappings you want to keep, and throw away the original map. That would eliminate the space wastage, and it would probably be a lot faster as well. Given this, the lack of an automatic shrinking policy for the collections is usually not a problem in practice.
I am wondering for those dictionary-like data structure (Hashtable, HashMap, LinkedHashMap, TreeMap, ConncurrentHashMap, SortedMap and so on) need to do rehashing operation when its size reaches the threshold ? Since it's really expensive whenever we resize our table, so I am wondering is there anything else that doesn't require rehashing when resizing the table or any way to improve the performance on such operation ?
SortedMap (TreeMap) doesn't need to rehash, it's designed as red-black tree, and thus is self-balanced
closed-hash related structures might need to be rehashed in order to get performance boost, however it's kinda trade-off.
so it can be implemented without rehashing, and that is what load factor parameter introduced for.
To improve the performance estimate approximate initial capacity based on the application and choose the right load factor when allocating memory for the datastructure.
No escape from re-hashing unless you implement your own HashMap.
A few other facts: Rehashing happens only on HashRelated data structures i.e HashMap, HashTable so on. So all non hash datastructures like treemap, sortedmap(is an interface so we can take this out of picture) are not re-hashed on reaching a limit.
Re-sizing always happens on ArrayList when it reaches the threshold. So use Linked list if you have a lot of insertions and deletions in between the datastructure.
When there is a collision during a put in a HashMap is the map resized or is the entry added to a list in that particular bucket?
When you say 'collision', do you mean the same hashcode? The hashcode is used to determine what bucket in a HashMap is to be used, and the bucket is made up of a linked list of all the entries with the same hashcode. The entries are then compared for equality (using .equals()) before being returned or booted (get/put).
Note that this is the HashMap specifically (since that's the one you asked about), and with other implementations, YMMV.
Either could happen - it depends on the fill ratio of the HashMap.
Usually however, it will be added to the list for that bucket - the HashMap class is designed so that resizes are comparatively rare (because they are more expensive).
The documentation of java.util.HashMap explains exactly when the map is resized:
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor.
The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.
The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.
When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
The default initial capacity is 16, the default load factor is 0.75. You can supply other values in the map's constructor.
Resising is done when the load factor is reached.
When there is a collision during a put in a HashMap the entry is added to a list in that particular "bucket". If the load factor is reached, the Hashmap is resized.