Iteration through a HashMap (Complexity) - java

From what I understand, the time complexity to iterate through a Hash table with capacity "m" and number of entries "n" is O(n+m). I was wondering, intuitively, why this is the case? For instance, why isn't it n*m?
Thanks in advance!

You are absolutely correct. Iterating a HashMap is an O(n + m) operation, with n being the number of elements contained in the HashMap and m being its capacity. Actually, this is clearly stated in the docs:
Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings).
Intuitively (and conceptually), this is because a HashMap consists of an array of buckets, with each element of the array pointing to either nothing (i.e. to null), or to a list of entries.
So if the array of buckets has size m, and if there are n entries in the map in total (I mean, n entries scattered throughout all the lists hanging from some bucket), then, iterating the HashMap is done by visiting each bucket, and, for buckets that have a list with entries, visiting each entry in the list. As there are m buckets and n elements in total, iteration is O(m + n).
Note that this is not the case for all hash table implementations. For example, LinkedHashMap is like a HashMap, except that it also has all its entries connected in a doubly-linked list fashion (to preserve either insertion or access order). If you are to iterate a LinkedHashMap, there's no need to visit each bucket. It would be enough to just visit the very first entry and then follow its link to the next entry, and then proceed to the next one, etc, and so on until the last entry. Thus, iterating a LinkedHashMap is just O(n), with n being the total number of entries.

Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings)
n = the number of buckets
m = the number of key-value mappings
The complexity of a Hashmap is O(n+m) because the worst-case scenario is one array element contains the whole linked list, which could occur due to flawed implementation of hashcode function being used as a key.
Visialise the worst-case scenario
To iterate this scenario, first java needs to iterate the complete array O(n) and then iterate the linked list O(m), combining these gives us O(n+m).

Related

How hashmap identifies when it needs rehashing

How Hashmap identifies that this bucket is full and it needs rehashing as it stores the value in linked list if two hashcodes are same, then as per my understanding this linkedlist does not have any fixed size and it can store as many elements it can so this bucket will never be full then how it will identify that it needs rehashing?
In a ConcurrentHashMap actually a red-black tree (for large number of elements) or a linked list (for small number of elements) is used when there is a collision (i.e. two different keys have the same hash code). But you are right when you say, the linked list (or red-black tree) can grow infinitely (assuming you have infinite memory and heap size).
The basic idea of a HashMap or ConcurrentHashMap is, you want to retrieve a value based on its key in O(1) time complexity. But in reality, collisions do happen and when that happens we put the nodes in a tree linked to the bucket (or array cell). So, Java could create a HashMap where the array size would remain fixed and rehashing would never happen, but if they did that all your key values need to be accommodated within the fixed-size array (along with their linked trees).
Let's say you have that kind of HashMap where your array size is fixed to 16 and you push 1000 key-value pairs in it. In that case, you can have at most 16 distinct hash codes. This in turn means that you would have collisions in all (1000-16) puts and those new nodes will end up in the tree and they can no longer be fetched in O(1) time complexity. In tree you'd need O(log n) time to search for keys.
To make sure that this doesn't happen, the HashMap uses a load factor calculation to determine how much of the array is filled with key-value pairs. If it is 75% (default settings) full, any new put would create a new larger array, copy the existing content into it and thus have more buckets or hash code space. This ensures that in most of the cases collisions won't happen and the tree won't be required and you would fetch most keys in O(1) time.
Hashmap maintain complexity of O(1) while inserting data in and
getting data from hashmap, but for 13th key-value pair, put request
will no longer be O(1), because as soon as map will realize that 13th
element came in, that is 75% of map is filled.
It will first double the bucket(array) capacity and then it will go
for Rehash. Rehashing requires re-computing hashcode of already placed
12 key-value pairs again and putting them at new index which requires
time.
Kindly refer this link, this will be helpful for you.

How does single instance in each bucket yield best performance in java Hashmap?

I have read in a book that if a hash function returns unique hash value for each distinct object, its most efficient. If hashcode() method in a class gives a unique hash value for each distinct object and I want to store n distinct instance of that class in a Hashmap, then there will be n buckets for storing n instances.The time complexity will be O(n). Then how does single entry(instance) for each hash value yield better performance?Is it related to the data structure of bucket?
You seem to think that having n buckets for n elements, time complexity will be O(n), which is wrong.
How about a different example, suppose you have an ArrayList with n elements, how much time it will take to perform a get(index) for example? O(1) right?
Now think about the HashMap, this index in the ArrayList example is actually the hashCode for the map. When we insert into a HashMap to find the location of where that element goes (the bucket), we use the hashcode (index). If there is an entry per bucket - the search time for a value from the map is O(1).
But even if there are multiple values in a single bucket, the general search complexity for a HashMap is still O(1).
The data structure of the bucket is important too. For the worse case scenarios for example. Under the current implementation of a HashMap it uses two types: LinkedNode and TreeNode; depending on a few things like how many there are in a bucket at this point in time. Linked is easy:
next.next.next...
TreeNode is
- left
node
- right
It is a red-black Tree. In such a data structure search complexity is O(logn) which is much better then O(n).
A Java HashMap associates a Key k with a Value v. Every Java Object has a method hashCode() that produces an integer which is not necessarily unique.
I have read in a book that if a hash function returns unique hash value for each distinct object, its most efficient.
Another definition would be that the best Hash Function is one that produces the least collisions.
If hashcode() method in a class gives a unique hash value for each distinct object and I want to store n distinct instance of that class in a Hashmap, then there will be n buckets for storing n instances.The time complexity will be O(n).
In our case HashMap contains a table of buckets of a certain size, let's say >= n for our purposes. It uses the hashCode of the object as a key and through a Hash Function returns an index to the table. If we have n objects and the Hash Function returns n different indexes we have zero collisions. This is the optimal case and the complexity to find and get any object is O(1).
Now, if the Hash Function returns the same index for 2 different keys (objects) then we have a collision and the table bucket on that index already contains a value. In that case the table bucket will point to another newly allocated bucket. In that order, a list is created on the index that the collision happened. So the worst case complexity will be O(m) where m is the size of the biggest list.
In conclusion the performance of the HashMap depends on the number of collisions. The fewer the better.
I believe this video will help you.

(Java) data structure for fast insertion, deletion, and RANDOM SELECTION

I need a data structure that supports the following operations in O(1):
myList.add(Item)
myList.remove(Item.ID) ==> It actually requires random access
myList.getRandomElement() (with equal probability)
--(Please note that getRandomElement() does not mean random access, it just means: "Give me one of the items at random, with equal probability")
Note that my items are unique, so I don't care if a List or Set is used.
I checked some java data structures, but it seems that none of them is the solution:
HashSet supports 1,2 in O(1), but it cannot give me a random element in O(1). I need to call mySet.iterator().next() to select a random element, which takes O(n).
ArrayList does 1,3 in O(1), but it needs to do a linear search to find the element I want to delete, though it takes O(n)
Any suggestions? Please tell me which functions should I call?
If java does not have such data structure, which algorithm should I use for such purpose?
You can use combination of HashMap and ArrayList if memory permits as follows:-
Store numbers in ArrayList arr as they come.
Use HashMap to give mapping arr[i] => i
While generating random select random form arrayList
Deleting :-
check in HashMap for num => i
swap(i,arr.size()-1)
HashMap.remove(num)
HashMap(arr[i])=> i
arr.remove(arr.size()-1)
All operation are O(1) but extra O(N) space
You can use a HashMap (of ID to array index) in conjunction with an array (or ArrayList).
add could be done in O(1) by simply adding to the array and adding the ID and index to the HashMap.
remove could be done in O(1) by doing a lookup (and removal) from the HashMap to find the index, then move the last index in the array to that index, update that element's index in the HashMap and decreasing the array size by one.
getRandomElement could be done in O(1) by returning a random element from the array.
Example:
Array: [5,3,2,4]
HashMap: [5->0, 3->1, 2->2, 4->3]
To remove 3:
Look up (and remove) key 3 in the HashMap (giving 3->1)
Swap 3 and, the last element, 4 in the array
Update 4's index in the HashMap to 1
Decrease the size of the array by 1
Array: [5,4,2]
HashMap: [5->0, 2->2, 4->1]
To add 6:
Simply add it to the array and HashMap
Array: [5,4,2,6]
HashMap: [5->0, 2->2, 4->1, 6->3]

Order of finding correct bucket in java hashmap

what is order of finding correct bucket in java hashmap??
In hashmap first bucket is located using hashcode method and then we iterate over it using equals method, so my question is on first part, what is complexity of finding the bucket in which desired key is present.
Looking up the bucket is O(1). Hashmap just computes the hashcode and uses it to index into the bucket slots.
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

What is the time complexity of java.util.HashMap class' keySet() method?

I am trying to implement a plane sweep algorithm and for this I need to know the time complexity of java.util.HashMap class' keySet() method. I suspect that it is O(n log n). Am I correct?
Point of clarification: I am talking about the time complexity of the keySet() method; iterating through the returned Set will take obviously O(n) time.
Getting the keyset is O(1) and cheap. This is because HashMap.keyset() returns the actual KeySet object associated with the HashMap.
The returned Set is not a copy of the keys, but a wrapper for the actual HashMap's state. Indeed, if you update the set you can actually change the HashMap's state; e.g. calling clear() on the set will clear the HashMap!
... iterating through the returned Set will take obviously O(n) time.
Actually that is not always true:
It is true for a HashMap is created using new HashMap<>(). The worst case is to have all N keys land in the same hash chain. However if the map has grown naturally, there will still be N entries and O(N) slots in the hash array. Thus iterating the entry set will involve O(N) operations.
It is false if the HashMap is created with new HashMap<>(capacity) and a singularly bad (too large) capacity estimate. Then it will take O(Cap) + O(N) operations to iterate the entry set. If we treat Cap as a variable, that is O(max(Cap, N)), which could be worse than O(N).
There is an escape clause though. Since capacity is an int in the current HashMap API, the upper bound for Cap is 231. So for really large values of Cap and N, the complexity is O(N).
On the other hand, N is limited by the amount of memory available and in practice you need a heap in the order of 238 bytes (256GBytes) for N to exceed the largest possible Cap value. And for a map that size, you would be better off using a hashtable implementation tuned for huge maps. Or not using an excessively large capacity estimate!
Surely it would be O(1). All that it is doing is returning a wrapper object on the HashMap.
If you are talking about walking over the keyset, then this is O(n), since each next() call is O(1), and this needs to be performed n times.
This should be doable in O(n) time... A hash map is usually implemented as a large bucket array, the bucket's size is (usually) directly proportional to the size of the hash map. In order to retrieve the key set, the bucket must be iterated through, and for each set item, the key must be retrieved (either through an intermediate collection or an iterator with direct access to the bucket)...
**EDIT: As others have pointed out, the actual keyset() method will run in O(1) time, however, iterating over the keyset or transferring it to a dedicated collection will be an O(n) operation. Not quite sure which one you are looking for **
Java collections have a lot of space and thus don't take much time. That method is, I believe, O(1). The collection is just sitting there.
To address the "iterating through the returned Set will take obviously O(n) time" comment, this is not actually correct per the doc comments of HashMap:
Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
So in other words, iterating over the returned Set will take O(n + c) where n is the size of the map and c is its capacity, not O(n). If an inappropriately sized initial capacity or load factor were chosen, the value of c could outweigh the actual size of the map in terms of iteration time.

Categories

Resources