If LinkedHashMap's time complexity is same as HashMap's complexity why do we need HashMap? What are all the extra overhead LinkedHashMap has when compared to HashMap in Java?
LinkedHashMap will take more memory. Each entry in a normal HashMap just has the key and the value. Each LinkedHashMap entry has those references and references to the next and previous entries. There's also a little bit more housekeeping to do, although that's usually irrelevant.
If LinkedHashMap's time complexity is same as HashMap's complexity why do we need HashMap?
You should not confuse complexity with performance. Two algorithms can have the same complexity, yet one can consistently perform better than the other.
Remember that f(N) is O(N) means that:
limit(f(N), N -> infinity) <= C*N
where C is a constant. The complexity says nothing about how small or large the C values are. For two different algorithms, the constant C will most likely be different.
(And remember that big-O complexity is about the behavior / performance as N gets very large. It tells you nothing about the behavior / performance for smaller N values.)
Having said that:
The difference in performance between HashMap and LinkedHashMap operations in equivalent use-cases is relatively small.
A LinkedHashMap uses more memory. For example, the Java 11 implementation has two additional reference fields in each map entry to represent the before/after list. On a 64 bit platform without compressed OOPs the extra overhead is 16 bytes per entry.
Relatively small differences in performance and/or memory usage can actually matter a lot to people with performance or memory critical applications1.
1 - ... and also to people who obsess about these things unnecessarily.
LinkedHashMap additionally maintains a doubly-linked list running through all of its entries, that will provide a reproducable order. This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order).
HashMap doesn't have these extra costs (runtime,space) and should prefered over LinkedHashMap when you don't care about insertion order.
LinkedHashMap is a useful data structure when you need to know the insertion order of keys to the Map. One suitable use case is for the implementation of an LRU cache. Due to order maintenance of the LinkedHashMap, the data structure needs additional memory compared to HashMap. In case insertion order is not a requirement, you should always go for the HashMap.
There is another major difference between HashMap and LinkedHashMap :Iteration is more efficient in case of LinkedHashMap.
As Elements in LinkedHashMap are connected with each other so iteration requires time proportional to the size of the map, regardless of its capacity.
But in case of HashMap; as there is no fixed order, so iteration over it requires time proportional to its capacity.
I have put more details on my blog.
HashMap does not maintains insertion order, hence does not maintains any doubly linked list.
Most salient feature of LinkedHashMap is that it maintains insertion order of key-value pairs. LinkedHashMap uses doubly Linked List for doing so.
Entry of LinkedHashMap looks like this:
static class Entry<K, V> {
K key;
V value;
Entry<K,V> next;
Entry<K,V> before, after; //For maintaining insertion order
public Entry(K key, V value, Entry<K,V> next){
this.key = key;
this.value = value;
this.next = next;
}
}
By using before and after - we keep track of newly added entry in LinkedHashMap, which helps us in maintaining insertion order.
Before refer to previous entry and
after refers to next entry in LinkedHashMap.
For diagrams and step by step explanation please refer http://www.javamadesoeasy.com/2015/02/linkedhashmap-custom-implementation.html
LinkedHashMap inherits HashMap, that means it uses existing implementation of HashMap to store key and values in a Node (Entry Object). Other than this it stores a separate doubly linked list implementation to maintain the insertion order in which keys have been entered.
It looks like this :
header node <---> node 1 <---> node 2 <---> node 3 <----> node 4 <---> header node.
So extra overload is maintaining insertion and deletion in this doubly linked list.
Benefit is : Iteration order is guaranteed to be insertion order, which is not in HashMap.
Re-sizing is supposed to be faster as it iterates through its
double-linked list to transfer the contents into a new table array.
containsValue() is Overridden to take advantage of the faster
iterator.
LinkedHashMap can also be used to create a LRU cache. A special
LinkedHashMap(capacity, loadFactor, accessOrderBoolean) constructor
is provided to create a linked hash map whose order of iteration is
the order in which its entries were last accessed, from
least-recently accessed to most-recently. In this case, merely
querying the map with get() is a structural modification.
Related
LinkedHashSet - This implementation spares its clients from the unspecified, generally chaotic ordering provided by HashSet, without incurring the increased cost associated with TreeSet.
Same is said about LinkedHashMap vs TreeMap
What is this increased cost (LinkedHashMap vs TreeMap) exactly?
Does that mean that TreeSet needs more memory per element? LinkedHashSet needs more memory for two additional links, but TreeSet needs additional memory to store Map.Entry pair of elements (because implicitly based on TreeMap), besides LinkedHashSet is based on HashMap which also has Map.Entry pair of elements overhead...
So the difference is how fast a new element is added (in case of TreeSet it takes longer due to some "sorting").
What are other significant increased costs?
TreeSet/TreeMap have a higher time complexity for operations such ass add(), contains() (for TreeSet), put(), containsKey() (for TreeMap), etc... since they require logarithmic time to locate elements in the tree (or add elements to the tree), while LinkedHashSet/LinkedHashMap require expected constant time for those operations.
In terms of memory requirements, there's a very small difference:
TreeMap entries hold key, value, 3 Entry references (left, right, parent) and a boolean.
LinkedHashMap entries hold key, value, 3 Entry references (next, before, after) and an int.
When iterating a HashSet, the iteration order is generally the order of the hash of the object, which is generally not too useful if you want a predictable order.
If sane ordering is important you would generally need to use a TreeSet which iterates in sorted order but at a cost because maintaining the sorted order adds to the complexity of the process.
A LinkedHashSet can be used as a middle-ground solution to the seemingly insane ordering of a HashSet by ensuring that the iteration order is at least consistent by using the insertion order.
Lately,I've been going through implementations of Map interface in java. I understood HashMap and everything made sense.
But when it comes to LinkedHashMap, as per my knowledge so far, the Entry has key, value, before and after. where before and after keeps track of order of insertion.
However, using hashcode and bucket concept doesn't make sense to me in LinkedHashMaps.
I went through this article for understanding implementation of linkedHashMaps
Could someone please explain it? I mean why does it matter in which bucket we put the entry node. In fact why bucket concept in the first place.? why not plain doubly llinked lists?
LinkedHashMap is still a type of a HashMap. It uses the same logic as HashMap in order to find the bucket where a key belongs (used in methods such as get(),put(),containsKey(),etc...). The hashCode() is used to locate that bucket. This logic is essential for the expected O(1) performance of these operations.
The added functionality of LinkedHashMap (which uses the before and after references) is only used to iterate the entries according to insertion order, so it affects the iterators of the Collections returned by the keySet(),entrySet() & values() methods. It doesn't affect where the entries are stored.
Without hashcodes and buckets, LinkedHashMap won't be able to lookup keys in the Map in O(1) expected time.
In my Android project I have a method to partially invalidate a cache
(HashMap<Integer, Boolean>).
I am currently using a HashMap for compatibility with third party code.
I found a great answer here but it requires switching to a TreeMap. The given solution is:
treeMap.tailMap(key).clear();
The TreeMap solution is much better than my effort on HashMap:
//where hashMap is a copied instance for the method
for (Integer key : hashMap.keySet()) {
if (key > minPosition) {
hashMap.remove(key);
}
}
Is there a better time/complexity solution for doing this in a HashMap, similar to the TreeMap solution?
If you are required to use HashMap, there is no better (more efficient) solution than iterating the entry set, and removing entries one at a time. This is going to be an O(N) operation. You need to visit / test all entries in the map.
As you correctly observed you can bulk remove entries more neatly and more efficiently from a TreeMap. It will be an O(logN) operation. But the downside is that insertions and deletions are O(logN) rather than O(1).
A LinkedHashMap can help in certain use-cases, but not in this one. (It orders the entries based on the sequence of insertion, not on the values of the keys.)
Why does Hashmap internally use a LinkedList instead of an Arraylist when two objects are placed in the same bucket in the hash table?
Why does HashMap internally use s LinkedList instead of an Arraylist, when two objects are placed into the same bucket in the hash table?
Actually, it doesn't use either (!).
It actually uses a singly linked list implemented by chaining the hash table entries. (By contrast, a LinkedList is doubly linked, and it requires a separate Node object for each element in the list.)
So why am I nitpicking here? Because it is actually important ... because it means that the normal trade-off between LinkedList and ArrayList does not apply.
The normal trade-off is:
ArrayList uses less space, but insertion and removal of a selected element is O(N) in the worst case.
LinkedList uses more space, but insertion and removal of a selected element1 is O(1).
However, in the case of the private singly linked list formed by chaining together HashMap entry nodes, the space overhead is one reference (same as ArrayList), the cost of inserting a node is O(1) (same as LinkedList), and the cost of removing a selected node is also O(1) (same as LinkedList).
Relying solely on "big O" for this analysis is dubious, but when you look at the actual code, it is clear that what HashMap does beat ArrayList on performance for deletion and insertion, and is comparable for lookup. (This ignores memory locality effects.) And it also uses less memory for the chaining than either ArrayList or LinkedList was used ... considering that there are already internal entry objects to hold the key / value pairs.
But it gets even more complicated. In Java 8, they overhauled the HashMap internal data structures. In the current implementation, once a hash chain exceeds a certain length threshold, the implementation switches to using a binary tree representation if the key type implements Comparable.
1 - That is the insertion / deletion is O(1) if you have found the insertion / removal point. For example, if you are using the insert and remove methods on a LinkedList object's ListIterator.
This basically boils down to complexities of ArrayList and LinkedList.
Insertion in LinkedList (when order is not important) is O(1), just append to start.
Insertion in ArrayList (when order is not important) is O(N) ,traverse to end and there is also resizing overhead.
Removal is O(n) in LinkedList, traverse and adjust pointers.
Removal in arraylist could be O(n^2) , traverse to element and shift elements or resize the Arraylist.
Contains will be O(n) in either cases.
When using a HashMap we will expect O(1) operations for add, remove and contains. Using ArrayList we will incur higher cost for the add, remove operations in buckets
Short Answer : Java uses either LinkedList or ArrayList (whichever it finds appropriate for the data).
Long Answer
Although sorted ArrayList looks like the obvious way to go, there are some practical benefits of using LinkedList instead.
We need to remember that LinkedList chain is used only when there is collision of keys.
But as a definition of Hash function : Collisions should be rare
In rare cases of collisions we have to choose between Sorted ArrayList or LinkedList.
If we compare sorted ArrayList and LinkedList there are some clear trade-offs
Insertion and Deletion : Sorted ArrayList takes O(n), but LinkedList takes constant O(1)
Retrieval : Sorted ArrayList takes O(logn) and LinkedList takes 0(n).
Now, its clear that LinkedList are better than sorted ArrayList during insertion and deletion, but they are bad while retrieval.
In there are fewer collisions, sorted ArrayList brings less value (but more over head).
But when the collisions are more frequent and the collided elements list become large(over certain threshold) Java changes the collision data structure from LinkedList to ArrayList.
i need to insert into a very large LinkedList, whose elements i hold in a fast-access HashMap.
it's important to keep the list in order (which is not the natural order of the keys).
i thought it would be possible to hash the linked list nodes, then insert directly on the node (getting the node from the map + insert in a linked list == constant time).
however, i couldn't find any Java collection that would do that or similar...
i'm currently using LinkedHashMap, which doesn't meet the requirements above.
thanks, asaf :-)
If the LinkedList should be sorted after each insertion, I doubt you will be able to find such a data structure, as it implies that you would get a sorting algorithm with time complexity O(n), which has been proven impossible. (The lowest bound on sorting is O(n log n).) The best you could get on insertion is O(log n).
Then you can use the TreeMap data structure.
Use a TreeSet or TreeMap. Inserts are O(log(n)) but remember that means LOG. So if you have 4 billion entries the runtime is O(32). If you have 264 entries, then an insert takes O(64), so it's not really a big deal.
As it's "important to keep the list in order" your effectively doing an insertion sort which has a best-case-performance of O(n). Additionally, you must not confuse hashing with sorting - as the order of a hash isn't defined as it depends on the size of the underlying hashtable. (Using a hash to insert would only be useful if you know the predecessor or successor of the node you want to insert)
When you say that you must keep your list in order - but that it is not the natural order of the keys, I hear you saying that you must preserve insertion order.
But then I don't know why LinkedHashMap doesn't meet your requirements.
Can you explain what LinkedHashMap fails to do?