Are there any disadvantages of using a LinkedHashMap instead of a HashMap? Most posts seem to discuss the advantages of LinkedHashMaps (such as this one or the API), but I can't find any reason HashMaps are better.
As the docs say, This implementation differs from HashMap in that it maintains a doubly-linked list running through all of its entries.. This has the benefit of allowing predictable iteration order, but the disadvantages are increased memory usage and probably higher insertion cost - nothing comes for free, the additional structure (linked list) uses some memory and requires extra CPU cost in order to be maintained.
Yes there is. LinkedHashMap differs from HashMap in that the order of elements is maintained.
So in order to maintain order, LinkedHashMap needs the expense of maintaining a linked list. Whereas a HashMap has no such overhead leading to a better performance than a LinkedHashMap.
Note that LinkedHashMap implements a normal hashtable, but with the added benefit of the keys of the hashtable being stored as a doubly-linked list.
Related
From Javadoc:
Hash table and linked list implementation of the Map interface, with predictable iteration order. This implementation differs from HashMap in that it maintains a doubly-linked list running through all of its entries.
If it is so, then why doesn't it provide object access like List in java,
list.get(index);
UPDATE
I had implemented LRU Cache using LinkedHashMap. My algorithm required me to access LRU Object from the cache. That's why I required random access, but I think that will cost me bad performance, so I have changed the logic and I am accessing the LRU object just when Cache is full...using removeEldestEntry()
Thank you all...
a) Because the entries are linked, not randomly accessible. The performance would be miserable, O(N) if I'm not in error.
b) Because there is no interface to back up this functionality. So the choice would be to either introduce a dedicated interface just for this (badly performing) Implementation or require clients to program against implementation classes instead of interfaces
Btw with Guava there's a simple solution for you:
Iterables.get(map.values(), offset);
And for caching look at Guava's MapMaker and it's expiration features.
Since values() provides a backing collection of the values, you can solve it like this:
map.values().remove(map.values().toArray()[index]);
Perhaps not very efficient (especially memory-wise), but it should be O(N) just as you would expect it to be.
Btw, I think the question is legitimate for all List operations. (It shouldn't be slower than LinkedList anyway, right?)
I set out to do a LinkedHashMapList which extended the LinkedHashMap and implemented the List interface. Surprisingly it seems impossible to do, due to the clash for remove. The existing remove method returns the previously mapped object, while the List.remove should return a boolean.
That's just a reflection, and honestly, I also find it annoying that the LinkedHashMap can't be treated more like a LinkedList.
It provides an Iterator interface, each node in the list is linked to the one before it and after it. Having a get(i) method would be no different than iterating over all the elements in the list since there is no backing array (same as LinkedList).
If you require this ability which isn't very performant I suggest extending the map yourself
If you want random access you can do
Map<K,V> map = new LinkedHashMap<K,V>();
Map.Entry<K,V>[] entries = (Map.Entry<K,V>[]) map.toArray(new Map.Entry[map.size()]);
Map.Entry<K,V> entry_n = entry[n];
As you can see the performance is likely to be very poor unless you cache the entries array.
I would question the need for it however.
There is no real problem to make a Map with log(N) efficiency of access by index. If you use a red-black tree and store for each node the number of elements in the tree starting at that node it is possible to write a get(int index) method that is log(N).
LinkedHashSet - This implementation spares its clients from the unspecified, generally chaotic ordering provided by HashSet, without incurring the increased cost associated with TreeSet.
Same is said about LinkedHashMap vs TreeMap
What is this increased cost (LinkedHashMap vs TreeMap) exactly?
Does that mean that TreeSet needs more memory per element? LinkedHashSet needs more memory for two additional links, but TreeSet needs additional memory to store Map.Entry pair of elements (because implicitly based on TreeMap), besides LinkedHashSet is based on HashMap which also has Map.Entry pair of elements overhead...
So the difference is how fast a new element is added (in case of TreeSet it takes longer due to some "sorting").
What are other significant increased costs?
TreeSet/TreeMap have a higher time complexity for operations such ass add(), contains() (for TreeSet), put(), containsKey() (for TreeMap), etc... since they require logarithmic time to locate elements in the tree (or add elements to the tree), while LinkedHashSet/LinkedHashMap require expected constant time for those operations.
In terms of memory requirements, there's a very small difference:
TreeMap entries hold key, value, 3 Entry references (left, right, parent) and a boolean.
LinkedHashMap entries hold key, value, 3 Entry references (next, before, after) and an int.
When iterating a HashSet, the iteration order is generally the order of the hash of the object, which is generally not too useful if you want a predictable order.
If sane ordering is important you would generally need to use a TreeSet which iterates in sorted order but at a cost because maintaining the sorted order adds to the complexity of the process.
A LinkedHashSet can be used as a middle-ground solution to the seemingly insane ordering of a HashSet by ensuring that the iteration order is at least consistent by using the insertion order.
I'm running some benchmarks. One of my tests depends on order, so I'm using a TreeSet for that. My second test doesn't, so I'm using a HashSet for it.
I know that insertion is slower for the TreeSet. But what about iterating through all elements?
TreeSets internally uses TreeMaps which are Red Black Trees (special type of BST) .
BST Inorder Traversal is O(n)
HashSets internally uses HashMaps which use an array for holding Entry objects.
Here also traversal should be O(n) .
Unless you write a benchmark it is going to be difficult to prove which is faster.
From a similar post (Hashset vs Treeset):
HashSet is much faster than TreeSet (constant-time versus log-time for most operations like add, remove and contains) but offers no ordering guarantees like TreeSet.
HashSet:
class offers constant time performance for the basic operations (add, remove, contains and size).
it does not guarantee that the order of elements will remain constant over time
iteration performance depends on the initial capacity and the load factor of the HashSet.
It's quite safe to accept default load factor but you may want to specify an initial capacity that's about twice the size to which you expect the set to grow.
TreeSet:
guarantees log(n) time cost for the basic operations (add, remove and contains)
guarantees that elements of set will be sorted (ascending, natural, or the one specified by you via it's constructor)
doesn't offer any tuning parameters for iteration performance
offers a few handy methods to deal with the ordered set like first(), last(), headSet(), and tailSet() etc
Important points:
Both guarantee duplicate-free collection of elements
It is generally faster to add elements to the HashSet and then convert the collection to a TreeSet for a duplicate-free sorted traversal.
None of these implementation are synchronized. That is if multiple threads access a set concurrently, and at least one of the threads modifies the set, it must be synchronized externally.
LinkedHashSet is in some sense intermediate between HashSet and TreeSet. Implemented as a hash table with a linked list running through it, however it provides insertion-ordered iteration which is not same as sorted traversal guaranteed by TreeSet.
So choice of usage depends entirely on your needs but I feel that even if you need an ordered collection then you should still prefer HashSet to create the Set and then convert it into TreeSet.
e.g. Set<String> s = new TreeSet<String>(hashSet);
If you want stable ordering with (nearly) the performance of a HashSet, then use a LinkedHashSet. You will still get constant-time operations, whereas I would assume a TreeSet will get you logarithmic time.
I'm thinking about filling a collection with a large amount of unique objects.
How is the cost of an insert in a Set (say HashSet) compared to an List (say ArrayList)?
My feeling is that duplicate elimination in sets might cause a slight overhead.
There is no "duplicate elimination" such as comparing to all existing elements. If you insert into hash set, it's really a dictionary of items by hash code. There's no duplicate checking unless there already are items with the same hash code. Given a reasonable (well-distributed) hash function, it's not that bad.
As Will has noted, because of the dictionary structure HashSet is probably a bit slower than an ArrayList (unless you want to insert "between" existing elements). It also is a bit larger. I'm not sure that's a significant difference though.
You're right: set structures are inherently more complex in order to recognize and eliminate duplicates. Whether this overhead is significant for your case should be tested with a benchmark.
Another factor is memory usage. If your objects are very small, the memory overhead introduced by the set structure can be significant. In the most extreme case (TreeSet<Integer> vs. ArrayList<Integer>) the set structure can require more than 10 times as much memory.
If you're certain your data will be unique, use a List. You can use a Set to enforce this rule.
Sets are faster than Lists if you have a large data set, while the inverse is true for smaller data sets. I haven't personally tested this claim.
Which type of List?
Also, consider which List to use. LinkedLists are faster at adding, removing elements.
ArrayLists are faster at random access (for loops, etc), but this can be worked around using the Iterator of a LinkedList. ArrayLists are are much faster at: list.toArray().
You have to compare concrete implementations (for example HashSet with ArrayList), because the abstract interfaces Set/List don't really tell you anything about performance.
Inserting into a HashSet is a pretty cheap operation, as long as the hashCode() of the object to be inserted is sane. It will still be slightly slower than ArrayList, because it's insertion is a simple insertion into an array (assuming you insert in the end and there's still free space; I don't factor in resizing the internal array, because the same cost applies to HashSet as well).
If the goal is the uniqueness of the elements, you should use an implementation of the java.util.Set interface. The class java.util.HashSet and java.util.LinkedHashSet have O(alpha) (close to O(1) in the best case) complexity for insert, delete and contains check.
ArrayList have O(n) for object (not index) contains check (you have to scroll through the whole list) and insertion (if the insertion is not in tail of the list, you have to shift the whole underline array).
You can use LinkedHashSet that preserve the order of insertion and have the same potentiality of HashSet (takes up only a bit more of memory).
I don't think you can make this judgement simply on the cost of building the collection. Other things that you need to take into account are:
Is the input dataset ordered? Is there a requirement that the output data structure preserves insertion order?
Is there a requirement that the output data structure is ordered (or reordered) based on element values?
Will the output data structure be subsequently modified? How?
Is there a requirement that the output data structure is duplicate free if other elements are added subsequently?
Do you know how many elements are likely to be in the input dataset?
Can you measure the size of the input dataset? (Or is it provided via an iterator?)
Does space utilization matter?
These can all effect your choice of data structure.
Java List:
If you don't have such requirement that you have to keep duplicate or not. Then you can use List instead of Set.
List is an interface in Collection framework. Which extends Collection interface. and ArrayList, LinkedList is the implementation of List interface.
When to use ArrayList or LinkedList
ArrayList: If you have such requirement that in your application mostly work is accessing the data. Then you should go for ArrayList. because ArrayList implements RtandomAccess interface which is Marker Interface. because of Marker interface ArrayList have capability to access the data in O(1) time. and you can use ArrayList over LinkedList where you want to get data according to insertion order.
LinkedList: If you have such requirement that your mostly work is insertion or deletion. Then you should use LinkedList over the ArrayList. because in LinkedList insertion and deletion happen in O(1) time whereas in ArrayList it's O(n) time.
Java Set:
If you have requirement in your application that you don't want any duplicates. Then you should go for Set instead of List. Because Set doesn't store any duplicates. Because Set works on the principle of Hashing. If we add object in Set then first it checks object's hashCode in the bucket if it's find any hashCode present in it's bucked then it'll not add that object.
If LinkedHashMap's time complexity is same as HashMap's complexity why do we need HashMap? What are all the extra overhead LinkedHashMap has when compared to HashMap in Java?
LinkedHashMap will take more memory. Each entry in a normal HashMap just has the key and the value. Each LinkedHashMap entry has those references and references to the next and previous entries. There's also a little bit more housekeeping to do, although that's usually irrelevant.
If LinkedHashMap's time complexity is same as HashMap's complexity why do we need HashMap?
You should not confuse complexity with performance. Two algorithms can have the same complexity, yet one can consistently perform better than the other.
Remember that f(N) is O(N) means that:
limit(f(N), N -> infinity) <= C*N
where C is a constant. The complexity says nothing about how small or large the C values are. For two different algorithms, the constant C will most likely be different.
(And remember that big-O complexity is about the behavior / performance as N gets very large. It tells you nothing about the behavior / performance for smaller N values.)
Having said that:
The difference in performance between HashMap and LinkedHashMap operations in equivalent use-cases is relatively small.
A LinkedHashMap uses more memory. For example, the Java 11 implementation has two additional reference fields in each map entry to represent the before/after list. On a 64 bit platform without compressed OOPs the extra overhead is 16 bytes per entry.
Relatively small differences in performance and/or memory usage can actually matter a lot to people with performance or memory critical applications1.
1 - ... and also to people who obsess about these things unnecessarily.
LinkedHashMap additionally maintains a doubly-linked list running through all of its entries, that will provide a reproducable order. This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order).
HashMap doesn't have these extra costs (runtime,space) and should prefered over LinkedHashMap when you don't care about insertion order.
LinkedHashMap is a useful data structure when you need to know the insertion order of keys to the Map. One suitable use case is for the implementation of an LRU cache. Due to order maintenance of the LinkedHashMap, the data structure needs additional memory compared to HashMap. In case insertion order is not a requirement, you should always go for the HashMap.
There is another major difference between HashMap and LinkedHashMap :Iteration is more efficient in case of LinkedHashMap.
As Elements in LinkedHashMap are connected with each other so iteration requires time proportional to the size of the map, regardless of its capacity.
But in case of HashMap; as there is no fixed order, so iteration over it requires time proportional to its capacity.
I have put more details on my blog.
HashMap does not maintains insertion order, hence does not maintains any doubly linked list.
Most salient feature of LinkedHashMap is that it maintains insertion order of key-value pairs. LinkedHashMap uses doubly Linked List for doing so.
Entry of LinkedHashMap looks like this:
static class Entry<K, V> {
K key;
V value;
Entry<K,V> next;
Entry<K,V> before, after; //For maintaining insertion order
public Entry(K key, V value, Entry<K,V> next){
this.key = key;
this.value = value;
this.next = next;
}
}
By using before and after - we keep track of newly added entry in LinkedHashMap, which helps us in maintaining insertion order.
Before refer to previous entry and
after refers to next entry in LinkedHashMap.
For diagrams and step by step explanation please refer http://www.javamadesoeasy.com/2015/02/linkedhashmap-custom-implementation.html
LinkedHashMap inherits HashMap, that means it uses existing implementation of HashMap to store key and values in a Node (Entry Object). Other than this it stores a separate doubly linked list implementation to maintain the insertion order in which keys have been entered.
It looks like this :
header node <---> node 1 <---> node 2 <---> node 3 <----> node 4 <---> header node.
So extra overload is maintaining insertion and deletion in this doubly linked list.
Benefit is : Iteration order is guaranteed to be insertion order, which is not in HashMap.
Re-sizing is supposed to be faster as it iterates through its
double-linked list to transfer the contents into a new table array.
containsValue() is Overridden to take advantage of the faster
iterator.
LinkedHashMap can also be used to create a LRU cache. A special
LinkedHashMap(capacity, loadFactor, accessOrderBoolean) constructor
is provided to create a linked hash map whose order of iteration is
the order in which its entries were last accessed, from
least-recently accessed to most-recently. In this case, merely
querying the map with get() is a structural modification.