What is increased cost of TreeSet vs LinkedHashSet and TreeMap over LinkedHashMap?

What is increased cost of TreeSet vs LinkedHashSet and TreeMap over LinkedHashMap? - java

LinkedHashSet - This implementation spares its clients from the unspecified, generally chaotic ordering provided by HashSet, without incurring the increased cost associated with TreeSet.
Same is said about LinkedHashMap vs TreeMap
What is this increased cost (LinkedHashMap vs TreeMap) exactly?
Does that mean that TreeSet needs more memory per element? LinkedHashSet needs more memory for two additional links, but TreeSet needs additional memory to store Map.Entry pair of elements (because implicitly based on TreeMap), besides LinkedHashSet is based on HashMap which also has Map.Entry pair of elements overhead...
So the difference is how fast a new element is added (in case of TreeSet it takes longer due to some "sorting").
What are other significant increased costs?

TreeSet/TreeMap have a higher time complexity for operations such ass add(), contains() (for TreeSet), put(), containsKey() (for TreeMap), etc... since they require logarithmic time to locate elements in the tree (or add elements to the tree), while LinkedHashSet/LinkedHashMap require expected constant time for those operations.
In terms of memory requirements, there's a very small difference:
TreeMap entries hold key, value, 3 Entry references (left, right, parent) and a boolean.
LinkedHashMap entries hold key, value, 3 Entry references (next, before, after) and an int.

When iterating a HashSet, the iteration order is generally the order of the hash of the object, which is generally not too useful if you want a predictable order.
If sane ordering is important you would generally need to use a TreeSet which iterates in sorted order but at a cost because maintaining the sorted order adds to the complexity of the process.
A LinkedHashSet can be used as a middle-ground solution to the seemingly insane ordering of a HashSet by ensuring that the iteration order is at least consistent by using the insertion order.

Related

Why does Hashmap Internally use LinkedList instead of Arraylist

Why does Hashmap internally use a LinkedList instead of an Arraylist when two objects are placed in the same bucket in the hash table?

Why does HashMap internally use s LinkedList instead of an Arraylist, when two objects are placed into the same bucket in the hash table?
Actually, it doesn't use either (!).
It actually uses a singly linked list implemented by chaining the hash table entries. (By contrast, a LinkedList is doubly linked, and it requires a separate Node object for each element in the list.)
So why am I nitpicking here? Because it is actually important ... because it means that the normal trade-off between LinkedList and ArrayList does not apply.
The normal trade-off is:
ArrayList uses less space, but insertion and removal of a selected element is O(N) in the worst case.
LinkedList uses more space, but insertion and removal of a selected element1 is O(1).
However, in the case of the private singly linked list formed by chaining together HashMap entry nodes, the space overhead is one reference (same as ArrayList), the cost of inserting a node is O(1) (same as LinkedList), and the cost of removing a selected node is also O(1) (same as LinkedList).
Relying solely on "big O" for this analysis is dubious, but when you look at the actual code, it is clear that what HashMap does beat ArrayList on performance for deletion and insertion, and is comparable for lookup. (This ignores memory locality effects.) And it also uses less memory for the chaining than either ArrayList or LinkedList was used ... considering that there are already internal entry objects to hold the key / value pairs.
But it gets even more complicated. In Java 8, they overhauled the HashMap internal data structures. In the current implementation, once a hash chain exceeds a certain length threshold, the implementation switches to using a binary tree representation if the key type implements Comparable.
1 - That is the insertion / deletion is O(1) if you have found the insertion / removal point. For example, if you are using the insert and remove methods on a LinkedList object's ListIterator.

This basically boils down to complexities of ArrayList and LinkedList.
Insertion in LinkedList (when order is not important) is O(1), just append to start.
Insertion in ArrayList (when order is not important) is O(N) ,traverse to end and there is also resizing overhead.
Removal is O(n) in LinkedList, traverse and adjust pointers.
Removal in arraylist could be O(n^2) , traverse to element and shift elements or resize the Arraylist.
Contains will be O(n) in either cases.
When using a HashMap we will expect O(1) operations for add, remove and contains. Using ArrayList we will incur higher cost for the add, remove operations in buckets

Short Answer : Java uses either LinkedList or ArrayList (whichever it finds appropriate for the data).
Long Answer
Although sorted ArrayList looks like the obvious way to go, there are some practical benefits of using LinkedList instead.
We need to remember that LinkedList chain is used only when there is collision of keys.
But as a definition of Hash function : Collisions should be rare
In rare cases of collisions we have to choose between Sorted ArrayList or LinkedList.
If we compare sorted ArrayList and LinkedList there are some clear trade-offs
Insertion and Deletion : Sorted ArrayList takes O(n), but LinkedList takes constant O(1)
Retrieval : Sorted ArrayList takes O(logn) and LinkedList takes 0(n).
Now, its clear that LinkedList are better than sorted ArrayList during insertion and deletion, but they are bad while retrieval.
In there are fewer collisions, sorted ArrayList brings less value (but more over head).
But when the collisions are more frequent and the collided elements list become large(over certain threshold) Java changes the collision data structure from LinkedList to ArrayList.

Disadvantages of a LinkedHashMap?

Are there any disadvantages of using a LinkedHashMap instead of a HashMap? Most posts seem to discuss the advantages of LinkedHashMaps (such as this one or the API), but I can't find any reason HashMaps are better.

As the docs say, This implementation differs from HashMap in that it maintains a doubly-linked list running through all of its entries.. This has the benefit of allowing predictable iteration order, but the disadvantages are increased memory usage and probably higher insertion cost - nothing comes for free, the additional structure (linked list) uses some memory and requires extra CPU cost in order to be maintained.

Yes there is. LinkedHashMap differs from HashMap in that the order of elements is maintained.
So in order to maintain order, LinkedHashMap needs the expense of maintaining a linked list. Whereas a HashMap has no such overhead leading to a better performance than a LinkedHashMap.
Note that LinkedHashMap implements a normal hashtable, but with the added benefit of the keys of the hashtable being stored as a doubly-linked list.

Is iterating through a TreeSet slower than iterating through a HashSet in Java?

I'm running some benchmarks. One of my tests depends on order, so I'm using a TreeSet for that. My second test doesn't, so I'm using a HashSet for it.
I know that insertion is slower for the TreeSet. But what about iterating through all elements?

TreeSets internally uses TreeMaps which are Red Black Trees (special type of BST) .
BST Inorder Traversal is O(n)
HashSets internally uses HashMaps which use an array for holding Entry objects.
Here also traversal should be O(n) .
Unless you write a benchmark it is going to be difficult to prove which is faster.

From a similar post (Hashset vs Treeset):
HashSet is much faster than TreeSet (constant-time versus log-time for most operations like add, remove and contains) but offers no ordering guarantees like TreeSet.
HashSet:
class offers constant time performance for the basic operations (add, remove, contains and size).
it does not guarantee that the order of elements will remain constant over time
iteration performance depends on the initial capacity and the load factor of the HashSet.
It's quite safe to accept default load factor but you may want to specify an initial capacity that's about twice the size to which you expect the set to grow.
TreeSet:
guarantees log(n) time cost for the basic operations (add, remove and contains)
guarantees that elements of set will be sorted (ascending, natural, or the one specified by you via it's constructor)
doesn't offer any tuning parameters for iteration performance
offers a few handy methods to deal with the ordered set like first(), last(), headSet(), and tailSet() etc
Important points:
Both guarantee duplicate-free collection of elements
It is generally faster to add elements to the HashSet and then convert the collection to a TreeSet for a duplicate-free sorted traversal.
None of these implementation are synchronized. That is if multiple threads access a set concurrently, and at least one of the threads modifies the set, it must be synchronized externally.
LinkedHashSet is in some sense intermediate between HashSet and TreeSet. Implemented as a hash table with a linked list running through it, however it provides insertion-ordered iteration which is not same as sorted traversal guaranteed by TreeSet.
So choice of usage depends entirely on your needs but I feel that even if you need an ordered collection then you should still prefer HashSet to create the Set and then convert it into TreeSet.
e.g. Set<String> s = new TreeSet<String>(hashSet);

If you want stable ordering with (nearly) the performance of a HashSet, then use a LinkedHashSet. You will still get constant-time operations, whereas I would assume a TreeSet will get you logarithmic time.

Contains on TreeSet versus another Set

Is the contains method on TreeSet (Since it is already sorted per default) faster than say HashSet?
The reason I ask is that Collections.binarySearch is quite fast if the List is sorted, so I am thinking that maybe the contains method for TreeSet might be the same.

From the javadoc of TreeSet:
This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).
From the javadoc of HashSet:
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets.
So the answer is no.
Looking at the implementation (JDK 1.7 oracle), treeset.contains (resp. hashtree) relies on treemap.containsKey (resp. hashmap) method. containsKey loops over one hash bucket in hashmap (which possibly contains only one item), whereas it loops over the whole map, moving from node to node in treemap, using the compareTo method. If your item is the largest or the smallest, this can take significantly more time.
Finally, I just ran a quick test (yes I know, not very reliable) with a tree containing 1m integers and looking for one of the 2 largest, which forces the treeset to browse the whole set. HashSet is quicker by a factor of 50.

How is the implementation of LinkedHashMap different from HashMap?

If LinkedHashMap's time complexity is same as HashMap's complexity why do we need HashMap? What are all the extra overhead LinkedHashMap has when compared to HashMap in Java?

LinkedHashMap will take more memory. Each entry in a normal HashMap just has the key and the value. Each LinkedHashMap entry has those references and references to the next and previous entries. There's also a little bit more housekeeping to do, although that's usually irrelevant.

If LinkedHashMap's time complexity is same as HashMap's complexity why do we need HashMap?
You should not confuse complexity with performance. Two algorithms can have the same complexity, yet one can consistently perform better than the other.
Remember that f(N) is O(N) means that:
limit(f(N), N -> infinity) <= C*N
where C is a constant. The complexity says nothing about how small or large the C values are. For two different algorithms, the constant C will most likely be different.
(And remember that big-O complexity is about the behavior / performance as N gets very large. It tells you nothing about the behavior / performance for smaller N values.)
Having said that:
The difference in performance between HashMap and LinkedHashMap operations in equivalent use-cases is relatively small.
A LinkedHashMap uses more memory. For example, the Java 11 implementation has two additional reference fields in each map entry to represent the before/after list. On a 64 bit platform without compressed OOPs the extra overhead is 16 bytes per entry.
Relatively small differences in performance and/or memory usage can actually matter a lot to people with performance or memory critical applications1.
1 - ... and also to people who obsess about these things unnecessarily.

LinkedHashMap additionally maintains a doubly-linked list running through all of its entries, that will provide a reproducable order. This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order).
HashMap doesn't have these extra costs (runtime,space) and should prefered over LinkedHashMap when you don't care about insertion order.

LinkedHashMap is a useful data structure when you need to know the insertion order of keys to the Map. One suitable use case is for the implementation of an LRU cache. Due to order maintenance of the LinkedHashMap, the data structure needs additional memory compared to HashMap. In case insertion order is not a requirement, you should always go for the HashMap.

There is another major difference between HashMap and LinkedHashMap :Iteration is more efficient in case of LinkedHashMap.
As Elements in LinkedHashMap are connected with each other so iteration requires time proportional to the size of the map, regardless of its capacity.
But in case of HashMap; as there is no fixed order, so iteration over it requires time proportional to its capacity.
I have put more details on my blog.

HashMap does not maintains insertion order, hence does not maintains any doubly linked list.
Most salient feature of LinkedHashMap is that it maintains insertion order of key-value pairs. LinkedHashMap uses doubly Linked List for doing so.
Entry of LinkedHashMap looks like this:
static class Entry<K, V> {
K key;
V value;
Entry<K,V> next;
Entry<K,V> before, after; //For maintaining insertion order
public Entry(K key, V value, Entry<K,V> next){
this.key = key;
this.value = value;
this.next = next;
}
}
By using before and after - we keep track of newly added entry in LinkedHashMap, which helps us in maintaining insertion order.
Before refer to previous entry and
after refers to next entry in LinkedHashMap.
For diagrams and step by step explanation please refer http://www.javamadesoeasy.com/2015/02/linkedhashmap-custom-implementation.html

LinkedHashMap inherits HashMap, that means it uses existing implementation of HashMap to store key and values in a Node (Entry Object). Other than this it stores a separate doubly linked list implementation to maintain the insertion order in which keys have been entered.
It looks like this :
header node <---> node 1 <---> node 2 <---> node 3 <----> node 4 <---> header node.
So extra overload is maintaining insertion and deletion in this doubly linked list.
Benefit is : Iteration order is guaranteed to be insertion order, which is not in HashMap.

Re-sizing is supposed to be faster as it iterates through its
double-linked list to transfer the contents into a new table array.
containsValue() is Overridden to take advantage of the faster
iterator.
LinkedHashMap can also be used to create a LRU cache. A special
LinkedHashMap(capacity, loadFactor, accessOrderBoolean) constructor
is provided to create a linked hash map whose order of iteration is
the order in which its entries were last accessed, from
least-recently accessed to most-recently. In this case, merely
querying the map with get() is a structural modification.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.