contain operation on hashmap dependant on size of hashmap? - java

As per my understanding on hashmap
Question 1:-
For Hashmap returning the unique hashcode for each key
time to determine whether a object is contained in hashmap is constant
and does not depend on size of hashmap
Question 2:-
For Hashmap returning the same hashcode for each key but retrning false for equals method
time to determine whether a object is contained in hashmap is dependant on size of hashmap
Is that true ?

It is generally considered that hashmap look ups only take O(1) time. This is the average time for look up. But in the worst case scenario it can be O(n) as well. For an example if a linked list is used in the implementation of hashmap this scenario can occur. But it can be avoided if self-balancing trees are used which reduces the worst case scenario to O(log n) time.

If we have an appropriately written hash function, then yes order of retrieval comes out to be of O(1).
Think of it this way, if your hash function is written appropriately, so that elements are distributed across buckets then the time to search an element would be proportional to the size of bucket. Now, if you have a constant size bucket and number of buckets or memory size is not a constraint, then you will be able to retrieve the element in constant time.
Regarding your second question: Yes if you have a hash function returning same hashcode, then the order of retrieval of element will be proportional to the size of hashmap also called O(n)

Related

What type of Map should I use if the data does not change?

I have data that I want to lookup by key.
My particular use case is that the data (key/value and number of elements) does not change once the map is initialised. All key/value values are known at once.
I have generally use a HashMap for this with default constructor (default initial capacity and load factor).
What is the best way build this Map? If I was to use HashMap, what should the default initial capacity and load factor be set to? Is Map.copyOf() a better solution? Does the size of the map matter (20 elements vs 140,000)?
This article https://docs.oracle.com/en/java/javase/15/core/creating-immutable-lists-sets-and-maps.html#GUID-6A9BAE41-A1AD-4AA1-AF1A-A8FC99A14199 seems to imply that non mutable Map returned by Map.copyOf() is more space efficient.
HashMap is fairly close to optimal in most cases already. The array of buckets doubles in capacity each time, so it's most wasteful when you have (2^N) + 1 items, since the capacity will necessarily be 2^(N+1) (i.e. 2049 items require capacity of 4096, but 2048 items fit perfectly).
In your case, specifying an initial size will only prevent a few reallocations when the map is created, which if it only happens once probably isn't relevant. Load factor is not relevant because the map's capacity will never change. In any case, if you did want to pre-size, this would be correct:
new HashMap<>(numItems, 1);
Does the size of the map matter (20 elements vs 140,000)?
It will have an impact, but not a massive one. Items are grouped into buckets, and buckets are structured as lists or trees. So the performance is mostly dependent on how many items are in a given bucket, rather than the total number of items across all buckets.
What's important is how evenly distributed across your buckets the items are. A bad hash code implementation will result in clustering. Clustering will start to move O(1) operations towards O(log n), I believe.
// The worst possible hashCode impl.
#Override
public int hashCode() { return 0; } // or any other constant
If you have the same items in the map across multiple invocations of your application (not clear from the question if that's the case), and if the class of the key is under your control, then you have the luxury of being able to tweak the hashCode implementation to positively affect the distribution, e.g. by using different prime numbers as a modulus. This would be trial and error, though, and is really only a micro-optimization.
As for the comments/answers addressing how to confer immutability, I'd argue that that's a separate concern. First work out what map is actually optimal, then worry about how to confer immutability upon it, if it isn't already. You can always wrap a mutable map in Collections.unmodifiableMap. Supposedly Guava's ImmutableMap is slower than HashMap, and I suspect other immutable variants will struggle to exceed the performance of HashMap too.

Java HashSet worst case lookup time complexity

If hashtables/maps with closed hashing are worst-case O(n), are HashSets also going to require O(n) time for lookup, or is it constant time?
When looking up an element in a HashMap, it performs an O(1) calculation to find the right bucket, and then iterates over the items there serially until it finds the one the is equal to the requested key, or all the items are checked.
In the worst case scenario, all the items in the map have the same hash code and are therefore stored in the same bucket. In this case, you'll need to iterate over all of them serially, which would be an O(n) operation.
A HashSet is just a HashMap where you don't care about the values, only the keys - under the hood, it's a HashMap where all the values are a dummy Object.
If you look at the implementation of a HashSet (e.g. from OpenJDK 8: https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashSet.java), you can see that it's actually just built on top of a HashMap. Relevant code snippet here:
public class HashSet<E>
extends AbstractSet<E>
implements Set<E>, Cloneable, java.io.Serializable
{
private transient HashMap<E,Object> map;
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
/**
* Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
* default initial capacity (16) and load factor (0.75).
*/
public HashSet() {
map = new HashMap<>();
}
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
The HashSet attempts to slightly optimize the memory usage by creating a single static empty Object value named PRESENT and just using that as the value part of every key/value entry into the HashMap.
So whatever the performance implications are of using a HashMap, a HashSet will have more or less the same ones since it's literally using a HashMap under the covers.
To directly answer your question: in the worst case, yes, just as a the worse case complexity of a HashMap is O(n), so too the worst case complexity of a HashSet is O(n).
It is worth noting that, unless you have a really bad hash function or are using a hashtable of a ridiculously small size, you're very unlikely to see the worst case performance in practice. You'd have to have every element hash to the exact same bucket in the hashtable so the performance would essentially degrade to a linked list traversal (assuming a hashtable using chaining for collision handling, which the Java ones do).
Worst case is O(N) as mentioned, average and amortized run time is constant.
From GeeksForGeeks:
The underlying data structure for HashSet is hashtable. So amortize (average or usual case) time complexity for add, remove and look-up (contains method) operation of HashSet takes O(1) time.
I see a lot of people saying the worst case is O(n). This is because the old HashSet implementation used to use a LinkedList to handle collisions to the same bucket. However, that is not a definitive answer.
In java 8 such LinkedList is replaced by a balanced binary tree when the number of collisions of a bucket grows. This improves the worst-case performance from O(n) to O(log n) for lookups.
You can check additional details here.
http://openjdk.java.net/jeps/180
https://www.nagarro.com/en/blog/post/24/performance-improvement-for-hashmap-in-java-8

What is increased cost of TreeSet vs LinkedHashSet and TreeMap over LinkedHashMap?

LinkedHashSet - This implementation spares its clients from the unspecified, generally chaotic ordering provided by HashSet, without incurring the increased cost associated with TreeSet.
Same is said about LinkedHashMap vs TreeMap
What is this increased cost (LinkedHashMap vs TreeMap) exactly?
Does that mean that TreeSet needs more memory per element? LinkedHashSet needs more memory for two additional links, but TreeSet needs additional memory to store Map.Entry pair of elements (because implicitly based on TreeMap), besides LinkedHashSet is based on HashMap which also has Map.Entry pair of elements overhead...
So the difference is how fast a new element is added (in case of TreeSet it takes longer due to some "sorting").
What are other significant increased costs?
TreeSet/TreeMap have a higher time complexity for operations such ass add(), contains() (for TreeSet), put(), containsKey() (for TreeMap), etc... since they require logarithmic time to locate elements in the tree (or add elements to the tree), while LinkedHashSet/LinkedHashMap require expected constant time for those operations.
In terms of memory requirements, there's a very small difference:
TreeMap entries hold key, value, 3 Entry references (left, right, parent) and a boolean.
LinkedHashMap entries hold key, value, 3 Entry references (next, before, after) and an int.
When iterating a HashSet, the iteration order is generally the order of the hash of the object, which is generally not too useful if you want a predictable order.
If sane ordering is important you would generally need to use a TreeSet which iterates in sorted order but at a cost because maintaining the sorted order adds to the complexity of the process.
A LinkedHashSet can be used as a middle-ground solution to the seemingly insane ordering of a HashSet by ensuring that the iteration order is at least consistent by using the insertion order.

Contains on TreeSet versus another Set

Is the contains method on TreeSet (Since it is already sorted per default) faster than say HashSet?
The reason I ask is that Collections.binarySearch is quite fast if the List is sorted, so I am thinking that maybe the contains method for TreeSet might be the same.
From the javadoc of TreeSet:
This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).
From the javadoc of HashSet:
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets.
So the answer is no.
Looking at the implementation (JDK 1.7 oracle), treeset.contains (resp. hashtree) relies on treemap.containsKey (resp. hashmap) method. containsKey loops over one hash bucket in hashmap (which possibly contains only one item), whereas it loops over the whole map, moving from node to node in treemap, using the compareTo method. If your item is the largest or the smallest, this can take significantly more time.
Finally, I just ran a quick test (yes I know, not very reliable) with a tree containing 1m integers and looking for one of the 2 largest, which forces the treeset to browse the whole set. HashSet is quicker by a factor of 50.

How is the implementation of LinkedHashMap different from HashMap?

If LinkedHashMap's time complexity is same as HashMap's complexity why do we need HashMap? What are all the extra overhead LinkedHashMap has when compared to HashMap in Java?
LinkedHashMap will take more memory. Each entry in a normal HashMap just has the key and the value. Each LinkedHashMap entry has those references and references to the next and previous entries. There's also a little bit more housekeeping to do, although that's usually irrelevant.
If LinkedHashMap's time complexity is same as HashMap's complexity why do we need HashMap?
You should not confuse complexity with performance. Two algorithms can have the same complexity, yet one can consistently perform better than the other.
Remember that f(N) is O(N) means that:
limit(f(N), N -> infinity) <= C*N
where C is a constant. The complexity says nothing about how small or large the C values are. For two different algorithms, the constant C will most likely be different.
(And remember that big-O complexity is about the behavior / performance as N gets very large. It tells you nothing about the behavior / performance for smaller N values.)
Having said that:
The difference in performance between HashMap and LinkedHashMap operations in equivalent use-cases is relatively small.
A LinkedHashMap uses more memory. For example, the Java 11 implementation has two additional reference fields in each map entry to represent the before/after list. On a 64 bit platform without compressed OOPs the extra overhead is 16 bytes per entry.
Relatively small differences in performance and/or memory usage can actually matter a lot to people with performance or memory critical applications1.
1 - ... and also to people who obsess about these things unnecessarily.
LinkedHashMap additionally maintains a doubly-linked list running through all of its entries, that will provide a reproducable order. This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order).
HashMap doesn't have these extra costs (runtime,space) and should prefered over LinkedHashMap when you don't care about insertion order.
LinkedHashMap is a useful data structure when you need to know the insertion order of keys to the Map. One suitable use case is for the implementation of an LRU cache. Due to order maintenance of the LinkedHashMap, the data structure needs additional memory compared to HashMap. In case insertion order is not a requirement, you should always go for the HashMap.
There is another major difference between HashMap and LinkedHashMap :Iteration is more efficient in case of LinkedHashMap.
As Elements in LinkedHashMap are connected with each other so iteration requires time proportional to the size of the map, regardless of its capacity.
But in case of HashMap; as there is no fixed order, so iteration over it requires time proportional to its capacity.
I have put more details on my blog.
HashMap does not maintains insertion order, hence does not maintains any doubly linked list.
Most salient feature of LinkedHashMap is that it maintains insertion order of key-value pairs. LinkedHashMap uses doubly Linked List for doing so.
Entry of LinkedHashMap looks like this:
static class Entry<K, V> {
K key;
V value;
Entry<K,V> next;
Entry<K,V> before, after; //For maintaining insertion order
public Entry(K key, V value, Entry<K,V> next){
this.key = key;
this.value = value;
this.next = next;
}
}
By using before and after - we keep track of newly added entry in LinkedHashMap, which helps us in maintaining insertion order.
Before refer to previous entry and
after refers to next entry in LinkedHashMap.
For diagrams and step by step explanation please refer http://www.javamadesoeasy.com/2015/02/linkedhashmap-custom-implementation.html
LinkedHashMap inherits HashMap, that means it uses existing implementation of HashMap to store key and values in a Node (Entry Object). Other than this it stores a separate doubly linked list implementation to maintain the insertion order in which keys have been entered.
It looks like this :
header node <---> node 1 <---> node 2 <---> node 3 <----> node 4 <---> header node.
So extra overload is maintaining insertion and deletion in this doubly linked list.
Benefit is : Iteration order is guaranteed to be insertion order, which is not in HashMap.
Re-sizing is supposed to be faster as it iterates through its
double-linked list to transfer the contents into a new table array.
containsValue() is Overridden to take advantage of the faster
iterator.
LinkedHashMap can also be used to create a LRU cache. A special
LinkedHashMap(capacity, loadFactor, accessOrderBoolean) constructor
is provided to create a linked hash map whose order of iteration is
the order in which its entries were last accessed, from
least-recently accessed to most-recently. In this case, merely
querying the map with get() is a structural modification.

Categories

Resources