I have a TreeMap, and want to get a set of K smallest keys (entries) larger than a given value.
I know we can use higherKey(givenValue) get the one key, but then how should I iterate from there?
One possible way is to get a tailMap from the smallest key larger than the given value, but that's an overkill if K is small compared with the map size.
Is there a better way with O(logn + K) time complexity?
The error here is thinking that tailMap makes a new map. It doesn't. It gives you a lightweight object that basically just contains a field pointing back to the original map, and the pivot point, that's all it has. Any changes you make to a tailMap map will therefore also affect the underlying map, and any changes made to the underlying map will also affect your tailMap.
for (KeyType key : treeMap.tailMap(pivot).keySet()) {
}
The above IS O(logn) complexity for the tailMap operation, and give that you loop, that adds K to the lot, for a grand total of O(logn+K) time complexity, and O(1) space complexity, where N is the size of the map and K is the number of keys that end up being on the selected side of the pivot point (so, worst case, O(nlogn)).
If you want an actual copied-over map, you'd do something like:
TreeMap<KeyType> tm = new TreeMap<KeyType>(original.tailMap(pivot));
Note that this also copies over the used comparator, in case you specified a custom one. That one really is O(logn + K) in time complexity (and O(K) in space complexity). Of course, if you loop through this code, it's.. still O(logn + K) (because 2K just boils down to K when talking about big-O notation).
Related
I know that the insertion into a HashMap takes O(1) time Complexity, so for inserting n elements the complexity should be O(n). I have a little doubt about the below method.
Code:
private static Map<Character, Character> mapCharacters(String message) {
Map<Character, Character> map = new HashMap<>();
char idx = 'a';
for (char ch : message.toCharArray()) { // a loop - O(n)
if (!map.containsKey(ch)) { // containsKey - take O(n) to check the whole map
map.put(ch, idx);
++idx;
}
}
return map;
}
What I am doing is to map the decrypted message with the sequential "abcd".
So, My confusion is whether the complexity of the above method is O(n^2) or O(n)? Please help me to clear my doubt.
It's amortised O(n). containsKey and put are both amortised O(1); see What is the time complexity of HashMap.containsKey() in java? in particular.
The time complexity of both containsKey() and put() will depend on the implementation of the equals/hashCode contract of the object that is used as a key.
Under the hood, HashMap maintains an array of buckets. And each bucket corresponds to a range of hashes. Each non-empty bucket contains nodes (each node contains information related to a particular map-entry) that form either a linked list or a tree (since Java 8).
The process of determining the right bucket based on the hash of the provided key is almost instant, unless the process of computing the hash is heavy, but anyway it's considered to have a constant time complexity O(1). Accessing the bucket (i.e. accessing array element) is also O(1), but then things can become tricky. Assuming that bucket contains only a few elements, checking them will not significantly increase the total cost (while performing both containsKey() and put() we should check whether provided key exists in the map) and overall time-complexity will be O(1).
But in the case when all the map-entries for some reason ended up in the same bucket then iterating over the linked list will result in O(n), and if nodes are stored as a Red-black tree the worse case cost of both containsKey() and put() will be O(log n).
In case Character as a key we have a proper implementation of the equals/hashCode. Proper hash function is important to ensure that objects would be evenly spread among buckets. But if you were using instead a custom object which hashCode() is broken, for instance always returning the same number, then all entries will end up in the same bucket and performance will be poor.
Collisions, situations when different objects produces hashes in the range that is being mapped to the same bucket, are inevitable, and we need a good hash function to make the number of collisions as fewer as possible, so that bucket would be "evenly" occupied and the map get expanded when the number of non-empty buckets exceeds load factor.
The time complexity of this particular method is linear O(n). But it might not be the case if you would change the type of the key.
Given your original code below:
private static Map<Character,Character> mapCharacters(String message){
Map<Character, Character> map = new HashMap<>();
char idx = 'a';
for (char ch : message.toCharArray()) {
if (!map.containsKey(ch)) {
map.put(ch, idx);
++idx;
}
}
return map;
}
Here are some observations about the time complexity:
Iterating each character in message will be O(n) – pretty clearly if there are 10 characters it takes 10 steps, 10,000 characters will take 10,000 steps. The time complexity is directly proportional to n.
Inside the loop, you're checking if the map contains that character already – using containsKey(), this is a constant-time operation which is defined as O(1) time complexity. The idea is that the actual time to evaluate the result of containsKey() should be the same whether your map has 10 keys or 10,000 keys.
Inside the if, you have map.put() which is another constant-time operation, another O(1) operation.
Combined together: for every nth character, you spend time iterating it in the loop, checking if the map has it already, and adding it; all of this would be 3n (three times n) which is just O(n).
Separately, you could make a small improvement to your code by replacing this block:
if (!map.containsKey(ch)) {
map.put(ch, idx);
++idx;
}
with something like below (which retains the same behavior of running ++idx only if you've put something, and specifically if the previous thing was either "not present" or perhaps had been set with "null" - though the "set as null" doesn't apply from your code sample):
Character previousValue = map.putIfAbsent(ch, idx);
if (previousValue == null) {
++idx;
}
Though the time complexity wouldn't change, using containsKey() makes the code clearer, and also takes advantage of well-tested, performant code that's available for free.
Under the covers, calling Map.putIfAbsent() is implemented like below (as of OpenJDK 17) showing an initial call to get() followed by put() if the key isn't present.
default V putIfAbsent(K key, V value) {
V v = get(key);
if (v == null) {
v = put(key, value);
}
return v;
}
I am making a custom implementation of a HashMap without using the HashMap data structure as part of an assignment, currently I have the choice of working with two 1D arrays or using a 2D array to store my keys and values. I want to be able to check if a key exists and return the corresponding value in O(1) time complexity (assignment requirement) but i am assuming it is without the use of containsKey().
Also, when inserting key and value pairs to my arrays, i am confused because it should not be O(1) logically, since there would occasionally be cases where there is collision and i have to recalculate the index, so why is the assignment requirement for insertion O(1)?
A lot of questions in there, let me give it a try.
I want to be able to check if a key exists and return the
corresponding value in O(1) time complexity (assignment requirement)
but i am assuming it is without the use of containsKey().
That actually doesn't make a difference. O(1) means the execution time is independent of the input, it does not mean a single operation is used. If your containsKey() and put() implementations are both O(1), then so is your solution that uses both of them exactly once.
Also, when inserting key and value pairs to my arrays, i am confused
because it should not be O(1) logically, since there would
occasionally be cases where there is collision and i have to
recalculate the index, so why is the assignment requirement for
insertion O(1)?
O(1) is the best case, which assumes that there are no hash collisions. The worst case is O(n) if every key generates the same hash code. So when a hash map's lookup or insertion performance is calculated as O(1), that assumes a perfect hashCode implementation.
Finally, when it comes to data structures, the usual approach is to use a single array, where the array items are link list nodes. The array offsets correspond to hashcode() % array size (there are much more advances formulas than this, but this is a good starting point). In case of a hash collision, you will have to navigate the linked list nodes until you find the correct entry.
You're correct in that hash table insert is not guaranteed to be O(1) due to collisions. If you use the open addressing strategy to deal with collisions, the process to insert an item is going to take time proportional to 1/(1-a) where a is the proportion of how much of the table capacity has been used. As the table fills up, a goes to 1, and the time to insert grows without bound.
The secret to keeping time complexity as O(1) is making sure that there's always room in the table. That way a never grows too big. That's why you have to resize the table when it starts to run out capacity.
Problem: resizing a table with N item takes O(N) time.
Solution: increase the capacity exponentially, for example double it every time you need to resize. This way the table has to be resized very rarely. The cost of the occasional resize operations is "amortized" over a large number of insertions, and that's why people say that hash table insertion has "amortized O(1)" time complexity.
TLDR: Make sure you increase the table capacity when it's getting full, maybe 70-80% utilization. When you increase the capacity, make sure that it's by a constant factor, for example doubling it.
Is there any chance to update a key of the least key entry (firstEntry()) in the TreeMap in one pass?
So, for example, if I pollFirstEntry(), it takes O(log n) time to retrieve and remove an entry. Afterwards, I create a new entry with desired key and put it back into TreeMap, it also takes O(log n) time. Consequently, I spend O(2 log n) time, in the case where it is logically could be just O(1+log n) = (log n) time.
I would be very happy to avoid removing the entry, but update it while it captured by firstEntry() method.
If it is not possible with TreeMap, could anybody suggest an alternative PriorityQueue like data structure, where possible to update keys of the least entry.
O(2 log N) is rightly considered O(log N). But no, cannot be done as in the entry in the map changes to another position (in the tree). The almost only data structure where this does not hold is a list of key-value pairs, and that is a terrible O(N) or as you might like O(N/2).
If the key can be used as index in a huge array, then O(1) would hold, still 2 operations.
Can someone tell me the time complexity of the below code?
a is an array of int.
Set<Integer> set = new HashSet<Integer>();
for (int i = 0; i < a.length; i++) {
if (set.contains(arr[i])) {
System.out.println("Hello");
}
set.add(arr[i]);
}
I think that it is O(n), but I'm not sure since it is using Set and this contains methods as well. It is also calling the add method of set.
Can anyone confirm and explain what the time complexity of the entire above code is? Also, how much space would it take?
i believe its O(n) because you loop over the array, and contains and add should be constant time because its a hash based set. If it were not hash based and required iteration over the entire set to do lookups, the upper bound would be n^2.
Integers are immutable, so the space complexity would be 2n, which I believe simplifies to just n, since constants don't matter.
If you had objects in the array and set, then you would have 2n references and n objects, so you are at 3n, which is still linear (times a constant) space constraints.
EDIT-- yep "This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets."
see here.
Understanding HashSet is the key of question
According to HashSet in Javadoc,
This class implements the Set interface, backed by a hash table
(actually a HashMap instance)...This class offers constant time
performance for the basic operations (add, remove, contains and size)
A more complete explain about HashSet https://www.geeksforgeeks.org/hashset-contains-method-in-java/?ref=rp
So the HashSet insert and contains are O(1). ( As HashSet is based on HashMap and Its memory complexity is O(n))
The rest is simple, the main array you are looping is order of O(n) , so the total order of function will be O(n).
I am trying to implement a plane sweep algorithm and for this I need to know the time complexity of java.util.HashMap class' keySet() method. I suspect that it is O(n log n). Am I correct?
Point of clarification: I am talking about the time complexity of the keySet() method; iterating through the returned Set will take obviously O(n) time.
Getting the keyset is O(1) and cheap. This is because HashMap.keyset() returns the actual KeySet object associated with the HashMap.
The returned Set is not a copy of the keys, but a wrapper for the actual HashMap's state. Indeed, if you update the set you can actually change the HashMap's state; e.g. calling clear() on the set will clear the HashMap!
... iterating through the returned Set will take obviously O(n) time.
Actually that is not always true:
It is true for a HashMap is created using new HashMap<>(). The worst case is to have all N keys land in the same hash chain. However if the map has grown naturally, there will still be N entries and O(N) slots in the hash array. Thus iterating the entry set will involve O(N) operations.
It is false if the HashMap is created with new HashMap<>(capacity) and a singularly bad (too large) capacity estimate. Then it will take O(Cap) + O(N) operations to iterate the entry set. If we treat Cap as a variable, that is O(max(Cap, N)), which could be worse than O(N).
There is an escape clause though. Since capacity is an int in the current HashMap API, the upper bound for Cap is 231. So for really large values of Cap and N, the complexity is O(N).
On the other hand, N is limited by the amount of memory available and in practice you need a heap in the order of 238 bytes (256GBytes) for N to exceed the largest possible Cap value. And for a map that size, you would be better off using a hashtable implementation tuned for huge maps. Or not using an excessively large capacity estimate!
Surely it would be O(1). All that it is doing is returning a wrapper object on the HashMap.
If you are talking about walking over the keyset, then this is O(n), since each next() call is O(1), and this needs to be performed n times.
This should be doable in O(n) time... A hash map is usually implemented as a large bucket array, the bucket's size is (usually) directly proportional to the size of the hash map. In order to retrieve the key set, the bucket must be iterated through, and for each set item, the key must be retrieved (either through an intermediate collection or an iterator with direct access to the bucket)...
**EDIT: As others have pointed out, the actual keyset() method will run in O(1) time, however, iterating over the keyset or transferring it to a dedicated collection will be an O(n) operation. Not quite sure which one you are looking for **
Java collections have a lot of space and thus don't take much time. That method is, I believe, O(1). The collection is just sitting there.
To address the "iterating through the returned Set will take obviously O(n) time" comment, this is not actually correct per the doc comments of HashMap:
Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
So in other words, iterating over the returned Set will take O(n + c) where n is the size of the map and c is its capacity, not O(n). If an inappropriately sized initial capacity or load factor were chosen, the value of c could outweigh the actual size of the map in terms of iteration time.