Time complexity of set in Java - java

Can someone tell me the time complexity of the below code?
a is an array of int.
Set<Integer> set = new HashSet<Integer>();
for (int i = 0; i < a.length; i++) {
if (set.contains(arr[i])) {
System.out.println("Hello");
}
set.add(arr[i]);
}
I think that it is O(n), but I'm not sure since it is using Set and this contains methods as well. It is also calling the add method of set.
Can anyone confirm and explain what the time complexity of the entire above code is? Also, how much space would it take?

i believe its O(n) because you loop over the array, and contains and add should be constant time because its a hash based set. If it were not hash based and required iteration over the entire set to do lookups, the upper bound would be n^2.
Integers are immutable, so the space complexity would be 2n, which I believe simplifies to just n, since constants don't matter.
If you had objects in the array and set, then you would have 2n references and n objects, so you are at 3n, which is still linear (times a constant) space constraints.
EDIT-- yep "This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets."
see here.

Understanding HashSet is the key of question
According to HashSet in Javadoc,
This class implements the Set interface, backed by a hash table
(actually a HashMap instance)...This class offers constant time
performance for the basic operations (add, remove, contains and size)
A more complete explain about HashSet https://www.geeksforgeeks.org/hashset-contains-method-in-java/?ref=rp
So the HashSet insert and contains are O(1). ( As HashSet is based on HashMap and Its memory complexity is O(n))
The rest is simple, the main array you are looping is order of O(n) , so the total order of function will be O(n).

Related

Algorithm Complexity: Is it the same to iterate an array from the start than from the end?

In an interview I was asked for the following:
public class Main {
public static void main(String[] args) {
// TODO Auto-generated method stub
int [] array = new int [10000];
for (int i = 0; i < array.length; i++) {
// do calculations
}
for (int x = array.length-1; x >= 0; x--) {
// do calculations
}
}
}
Is it the same to iterate an array either from the end or from the start? As my understanding it would be the same since complexity is constant i.e O(1) ? Am I correct?
Also I was asked regarding ArrayList Complexity compared to other collections in java, for example, LinkedList.
Thank you.
There can be a difference due to CPU prefetch characteristics.
There is no difference between looping in either direction as per computational theory. However, depending on the kind of prefetcher that is used by the CPU on which the code runs, there will be some differences in practice.
For example, the Sandy Bridge Intel processor has a prefetcher that goes forward only for data (while instructions could be prefetched in both directions). This will help iteration from the start (as future memory locations are prefetched into the L1 cache), while iterating from the end will cause very little to no prefetching, and hence more accesses to RAM which is much slower than accessing any of the CPU caches.
There is a more detailed discussion about forward and backward prefetching at this link.
It's O(n) in both cases for an array, as there are n iterations and each step takes O(1) (assuming the calculations in the loop take O(1)). In particular, obtaining the length or size is typically an O(1) operation for arrays or ArrayList.
A typical use case for iterating from the end is removing elements in the loop (which may otherwise require more complex accounting to avoid skipping elements or iterating beyond the end).
For a linked list, the first loop would typically be O(n²), as determining the length of a linked list is typically an O(n) operation without additional caching, and it's used every time the exit condition is checked. However, java.util.LinkedList keeps explicitly track of the length, so the total is O(n) for linked lists in java.
If an element in a linked list is accessed using the index in the calculations, this will be an O(n) operation, yielding a total of O(n²).
Is it the same to iterate an array either from the end or from the start? As my understanding it would be the same since complexity is constant i.e O(1) ? Am I correct?
In theory, yes it's the same to iterate an array from the end and from the start.
The time complexity is O(10,000) which is constant so O(1) assuming that the loop body has a constant time complexity. But it's nice to mention that the constant 10,000 can be promoted to a variable, call it N, and then you can say that the time complexity is O(N).
Also I was asked regarding ArrayList Complexity compared to other collections in java, for example, LinkedList.
Here you can find comparison between ArrayList and LinkedList time complexity. The interesting methods are add, remove and get.
http://www.programcreek.com/2013/03/arraylist-vs-linkedlist-vs-vector/
Also, data in a LinkedList are not stored consuecutively. However, data in an ArrayList are stored consecutively and also an ArrayList uses less space than a LinkedList
Good luck!

ArrayList and HashSet insert performance test result confuse me

i wirte a class to test the insert performance between arraylist and hashset,as i expect ,the hashset insert performance will be much better than arraylist(maybe the book deceived me),but the test result make me so confused
HashSet<String> hashSet = new HashSet<String>();
long start = System.currentTimeMillis();
for (int i = 0; i < 900000; i++) {
hashSet.add(String.valueOf(i));
}
System.out.println("Insert HashSet Time: " + (System.currentTimeMillis() - start));
ArrayList<String> arrayList = new ArrayList<String>();
start = System.currentTimeMillis();
for (int i = 0; i < 900000; i++) {
arrayList.add(String.valueOf(i));
}
System.out.println("Insert ArrayList Time: " + (System.currentTimeMillis() - start));
result:
Insert HashSet Time: 978
Insert ArrayList Time: 287
i run this main metod many times and the result has no more different between this,the insert arraylist time is much shorter than insert hashset time
can anybody explain this weird result.
The hashset and list are different types of datastructures. So you should think about what you do want to do with them before choosing one.
HashSet
Longer insert time
Fast access time on elements
List
Fast append time
Long access time on elements
The list is faster because it can just add the element at the end of the list, the hashset has to find where to insert and then make the element accessable, this is more work (time) as adding it to the end of a list.
Exact performance characteristics of datastructures and algorithms are highly machine- and implementation-specific. However, it doesn't seem surprising to me that ArrayList inserts would be faster than HashSet inserts by a constant factor. To insert into an ArrayList, you just need to set a value at a particular index in an array. To insert into a hash set, you need to compute a hashcode for the inserted item and map that to an array index, check that index and possibly perform some action based on what you find, and finally insert into the array. Furthermore the HashSet will have worse memory locality so you'll get cache misses more often.
There's also the question of array resizing, which both data structures will need to do, but both data structures will need to resize at about the same rate (and hash table resizing is probably more expensive by a constant factor, too, due to rehashing).
Both algorithms are constant (expected) time, but there's a lot more stuff to do with a hash table than an array list. So it's not surprising that it would be slower by a constant factor. (Again, the exact difference is highly dependent on machine and implementation.)
the hashset insert performance will be much better than arraylist
Where did you get that idea?
HashSet will outperform ArrayList on search i.e: get().
But on insert they have comparable performance. Actually ArrayList is even faster if you are within array limits (no resize needed) and the hash function is not good
Actually, you are getting the right results. Also, as pointed out in the above answer, these are different types of data-structures. Comparing them would be like comparing the speed of a bike with a car. I think the time for inserting in a HashSet must be more than that of insertion in an ArrayList because HashSet doesn't allow duplicate keys. So I assume that before insertion there must be some sort of checking for duplicate keys before insertion and how to handle them which makes them somewhat slower as compared to ArrayList.
HashSet is backed by hashtable. If you know about hashtable, you would know that there is a hash function. also collision handling(if there was collision) when you add new element in it. Well hashSet doesn't handle collision, just overwrite the old value if hash are same. However if capacity reached, it need to resize, and possible re-hash. it would be very slow.
ArrayList just append the object to the end of the list. if size reached, it does resize.

How reduce the complexity of the searching in two lists algorithm?

I have to find some common items in two lists. I cannot sort it, order is important. Have to find how many elements from secondList occur in firstList. Now it looks like below:
int[] firstList;
int[] secondList;
int iterator=0;
for(int i:firstList){
while(i <= secondList[iterator]/* two conditions more */){
iterator++;
//some actions
}
}
Complexity of this algorithm is n x n. I try to reduce the complexity of this operation, but I don't know how compare elements in different way? Any advice?
EDIT:
Example: A=5,4,3,2,3 B=1,2,3
We look for pairs B[i],A[j]
Condition:
when
B[i] < A[j]
j++
when
B[i] >= A[j]
return B[i],A[j-1]
next iteration through the list of A to an element j-1 (mean for(int z=0;z<j-1;z++))
I'm not sure, Did I make myself clear?
Duplicated are allowed.
My approach would be - put all the elements from the first array in a HashSet and then do an iteration over the second array. This reduces the complexity to the sum of the lengths of the two arrays. It has the downside of taking additional memory, but unless you use more memory I don't think you can improve your brute force solution.
EDIT: to avoid further dispute on the matter. If you are allowed to have duplicates in the first array and you actually care how many times does an element in the second array match an array in the first one, use HashMultiSet.
Put all the items of the first list in a set
For each item of the second list, test if its in the set.
Solved in less than n x n !
Edit to please fge :)
Instead of a set, you can use a map with the item as key and the number of occurrence as value.
Then for each item of the second list, if it exists in the map, execute your action once per occurence in the first list (dictionary entries' value).
import java.util.*;
int[] firstList;
int[] secondList;
int iterator=0;
HashSet hs = new HashSet(Arrays.asList(firstList));
HashSet result = new HashSet();
while(i <= secondList.length){
if (hs.contains( secondList[iterator]))
{
result.add(secondList[iterator]);
}
iterator++;
}
result will contain required common element.
Algorithm complexity n
Just because the order is important doesn't mean that you cannot sort either list (or both). It only means you will have to copy first before you can sort anything. Of course, copying requires additional memory and sorting requires additional processing time... yet I guess all solutions that are better than O(n^2) will require additional memory and processing time (also true for the suggested HashSet solutions - adding all values to a HashSet costs additional memory and processing time).
Sorting both lists is possible in O(n * log n) time, finding common elements once the lists are sorted is possible in O(n) time. Whether it will be faster than your native O(n^2) approach depends on the size of the lists. In the end only testing different approaches can tell you which approach is fastest (and those tests should use realistic list sizes as to be expected in your final code).
The Big-O notation is no notation that tells you anything about absolute speed, it only tells you something about relative speed. E.g. if you have two algorithms to calculate a value from an input set of elements, one is O(1) and the other one is O(n), this doesn't mean that the O(1) solution is always faster. This is a big misconception of the Big-O notation! It only means that if the number of input elements doubles, the O(1) solution will still take approx. the same amount of time while the O(n) solution will take approx. twice as much time as before. So there is no doubt that by constantly increasing the number of input elements, there must be a point where the O(1) solution will become faster than the O(n) solution, yet for a very small set of elements, the O(1) solution may in fact be slower than the O(n) solution.
OK, so this solution will work if there are no duplicates in either the first or second array. As the question does not tell, we cannot be sure.
First, build a LinkedHashSet<Integer> out of the first array, and a HashSet<Integer> out of the second array.
Second, retain in the first set only elements that are in the second set.
Third, iterate over the first set and proceed:
// A LinkedHashSet retains insertion order
Set<Integer> first = LinkedHashSet<Integer>(Arrays.asList(firstArray));
// A HashSet does not but we don't care
Set<Integer> second = new HashSet<Integer>(Arrays.asList(secondArray));
// Retain in first only what is in second
first.retainAll(second);
// Iterate
for (int i: first)
doSomething();

picking without replacement in java

I often* find myself in need of a data structure which has the following properties:
can be initialized with an array of n objects in O(n).
one can obtain a random element in O(1), after this operation the picked
element is removed from the structure.
(without replacement)
one can undo p 'picking without replacement' operations in O(p)
one can remove a specific object (eg by id) from the structure in O(log(n))
one can obtain an array of the objects currently in the structure in
O(n).
the complexity (or even possibility) of other actions (eg insert) does not matter. Besides the complexity it should also be efficient for small numbers of n.
Can anyone give me guidelines on implementing such a structure? I currently implemented a structure having all above properties, except the picking of the element takes O(d) with d the number of past picks (since I explicitly check whether it is 'not yet picked'). I can figure out structures allowing picking in O(1), but these have higher complexities on at least one of the other operations.
BTW:
note that O(1) above implies that the complexity is independent from #earlier picked elements and independent from total #elements.
*in monte carlo algorithms (iterative picks of p random elements from a 'set' of n elements).
HashMap has complexity O(1) both for insertion and removal.
You specify a lot of operation, but all of them are nothing else then insertion, removal and traversing:
can be initialized with an array of n objects in O(n).
n * O(1) insertion. HashMap is fine
one can obtain a random element in
O(1), after this operation the picked
element is removed from the structure.
(without replacement)
This is the only op that require O(n).
one can undo p 'picking without
replacement' operations in O(p)
it's an insertion operation: O(1).
one can remove a specific object (eg
by id) from the structure in O(log(n))
O(1).
one can obtain an array of the objects
currently in the structure in O(n).
you can traverse an HashMap in O(n)
EDIT:
example of picking up a random element in O(n):
HashMap map ....
int randomIntFromZeroToYouHashMapSize = ...
Collection collection = map.values();
Object[] values = collection.toArray();
values[randomIntFromZeroToYouHashMapSize];
Ok, same answer as 0verbose with a simple fix to get the O(1) random lookup. Create an array which stores the same n objects. Now, in the HashMap, store the pairs . For example, say your Objects (strings for simplicity) are:
{"abc" , "def", "ghi"}
Create an
List<String> array = ArrayList<String>("abc","def","ghi")
Create a HashMap map with the following values:
for (int i = 0; i < array.size(); i++)
{
map.put(array[i],i);
}
O(1) random lookup is easily achieved by picking any index in the array. The only complication that arises is when you delete an object. For that, do:
Find object in map. Get its array index. Lets call this index i (map.get(i)) - O(1)
Swap array[i] with array[size of array - 1] (the last element in the array). Reduce the size of the array by 1 (since there is one less number now) - O(1)
Update the index of the new object in position i of the array in map (map.put(array[i], i)) - O(1)
I apologize for the mix of java and cpp notation, hope this helps
Here's my analysis of using Collections.shuffle() on an ArrayList:
✔ can be initialized with an array of n objects in O(n).
Yes, although the cost is amortized unless n is known in advance.
✔ one can obtain a random element in O(1), after this operation the picked element is removed from the structure, without replacement.
Yes, choose the last element in the shuffled array; replace the array with a subList() of the remaining elements.
✔ one can undo p 'picking without replacement' operations in O(p).
Yes, append the element to the end of this list via add().
❍ one can remove a specific object (eg by id) from the structure in O(log(n)).
No, it looks like O(n).
✔ one can obtain an array of the objects currently in the structure in O(n).
Yes, using toArray() looks reasonable.
How about an array (or ArrayList) that's divided into "picked" and "unpicked"? You keep track of where the boundary is, and to pick, you generate a random index below the boundary, then (since you don't care about order), swap the item at that index with the last unpicked item, and decrement the boundary. To unpick, you just increment the boundary.
Update: Forgot about O(log(n)) removal. Not that hard, though, just a little memory-expensive, if you keep a HashMap of IDs to indices.
If you poke around on line you'll find various IndexedHashSet implementations that all work on more or less this principle -- an array or ArrayList plus a HashMap.
(I'd love to see a more elegant solution, though, if one exists.)
Update 2: Hmm... or does the actual removal become O(n) again, if you have to either recopy the arrays or shift them around?

What is the time complexity of java.util.HashMap class' keySet() method?

I am trying to implement a plane sweep algorithm and for this I need to know the time complexity of java.util.HashMap class' keySet() method. I suspect that it is O(n log n). Am I correct?
Point of clarification: I am talking about the time complexity of the keySet() method; iterating through the returned Set will take obviously O(n) time.
Getting the keyset is O(1) and cheap. This is because HashMap.keyset() returns the actual KeySet object associated with the HashMap.
The returned Set is not a copy of the keys, but a wrapper for the actual HashMap's state. Indeed, if you update the set you can actually change the HashMap's state; e.g. calling clear() on the set will clear the HashMap!
... iterating through the returned Set will take obviously O(n) time.
Actually that is not always true:
It is true for a HashMap is created using new HashMap<>(). The worst case is to have all N keys land in the same hash chain. However if the map has grown naturally, there will still be N entries and O(N) slots in the hash array. Thus iterating the entry set will involve O(N) operations.
It is false if the HashMap is created with new HashMap<>(capacity) and a singularly bad (too large) capacity estimate. Then it will take O(Cap) + O(N) operations to iterate the entry set. If we treat Cap as a variable, that is O(max(Cap, N)), which could be worse than O(N).
There is an escape clause though. Since capacity is an int in the current HashMap API, the upper bound for Cap is 231. So for really large values of Cap and N, the complexity is O(N).
On the other hand, N is limited by the amount of memory available and in practice you need a heap in the order of 238 bytes (256GBytes) for N to exceed the largest possible Cap value. And for a map that size, you would be better off using a hashtable implementation tuned for huge maps. Or not using an excessively large capacity estimate!
Surely it would be O(1). All that it is doing is returning a wrapper object on the HashMap.
If you are talking about walking over the keyset, then this is O(n), since each next() call is O(1), and this needs to be performed n times.
This should be doable in O(n) time... A hash map is usually implemented as a large bucket array, the bucket's size is (usually) directly proportional to the size of the hash map. In order to retrieve the key set, the bucket must be iterated through, and for each set item, the key must be retrieved (either through an intermediate collection or an iterator with direct access to the bucket)...
**EDIT: As others have pointed out, the actual keyset() method will run in O(1) time, however, iterating over the keyset or transferring it to a dedicated collection will be an O(n) operation. Not quite sure which one you are looking for **
Java collections have a lot of space and thus don't take much time. That method is, I believe, O(1). The collection is just sitting there.
To address the "iterating through the returned Set will take obviously O(n) time" comment, this is not actually correct per the doc comments of HashMap:
Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
So in other words, iterating over the returned Set will take O(n + c) where n is the size of the map and c is its capacity, not O(n). If an inappropriately sized initial capacity or load factor were chosen, the value of c could outweigh the actual size of the map in terms of iteration time.

Categories

Resources