Why is my algorithm O(1) additional space complexity? - java

I solved this problem from codefights:
Note: Write a solution with O(n) time complexity and O(1) additional space complexity, since this is what you would be asked to do during a real interview.
Given an array a that contains only numbers in the range from 1 to a.length, find the first duplicate number for which the second occurrence has the minimal index. In other words, if there are more than 1 duplicated numbers, return the number for which the second occurrence has a smaller index than the second occurrence of the other number does. If there are no such elements, return -1.
int firstDuplicate(int[] a) {
HashSet z = new HashSet();
for (int i: a) {
if (z.contains(i)){
return i;
}
z.add(i);
}
return -1;
}
My solution passed all of the tests. However I don't understand how my solution met the O(1) additional space complexity requirement. The size of the hashtable is directly proportional to the input so I would think it is O(n) space complexity. Did codefights incorrectly test my algorithm or am I misunderstanding something?

Your code doesn’t have O(1) auxiliary space complexity, since that hash set can grow up to size n if given an array of all different elements.
My guess is that the online testing infrastructure didn’t check memory usage or otherwise checked memory usage incorrectly. If you want to meet the space constraints, you’ll need to go back and try solving the problem a different way.
As a hint, think about reordering the array elements.

In case you are able to modify incomming array, you could fix your problem with O(n) time complexity, and do not use external memory.
public static int getFirstDuplicate(int... arr) {
for (int i = 0; i < arr.length; i++) {
int val = Math.abs(arr[i]);
if (arr[val - 1] < 0)
return val;
arr[val - 1] = -arr[val - 1];
}
return -1;
}

This is technically incorrect, for two reasons.
Firstly, depending on the values in the array, there may be overhead when the ints become Integers and added to the HashSet.
Secondly, while the additional memory is largely the overhead associated with a HashSet, that overhead is linearly proportional to the size of the set. (Note that I am not counting the elements in this, as they are already present in the array.)
Usually, these memory constraints are tested by setting a limit to the amount of memory it can use. A solution like this I would expect to fall below the said threshold.

Related

Algorithm Complexity: Is it the same to iterate an array from the start than from the end?

In an interview I was asked for the following:
public class Main {
public static void main(String[] args) {
// TODO Auto-generated method stub
int [] array = new int [10000];
for (int i = 0; i < array.length; i++) {
// do calculations
}
for (int x = array.length-1; x >= 0; x--) {
// do calculations
}
}
}
Is it the same to iterate an array either from the end or from the start? As my understanding it would be the same since complexity is constant i.e O(1) ? Am I correct?
Also I was asked regarding ArrayList Complexity compared to other collections in java, for example, LinkedList.
Thank you.
There can be a difference due to CPU prefetch characteristics.
There is no difference between looping in either direction as per computational theory. However, depending on the kind of prefetcher that is used by the CPU on which the code runs, there will be some differences in practice.
For example, the Sandy Bridge Intel processor has a prefetcher that goes forward only for data (while instructions could be prefetched in both directions). This will help iteration from the start (as future memory locations are prefetched into the L1 cache), while iterating from the end will cause very little to no prefetching, and hence more accesses to RAM which is much slower than accessing any of the CPU caches.
There is a more detailed discussion about forward and backward prefetching at this link.
It's O(n) in both cases for an array, as there are n iterations and each step takes O(1) (assuming the calculations in the loop take O(1)). In particular, obtaining the length or size is typically an O(1) operation for arrays or ArrayList.
A typical use case for iterating from the end is removing elements in the loop (which may otherwise require more complex accounting to avoid skipping elements or iterating beyond the end).
For a linked list, the first loop would typically be O(n²), as determining the length of a linked list is typically an O(n) operation without additional caching, and it's used every time the exit condition is checked. However, java.util.LinkedList keeps explicitly track of the length, so the total is O(n) for linked lists in java.
If an element in a linked list is accessed using the index in the calculations, this will be an O(n) operation, yielding a total of O(n²).
Is it the same to iterate an array either from the end or from the start? As my understanding it would be the same since complexity is constant i.e O(1) ? Am I correct?
In theory, yes it's the same to iterate an array from the end and from the start.
The time complexity is O(10,000) which is constant so O(1) assuming that the loop body has a constant time complexity. But it's nice to mention that the constant 10,000 can be promoted to a variable, call it N, and then you can say that the time complexity is O(N).
Also I was asked regarding ArrayList Complexity compared to other collections in java, for example, LinkedList.
Here you can find comparison between ArrayList and LinkedList time complexity. The interesting methods are add, remove and get.
http://www.programcreek.com/2013/03/arraylist-vs-linkedlist-vs-vector/
Also, data in a LinkedList are not stored consuecutively. However, data in an ArrayList are stored consecutively and also an ArrayList uses less space than a LinkedList
Good luck!

How to work out worst-case run time complexity of this simple algorithm

I am trying to learn how to work out worst case run time complexity and I was wondering if someone could explain in detail how to work it out for the code below by highlighting all the operations etc.... Sorry for the awful code example, I'm very aware it is the most pointless method ever!
public int two(int[] n){
int twosFound=0;
int other=0;
for (int i = 0; i < n.length; i++) {
if (n[i]==2) {
System.out.println("2 is found");
twosFound++;
}
else other++;
}
return twosFound;
}
Start with the basic building blocks, and then work your way upwards:
initializing an int variable does not depend on the array length, therefore it is O(1).
(By the way, it is a terrible idea to call the array n when you want to talk about O(n). This can easily lead to confusion.)
Accessing n.length takes constant time. On the other hand, if n were a linked list or some other data structure, you would have to further analyze it.
Accessing n[i] takes constant time.
System.out.println takes constant time, since it neither depends on the array length nor on the array contents.
Incrementing an int variable takes constant time.
The if statement takes the worst of whatever its components take; in this case it is constant time.
The for loop iterates linearly over the array, therefore it takes O(n.length), multiplied with O(whatever happens inside the body of the for statement).
In this case, this is O(n) * O(1) = O(n).
Your worst case complexity does not differs from the other cases. Its because of your algorithm. Its running time is solely dependent on the number of elements in the array. You test each element to satisfy a condition, the more elements you have the longer it will take the algo to run.
In fact it clear that time complexity is O(number_of_elements). It means that time is linear dependent on the number of elements. If you take twice bigger array the time will also increase twice.
The time complexity if O(n) (actually, exactly n) because control accesses all n elements of the array. (plus there are no extra conditions to terminate the loop)

How reduce the complexity of the searching in two lists algorithm?

I have to find some common items in two lists. I cannot sort it, order is important. Have to find how many elements from secondList occur in firstList. Now it looks like below:
int[] firstList;
int[] secondList;
int iterator=0;
for(int i:firstList){
while(i <= secondList[iterator]/* two conditions more */){
iterator++;
//some actions
}
}
Complexity of this algorithm is n x n. I try to reduce the complexity of this operation, but I don't know how compare elements in different way? Any advice?
EDIT:
Example: A=5,4,3,2,3 B=1,2,3
We look for pairs B[i],A[j]
Condition:
when
B[i] < A[j]
j++
when
B[i] >= A[j]
return B[i],A[j-1]
next iteration through the list of A to an element j-1 (mean for(int z=0;z<j-1;z++))
I'm not sure, Did I make myself clear?
Duplicated are allowed.
My approach would be - put all the elements from the first array in a HashSet and then do an iteration over the second array. This reduces the complexity to the sum of the lengths of the two arrays. It has the downside of taking additional memory, but unless you use more memory I don't think you can improve your brute force solution.
EDIT: to avoid further dispute on the matter. If you are allowed to have duplicates in the first array and you actually care how many times does an element in the second array match an array in the first one, use HashMultiSet.
Put all the items of the first list in a set
For each item of the second list, test if its in the set.
Solved in less than n x n !
Edit to please fge :)
Instead of a set, you can use a map with the item as key and the number of occurrence as value.
Then for each item of the second list, if it exists in the map, execute your action once per occurence in the first list (dictionary entries' value).
import java.util.*;
int[] firstList;
int[] secondList;
int iterator=0;
HashSet hs = new HashSet(Arrays.asList(firstList));
HashSet result = new HashSet();
while(i <= secondList.length){
if (hs.contains( secondList[iterator]))
{
result.add(secondList[iterator]);
}
iterator++;
}
result will contain required common element.
Algorithm complexity n
Just because the order is important doesn't mean that you cannot sort either list (or both). It only means you will have to copy first before you can sort anything. Of course, copying requires additional memory and sorting requires additional processing time... yet I guess all solutions that are better than O(n^2) will require additional memory and processing time (also true for the suggested HashSet solutions - adding all values to a HashSet costs additional memory and processing time).
Sorting both lists is possible in O(n * log n) time, finding common elements once the lists are sorted is possible in O(n) time. Whether it will be faster than your native O(n^2) approach depends on the size of the lists. In the end only testing different approaches can tell you which approach is fastest (and those tests should use realistic list sizes as to be expected in your final code).
The Big-O notation is no notation that tells you anything about absolute speed, it only tells you something about relative speed. E.g. if you have two algorithms to calculate a value from an input set of elements, one is O(1) and the other one is O(n), this doesn't mean that the O(1) solution is always faster. This is a big misconception of the Big-O notation! It only means that if the number of input elements doubles, the O(1) solution will still take approx. the same amount of time while the O(n) solution will take approx. twice as much time as before. So there is no doubt that by constantly increasing the number of input elements, there must be a point where the O(1) solution will become faster than the O(n) solution, yet for a very small set of elements, the O(1) solution may in fact be slower than the O(n) solution.
OK, so this solution will work if there are no duplicates in either the first or second array. As the question does not tell, we cannot be sure.
First, build a LinkedHashSet<Integer> out of the first array, and a HashSet<Integer> out of the second array.
Second, retain in the first set only elements that are in the second set.
Third, iterate over the first set and proceed:
// A LinkedHashSet retains insertion order
Set<Integer> first = LinkedHashSet<Integer>(Arrays.asList(firstArray));
// A HashSet does not but we don't care
Set<Integer> second = new HashSet<Integer>(Arrays.asList(secondArray));
// Retain in first only what is in second
first.retainAll(second);
// Iterate
for (int i: first)
doSomething();

Calculating run time analysis on a few methods

I am preparing for my exam and having a little trouble on run time analysis. I have 2 methods below that I am confused on the run time analysis for:
public boolean findDuplicates(String [] arr) {
Hashtable<String,String> h = new Hashtable<String,String>();
for (int i = 0; i < arr.length; i++) {
if (h.get(arr[i]) == null)
h.put(arr[i], arr[i]);
else
return true;
}
return false;
}
Assuming that hash function only takes O(1) on any key, would the run time simply be O(n) due to in worst case, running through the entire array? Am I thinking of this along the right lines if each hash function takes constant time to evaluate?
The other problem I have seems much more complicated and I don't know exactly how to approach this. Assume these are arrarlists.
public boolean makeTranslation(List<Integer> lst1, List<Integer> lst2) {
//both lst1 and lst2 are same size and size is positive
int shift = lst1.get(0) - lst2.get(0);
for (int i = 1; i < lst1.size(); i++)
if ( (lst1.get(i) - lst2.get(i)) != shift)
return false;
return true;
}
In this case, the get operations are supposed to be constant since we are simply retrieving a particular index values. But in the for loop, we are both comparing it to shift and also iterating over all elements. How exactly would this translate to run time?
A helpful explanation would be much appreciated since I have the hardest time understanding run time analysis than anything in this course and my final is next week.
The short answer: both methods have time complexity of O(n).
For hash, it is clear that both get and put operations take constant time.
For list, if you use the ArrayList implementation(and it is likely), the get method takes constant time as well. This is because an ArrayList in Java is a List that is backed by an array.
Code for ArrayList.get(index) in the standard Java library:
public E get(int index) {
RangeCheck(index);
return (E) elementData[index];
}
RangeCheck probably did two comparisons, which is constant time. Returning a value from an array is obviously constant time. Thus, the get method for ArrayList takes constant time.
As for your specific concern mentioned in the OP:
But in the for loop, we are both comparing it to shift and also
iterating over all elements. How exactly would this translate to run
time?
lst1.get(i) takes constant time. lst2.get(i) takes constant time. Thus, lst1.get(i) - lst2.get(i) takes constant time. The same holds for (lst1.get(i) - lst2.get(i)) != shift. The idea is the sum of a constant number of constant time operations is still constant time. Since the loop iterates up to n times, the total time is O(Cn), i.e., O(n) where C is a constant.
And...it never hurts to have a brief review of the big O notation just before final :)
In general, the O() expresses the complexity of an algorithm wich generally is the number of operations, assuming the cost of each operation is constant.For example O(1000n) would be the similar to writing O(n), because each operation costs 1000, and there are n operations.
So assuming get and put are constant (depends on library implementation) for every value, the time for both would be O(n).
For more information see:
http://en.wikipedia.org/wiki/Big_O_notation
Big-O notation is not very accurate as you omit constant factors and lower order terms. So, even if you have 2 constant operations n times, it will still be O(n). In reality, it will be (1+1)n=2n, but in ordo-notation we round it down (even if it's 10000n). So, for both these cases the run-time will be O(n).
In practice, I suggest typing out the costs for each loop and each operation in the worst case. Start from the innermost nested level and multiply outwards (with only the highest cost of each level).
For example:
for (int i = 0; i<n; i++) { //n times
//log n operation
for (int i = 0; i<n; i++) { //n times
//constant operation
}
}
Here, we have n*(log(n)+n*1)=O(n*n) as n>log(n)
It's worth echoing, but both of these operations are O(n) (for #2, that's the worst case). The key thing to note is the number of critical operations done each iteration through.
For your first snippet, the Hashtable is a bit of a red herring, since access time isn't going to be your largest operation in the loop. It's also the case that, since that Hashtable was just new'd, you'll always be inserting n elements into it.
For your second snippet, you have a chance to end early. If the next elements' difference isn't shift, then you return false right there and then, which was only one operation. In the worst case, you'll be going through all n and returning.

Best way to write this program

I have a general programming question, that I have happened to use Java to answer. This is the question:
Given an array of ints write a program to find out how many numbers that are not unique are in the array. (e.g. in {2,3,2,5,6,1,3} 2 numbers (2 and 3) are not unique). How many operations does your program perform (in O notation)?
This is my solution.
int counter = 0;
for(int i=0;i<theArray.length-1;i++){
for(int j=i+1;j<theArray.length;j++){
if(theArray[i]==theArray[j]){
counter++;
break; //go to next i since we know it isn't unique we dont need to keep comparing it.
}
}
}
return counter:
Now, In my code every element is being compared with every other element so there are about n(n-1)/2 operations. Giving O(n^2). Please tell me if you think my code is incorrect/inefficient or my O expression is wrong.
Why not use a Map as in the following example:
// NOTE! I assume that elements of theArray are Integers, not primitives like ints
// You'll nee to cast things to Integers if they are ints to put them in a Map as
// Maps can't take primitives as keys or values
Map<Integer, Integer> elementCount = new HashMap<Integer, Integer>();
for (int i = 0; i < theArray.length; i++) {
if (elementCount.containsKey(theArray[i]) {
elementCount.put(theArray[i], new Integer(elementCount.get(theArray[i]) + 1));
} else {
elementCount.put(theArray[i], new Integer(1));
}
}
List<Integer> moreThanOne = new ArrayList<Integer>();
for (Integer key : elementCount.keySet()) { // method may be getKeySet(), can't remember
if (elementCount.get(key) > 1) {
moreThanOne.add(key);
}
}
// do whatever you want with the moreThanOne list
Notice that this method requires iterating through the list twice (I'm sure there's a way to do it iterating once). It iterates once through theArray, and then implicitly again as it iterates through the key set of elementCount, which if no two elements are the same, will be exactly as large. However, iterating through the same list twice serially is still O(n) instead of O(n^2), and thus has much better asymptotic running time.
Your code doesn't do what you want. If you run it using the array {2, 2, 2, 2}, you'll find that it returns 2 instead of 1. You'll have to find a way to make sure that the counting is never repeated.
However, your Big O expression is correct as a worst-case analysis, since every element might be compared with every other element.
Your analysis is correct but you could easily get it down to O(n) time. Try using a HashMap<Integer,Integer> to store previously-seen values as you iterate through the array (key is the number that you've seen, value is the number of times you've seen it). Each time you try to add an integer into the hashmap, check to see if it's already there. If it is, just increment that integers counter. Then, at the end, loop through the map and count the number of times you see a key with a corresponding value higher than 1.
First, your approach is what I would call "brute force", and it is indeed O(n^2) in the worst case. It's also incorrectly implemented, since numbers that repeat n times are counted n-1 times.
Setting that aside, there are a number of ways to approach the problem. The first (that a number of answers have suggested) is to iterate the array, and using a map to keep track of how many times the given element has been seen. Assuming the map uses a hash table for the underlying storage, the average-case complexity should be O(n), since gets and inserts from the map should be O(1) on average, and you only need to iterate the list and map once each. Note that this is still O(n^2) in the worst case, since there's no guarantee that the hashing will produce contant-time results.
Another approach would be to simply sort the array first, and then iterate the sorted array looking for duplicates. This approach is entirely dependent on the sort method chosen, and can be anywhere from O(n^2) (for a naive bubble sort) to O(n log n) worst case (for a merge sort) to O(n log n) average-though-likely case (for a quicksort.)
That's the best you can do with the sorting approach assuming arbitrary objects in the array. Since your example involves integers, though, you can do much better by using radix sort, which will have worst-case complexity of O(dn), where d is essentially constant (since it maxes out at 9 for 32-bit integers.)
Finally, if you know that the elements are integers, and that their magnitude isn't too large, you can improve the map-based solution by using an array of size ElementMax, which would guarantee O(n) worst-case complexity, with the trade-off of requiring 4*ElementMax additional bytes of memory.
I think your time complexity of O(n^2) is correct.
If space complexity is not the issue then you can have an array of 256 characters(ASCII) standard and start filling it with values. For example
// Maybe you might need to initialize all the values to 0. I don't know. But the following can be done with O(n+m) where n is the length of theArray and m is the length of array.
int[] array = new int[256];
for(int i = 0 ; i < theArray.length(); i++)
array[theArray[i]] = array[theArray[i]] + 1;
for(int i = 0 ; i < array.length(); i++)
if(array[i] > 1)
System.out.print(i);
As others have said, an O(n) solution is quite possible using a hash. In Perl:
my #data = (2,3,2,5,6,1,3);
my %count;
$count{$_}++ for #data;
my $n = grep $_ > 1, values %count;
print "$n numbers are not unique\n";
OUTPUT
2 numbers are not unique

Categories

Resources