I often see you guys talking about N methods and N^2 methods, which, correct me if I'm wrong, indicate how fast a method is. My question is: how do you guys know which methods are N and which are N^2? And also: are there other speed indications of methods then just N and N^2?
This talks abnout the complexity of an algorithm (which is an indicator of how fast it will be, yes)
In short, it tells how many "operations" (with operations being a very vague and abstract term) will be needed for a input to the method of size "N".
e.g. if your input is an List-type object, and you must iterate over all items in the list, the complexity is "N". (often expressed O(N) ).
if your input is an list-type object, and you need only to look at the first (or last), and the list gurantees to you that such a look at the item is O(1); your method will be O(1) - independent from the input size.
If your input is a list, and you need to compare every item to every other item the complexity will be O(N²) or O(N*log(n))
correct me if I'm wrong, indicate how fast a method is.
Its says how an algorithm will scale on an ideal machine. It deliberately ignores the factor involved which can mean that an O(1) could be slower than an O(N) which could be slower than an O(N^2) for your use-case. e.g. Arrays.sort() will use insertion sort O(N^2) for small collections (length < 47 in Java 7) in preference to quick sort O(N ln N)
In general, using lower order algorithms are a safer choice because they are less likely to break in extreme cases which you may not get a chance to test thoroughly.
The way to guesstimate the big-O complexity of a program is based on experience with dry-running code (running it in your mind). Some cases are dead obvious, but the most interesting ones aren't: for example, calling library methods known to be O(n) in an O(n) loop results in O(n2) total complexity; writing to a TreeMap in an O(n) loop results in O(nlogn) total, and so on.
Related
I am confused with the concept of constant time/space complexity.
For example:
public void recurse(int x) {
if(x==0) return;
else recurse(x/10);
}
where, 1<x<=2147483647
If we want to express the space complexity for this function in terms of big O notation and count the stack space for recursion, what will be the space complexity?
I am confused between:
O(1) - The maximum value of int in java is 2147483647, so at max it will recurse 10 times.
O(log x) - Number of recursions is really dependent on the number of digits in x, so at max we will have ~log10x recursion.
If we say it is O(1), then wouldn't any algorithm which has some finite input can have its time/space complexity bounded by some number? So let's take case of insertion sort in an array of numbers in java. The largest array you can have in java is of size 2147483647, so does that mean T(n) = O(21474836472) = O(1)?
Or should I just look it as, O(1) is a loose bound, while O(log x) is a tighter bound?
Here is the definition I found on wikipedia:
An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is bounded by a value that does not depend on the size of the input. For example, accessing any single element in an array takes constant time as only one operation has to be performed to locate it. In a similar manner, finding the minimal value in an array sorted in ascending order; it is the first element. However, finding the minimal value in an unordered array is not a constant time operation as scanning over each element in the array is needed in order to determine the minimal value. Hence it is a linear time operation, taking O(n) time. If the number of elements is known in advance and does not change, however, such an algorithm can still be said to run in constant time.
When analysing the time and space complexity of algorithms, we have to ignore some limitations of physical computers; the complexity is a function of the "input size" n, which in big O notation is an asymptotic upper bound as n tends to infinity, but of course a physical computer cannot run the algorithm for arbitrarily large n because it has a finite amount of memory and other storage.
So to do the analysis in a meaningful way, we analyse the algorithm on an imaginary kind of computer where there is no limit on an array's length, where integers can be "sufficiently large" for the algorithm to work, and so on. Your Java code is a concrete implementation of the algorithm, but the algorithm exists as an abstract idea beyond the boundary of what is possible in Java on a real computer. So running this abstract algorithm on an imaginary computer with no such limits, the space complexity is O(log n).
This kind of "imaginary computer" might sound a bit vague, but it is something that can be mathematically formalised in order to do the analysis rigorously; it is called a model of computation. In practice, unless you are doing academic research then you don't need to analyse an algorithm that rigorously, so it's more useful to get comfortable with the vaguer notion that you should ignore any limits which would prevent the algorithm running on an arbitrarily large input.
It really depends on why you are using the big-O notation.
You are correct in saying that, technically, any algorithm is O(1) if it only works for a finite number of possible inputs. For example, this would be an O(1) sorting algorithm: "Read the first 10^6 bits of input. If there are more bits left in the input, output "error". Otherwise, bubblesort."
But the benefit of the notation lies in the fact that it usually approximates the actual running time of a program well. While an O(n) algorithm might as well do 10^100 * n operations, this is usually not the case, and this is why we use the big-O notation at all. Exceptions from this rule are known as galactic algorithms, the most famous one being the Coppersmith–Winograd matrix multiplication algorithm.
To sum up, if you want to be technical and win an argument with a friend, you could say that your algorithm is O(1). If you want to actually use the bound to approximate how fast it is, what you should do is imagine it works for arbitrarily large numbers and just call it O(log(n)).
Side note: Calling this algorithm O(log(n)) is a bit informal, as, technically, the complexity would need to be expressed in terms of size of input, not its magnitude, thus making it O(n). The rule of thumb is: if you're working with small numbers, express the complexity in terms of the magnitude - everyone will understand. If you're working with numbers with potentially millions of digits, then express the complexity in terms of the length. In this case, the cost of "simple" operations such as multiplication (which, for small numbers, is often considered to be O(1)) also needs to be factored in.
Constant time or space means that the time and space used by the algorithm don't depend on the size of the input.
A constant time (hence O(1)) algorithm would be
public int square(int x){
return x * x;
}
because for any input, it does the same multiplication and it's over.
On the other hand, to sum all elements of an array
public int sum(int[] array){
int sum = 0;
for(int i : array) sum += i;
return sum;
}
takes O(n) time, where n is the size of the array. It depends directly on the size of the input.
The space complexity behaves equally.
Any thing that doesn't rely on the size of any input is considered constant.
Applying asymptotic complexity to the real world is tricky as you have discovered.
Asymptotic complexity deals with the abstract situation where input size N has no upper limit, and you're only interested in what will happen with arbitrarily large input size.
In the real world, in the practical applications you're interested, the input size often has an upper limit. The upper limit may come from the fact that you don't have infinite resources (time/money) to collect data. Or it may be imposed by technical limitations, like the fixed size of int datatype in Java.
Since asymptotic complexity analysis does not account for real world limitations, the asymptotic complexity of recurse(x) is O(log x). Even though we know that x can only grow up to 2^31.
When your algo doesnt depend on size of input, it is said to have constant time complexity. For eg:
function print(int input) {
// 10 lines of printing here
}
Here, no matter what you pass in as 'input', function body statements will always run 10 times. If you pass 'input' as 10, 10 statements are run. If you pass 'input' as 20, still 10 statements are run.
Now on other hand, consider this:
function print(int input) {
// This loop will run 'input' times
for(int i=0;i<input;i++){
System.out.println(i);
}
}
This algo will run depending on the size of input. If you pass 'input' as 10, for loop will run 10 times, If you pass 'input' as 20, for loop will run 20 times. So, algo grows with the same pace as 'input' grows. So, in this case time complexity is said to be O(n)
I have implemented splay tree (insert, search, delete operation) in Java. Now I want to check if the complexity of the algorithm is O(logn) or not. Is there any way to check this by varying the input values (number of nodes) and checking the run time in seconds? Say, by putting input values like 1000, 100000 and checking the run time or is there any other way?
Strictly speaking, you cannot find the time complexity of the algorithm by running it for some values of n. Let's assume that you've run it for values n_1, n_2, ..., n_k. If the algorithm makes n^2 operations for any n <= max(n_1, ..., n_k) and exactly 10^100 operations for any larger value of n, it has a constant time complexity, even though it would look like a quadratic one from the points you have.
However, you can assess the number of operations it takes to complete on an input of a size n (I wouldn't even call it time complexity here, as the latter has a strict formal definition) by running on some values of n and looking at ratios T(n1) / T(n2) and n1 / n2. But even in case of a "real" algorithm (in sense that it is not a pathological case described in the first paragraph), you should be careful with the "structure" of the input (for example, a quick sort algorithm that takes the first element as pivot runs in O(n log n) on average for a random input, so it would look like an O(n log n) if you generate random arrays of different sizes. However, it runs in O(n^2) time for a reversed sorted array).
To sum it up, if you need to figure out if it's fast enough from a practical point of view and you have an idea how a typical input to your algorithm looks like, you can try generating inputs of different sizes and see how the execution time grows.
However, if you need a bound on the runtime in a mathematical sense, you need to prove some properties and bounds of your algorithm mathematically.
In your case, I would say that testing on random inputs can be a reasonable idea (because there is a mathematical proof that the time complexity of one operation is O(log n) for a splay tree), so you just need to check that the tree you have implemented is indeed a correct splay tree. One note: I'd recommend to try different patterns of queries (like inserting elements in sorted/reverse order and so on) as even unbalanced trees can work pretty fast as long as the input is "random".
What is time-complexity of math.sqrt implementation in Java ?
Java has time-complexity implemented in some technique whose, time-complexity I am trying to determine.
In most cases, Java attempts to use the "smart-power" algorithm, which results in a time-complexity of O(log n).
Smart power Algorithm
Also, it appears that in different cases, you could end up with different complexities; Why is multiplied many times faster than taking the square root?
It looks like it is implemented by delegating to the sqrt method StrictMath which is a native method.
Thus it seems the answer would be implementation specific.
Strictly speaking it is O(1). In theory (but obviously not practice), we could iterate over all doubles and find the maximum time.
In addition, the time complexity of Math.sqrt(n) does not directly depend on n but instead on the amount of space needed to represent n which for doubles should be constant.
I've a uni practical to determine the complexity of a small section of code using the O() notation.
The code is:
for (int i = 0; i < list.size(); i++)
System.out.println(list.get(i));
The list in question is a linked list. For our practical, we were given a ready made LinkedList class, although we had to write our own size() and get() methods.
What is confusing me about this question is what to count in the final calculation. The question asks:
How many lookups would it make if there 100 elements in the list? Based on this, calculate the complexity of the program using O() notation.
If I am just counting the get() method, it will make an average of n/2 lookups, resulting in a big O notation of O(n). However, each iteration of the for loop requires recalculating the size(), which involves a lookup (to determine how many nodes are in the linked list).
When calculating the complexity of this code, should this be taken into consideration? Or does calculating the size not count as a lookup?
I might be bit late to answer, but I think this for loop would actually be
Explanation
Each loop iteration you would be accessing the ith index of the list. Your call sequence would therefore be:
This is because each iteration i is incremented, and you are looping n times.
Therefore, the total number of method calls can be evaluated using the following sum:
In Java LinkedList, get(int) operation is O(N), and size() operation is O(1) complexity.
Since it is a linked list, to determine the size will be an O(N) operation, since you must traverse the whole list.
Also, you miscalculated the time complexity for .get(). For big-O, what matters is the worst case computation. For a linked list, the worst case of retrieval is that the element is at the end of the list, so that is also O(N).
All told, your algorithm will take O(2N) = O(N) time per iteration. I hope you can go from there to figure out what the time complexity of the whole loop will be.
By the way, in the real world you would want to compute the size just once, before the loop, precisely because it can be inefficient like this. Obviously, that's not an option if the size of the list can change during the loop, but it doesn't look like that's the case for this non-mutating algorithm.
Short answer: It depends on the interpretation of the question.
If the question is asking how many times I will have to jump the list if I want to find 100th position (like calling .get(100)), the complexity would be O(N) since I need to go through the entire list once.
If the question is asking for the complexity of finding an ith variable by checking each index ( like .get(1), .get(2), ..., .get(100)), the complexity would be O(N²) as explained by michael.
Long answer:
The complexity of calculating the size depends on your implementation. If you traverse the entire list to find the size, the complexity would be O(N) for the size calculation (and O(2N) in the first case, O(N² + N) in the second) <- this last part also depends on your implementation as I'm thinking you're calculating the size out of the for-loop.
if you have the size saved as an instance variable that gets bigger every time an element is added, you'll have O(1) for the size and the same complexity for first and second case.
The reason why we round O(2N) (or any case of O(aN + b)) to O(N) is because we care only about the growth of time spent to process the data. If N is small, the code would run fast anyways. If N is big, the code might run in a lot more time depending of the complexity but the constants a and b wouldn't be of much effect when compared with a worse complexity implementation.
Suppose a code runs in 2 seconds for a small input N in O(N) complexity.
as the value gets bigger: N, 2N, 3N, 4N, ..., kN
if the code has complexity O(N) the time would be: 2, 4, 6, 8, ..., 2k
if the code has complexity O(2N) the time would be: 4, 8, 12, 16, ..., 2k * 2
if the code has complexity O(N²) the time would be: 4, 16, 36, 64, ..., (2k)²
As you can see the last implementation is getting out of hand really fast while the second is only two times slower than a simple linear. So O(2N) is slower but it's almost nothing compared to a O(N²) solution.
What is the fundamental difference between quicksort and tuned quicksort? What is the improvement given to quicksort? How does Java decide to use this instead of merge sort?
As Bill the Lizard said, a tuned quicksort still has the same complexity as the basic quicksort - O(N log N) average complexity - but a tuned quicksort uses some various means to try to avoid the O(N^2) worst case complexity as well as uses some optimizations to reduce the constant that goes in front of the N log N for average running time.
Worst Case Time Complexity
Worst case time complexity occurs for quicksort when one side of the partition at each step always has zero elements. Near worst case time complexity occurs when the ratio of the elements in one partition to the other partition is very far from 1:1 (10000:1 for instance). Common causes of this worst case complexity include, but are not limited to:
A quicksort algorithm that always chooses the element with the same relative index of a subarray as the pivot. For instance, with an array that is already sorted, a quicksort algorithm that always chooses the leftmost or rightmost element of the subarray as the pivot will be O(N^2). A quicksort algorithm that always chooses the middle element gives O(N^2) for the organ pipe array ([1,2,3,4,5,4,3,2,1] is an example of this).
A quicksort algorithm that doesn't handle repeated/duplicate elements in the array can be O(N^2). The obvious example is sorting an array that contains all the same elements. Explicitly, if the quicksort sorts the array into partitions like [ < p | >= p ], then the left partition will always have zero elements.
How are these remedied? The first is generally remedied by choosing the pivot randomly. Using a median of a few elements as the pivot can also help, but the probability of the sort being O(N^2) is higher than using a random pivot. Of course, the median of a few randomly chosen elements might be a wise choice too. The median of three randomly chosen elements as the pivot is a common choice here.
The second case, repeated elements, is usually solved with something like Bentley-McIlroy paritioning(links to a pdf) or the solution to the Dutch National Flag problem. The Bentley-McIlroy partitioning is more commonly used, however, because it is usually faster. I've come up with a method that is faster than it, but that's not the point of this post.
Optimizations
Here are some common optimizations outside of the methods listed above to help with worst case scenarios:
Using the converging pointers quicksort as opposed to the basic quicksort. Let me know if you want more elaboration on this.
Insertion sort subarrays when they get below a certain size. Insertion sort is asymptotically O(N^2), but for small enough N, it beats quicksort.
Using an iterative quicksort with an explicit stack as opposed to a recursive quicksort.
Unrolling parts of loops to reduce the number of comparisons.
Copying the pivot to a register and using that space in the array to reduce the time cost of swapping elements.
Other Notes
Java uses mergesort when sorting objects because it is a stable sort (the order of elements that have the same key is preserved). Quicksort can be stable or unstable, but the stable version is slower than the unstable version.
"Tuned" quicksort just means that some improvements are applied to the basic algorithm. Usually the improvements are to try and avoid worst case time complexity. Some examples of improvements might be to choose the pivot (or multiple pivots) so that there's never only 1 key in a partition, or only make the recursive call when a partition is above a certain minimum size.
It looks like Java only uses merge sort when sorting Objects (the Arrays doc tells you which sorting algorithm is used for which sort method signature), so I don't think it ever really "decides" on its own, but the decision was made in advance. (Also, implementers are free to use another sort, as long as it's stable.)
In java, Arrays.sort(Object[]) uses merge sort but all other overloaded sort functions use
insertion sort if length is less than 7 and if length of array is greater than 7 it uses
tuned quicksort.