Complexity measurement in algorithms using auto-initialized arrays

Complexity measurement in algorithms using auto-initialized arrays - java

When it comes to evaluate the time complexity of an algorithm which uses an array that must be initialized, usually it is expressed as O(k). Where k is the size of the array.
For instance, the counting sort has a time complexity of O(n + k).
But what happend when the array is automatically initialized, like in Java or PHP. Would it be fair to say that the counting sort (or any other algorithm that needs an initialized array) in Java (or PHP...) has a time complexity of O(n)?

Are you talking about this http://en.wikipedia.org/wiki/Counting_sort which has an time complexity of O(n + k)?
You have to remember that time complexity is determined for an idealised machine which doesn't have caches, resource constraints and is independent of how a particular language or machine might actually perform.
The time complexity is still O(n + k)
However in a real machine that the initialisation is likely to much more efficient that the incrementing , so n and k are not directly comparable. The pattern for initialisation is like to be sequential and very efficient (the n). If the counts are of type int for example, the CPU could be using long or 128-bit registers to perform the initialisation.
The access pattern for counting is likely to be relatively random and for large values of k likely to be much slower. (up to 10x slower)

actually it would be O(n+k)
thus if n is of a higher order than k (many duplicates in counting sort) it can be discarded in the time complexity making it O(n)

Automatic initialization isn't free, you must account for it anyway, so it's still O(n + k).

Related

Confusion in the concept of constant time/space complexity

I am confused with the concept of constant time/space complexity.
For example:
public void recurse(int x) {
if(x==0) return;
else recurse(x/10);
}
where, 1<x<=2147483647
If we want to express the space complexity for this function in terms of big O notation and count the stack space for recursion, what will be the space complexity?
I am confused between:
O(1) - The maximum value of int in java is 2147483647, so at max it will recurse 10 times.
O(log x) - Number of recursions is really dependent on the number of digits in x, so at max we will have ~log10x recursion.
If we say it is O(1), then wouldn't any algorithm which has some finite input can have its time/space complexity bounded by some number? So let's take case of insertion sort in an array of numbers in java. The largest array you can have in java is of size 2147483647, so does that mean T(n) = O(21474836472) = O(1)?
Or should I just look it as, O(1) is a loose bound, while O(log x) is a tighter bound?
Here is the definition I found on wikipedia:
An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is bounded by a value that does not depend on the size of the input. For example, accessing any single element in an array takes constant time as only one operation has to be performed to locate it. In a similar manner, finding the minimal value in an array sorted in ascending order; it is the first element. However, finding the minimal value in an unordered array is not a constant time operation as scanning over each element in the array is needed in order to determine the minimal value. Hence it is a linear time operation, taking O(n) time. If the number of elements is known in advance and does not change, however, such an algorithm can still be said to run in constant time.

When analysing the time and space complexity of algorithms, we have to ignore some limitations of physical computers; the complexity is a function of the "input size" n, which in big O notation is an asymptotic upper bound as n tends to infinity, but of course a physical computer cannot run the algorithm for arbitrarily large n because it has a finite amount of memory and other storage.
So to do the analysis in a meaningful way, we analyse the algorithm on an imaginary kind of computer where there is no limit on an array's length, where integers can be "sufficiently large" for the algorithm to work, and so on. Your Java code is a concrete implementation of the algorithm, but the algorithm exists as an abstract idea beyond the boundary of what is possible in Java on a real computer. So running this abstract algorithm on an imaginary computer with no such limits, the space complexity is O(log n).
This kind of "imaginary computer" might sound a bit vague, but it is something that can be mathematically formalised in order to do the analysis rigorously; it is called a model of computation. In practice, unless you are doing academic research then you don't need to analyse an algorithm that rigorously, so it's more useful to get comfortable with the vaguer notion that you should ignore any limits which would prevent the algorithm running on an arbitrarily large input.

It really depends on why you are using the big-O notation.
You are correct in saying that, technically, any algorithm is O(1) if it only works for a finite number of possible inputs. For example, this would be an O(1) sorting algorithm: "Read the first 10^6 bits of input. If there are more bits left in the input, output "error". Otherwise, bubblesort."
But the benefit of the notation lies in the fact that it usually approximates the actual running time of a program well. While an O(n) algorithm might as well do 10^100 * n operations, this is usually not the case, and this is why we use the big-O notation at all. Exceptions from this rule are known as galactic algorithms, the most famous one being the Coppersmith–Winograd matrix multiplication algorithm.
To sum up, if you want to be technical and win an argument with a friend, you could say that your algorithm is O(1). If you want to actually use the bound to approximate how fast it is, what you should do is imagine it works for arbitrarily large numbers and just call it O(log(n)).
Side note: Calling this algorithm O(log(n)) is a bit informal, as, technically, the complexity would need to be expressed in terms of size of input, not its magnitude, thus making it O(n). The rule of thumb is: if you're working with small numbers, express the complexity in terms of the magnitude - everyone will understand. If you're working with numbers with potentially millions of digits, then express the complexity in terms of the length. In this case, the cost of "simple" operations such as multiplication (which, for small numbers, is often considered to be O(1)) also needs to be factored in.

Constant time or space means that the time and space used by the algorithm don't depend on the size of the input.
A constant time (hence O(1)) algorithm would be
public int square(int x){
return x * x;
}
because for any input, it does the same multiplication and it's over.
On the other hand, to sum all elements of an array
public int sum(int[] array){
int sum = 0;
for(int i : array) sum += i;
return sum;
}
takes O(n) time, where n is the size of the array. It depends directly on the size of the input.
The space complexity behaves equally.
Any thing that doesn't rely on the size of any input is considered constant.

Applying asymptotic complexity to the real world is tricky as you have discovered.
Asymptotic complexity deals with the abstract situation where input size N has no upper limit, and you're only interested in what will happen with arbitrarily large input size.
In the real world, in the practical applications you're interested, the input size often has an upper limit. The upper limit may come from the fact that you don't have infinite resources (time/money) to collect data. Or it may be imposed by technical limitations, like the fixed size of int datatype in Java.
Since asymptotic complexity analysis does not account for real world limitations, the asymptotic complexity of recurse(x) is O(log x). Even though we know that x can only grow up to 2^31.

When your algo doesnt depend on size of input, it is said to have constant time complexity. For eg:
function print(int input) {
// 10 lines of printing here
}
Here, no matter what you pass in as 'input', function body statements will always run 10 times. If you pass 'input' as 10, 10 statements are run. If you pass 'input' as 20, still 10 statements are run.
Now on other hand, consider this:
function print(int input) {
// This loop will run 'input' times
for(int i=0;i<input;i++){
System.out.println(i);
}
}
This algo will run depending on the size of input. If you pass 'input' as 10, for loop will run 10 times, If you pass 'input' as 20, for loop will run 20 times. So, algo grows with the same pace as 'input' grows. So, in this case time complexity is said to be O(n)

Runtime of Arrays.copyOfRange()

What is the big-O runtime of Java's Arrays.copyOfRange(array, startIndex, endIndex) function?
For example, would it be equivalent or less efficient in terms of both space and time complexity to write a simple binary search on arrays function using copyOfRange rather than passing in the start and end indices?

Arrays.copyRangeOf() uses System.arraycopy() which uses native code (could use memcpy for example - depending on JIT implementation) under the hood.
The "magic" behind copying with System.arraycopy() is making one call to copy a block of memory instead of making n distinct calls.
That means that using Arrays.copyOfRange() will definitely be more efficient comparing to any other solution you'll choose to implement by yourself.
Further, I don't see how a binary search could help here: an array has a direct access - and here we now exactly what are the src, dst and how many items should we copy.
From big-O perspective, the complexity will be O(n*k) where n is the number of items to copy and k is the size (in bits) of each item. Space complexity is the same.

Arrays.copyOfRange takes linear time -- with a low constant factor, but still linear time. Manipulating the start and end indices will inevitably be asymptotically faster, O(log n) instead of O(n).

Complexity of calling get() on a LinkedList in a for loop using O notation

I've a uni practical to determine the complexity of a small section of code using the O() notation.
The code is:
for (int i = 0; i < list.size(); i++)
System.out.println(list.get(i));
The list in question is a linked list. For our practical, we were given a ready made LinkedList class, although we had to write our own size() and get() methods.
What is confusing me about this question is what to count in the final calculation. The question asks:
How many lookups would it make if there 100 elements in the list? Based on this, calculate the complexity of the program using O() notation.
If I am just counting the get() method, it will make an average of n/2 lookups, resulting in a big O notation of O(n). However, each iteration of the for loop requires recalculating the size(), which involves a lookup (to determine how many nodes are in the linked list).
When calculating the complexity of this code, should this be taken into consideration? Or does calculating the size not count as a lookup?

I might be bit late to answer, but I think this for loop would actually be
Explanation
Each loop iteration you would be accessing the ith index of the list. Your call sequence would therefore be:
This is because each iteration i is incremented, and you are looping n times.
Therefore, the total number of method calls can be evaluated using the following sum:

In Java LinkedList, get(int) operation is O(N), and size() operation is O(1) complexity.

Since it is a linked list, to determine the size will be an O(N) operation, since you must traverse the whole list.
Also, you miscalculated the time complexity for .get(). For big-O, what matters is the worst case computation. For a linked list, the worst case of retrieval is that the element is at the end of the list, so that is also O(N).
All told, your algorithm will take O(2N) = O(N) time per iteration. I hope you can go from there to figure out what the time complexity of the whole loop will be.
By the way, in the real world you would want to compute the size just once, before the loop, precisely because it can be inefficient like this. Obviously, that's not an option if the size of the list can change during the loop, but it doesn't look like that's the case for this non-mutating algorithm.

Short answer: It depends on the interpretation of the question.
If the question is asking how many times I will have to jump the list if I want to find 100th position (like calling .get(100)), the complexity would be O(N) since I need to go through the entire list once.
If the question is asking for the complexity of finding an ith variable by checking each index ( like .get(1), .get(2), ..., .get(100)), the complexity would be O(N²) as explained by michael.
Long answer:
The complexity of calculating the size depends on your implementation. If you traverse the entire list to find the size, the complexity would be O(N) for the size calculation (and O(2N) in the first case, O(N² + N) in the second) <- this last part also depends on your implementation as I'm thinking you're calculating the size out of the for-loop.
if you have the size saved as an instance variable that gets bigger every time an element is added, you'll have O(1) for the size and the same complexity for first and second case.
The reason why we round O(2N) (or any case of O(aN + b)) to O(N) is because we care only about the growth of time spent to process the data. If N is small, the code would run fast anyways. If N is big, the code might run in a lot more time depending of the complexity but the constants a and b wouldn't be of much effect when compared with a worse complexity implementation.
Suppose a code runs in 2 seconds for a small input N in O(N) complexity.
as the value gets bigger: N, 2N, 3N, 4N, ..., kN
if the code has complexity O(N) the time would be: 2, 4, 6, 8, ..., 2k
if the code has complexity O(2N) the time would be: 4, 8, 12, 16, ..., 2k * 2
if the code has complexity O(N²) the time would be: 4, 16, 36, 64, ..., (2k)²
As you can see the last implementation is getting out of hand really fast while the second is only two times slower than a simple linear. So O(2N) is slower but it's almost nothing compared to a O(N²) solution.

How to know if method is N or N^2

I often see you guys talking about N methods and N^2 methods, which, correct me if I'm wrong, indicate how fast a method is. My question is: how do you guys know which methods are N and which are N^2? And also: are there other speed indications of methods then just N and N^2?

This talks abnout the complexity of an algorithm (which is an indicator of how fast it will be, yes)
In short, it tells how many "operations" (with operations being a very vague and abstract term) will be needed for a input to the method of size "N".
e.g. if your input is an List-type object, and you must iterate over all items in the list, the complexity is "N". (often expressed O(N) ).
if your input is an list-type object, and you need only to look at the first (or last), and the list gurantees to you that such a look at the item is O(1); your method will be O(1) - independent from the input size.
If your input is a list, and you need to compare every item to every other item the complexity will be O(N²) or O(N*log(n))

correct me if I'm wrong, indicate how fast a method is.
Its says how an algorithm will scale on an ideal machine. It deliberately ignores the factor involved which can mean that an O(1) could be slower than an O(N) which could be slower than an O(N^2) for your use-case. e.g. Arrays.sort() will use insertion sort O(N^2) for small collections (length < 47 in Java 7) in preference to quick sort O(N ln N)
In general, using lower order algorithms are a safer choice because they are less likely to break in extreme cases which you may not get a chance to test thoroughly.

The way to guesstimate the big-O complexity of a program is based on experience with dry-running code (running it in your mind). Some cases are dead obvious, but the most interesting ones aren't: for example, calling library methods known to be O(n) in an O(n) loop results in O(n2) total complexity; writing to a TreeMap in an O(n) loop results in O(nlogn) total, and so on.

Time Complexity of an algorithm : How to decide which algorithm after calculated the time

Today i'm come across with the blog in msdn and i noticed how to calculate the time complexity of an algorithm. I perfectly understand how to calculate the time complexity of an algorithm but in the last the author mentioned the below lines
Adding everything up I get
(N+4)+(5N+2)+(4N+2) = 10N+8
So the asymptotic time complexity for the above code is O(N), which
means that the above algorithm is a liner time complexity algorithm.
So how come the author is said it's based on the liner time complexity algorithm. The link for the blog
http://blogs.msdn.com/b/nmallick/archive/2010/03/30/how-to-calculate-time-complexity-for-a-given-algorithm.aspx.

He said that because 10N + 8 is a linear equation. If you plot that equation you get a straight line. Try typing 10 * x + 8 on this website (function graphs) and see for yourself.

Ascending order of time complexities(the common one)
O(1) - Constant
O(log n) - logarithmic
O(n) - linear
O(n log n) - loglinear
O(n^2) - quadratic
Note: N increases without bounds

For complexity theory you definitely should read some background theory. It's usually about asymptotic complexity, which is why you can drop the smaller parts, and only keep the complexity class.
The key idea is that the difference between N and N+5 becomes neglibile once N is really big.
For more details, start reading here:
http://en.wikipedia.org/wiki/Big_O_notation

The author just based on his experience in choosing most appropriate . You should know, that counting algorithm complexity almost always means to find a Big-Oh function, which, in turn, is just an upper bound to given function (10N+8 in your case).
There is only a few well-known complexity types: linear complexity, quadratic complexity, etc. So, the final step of counting a time complexity consists of choosing what is the less complex type (i mean, linear is less complex than a quadratic, and quadratic is less complex that exponential, and so on) can be used for the given function, which correctly describe its complexity.
In your case, O(n) and O(n^2) and even O(2^n) are the right answers indeed. But the less complex function, which suits perfectly in Big-Oh notation definition is O(n), which is an answer here.
Here is a real good article, fully explained Big-Oh notation.

A very pragmatic rule is:
when the complexity of an algorithm si represented by a poly like A*n^2+B*n+C then the order of complexity ( that is to say the O(something) ) is equal to the highest order of the variable n.
In the A*n^2+B*n+C poly the order is O(n^2).
Like josnidhin explained, if the poly has
order 1 (i.e. n)- it is called linear
order 2 (i.e. n^2) - it is called quadratic
... and so on.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.