checking the complexity of splay tree operation in java - java

I have implemented splay tree (insert, search, delete operation) in Java. Now I want to check if the complexity of the algorithm is O(logn) or not. Is there any way to check this by varying the input values (number of nodes) and checking the run time in seconds? Say, by putting input values like 1000, 100000 and checking the run time or is there any other way?

Strictly speaking, you cannot find the time complexity of the algorithm by running it for some values of n. Let's assume that you've run it for values n_1, n_2, ..., n_k. If the algorithm makes n^2 operations for any n <= max(n_1, ..., n_k) and exactly 10^100 operations for any larger value of n, it has a constant time complexity, even though it would look like a quadratic one from the points you have.
However, you can assess the number of operations it takes to complete on an input of a size n (I wouldn't even call it time complexity here, as the latter has a strict formal definition) by running on some values of n and looking at ratios T(n1) / T(n2) and n1 / n2. But even in case of a "real" algorithm (in sense that it is not a pathological case described in the first paragraph), you should be careful with the "structure" of the input (for example, a quick sort algorithm that takes the first element as pivot runs in O(n log n) on average for a random input, so it would look like an O(n log n) if you generate random arrays of different sizes. However, it runs in O(n^2) time for a reversed sorted array).
To sum it up, if you need to figure out if it's fast enough from a practical point of view and you have an idea how a typical input to your algorithm looks like, you can try generating inputs of different sizes and see how the execution time grows.
However, if you need a bound on the runtime in a mathematical sense, you need to prove some properties and bounds of your algorithm mathematically.
In your case, I would say that testing on random inputs can be a reasonable idea (because there is a mathematical proof that the time complexity of one operation is O(log n) for a splay tree), so you just need to check that the tree you have implemented is indeed a correct splay tree. One note: I'd recommend to try different patterns of queries (like inserting elements in sorted/reverse order and so on) as even unbalanced trees can work pretty fast as long as the input is "random".

Related

How to save CPU cycle searching a sorted array for an exact match? Linear search is too slow?

The aim of this exercise is to check the presence of a number in an array.
Specifications: The items are integers arranged in ascending order. The array can contain up to 1 million items. The array is never null. Implement the method boolean Answer.Exists(int[] ints, int k) so that it returns true if k belongs to ints, otherwise the method should return false.
Important note: Try to save CPU cycles if possible.
this is my code
import java.util.*;
class Main {
static boolean exists(int[] ints, int k) {
boolean flag = false;
for (int i = 0; i < ints.length; i++) {
if (ints[i] == k) {
flag = true;
return flag;
} else {
continue;
}
}
return flag;
}
public static void main(String []args){
int[] ints = {-9, 14, 37, 102};
System.out.println(Main.exists(ints,9)); // true
System.out.println(Main.exists(ints, 102));
}
}
but after submitted the code, this is the result
The solution works with a 'small' array
The solution works with an empty array
The solution Doesn't work in a reasonable time with one million items
So, why it is not working if anyone can clarify that
The solution works if k is the first element in the array
The solution doesn't use the J2SE API to perform the binary search
Your code uses an inefficient algorithm.
Modern CPUs and OSes are incredibly complicated and do vast amounts of bizarre optimizations. Thus, 'save some CPU cycles' is not something you can meaningfully reason about anymore. So let's objectify that statement into something that is useful:
The exercise wants you to find the algorithmically least complex algorithm for the task.
"Algorithmic complexity" is best thought of as follows: Define some variables. Let's say: The size of the input array. Let's call it N.
Now pick a bunch of numbers for N, run your algorithm a few times averaging the runtime. Then chart this. On the x-axis is N. On the y-axis is how long it took.
In other words, make some lists of size 10 and run the algorithm a bunch of times. Then go for size 20, size 30, size 40, and so on. A chart rolls out. At the beginning, for low N, it'll be all over the place, wildly different numbers. Your CPU is busy doing other things and who knows what - all sorts of esoteric factors (literally what song is playing on your music player, that kind of irrelevant stuff) control how long things take. But eventually, for a large enough N, you'll see a pattern - the line starts coalescing, the algorithmic complexity takes over.
At that point, the line (from there on out - so, looking to the 'right', to larger N) looks like a known graph. It may look like y = x - i.e. a straight line at an angle. Or like y = x^2. Or something complicated like y = x^3 + x!.
That is called the big-O number of your algorithm.
Your algorithm has O(n) performance. In other words, an angled line: If 10,000 items takes 5 milliseconds to process, than 20,000 items will take 10, and 40,000 items will take 20.
There is an O(log n) algorithm available instead. In other words, a line that becomes nearly entirely horizontal over time. If 10,000 items take 5 milliseconds, then 100,000 takes 10 milliseconds, and a million takes 20. Make N large enough and the algorithmically simpler one will trounce the other one, regardless of how many optimizations the algorithmically more complex one has. Because math says it has to be that way.
Hence trivially no amount of OS, JVM, and hardware optimizations could ever make an O(n^2) algorithm be faster than an O(n) one for a large enough N.
So let's figure out this O(log n) algorithm.
Imagine a phone book. I ask you to look up Mr. Smith.
You could open to page 1 and start reading. That's what your algorithm does. On average you have to read through half of the entire phonebook.
Here's another algorithm: Instead, turn to the middle of the book. Check the name. Is that name 'lower' or 'higher' than Smith? If it's lower, tear the top half of the phone book and toss it in the garbage. If it's higher, tear the bottom half off and get rid of that.
Then repeat: Pick the middle of the new (half-sized) phonebook. Keep tearing out half of that book until only one name remains. That's your match.
This is algorithmically less complicated: With a single lookup you eliminate half of the phonebook. Got a million entries in that phonebook? one lookup eliminates HALF of all names it could be.
In 20 lookups, you could get an answer even if the phonebook has 2^20 = 1 million items. Got 21 lookups? Then you can deal with 2 million items.
The crucial piece of information is that your input is sorted. It's like the phone book! You can apply the same algorithm! Pick a start and end, then look at the middle. Is it your answer? great, return true;. Is it not? If lower, then start now becomes the middle, and start the algorithm again. Is it higher? Then end now becomes the middle. Is start and end identical? Then return false.
That's the algorithm they want you to write. It's called "binary search". Wikipedia has a page on it, lots of web tutorials cover the notion. But it's good to know why binary search is faster.
NB: Moore's Law, an observation that computers get faster over time, more or less limits technical improvement at O(n^2). In other words, any algorithm whose algorithmic complexity is more complicated than O(n^2) is utterly safe - no amount of technological advancement will ever mean that an algorithm that cannot reasonably be run today can run in a flash on the hardware of tomorrow. Anything that is less complicated can just wait for technology to become faster. In that sense, anything more complex than O(n^2) will take literally quadrillions of years for large enough N, and always will. That's why we say 'this crypto algorithm is safe' - because it's algorithmically significantly more complicated than O(n^2) so the Apple M99 chip released in 2086 still can't crack anything unless you give it quadrillions of years. Perhaps one day quantum tech leapfrogs this entire notion, but that's a bit too out there for an SO answer :)

Confusion in the concept of constant time/space complexity

I am confused with the concept of constant time/space complexity.
For example:
public void recurse(int x) {
if(x==0) return;
else recurse(x/10);
}
where, 1<x<=2147483647
If we want to express the space complexity for this function in terms of big O notation and count the stack space for recursion, what will be the space complexity?
I am confused between:
O(1) - The maximum value of int in java is 2147483647, so at max it will recurse 10 times.
O(log x) - Number of recursions is really dependent on the number of digits in x, so at max we will have ~log10x recursion.
If we say it is O(1), then wouldn't any algorithm which has some finite input can have its time/space complexity bounded by some number? So let's take case of insertion sort in an array of numbers in java. The largest array you can have in java is of size 2147483647, so does that mean T(n) = O(21474836472) = O(1)?
Or should I just look it as, O(1) is a loose bound, while O(log x) is a tighter bound?
Here is the definition I found on wikipedia:
An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is bounded by a value that does not depend on the size of the input. For example, accessing any single element in an array takes constant time as only one operation has to be performed to locate it. In a similar manner, finding the minimal value in an array sorted in ascending order; it is the first element. However, finding the minimal value in an unordered array is not a constant time operation as scanning over each element in the array is needed in order to determine the minimal value. Hence it is a linear time operation, taking O(n) time. If the number of elements is known in advance and does not change, however, such an algorithm can still be said to run in constant time.
When analysing the time and space complexity of algorithms, we have to ignore some limitations of physical computers; the complexity is a function of the "input size" n, which in big O notation is an asymptotic upper bound as n tends to infinity, but of course a physical computer cannot run the algorithm for arbitrarily large n because it has a finite amount of memory and other storage.
So to do the analysis in a meaningful way, we analyse the algorithm on an imaginary kind of computer where there is no limit on an array's length, where integers can be "sufficiently large" for the algorithm to work, and so on. Your Java code is a concrete implementation of the algorithm, but the algorithm exists as an abstract idea beyond the boundary of what is possible in Java on a real computer. So running this abstract algorithm on an imaginary computer with no such limits, the space complexity is O(log n).
This kind of "imaginary computer" might sound a bit vague, but it is something that can be mathematically formalised in order to do the analysis rigorously; it is called a model of computation. In practice, unless you are doing academic research then you don't need to analyse an algorithm that rigorously, so it's more useful to get comfortable with the vaguer notion that you should ignore any limits which would prevent the algorithm running on an arbitrarily large input.
It really depends on why you are using the big-O notation.
You are correct in saying that, technically, any algorithm is O(1) if it only works for a finite number of possible inputs. For example, this would be an O(1) sorting algorithm: "Read the first 10^6 bits of input. If there are more bits left in the input, output "error". Otherwise, bubblesort."
But the benefit of the notation lies in the fact that it usually approximates the actual running time of a program well. While an O(n) algorithm might as well do 10^100 * n operations, this is usually not the case, and this is why we use the big-O notation at all. Exceptions from this rule are known as galactic algorithms, the most famous one being the Coppersmith–Winograd matrix multiplication algorithm.
To sum up, if you want to be technical and win an argument with a friend, you could say that your algorithm is O(1). If you want to actually use the bound to approximate how fast it is, what you should do is imagine it works for arbitrarily large numbers and just call it O(log(n)).
Side note: Calling this algorithm O(log(n)) is a bit informal, as, technically, the complexity would need to be expressed in terms of size of input, not its magnitude, thus making it O(n). The rule of thumb is: if you're working with small numbers, express the complexity in terms of the magnitude - everyone will understand. If you're working with numbers with potentially millions of digits, then express the complexity in terms of the length. In this case, the cost of "simple" operations such as multiplication (which, for small numbers, is often considered to be O(1)) also needs to be factored in.
Constant time or space means that the time and space used by the algorithm don't depend on the size of the input.
A constant time (hence O(1)) algorithm would be
public int square(int x){
return x * x;
}
because for any input, it does the same multiplication and it's over.
On the other hand, to sum all elements of an array
public int sum(int[] array){
int sum = 0;
for(int i : array) sum += i;
return sum;
}
takes O(n) time, where n is the size of the array. It depends directly on the size of the input.
The space complexity behaves equally.
Any thing that doesn't rely on the size of any input is considered constant.
Applying asymptotic complexity to the real world is tricky as you have discovered.
Asymptotic complexity deals with the abstract situation where input size N has no upper limit, and you're only interested in what will happen with arbitrarily large input size.
In the real world, in the practical applications you're interested, the input size often has an upper limit. The upper limit may come from the fact that you don't have infinite resources (time/money) to collect data. Or it may be imposed by technical limitations, like the fixed size of int datatype in Java.
Since asymptotic complexity analysis does not account for real world limitations, the asymptotic complexity of recurse(x) is O(log x). Even though we know that x can only grow up to 2^31.
When your algo doesnt depend on size of input, it is said to have constant time complexity. For eg:
function print(int input) {
// 10 lines of printing here
}
Here, no matter what you pass in as 'input', function body statements will always run 10 times. If you pass 'input' as 10, 10 statements are run. If you pass 'input' as 20, still 10 statements are run.
Now on other hand, consider this:
function print(int input) {
// This loop will run 'input' times
for(int i=0;i<input;i++){
System.out.println(i);
}
}
This algo will run depending on the size of input. If you pass 'input' as 10, for loop will run 10 times, If you pass 'input' as 20, for loop will run 20 times. So, algo grows with the same pace as 'input' grows. So, in this case time complexity is said to be O(n)

How to know if method is N or N^2

I often see you guys talking about N methods and N^2 methods, which, correct me if I'm wrong, indicate how fast a method is. My question is: how do you guys know which methods are N and which are N^2? And also: are there other speed indications of methods then just N and N^2?
This talks abnout the complexity of an algorithm (which is an indicator of how fast it will be, yes)
In short, it tells how many "operations" (with operations being a very vague and abstract term) will be needed for a input to the method of size "N".
e.g. if your input is an List-type object, and you must iterate over all items in the list, the complexity is "N". (often expressed O(N) ).
if your input is an list-type object, and you need only to look at the first (or last), and the list gurantees to you that such a look at the item is O(1); your method will be O(1) - independent from the input size.
If your input is a list, and you need to compare every item to every other item the complexity will be O(N²) or O(N*log(n))
correct me if I'm wrong, indicate how fast a method is.
Its says how an algorithm will scale on an ideal machine. It deliberately ignores the factor involved which can mean that an O(1) could be slower than an O(N) which could be slower than an O(N^2) for your use-case. e.g. Arrays.sort() will use insertion sort O(N^2) for small collections (length < 47 in Java 7) in preference to quick sort O(N ln N)
In general, using lower order algorithms are a safer choice because they are less likely to break in extreme cases which you may not get a chance to test thoroughly.
The way to guesstimate the big-O complexity of a program is based on experience with dry-running code (running it in your mind). Some cases are dead obvious, but the most interesting ones aren't: for example, calling library methods known to be O(n) in an O(n) loop results in O(n2) total complexity; writing to a TreeMap in an O(n) loop results in O(nlogn) total, and so on.

Time Complexity of an algorithm : How to decide which algorithm after calculated the time

Today i'm come across with the blog in msdn and i noticed how to calculate the time complexity of an algorithm. I perfectly understand how to calculate the time complexity of an algorithm but in the last the author mentioned the below lines
Adding everything up I get
(N+4)+(5N+2)+(4N+2) = 10N+8
So the asymptotic time complexity for the above code is O(N), which
means that the above algorithm is a liner time complexity algorithm.
So how come the author is said it's based on the liner time complexity algorithm. The link for the blog
http://blogs.msdn.com/b/nmallick/archive/2010/03/30/how-to-calculate-time-complexity-for-a-given-algorithm.aspx.
He said that because 10N + 8 is a linear equation. If you plot that equation you get a straight line. Try typing 10 * x + 8 on this website (function graphs) and see for yourself.
Ascending order of time complexities(the common one)
O(1) - Constant
O(log n) - logarithmic
O(n) - linear
O(n log n) - loglinear
O(n^2) - quadratic
Note: N increases without bounds
For complexity theory you definitely should read some background theory. It's usually about asymptotic complexity, which is why you can drop the smaller parts, and only keep the complexity class.
The key idea is that the difference between N and N+5 becomes neglibile once N is really big.
For more details, start reading here:
http://en.wikipedia.org/wiki/Big_O_notation
The author just based on his experience in choosing most appropriate . You should know, that counting algorithm complexity almost always means to find a Big-Oh function, which, in turn, is just an upper bound to given function (10N+8 in your case).
There is only a few well-known complexity types: linear complexity, quadratic complexity, etc. So, the final step of counting a time complexity consists of choosing what is the less complex type (i mean, linear is less complex than a quadratic, and quadratic is less complex that exponential, and so on) can be used for the given function, which correctly describe its complexity.
In your case, O(n) and O(n^2) and even O(2^n) are the right answers indeed. But the less complex function, which suits perfectly in Big-Oh notation definition is O(n), which is an answer here.
Here is a real good article, fully explained Big-Oh notation.
A very pragmatic rule is:
when the complexity of an algorithm si represented by a poly like A*n^2+B*n+C then the order of complexity ( that is to say the O(something) ) is equal to the highest order of the variable n.
In the A*n^2+B*n+C poly the order is O(n^2).
Like josnidhin explained, if the poly has
order 1 (i.e. n)- it is called linear
order 2 (i.e. n^2) - it is called quadratic
... and so on.

What is the difference between quicksort and tuned quicksort?

What is the fundamental difference between quicksort and tuned quicksort? What is the improvement given to quicksort? How does Java decide to use this instead of merge sort?
As Bill the Lizard said, a tuned quicksort still has the same complexity as the basic quicksort - O(N log N) average complexity - but a tuned quicksort uses some various means to try to avoid the O(N^2) worst case complexity as well as uses some optimizations to reduce the constant that goes in front of the N log N for average running time.
Worst Case Time Complexity
Worst case time complexity occurs for quicksort when one side of the partition at each step always has zero elements. Near worst case time complexity occurs when the ratio of the elements in one partition to the other partition is very far from 1:1 (10000:1 for instance). Common causes of this worst case complexity include, but are not limited to:
A quicksort algorithm that always chooses the element with the same relative index of a subarray as the pivot. For instance, with an array that is already sorted, a quicksort algorithm that always chooses the leftmost or rightmost element of the subarray as the pivot will be O(N^2). A quicksort algorithm that always chooses the middle element gives O(N^2) for the organ pipe array ([1,2,3,4,5,4,3,2,1] is an example of this).
A quicksort algorithm that doesn't handle repeated/duplicate elements in the array can be O(N^2). The obvious example is sorting an array that contains all the same elements. Explicitly, if the quicksort sorts the array into partitions like [ < p | >= p ], then the left partition will always have zero elements.
How are these remedied? The first is generally remedied by choosing the pivot randomly. Using a median of a few elements as the pivot can also help, but the probability of the sort being O(N^2) is higher than using a random pivot. Of course, the median of a few randomly chosen elements might be a wise choice too. The median of three randomly chosen elements as the pivot is a common choice here.
The second case, repeated elements, is usually solved with something like Bentley-McIlroy paritioning(links to a pdf) or the solution to the Dutch National Flag problem. The Bentley-McIlroy partitioning is more commonly used, however, because it is usually faster. I've come up with a method that is faster than it, but that's not the point of this post.
Optimizations
Here are some common optimizations outside of the methods listed above to help with worst case scenarios:
Using the converging pointers quicksort as opposed to the basic quicksort. Let me know if you want more elaboration on this.
Insertion sort subarrays when they get below a certain size. Insertion sort is asymptotically O(N^2), but for small enough N, it beats quicksort.
Using an iterative quicksort with an explicit stack as opposed to a recursive quicksort.
Unrolling parts of loops to reduce the number of comparisons.
Copying the pivot to a register and using that space in the array to reduce the time cost of swapping elements.
Other Notes
Java uses mergesort when sorting objects because it is a stable sort (the order of elements that have the same key is preserved). Quicksort can be stable or unstable, but the stable version is slower than the unstable version.
"Tuned" quicksort just means that some improvements are applied to the basic algorithm. Usually the improvements are to try and avoid worst case time complexity. Some examples of improvements might be to choose the pivot (or multiple pivots) so that there's never only 1 key in a partition, or only make the recursive call when a partition is above a certain minimum size.
It looks like Java only uses merge sort when sorting Objects (the Arrays doc tells you which sorting algorithm is used for which sort method signature), so I don't think it ever really "decides" on its own, but the decision was made in advance. (Also, implementers are free to use another sort, as long as it's stable.)
In java, Arrays.sort(Object[]) uses merge sort but all other overloaded sort functions use
insertion sort if length is less than 7 and if length of array is greater than 7 it uses
tuned quicksort.

Categories

Resources