What is the big-O runtime of Java's Arrays.copyOfRange(array, startIndex, endIndex) function?
For example, would it be equivalent or less efficient in terms of both space and time complexity to write a simple binary search on arrays function using copyOfRange rather than passing in the start and end indices?
Arrays.copyRangeOf() uses System.arraycopy() which uses native code (could use memcpy for example - depending on JIT implementation) under the hood.
The "magic" behind copying with System.arraycopy() is making one call to copy a block of memory instead of making n distinct calls.
That means that using Arrays.copyOfRange() will definitely be more efficient comparing to any other solution you'll choose to implement by yourself.
Further, I don't see how a binary search could help here: an array has a direct access - and here we now exactly what are the src, dst and how many items should we copy.
From big-O perspective, the complexity will be O(n*k) where n is the number of items to copy and k is the size (in bits) of each item. Space complexity is the same.
Arrays.copyOfRange takes linear time -- with a low constant factor, but still linear time. Manipulating the start and end indices will inevitably be asymptotically faster, O(log n) instead of O(n).
Related
With reference to the question , I have found out the accepted answer used Java Collections API to get the index. My question is there are so many other methods to solve the given problem, which would be the optimal solution ?
Use two loops
Use sorting and binary search
Use sorting and merging
Use hashing
Use Collections api
Use two loops will take O(n ^ 2) time
Use sorting and binary search will take O(nlog n) time
Use sorting and merging O(nlog n) time
Use hashing will take O(k * n) time with some other overhead and additional space.
Use Collections api will take O(n ^ 2) time as its use native algorithm under the hood
You can do it in optimal way along with the ways mentioned above by using Knuth–Morris–Pratt algorithm in linear O(n + m) time complexity where n and m are the lengths of the two arrays.
KMP algorithm is basically a pattern matching algorithm(finding the starting position of a needle in haystack) which works on character string. But you can easily use it for integer array.
You can do some benchmark test for all those implementations and choose which one is efficient enough to suit your requirement.
I know that Java's Arrays.sort method uses MergeSort for sorting arrays of objects (or collections of objects) since it is stable, and Java uses QuickSort for arrays of primitives because we don't need stability since two equal ints are indistinguishable, i.e. their identity doesn't matter.
My question is, in the case of primitives, why doesn't Java use MergeSort's guaranteed O(n log n) time and instead goes for the average O(n log n) time of QuickSort? In the last paragraph of one of the related answers here, it is explained that:
For reference types, where the referred objects usually take up far more memory than the array of references, this generally does not matter. But for primitive types, cloning the array outright doubles the memory usage.
What does this mean? Cloning a reference is still at-least as costly as cloning a primitive. Are there any other reasons for using QuickSort (average O(n log n)) instead of MergeSort (guaranteed O(n log n) time) on arrays of primitives?
Not all O(n log n) algorithms have the same constant factors. Quicksort, in the 99.9% of cases where it takes n log n time, runs in a much faster n log n than mergesort. I don't know the exact multiplier -- and it'll vary system to system -- but, say, quicksort could run twice as fast as merge sort on average and still have theoretical worst case n^2 performance.
Additionally, Quicksort doesn't require cloning the array in the first place, and merge sort inevitably does. But you don't have a choice for reference types if you want a stable sort, so you have to accept the copy, but you don't need to accept that cost for primitives.
Cloning a reference is still at-least as costly as cloning a primitive.
Most (or all?) implementations of Java implement an array of objects as an array of pointers (references) to objects. So cloning an array of pointers (references) would consume less space than cloning the objects themselves if the objects are larger in size than a pointer (reference).
I don't know why the term "cloning" was used. Merge sort allocates a second temp array, but the array is not a "clone" of the original. Instead an proper merge sort alternates the direction of merge from original to temp or from temp to original depending on iteration for bottom up, or depending on level of recursion for top down.
dual pivot quick sort
Based on what I can find doing web searches, Java's dual pivot quicksort keeps track of "recursions", and switches to heap sort if the recursion depth is excessive, to maintain O(n log(n)) time complexity, but at a higher cost factor.
quick sort versus merge sort
In addition to stability, merge sort can be faster for sorting an array of pointers (references) to objects. Merge sort does more moves (of the pointers) but fewer compares (of the objects accessed by dereferencing pointers), than quick sort.
On a system with 16 registers (most of them used as pointers), such as X86 in 64 bit mode, a 4-way merge sort is about as fast as regular quick sort, but I don't recall seeing a 4-way merge sort in a common library, at least not for a PC.
Arrays#sort(primitive array) doesn't use traditional Quick Sort; it uses Dual-Pivot Quicksort, which is faster than quicksort, which in turn is faster than merge sort, in part because it doesn't have to be stable.
From the javadoc:
Implementation note: The sorting algorithm is a Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm offers O(n log(n)) performance on many data sets that cause other quicksorts to degrade to quadratic performance, and is typically faster than traditional (one-pivot) Quicksort implementations.
QuickSort is approximately 40% faster than MergeSort on random data because of fewer data movements
QuickSort requires O(1) extra space while MergeSort requires O(n)
P.S. Neither classic QuickSort nor MergeSort are used in Java standard library.
What is time-complexity of math.sqrt implementation in Java ?
Java has time-complexity implemented in some technique whose, time-complexity I am trying to determine.
In most cases, Java attempts to use the "smart-power" algorithm, which results in a time-complexity of O(log n).
Smart power Algorithm
Also, it appears that in different cases, you could end up with different complexities; Why is multiplied many times faster than taking the square root?
It looks like it is implemented by delegating to the sqrt method StrictMath which is a native method.
Thus it seems the answer would be implementation specific.
Strictly speaking it is O(1). In theory (but obviously not practice), we could iterate over all doubles and find the maximum time.
In addition, the time complexity of Math.sqrt(n) does not directly depend on n but instead on the amount of space needed to represent n which for doubles should be constant.
I often see you guys talking about N methods and N^2 methods, which, correct me if I'm wrong, indicate how fast a method is. My question is: how do you guys know which methods are N and which are N^2? And also: are there other speed indications of methods then just N and N^2?
This talks abnout the complexity of an algorithm (which is an indicator of how fast it will be, yes)
In short, it tells how many "operations" (with operations being a very vague and abstract term) will be needed for a input to the method of size "N".
e.g. if your input is an List-type object, and you must iterate over all items in the list, the complexity is "N". (often expressed O(N) ).
if your input is an list-type object, and you need only to look at the first (or last), and the list gurantees to you that such a look at the item is O(1); your method will be O(1) - independent from the input size.
If your input is a list, and you need to compare every item to every other item the complexity will be O(N²) or O(N*log(n))
correct me if I'm wrong, indicate how fast a method is.
Its says how an algorithm will scale on an ideal machine. It deliberately ignores the factor involved which can mean that an O(1) could be slower than an O(N) which could be slower than an O(N^2) for your use-case. e.g. Arrays.sort() will use insertion sort O(N^2) for small collections (length < 47 in Java 7) in preference to quick sort O(N ln N)
In general, using lower order algorithms are a safer choice because they are less likely to break in extreme cases which you may not get a chance to test thoroughly.
The way to guesstimate the big-O complexity of a program is based on experience with dry-running code (running it in your mind). Some cases are dead obvious, but the most interesting ones aren't: for example, calling library methods known to be O(n) in an O(n) loop results in O(n2) total complexity; writing to a TreeMap in an O(n) loop results in O(nlogn) total, and so on.
When it comes to evaluate the time complexity of an algorithm which uses an array that must be initialized, usually it is expressed as O(k). Where k is the size of the array.
For instance, the counting sort has a time complexity of O(n + k).
But what happend when the array is automatically initialized, like in Java or PHP. Would it be fair to say that the counting sort (or any other algorithm that needs an initialized array) in Java (or PHP...) has a time complexity of O(n)?
Are you talking about this http://en.wikipedia.org/wiki/Counting_sort which has an time complexity of O(n + k)?
You have to remember that time complexity is determined for an idealised machine which doesn't have caches, resource constraints and is independent of how a particular language or machine might actually perform.
The time complexity is still O(n + k)
However in a real machine that the initialisation is likely to much more efficient that the incrementing , so n and k are not directly comparable. The pattern for initialisation is like to be sequential and very efficient (the n). If the counts are of type int for example, the CPU could be using long or 128-bit registers to perform the initialisation.
The access pattern for counting is likely to be relatively random and for large values of k likely to be much slower. (up to 10x slower)
actually it would be O(n+k)
thus if n is of a higher order than k (many duplicates in counting sort) it can be discarded in the time complexity making it O(n)
Automatic initialization isn't free, you must account for it anyway, so it's still O(n + k).