I have to write a small Java program to find out how long does it take for performing a search algorithm.
The algorithm reads as:
Assume you have a search algorithm which, at each level of recursion, excludes half of the data from consideration when searching for a specific data item. Search stops only when one data item is left.
I would like to know your opinion about which search algorithm it is.
sounds like a binary search or half-interval search provided that your collection or data is sorted, worst and everage case in O(log n)
if you have to sort first, then the best algorithm will give you O(n log n), then plus O(log n) of binary search , overall it becomes O(n log n)
If your data is sorted, then yes this is binary search algorithm. This algorithm comes under "Divide and Conquer" strategy. In this, each time algorithm goes on dividing the data and half of the data is eliminated. The basic assumption is data is sorted before applying the algorithm.
Related
At what Array Size is it better to use the Sequential Search over the Binary Search (needs to be sorted First) for these specific situations. The first case is when all the values of the array are just random numbers and not sorted. The second case is when the values or random numbers of the array are sorted numerically from least to greatest or greatest to least. For the searches assume you are only trying to find one number in the array.
Case 1: Random numbers
Case 2: Already sorted
The Sequential Search algorithm has a worst case running time of O(n) and does not depend on whether the data is sorted or not.
The Binary Search algorithm has a worst case running time of O(logn), however, in order to use algorithm the data must be sorted. If the data is not sorted, sorting the data will take O(nlogn) time.
Therefore:
Case 1: When the data is not sorted, a Sequential Search will be more time efficient as it will take O(n) time. A Binary Search would require the data to be sorted in O(nlogn) and then searched O(logn). Therefore, the time complexity would be O(nlogn) + O(logn) = O(nlogn).
Case 2: When the data is sorted, a Binary Search will be more time efficient as it will only take O(logn) time, while the Sequential Search will still take O(n) time.
Binary search is always better. But binary search always requires to array should be sorted.
With reference to the question , I have found out the accepted answer used Java Collections API to get the index. My question is there are so many other methods to solve the given problem, which would be the optimal solution ?
Use two loops
Use sorting and binary search
Use sorting and merging
Use hashing
Use Collections api
Use two loops will take O(n ^ 2) time
Use sorting and binary search will take O(nlog n) time
Use sorting and merging O(nlog n) time
Use hashing will take O(k * n) time with some other overhead and additional space.
Use Collections api will take O(n ^ 2) time as its use native algorithm under the hood
You can do it in optimal way along with the ways mentioned above by using Knuth–Morris–Pratt algorithm in linear O(n + m) time complexity where n and m are the lengths of the two arrays.
KMP algorithm is basically a pattern matching algorithm(finding the starting position of a needle in haystack) which works on character string. But you can easily use it for integer array.
You can do some benchmark test for all those implementations and choose which one is efficient enough to suit your requirement.
I have implemented splay tree (insert, search, delete operation) in Java. Now I want to check if the complexity of the algorithm is O(logn) or not. Is there any way to check this by varying the input values (number of nodes) and checking the run time in seconds? Say, by putting input values like 1000, 100000 and checking the run time or is there any other way?
Strictly speaking, you cannot find the time complexity of the algorithm by running it for some values of n. Let's assume that you've run it for values n_1, n_2, ..., n_k. If the algorithm makes n^2 operations for any n <= max(n_1, ..., n_k) and exactly 10^100 operations for any larger value of n, it has a constant time complexity, even though it would look like a quadratic one from the points you have.
However, you can assess the number of operations it takes to complete on an input of a size n (I wouldn't even call it time complexity here, as the latter has a strict formal definition) by running on some values of n and looking at ratios T(n1) / T(n2) and n1 / n2. But even in case of a "real" algorithm (in sense that it is not a pathological case described in the first paragraph), you should be careful with the "structure" of the input (for example, a quick sort algorithm that takes the first element as pivot runs in O(n log n) on average for a random input, so it would look like an O(n log n) if you generate random arrays of different sizes. However, it runs in O(n^2) time for a reversed sorted array).
To sum it up, if you need to figure out if it's fast enough from a practical point of view and you have an idea how a typical input to your algorithm looks like, you can try generating inputs of different sizes and see how the execution time grows.
However, if you need a bound on the runtime in a mathematical sense, you need to prove some properties and bounds of your algorithm mathematically.
In your case, I would say that testing on random inputs can be a reasonable idea (because there is a mathematical proof that the time complexity of one operation is O(log n) for a splay tree), so you just need to check that the tree you have implemented is indeed a correct splay tree. One note: I'd recommend to try different patterns of queries (like inserting elements in sorted/reverse order and so on) as even unbalanced trees can work pretty fast as long as the input is "random".
I am confused on the performance analysis of binarySearch from the Collections
It says:
If the specified list does not implement the RandomAccess interface
and is large, this method will do an iterator-based binary search that
performs O(n) link traversals and O(log n) element comparisons.
I am not sure how to interpret this O(n) + O(log n).
I mean isn't it worse than simply traversing the linked-list and compare? We still get only O(n).
So what does this statement mean about performance? As phrased, I can not understand the difference from a plain linear search in the linked list.
What am I missunderstanding here?
First of all you must understand that without RandomAccess interface the binarySearch cannot simply access, well, random element from the list, but instead it has to use an iterator. That introduces O(n) cost. When the collection implements RandomAccess, cost of each element access is O(1) and can be ignored as far as asymptotic complexity is concerned.
Because O(n) is greater than O(log n) it will always take precedence over O(log n) and dominate the complexity. In this case binarySearch has the same complexity as simple linear search. So what is the advantage?
Linear search performs O(n) comparisons, as opposed to O(log n) with binarySearch without random access. This is especially important when the constant before O(logn) is high. In plain English: when single comparison has a very high cost compared to advancing iterator. This might be quite common scenario, so limiting the number of comparisons is beneficial. Profit!
Binary search is not suited for linked lists. The algorithm is supposed to benefit from a sorted collection with random access (like a plain array), where it can quickly jump from one element to another, splitting the remaining search space in two on each iteration (hence the O(log N) time complexity).
For a linked list, there is a modified version which iterates through all elements (and needs to go through 2n elements in the worst case), but instead of comparing every element, it "probes" the list at specified positions only (hence doing a lower number of comparisons compared to a linear search).
Since comparisons are usually slightly more costly compared to plain pointer iterating, total time should be lower. That is why the log N part is emphasized separately.
why do we use hashing for search? what are advantages of using hashing over binary search tree?
Hashing is generally a constant time operation whereas a Binary Tree has a logarithmic time complexity.
Because a hash is calculated not based on the number of items in the collection but on the item being searched for, the size of the collection has no bearing on the time it takes to find an item. However most hashing algorithms will have collisions which then increases the time complexity so it's very unlikely to get a perfect constant time lookup.
With a binary tree, you have to do up to log2N comparisons before the item can be found.
Wikipedia explains it well:
http://en.wikipedia.org/wiki/Hash_table#Features
Summary: Inserts are generally slow, reads are faster than trees.
As for Java: Any time you have some key/value pair that you read a lot and write not very often and everything easily fits into RAM, use a HashTable for quick read accesses and incredible easy of code maintenance.
Hashing means using some function or
algorithm to map object data to some
representative integer value. This
so-called hash code (or simply hash)
can then be used as a way to narrow
down our search when looking for the
item in the map.
If need to use an algorithm that is
fast for looking up the information
that you need then the HashTable is
the most suitable algorithm to use, as
it is simply generating a hash of your
key object and using that to access
the target data - it is O(1). The
others are O(N) (Linked Lists of size
N - you have to iterate through the
list one at a time, an average of N/2
times) and O(log N) (Binary Tree - you
halve the search space with each
iteration - only if the tree is
balanced, so this depends on your
implementation, an unbalanced tree can
have significantly worse performance).
Hash Tables are best for searching(=) if you have lower inserts and uniform slot distribution. The time complexity is O(n+k) - linear.
They are not a good idea if you want to do comparison operations (<, >)