I'm doing a project. I need to calculate the complexity of a recursive method. This method is called recursively and uses methods "incomingEdges" and "opposite". Can someone help me to find the complexity of "FUNCTION" method?
public HashMap<String, Integer[]> FUNCTION() {
HashMap<String, Integer[]> times = new HashMap<>();
Integer[] timesAct = new Integer[5];
boolean[] visited = new boolean[graphPertCpm.numVertices()];
Vertex<Activity, String> current = graphPertCpm.getVertex(0);
timesAct[0] = 0;
timesAct[1] = 0;
times.put(current.getElement().getKeyId(), timesAct);
FUNCTION(current, times, visited);
return times;
}
private void FUNCTION(Vertex<Activity, String> current, HashMap<String, Integer[]> times, boolean[] visited) {
if (times.get(current.getElement().getKeyId()) == null) {
for (Edge<Activity, String> inc : graphPertCpm.incomingEdges(current)) {
Vertex<Activity, String> vAdj = graphPertCpm.opposite(current, inc);
FUNCTION(vAdj, times, visited);
}
}
visited[current.getKey()] = true;
for (Entry<Vertex<Activity, String>, Edge<Activity, String>> outs : current.getOutgoing().entrySet()) {
if (!visited[outs.getKey().getKey()]) {
int maxEF = 0;
Vertex<Activity, String> vAdj = graphPertCpm.opposite(current, outs.getValue());
for (Edge<Activity, String> inc : graphPertCpm.incomingEdges(outs.getKey())) {
Integer[] timesAct = times.get(graphPertCpm.opposite(outs.getKey(), inc).getElement().getKeyId());
if (timesAct == null) {
vAdj = graphPertCpm.opposite(vAdj, inc);
FUNCTION(vAdj, times, visited);
} else {
if (timesAct[1] > maxEF) {
maxEF = timesAct[1];
}
}
}
Integer[] timesAct = new Integer[5];
timesAct[0] = maxEF;
timesAct[1] = timesAct[0] + outs.getKey().getElement().getDuration();
times.put(outs.getKey().getElement().getKeyId(), timesAct);
if (visited[vAdj.getKey()] != true) {
FUNCTION(vAdj, times, visited);
}
}
}
visited[current.getKey()] = false;
}
Opposite Method
public Vertex<V, E> opposite(Vertex<V, E> vert, Edge<V, E> e) {
if (e.getVDest() == vert) {
return e.getVOrig();
} else if (e.getVOrig() == vert) {
return e.getVDest();
}
return null;
}
IncomingEdges Method
public Iterable<Edge<V, E>> incomingEdges(Vertex<V, E> v) {
Edge e;
ArrayList<Edge<V, E>> edges = new ArrayList<>();
for (int i = 0; i < numVert; i++) {
for (int j = 0; j < numVert; j++) {
e = getEdge(getVertex(i), getVertex(j));
if (e != null && e.getVDest() == v) {
edges.add(e);
}
}
}
return edges;
}
So, first are you familiar with the concepts of Big-O analysis?
The most common metric for calculating time complexity is Big O notation. This removes all constant factors so that the running time can be estimated in relation to N as N approaches infinity. In general you can think of it like this:
Constant O(1)
statement;
The running time of the statement will not change in relation to N.
Linear O(n)
for ( i = 0; i < N; i++ )
statement;
The running time of the loop is directly proportional to N. When N doubles, so does the running time.
Quadratic O(n2)
for ( i = 0; i < N; i++ ) {
for ( j = 0; j < N; j++ )
statement;
}
The running time of the two loops is proportional to the square of N. When N doubles, the running time increases by N * N.
Logarithmic O(log n)
while ( low <= high ) {
mid = ( low + high ) / 2;
if ( target < list[mid] )
high = mid - 1;
else if ( target > list[mid] )
low = mid + 1;
else break;
}
The running time of the algorithm is proportional to the number of times N can be divided by 2. This is because the algorithm divides the working area in half with each iteration.
Linearithmic O(n log n)
void quicksort ( int list[], int left, int right ){
int pivot = partition ( list, left, right );
quicksort ( list, left, pivot - 1 );
quicksort ( list, pivot + 1, right );
}
Is N * log ( N ). The running time consists of N loops (iterative or recursive) that are logarithmic, thus the algorithm is a combination of linear and logarithmic (also termed linearithmic).
Note that none of this has taken into account best, average, and worst case measures. Each would have its own Big O notation. Also note that this is a VERY simplistic explanation. Big O is the most common, but it's also more complex that I've shown. There are also other notations such as big omega, little o, and big theta. You probably won't encounter them outside of an algorithm analysis course.
Your function can be stripped down to two for-loops with recursive calls and one additional for-loop:
for (Edge<Activity, String> inc : graphPertCpm.incomingEdges(current)) {
Vertex<Activity, String> vAdj = graphPertCpm.opposite(current, inc);
FUNCTION(vAdj, times, visited);
for (Entry<Vertex<Activity, String>, Edge<Activity, String>> outs : current.getOutgoing().entrySet()) {
for (Edge<Activity, String> inc : graphPertCpm.incomingEdges(outs.getKey())) {
FUNCTION(vAdj, times, visited);
Then as suggested, consult the Master Theorem
If you need complexity of Graph Operations, Looking at the Big-O Cheat Sheet yields
Related
My task is to take an ArrayList<SomeType>, and check for duplicate elements. If there is a duplicate element found && it has the same someClassProperty as the first element, use the duplicate element as a parameter in a function call. In the end, remove the duplicates from the original List and the function returns the number of duplicates. (Sorry if my explanation is poor, please look at my code then it's easy to understand)
The problem here is that the code I've come up with is very inefficient and slow, I can't figure out how to make it faster.
public int removeDuplicateElements(){
List<SomeType> duplicates = new ArrayList<SomeType>();
for (int i = 0; i < ListWithDuplicates.size(); i++) {
SomeType firstElement = ListWithDuplicates.get(i);
for (int j = i + 1; j < ListWithDuplicates.size(); j++) {
SomeType otherElement = ListWithDuplicates.get(j);
assert firstElement != otherElement;
if (firstElement.sameProperty(otherElement.getProperty())) {
duplicates.add(otherElement);
firstElement.someFunction(otherElement);
}
}
}
for (SomeType duplicate : duplicates) {
ListWithDuplicates.remove(duplicate);
}
return duplicates.size();
}
Assuming your property implements hashCode() and equals(), you can use it as a key in a HashMap to efficiently exclude duplicates and construct a new list from the remaining values:
public int removeDuplicateElements() {
Map<Property, SomeType> uniques = new HashMap<>();
for (SomeType element : ListWithDuplicates) {
uniques.putIfAbsent(element.getProperty(), element);
}
int duplicates = ListWithDuplicates.size() - uniques.size();
ListWithDuplicates = new ArrayList<>(uniques.values());
return duplicates;
}
A more streamy variant on the above map:
Map<Property, SomeProperty> uniques = ListWithDuplicates.stream()
.collect(Collectors.toMap(SomeType::getProperty, e -> e, (a, b) -> a));
You code runs in O(n^2) time complexity, we can reduce the time complexity by two ways
Sorting the Array
public static <T extends Comparable<T>> int removeDuplicateElements(List<T> listWithDuplicates) {
Collections.sort(listWithDuplicates);
for(int i = 0 ; i < listWithDuplicates.size() - 1 ; i++) {
if(listWithDuplicates.get(i).equals(listWithDuplicates.get(i+1))) {
listWithDuplicates.remove(i--);
}
}
return listWithDuplicates.size();
}
By doing it this way, we reduce the Time Complexity to O(nlogn) but we lose the order of the original array.
Hash Table
public static <T extends Comparable<T>> int removeDuplicateElements(List<T> listWithDuplicates) {
HashMap<T,Boolean> hashMap = new HashMap<>();
for(int i = 0 ; i < listWithDuplicates.size() ; i++) {
if(hashMap.containsKey(listWithDuplicates.get(i)))
listWithDuplicates.remove(i--);
else
hashMap.put(listWithDuplicates.get(i),true);
}
return listWithDuplicates.size();
}
With this way we reduce the time complexity to O(n) but we exchanged it for O(n) memomry complexity.
Note: Existing solution may count the same duplicates several times, thus the number of actual duplicates is miscalculated if the duplicate element is simply added to duplicates list.
An example: input contains 3 elements, all having the same property X, then for element1 two duplicates element2 and element3 are detected, and for element2 the duplicate element3 is detected, thus listWithDuplicates contains 3 entries and one of them is duplicated again.
The duplicates may be detected and collected with O(N) time complexity when using a map Map<SomeProperty, List<SomeType>>, however, additional memory is required for this.
If someFunction needs to be invoked for all predecessors as in the example above:
element1.someFunction(element2);
element1.someFunction(element3);
element2.someFunction(element3);
the following solution may be offered, however, in the worst case it has the same O(N^2) complexity.
public int removeDuplicateElements(List<SomeType> input){
Map<SomeProperty, List<SomeType>> map = input.stream()
.collect(Collectors.grouping(SomeType::getProperty));
int size = input.size();
map.values().stream() // Stream<List<SomeType>>
.filter(lst -> lst.size() > 1) // address only duplicated elements
.forEach(lst -> {
for (int i = 0; i < lst.size() - 1; i++) {
for (int j = i + 1; j < lst.size(); j++) {
lst.get(i).someFunction(lst.get(j));
input.remove(lst.get(j));
}
}
});
return size - input.size();
}
If someFunction needs to be invoked only for the first element a simpler and faster solution using Collectors.toMap can be created:
public int removeDuplicateElements(List<SomeType> input){
Map<SomeProperty, List<SomeType>> map = input.stream()
.collect(Collectors.toMap(
SomeType::getProperty,
x -> x,
(a, b) -> {a.someFunction(b); return a;}
));
int size = input.size();
input.retainAll(map.values());
return size - input.size();
}
Good day SO community,
I am a CS student currently performing an experiment combining MergeSort and InsertionSort. It is understood that for a certain threshold, S, InsertionSort will have a quicker execution time than MergeSort. Hence, by merging both sorting algorithms, the total runtime will be optimized.
However, after running the experiment many times, using a sample size of 1000, and varying sizes of S, the results of the experiment does not give a definitive answer each time. Here is a picture of the better results obtained (Note that half of the time the result is not as definitive):
Now, trying the same algorithm code with a sample size of 3500:
Finally, trying the same algorithm code with a sample size of 500,000 (Note that the y-axis is in milliseconds:
Although logically, the Hybrid MergeSort will be faster when S<=10, as InsertionSort does not have recursive overhead time. However, the results of my mini experiment says otherwise.
Currently, these are the Time Complexities taught to me:
MergeSort: O(n log n)
InsertionSort:
Best Case: θ(n)
Worst Case: θ(n^2)
Finally, I have found an online source: https://cs.stackexchange.com/questions/68179/combining-merge-sort-and-insertion-sort that states that:
Hybrid MergeInsertionSort:
Best Case: θ(n + n log (n/x))
Worst Case: θ(nx + n log (n/x))
I would like to ask if there are results in the CS community that shows definitive proof that a Hybrid MergeSort algorithm will work better than a normal MergeSort algorithm below a certain threshold, S, and if so, why?
Thank you so much SO community, it might be a trivial question, but it really will clarify many questions that I currently have regarding Time Complexities and stuff :)
Note: I am using Java for the coding of the algorithm, and runtime could be affected by the way java stores data in memory..
Code in Java:
public static int mergeSort2(int n, int m, int s, int[] arr){
int mid = (n+m)/2, right=0, left=0;
if(m-n<=s)
return insertSort(arr,n,m);
else
{
right = mergeSort2(n, mid,s, arr);
left = mergeSort2(mid+1,m,s, arr);
return right+left+merge(n,m,s,arr);
}
}
public static int insertSort(int[] arr, int n, int m){
int temp, comp=0;
for(int i=n+1; i<= m; i++){
for(int j=i; j>n; j--){
comp++;
comparison2++;
if(arr[j]<arr[j-1]){
temp = arr[j];
arr[j] = arr[j-1];
arr[j-1] = temp;
}
else
break;
}
}
return comp;
}
public static void shiftArr(int start, int m, int[] arr){
for(int i=m; i>start; i--)
arr[i] = arr[i-1];
}
public static int merge(int n, int m, int s, int[] arr){
int comp=0;
if(m-n<=s)
return 0;
int mid = (n+m)/2;
int temp, i=n, j=mid+1;
while(i<=mid && j<=m)
{
comp++;
comparison2++;
if(arr[i] >= arr[j])
{
if(i==mid++&&j==m && (arr[i]==arr[j]))
break;
temp = arr[j];
shiftArr(i,j++,arr);
arr[i] = temp;
if(arr[i+1]==arr[i]){
i++;
}
}
i++;
}
return comp;
}
The example code isn't a conventional merge sort. The merge function is shifting an array instead of merging runs between the original array and a temporary working array and back.
I tested top down and bottom up merge sorts and both take about 42 ms == 0.042 seconds to sort 500,000 32 bit integers, versus the apparent results in the graph which are 1000 times slower at about 42 seconds instead of 42 ms. I also tested with 10,000,000 integers and it takes a bit over 1 second to sort.
In the past, using C++, I compared a bottom up merge sort with a hybrid bottom up merge / insertion sort, and for 16 million (2^24 == 16,777,216) 32 bit integers, the hybrid sort was about 8% faster with S == 16. S == 64 was slightly slower than S == 16. Visual Studio std::stable_sort is a variation of bottom up merge sort (the temp array is 1/2 the size of the original array) and insertion sort, and uses S == 32.
For small arrays, insertion sort is quicker than merge sort, a combination of cache locality and fewer instructions needed to sort a small array with insertion sort. For pseudo random data and S == 16 to 64, insertion sort was about twice as fast as merge sort.
The relative gain diminishes as the array size increases. Considering the effect on bottom up merge sort, with S == 16, only 4 merge passes are optimized. In my test case with 2^24 == 16,777,216 elements, that's 4/24 = 1/6 ~= 16.7% of the number of passes, resulting in about an 8% improvement (so the insertion sort is about twice as fast as merge sort for those 4 passes). The total times were about 1.52 seconds for the merge only sort, and about 1.40 seconds for the hybrid sort, a 0.12 second gain on a process that only takes 1.52 seconds. For a top down merge sort, with S == 16, the 4 deepest levels of recursion would be optimized.
Update - Example java code for an hybrid in place merge sort / insertion sort with O(n log(n)) time complexity. (Note - auxiliary storage is still consumed on the stack due to recursion.) The in place part is accomplished during merge steps by swapping the data in the area merged into with the data in the area merged from. This is not a stable sort (the order of equal elements is not preserved, due to the swapping during merge steps). Sorting 500,000 integers takes about 1/8th of a second, so I increased this to 16 million (2^24 == 16777216) integers, which takes a bit over 4 seconds. Without the insertion sort, the sort takes about 4.524 seconds, and with the insertion sort with S == 64, the sort takes about 4.150 seconds, about 8.8% gain. With essentially the same code in C, the improvement was less: from 2.88 seconds to 2.75 seconds, about 4.5% gain.
package msortih;
import java.util.Random;
public class msortih {
static final int S = 64; // use insertion sort if size <= S
static void swap(int[] a, int i, int j) {
int tmp = a[i]; a[i] = a[j]; a[j] = tmp;
}
// a[w:] = merged a[i:m]+a[j:n]
// a[i:] = reordered a[w:]
static void wmerge(int[] a, int i, int m, int j, int n, int w) {
while (i < m && j < n)
swap(a, w++, a[i] < a[j] ? i++ : j++);
while (i < m)
swap(a, w++, i++);
while (j < n)
swap(a, w++, j++);
}
// a[w:] = sorted a[b:e]
// a[b:e] = reordered a[w:]
static void wsort(int[] a, int b, int e, int w) {
int m;
if (e - b > 1) {
m = b + (e - b) / 2;
imsort(a, b, m);
imsort(a, m, e);
wmerge(a, b, m, m, e, w);
}
else
while (b < e)
swap(a, b++, w++);
}
// inplace merge sort a[b:e]
static void imsort(int[] a, int b, int e) {
int m, n, w, x;
int t;
// if <= S elements, use insertion sort
if (e - b <= S){
for(n = b+1; n < e; n++){
t = a[n];
m = n-1;
while(m >= b && a[m] > t){
a[m+1] = a[m];
m--;}
a[m+1] = t;}
return;
}
if (e - b > 1) {
// split a[b:e]
m = b + (e - b) / 2;
w = b + e - m;
// wsort -> a[w:e] = sorted a[b:m]
// a[b:m] = reordered a[w:e]
wsort(a, b, m, w);
while (w - b > 2) {
// split a[b:w], w = new mid point
n = w;
w = b + (n - b + 1) / 2;
x = b + n - w;
// wsort -> a[b:x] = sorted a[w:n]
// a[w:n] = reordered a[b:x]
wsort(a, w, n, b);
// wmerge -> a[w:e] = merged a[b:x]+a[n:e]
// a[b:x] = reordered a[w:n]
wmerge(a, b, x, n, e, w);
}
// insert a[b:w] into a[b:e] using left shift
for (n = w; n > b; --n) {
t = a[n-1];
for (m = n; m < e && a[m] < t; ++m)
a[m-1] = a[m];
a[m-1] = t;
}
}
}
public static void main(String[] args) {
int[] a = new int[16*1024*1024];
Random r = new Random(0);
for(int i = 0; i < a.length; i++)
a[i] = r.nextInt();
long bgn, end;
bgn = System.currentTimeMillis();
imsort(a, 0, a.length);
end = System.currentTimeMillis();
for(int i = 1; i < a.length; i++){
if(a[i-1] > a[i]){
System.out.println("failed");
break;
}
}
System.out.println("milliseconds " + (end-bgn));
}
}
Where ever I see Recursive Fibonacci Series everyone tell that
a[i] = fib(i - 1) + fib( i - 2)
But it can also be solved with
a[i] = fib(i - 1) + a[i-2] // If array 'a' is a global variable.
If array 'a' is a global Variable, then a[i-2] will be calculated when it is calculating for a[i-2];
It can be solved with below program in java..
public class Fibonacci {
public static int maxNumbers = 10;
public static double[] arr = new double[maxNumbers];
public static void main(String args[])
{
arr[0] = 0;
arr[1] = 1;
recur(maxNumbers - 1);
}
public static double recur(int i)
{
if( i > 1)
{
arr[i] = recur(i - 1) + arr[i - 2];
}
return arr[i];
}
}
Further more, complexity is also less when compared with original procedure. Is there any disadvantage of doing this way?
You have done the first step for Dynamic Programming calculation of Fibonacci, idea of DP is to avoid redundant calculations, and your algorithm achieve its goal.
A "classic" Bottom-Up DP Fibonacci implementation is filling the elements from lower to higher:
arr[0] = 0
arr[1] = 1
for (int i = 2; i <= n; i++)
arr[i] = arr[i-1] + arr[i-2]
(Optimization could be storing curr,last alone, and modifying them at each iteration.
Your approach is basically the same in principle.
As a side note, the DP approach to calculate Fibonacci is taking O(n) time, where there is even more efficient solution with exponential of the matrix:
1 1
1 0
The above holds because you use the fact that
1 1 F_{n+1} 1*F{n+1} + 1*F{n} F_{n+2}
* = =
1 0 F_{n} 1*F{n+1} + 0*F{n} F_{n+1}
Using exponent by squaring on the above matrix, this can be solved in O(logn).
If you just want the nth fibonacci number you could do this:
static double fib(double prev, double curr, int n) {
if(n == 0)
return curr;
return fib(curr, prev+curr, n-1);
}
Initial conditions would be prev = 0, curr = 1, n = maxNumbers. This function is tail recursive because you don't need to store the return value of the recursive call for any additional calculations. The initial stack frame gets reused (which saves memory) and once you hit your base case the value that's returned is the same value that would be returned from every other recursive call.
By using an array like you do you only recalculate one of the two branches (the longest one in each iteration) ending up with a O(n) complexity.
If you were to keep track on how large fibonacci number you have caclulated earlier you can use that and produce O(max(n-prevn, 1)). Here is an altered version of your code that fills the array from bottom to i if needed:
public class Fibonacci {
public static final int maxNumbers = 93; // fib(93) > Long.MAX_VALUE
public static long[] arr = new long[maxNumbers];
public static int calculatedN = 0;
public static long fib(int i) throws Exception
{
if( i >= maxNumbers )
throw new Exception("value out of bounds");
if( calculatedN == 0 ) {
arr[0] = 0L;
arr[1] = 1L;
calculatedN = 1;
}
if( i > calculatedN ) {
for( int x=calculatedN+1; x<=i; x++ ){
arr[x] = arr[x-2] + arr[x-1];
}
calculatedN = i;
}
return arr[i];
}
public static void main (String args[]) {
try {
System.out.println(fib(50)); // O(50-2)
System.out.println(fib(30)); // O(1)
System.out.println(fib(92)); // O(92-50)
System.out.println(fib(92)); // O(1)
} catch ( Exception e ) { e.printStackTrace(); }
}
}
I changed double to long. If you need larger fibonacci numbers than fib(92) I would change from long to Biginteger.
You can also code using two recursive function but as the same value is calculating over again and again so all You can do a dynamic programming approach where You can store the value and return it where need.Like this one in C++
#include <bits/stdc++.h>
using namespace std;
int dp[100];
int fib(int n){
if(n <= 1)
return n;
if(dp[n]!= -1)
return dp[n];
dp[n] = fib(n-1) + fib(n-2);
return dp[n];
}
int main(){
memset(dp,-1,sizeof(dp));
for(int i=1 ;i<10 ;i++)
cout<<fib(i)<<endl;
}
This is only step from non recursive version:
https://gist.github.com/vividvilla/4641152
General this partially recursive approach looks incredibly messy
As in the title, I want to use Knuth-Fisher-Yates shuffle algorithm to select N random elements from a List but without using List.toArray and change the list. Here is my current code:
public List<E> getNElements(List<E> list, Integer n) {
List<E> rtn = null;
if (list != null && n != null && n > 0) {
int lSize = list.size();
if (lSize > n) {
rtn = new ArrayList<E>(n);
E[] es = (E[]) list.toArray();
//Knuth-Fisher-Yates shuffle algorithm
for (int i = es.length - 1; i > es.length - n - 1; i--) {
int iRand = rand.nextInt(i + 1);
E eRand = es[iRand];
es[iRand] = es[i];
//This is not necessary here as we do not really need the final shuffle result.
//es[i] = eRand;
rtn.add(eRand);
}
} else if (lSize == n) {
rtn = new ArrayList<E>(n);
rtn.addAll(list);
} else {
log("list.size < nSub! ", lSize, n);
}
}
return rtn;
}
It uses list.toArray() to make a new array to avoid modifying the original list. However, my problem now is that my list could be very big, can have 1 million elements. Then list.toArray() is too slow. And my n could range from 1 to 1 million. When n is small (say 2), the function is very in-efficient as it still need to do list.toArray() for a list of 1 million elements.
Can someone help improve the above code to make it more efficient when dealing with large lists. Thanks.
Here I assume Knuth-Fisher-Yates shuffle is the best algorithm to do the job of selecting n random elements from a list. Am I right? I would be very glad to if there is other algorithms better than Knuth-Fisher-Yates shuffle to do the job in terms of the speed and the quality of the results (guarantee real randomness).
Update:
Here is some of my test results:
When selection n from 1000000 elements.
When n<1000000/4 the fastest way to through using Daniel Lemire's Bitmap function to select n random id first then get the elements with these ids:
public List<E> getNElementsBitSet(List<E> list, int n) {
List<E> rtn = new ArrayList<E>(n);
int[] ids = genNBitSet(n, 0, list.size());
for (int i = 0; i < ids.length; i++) {
rtn.add(list.get(ids[i]));
}
return rtn;
}
The genNBitSet is using the code generateUniformBitmap from https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2013/08/14/java/UniformDistinct.java
When n>1000000/4 the Reservoir Sampling method is faster.
So I have built a function to combine these two methods.
You are probably looking for something like Resorvoir Sampling.
Start with an initial array with first k elements, and modify it with new elements with decreasing probabilities:
java like pseudo code:
E[] r = new E[k]; //not really, cannot create an array of generic type, but just pseudo code
int i = 0;
for (E e : list) {
//assign first k elements:
if (i < k) { r[i++] = e; continue; }
//add current element with decreasing probability:
j = random(i++) + 1; //a number from 1 to i inclusive
if (j <= k) r[j] = e;
}
return r;
This requires a single pass on the data, with very cheap ops every iteration, and the space consumption is linear with the required output size.
If n is very small compared to the length of the list, take an empty set of ints and keep adding a random index until the set has the right size.
If n is comparable to the length of the list, do the same, but then return items in the list that don't have indexes in the set.
In the middle ground, you can iterate through the list, and randomly select items based on how many items you've seen, and how many items you've already returned. In pseudo-code, if you want k items from N:
for i = 0 to N-1
if random(N-i) < k
add item[i] to the result
k -= 1
end
end
Here random(x) returns a random number between 0 (inclusive) and x (exclusive).
This produces a uniformly random sample of k elements. You could also consider making an iterator to avoid building the results list to save memory, assuming the list is unchanged as you're iterating over it.
By profiling, you can determine the transition point where it makes sense to switch from the naive set-building method to the iteration method.
Let's assume that you can generate n random indices out of m that are pairwise disjoint and then look them up efficiently in the collection. If you don't need the order of the elements to be random, then you can use an algorithm due to Robert Floyd.
Random r = new Random();
Set<Integer> s = new HashSet<Integer>();
for (int j = m - n; j < m; j++) {
int t = r.nextInt(j);
s.add(s.contains(t) ? j : t);
}
If you do need the order to be random, then you can run Fisher--Yates where, instead of using an array, you use a HashMap that stores only those mappings where the key and the value are distinct. Assuming that hashing is constant time, both of these algorithms are asymptotically optimal (though clearly, if you want to randomly sample most of the array, then there are data structures with better constants).
Just for convenience: A MCVE with an implementation of the Resorvoir Sampling proposed by amit (possible upvotes should go to him (I'm just hacking some code))
It seems like this is indeed a algorithm that nicely covers the cases of where the number of elements to select is low compared to the list size, and the cases where the number of elements is high compared to the list size (assumung that the properties about the randomness of the result that are stated on the wikipedia page are correct).
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Random;
import java.util.TreeMap;
public class ReservoirSampling
{
public static void main(String[] args)
{
example();
//test();
}
private static void test()
{
List<String> list = new ArrayList<String>();
list.add("A");
list.add("B");
list.add("C");
list.add("D");
list.add("E");
int size = 2;
int runs = 100000;
Map<String, Integer> counts = new TreeMap<String, Integer>();
for (int i=0; i<runs; i++)
{
List<String> sample = sample(list, size);
String s = createString(sample);
Integer count = counts.get(s);
if (count == null)
{
count = 0;
}
counts.put(s, count+1);
}
for (Entry<String, Integer> entry : counts.entrySet())
{
System.out.println(entry.getKey()+" : "+entry.getValue());
}
}
private static String createString(List<String> list)
{
Collections.sort(list);
StringBuilder sb = new StringBuilder();
for (String s : list)
{
sb.append(s);
}
return sb.toString();
}
private static void example()
{
List<String> list = new ArrayList<String>();
for (int i=0; i<26; i++)
{
list.add(String.valueOf((char)('A'+i)));
}
for (int i=1; i<=26; i++)
{
printExample(list, i);
}
}
private static <T> void printExample(List<T> list, int size)
{
System.out.printf("%3d elements: "+sample(list, size)+"\n", size);
}
private static final Random random = new Random(0);
private static <T> List<T> sample(List<T> list, int size)
{
List<T> result = new ArrayList<T>(Collections.nCopies(size, (T) null));
int i = 0;
for (T element : list)
{
if (i < size)
{
result.set(i, element);
i++;
continue;
}
i++;
int j = random.nextInt(i);
if (j < size)
{
result.set(j, element);
}
}
return result;
}
}
If n is way smaller then size, you could use this algorith, witch is unfortunatly quadratic with n, but doest depend on size of array at all.
Example with size = 100 and n = 4.
choose random number from 0 to 99, lets say 42, and add it to result.
choose random number from 0 to 98, lets say 39, and add it to result.
choose random number from 0 to 97, lets say 41, but since 41 is bigger or equal than 39, increment it by 1, so you have 42, but that is bigger then equal than 42, so you have 43.
...
Shortly, you choose from remaining numbers and then compuce what number have you acctualy chosen. I would use link list for this, but maybe there are better data structures.
Summarizing Changwang's update. If you want more than 250,000 items, use amit's answer. Otherwise use Knuth-Fisher-Yates Shuffle as shown in entirety here
NOTE: The result is always in the original order as well
public static <T> List<T> getNRandomElements(int n, List<T> list) {
List<T> subList = new ArrayList<>(n);
int[] ids = generateUniformBitmap(n, list.size());
for (int id : ids) {
subList.add(list.get(id));
}
return subList;
}
// https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2013/08/14/java/UniformDistinct.java
private static int[] generateUniformBitmap(int num, int max) {
if (num > max) {
DebugUtil.e("Can't generate n ints");
}
int[] ans = new int[num];
if (num == max) {
for (int k = 0; k < num; ++k) {
ans[k] = k;
}
return ans;
}
BitSet bs = new BitSet(max);
int cardinality = 0;
Random random = new Random();
while (cardinality < num) {
int v = random.nextInt(max);
if (!bs.get(v)) {
bs.set(v);
cardinality += 1;
}
}
int pos = 0;
for (int i = bs.nextSetBit(0); i >= 0; i = bs.nextSetBit(i + 1)) {
ans[pos] = i;
pos += 1;
}
return ans;
}
If you want them randomized, I use:
public static <T> List<T> getNRandomShuffledElements(int n, List<T> list) {
List<T> randomElements = getNRandomElements(n, list);
Collections.shuffle(randomElements);
return randomElements;
}
I needed something for this in C#, here's my solution which works on a generic List.
It selects N random elements of the list and places them at the front of the list.
So upon returning, the first N elements of the list are randomly selected. It is fast and efficient even when you're dealing with a very large number of elements.
static void SelectRandom<T>(List<T> list, int n)
{
if (n >= list.Count)
{
// n should be less than list.Count
return;
}
int max = list.Count;
var random = new Random();
for (int i = 0; i < n; i++)
{
int r = random.Next(max);
max = max - 1;
int irand = i + r;
if (i != irand)
{
T rand = list[irand];
list[irand] = list[i];
list[i] = rand;
}
}
}
I am trying to Implement a solutions to find k-th largest element in a given integer list with duplicates with O(N*log(N)) average time complexity in Big-O notation, where N is the number of elements in the list.
As per my understanding Merge-sort has an average time complexity of O(N*log(N)) however in my below code I am actually using an extra for loop along with mergesort algorithm to delete duplicates which is definitely violating my rule of find k-th largest element with O(N*log(N)). How do I go about it by achieving my task O(N*log(N)) average time complexity in Big-O notation?
public class FindLargest {
public static void nthLargeNumber(int[] arr, String nthElement) {
mergeSort_srt(arr, 0, arr.length - 1);
// remove duplicate elements logic
int b = 0;
for (int i = 1; i < arr.length; i++) {
if (arr[b] != arr[i]) {
b++;
arr[b] = arr[i];
}
}
int bbb = Integer.parseInt(nthElement) - 1;
// printing second highest number among given list
System.out.println("Second highest number is::" + arr[b - bbb]);
}
public static void mergeSort_srt(int array[], int lo, int n) {
int low = lo;
int high = n;
if (low >= high) {
return;
}
int middle = (low + high) / 2;
mergeSort_srt(array, low, middle);
mergeSort_srt(array, middle + 1, high);
int end_low = middle;
int start_high = middle + 1;
while ((lo <= end_low) && (start_high <= high)) {
if (array[low] < array[start_high]) {
low++;
} else {
int Temp = array[start_high];
for (int k = start_high - 1; k >= low; k--) {
array[k + 1] = array[k];
}
array[low] = Temp;
low++;
end_low++;
start_high++;
}
}
}
public static void main(String... str) {
String nthElement = "2";
int[] intArray = { 1, 9, 5, 7, 2, 5 };
FindLargest.nthLargeNumber(intArray, nthElement);
}
}
Your only problem here is that you don't understand how to do the time analysis. If you have one routine which takes O(n) and one which takes O(n*log(n)), running both takes a total of O(n*log(n)). Thus your code runs in O(n*log(n)) like you want.
To do things formally, we would note that the definition of O() is as follows:
f(x) ∈ O(g(x)) if and only if there exists values c > 0 and y such that f(x) < cg(x) whenever x > y.
Your merge sort is in O(n*log(n)) which tells us that its running time is bounded above by c1*n*log(n) when n > y1 for some c1,y1. Your duplication elimination is in O(n) which tells us that its running time is bounded above by c2*n when n > y2 for some c2 and y2. Using this, we can know that the total running time of the two is bounded above by c1*n*log(n)+c2*n when n > max(y1,y2). We know that c1*n*log(n)+c2*n < c1*n*log(n)+c2*n*log(n) because log(n) > 1, and this, of course simplifies to (c1+c2)*n*log(n). Thus, we can know that the running time of the two together is bounded above by (c1+c2)*n*log(n) when n > max(y1,y2) and thus, using c1+c2 as our c and max(y1,y2) as our y, we know that the running time of the two together is in O(n*log(n)).
Informally, you can just know that faster growing functions always dominate, so if one piece of code is O(n) and the second is O(n^2), the combination is O(n^2). If one is O(log(n)) and the second is O(n), the combination is O(n). If one is O(n^20) and the second is O(n^19.99), the combination is O(n^20). If one is O(n^2000) and the second is O(2^n), the combination is O(2^n).
Problem here is your merge routine where you have used another loop which i donot understand why, Hence i would say your algorithm of merge O(n^2) which changes your merge sort time to O(n^2).
Here is a pseudo code for typical O(N) merge routine :-
void merge(int low,int high,int arr[]) {
int buff[high-low+1];
int i = low;
int mid = (low+high)/2;
int j = mid +1;
int k = 0;
while(i<=mid && j<=high) {
if(arr[i]<arr[j]) {
buff[k++] = arr[i];
i++;
}
else {
buff[k++] = arr[j];
j++;
}
}
while(i<=mid) {
buff[k++] = arr[i];
i++;
}
while(j<=high) {
buff[k++] = arr[j];
j++;
}
for(int x=0;x<k;x++) {
arr[low+x] = buff[x];
}
}