Do running times match with O(nlogn)? - java

I have written a class(greedy strategy) that at first i used sort method which has O(nlogn)
Collections.sort(array, new
SortingObjectsWithProbabilityField());
and then i used the insert method of binary search tree which takes O(h) and h here is the tree height.
for different n ,the running time will be :
n,running time
17,515428
33,783340
65,540572
129,1285080
257,2052216
513,4299709
which I think is not correct because for increasing n , the running time should almost increase.
This method will take the running time:
Exponent = -1;
for(int n = 2;n<1000;n+=Math.pow(2,exponent){
for (int j = 1; j <= 3; j++) {
Random rand = new Random();
for (int i = 0; i < n; i++) {
Element e = new Element(rand.nextInt(100) + 1, rand.nextInt(100) + 1, 0);
for (int k = 0; k < i; k++) {
if (e.getDigit() == randList.get(k).getDigit()) {
e.setDigit(e.getDigit() + 1);
}
}
randList.add(e);
}
double sum = 0.0;
for (int i = 0; i < randList.size(); i++) {
sum += randList.get(i).getProbability();
}
for (Element i : randList) {
i.setProbability(i.getProbability() / sum);
}
//Get time.
long t2 = System.nanoTime();
GreedyVersion greedy = new GreedyVersion((ArrayList<Element>) randList);
long t3 = System.nanoTime();
timeForGreedy = timeForGreedy + t3 - t2;
}
System.out.println(n + "," + "," + timeForGreedy/3 );
exponent++;
}
thanks

Your data appears to roughly fit an order of nlogn, as we can see below. Notice that the curve is almost linear, as for large values of n, logn is pretty small. For example, for your largest value of n=513, logn is 9.003.
There are ways to achieve more accurate timings, which would likely make the curve fit the data points better. Such as taking a larger sample of random inputs (I'd advise at least 10, 100 if possible) and running multiple iterations per dataset (5 is an acceptable number) to smooth out the inaccuracies of the timer. You can use a single start/stop timer to time all iterations for the same n, and then divide by the number of runs, to get more accurate data points. Just be sure to first generate all data sets, store them all, and then run them all.
Good choice to sample n at powers of 2. You just might want to subtract 1 to make them exactly powers of 2, not that it makes any real impact.
For reference, here's the gnuplot script used to generate the plot:
set terminal png
set output 'graph.png'
set xrange [0:5000000]
set yrange [0:600]
f1(x) = a1*x*log(x)/log(2)
a1 = 1000
plot 'time.dat' title 'Actual runtimes', \
a1*x*log(x)/log(2) title 'Fitted curve: O(nlogn)
fit f1(x) 'time.dat' via a1

It's not that easy to relate asymptotic complexity to running times. When the sample is so small there are lots of things that will affect your timing.
To have more accurate timings you should run your algorithm K times per instance (e.g. K times with 17, K times with 33 and so forth) and take the average time as sample point (e.g. K=100)
That said it looks about right. You can plot nlog(n) vs your timings and you'll see that despite the different scales they are growing similarly. Still too little sample points to be sure...

Related

What is the specific runtime complexity of insertion sort?

Im just going over some basic sorting algorithms. I implemented the below insertion sort.
public static int[] insertionSort(int[] arr){
int I = 0;
for(int i = 0; i < arr.length; i++){
for(int j = 0; j < i; j++){
if(arr[i] < arr[j]){
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
I++;
}
}
System.out.println(I);
return arr;
}
I prints out 4950 for a sized 100 array with 100 randomly generated integers.
I know the algorithm is considered O(n^2), but what would be the more arithmetically correct runtime? If it was actually O(N^2) Iim assuming, would print out 10,000 and not 4950.
Big-Oh notation gives us how much work an algorithm must do as the input size grows bigger. A single input test doesn't give enough information to verify the theoretical Big-Oh. You should run the algorithm on arrays of different sizes from 100 to a million and graph the output with the size of the array as the x-variable and the number of steps that your code outputs as the y-variable. When you do this, you will see that the graph is a parabola.
You can use algebra to get an function in the form y = a*x^2 + b*x +c that fits as close as possible to this data. But with Big-Oh notation, we don't care about the smaller terms because they grow insignificant compared to the x^2 part. For example, when x = 10^3, then x^2 = 10^6 which is much larger than b*x + c. If x = 10^6 then x^2 = 10^12 which again is so much larger than b*x + c that we can ignore these smaller terms.
You can make the following observations: On the ith iteration of the outer loop, the inner loop runs i times, for i from 0 to n-1 where n is the length of the array.
In total over the entire algorithm the inner loop runs T(n) times where
T(n) = 0 + 1 + 2 + ... + (n-1)
This is an arithmetic series and it's easy to prove the sum is equal to a second degree polynomial on n:
T(n) = n*(n-1)/2 = .5*n^2 - .5*n
For n = 100, the formula predicts the inner loop will run T(100) = 100*99/2 = 4950 times which matches what you calculated.

How to handle kmeans when a cluster has zero elements in it

I'm trying to implement KMeans in Java and have encountered a case that throws all of my results out. This happens when, given some randomly chosen initialized centroids, the data gets into a state where one of the centroids doesn't actually define a cluster. For example, if K=3, it could be that 2 of the centroids are closer to all of the data points, in which case during that iteration, I will only have 2 clusters instead of 3.
As I understand KMeans though, when we reset the centroids we need to sum up all of the data points per cluster and divide by the size of the cluster (to get the average). So, this means that we would have a cluster of size 0 and would get our new centroid to be
[0/0, 0/0, ... 0/0]
I have 2 questions about handling this case:
(1) How would we possibly recover from this if we've lost one of our clusters?
(2) Is there some way to account for the division by 0?
The code I have for this logic is as follows:
// do the sums
for (int i = 0; i < numDocuments; i++) {
int value = label[i]; // get the document's label (i.e. 0, 1, 2)
for (int j = 0; j < numWords; j++) {
tempCentroids[value][j] += data[i][j];
}
tally[value]++;
}
// get the average
for (int i = 0; i < k; i++) {
for (int j = 0; j < numWords; j++) {
tempCentroids[i][j] /= (double) tally[i]; // could have division by zero
System.out.println("tally[i] for centroid " + k + " is " + tally[i]);
}
}
Thanks in advance,
“For example, if K=3, it could be that 2 of the centroids are closer to all of the data points, in which case during that iteration, I will only have 2 clusters instead of 3”
I think you can always keep the centroid you chose for the third cluster to be in the third cluster and not in some other cluster. That way, you maintain the number of clusters and you don’t run into the weird case you mentioned. (I am assuming you chose the random centroids to be actual K data points from your dataset)
You might also want to look at K-means ++ algorithm which is the same as the Kmeans algorithm except for the initialization of the cluster-center step. This will lead to (probably) better classifications.

Time complexity on iterative and recursive solution

I'm trying to solve the following problem:
I feel like I've given it a lot of thoughts and tried a lot of stuff. I manage to solve it, and produce correct values but the problem is that it isn't time efficient enough. It completes 2 out of the Kattis tests and fails on the 3 because of the time limit 1 second was exceeded. There is noway for me to see what the input was that they tested with I'm afraid.
I started out with a recursive solution and finished that. But then I realised that it wasn't time efficient enough so I instead tried to switch to an iterative solution.
I start with reading input and add those to an ArrayList. And then I call the following method with target as 1000.
public static int getCorrectWeight(List<Integer> platesArr, int target) {
/* Creates two lists, one for storing completed values after each iteration,
one for storing new values during iteration. */
List<Integer> vals = new ArrayList<>();
List<Integer> newVals = new ArrayList<>();
// Inserts 0 as a first value so that we can start the first iteration.
int best = 0;
vals.add(best);
for(int i=0; i < platesArr.size(); i++) {
for(int j=0; j < vals.size(); j++) {
int newVal = vals.get(j) + platesArr.get(i);
if (newVal <= target) {
newVals.add(newVal);
if (newVal > best) {
best = newVal;
}
} else if ((Math.abs(target-newVal) < Math.abs(target-best)) || (Math.abs(target-newVal) == Math.abs(target-best) && newVal > best)) {
best = newVal;
}
}
vals.addAll(newVals);
}
return best;
}
My question is, is there some way that I can reduce the time complexity on this one for large number of data?
The main problem is that the size of vals and newVals can grow very quickly, as each iteration can double their size. You only need to store 1000 or so values which should be manageable. You're limiting the values but because they're stored in an ArrayList, it ends up with a lot of duplicate values.
If instead, you used a HashSet, then it should help the efficiency a lot.
You only need to store a DP table of size 2001 (0 to 2000)
Let dp[i] represent if it is possible to form ikg of weights. If the weight goes over the array bounds, ignore it.
For example:
dp[0] = 1;
for (int i = 0; i < values.size(); i++){
for (int j = 2000; j >= values[i]; j--){
dp[j] = max(dp[j],dp[j-values[i]);
}
}
Here, values is where all the original weights are stored. All values of dp are to be set to 0 except for dp[0].
Then, check 1000 if it is possible to make it. If not, check 999 and 1001 and so on.
This should run in O(1000n + 2000) time, since n is at most 1000 this should run in time.
By the way, this is a modified knapsack algorithm, you might want to look up some other variants.
If you think too generally about this type of problem, you may think you have to check all possible combinations of input (each weight can be included or excluded), giving you 2n combinations to test if you have n inputs. This is, however, rather beside the point. Rather, the key here is that all weights are integers, and that the goal is 1000.
Let's examine corner cases first, because that limits the search space.
If all weights are >= 1000, pick the smallest.
If there is at least one weight < 1000, that is always better than any weight >= 2000, so you can ignore any weight >= 1000 for combination purposes.
Then, apply dynamic programming. Keep a set (you got HashSet as suggestion from other poster, but BitSet is even better since the maximum value in it is so small) of all combinations of the first k inputs, and increase k by combining all previous solutions with the k+1'th input.
When you have considered all possibilities, just search the bit vector for the best response.
static int count() {
int[] weights = new int[]{900, 500, 498, 4};
// Check for corner case to limit search later
int min = Integer.MAX_VALUE;
for (int weight : weights) min = Math.min(min, weight);
if (min >= 1000) {
return min;
}
// Get all interesting combinations
BitSet combos = new BitSet();
for (int weight : weights) {
if (weight < 1000) {
for (int t = combos.previousSetBit(2000 - weight) ; t >= 0; t = combos.previousSetBit(t-1)) {
combos.set(weight + t);
}
combos.set(weight);
}
}
// Pick best combo
for (int distance = 0; distance <= 1000; distance++) {
if (combos.get(1000 + distance)) {
return 1000 + distance;
}
if (combos.get(1000 - distance)) {
return 1000 - distance;
}
}
return 0;
}

Sum of all prime numbers below 2 million

Problem 10 from Project Euler:
The program runs for smaller numbers and slows to a crawl in the hundred thousands.
At 2 million, an answer fails to show up even though the program seems like it is still running.
I'm trying to implement the Sieve of Eratosthenes. It is supposed to be very fast. What's wrong with my approach?
import java.util.ArrayList;
public class p010
{
/**
* The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17
* Find the sum of all the primes below two million.
* #param args
*/
public static void main(String[] args)
{
ArrayList<Integer> primes = new ArrayList<Integer>();
int upper = 2000000;
for (int i = 2; i < upper; i++)
{
primes.add(i);
}
int sum = 0;
for (int i = 0; i < primes.size(); i++)
{
if (isPrime(primes.get(i)))
{
for (int k = 2; k*primes.get(i) < upper; k++)
{
if (primes.contains(k*primes.get(i)))
{
primes.remove(primes.indexOf(k*primes.get(i)));
}
}
}
}
for (int i = 0; i < primes.size(); i++)
{
sum += primes.get(i);
}
System.out.println(sum);
}
public static boolean isPrime(int number)
{
boolean returnVal = true;
for (int i = 2; i <= Math.sqrt(number); i ++)
{
if (number % i == 0)
{
returnVal = false;
}
}
return returnVal;
}
}
You appear to be trying to implement the Sieve of Eratosthenes which should perform better that O(N^2) (In fact, Wikipedia says it is O(N log(log N)) ...).
The fundamental problem is your choice of data structure. You've chosen to represent the set of remaining prime candidates as an ArrayList of primes. This means that your test to see if a number is still in the set takes O(N) comparisons ... where N is the number of remaining primes. Then you are using ArrayList.remove(int) to remove the non-primes ... which is O(N) also.
That all adds up to making your Sieve implementation worse than O(N^2).
The solution is to replace the ArrayList<Integer> with an boolean[] where the positions (indexes) in the boolean array represent the numbers, and the value of the boolean says whether the number is prime / possibly prime, or not prime.
(There were other problems too that I didn't notice ... see the other answers.)
There are a few issues here. First, lets talk about the algorithm. Your isPrime method is actually the very thing that the sieve is designed to avoid. When you get to a number in the sieve, you already know it's prime, you don't need to test it. If it weren't prime, it would already have been eliminated as a factor of a lower number.
So, point 1:
You can eliminate the isPrime method altogether. It should never return false.
Then, there are implementation issues. primes.contains and primes.remove are problems. They run in linear time on an ArrayList, because they require checking each element or rewriting a large portion of the backing array.
Point 2:
Either mark values in place (use boolean[], or use some other more appropriate data structure.)
I typically use something like boolean primes = new boolean[upper+1], and define n to be included if !(primes[n]). (I just ignore elements 0 and 1 so I don't have to subtract indices.) To "remove" an element, I set it to true. You could also use something like TreeSet<Integer>, I suppose. Using boolean[], the method is near-instantaneous.
Point 3:
sum needs to be a long. The answer (roughly 1.429e11) is larger than the maximum value of an integer (2^31-1)
I can post working code if you like, but here's a test output, without spoilers:
public static void main(String[] args) {
long value;
long start;
long finish;
start = System.nanoTime();
value = arrayMethod(2000000);
finish = System.nanoTime();
System.out.printf("Value: %.3e, time: %4d ms\n", (double)value, (finish-start)/1000000);
start = System.nanoTime();
value = treeMethod(2000000);
finish = System.nanoTime();
System.out.printf("Value: %.3e, time: %4d ms\n", (double)value, (finish-start)/1000000);
}
output:
Using boolean[]
Value: 1.429e+11, time: 17 ms
Using TreeSet<Integer>
Value: 1.429e+11, time: 4869 ms
Edit:
Since spoilers are posted, here's my code:
public static long arrayMethod(int upper) {
boolean[] primes = new boolean[upper+1];
long sum = 0;
for (int i = 2; i <=upper; i++) {
if (!primes[i]) {
sum += i;
for (int k = 2*i; k <= upper; k+=i) {
primes[k] = true;
}
}
}
return sum;
}
public static long treeMethod(int upper) {
TreeSet<Integer> primes = new TreeSet<Integer>();
for (int i = 2; i <= upper; i++) {
primes.add(i);
}
long sum = 0;
for (Integer i = 2; i != null; i=primes.higher(i)) {
sum += i;
for (int k = 2*i; k <= upper; k+=i) {
primes.remove(k);
}
}
return sum;
}
Two things:
Your code is hard to follow. You have a list called "primes", that contains non prime numbers!
Also, you should strongly consider whether or not an array list is appropriate. In this case, a LinkedList would be much more efficient.
Why is this? An array list must constantly resize an array by: asking for new memory to create an array, then copying the old memory over in the newly created array. A Linked list would just resize the memory by changing a pointer. This is a lot quicker! However, I do not think that by making this change you can salvage your algorithm.
You should use an array list if you need to access the items non-sequentially, here, (with a suitable algorithm) you need to access the items sequentially.
Also, your algorithm is slow.Take the advice of SJuan76 (or gyrogearless), thanks sjuan76
The key to the efficiency of classic implementation of the sieve of Eratosthenes on modern CPUs is the direct (i.e. non-sequential) memory access. Fortunately, ArrayList<E> does implement RandomAccess.
Another key to the sieve's efficiency is its conflation of index and value, just like in integer sorting. Actually removing any number from the sequence destroys this ability to directly address without any computations. We must mark, not remove, any composite as we find them, so any numbers greater than it will remain in their places in the sequence.
ArrayList<Integer> can be used for that (except taking more memory than is strictly necessary, but for 2 million this is inconsequential).
So your code with a minimal edit fix (also changing sum to be long as others point out too), becomes
import java.util.ArrayList;
public class Main
{
/**
* The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17
* Find the sum of all the primes below two million.
* #param args
*/
public static void main(String[] args)
{
ArrayList<Integer> primes = new ArrayList<Integer>();
int upper = 5000;
primes.ensureCapacity(upper);
for (int i = 0; i < upper; i++) {
primes.add(i);
}
long sum = 0;
for (int i = 2; i <= upper / i; i++) {
if ( primes.get(i) > 0 ) {
for (int k = i*i; k < upper ; k+=i) {
primes.set(k, 0);
}
}
}
for (int i = 2; i < upper; i++) {
sum += primes.get(i);
}
System.out.println(sum);
}
}
Finds the result for 2000000 in half a second on Ideone. The projected run time for your original code there: between 10 and 400 hours (!).
To find rough estimates for the run time when faced with a slow code, you should always try to find out its empirical orders of growth: run it for some small size n1, then a bigger size n2, record the run times t1 and t2. If t ~ n^a, then a = log(t2/t1) / log(n2/n1).
For your original code the empirical orders of growth measured on 10k .. 20k .. 40k range of upper limit value N, are ~ N^1.7 .. N^1.9 .. N^2.1. For the fixed code it's faster than ~ N (in fact, it's ~ N^0.9 in the tested range 0.5 mln .. 1 mln .. 2 mln). The theoretical complexity is O(N log (log N)).
Your program is not the Sieve of Eratosthenes; the modulo operator gives it away. Your program will be O(n^2), where a proper Sieve of Eratosthenes is O(n log log n), which is essentially n. Here's my version; I'll leave it to you to translate to Java with appropriate numeric datatypes:
function sumPrimes(n)
sum := 0
sieve := makeArray(2..n, True)
for p from 2 to n step 1
if sieve[p]
sum := sum + p
for i from p * p to n step p
sieve[i] := False
return sum
If you're interested in programming with prime numbers, I modestly recommend this essay at my blog.

How to get a 50/50 chance in random generator

I am trying to get a 50/50 chance of get either 1 or 2 in a random generator.
For example:
Random random = new Random();
int num = random.nextInt(2)+1;
This code will output either a 1 or 2.
Let's say I run it in a loop:
for ( int i = 0; i < 100; i++ ) {
int num = random.nextInt(2)+1 ;
}
How can I make the generator make an equal number for 1 and 2 in this case?
So I want this loop to generate 50 times of number 1 and 50 times of number 2.
One way: fill an ArrayList<Integer> with fifty 1's and fifty 2's and then call Collection.shuffle(...) on it.
50/50 is quite easy with Random.nextBoolean()
private final Random random = new Random();
private int next() {
if (random.nextBoolean()) {
return 1;
} else {
return 2;
}
}
Test Run:
final ListMultimap<Integer, Integer> histogram = LinkedListMultimap.create(2);
for (int i = 0; i < 10000; i++) {
nal Integer result = Integer.valueOf(next());
histogram.put(result, result);
}
for (final Integer key : histogram.keySet()) {
System.out.println(key + ": " + histogram.get(key).size());
}
Result:
1: 5056
2: 4944
You can't achieve this with random. If you need exactly 50 1s and 50 2s, you should try something like this:
int[] array = new int[100];
for (int i = 0; i < 50; ++i)
array[i] = 1;
for (int i = 50; i < 100; ++i)
array[i] = 2;
shuffle(array); // implement shuffling algorithm or use an already existing one
EDIT:
I understand that if you are looking to accomplish exactly 50-50 results, then my answer was not accurate. You should use a pre-filled collection, since it is impossible to achive that using any kind of randomness. This considered, my answer is still valid for the title of the question, so, this is it:
Well, you do not need the rnd generator to do this.
Comming from javascript, I would go with a single liner:
return Math.random() > 0.5 ? 1: 2;
Explanation: Math.random() returns a number between 0(inclusive) and 1(exclusive), so, we just examine weather is larger than 0.5 (middle value). In theory there is a 50% change that does.
For a more generic use, you can just replace 1:2 to true:false
You can adjust the probability along the way so that the probability of getting a one decreases as you get more ones. This way you don't always have a 50% chance of getting a one, but you can get the result you expected (exactly 50 ones):
int onesLeft = 50;
for(int i=0;i<100;i++) {
int totalLeft = 100 - i;
// we need a probability of onesLeft out of (totalLeft)
int r = random.nextInt(totalLeft);
int num;
if(r < onesLeft) {
num = 1;
onesLeft --;
} else {
num = 2;
}
}
This has an advantage over shuffling because it generates numbers incrementally so it desn't need memory to store the numbers.
You have already successfully created a random generator that returns 1 or 2 with equal probability.
As (many) other's have mentioned, your next request, to force an exact 50/50 distributions in 100 trials, does not fall in line with random number generation. As shown in https://math.stackexchange.com/questions/12348/probability-of-getting-50-heads-from-tossing-a-coin-100-times, the realistic expectation of that occurring is only around 8%. So even while you might expect 50 of each, that exact outcome is actually rather rare.
The Law of Large Numbers states that you should close in on expected value as your number of trials increases.
So for your actual question: How can I make the generator make an equal number for 1 and 2 in this case?
The best (humorous) answer I can come up with is: "Run it in an infinite loop."

Categories

Resources