Sample without replacement in Java with probabilities

Sample without replacement in Java with probabilities - java

I have a list of 10 probabilities (assume these are sorted in descending order): <p1, p2, ..., p10>. I want to sample (without replacement) 10 elements such that the probability of selecting i-th index is p_i.
Is there a ready to use Java method in common libraries like Random, etc that I could use to do that?
Example: 5-element list: <0.4,0.3,0.2,0.1,0.0>
Select 5 indexes (no duplicates) such that their probability of selection is given by the probability at that index in the list above. So index 0 would be selected with probability 0.4, index 1 selected with prob 0.3 and so on.
I have written my own method to do that but feel that an existing method would be better to use. If you are aware of such a method, please let me know.

This is how this is typically done:
static int sample(double[] pdf) {
// Transform your probabilities into a cumulative distribution
double[] cdf = new double[pdf.length];
cdf[0] = pdf[0];
for(int i = 1; i < pdf.length; i++)
cdf[i] += pdf[i] + cdf[i-1];
// Let r be a probability [0,1]
double r = Math.random();
// Search the bin corresponding to that quantile
int k = Arrays.binarySearch(cdf, random.nextDouble());
k = k >= 0 ? k : (-k-1);
return k;
}
If you want to return a probability do:
return pdf[k];
EDIT: I just noticed you say in the title sampling without replacement. This is not so trivial to do fast (I can give you some code I have for that). Anyhow, your question does not make any sense in that case. You cannot sample without replacement from a probability distribution. You need absolute frequencies.
i.e. If I tell you that I have a box filled with two balls: orange and blue with the proportions 20% and 80%. If you do not tell me how many balls you have of each (in absolute terms), I cannot tell you how many balls you will have in a few turns.
EDIT2: A faster version. This is not how it is typically but I have found this suggestion on the web, and I have used it in projects of mine as well.
static int sample(double[] pdf) {
double r = random.nextDouble();
for(int i = 0; i < pdf.length; i++) {
if(r < pdf[i])
return i;
r -= pdf[i];
}
return pdf.length-1; // should not happen
}
To test this:
// javac Test.java && java Test
import java.util.Arrays;
import java.util.Random;
class Test
{
static Random random = new Random();
public static void sample(double[] pdf) {
...
}
public static void main(String[] args) {
double[] pdf = new double[] { 0.3, 0.4, 0.2, 0.1 };
int[] counts = new int[pdf.length];
final int tests = 1000000;
for(int i = 0; i < tests; i++)
counts[sample(pdf)]++;
for(int i = 0; i < counts.length; i++)
System.out.println(counts[i] / (double)tests);
}
}
You can see we get output very similar to the PDF that was used:
0.3001356
0.399643
0.2001143
0.1001071
This are the times I get when running each version:
1st version: 0m0.680s
2nd version: 0m0.296s

Use sample[i] as index of your values array.
public static int[] WithoutReplacement(int m, int n) {
int[] perm = new int[n];
for (int i = 0; i < n; i++) {
perm[i] = i;
}
//take sample
for (int i = 0; i < m; i++) {
int r = i + (int) (Math.random() * (n - 1));
int tmp = perm[i];
perm[i] = perm[r];
perm[r] = tmp;
}
int[] sample = new int[m];
for (int i = 0; i < m; i++) {
sample[i] = perm[i];
}
return sample;
}

Related

How to reduce the number of iterations in a program where number of elements in the test case is very large?

Please refer to this problem from Hackerrank
HackerLand National Bank has a simple policy for warning clients about possible fraudulent account activity. If the amount spent by a client on a particular day is greater than or equal to the client's median spending for a trailing number of days, they send the client a notification about potential fraud. The bank doesn't send the client any notifications until they have at least that trailing number of prior days' transaction data.
I have written the following code. However, the code is working for some of the test cases and is getting 'terminated due to timeout' for some. Can anyone please tell how can I improve the code?
import java.io.*;
import java.math.*;
import java.security.*;
import java.text.*;
import java.util.*;
import java.util.concurrent.*;
import java.util.regex.*;
public class Solution {
// Complete the activityNotifications function below.
static int activityNotifications(int[] expenditure, int d) {
//Delaring Variables
int iterations,itr,length,median,midDummy,midL,midR, midDummy2,i,i1,temp,count;
float mid,p,q;
length = expenditure.length;
iterations = length-d;
i=0;
i1=0;
itr=0;
count = 0;
int[] exSub = new int[d];
while(iterations>0)
{
// Enter the elements in the subarray
while(i1<d)
{
exSub[i1]=expenditure[i+i1];
//System.out.println(exSub[i1]);
i1++;
}
//Sort the exSub array
for(int k=0; k<(d-1); k++)
{
for(int j=k+1; j<d; j++)
{
if(exSub[j]<exSub[k])
{
temp = exSub[j];
exSub[j] = exSub[k];
exSub[k] = temp;
}
}
}
//Printing the exSub array in each iteration
for(int l = 0 ; l<d ; l++)
{
System.out.println(exSub[l]);
}
i1=0;
//For each iteration claculate the median
if(d%2 == 0) // even
{
midDummy = d/2;
p= (float)exSub[midDummy];
q= (float)exSub[midDummy-1];
mid = (p+q)/2;
//mid = (exSub[midDummy]+exSub [midDummy-1])/2;
//System.out.println(midDummy);
}
else // odd
{
midDummy2 =d/2;
mid=exSub[midDummy2];
//System.out.println(midDummy2);
}
if(expenditure[itr+d]>=2*mid)
{
count++;
}
itr++;
i++;
iterations--;
System.out.println("Mid:"+mid);
System.out.println("---------");
}
System.out.println("Count:"+count);
return count;
}
private static final Scanner scanner = new Scanner(System.in);
public static void main(String[] args) throws IOException {
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(System.getenv("OUTPUT_PATH")));
String[] nd = scanner.nextLine().split(" ");
int n = Integer.parseInt(nd[0]);
int d = Integer.parseInt(nd[1]);
int[] expenditure = new int[n];
String[] expenditureItems = scanner.nextLine().split(" ");
scanner.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
for (int i = 0; i < n; i++) {
int expenditureItem = Integer.parseInt(expenditureItems[i]);
expenditure[i] = expenditureItem;
}
int result = activityNotifications(expenditure, d);
bufferedWriter.write(String.valueOf(result));
bufferedWriter.newLine();
bufferedWriter.close();
scanner.close();
}
}

The first rule on performance improvement is: Don't improve the performance if it's not needed.
Performance improvements usually lead to code that is less readable and therefore it should only be done when it's really needed.
The second rule is: Improve algorithms and data-structures before low-level improvements.
If you need to improve the performance of your code always try to use more efficient algorithms and data-structures before going to low-level improvement. In your code example that would be: Don't use BubbleSort, but try to use more efficient algorithms like Quicksort or Mergesort, because they use time complexity of O(n*log(n) while Bubble sort has a time complexity of O(n^2) which is much slower when you have to sort big arrays. You can use Arrays.sort(int[]) to do this.
Your data-structures are only arrays so this can't be improved in your code.
This will give your code quite some speedup, and will not lead to a code that can't be read anymore. Improvements like changing simple calculations to slightly faster calculations using bitshifts and other fast calculations (that are pretty hard to understand if used to often) will almost always lead to a code that is only slightly faster but no one will be able to easily understand it anymore.
Some improvements that could be applied to your code (that will also only slightly improve the performance) are:
Replace while loops with for loops if possible (they can be improved by the compiler)
Don't use System.out.println for many texts if it's not totaly needed (because it's quite slow for big texts)
Try to copy arrays using System.arraycopy which usually is faster than copying using while loops
So an improved code of yours could look like this (I marked the changed parts with comments):
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Arrays;
import java.util.Scanner;
public class Solution {
// Complete the activityNotifications function below.
static int activityNotifications(int[] expenditure, int d) {
//Delaring Variables
int iterations, itr, length, median, midDummy, midL, midR, midDummy2, i, i1, temp, count;
float mid, p, q;
length = expenditure.length;
iterations = length - d;
i = 0;
i1 = 0;
itr = 0;
count = 0;
int[] exSub = new int[d];
//EDIT: replace while loops with for loops if possible
//while (iterations > 0) {
for (int iter = 0; iter < iterations; iter++) {
//EDIT: here you can again use a for loop or just use System.arraycopy which should be (slightly) fasters
// Enter the elements in the subarray
/*while (i1 < d) {
exSub[i1] = expenditure[i + i1];
//System.out.println(exSub[i1]);
i1++;
}*/
System.arraycopy(expenditure, i, exSub, 0, d);
//EDIT: Don't use bubble sort!!! It's one of the worst sorting algorithms, because it's really slow
//Bubble sort uses time complexity O(n^2); others (like merge-sort or quick-sort) only use O(n*log(n))
//The easiest and fastest solution is: don't implement sorting by yourself, but use Arrays.sort(int[]) from the java API
//Sort the exSub array
/*for (int k = 0; k < (d - 1); k++) {
for (int j = k + 1; j < d; j++) {
if (exSub[j] < exSub[k]) {
temp = exSub[j];
exSub[j] = exSub[k];
exSub[k] = temp;
}
}
}*/
Arrays.sort(exSub);
//Printing the exSub array in each iteration
//EDIT: printing many results also takes much time, so only print the results if it's really needed
/*for (int l = 0; l < d; l++) {
System.out.println(exSub[l]);
}*/
i1 = 0;
//For each iteration claculate the median
if (d % 2 == 0) // even
{
midDummy = d / 2;
p = (float) exSub[midDummy];
q = (float) exSub[midDummy - 1];
mid = (p + q) / 2;
//mid = (exSub[midDummy]+exSub [midDummy-1])/2;
//System.out.println(midDummy);
}
else // odd
{
midDummy2 = d / 2;
mid = exSub[midDummy2];
//System.out.println(midDummy2);
}
if (expenditure[itr + d] >= 2 * mid) {
count++;
}
itr++;
i++;
//iterations--;//EDIT: don't change iterations anymore because of the for loop
System.out.println("Mid:" + mid);
System.out.println("---------");
}
System.out.println("Count:" + count);
return count;
}
private static final Scanner scanner = new Scanner(System.in);
public static void main(String[] args) throws IOException {
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(System.getenv("OUTPUT_PATH")));
String[] nd = scanner.nextLine().split(" ");
int n = Integer.parseInt(nd[0]);
int d = Integer.parseInt(nd[1]);
int[] expenditure = new int[n];
String[] expenditureItems = scanner.nextLine().split(" ");
scanner.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
for (int i = 0; i < n; i++) {
int expenditureItem = Integer.parseInt(expenditureItems[i]);
expenditure[i] = expenditureItem;
}
int result = activityNotifications(expenditure, d);
bufferedWriter.write(String.valueOf(result));
bufferedWriter.newLine();
bufferedWriter.close();
scanner.close();
}
}
Edit:
You can make the solution even faster if you don't sort the complete (sub-)array in every iteration, but instead only remove one value (the first day that is not used anymore) and add a new value (the new day that is now used) in the correct position (like #Vojtěch Kaiser mentioned in his answer)
This will make it even faster, because sorting an array takes the time O(d*log(d)), while adding a new value into an array, that is already sorted only takes the time O(log(d)) if you are using a search tree. When using an array (like I did in the example below) it takes the time O(d) because when using an array you need to copy the array values which takes linear time (like #dyukha mentioned in the comments). So the improvement (again) can be done by using a better algorithm (This solution could also be improved by using a search tree instead of an array).
So the new solution could look like this:
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Arrays;
import java.util.Scanner;
public class Solution {
// Complete the activityNotifications function below.
static int activityNotifications(int[] expenditure, int d) {
//Delaring Variables
int iterations, length, midDummy, midDummy2, count;//EDIT: removed some unused variables here
float mid, p, q;
length = expenditure.length;
iterations = length - d;
count = 0;
//EDIT: add the first d values to the sub-array and sort it (only once)
int[] exSub = new int[d];
System.arraycopy(expenditure, 0, exSub, 0, d);
Arrays.sort(exSub);
for (int iter = 0; iter < iterations; iter++) {
//EDIT: don't sort the complete array in every iteration
//instead remove the one value (the first day that is not used anymore) and add the new value (of the new day) into the sorted array
//sorting is done in O(n * log(n)); deleting and inserting a new value into a sorted array is done in O(log(n))
if (iter > 0) {//not for the first iteration
int remove = expenditure[iter - 1];
int indexToRemove = find(exSub, remove);
//remove the index and move the following values one index to the left
exSub[indexToRemove] = 0;//not needed; just to make it more clear what's happening
System.arraycopy(exSub, indexToRemove + 1, exSub, indexToRemove, exSub.length - indexToRemove - 1);
exSub[d - 1] = 0;//not needed again; just to make it more clear what's happening
int newValue = expenditure[iter + d - 1];
//insert the new value to the correct position
insertIntoSortedArray(exSub, newValue);
}
//For each iteration claculate the median
if (d % 2 == 0) // even
{
midDummy = d / 2;
p = exSub[midDummy];
q = exSub[midDummy - 1];
mid = (p + q) / 2;
//mid = (exSub[midDummy]+exSub [midDummy-1])/2;
//System.out.println(midDummy);
}
else // odd
{
midDummy2 = d / 2;
mid = exSub[midDummy2];
//System.out.println(midDummy2);
}
if (expenditure[iter + d] >= 2 * mid) {
count++;
}
}
System.out.println("Count:" + count);
return count;
}
/**
* Find the position of value in expenditure
*/
private static int find(int[] array, int value) {
int index = -1;
for (int i = 0; i < array.length; i++) {
if (array[i] == value) {
index = i;
}
}
return index;
}
/**
* Find the correct position to insert value into the array by bisection search
*/
private static void insertIntoSortedArray(int[] array, int value) {
int[] indexRange = new int[] {0, array.length - 1};
while (indexRange[1] - indexRange[0] > 0) {
int mid = indexRange[0] + (indexRange[1] - indexRange[0]) / 2;
if (value > array[mid]) {
if (mid == indexRange[0]) {
indexRange[0] = mid + 1;
}
else {
indexRange[0] = mid;
}
}
else {
if (mid == indexRange[1]) {
indexRange[1] = mid - 1;
}
else {
indexRange[1] = mid;
}
}
}
System.arraycopy(array, indexRange[0], array, indexRange[0] + 1, array.length - indexRange[0] - 1);
array[indexRange[0]] = value;
}
private static final Scanner scanner = new Scanner(System.in);
public static void main(String[] args) throws IOException {
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(System.getenv("OUTPUT_PATH")));
String[] nd = scanner.nextLine().split(" ");
int n = Integer.parseInt(nd[0]);
int d = Integer.parseInt(nd[1]);
int[] expenditure = new int[n];
String[] expenditureItems = scanner.nextLine().split(" ");
scanner.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
for (int i = 0; i < n; i++) {
int expenditureItem = Integer.parseInt(expenditureItems[i]);
expenditure[i] = expenditureItem;
}
int result = activityNotifications(expenditure, d);
bufferedWriter.write(String.valueOf(result));
bufferedWriter.newLine();
bufferedWriter.close();
scanner.close();
//Just for testing; can be deleted if you don't need it
/*int[] exp = new int[] {2, 3, 4, 2, 3, 6, 8, 4, 5};
int d = 5;
activityNotifications(exp, d);
int[] exp2 = new int[] {1, 2, 3, 4, 4};
d = 4;
activityNotifications(exp2, d);*/
}
}

Your main concern is that you are sorting the partial array in every iteration, costing you total complexity of the problem O(n d log(d)), which can get pretty hairy for large d values.
What you want is to keep the array sorted between iterations and sort in/out changed values. For that you would implement binary search tree (BST) or some other balanced option (AVL, ...), perform O(log(d)) removal of oldest value, then perform O(log(d)) insertion of new value, and simply look in the middle for median. Total asymptotic complexity would be O(n log(d)) which is as far as I know the best you can get - rest of the optimization is low level dirty work.
Take a look at java https://docs.oracle.com/javase/10/docs/api/java/util/TreeSet.html, which should take care of the most of the work, but keep in mind that underlying structure is made out of objects that will be slower than arrays.

How to create a number generator that will only pick a number 1 time?

I am creating a concentration game.
I have an buffered image array where I load in a 25 image sprite sheet.
public static BufferedImage[] card = new BufferedImage[25];
0 index being the card back. and 1 - 24 being the values for the face of the cards to check against if the cards match.
What I am tying to do is this I will have 4 difficulties Easy, Normal, Hard, and Extreme. Each difficulty will have a certain amount of cards it will need to draw and then double the ones it chosen. for example the default level will be NORMAL which is 12 matches so it need to randomly choose 12 unique cards from the Buffered Image array and then double each value so it will only have 2 of each cards and then shuffle the results.
This is what I got so far but it always seems to have duplicates about 99% of the time.
//generate cards
Random r = new Random();
int j = 0;
int[] rowOne = new int[12];
int[] rowTwo = new int[12];
boolean[] rowOneBool = new boolean[12];
for(int i = 0; i < rowOneBool.length; i++)
rowOneBool[i] = false;
for(int i = 0; i < rowOne.length; i++){
int typeId = r.nextInt(12)+1;
while(rowOneBool[typeId]){
typeId = r.nextInt(12)+1;
if(rowOneBool[typeId] == false);
}
rowOne[i] = typeId;
j=0;
}
the 3 amounts I will be needing to generate is Easy 6, Normal 12, and Hard 18 extreme will use all of the images except index 0 which is the back of the cards.

This is more or less in the nature of random numbers. Sometimes they are duplicates. You can easily factor that in though if you want them to be more unique. Just discard the number and generate again if it's not unique.
Here's a simple method to generate unique random numbers with a specified allowance of duplicates:
public static void main(String[] args) {
int[] randoms = uniqueRandoms(new int[16], 1, 25, 3);
for (int r : randoms) System.out.println(r);
}
public static int[] uniqueRandoms(int[] randoms, int lo, int hi, int allowance) {
// should do some error checking up here
int range = hi - lo, duplicates = 0;
Random gen = new Random();
for (int i = 0, k; i < randoms.length; i++) {
randoms[i] = gen.nextInt(range) + lo;
for (k = 0; k < i; k++) {
if (randoms[i] == randoms[k]) {
if (duplicates < allowance) {
duplicates++;
} else {
i--;
}
break;
}
}
}
return randoms;
}
Edit: Tested and corrected. Now it works. : )

From what I understand from your question, the answer should look something like this:
Have 2 classes, one called Randp and the other called Main. Run Main, and edit the code to suit your needs.
package randp;
public class Main {
public static void main(String[] args) {
Randp randp = new Randp(10);
for (int i = 0; i < 10; i++) {
System.out.print(randp.nextInt());
}
}
}
package randp;
public class Randp {
private int numsLeft;
private int MAX_VALUE;
int[] chooser;
public Randp(int startCounter) {
MAX_VALUE = startCounter; //set the amount we go up to
numsLeft = startCounter;
chooser = new int[MAX_VALUE];
for (int i = 1; i <= chooser.length; i++) {
chooser[i-1] = i; //fill the array up
}
}
public int nextInt() {
if(numsLeft == 0){
return 0; //nothing left in the array
}
int a = chooser[(int)(Math.random() * MAX_VALUE)]; //picking a random index
if(a == 0) {
return this.nextInt(); //we hit an index that's been used already, pick another one!
}
chooser[a-1] = 0; //don't want to use it again
numsLeft--; //keep track of the numbers
return a;
}
}

This is how I would handle it. You would move your BufferedImage objects to a List, although I would consider creating an object for the 'cards' you're using...
int removalAmount = 3; //Remove 3 cards at random... Use a switch to change this based upon difficulty or whatever...
List<BufferedImage> list = new ArrayList<BufferedImage>();
list.addAll(Arrays.asList(card)); // Add the cards to the list, from your array.
Collections.shuffle(list);
for (int i = 0; i < removalAmount; i++) {
list.remove(list.size() - 1);
}
list.addAll(list);
Collections.shuffle(list);
for (BufferedImage specificCard : list) {
//Do something
}

Ok, I said I'd give you something better, and I will. First, let's improve Jeeter's solution.
It has a bug. Because it relies on 0 to be the "used" indicator, it won't actually produce index 0 until the end, which is not random.
It fills an array with indices, then uses 0 as effectively a boolean value, which is redundant. If a value at an index is not 0 we already know what it is, it's the same as the index we used to get to it. It just hides the true nature of algorithm and makes it unnecessarily complex.
It uses recursion when it doesn't need to. Sure, you can argue that this improves code clarity, but then you risk running into a StackOverflowException for too many recursive calls.
Thus, I present an improved version of the algorithm:
class Randp {
private int MAX_VALUE;
private int numsLeft;
private boolean[] used;
public Randp(int startCounter) {
MAX_VALUE = startCounter;
numsLeft = startCounter;
// All false by default.
used = new boolean[MAX_VALUE];
}
public int nextInt() {
if (numsLeft <= 0)
return 0;
numsLeft--;
int index;
do
{
index = (int)(Math.random() * MAX_VALUE);
} while (used[index]);
return index;
}
}
I believe this is much easier to understand, but now it becomes clear the algorithm is not great. It might take a long time to find an unused index, especially when we wanted a lot of values and there's only a few left. We need to fundamentally change the way we approach this. It'd be better to generate the values randomly from the beginning:
class Randp {
private ArrayList<Integer> chooser = new ArrayList<Integer>();
private int count = 0;
public Randp(int startCounter) {
for (int i = 0; i < startCounter; i++)
chooser.add(i);
Collections.shuffle(chooser);
}
public int nextInt() {
if (count >= chooser.size())
return 0;
return chooser.get(count++);
}
}
This is the most efficient and extremely simple since we made use of existing classes and methods.

Time difference for random number generation implementation in Java vs. C++

I'm writing a Monte Carlo simulation in Java that involves generating a lot of random integers. My thinking was that native code would be faster for random number generation, so I should write the code in C++ and return the output via JNI. But when I wrote the same method in C++, it actually takes longer to execute than the Java version. Here are the code samples:
Random rand = new Random();
int threshold = 5;
int[] composition = {10, 10, 10, 10, 10};
for (int j = 0; j < 100000000; j++) {
rand.setSeed(System.nanoTime());
double sum = 0;
for (int i = 0; i < composition[0]; i++) sum += carbon(rand);
for (int i = 0; i < composition[1]; i++) sum += hydrogen(rand);
for (int i = 0; i < composition[2]; i++) sum += nitrogen(rand);
for (int i = 0; i < composition[3]; i++) sum += oxygen(rand);
for (int i = 0; i < composition[4]; i++) sum += sulfur(rand);
if (sum < threshold) {}//execute some code
else {}//execute some other code
}
And the equivalent code in C++:
int threshold = 5;
int composition [5] = {10, 10, 10, 10, 10};
for (int i = 0; i < 100000000; i++)
{
srand(time(0));
double sum = 0;
for (int i = 0; i < composition[0]; i++) sum += carbon();
for (int i = 0; i < composition[1]; i++) sum += hydrogen();
for (int i = 0; i < composition[2]; i++) sum += nitrogen();
for (int i = 0; i < composition[3]; i++) sum += oxygen();
for (int i = 0; i < composition[4]; i++) sum += sulfur();
if (sum > threshold) {}
else {}
}
All of the element methods (carbon, hydrogen, etc) just generate a random number and return a double.
Runtimes are 77.471 sec for the Java code, and 121.777 sec for C++.
Admittedly I'm not very experienced in C++ so it's possible that the cause is just badly written code.

I suspect that the performance issue is in the bodies of your carbon(), hydrogen(), nitrogen(), oxygen(), and sulfur() functions. You should show how they produce the random data.
Or it could be in the if (sum < threshold) {} else {} code.
I wanted to keep setting the seed so the results would not be deterministic (closer to being truly random)
Since you're using the result of time(0) as a seed you're not getting particularly random results either way.
Instead of using srand() and rand() you should take a look at the <random> library and choose an engine with the performance/quality characteristics that meed your needs. If your implementation supports it you can even get non-deterministic random data from std::random_device (either to generate seeds or as an engine).
Additionally <random> provides pre-made distributions such as std::uniform_real_distribution<double> which is likely to be better than the average programmer's method of manually computing the distribution you want from the results of rand().
Okay, here's how you can eliminate the inner loops from your code and drastically speed it up (In Java or C++).
Your code:
double carbon() {
if (rand() % 10000 < 107)
return 13.0033548378;
else
return 12.0;
}
picks one of two values with a particular probability. Presumably you intended the first value to be picked about 107 times out of 10000 (although using % with rand() doesn't quite give you that). When you run this in a loop and sum the results as in:
for (int i = 0; i < composition[0]; i++) sum += carbon();
you'll essentially get sum += X*13.0033548378 + Y*12.0; where X is the number of times the random number stays under the threshold and Y is (trials-X). It just so happens that you can simulate running a bunch of trials and calculating the number of successes using a binomial distribution, and <random> happens to provide a binomial distribution.
Given a function sum_trials()
std::minstd_rand0 eng; // global random engine
double sum_trials(int trials, double probability, double A, double B) {
std::binomial_distribution<> dist(trials, probability);
int successes = dist(eng);
return successes*A + (trials-successes)*B;
}
You can replace your carbon() loop:
sum += sum_trials(composition[0], 107.0/10000.0, 13.003354378, 12.0); // carbon trials
I don't have the actual values you're using, but your whole loop will look something like:
for (int i = 0; i < 100000000; i++) {
double sum = 0;
sum += sum_trials(composition[0], 107.0/10000.0, 13.003354378, 12.0); // carbon trials
sum += sum_trials(composition[1], 107.0/10000.0, 13.003354378, 12.0); // hydrogen trials
sum += sum_trials(composition[2], 107.0/10000.0, 13.003354378, 12.0); // nitrogen trials
sum += sum_trials(composition[3], 107.0/10000.0, 13.003354378, 12.0); // oxygen trials
sum += sum_trials(composition[4], 107.0/10000.0, 13.003354378, 12.0); // sulfur trials
if (sum > threshold) {
} else {
}
}
Now one thing to note is that inside the function we're constructing distributions over and over with the same data. We can extract that by replacing the function sum_trials() with a function object, which we construct with the appropriate data once before the loop, and then just use the functor repeatedly:
struct sum_trials {
std::binomial_distribution<> dist;
double A; double B; int trials;
sum_trials(int t, double p, double a, double b) : dist{t, p}, A{a}, B{b}, trials{t} {}
double operator() () {
int successes = dist(eng);
return successes * A + (trials - successes) * B;
}
};
int main() {
int threshold = 5;
int composition[5] = { 10, 10, 10, 10, 10 };
sum_trials carbon = { composition[0], 107.0/10000.0, 13.003354378, 12.0};
sum_trials hydrogen = { composition[1], 107.0/10000.0, 13.003354378, 12.0};
sum_trials nitrogen = { composition[2], 107.0/10000.0, 13.003354378, 12.0};
sum_trials oxygen = { composition[3], 107.0/10000.0, 13.003354378, 12.0};
sum_trials sulfur = { composition[4], 107.0/10000.0, 13.003354378, 12.0};
for (int i = 0; i < 100000000; i++) {
double sum = 0;
sum += carbon();
sum += hydrogen();
sum += nitrogen();
sum += oxygen();
sum += sulfur();
if (sum > threshold) {
} else {
}
}
}
The original version of the code took my system about one minute 30 seconds. The last version here takes 11 seconds.
Here's a functor to generate the oxygen sums using two binomial_distributions. Maybe one of the other distributions can do this in one shot but I don't know.
struct sum_trials2 {
std::binomial_distribution<> d1;
std::binomial_distribution<> d2;
double A; double B; double C;
int trials;
double probabilty2;
sum_trials2(int t, double p1, double p2, double a, double b, double c)
: d1{t, p1}, A{a}, B{b}, C{c}, trials{t}, probability2{p2} {}
double operator() () {
int X = d1(eng);
d2.param(std::binomial_distribution<>{trials-X, p2}.param());
int Y = d2(eng);
return X*A + Y*B + (trials-X-Y)*C;
}
};
sum_trials2 oxygen{composition[3], 17.0/1000.0, (47.0-17.0)/(1000.0-17.0), 17.9999, 16.999, 15.999};
You can further speed this up if you can just calculate the probability that the sum is under your threshold:
int main() {
std::minstd_rand0 eng;
std::bernoulli_distribution dist(probability_sum_is_over_threshold);
for (int i=0; i< 100000000; ++i) {
if (dist(eng)) {
} else {
}
}
}
Unless the values for the other elements can be negative then the probability that the sum is greater than five is 100%. In that case you don't even need to generate random data; execute the 'if' branch of your code 100,000,000 times.
int main() {
for (int i=0; i< 100000000; ++i) {
//execute some code
}
}

Java (actually the JIT) is generally very good at detecting code which doesn't do anything useful. This is because the JIT can obtain information at runtime a static compiler cannot determine. For code which can be optimised away, Java can actually be faster than C++. In general however, a well tuned C++ program is faster than one in Java.
In short, given any amount of time, C++ will be faster for a well understand, well tuned program. However, given limited resources, changing requirements and teams of mixed ability Java can often outperform C++ by a significant margin.
All that said, it could be that the random in C++ is better, but more expensive.

Converting Complex to ArrayList<Float> in Java

I have an input signal that I want to store in an ArrayList then convert it into Complex, which goes something like this
-0.03480425839330703
0.07910192950176387
0.7233322451735928
0.1659819820667019
and this outputs its FFT like this
0.9336118983487516
-0.7581365035668999 + 0.08688005256493803i
0.44344407521182005
-0.7581365035668999 - 0.08688005256493803i
This is in a complex structure, I want to convert this into an ArrayList type. while dropping the + 0.08688005256493803i value.
So All I need are these values
0.9336118983487516
-0.7581365035668999
0.44344407521182005
-0.7581365035668999
What is the best way of going about this?
And this is the code that I am using
public static Complex[] fft(Complex[] x) {
int N = x.length;
// base case
if (N == 1) return new Complex[] { x[0] };
// radix 2 Cooley-Tukey FFT
if (N % 2 != 0) { throw new RuntimeException("N is not a power of 2"); }
// fft of even terms
Complex[] even = new Complex[N/2];
for (int k = 0; k < N/2; k++) {
even[k] = x[2*k];
}
Complex[] q = fft(even);
// fft of odd terms
Complex[] odd = even; // reuse the array
for (int k = 0; k < N/2; k++) {
odd[k] = x[2*k + 1];
}
Complex[] r = fft(odd);
// combine
Complex[] y = new Complex[N];
for (int k = 0; k < N/2; k++) {
double kth = -2 * k * Math.PI / N;
Complex wk = new Complex(Math.cos(kth), Math.sin(kth));
y[k] = q[k].plus(wk.times(r[k]));
y[k + N/2] = q[k].minus(wk.times(r[k]));
}
return y;
}

All you want to do is just drop imaginary part of your Complex data structure.
As you not show us Complex class assume it has member for real part (e.g double real;)
To drop imaginary part just call something like complex.getRealPart(), or access complex.real (substitute with your real member name).
To compose ArrayList<Double> use the following snippet:
ArrayList<Double> list = new ArrayList<Double>();
for (Complex c : complexes) { // complexes your array of complexes returned from for fft
list.add(c.getRealpart());
}
Note: Just in case, I can be wrong, but I assume that instead of real part you need absolute value of complex number. To calculate it use:
Math.sqrt(c.getRealPart() * c.getRealPart() + c.getImPart() * c.getImPart());

From what I understand you just want the real part of the complex value. If that's the case, presumably your Complex class also has getReal() and getImaginary() (or similar) methods - so just use getReal().

Finding prime numbers with the Sieve of Eratosthenes (Originally: Is there a better way to prepare this array?)

Note: Version 2, below, uses the Sieve of Eratosthenes. There are several answers that helped with what I originally asked. I have chosen the Sieve of Eratosthenes method, implemented it, and changed the question title and tags appropriately. Thanks to everyone who helped!
Introduction
I wrote this fancy little method that generates an array of int containing the prime numbers less than the specified upper bound. It works very well, but I have a concern.
The Method
private static int [] generatePrimes(int max) {
int [] temp = new int [max];
temp [0] = 2;
int index = 1;
int prime = 1;
boolean isPrime = false;
while((prime += 2) <= max) {
isPrime = true;
for(int i = 0; i < index; i++) {
if(prime % temp [i] == 0) {
isPrime = false;
break;
}
}
if(isPrime) {
temp [index++] = prime;
}
}
int [] primes = new int [index];
while(--index >= 0) {
primes [index] = temp [index];
}
return primes;
}
My Concern
My concern is that I am creating an array that is far too large for the final number of elements the method will return. The trouble is that I don't know of a good way to correctly guess the number of prime numbers less than a specified number.
Focus
This is how the program uses the arrays. This is what I want to improve upon.
I create a temporary array that is
large enough to hold every number
less than the limit.
I generate the prime numbers, while
keeping count of how many I have
generated.
I make a new array that is the right
dimension to hold just the prime
numbers.
I copy each prime number from the
huge array to the array of the
correct dimension.
I return the array of the correct
dimension that holds just the prime
numbers I generated.
Questions
Can I copy the whole chunk (at once) of
temp[] that has nonzero
elements to primes[]
without having to iterate through
both arrays and copy the elements
one by one?
Are there any data structures that
behave like an array of primitives
that can grow as elements are added,
rather than requiring a dimension
upon instantiation? What is the
performance penalty compared to
using an array of primitives?
Version 2 (thanks to Jon Skeet):
private static int [] generatePrimes(int max) {
int [] temp = new int [max];
temp [0] = 2;
int index = 1;
int prime = 1;
boolean isPrime = false;
while((prime += 2) <= max) {
isPrime = true;
for(int i = 0; i < index; i++) {
if(prime % temp [i] == 0) {
isPrime = false;
break;
}
}
if(isPrime) {
temp [index++] = prime;
}
}
return Arrays.copyOfRange(temp, 0, index);
}
Version 3 (thanks to Paul Tomblin) which uses the Sieve of Erastosthenes:
private static int [] generatePrimes(int max) {
boolean[] isComposite = new boolean[max + 1];
for (int i = 2; i * i <= max; i++) {
if (!isComposite [i]) {
for (int j = i; i * j <= max; j++) {
isComposite [i*j] = true;
}
}
}
int numPrimes = 0;
for (int i = 2; i <= max; i++) {
if (!isComposite [i]) numPrimes++;
}
int [] primes = new int [numPrimes];
int index = 0;
for (int i = 2; i <= max; i++) {
if (!isComposite [i]) primes [index++] = i;
}
return primes;
}

Your method of finding primes, by comparing every single element of the array with every possible factor is hideously inefficient. You can improve it immensely by doing a Sieve of Eratosthenes over the entire array at once. Besides doing far fewer comparisons, it also uses addition rather than division. Division is way slower.

ArrayList<> Sieve of Eratosthenes
// Return primes less than limit
static ArrayList<Integer> generatePrimes(int limit) {
final int numPrimes = countPrimesUpperBound(limit);
ArrayList<Integer> primes = new ArrayList<Integer>(numPrimes);
boolean [] isComposite = new boolean [limit]; // all false
final int sqrtLimit = (int)Math.sqrt(limit); // floor
for (int i = 2; i <= sqrtLimit; i++) {
if (!isComposite [i]) {
primes.add(i);
for (int j = i*i; j < limit; j += i) // `j+=i` can overflow
isComposite [j] = true;
}
}
for (int i = sqrtLimit + 1; i < limit; i++)
if (!isComposite [i])
primes.add(i);
return primes;
}
Formula for upper bound of number of primes less than or equal to max (see wolfram.com):
static int countPrimesUpperBound(int max) {
return max > 1 ? (int)(1.25506 * max / Math.log((double)max)) : 0;
}

Create an ArrayList<Integer> and then convert to an int[] at the end.
There are various 3rd party IntList (etc) classes around, but unless you're really worried about the hit of boxing a few integers, I wouldn't worry about it.
You could use Arrays.copyOf to create the new array though. You might also want to resize by doubling in size each time you need to, and then trim at the end. That would basically be mimicking the ArrayList behaviour.

Algo using Sieve of Eratosthenes
public static List<Integer> findPrimes(int limit) {
List<Integer> list = new ArrayList<>();
boolean [] isComposite = new boolean [limit + 1]; // limit + 1 because we won't use '0'th index of the array
isComposite[1] = true;
// Mark all composite numbers
for (int i = 2; i <= limit; i++) {
if (!isComposite[i]) {
// 'i' is a prime number
list.add(i);
int multiple = 2;
while (i * multiple <= limit) {
isComposite [i * multiple] = true;
multiple++;
}
}
}
return list;
}
Image depicting the above algo (Grey color cells represent prime number. Since we consider all numbers as prime numbers intially, the whole is grid is grey initially.)
Image Source: WikiMedia

The easiest solution would be to return some member of the Collections Framework instead of an array.

Are you using Java 1.5? Why not return List<Integer> and use ArrayList<Integer>? If you do need to return an int[], you can do it by converting List to int[] at the end of processing.

As Paul Tomblin points out, there are better algorithms.
But keeping with what you have, and assuming an object per result is too big:
You are only ever appending to the array. So, use a relatively small int[] array. When it's full use append it to a List and create a replacement. At the end copy it into a correctly sized array.
Alternatively, guess the size of the int[] array. If it is too small, replace by an int[] with a size a fraction larger than the current array size. The performance overhead of this will remain proportional to the size. (This was discussed briefly in a recent stackoverflow podcast.)

Now that you've got a basic sieve in place, note that the inner loop need only continue until temp[i]*temp[i] > prime.

I have a really efficient implementation:
we don't keep the even numbers, therefore halving the memory usage.
we use BitSet, requiring only one bit per number.
we estimate the upper bound for number of primes on the interval, thus we can set the initialCapacity for the Array appropriately.
we don't perform any kind of division in the loops.
Here's the code:
public ArrayList<Integer> sieve(int n) {
int upperBound = (int) (1.25506 * n / Math.log(n));
ArrayList<Integer> result = new ArrayList<Integer>(upperBound);
if (n >= 2)
result.add(2);
int size = (n - 1) / 2;
BitSet bs = new BitSet(size);
int i = 0;
while (i < size) {
int p = 3 + 2 * i;
result.add(p);
for (int j = i + p; j < size; j += p)
bs.set(j);
i = bs.nextClearBit(i + 1);
}
return result;
}

Restructure your code. Throw out the temporary array, and instead write function that just prime-tests an integer. It will be reasonably fast, since you're only using native types. Then you can, for instance, loop and build a list of integers that are prime, before finally converting that to an array to return.

Not sure if this will suite your situation but you can take a look at my approach. I used mine using Sieve of Eratosthenes.
public static List<Integer> sieves(int n) {
Map<Integer,Boolean> numbers = new LinkedHashMap<>();
List<Integer> primes = new ArrayList<>();
//First generate a list of integers from 2 to 30
for(int i=2; i<n;i++){
numbers.put(i,true);
}
for(int i : numbers.keySet()){
/**
* The first number in the list is 2; cross out every 2nd number in the list after 2 by
* counting up from 2 in increments of 2 (these will be all the multiples of 2 in the list):
*
* The next number in the list after 2 is 3; cross out every 3rd number in the list after 3 by
* counting up from 3 in increments of 3 (these will be all the multiples of 3 in the list):
* The next number not yet crossed out in the list after 5 is 7; the next step would be to cross out every
* 7th number in the list after 7, but they are all already crossed out at this point,
* as these numbers (14, 21, 28) are also multiples of smaller primes because 7 × 7 is greater than 30.
* The numbers not crossed out at this point in the list are all the prime numbers below 30:
*/
if(numbers.get(i)){
for(int j = i+i; j<n; j+=i) {
numbers.put(j,false);
}
}
}
for(int i : numbers.keySet()){
for(int j = i+i; j<n && numbers.get(i); j+=i) {
numbers.put(j,false);
}
}
for(int i : numbers.keySet()){
if(numbers.get(i)) {
primes.add(i);
}
}
return primes;
}
Added comment for each steps that has been illustrated in wikipedia

I have done using HashMap and found it very simple
import java.util.HashMap;
import java.util.Map;
/*Using Algorithms such as sieve of Eratosthanas */
public class PrimeNumber {
public static void main(String[] args) {
int prime = 15;
HashMap<Integer, Integer> hashMap = new HashMap<Integer, Integer>();
hashMap.put(0, 0);
hashMap.put(1, 0);
for (int i = 2; i <= prime; i++) {
hashMap.put(i, 1);// Assuming all numbers are prime
}
printPrimeNumberEratoshanas(hashMap, prime);
}
private static void printPrimeNumberEratoshanas(HashMap<Integer, Integer> hashMap, int prime) {
System.out.println("Printing prime numbers upto" + prime + ".....");
for (Map.Entry<Integer, Integer> entry : hashMap.entrySet()) {
if (entry.getValue().equals(1)) {
System.out.println(entry.getKey());
for (int j = entry.getKey(); j < prime; j++) {
for (int k = j; k * j <= prime; k++) {
hashMap.put(j * k, 0);
}
}
}
}
}
}
Think this is effective

public static void primes(int n) {
boolean[] lista = new boolean[n+1];
for (int i=2;i<lista.length;i++) {
if (lista[i]==false) {
System.out.print(i + " ");
}
for (int j=i+i;j<lista.length;j+=i) {
lista[j]=true;
}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Sample without replacement in Java with probabilities - java

Related

How to reduce the number of iterations in a program where number of elements in the test case is very large?

How to create a number generator that will only pick a number 1 time?

Time difference for random number generation implementation in Java vs. C++

Converting Complex to ArrayList<Float> in Java

Finding prime numbers with the Sieve of Eratosthenes (Originally: Is there a better way to prepare this array?)

Categories

Resources