Algorithm to generate Poisson and binomial random numbers?

Algorithm to generate Poisson and binomial random numbers? - java

i've been looking around, but i'm not sure how to do it.
i've found this page which, in the last paragraph, says:
A simple generator for random numbers taken from a Poisson distribution is obtained using this simple recipe: if x1, x2, ... is a sequence of random numbers with uniform distribution between zero and one, k is the first integer for which the product x1 · x2 · ... · xk+1 < e-λ
i've found another page describing how to generate binomial numbers, but i think it is using an approximation of poisson generation, which doesn't help me.
For example, consider binomial random numbers. A binomial random number is the number of heads in N tosses of a coin with probability p of a heads on any single toss. If you generate N uniform random numbers on the interval (0,1) and count the number less than p, then the count is a binomial random number with parameters N and p.
i know there are libraries to do it, but i can't use them, only the standard uniform generators provided by the language (java, in this case).

Poisson distribution
Here's how Wikipedia says Knuth says to do it:
init:
Let L ← e^(−λ), k ← 0 and p ← 1.
do:
k ← k + 1.
Generate uniform random number u in [0,1] and let p ← p × u.
while p > L.
return k − 1.
In Java, that would be:
public static int getPoisson(double lambda) {
double L = Math.exp(-lambda);
double p = 1.0;
int k = 0;
do {
k++;
p *= Math.random();
} while (p > L);
return k - 1;
}
Binomial distribution
Going by chapter 10 of Non-Uniform Random Variate Generation (PDF) by Luc Devroye (which I found linked from the Wikipedia article) gives this:
public static int getBinomial(int n, double p) {
int x = 0;
for(int i = 0; i < n; i++) {
if(Math.random() < p)
x++;
}
return x;
}
Please note
Neither of these algorithms is optimal. The first is O(λ), the second is O(n). Depending on how large these values typically are, and how frequently you need to call the generators, you might need a better algorithm. The paper I link to above has more complicated algorithms that run in constant time, but I'll leave those implementations as an exercise for the reader. :)

For this and other numerical problems the bible is the numerical recipes book.
There's a free version for C here: http://www.nrbook.com/a/bookcpdf.php (plugin required)
Or you can see it on google books: http://books.google.co.uk/books?id=4t-sybVuoqoC&lpg=PP1&ots=5IhMINLhHo&dq=numerical%20recipes%20in%20c&pg=PP1#v=onepage&q=&f=false
The C code should be very easy to transfer to Java.
This book is worth it's weight in gold for lots of numerical problems. On the above site you can also buy the latest version of the book.

Although the answer posted by Kip is perfectly valid for generating Poisson RVs with small rate of arrivals (lambda), the second algorithm posted in Wikipedia Generating Poisson Random variables is better for larger rate of arrivals due to numerical stability.
I faced problems during implementation of one of the projects requiring generation of Poisson RV with very high lambda due to this. So I suggest the other way.

There are several implementations from CERN in the following library (Java code):
http://acs.lbl.gov/~hoschek/colt/
Concerning binomial random numbers, it is based on the paper from 1988 "Binomial Random Variate Generation", that I recommend to you since they use an optimized algorithm.
Regards

you can add this to build.gradle
implementation 'org.kie.modules:org-apache-commons-math:6.5.0.Final'
and use class PoissonDistribution
more detail for class PoissonDistribution

Related

How do you implement AKS test in Java to determine whether a number is prime?

I wrote some pseudo code to start with. I am having trouble writing an expandPolynomial() method. Also, I don't know how to represent the arbitrary int x in isPrime().
Here is a link that explains AKS: https://www.youtube.com/watch?v=HvMSRWTE2mI
public class Solution {
public static void main(String[] args)
{
Scanner scan = new Scanner(System.in);
int n = scan.nextInt();
System.out.println(isPrime(n));
}
private static boolean isPrime(int p)
{
/*
AKS primality test says that if
((x-1)^p - (x^p-1)) mod p == 0,
then p is a prime
*/
int x;
boolean prime=true;
int[] arr = expandPolynomial(x-1, p);
for (int i=1;i<arr.length-1; i++)
{
// arr[0] and arr[arr.length-1] omitted
// due to the subtraction in the test
if (arr[i] % p != 0)
{
prime = false;
}
}
return prime;
}
private static int[] expandPolynomial()
{
/*
return the coefficient of each term
from the leading term to x^1;
*/
}
}

Unfortunately that video is very misleading in a number of ways.
What Dr. Grime spends most of the video describing is equation (1) from the Wikipedia page. It's Lemma 2.1, also equation (1) in the paper. It was known in the 1600s and is exponential time. It's in fact slower than simple trial division. There is really little point in implementing this other than just for pure fun.
Dr. Grime knows this isn't really AKS, and mentions "some other fiddly bits" near the end, which is in fact the whole point of AKS. Thousands of people for hundreds of years had looked at that equation, maybe gone about different ways of proving it, maybe related it to things like Pascal's triangle (or Tartaglia's triangle, Yang Hui's triangle, the Khayyam triangle, Pingala's, etc if you'd prefer). One of the most amazing things about AKS is that these three people looked at the same equation as thousands before them, and managed to show how reduce the equation in size and number of tests to bounds that grow slowly enough that the time taken is polynomial (which is certainly not true of the original).
It is possible to implement the real AKS test in Java -- it's been done at least once. It takes more work since you need to do some modular polynomial multiplication (modular in both the coefficients and the exponents). More importantly you have to be extremely careful with the various limits and comparisons. Otherwise you get, like many have done, a test that doesn't work correctly. For instance, if you fail to notice, buried in the second paragraph of page 3, "We use log for base 2 logarithm" then there's a good chance you'll do it wrong and never know you've written a really slow "probable prime" test. Unlike ECPP, the result of AKS is either "...compute for hours/days/years... COMPOSITE" or "...compute for hours/days/years... PRIME". No explanation for why. No way to double check the results other than run a different test. ECPP gives you a certificate explaining exactly why the number is prime (albeit ECPP is rather more complicated).
Let's say you do manage to get this written correctly. It's still not useful, as the fastest implementations in C are still slower than other known and in-daily-use primality proof methods. I'm not talking about probable prime algorithms (which are orders of magnitude faster yet), I mean APR-CL and ECPP. Proofs. To use the video's terms, they are "fool-proof tests for primes" which were known and in use before AKS and continue to be used now. The video once again misleads the viewers into thinking this was new and unique.
If you're writing a paper that needs some limit on the computational complexity of primality testing, then AKS is great. "By [AKS2002] primality is in P. Moving on to my interesting point, ..." If you're writing something because it interests you, then have at it. I've done it as have others. It's fun. But it's not all that useful in a practical sense.
Add: If you look at RosettaCode, you can find example code in over 50 different languages for this lemma. Java is one of them.

x is not a Java variable that has an actual value. It's a symbol in a polynomial. Here you want to represent an actual polynomial in Java, not a single value; so, for instance, for x2 + 3 x + 5, you might want to represent this as an array [1, 3, 5], the coefficients of the polynomial (although it's probably better to do it in reverse, [5, 3, 1], with the coefficient of x0 first--I think this makes the algorithms smoother). So to represent x-1, you'll need an int[] whose values are [1, -1] or [-1, 1]. You could write a multiplication method that multiplies two polynomials; to multiply a0 + a1 x1 + a2 x2 ... by b0 + b1 x1 + b2 x2, the coefficient of x0 in the result is a0 b0; the coefficient of x1 will be a0 b1 + a1 b0, of x2 will be a0 b2 + a1 b1 + a2 b0, and so on. Using this approach, you could represent x-1 as a polynomial and then multiply it by itself p times (using your polynomial multiplication method) to get the polynomial (x-1)p; then you subtract the xp and x0 coefficients (to implement the -(x^p-1) part), and see if all the resulting coefficients are divisible by p.
Or, instead of doing all the polynomial math, you could use the fact that the coefficients of (x-1)p will be the binomial coefficients (also found in Pascal's triangle), with every other sign switched. But the sign doesn't matter when testing for divisibility by p anyway. So you could simply perform the test by computing all the binomial coefficients and seeing if they're divisible by p. See https://en.wikipedia.org/wiki/AKS_primality_test#Concepts which says the same thing. To get all the binomial coefficients, start with 1, then multiply by p and divide by 1, then multiply by p-1 and divide by 2, then multiply by p-2 and divide by 3, etc.
Of course, doing this is actually slower than just running through odd numbers and seeing if p is divisible by any of them--the old boring way to check for primality. The Wikipedia article on the AKS test gives an algorithm that actually does appear to be faster, but it involves things like multiplicative orders and Euler's totient function, so it's too complex to explain here.

Convert uniform random generation to binomial

I have written the following function to implement a type of mutation(creep) in my genetic algorithm project. Since I've used java's inbuilt random generation library, the probability of getting every index is uniform. I've been asked to modify the function such way that it uses binomial distribution instead of uniform. As far as I googled, I couldn't find any example/tutorial that demonstrates conversion of uniform to binomial. How do I achieve it?
int mutationRate = 0.001;
public void mutate_creep() {
if (random.nextDouble() <= mutationRate) {
// uniform random generation
int index = random.nextInt(chromoLen);
if(index%2 == 0) { // even index
chromo[index] += 1;
} else { // odd index
chromo[index] -= 1;
}
}
}
NOTE: I have already seen the solution at A efficient binomial random number generator code in Java. Since my problem here is specific to creep mutation algorithm, I'm not sure how it can be applied directly.

According to Wikipedia, you do this:
One way to generate random samples from a binomial distribution is to use an inversion algorithm. To do so, one must calculate the probability that P(X=k) for all values k from 0 through n. (These probabilities should sum to a value close to one, in order to encompass the entire sample space.) Then by using a pseudorandom number generator to generate samples uniformly between 0 and 1, one can transform the calculated samples U[0,1] into discrete numbers by using the probabilities calculated in step one.
I will leave it to you to "calculate the probability [...] for all values k from 0 through n". After that, it's a weighed distribution.
You can do that using a TreeMap, similar to how I show it in this answer.

Arrangements of sets of k positions in a n-competitors race

this is a copy of my post on mathexchange.com.
Let E(n) be the set of all possible ending arrangements of a race of n competitors.
Obviously, because it's a race, each one of the n competitors wants to win.
Hence, the order of the arrangements does matter.
Let us also say that if two competitors end with the same result of time, they win the same spot.
For example, E(3) contains the following sets of arrangements:
{(1,1,1), (1,1,2), (1,2,1), (1,2,2), (1,2,3), (1,3,2), (2,1,1), (2,1,2),(2,1,3), (2,2,1), (2,3,1), (3,1,2), (3,2,1)}.
Needless to say, for example, that the arrangement (1,3,3) is invalid, because the two competitors that supposedly ended in the third place, actually ended in the second place. So the above arrangement "transfers" to (1,2,2).
Define k to be the number of distinct positions of the competitors in a subset of E(n).
We have for example:
(1,1,1) -------> k = 1
(1,2,1) -------> k = 2
(1,2,3,2) -------> k = 3
(1,2,1,5,4,4,3) -------> k = 5
Finally, let M(n,k) be the number of subsets of E(n) in which the competitors ended in exactly k distinct positions.
We get, for example,M(3,3) = M(3,2) = 6 and M(3,1) = 1.
-------------------------------------------------------------------------------------------
Thus far is the question
It's a problem I came up with solely by myself. After some time of thought I came up with the following recursive formula for |E(n)|:
(Don't continue reading if you want to derive a formula yourself!)
|E(n)| = sum from l=1 to n of C(n,l)*|E(n-l)| where |E(0)| = 1
And the code in Java for this function, using the BigInteger class:
public static BigInteger E (int n)
{
if (!Ens[n].equals(BigInteger.ZERO))
return Ens[n];
else
{
BigInteger ends=BigInteger.ZERO;
for (int l=1;l<=n;l++)
ends=ends.add(factorials[n].divide(factorials[l].multiply(factorials[n-l])).multiply(E(n-l)));
Ens[n]=ends;
return ends;
}
}
The factorials array is an array of precalculated factorials for faster binomial coefficients calculations.
The Ens array is an array of the memoized/cached E(n) values which really quickens the calculating, due to the need of repeatedly calculating certain E(n) values.
The logic behind this recurrence relation is that l symbolizes how many "first" spots we have. For each l, the binomial coefficient C(n,l) symbolizes in how many ways we can pick l first-placers out of the n competitors. Once we have chosen them, we to need to figure out in how many ways we can arrange the n-l competitors we have left, which is just |E(n-l)|.
I get the following:
|E(3)| = 13
|E(5)| = 541
|E(10)| = 102247563
|E(100)| mod 1 000 000 007 = 619182829 -------> 20 ms.
And |E(1000)| mod 1 000 000 007 = 581423957 -------> 39 sec.
I figured out that |E(n)| can also be visualized as the number of sets to which the following applies:
For every i = 1, 2, 3 ... n, every i-tuple subset of the original set has GCD (greatest common divisor) of all of its elements equal to 1.
But I'm not 100% sure about this because I was not able to compute this approach for large n.
However, even with precalculating factorials and memoizing the E(n)'s, the calculating times for higher n's grow very fast.
Is anyone capable of verifying the above formula and values?
Can anyone derive a better, faster formula? Perhaps with generating functions?
As for M(n,k).. I'm totally clueless. I absolutely have no idea how to calculate it, and therefore I couldn't post any meaningful data points.
Perhaps it's P(n,k) = n!/(n-k)!.
Can anyone figure out a formula for M(n,k)?
I have no idea which function is harder to compute, either E(n) or M(n,k), but helping me with either of them will be very much appreciable.
I want the solutions to be generic as well as work efficiently even for large n's. Exhaustive search is not what I'm looking for, unfortunately.
What I am looking for is solutions based purely on combinatorial approach and efficient formulas.
I hope I was clear enough with the wording and what I ask for throughout my post. By the way, I can program using Java. I also know Mathematica pretty decently :) .
Thanks a lot in advance,
Matan.

E(n) are the Fubini numbers. M(n, k) = S(n, k) * k!, where S(n, k) is a Stirling number of the second kind, because S(n, k) is the number of different placing partitions, and k! is the number of ways to rank them.

Fitness function

Whilst searching on Google about Genetic Algorithms, I came across OneMax Problem, my search showed that this is one of the very first problem that the Genetic Algorithm was applied to. However, I am not exactly sure what is OneMax problem. Can anyone explain.
Any help is appreciated

The goal of One-Max problem is to create a binary string of length n where every single gene contains a 1. The fitness function is very simple, you just iterate through your binary string counting all ones. This is what the sum represents in the formula you provided with your post. It is just the number of ones in the binary string. You could also represent the fitness as a percentage, by dividing the number of ones by n * 0.01. A higher fitness would have a higher percentage. Eventually you will get a string of n ones with a fitness of 100% at some generation.
double fitness(List<int> chromosome) {
int ones = chromosome.stream().filter(g -> g == 1).count();
return ones / chromosome.size() * 0.01;
}

Simulating Poisson Waiting Times

I need to simulate Poisson wait times. I've found many examples of simulating the number of arrivals, but I need to simulate the wait time for one arrival, given an average wait time.
I keep finding code like this:
public int getPoisson(double lambda)
{
double L = Math.exp(-lambda);
double p = 1.0;
int k = 0;
do
{
k++;
p *= rand.nextDouble();
p *= Math.random();
} while (p > L);
return k - 1;
}
but that is for number of arrivals, not arrival times.
Efficieny is preferred to accuracy, more because of power consumption than time. The language I am working in is Java, and it would be best if the algorithm only used methods available in the Random class, but this is not required.

Time between arrivals is an exponential distribution, and you can generate a random variable X~exp(lambda) with the formula:
-ln(U)/lambda` (where U~Uniform[0,1]).
More info on generating exponential variable.
Note that time between arrival also matches time until first arrival, because exponential distribution is memoryless.

If you want to simulate earthquakes, or lightning or critters appearing on a screen, the usual method is to assume a Poisson Distribution with an average arrival rate λ.
The easier thing to do is to simulate inter-arrivals:
With a Poisson distribution, the arrivals get more likely as time passes. It corresponds to the cumulative distribution for that probability density function. The expected value of a Poisson-distributed random variable is equal to λ and so is its variance.
The simplest way is to 'sample' the cumulative distribution which has an exponential form (e)^-λt which gives t = -ln(U)/λ. You choose a uniform random number U and plug in the formula to get the time that should pass before the next event.
Unfortunately, because U usually belongs to [0,1[ that could cause issues with the log, so it's easier to avoid it by using t= -ln(1-U)/λ.
Sample code can be found at the link below.
https://stackoverflow.com/a/5615564/1650437

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.