Simulating Poisson Waiting Times - java

I need to simulate Poisson wait times. I've found many examples of simulating the number of arrivals, but I need to simulate the wait time for one arrival, given an average wait time.
I keep finding code like this:
public int getPoisson(double lambda)
{
double L = Math.exp(-lambda);
double p = 1.0;
int k = 0;
do
{
k++;
p *= rand.nextDouble();
p *= Math.random();
} while (p > L);
return k - 1;
}
but that is for number of arrivals, not arrival times.
Efficieny is preferred to accuracy, more because of power consumption than time. The language I am working in is Java, and it would be best if the algorithm only used methods available in the Random class, but this is not required.

Time between arrivals is an exponential distribution, and you can generate a random variable X~exp(lambda) with the formula:
-ln(U)/lambda` (where U~Uniform[0,1]).
More info on generating exponential variable.
Note that time between arrival also matches time until first arrival, because exponential distribution is memoryless.

If you want to simulate earthquakes, or lightning or critters appearing on a screen, the usual method is to assume a Poisson Distribution with an average arrival rate λ.
The easier thing to do is to simulate inter-arrivals:
With a Poisson distribution, the arrivals get more likely as time passes. It corresponds to the cumulative distribution for that probability density function. The expected value of a Poisson-distributed random variable is equal to λ and so is its variance.
The simplest way is to 'sample' the cumulative distribution which has an exponential form (e)^-λt which gives t = -ln(U)/λ. You choose a uniform random number U and plug in the formula to get the time that should pass before the next event.
Unfortunately, because U usually belongs to [0,1[ that could cause issues with the log, so it's easier to avoid it by using t= -ln(1-U)/λ.
Sample code can be found at the link below.
https://stackoverflow.com/a/5615564/1650437

Related

Statistical analysis of distributed data values in Java

I am writing a program in Java that outputs a List<Double> of distances that roughly follow a bell curve distribution. From this data, I need to generate two values A and B that follow the distribution at a particular standard deviation from the mean X, one above the mean and one below the mean. The distribution may not be symmetrical but I am content to assume that it is for my purposes. These values A and B would be better than my current method of taking the min and max of the dataset, which is very vulnerable to be skewed by random outliers, and so is not always representative of a specific probability from the distribution. How would I generate these values, A and B? Should I be asking this in the Stats stack exchange? Any help is greatly appreciated!
Should I be asking this in the Stats stack exchange?
Nah, we can do it here!
The Statistics
First off, we need to establish what we want to do. A and B are the values on opposite sides of the mean, with a particular standard deviation from it.
Recall, the standard deviation, is simply the square root of the variance
The variance, is calculated by sum((x[i] - mean)^2) / x.length
Thus, we also need the mean, which is sum(x[i]) / x.length
With the standard deviation calculated, if you multiply it with 1, it will be the distance from the mean to B, so B would be that value plus the mean. Use negative for the value of A (if that's what's below the mean).
The code
So, we have established that the data type for the statistical data is a List, so I will adapt it to the use of Lists.
First we need to loop over the list of data, let's call that List x. And I'm assuming it is already populated with data.
We also need some variables, let's define the mean: double mean, the standard deviation: double stdev and two helper variables to keep the sums: double sqr_sum and double data_sum.
Now, we will compute the mean first:
for (int i; i < x.size(); i++){
data_sum += x[i];
}
mean = data_sum / x.size();
Finally, we should have everything to begin calculating the sum of squares, and eventually the variance! I will also define another variable "variance" (data_var) here to make it easier.
for (int i; i < x.size(); i++){
sqr_sum += Math.pow(x[i] - mean, 2);
}
data_var = sqr_sum / x.size(); // Note, in statistics, depending on what data this is, you should use x.size() for populations, but x.size()-1 for sample data.
stdev = Math.sqrt(data_var);
... and there you have it! The standard deviation of the x data.
If you want to get B (or A), you could simply use:
double dev_A = -1; // How far from the mean we want A to be.
double dev_B = 1; // How far from the mean we want B to be.
double a = dev_A * stdev + mean;
double b = dev_B * stdev + mean;
Hope this helps!

Java: Poisson-based point process constrained in a small region

While simulating a random point process based on Poisson distribution that contains 1000 dots; they all appear to occupy a small region in the center of the window.
I used Donald Knuth inverse sampling algorithm to implement the Poisson-based pseudo-random number generator.
https://en.wikipedia.org/wiki/Poisson_distribution#Generating_Poisson-distributed_random_variables
Lambda value (aka the success rate) was set to window_dimension/2, and obtained this result (screenshot)
Code:
public double getPoisson(double lambda) {//250
double L = Math.exp(-lambda);
double p = 1d;
int k = 0;
do {
k++;
p *= Math.random();
} while (p > L);
return k-1;
}
`
It looks to me like the problem is with what you think the output should be, because the program seems to be generating pretty much what you asked. A Poisson with a rate of 500 will have both its expected value and its variance equal to 500, and for large values of λ it's pretty symmetric and bell-shaped. Taken together that all means the standard deviation is sqrt(500), which is slightly less than 22.4, so you should expect about 95% of your incomes to be 500&pm;45, which looks like what you're getting.
With your subsequent edit saying (in a comment) that λ=250, the results behave similarly. The likely range of outcomes in each dimension is 250&pm;31, still clustering to the center.
It's easy to confirm my explanation by creating Poisson random variates with a standard deviation such that &pm;3σ span your plot area.
You need a larger variance/standard deviation to increase the spread of outcomes across your window. To demo this, I went with a Poisson(6400)—which has a standard deviation of 80—and subtracted 6150 to give the result a mean of 250. The overwhelming majority of values will therefore fall between 0 and 500. I generated 1000 independent pairs of values and plotted them using the JMP statistics package, and here are the results:
and just for jollies, here's a plot of independent pairs of Normal(250, 80)'s:
They look pretty darn similar, don't they?
To reiterate, there's nothing wrong with the Poisson algorithm you used. It's doing exactly what you told it to do, even if that's not what you expected the results to look like.
Addendum
Since you don't believe that Poisson converges to Gaussian as lambda grows, here's some direct evidence for your specific case, again generated with JMP:
On the left is a histogram of 1000 randomly generated Poisson(250) values. Note the well-formed bell shape. I had JMP select the best continuous distribution fit based on AIC (Aikaike Information Criterion). It selected normality as the best possible fit, with the diagnostics on the right and the resulting density plot in red superimposed on the histogram. The results pretty much speak for themselves.

First random number after setSeed in Java always similar

To give some context, I have been writing a basic Perlin noise implementation in Java, and when it came to implementing seeding, I had encountered a bug that I couldn't explain.
In order to generate the same random weight vectors each time for the same seed no matter which set of coordinates' noise level is queried and in what order, I generated a new seed (newSeed), based on a combination of the original seed and the coordinates of the weight vector, and used this as the seed for the randomization of the weight vector by running:
rnd.setSeed(newSeed);
weight = new NVector(2);
weight.setElement(0, rnd.nextDouble() * 2 - 1);
weight.setElement(1, rnd.nextDouble() * 2 - 1);
weight.normalize()
Where NVector is a self-made class for vector mathematics.
However, when run, the program generated very bad noise:
After some digging, I found that the first element of each vector was very similar (and so the first nextDouble() call after each setSeed() call) resulting in the first element of every vector in the vector grid being similar.
This can be proved by running:
long seed = Long.valueOf(args[0]);
int loops = Integer.valueOf(args[1]);
double avgFirst = 0.0, avgSecond = 0.0, avgThird = 0.0;
double lastfirst = 0.0, lastSecond = 0.0, lastThird = 0.0;
for(int i = 0; i<loops; i++)
{
ran.setSeed(seed + i);
double first = ran.nextDouble();
double second = ran.nextDouble();
double third = ran.nextDouble();
avgFirst += Math.abs(first - lastfirst);
avgSecond += Math.abs(second - lastSecond);
avgThird += Math.abs(third - lastThird);
lastfirst = first;
lastSecond = second;
lastThird = third;
}
System.out.println("Average first difference.: " + avgFirst/loops);
System.out.println("Average second Difference: " + avgSecond/loops);
System.out.println("Average third Difference.: " + avgSecond/loops);
Which finds the average difference between the first, second and third random numbers generated after a setSeed() method has been called over a range of seeds as specified by the program's arguments; which for me returned these results:
C:\java Test 462454356345 10000
Average first difference.: 7.44638117976783E-4
Average second Difference: 0.34131692827329957
Average third Difference.: 0.34131692827329957
C:\java Test 46245445 10000
Average first difference.: 0.0017196011123287126
Average second Difference: 0.3416750057190849
Average third Difference.: 0.3416750057190849
C:\java Test 1 10000
Average first difference.: 0.0021601598225344998
Average second Difference: 0.3409914232342002
Average third Difference.: 0.3409914232342002
Here you can see that the first average difference is significantly smaller than the rest, and seemingly decreasing with higher seeds.
As such, by adding a simple dummy call to nextDouble() before setting the weight vector, I was able to fix my perlin noise implementation:
rnd.setSeed(newSeed);
rnd.nextDouble();
weight.setElement(0, rnd.nextDouble() * 2 - 1);
weight.setElement(1, rnd.nextDouble() * 2 - 1);
Resulting in:
I would like to know why this bad variation in the first call to nextDouble() (I have not checked other types of randomness) occurs and/or to alert people to this issue.
Of course, it could just be an implementation error on my behalf, which I would be greatful if it were pointed out to me.
The Random class is designed to be a low overhead source of pseudo-random numbers. But the consequence of the "low overhead" implementation is that the number stream has properties that are a long way off perfect ... from a statistical perspective. You have encountered one of the imperfections. Random is documented as being a Linear Congruential generator, and the properties of such generators are well known.
There are a variety of ways of dealing with this. For example, if you are careful you can hide some of the most obvious "poor" characteristics. (But you would be advised to run some statistical tests. You can't see non-randomness in the noise added to your second image, but it could still be there.)
Alternatively, if you want pseudo-random numbers that have guaranteed good statistical properties, then you should be using SecureRandom instead of Random. It has significantly higher overheads, but you can be assured that many "smart people" will have spent a lot of time on the design, testing and analysis of the algorithms.
Finally, it is relatively simple to create a subclass of Random that uses an alternative algorithm for generating the numbers; see link. The problem is that you have to select (or design) and implement an appropriate algorithm.
Calling this an "issue" is debatable. It is a well known and understood property of LCGs, and use of LCGs was a concious engineering choice. People want low overhead PRNGs, but low overhead PRNGs have poor properties. TANSTAAFL.
Certainly, this is not something that Oracle would contemplate changing in Random. Indeed, the reasons for not changing are stated clearly in the javadoc for the Random class.
"In order to guarantee this property, particular algorithms are specified for the class Random. Java implementations must use all the algorithms shown here for the class Random, for the sake of absolute portability of Java code."
This is known issue. Similar seed will generate similar few first values. Random wasn't really designed to be used this way. You are supposed to create instance with a good seed and then generate moderately sized sequence of "random" numbers.
Your current solution is ok - as long as it looks good and is fast enough. You can also consider using hashing/mixing functions which were designed to solve your problem (and then, optionally, using the output as seed). For example see: Parametric Random Function For 2D Noise Generation
Move your setSeed out of the loop. Java's PRNG is a linear congruential generator, so seeding it with sequential values is guaranteed to give results that are correlated across iterations of the loop.
ADDENDUM
I dashed that off before running out the door to a meeting, and now have time to illustrate what I was saying above.
I've written a little Ruby script which implements Schrage's portable prime modulus multiplicative linear congruential generator. I instantiate two copies of the LCG, both seeded with a value of 1. However, in each iteration of the output loop I reseed the second one based on the loop index. Here's the code:
# Implementation of a Linear Congruential Generator (LCG)
class LCG
attr_reader :state
M = (1 << 31) - 1 # Modulus = 2**31 - 1, which is prime
# constructor requires setting a seed value to use as initial state
def initialize(seed)
reseed(seed)
end
# users can explicitly reset the seed.
def reseed(seed)
#state = seed.to_i
end
# Schrage's portable prime modulus multiplicative LCG
def value
#state = 16807 * #state % M
# return the generated integer value AND its U(0,1) mapping as an array
[#state, #state.to_f / M]
end
end
if __FILE__ == $0
# create two instances of LCG, both initially seeded with 1
mylcg1 = LCG.new(1)
mylcg2 = LCG.new(1)
puts " default progression manual reseeding"
10.times do |n|
mylcg2.reseed(1 + n) # explicitly reseed 2nd LCG based on loop index
printf "%d %11d %f %11d %f\n", n, *mylcg1.value, *mylcg2.value
end
end
and here's the output it produces:
default progression manual reseeding
0 16807 0.000008 16807 0.000008
1 282475249 0.131538 33614 0.000016
2 1622650073 0.755605 50421 0.000023
3 984943658 0.458650 67228 0.000031
4 1144108930 0.532767 84035 0.000039
5 470211272 0.218959 100842 0.000047
6 101027544 0.047045 117649 0.000055
7 1457850878 0.678865 134456 0.000063
8 1458777923 0.679296 151263 0.000070
9 2007237709 0.934693 168070 0.000078
The columns are iteration number followed by the underlying integer generated by the LCG and the result when scaled to the range (0,1). The left set of columns show the natural progression of the LCG when allowed to proceed on its own, while the right set show what happens when you reseed on each iteration.

What could Error in this java program to compute sine?

I have written this code to compute the sine of an angle. This works fine for smaller angles, say upto +-360. But with larger angles it starts giving faulty results. (When I say larger, I mean something like within the range +-720 or +-1080)
In order to get more accurate results I increased the number of times my loop runs. That gave me better results but still that too had its limitations.
So I was wondering if there is any fault in my logic or do I need to fiddle with the conditional part of my loop? How can I overcome this shortcoming of my code? The inbuilt java sine function gives correct results for all the angles I have tested..so where am I going wrong?
Also can anyone give me an idea as to how do I modify the condition of my loop so that it runs until I get a desired decimal precision?
import java.util.Scanner;
class SineFunctionManual
{
public static void main(String a[])
{
System.out.print("Enter the angle for which you want to compute sine : ");
Scanner input = new Scanner(System.in);
int degreeAngle = input.nextInt(); //Angle in degree.
input.close();
double radianAngle = Math.toRadians(degreeAngle); //Sine computation is done in terms of radian angle
System.out.println(radianAngle);
double sineOfAngle = radianAngle,prevVal = radianAngle; //SineofAngle contains actual result, prevVal contains the next term to be added
//double fractionalPart = 0.1; // This variable is used to check the answer to a certain number of decimal places, as seen in the for loop
for(int i=3;i<=20;i+=2)
{
prevVal = (-prevVal)*((radianAngle*radianAngle)/(i*(i-1))); //x^3/3! can be written as ((x^2)/(3*2))*((x^1)/1!), similarly x^5/5! can be written as ((x^2)/(5*4))*((x^3)/3!) and so on. The negative sign is added because each successive term has alternate sign.
sineOfAngle+=prevVal;
//int iPart = (int)sineOfAngle;
//fractionalPart = sineOfAngle - iPart; //Extracting the fractional part to check the number of decimal places.
}
System.out.println("The value of sin of "+degreeAngle+" is : "+sineOfAngle);
}
}
The polynomial approximation for sine diverges widely for large positive and large negative values. Remember, since varies from -1 to 1 over all real numbers. Polynomials, on the other hand, particularly ones with higher orders, can't do that.
I would recommend using the periodicity of sine to your advantage.
int degreeAngle = input.nextInt() % 360;
This will give accurate answers, even for very, very large angles, without requiring an absurd number of terms.
The further you get from x=0, the more terms you need, of the Taylor expansion for sin x, to get within a particular accuracy of the correct answer. You're stopping around the 20th term, which is fine for small angles. If you want better accuracy for large angles, you'll just need to add more terms.

probability and programming simulation

I'm having some trouble understanding the following result.
I want to know if the following code is actually correct. It stumps me - but that could be due to me misunderstanding the probability involved.
The code should speak for itself, but to clarify the 'real world' simulation represents 2 people flipping a coin. When you lose you pay 1 dollar, when you win you win a dollar. An even sum game!
private static Random rnd = new Random();
public static void main(String[] args) {
int i=0;
for (int x = 0; x<1000000; x++) {
if (rnd.nextBoolean()) i+=1;
else i-=1;
}
System.out.println(i);
}
When I run this however I get huge swings! Whilst I would expect a large sample like this to converge to 0, I'm seeing +-4000
Not only that but increasing the sample size seems to only make the swings higher.
Am I misusing the random function ? :P
I think you're good. The thing to look at is the ratio of the swing to your sample.
4000 out of 1000000 for example is 0.4%
If you increase the sample size, you should expect that ratio to go down.
The results of your experiment should follow a binomial distribution. If the
number of trials is N, and the probability of success p=1/2, then the
number of successes N_success (for large enough N) should have a mean of approximately Np,
and standard deviation sqrt(N*p*(1-p)).
You're actually tracking K = (N_success - N_fail). So N_success = N/2 + K/2.
With 1,000,000 trials and K=4000, we get N_success = 502000. The expected
value is 500000, with standard deviation sqrt(250000) = 500. The difference
between the observed and expected values of N_success is 2000, or about 4 sigma.
That's significant enough to question whether the random number generator is
biased. On the other hand, if you're running this test thousands of times,
you'd expect a few outliers of this magnitude, and you seem to be seeing both
positive and negative values, so in the long run maybe things are OK after all.
You are simulating a one-dimensional random walk. Basically, imagine yourself standing on a line of integers. You begin at point i=0. With equal probability you take a step to the right or the left.
The random walk has a few cool properties and you've touched on my favourite:
Starting at point i=0, as N gets larger, the probability that you will return to that point approaches 1. As you point out - a zero sum game.
However, the expected time it will take you to return there tends to infinity. As you notice, you get some very large swings.
Since the average value should be 0 and the variance of N moves is N, then you would expect 95% of your simulations to end in the region: (- 1.96, 1.96)*N^(0.5).

Categories

Resources