probability and programming simulation

probability and programming simulation - java

I'm having some trouble understanding the following result.
I want to know if the following code is actually correct. It stumps me - but that could be due to me misunderstanding the probability involved.
The code should speak for itself, but to clarify the 'real world' simulation represents 2 people flipping a coin. When you lose you pay 1 dollar, when you win you win a dollar. An even sum game!
private static Random rnd = new Random();
public static void main(String[] args) {
int i=0;
for (int x = 0; x<1000000; x++) {
if (rnd.nextBoolean()) i+=1;
else i-=1;
}
System.out.println(i);
}
When I run this however I get huge swings! Whilst I would expect a large sample like this to converge to 0, I'm seeing +-4000
Not only that but increasing the sample size seems to only make the swings higher.
Am I misusing the random function ? :P

I think you're good. The thing to look at is the ratio of the swing to your sample.
4000 out of 1000000 for example is 0.4%
If you increase the sample size, you should expect that ratio to go down.

The results of your experiment should follow a binomial distribution. If the
number of trials is N, and the probability of success p=1/2, then the
number of successes N_success (for large enough N) should have a mean of approximately Np,
and standard deviation sqrt(N*p*(1-p)).
You're actually tracking K = (N_success - N_fail). So N_success = N/2 + K/2.
With 1,000,000 trials and K=4000, we get N_success = 502000. The expected
value is 500000, with standard deviation sqrt(250000) = 500. The difference
between the observed and expected values of N_success is 2000, or about 4 sigma.
That's significant enough to question whether the random number generator is
biased. On the other hand, if you're running this test thousands of times,
you'd expect a few outliers of this magnitude, and you seem to be seeing both
positive and negative values, so in the long run maybe things are OK after all.

You are simulating a one-dimensional random walk. Basically, imagine yourself standing on a line of integers. You begin at point i=0. With equal probability you take a step to the right or the left.
The random walk has a few cool properties and you've touched on my favourite:
Starting at point i=0, as N gets larger, the probability that you will return to that point approaches 1. As you point out - a zero sum game.
However, the expected time it will take you to return there tends to infinity. As you notice, you get some very large swings.
Since the average value should be 0 and the variance of N moves is N, then you would expect 95% of your simulations to end in the region: (- 1.96, 1.96)*N^(0.5).

Related

How is it possible to get a random number with a specific probability?

I wanted to make a random number picker in the range 1-50000.
But I want to do it so that the larger the number, the smaller the probability.
Probability like (1/2*number) or something else.
Can anybody help?

You need a mapping function of some sort. What you get from Random is a few 'primitive' constructs that you can trust do exactly what their javadoc spec says they do:
.nextInt(X) which returns, uniform random (i.e. the probability chart is an exact horizontal line), a randomly chosen number between 0 and X-1 inclusive.
.nextBoolean() which gives you 1 bit of randomness.
.nextDouble(), giving you a mostly uniform random number between 0.0 and 1.0
nextGaussian() which gives you a random number whose probability chart is a uniform normal curve with standard deviation = 1.0 and midpoint (average) of 0.0.
For the double-returning methods, you run into some trouble if you want exact precision. Computers aren't magical. As a consequence, if you e.g. write this mapping function to turn nextDouble() into a standard uniformly distributed 6-sided die roll, you'd think: int dieRoll = 1 + (int) (rnd.nextDouble() * 6); would do it. Had double been perfect, you'd be right. But they aren't, so, instead, best case scenario, 4 of 6 die faces are going to come up 750599937895083 times, and the other 2 die faces are going to come up 750599937895082 times. It'll be hard to really notice that, but it is provably imperfect. I assume this kind of tiny deviation doesn't matter to you, but, it's good to be aware that anytime you so much as mention double, inherent tiny errors creep into everything and you can't really stop that from happening.
What you need is some sort of mapping function that takes any amount of such randomly provided data (from those 3 primitives, and really only from nextInt/nextBoolean if you want to avoid the errors that double inherently brings) to produce what you want.
For example, imagine instead the 'primitive' I gave you is a uniform random value between 1 and 6, inclusive, i.e.: A standard 6-sided die roll. And I ask you to come up with a uniform algorithm (as in, each value is equally likely) to produce a number between 2 and 12, inclusive.
Perhaps you might think: Easy, just roll 2 dice and add em up. But that would be incorrect: 7 is far more likely than 12.
Instead, you'd roll 1 die and just register if it was even or odd. Then you roll the second die and that's your result, unless the first die was odd in which case you add 6 to it. If you get odd on the first die and 1 on the second die, you start the process over again; eventually you're bound to not roll snake eyes.
That'd be uniform random.
You can apply the same principle to your question. You need a mathematical function that maps the 'horizontal line' of .nextInt() to whatever curve you want. For example, sounds like you want to perhaps generate something and then take the square root and round it down, maybe. You're going to have to draw out or write a formula that precisely describes the probability density.
Here's an example:
while (true) {
int v = (int) (50000.0 * Math.abs(r.nextGaussian()));
if (v >= 1 && v <= 50000) return v;
}
That returns you a roughly normally distributed value, 1 being the most likely, 50000 being the least likely.

One simple formula that will give you a very close approximation to what you want is
Random random = new Random();
int result = (int) Math.pow( 50001, random.nextDouble());
That will give a result in the range 1 - 50000, where the probability of each result is approximately proportional to 1 / result, which is what you asked for.
The reason why it works is that the probability of result being any value n within the range is P( n <= 50001^x < n+1) where x is randomly distributed in [0,1). That's the probability that x falls between log(n) and log(n+1), where the logs are base 50001. But that probability is proportional to log (1 + 1/n), which is very close to 1/n.

Java: Poisson-based point process constrained in a small region

While simulating a random point process based on Poisson distribution that contains 1000 dots; they all appear to occupy a small region in the center of the window.
I used Donald Knuth inverse sampling algorithm to implement the Poisson-based pseudo-random number generator.
https://en.wikipedia.org/wiki/Poisson_distribution#Generating_Poisson-distributed_random_variables
Lambda value (aka the success rate) was set to window_dimension/2, and obtained this result (screenshot)
Code:
public double getPoisson(double lambda) {//250
double L = Math.exp(-lambda);
double p = 1d;
int k = 0;
do {
k++;
p *= Math.random();
} while (p > L);
return k-1;
}
`

It looks to me like the problem is with what you think the output should be, because the program seems to be generating pretty much what you asked. A Poisson with a rate of 500 will have both its expected value and its variance equal to 500, and for large values of λ it's pretty symmetric and bell-shaped. Taken together that all means the standard deviation is sqrt(500), which is slightly less than 22.4, so you should expect about 95% of your incomes to be 500&pm;45, which looks like what you're getting.
With your subsequent edit saying (in a comment) that λ=250, the results behave similarly. The likely range of outcomes in each dimension is 250&pm;31, still clustering to the center.
It's easy to confirm my explanation by creating Poisson random variates with a standard deviation such that &pm;3σ span your plot area.
You need a larger variance/standard deviation to increase the spread of outcomes across your window. To demo this, I went with a Poisson(6400)—which has a standard deviation of 80—and subtracted 6150 to give the result a mean of 250. The overwhelming majority of values will therefore fall between 0 and 500. I generated 1000 independent pairs of values and plotted them using the JMP statistics package, and here are the results:
and just for jollies, here's a plot of independent pairs of Normal(250, 80)'s:
They look pretty darn similar, don't they?
To reiterate, there's nothing wrong with the Poisson algorithm you used. It's doing exactly what you told it to do, even if that's not what you expected the results to look like.
Addendum
Since you don't believe that Poisson converges to Gaussian as lambda grows, here's some direct evidence for your specific case, again generated with JMP:
On the left is a histogram of 1000 randomly generated Poisson(250) values. Note the well-formed bell shape. I had JMP select the best continuous distribution fit based on AIC (Aikaike Information Criterion). It selected normality as the best possible fit, with the diagnostics on the right and the resulting density plot in red superimposed on the histogram. The results pretty much speak for themselves.

How to design a "machine learning" method for a coin flip game

I am finishing up an algorithms class and the professor wants us to do an exercise in "machine learning" I pretty much have free reign in the project, but, it has to fall under the general umbrella of supervised learning. So, I decided to do a coin flip simulation with bets being made by a computer player. My idea is to make the coin biased and see if the computer can discover that bias when placing bets by using training data. I'm really not sure how to approach this..
An idea I had was to increment a win counter and pass that as a parameter into a method when determining the bets after a training round of say 100 bets. That way the computer player could make a decision to place more heads, if for example heads came up with 90% probability. After sufficient training I think the computer would discover the bias.
I have code here to increment a win counter depending on the outcome, but, I'm not really sure where to go from there. I.E. how to hold the corresponding data for if it was a win and if heads or tails came up and then use that information to place the bets in the next round. Any advice would be greatly appreciated. I am also open to using a different method if anyone thinks I am going about this the wrong way.
Please keep in mind that I am a second year undergrad and have very little knowledge of and exposure to actual machine learning techniques, hence the quotation marks in my question.
public class CoinFlip {
public static void main(String[] args) {
int bank = 500;
int headsCount=0;
int tailsCount=0;
int winCounter=0;
int lossCounter=0;
String[] flipResults = new String[100];
ComputerPlayer randomPlayer = new ComputerPlayer();
Coin coin = new Coin();
for (int i = 0; i < 100; i++) {
randomPlayer.placeRandomBet();
flipResults[i] = coin.flip();
if(flipResults[i].equals("heads")){
headsCount += 1;
}
else{
tailsCount +=1;
}
if(flipResults[i].equals(randomPlayer.placeRandomBet())){
bank += 50;
winCounter+=1;
}
if(!flipResults[i].equals(randomPlayer.placeRandomBet())){
bank -=50;
lossCounter+=1;
}
}
}
public static class ComputerPlayer{
double bet;
String heads = "heads";
String tails = "tails";
public String placeRandomBet(){
bet = Math.random();
if(bet < .5){
return heads;
}
else return tails;
}
public static String placeLearnedBet(int wins; int losses){
//not sure where to start
}
public static class Coin {
double coin;
String heads = "heads";
String tails = "tails";
public String flip() {
if (Math.random() < .9) {
return heads;
} else {
return tails;
}
}
}

As far as finding the bias, that's not really all that difficult. Basically, you would just guess the bias based on the number of heads out of the total number of plays and this ratio would give you the average bias. So for instance if you got 51 heads and 49 tails, then you would guess the bias towards heads is 51%.
Ultimately, I think Benjy is correct that you should bet (heads or tails) based on which is more probable (according to your bias). As you get more and more samples, your guessed bias will converge to the actual bias. As far as the betting goes, from an expectation stand point, you should bet as much as possible; but from a realistic standpoint, this is not a good strategy (since you will eventually lose everything if you always bet everything unless the coin is 100% biased one way or the other).
A possible betting strategy could be "forward looking", i.e. assume you are going to make the same bet for the next n rounds and calculate the probability that you will win vs. lose. You aren't directly trying to find the amount to bet rather you are trying to find the number n (the number of rounds to look forward) that give a certain probability of success. The riskier the betting scheme, the smaller the accepted probability of success (i.e. the riskier the scheme) and the more conservative the betting scheme, then the larger accepted probability of success. Once you find the number of rounds needed to guarantee you success (within your selected probability) you can then divide your pot by the number of expected rounds, that is bet the maximum amount such that if you lose every one of the next n rounds, you can still make a bet.
You can see that with the above, a conservative scheme would require more and more forward rounds and thus the bet would get smaller and smaller whereas with a risky scheme, the probability of success is small and thus would require fewer rounds and allow for a larger bet.
The last interesting thing is to decide how risky of a scheme you should choose. Based on what I did, the less biased the coin (i.e. the closer to 50% chance of heads or tails), the riskier you should be and the more biased the coin (i.e. say 90% chance of tails or 90% chance of heads) the less risky your betting should be. This sort of makes sense because if the coin is completely fair, then there isn't a betting scheme that will usually win so your best just making large bets and taking the 50% chance that you come out on top. On the other hand, if the coin is extremely biased then the game itself isn't very risky and thus your betting scheme doesn't need to be very risky. In the latter case, if you tend to make large wagers then you increase the chance that you lose everything vs. smaller wagers that all but guarantee a large payoff. Keep in mind that as your pot grows, you can make larger and larger wagers without increasing your risk of losing money and thus if the coin is very biased it's quite easy to essentially get exponential growth in your pot.

If you know that coin is distributed uniformly with a bias then you should always go with the majority. If you do not know the distribution of the coin then you want to use an algorithm called randomized weighted majority (Regret Minimization). This algorithm basically states that you should choose heads with probability #heads/#flips. As far as the weights are concerned if you know that the coin is distributed uniformly then you can estimate the bias and calculate the variance of your estimate. As the number of flips goes up the variance goes down like the square root of the number of flips. Your wagers should go up appropriately. IMO the weights make the problem a little less clear. What is your goal? To optimize your total expected gain? If so then it is not clear whether you should wager at all until the last coin flip at which point your variance is lowest and wager all your money then. This strategy however does not minimize your variance because all your money lies on one wager. Also if the coin is unbiased is it better to not wager at all or to wager all your money. In both cases you have 0 expected gain but with very different variances.

What could Error in this java program to compute sine?

I have written this code to compute the sine of an angle. This works fine for smaller angles, say upto +-360. But with larger angles it starts giving faulty results. (When I say larger, I mean something like within the range +-720 or +-1080)
In order to get more accurate results I increased the number of times my loop runs. That gave me better results but still that too had its limitations.
So I was wondering if there is any fault in my logic or do I need to fiddle with the conditional part of my loop? How can I overcome this shortcoming of my code? The inbuilt java sine function gives correct results for all the angles I have tested..so where am I going wrong?
Also can anyone give me an idea as to how do I modify the condition of my loop so that it runs until I get a desired decimal precision?
import java.util.Scanner;
class SineFunctionManual
{
public static void main(String a[])
{
System.out.print("Enter the angle for which you want to compute sine : ");
Scanner input = new Scanner(System.in);
int degreeAngle = input.nextInt(); //Angle in degree.
input.close();
double radianAngle = Math.toRadians(degreeAngle); //Sine computation is done in terms of radian angle
System.out.println(radianAngle);
double sineOfAngle = radianAngle,prevVal = radianAngle; //SineofAngle contains actual result, prevVal contains the next term to be added
//double fractionalPart = 0.1; // This variable is used to check the answer to a certain number of decimal places, as seen in the for loop
for(int i=3;i<=20;i+=2)
{
prevVal = (-prevVal)*((radianAngle*radianAngle)/(i*(i-1))); //x^3/3! can be written as ((x^2)/(3*2))*((x^1)/1!), similarly x^5/5! can be written as ((x^2)/(5*4))*((x^3)/3!) and so on. The negative sign is added because each successive term has alternate sign.
sineOfAngle+=prevVal;
//int iPart = (int)sineOfAngle;
//fractionalPart = sineOfAngle - iPart; //Extracting the fractional part to check the number of decimal places.
}
System.out.println("The value of sin of "+degreeAngle+" is : "+sineOfAngle);
}
}

The polynomial approximation for sine diverges widely for large positive and large negative values. Remember, since varies from -1 to 1 over all real numbers. Polynomials, on the other hand, particularly ones with higher orders, can't do that.
I would recommend using the periodicity of sine to your advantage.
int degreeAngle = input.nextInt() % 360;
This will give accurate answers, even for very, very large angles, without requiring an absurd number of terms.

The further you get from x=0, the more terms you need, of the Taylor expansion for sin x, to get within a particular accuracy of the correct answer. You're stopping around the 20th term, which is fine for small angles. If you want better accuracy for large angles, you'll just need to add more terms.

Generating Random Numbers from 1-100

int getnum50()
{
Random rand = new Random();
return (1+rand.nextInt(50));
}
You are given a predefined function named getnum50() which returns an
integer which is one random number from 1-50.
You can call this function as many times as you want but beware
that this function is quite resource intensive.
You cannot use any other random generator. You can NOT change the
definition of getnum50().
Print numbers 1-100 in random order. (Not 100 random numbers)
Note:
i. Every number should be printed exactly once.
ii. There should be no pattern in the numbers listing. List should be
completely random i.e., all numbers have equal probability
appearing at any place.
iii. You may call getnum50() any number of time to get random number
from 1 to 50 but try to make the code optimised.
iv. You cannot use any other random generator function except
getnum50().
I wrote some code which was showing correct output.
import java.util.Random;
public class RandomInteger{
int number[]=new int[100];//To store numbers in random order
public RandomInteger(){
int n[]=new int[100];//array to store which random numbers are generated
int off[]={-1,0};//offset to add
System.out.println("Length of array number100 is:"+number.length);
System.out.println("Generating random numbers in the range 1-100:");
for(int n1=0;n1<number.length;n1++){
int rnd=off[(getnum50()-1)/50]+(getnum50()*2);
if(n[rnd-1] == 0){
n[rnd-1]=1;//to indicate which random number is generated
number[n1]=rnd;
System.out.println(number[n1]+" ");
}
}
}
//end of constructor
int getnum50(){
Random rand = new Random();
return (1+rand.nextInt(50));
}
public static void main(String args[]){
RandomInteger m= new RandomInteger();
}
//end of main()
}
//end of class
While it was accepted in that round, in the next round the interviewer tells me that getnum50() is a costly method and even in best case scenario I have to call it twice for every number generated. i.e. 200 times for 1-100. In worst case scenario it would be infinity and tens of thousand in average case. He asks me to optimize the code so as to significantly improve the average case.
I could not answer.So please give me proper answer for the question? How will I optimize my above code??

One stupid optimization would be be to just realize that since your randomized source is limited to 1-50, you might as well set TWO array positions, e.g.
rand = getnum50();
n[rand] = 1;
n[rand+50] = 1;
Now the array will be slightly "less" random, because every index n is going simply be 1/2 of whatever's at n+50, but at least you've cut ~half the build array construction time.

I think they want you to produce a shuffle algorithm.
In this, you start with an array of exactly 100 numbers ( 1 through 100 in order ), and then on each iteration you shuffle the numbers.
Do it enough times, and the original array is completely random.

The 50 is a red herring. Use two calls to random50, mod 10. Now you have two digits: tens and ones place. This gives you a random100() function.
The real killer is the generate-and-check approach. Instead, put the numbers 1-100 into an arraylist, and use your random100 to REMOVE a random index. Now your worst case scenario has 2n calls to random50. There's a few problems left to solve - overruns - but that's the approach I'd look at.

Your problem us that if you are toward the end of the list you will have to generate lots of random numbers to get a number in the couple of spots left. You could reduce a couple of ways one the fits into your current answer fairly will is as follows:
while(n[rnd-1] == 1)
{
rnd++;
rnd=end%101;
}
n[rnd-1]=1;//to indicate which random number is generated
number[n1]=rnd;
System.out.println(number[n1]+" ");
However if you assume that the getnum50 is more expensive than anything you can write you could reduce the number of getnum50 that you call while filling in the second half of the list. Each time you find a number you could reduce your search space by one so (using non primitives):
while(myArrayList.size()>1)
{
int rnd=0;
if(myArrayList.size()>50);
rnd=((getnum50()-1)/50)+((getnum50()*2)%myArrayList.size())+1;
else
rnd=getnum50()%myArrayList.size()+1;
System.out.println(rnd);
myArrayList.remove(rnd);
}
System.out.println(myArrayList.get(rnd);
myArrayList.remove(rnd);
In this example your best, average and worst are 149 getnum50 calls;

The reason you are calling the method getnum50() twice is because of this line:
int rnd = off[(getnum50()-1)/50] + (getnum50()*2);
which seems self-explanatory. And the reason your worst case scenario is so bad is because of this block:
if(n[rnd - 1] == 0){
n[rnd - 1] = 1; //to indicate which random number is generated
number[n1] = rnd;
System.out.println(number[n1] + " ");
}
Depending on how bad your luck is, it could take a very long time to get each value. So, best case, you make your two getnum50() calls, which WILL happen the first time, but as you fill up your array, it becomes increasingly less likely. For 100 numbers, the last number will have a 1% chance of success on the first time, and every time it fails, you make another two calls to getnum50().
Sorry, this doesn't answer HOW to improve your efficiency, but it does explain why the efficiency concerns. Hope it helps.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.