Wondering why would one calculate the median this way?

Wondering why would one calculate the median this way? - java

I was wondering what may be the reason to use this median function, instead of just calculating the min + (max - min) / 2:
// used by the random number generator
private static final double M_E12 = 162754.79141900392083592475;
/**
* Return an estimate of median of n values distributed in [min,max)
* #param min the minimum value
* #param max the maximum value
* #param n
* #return an estimate of median of n values distributed in [min,max)
**/
private static double median(double min, double max, int n)
{
// get random value in [0.0, 1.0)
double t = (new Random()).nextDouble();
double retval;
if (t > 0.5) {
retval = java.lang.Math.log(1.0-(2.0*(M_E12-1)*(t-0.5)/M_E12))/12.0;
} else {
retval = -java.lang.Math.log(1.0-(2.0*(M_E12-1)*t/M_E12))/12.0;
}
// We now have something distributed on (-1.0,1.0)
retval = (retval+1.0) * (max-min)/2.0;
retval = retval + min;
return retval;
}
The only downside of my approach would maybe be its deterministic nature, I'd say?
The whole code can be found here, http://www.koders.com/java/fid42BB059926626852A0D146D54F7D66D7D2D5A28D.aspx?s=cdef%3atree#L8, btw.
Thanks

[trying to cover a range here because it's not clear to me what you're not understanding]
first, the median is the middle value. the median of [0,0,1,99,99] is 1.
and so we can see that the code given is not calculating the median (it's not finding a middle value). instead, it's estimating it from some theoretical distribution. as the comment says.
the forumla you give is for the mid-point. if many values are uniformly distributed between min and max then yes, that is a good estimation of the median. in this case (presumably) the values are not distributed in that way and so some other method is necessary.
you can see why this could be necessary by calculating the mid point of the numbers above - your formula would give 49.5.
the reason for using an estimate is probably that it is much faster than finding the median. the reason for making that estimate random is likely to avoid a bad worst case on multiple calls.
and finally, sorry but i don't know what the distribution is in this case. you probably need to search for the data structure and/or author name to see if you can find a paper or book reference (i thought it might be assuming a power law, but see edit below - it seems to be adding a very small correction) (i'm not sure if that is what you are asking, or if you are more generally confused).
[edit] looking some more, i think the log(...) is giving a central bias to the uniformly random t. so it's basically doing what you suggest, but with some spread around the 0.5. here's a plot of one case which shows that retval is actually a pretty small adjustment.

I can't tell you what this code is attempting to achieve; for a start it doesn't even use n!
But from the looks of it, it's simply generating some sort of exponentially-distributed random value in the range [min,max]. See http://en.wikipedia.org/wiki/Exponential_distribution#Generating_exponential_variates.
Interestingly, Googling for that magic number brings up lots of relevant hits, none of which are illuminating: http://www.google.co.uk/search?q=162754.79141900392083592475.

Related

Convert uniform random generation to binomial

I have written the following function to implement a type of mutation(creep) in my genetic algorithm project. Since I've used java's inbuilt random generation library, the probability of getting every index is uniform. I've been asked to modify the function such way that it uses binomial distribution instead of uniform. As far as I googled, I couldn't find any example/tutorial that demonstrates conversion of uniform to binomial. How do I achieve it?
int mutationRate = 0.001;
public void mutate_creep() {
if (random.nextDouble() <= mutationRate) {
// uniform random generation
int index = random.nextInt(chromoLen);
if(index%2 == 0) { // even index
chromo[index] += 1;
} else { // odd index
chromo[index] -= 1;
}
}
}
NOTE: I have already seen the solution at A efficient binomial random number generator code in Java. Since my problem here is specific to creep mutation algorithm, I'm not sure how it can be applied directly.

According to Wikipedia, you do this:
One way to generate random samples from a binomial distribution is to use an inversion algorithm. To do so, one must calculate the probability that P(X=k) for all values k from 0 through n. (These probabilities should sum to a value close to one, in order to encompass the entire sample space.) Then by using a pseudorandom number generator to generate samples uniformly between 0 and 1, one can transform the calculated samples U[0,1] into discrete numbers by using the probabilities calculated in step one.
I will leave it to you to "calculate the probability [...] for all values k from 0 through n". After that, it's a weighed distribution.
You can do that using a TreeMap, similar to how I show it in this answer.

Fitness function

Whilst searching on Google about Genetic Algorithms, I came across OneMax Problem, my search showed that this is one of the very first problem that the Genetic Algorithm was applied to. However, I am not exactly sure what is OneMax problem. Can anyone explain.
Any help is appreciated

The goal of One-Max problem is to create a binary string of length n where every single gene contains a 1. The fitness function is very simple, you just iterate through your binary string counting all ones. This is what the sum represents in the formula you provided with your post. It is just the number of ones in the binary string. You could also represent the fitness as a percentage, by dividing the number of ones by n * 0.01. A higher fitness would have a higher percentage. Eventually you will get a string of n ones with a fitness of 100% at some generation.
double fitness(List<int> chromosome) {
int ones = chromosome.stream().filter(g -> g == 1).count();
return ones / chromosome.size() * 0.01;
}

First random number after setSeed in Java always similar

To give some context, I have been writing a basic Perlin noise implementation in Java, and when it came to implementing seeding, I had encountered a bug that I couldn't explain.
In order to generate the same random weight vectors each time for the same seed no matter which set of coordinates' noise level is queried and in what order, I generated a new seed (newSeed), based on a combination of the original seed and the coordinates of the weight vector, and used this as the seed for the randomization of the weight vector by running:
rnd.setSeed(newSeed);
weight = new NVector(2);
weight.setElement(0, rnd.nextDouble() * 2 - 1);
weight.setElement(1, rnd.nextDouble() * 2 - 1);
weight.normalize()
Where NVector is a self-made class for vector mathematics.
However, when run, the program generated very bad noise:
After some digging, I found that the first element of each vector was very similar (and so the first nextDouble() call after each setSeed() call) resulting in the first element of every vector in the vector grid being similar.
This can be proved by running:
long seed = Long.valueOf(args[0]);
int loops = Integer.valueOf(args[1]);
double avgFirst = 0.0, avgSecond = 0.0, avgThird = 0.0;
double lastfirst = 0.0, lastSecond = 0.0, lastThird = 0.0;
for(int i = 0; i<loops; i++)
{
ran.setSeed(seed + i);
double first = ran.nextDouble();
double second = ran.nextDouble();
double third = ran.nextDouble();
avgFirst += Math.abs(first - lastfirst);
avgSecond += Math.abs(second - lastSecond);
avgThird += Math.abs(third - lastThird);
lastfirst = first;
lastSecond = second;
lastThird = third;
}
System.out.println("Average first difference.: " + avgFirst/loops);
System.out.println("Average second Difference: " + avgSecond/loops);
System.out.println("Average third Difference.: " + avgSecond/loops);
Which finds the average difference between the first, second and third random numbers generated after a setSeed() method has been called over a range of seeds as specified by the program's arguments; which for me returned these results:
C:\java Test 462454356345 10000
Average first difference.: 7.44638117976783E-4
Average second Difference: 0.34131692827329957
Average third Difference.: 0.34131692827329957
C:\java Test 46245445 10000
Average first difference.: 0.0017196011123287126
Average second Difference: 0.3416750057190849
Average third Difference.: 0.3416750057190849
C:\java Test 1 10000
Average first difference.: 0.0021601598225344998
Average second Difference: 0.3409914232342002
Average third Difference.: 0.3409914232342002
Here you can see that the first average difference is significantly smaller than the rest, and seemingly decreasing with higher seeds.
As such, by adding a simple dummy call to nextDouble() before setting the weight vector, I was able to fix my perlin noise implementation:
rnd.setSeed(newSeed);
rnd.nextDouble();
weight.setElement(0, rnd.nextDouble() * 2 - 1);
weight.setElement(1, rnd.nextDouble() * 2 - 1);
Resulting in:
I would like to know why this bad variation in the first call to nextDouble() (I have not checked other types of randomness) occurs and/or to alert people to this issue.
Of course, it could just be an implementation error on my behalf, which I would be greatful if it were pointed out to me.

The Random class is designed to be a low overhead source of pseudo-random numbers. But the consequence of the "low overhead" implementation is that the number stream has properties that are a long way off perfect ... from a statistical perspective. You have encountered one of the imperfections. Random is documented as being a Linear Congruential generator, and the properties of such generators are well known.
There are a variety of ways of dealing with this. For example, if you are careful you can hide some of the most obvious "poor" characteristics. (But you would be advised to run some statistical tests. You can't see non-randomness in the noise added to your second image, but it could still be there.)
Alternatively, if you want pseudo-random numbers that have guaranteed good statistical properties, then you should be using SecureRandom instead of Random. It has significantly higher overheads, but you can be assured that many "smart people" will have spent a lot of time on the design, testing and analysis of the algorithms.
Finally, it is relatively simple to create a subclass of Random that uses an alternative algorithm for generating the numbers; see link. The problem is that you have to select (or design) and implement an appropriate algorithm.
Calling this an "issue" is debatable. It is a well known and understood property of LCGs, and use of LCGs was a concious engineering choice. People want low overhead PRNGs, but low overhead PRNGs have poor properties. TANSTAAFL.
Certainly, this is not something that Oracle would contemplate changing in Random. Indeed, the reasons for not changing are stated clearly in the javadoc for the Random class.
"In order to guarantee this property, particular algorithms are specified for the class Random. Java implementations must use all the algorithms shown here for the class Random, for the sake of absolute portability of Java code."

This is known issue. Similar seed will generate similar few first values. Random wasn't really designed to be used this way. You are supposed to create instance with a good seed and then generate moderately sized sequence of "random" numbers.
Your current solution is ok - as long as it looks good and is fast enough. You can also consider using hashing/mixing functions which were designed to solve your problem (and then, optionally, using the output as seed). For example see: Parametric Random Function For 2D Noise Generation

Move your setSeed out of the loop. Java's PRNG is a linear congruential generator, so seeding it with sequential values is guaranteed to give results that are correlated across iterations of the loop.
ADDENDUM
I dashed that off before running out the door to a meeting, and now have time to illustrate what I was saying above.
I've written a little Ruby script which implements Schrage's portable prime modulus multiplicative linear congruential generator. I instantiate two copies of the LCG, both seeded with a value of 1. However, in each iteration of the output loop I reseed the second one based on the loop index. Here's the code:
# Implementation of a Linear Congruential Generator (LCG)
class LCG
attr_reader :state
M = (1 << 31) - 1 # Modulus = 2**31 - 1, which is prime
# constructor requires setting a seed value to use as initial state
def initialize(seed)
reseed(seed)
end
# users can explicitly reset the seed.
def reseed(seed)
#state = seed.to_i
end
# Schrage's portable prime modulus multiplicative LCG
def value
#state = 16807 * #state % M
# return the generated integer value AND its U(0,1) mapping as an array
[#state, #state.to_f / M]
end
end
if __FILE__ == $0
# create two instances of LCG, both initially seeded with 1
mylcg1 = LCG.new(1)
mylcg2 = LCG.new(1)
puts " default progression manual reseeding"
10.times do |n|
mylcg2.reseed(1 + n) # explicitly reseed 2nd LCG based on loop index
printf "%d %11d %f %11d %f\n", n, *mylcg1.value, *mylcg2.value
end
end
and here's the output it produces:
default progression manual reseeding
0 16807 0.000008 16807 0.000008
1 282475249 0.131538 33614 0.000016
2 1622650073 0.755605 50421 0.000023
3 984943658 0.458650 67228 0.000031
4 1144108930 0.532767 84035 0.000039
5 470211272 0.218959 100842 0.000047
6 101027544 0.047045 117649 0.000055
7 1457850878 0.678865 134456 0.000063
8 1458777923 0.679296 151263 0.000070
9 2007237709 0.934693 168070 0.000078
The columns are iteration number followed by the underlying integer generated by the LCG and the result when scaled to the range (0,1). The left set of columns show the natural progression of the LCG when allowed to proceed on its own, while the right set show what happens when you reseed on each iteration.

What could Error in this java program to compute sine?

I have written this code to compute the sine of an angle. This works fine for smaller angles, say upto +-360. But with larger angles it starts giving faulty results. (When I say larger, I mean something like within the range +-720 or +-1080)
In order to get more accurate results I increased the number of times my loop runs. That gave me better results but still that too had its limitations.
So I was wondering if there is any fault in my logic or do I need to fiddle with the conditional part of my loop? How can I overcome this shortcoming of my code? The inbuilt java sine function gives correct results for all the angles I have tested..so where am I going wrong?
Also can anyone give me an idea as to how do I modify the condition of my loop so that it runs until I get a desired decimal precision?
import java.util.Scanner;
class SineFunctionManual
{
public static void main(String a[])
{
System.out.print("Enter the angle for which you want to compute sine : ");
Scanner input = new Scanner(System.in);
int degreeAngle = input.nextInt(); //Angle in degree.
input.close();
double radianAngle = Math.toRadians(degreeAngle); //Sine computation is done in terms of radian angle
System.out.println(radianAngle);
double sineOfAngle = radianAngle,prevVal = radianAngle; //SineofAngle contains actual result, prevVal contains the next term to be added
//double fractionalPart = 0.1; // This variable is used to check the answer to a certain number of decimal places, as seen in the for loop
for(int i=3;i<=20;i+=2)
{
prevVal = (-prevVal)*((radianAngle*radianAngle)/(i*(i-1))); //x^3/3! can be written as ((x^2)/(3*2))*((x^1)/1!), similarly x^5/5! can be written as ((x^2)/(5*4))*((x^3)/3!) and so on. The negative sign is added because each successive term has alternate sign.
sineOfAngle+=prevVal;
//int iPart = (int)sineOfAngle;
//fractionalPart = sineOfAngle - iPart; //Extracting the fractional part to check the number of decimal places.
}
System.out.println("The value of sin of "+degreeAngle+" is : "+sineOfAngle);
}
}

The polynomial approximation for sine diverges widely for large positive and large negative values. Remember, since varies from -1 to 1 over all real numbers. Polynomials, on the other hand, particularly ones with higher orders, can't do that.
I would recommend using the periodicity of sine to your advantage.
int degreeAngle = input.nextInt() % 360;
This will give accurate answers, even for very, very large angles, without requiring an absurd number of terms.

The further you get from x=0, the more terms you need, of the Taylor expansion for sin x, to get within a particular accuracy of the correct answer. You're stopping around the 20th term, which is fine for small angles. If you want better accuracy for large angles, you'll just need to add more terms.

Selecting a value proportionally based on its double key

I have a list of values, keyed with doubles between 0 and 1 that represent how likely I think it is for a thing to be useful to me. For example, for getting an answer to a question:
0.5 call your mom
0.25 go to the library
0.6 StackOverflow
0.9 just Google it
So, we think that Googling it is (about) twice as likely to be helpful as asking your mom. When I attempt to figure out the next thing to do, I'd like "just Google it" to be returned about twice as often as "call your mom".
I've been searching for solutions with little success. Most of the things that I've found rely on having integer keys (like How to randomly select a key based on its Integer value in a Map with respect to the other values in O(n) time?), which I don't have and which I can't easily generate.
I feel like there should be some Java datatype that can do this for me. Any suggestions?

You can think of a solution based on the java interface NavigableMap, and if you use the TreeMap implementation you will always get a O(logn) complexity.
You can use one of the following:
lowerEntry
ceilingEntry
floorEntry
higherEntry
Now you just need to extract random numbers with the right probability. For that I would refer to this post:
How to generate a random number from specified discrete distribution?

If I understood correctly, what you're looking for is a weighted random.
You should sum all your weights, and maybe normalize this to an integer value, so you will be able to use the rand.nextInt as suggested by comments.
Normalization can be done by multiplying by 100 for example, so your normalized weights are now:
50, 25, 60, 90 - The sum is 225.
You should define ranges:
0 - 49 is for "calling your mum"
50 - 74 - is for "go to library"
Now you need to perform this.rand.nextInt(sum) - and get a value,
and this value should be mapped to one of the defined ranges.

If you keep track of what the total value of the probabilities are, you can do something like this:
double interval = 100;
double counter = 0;
double totalProbabilities = 2.25;
int randInt = new Random().nextInt((int)interval);
for (Element e: list) {
counter += (interval * e.probability() / totalProbabilities);
if (randInt < counter) {
return e.activity();
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.