To give some context, I have been writing a basic Perlin noise implementation in Java, and when it came to implementing seeding, I had encountered a bug that I couldn't explain.
In order to generate the same random weight vectors each time for the same seed no matter which set of coordinates' noise level is queried and in what order, I generated a new seed (newSeed), based on a combination of the original seed and the coordinates of the weight vector, and used this as the seed for the randomization of the weight vector by running:
rnd.setSeed(newSeed);
weight = new NVector(2);
weight.setElement(0, rnd.nextDouble() * 2 - 1);
weight.setElement(1, rnd.nextDouble() * 2 - 1);
weight.normalize()
Where NVector is a self-made class for vector mathematics.
However, when run, the program generated very bad noise:
After some digging, I found that the first element of each vector was very similar (and so the first nextDouble() call after each setSeed() call) resulting in the first element of every vector in the vector grid being similar.
This can be proved by running:
long seed = Long.valueOf(args[0]);
int loops = Integer.valueOf(args[1]);
double avgFirst = 0.0, avgSecond = 0.0, avgThird = 0.0;
double lastfirst = 0.0, lastSecond = 0.0, lastThird = 0.0;
for(int i = 0; i<loops; i++)
{
ran.setSeed(seed + i);
double first = ran.nextDouble();
double second = ran.nextDouble();
double third = ran.nextDouble();
avgFirst += Math.abs(first - lastfirst);
avgSecond += Math.abs(second - lastSecond);
avgThird += Math.abs(third - lastThird);
lastfirst = first;
lastSecond = second;
lastThird = third;
}
System.out.println("Average first difference.: " + avgFirst/loops);
System.out.println("Average second Difference: " + avgSecond/loops);
System.out.println("Average third Difference.: " + avgSecond/loops);
Which finds the average difference between the first, second and third random numbers generated after a setSeed() method has been called over a range of seeds as specified by the program's arguments; which for me returned these results:
C:\java Test 462454356345 10000
Average first difference.: 7.44638117976783E-4
Average second Difference: 0.34131692827329957
Average third Difference.: 0.34131692827329957
C:\java Test 46245445 10000
Average first difference.: 0.0017196011123287126
Average second Difference: 0.3416750057190849
Average third Difference.: 0.3416750057190849
C:\java Test 1 10000
Average first difference.: 0.0021601598225344998
Average second Difference: 0.3409914232342002
Average third Difference.: 0.3409914232342002
Here you can see that the first average difference is significantly smaller than the rest, and seemingly decreasing with higher seeds.
As such, by adding a simple dummy call to nextDouble() before setting the weight vector, I was able to fix my perlin noise implementation:
rnd.setSeed(newSeed);
rnd.nextDouble();
weight.setElement(0, rnd.nextDouble() * 2 - 1);
weight.setElement(1, rnd.nextDouble() * 2 - 1);
Resulting in:
I would like to know why this bad variation in the first call to nextDouble() (I have not checked other types of randomness) occurs and/or to alert people to this issue.
Of course, it could just be an implementation error on my behalf, which I would be greatful if it were pointed out to me.
The Random class is designed to be a low overhead source of pseudo-random numbers. But the consequence of the "low overhead" implementation is that the number stream has properties that are a long way off perfect ... from a statistical perspective. You have encountered one of the imperfections. Random is documented as being a Linear Congruential generator, and the properties of such generators are well known.
There are a variety of ways of dealing with this. For example, if you are careful you can hide some of the most obvious "poor" characteristics. (But you would be advised to run some statistical tests. You can't see non-randomness in the noise added to your second image, but it could still be there.)
Alternatively, if you want pseudo-random numbers that have guaranteed good statistical properties, then you should be using SecureRandom instead of Random. It has significantly higher overheads, but you can be assured that many "smart people" will have spent a lot of time on the design, testing and analysis of the algorithms.
Finally, it is relatively simple to create a subclass of Random that uses an alternative algorithm for generating the numbers; see link. The problem is that you have to select (or design) and implement an appropriate algorithm.
Calling this an "issue" is debatable. It is a well known and understood property of LCGs, and use of LCGs was a concious engineering choice. People want low overhead PRNGs, but low overhead PRNGs have poor properties. TANSTAAFL.
Certainly, this is not something that Oracle would contemplate changing in Random. Indeed, the reasons for not changing are stated clearly in the javadoc for the Random class.
"In order to guarantee this property, particular algorithms are specified for the class Random. Java implementations must use all the algorithms shown here for the class Random, for the sake of absolute portability of Java code."
This is known issue. Similar seed will generate similar few first values. Random wasn't really designed to be used this way. You are supposed to create instance with a good seed and then generate moderately sized sequence of "random" numbers.
Your current solution is ok - as long as it looks good and is fast enough. You can also consider using hashing/mixing functions which were designed to solve your problem (and then, optionally, using the output as seed). For example see: Parametric Random Function For 2D Noise Generation
Move your setSeed out of the loop. Java's PRNG is a linear congruential generator, so seeding it with sequential values is guaranteed to give results that are correlated across iterations of the loop.
ADDENDUM
I dashed that off before running out the door to a meeting, and now have time to illustrate what I was saying above.
I've written a little Ruby script which implements Schrage's portable prime modulus multiplicative linear congruential generator. I instantiate two copies of the LCG, both seeded with a value of 1. However, in each iteration of the output loop I reseed the second one based on the loop index. Here's the code:
# Implementation of a Linear Congruential Generator (LCG)
class LCG
attr_reader :state
M = (1 << 31) - 1 # Modulus = 2**31 - 1, which is prime
# constructor requires setting a seed value to use as initial state
def initialize(seed)
reseed(seed)
end
# users can explicitly reset the seed.
def reseed(seed)
#state = seed.to_i
end
# Schrage's portable prime modulus multiplicative LCG
def value
#state = 16807 * #state % M
# return the generated integer value AND its U(0,1) mapping as an array
[#state, #state.to_f / M]
end
end
if __FILE__ == $0
# create two instances of LCG, both initially seeded with 1
mylcg1 = LCG.new(1)
mylcg2 = LCG.new(1)
puts " default progression manual reseeding"
10.times do |n|
mylcg2.reseed(1 + n) # explicitly reseed 2nd LCG based on loop index
printf "%d %11d %f %11d %f\n", n, *mylcg1.value, *mylcg2.value
end
end
and here's the output it produces:
default progression manual reseeding
0 16807 0.000008 16807 0.000008
1 282475249 0.131538 33614 0.000016
2 1622650073 0.755605 50421 0.000023
3 984943658 0.458650 67228 0.000031
4 1144108930 0.532767 84035 0.000039
5 470211272 0.218959 100842 0.000047
6 101027544 0.047045 117649 0.000055
7 1457850878 0.678865 134456 0.000063
8 1458777923 0.679296 151263 0.000070
9 2007237709 0.934693 168070 0.000078
The columns are iteration number followed by the underlying integer generated by the LCG and the result when scaled to the range (0,1). The left set of columns show the natural progression of the LCG when allowed to proceed on its own, while the right set show what happens when you reseed on each iteration.
Related
I wanted to make a random number picker in the range 1-50000.
But I want to do it so that the larger the number, the smaller the probability.
Probability like (1/2*number) or something else.
Can anybody help?
You need a mapping function of some sort. What you get from Random is a few 'primitive' constructs that you can trust do exactly what their javadoc spec says they do:
.nextInt(X) which returns, uniform random (i.e. the probability chart is an exact horizontal line), a randomly chosen number between 0 and X-1 inclusive.
.nextBoolean() which gives you 1 bit of randomness.
.nextDouble(), giving you a mostly uniform random number between 0.0 and 1.0
nextGaussian() which gives you a random number whose probability chart is a uniform normal curve with standard deviation = 1.0 and midpoint (average) of 0.0.
For the double-returning methods, you run into some trouble if you want exact precision. Computers aren't magical. As a consequence, if you e.g. write this mapping function to turn nextDouble() into a standard uniformly distributed 6-sided die roll, you'd think: int dieRoll = 1 + (int) (rnd.nextDouble() * 6); would do it. Had double been perfect, you'd be right. But they aren't, so, instead, best case scenario, 4 of 6 die faces are going to come up 750599937895083 times, and the other 2 die faces are going to come up 750599937895082 times. It'll be hard to really notice that, but it is provably imperfect. I assume this kind of tiny deviation doesn't matter to you, but, it's good to be aware that anytime you so much as mention double, inherent tiny errors creep into everything and you can't really stop that from happening.
What you need is some sort of mapping function that takes any amount of such randomly provided data (from those 3 primitives, and really only from nextInt/nextBoolean if you want to avoid the errors that double inherently brings) to produce what you want.
For example, imagine instead the 'primitive' I gave you is a uniform random value between 1 and 6, inclusive, i.e.: A standard 6-sided die roll. And I ask you to come up with a uniform algorithm (as in, each value is equally likely) to produce a number between 2 and 12, inclusive.
Perhaps you might think: Easy, just roll 2 dice and add em up. But that would be incorrect: 7 is far more likely than 12.
Instead, you'd roll 1 die and just register if it was even or odd. Then you roll the second die and that's your result, unless the first die was odd in which case you add 6 to it. If you get odd on the first die and 1 on the second die, you start the process over again; eventually you're bound to not roll snake eyes.
That'd be uniform random.
You can apply the same principle to your question. You need a mathematical function that maps the 'horizontal line' of .nextInt() to whatever curve you want. For example, sounds like you want to perhaps generate something and then take the square root and round it down, maybe. You're going to have to draw out or write a formula that precisely describes the probability density.
Here's an example:
while (true) {
int v = (int) (50000.0 * Math.abs(r.nextGaussian()));
if (v >= 1 && v <= 50000) return v;
}
That returns you a roughly normally distributed value, 1 being the most likely, 50000 being the least likely.
One simple formula that will give you a very close approximation to what you want is
Random random = new Random();
int result = (int) Math.pow( 50001, random.nextDouble());
That will give a result in the range 1 - 50000, where the probability of each result is approximately proportional to 1 / result, which is what you asked for.
The reason why it works is that the probability of result being any value n within the range is P( n <= 50001^x < n+1) where x is randomly distributed in [0,1). That's the probability that x falls between log(n) and log(n+1), where the logs are base 50001. But that probability is proportional to log (1 + 1/n), which is very close to 1/n.
I'm working on a big school project about random numbers, but I can't find the period for Math.random(). I have version 7.0.800.15 installed and I'm working an a windows 10 computer. I've tried determining the period with a simple program which saves the first values of:
double num = Math.random();
in an array and then loops until it finds the same values in a row again, thus a period would have passed, but with no result, the period is too long.
So my question is: What is the period of Math.random() on my version?
Or: Is there a way to determine the period using a simple program?
Edit: took away a source pointing to a page about JavaScript, it was not relevant
Java's Math.Random uses a linear congruential generator with a modulus of 2^48. The period of such pseudorandom generator with well-chosen parameters is equal to the modulus. Apparently the parameters in Java are sanely chosen, so in practise the period is 2^48.
Sources:
https://en.wikipedia.org/wiki/Linear_congruential_generator
http://www.javamex.com/tutorials/random_numbers/java_util_random_algorithm.shtml#.WKX-gRJ97dQ
The wiki on linear congruential generator cites Java (java.util.Random) as having a modulus of 248.
That is likely the period but you may need to read more about these types of random generators.
This question (How good is java.util.Random?) also cites the same period.
Just to add to the other answers and to comment a little more generally on random number generators and writing a program to determine what the period is, beware of the Birthday Paradox and the Gambler's Fallacy. If you generate some value x, the next number is still just as likely to be x as any other number, and the number of numbers you need to generate before you're likely to have a duplicate is actually surprisingly small (meaning that you could, in principle, start seeing some duplicates before the end of the period, which complicates writing a program to test this).
The probability of a duplicate for probabilities up to 50% or so can be approximated by sqrt(2m * p(n)) where p(n) is the probability you're trying to calculating and m is the number of choices. For a 32-bit integer, sqrt(2m * p(n)) = sqrt(2 * 2^32 * 0.5) = sqrt(2^32) = 65,536. There you have it - once you generate 65,536 numbers there's approximately a 50-50 chance you've generated a duplicate.
Once you've generated 2^32 + 1 values, the Pigeonhole Principle specifies that you must have generated at least one duplicate (assuming, of course, that you're generating a 32-bit number).
You may also be interested in this question on whether you can count on random numbers to be unique.
As part of a Monte Carlo simulation, I have to roll a group of dice until certain values show up a certain amount of times. My code that does this calls upon a dice class which generates a random number between 1 and 6, and returns it. Originally the code looked like
public void roll() {
value = (int)(Math.random()*6) + 1;
}
and it wasn't very fast. By exchanging Math.random() for
ThreadLocalRandom.current().nextInt(1, 7);
It ran a section in roughly 60% of the original time, which called this around about 250 million times.
As part of the full simulation it will call upon this method billions of times at the very least, so is there any faster way to do this?
Pick a random generator that is as fast and as good as you need it to be, and that isn't slowed down to a tiny fraction of its normal speed by thread safety mechanisms. Then pick a method of generating the [1..6] integer distribution that is a fast and as precise as you need it to be.
The fastest simple generator that is of sufficiently high quality to beat standard tests for PRNGs like TestU01 (instead of failing systematically, like the Mersenne Twister) is Sebastiano Vigna's xorshift64*. I'm showing it as C code but Sebastiano has it in Java as well:
uint64_t xorshift64s (int64_t &x)
{
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
return x * 2685821657736338717ull;
}
Sebastiano Vigna's site has lots of useful info, links and benchmark results. Including papers, for the mathematically inclined.
At that high resolution you can simply use 1 + xorshift64s(state) % 6 and the bias will be immeasurably small. If that is not fast enough, implement the modulo division by multiplication with the inverse. If that is not fast enough - if you cannot afford two MULs per variate - then it gets tricky and you need to come back here. xorshift1024* (Java) plus some bit trickery for the variate would be an option.
Batching - generating an array full of numbers and processing that, then refilling the array and so on - can unlock some speed reserves. Needlessly wrapping things in classes achieves the opposite.
P.S.: if ThreadLocalRandom and xorshift* are not fast enough for your purposes even with batching then you might be going about things in the wrong way, or you might be doing it in the wrong language. Or both.
P.P.S.: in languages like Java (or C#, or Delphi), abstraction is not free, it has a cost. In Java you also have to reckon with things like mandatory gratuitous array bounds checking, unless you have a compiler that can eliminate those checks. Teasing high performance out of a Java program can get very involved... In C++ you get abstraction and performance for free.
Darth is correct that Xorshift* is probably the best generator to use. Use it to fill a ring buffer of bytes, then fetch the bytes one at a time to roll your dice, refilling the buffer when you've fetched enough. To get the actual die roll, avoid division and bias by using rejection sampling. The rest of the code then looks something like this (in C):
do {
if (bp >= buffer + sizeof buffer) {
// refill buffer with Xorshifts
}
v = *bp++ & 7;
} while (v > 5);
return v;
This will allow you to get on average 6 die rolls per 64-bit random value.
Why were 181783497276652981 and 8682522807148012 chosen in Random.java?
Here's the relevant source code from Java SE JDK 1.7:
/**
* Creates a new random number generator. This constructor sets
* the seed of the random number generator to a value very likely
* to be distinct from any other invocation of this constructor.
*/
public Random() {
this(seedUniquifier() ^ System.nanoTime());
}
private static long seedUniquifier() {
// L'Ecuyer, "Tables of Linear Congruential Generators of
// Different Sizes and Good Lattice Structure", 1999
for (;;) {
long current = seedUniquifier.get();
long next = current * 181783497276652981L;
if (seedUniquifier.compareAndSet(current, next))
return next;
}
}
private static final AtomicLong seedUniquifier
= new AtomicLong(8682522807148012L);
So, invoking new Random() without any seed parameter takes the current "seed uniquifier" and XORs it with System.nanoTime(). Then it uses 181783497276652981 to create another seed uniquifier to be stored for the next time new Random() is called.
The literals 181783497276652981L and 8682522807148012L are not placed in constants, but they don't appear anywhere else.
At first the comment gives me an easy lead. Searching online for that article yields the actual article. 8682522807148012 doesn't appear in the paper, but 181783497276652981 does appear -- as a substring of another number, 1181783497276652981, which is 181783497276652981 with a 1 prepended.
The paper claims that 1181783497276652981 is a number that yields good "merit" for a linear congruential generator. Was this number simply mis-copied into Java? Does 181783497276652981 have an acceptable merit?
And why was 8682522807148012 chosen?
Searching online for either number yields no explanation, only this page that also notices the dropped 1 in front of 181783497276652981.
Could other numbers have been chosen that would have worked as well as these two numbers? Why or why not?
Was this number simply mis-copied into Java?
Yes, seems to be a typo.
Does 181783497276652981 have an acceptable merit?
This could be determined using the evaluation algorithm presented in the paper. But the merit of the "original" number is probably higher.
And why was 8682522807148012 chosen?
Seems to be random. It could be the result of System.nanoTime() when the code was written.
Could other numbers have been chosen that would have worked as well as these two numbers?
Not every number would be equally "good". So, no.
Seeding Strategies
There are differences in the default-seeding schema between different versions and implementation of the JRE.
public Random() { this(System.currentTimeMillis()); }
public Random() { this(++seedUniquifier + System.nanoTime()); }
public Random() { this(seedUniquifier() ^ System.nanoTime()); }
The first one is not acceptable if you create multiple RNGs in a row. If their creation times fall in the same millisecond range, they will give completely identical sequences. (same seed => same sequence)
The second one is not thread safe. Multiple threads can get identical RNGs when initializing at the same time. Additionally, seeds of subsequent initializations tend to be correlated. Depending on the actual timer resolution of the system, the seed sequence could be linearly increasing (n, n+1, n+2, ...). As stated in How different do random seeds need to be? and the referenced paper Common defects in initialization of pseudorandom number generators, correlated seeds can generate correlation among the actual sequences of multiple RNGs.
The third approach creates randomly distributed and thus uncorrelated seeds, even across threads and subsequent initializations.
So the current java docs:
This constructor sets the seed of the random number generator to a
value very likely to be distinct from any other invocation of this
constructor.
could be extended by "across threads" and "uncorrelated"
Seed Sequence Quality
But the randomness of the seeding sequence is only as good as the underlying RNG.
The RNG used for the seed sequence in this java implementation uses a multiplicative linear congruential generator (MLCG) with c=0 and m=2^64. (The modulus 2^64 is implicitly given by the overflow of 64bit long integers)
Because of the zero c and the power-of-2-modulus, the "quality" (cycle length, bit-correlation, ...) is limited. As the paper says, besides the overall cycle length, every single bit has an own cycle length, which decreases exponentially for less significant bits. Thus, lower bits have a smaller repetition pattern. (The result of seedUniquifier() should be bit-reversed, before it is truncated to 48-bits in the actual RNG)
But it is fast! And to avoid unnecessary compare-and-set-loops, the loop body should be fast. This probably explains the usage of this specific MLCG, without addition, without xoring, just one multiplication.
And the mentioned paper presents a list of good "multipliers" for c=0 and m=2^64, as 1181783497276652981.
All in all: A for effort # JRE-developers ;) But there is a typo.
(But who knows, unless someone evaluates it, there is the possibility that the missing leading 1 actually improves the seeding RNG.)
But some multipliers are definitely worse:
"1" leads to a constant sequence.
"2" leads to a single-bit-moving sequence (somehow correlated)
...
The inter-sequence-correlation for RNGs is actually relevant for (Monte Carlo) Simulations, where multiple random sequences are instantiated and even parallelized. Thus a good seeding strategy is necessary to get "independent" simulation runs. Therefore the C++11 standard introduces the concept of a Seed Sequence for generating uncorrelated seeds.
If you consider that the equation used for the random number generator is:
Where X(n+1) is the next number, a is the multipler, X(n) is the current number, c is the increment and m is the modulus.
If you look further into Random, a, c and m are defined in the header of the class
private static final long multiplier = 0x5DEECE66DL; //= 25214903917 -- 'a'
private static final long addend = 0xBL; //= 11 -- 'c'
private static final long mask = (1L << 48) - 1; //= 2 ^ 48 - 1 -- 'm'
and looking at the method protected int next(int bits) this is were the equation is implemented
nextseed = (oldseed * multiplier + addend) & mask;
//X(n+1) = (X(n) * a + c ) mod m
This implies that the method seedUniquifier() is actually getting X(n) or in the first case at initialisation X(0) which is actually 8682522807148012 * 181783497276652981, this value is then modified further by the value of System.nanoTime(). This algorithm is consistent with the equation above but with the following X(0) = 8682522807148012, a = 181783497276652981, m = 2 ^ 64 and c = 0. But as the mod m of is preformed by the long overflow the above equation just becomes
Looking at the paper, the value of a = 1181783497276652981 is for m = 2 ^ 64, c = 0. So it appears to just be a typo and the value 8682522807148012 for X(0) which appears to be a seeming randomly chosen number from legacy code for Random. As seen here. But the merit of these chosen numbers could still be valid but as mentioned by Thomas B. probably not as "good" as the one in the paper.
EDIT - Below original thoughts have since been clarified so can be disregarded but leaving it for reference
This leads me the conclusions:
The reference to the paper is not for the value itself but for the methods used to obtain the values due to the different values of a, c and m
It is mere coincidence that the value is otherwise the same other than the leading 1 and the comment is misplaced (still struggling to believe this though)
OR
There has been a serious misunderstanding of the tables in the paper and the developers have just chosen a value at random as by the time it is multiplied out what was the point in using the table value in the first place especially as you can just provide your own seed value any way in which case these values are not even taken into account
So to answer your question
Could other numbers have been chosen that would have worked as well as these two numbers? Why or why not?
Yes, any number could have been used, in fact if you specify a seed value when you Instantiate Random you are using any other value. This value does not have any effect on the performance of the generator, this is determined by the values of a,c and m which are hard coded within the class.
As per the link you provided, they have chosen (after adding the missing 1 :) ) the best yield from 2^64 because long can't have have a number from 2^128
I have a list of values, keyed with doubles between 0 and 1 that represent how likely I think it is for a thing to be useful to me. For example, for getting an answer to a question:
0.5 call your mom
0.25 go to the library
0.6 StackOverflow
0.9 just Google it
So, we think that Googling it is (about) twice as likely to be helpful as asking your mom. When I attempt to figure out the next thing to do, I'd like "just Google it" to be returned about twice as often as "call your mom".
I've been searching for solutions with little success. Most of the things that I've found rely on having integer keys (like How to randomly select a key based on its Integer value in a Map with respect to the other values in O(n) time?), which I don't have and which I can't easily generate.
I feel like there should be some Java datatype that can do this for me. Any suggestions?
You can think of a solution based on the java interface NavigableMap, and if you use the TreeMap implementation you will always get a O(logn) complexity.
You can use one of the following:
lowerEntry
ceilingEntry
floorEntry
higherEntry
Now you just need to extract random numbers with the right probability. For that I would refer to this post:
How to generate a random number from specified discrete distribution?
If I understood correctly, what you're looking for is a weighted random.
You should sum all your weights, and maybe normalize this to an integer value, so you will be able to use the rand.nextInt as suggested by comments.
Normalization can be done by multiplying by 100 for example, so your normalized weights are now:
50, 25, 60, 90 - The sum is 225.
You should define ranges:
0 - 49 is for "calling your mum"
50 - 74 - is for "go to library"
Now you need to perform this.rand.nextInt(sum) - and get a value,
and this value should be mapped to one of the defined ranges.
If you keep track of what the total value of the probabilities are, you can do something like this:
double interval = 100;
double counter = 0;
double totalProbabilities = 2.25;
int randInt = new Random().nextInt((int)interval);
for (Element e: list) {
counter += (interval * e.probability() / totalProbabilities);
if (randInt < counter) {
return e.activity();
}
}