What's with 181783497276652981 and 8682522807148012 in Random (Java 7)? - java

Why were 181783497276652981 and 8682522807148012 chosen in Random.java?
Here's the relevant source code from Java SE JDK 1.7:
/**
* Creates a new random number generator. This constructor sets
* the seed of the random number generator to a value very likely
* to be distinct from any other invocation of this constructor.
*/
public Random() {
this(seedUniquifier() ^ System.nanoTime());
}
private static long seedUniquifier() {
// L'Ecuyer, "Tables of Linear Congruential Generators of
// Different Sizes and Good Lattice Structure", 1999
for (;;) {
long current = seedUniquifier.get();
long next = current * 181783497276652981L;
if (seedUniquifier.compareAndSet(current, next))
return next;
}
}
private static final AtomicLong seedUniquifier
= new AtomicLong(8682522807148012L);
So, invoking new Random() without any seed parameter takes the current "seed uniquifier" and XORs it with System.nanoTime(). Then it uses 181783497276652981 to create another seed uniquifier to be stored for the next time new Random() is called.
The literals 181783497276652981L and 8682522807148012L are not placed in constants, but they don't appear anywhere else.
At first the comment gives me an easy lead. Searching online for that article yields the actual article. 8682522807148012 doesn't appear in the paper, but 181783497276652981 does appear -- as a substring of another number, 1181783497276652981, which is 181783497276652981 with a 1 prepended.
The paper claims that 1181783497276652981 is a number that yields good "merit" for a linear congruential generator. Was this number simply mis-copied into Java? Does 181783497276652981 have an acceptable merit?
And why was 8682522807148012 chosen?
Searching online for either number yields no explanation, only this page that also notices the dropped 1 in front of 181783497276652981.
Could other numbers have been chosen that would have worked as well as these two numbers? Why or why not?

Was this number simply mis-copied into Java?
Yes, seems to be a typo.
Does 181783497276652981 have an acceptable merit?
This could be determined using the evaluation algorithm presented in the paper. But the merit of the "original" number is probably higher.
And why was 8682522807148012 chosen?
Seems to be random. It could be the result of System.nanoTime() when the code was written.
Could other numbers have been chosen that would have worked as well as these two numbers?
Not every number would be equally "good". So, no.
Seeding Strategies
There are differences in the default-seeding schema between different versions and implementation of the JRE.
public Random() { this(System.currentTimeMillis()); }
public Random() { this(++seedUniquifier + System.nanoTime()); }
public Random() { this(seedUniquifier() ^ System.nanoTime()); }
The first one is not acceptable if you create multiple RNGs in a row. If their creation times fall in the same millisecond range, they will give completely identical sequences. (same seed => same sequence)
The second one is not thread safe. Multiple threads can get identical RNGs when initializing at the same time. Additionally, seeds of subsequent initializations tend to be correlated. Depending on the actual timer resolution of the system, the seed sequence could be linearly increasing (n, n+1, n+2, ...). As stated in How different do random seeds need to be? and the referenced paper Common defects in initialization of pseudorandom number generators, correlated seeds can generate correlation among the actual sequences of multiple RNGs.
The third approach creates randomly distributed and thus uncorrelated seeds, even across threads and subsequent initializations.
So the current java docs:
This constructor sets the seed of the random number generator to a
value very likely to be distinct from any other invocation of this
constructor.
could be extended by "across threads" and "uncorrelated"
Seed Sequence Quality
But the randomness of the seeding sequence is only as good as the underlying RNG.
The RNG used for the seed sequence in this java implementation uses a multiplicative linear congruential generator (MLCG) with c=0 and m=2^64. (The modulus 2^64 is implicitly given by the overflow of 64bit long integers)
Because of the zero c and the power-of-2-modulus, the "quality" (cycle length, bit-correlation, ...) is limited. As the paper says, besides the overall cycle length, every single bit has an own cycle length, which decreases exponentially for less significant bits. Thus, lower bits have a smaller repetition pattern. (The result of seedUniquifier() should be bit-reversed, before it is truncated to 48-bits in the actual RNG)
But it is fast! And to avoid unnecessary compare-and-set-loops, the loop body should be fast. This probably explains the usage of this specific MLCG, without addition, without xoring, just one multiplication.
And the mentioned paper presents a list of good "multipliers" for c=0 and m=2^64, as 1181783497276652981.
All in all: A for effort # JRE-developers ;) But there is a typo.
(But who knows, unless someone evaluates it, there is the possibility that the missing leading 1 actually improves the seeding RNG.)
But some multipliers are definitely worse:
"1" leads to a constant sequence.
"2" leads to a single-bit-moving sequence (somehow correlated)
...
The inter-sequence-correlation for RNGs is actually relevant for (Monte Carlo) Simulations, where multiple random sequences are instantiated and even parallelized. Thus a good seeding strategy is necessary to get "independent" simulation runs. Therefore the C++11 standard introduces the concept of a Seed Sequence for generating uncorrelated seeds.

If you consider that the equation used for the random number generator is:
Where X(n+1) is the next number, a is the multipler, X(n) is the current number, c is the increment and m is the modulus.
If you look further into Random, a, c and m are defined in the header of the class
private static final long multiplier = 0x5DEECE66DL; //= 25214903917 -- 'a'
private static final long addend = 0xBL; //= 11 -- 'c'
private static final long mask = (1L << 48) - 1; //= 2 ^ 48 - 1 -- 'm'
and looking at the method protected int next(int bits) this is were the equation is implemented
nextseed = (oldseed * multiplier + addend) & mask;
//X(n+1) = (X(n) * a + c ) mod m
This implies that the method seedUniquifier() is actually getting X(n) or in the first case at initialisation X(0) which is actually 8682522807148012 * 181783497276652981, this value is then modified further by the value of System.nanoTime(). This algorithm is consistent with the equation above but with the following X(0) = 8682522807148012, a = 181783497276652981, m = 2 ^ 64 and c = 0. But as the mod m of is preformed by the long overflow the above equation just becomes
Looking at the paper, the value of a = 1181783497276652981 is for m = 2 ^ 64, c = 0. So it appears to just be a typo and the value 8682522807148012 for X(0) which appears to be a seeming randomly chosen number from legacy code for Random. As seen here. But the merit of these chosen numbers could still be valid but as mentioned by Thomas B. probably not as "good" as the one in the paper.
EDIT - Below original thoughts have since been clarified so can be disregarded but leaving it for reference
This leads me the conclusions:
The reference to the paper is not for the value itself but for the methods used to obtain the values due to the different values of a, c and m
It is mere coincidence that the value is otherwise the same other than the leading 1 and the comment is misplaced (still struggling to believe this though)
OR
There has been a serious misunderstanding of the tables in the paper and the developers have just chosen a value at random as by the time it is multiplied out what was the point in using the table value in the first place especially as you can just provide your own seed value any way in which case these values are not even taken into account
So to answer your question
Could other numbers have been chosen that would have worked as well as these two numbers? Why or why not?
Yes, any number could have been used, in fact if you specify a seed value when you Instantiate Random you are using any other value. This value does not have any effect on the performance of the generator, this is determined by the values of a,c and m which are hard coded within the class.

As per the link you provided, they have chosen (after adding the missing 1 :) ) the best yield from 2^64 because long can't have have a number from 2^128

Related

How to generate Unique random number even application get closed in a day

The Requirement:
I need to generate 4-digit non-duplicate number - even my application get closed, generation of number must not have to be duplicate.
I don't want to store all previous number in any storage.
Is there any algorithm which has highest possibility to produce most of unique number in a day ?
Thank you
Don't generate a random number. Instead, generate a sequential number from 0000 to 9999, and then obfuscate it using the technique described in https://stackoverflow.com/a/34420445/56778.
That way, the only thing you have to save is the next sequential number.
That example uses a multiplicative inverse to map the numbers from 0 to 100 to other numbers within the same range. Every number from 0 to 100 will be mapped to a unique number between 0 and 100. It's quick and easy, and you can change the mapping by changing the constants.
More information at http://blog.mischel.com/2017/06/20/how-to-generate-random-looking-keys/
Ten thousand is too few
Generating more than a few random numbers within a range of 0 to 9,999 is not likely to go well. Depending on how many such numbers you need, you are quite likely to run into duplicates. That seems rather obvious.
Cryptographically-strong random number generator
As for generating the most randomness, you need a cryptographically-strong random number generator that produces non-deterministic output.
java.security.SecureRandom
Java provides such a beast in the java.security.SecureRandom class. Study the class documentation for options.
SecureRandom secureRandom = new SecureRandom();
Notice that a SecureRandom is a java.util.Random. On that interface you will find convenient methods such as nextInt​(int bound). You get a pseudorandom, uniformly distributed int value between 0 (inclusive) and the specified value (exclusive).
int r = secureRandom.nextInt( 10_000 ) ;
Let's try it.
SecureRandom secureRandom = new SecureRandom();
for( int i = 1 ; i <= 20 ; i ++ )
{
System.out.println( secureRandom.nextInt( 10_000 ) ) ;
}
7299
3581
7106
1195
8713
9517
6954
5558
6136
1623
7006
2910
5855
6273
1691
588
5629
7347
7123
6973
If you need more than few such numbers, or they absolutely must be distinct (no duplicates), than you must change your constraints.
For a small range like 10,000, keep a collection of generated numbers, checking each newly generated number to see if it has been used already.
Dramatically expand your limit far beyond 10,000.
UUID
If you need a universally unique identifier, use, well, a Universally Unique Identifier (UUID). Java represents a UUID value with the java.util.UUID class.
UUIDs were designed for systems to be able to independently generate a virtually unique identifier without coordinating with a central authority. They are 128-bit values. That is quadruple the 32-bits of a int integer primitive in Java. UUIDs are often displayed to humans as a canonically-formatted 36-character hexadecimal string grouped with hyphens. Do not confuse this textual representation for a UUID value itself; a UUID is a 128-bit value, not text.
The Version 1 type of UUIDs are ideal, as they represent a point in both space and time by combining a date-time, a MAC address, and a small arbitrary number. This makes it virtually impossible to have duplicates. By “virtually impossible”, I mean literally astronomically-large numbers. Java does not bundle a Version 1 generator because of security/privacy concerns. You can add a library, make a web services call, or ask your database such as Postgres to generate one for you.
Or, for most cases where we need a relatively small number of instances, use the Version 4 UUID where 122 of the 128 bits are randomly generated. Java bundles a Version 4 generator. Simply call UUID.randomUUID.
I agree with Basil Bourque and others about whether what you propose is the "right" approach. However, if you really want to do what you are proposing, then one way to achieve your goal (generate as many numbers as possible within range in pseudorandom order without having to store all previous values generated):
find a random number generator that has a period roughly within the range that you require;
to generate the next ID, take the next number generated, discarding ones that are not within range.
So for four digit numbers, one option would be an XORShift generator configured to generate 16 bit numbers (i.e. in the range 1-16383, near enough to 1-999):
private int seed = 1;
public int nextJobID() {
do {
seed = (seed ^ (seed << 5)) & 0x3fff;
seed = (seed ^ (seed >>> 3)) & 0x3fff;
seed = (seed ^ (seed << 7)) & 0x3fff;
} while (seed >= 10000);
return seed;
}
To generate a new sequence each "day", set 'seed' to any number between 1 and 16383. [It'll be the same sequence, just starting at a different point. You could vary the sequence a little e.g. by taking every nth job ID where n is a small number, reversing the pattern of shifts (do >>>, <<, >>> instead of <<, >>>, <<) or finding some other combination of shifts (not 5/3/7) that produce a complete period.]
This technique is guaranteed to produce all numbers in range in "random" order. It doesn't offer any other guarantee, e.g. that it will be the most efficient or produce the "highest quality" of randomness achievable. It's probably good enough -- if not as good as you can get -- given the requirements you set out.

Java Math.random period

I'm working on a big school project about random numbers, but I can't find the period for Math.random(). I have version 7.0.800.15 installed and I'm working an a windows 10 computer. I've tried determining the period with a simple program which saves the first values of:
double num = Math.random();
in an array and then loops until it finds the same values in a row again, thus a period would have passed, but with no result, the period is too long.
So my question is: What is the period of Math.random() on my version?
Or: Is there a way to determine the period using a simple program?
Edit: took away a source pointing to a page about JavaScript, it was not relevant
Java's Math.Random uses a linear congruential generator with a modulus of 2^48. The period of such pseudorandom generator with well-chosen parameters is equal to the modulus. Apparently the parameters in Java are sanely chosen, so in practise the period is 2^48.
Sources:
https://en.wikipedia.org/wiki/Linear_congruential_generator
http://www.javamex.com/tutorials/random_numbers/java_util_random_algorithm.shtml#.WKX-gRJ97dQ
The wiki on linear congruential generator cites Java (java.util.Random) as having a modulus of 248.
That is likely the period but you may need to read more about these types of random generators.
This question (How good is java.util.Random?) also cites the same period.
Just to add to the other answers and to comment a little more generally on random number generators and writing a program to determine what the period is, beware of the Birthday Paradox and the Gambler's Fallacy. If you generate some value x, the next number is still just as likely to be x as any other number, and the number of numbers you need to generate before you're likely to have a duplicate is actually surprisingly small (meaning that you could, in principle, start seeing some duplicates before the end of the period, which complicates writing a program to test this).
The probability of a duplicate for probabilities up to 50% or so can be approximated by sqrt(2m * p(n)) where p(n) is the probability you're trying to calculating and m is the number of choices. For a 32-bit integer, sqrt(2m * p(n)) = sqrt(2 * 2^32 * 0.5) = sqrt(2^32) = 65,536. There you have it - once you generate 65,536 numbers there's approximately a 50-50 chance you've generated a duplicate.
Once you've generated 2^32 + 1 values, the Pigeonhole Principle specifies that you must have generated at least one duplicate (assuming, of course, that you're generating a 32-bit number).
You may also be interested in this question on whether you can count on random numbers to be unique.

First random number after setSeed in Java always similar

To give some context, I have been writing a basic Perlin noise implementation in Java, and when it came to implementing seeding, I had encountered a bug that I couldn't explain.
In order to generate the same random weight vectors each time for the same seed no matter which set of coordinates' noise level is queried and in what order, I generated a new seed (newSeed), based on a combination of the original seed and the coordinates of the weight vector, and used this as the seed for the randomization of the weight vector by running:
rnd.setSeed(newSeed);
weight = new NVector(2);
weight.setElement(0, rnd.nextDouble() * 2 - 1);
weight.setElement(1, rnd.nextDouble() * 2 - 1);
weight.normalize()
Where NVector is a self-made class for vector mathematics.
However, when run, the program generated very bad noise:
After some digging, I found that the first element of each vector was very similar (and so the first nextDouble() call after each setSeed() call) resulting in the first element of every vector in the vector grid being similar.
This can be proved by running:
long seed = Long.valueOf(args[0]);
int loops = Integer.valueOf(args[1]);
double avgFirst = 0.0, avgSecond = 0.0, avgThird = 0.0;
double lastfirst = 0.0, lastSecond = 0.0, lastThird = 0.0;
for(int i = 0; i<loops; i++)
{
ran.setSeed(seed + i);
double first = ran.nextDouble();
double second = ran.nextDouble();
double third = ran.nextDouble();
avgFirst += Math.abs(first - lastfirst);
avgSecond += Math.abs(second - lastSecond);
avgThird += Math.abs(third - lastThird);
lastfirst = first;
lastSecond = second;
lastThird = third;
}
System.out.println("Average first difference.: " + avgFirst/loops);
System.out.println("Average second Difference: " + avgSecond/loops);
System.out.println("Average third Difference.: " + avgSecond/loops);
Which finds the average difference between the first, second and third random numbers generated after a setSeed() method has been called over a range of seeds as specified by the program's arguments; which for me returned these results:
C:\java Test 462454356345 10000
Average first difference.: 7.44638117976783E-4
Average second Difference: 0.34131692827329957
Average third Difference.: 0.34131692827329957
C:\java Test 46245445 10000
Average first difference.: 0.0017196011123287126
Average second Difference: 0.3416750057190849
Average third Difference.: 0.3416750057190849
C:\java Test 1 10000
Average first difference.: 0.0021601598225344998
Average second Difference: 0.3409914232342002
Average third Difference.: 0.3409914232342002
Here you can see that the first average difference is significantly smaller than the rest, and seemingly decreasing with higher seeds.
As such, by adding a simple dummy call to nextDouble() before setting the weight vector, I was able to fix my perlin noise implementation:
rnd.setSeed(newSeed);
rnd.nextDouble();
weight.setElement(0, rnd.nextDouble() * 2 - 1);
weight.setElement(1, rnd.nextDouble() * 2 - 1);
Resulting in:
I would like to know why this bad variation in the first call to nextDouble() (I have not checked other types of randomness) occurs and/or to alert people to this issue.
Of course, it could just be an implementation error on my behalf, which I would be greatful if it were pointed out to me.
The Random class is designed to be a low overhead source of pseudo-random numbers. But the consequence of the "low overhead" implementation is that the number stream has properties that are a long way off perfect ... from a statistical perspective. You have encountered one of the imperfections. Random is documented as being a Linear Congruential generator, and the properties of such generators are well known.
There are a variety of ways of dealing with this. For example, if you are careful you can hide some of the most obvious "poor" characteristics. (But you would be advised to run some statistical tests. You can't see non-randomness in the noise added to your second image, but it could still be there.)
Alternatively, if you want pseudo-random numbers that have guaranteed good statistical properties, then you should be using SecureRandom instead of Random. It has significantly higher overheads, but you can be assured that many "smart people" will have spent a lot of time on the design, testing and analysis of the algorithms.
Finally, it is relatively simple to create a subclass of Random that uses an alternative algorithm for generating the numbers; see link. The problem is that you have to select (or design) and implement an appropriate algorithm.
Calling this an "issue" is debatable. It is a well known and understood property of LCGs, and use of LCGs was a concious engineering choice. People want low overhead PRNGs, but low overhead PRNGs have poor properties. TANSTAAFL.
Certainly, this is not something that Oracle would contemplate changing in Random. Indeed, the reasons for not changing are stated clearly in the javadoc for the Random class.
"In order to guarantee this property, particular algorithms are specified for the class Random. Java implementations must use all the algorithms shown here for the class Random, for the sake of absolute portability of Java code."
This is known issue. Similar seed will generate similar few first values. Random wasn't really designed to be used this way. You are supposed to create instance with a good seed and then generate moderately sized sequence of "random" numbers.
Your current solution is ok - as long as it looks good and is fast enough. You can also consider using hashing/mixing functions which were designed to solve your problem (and then, optionally, using the output as seed). For example see: Parametric Random Function For 2D Noise Generation
Move your setSeed out of the loop. Java's PRNG is a linear congruential generator, so seeding it with sequential values is guaranteed to give results that are correlated across iterations of the loop.
ADDENDUM
I dashed that off before running out the door to a meeting, and now have time to illustrate what I was saying above.
I've written a little Ruby script which implements Schrage's portable prime modulus multiplicative linear congruential generator. I instantiate two copies of the LCG, both seeded with a value of 1. However, in each iteration of the output loop I reseed the second one based on the loop index. Here's the code:
# Implementation of a Linear Congruential Generator (LCG)
class LCG
attr_reader :state
M = (1 << 31) - 1 # Modulus = 2**31 - 1, which is prime
# constructor requires setting a seed value to use as initial state
def initialize(seed)
reseed(seed)
end
# users can explicitly reset the seed.
def reseed(seed)
#state = seed.to_i
end
# Schrage's portable prime modulus multiplicative LCG
def value
#state = 16807 * #state % M
# return the generated integer value AND its U(0,1) mapping as an array
[#state, #state.to_f / M]
end
end
if __FILE__ == $0
# create two instances of LCG, both initially seeded with 1
mylcg1 = LCG.new(1)
mylcg2 = LCG.new(1)
puts " default progression manual reseeding"
10.times do |n|
mylcg2.reseed(1 + n) # explicitly reseed 2nd LCG based on loop index
printf "%d %11d %f %11d %f\n", n, *mylcg1.value, *mylcg2.value
end
end
and here's the output it produces:
default progression manual reseeding
0 16807 0.000008 16807 0.000008
1 282475249 0.131538 33614 0.000016
2 1622650073 0.755605 50421 0.000023
3 984943658 0.458650 67228 0.000031
4 1144108930 0.532767 84035 0.000039
5 470211272 0.218959 100842 0.000047
6 101027544 0.047045 117649 0.000055
7 1457850878 0.678865 134456 0.000063
8 1458777923 0.679296 151263 0.000070
9 2007237709 0.934693 168070 0.000078
The columns are iteration number followed by the underlying integer generated by the LCG and the result when scaled to the range (0,1). The left set of columns show the natural progression of the LCG when allowed to proceed on its own, while the right set show what happens when you reseed on each iteration.

Please explain me the role of seed in class Random in java.util

Whenever we create an object of Random class in java. We either of the constructor
Random()
Random(long seed)
What is the purpose of seed here in the 2nd constructor and how can I use it to my benefit i.e. manipulate its use?
The answer above sums it up clearly. As per java api docs from oracle, the first constructor
Random()
"Creates a new random number generator. This constructor sets the seed of the random number generator to a value very likely to be distinct from any other invocation of this constructor. "
The seed is probably a derivative of the current time, or the current time itself. That should be enough to be "very likely to be distinct from any other invocation". Which, in essence, is most likely what you need, most of the time.
So why have another constructor that takes a seed?
Simply put, if you want to generate the same set of random numbers over and over, you use the same seed on your Random constructor. This is useful when doing experiments on different control sets, and you don't want to bother creating your own table of random inputs, but still want the same set of random input on a different experiment/control set.
There's no such thing as truly random numbers in computing. The available methods for getting a random number across all programming languages is nothing but an algorithm to simulate random numbers.
In some languages (C++, I know for sure), an unseeded random number generator will return the same series of numbers on every fresh execution of the program.
What is common is to seed the random number generator with the current time (which will be random enough for most purposes) so that the algorithm starts with a random number each time.
Pseudo-random number generators maintain some set of state information, which is advanced through some recurrence relation to determine the next value of the state. The output of a PRNG is some function of the state. Java's Random class uses a Linear Congruential Generator. LCG's work using the recurrence relationship Ui+1 = (A Ui + C) % M for some constant integer values A, C, and M. Java's current implementation uses a 48-bit state but uses 32 bits or less of it on each iteration of the recurrence.
Based on this, you can see that if you start with the same state you will get the exact same sequence of values out of your PRNG. This can be useful if you want to be able to reproduce exactly the same sequence of "randomness", for instance for debugging or for comparing two experiments head-to-head.
If you invoke the constructor without an argument, it picks a starting value for the state with a promise that different invocations are very likely to be distinct from each other. If you supply a seed to the constructor, that seed's value is used to set the initial state.

Why does java.util.Random use the mask?

Simplified (i.e., leaving concurrency out) Random.next(int bits) looks like
protected int next(int bits) {
seed = (seed * multiplier + addend) & mask;
return (int) (seed >>> (48 - bits));
}
where masking gets used to reduce the seed to 48 bits. Why is it better than just
protected int next(int bits) {
seed = seed * multiplier + addend;
return (int) (seed >>> (64 - bits));
}
? I've read quite a lot about random numbers, but see no reason for this.
The reason is that the lower bits tend to have a lower period (at least with the algorithm Java uses)
From Wikipedia - Linear Congruential Generator:
As shown above, LCG's do not always use all of the bits in the values they produce. The Java implementation produces 48 bits with each iteration but only returns the 32 most significant bits from these values. This is because the higher-order bits have longer periods than the lower order bits (see below). LCG's that use this technique produce much better values than those that do not.
edit:
after further reading (conveniently, on Wikipedia), the values of a, c, and m must satisfy these conditions to have the full period:
c and m must be relatively primes
a-1 is divisible by all prime factors of m
a-1 is a multiple of 4 if m is a multiple of 4
The only one that I can clearly tell is still satisfied is #3. #1 and #2 need to be checked, and I have a feeling that one (or both) of these fail.
From the docs at the top of java.util.Random:
The algorithm is described in The Art of Computer Programming,
Volume 2 by Donald Knuth in Section 3.2.1. It is a 48-bit seed,
linear congruential formula.
So the entire algorithm is designed to operate of 48-bit seeds, not 64 bit ones. I guess you can take it up with Mr. Knuth ;p
From wikipedia (the quote alluded to by the quote that #helloworld922 posted):
A further problem of LCGs is that the lower-order bits of the generated sequence have a far shorter period than the sequence as a whole if m is set to a power of 2. In general, the nth least significant digit in the base b representation of the output sequence, where bk = m for some integer k, repeats with at most period bn.
And furthermore, it continues (my italics):
The low-order bits of LCGs when m is a power of 2 should never be relied on for any degree of randomness whatsoever. Indeed, simply substituting 2n for the modulus term reveals that the low order bits go through very short cycles. In particular, any full-cycle LCG when m is a power of 2 will produce alternately odd and even results.
In the end, the reason is probably historical: the folks at Sun wanted something to work reliably, and the Knuth formula gave 32 significant bits. Note that the java.util.Random API says this (my italics):
If two instances of Random are created with the same seed, and the same sequence of method calls is made for each, they will generate and return identical sequences of numbers. In order to guarantee this property, particular algorithms are specified for the class Random. Java implementations must use all the algorithms shown here for the class Random, for the sake of absolute portability of Java code. However, subclasses of class Random are permitted to use other algorithms, so long as they adhere to the general contracts for all the methods.
So we're stuck with it as a reference implementation. However that doesn't mean you can't use another generator (and subclass Random or create a new class):
from the same Wikipedia page:
MMIX by Donald Knuth m=264 a=6364136223846793005 c=1442695040888963407
There's a 64-bit formula for you.
Random numbers are tricky (as Knuth notes) and depending on your needs, you might be fine with just calling java.util.Random twice and concatenating the bits if you need a 64-bit number. If you really care about the statistical properties, use something like Mersenne Twister, or if you care about information leakage / unpredictability use java.security.SecureRandom.
It doesn't look like there was a good reason for doing this.
Applying the mask is an conservative approach using a proven design.
Leaving it out most probably leads to a better generator, however, without knowing the math well, it's a risky step.
Another small advantage of masking is a speed gain on 8-bit architectures, since it uses 6 bytes instead of 8.

Categories

Resources