I'm working on a big school project about random numbers, but I can't find the period for Math.random(). I have version 7.0.800.15 installed and I'm working an a windows 10 computer. I've tried determining the period with a simple program which saves the first values of:
double num = Math.random();
in an array and then loops until it finds the same values in a row again, thus a period would have passed, but with no result, the period is too long.
So my question is: What is the period of Math.random() on my version?
Or: Is there a way to determine the period using a simple program?
Edit: took away a source pointing to a page about JavaScript, it was not relevant
Java's Math.Random uses a linear congruential generator with a modulus of 2^48. The period of such pseudorandom generator with well-chosen parameters is equal to the modulus. Apparently the parameters in Java are sanely chosen, so in practise the period is 2^48.
Sources:
https://en.wikipedia.org/wiki/Linear_congruential_generator
http://www.javamex.com/tutorials/random_numbers/java_util_random_algorithm.shtml#.WKX-gRJ97dQ
The wiki on linear congruential generator cites Java (java.util.Random) as having a modulus of 248.
That is likely the period but you may need to read more about these types of random generators.
This question (How good is java.util.Random?) also cites the same period.
Just to add to the other answers and to comment a little more generally on random number generators and writing a program to determine what the period is, beware of the Birthday Paradox and the Gambler's Fallacy. If you generate some value x, the next number is still just as likely to be x as any other number, and the number of numbers you need to generate before you're likely to have a duplicate is actually surprisingly small (meaning that you could, in principle, start seeing some duplicates before the end of the period, which complicates writing a program to test this).
The probability of a duplicate for probabilities up to 50% or so can be approximated by sqrt(2m * p(n)) where p(n) is the probability you're trying to calculating and m is the number of choices. For a 32-bit integer, sqrt(2m * p(n)) = sqrt(2 * 2^32 * 0.5) = sqrt(2^32) = 65,536. There you have it - once you generate 65,536 numbers there's approximately a 50-50 chance you've generated a duplicate.
Once you've generated 2^32 + 1 values, the Pigeonhole Principle specifies that you must have generated at least one duplicate (assuming, of course, that you're generating a 32-bit number).
You may also be interested in this question on whether you can count on random numbers to be unique.
Related
The Requirement:
I need to generate 4-digit non-duplicate number - even my application get closed, generation of number must not have to be duplicate.
I don't want to store all previous number in any storage.
Is there any algorithm which has highest possibility to produce most of unique number in a day ?
Thank you
Don't generate a random number. Instead, generate a sequential number from 0000 to 9999, and then obfuscate it using the technique described in https://stackoverflow.com/a/34420445/56778.
That way, the only thing you have to save is the next sequential number.
That example uses a multiplicative inverse to map the numbers from 0 to 100 to other numbers within the same range. Every number from 0 to 100 will be mapped to a unique number between 0 and 100. It's quick and easy, and you can change the mapping by changing the constants.
More information at http://blog.mischel.com/2017/06/20/how-to-generate-random-looking-keys/
Ten thousand is too few
Generating more than a few random numbers within a range of 0 to 9,999 is not likely to go well. Depending on how many such numbers you need, you are quite likely to run into duplicates. That seems rather obvious.
Cryptographically-strong random number generator
As for generating the most randomness, you need a cryptographically-strong random number generator that produces non-deterministic output.
java.security.SecureRandom
Java provides such a beast in the java.security.SecureRandom class. Study the class documentation for options.
SecureRandom secureRandom = new SecureRandom();
Notice that a SecureRandom is a java.util.Random. On that interface you will find convenient methods such as nextInt(int bound). You get a pseudorandom, uniformly distributed int value between 0 (inclusive) and the specified value (exclusive).
int r = secureRandom.nextInt( 10_000 ) ;
Let's try it.
SecureRandom secureRandom = new SecureRandom();
for( int i = 1 ; i <= 20 ; i ++ )
{
System.out.println( secureRandom.nextInt( 10_000 ) ) ;
}
7299
3581
7106
1195
8713
9517
6954
5558
6136
1623
7006
2910
5855
6273
1691
588
5629
7347
7123
6973
If you need more than few such numbers, or they absolutely must be distinct (no duplicates), than you must change your constraints.
For a small range like 10,000, keep a collection of generated numbers, checking each newly generated number to see if it has been used already.
Dramatically expand your limit far beyond 10,000.
UUID
If you need a universally unique identifier, use, well, a Universally Unique Identifier (UUID). Java represents a UUID value with the java.util.UUID class.
UUIDs were designed for systems to be able to independently generate a virtually unique identifier without coordinating with a central authority. They are 128-bit values. That is quadruple the 32-bits of a int integer primitive in Java. UUIDs are often displayed to humans as a canonically-formatted 36-character hexadecimal string grouped with hyphens. Do not confuse this textual representation for a UUID value itself; a UUID is a 128-bit value, not text.
The Version 1 type of UUIDs are ideal, as they represent a point in both space and time by combining a date-time, a MAC address, and a small arbitrary number. This makes it virtually impossible to have duplicates. By “virtually impossible”, I mean literally astronomically-large numbers. Java does not bundle a Version 1 generator because of security/privacy concerns. You can add a library, make a web services call, or ask your database such as Postgres to generate one for you.
Or, for most cases where we need a relatively small number of instances, use the Version 4 UUID where 122 of the 128 bits are randomly generated. Java bundles a Version 4 generator. Simply call UUID.randomUUID.
I agree with Basil Bourque and others about whether what you propose is the "right" approach. However, if you really want to do what you are proposing, then one way to achieve your goal (generate as many numbers as possible within range in pseudorandom order without having to store all previous values generated):
find a random number generator that has a period roughly within the range that you require;
to generate the next ID, take the next number generated, discarding ones that are not within range.
So for four digit numbers, one option would be an XORShift generator configured to generate 16 bit numbers (i.e. in the range 1-16383, near enough to 1-999):
private int seed = 1;
public int nextJobID() {
do {
seed = (seed ^ (seed << 5)) & 0x3fff;
seed = (seed ^ (seed >>> 3)) & 0x3fff;
seed = (seed ^ (seed << 7)) & 0x3fff;
} while (seed >= 10000);
return seed;
}
To generate a new sequence each "day", set 'seed' to any number between 1 and 16383. [It'll be the same sequence, just starting at a different point. You could vary the sequence a little e.g. by taking every nth job ID where n is a small number, reversing the pattern of shifts (do >>>, <<, >>> instead of <<, >>>, <<) or finding some other combination of shifts (not 5/3/7) that produce a complete period.]
This technique is guaranteed to produce all numbers in range in "random" order. It doesn't offer any other guarantee, e.g. that it will be the most efficient or produce the "highest quality" of randomness achievable. It's probably good enough -- if not as good as you can get -- given the requirements you set out.
I have a system which communicate to external system via webservices in which we used to send random nos as msg id and same is getting stored as a primary key of table in our database. Problem here is since we have approx 80-90 k call on daily basis i have seen so many exceptions saying that duplicate primary key. I am generating random nos in java. How can i be sure that whatever random number i will generate will not be duplicated.
below is code for the generating random nos:
private static int getRandomNumberInRange(int min, int max) {
if (min >= max) {
throw new IllegalArgumentException("max must be greater than min");
}
Random r = new Random();
return r.nextInt((max - min) + 1) + min;
}
There's nothing wrong with using a random number as a primary key. You just need to make sure that numbers are chosen from a range large enough to make the chance of picking a number more than once is virtually zero.
If you generate 100k identifiers per day for 30 years, that's about 1 billion identifiers. So, using a 100-bit number will make a collision virtually impossible over that time. 13 bytes, or maybe 12 if you feel lucky.
I define "virtually zero" as 2-40. There's not much point in defining it as less than 2-50, because things like RAM and hard drives are more likely than that to suffer undetected errors. When you have to satisfy a uniqueness constraint, estimates involving a 50% chance of collision are useless.
There is nothing magic about UUIDs. They are just 122-bit numbers with a verbose encoding. They will work, but they are overkill for this application.
You need to use a large random number, and a good source of randomness. int is not large enough, and you're restricting your range to less than that with your min and max.
The rule of thumb is that you should expect a 50% chance of collision for every 2n/2 numbers, where n is the number of bits in your random number.
The Random class in java.util isn't a good source for truely random numbers (among other problems, it uses a 48 bit seed). You should use SecureRandom, and at least a long. You should also construct it outside your method to avoid the overhead of initialisation.
As others have suggested, a UUID would solve your problem.
I have a program in which I need to generate random numbers that determine various outputs(To explain the exact reason would be too long). In theory a high number (lets say 100,000) is a valid output for my program, but its most likely(but not entirely impossible) going to end up being useless output.
I'd like to generate random numbers that are weighted to be around a "normalized" number.
For example I'd pick a number (10), and the majority of numbers that are randomly generated will be near 10. But there's a small chance the random number could any integer. I currently just use a range when generating the numbers, but this bothers me since numbers outside this range could potentially be valid and useful output.
Is there an easy way to do this without introducing to much overhead or having to map a percentage chances to individual integers?
For positive integers geometric, negative binomial, or Poisson are all possibilities. Java implementations are readily available for all of these.
I would consider this more of a statistics problem than a programming one. I think you want a logarithmic distribution. Here's an example Java implementation.
I am working on a project that needs to generate two random numbers from a given range (both of them at the same time, one after another) and check if they are equal to each other - if they are, proceed executing other code; if they aren't - generate the numbers again. Now my question is, if we have a range [0;10], and the first randomly generated number turned out to be 5, is the probability of the second number also being 5 as good as any other number? Specifically, does Math.random() have any "defense" against generating same number if it is called twice consecutively? or it "tries" to not generate the same number?
Generating the same number in the range [0,10] twice in succession is a perfectly valid occurrence for any random number generator. If it took any steps to prevent that it wouldn't be random.
On any invocation, the chances of any individual number being chosen should be 1:11, and each choice should be independent of previous choices, so the chances that in a pair the second number matches the first is 1 in 11.
As to how random Math.random() is, it's pseudo-random, meaning it uses an algorithm to generate a series of evenly distributed numbers starting with a "seed" value. It's not suitable for cryptography but quite good for simulations and other non-cryptographic uses.
Why were 181783497276652981 and 8682522807148012 chosen in Random.java?
Here's the relevant source code from Java SE JDK 1.7:
/**
* Creates a new random number generator. This constructor sets
* the seed of the random number generator to a value very likely
* to be distinct from any other invocation of this constructor.
*/
public Random() {
this(seedUniquifier() ^ System.nanoTime());
}
private static long seedUniquifier() {
// L'Ecuyer, "Tables of Linear Congruential Generators of
// Different Sizes and Good Lattice Structure", 1999
for (;;) {
long current = seedUniquifier.get();
long next = current * 181783497276652981L;
if (seedUniquifier.compareAndSet(current, next))
return next;
}
}
private static final AtomicLong seedUniquifier
= new AtomicLong(8682522807148012L);
So, invoking new Random() without any seed parameter takes the current "seed uniquifier" and XORs it with System.nanoTime(). Then it uses 181783497276652981 to create another seed uniquifier to be stored for the next time new Random() is called.
The literals 181783497276652981L and 8682522807148012L are not placed in constants, but they don't appear anywhere else.
At first the comment gives me an easy lead. Searching online for that article yields the actual article. 8682522807148012 doesn't appear in the paper, but 181783497276652981 does appear -- as a substring of another number, 1181783497276652981, which is 181783497276652981 with a 1 prepended.
The paper claims that 1181783497276652981 is a number that yields good "merit" for a linear congruential generator. Was this number simply mis-copied into Java? Does 181783497276652981 have an acceptable merit?
And why was 8682522807148012 chosen?
Searching online for either number yields no explanation, only this page that also notices the dropped 1 in front of 181783497276652981.
Could other numbers have been chosen that would have worked as well as these two numbers? Why or why not?
Was this number simply mis-copied into Java?
Yes, seems to be a typo.
Does 181783497276652981 have an acceptable merit?
This could be determined using the evaluation algorithm presented in the paper. But the merit of the "original" number is probably higher.
And why was 8682522807148012 chosen?
Seems to be random. It could be the result of System.nanoTime() when the code was written.
Could other numbers have been chosen that would have worked as well as these two numbers?
Not every number would be equally "good". So, no.
Seeding Strategies
There are differences in the default-seeding schema between different versions and implementation of the JRE.
public Random() { this(System.currentTimeMillis()); }
public Random() { this(++seedUniquifier + System.nanoTime()); }
public Random() { this(seedUniquifier() ^ System.nanoTime()); }
The first one is not acceptable if you create multiple RNGs in a row. If their creation times fall in the same millisecond range, they will give completely identical sequences. (same seed => same sequence)
The second one is not thread safe. Multiple threads can get identical RNGs when initializing at the same time. Additionally, seeds of subsequent initializations tend to be correlated. Depending on the actual timer resolution of the system, the seed sequence could be linearly increasing (n, n+1, n+2, ...). As stated in How different do random seeds need to be? and the referenced paper Common defects in initialization of pseudorandom number generators, correlated seeds can generate correlation among the actual sequences of multiple RNGs.
The third approach creates randomly distributed and thus uncorrelated seeds, even across threads and subsequent initializations.
So the current java docs:
This constructor sets the seed of the random number generator to a
value very likely to be distinct from any other invocation of this
constructor.
could be extended by "across threads" and "uncorrelated"
Seed Sequence Quality
But the randomness of the seeding sequence is only as good as the underlying RNG.
The RNG used for the seed sequence in this java implementation uses a multiplicative linear congruential generator (MLCG) with c=0 and m=2^64. (The modulus 2^64 is implicitly given by the overflow of 64bit long integers)
Because of the zero c and the power-of-2-modulus, the "quality" (cycle length, bit-correlation, ...) is limited. As the paper says, besides the overall cycle length, every single bit has an own cycle length, which decreases exponentially for less significant bits. Thus, lower bits have a smaller repetition pattern. (The result of seedUniquifier() should be bit-reversed, before it is truncated to 48-bits in the actual RNG)
But it is fast! And to avoid unnecessary compare-and-set-loops, the loop body should be fast. This probably explains the usage of this specific MLCG, without addition, without xoring, just one multiplication.
And the mentioned paper presents a list of good "multipliers" for c=0 and m=2^64, as 1181783497276652981.
All in all: A for effort # JRE-developers ;) But there is a typo.
(But who knows, unless someone evaluates it, there is the possibility that the missing leading 1 actually improves the seeding RNG.)
But some multipliers are definitely worse:
"1" leads to a constant sequence.
"2" leads to a single-bit-moving sequence (somehow correlated)
...
The inter-sequence-correlation for RNGs is actually relevant for (Monte Carlo) Simulations, where multiple random sequences are instantiated and even parallelized. Thus a good seeding strategy is necessary to get "independent" simulation runs. Therefore the C++11 standard introduces the concept of a Seed Sequence for generating uncorrelated seeds.
If you consider that the equation used for the random number generator is:
Where X(n+1) is the next number, a is the multipler, X(n) is the current number, c is the increment and m is the modulus.
If you look further into Random, a, c and m are defined in the header of the class
private static final long multiplier = 0x5DEECE66DL; //= 25214903917 -- 'a'
private static final long addend = 0xBL; //= 11 -- 'c'
private static final long mask = (1L << 48) - 1; //= 2 ^ 48 - 1 -- 'm'
and looking at the method protected int next(int bits) this is were the equation is implemented
nextseed = (oldseed * multiplier + addend) & mask;
//X(n+1) = (X(n) * a + c ) mod m
This implies that the method seedUniquifier() is actually getting X(n) or in the first case at initialisation X(0) which is actually 8682522807148012 * 181783497276652981, this value is then modified further by the value of System.nanoTime(). This algorithm is consistent with the equation above but with the following X(0) = 8682522807148012, a = 181783497276652981, m = 2 ^ 64 and c = 0. But as the mod m of is preformed by the long overflow the above equation just becomes
Looking at the paper, the value of a = 1181783497276652981 is for m = 2 ^ 64, c = 0. So it appears to just be a typo and the value 8682522807148012 for X(0) which appears to be a seeming randomly chosen number from legacy code for Random. As seen here. But the merit of these chosen numbers could still be valid but as mentioned by Thomas B. probably not as "good" as the one in the paper.
EDIT - Below original thoughts have since been clarified so can be disregarded but leaving it for reference
This leads me the conclusions:
The reference to the paper is not for the value itself but for the methods used to obtain the values due to the different values of a, c and m
It is mere coincidence that the value is otherwise the same other than the leading 1 and the comment is misplaced (still struggling to believe this though)
OR
There has been a serious misunderstanding of the tables in the paper and the developers have just chosen a value at random as by the time it is multiplied out what was the point in using the table value in the first place especially as you can just provide your own seed value any way in which case these values are not even taken into account
So to answer your question
Could other numbers have been chosen that would have worked as well as these two numbers? Why or why not?
Yes, any number could have been used, in fact if you specify a seed value when you Instantiate Random you are using any other value. This value does not have any effect on the performance of the generator, this is determined by the values of a,c and m which are hard coded within the class.
As per the link you provided, they have chosen (after adding the missing 1 :) ) the best yield from 2^64 because long can't have have a number from 2^128