Which probability does method Random.next(int) work with? - java

In my Android app I want to generate an integer number between 0 and 100000 using method Random.next(int). I want to know the type of probability of this method. Is it the same for every value from the range?

Assuming you want to draw 100000, it will broadly satisfy the statistical properties of a uniform distribution in the range [0, 100000].
That is,
The probability of generating any particular number is equal, and independent of the previous numbers drawn.
The average value of the mean of a sample of random numbers will be 100000 / 2
The variance of a sample will converge to the variance of a uniform distribution.
But it's not perfect, or cryptographically secure. In particular you can observe autocorrelation effects. Depending on your specific requirements you may need to consider alternatives.

I assume you are referring to nextInt(int) and not next(int) (which is protected, and not public method).
It's uniform (discrete, pseudo-random) distribution.
From the documentation:
Returns a pseudorandom, uniformly distributed int value between 0
(inclusive) and the specified value (exclusive)

It will generate a pseudorandom number. Each number has the same probability to be generated. I tried to call Random.next(2) 100 times and I get 50 1s and 50 0s. I think you get the idea. Its a uniform distribution.
Here's the source code if you are curious:
protected synchronized int next(int bits) {
seed = (seed * multiplier + 0xbL) & ((1L << 48) - 1);
return (int) (seed >>> (48 - bits));
}

Each call of Random.nextInt(int value) will be produced different result.
What about probability?
From docs :
The general contract of nextInt is that one int value is pseudorandomly generated and returned. All 2^32 possible int values are produced with (approximately) equal probability

Related

How is it possible to get a random number with a specific probability?

I wanted to make a random number picker in the range 1-50000.
But I want to do it so that the larger the number, the smaller the probability.
Probability like (1/2*number) or something else.
Can anybody help?
You need a mapping function of some sort. What you get from Random is a few 'primitive' constructs that you can trust do exactly what their javadoc spec says they do:
.nextInt(X) which returns, uniform random (i.e. the probability chart is an exact horizontal line), a randomly chosen number between 0 and X-1 inclusive.
.nextBoolean() which gives you 1 bit of randomness.
.nextDouble(), giving you a mostly uniform random number between 0.0 and 1.0
nextGaussian() which gives you a random number whose probability chart is a uniform normal curve with standard deviation = 1.0 and midpoint (average) of 0.0.
For the double-returning methods, you run into some trouble if you want exact precision. Computers aren't magical. As a consequence, if you e.g. write this mapping function to turn nextDouble() into a standard uniformly distributed 6-sided die roll, you'd think: int dieRoll = 1 + (int) (rnd.nextDouble() * 6); would do it. Had double been perfect, you'd be right. But they aren't, so, instead, best case scenario, 4 of 6 die faces are going to come up 750599937895083 times, and the other 2 die faces are going to come up 750599937895082 times. It'll be hard to really notice that, but it is provably imperfect. I assume this kind of tiny deviation doesn't matter to you, but, it's good to be aware that anytime you so much as mention double, inherent tiny errors creep into everything and you can't really stop that from happening.
What you need is some sort of mapping function that takes any amount of such randomly provided data (from those 3 primitives, and really only from nextInt/nextBoolean if you want to avoid the errors that double inherently brings) to produce what you want.
For example, imagine instead the 'primitive' I gave you is a uniform random value between 1 and 6, inclusive, i.e.: A standard 6-sided die roll. And I ask you to come up with a uniform algorithm (as in, each value is equally likely) to produce a number between 2 and 12, inclusive.
Perhaps you might think: Easy, just roll 2 dice and add em up. But that would be incorrect: 7 is far more likely than 12.
Instead, you'd roll 1 die and just register if it was even or odd. Then you roll the second die and that's your result, unless the first die was odd in which case you add 6 to it. If you get odd on the first die and 1 on the second die, you start the process over again; eventually you're bound to not roll snake eyes.
That'd be uniform random.
You can apply the same principle to your question. You need a mathematical function that maps the 'horizontal line' of .nextInt() to whatever curve you want. For example, sounds like you want to perhaps generate something and then take the square root and round it down, maybe. You're going to have to draw out or write a formula that precisely describes the probability density.
Here's an example:
while (true) {
int v = (int) (50000.0 * Math.abs(r.nextGaussian()));
if (v >= 1 && v <= 50000) return v;
}
That returns you a roughly normally distributed value, 1 being the most likely, 50000 being the least likely.
One simple formula that will give you a very close approximation to what you want is
Random random = new Random();
int result = (int) Math.pow( 50001, random.nextDouble());
That will give a result in the range 1 - 50000, where the probability of each result is approximately proportional to 1 / result, which is what you asked for.
The reason why it works is that the probability of result being any value n within the range is P( n <= 50001^x < n+1) where x is randomly distributed in [0,1). That's the probability that x falls between log(n) and log(n+1), where the logs are base 50001. But that probability is proportional to log (1 + 1/n), which is very close to 1/n.

What is the distribution of math.random in Java?

I want to use math.random() in a Java program.
What is the distribution of this way of getting doubles?
This is documented in the Java SE API docs:
public static double random()
Returns a double value with a positive sign, greater than or equal to 0.0 and less than 1.0. Returned values are chosen pseudorandomly with (approximately) uniform distribution from that range.
The doc goes on to mention that it uses java.util.Random under the hood, presumably nextDouble() (although not explicitly stated).
The remarks about nonuniformity in nextDouble are as follows:
The method nextDouble is implemented by class Random as if by:
public double nextDouble() {
return (((long)next(26) << 27) + next(27))
/ (double)(1L << 53);
}
The hedge "approximately" is used in the foregoing description only because the next method is only approximately an unbiased source of independently chosen bits. If it were a perfect source of randomly chosen bits, then the algorithm shown would choose double values from the stated range with perfect uniformity.
It appears that the non-uniformity is only due to the underlying nonuniformity of next rather than a significant bias in the nextDouble algorithm itself:
[In early versions of Java, the result was incorrectly calculated as:
return (((long)next(27) << 27) + next(27))
/ (double)(1L << 54);
This might seem to be equivalent, if not better, but in fact it
introduced a large nonuniformity because of the bias in the rounding of floating-point numbers: it was three times as likely that the low-order bit of the significand would be 0 than that it would be 1! This nonuniformity probably doesn't matter much in practice, but we strive for perfection.]
Note that this draws uniformly from the interval [0,1] separated into equal steps of 2-53, which is not the same as drawing uniformly from the set of all double values whose values are between 0 and 1. This is a subtle difference: the set of all double values between 0 and 1 is itself not uniformly spaced on the number line as this image shows:
image reproduced from docs.oracle.com

Finding Java.util.Random seed from bounded nextInt(int bound) results

Background
I've been reading and trying to wrap my head around various questions/answers that relate to finding the seed from Java.util.Random given its output from nextInt().
The implementation of nextInt(int bound) is:
public int nextInt(int bound) {
if (bound <= 0)
throw new IllegalArgumentException("bound must be positive");
if ((bound & -bound) == bound) // i.e., bound is a power of 2
return (int)((bound * (long)next(31)) >> 31);
int bits, val;
do {
bits = next(31);
val = bits % bound;
} while (bits - val + (bound-1) < 0);
return val;
}
The implementation of next(int bits) is:
protected int next(int bits) {
long oldseed, nextseed;
AtomicLong seed = this.seed;
do {
oldseed = seed.get();
nextseed = (oldseed * multiplier + addend) & mask;
} while (!seed.compareAndSet(oldseed, nextseed));
return (int)(nextseed >>> (48 - bits));
}
where the multiplier is 0x5DEECE66DL, the addend is 0xBL, and the mask is (1L << 48) - 1. These are hexadecimal values where the L is Java convention for long conversion.
By calling nextInt() without a bound, the full 32-bits are returned from next(32) instead of dropping bits with bits % bound.
Questions
Without completely bruteforcing the full 248 possibilities, how would I go about finding the current seed after x amount of calls to nextInt(n) (assuming the bound is never a power of 2)? For example, let's assume I want to find the seed given 10 calls to nextInt(344) [251, 331, 306, 322, 333, 283, 187, 54, 170, 331].
How can I determine the amount of data I'd need to find the correct seed, not just another one that produces the same starting data?
Does this change given bounds that are either odd/even?
Without completely bruteforcing the full 248 possibilities, how would I go about finding the current seed after x amount of calls to nextInt(n) (assuming the bound is never a power of 2)?
Let's first remove code that is here for multi-threading, error testing, and bound a power of two. Things boil down to
public int nextInt_equivalent(int bound) {
int bits, val;
do {
seed = (seed * multiplier + addend) & mask; // 48-bit LCG
bits = seed >> 17; // keep the top 31 bits
val = bits % bound; // build val in [0, bound)
} while (bits - val + (bound-1) < 0);
return val;
}
Next we must understand what the while (bits - val + (bound-1) < 0) is about. It's here to use bits only when it is in an interval of width multiple of bound, thus insuring a uniform distribution of val. That interval is [0, (1L<<31)/bound*bound).
The while condition is equivalent to while (bits >= (1L<<31)/bound*bound), but executes faster. This condition occurs for the (1L<<31)%bound highest values of bits out of 1L<<31. When bound is 344, this occurs for 8 values of bits out of 231, or about 3.7 per US billion.
This is so rare that one reasonable approach is to assume it does not occur. Another is to hope that it occurs, and test if (and when) it does by seeing if the seeds that cause that rare event lead to a sequence of val found in the givens. We have only ((1L<<31)%bound)<<17 (here slightly above a million) values of seed to test, which is quite feasible. Whatever, in the rest I assume this is ruled out, and consider the generator without the while.
When bound is even, or more generally a multiple of 2S for some S>0, observe that the low-order S bits of the output val (which we can find) are also the low-order S bits of bits, and thus the bits of rank [17, 17+S) of seed. And that the low 17+S bits of seed are fully independent of the 31-S other ones. When bound is 344=8×43, we have S=3, and thus we can attack the low-order 17+S=20 low-order bits of seed independently. We directly get S=3 bits of seed from the first val.
We get the 17 low-order bits of seed by elimination: for each of 217 candidates, and given the S=3 bits we know, does the 17+S=20 bits of seed lead to a sequence of val which low-order S bits match the given sequence? With enough values we can make a full determination of the 17+S bits. We need ⌈17/S+1⌉ = 7 val to narrow down to a single value for the 17+S low-order bits of seedin this way. If we got less, we need to keep several candidates in the next step. In the question we have ample val to narrow to a single value, and be convinced we get it right.
And then when we have these 17+S=20 bits of seed, we can find the remaining 31-S=28 with a moderate amount of brute force. We could test 228 values of the yet unknown bits for seed and check which gives a full match with the known val. But better: we know seed % (bound<<17) exactly, and thus need only test 231/bound values of seed (here roughly 6 million).
How can I determine the amount of data I'd need to find the correct seed?
A working heuristic for all but pathological LCGs, and many other PRNGs, is that you need as much information as there are bits in the state, thus 48 bits. Each output gives you log2(bound) bits, thus you need ⌈48/log2(bound)⌉ values, here 6 (which would require keeping track of a few candidates for the low 20 bits of seed and thus require correspondingly more work in the second phase) . Extra values give confidence that the actual state was recovered, but AFAIK wrong guesses just will not happen unless the while comes into play.
Does this change given bounds that are either odd/even?
The above attack strategy does not work well for odd bound (we can't separately guess the low-order bits and need to search 248/bound values of seed). However there are better attacks with less guesswork, applicable even if we raise the number of state bits considerably, and including for odd bound. They are more difficult to explain (read: I hardly can get them working with a math package, and can't explain how, yet; see this question).

How to generate Unique random number even application get closed in a day

The Requirement:
I need to generate 4-digit non-duplicate number - even my application get closed, generation of number must not have to be duplicate.
I don't want to store all previous number in any storage.
Is there any algorithm which has highest possibility to produce most of unique number in a day ?
Thank you
Don't generate a random number. Instead, generate a sequential number from 0000 to 9999, and then obfuscate it using the technique described in https://stackoverflow.com/a/34420445/56778.
That way, the only thing you have to save is the next sequential number.
That example uses a multiplicative inverse to map the numbers from 0 to 100 to other numbers within the same range. Every number from 0 to 100 will be mapped to a unique number between 0 and 100. It's quick and easy, and you can change the mapping by changing the constants.
More information at http://blog.mischel.com/2017/06/20/how-to-generate-random-looking-keys/
Ten thousand is too few
Generating more than a few random numbers within a range of 0 to 9,999 is not likely to go well. Depending on how many such numbers you need, you are quite likely to run into duplicates. That seems rather obvious.
Cryptographically-strong random number generator
As for generating the most randomness, you need a cryptographically-strong random number generator that produces non-deterministic output.
java.security.SecureRandom
Java provides such a beast in the java.security.SecureRandom class. Study the class documentation for options.
SecureRandom secureRandom = new SecureRandom();
Notice that a SecureRandom is a java.util.Random. On that interface you will find convenient methods such as nextInt​(int bound). You get a pseudorandom, uniformly distributed int value between 0 (inclusive) and the specified value (exclusive).
int r = secureRandom.nextInt( 10_000 ) ;
Let's try it.
SecureRandom secureRandom = new SecureRandom();
for( int i = 1 ; i <= 20 ; i ++ )
{
System.out.println( secureRandom.nextInt( 10_000 ) ) ;
}
7299
3581
7106
1195
8713
9517
6954
5558
6136
1623
7006
2910
5855
6273
1691
588
5629
7347
7123
6973
If you need more than few such numbers, or they absolutely must be distinct (no duplicates), than you must change your constraints.
For a small range like 10,000, keep a collection of generated numbers, checking each newly generated number to see if it has been used already.
Dramatically expand your limit far beyond 10,000.
UUID
If you need a universally unique identifier, use, well, a Universally Unique Identifier (UUID). Java represents a UUID value with the java.util.UUID class.
UUIDs were designed for systems to be able to independently generate a virtually unique identifier without coordinating with a central authority. They are 128-bit values. That is quadruple the 32-bits of a int integer primitive in Java. UUIDs are often displayed to humans as a canonically-formatted 36-character hexadecimal string grouped with hyphens. Do not confuse this textual representation for a UUID value itself; a UUID is a 128-bit value, not text.
The Version 1 type of UUIDs are ideal, as they represent a point in both space and time by combining a date-time, a MAC address, and a small arbitrary number. This makes it virtually impossible to have duplicates. By “virtually impossible”, I mean literally astronomically-large numbers. Java does not bundle a Version 1 generator because of security/privacy concerns. You can add a library, make a web services call, or ask your database such as Postgres to generate one for you.
Or, for most cases where we need a relatively small number of instances, use the Version 4 UUID where 122 of the 128 bits are randomly generated. Java bundles a Version 4 generator. Simply call UUID.randomUUID.
I agree with Basil Bourque and others about whether what you propose is the "right" approach. However, if you really want to do what you are proposing, then one way to achieve your goal (generate as many numbers as possible within range in pseudorandom order without having to store all previous values generated):
find a random number generator that has a period roughly within the range that you require;
to generate the next ID, take the next number generated, discarding ones that are not within range.
So for four digit numbers, one option would be an XORShift generator configured to generate 16 bit numbers (i.e. in the range 1-16383, near enough to 1-999):
private int seed = 1;
public int nextJobID() {
do {
seed = (seed ^ (seed << 5)) & 0x3fff;
seed = (seed ^ (seed >>> 3)) & 0x3fff;
seed = (seed ^ (seed << 7)) & 0x3fff;
} while (seed >= 10000);
return seed;
}
To generate a new sequence each "day", set 'seed' to any number between 1 and 16383. [It'll be the same sequence, just starting at a different point. You could vary the sequence a little e.g. by taking every nth job ID where n is a small number, reversing the pattern of shifts (do >>>, <<, >>> instead of <<, >>>, <<) or finding some other combination of shifts (not 5/3/7) that produce a complete period.]
This technique is guaranteed to produce all numbers in range in "random" order. It doesn't offer any other guarantee, e.g. that it will be the most efficient or produce the "highest quality" of randomness achievable. It's probably good enough -- if not as good as you can get -- given the requirements you set out.

Effective Java Item 47: Know and use your libraries - Flawed random integer method example

In the example Josh gives of the flawed random method that generates a positive random number with a given upper bound n, I don't understand the two of the flaws he states.
The method from the book is:
private static final Random rnd = new Random();
//Common but deeply flawed
static int random(int n) {
return Math.abs(rnd.nextInt()) % n;
}
He says that if n is a small power of 2, the sequence of random numbers that are generated will repeat itself after a short period of time. Why is this the case? The documentation for Random.nextInt() says Returns the next pseudorandom, uniformly distributed int value from this random number generator's sequence. So shouldn't it be that if n is a small integer then the sequence will repeat itself, why does this only apply to powers of 2?
Next he says that if n is not a power of 2, some numbers will be returned on average more frequently than others. Why does this occur, if Random.nextInt() generates random integers that are uniformly distributed? (He provides a code snippet which clearly demonstrates this but I don't understand why this is the case, and how this is related to n being a power of 2).
Question 1: if n is a small power of 2, the sequence of random numbers that are generated will repeat itself after a short period of time.
This is not a corollary of anything Josh is saying; rather, it is simply a known property of linear congruential generators. Wikipedia has the following to say:
A further problem of LCGs is that the lower-order bits of the generated sequence have a far shorter period than the sequence as a whole if m is set to a power of 2. In general, the n-th least significant digit in the base b representation of the output sequence, where bk = m for some integer k, repeats with at most period bn.
This is also noted in the Javadoc:
Linear congruential pseudo-random number generators such as the one implemented by this class are known to have short periods in the sequence of values of their low-order bits.
The other version of the function, Random.nextInt(int), works around this by using different bits in this case (emphasis mine):
The algorithm treats the case where n is a power of two specially: it returns the correct number of high-order bits from the underlying pseudo-random number generator.
This is a good reason to prefer Random.nextInt(int) over using Random.nextInt() and doing your own range transformation.
Question 2: Next he says that if n is not a power of 2, some numbers will be returned on average more frequently than others.
There are 232 distinct numbers that can be returned by nextInt(). If you try to put them into n buckets by using % n, and n isn't a power of 2, some buckets will have more numbers than others. This means that some outcomes will occur more frequently than others even though the original distribution was uniform.
Let's look at this using small numbers. Let's say nextInt() returned four equiprobable outcomes, 0, 1, 2 and 3. Let's see what happens if we applied % 3 to them:
0 maps to 0
1 maps to 1
2 maps to 2
3 maps to 0
As you can see, the algorithm would return 0 twice as frequently as it would return each of 1 and 2.
This does not happen when n is a power of two, since one power of two is divisible by the other. Consider n=2:
0 maps to 0
1 maps to 1
2 maps to 0
3 maps to 1
Here, 0 and 1 occur with the same frequency.
Additional resources
Here are some additional -- if only tangentially relevant -- resources related to LCGs:
Spectral tests are statistical tests used to assess the quality of LCGs. Read more here and here.
A collection of classical pseudorandom number generators with linear structures has some pretty scatterplots (the generator used in Java is called DRAND48).
There is an interesting discussion on crypto.SE about predicting values from Java's generator.
1) When n is a power of 2, rnd % n is equivalent to selecting a few lower bits of the original. Lower bits of numbers generated by the type of generators used by java are known to be "less random" than the higher bits. It's just the property of the formula used for generating the numbers.
2) Imagine, that the largest possible value, returned by random() is 10, and n = 7. Now doing n % 7 maps numbers 7, 8, 9 and 10 into 0, 1, 2, 3 respectively. Therefore, if the original number is uniformly distributed, the result will be heavily biased towards the lower numbers, because they will appear twice as often as 4, 5 and 6. In this case, this does happen regardless of whether n is a power of two or not, but, if instead of 10 we chose, say, 15 (which is 2^4-1), then any n, that is a power of two would result in a uniform distribution, because there would be no "excess" numbers left at the end of the range to cause bias, because the total number of possible values would be exactly divisible by the number of possible remainders.

Categories

Resources