I'm using this to generate values between 5 to 13
int randomGeneratedLevelValue = ThreadLocalRandom.current().nextInt(5, 13);
How to make same matches fewer?
There is no way to reduce the number of repetitions in a true random number sequence without biasing the random number sequence.
So, for example, in your sequence 10, 5, 12, 12, 13, 13, the chance that 12 is followed by another 12 is 1 in 9; i.e. the same as the probability of any other number in the range.
Now it is possible that since you are using Random / ThreadLocalRandom you are seeing the effects of the autocorrelation that is inherent in linear congruential generators. If so, these effects can be eliminated by using SecureRandom instead. But SecureRandom calls are significantly more expensive.
The other approach would be to deliberately bias against repetitions; e.g. (pseudo-code)
int random = rand.nextInt(...)
if (random == lastRandom) {
random = rand.nextInt(...);
}
return random;
But be careful. Introducing a bias could have unintended / unexpected consequences.
The Birthday Paradox predicts that the probability of duplicates in random numbers is much higher than you'd think. For example, it predicts that, with a mere 23 random people, there's a greater than 50% chance of two of them have the same birthday. By the Pigeonhole Principle, it takes 367 people for there to be a 100% chance of a duplicate, but the probability of a duplicate is extremely high even before that.
Here's the probability distribution (from Wikipedia):
The rule of thumb to approximate the number of numbers you have to generate before the probability of duplicates reaches sqrt(2m * p(n)), where m is the number of possible random numbers and p(n) is the probability that you're looking for. So, for example, if you're generating random numbers in a 50-number range (e.g. if you pick a random number from 100 - 150), you'd only have to generate approximately sqrt((2 * 50) * 0.5) = 7.07 random numbers before the odds are just as good as not that you have a duplicate. If you generate 8 random numbers within a 50-number range, odds are better than not that'll have a duplicate. (Note that this only works for p(n) values of up to 1/2).
In your case, there are 8 possible values for any particular random value (5, 6, 7, 8, 9, 10, 11, 12), so you only need to generate sqrt(8) = 2.83 numbers before there's a probability of 50% that you'll have a duplicate. In other words, the Birthday Paradox predicts you only need to generate approximately 3 numbers for the chances to be better than not that you'll have a duplicate.
See also this Q&A.
One more point: beware the Gambler's Fallacy, in which people assume that if you randomly generate, for example, a 10, odds are the next one won't be a 10. Actually, given that you're generating random numbers, the odds of any particular number is 1/8 regardless of what numbers came before. In other words, if you generate a 12, the probability of the next number being a 10 is 1/8. If you generate a 7, the odds of the next number being a 10 is 1/8. If you generate a 10, the probability of the next number being a 10 is still 1/8. Each number is an independent event (i.e. the numbers you've generated so far don't influence the probability distribution of future numbers in the least).
TL;DR You need to generate much fewer numbers than you think you do before you're likely to start getting duplicates - if you're generating random numbers within a small range of numbers in particular (like you are) the number is particularly low.
Related
I've been searching for, and found similar topics, but cannot understand them or work out how to apply them.
Simple: all I want is to generate a number between 1 and 100:
Numbers 1 to 30 should have a 60% probability.
Numbers 31 to 60 should have a 35% probability.
Numbers 61 to 100 should have a 5% probability.
Get numbers in your ranges
First generate 3 random numbers in your 3 intervals.
1: 1-30
2: 31-60
3: 61-100
Generate the probability number
Next, generate a number 1-100. If the number is 1-60 choose the first random number from the first step if it is 61-95 do the second option and if it is 96-100 chose the third.
This sounds like a homework problem so I will not provide the code, but here is a description of a simple algorithm:
Generate a random number between 1 and 100. Lets call this X. X will be used to determine how to generate your final result:
If X is between 1 and 60, generate a random number between 1 and 30 to be your final result.
If X is between 61 and 95, generate a random number between 31 and 60 to be your final result.
If X is between 96 and 100, generate a random number between 61 and 100 to be your final result.
You can see that this requires two random number generations for every weighted number that you want. It can actually be simplified into a single random number generation, and that is left as an exercise for you.
FYI, how to generate a random number within a range is found here: How do I generate random integers within a specific range in Java?
Simplest way for me would be to generate four randoms. The first would be a number 1-100. The second would be a number 1-30, the third would be a number 31-60, and the fourth would be a number 61-100. Think of the first random as a percent. If it is 1-60 you then move on to run the second random, if it is 60-95 run the third random, and if it is 95-100 run the fourth random. There are other ways to make it shorter, but in my opinion this is easiest to comprehend.
Create random number 1-100 with this: (int)(Math.random()*100)+1
The rest should just be conditionals.
In the example Josh gives of the flawed random method that generates a positive random number with a given upper bound n, I don't understand the two of the flaws he states.
The method from the book is:
private static final Random rnd = new Random();
//Common but deeply flawed
static int random(int n) {
return Math.abs(rnd.nextInt()) % n;
}
He says that if n is a small power of 2, the sequence of random numbers that are generated will repeat itself after a short period of time. Why is this the case? The documentation for Random.nextInt() says Returns the next pseudorandom, uniformly distributed int value from this random number generator's sequence. So shouldn't it be that if n is a small integer then the sequence will repeat itself, why does this only apply to powers of 2?
Next he says that if n is not a power of 2, some numbers will be returned on average more frequently than others. Why does this occur, if Random.nextInt() generates random integers that are uniformly distributed? (He provides a code snippet which clearly demonstrates this but I don't understand why this is the case, and how this is related to n being a power of 2).
Question 1: if n is a small power of 2, the sequence of random numbers that are generated will repeat itself after a short period of time.
This is not a corollary of anything Josh is saying; rather, it is simply a known property of linear congruential generators. Wikipedia has the following to say:
A further problem of LCGs is that the lower-order bits of the generated sequence have a far shorter period than the sequence as a whole if m is set to a power of 2. In general, the n-th least significant digit in the base b representation of the output sequence, where bk = m for some integer k, repeats with at most period bn.
This is also noted in the Javadoc:
Linear congruential pseudo-random number generators such as the one implemented by this class are known to have short periods in the sequence of values of their low-order bits.
The other version of the function, Random.nextInt(int), works around this by using different bits in this case (emphasis mine):
The algorithm treats the case where n is a power of two specially: it returns the correct number of high-order bits from the underlying pseudo-random number generator.
This is a good reason to prefer Random.nextInt(int) over using Random.nextInt() and doing your own range transformation.
Question 2: Next he says that if n is not a power of 2, some numbers will be returned on average more frequently than others.
There are 232 distinct numbers that can be returned by nextInt(). If you try to put them into n buckets by using % n, and n isn't a power of 2, some buckets will have more numbers than others. This means that some outcomes will occur more frequently than others even though the original distribution was uniform.
Let's look at this using small numbers. Let's say nextInt() returned four equiprobable outcomes, 0, 1, 2 and 3. Let's see what happens if we applied % 3 to them:
0 maps to 0
1 maps to 1
2 maps to 2
3 maps to 0
As you can see, the algorithm would return 0 twice as frequently as it would return each of 1 and 2.
This does not happen when n is a power of two, since one power of two is divisible by the other. Consider n=2:
0 maps to 0
1 maps to 1
2 maps to 0
3 maps to 1
Here, 0 and 1 occur with the same frequency.
Additional resources
Here are some additional -- if only tangentially relevant -- resources related to LCGs:
Spectral tests are statistical tests used to assess the quality of LCGs. Read more here and here.
A collection of classical pseudorandom number generators with linear structures has some pretty scatterplots (the generator used in Java is called DRAND48).
There is an interesting discussion on crypto.SE about predicting values from Java's generator.
1) When n is a power of 2, rnd % n is equivalent to selecting a few lower bits of the original. Lower bits of numbers generated by the type of generators used by java are known to be "less random" than the higher bits. It's just the property of the formula used for generating the numbers.
2) Imagine, that the largest possible value, returned by random() is 10, and n = 7. Now doing n % 7 maps numbers 7, 8, 9 and 10 into 0, 1, 2, 3 respectively. Therefore, if the original number is uniformly distributed, the result will be heavily biased towards the lower numbers, because they will appear twice as often as 4, 5 and 6. In this case, this does happen regardless of whether n is a power of two or not, but, if instead of 10 we chose, say, 15 (which is 2^4-1), then any n, that is a power of two would result in a uniform distribution, because there would be no "excess" numbers left at the end of the range to cause bias, because the total number of possible values would be exactly divisible by the number of possible remainders.
I am working on a project that needs to generate two random numbers from a given range (both of them at the same time, one after another) and check if they are equal to each other - if they are, proceed executing other code; if they aren't - generate the numbers again. Now my question is, if we have a range [0;10], and the first randomly generated number turned out to be 5, is the probability of the second number also being 5 as good as any other number? Specifically, does Math.random() have any "defense" against generating same number if it is called twice consecutively? or it "tries" to not generate the same number?
Generating the same number in the range [0,10] twice in succession is a perfectly valid occurrence for any random number generator. If it took any steps to prevent that it wouldn't be random.
On any invocation, the chances of any individual number being chosen should be 1:11, and each choice should be independent of previous choices, so the chances that in a pair the second number matches the first is 1 in 11.
As to how random Math.random() is, it's pseudo-random, meaning it uses an algorithm to generate a series of evenly distributed numbers starting with a "seed" value. It's not suitable for cryptography but quite good for simulations and other non-cryptographic uses.
first time here at Stackoverflow. I hope someone can help me with my search of an algorithm.
I need to generate N random numbers in given Ranges that sum up to a given sum!
For example: Generatare 3 Numbers that sum up to 11.
Ranges:
Value between 1 and 3.
Value between 5 and 8.
value between 3 and 7.
The Generated numbers for this examle could be: 2, 5, 4.
I already searched alot and couldnt find the solution i need.
It is possible to generate like N Numbers of a constant sum unsing modulo like this:
generate random numbers of which the sum is constant
But i couldnt get that done with ranges.
Or by generating N random values, sum them up and then divide the constant sum by the random sum and afterwards multiplying each random number with that quotient as proposed here.
Main Problem, why i cant adopt those solution is that every of my random values has different ranges and i need the values to be uniformly distributed withing the ranges (no frequency occurances at min/max for example, which happens if i cut off the values which are less/greater than min/max).
I also thought of an soultion, taking a random number (in that Example, Value 1,2 or 3), generate the value within the range (either between min/max or min and the rest of the sum, depending on which is smaller), substracting that number of my given sum, and keep that going until everything is distributed. But that would be horrible inefficiant. I could really use a way where the runtime of the algorithm is fixed.
I'm trying to get that running in Java. But that Info is not that importend, except if someone already has a solution ready. All i need is a description or and idea of an algorithm.
First, note that the problem is equivalent to:
Generate k numbers that sums to a number y, such that x_1, ..., x_k -
each has a limit.
The second can be achieved by simply reducing the lower bound from the number - so in your example, it is equivalent to:
Generate 3 numbers such that x1 <= 2; x2 <= 3; x3 <= 4; x1+x2+x3 = 2
Note that the 2nd problem can be solved in various ways, one of them is:
Generate a list with h_i repeats per element - where h_i is the limit for element i - shuffle the list, and pick the first elements.
In your example, the list is:[x1,x1,x2,x2,x2,x3,x3,x3,x3] - shuffle it and choose first two elements.
(*) Note that shuffling the list can be done using fisher-yates algorithm. (you can abort the algorithm in the middle after you passed the desired limit).
Add up the minimum values. In this case 1 + 5 + 3 = 9
11 - 9 = 2, so you have to distribute 2 between the three numbers (eg: +2,+0,+0 or +0,+1,+1).
I leave the rest for you, it's relatively easy to create a uniform distribution after this transformation.
This problem is equivalent to randomly distributing an excess of 2 over the minimum of 9 on 3 positions.
So you start with the minima (1/5/3) and then cycle 2 times, generating a (pseudo-)random value of [0-2] (3 positions) and increment the indexed value.
e.g.
Start 1/5/3
1st random=1 ... increment index 1 ... 1/6/3
2nd random=0 ... increment index 0 ... 2/6/3
2+6+3=11
Edit
Reading this a second time, I understand, this is exactly what #KarolyHorvath mentioned.
I would like to get clarifications on Pseudo Random Number generation.
My questions are:
Is there any chance for getting repeated numbers in Pseudo Random Number Generation?
When i googled i found true random number generation. Can i get some algorithms for true random number generation, so that i can use it with
SecureRandom.getInstance(String algorithm)
Please give guidance with priority given to security.
1) Yes, you can generally have repeated numbers in a PRNG. Actually, if you apply the pigeon hole principle, the proof is quite straightforward (ie, suppose you have a PRNG on the set of 32-bit unsigned integers; if you generate more than 2^32 pseudo random numbers, you will certainly have at least one number generated at least 2 times; in practice, that would happen way faster; usually the algorithms for PRNGs will cycle through a sequence, and you have a way to calculate or estimate the size of that cycle, at the end of which every single number will start repeating, and the image of the algorithm is usually way, way smaller than the set from which you take your numbers).
If you need non-repeated numbers (since security seems to be a concern for you, note that this is less secure than a sequence of (pseudo) random numbers in which you allow repeated numbers!!!), you can do as follows:
class NonRepeatedPRNG {
private final Random rnd = new Random();
private final Set<Integer> set = new HashSet<>();
public int nextInt() {
for (;;) {
final int r = rnd.nextInt();
if (set.add(r)) return r;
}
}
}
Note that the nextInt method defined above may never return! Use with caution.
2) No, there's no such thing as an "algorithm for true random number generation", since an algorithm is something known, that you control and can predict (ie, just run it and you have the output; you know exactly its output the next time you run it with the same initial conditions), while a true RNG is completely unpredictable by definition.
For most common non security-related applications (ie, scientific calculations, games, etc), a PRNG will suffice. If security is a concern (ie, you want random numbers for crypto), then a CSPRNG (cryptographycally secure PRNG) will suffice.
If you have an application that cannot work without true randomness, I'm really curious to know more about it.
Yes, any random number generator can repeat. There are three general solutions to the non-duplicate random number problem:
If you want a few numbers from a large range then pick one and reject
it if it is a duplicate. If the range is large, then this won't cause
too many repeated attempts.
If you want a lot of numbers from a small range, then set out all the numbers in an
array and shuffle the array. The Fisher-Yates algorithm is standard for array
shuffling. Take the random numbers in sequence from the shuffled array.
If you want a lot of numbers from a large range then use an appropriately sized
encryption algorithm. E.g. for 64 bit numbers use DES and encrypt 0, 1, 2, 3, ...
in sequence. The output is guaranteed unique because encryption is reversible.
Pseudo RNGs can repeat themselves, but True RNGs can also repeat themselves - if they never repeated themselves they wouldn't be random.
A good PRNG once seeded with some (~128 bit) real entropy is practically indistinguishable from a true RNG. You certainly won't get noticeably more collisions or repetitions than with a true RNG.
Therefore you are unlikely to ever need a true random number generator, but if you do check out the HTTP API at random.org. Their API is backed by a true random source. The randomness comes from atmospheric noise.
If a PRNG or RNG never repeated numbers, it would be... really predictable, actually! Imagine a PRNG over the numbers 1 to 8. You see it print out 2, 5, 7, 3, 8, 4, 6. If the PRNG tried its hardest not to repeat itself, now you know the next number is going to be 1 - that's not random at all anymore!
So PRNGs and RNGs produce random output with repetition by default. If you don't want repetition, you should use a shuffling algorithm like the Fisher-Yates Shuffle ( http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle ) to randomly shuffle an array of the numbers you want, in random order.
Also, if you need a source of random number generation for cryptographic purposes, seek out a provider of cryptographic PRNGs for your language. As long as it's cryptographically strong it should be fine - A true RNG is a lot more expensive (or demands latency, such as using random.org) and not usually needed.