Seeding java.util.Random with consecutive numbers

Seeding java.util.Random with consecutive numbers - java

I've simplified a bug I'm experiencing down to the following lines of code:
int[] vals = new int[8];
for (int i = 0; i < 1500; i++)
vals[new Random(i).nextInt(8)]++;
System.out.println(Arrays.toString(vals));
The output is: [0, 0, 0, 0, 0, 1310, 190, 0]
Is this just an artifact of choosing consecutive numbers to seed Random and then using nextInt with a power of 2? If so, are there other pitfalls like this I should be aware of, and if not, what am I doing wrong? (I'm not looking for a solution to the above problem, just some understanding about what else could go wrong)
Dan, well-written analysis. As the javadoc is pretty explicit about how numbers are calculated, it's not a mystery as to why this happened as much as if there are other anomalies like this to watch out for-- I didn't see any documentation about consecutive seeds, and I'm hoping someone with some experience with java.util.Random can point out other common pitfalls.
As for the code, the need is for several parallel agents to have repeatably random behavior who happen to choose from an enum 8 elements long as their first step. Once I discovered this behavior, the seeds all come from a master Random object created from a known seed. In the former (sequentially-seeded) version of the program, all behavior quickly diverged after that first call to nextInt, so it took quite a while for me to narrow the program's behavior down to the RNG library, and I'd like to avoid that situation in the future.

As much as possible, the seed for an RNG should itself be random. The seeds that you are using are only going to differ in one or two bits.
There's very rarely a good reason to create two separate RNGs in the one program. Your code is not one of those situations where it makes sense.
Just create one RNG and reuse it, then you won't have this problem.
In response to comment from mmyers:
Do you happen to know java.util.Random
well enough to explain why it picks 5
and 6 in this case?
The answer is in the source code for java.util.Random, which is a linear congruential RNG. When you specify a seed in the constructor, it is manipulated as follows.
seed = (seed ^ 0x5DEECE66DL) & mask;
Where the mask simply retains the lower 48 bits and discards the others.
When generating the actual random bits, this seed is manipulated as follows:
randomBits = (seed * 0x5DEECE66DL + 0xBL) & mask;
Now if you consider that the seeds used by Parker were sequential (0 -1499), and they were used once then discarded, the first four seeds generated the following four sets of random bits:
101110110010000010110100011000000000101001110100
101110110001101011010101011100110010010000000111
101110110010110001110010001110011101011101001110
101110110010011010010011010011001111000011100001
Note that the top 10 bits are indentical in each case. This is a problem because he only wants to generate values in the range 0-7 (which only requires a few bits) and the RNG implementation does this by shifting the higher bits to the right and discarding the low bits. It does this because in the general case the high bits are more random than the low bits. In this case they are not because the seed data was poor.
Finally, to see how these bits convert into the decimal values that we get, you need to know that java.util.Random makes a special case when n is a power of 2. It requests 31 random bits (the top 31 bits from the above 48), multiplies that value by n and then shifts it 31 bits to the right.
Multiplying by 8 (the value of n in this example) is the same as shifting left 3 places. So the net effect of this procedure is to shift the 31 bits 28 places to the right. In each of the 4 examples above, this leaves the bit pattern 101 (or 5 in decimal).
If we didn't discard the RNGs after just one value, we would see the sequences diverge. While the four sequences above all start with 5, the second values of each are 6, 0, 2 and 4 respectively. The small differences in the initial seeds start to have an influence.
In response to the updated question: java.util.Random is thread-safe, you can share one instance across multiple threads, so there is still no need to have multiple instances. If you really have to have multiple RNG instances, make sure that they are seeded completely independently of each other, otherwise you can't trust the outputs to be independent.
As to why you get these kind of effects, java.util.Random is not the best RNG. It's simple, pretty fast and, if you don't look too closely, reasonably random. However, if you run some serious tests on its output, you'll see that it's flawed. You can see that visually here.
If you need a more random RNG, you can use java.security.SecureRandom. It's a fair bit slower, but it works properly. One thing that might be a problem for you though is that it is not repeatable. Two SecureRandom instances with the same seed won't give the same output. This is by design.
So what other options are there? This is where I plug my own library. It includes 3 repeatable pseudo-RNGs that are faster than SecureRandom and more random than java.util.Random. I didn't invent them, I just ported them from the original C versions. They are all thread-safe.
I implemented these RNGs because I needed something better for my evolutionary computation code. In line with my original brief answer, this code is multi-threaded but it only uses a single RNG instance, shared between all threads.

Related

Faster mod operation for large numbers in Java

I need to check whether X is divisible by Y or Not. I don't need the actual remainder in other cases.
I'm using the "mod" operator.
if (X.mod(Y).equals(BigInteger.ZERO))
{
do something
}
Now, I'm interested only when X is divisible by Y, I don't need the actual remainder on other cases.
Finding a faster way to check divisibility when the dividend is fixed. More precisely, to check for large number (potential to be prime) with many smaller primes before going to Lucas-Lehmer test.
I was just wondering, can we make some assumption (Look ahead type) depending on the last one or two digits of X & Y and we can make a decision to go for the mod or not (when there is no chance to get zero).

Java BigIntegers (like most numbers in computers since about 1980) are binary, so the only modulo that can be optimized by looking at the last 'digits' (binary digits = bits) are powers of 2, and the only power of 2 that is prime is 21. BigInteger.testBit(0) tests that directly. However, most software that generates large should-be primes is for cryptography (like RSA, Rabin, DH, DSA) and ensures never to test an even candidate in the first place; see e.g. FIPS186-4 A.1.1.2 (or earlier).
Since your actual goal is not as stated in the title, but to test if one (large) integer is not divisible by any of several small primes, the mathematically fastest way is to form their product -- in general any common multiple, preferably the least, but for distinct primes the product is the LCM -- and compute its GCD with the candidate using Euclid's algorithm. If the GCD is 1, no prime factor in the product is common with, and thus divides, the candidate. This requires several BigInteger divideAndRemainder operations, but it handles all of your tests in one fwoop.
A middle way is to 'bunch' several small primes whose product is less than 231 or 263, take BigInteger.mod (or .remainder) that product as .intValue() or .longValue() respectively, and test that (if nonzero) for divisibility by each of the small primes using int or long operations, which are much faster than the BigInteger ones. Repeat for several bunches if needed. BigInteger.probablePrime and related routines does exactly this (primes 3..41 against a long) for candidates up to 95 bits, above which it considers an Erastosthenes-style sieve more efficient. (In either case followed by Miller-Rabin and Lucas-Lehmer.)
When measuring things like this in Java, remember that if you execute some method 'a lot', where the exact definition of 'a lot' can vary and be hard to pin down, all common JVMs will JIT-compile the code, changing the performance radically. If you are doing it a lot be sure to measure the compiled performance, and if you aren't doing it a lot the performance usually doesn't matter. There are many existing questions here on SO about pitfalls in 'microbenchmark(s)' for Java.

There are algorithms to check divisibility but it's plural and each algorithm covers a particular group of numbers, e.g. dividable by 3, dividable by 4, etc. A list of some algorithms can be found e.g. at Wikipedia. There is no general, high performance algorithm that could be used for any given number, otherwise the one who found it would be famous and every dividable-by-implementation out there would use it.

Why random of 0 to 4 is 1 most of times?

I am using a simple random calculations for a small range with 4 elements.
indexOfNext = new Random().nextInt(4); //randomize 0 to 3
When I attach the debugger I see that for the first 2-3 times every time the result is 1.
Is this the wrong method for a small range or I should implement another logic (detecting previous random result)?
NOTE: If this is just a coincidence, then using this method is the wrong way to go, right? If yes, can you suggest alternative method? Shuffle maybe? Ideally, the alternative method would have a way to check that next result is not the same as the previous result.

Don't create a new Random() each time, create one object and call it many times.
This is because you need one random generator and many numbers from its
random sequence, as opposed to having many random generators and getting
just the 1st numbers of the random sequences they generate.

You're abusing the random number generator by creating a new instance repeatedly. (This is due to the implementation setting a starting random number value that's a very deterministic function of your system clock time). This ruins the statistical properties of the generator.
You should create one instance, then call nextInt as you need numbers.

One thing you can do is hold onto the Random instance. It has to seed itself each time you instantiate it and if that seed is the same then it will generate the same random sequence each time.
Other options are converting to SecureRandom, but you definitely want to hold onto that random instance to get the best random number performance. You really only need SecureRandom is you are randomly generating things that have security implications. Like implementing crypto algorithms or working around such things.

"Random" doesn't mean "non-repeating", and you cannot judge randomness by short sequences. For example, imagine that you have 2 sequences of 1 and 0:
101010101010101010101010101010101
and
110100100001110100100011111010001
Which looks more random? Second one, of course. But when you take any short sequence of 3-4-5 numbers from it, such sequence will look less random than taken from the first one. This is well-known and researched paradox.

A pseudo-random number generator requires a seed to work. The seed is a bit array which size depends on the implementation. The initial seed for a Random can either be specified manually or automatically assigned.
Now there are several common ways of assigning a random seed. One of them is ++currentSeed, the other is current timestamp.
It is possible that java uses System.currentTimeMillis() to get the timestamp and initialize the seed with it. Since the resolution of the timestamp is at most a millisecond (it differs on some machines, WinXP AFAIK had 3ms) all Random instances instantiated in the same millisecond-resolution window will have the same seed.
Another feature of pseudo-random number generators is that if they have the same seed, they return the same numbers in the same order.
So if you get the first pseudo-random number returned by several Randoms initialized with the same seed you are bound to get the same number for a few times. And I suspect that's what's happening in your case.

It seems that the number 1 has a 30% chance of showing its self which is more than other numbers. You can read this link for more information on the subject.
You can also read up on Benford's law.

Why does Random.nextLong not generate all possible long values in Java?

The Javadoc of the nextLong() method of the Random class states that
Because class Random uses a seed with only 48 bits, this algorithm will not return all possible long values. (Random javadoc)
The implementation is:
return ((long)next(32) << 32) + next(32);
The way I see it is as follows: to create any possible long, we should generate any possible bit pattern of 64 bits with equal likelihood. Assuming the calls to next(int) give us 32 random bits, then the concatenation of these bits will be a sequence of 64 random bits and hence we generate each 64 bit pattern with equal likelihood. And therefore all possible long values.
I suppose that the person who wrote the javadoc knows better and that my reasoning is flaw somehow. Can anyone explain where my reasoning is incorrect and what kind of longs will be returned then?

Since Random is pseudo-random we know that given the same seed it will return the same values. Taking the docs at their word there are 48 bits of seed. This means there are at most 2^48 unique values that can be printed out. If there were more that would mean that some value that we used before in position < 2^48 gives us a different value this time than it did last time.
If we try to join up two results what do we see?
|a|b|c|d|e|f|...|(2^48)-1|
Above are some values. How many pairs are there? a-b, b-c, c-d,... (2^48)-1-a. There are also 2^48 pairs. We can't fill all values of 2^64 with only the 2^48 pairs.

Pseudo-Random Number Generators are like giant rings of numbers. You start somewhere, and then move around the ring step by step, as you pull numbers out. This means that with a given seed - an initial internal state - all subsequent numbers are predetermined. Therefor, since the internal state is only 48 bits wide, only 2 to the power 48 random numbers are possible. So since the next number is given by the previous number, it is now clear why that implementation of nextLong will not generate all possible long values.

Let's say a perfect pseudo random K-bit generator is one that creates all possible 2^K seed values in 2^K trys. We can't do better, as there are only 2^K states, and every state is completly determined by the previous state and determines itself the next state.
Assume we write the output of the 48-bit generator down in binary. We get 2^48 * 48 bits that way.
And now we can say exactly how many 64-bit sequences we can get by going through the list and noting the next 64 bits (wrapping to the start when needed). It is exactly the number of bits we have: 13510798882111488.
Even if we assume that all those 64-bit sequences are pairwise different (which is not at all obvious), we have a long way to go until 2^64: 18446744073709551616.
I write the numbers again:
18446744073709551616 pairwise different 64 bit sequences we need
13510798882111488 64 bit sequences we can get with a 48 bit seed.
This proves that the javadoc writer was right. Only 1/1844th of all long values can be produced with the random generator

Random Number Generation without repetition in Java

I would like to get clarifications on Pseudo Random Number generation.
My questions are:
Is there any chance for getting repeated numbers in Pseudo Random Number Generation?
When i googled i found true random number generation. Can i get some algorithms for true random number generation, so that i can use it with
SecureRandom.getInstance(String algorithm)
Please give guidance with priority given to security.

1) Yes, you can generally have repeated numbers in a PRNG. Actually, if you apply the pigeon hole principle, the proof is quite straightforward (ie, suppose you have a PRNG on the set of 32-bit unsigned integers; if you generate more than 2^32 pseudo random numbers, you will certainly have at least one number generated at least 2 times; in practice, that would happen way faster; usually the algorithms for PRNGs will cycle through a sequence, and you have a way to calculate or estimate the size of that cycle, at the end of which every single number will start repeating, and the image of the algorithm is usually way, way smaller than the set from which you take your numbers).
If you need non-repeated numbers (since security seems to be a concern for you, note that this is less secure than a sequence of (pseudo) random numbers in which you allow repeated numbers!!!), you can do as follows:
class NonRepeatedPRNG {
private final Random rnd = new Random();
private final Set<Integer> set = new HashSet<>();
public int nextInt() {
for (;;) {
final int r = rnd.nextInt();
if (set.add(r)) return r;
}
}
}
Note that the nextInt method defined above may never return! Use with caution.
2) No, there's no such thing as an "algorithm for true random number generation", since an algorithm is something known, that you control and can predict (ie, just run it and you have the output; you know exactly its output the next time you run it with the same initial conditions), while a true RNG is completely unpredictable by definition.
For most common non security-related applications (ie, scientific calculations, games, etc), a PRNG will suffice. If security is a concern (ie, you want random numbers for crypto), then a CSPRNG (cryptographycally secure PRNG) will suffice.
If you have an application that cannot work without true randomness, I'm really curious to know more about it.

Yes, any random number generator can repeat. There are three general solutions to the non-duplicate random number problem:
If you want a few numbers from a large range then pick one and reject
it if it is a duplicate. If the range is large, then this won't cause
too many repeated attempts.
If you want a lot of numbers from a small range, then set out all the numbers in an
array and shuffle the array. The Fisher-Yates algorithm is standard for array
shuffling. Take the random numbers in sequence from the shuffled array.
If you want a lot of numbers from a large range then use an appropriately sized
encryption algorithm. E.g. for 64 bit numbers use DES and encrypt 0, 1, 2, 3, ...
in sequence. The output is guaranteed unique because encryption is reversible.

Pseudo RNGs can repeat themselves, but True RNGs can also repeat themselves - if they never repeated themselves they wouldn't be random.
A good PRNG once seeded with some (~128 bit) real entropy is practically indistinguishable from a true RNG. You certainly won't get noticeably more collisions or repetitions than with a true RNG.
Therefore you are unlikely to ever need a true random number generator, but if you do check out the HTTP API at random.org. Their API is backed by a true random source. The randomness comes from atmospheric noise.

If a PRNG or RNG never repeated numbers, it would be... really predictable, actually! Imagine a PRNG over the numbers 1 to 8. You see it print out 2, 5, 7, 3, 8, 4, 6. If the PRNG tried its hardest not to repeat itself, now you know the next number is going to be 1 - that's not random at all anymore!
So PRNGs and RNGs produce random output with repetition by default. If you don't want repetition, you should use a shuffling algorithm like the Fisher-Yates Shuffle ( http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle ) to randomly shuffle an array of the numbers you want, in random order.
Also, if you need a source of random number generation for cryptographic purposes, seek out a provider of cryptographic PRNGs for your language. As long as it's cryptographically strong it should be fine - A true RNG is a lot more expensive (or demands latency, such as using random.org) and not usually needed.

Why does java.util.Random use the mask?

Simplified (i.e., leaving concurrency out) Random.next(int bits) looks like
protected int next(int bits) {
seed = (seed * multiplier + addend) & mask;
return (int) (seed >>> (48 - bits));
}
where masking gets used to reduce the seed to 48 bits. Why is it better than just
protected int next(int bits) {
seed = seed * multiplier + addend;
return (int) (seed >>> (64 - bits));
}
? I've read quite a lot about random numbers, but see no reason for this.

The reason is that the lower bits tend to have a lower period (at least with the algorithm Java uses)
From Wikipedia - Linear Congruential Generator:
As shown above, LCG's do not always use all of the bits in the values they produce. The Java implementation produces 48 bits with each iteration but only returns the 32 most significant bits from these values. This is because the higher-order bits have longer periods than the lower order bits (see below). LCG's that use this technique produce much better values than those that do not.
edit:
after further reading (conveniently, on Wikipedia), the values of a, c, and m must satisfy these conditions to have the full period:
c and m must be relatively primes
a-1 is divisible by all prime factors of m
a-1 is a multiple of 4 if m is a multiple of 4
The only one that I can clearly tell is still satisfied is #3. #1 and #2 need to be checked, and I have a feeling that one (or both) of these fail.

From the docs at the top of java.util.Random:
The algorithm is described in The Art of Computer Programming,
Volume 2 by Donald Knuth in Section 3.2.1. It is a 48-bit seed,
linear congruential formula.
So the entire algorithm is designed to operate of 48-bit seeds, not 64 bit ones. I guess you can take it up with Mr. Knuth ;p

From wikipedia (the quote alluded to by the quote that #helloworld922 posted):
A further problem of LCGs is that the lower-order bits of the generated sequence have a far shorter period than the sequence as a whole if m is set to a power of 2. In general, the nth least significant digit in the base b representation of the output sequence, where bk = m for some integer k, repeats with at most period bn.
And furthermore, it continues (my italics):
The low-order bits of LCGs when m is a power of 2 should never be relied on for any degree of randomness whatsoever. Indeed, simply substituting 2n for the modulus term reveals that the low order bits go through very short cycles. In particular, any full-cycle LCG when m is a power of 2 will produce alternately odd and even results.
In the end, the reason is probably historical: the folks at Sun wanted something to work reliably, and the Knuth formula gave 32 significant bits. Note that the java.util.Random API says this (my italics):
If two instances of Random are created with the same seed, and the same sequence of method calls is made for each, they will generate and return identical sequences of numbers. In order to guarantee this property, particular algorithms are specified for the class Random. Java implementations must use all the algorithms shown here for the class Random, for the sake of absolute portability of Java code. However, subclasses of class Random are permitted to use other algorithms, so long as they adhere to the general contracts for all the methods.
So we're stuck with it as a reference implementation. However that doesn't mean you can't use another generator (and subclass Random or create a new class):
from the same Wikipedia page:
MMIX by Donald Knuth m=264 a=6364136223846793005 c=1442695040888963407
There's a 64-bit formula for you.
Random numbers are tricky (as Knuth notes) and depending on your needs, you might be fine with just calling java.util.Random twice and concatenating the bits if you need a 64-bit number. If you really care about the statistical properties, use something like Mersenne Twister, or if you care about information leakage / unpredictability use java.security.SecureRandom.

It doesn't look like there was a good reason for doing this.
Applying the mask is an conservative approach using a proven design.
Leaving it out most probably leads to a better generator, however, without knowing the math well, it's a risky step.
Another small advantage of masking is a speed gain on 8-bit architectures, since it uses 6 bytes instead of 8.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.