How good is java.util.Random?

How good is java.util.Random? - java

Two Questions:
Will I get different sequences of numbers for every seed I put into it?
Are there some "dead" seeds? (Ones that produce zeros or repeat very quickly.)
By the way, which, if any, other PRNGs should I use?
Solution: Since, I'm going to be using the PRNG to make a game, I don't need it to be cryptographically secure. I'm going with the Mersenne Twister, both for it's speed and huge period.

To some extent, random number generators are horses for courses. The Random class implements an LCG with reasonably chosen parameters. But it still exhibits the following features:
fairly short period (2^48)
bits are not equally random (see my article on randomness of bit positions)
will only generate a small fraction of combinations of values (the famous problem of "falling in the planes")
If these things don't matter to you, then Random has the redeeming feature of being provided as part of the JDK. It's good enough for things like casual games (but not ones where money is involved). There are no weak seeds as such.
Another alternative which is the XORShift generator, which can be implemented in Java as follows:
public long randomLong() {
x ^= (x << 21);
x ^= (x >>> 35);
x ^= (x << 4);
return x;
}
For some very cheap operations, this has a period of 2^64-1 (zero is not permitted), and is simple enough to be inlined when you're generating values repeatedly. Various shift values are possible: see George Marsaglia's paper on XORShift Generators for more details. You can consider bits in the numbers generated as being equally random. One main weakness is that occasionally it will get into a "rut" where not many bits are set in the number, and then it takes a few generations to get out of this rut.
Other possibilities are:
combine different generators (e.g. feed the output from an XORShift generator into an LCG, then add the result to the output of an XORShift generator with different parameters): this generally allows the weaknesses of the different methods to be "smoothed out", and can give a longer period if the periods of the combined generators are carefully chosen
add a "lag" (to give a longer period): essentially, where a generator would normally transform the last number generated, store a "history buffer" and transform, say, the (n-1023)th.
I would say avoid generators that use a stupid amount of memory to give you a period longer than you really need (some have a period greater than the number of atoms in the universe-- you really don't usually need that). And note that "long period" doesn't necessarily mean "high quality generator" (though 2^48 is still a little bit low!).

As zvrba said, that JavaDoc explains the normal implementation. The Wikipedia page on pseudo-random number generators has a fair amount of information and mentions the Mersenne twister, which is not deemed cryptographically secure, but is very fast and has various implementations in Java. (The last link has two implementations - there are others available, I believe.)
If you need cryptographically secure generation, read the Wikipedia page - there are various options available.

As RNGs go, Sun's implementation is definitely not state-of-theart, but's good enough for most purposes. If you need random numbers for cryptography purposes, there's java.security.SecureRandom, if you just want something faster and better than java.util.random, it's easy to find Java implementations of the Mersenne Twister on the net.

This is described in the documentation. Linear congruential generators are theoretically well-understood and a lot of material on them is available in literature and on the internet. Linear congruential generator with same parameters always outputs the same periodic sequence, and the only thing that seed decides is where the sequence begins. So the answer to your first question is "yes, if you generate enough random numbers."

See the answer in my blog post:
http://code-o-matic.blogspot.com/2010/12/how-long-is-period-of-random-numbers.html
Random has a maximal period for its state (a long, i.e. 2^64 period). This can be directly generalized to 2^k - invest as many state bits as you want, and you get the maximal period. 2Mersenne Twister has actually a very short period, comparatively (see the comments in said blog post).
--Oops. Random restricts itself to 48bits, instead of using the full 64 bits of a long, so correspondingly, its period is 2^48 after all, not 2^64.

If RNG quality really matters to you, I'd recommend using your own RNG. Maybe java.util.Random is just great, in this version, on your operating system, etc. It probably is. But that could change. It's happened before that a library writer made things worse in a later version.
It's very simple to write your own, and then you know exactly what's going on. It won't change on upgrade, etc. Here's a generator you could port to Java in 10 minutes. And if you start writing in some new language a week from now, you can port it again.
If you don't implement your own, you can grab code for a well-known RNG from a reputable source and use it in your projects. Then nobody will change your generator out from under you.
(I'm not advocating that people come up with their own algorithms, only their own implementation. Most people, myself included, have no business developing their own algorithm. It's easy to write a bad generator that you think is wonderful. That's why people need to ask questions like this one, wondering how good the library generator is. The algorithm in the generator I referenced has been through the ringer of much peer review.)

Related

How to collect entropy sources in JRE(Java Runtime Environment)?

I wish to find entropy sources(noise source) in JRE.
(Entropy source means a seed material(or Seed) of PRNG(Pseudo random number generator).)
However, I don't exactly know what is good entropy sources for the seed.
So I have difficulty finding proper entropy sources in JRE.
Would you tell me about this?

You can extract entropy from different sources within the JVM:
The most obvious source is the clock
System.currentTimeMillis();
System.nanoTime();
You can ask for the free memory, which changes quite often
Runtime.getRuntime().freeMemory()
You can also use the internal hashCode of objects as entropy
System.identityHashCode(someObject)
new Object[0].hashCode()
Going for heavier stuff, maybe the most comprehensive source of entropy are the MXBeans, they have many properties which change quite unpredictably over time.
ManagementFactory.getMemoryMXBean()
ManagementFactory.getThreadMXBean()
...
Finally, you can always generate Random and SecureRandom numbers or random UUIDs, for example
new Random().nextLong()
new SecureRandom().nextLong()
UUID.randomUUid()
By the way, somebody asked quite the same question here, with not much more success.

Imagine the JVM as a sandbox.
When you are not allowed to leave the sandbox, your options are pretty much limited. Then you can do simple things, such as asking the user to do random typing on the keyboard, or you put up a panel and have him use the mouse to draw something.
But if your use case allows for making system calls, you can look into more advanced options based on the specifics of the operating system you are running on.

You probably should use SecureRandom if you're not interested in reaching for the underling Operating System or other external to the JVM sources.
Here's the official answer from Oracle:
A fast Random number designed for basic tasks where true randomness is
not the main goal. This is useful for things like which shade of color
to use, preventing overlap in force-directed layouts, or which picture
to show from a list after evaluating demographic information.
High-concurrency programs may also use ThreadLocalRandom if they value
speed over true randomness. This is the same as above but will likely
give better performance if simultaneous threads generate pseudo-random
numbers at the same time.
A slower but more random SecureRandom
designed for important tasks where the inability to predict numbers is
crucial to success. Examples include cases like gambling, scientific
sampling, or any cryptographic operations. Although slower than the
other two random number generators, its better randomness in many
applications.
Full post: https://blogs.oracle.com/java-platform-group/thats-so-securerandom

Why would you write your own Random Number Generator? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I've read a lot of guides on how to write your own random number generator, so I'm interested in the reasons why you would write your own since most languages already provide functions for generating random numbers:
Like C++
srand(time(NULL));
rand();
C#
Random rand = new Random();
rand.Next(100);
and Java
Random rand = new Random();
rand.nextInt(0, 100);
I'm mainly looking for advantages to using your own.

If you've done your research and found out that the default generator is horrible (as is the case in C or Excel, or with IBM's infamous randu), you might be motivated to download or implement a better generator. However, unless you have a very deep understanding of probability, statistics, and numerical methods, you should under no circumstances try to create your own. Even such luminaries as John von Neumann have screwed up royally on this.
Another reason might be to get cross-platform reproducibility of results.

Never, ever, roll-your-own cryptography or random number generation unless you are very comfortable with the higher math involved. Here's a short test: if you understand probability distributions, linear feedback shift registers, the incomplete gamma function, and the Chinese remainder theorem, you might be qualified to roll your own.
Otherwise, use a generator provided by someone who does understand these things. The one built into your language might not be. So look for add-on libraries with good reputations.

Sometimes, even though you want a sequence of random numbers, you might want the same exact sequence of random numbers (for debugging or other purposes).
In a portable program, designed to be run on different systems with different libraries, and possibly different random number generators, accomplishing the goal stated above might not be possible.
If you instead implement your own, you would have control over this, and could make it behave the same on a multitude of systems, rather than relying on the provided implementation.
Also, as mentioned in a comment, a provided implementation may be bugged somehow.

One of the possible reasons is right in your question..:
most languages already provide functions
They do but they are more often than not incompatible.
I had to write one once because the (lightweight) encryption I wrote was using a different language (Powerscript) than the decryption (VB) and their Random generators were not compatible.

Stock random number generators are usually pseudo-random number generators in most languages.
A pseudo-random number generator starts with some state, and uses it to produce an unpredictable sequence of seemingly uniform numbers.
There are many different pseudo-random number generators that have been researched. They have different advantages and disadvantages -- some are more random, some have longer periods, some are cryptographically strong and difficult to work out the seed from previous samples, some are fast, etc.
The one picked for a given language is going to be some compromise of the above features. In some cases, the one picked will be known to be a poor one, but for legacy reasons is left alone as the "stock" random number generator (rand() is an example of a poor random number generator). If you need different features than your given language random number generator picked as important, writing your own (or finding one) is about the only way to get it.
In some languages, the random number generator (or the distribution generator) is under specified, or subject to change between revisions of the language. If you need stability of your random number generator (say, you are using it to procedurally generate a game universe from a small seed -- see the classic game star control 2), writing it yourself may be required, even if it is a clone of the standard one on your system.
If you need your random number generator to be stable from one language to another, each language is going to have made different choices.
In C++11, the old rand() was mostly deprecated, and a new library with 3 engines, 10 predefined generators, 3 engine adapters, 21 distributions, and 1 non-pseudo random number generator (random_device) was added. The distributions are under-specified, while the generators are not: if you need cross-compiler compatibility of results from a given seed state, you would need to write your own distributions.
Even in C++11 with that embarrassment of riches, the exact trade offs you want might not be available. So you'd have to write your own.
Note that C++11's set of generators was mostly written prior to C++11 being in existence. It was written because rand() was considered useless, and people wrote libraries with their own random number generators. Best practices where gathered, and formalized in that version of C++. So another reason to learn how to write them is that your language of choice will need to be improved, and programmers are the ones who need to do it.
For an in-depth discussion of pseudo-random number generator properties, wikipedia has an acceptable place to start. Here it mentions that Java's JCG is a low quality one.

The generators you list are all PRNGs. These particular PRNGs are not suitable for gaming, scientific, or cryptographic applications.

Types of randomness

Java's stock Random libraries include Random and SecureRandom (and I see ThreadLocalRandom as well). Are there any others? When would I use each? Sometimes I use SecureRandom just to feel better about my simple numbers. It turns out that SecureRandom actually lets you pick your generator. How and when should I use this?
Finally, Java 8 provides SecureRandom.getInstanceStrong(). I am not sure what this is, but it's much slower than any of the previous. How and when should I use SecureRandom.getInstanceStrong()? Also, is it slow because the noise source is running out?

Random is predictable, you just need a small sequence of the generated numbers and you can walk both forward and backwards through the sequence. See Inverse function of Java's Random function for an example of reversing the sequence.
SecureRandom is not.
ThreadLocalRandom is an attempt to fix the fact that Random is not thread safe.
Other forms of random are possible with different features - you will have to study the maths of random numbers to to be able to balance between the ones you mentioned and any other algorithm.
SecureRandom getInstanceStrong() (note the Strong) seems to be an even stronger random sequence that is especially resilient to exposing long sequences.
Randomness
Randomness can be measured statistically - I won't go into detail here, there are loads of resources out there that explain how this can be done.
It is comparatively easy to think up an algorithm that generate a statistically random sequence. However, if you only attempt statistical randomness and expect it to be a good source for encrypting your data you are mistaken. You might as well use:
private static int lastRandom = 0;
public static int nextRandom() {
return ++lastRandom;
}
The sequence generated will probably not pass the statistical tests for randomness but it would be about as predictable.
Predictability
This is a completely different mathematical problem far beyond a simple StackOverflow answer. If you want to generate a random number sequence that is not predictable at all you may as well use a Geiger counter or similar unpredictable hardware source. Have a look here for some interesting discussion.
Security
The problem is that a good encryption sequence must find the balance between making it difficult to reproduce while not making it impossible to reproduce. An impossible to reproduce sequence of random numbers is useless for encryption because you would never be able to reproduce the same sequence to decrypt.
Achieving difficult to reproduce without becoming impossible is the dream of cryptography. Again there are many resources but Wikipedia is, as usual, an excellent start.

any function in C or Java (Android) could mess up srand()?

So I have some C code which calculate some results based on the number generated by srand(). If I use the same seed number, the result will always be the same.
Now I have an Android app load these C code via JNI. However, the results become different although the same seed number is being used. I have double checked the seed number to make sure it is the same. However, since both the Android program and the native code are pretty complicated, I am having a hard time to figure out what is causing this problem.
What I am sure is, we did not use function in the java program to generate random numbers. So presumably srand() is not called with a different seed number every time. Can other functions in Java or C change the random number generated by srand()?
Thanks!
Update:
I guess my question was a little confusing. To clarify, the results I am comparing are from the same platform, but different runs. The c code use rand() to get a number calculate a result based on that. So if the seed number of srand() is always the same, the number get by rand() should be the same and hence the results should be the same. but somehow even I use the same seed for srand(), the rand() give me different number... Any thought on that?

There are many different types of random number generators, and they are not all guaranteed to be the same from platform to platform. If having a cross platform 100% predictable solution is necessary for your project, you'll probably have to write your own.
It's really not as bad as it may sound...
I'd recommend looking up random number generation such as the Mersenne Twister algorithm (which is what I use in my projects), and write a small block of code that you can share amongst all your projects. This also gives you the benefit of being able to have multiple generators with varying seeds, which comes in really useful for something like a puzzle game, where you might want a predictably random set based on a specific seed to generate your puzzle, but another clock seeded generator for randomizing special FX or other game elements.

The pseudo-random algorithm implemented by rand() is determined by the C library, and there is no standard algorithm for it. You are absolutely not guaranteed to get the same sequence of numbers from one implementation to the next, and it sounds like the Android implementation differs from your development environment. If you need a predictable sequence across platforms, you should implement your own random number generator.

Pollard-Rho Factorization Parallelization

I recently stumbled upon a paper on a parallelization of Pollard's Rho algorithm, and given my specific application, in addition to the fact that I haven't attained the required level of math, I'm wondering if this particular parallelization method helps my specific case.
I'm trying to find two factors—semiprimes—of a very large number. My assumption, based on what little I can understand of the paper, is that this parallelization works well on a number with lots of smaller factors, rather than on two very large factors.
Is this true? Should I use this parallelization or use something else? Should I even use Pollard's Rho, or is there a better parallelization of a different factorization algorithm?

The wikipedia article states two concrete examples:
Number Original code Brent's modification
18446744073709551617 26 ms 5 ms
10023859281455311421 109 ms 31 ms
First of all, run these two with your program and take a look at your times. If they are similar to this ("hard" numbers calculating 4-6 times longer), ask yourself if you can live with that. Or even better, use other algorithms like simple classic "brute force" factorization and look at the times they give. I guess they might have a hard-easy factor closer to 1, but worse absolute times, so it's a simple trade-off.
Side note: Of course, parallelization is the way to go here, I guess you know that but I think it's important to emphasize. Also, it would help for the case that another approach lies between the Pollard-rho timings (e.g. Pollard-Rho 5-31 ms, different approach 15-17 ms) - in this case, consider running the 2 algorithms in seperate threads to do a "factorization race".
In case you don't have an actual implementation of the algorithm yet, here are Python implementations.

The basic idea in factoring large integers is to use a variety of methods, each with its own pluses and minuses. The usual plan is to start with trial division by primes to 1000 or 10000, followed by a few million Pollard rho steps; that ought to get you factors up to about twelve digits. At that point a few tests are in order: is the number a prime power or a perfect power (there are simple tests for those properties). If you still haven't factored the number, you know that it will be hard, so you will need heavy-duty tools. A useful next step is Pollard's p-1 method, followed by its close cousin the elliptic curve method. After a while, if that doesn't work, the only methods left are quadratic sieve or number field sieve, which are inherently parallel.
The parallel rho method that you asked about isn't widely used today. As you suggested, Pollard rho is better suited to finding small factors rather than large ones. For a semi-prime, it's better to spend parallel cycles on one of the sieves than on Pollard rho.
I recommend the factoring forum at mersenneforum.org for more information.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.