How to collect entropy sources in JRE(Java Runtime Environment)? - java

I wish to find entropy sources(noise source) in JRE.
(Entropy source means a seed material(or Seed) of PRNG(Pseudo random number generator).)
However, I don't exactly know what is good entropy sources for the seed.
So I have difficulty finding proper entropy sources in JRE.
Would you tell me about this?

You can extract entropy from different sources within the JVM:
The most obvious source is the clock
System.currentTimeMillis();
System.nanoTime();
You can ask for the free memory, which changes quite often
Runtime.getRuntime().freeMemory()
You can also use the internal hashCode of objects as entropy
System.identityHashCode(someObject)
new Object[0].hashCode()
Going for heavier stuff, maybe the most comprehensive source of entropy are the MXBeans, they have many properties which change quite unpredictably over time.
ManagementFactory.getMemoryMXBean()
ManagementFactory.getThreadMXBean()
...
Finally, you can always generate Random and SecureRandom numbers or random UUIDs, for example
new Random().nextLong()
new SecureRandom().nextLong()
UUID.randomUUid()
By the way, somebody asked quite the same question here, with not much more success.

Imagine the JVM as a sandbox.
When you are not allowed to leave the sandbox, your options are pretty much limited. Then you can do simple things, such as asking the user to do random typing on the keyboard, or you put up a panel and have him use the mouse to draw something.
But if your use case allows for making system calls, you can look into more advanced options based on the specifics of the operating system you are running on.

You probably should use SecureRandom if you're not interested in reaching for the underling Operating System or other external to the JVM sources.
Here's the official answer from Oracle:
A fast Random number designed for basic tasks where true randomness is
not the main goal. This is useful for things like which shade of color
to use, preventing overlap in force-directed layouts, or which picture
to show from a list after evaluating demographic information.
High-concurrency programs may also use ThreadLocalRandom if they value
speed over true randomness. This is the same as above but will likely
give better performance if simultaneous threads generate pseudo-random
numbers at the same time.
A slower but more random SecureRandom
designed for important tasks where the inability to predict numbers is
crucial to success. Examples include cases like gambling, scientific
sampling, or any cryptographic operations. Although slower than the
other two random number generators, its better randomness in many
applications.
Full post: https://blogs.oracle.com/java-platform-group/thats-so-securerandom

Related

Will genetic algorithm provides different output every time?

Since we expect feasible solution from Genetic Algorithm, So will genetic algorithm provides different output every time with same input set?
It depends.
GAs will take different routes through the solution space; and
GAs are not guaranteed to converge
GAs are typically used on complex problems with large solutions spaces with convoluted payoffs and local minima. In such problems you would not expect identical outputs at the end of a GA run.
On the other hand, GAs may be applied to large problems that have a single correct answer. In such situations, the population may converge.
A good GA implementation will support reproducible results. This applies to all metaheuristics (not just GA's). Reproducible means that the same run will yield the same order of the same set of new best solution events are found. Depending on the actual CPU time given to the process, the number of iterations might differ and therefore they might not end with the same best solution.
Internally, reproducible results implies that:
everything uses 1 seeded Random instance.
even parallel implementations yield reproducible results (=> no work stealing)
...
During development, reproducibility is worth its weight in gold to find, diagnose, debug and fix bugs.
In production, a few companies turn it off (to take advantage of performance gains such as work stealing), but most enterprises still like to leave it on.

Why would you write your own Random Number Generator? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I've read a lot of guides on how to write your own random number generator, so I'm interested in the reasons why you would write your own since most languages already provide functions for generating random numbers:
Like C++
srand(time(NULL));
rand();
C#
Random rand = new Random();
rand.Next(100);
and Java
Random rand = new Random();
rand.nextInt(0, 100);
I'm mainly looking for advantages to using your own.
If you've done your research and found out that the default generator is horrible (as is the case in C or Excel, or with IBM's infamous randu), you might be motivated to download or implement a better generator. However, unless you have a very deep understanding of probability, statistics, and numerical methods, you should under no circumstances try to create your own. Even such luminaries as John von Neumann have screwed up royally on this.
Another reason might be to get cross-platform reproducibility of results.
Never, ever, roll-your-own cryptography or random number generation unless you are very comfortable with the higher math involved. Here's a short test: if you understand probability distributions, linear feedback shift registers, the incomplete gamma function, and the Chinese remainder theorem, you might be qualified to roll your own.
Otherwise, use a generator provided by someone who does understand these things. The one built into your language might not be. So look for add-on libraries with good reputations.
Sometimes, even though you want a sequence of random numbers, you might want the same exact sequence of random numbers (for debugging or other purposes).
In a portable program, designed to be run on different systems with different libraries, and possibly different random number generators, accomplishing the goal stated above might not be possible.
If you instead implement your own, you would have control over this, and could make it behave the same on a multitude of systems, rather than relying on the provided implementation.
Also, as mentioned in a comment, a provided implementation may be bugged somehow.
One of the possible reasons is right in your question..:
most languages already provide functions
They do but they are more often than not incompatible.
I had to write one once because the (lightweight) encryption I wrote was using a different language (Powerscript) than the decryption (VB) and their Random generators were not compatible.
Stock random number generators are usually pseudo-random number generators in most languages.
A pseudo-random number generator starts with some state, and uses it to produce an unpredictable sequence of seemingly uniform numbers.
There are many different pseudo-random number generators that have been researched. They have different advantages and disadvantages -- some are more random, some have longer periods, some are cryptographically strong and difficult to work out the seed from previous samples, some are fast, etc.
The one picked for a given language is going to be some compromise of the above features. In some cases, the one picked will be known to be a poor one, but for legacy reasons is left alone as the "stock" random number generator (rand() is an example of a poor random number generator). If you need different features than your given language random number generator picked as important, writing your own (or finding one) is about the only way to get it.
In some languages, the random number generator (or the distribution generator) is under specified, or subject to change between revisions of the language. If you need stability of your random number generator (say, you are using it to procedurally generate a game universe from a small seed -- see the classic game star control 2), writing it yourself may be required, even if it is a clone of the standard one on your system.
If you need your random number generator to be stable from one language to another, each language is going to have made different choices.
In C++11, the old rand() was mostly deprecated, and a new library with 3 engines, 10 predefined generators, 3 engine adapters, 21 distributions, and 1 non-pseudo random number generator (random_device) was added. The distributions are under-specified, while the generators are not: if you need cross-compiler compatibility of results from a given seed state, you would need to write your own distributions.
Even in C++11 with that embarrassment of riches, the exact trade offs you want might not be available. So you'd have to write your own.
Note that C++11's set of generators was mostly written prior to C++11 being in existence. It was written because rand() was considered useless, and people wrote libraries with their own random number generators. Best practices where gathered, and formalized in that version of C++. So another reason to learn how to write them is that your language of choice will need to be improved, and programmers are the ones who need to do it.
For an in-depth discussion of pseudo-random number generator properties, wikipedia has an acceptable place to start. Here it mentions that Java's JCG is a low quality one.
The generators you list are all PRNGs. These particular PRNGs are not suitable for gaming, scientific, or cryptographic applications.

Types of randomness

Java's stock Random libraries include Random and SecureRandom (and I see ThreadLocalRandom as well). Are there any others? When would I use each? Sometimes I use SecureRandom just to feel better about my simple numbers. It turns out that SecureRandom actually lets you pick your generator. How and when should I use this?
Finally, Java 8 provides SecureRandom.getInstanceStrong(). I am not sure what this is, but it's much slower than any of the previous. How and when should I use SecureRandom.getInstanceStrong()? Also, is it slow because the noise source is running out?
Random is predictable, you just need a small sequence of the generated numbers and you can walk both forward and backwards through the sequence. See Inverse function of Java's Random function for an example of reversing the sequence.
SecureRandom is not.
ThreadLocalRandom is an attempt to fix the fact that Random is not thread safe.
Other forms of random are possible with different features - you will have to study the maths of random numbers to to be able to balance between the ones you mentioned and any other algorithm.
SecureRandom getInstanceStrong() (note the Strong) seems to be an even stronger random sequence that is especially resilient to exposing long sequences.
Randomness
Randomness can be measured statistically - I won't go into detail here, there are loads of resources out there that explain how this can be done.
It is comparatively easy to think up an algorithm that generate a statistically random sequence. However, if you only attempt statistical randomness and expect it to be a good source for encrypting your data you are mistaken. You might as well use:
private static int lastRandom = 0;
public static int nextRandom() {
return ++lastRandom;
}
The sequence generated will probably not pass the statistical tests for randomness but it would be about as predictable.
Predictability
This is a completely different mathematical problem far beyond a simple StackOverflow answer. If you want to generate a random number sequence that is not predictable at all you may as well use a Geiger counter or similar unpredictable hardware source. Have a look here for some interesting discussion.
Security
The problem is that a good encryption sequence must find the balance between making it difficult to reproduce while not making it impossible to reproduce. An impossible to reproduce sequence of random numbers is useless for encryption because you would never be able to reproduce the same sequence to decrypt.
Achieving difficult to reproduce without becoming impossible is the dream of cryptography. Again there are many resources but Wikipedia is, as usual, an excellent start.

Where to code this heuristic?

I want to ask a complex question.
I have to code a heuristic for my thesis. I need followings:
Evaluate some integral functions
Minimize functions over an interval
Do this over thousand and thousand times.
So I need a faster programming language to do these jobs. Which language do you suggest? First, I started with Java, but taking integrals become a problem. And I'm not sure about speed.
Connecting Java and other softwares like MATLAB may be a good idea. Since I'm not sure, I want to take your opinions.
Thanks!
C,Java, ... are all Turing complete languages. They can calculate the same functions with the same precision.
If you want achieve performance goals use C that is a compiled and high performances language . Can decrease your computation time avoiding method calls and high level features present in an interpreted language like Java.
Anyway remember that your implementation may impact the performances more than which language you choose, because for increasing input dimension is the computational complexity that is relevant ( http://en.wikipedia.org/wiki/Computational_complexity_theory ).
It's not the programming language, it's probably your algorithm. Determine the big0 notation of your algorithm. If you use loops in loops, where you could use a search by a hash in a Map instead, your algorithm can be made n times faster.
Note: Modern JVM's (JDK 1.5 or 1.6) compile Just-In-Time natively (as in not-interpreted) to a specific OS and a specific OS version and a specific hardware architecture. You could try the -server to JIT even more aggressively (at the cost of an even longer initialization time).
Do this over thousand and thousand times.
Are you sure it's not more, something like 10^1000 instead? Try accurately calculating how many times you need to run that loop, it might surprise you. The type of problems on which heuristics are used, tend to have a really big search space.
Before you start switching languages, I'd first try to do the following things:
Find the best available algorithms.
Find available implementations of those algorithms usable from your language.
There are e.g. scientific libraries for Java. Try to use these libraries.
If they are not fast enough investigate whether there is anything to be done about it. Is your problem more specific than what the library assumes. Are you able to improve the algorithm based on that knowledge.
What is it that takes so much/memory? Is this realy related to your language? Try to avoid observing JVM start times instead of the time it performed calculation for you.
Then, I'd consider switching languages. But don't expect it to be easy to beat optimized third party java libraries in c.
Order of the algorithm
Tipically switching between languages only reduce the time required by a constant factor. Let's say you can double the speed using C, but if your algorithm is O(n^2) it will take four times to process if you double the data no matter the language.
And the JVM can optimize a lot of things getting good results.
Some posible optimizations in Java
If you have functions that are called a lot of times make them final. And the same for entire classes. The compiler will know that it can inline the method code, avoiding creating method-call stack frames for that call.

How good is java.util.Random?

Two Questions:
Will I get different sequences of numbers for every seed I put into it?
Are there some "dead" seeds? (Ones that produce zeros or repeat very quickly.)
By the way, which, if any, other PRNGs should I use?
Solution: Since, I'm going to be using the PRNG to make a game, I don't need it to be cryptographically secure. I'm going with the Mersenne Twister, both for it's speed and huge period.
To some extent, random number generators are horses for courses. The Random class implements an LCG with reasonably chosen parameters. But it still exhibits the following features:
fairly short period (2^48)
bits are not equally random (see my article on randomness of bit positions)
will only generate a small fraction of combinations of values (the famous problem of "falling in the planes")
If these things don't matter to you, then Random has the redeeming feature of being provided as part of the JDK. It's good enough for things like casual games (but not ones where money is involved). There are no weak seeds as such.
Another alternative which is the XORShift generator, which can be implemented in Java as follows:
public long randomLong() {
x ^= (x << 21);
x ^= (x >>> 35);
x ^= (x << 4);
return x;
}
For some very cheap operations, this has a period of 2^64-1 (zero is not permitted), and is simple enough to be inlined when you're generating values repeatedly. Various shift values are possible: see George Marsaglia's paper on XORShift Generators for more details. You can consider bits in the numbers generated as being equally random. One main weakness is that occasionally it will get into a "rut" where not many bits are set in the number, and then it takes a few generations to get out of this rut.
Other possibilities are:
combine different generators (e.g. feed the output from an XORShift generator into an LCG, then add the result to the output of an XORShift generator with different parameters): this generally allows the weaknesses of the different methods to be "smoothed out", and can give a longer period if the periods of the combined generators are carefully chosen
add a "lag" (to give a longer period): essentially, where a generator would normally transform the last number generated, store a "history buffer" and transform, say, the (n-1023)th.
I would say avoid generators that use a stupid amount of memory to give you a period longer than you really need (some have a period greater than the number of atoms in the universe-- you really don't usually need that). And note that "long period" doesn't necessarily mean "high quality generator" (though 2^48 is still a little bit low!).
As zvrba said, that JavaDoc explains the normal implementation. The Wikipedia page on pseudo-random number generators has a fair amount of information and mentions the Mersenne twister, which is not deemed cryptographically secure, but is very fast and has various implementations in Java. (The last link has two implementations - there are others available, I believe.)
If you need cryptographically secure generation, read the Wikipedia page - there are various options available.
As RNGs go, Sun's implementation is definitely not state-of-theart, but's good enough for most purposes. If you need random numbers for cryptography purposes, there's java.security.SecureRandom, if you just want something faster and better than java.util.random, it's easy to find Java implementations of the Mersenne Twister on the net.
This is described in the documentation. Linear congruential generators are theoretically well-understood and a lot of material on them is available in literature and on the internet. Linear congruential generator with same parameters always outputs the same periodic sequence, and the only thing that seed decides is where the sequence begins. So the answer to your first question is "yes, if you generate enough random numbers."
See the answer in my blog post:
http://code-o-matic.blogspot.com/2010/12/how-long-is-period-of-random-numbers.html
Random has a maximal period for its state (a long, i.e. 2^64 period). This can be directly generalized to 2^k - invest as many state bits as you want, and you get the maximal period. 2Mersenne Twister has actually a very short period, comparatively (see the comments in said blog post).
--Oops. Random restricts itself to 48bits, instead of using the full 64 bits of a long, so correspondingly, its period is 2^48 after all, not 2^64.
If RNG quality really matters to you, I'd recommend using your own RNG. Maybe java.util.Random is just great, in this version, on your operating system, etc. It probably is. But that could change. It's happened before that a library writer made things worse in a later version.
It's very simple to write your own, and then you know exactly what's going on. It won't change on upgrade, etc. Here's a generator you could port to Java in 10 minutes. And if you start writing in some new language a week from now, you can port it again.
If you don't implement your own, you can grab code for a well-known RNG from a reputable source and use it in your projects. Then nobody will change your generator out from under you.
(I'm not advocating that people come up with their own algorithms, only their own implementation. Most people, myself included, have no business developing their own algorithm. It's easy to write a bad generator that you think is wonderful. That's why people need to ask questions like this one, wondering how good the library generator is. The algorithm in the generator I referenced has been through the ringer of much peer review.)

Categories

Resources