Bit-wise efficient uniform random number generation - java

I recall reading about a method for efficiently using random bits in an article on a math-oriented website, but I can't seem to get the right keywords in Google to find it anymore, and it's not in my browser history.
The gist of the problem that was being asked was to take a sequence of random numbers in the domain [domainStart, domainEnd) and efficiently use the bits of the random number sequence to project uniformly into the range [rangeStart, rangeEnd). Both the domain and the range are integers (more correctly, longs and not Z). What's an algorithm to do this?
Implementation-wise, I have a function with this signature:
long doRead(InputStream in, long rangeStart, long rangeEnd);
in is based on a CSPRNG (fed by a hardware RNG, conditioned through SecureRandom) that I am required to use; the return value must be between rangeStart and rangeEnd, but the obvious implementation of this is wasteful:
long doRead(InputStream in, long rangeStart, long rangeEnd) {
long retVal = 0;
long range = rangeEnd - rangeStart;
// Fill until we get to range
for (int i = 0; (1 << (8 * i)) < range; i++) {
int in = 0;
do {
in = in.read();
// but be sure we don't exceed range
} while(retVal + (in << (8 * i)) >= range);
retVal += in << (8 * i);
}
return retVal + rangeStart;
}
I believe this is effectively the same idea as (rand() * (max - min)) + min, only we're discarding bits that push us over max. Rather than use a modulo operator which may incorrectly bias the results to the lower values, we discard those bits and try again. Since hitting the CSPRNG may trigger re-seeding (which can block the InputStream), I'd like to avoid wasting random bits. Henry points out that this code biases against 0 and 257; Banthar demonstrates it in an example.
First edit: Henry reminded me that summation invokes the Central Limit Theorem. I've fixed the code above to get around that problem.
Second edit: Mechanical snail suggested that I look at the source for Random.nextInt(). After reading it for a while, I realized that this problem is similar to the base conversion problem. See answer below.

Your algorithm produces biased results. Let's assume rangeStart=0 and rangeEnd=257. If first byte is greater than 0, that will be the result. If it's 0, the result will be either 0 or 256 with 50/50 probability. So 0 and 256 are twice less likely to be chosen than any other number.
I did a simple test to confirm this:
p(0)=0.001945
p(1)=0.003827
p(2)=0.003818
...
p(254)=0.003941
p(255)=0.003817
p(256)=0.001955
I think you need to do the same as java.util.Random.nextInt and discard the whole number, instead just the last byte.

After reading the source to Random.nextInt(), I realized that this problem is similar to the base conversion problem.
Rather than converting a single symbol at a time, it would be more effective to convert blocks of input symbol at a time through an accumulator "buffer" which is large enough to represent at least one symbol in the domain and in the range. The new code looks like this:
public int[] fromStream(InputStream input, int length, int rangeLow, int rangeHigh) throws IOException {
int[] outputBuffer = new int[length];
// buffer is initially 0, so there is only 1 possible state it can be in
int numStates = 1;
long buffer = 0;
int alphaLength = rangeLow - rangeHigh;
// Fill outputBuffer from 0 to length
for (int i = 0; i < length; i++) {
// Until buffer has sufficient data filled in from input to emit one symbol in the output alphabet, fill buffer.
fill:
while(numStates < alphaLength) {
// Shift buffer by 8 (*256) to mix in new data (of 8 bits)
buffer = buffer << 8 | input.read();
// Multiply by 256, as that's the number of states that we have possibly introduced
numStates = numStates << 8;
}
// spits out least significant symbol in alphaLength
outputBuffer[i] = (int) (rangeLow + (buffer % alphaLength));
// We have consumed the least significant portion of the input.
buffer = buffer / alphaLength;
// Track the number of states we've introduced into buffer
numStates = numStates / alphaLength;
}
return outputBuffer;
}
There is a fundamental difference between converting numbers between bases and this problem, however; in order to convert between bases, I think one needs to have enough information about the number to perform the calculation - successive divisions by the target base result in remainders which are used to construct the digits in the target alphabet. In this problem, I don't really need to know all that information, as long as I'm not biasing the data, which means I can do what I did in the loop labeled "fill."

Related

Represent long in least amount of characters

I need to represent both very large and small numbers in the shortest string possible. The numbers are unsigned. I have tried just straight Base64 encode, but for some smaller numbers, the encoded string is longer than just storing the number as a string. What would be the best way to most optimally store a very large or short number in the shortest string possible with it being URL safe?
I have tried just straight Base64 encode, but for some smaller numbers, the encoded string is longer than just storing the number as a string
Base64 encoding of binary byte data will make it longer, by about a third. It is not supposed to make it shorter, but to allow safe transport of binary data in formats that are not binary safe.
However, base 64 is more compact than decimal representation of a number (or of byte data), even if it is less compact than base 256 (the raw byte data). Encoding your numbers in base 64 directly will make them more compact than decimal. This will do it:
private static final String base64Chars =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
static String encodeNumber(long x) {
char[] buf = new char[11];
int p = buf.length;
do {
buf[--p] = base64Chars.charAt((int)(x % 64));
x /= 64;
} while (x != 0);
return new String(buf, p, buf.length - p);
}
static long decodeNumber(String s) {
long x = 0;
for (char c : s.toCharArray()) {
int charValue = base64Chars.indexOf(c);
if (charValue == -1) throw new NumberFormatException(s);
x *= 64;
x += charValue;
}
return x;
}
Using this encoding scheme, Long.MAX_VALUE will be the string H__________, which is 11 characters long, compared to its decimal representation 9223372036854775807 which is 19 characters long. Numbers up to about 16 million will fit in a mere 4 characters. That's about as short as you'll get it. (Technically there are two other characters which do not need to be encoded in URLs: . and ~. You can incorporate those to get base 66, which would be a smidgin shorter for some numbers, although that seems a bit pedantic.)
To extend on Stephen C's answer, here is a piece of code to convert to base 62 (but you can increase this by adding more characters to the digits String (just pick what characters are valid for you):
public static String toString(long n) {
String digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
int base = digits.length();
String s = "";
while (n > 0) {
long d = n % base;
s = digits.charAt(d) + s;
n = n / base;
}
return s;
}
This will never result in the string representation being longer than the digit one.
Assuming that you don't do any compression, and that you restrict yourself to URL safe characters, then the following procedure will give you the most compact encoding possible.
Make a list of all URL safe characters
Count them. Suppose you have N.
Represent your number in base N, representing 0 by the first character, 1 by the 2nd and so on.
So, what about compression ...
If you assume that the numbers you are representing are uniformly distributed across their range, then there is no real opportunity for compression.
Otherwise, there is potential for compression. If you can reduce the size of the common numbers then you can typically achieve a saving by compression. This is how Huffman encoding works.
But the downside is that compression at this level is not perfect across the range of numbers. It reduces the size of some numbers, but it inevitably increases the size of others.
So what does this mean for your use-case?
I think it means that you are looking at the problem the wrong way. You should not be aiming for a minimal encoded size for every number. You should be aiming to minimize the size on average ... averaged over the actual distribution of your numbers.

Understanding Piece of Java code

While working with some code base, I am trying to understand piece of code so as can work and customize it , I am able to understand almost 90% of the code flow. Here is the overall flow
Code is being used to generate 15 digit code (alphanumeric), first 3 digits are customer provided.
Initially code is generating 16 digit alphanumeric number and storing it in the cache.
Customer can generated any number of code by specifying quantity.
all customer generated codes are being generated from the 16 digit number (point 2). All code generated have numbers/ alphabets from that 16 digit alphanumeric number.
When some one try to use those codes, system is trying to validate if the provided code is valid or not.
I am struck at the logic used to determine if the provided code is valid or not, here is that piece of code, I am generating 6 code as a sample, in this case alphanumeric code being generated and stored in the cache is
initial-alphabet : M9W6K3TENDGSFAL4
Code generated based on initial-alphabet are
myList=[123-MK93-ES6D-36F3, 123-MK93-EFTW-D3LG, 123-MK93-EALK-TGLD, 123-MK93-ELKK-DN6S, 123-MK93-E4D9-3A6T, 123-MK93-EMTW-LNME]
protected int getVoucherNumber(String voucherCode){
int voucherNumberPos = voucherCode.length() - 12;
String voucherNumberHex = voucherCode.substring(voucherNumberPos, voucherNumberPos + 6);
int firstByte = getIntFromHexByte(voucherNumberHex.substring(0, 2), 0);
int secondByte = getIntFromHexByte(voucherNumberHex.substring(2, 4), 1);
int thirdByte = getIntFromHexByte(voucherNumberHex.substring(4, 6), 7);
return firstByte << 16 | secondByte << 8 | thirdByte;
}
private int getIntFromHexByte(String value, int offset){
return (getIntFromHexNibble(value.charAt(0), offset) << 4) + getIntFromHexNibble(value.charAt(1), offset + 4);
}
private int getIntFromHexNibble(char value, int offset){
int pos = getAlphabet().indexOf(value);
if (pos == -1) {// nothing found}
pos -= offset;
while (pos < 0) {
pos += 16;
}
return pos % 16;
}
Here is the code which is trying to validate code
int voucherNumber = getVoucherNumber(kyList.get(4));
In this case value of voucherNumber is 4 i.e the fourth element from the list, in case I pass any value which is not part of the list getVoucherNumber method is returning a higher value (greater than the list count).
One of the main thing which confusing me are these 2 lines
int voucherNumberPos = voucherCode.length() - 12;
String voucherNumberHex = voucherCode.substring(voucherNumberPos, voucherNumberPos + 6);
As per my understanding, they are first moving out the first 3 digit from the check which are customer supplied but again they have not used rest of the string but only specific part of the string.
Can any one help me to understand this
It appears you've inherited responsibility for some poorly written code. We've all been there so I'll try to answer in that spirit. I'm not positive this question is on-topic for this site, but it doesn't appear to be forbidden by the help center. In an attempt to stay on-topic I'll end with some general advice not limited to the highly-localized specifics of the question.
myList.get(4)
Arrays in Java are zero-based, so that's 123-MK93-E4D9-3A6T. You probably know that, but it isn't clear from your question that you do.
initial-alphabet : M9W6K3TENDGSFAL4
I assume this is what's returned by the call to getAlphabet in getIntFromHexNibble. So the alphanumeric characters in the code are meant to be hexadecimal but using a nonstandard set of 16 characters for the digits.
protected int getVoucherNumber(String voucherCode){
Ignoring the hyphens and the customer-supplied first three digits, the code is 'MK93E4D93A6T'. Twelve hex digits encode 48 bits, but an int in Java is only 32 bits long, so the code is already broken. Whatever it does, it isn't going to return the voucher number represented by the voucher code.
int voucherNumberPos = voucherCode.length() - 12;
String voucherNumberHex = voucherCode.substring(voucherNumberPos, voucherNumberPos + 6);
This is setting voucherNumberHex to a six-character long string, starting twelve from the end of voucherCode, in this case 93-E4D. It seems likely the author didn't expect the caller to include the hyphens when this code was first written. Even so the intent seems to be to ignore half the voucher code.
int firstByte = getIntFromHexByte(voucherNumberHex.substring(0, 2), 0);
int secondByte = getIntFromHexByte(voucherNumberHex.substring(2, 4), 1);
int thirdByte = getIntFromHexByte(voucherNumberHex.substring(4, 6), 7);
This looks straightforward at first, but the parameters 0, 1, and 7 are not offsets at all, despite the name of the argument. It's trying to turn each pair of hex digits into a byte, which would be sensible enough if not for the hyphen character. Now for the fun part:
private int getIntFromHexNibble(char value, int offset) {
int pos = getAlphabet().indexOf(value);
if (pos == -1) {// nothing found}
pos -= offset;
while (pos < 0) {
pos += 16;
}
return pos % 16;
}
The right curly brace after "found" is commented out, so the code you posted is actually incomplete. I'm going to assume there's another line or two that read
return pos;
}
So the basic idea is that M becomes 0, 9 becomes 1, and so on via the call to indexOf. But if this method sees a character not in the provided alphabet, like a hyphen, it uses the so-called offset to calculate a default value (in this case 14, if I've done the math in my head right), and returns that as the hex nibble value.
The end result is that you get back a number in the range 0 (inclusive) to 2^24 (exclusive). But of the 2^24 possible values such a number should have, only 2^20 different values will ever be returned. So from a voucher code that looks like twelve digits of base-32, which would have an astronomical number of values, you're limited to slightly over a million different voucher numbers within each customer prefix.
General advice:
Use peer reviews to prevent code like this from getting into
production.
Use unit tests to prove the code does what the function
name says it does.
Use exceptions to fail early if the input isn't
what you're expecting.

Stable mapping of an integer to a random number

I need a stable and fast one way mapping function of an integer to a random number.
By "stable" I mean that the same integer should always map to the same random number.
And by "random number" I actually mean "some number which behaves like random".
e.g.
1 -> 329423
2 -> -12398791234
3 -> -984
4 -> 42342435
...
If I had enough memory (and time) I would ideally use:
for( int i=Integer.MIN_VALUE; i<Integer.MAX_VALUE; i++ ){
map[i]=i;
}
shuffle( map );
I could use some secure hash function like MD5 or SHA but these are to slow for my purposes and I don't need any crypto/security properties.
I only need this in one way. So I will never have to translate the random number back to its integer.
Background: (For those who want to know more)
I'm planing to use this to invalidate a complete cache over a given amount of time. The invalidation is done "randomly" on access of the cache member with an increasing chance while time passes. I need this to be stable so that isValid( entry ) does not "flicker" and for consistent testing.
The input to this function will be the java hash of the key of the entry which typically is in the range of "1000"-"15000" (but can contain some other stuff, too) and comes in bulks.
The invalidation is done on the condition of:
elapsedTime / timeout * Integer.MAX_VALUE > abs( random( key.hashCode() ) )
EDIT: (this is to long for a comment so I put it here)
I tried gexicide's answer and it turns out this isn't random enough. Here is what I tried:
for( int i=0; i<12000; i++ ){
int hash = (""+i).hashCode();
Random rng = new Random( hash );
int random = rng.nextInt();
System.out.printf( "%05d, %08x, %08x\n", i, hash, random );
}
The output starts with:
00000, 00000030, bac2c591
00001, 00000031, babce6a4
00002, 00000032, bace836b
00003, 00000033, bac8a47e
00004, 00000034, baab49de
00005, 00000035, baa56af1
00006, 00000036, bab707b7
00007, 00000037, bab128ca
00008, 00000038, ba93ce2a
00009, 00000039, ba8def3d
00010, 0000061f, 98048199
and it goes on in this way.
I could use SecureRandom instead:
for( int i=0; i<12000; i++ ){
SecureRandom rng = new SecureRandom( (""+i).getBytes() );
int random = rng.nextInt();
System.out.printf( "%05d, %08x\n", i, random );
}
which indeed looks pretty random but this is not stable anymore and 10 times slower than the method above.
Although you never specified it as a requirement you'll probably want a full 1:1 mapping. This is because the number of possible input values is small. Any output that can occur for more than one input implies another output which can never happen at all. If you have output values which are impossible then you have a skewed distribution.
Of course, if your input is skewed then your output will be skewed anyway, and there's not much you can do about that.
Anyway; this makes it a unique int to int hash.
Simply apply a couple of trivial, independent 1:1 mapping functions until things are suitably distributed. You've already isolated one transform from the Random class, but I suggest mixing it with some other transforms like shifts and XORs to avoid individual weaknesses of different algorithms.
For example:
public static int mapInteger( int value ){
value *= 1664525;
value += 1013904223;
value ^= value >>> 12;
value ^= value << 25;
value ^= value >>> 27;
value *= 1103515245;
value += 12345;
return value;
}
If that's good enough then you can make it faster by deleting lines at random (I suggest you keep at least one multiply) until it's not good enough anymore, and then add the last deleted line back in.
Use a Random and seed it with your number:
Random generator = new Random(i);
return generator.nextInt();
As your testing exposes, the problem with this method is that such a seed creates a very poor random number in the first iteration. To increase the quality of the result, we need to run the random generator a few times; this will fill up the state of the random generator with pseudo-random values and will increase the quality of the following values.
To make sure that the random generator spreads the values enough, use it a few times before outputting the number. This should make the resulting number more pseudo-random:
Random generator = new Random(i);
for(int i = 0; i < 5; i++) generator.nextInt();
return generator.nextInt();
Try different values, maybe 5 is enough.
The answer of gexicide is the correct (and the most simple) one. Just one note:
Running this 1,000,000 times on my system takes about 70ms. (Which is pretty fast.)
But it involves at least two object creations and feeds the GC. It would be better
if this could be done on the stack and not using object creation at all.
Looking at the sources of Random class it shows that there is some code to make
it callable multiple times and to make it threadsafe which can be removed.
So I ended up with a reimplementation in one method:
public static int mapInteger( int value ){
// initial scramble
long seed = (value ^ multiplier) & mask;
// shuffle three times. This is like calling rng.nextInt() 3 times
seed = (seed * multiplier + addend) & mask;
seed = (seed * multiplier + addend) & mask;
seed = (seed * multiplier + addend) & mask;
// fit size
return (int)(seed >>> 16);
}
(multiplier, addend and mask are some constants used by Random)
Running this 1,000,000 times gives the same result but takes only 5ms and is therefor 10 times faster.
BTW: This happens to be another piece of code from The Old Man - again. See Donald Knuth,
The Art of Computer Programming, Volume 2, Section 3.2.1

Suggestions for compression library to get byte[] as small as possible without considering cpu expense?

Correct me if I'm approaching this wrong, but I have a queue server and a bunch of java workers that I'm running on in a cluster. My queue has work units that are very small but there are many of them. So far my benchmarks and review of the workers has shown that I get about 200mb/second.
So I'm trying to figure out how to get more work units via my bandwidth. Currently my CPU usage is not very high(40-50%) because it can process the data faster than the network can send it. I want to get more work through the queue and am willing to pay for it via expensive compression/decompression(since half of each core is idle right now).
I have tried java LZO and gzip, but was wondering if there was anything better(even if its more cpu expensive)?
Updated: data is a byte[]. Basically the queue only takes it in that format so I am using ByteArrayOutputStream to write two ints and a int[] to to a byte[] format. The values in int[] are all ints between 0 to 100(or 1000 but the vast majority of the numbers are zeros). The lists are quite large anywhere from 1000 to 10,000 items(again, majority zeros..never more than 100 non-zero numbers in the int[])
It sounds like using a custom compression mechanism that exploits the structure of the data could be very efficient.
Firstly, using a short[] (16 bit data type) instead of an int[] will halve (!) the amount of data sent, you can do this because the numbers are easily between -2^15 (-32768) and 2^15-1 (32767). This is ridiculously easy to implement.
Secondly, you could use a scheme similar to run-length encoding: a positive number represents that number literally, while a negative number represents that many zeros (after taking absolute values). e.g.
[10, 40, 0, 0, 0, 30, 0, 100, 0, 0, 0, 0] <=> [10, 40, -3, 30, -1, 100, -4]
This is harder to implement that just substituting short for int, but will provide ~80% compression in the very worst case (1000 numbers, 100 non-zero, none of which are consecutive).
I just did some simulations to work out the compression ratios. I tested the method I described above, and the one suggested by Louis Wasserman and sbridges. Both performed very well.
Assuming the length of the array and the number of non-zero numbers are both uniformly between their bounds, both methods save about 5400 ints (or shorts) on average with a compressed size of about 2.5% the original! The run-length encoding method seems to save about 1 additional int (or average compressed size that is 0.03% smaller), i.e. basically no difference, so you should use the one that is easiest to implement. The following are histograms of the compression ratios for 50000 random samples (they are very similar!).
Summary: using shorts instead of ints and one of the compression methods, you will be able to compress the data to about 1% of its original size!
For the simulation, I used the following R script:
SIZE <- 50000
lengths <- sample(1000:10000, SIZE, replace=T)
nonzeros <- sample(1:100, SIZE, replace=T)
f.rle <- function(len, nonzero) {
indexes <- sort(c(0,sample(1:len, nonzero, F)))
steps <- diff(indexes)
sum(steps > 1) + nonzero # one short per run of zeros, and one per zero
}
f.index <- function(len, nonzero) {
nonzero * 2
}
# using the [value, -1 * number of zeros,...] method
rle.comprs <- mapply(f.rle, lengths, nonzeros)
print(mean(lengths - rle.comprs)) # average number of shorts saved
rle.ratios <- rle.comprs / lengths * 100
print(mean(rle.ratios)) # average compression ratio
# using the [(index, value),...] method
index.comprs <- mapply(f.index, lengths, nonzeros)
print(mean(lengths - index.comprs)) # average number of shorts saved
index.ratios <- index.comprs / lengths * 100
print(mean(index.ratios)) # average compression ratio
par(mfrow=c(2,1))
hist(rle.ratios, breaks=100, freq=F, xlab="Compression ratio (%)", main="Run length encoding")
hist(index.ratios, breaks=100, freq=F, xlab="Compression ratio (%)", main="Store indices")
Try encoding your data as two varints, the first varint is the index of the number in the sequence, the second is the number itself. For entries which are 0, write nothing.
I wrote an implementation of an RLE algorithm. This operates on a byte array, so could be used as an in-line filter with your existing code. It should safely handle large or negative values should your data change in the future.
It encodes a sequence of zeros as {0}{qty} where {qty} is in the range 1..255. All other bytes are stored as the byte itself. You squish your byte array before sending, and bloat it back to full size when receiving.
public static byte[] squish(byte[] bloated) {
int size = bloated.length;
ByteBuffer bb = ByteBuffer.allocate(2 * size);
bb.putInt(size);
int zeros = 0;
for (int i = 0; i < size; i++) {
if (bloated[i] == 0) {
if (++zeros == 255) {
bb.putShort((short) zeros);
zeros = 0;
}
} else {
if (zeros > 0) {
bb.putShort((short) zeros);
zeros = 0;
}
bb.put(bloated[i]);
}
}
if (zeros > 0) {
bb.putShort((short) zeros);
zeros = 0;
}
size = bb.position();
byte[] buf = new byte[size];
bb.rewind();
bb.get(buf, 0, size).array();
return buf;
}
public static byte[] bloat(byte[] squished) {
ByteBuffer bb = ByteBuffer.wrap(squished);
byte[] bloated = new byte[bb.getInt()];
int pos = 0;
while (bb.position() < bb.capacity()) {
byte value = bb.get();
if (value == 0) {
bb.position(bb.position() - 1);
pos += bb.getShort();
} else {
bloated[pos++] = value;
}
}
return bloated;
}
I've been impressed with BZIP2, compared with 7z and gzip. I haven't personally tried this Java implementation, but it looks like it would be easy to substitute your GZIP call for this one and verify the results.
http://www.kohsuke.org/bzip2
You should probably try all the major ones on your data stream and see which works best. You should also consider that some algorithms will take longer to run, adding more latency to the queue. This may or may not be a problem depending on your application.
You can sometimes get better compression if you know something about the data. (dbaupp's answer covers this approach nicely)
This comparison of compression algorithms might be useful. From the article:

Random Number generation Issues

This question was asked in my interview.
random(0,1) is a function that generates integers 0 and 1 randomly.
Using this function how would you design a function that takes two integers a,b as input and generates random integers including a and b.
I have No idea how to solve this.
We can do this easily by bit logic (E,g, a=4 b=10)
Calculate difference b-a (for given e.g. 6)
Now calculate ceil(log(b-a+1)(Base 2)) i.e. no of bits required to represent all numbers b/w a and b
now call random(0,1) for each bit. (for given example range will be b/w 000 - 111)
do step 3 till the number(say num) is b/w 000 to 110(inclusive) i.e. we need only 7 levels since b-a+1 is 7.So there are 7 possible states a,a+1,a+2,... a+6 which is b.
return num + a.
I hate this kind of interview Question because there are some
answer fulfilling it but the interviewer will be pretty mad if you use them. For example,
Call random,
if you obtain 0, output a
if you obtain 1, output b
A more sophisticate answer, and probably what the interviewer wants is
init(a,b){
c = Max(a,b)
d = log2(c) //so we know how much bits we need to cover both a and b
}
Random(){
int r = 0;
for(int i = 0; i< d; i++)
r = (r<<1)| Random01();
return r;
}
You can generate random strings of 0 and 1 by successively calling the sub function.
So we have randomBit() returning 0 or 1 independently, uniformly at random and we want a function random(a, b) that returns a value in the range [a,b] uniformly at random. Let's actually make that the range [a, b) because half-open ranges are easier to work with and equivalent. In fact, it is easy to see that we can just consider the case where a == 0 (and b > 0), i.e. we just want to generate a random integer in the range [0, b).
Let's start with the simple answer suggested elsewhere. (Forgive me for using c++ syntax, the concept is the same in Java)
int random2n(int n) {
int ret = n ? randomBit() + (random2n(n - 1) << 1) : 0;
}
int random(int b) {
int n = ceil(log2(b)), v;
while ((v = random2n(n)) >= b);
return v;
}
That is-- it is easy to generate a value in the range [0, 2^n) given randomBit(). So to get a value in [0, b), we repeatedly generate something in the range [0, 2^ceil(log2(b))] until we get something in the correct range. It is rather trivial to show that this selects from the range [0, b) uniformly at random.
As stated before, the worst case expected number of calls to randomBit() for this is (1 + 1/2 + 1/4 + ...) ceil(log2(b)) = 2 ceil(log2(b)). Most of those calls are a waste, we really only need log2(n) bits of entropy and so we should try to get as close to that as possible. Even a clever implementation of this that calculates the high bits early and bails out as soon as it exits the wanted range has the same expected number of calls to randomBit() in the worst case.
We can devise a more efficient (in terms of calls to randomBit()) method quite easily. Let's say we want to generate a number in the range [0, b). With a single call to randomBit(), we should be able to approximately cut our target range in half. In fact, if b is even, we can do that. If b is odd, we will have a (very) small chance that we have to "re-roll". Consider the function:
int random(int b) {
if (b < 2) return 0;
int mid = (b + 1) / 2, ret = b;
while (ret == b) {
ret = (randomBit() ? mid : 0) + random(mid);
}
return ret;
}
This function essentially uses each random bit to select between two halves of the wanted range and then recursively generates a value in that half. While the function is fairly simple, the analysis of it is a bit more complex. By induction one can prove that this generates a value in the range [0, b) uniformly at random. Also, it can be shown that, in the worst case, this is expected to require ceil(log2(b)) + 2 calls to randomBit(). When randomBit() is slow, as may be the case for a true random generator, this is expected to waste only a constant number of calls rather than a linear amount as in the first solution.
function randomBetween(int a, int b){
int x = b-a;//assuming a is smaller than b
float rand = random();
return a+Math.ceil(rand*x);
}

Categories

Resources