Stable mapping of an integer to a random number

Stable mapping of an integer to a random number - java

I need a stable and fast one way mapping function of an integer to a random number.
By "stable" I mean that the same integer should always map to the same random number.
And by "random number" I actually mean "some number which behaves like random".
e.g.
1 -> 329423
2 -> -12398791234
3 -> -984
4 -> 42342435
...
If I had enough memory (and time) I would ideally use:
for( int i=Integer.MIN_VALUE; i<Integer.MAX_VALUE; i++ ){
map[i]=i;
}
shuffle( map );
I could use some secure hash function like MD5 or SHA but these are to slow for my purposes and I don't need any crypto/security properties.
I only need this in one way. So I will never have to translate the random number back to its integer.
Background: (For those who want to know more)
I'm planing to use this to invalidate a complete cache over a given amount of time. The invalidation is done "randomly" on access of the cache member with an increasing chance while time passes. I need this to be stable so that isValid( entry ) does not "flicker" and for consistent testing.
The input to this function will be the java hash of the key of the entry which typically is in the range of "1000"-"15000" (but can contain some other stuff, too) and comes in bulks.
The invalidation is done on the condition of:
elapsedTime / timeout * Integer.MAX_VALUE > abs( random( key.hashCode() ) )
EDIT: (this is to long for a comment so I put it here)
I tried gexicide's answer and it turns out this isn't random enough. Here is what I tried:
for( int i=0; i<12000; i++ ){
int hash = (""+i).hashCode();
Random rng = new Random( hash );
int random = rng.nextInt();
System.out.printf( "%05d, %08x, %08x\n", i, hash, random );
}
The output starts with:
00000, 00000030, bac2c591
00001, 00000031, babce6a4
00002, 00000032, bace836b
00003, 00000033, bac8a47e
00004, 00000034, baab49de
00005, 00000035, baa56af1
00006, 00000036, bab707b7
00007, 00000037, bab128ca
00008, 00000038, ba93ce2a
00009, 00000039, ba8def3d
00010, 0000061f, 98048199
and it goes on in this way.
I could use SecureRandom instead:
for( int i=0; i<12000; i++ ){
SecureRandom rng = new SecureRandom( (""+i).getBytes() );
int random = rng.nextInt();
System.out.printf( "%05d, %08x\n", i, random );
}
which indeed looks pretty random but this is not stable anymore and 10 times slower than the method above.

Although you never specified it as a requirement you'll probably want a full 1:1 mapping. This is because the number of possible input values is small. Any output that can occur for more than one input implies another output which can never happen at all. If you have output values which are impossible then you have a skewed distribution.
Of course, if your input is skewed then your output will be skewed anyway, and there's not much you can do about that.
Anyway; this makes it a unique int to int hash.
Simply apply a couple of trivial, independent 1:1 mapping functions until things are suitably distributed. You've already isolated one transform from the Random class, but I suggest mixing it with some other transforms like shifts and XORs to avoid individual weaknesses of different algorithms.
For example:
public static int mapInteger( int value ){
value *= 1664525;
value += 1013904223;
value ^= value >>> 12;
value ^= value << 25;
value ^= value >>> 27;
value *= 1103515245;
value += 12345;
return value;
}
If that's good enough then you can make it faster by deleting lines at random (I suggest you keep at least one multiply) until it's not good enough anymore, and then add the last deleted line back in.

Use a Random and seed it with your number:
Random generator = new Random(i);
return generator.nextInt();
As your testing exposes, the problem with this method is that such a seed creates a very poor random number in the first iteration. To increase the quality of the result, we need to run the random generator a few times; this will fill up the state of the random generator with pseudo-random values and will increase the quality of the following values.
To make sure that the random generator spreads the values enough, use it a few times before outputting the number. This should make the resulting number more pseudo-random:
Random generator = new Random(i);
for(int i = 0; i < 5; i++) generator.nextInt();
return generator.nextInt();
Try different values, maybe 5 is enough.

The answer of gexicide is the correct (and the most simple) one. Just one note:
Running this 1,000,000 times on my system takes about 70ms. (Which is pretty fast.)
But it involves at least two object creations and feeds the GC. It would be better
if this could be done on the stack and not using object creation at all.
Looking at the sources of Random class it shows that there is some code to make
it callable multiple times and to make it threadsafe which can be removed.
So I ended up with a reimplementation in one method:
public static int mapInteger( int value ){
// initial scramble
long seed = (value ^ multiplier) & mask;
// shuffle three times. This is like calling rng.nextInt() 3 times
seed = (seed * multiplier + addend) & mask;
seed = (seed * multiplier + addend) & mask;
seed = (seed * multiplier + addend) & mask;
// fit size
return (int)(seed >>> 16);
}
(multiplier, addend and mask are some constants used by Random)
Running this 1,000,000 times gives the same result but takes only 5ms and is therefor 10 times faster.
BTW: This happens to be another piece of code from The Old Man - again. See Donald Knuth,
The Art of Computer Programming, Volume 2, Section 3.2.1

Related

Too many hashing function collisions

I'm trying to make a hashing function using the polynomial accumulation method (which is supposed to give you 5 collisions per 55k words or something) but when I run it with 1,000 words, I get ~190 collisions. Am I doing something wrong?
public int hashCode(String str) {
double hash_value = 0; // used for float
for (int i = 0; i < str.length(); i++){
hash_value = 33*hash_value + str.charAt(i);
}
return (int) (hash_value % array_size);
}

Generally, prime numbers are favoured for hash code generation. I suggest trying 109 or 251. 33 is a multiple of 3 which means you are more likely to have issues based on your inputs.
Also you should use an int for the calculations and call Math.abs on the result.

Either your data set is extremely "unlucky", or (which is more probable) the array_size is too small (hash function params are usually quoted without consideration of finite bucket array size).

You are generating a large number which is different for different word in the input. But there is still a chance of collisions, as for example
"bA" = 98+(33x65)=2243
"AB" = 65+(33x66)=2243
If you go for a large number greater then 57, there will be less chance of collision. 109 or 251 will be a good choice.

Random Class with seed

long seed = 0;
Random rand = new Random(seed);
int rand100 = 0;
for(int i = 0; i < 100; i++)
rand100 = rand.nextInt();
System.out.println(rand100);
I wrote this code to get 100th random integer value of given seed. I want to know if there is a way to get 100th random integer value of given seed without calling nextInt() 100 times.

I want to know if there is a way to get 100-th random integer value of given seed without calling nextInt() 100 times.
No, there is no way to directly get the 100-th random number of the sequence without first generating the other 99 values. That's simply because of how the generation works in Java, the values depend on their previous values.
If you want to go into details, take a look at the source code. The internal seed changes with every call of the next method, using the previous seed:
nextseed = (oldseed * multiplier + addend) & mask;
So in order to get the seed for the 100-th value, you need to know the seed for the 99-th value, which needs the seed for the 98-th value and so on.
However, you can easily get the 100-th value with a more compact statement like
long seed = ...
int amount = 100;
Random rnd = new Random(seed);
// Generate sequence of 100 random values, discard 99 and get the last
int val = rnd.ints(100).skip(amount - 1).findFirst().orElse(-1);
Keep in mind that this still computes all previous values, as explained. It just discards them.
After you have computed that value for the first time, you could just hardcode it into your program. Let's suppose you have tested it and it yields 123. Then, if the seed does not change, the value will always be 123. So you could just do
int val = 123;
The sequences remain the same through multiple instance of the JVM, so the value will always be valid for this seed. Don't know about release cycles though, I think it's allowed for Random to change its behavior through different versions of Java.

Yes. As long as the seed is constant, then the result of executing this 100 times will yield the same result every time. As such, you can just do
int rand100 = -1331702554;

If I got you correct, you search for some seeded method like
int[] giveMeInts(int amount, long seed);
There exists something very similar, the Stream methods of Random (documentation):
long seed = ...
int amount = 100;
Random rnd = new Random(seed);
IntStream values = rnd.ints(amount);
You could collect the stream values in collections like List<Integer> or an int[] array:
List<Integer> values = rnd.ints(amount).collect(Collectors.toList());
int[] values = rnd.ints(amount).toArray();
The methods will use the seed of the Random object, so if fed with the same seed they will always produce the same sequence of values.

How to generate quasi random numbers that don't immediately repeat (and more)?

I want to generate an endless series of quasi random numbers to the following specification:-
Source of numbers is uniformly distributed and random, ranging 0 through 255 inclusive. It's an existing hardware device.
Required output range is 1 through 8 inclusive.
Two consecutive output numbers are never the same. For example 5 will never follow 5, but you can have 5,2,5.
Exactly one output number is required for every single source number. Rejection sampling therefore cannot be used. And while() loops, shuffles etc. can't be used.
It's this last stipulation that's vexing me. The source generator can only supply random bytes at a constant 1 /s and I want output at a constant 1 /s. Typically you'd simply reject a generated number if it was equal to the previous one, and generate another. In my case you only get one shot at each output. I think that it's some sort of random selection process, but this requirement has me going around in circles as I'm a bad programmer. An algorithm, flowchart or picture will do, but I'll be implementing in Java.
Apologies for the semi generic title, but I couldn't really think of anything more accurate yet concise.

If I understand the problem correctly, the first random number will be chosen randomly from among 8 different numbers (1 to 8), while every successive random number will be chosen from 7 different possibilities (1 to 8 excluding the previous one). Thus, your range of 256 values will need to be divided into 7 possibilities. It won't come out even, but that's the best you can do. So you need something like
public class RandomClass {
public RandomClass(HardwareSource source) {
this.source = source;
first = true;
}
pubic int nextRandom() {
int sourceValue = source.read();
int value;
if (first) {
value = sourceValue % 8 + 1;
prev = value;
} else {
value = sourceValue % 7 + 1;
if (value >= prev) {
value++;
}
prev = value;
first = false;
return value;
}
}
Suppose the first call generates 5. The second time you call it, value is first computed to be a number from 1 to 7; by incrementing it if the value is >= 5, the range of possible outputs becomes 1, 2, 3, 4, 6, 7, 8. The output will be almost evenly distributed between those two values. Since 256 is not divisible by 7, the distribution isn't quite even, and there will be a slight bias toward the lower numbers. You could fix it so that the bias will shift on each call and even out over the entire sequence; I believe one way is
value = (sourceValue + countGenerated) % 7 + 1;
where you keep track of how many numbers you've generated.
I think this is better than solutions that take the input modulo 8 and add 1 if the number equals the previous one. Those solutions will generate prev + 1 with twice the probability of generating other numbers, so it's more skewed than necessary.

int sum=0;
int prev=-1;
int next(int input){
sum=(sum+input)%8;
if(sum==prev)sum=(sum+1)%8;
prev=sum;
return sum+1;
}
(As I interpret even with the new bold emphasis, it is not required to always generate the same output value for the same input value - that would make the task impossible to solve)

Bit-wise efficient uniform random number generation

I recall reading about a method for efficiently using random bits in an article on a math-oriented website, but I can't seem to get the right keywords in Google to find it anymore, and it's not in my browser history.
The gist of the problem that was being asked was to take a sequence of random numbers in the domain [domainStart, domainEnd) and efficiently use the bits of the random number sequence to project uniformly into the range [rangeStart, rangeEnd). Both the domain and the range are integers (more correctly, longs and not Z). What's an algorithm to do this?
Implementation-wise, I have a function with this signature:
long doRead(InputStream in, long rangeStart, long rangeEnd);
in is based on a CSPRNG (fed by a hardware RNG, conditioned through SecureRandom) that I am required to use; the return value must be between rangeStart and rangeEnd, but the obvious implementation of this is wasteful:
long doRead(InputStream in, long rangeStart, long rangeEnd) {
long retVal = 0;
long range = rangeEnd - rangeStart;
// Fill until we get to range
for (int i = 0; (1 << (8 * i)) < range; i++) {
int in = 0;
do {
in = in.read();
// but be sure we don't exceed range
} while(retVal + (in << (8 * i)) >= range);
retVal += in << (8 * i);
}
return retVal + rangeStart;
}
I believe this is effectively the same idea as (rand() * (max - min)) + min, only we're discarding bits that push us over max. Rather than use a modulo operator which may incorrectly bias the results to the lower values, we discard those bits and try again. Since hitting the CSPRNG may trigger re-seeding (which can block the InputStream), I'd like to avoid wasting random bits. Henry points out that this code biases against 0 and 257; Banthar demonstrates it in an example.
First edit: Henry reminded me that summation invokes the Central Limit Theorem. I've fixed the code above to get around that problem.
Second edit: Mechanical snail suggested that I look at the source for Random.nextInt(). After reading it for a while, I realized that this problem is similar to the base conversion problem. See answer below.

Your algorithm produces biased results. Let's assume rangeStart=0 and rangeEnd=257. If first byte is greater than 0, that will be the result. If it's 0, the result will be either 0 or 256 with 50/50 probability. So 0 and 256 are twice less likely to be chosen than any other number.
I did a simple test to confirm this:
p(0)=0.001945
p(1)=0.003827
p(2)=0.003818
...
p(254)=0.003941
p(255)=0.003817
p(256)=0.001955
I think you need to do the same as java.util.Random.nextInt and discard the whole number, instead just the last byte.

After reading the source to Random.nextInt(), I realized that this problem is similar to the base conversion problem.
Rather than converting a single symbol at a time, it would be more effective to convert blocks of input symbol at a time through an accumulator "buffer" which is large enough to represent at least one symbol in the domain and in the range. The new code looks like this:
public int[] fromStream(InputStream input, int length, int rangeLow, int rangeHigh) throws IOException {
int[] outputBuffer = new int[length];
// buffer is initially 0, so there is only 1 possible state it can be in
int numStates = 1;
long buffer = 0;
int alphaLength = rangeLow - rangeHigh;
// Fill outputBuffer from 0 to length
for (int i = 0; i < length; i++) {
// Until buffer has sufficient data filled in from input to emit one symbol in the output alphabet, fill buffer.
fill:
while(numStates < alphaLength) {
// Shift buffer by 8 (*256) to mix in new data (of 8 bits)
buffer = buffer << 8 | input.read();
// Multiply by 256, as that's the number of states that we have possibly introduced
numStates = numStates << 8;
}
// spits out least significant symbol in alphaLength
outputBuffer[i] = (int) (rangeLow + (buffer % alphaLength));
// We have consumed the least significant portion of the input.
buffer = buffer / alphaLength;
// Track the number of states we've introduced into buffer
numStates = numStates / alphaLength;
}
return outputBuffer;
}
There is a fundamental difference between converting numbers between bases and this problem, however; in order to convert between bases, I think one needs to have enough information about the number to perform the calculation - successive divisions by the target base result in remainders which are used to construct the digits in the target alphabet. In this problem, I don't really need to know all that information, as long as I'm not biasing the data, which means I can do what I did in the loop labeled "fill."

Random Number generation Issues

This question was asked in my interview.
random(0,1) is a function that generates integers 0 and 1 randomly.
Using this function how would you design a function that takes two integers a,b as input and generates random integers including a and b.
I have No idea how to solve this.

We can do this easily by bit logic (E,g, a=4 b=10)
Calculate difference b-a (for given e.g. 6)
Now calculate ceil(log(b-a+1)(Base 2)) i.e. no of bits required to represent all numbers b/w a and b
now call random(0,1) for each bit. (for given example range will be b/w 000 - 111)
do step 3 till the number(say num) is b/w 000 to 110(inclusive) i.e. we need only 7 levels since b-a+1 is 7.So there are 7 possible states a,a+1,a+2,... a+6 which is b.
return num + a.

I hate this kind of interview Question because there are some
answer fulfilling it but the interviewer will be pretty mad if you use them. For example,
Call random,
if you obtain 0, output a
if you obtain 1, output b
A more sophisticate answer, and probably what the interviewer wants is
init(a,b){
c = Max(a,b)
d = log2(c) //so we know how much bits we need to cover both a and b
}
Random(){
int r = 0;
for(int i = 0; i< d; i++)
r = (r<<1)| Random01();
return r;
}
You can generate random strings of 0 and 1 by successively calling the sub function.

So we have randomBit() returning 0 or 1 independently, uniformly at random and we want a function random(a, b) that returns a value in the range [a,b] uniformly at random. Let's actually make that the range [a, b) because half-open ranges are easier to work with and equivalent. In fact, it is easy to see that we can just consider the case where a == 0 (and b > 0), i.e. we just want to generate a random integer in the range [0, b).
Let's start with the simple answer suggested elsewhere. (Forgive me for using c++ syntax, the concept is the same in Java)
int random2n(int n) {
int ret = n ? randomBit() + (random2n(n - 1) << 1) : 0;
}
int random(int b) {
int n = ceil(log2(b)), v;
while ((v = random2n(n)) >= b);
return v;
}
That is-- it is easy to generate a value in the range [0, 2^n) given randomBit(). So to get a value in [0, b), we repeatedly generate something in the range [0, 2^ceil(log2(b))] until we get something in the correct range. It is rather trivial to show that this selects from the range [0, b) uniformly at random.
As stated before, the worst case expected number of calls to randomBit() for this is (1 + 1/2 + 1/4 + ...) ceil(log2(b)) = 2 ceil(log2(b)). Most of those calls are a waste, we really only need log2(n) bits of entropy and so we should try to get as close to that as possible. Even a clever implementation of this that calculates the high bits early and bails out as soon as it exits the wanted range has the same expected number of calls to randomBit() in the worst case.
We can devise a more efficient (in terms of calls to randomBit()) method quite easily. Let's say we want to generate a number in the range [0, b). With a single call to randomBit(), we should be able to approximately cut our target range in half. In fact, if b is even, we can do that. If b is odd, we will have a (very) small chance that we have to "re-roll". Consider the function:
int random(int b) {
if (b < 2) return 0;
int mid = (b + 1) / 2, ret = b;
while (ret == b) {
ret = (randomBit() ? mid : 0) + random(mid);
}
return ret;
}
This function essentially uses each random bit to select between two halves of the wanted range and then recursively generates a value in that half. While the function is fairly simple, the analysis of it is a bit more complex. By induction one can prove that this generates a value in the range [0, b) uniformly at random. Also, it can be shown that, in the worst case, this is expected to require ceil(log2(b)) + 2 calls to randomBit(). When randomBit() is slow, as may be the case for a true random generator, this is expected to waste only a constant number of calls rather than a linear amount as in the first solution.

function randomBetween(int a, int b){
int x = b-a;//assuming a is smaller than b
float rand = random();
return a+Math.ceil(rand*x);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Stable mapping of an integer to a random number - java

Related

Too many hashing function collisions

Random Class with seed

How to generate quasi random numbers that don't immediately repeat (and more)?

Bit-wise efficient uniform random number generation

Random Number generation Issues

Categories

Resources