Background
I've been reading and trying to wrap my head around various questions/answers that relate to finding the seed from Java.util.Random given its output from nextInt().
The implementation of nextInt(int bound) is:
public int nextInt(int bound) {
if (bound <= 0)
throw new IllegalArgumentException("bound must be positive");
if ((bound & -bound) == bound) // i.e., bound is a power of 2
return (int)((bound * (long)next(31)) >> 31);
int bits, val;
do {
bits = next(31);
val = bits % bound;
} while (bits - val + (bound-1) < 0);
return val;
}
The implementation of next(int bits) is:
protected int next(int bits) {
long oldseed, nextseed;
AtomicLong seed = this.seed;
do {
oldseed = seed.get();
nextseed = (oldseed * multiplier + addend) & mask;
} while (!seed.compareAndSet(oldseed, nextseed));
return (int)(nextseed >>> (48 - bits));
}
where the multiplier is 0x5DEECE66DL, the addend is 0xBL, and the mask is (1L << 48) - 1. These are hexadecimal values where the L is Java convention for long conversion.
By calling nextInt() without a bound, the full 32-bits are returned from next(32) instead of dropping bits with bits % bound.
Questions
Without completely bruteforcing the full 248 possibilities, how would I go about finding the current seed after x amount of calls to nextInt(n) (assuming the bound is never a power of 2)? For example, let's assume I want to find the seed given 10 calls to nextInt(344) [251, 331, 306, 322, 333, 283, 187, 54, 170, 331].
How can I determine the amount of data I'd need to find the correct seed, not just another one that produces the same starting data?
Does this change given bounds that are either odd/even?
Without completely bruteforcing the full 248 possibilities, how would I go about finding the current seed after x amount of calls to nextInt(n) (assuming the bound is never a power of 2)?
Let's first remove code that is here for multi-threading, error testing, and bound a power of two. Things boil down to
public int nextInt_equivalent(int bound) {
int bits, val;
do {
seed = (seed * multiplier + addend) & mask; // 48-bit LCG
bits = seed >> 17; // keep the top 31 bits
val = bits % bound; // build val in [0, bound)
} while (bits - val + (bound-1) < 0);
return val;
}
Next we must understand what the while (bits - val + (bound-1) < 0) is about. It's here to use bits only when it is in an interval of width multiple of bound, thus insuring a uniform distribution of val. That interval is [0, (1L<<31)/bound*bound).
The while condition is equivalent to while (bits >= (1L<<31)/bound*bound), but executes faster. This condition occurs for the (1L<<31)%bound highest values of bits out of 1L<<31. When bound is 344, this occurs for 8 values of bits out of 231, or about 3.7 per US billion.
This is so rare that one reasonable approach is to assume it does not occur. Another is to hope that it occurs, and test if (and when) it does by seeing if the seeds that cause that rare event lead to a sequence of val found in the givens. We have only ((1L<<31)%bound)<<17 (here slightly above a million) values of seed to test, which is quite feasible. Whatever, in the rest I assume this is ruled out, and consider the generator without the while.
When bound is even, or more generally a multiple of 2S for some S>0, observe that the low-order S bits of the output val (which we can find) are also the low-order S bits of bits, and thus the bits of rank [17, 17+S) of seed. And that the low 17+S bits of seed are fully independent of the 31-S other ones. When bound is 344=8×43, we have S=3, and thus we can attack the low-order 17+S=20 low-order bits of seed independently. We directly get S=3 bits of seed from the first val.
We get the 17 low-order bits of seed by elimination: for each of 217 candidates, and given the S=3 bits we know, does the 17+S=20 bits of seed lead to a sequence of val which low-order S bits match the given sequence? With enough values we can make a full determination of the 17+S bits. We need ⌈17/S+1⌉ = 7 val to narrow down to a single value for the 17+S low-order bits of seedin this way. If we got less, we need to keep several candidates in the next step. In the question we have ample val to narrow to a single value, and be convinced we get it right.
And then when we have these 17+S=20 bits of seed, we can find the remaining 31-S=28 with a moderate amount of brute force. We could test 228 values of the yet unknown bits for seed and check which gives a full match with the known val. But better: we know seed % (bound<<17) exactly, and thus need only test 231/bound values of seed (here roughly 6 million).
How can I determine the amount of data I'd need to find the correct seed?
A working heuristic for all but pathological LCGs, and many other PRNGs, is that you need as much information as there are bits in the state, thus 48 bits. Each output gives you log2(bound) bits, thus you need ⌈48/log2(bound)⌉ values, here 6 (which would require keeping track of a few candidates for the low 20 bits of seed and thus require correspondingly more work in the second phase) . Extra values give confidence that the actual state was recovered, but AFAIK wrong guesses just will not happen unless the while comes into play.
Does this change given bounds that are either odd/even?
The above attack strategy does not work well for odd bound (we can't separately guess the low-order bits and need to search 248/bound values of seed). However there are better attacks with less guesswork, applicable even if we raise the number of state bits considerably, and including for odd bound. They are more difficult to explain (read: I hardly can get them working with a math package, and can't explain how, yet; see this question).
Related
My use case is this,
I wish to reduce an extremely long number like 97173329791011L to a smaller integer by shifting down and be able to get back the long number 97173329791011L from the smaller integer by shifting up .I have implemented a function called reduceLong to implement this as shown below
private int reduceLong(long reduceable) {
return (int) (reduceable >> 32);
}
However, I feel the function I have is in a way wrong as the result produced is incorrect. Here is the result from my console output when trying to reduce 97173329791011L to a smaller integer
Trying to reduce 97173329791011L
Result= 0
Your help would be greatly appreciated. Thanks alot.
The int datatype can hold all integral values in the range [-2^31, +2^31-1], inclusive. That's, in decimal, [-2147483648, 2147483647]. The total range covers 2^32 different numbers, and that makes sense because ints are 32 bits in memory. Just like you can't store an elephant in a matchbox, you can't store an infinite amount of data in 32 bits worth of data. You can store at most... 32 bits worth of data.
3706111600L is a long; it is (slightly) outside of the range of the int. In binary, it is:
11011100111001101100011001110000
How do you propose to store these 64 bits into a mere 32 bits? There is no general strategy here, and that is mathematically impossible to have: You can store exactly 2^64 different numbers in a long, and that's more unique values than 2^32, so whatever 'compression' algorithm you suggest, it cannot work except for at most 2^32 unique long values, which is only a very small number of them.
Separate from that, running your snippet: first, you do 11011100111001101100011001110000 >> 32, which gets rid of all of the bits. (there are exactly 32 bits there), hence why you get 0.
Perhaps you want this 'compression' algorithm: The 2^32 longs we decree as representable in this scheme are:
all the longs from 0 to 2^31-1, by mapping them to the same integer value, and then also another batch of 2^31 longs that immediately follow that, by mapping them bitwise, although, given that in java all numbers are signed, these then map to negative ints. All other long values (so all values above 2^32-1 and all negative longs) cannot be mapped (or if you try, they'd unmap to the wrong value).
If you want that, all you need to do:
int longToInt = (int) theLong;
long backToLong = 0xFFFFFFFFL & theLong;
Normally if you cast an int to a long it 'sign extends', filling the top 32 bits with 1s to represent the fact that your int is negative. The bitwise & operation clears the top 32 bits all back down to 0 and you're back to your original... IF the original long had 32 zero-bits at the top (which 3706111600L does).
Your test number is too small. Converted into Hexadecimal, 3706111600L is 0x00000000DCE6C670.
If you shift this number 32 bits to the right, you will lose the last 8 nibbles; your resulting number is 0x00000000L. Casted to int this value is still 0.
I was recently looking into some problems with but manipulation in Java and I came up with two questions.
1) Firstly, I came up to the problem of flipping all the bits in a number.
I found this solution:
public class Solution {
public int flipAllBits(int num) {
int mask = (1 << (int)Math.floor(Math.log(num)/Math.log(2))+1) - 1;
return num ^ mask;
}
}
But what happens when k = 32 bits? Can the 1 be shifted 33 times?
What I understand from the code (although it doesn't really make sense), the mask is 0111111.(31 1's)....1 and not 32 1's, as someone would expect. And therefore when num is a really large number this would fail.
2) Another question I had was determining when something is a bit sequence in 2s complement or just a normal bit sequence. For example I read that 1010 when flipped is 0110 which is -10 but also 6. Which one is it and how do we know?
Thanks.
1) The Math object calls are not necessary. Flipping all the bits in any ordinal type in Java (or C) is not an arithmatic operation. It is a bitwise operation. Using the '^' operator, simply using 1- as an operand will work regardless of the sizeof int in C/C++ or a Java template with with the ordinal type as a parameter T. The tilde '~' operator is the other option.
T i = 0xf0f0f0f0;
System.out.println(T.toHexString(i));
i ^= -1;
System.out.println(T.toHexString(i));
i = ~ i;
System.out.println(T.toHexString(i));
2) Since the entire range of integers maps to the entire range of integers in a 2's compliment transform, it is not possible to detect whether a number is or is not 2's complement unless one knows the range of numbers from which the 2's complement might be calculated and the two sets (before and after) are mutually exclusive.
That mask computation is fairly inscrutable, I'm going to guess that it (attempts to, since you mention it's wrong) make a mask up to and including the highest set bit. Whether that's useful for "flipping all bits" is an other possible point of discussion, since to me at least, "all bits" means all 32 of them, not some number that depends on the value. But if that's what you want then that's what you want. Especially combined with that second question, that looks like a mistake to me, so you'd be implementing the wrong thing from the start - see near the bottom.
Anyway, the mask can be generated with some reasonably nice bitmath, which does not create any doubt about possible edge cases (eg Math.log(0) is probably bad, and k=32 corresponds with negative numbers which are also probably bad to put into a log):
int m = num | (num >> 16);
m |= m >> 8;
m |= m >> 4;
m |= m >> 2;
m |= m >> 1;
return num ^ m;
Note that this function has odd properties, it almost always returns an unsigned-lower number than went in, except at 0. It flips bits so the name is not completely wrong, but flipAllBits(flipAllBits(x)) != x (usually), while the name suggests it should be an involution.
As for the second question, there is nothing to determine. Two's complement is scheme by which you can interpret a bitvector - any bitvector. So it's really a choice you make; to interpret a given bitvector that way or some other way. In Java the "default" interpretation is two's complement (eg toString will print an int by interpreting it according to its two's complement meaning), but you don't have to go along with it, you can (with care) treat int as unsigned, or as an array of booleans, or several bitfields packed together, etc.
If you wanted to invert all the bits but made the common mistake to assume that the number of bits in an int is variable (and that you therefore needed to compute a mask that covers "all bits"), I have some great news for you, because inverting all bits is a lot easier:
return ~num;
If you were reading "invert all bits" in the context of two's complement, it would have the above meaning, so all bits, including those left of the highest set bit.
In my Android app I want to generate an integer number between 0 and 100000 using method Random.next(int). I want to know the type of probability of this method. Is it the same for every value from the range?
Assuming you want to draw 100000, it will broadly satisfy the statistical properties of a uniform distribution in the range [0, 100000].
That is,
The probability of generating any particular number is equal, and independent of the previous numbers drawn.
The average value of the mean of a sample of random numbers will be 100000 / 2
The variance of a sample will converge to the variance of a uniform distribution.
But it's not perfect, or cryptographically secure. In particular you can observe autocorrelation effects. Depending on your specific requirements you may need to consider alternatives.
I assume you are referring to nextInt(int) and not next(int) (which is protected, and not public method).
It's uniform (discrete, pseudo-random) distribution.
From the documentation:
Returns a pseudorandom, uniformly distributed int value between 0
(inclusive) and the specified value (exclusive)
It will generate a pseudorandom number. Each number has the same probability to be generated. I tried to call Random.next(2) 100 times and I get 50 1s and 50 0s. I think you get the idea. Its a uniform distribution.
Here's the source code if you are curious:
protected synchronized int next(int bits) {
seed = (seed * multiplier + 0xbL) & ((1L << 48) - 1);
return (int) (seed >>> (48 - bits));
}
Each call of Random.nextInt(int value) will be produced different result.
What about probability?
From docs :
The general contract of nextInt is that one int value is pseudorandomly generated and returned. All 2^32 possible int values are produced with (approximately) equal probability
In the example Josh gives of the flawed random method that generates a positive random number with a given upper bound n, I don't understand the two of the flaws he states.
The method from the book is:
private static final Random rnd = new Random();
//Common but deeply flawed
static int random(int n) {
return Math.abs(rnd.nextInt()) % n;
}
He says that if n is a small power of 2, the sequence of random numbers that are generated will repeat itself after a short period of time. Why is this the case? The documentation for Random.nextInt() says Returns the next pseudorandom, uniformly distributed int value from this random number generator's sequence. So shouldn't it be that if n is a small integer then the sequence will repeat itself, why does this only apply to powers of 2?
Next he says that if n is not a power of 2, some numbers will be returned on average more frequently than others. Why does this occur, if Random.nextInt() generates random integers that are uniformly distributed? (He provides a code snippet which clearly demonstrates this but I don't understand why this is the case, and how this is related to n being a power of 2).
Question 1: if n is a small power of 2, the sequence of random numbers that are generated will repeat itself after a short period of time.
This is not a corollary of anything Josh is saying; rather, it is simply a known property of linear congruential generators. Wikipedia has the following to say:
A further problem of LCGs is that the lower-order bits of the generated sequence have a far shorter period than the sequence as a whole if m is set to a power of 2. In general, the n-th least significant digit in the base b representation of the output sequence, where bk = m for some integer k, repeats with at most period bn.
This is also noted in the Javadoc:
Linear congruential pseudo-random number generators such as the one implemented by this class are known to have short periods in the sequence of values of their low-order bits.
The other version of the function, Random.nextInt(int), works around this by using different bits in this case (emphasis mine):
The algorithm treats the case where n is a power of two specially: it returns the correct number of high-order bits from the underlying pseudo-random number generator.
This is a good reason to prefer Random.nextInt(int) over using Random.nextInt() and doing your own range transformation.
Question 2: Next he says that if n is not a power of 2, some numbers will be returned on average more frequently than others.
There are 232 distinct numbers that can be returned by nextInt(). If you try to put them into n buckets by using % n, and n isn't a power of 2, some buckets will have more numbers than others. This means that some outcomes will occur more frequently than others even though the original distribution was uniform.
Let's look at this using small numbers. Let's say nextInt() returned four equiprobable outcomes, 0, 1, 2 and 3. Let's see what happens if we applied % 3 to them:
0 maps to 0
1 maps to 1
2 maps to 2
3 maps to 0
As you can see, the algorithm would return 0 twice as frequently as it would return each of 1 and 2.
This does not happen when n is a power of two, since one power of two is divisible by the other. Consider n=2:
0 maps to 0
1 maps to 1
2 maps to 0
3 maps to 1
Here, 0 and 1 occur with the same frequency.
Additional resources
Here are some additional -- if only tangentially relevant -- resources related to LCGs:
Spectral tests are statistical tests used to assess the quality of LCGs. Read more here and here.
A collection of classical pseudorandom number generators with linear structures has some pretty scatterplots (the generator used in Java is called DRAND48).
There is an interesting discussion on crypto.SE about predicting values from Java's generator.
1) When n is a power of 2, rnd % n is equivalent to selecting a few lower bits of the original. Lower bits of numbers generated by the type of generators used by java are known to be "less random" than the higher bits. It's just the property of the formula used for generating the numbers.
2) Imagine, that the largest possible value, returned by random() is 10, and n = 7. Now doing n % 7 maps numbers 7, 8, 9 and 10 into 0, 1, 2, 3 respectively. Therefore, if the original number is uniformly distributed, the result will be heavily biased towards the lower numbers, because they will appear twice as often as 4, 5 and 6. In this case, this does happen regardless of whether n is a power of two or not, but, if instead of 10 we chose, say, 15 (which is 2^4-1), then any n, that is a power of two would result in a uniform distribution, because there would be no "excess" numbers left at the end of the range to cause bias, because the total number of possible values would be exactly divisible by the number of possible remainders.
I have a scenario where I'm working with large integers (e.g. 160 bit), and am trying to create the biggest possible unsigned integer that can be represented with an n bit number at run time. The exact value of n isn't known until the program has begun executing and read the value from a configuration file. So for example, n might be 160, or 128, or 192, etcetera...
Initially what I was thinking was something like:
BigInteger.valueOf((long)Math.pow(2, n));
but then I realized, the conversion to long that takes place sort of defeats the purpose, given that long is not comprised of enough bits in the first place to store the result. Any suggestions?
On the largest n-bit unsigned number
Let's first take a look at what this number is, mathematically.
In an unsigned binary representation, the largest n-bit number would have all bits set to 1. Let's take a look at some examples:
1(2)= 1 =21 - 1
11(2)= 3 =22 - 1
111(2)= 7 =23 - 1
:
1………1(2)=2n -1
n
Note that this is analogous in decimal too. The largest 3 digit number is:
103- 1 = 1000 - 1 = 999
Thus, a subproblem of finding the largest n-bit unsigned number is computing 2n.
On computing powers of 2
Modern digital computers can compute powers of two efficiently, due to the following pattern:
20= 1(2)
21= 10(2)
22= 100(2)
23= 1000(2)
:
2n= 10………0(2)
n
That is, 2n is simply a number having its bit n set to 1, and everything else set to 0 (remember that bits are numbered with zero-based indexing).
Solution
Putting the above together, we get this simple solution using BigInteger for our problem:
final int N = 5;
BigInteger twoToN = BigInteger.ZERO.setBit(N);
BigInteger maxNbits = twoToN.subtract(BigInteger.ONE);
System.out.println(maxNbits); // 31
If we were using long instead, then we can write something like this:
// for 64-bit signed long version, N < 64
System.out.println(
(1L << N) - 1
); // 31
There is no "set bit n" operation defined for long, so traditionally bit shifting is used instead. In fact, a BigInteger analog of this shifting technique is also possible:
System.out.println(
BigInteger.ONE.shiftLeft(N).subtract(BigInteger.ONE)
); // 31
See also
Wikipedia/Binary numeral system
Bit Twiddling Hacks
Additional BigInteger tips
BigInteger does have a pow method to compute non-negative power of any arbitrary number. If you're working in a modular ring, there are also modPow and modInverse.
You can individually setBit, flipBit or just testBit. You can get the overall bitCount, perform bitwise and with another BigInteger, and shiftLeft/shiftRight, etc.
As bonus, you can also compute the gcd or check if the number isProbablePrime.
ALWAYS remember that BigInteger, like String, is immutable. You can't invoke a method on an instance, and expect that instance to be modified. Instead, always assign the result returned by the method to your variables.
Just to clarify you want the largest n bit number (ie, the one will all n-bits set). If so, the following will do that for you:
BigInteger largestNBitInteger = BigInteger.ZERO.setBit(n).subtract(BigInteger.ONE);
Which is mathematically equivalent to 2^n - 1. Your question has how you do 2^n which is actually the smallest n+1 bit number. You can of course do that with:
BigInteger smallestNPlusOneBitInteger = BigInteger.ZERO.setBit(n);
I think there is pow method directly in BigInteger. You can use it for your purpose
The quickest way I can think of doing this is by using the constructor for BigInteger that takes a byte[].
BigInteger(byte[] val) constructs the BigInteger Object from an array of bytes. You are, however, dealing with bits, and so creating a byte[] that might consist of {127, 255, 255, 255, 255} for a 39 bit integer representing 2^40 - 1 might be a little tedious.
You could also use the constructor BigInteger(String val, int radix) - which might be readily more apparently what's going on in your code if you don't mind a performance hit for parsing a String. Then you could generate a string like val = "111111111111111111111111111111111111111" and then call BigInteger myInt = new BigInteger(val, 2); - resulting in the same 39 bit integer.
The first option will require some thinking about how to represent your number. That particular constructor expects a two's-compliment, big-endian representation of the number. The second will likely be marginally slower, but much clearer.
EDIT: Corrected numbers. I thought you meant represent 2^n, and didn't correctly read the largest value n bits could store.