I want to use math.random() in a Java program.
What is the distribution of this way of getting doubles?
This is documented in the Java SE API docs:
public static double random()
Returns a double value with a positive sign, greater than or equal to 0.0 and less than 1.0. Returned values are chosen pseudorandomly with (approximately) uniform distribution from that range.
The doc goes on to mention that it uses java.util.Random under the hood, presumably nextDouble() (although not explicitly stated).
The remarks about nonuniformity in nextDouble are as follows:
The method nextDouble is implemented by class Random as if by:
public double nextDouble() {
return (((long)next(26) << 27) + next(27))
/ (double)(1L << 53);
}
The hedge "approximately" is used in the foregoing description only because the next method is only approximately an unbiased source of independently chosen bits. If it were a perfect source of randomly chosen bits, then the algorithm shown would choose double values from the stated range with perfect uniformity.
It appears that the non-uniformity is only due to the underlying nonuniformity of next rather than a significant bias in the nextDouble algorithm itself:
[In early versions of Java, the result was incorrectly calculated as:
return (((long)next(27) << 27) + next(27))
/ (double)(1L << 54);
This might seem to be equivalent, if not better, but in fact it
introduced a large nonuniformity because of the bias in the rounding of floating-point numbers: it was three times as likely that the low-order bit of the significand would be 0 than that it would be 1! This nonuniformity probably doesn't matter much in practice, but we strive for perfection.]
Note that this draws uniformly from the interval [0,1] separated into equal steps of 2-53, which is not the same as drawing uniformly from the set of all double values whose values are between 0 and 1. This is a subtle difference: the set of all double values between 0 and 1 is itself not uniformly spaced on the number line as this image shows:
image reproduced from docs.oracle.com
Related
I am reading the implementation details of Java 8 HashMap, can anyone let me know why Java HashMap initial array size is 16 specifically? What is so special about 16? And why is it the power of two always? Thanks
The reason why powers of 2 appear everywhere is because when expressing numbers in binary (as they are in circuits), certain math operations on powers of 2 are simpler and faster to perform (just think about how easy math with powers of 10 are with the decimal system we use). For example, multication is not a very efficient process in computers - circuits use a method similar to the one you use when multiplying two numbers each with multiple digits. Multiplying or dividing by a power of 2 requires the computer to just move bits to the left for multiplying or the right for dividing.
And as for why 16 for HashMap? 10 is a commonly used default for dynamically growing structures (arbitrarily chosen), and 16 is not far off - but is a power of 2.
You can do modulus very efficiently for a power of 2. n % d = n & (d-1) when d is a power of 2, and modulus is used to determine which index an item maps to in the internal array - which means it occurs very often in a Java HashMap. Modulus requires division, which is also much less efficient than using the bitwise and operator. You can convince yourself of this by reading a book on Digital Logic.
The reason why bitwise and works this way for powers of two is because every power of 2 is expressed as a single bit set to 1. Let's say that bit is t. When you subtract 1 from a power of 2, you set every bit below t to 1, and every bit above t (as well as t) to 0. Bitwise and therefore saves the values of all bits below position t from the number n (as expressed above), and sets the rest to 0.
But how does that help us? Remember that when dividing by a power of 10, you can count the number of zeroes following the 1, and take that number of digits starting from the least significant of the dividend in order to find the remainder. Example: 637989 % 1000 = 989. A similar property applies to binary numbers with only one bit set to 1, and the rest set to 0. Example: 100101 % 001000 = 000101
There's one more thing about choosing the hash & (n - 1) versus modulo and that is negative hashes. hashcode is of type int, which of course can be negative. modulo on a negative number (in Java) is negative also, while & is not.
Another reason is that you want all of the slots in the array to be equally likely to be used. Since hash() is evenly distributed over 32 bits, if the array size didn't divide into the hash space, then there would be a remainder causing lower indexes to have a slightly higher chance of being used. Ideally, not just the hash, but (hash() % array_size) is random and evenly distributed.
But this only really matters for data with a small hash range (like a byte or character).
In my Android app I want to generate an integer number between 0 and 100000 using method Random.next(int). I want to know the type of probability of this method. Is it the same for every value from the range?
Assuming you want to draw 100000, it will broadly satisfy the statistical properties of a uniform distribution in the range [0, 100000].
That is,
The probability of generating any particular number is equal, and independent of the previous numbers drawn.
The average value of the mean of a sample of random numbers will be 100000 / 2
The variance of a sample will converge to the variance of a uniform distribution.
But it's not perfect, or cryptographically secure. In particular you can observe autocorrelation effects. Depending on your specific requirements you may need to consider alternatives.
I assume you are referring to nextInt(int) and not next(int) (which is protected, and not public method).
It's uniform (discrete, pseudo-random) distribution.
From the documentation:
Returns a pseudorandom, uniformly distributed int value between 0
(inclusive) and the specified value (exclusive)
It will generate a pseudorandom number. Each number has the same probability to be generated. I tried to call Random.next(2) 100 times and I get 50 1s and 50 0s. I think you get the idea. Its a uniform distribution.
Here's the source code if you are curious:
protected synchronized int next(int bits) {
seed = (seed * multiplier + 0xbL) & ((1L << 48) - 1);
return (int) (seed >>> (48 - bits));
}
Each call of Random.nextInt(int value) will be produced different result.
What about probability?
From docs :
The general contract of nextInt is that one int value is pseudorandomly generated and returned. All 2^32 possible int values are produced with (approximately) equal probability
Here is what i tried:
public class LongToDoubleTest{
public static void main(String... args) {
System.out.println(Long.MAX_VALUE);
System.out.println(Long.MAX_VALUE/2);
System.out.println(Math.floor(Long.MAX_VALUE/2));
System.out.println(new Double(Math.floor(Long.MAX_VALUE/2)).longValue());
}
}
Here is the output:
9223372036854775807
4611686018427387903
4.6116860184273879E18
4611686018427387904
I was initially trying to figure out, is it possible to keep half of the Long.MAX_VALUE in double without losing data, So I had a test with all those lines, except the last one. So it appeared that i'm right and last 3 was missing. Then, just to clarify it, I added last line. And not 3 appeared but 4. So my question is, from where those 4 appeared and why it's 4 and not 3. Because 4 is actually an incorrect value here.
P.S. I'm very poor in knowledge of IEEE 754, so maybe behaviour I found is absolutely correct, but 4 is obviously a wrong value here.
You need to understand that not every long can be exactly represented as a double - after all, there are 2568 long values, and at most that many double values (although lots of those are reserved for "not a number" values etc). Given that there are also double values which clearly aren't long values (e.g. 0.5 - any non-integer, for a start) that means there can't possibly be a double value for every long value.
That means if you start with a long value that can't be represented, convert it to a double and then back to a long, it's entirely reasonable to get back a different number.
The absolute difference between adjacent double values increases as the magnitude of the numbers gets larger. So when the numbers are very small, the difference between two numbers is really tiny (very very small indeed) - but when the numbers get bigger - e.g. above the range of int - the gap between numbers becomes greater... greater even than 1. So adjacent double values near Long.MAX_VALUE can be quite a distance apart. That means several long values will map to the same nearest double.
The arithmetic here is completely predictable.
The Java double format uses one bit for sign and eleven bits for exponent. That leaves 52 bits to encode the significand (the fraction portion of a floating-point number).
For normal numbers, the significand has a leading 1 bit, followed by a binary point, followed by the 52 bits of the encoding.
When Long.MAX_VALUE/2, 4611686018427387903, is converted to double, it must be rounded to fit in these bits. 4611686018427387903 is 0x3fffffffffffffff. There are 62 significant bits there (two leading zeroes that are insignificant, then 62 bits). Since not all 62 bits fit in the 53 bits available, we must round them. The last nine bits, which we must eliminate by rounding, are 1111111112. We must either round them down to zero (producing 0x3ffffffffffffe00) or up to 10000000002 (which carries into the next higher bit and produces 0x4000000000000000). The latter change (adding 1) is smaller than the former change (subtracting 1111111112). We want a smaller error, so we choose the latter and round up. Thus, we round 0x3fffffffffffffff up to 0x4000000000000000. This is 262, which is 4611686018427387904.
I've been a little curious about this. Math.random() gives a value in the range [0.0,1.0). So what might the largest value it can give be? In other words, what is the closest double value to 1.0 that is less than 1.0?
Java uses 64-bit IEEE-754 representation, so the closest number smaller than one is theoretically 3FEFFFFFFFFFFFFF in hexadecimal representation, which is 0 for sign, -1 for the exponent, and 1.9999999999999997 for the 52-bit significand. This equals to roughly 0.9999999999999998.
References: IEEE-754 Calculator.
The number that you want is returned by Math.nextAfter(1.0, -1.0).
The name of the function is somewhat of a misnomer. Math.nextAfter(a, 1.0) returns the least double value that is greater than a (i.e., the next value after a), and Math.nextAfter(a, -1.0) returns the greatest value that is less than a (i.e., the value before a).
Note: Another poster said, 1.0-Double.MIN_NORMAL. That's wrong. 1.0-Double.MIN_NORMAL is exactly equal to 1.0.
The smallest positive value of a double is Double.MIN_NORMAL. So, the largest number less than 1.0 is 1.0-Double.MIN_NORMAL.
Double.MIN_NORMAL is equal to 2-1022, so the answer is still extremely close to 1.0. You'd have to print the value of 1.0-Double.MIN_NORMAL to 308 decimal places before you could see anything but a 9.
I am confused about using expm1 function in java
The Oracle java doc for Math.expm1 says:
Returns exp(x) -1. Note that for values of x near 0, the exact sum of
expm1(x) + 1 is much closer to the true result of ex than exp(x).
but this page says:
However, for negative values of x, roughly -4 and lower, the algorithm
used to calculate Math.exp() is relatively ill-behaved and subject to
round-off error. It's more accurate to calculate ex - 1 with a
different algorithm and then add 1 to the final result.
should we use expm1(x) for negative x values or near 0 values?
The implementation of double at the bit level means that you can store doubles near 0 with much more precision than doubles near 1. That's why expm1 can give you much more accuracy for near-zero powers than exp can, because double doesn't have enough precision to store very accurate numbers very close to 1.
I don't believe the article you're citing is correct, as far as the accuracy of Math.exp goes (modulo the limitations of double). The Math.exp specification guarantees that the result is within 1 ulp of the exact value, which means -- to oversimplify a bit -- a relative error of at most 2^-52, ish.
You use expm1(x) for anything close to 0. Positive or negative.
The reason is because exp(x) of anything close to 0 will be very close to 1. Therefore exp(x) - 1 will suffer from destructive cancellation when x is close to 0.
expm1(x) is properly optimized to avoid this destructive cancellation.
From the mathematical side: If exp is implemented using its Taylor Series, then expm1(x) can be done by simply omitting the first +1.