This question already has answers here:
How to resolve a Java Rounding Double issue [duplicate]
(13 answers)
Closed 9 years ago.
why is the Double.parseDouble making 9999999999999999 to 10000000000000000 ?
For Example :
Double d =Double.parseDouble("9999999999999999");
String b= new DecimalFormat("#.##").format(d);
System.out.println(b);
IS Printing
10000000000000000
instead it has to show 9999999999999999 or 9999999999999999.00
Any sort of help is greatly appreciated.
The number 9999999999999999 is just above the precision limit of double-precision floating-point. In other words, the 53-bit mantissa is not able to hold 9999999999999999.
So the result is that it is rounded to the nearest double-precision value - which is 10000000000000000.
9999999999999999 = 0x2386f26fc0ffff // 54 significant bits needed
10000000000000000 = 0x2386f26fc10000 // 38 significant bits needed
double only has 15/16 digits of accuracy and when you give it a number it can't represent (which is most of the time, even 0.1 is not accurate) it takes the closest representable number.
If you want to represent 9999999999999999 exactly, you need to use BigDecimal.
BigDecimal bd = new BigDecimal("9999999999999999");
System.out.println(new DecimalFormat("#.##").format(bd));
prints
9999999999999999
Very few real world problems need this accuracy because you can't measure anything this accurately anyway. i.e. to an error of 1 part per quintillion.
You can find the largest representable integer with
// search all the powers of 2 until (x + 1) - x != 1
for (long l = 1; l > 0; l <<= 1) {
double d0 = l;
double d1 = l + 1;
if (d1 - d0 != 1) {
System.out.println("Cannot represent " + (l + 1) + " was " + d1);
break;
}
}
prints
Cannot represent 9007199254740993 was 9.007199254740992E15
The largest representable integer is 9007199254740992 as it needs one less bit (as its even)
9999999999999999 requires 54 bits of mantissa in order to be represented exactly, and double only has 52. The number is therefore rounded to the nearest number that can be represented using a 52-bit mantissa. This number happens to be 10000000000000000.
The reason 10000000000000000 requires fewer bits is that its binary representation ends in a lot of zeroes, and those zeroes can get represented by increasing the (binary) exponent.
For detailed explanation of a similar problem, see Why is (long)9223372036854665200d giving me 9223372036854665216?
Related
This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 5 years ago.
When I tried the following codes in Java:
System.out.println("0.1d/0.3d is " + 0.1d/0.3d)
System.out.println("0.1f/0.3d is " + 0.1f/0.3d)
I get the following output:
0.1d/0.3d is 0.33333333333333337
0.1f/0.3d is 0.3333333383003871
If float/double should get a double then float/double should be same with double/double in this case.
Just execute the below code to see binary string of number you want to check for:
System.out.println(Long.toBinaryString(Double.doubleToLongBits(0.1d)));
System.out.println(Integer.toBinaryString(Float.floatToIntBits(0.1f)));
Output:
11111110111001100110011001100110011001100110011001100110011010
111101110011001100110011001101
And you see the difference in float bits and double bits. so we can't get the same result for float/double and double/double for a longer precision values .
yes "milbrandt" said right when we are converting float quantity into double by implicit typecasting some information at decimal point will differ. That's why it is showing the different result while dividing float/ double.
As you probably know, double uses 64 bits to store a value but float 32 bits.
To represent significand, double uses 52 bits, and float uses 23 bits. Thus double can give 15 digit precision at most, and float can give 7.
If you do float / double, you actually try to divide a floating point with 7 digit precision at most (far to exact) by a floating point with 15 digit precision at most (closer to exact), and then implicitly typecast the result to a floating point with 15 digit precision. The typecast cannot restore your result from 7 digit precision to 15 digit. Therefore, the results are different though both of them are double.
initialize a double with a float and divide then. You will see the same result.
double f = 0.1f;
double d = 0.1d;
double n = 0.3d;
System.out.println("float /double = " + f/n);
System.out.println("double/double = " + d/n);
result is
float /double = 0.3333333383003871
double/double = 0.33333333333333337
The reason why you see this difference is because you don't have these exact values.
Even if we suppose 0.1d and 0.3d would be exact (which they aren't, what you can see on the result 0.33333333333333337), 0.1f is 0.10000000149011612, which makes the difference.
like a decimal number 0.1, represented as binary 0.00011001100110011...., this is a infinite repeating number.
when I write code like this:
float f = 0.1f;
the program will rounding it as binary 0 01111011 1001 1001 1001 1001 1001 101, this is not original number 0.1.
but when print this variable like this:
System.out.print(f);
I can get original number 0.1 rather than 0.100000001 or some other number. I think the program can't exactly represent "0.1", but it can display "0.1" exactly. How to do it?
I recover decimal number through add each bits of binary, it looks weird.
float f = (float) (Math.pow(2, -4) + Math.pow(2, -5) + Math.pow(2, -8) + Math.pow(2, -9) + Math.pow(2, -12) + Math.pow(2, -13) + Math.pow(2, -16) + Math.pow(2, -17) + Math.pow(2, -20) + Math.pow(2, -21) + Math.pow(2, -24) + Math.pow(2, -25));
float f2 = (float) Math.pow(2, -27);
System.out.println(f);
System.out.println(f2);
System.out.println(f + f2);
Output:
0.099999994
7.4505806E-9
0.1
in math, f1 + f2 = 0.100000001145... , not equals 0.1. Why the program would not get result like 0.100000001, I think it is more accurate.
Java's System.out.print prints just enough decimals that the resulting representation, if parsed as a double or float, converts to the original double or float value.
This is a good idea because it means that in a sense, no information is lost in this kind of conversion to decimal. On the other hand, it can give an impression of exactness which, as you make clear in your question, is wrong.
In other languages, you can print the exact decimal representation of the float or double being considered:
#include <stdio.h>
int main(){
printf("%.60f", 0.1);
}
result: 0.100000000000000005551115123125782702118158340454101562500000
In Java, in order to emulate the above behavior, you need to convert the float or double to BigDecimal (this conversion is exact) and then print the BigDecimal with enough digits. Java's attitude to floating-point-to-string-representing-a-decimal conversion is pervasive, so that even System.out.format is affected. The linked Java program, the important line of which is System.out.format("%.60f\n", 0.1);, shows 0.100000000000000000000000000000000000000000000000000000000000, although the value of 0.1d is not 0.10000000000000000000…, and a Java programmer could have been excused for expecting the same output as the C program.
To convert a double to a string that represents the exact value of the double, consider the hexadecimal format, that Java supports for literals and for printing.
I believe this is covered by Double.toString(double) (and similarly in Float#toString(float)):
How many digits must be printed for the fractional part of m or a? There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double. That is, suppose that x is the exact mathematical value represented by the decimal representation produced by this method for a finite nonzero argument d. Then d must be the double value nearest to x; or if two double values are equally close to x, then d must be one of them and the least significant bit of the significand of d must be 0.
(my emphasis)
Why does changing the sum order returns a different result?
23.53 + 5.88 + 17.64 = 47.05
23.53 + 17.64 + 5.88 = 47.050000000000004
Both Java and JavaScript return the same results.
I understand that, due to the way floating point numbers are represented in binary, some rational numbers (like 1/3 - 0.333333...) cannot be represented precisely.
Why does simply changing the order of the elements affect the result?
Maybe this question is stupid, but why does simply changing the order of the elements affects the result?
It will change the points at which the values are rounded, based on their magnitude. As an example of the kind of thing that we're seeing, let's pretend that instead of binary floating point, we were using a decimal floating point type with 4 significant digits, where each addition is performed at "infinite" precision and then rounded to the nearest representable number. Here are two sums:
1/3 + 2/3 + 2/3 = (0.3333 + 0.6667) + 0.6667
= 1.000 + 0.6667 (no rounding needed!)
= 1.667 (where 1.6667 is rounded to 1.667)
2/3 + 2/3 + 1/3 = (0.6667 + 0.6667) + 0.3333
= 1.333 + 0.3333 (where 1.3334 is rounded to 1.333)
= 1.666 (where 1.6663 is rounded to 1.666)
We don't even need non-integers for this to be a problem:
10000 + 1 - 10000 = (10000 + 1) - 10000
= 10000 - 10000 (where 10001 is rounded to 10000)
= 0
10000 - 10000 + 1 = (10000 - 10000) + 1
= 0 + 1
= 1
This demonstrates possibly more clearly that the important part is that we have a limited number of significant digits - not a limited number of decimal places. If we could always keep the same number of decimal places, then with addition and subtraction at least, we'd be fine (so long as the values didn't overflow). The problem is that when you get to bigger numbers, smaller information is lost - the 10001 being rounded to 10000 in this case. (This is an example of the problem that Eric Lippert noted in his answer.)
It's important to note that the values on the first line of the right hand side are the same in all cases - so although it's important to understand that your decimal numbers (23.53, 5.88, 17.64) won't be represented exactly as double values, that's only a problem because of the problems shown above.
Here's what's going on in binary. As we know, some floating-point values cannot be represented exactly in binary, even if they can be represented exactly in decimal. These 3 numbers are just examples of that fact.
With this program I output the hexadecimal representations of each number and the results of each addition.
public class Main{
public static void main(String args[]) {
double x = 23.53; // Inexact representation
double y = 5.88; // Inexact representation
double z = 17.64; // Inexact representation
double s = 47.05; // What math tells us the sum should be; still inexact
printValueAndInHex(x);
printValueAndInHex(y);
printValueAndInHex(z);
printValueAndInHex(s);
System.out.println("--------");
double t1 = x + y;
printValueAndInHex(t1);
t1 = t1 + z;
printValueAndInHex(t1);
System.out.println("--------");
double t2 = x + z;
printValueAndInHex(t2);
t2 = t2 + y;
printValueAndInHex(t2);
}
private static void printValueAndInHex(double d)
{
System.out.println(Long.toHexString(Double.doubleToLongBits(d)) + ": " + d);
}
}
The printValueAndInHex method is just a hex-printer helper.
The output is as follows:
403787ae147ae148: 23.53
4017851eb851eb85: 5.88
4031a3d70a3d70a4: 17.64
4047866666666666: 47.05
--------
403d68f5c28f5c29: 29.41
4047866666666666: 47.05
--------
404495c28f5c28f6: 41.17
4047866666666667: 47.050000000000004
The first 4 numbers are x, y, z, and s's hexadecimal representations. In IEEE floating point representation, bits 2-12 represent the binary exponent, that is, the scale of the number. (The first bit is the sign bit, and the remaining bits for the mantissa.) The exponent represented is actually the binary number minus 1023.
The exponents for the first 4 numbers are extracted:
sign|exponent
403 => 0|100 0000 0011| => 1027 - 1023 = 4
401 => 0|100 0000 0001| => 1025 - 1023 = 2
403 => 0|100 0000 0011| => 1027 - 1023 = 4
404 => 0|100 0000 0100| => 1028 - 1023 = 5
First set of additions
The second number (y) is of smaller magnitude. When adding these two numbers to get x + y, the last 2 bits of the second number (01) are shifted out of range and do not figure into the calculation.
The second addition adds x + y and z and adds two numbers of the same scale.
Second set of additions
Here, x + z occurs first. They are of the same scale, but they yield a number that is higher up in scale:
404 => 0|100 0000 0100| => 1028 - 1023 = 5
The second addition adds x + z and y, and now 3 bits are dropped from y to add the numbers (101). Here, there must be a round upwards, because the result is the next floating point number up: 4047866666666666 for the first set of additions vs. 4047866666666667 for the second set of additions. That error is significant enough to show in the printout of the total.
In conclusion, be careful when performing mathematical operations on IEEE numbers. Some representations are inexact, and they become even more inexact when the scales are different. Add and subtract numbers of similar scale if you can.
Jon's answer is of course correct. In your case the error is no larger than the error you would accumulate doing any simple floating point operation. You've got a scenario where in one case you get zero error and in another you get a tiny error; that's not actually that interesting a scenario. A good question is: are there scenarios where changing the order of calculations goes from a tiny error to a (relatively) enormous error? The answer is unambiguously yes.
Consider for example:
x1 = (a - b) + (c - d) + (e - f) + (g - h);
vs
x2 = (a + c + e + g) - (b + d + f + h);
vs
x3 = a - b + c - d + e - f + g - h;
Obviously in exact arithmetic they would be the same. It is entertaining to try to find values for a, b, c, d, e, f, g, h such that the values of x1 and x2 and x3 differ by a large quantity. See if you can do so!
This actually covers much more than just Java and Javascript, and would likely affect any programming language using floats or doubles.
In memory, floating points use a special format along the lines of IEEE 754 (the converter provides much better explanation than I can).
Anyways, here's the float converter.
http://www.h-schmidt.net/FloatConverter/
The thing about the order of operations is the "fineness" of the operation.
Your first line yields 29.41 from the first two values, which gives us 2^4 as the exponent.
Your second line yields 41.17 which gives us 2^5 as the exponent.
We're losing a significant figure by increasing the exponent, which is likely to change the outcome.
Try ticking the last bit on the far right on and off for 41.17 and you can see that something as "insignificant" as 1/2^23 of the exponent would be enough to cause this floating point difference.
Edit: For those of you who remember significant figures, this would fall under that category. 10^4 + 4999 with a significant figure of 1 is going to be 10^4. In this case, the significant figure is much smaller, but we can see the results with the .00000000004 attached to it.
Floating point numbers are represented using the IEEE 754 format, which provides a specific size of bits for the mantissa (significand). Unfortunately this gives you a specific number of 'fractional building blocks' to play with, and certain fractional values cannot be represented precisely.
What is happening in your case is that in the second case, the addition is probably running into some precision issue because of the order the additions are evaluated. I haven't calculated the values, but it could be for example that 23.53 + 17.64 cannot be precisely represented, while 23.53 + 5.88 can.
Unfortunately it is a known problem that you just have to deal with.
I believe it has to do with the order of evaulation. While the sum is naturally the same in a math world, in the binary world instead of A + B + C = D, it's
A + B = E
E + C = D(1)
So there's that secondary step where floating point numbers can get off.
When you change the order,
A + C = F
F + B = D(2)
To add a different angle to the other answers here, this SO answer shows that there are ways of doing floating-point math where all summation orders return exactly the same value at the bit level.
I was recently reading about storing floating point values in the memory. And I've written a small program to test what I've read. And I noticed that there is a difference in the way Java processes the floating point values.
public class Test
{
public static void main(String args[])
{
double a = 0.90;
System.out.println(a);
System.out.println(2.00-1.10);
}
}
The above program is printing
0.9
0.8999999999999999
Why both these statements are not printing the same value? I know some floating values can't be represented exactly. In that case, both should give same value.
Why both these statements are not printing the same value?
The result is not the same.
I know some floating values can't be represented exactly.
So you should assume that the result of an operation can depend on the amount of representation error of the values you use.
for (long l = 1; l <= 1e16; l *= 10) {
double a = l + 2;
double b = l + 1.1;
System.out.println(a + " - " + b + " is " + (a - b));
}
as the value gets larger the representation error increases and gets larger compares with the result of 0.9
3.0 - 2.1 is 0.8999999999999999
12.0 - 11.1 is 0.9000000000000004
102.0 - 101.1 is 0.9000000000000057
1002.0 - 1001.1 is 0.8999999999999773
10002.0 - 10001.1 is 0.8999999999996362
100002.0 - 100001.1 is 0.8999999999941792
1000002.0 - 1000001.1 is 0.9000000000232831
1.0000002E7 - 1.00000011E7 is 0.900000000372529
1.00000002E8 - 1.000000011E8 is 0.9000000059604645
1.000000002E9 - 1.0000000011E9 is 0.8999999761581421
1.0000000002E10 - 1.00000000011E10 is 0.8999996185302734
1.00000000002E11 - 1.000000000011E11 is 0.899993896484375
1.000000000002E12 - 1.0000000000011E12 is 0.9000244140625
1.0000000000002E13 - 1.00000000000011E13 is 0.900390625
1.00000000000002E14 - 1.000000000000011E14 is 0.90625
1.000000000000002E15 - 1.0000000000000011E15 is 0.875
1.0000000000000002E16 - 1.0000000000000002E16 is 0.0
and on the topic of when representation error gets so large your operation does nothing.
for (double d = 1; d < Double.MAX_VALUE; d *= 2) {
if (d == d + 1) {
System.out.println(d + " + 1 == " + (d + 1));
break;
}
}
for (double d = 1; d < Double.MAX_VALUE; d *= 2) {
if (d == d - 1) {
System.out.println(d + " - 1 == " + (d - 1));
break;
}
}
prints
9.007199254740992E15 + 1 == 9.007199254740992E15
1.8014398509481984E16 - 1 == 1.8014398509481984E16
When “0.90” is converted to double, the result is .9 plus some small error, e0. Thus a equals .9+e0.
When “1.10” is converted to double, the result is 1.1 plus some small error, e1, so the result is 1.1+e1.
These two errors, e0 and e1, are generally unrelated to each other. Simply put, different decimal numbers are different distances away from binary floating-point numbers. When you evaluate 2.00-1.10, the result is 2–(1.1+e1) = .9–e1. So one of your numbers is .9+e0, and the other is .9-e1, and there is no reason to expect them to be the same.
(As it happens in this case, e0 is .00000000000000002220446049250313080847263336181640625, and e1 is .000000000000000088817841970012523233890533447265625. Also, subtracting 1.1 from 2 introduces no new error, after the conversion of “1.1” to double, by Sterbenz’ Lemma.)
Additional details:
In binary, .9 is .11100110011001100110011001100110011001100110011001100 11001100… The bits in bold fit into a double. The trailing bits do not fit, so the number is rounded at that point. That causes a difference between the exact value of .9 and the value of “.9” represented as a double. In binary, 1.1 is 1.00011001100110011001100110011001100110011001 10011001… Again, the number is rounded. But observe the amount rounding is different. For .9, 1100 1100… was rounded up to 1 0000 0000…, which adds 00110011… at that position. For 1.1, 1001 1001 is rounded up to 1 0000 0000…, which adds 01100110… at that position (and causes a carry in the bold bits). And the two positions are different; 1.1 starts to the left of the radix point, so it looks like this: 1.[52 bits here][place where rounding occurs]. .9 starts to the right of the radix point, so it looks like this: .[53 bits here][place where rounding occurs]. So the rounding for 1.1, besides being 01100110… instead of 00110011…, is also doubled because it occurs one bit to the left of the .9 rounding. So you have two effects making e0 different from e1: The trailing bits that were rounded are different, and the place where rounding occurs is different.
I know some floating values can't be represented exactly
Well that is your answer (or more precisely, as pointed out by Mark Byers, some decimal values can't be represented exactly as a double)! Neither 0.9 or 1.1 can be represented as a double so you get rounding errors.
You can check the exact value of the various doubles with BigDecimal:
public static void main(String args[]) {
double a = 0.9d;
System.out.println(a);
System.out.println(new BigDecimal(a));
double b = 2d - 1.1d;
System.out.println(b);
System.out.println(new BigDecimal(2.0d));
System.out.println(new BigDecimal(1.1d));
System.out.println(new BigDecimal(b));
}
which outputs:
0.9
0.90000000000000002220446049250313080847263336181640625
0.8999999999999999
2
1.100000000000000088817841970012523233890533447265625
0.899999999999999911182158029987476766109466552734375
Your reasoning is that, even if 0.9 can't be represented precisely by a double, that it should have exactly the same double value as 2.0 - 1.1, and so result in the same printed value. That's the error -- this subtraction does not yield the double represented by "0.9" (or the exact value 0.9).
Alternative wording: When will adding Double.MIN_VALUE to a double in Java not result in a different Double value? (See Jon Skeet's comment below)
This SO question about the minimum Double value in Java has some answers which seem to me to be equivalent. Jon Skeet's answer no doubt works but his explanation hasn't convinced me how it is different from Richard's answer.
Jon's answer uses the following:
double d = // your existing value;
long bits = Double.doubleToLongBits(d);
bits++;
d = Double.longBitsToDouble();
Richards answer mentions the JavaDoc for Double.MIN_VALUE
A constant holding the smallest
positive nonzero value of type double,
2-1074. It is equal to the hexadecimal
floating-point literal
0x0.0000000000001P-1022 and also equal
to Double.longBitsToDouble(0x1L).
My question is, how is Double.logBitsToDouble(0x1L) different from Jon's bits++;?
Jon's comment focuses on the basic floating point issue.
There's a difference between adding
Double.MIN_VALUE to a double value,
and incrementing the bit pattern
representing a double. They're
entirely different operations, due to
the way that floating point numbers
are stored. If you try to add a very
little number to a very big number,
the difference may well be so small
that the closest result is the same as
the original. Adding 1 to the current
bit pattern, however, will always
change the corresponding floating
point value, by the smallest possible
value which is visible at that scale.
I don't see any difference to Jon's approach of incrementing a long, "bits++", with adding Double.MIN_VALUE. When will they produce different results?
I wrote the following code to test the differences. Maybe someone could provide more/better sample double numbers or use a loop to find a number where there is a difference.
double d = 3.14159269123456789; // sample double
long bits = Double.doubleToLongBits(d);
long bitsBefore = bits;
bits++;
long bitsAfter = bits;
long bitsDiff = bitsAfter - bitsBefore;
long bitsMinValue = Double.doubleToLongBits(Double.MIN_VALUE);
long bitsSmallValue = Double.doubleToLongBits(Double.longBitsToDouble(0x1L));
if (bitsMinValue == bitsSmallValue)
{
System.out.println("Double.doubleToLongBits(0x1L) is same as Double.doubleToLongBits(Double.MIN_VALUE)");
}
if (bitsDiff == bitsMinValue)
{
System.out.println("bits++ increments the same amount as Double.MIN_VALUE");
}
if (bitsDiff == bitsMinValue)
{
d = d + Double.MIN_VALUE;
System.out.println("Using Double.MIN_VALUE");
}
else
{
d = Double.longBitsToDouble(bits);
System.out.println("Using doubleToLongBits/bits++");
}
System.out.println("bits before: " + bitsBefore);
System.out.println("bits after: " + bitsAfter);
System.out.println("bits diff: " + bitsDiff);
System.out.println("bits Min value: " + bitsMinValue);
System.out.println("bits Small value: " + bitsSmallValue);
OUTPUT:
Double.doubleToLongBits(Double.longBitsToDouble(0x1L)) is same as Double.doubleToLongBits(Double.MIN_VALUE)
bits++ increments the same amount as Double.MIN_VALUE
Using doubleToLongBits/bits++
bits before: 4614256656636814345
bits after: 4614256656636814346
bits diff: 1
bits Min value: 1
bits Small value: 1
Okay, let's imagine it this way, sticking with decimal numbers. Suppose you have a floating decimal point type which allows you to represent 5 decimal digits, and a number between 0 and 3 for the exponent, to multiple the result by 1, 10, 100 or 1000.
So the smallest non-zero value is just 1 (i.e. mantissa=00001, exponent=0). The largest value is 99999000 (mantissa=99999, exponent=3).
Now, what happens when you add 1 to 50000000? You can't represent 50000001...the next representable number after 500000000 is 50001000. So if you try to add them together, the result is just going to be the closest value to the "true" result - which is still 500000000. That's like adding Double.MIN_VALUE to a large double.
My version (converting to bits, incrementing and then converting back) is like taking that 50000000, splitting into mantissa and exponent (m=50000, e=3) then incrementing it the smallest amount, to (m=50001, e=3) and then reassembling to 50001000.
Do you see how they're different?
Now here's a concrete example:
public class Test{
public static void main(String[] args) {
double before = 100000000000000d;
double after = before + Double.MIN_VALUE;
System.out.println(before == after);
long bits = Double.doubleToLongBits(before);
bits++;
double afterBits = Double.longBitsToDouble(bits);
System.out.println(before == afterBits);
System.out.println(afterBits - before);
}
}
This tries both approaches with a large number. The output is:
true
false
0.015625
Going through the output, that means:
Adding Double.MIN_VALUE didn't have any effect
Incrementing the bit did have an effect
The difference between afterBits and before is 0.015625, which is much bigger than Double.MIN_VALUE. No wonder the simple addition had no effect!
It's exactly as Jon said:
"If you try to add a very little
number to a very big number, the
difference may well be so small that
the closest result is the same as the
original."
For example:
// True:
(Double.MAX_VALUE + Double.MIN_VALUE) == Double.MAX_VALUE
// False:
Double.longBitsToDouble(Double.doubleToLongBits(Double.MAX_VALUE) + 1) == Double.MAX_VALUE)
MIN_VALUE is the smallest representable positive double, but that certainly does not imply that adding it to an arbitrary double results in a unequal one.
In contrast, adding 1 to the underlying bits results in a new bit pattern, and thus does result in a unequal double.