Floating point addition - giving strange result..! - java

When executing the following code:
public class FPoint {
public static void main(String[] args) {
float f = 0.1f;
for(int i = 0; i<9; i++) {
f += 0.1f;
}
System.out.println(f);
}
}
The following output is displayed:
1.0000001
But output should be 1.0000000, right? Correct me if I'm wrong..!!

0.1 is not really "0.1" with IEEE 754 Standard.
0.1 is coded : 0 01111011 10011001100110011001101 (with float number)
0 is the sign (= positive)
01111011 the exponent (= 123 -> 123 - 127 = -4 (127 is the bias in IEEE 754))
10011001100110011001101 the mantissa
To convert the mantissa in decimal number we have
1.10011001100110011001101*2^-4(base2) [the 1.xxx is implicit in IEEE 754]
= 0.000110011001100110011001101(base2)
= 1/2^4 + 1/2^5 + 1/2^8 + 1/2^9 + 1/2^12 + 1/2^13 + 1/2^16 + 1/2^17 + 1/2^20 + 1/2^21 + 1/2^24 + 1/2^25 + 1/2^27 (base10)
= 1/16 + 1/32 + 1/256 + 1/512 + 1/4096 + 1/8192 + 1/65536 + 1/131072 ...(base10)
= 0.10000000149011612(base10)

Much like certain real numbers like 1/3 cannot be expressed exactly in our decimal system certain numbers like 1/10 cannot be exactly expressed in in binary.
1/3=(decimal)0.33333333333 (3) recurring
1/10=(binary)0.00011001100 (1100) recurring
Because we are so familiar with decimal it seems obvious that 1/3 cannot be exactly represented, but to someone with a base 3 number system this would seem like a major limitation to decimal;
1/3=(base 3)0.1
As such 1/10 is inexactly represented within the float and by adding multiple inexact numbers together you get an inexact answer.
It is within this context that you should interpret the floating point errors. If you have a number that is exactly representable within decimal but not within binary then you may find BigDecimal useful. But you should not consider BigDecimal to be better than floating point numbers; it just has a different set of numbers it can and can't represent exactly; the set you're used to. BigDecimal also attempts to use the decimal counting system on a binary processor; as such its calculations are less efficient that double/float based calculations

If you need exact computation, like when you work with very sensitive data, don't use float or double, their representation in binary uses approximations. Check this article for some nice explanations.

It is a common misunderstanding. This kind of "errors" are related with floating point data structures: read this. You can use a double in this case to get more precision.

Related

Math in Java when precision is lost

The below algorithm works to identify a factor of a small number but fails completely when using a large one such as 7534534523.0
double result = 7; // 7534534523.0;
double divisor = 1;
for (int i = 2; i < result; i++){
double r = result / (double)i;
if (Math.floor(r) == r){
divisor = i;
break;
}
}
System.out.println(result + "/" + divisor + "=" + (result/divisor));
The number 7534534523.0 divided by 2 on a calculator can give a decimal part or round it (losing the 0.5). How can I perform such a check on large numbers? Do I have to use BigDecimal for this? Or is there another way?
If your goal is to represent a number with exactly n significant figures to the right of the decimal, BigDecimal is the class to use.
Immutable, arbitrary-precision signed decimal numbers. A BigDecimal consists of an arbitrary precision integer unscaled value and a 32-bit integer scale. If zero or positive, the scale is the number of digits to the right of the decimal point. If negative, the unscaled value of the number is multiplied by ten to the power of the negation of the scale. The value of the number represented by the BigDecimal is therefore (unscaledValue × 10-scale).
Additionally, you can have a better control over scale manipulation, rounding and format conversion.
I don't see what the problem is in your code. It works exactly like it should.
When I run your code I get this output:
7.534534523E9/77359.0=97397.0
That may have confused you, but its perfectly fine. It's just using scientific notation, but there is nothing wrong with that.
7.534534523E9 = 7.534534523 * 109 = 7,534,534,523
If you want to see it in normal notation, you can use System.out.format to print the result:
System.out.format("%.0f/%.0f=%.0f\n", result, divisor, result / divisor);
Shows:
7534534523/77359=97397
But you don't need double or BigDecimal to check if a number is divisible by another number. You can use the modulo operator on integral types to check if one number is divisible by another. As long as your numbers fit in a long, this works, otherwise you can move on to a BigInteger:
long result = 7534534523L;
long divisor = 1;
for (int i = 2; i < result; i++) {
if (result % i == 0) {
divisor = i;
break;
}
}
System.out.println(result + "/" + divisor + "=" + (result / divisor));
BigDecimal is the way to move ahead for preserving high precision in numbers.
DO NOT do not use constructor BigDecimal(double val) as the rounding is performed and the output is not always same. The same is mentioned in the implementation as well. According to it:
The results of this constructor can be somewhat unpredictable. One might assume that writing new BigDecimal(0.1) in Java creates a BigDecimal which is exactly equal to 0.1 (an unscaled value of 1, with a scale of 1), but it is actually equal to 0.1000000000000000055511151231257827021181583404541015625. This is because 0.1 cannot be represented exactly as a double (or, for that matter, as a binary fraction of any finite length). Thus, the value that is being passed in to the constructor is not exactly equal to 0.1, appearances notwithstanding.
ALWAYS try to use constructor BigDecimal(String val) as it preserves precision and gives same output each time.

double inaccuracy [duplicate]

public class doublePrecision {
public static void main(String[] args) {
double total = 0;
total += 5.6;
total += 5.8;
System.out.println(total);
}
}
The above code prints:
11.399999999999
How would I get this to just print (or be able to use it as) 11.4?
As others have mentioned, you'll probably want to use the BigDecimal class, if you want to have an exact representation of 11.4.
Now, a little explanation into why this is happening:
The float and double primitive types in Java are floating point numbers, where the number is stored as a binary representation of a fraction and a exponent.
More specifically, a double-precision floating point value such as the double type is a 64-bit value, where:
1 bit denotes the sign (positive or negative).
11 bits for the exponent.
52 bits for the significant digits (the fractional part as a binary).
These parts are combined to produce a double representation of a value.
(Source: Wikipedia: Double precision)
For a detailed description of how floating point values are handled in Java, see the Section 4.2.3: Floating-Point Types, Formats, and Values of the Java Language Specification.
The byte, char, int, long types are fixed-point numbers, which are exact representions of numbers. Unlike fixed point numbers, floating point numbers will some times (safe to assume "most of the time") not be able to return an exact representation of a number. This is the reason why you end up with 11.399999999999 as the result of 5.6 + 5.8.
When requiring a value that is exact, such as 1.5 or 150.1005, you'll want to use one of the fixed-point types, which will be able to represent the number exactly.
As has been mentioned several times already, Java has a BigDecimal class which will handle very large numbers and very small numbers.
From the Java API Reference for the BigDecimal class:
Immutable,
arbitrary-precision signed decimal
numbers. A BigDecimal consists of an
arbitrary precision integer unscaled
value and a 32-bit integer scale. If
zero or positive, the scale is the
number of digits to the right of the
decimal point. If negative, the
unscaled value of the number is
multiplied by ten to the power of the
negation of the scale. The value of
the number represented by the
BigDecimal is therefore (unscaledValue
× 10^-scale).
There has been many questions on Stack Overflow relating to the matter of floating point numbers and its precision. Here is a list of related questions that may be of interest:
Why do I see a double variable initialized to some value like 21.4 as 21.399999618530273?
How to print really big numbers in C++
How is floating point stored? When does it matter?
Use Float or Decimal for Accounting Application Dollar Amount?
If you really want to get down to the nitty gritty details of floating point numbers, take a look at What Every Computer Scientist Should Know About Floating-Point Arithmetic.
When you input a double number, for example, 33.33333333333333, the value you get is actually the closest representable double-precision value, which is exactly:
33.3333333333333285963817615993320941925048828125
Dividing that by 100 gives:
0.333333333333333285963817615993320941925048828125
which also isn't representable as a double-precision number, so again it is rounded to the nearest representable value, which is exactly:
0.3333333333333332593184650249895639717578887939453125
When you print this value out, it gets rounded yet again to 17 decimal digits, giving:
0.33333333333333326
If you just want to process values as fractions, you can create a Fraction class which holds a numerator and denominator field.
Write methods for add, subtract, multiply and divide as well as a toDouble method. This way you can avoid floats during calculations.
EDIT: Quick implementation,
public class Fraction {
private int numerator;
private int denominator;
public Fraction(int n, int d){
numerator = n;
denominator = d;
}
public double toDouble(){
return ((double)numerator)/((double)denominator);
}
public static Fraction add(Fraction a, Fraction b){
if(a.denominator != b.denominator){
double aTop = b.denominator * a.numerator;
double bTop = a.denominator * b.numerator;
return new Fraction(aTop + bTop, a.denominator * b.denominator);
}
else{
return new Fraction(a.numerator + b.numerator, a.denominator);
}
}
public static Fraction divide(Fraction a, Fraction b){
return new Fraction(a.numerator * b.denominator, a.denominator * b.numerator);
}
public static Fraction multiply(Fraction a, Fraction b){
return new Fraction(a.numerator * b.numerator, a.denominator * b.denominator);
}
public static Fraction subtract(Fraction a, Fraction b){
if(a.denominator != b.denominator){
double aTop = b.denominator * a.numerator;
double bTop = a.denominator * b.numerator;
return new Fraction(aTop-bTop, a.denominator*b.denominator);
}
else{
return new Fraction(a.numerator - b.numerator, a.denominator);
}
}
}
Observe that you'd have the same problem if you used limited-precision decimal arithmetic, and wanted to deal with 1/3: 0.333333333 * 3 is 0.999999999, not 1.00000000.
Unfortunately, 5.6, 5.8 and 11.4 just aren't round numbers in binary, because they involve fifths. So the float representation of them isn't exact, just as 0.3333 isn't exactly 1/3.
If all the numbers you use are non-recurring decimals, and you want exact results, use BigDecimal. Or as others have said, if your values are like money in the sense that they're all a multiple of 0.01, or 0.001, or something, then multiply everything by a fixed power of 10 and use int or long (addition and subtraction are trivial: watch out for multiplication).
However, if you are happy with binary for the calculation, but you just want to print things out in a slightly friendlier format, try java.util.Formatter or String.format. In the format string specify a precision less than the full precision of a double. To 10 significant figures, say, 11.399999999999 is 11.4, so the result will be almost as accurate and more human-readable in cases where the binary result is very close to a value requiring only a few decimal places.
The precision to specify depends a bit on how much maths you've done with your numbers - in general the more you do, the more error will accumulate, but some algorithms accumulate it much faster than others (they're called "unstable" as opposed to "stable" with respect to rounding errors). If all you're doing is adding a few values, then I'd guess that dropping just one decimal place of precision will sort things out. Experiment.
You may want to look into using java's java.math.BigDecimal class if you really need precision math. Here is a good article from Oracle/Sun on the case for BigDecimal. While you can never represent 1/3 as someone mentioned, you can have the power to decide exactly how precise you want the result to be. setScale() is your friend.. :)
Ok, because I have way too much time on my hands at the moment here is a code example that relates to your question:
import java.math.BigDecimal;
/**
* Created by a wonderful programmer known as:
* Vincent Stoessel
* xaymaca#gmail.com
* on Mar 17, 2010 at 11:05:16 PM
*/
public class BigUp {
public static void main(String[] args) {
BigDecimal first, second, result ;
first = new BigDecimal("33.33333333333333") ;
second = new BigDecimal("100") ;
result = first.divide(second);
System.out.println("result is " + result);
//will print : result is 0.3333333333333333
}
}
and to plug my new favorite language, Groovy, here is a neater example of the same thing:
import java.math.BigDecimal
def first = new BigDecimal("33.33333333333333")
def second = new BigDecimal("100")
println "result is " + first/second // will print: result is 0.33333333333333
Pretty sure you could've made that into a three line example. :)
If you want exact precision, use BigDecimal. Otherwise, you can use ints multiplied by 10 ^ whatever precision you want.
As others have noted, not all decimal values can be represented as binary since decimal is based on powers of 10 and binary is based on powers of two.
If precision matters, use BigDecimal, but if you just want friendly output:
System.out.printf("%.2f\n", total);
Will give you:
11.40
You're running up against the precision limitation of type double.
Java.Math has some arbitrary-precision arithmetic facilities.
You can't, because 7.3 doesn't have a finite representation in binary. The closest you can get is 2054767329987789/2**48 = 7.3+1/1407374883553280.
Take a look at http://docs.python.org/tutorial/floatingpoint.html for a further explanation. (It's on the Python website, but Java and C++ have the same "problem".)
The solution depends on what exactly your problem is:
If it's that you just don't like seeing all those noise digits, then fix your string formatting. Don't display more than 15 significant digits (or 7 for float).
If it's that the inexactness of your numbers is breaking things like "if" statements, then you should write if (abs(x - 7.3) < TOLERANCE) instead of if (x == 7.3).
If you're working with money, then what you probably really want is decimal fixed point. Store an integer number of cents or whatever the smallest unit of your currency is.
(VERY UNLIKELY) If you need more than 53 significant bits (15-16 significant digits) of precision, then use a high-precision floating-point type, like BigDecimal.
private void getRound() {
// this is very simple and interesting
double a = 5, b = 3, c;
c = a / b;
System.out.println(" round val is " + c);
// round val is : 1.6666666666666667
// if you want to only two precision point with double we
// can use formate option in String
// which takes 2 parameters one is formte specifier which
// shows dicimal places another double value
String s = String.format("%.2f", c);
double val = Double.parseDouble(s);
System.out.println(" val is :" + val);
// now out put will be : val is :1.67
}
Use java.math.BigDecimal
Doubles are binary fractions internally, so they sometimes cannot represent decimal fractions to the exact decimal.
/*
0.8 1.2
0.7 1.3
0.7000000000000002 2.3
0.7999999999999998 4.2
*/
double adjust = fToInt + 1.0 - orgV;
// The following two lines works for me.
String s = String.format("%.2f", adjust);
double val = Double.parseDouble(s);
System.out.println(val); // output: 0.8, 0.7, 0.7, 0.8
Doubles are approximations of the decimal numbers in your Java source. You're seeing the consequence of the mismatch between the double (which is a binary-coded value) and your source (which is decimal-coded).
Java's producing the closest binary approximation. You can use the java.text.DecimalFormat to display a better-looking decimal value.
Short answer: Always use BigDecimal and make sure you are using the constructor with String argument, not the double one.
Back to your example, the following code will print 11.4, as you wish.
public class doublePrecision {
public static void main(String[] args) {
BigDecimal total = new BigDecimal("0");
total = total.add(new BigDecimal("5.6"));
total = total.add(new BigDecimal("5.8"));
System.out.println(total);
}
}
Multiply everything by 100 and store it in a long as cents.
Computers store numbers in binary and can't actually represent numbers such as 33.333333333 or 100.0 exactly. This is one of the tricky things about using doubles. You will have to just round the answer before showing it to a user. Luckily in most applications, you don't need that many decimal places anyhow.
Floating point numbers differ from real numbers in that for any given floating point number there is a next higher floating point number. Same as integers. There's no integer between 1 and 2.
There's no way to represent 1/3 as a float. There's a float below it and there's a float above it, and there's a certain distance between them. And 1/3 is in that space.
Apfloat for Java claims to work with arbitrary precision floating point numbers, but I've never used it. Probably worth a look.
http://www.apfloat.org/apfloat_java/
A similar question was asked here before
Java floating point high precision library
Use a BigDecimal. It even lets you specify rounding rules (like ROUND_HALF_EVEN, which will minimize statistical error by rounding to the even neighbor if both are the same distance; i.e. both 1.5 and 2.5 round to 2).
Why not use the round() method from Math class?
// The number of 0s determines how many digits you want after the floating point
// (here one digit)
total = (double)Math.round(total * 10) / 10;
System.out.println(total); // prints 11.4
Check out BigDecimal, it handles problems dealing with floating point arithmetic like that.
The new call would look like this:
term[number].coefficient.add(co);
Use setScale() to set the number of decimal place precision to be used.
If you have no choice other than using double values, can use the below code.
public static double sumDouble(double value1, double value2) {
double sum = 0.0;
String value1Str = Double.toString(value1);
int decimalIndex = value1Str.indexOf(".");
int value1Precision = 0;
if (decimalIndex != -1) {
value1Precision = (value1Str.length() - 1) - decimalIndex;
}
String value2Str = Double.toString(value2);
decimalIndex = value2Str.indexOf(".");
int value2Precision = 0;
if (decimalIndex != -1) {
value2Precision = (value2Str.length() - 1) - decimalIndex;
}
int maxPrecision = value1Precision > value2Precision ? value1Precision : value2Precision;
sum = value1 + value2;
String s = String.format("%." + maxPrecision + "f", sum);
sum = Double.parseDouble(s);
return sum;
}
You can Do the Following!
System.out.println(String.format("%.12f", total));
if you change the decimal value here %.12f
So far I understand it as main goal to get correct double from wrong double.
Look for my solution how to get correct value from "approximate" wrong value - if it is real floating point it rounds last digit - counted from all digits - counting before dot and try to keep max possible digits after dot - hope that it is enough precision for most cases:
public static double roundError(double value) {
BigDecimal valueBigDecimal = new BigDecimal(Double.toString(value));
String valueString = valueBigDecimal.toPlainString();
if (!valueString.contains(".")) return value;
String[] valueArray = valueString.split("[.]");
int places = 16;
places -= valueArray[0].length();
if ("56789".contains("" + valueArray[0].charAt(valueArray[0].length() - 1))) places--;
//System.out.println("Rounding " + value + "(" + valueString + ") to " + places + " places");
return valueBigDecimal.setScale(places, RoundingMode.HALF_UP).doubleValue();
}
I know it is long code, sure not best, maybe someone can fix it to be more elegant. Anyway it is working, see examples:
roundError(5.6+5.8) = 11.399999999999999 = 11.4
roundError(0.4-0.3) = 0.10000000000000003 = 0.1
roundError(37235.137567000005) = 37235.137567
roundError(1/3) 0.3333333333333333 = 0.333333333333333
roundError(3723513756.7000005) = 3.7235137567E9 (3723513756.7)
roundError(3723513756123.7000005) = 3.7235137561237E12 (3723513756123.7)
roundError(372351375612.7000005) = 3.723513756127E11 (372351375612.7)
roundError(1.7976931348623157) = 1.797693134862316
Do not waste your efford using BigDecimal. In 99.99999% cases you don't need it. java double type is of cource approximate but in almost all cases, it is sufficiently precise. Mind that your have an error at 14th significant digit. This is really negligible!
To get nice output use:
System.out.printf("%.2f\n", total);

why program can exactly display infinite repeating floating point number in java or other language

like a decimal number 0.1, represented as binary 0.00011001100110011...., this is a infinite repeating number.
when I write code like this:
float f = 0.1f;
the program will rounding it as binary 0 01111011 1001 1001 1001 1001 1001 101, this is not original number 0.1.
but when print this variable like this:
System.out.print(f);
I can get original number 0.1 rather than 0.100000001 or some other number. I think the program can't exactly represent "0.1", but it can display "0.1" exactly. How to do it?
I recover decimal number through add each bits of binary, it looks weird.
float f = (float) (Math.pow(2, -4) + Math.pow(2, -5) + Math.pow(2, -8) + Math.pow(2, -9) + Math.pow(2, -12) + Math.pow(2, -13) + Math.pow(2, -16) + Math.pow(2, -17) + Math.pow(2, -20) + Math.pow(2, -21) + Math.pow(2, -24) + Math.pow(2, -25));
float f2 = (float) Math.pow(2, -27);
System.out.println(f);
System.out.println(f2);
System.out.println(f + f2);
Output:
0.099999994
7.4505806E-9
0.1
in math, f1 + f2 = 0.100000001145... , not equals 0.1. Why the program would not get result like 0.100000001, I think it is more accurate.
Java's System.out.print prints just enough decimals that the resulting representation, if parsed as a double or float, converts to the original double or float value.
This is a good idea because it means that in a sense, no information is lost in this kind of conversion to decimal. On the other hand, it can give an impression of exactness which, as you make clear in your question, is wrong.
In other languages, you can print the exact decimal representation of the float or double being considered:
#include <stdio.h>
int main(){
printf("%.60f", 0.1);
}
result: 0.100000000000000005551115123125782702118158340454101562500000
In Java, in order to emulate the above behavior, you need to convert the float or double to BigDecimal (this conversion is exact) and then print the BigDecimal with enough digits. Java's attitude to floating-point-to-string-representing-a-decimal conversion is pervasive, so that even System.out.format is affected. The linked Java program, the important line of which is System.out.format("%.60f\n", 0.1);, shows 0.100000000000000000000000000000000000000000000000000000000000, although the value of 0.1d is not 0.10000000000000000000…, and a Java programmer could have been excused for expecting the same output as the C program.
To convert a double to a string that represents the exact value of the double, consider the hexadecimal format, that Java supports for literals and for printing.
I believe this is covered by Double.toString(double) (and similarly in Float#toString(float)):
How many digits must be printed for the fractional part of m or a? There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double. That is, suppose that x is the exact mathematical value represented by the decimal representation produced by this method for a finite nonzero argument d. Then d must be the double value nearest to x; or if two double values are equally close to x, then d must be one of them and the least significant bit of the significand of d must be 0.
(my emphasis)

Why does changing the sum order returns a different result?

Why does changing the sum order returns a different result?
23.53 + 5.88 + 17.64 = 47.05
23.53 + 17.64 + 5.88 = 47.050000000000004
Both Java and JavaScript return the same results.
I understand that, due to the way floating point numbers are represented in binary, some rational numbers (like 1/3 - 0.333333...) cannot be represented precisely.
Why does simply changing the order of the elements affect the result?
Maybe this question is stupid, but why does simply changing the order of the elements affects the result?
It will change the points at which the values are rounded, based on their magnitude. As an example of the kind of thing that we're seeing, let's pretend that instead of binary floating point, we were using a decimal floating point type with 4 significant digits, where each addition is performed at "infinite" precision and then rounded to the nearest representable number. Here are two sums:
1/3 + 2/3 + 2/3 = (0.3333 + 0.6667) + 0.6667
= 1.000 + 0.6667 (no rounding needed!)
= 1.667 (where 1.6667 is rounded to 1.667)
2/3 + 2/3 + 1/3 = (0.6667 + 0.6667) + 0.3333
= 1.333 + 0.3333 (where 1.3334 is rounded to 1.333)
= 1.666 (where 1.6663 is rounded to 1.666)
We don't even need non-integers for this to be a problem:
10000 + 1 - 10000 = (10000 + 1) - 10000
= 10000 - 10000 (where 10001 is rounded to 10000)
= 0
10000 - 10000 + 1 = (10000 - 10000) + 1
= 0 + 1
= 1
This demonstrates possibly more clearly that the important part is that we have a limited number of significant digits - not a limited number of decimal places. If we could always keep the same number of decimal places, then with addition and subtraction at least, we'd be fine (so long as the values didn't overflow). The problem is that when you get to bigger numbers, smaller information is lost - the 10001 being rounded to 10000 in this case. (This is an example of the problem that Eric Lippert noted in his answer.)
It's important to note that the values on the first line of the right hand side are the same in all cases - so although it's important to understand that your decimal numbers (23.53, 5.88, 17.64) won't be represented exactly as double values, that's only a problem because of the problems shown above.
Here's what's going on in binary. As we know, some floating-point values cannot be represented exactly in binary, even if they can be represented exactly in decimal. These 3 numbers are just examples of that fact.
With this program I output the hexadecimal representations of each number and the results of each addition.
public class Main{
public static void main(String args[]) {
double x = 23.53; // Inexact representation
double y = 5.88; // Inexact representation
double z = 17.64; // Inexact representation
double s = 47.05; // What math tells us the sum should be; still inexact
printValueAndInHex(x);
printValueAndInHex(y);
printValueAndInHex(z);
printValueAndInHex(s);
System.out.println("--------");
double t1 = x + y;
printValueAndInHex(t1);
t1 = t1 + z;
printValueAndInHex(t1);
System.out.println("--------");
double t2 = x + z;
printValueAndInHex(t2);
t2 = t2 + y;
printValueAndInHex(t2);
}
private static void printValueAndInHex(double d)
{
System.out.println(Long.toHexString(Double.doubleToLongBits(d)) + ": " + d);
}
}
The printValueAndInHex method is just a hex-printer helper.
The output is as follows:
403787ae147ae148: 23.53
4017851eb851eb85: 5.88
4031a3d70a3d70a4: 17.64
4047866666666666: 47.05
--------
403d68f5c28f5c29: 29.41
4047866666666666: 47.05
--------
404495c28f5c28f6: 41.17
4047866666666667: 47.050000000000004
The first 4 numbers are x, y, z, and s's hexadecimal representations. In IEEE floating point representation, bits 2-12 represent the binary exponent, that is, the scale of the number. (The first bit is the sign bit, and the remaining bits for the mantissa.) The exponent represented is actually the binary number minus 1023.
The exponents for the first 4 numbers are extracted:
sign|exponent
403 => 0|100 0000 0011| => 1027 - 1023 = 4
401 => 0|100 0000 0001| => 1025 - 1023 = 2
403 => 0|100 0000 0011| => 1027 - 1023 = 4
404 => 0|100 0000 0100| => 1028 - 1023 = 5
First set of additions
The second number (y) is of smaller magnitude. When adding these two numbers to get x + y, the last 2 bits of the second number (01) are shifted out of range and do not figure into the calculation.
The second addition adds x + y and z and adds two numbers of the same scale.
Second set of additions
Here, x + z occurs first. They are of the same scale, but they yield a number that is higher up in scale:
404 => 0|100 0000 0100| => 1028 - 1023 = 5
The second addition adds x + z and y, and now 3 bits are dropped from y to add the numbers (101). Here, there must be a round upwards, because the result is the next floating point number up: 4047866666666666 for the first set of additions vs. 4047866666666667 for the second set of additions. That error is significant enough to show in the printout of the total.
In conclusion, be careful when performing mathematical operations on IEEE numbers. Some representations are inexact, and they become even more inexact when the scales are different. Add and subtract numbers of similar scale if you can.
Jon's answer is of course correct. In your case the error is no larger than the error you would accumulate doing any simple floating point operation. You've got a scenario where in one case you get zero error and in another you get a tiny error; that's not actually that interesting a scenario. A good question is: are there scenarios where changing the order of calculations goes from a tiny error to a (relatively) enormous error? The answer is unambiguously yes.
Consider for example:
x1 = (a - b) + (c - d) + (e - f) + (g - h);
vs
x2 = (a + c + e + g) - (b + d + f + h);
vs
x3 = a - b + c - d + e - f + g - h;
Obviously in exact arithmetic they would be the same. It is entertaining to try to find values for a, b, c, d, e, f, g, h such that the values of x1 and x2 and x3 differ by a large quantity. See if you can do so!
This actually covers much more than just Java and Javascript, and would likely affect any programming language using floats or doubles.
In memory, floating points use a special format along the lines of IEEE 754 (the converter provides much better explanation than I can).
Anyways, here's the float converter.
http://www.h-schmidt.net/FloatConverter/
The thing about the order of operations is the "fineness" of the operation.
Your first line yields 29.41 from the first two values, which gives us 2^4 as the exponent.
Your second line yields 41.17 which gives us 2^5 as the exponent.
We're losing a significant figure by increasing the exponent, which is likely to change the outcome.
Try ticking the last bit on the far right on and off for 41.17 and you can see that something as "insignificant" as 1/2^23 of the exponent would be enough to cause this floating point difference.
Edit: For those of you who remember significant figures, this would fall under that category. 10^4 + 4999 with a significant figure of 1 is going to be 10^4. In this case, the significant figure is much smaller, but we can see the results with the .00000000004 attached to it.
Floating point numbers are represented using the IEEE 754 format, which provides a specific size of bits for the mantissa (significand). Unfortunately this gives you a specific number of 'fractional building blocks' to play with, and certain fractional values cannot be represented precisely.
What is happening in your case is that in the second case, the addition is probably running into some precision issue because of the order the additions are evaluated. I haven't calculated the values, but it could be for example that 23.53 + 17.64 cannot be precisely represented, while 23.53 + 5.88 can.
Unfortunately it is a known problem that you just have to deal with.
I believe it has to do with the order of evaulation. While the sum is naturally the same in a math world, in the binary world instead of A + B + C = D, it's
A + B = E
E + C = D(1)
So there's that secondary step where floating point numbers can get off.
When you change the order,
A + C = F
F + B = D(2)
To add a different angle to the other answers here, this SO answer shows that there are ways of doing floating-point math where all summation orders return exactly the same value at the bit level.

Retain precision with double in Java

public class doublePrecision {
public static void main(String[] args) {
double total = 0;
total += 5.6;
total += 5.8;
System.out.println(total);
}
}
The above code prints:
11.399999999999
How would I get this to just print (or be able to use it as) 11.4?
As others have mentioned, you'll probably want to use the BigDecimal class, if you want to have an exact representation of 11.4.
Now, a little explanation into why this is happening:
The float and double primitive types in Java are floating point numbers, where the number is stored as a binary representation of a fraction and a exponent.
More specifically, a double-precision floating point value such as the double type is a 64-bit value, where:
1 bit denotes the sign (positive or negative).
11 bits for the exponent.
52 bits for the significant digits (the fractional part as a binary).
These parts are combined to produce a double representation of a value.
(Source: Wikipedia: Double precision)
For a detailed description of how floating point values are handled in Java, see the Section 4.2.3: Floating-Point Types, Formats, and Values of the Java Language Specification.
The byte, char, int, long types are fixed-point numbers, which are exact representions of numbers. Unlike fixed point numbers, floating point numbers will some times (safe to assume "most of the time") not be able to return an exact representation of a number. This is the reason why you end up with 11.399999999999 as the result of 5.6 + 5.8.
When requiring a value that is exact, such as 1.5 or 150.1005, you'll want to use one of the fixed-point types, which will be able to represent the number exactly.
As has been mentioned several times already, Java has a BigDecimal class which will handle very large numbers and very small numbers.
From the Java API Reference for the BigDecimal class:
Immutable,
arbitrary-precision signed decimal
numbers. A BigDecimal consists of an
arbitrary precision integer unscaled
value and a 32-bit integer scale. If
zero or positive, the scale is the
number of digits to the right of the
decimal point. If negative, the
unscaled value of the number is
multiplied by ten to the power of the
negation of the scale. The value of
the number represented by the
BigDecimal is therefore (unscaledValue
× 10^-scale).
There has been many questions on Stack Overflow relating to the matter of floating point numbers and its precision. Here is a list of related questions that may be of interest:
Why do I see a double variable initialized to some value like 21.4 as 21.399999618530273?
How to print really big numbers in C++
How is floating point stored? When does it matter?
Use Float or Decimal for Accounting Application Dollar Amount?
If you really want to get down to the nitty gritty details of floating point numbers, take a look at What Every Computer Scientist Should Know About Floating-Point Arithmetic.
When you input a double number, for example, 33.33333333333333, the value you get is actually the closest representable double-precision value, which is exactly:
33.3333333333333285963817615993320941925048828125
Dividing that by 100 gives:
0.333333333333333285963817615993320941925048828125
which also isn't representable as a double-precision number, so again it is rounded to the nearest representable value, which is exactly:
0.3333333333333332593184650249895639717578887939453125
When you print this value out, it gets rounded yet again to 17 decimal digits, giving:
0.33333333333333326
If you just want to process values as fractions, you can create a Fraction class which holds a numerator and denominator field.
Write methods for add, subtract, multiply and divide as well as a toDouble method. This way you can avoid floats during calculations.
EDIT: Quick implementation,
public class Fraction {
private int numerator;
private int denominator;
public Fraction(int n, int d){
numerator = n;
denominator = d;
}
public double toDouble(){
return ((double)numerator)/((double)denominator);
}
public static Fraction add(Fraction a, Fraction b){
if(a.denominator != b.denominator){
double aTop = b.denominator * a.numerator;
double bTop = a.denominator * b.numerator;
return new Fraction(aTop + bTop, a.denominator * b.denominator);
}
else{
return new Fraction(a.numerator + b.numerator, a.denominator);
}
}
public static Fraction divide(Fraction a, Fraction b){
return new Fraction(a.numerator * b.denominator, a.denominator * b.numerator);
}
public static Fraction multiply(Fraction a, Fraction b){
return new Fraction(a.numerator * b.numerator, a.denominator * b.denominator);
}
public static Fraction subtract(Fraction a, Fraction b){
if(a.denominator != b.denominator){
double aTop = b.denominator * a.numerator;
double bTop = a.denominator * b.numerator;
return new Fraction(aTop-bTop, a.denominator*b.denominator);
}
else{
return new Fraction(a.numerator - b.numerator, a.denominator);
}
}
}
Observe that you'd have the same problem if you used limited-precision decimal arithmetic, and wanted to deal with 1/3: 0.333333333 * 3 is 0.999999999, not 1.00000000.
Unfortunately, 5.6, 5.8 and 11.4 just aren't round numbers in binary, because they involve fifths. So the float representation of them isn't exact, just as 0.3333 isn't exactly 1/3.
If all the numbers you use are non-recurring decimals, and you want exact results, use BigDecimal. Or as others have said, if your values are like money in the sense that they're all a multiple of 0.01, or 0.001, or something, then multiply everything by a fixed power of 10 and use int or long (addition and subtraction are trivial: watch out for multiplication).
However, if you are happy with binary for the calculation, but you just want to print things out in a slightly friendlier format, try java.util.Formatter or String.format. In the format string specify a precision less than the full precision of a double. To 10 significant figures, say, 11.399999999999 is 11.4, so the result will be almost as accurate and more human-readable in cases where the binary result is very close to a value requiring only a few decimal places.
The precision to specify depends a bit on how much maths you've done with your numbers - in general the more you do, the more error will accumulate, but some algorithms accumulate it much faster than others (they're called "unstable" as opposed to "stable" with respect to rounding errors). If all you're doing is adding a few values, then I'd guess that dropping just one decimal place of precision will sort things out. Experiment.
You may want to look into using java's java.math.BigDecimal class if you really need precision math. Here is a good article from Oracle/Sun on the case for BigDecimal. While you can never represent 1/3 as someone mentioned, you can have the power to decide exactly how precise you want the result to be. setScale() is your friend.. :)
Ok, because I have way too much time on my hands at the moment here is a code example that relates to your question:
import java.math.BigDecimal;
/**
* Created by a wonderful programmer known as:
* Vincent Stoessel
* xaymaca#gmail.com
* on Mar 17, 2010 at 11:05:16 PM
*/
public class BigUp {
public static void main(String[] args) {
BigDecimal first, second, result ;
first = new BigDecimal("33.33333333333333") ;
second = new BigDecimal("100") ;
result = first.divide(second);
System.out.println("result is " + result);
//will print : result is 0.3333333333333333
}
}
and to plug my new favorite language, Groovy, here is a neater example of the same thing:
import java.math.BigDecimal
def first = new BigDecimal("33.33333333333333")
def second = new BigDecimal("100")
println "result is " + first/second // will print: result is 0.33333333333333
Pretty sure you could've made that into a three line example. :)
If you want exact precision, use BigDecimal. Otherwise, you can use ints multiplied by 10 ^ whatever precision you want.
As others have noted, not all decimal values can be represented as binary since decimal is based on powers of 10 and binary is based on powers of two.
If precision matters, use BigDecimal, but if you just want friendly output:
System.out.printf("%.2f\n", total);
Will give you:
11.40
You're running up against the precision limitation of type double.
Java.Math has some arbitrary-precision arithmetic facilities.
You can't, because 7.3 doesn't have a finite representation in binary. The closest you can get is 2054767329987789/2**48 = 7.3+1/1407374883553280.
Take a look at http://docs.python.org/tutorial/floatingpoint.html for a further explanation. (It's on the Python website, but Java and C++ have the same "problem".)
The solution depends on what exactly your problem is:
If it's that you just don't like seeing all those noise digits, then fix your string formatting. Don't display more than 15 significant digits (or 7 for float).
If it's that the inexactness of your numbers is breaking things like "if" statements, then you should write if (abs(x - 7.3) < TOLERANCE) instead of if (x == 7.3).
If you're working with money, then what you probably really want is decimal fixed point. Store an integer number of cents or whatever the smallest unit of your currency is.
(VERY UNLIKELY) If you need more than 53 significant bits (15-16 significant digits) of precision, then use a high-precision floating-point type, like BigDecimal.
private void getRound() {
// this is very simple and interesting
double a = 5, b = 3, c;
c = a / b;
System.out.println(" round val is " + c);
// round val is : 1.6666666666666667
// if you want to only two precision point with double we
// can use formate option in String
// which takes 2 parameters one is formte specifier which
// shows dicimal places another double value
String s = String.format("%.2f", c);
double val = Double.parseDouble(s);
System.out.println(" val is :" + val);
// now out put will be : val is :1.67
}
Use java.math.BigDecimal
Doubles are binary fractions internally, so they sometimes cannot represent decimal fractions to the exact decimal.
/*
0.8 1.2
0.7 1.3
0.7000000000000002 2.3
0.7999999999999998 4.2
*/
double adjust = fToInt + 1.0 - orgV;
// The following two lines works for me.
String s = String.format("%.2f", adjust);
double val = Double.parseDouble(s);
System.out.println(val); // output: 0.8, 0.7, 0.7, 0.8
Doubles are approximations of the decimal numbers in your Java source. You're seeing the consequence of the mismatch between the double (which is a binary-coded value) and your source (which is decimal-coded).
Java's producing the closest binary approximation. You can use the java.text.DecimalFormat to display a better-looking decimal value.
Short answer: Always use BigDecimal and make sure you are using the constructor with String argument, not the double one.
Back to your example, the following code will print 11.4, as you wish.
public class doublePrecision {
public static void main(String[] args) {
BigDecimal total = new BigDecimal("0");
total = total.add(new BigDecimal("5.6"));
total = total.add(new BigDecimal("5.8"));
System.out.println(total);
}
}
Multiply everything by 100 and store it in a long as cents.
Computers store numbers in binary and can't actually represent numbers such as 33.333333333 or 100.0 exactly. This is one of the tricky things about using doubles. You will have to just round the answer before showing it to a user. Luckily in most applications, you don't need that many decimal places anyhow.
Floating point numbers differ from real numbers in that for any given floating point number there is a next higher floating point number. Same as integers. There's no integer between 1 and 2.
There's no way to represent 1/3 as a float. There's a float below it and there's a float above it, and there's a certain distance between them. And 1/3 is in that space.
Apfloat for Java claims to work with arbitrary precision floating point numbers, but I've never used it. Probably worth a look.
http://www.apfloat.org/apfloat_java/
A similar question was asked here before
Java floating point high precision library
Use a BigDecimal. It even lets you specify rounding rules (like ROUND_HALF_EVEN, which will minimize statistical error by rounding to the even neighbor if both are the same distance; i.e. both 1.5 and 2.5 round to 2).
Why not use the round() method from Math class?
// The number of 0s determines how many digits you want after the floating point
// (here one digit)
total = (double)Math.round(total * 10) / 10;
System.out.println(total); // prints 11.4
Check out BigDecimal, it handles problems dealing with floating point arithmetic like that.
The new call would look like this:
term[number].coefficient.add(co);
Use setScale() to set the number of decimal place precision to be used.
If you have no choice other than using double values, can use the below code.
public static double sumDouble(double value1, double value2) {
double sum = 0.0;
String value1Str = Double.toString(value1);
int decimalIndex = value1Str.indexOf(".");
int value1Precision = 0;
if (decimalIndex != -1) {
value1Precision = (value1Str.length() - 1) - decimalIndex;
}
String value2Str = Double.toString(value2);
decimalIndex = value2Str.indexOf(".");
int value2Precision = 0;
if (decimalIndex != -1) {
value2Precision = (value2Str.length() - 1) - decimalIndex;
}
int maxPrecision = value1Precision > value2Precision ? value1Precision : value2Precision;
sum = value1 + value2;
String s = String.format("%." + maxPrecision + "f", sum);
sum = Double.parseDouble(s);
return sum;
}
You can Do the Following!
System.out.println(String.format("%.12f", total));
if you change the decimal value here %.12f
So far I understand it as main goal to get correct double from wrong double.
Look for my solution how to get correct value from "approximate" wrong value - if it is real floating point it rounds last digit - counted from all digits - counting before dot and try to keep max possible digits after dot - hope that it is enough precision for most cases:
public static double roundError(double value) {
BigDecimal valueBigDecimal = new BigDecimal(Double.toString(value));
String valueString = valueBigDecimal.toPlainString();
if (!valueString.contains(".")) return value;
String[] valueArray = valueString.split("[.]");
int places = 16;
places -= valueArray[0].length();
if ("56789".contains("" + valueArray[0].charAt(valueArray[0].length() - 1))) places--;
//System.out.println("Rounding " + value + "(" + valueString + ") to " + places + " places");
return valueBigDecimal.setScale(places, RoundingMode.HALF_UP).doubleValue();
}
I know it is long code, sure not best, maybe someone can fix it to be more elegant. Anyway it is working, see examples:
roundError(5.6+5.8) = 11.399999999999999 = 11.4
roundError(0.4-0.3) = 0.10000000000000003 = 0.1
roundError(37235.137567000005) = 37235.137567
roundError(1/3) 0.3333333333333333 = 0.333333333333333
roundError(3723513756.7000005) = 3.7235137567E9 (3723513756.7)
roundError(3723513756123.7000005) = 3.7235137561237E12 (3723513756123.7)
roundError(372351375612.7000005) = 3.723513756127E11 (372351375612.7)
roundError(1.7976931348623157) = 1.797693134862316
Do not waste your efford using BigDecimal. In 99.99999% cases you don't need it. java double type is of cource approximate but in almost all cases, it is sufficiently precise. Mind that your have an error at 14th significant digit. This is really negligible!
To get nice output use:
System.out.printf("%.2f\n", total);

Categories

Resources