Why is 8099.99975f != 8100f? - java

Edit: I know floating point arithmetic is not exact. And the arithmetic isn't even my problem. The addition gives the result I expected. 8099.99975f doesn't.
So I have this little program:
public class Test {
public static void main(String[] args) {
System.out.println(8099.99975f); // 8099.9995
System.out.println(8099.9995f + 0.00025f); // 8100.0
System.out.println(8100f == 8099.99975f); // false
System.out.println(8099.9995f + 0.00025f == 8099.99975f); // false
// I know comparing floats with == can be troublesome
// but here they really should be equal in every bit.
}
}
I wrote it to check if 8099.99975 is rounded to 8100 when written as an IEEE 754 single precision float. To my surprise Java converts it to 8099.9995 when written as a float literal (8099.99975f). I checked my calculations and the IEEE standard again but couldn't find any mistakes. 8100 is just as far away from 8099.99975 as 8099.9995 but the last bit of 8100 is 0 which should make it the right representation.
So I checked the Java language spec to see if I missed something. After a quick search I found two things:
The Java programming language requires that floating-point arithmetic behave as if every floating-point operator rounded its floating-point result to the result precision. Inexact results must be rounded to the representable value nearest to the infinitely precise result; if the two nearest representable values are equally near, the one with its least significant bit zero is chosen.
The Java programming language uses round toward zero when converting a floating value to an integer [...].
I noticed here that nothing was said about float literals. So I thought that float literals maybe are just doubles which when cast to float are rounded to zero similarly to the float to int casting. That would explain why 8099.99975f was rounded to zero.
I wrote the little program you can see above to check my theory and indeed found that when adding two float literals that should result in 8100 the correct float is computed. (Note here that 8099.9995 and 0.00025 can be represented exactly as single floats so there's no rounding that could lead to a different result) This confused me since it didn't make much sense to me that float literals and computed floats behaved differently so I dug around in the language spec some more and found this:
A floating-point literal is of type float if it is suffixed with an ASCII letter F or f [...]. The elements of the types float [...] are those values that can be represented using the IEEE 754 32-bit single-precision [...] binary floating-point formats.
This ultimately states that the literal should be rounded according to the IEEE standard which in this case is to 8100. So why is it 8099.9995?

The key point to realise is that the value of a floating point number can be worked out in two different ways, that aren't in general equal.
There's the value that the bits in the floating point number give the exact binary representation of.
There's the "decimal display value" of a floating point number, which is the number with the least decimal places that is closer to that floating point number than any other number.
To understand the difference, consider the number whose exponent is 10001011 and whose significand is 1.11111010001111111111111. This is the exact binary representation of 8099.99951171875. But the decimal value 8099.9995 has fewer decimal places, and is closer to this floating point number than to any other floating point number. Therefore, 8099.9995 is the value that will be displayed when you print out that number.
Note that this particular floating point number is the next lowest one after 8100.
Now consider 8099.99975. It's slightly closer to 8099.99951171875 than it is to 8100. Therefore, to represent it in single precision floating point, Java will pick the floating point number which is the exact binary representation of 8099.99951171875. If you try to print it, you'll see 8099.9995.
Lastly, when you do 8099.9995 + 0.00025 in single precision floating point, the numbers involved are the exact binary representations of 8099.99951171875 and 0.0002499999827705323696136474609375. But because the latter is slightly more than 1/2^12, the result of addition will be closer to 8100 than to 8099.99951171875, and so it will be rounded up, not down at the end, making it 8100.

The decimal value 8099.99975 has nine significant digits. This is more than can be represented exactly in a float. If you use the floating point analysis tool at CUNY you'll see that the binary representation closest to 8099.9995 is 45FD1FFF. When you attempt to add 0.00025 you are suffering a "loss of significance". In order not to lose significant (left-hand) digits of the larger number, the significand of the smaller has to be shifted right to match the scale (exponent) of the larger. When this happens, its value becomes ZERO as it shifts off the right end of the register.
Decimal Exponent Significand
--------- -------------- -------------------------
8099.9995 10001011 (+12) 1.11111010001111111111111
0.00025 01110011 (-12) 1.00000110001001001101111
To line these up for addition, the second one has to shift right 24 bits, but there are only 23 bits in the significand of a single-precision float. The significand disappears, leaving zero, so the addition has no effect.
If you want this to work, switch to double-precision arithmetic.

Related

What is the range in which IEEE754 can correctly compare the size of floating point numbers

I know that the range of integers that can be correctly compared is -(2^53 - 1) to 2^53 - 1.
But what is the range of floating point numbers?
// Should the double type conform to this specification? I guess
double c = 9007199254740990.9;
double d = 9007199254740990.8;
System.out.println(c > d); // return false
There's no particular range in which floating point numbers are guaranteed to be compared correctly - that would require the existence of infinitely many double representations.
All double numbers are accurate to 53 significant figures of their representations in binary. If two numbers differ only in the 54th significant binary figure, they'll generally have the same double representation, unless rounding makes them different.
This question seems to be based on a fundamental misunderstanding of (mathematical) Real numbers, and their relationship to IEEE-754.
"Floating point" refers to a representation scheme for Real numbers. Historically that comprises decimal digits and a "decimal point". There are an infinite number of Real numbers, and (correspondingly) an infinite number of floating point numbers. Provided that you have enough paper to write them down.
One property of Real numbers is that there is an infinite number of Real numbers between any pair of distinct Real numbers. That includes any pair that are really close to one another.
IEEE-754 floating point numbers are different. Each one represents a single precise Real number. There is a set of (less than) 2^64 of these Real numbers for IEEE-754 double precision. Any other Real number that is not in that set cannot be precisely represented as a double.
In Java, when we write this:
double c = 9007199254740990.9;
double d = 9007199254740990.8;
System.out.println(c > d); // return false
the value of c will be the IEEE-754 representation that is closest to the real number that the floating point notation denotes. Likewise d. It is not going to be exactly equal to the Real number. (It can only be exact for a finite subset of the infinite set of Real numbers ...)
The print statement illustrates this. The closest IEEE-754 representations for those 2 Real numbers are the same.
What actually happens is that the Java compiler works out what 9007199254740990.9 and 9007199254740990.8 mean as Real numbers. Then it silently rounds those numbers to the nearest IEEE-754 double.
So to your question:
What is the range in which IEEE-754 can correctly compare the size of floating point numbers
A: There is no such range.
If you give me any Real number "x" (expressed in floating point form) that maps to a double value d, I can write you a different Real number "y" (in floating point form) that also maps to the same value d. All I need to do is to add some additional digits to the least significant end of "x".
Note: the reasoning is independent of the actual representation of IEEE-754. It only depends on the fact that IEEE-754 representations have a fixed number of bits. That means that the set of possible values is bounded ... and the rest is a logical consequence of that.
Let's think about the actual bit representation of these double numbers:
We can use System.out.println(Long.toBinaryString(Double.doubleToRawLongBits(c))); to print the bit representation.
Both 9007199254740990.9 and 9007199254740990.8 has the same bit representation following IEEE 754:
0 10000110011 1111111111111111111111111111111111111111111111111111
s(sign) = 0
exp = 10000110011 (binary value, which equals 1075 in decimal)
frac = 111...111 (52 bits)
The precise value of this representation is:
With IEEE 754, starting from 2^53, the double type is no longer able to precisely represent all integers larger than 2 ^ 53, let alone floating point numbers.
Regarding the OP question, as mentioned in other answers, there's not an accurate range. For example, the following statement will evaluate as true. We won't say that the range of floating numbers that double can compare is less than 1.1. Generally speaking, the larger the number is, the less precise the double type could represent.
double x1 = 1.1;
double x2 = 1.1000000000000001;
System.out.println(x1 == x2); // true

Is Java floating arithmetic division always within 1 ULP of the true result?

Is Java's floating arithmetic division always within 1 ULP of the true result? I read that CPUs sometimes do floating-point division a/b by doing a * 1/b. However, 1/b may be off by 1 ULP, and multiplication adds up to 1 ULP of error. Doesn't this mean that the final error could be 2 ULPs?
This doesn't sound right to me, because I know there are many methods in the Math class that are within 1 ULP of the true result (such as Math.pow), so I don't think something as simple as division would be less accurate.
https://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.2.4 contains Java's specifications on this. The relevant clause is probably this one:
The Java programming language requires that floating-point arithmetic behave as if every floating-point operator rounded its floating-point result to the result precision. Inexact results must be rounded to the representable value nearest to the infinitely precise result; if the two nearest representable values are equally near, the one with its least significant bit zero is chosen. This is the IEEE 754 standard's default rounding mode known as round to nearest.
I read this as 1/2 ULP precision: the two nearest values are one ulp apart, and Java chooses "the representable value nearest to the infinitely precise result," i.e. the closer one, which must be within 1/2 ULP.

Determine smallest floating point type that can hold a string value

I'm working on a method that translates a string into an appropriate Number type, depending upon the format of the number. If the number appears to be a floating point value, then I need to return the smallest type I can use without sacrificing precision (Float, Double or BigDecimal).
Based on How many significant digits have floats and doubles in java? (and other resources), I've learned than Float values have 23 bits for the mantissa. Based on this, I used the following method to return the bit length for a given value:
private static int getBitLengthOfSignificand(String integerPart,
String fractionalPart) {
return new BigInteger(integerPart + fractionalPart).bitLength();
}
If the result of this test is below 24, I return a Float. If below 53 I return a Double, otherwise a BigDecimal.
However, I'm confused by the result when I consider Float.MAX_VALUE, which is 3.4028235E38. The bit length of the significand is 26 according to my method (where integerPart = 3 and fractionalPart = 4028235. This triggers my method to return a Double, when clearly Float would suffice.
Can someone highlight the flaw in my thinking or implementation? Another idea I had was to convert the string to a BigDecimal and scale down using floatValue() and doubleValue(), testing for overflow (which is represented by infinite values). But that loses precision, so isn't appropriate for me.
The significand is stored in binary, and you can think of it as a number in its decimal representation only if you don't let it confuse you.
The exponent is a binary exponent that does not represent a multiplication by a power of ten but by a power of two. For this reason, the E38 in the number you used as example is only a convenience: the real significand is in binary and should be multiplied by a power of two to obtain the actual number. Powers of two and powers of ten aren't the same, so “3.4028235” is not the real significand.
The real significand of Float.MAX_VALUE is in hexadecimal notation, 0x1.fffffe, and its associated exponent is 127, meaning that Float.MAX_VALUE is actually 0x1.fffffe * 2127.
Looking at the decimal representation to choose a binary floating-point type to put the value in, as you are trying to do, doesn't work. For one thing, the number of decimal digits that one is sure to recover from a float is different from the number of decimal digits one may need to write to distinguish a float from its neighbors (6 and 9 respectively). You chose to write “3.4028235E38” but you could have written 3.40282E38, which for your algorithm, looks easier to represent, when it isn't, really. When people write that “3.4028235E38” is the largest finite value of the float type, they mean that if you round this decimal number to float, you will arrive to the largest float. If you parse “3.4028235E38” as a double-precision number it won't even be equal to Float.MAX_VALUE.
To put it differently: another way to write Float.MAX_VALUE is 3.4028234663852885981170418348451692544E38. It is still representable as a float (it represents the exact same value as 3.4028235E38). It looks like it has many digits because these are decimal digits that appear for a decimal exponent, when in fact the number is represented internally with a binary exponent.
(By the way, your approach does not check that the exponent is in range to represent a number in the chosen type, which is another condition for a type to be able to represent the number from a string.)
I would work in terms of the difference between the actual value and the nearest float. BigDecimal can store any finite length decimal fraction exactly and do arithmetic on it:
Convert the String to the nearest float x. If x is infinite, but the value has a finite double representation use that.
Convert the String exactly to BigDecimal y.
If y is zero, use float, which can represent zero exactly.
If not, convert the float x to BigDecimal, z.
Calculate, in BigDecimal to a reasonable number of decimal places, the absolute value of (y-z)/z. That is the relative rounding error due to using float. If it is small enough for your purposes, less than some value you pick, use float. If not, use double.
If you literally want no sacrifice in precision, it is much simpler. Convert to both float and double. Compare them for equality. The comparison will be done in double. If they compare equal, go with the float. If not, go with the double.

Math.cos(Math.radians(90)) doesn't give 0 [duplicate]

alert(Math.cos(Math.PI/2));
Why the result is not exact zero? Is this inaccurancy, or some implementation error?
Math.PI/2 is an approximation of the real value of pi/2. Taking the exact cosine of this approximated value won't yield zero. The value you get is an approximation of this exact value up to the precision of the underlying floating point datatype.
Using some arbitrary precision library, you can evaluate the difference between pi/2 in double precision and the exact value to
0.0000000000000000612323399573676588613032966137500529104874722961...
Since the slope of the cosine close to its zeros is 1, you would expect the cosine of the approximation of pi/2 to be approximately equal to this difference, and indeed it is.
Floating-point numbers are normally approximations. Since floating-point numbers are represented in memory as binary numbers multiplied by an exponent only numbers that are sums of powers of 2 can usually be represented.
Fractions such as 1/3 can't be written as a binary number and as such have no accurate floating-point representation. Even some numbers that can be written accurately in decimal, such as 0.1, can't be represented accurately in binary and so will not be represented correctly in floating point.
PI is an irrational number and can't be represented as floating-point, so there will be rounding errors. Do not compare floating-point numbers for equality without including a tolerance parameter. This link has a good explanation of the basics.
Comparing calculated floating point numbers for equality is almost always a bad idea, since (as others have stated) they are approximations, and errors appear.
Instead of checking for a==b, check for equality to within a threshold that makes sense for your application, as with Math.abs(a-b) < .00001. This is good practice in any programming language that represents numbers as floating point values.
If you're storing integers in floating point variables and just adding, subtracting, and multiplying, they'll stay integers (at least until they go out of bounds). But dividing, using trig functions, etc., will introduce errors that must be allowed for.
-m#

How can a primitive float value be -0.0? What does that mean?

How come a primitive float value can be -0.0? What does that mean?
Can I cancel that feature?
When I have:
float fl;
Then fl == -0.0 returns true and so does fl == 0. But when I print it, it prints -0.0.
Because Java uses the IEEE Standard for Floating-Point Arithmetic (IEEE 754) which defines -0.0 and when it should be used.
The smallest number representable has no 1 bit in the subnormal significand and is called the positive or negative zero as determined by the sign. It actually represents a rounding to zero of numbers in the range between zero and the smallest representable non-zero number of the same sign, which is why it has a sign, and why its reciprocal +Inf or -Inf also has a sign.
You can get around your specific problem by adding 0.0
e.g.
Double.toString(value + 0.0);
See: Java Floating-Point Number Intricacies
Operations Involving Negative Zero
...
(-0.0) + 0.0 -> 0.0
-
"-0.0" is produced when a floating-point operation results in a negative floating-point number so close to 0 that it cannot be represented normally.
how come a primitive float value can be -0.0?
floating point numbers are stored in memory using the IEEE 754 standard meaning that there could be rounding errors. You could never be able to store a floating point number of infinite precision with finite resources.
You should never test if a floating point number == to some other, i.e. never write code like this:
if (a == b)
where a and b are floats. Due to rounding errors those two numbers might be stored as different values in memory.
You should define a precision you want to work with:
private final static double EPSILON = 0.00001;
and then test against the precision you need
if (Math.abs(a - b) < epsilon)
So in your case if you want to test that a floating point number equals to zero in the given precision:
if (Math.abs(a) < epsilon)
And if you want to format the numbers when outputting them in the GUI you may take a look at the following article and the NumberFormat class.
The floating point type in Java is described in the JLS: 4.2.3 Floating-Point Types, Formats, and Values.
It talks about these special values:
(...) Each of the four value sets includes not only the finite nonzero values that are ascribed to it above, but also NaN values and the four values positive zero, negative zero, positive infinity, and negative infinity. (...)
And has some important notes about them:
Positive zero and negative zero compare equal; thus the result of the expression 0.0==-0.0 is true and the result of 0.0>-0.0 is false. But other operations can distinguish positive and negative zero; for example, 1.0/0.0 has the value positive infinity, while the value of 1.0/-0.0 is negative infinity.
You can't "cancel" that feature, it's part of how the floats work.
For more about negative zero, have a look at the Signed zero Wikipedia entry.
If you want to check what "kind" of zero you have, you can use the fact that:
(new Float(0.0)).equals(new Float(-0.0))
is false (but indeed, 0.0 == -0.0).
Have a look here for more of this: Java Floating-Point Number Intricacies.
From wikipedia
The IEEE 754 standard for floating point arithmetic (presently used by
most computers and programming languages that support floating point
numbers) requires both +0 and −0. The zeroes can be considered as a
variant of the extended real number line such that 1/−0 = −∞ and 1/+0
= +∞, division by zero is only undefined for ±0/±0 and ±∞/±∞.
I don't think you can or need to cancel that feature. You must not compare floating point numbers with == because of precision errors anyway.
A good article on how float point numbers are managed in java / computers.
http://www.artima.com/underthehood/floating.html
btw: it is real pain in computers when 2.0 - 1.0 could produce 0.999999999999 that is not equal to1.0 :). That can be especially easy stumbled upon in javascript form validations.

Categories

Resources