Why is comparing floats inconsistent in Java? - java

class Test{
public static void main(String[] args){
float f1=3.2f;
float f2=6.5f;
if(f1==3.2){
System.out.println("same");
}else{
System.out.println("different");
}
if(f2==6.5){
System.out.println("same");
}else{
System.out.println("different");
}
}
}
output:
different
same
Why is the output like that? I expected same as the result in first case.

The difference is that 6.5 can be represented exactly in both float and double, whereas 3.2 can't be represented exactly in either type. and the two closest approximations are different.
An equality comparison between float and double first converts the float to a double and then compares the two. So the data loss.
You shouldn't ever compare floats or doubles for equality; because you can't really guarantee that the number you assign to the float or double is exact.
This rounding error is a characteristic feature of floating-point computation.
Squeezing infinitely many real numbers into a finite number of bits
requires an approximate representation. Although there are infinitely
many integers, in most programs the result of integer computations can
be stored in 32 bits.
In contrast, given any fixed number of bits,
most calculations with real numbers will produce quantities that
cannot be exactly represented using that many bits. Therefore the
result of a floating-point calculation must often be rounded in order
to fit back into its finite representation. This rounding error is the
characteristic feature of floating-point computation.
Check What Every Computer Scientist Should Know About Floating-Point Arithmetic for more!

They're both implementations of different parts of the IEEE floating point standard. A float is 4 bytes wide, whereas a double is 8 bytes wide.
As a rule of thumb, you should probably prefer to use double in most cases, and only use float when you have a good reason to. (An example of a good reason to use float as opposed to a double is "I know I don't need that much precision and I need to store a million of them in memory.") It's also worth mentioning that it's hard to prove you don't need double precision.
Also, when comparing floating point values for equality, you'll typically want to use something like Math.abs(a-b) < EPSILON where a and b are the floating point values being compared and EPSILON is a small floating point value like 1e-5. The reason for this is that floating point values rarely encode the exact value they "should" -- rather, they usually encode a value very close -- so you have to "squint" when you determine if two values are the same.
EDIT: Everyone should read the link #Kugathasan Abimaran posted below: What Every Computer Scientist Should Know About Floating-Point Arithmetic for more!

To see what you're dealing with, you can use Float and Double's toHexString method:
class Test {
public static void main(String[] args) {
System.out.println("3.2F is: "+Float.toHexString(3.2F));
System.out.println("3.2 is: "+Double.toHexString(3.2));
System.out.println("6.5F is: "+Float.toHexString(6.5F));
System.out.println("6.5 is: "+Double.toHexString(6.5));
}
}
$ java Test
3.2F is: 0x1.99999ap1
3.2 is: 0x1.999999999999ap1
6.5F is: 0x1.ap2
6.5 is: 0x1.ap2
Generally, a number has an exact representation if it equals A * 2^B, where A and B are integers whose allowed values are set by the language specification (and double has more allowed values).
In this case,
6.5 = 13/2 = (1+10/16)*4 = (1+a/16)*2^2 == 0x1.ap2, while
3.2 = 16/5 = ( 1 + 9/16 + 9/16^2 + 9/16^3 + . . . ) * 2^1 == 0x1.999. . . p1.
But Java can only hold a finite number of digits, so it cuts the .999. . . off at some point. (You may remember from math that 0.999. . .=1. That's in base 10. In base 16, it would be 0.fff. . .=1.)

class Test {
public static void main(String[] args) {
float f1=3.2f;
float f2=6.5f;
if(f1==3.2f)
System.out.println("same");
else
System.out.println("different");
if(f2==6.5f)
System.out.println("same");
else
System.out.println("different");
}
}
Try like this and it will work. Without 'f' you are comparing a floating with other floating type and different precision which may cause unexpected result as in your case.

It is not possible to compare values of type float and double directly. Before the values can be compared, it is necessary to either convert the double to float, or convert the float to double. If one does the former comparison, the conversion will ask "Does the the float hold the best possible float representation of the double's value?" If one does the latter conversion, the question will be "Does the float hold a perfect representation of the double's value". In many contexts, the former question is the more meaningful one, but Java assumes that all comparisons between float and double are intended to ask the latter question.
I would suggest that regardless of what a language is willing to tolerate, one's coding standards should absolutely positively forbid direct comparisons between operands of type float and double. Given code like:
float f = function1();
double d = function2();
...
if (d==f) ...
it's impossible to tell what behavior is intended in cases where d represents a value which is not precisely representable in float. If the intention is that f be converted to a double, and the result of that conversion compared with d, one should write the comparison as
if (d==(double)f) ...
Although the typecast doesn't change the code's behavior, it makes clear that the code's behavior is intentional. If the intention was that the comparison indicate whether f holds the best float representation of d, it should be:
if ((float)d==f)
Note that the behavior of this is very different from what would happen without the cast. Had your original code cast the double operand of each comparison to float, then both equality tests would have passed.

In general is not a good practice to use the == operator with floating points number, due to approximation issues.

6.5 can be represented exactly in binary, whereas 3.2 can't. That's why the difference in precision doesn't matter for 6.5, so 6.5 == 6.5f.
To quickly refresh how binary numbers work:
100 -> 4
10 -> 2
1 -> 1
0.1 -> 0.5 (or 1/2)
0.01 -> 0.25 (or 1/4)
etc.
6.5 in binary: 110.1 (exact result, the rest of the digits are just zeroes)
3.2 in binary: 11.001100110011001100110011001100110011001100110011001101... (here precision matters!)
A float only has 24 bits precision (the rest is used for sign and exponent), so:
3.2f in binary: 11.0011001100110011001100 (not equal to the double precision approximation)
Basically it's the same as when you're writing 1/5 and 1/7 in decimal numbers:
1/5 = 0,2
1,7 = 0,14285714285714285714285714285714.

Float has less precision than double, bcoz float is using 32bits inwhich 1 is used for Sign, 23 precision and 8 for Exponent . Where as double uses 64 bits in which 52 are used for precision, 11 for exponent and 1for Sign....Precision is important matter.A decimal number represented as float and double can be equal or unequal depends is need of precision( i.e range of numbers after decimal point can vary). Regards S. ZAKIR

Related

What is the range in which IEEE754 can correctly compare the size of floating point numbers

I know that the range of integers that can be correctly compared is -(2^53 - 1) to 2^53 - 1.
But what is the range of floating point numbers?
// Should the double type conform to this specification? I guess
double c = 9007199254740990.9;
double d = 9007199254740990.8;
System.out.println(c > d); // return false
There's no particular range in which floating point numbers are guaranteed to be compared correctly - that would require the existence of infinitely many double representations.
All double numbers are accurate to 53 significant figures of their representations in binary. If two numbers differ only in the 54th significant binary figure, they'll generally have the same double representation, unless rounding makes them different.
This question seems to be based on a fundamental misunderstanding of (mathematical) Real numbers, and their relationship to IEEE-754.
"Floating point" refers to a representation scheme for Real numbers. Historically that comprises decimal digits and a "decimal point". There are an infinite number of Real numbers, and (correspondingly) an infinite number of floating point numbers. Provided that you have enough paper to write them down.
One property of Real numbers is that there is an infinite number of Real numbers between any pair of distinct Real numbers. That includes any pair that are really close to one another.
IEEE-754 floating point numbers are different. Each one represents a single precise Real number. There is a set of (less than) 2^64 of these Real numbers for IEEE-754 double precision. Any other Real number that is not in that set cannot be precisely represented as a double.
In Java, when we write this:
double c = 9007199254740990.9;
double d = 9007199254740990.8;
System.out.println(c > d); // return false
the value of c will be the IEEE-754 representation that is closest to the real number that the floating point notation denotes. Likewise d. It is not going to be exactly equal to the Real number. (It can only be exact for a finite subset of the infinite set of Real numbers ...)
The print statement illustrates this. The closest IEEE-754 representations for those 2 Real numbers are the same.
What actually happens is that the Java compiler works out what 9007199254740990.9 and 9007199254740990.8 mean as Real numbers. Then it silently rounds those numbers to the nearest IEEE-754 double.
So to your question:
What is the range in which IEEE-754 can correctly compare the size of floating point numbers
A: There is no such range.
If you give me any Real number "x" (expressed in floating point form) that maps to a double value d, I can write you a different Real number "y" (in floating point form) that also maps to the same value d. All I need to do is to add some additional digits to the least significant end of "x".
Note: the reasoning is independent of the actual representation of IEEE-754. It only depends on the fact that IEEE-754 representations have a fixed number of bits. That means that the set of possible values is bounded ... and the rest is a logical consequence of that.
Let's think about the actual bit representation of these double numbers:
We can use System.out.println(Long.toBinaryString(Double.doubleToRawLongBits(c))); to print the bit representation.
Both 9007199254740990.9 and 9007199254740990.8 has the same bit representation following IEEE 754:
0 10000110011 1111111111111111111111111111111111111111111111111111
s(sign) = 0
exp = 10000110011 (binary value, which equals 1075 in decimal)
frac = 111...111 (52 bits)
The precise value of this representation is:
With IEEE 754, starting from 2^53, the double type is no longer able to precisely represent all integers larger than 2 ^ 53, let alone floating point numbers.
Regarding the OP question, as mentioned in other answers, there's not an accurate range. For example, the following statement will evaluate as true. We won't say that the range of floating numbers that double can compare is less than 1.1. Generally speaking, the larger the number is, the less precise the double type could represent.
double x1 = 1.1;
double x2 = 1.1000000000000001;
System.out.println(x1 == x2); // true

How float is converted to double in java? [duplicate]

This question already has answers here:
Why converting from float to double changes the value?
(9 answers)
Closed 7 years ago.
I write the following code in java and check the values stored in the variables. when I store 1.2 in a double variable 'y' it becomes 1.200000025443 something.
Why it is not 1.200000000000 ?
public class DataTypes
{
static public void main(String[] args)
{
float a=1;
float b=1.2f;
float c=12e-1f;
float x=1.2f;
double y=x;
System.out.println("float a=1 shows a= "+a+"\nfloat b=1.2f shows b= "+b+"\nfloat c=12e-1f shows c= "+c+"\nfloat x=1.2f shows x= "+x+"\ndouble y=x shows y= "+y);
}
}
You can see the output here:
float a=1 shows a= 1.0
float b=1.2f shows b= 1.2
float c=12e-1f shows c= 1.2
float x=1.2f shows x= 1.2
double y=x shows y= 1.2000000476837158
This is a question of formatting above anything else.
Take a look at the Float.toString documentation (Float.toString is what's called to produce the decimal representations you see for the floats, and Double.toString for the double):
How many digits must be printed for the fractional part of m or a? There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type float. That is, suppose that x is the exact mathematical value represented by the decimal representation produced by this method for a finite nonzero argument f. Then f must be the float value nearest to x; or, if two float values are equally close to x, then f must be one of them and the least significant bit of the significand of f must be 0.
(emphasis mine)
The situation is the same for Double.toString. But, you need more digits to "uniquely distinguish the argument value from adjacent values of type double" than you do for float (recall that double is 64-bits while float is 32), that's why you're seeing the extra digits for double and not for float.
Note that anything that can be represented by float can also be represented by double, so you're not actually losing any precision in the conversion.
Of course, not all numbers can be exactly representable by float or double, which is why you see those seemingly random extra digits in the decimal representation in the first place. See "What Every Computer Scientist Should Know About Floating-Point Arithmetic".
The reason why there's such issue is because a computer works only in discrete mathematics, because the microprocessor can only represent internally full numbers, but no decimals. Because we cannot only work with such numbers, but also with decimals, to circumvent that, decades ago very smart engineers have invented the floating point representation, normalized as IEEE754.
The IEEE754 norm that defines how floats and doubles are interpreted in memory. Basically, unlike the int which represent an exact value, the floats and doubles are a calculation from:
sign
exponent
fraction
So the issue here is that when you're storing 1.2 as a double, you actually store a binary approximation to it:
00111111100110011001100110011010
which gives you the closest representation of 1.2 that can be stored using a binary fraction, but not exactly that fraction. In decimal fraction, 12*10^-1 gives an exact value, but as a binary fraction, it cannot give an exact value.
(cf http://www.h-schmidt.net/FloatConverter/IEEE754.html as I'm too lazy to do it myself)
when I store 1.2 in a double variable 'y' it becomes 1.200000025443 something
well actually in both the float and the double versions of y, the value actually is 1.2000000476837158, but because of the smaller mantissa of the float, the value represented is truncated before the approximation, making you believe it's an exact value, whereas in the memory it's not.

How does java round an integer when stored in a floating point

Its a classic problem: your legacy code uses a floating point when it should really be using n integer. But, its to expensive to change every instance of that variable (or several) in the code. So, you need to write your own rounding function that takes a bunch of parameters to improve accuracy and convert to an integer.
So, the basic questions is, how do floating point numbers round when they are made in java? the classic example is 0.1 what is often quoted as rounding to 0.0999999999998 (or something like that). But does a floating point number always round down to the next value it can represent when given an integer in Java? Does it round down it's internal mantissa to efficiently round down its absolute value? Or does it just pick the value with the smallest error between the integer and the new float?
Also is the behavior different when calling Float.parseFloat(String) when the String is an integer like "1234567890"? And is the behavior also the same when String is a Floating point with more precision than the Float can store.
Please note that I use floating point or reference Float, I use that interchangeably with Double. Same with integer and long.
how do floating point numbers round when they are made in java?
Java truncates (rounds towards zero) when you use the construct (int) d where d has type double or float. If you need to round to the nearest integer, you can use the line below:
int a = (int) Math.round(d);
the classic example is 0.1 what is often quoted as rounding to 0.0999999999998 (or something like that).
The issue you allude to does not exist with integers, which are exactly representable as double (for those between -253 and 253). If the number you are rounding comes from previous computations that should have produced an integer but may not have because of floating-point rounding errors, then (int) Math.round(d) is likely the solution you should use. It means that you will get the correct integer as long as the cumulative error is not above 0.5.
your legacy code uses a floating point when it should really be using n integer. But, its to expensive to change every instance of that variable (or several) in the code.
If the computations producing the double d are only computations +, -, * with other integers, producing intermediate results between -253 and 253, then d automatically contains an integer (it is exact because the floating-point computations involved are exact, and it is an integer because the exact result is an integer), and you can convert it with the simpler (int) d. On the other hand, if division or non-integer operands are involved, then you should not lightly change the type of d, because it would change the results of these computations.
Also is the behavior different when calling Float.parseFloat(String) when the String is an integer like "1234567890"?
This will produce a float whose value is the nearest representable single-precision value to the rational 1234567890. This happens to be 1234567936.0f.
And is the behavior also the same when String is a Floating point with more precision than the Float can store.
Technically, “0.1” is more precision than Float can store. Also, technically, the previous example 1234567890 is also more precision than Float can store. The behavior is the same: Float.parseFloat("0.1") produces the nearest float to the rational number 0.1.

Why is there a difference between the same value stored as a float and a double in Java?

I expected the following code to produce: "Both are equal", but I got "Both are NOT equal":
float a=1.3f;
double b=1.3;
if(a==b)
{
System.out.println("Both are equal");
}
else{
System.out.println("Both are NOT equal");
}
What is the reason for this?
It's because the closest float value to 1.3 isn't the same as the closest double value to 1.3. Neither value will be exactly 1.3 - that can't be represented exactly in a non-recurring binary representation.
To give a different understanding of why this happens, suppose we had two decimal floating point types - decimal5 and decimal10, where the number represents the number of significant digits. Now suppose we tried to assign the value of "a third" to both of them. You'd end up with
decimal5 oneThird = 0.33333
decimal10 oneThird = 0.3333333333
Clearly those values aren't equal. It's exactly the same thing here, just with different bases involved.
However if you restrict the values to the less-precise type, you'll find they are equal in this particular case:
double d = 1.3d;
float f = 1.3f;
System.out.println((float) d == f); // Prints true
That's not guaranteed to be the case, however. Sometimes the approximation from the decimal literal to the double representation, and then the approximation of that value to the float representation, ends up being less accurate than the straight decimal to float approximation. One example of this 1.0000001788139343 (thanks to stephentyrone for finding this example).
Somewaht more safely, you can do the comparison between doubles, but use a float literal in the original assignment:
double d = 1.3f;
float f = 1.3f;
System.out.println(d == f); // Prints true
In the latter case, it's a bit like saying:
decimal10 oneThird = 0.3333300000
However, as pointed out in the comments, you almost certainly shouldn't be comparing floating point values with ==. It's almost never the right thing to do, because of precisely this sort of thing. Usually if you want to compare two values you do it with some sort of "fuzzy" equality comparison, checking whether the two numbers are "close enough" for your purposes. See the Java Traps: double page for more information.
If you really need to check for absolute equality, that usually indicates that you should be using a different numeric format in the first place - for instance, for financial data you should probably be using BigDecimal.
A float is a single precision floating point number. A double is a double precision floating point number. More details here: http://www.concentric.net/~Ttwang/tech/javafloat.htm
Note: It is a bad idea to check exact equality for floating point numbers. Most of the time, you want to do a comparison based on a delta or tolerance value.
For example:
float a = 1.3f;
double b = 1.3;
float delta = 0.000001f;
if (Math.abs(a - b) < delta)
{
System.out.println("Close enough!");
}
else
{
System.out.println("Not very close!");
}
Some numbers can't be represented exactly in floating point (e.g. 0.01) so you might get unexpected results when you compare for equality.
Read this article.
The above article clearly illustrates with examples your scenario while using double and float types.
float a=1.3f;
double b=1.3;
At this point you have two variables containing binary approximations to the Real number 1.3. The first approximation is accurate to about 7 decimal digits, and the second one is accurate to about 15 decimal digits.
if(a==b) {
The expression a==b is evaluate in two stages. First the value of a is converted from a float to a double by padding the binary representation. The result is still only accurate to about 7 decimal digits as a representation of the Real 1.3. Next you compare the two different approximations. Since they are different, the result of a==b is false.
There are two lessons to learn:
Floating point (and double) literals are almost always approximations; e.g. actual number that corresponds to the literal 1.3f is not exactly equal to the Real number 1.3.
Every time you do a floating point computation, errors creep in. These errors tend to build up. So when you are comparing floating points / double numbers it is usually a mistake to use a simple "==", "<", etcetera. Instead you should use |a - b| < delta where delta is chosen appropriately. (And figuring out what is an appropriate delta is not always straight-forward either.)
You should have taken that course in Numerical Analysis :-)
Never check for equality between floating point numbers. Specifically, to answer your question, the number 1.3 is hard to represent as a binary floating point and the double and float representations are different.
The problem is that Java (and alas .NET as well) is inconsistent about whether a float value represents a single precise numeric quantity or a range of quantities. If a float is considered to represents an exact numeric quantity of the form Mant * 2^Exp, where Mant is an integer 0 to 2^25 and Exp is an integer), then an attempt to cast any number not of that form to float should throw an exception. If it's considered to represent "the locus of numbers for which some particular representation in the above form has been deemed likely to be the best", then a double-to-float cast would be correct even for double values not of the above form [casting the double that best represents a quantity to a float will almost always yield the float that best represents that quantity, though in some corner cases (e.g. numeric quantities in the range 8888888.500000000001 to 8888888.500000000932) the float which is chosen may be a few parts per trillion worse than the best possible float representation of the actual numeric quantity].
To use an analogy, suppose two people each have a ten-centimeter-long object and they measure it. Bob uses an expensive set of calibers and determines that his object is 3.937008" long. Joe uses a tape measure and determines that his object is 3 15/16" long. Are the objects the same size? If one converts Joe's measurement to millionths of an inch (3.937500") the measurements will appear different, but one instead converts Bob's measurement to the nearest 1/256" fraction, they will appear equal. Although the former comparison might seem more "precise", the latter is apt to be more meaningful. Joe's measurement if 3 15/16" doesn't really mean 3.937500"--it means "a distance which, using a tape measure, is indistinguishable from 3 15/16". And 3.937008" is, like the size of Joe's object, a distance which using a tape measure would be indistinguishable from 3 15/16.
Unfortunately, even though it would be more meaningful to compare the measurements using the lower precision, Java's floating-point-comparison rules assume that a float represents a single precise numeric quantity, and performs comparisons on that basis. While there are some cases where this is useful (e.g. knowing whether the particular double produced by casting some value to float and back to double would match the starting value), in general direct equality comparisons between float and double are not meaningful. Even though Java does not require it, one should always cast the operands of a floating-point equality comparison to be the same type. The semantics that result from casting the double to float before the comparison are different from those of casting the float to double, and the behavior Java picks by default (cast the float to double) is often semantically wrong.
Actually neither float nor double can store 1.3. I am not kidding. Watch this video carefully.
https://www.youtube.com/watch?v=RtHKwsXuRkk&index=50&list=PL6pxHmHF3F5JPdnEqKALRMgogwYc2szp1

Can every float be expressed exactly as a double?

Can every possible value of a float variable can be represented exactly in a double variable?
In other words, for all possible values X will the following be successful:
float f1 = X;
double d = f1;
float f2 = (float)d;
if(f1 == f2)
System.out.println("Success!");
else
System.out.println("Failure!");
My suspicion is that there is no exception, or if there is it is only for an edge case (like +/- infinity or NaN).
Edit: Original wording of question was confusing (stated two ways, one which would be answered "no" the other would be answered "yes" for the same answer). I've reworded it so that it matches the question title.
Yes.
Proof by enumeration of all possible cases:
public class TestDoubleFloat {
public static void main(String[] args) {
for (long i = Integer.MIN_VALUE; i <= Integer.MAX_VALUE; i++) {
float f1 = Float.intBitsToFloat((int) i);
double d = (double) f1;
float f2 = (float) d;
if (f1 != f2) {
if (Float.isNaN(f1) && Float.isNaN(f2)) {
continue; // ok, NaN
}
fail("oops: " + f1 + " != " + f2);
}
}
}
}
finishes in 12 seconds on my machine. 32 bits are small.
In theory, there is not such a value, so "yes", every float should be representable as a double.. Converting from a float to a double should involve just tacking four bytes of 00 on the end -- they are stored using the same format, just with different sized fields.
Yes, floats are a subset of doubles. Both floats and doubles have the form (sign * a * 2^b). The difference between floats and doubles is the number of bits in a & b. Since doubles have more bits available, assigning a float value to a double effectively means inserting extra 0 bits.
As everyone has already said, "no". But that's actually a "yes" to the question itself, i.e. every float can be exactly expressed as a double. Confusing. :)
If I'm reading the language specification correctly (and as everyone else is confirming), there is no such value.
That is, each claims only to hold only IEEE 754 standard values, so casts between the two should incur no change except in memory given.
(clarification: There would be no change as long as the value was small enough to be held in a float; obviously if the value was too many bits to be held in a float to begin with, casting from double to float would result in a loss of precision.)
#KenG: This code:
float a = 0.1F
println "a=${a}"
double d = a
println "d=${d}"
fails not because 0.1f can't be exactly represented. The question was "is there a float value that cannot be represented as a double", which this code doesn't prove. Although 0.1f can't be stored exactly, the value that a is given (which isn't 0.1f exactly) can be stored as a double (which also won't be 0.1f exactly). Assuming an Intel FPU, the bit pattern for a is:
0 01111011 10011001100110011001101
and the bit pattern for d is:
0 01111111011 100110011001100110011010 (followed by lots more zeros)
which has the same sign, exponent (-4 in both cases) and the same fractional part (separated by spaces above). The difference in the output is due to the position of the second non-zero digit in the number (the first is the 1 after the point) which can only be represented with a double. The code that outputs the string format stores intermediate values in memory and is specific to floats and doubles (i.e. there is a function double-to-string and another float-to-string). If the to-string function was optimised to use the FPU stack to store the intermediate results of the to-string process, the output would be the same for float and double since the FPU uses the same, larger format (80bits) for both float and double.
There are no float values that can't be stored identically in a double, i.e. the set of float values is a sub-set of the the set of double values.
Snark: NaNs will compare differently after (or indeed before) conversion.
This does not, however, invalidate the answers already given.
I took the code you listed and decided to try it in C++ since I thought it might execute a little faster and it is significantly easier to do unsafe casting. :-D
I found out that for valid numbers, the conversion works and you get the exact bitwise representation after the cast. However, for non-numbers, e.g. 1.#QNAN0, etc., the result will use a simplified representation of the non-number rather than the exact bits of the source. For example:
**** FAILURE **** 2140188725 | 1.#QNAN0 -- 0xa0000000 0x7ffa1606
I cast an unsigned int to float then to double and back to float. The number 2140188725 (0x7F90B035) results in a NAN and converting to double and back is still a NAN but not the exact same NAN.
Here is the simple C++ code:
typedef unsigned int uint;
for (uint i = 0; i < 0xFFFFFFFF; ++i)
{
float f1 = *(float *)&i;
double d = f1;
float f2 = (float)d;
if(f1 != f2)
printf("**** FAILURE **** %u | %f -- 0x%08x 0x%08x\n", i, f1, f1, f2);
if ((i % 1000000) == 0)
printf("Iteration: %d\n", i);
}
The answer to the first question is yes, the answer to the 'in other words', however is no. If you change the test in the code to be if (!(f1 != f2)) the answer to the second question becomes yes -- it will print 'Success' for all float values.
In theory every normal single can have the exponent and mantissa padded to create a double and then remove the padding and you return to the original single.
When you go from theory to reality is when you will have problems. I dont know if you were interested in theory or implementation. If it is implementation then you can rapidly get into trouble.
IEEE is a horrible format, my understanding it was intentionally designed to be so tough that nobody could meet it and allow the market to catch up to intel (this was a while back) allowing for more competition. If that is true it failed, either way we are stuck with this dreadful spec. Something like the TI format is far superior for the real world in so many ways. I have no connection to either company or any of these formats.
Thanks to this spec there are very few if any fpus that actually meet it (in hardware or even in hardware plus the operating system), and those that do often fail on the next generation. (google: TestFloat). The problems these days tend to lie in the int to float and float to int and not single to double and double to single as you have specified above. Of course what operation is the fpu going to perform to do that conversion? Add 0? Multiply by 1? Depends on the fpu and the compiler.
The problem with IEEE related to your question above is that there is more than one way a number, not every number but many numbers can be represented. If I wanted to break your code I would start with minus zero in the hope that one of the two operations would convert it to a plus zero. Then I would try denormals. And it should fail with a signaling nan, but you called that out as a known exception.
The problem is that equal sign, here is rule number one about floating point, never use an equal sign. Equals is a bit comparison not a value comparison, if you have two values represented in different ways (plus zero and minus zero for example) the bit comparison will fail even though its the same number. Greater than and less than are done in the fpu, equals is done with the integer alu.
I realize that you probably used the equal to explain the problem and not necessarily the code you wanted to succeed or fail.
If a floating-point type is viewed as representing a precise value, then as other posters have noted, every float value is representable as a double, but only a few values of double can be represented by float. On the other hand, if one recognizes that floating-point values are approximations, one will realize the real situation is reversed. If one uses a very precise instrument to measure something which is 3.437mm, one may correctly describe is size as 3.4mm. if one uses a ruler to measure the object as 3.4mm, it would be incorrect to describe its size as 3.400mm.
Even bigger problems exist at the top of the range. There is a float value that represents: "computed value exceeded 2^127 by an unknown amount", but there's no double value that indicates such a thing. Casting an "infinity" from single to double will yield a value "computed value exceeded 2^1023 by an unknown amount" which is off by a factor of over a googol.

Categories

Resources