use case for Double as HashMap Keys [duplicate] - java

I was thinking of using a Double as the key to a HashMap but I know floating point comparisons are unsafe, that got me thinking. Is the equals method on the Double class also unsafe? If it is then that would mean the hashCode method is probably also incorrect. This would mean that using Double as the key to a HashMap would lead to unpredictable behavior.
Can anyone confirm any of my speculation here?

Short answer: Don't do it
Long answer: Here is how the key is going to be computed:
The actual key will be a java.lang.Double object, since keys must be objects. Here is its hashCode() method:
public int hashCode() {
long bits = doubleToLongBits(value);
return (int)(bits ^ (bits >>> 32));
}
The doubleToLongBits() method basically takes the 8 bytes and represent them as long. So it means that small changes in the computation of double can mean a great deal and you will have key misses.
If you can settle for a given number of points after the dot - multiply by 10^(number of digits after the dot) and convert to int (for example - for 2 digits multiply by 100).
It will be much safer.

I think you are right. Although the hash of the doubles are ints, the double could mess up the hash. That is why, as Josh Bloch mentions in Effective Java, when you use a double as an input to a hash function, you should use doubleToLongBits(). Similarly, use floatToIntBits for floats.
In particular, to use a double as your hash, following Josh Bloch's recipe, you would do:
public int hashCode() {
int result = 17;
long temp = Double.doubleToLongBits(the_double_field);
result = 37 * result + ((int) (temp ^ (temp >>> 32)));
return result;
}
This is from Item 8 of Effective Java, "Always override hashCode when you override equals". It can be found in this pdf of the chapter from the book.
Hope this helps.

It depends on how you would be using it.
If you're happy with only being able to find the value based on the exact same bit pattern (or potentially an equivalent one, such as +/- 0 and various NaNs) then it might be okay.
In particular, all NaNs would end up being considered equal, but +0 and -0 would be considered different. From the docs for Double.equals:
Note that in most cases, for two
instances of class Double, d1 and d2,
the value of d1.equals(d2) is true if
and only if
d1.doubleValue() ==
d2.doubleValue() also has the value
true. However, there are two
exceptions:
If d1 and d2 both represent
Double.NaN, then the equals method
returns true, even though
Double.NaN==Double.NaN has the value
false.
If d1 represents +0.0 while d2
represents -0.0, or vice versa, the
equal test has the value false, even
though +0.0==-0.0 has the value true.
This definition allows hash tables to
operate properly.
Most likely you're interested in "numbers very close to the key" though, which makes it a lot less viable. In particular if you're going to do one set of calculations to get the key once, then a different set of calculations to get the key the second time, you'll have problems.

The problem is not the hash code but the precision of the doubles. This will cause some strange results. Example:
double x = 371.4;
double y = 61.9;
double key = x + y; // expected 433.3
Map<Double, String> map = new HashMap<Double, String>();
map.put(key, "Sum of " + x + " and " + y);
System.out.println(map.get(433.3)); // prints null
The calculated value (key) is "433.29999999999995" which is not EQUALS to 433.3 and so you don't find the entry in the Map (the hash code probably is also different, but that is not the main problem).
If you use
map.get(key)
it should find the entry...
[]]

Short answer: It probably won't work.
Honest answer: It all depends.
Longer answer: The hash code isn't the issue, it's the nature of equal comparisons on floating point. As Nalandial and the commenters on his post point out, ultimately any match against a hash table still ends up using equals to pick the right value.
So the question is, are your doubles generated in such a way that you know that equals really means equals? If you read or compute a value, store it in the hash table, and then later read or compute the value using exactly the same computation, then Double.equals will work. But otherwise it's unreliable: 1.2 + 2.3 does not necessarily equal 3.5, it might equal 3.4999995 or whatever. (Not a real example, I just made that up, but that's the sort of thing that happens.) You can compare floats and doubles reasonably reliably for less or greater, but not for equals.

Maybe BigDecimal get you where you want to go?

The hash of the double is used, not the double itself.
Edit: Thanks, Jon, I actually didn't know that.
I'm not sure about this (you should just look at the source code of the Double object) but I would think any issues with floating point comparisons would be taken care of for you.

It depends on how you store and access you map, yes similar values could end up being slightly different and therefore not hash to the same value.
private static final double key1 = 1.1+1.3-1.6;
private static final double key2 = 123321;
...
map.get(key1);
would be all good, however
map.put(1.1+2.3, value);
...
map.get(5.0 - 1.6);
would be dangerous

Related

Float-type useless?

in my java book it told me to not directly compare 2 different float(type) numbers when storing them in variables. Because it gives an approx of a number in the variable. Instead it suggested checking the absolute value of the difference and see if it equals 0. If it does they are the same. How is this helpful? What if I store 5 in variable a and 5 in variable b, how can they not be the same? And how does it help if I compare absolute value??
double a=5,b=5;
if (Math.abs(a-b)==0)
//run code
if (a==b)
//run code
I don't see at all why the above method would be more accurate? Since if 'a' is not equal to 'b' it wont matter if I use Math.abs.
I appreciate replies and thank you for your time.
I tried both methods.
Inaccuracy with comparisons using the == operator is caused by the way double values are stored in a computer's memory. We need to remember that there is an infinite number of values that must fit in limited memory space, usually 64 bits. As a result, we can't have an exact representation of most double values in our computers. They must be rounded to be saved.
Because of the rounding inaccuracy, interesting errors might occur:
double d1 = 0;
for (int i = 1; i <= 8; i++) {
d1 += 0.1;
}
double d2 = 0.1 * 8;
System.out.println(d1);
System.out.println(d2);
Both variables, d1 and d2, should equal 0.8. However, when we run the code above, we'll see the following results:
0.7999999999999999
0.8
In that case, comparing both values with the == operator would produce a wrong result. For this reason, we must use a more complex comparison algorithm.
If we want to have the best precision and control over the rounding mechanism, we can use java.math.BigDecimal class.
The recommended algorithm to compare double values in plain Java is a threshold comparison method. In this case, we need to check whether the difference between both numbers is within the specified tolerance, commonly called epsilon:
double epsilon = 0.000001d;
assertThat(Math.abs(d1 - d2) < epsilon).isTrue();
The smaller the epsilon's value, the greater the comparison accuracy. However, if we specify the tolerance value too small, we'll get the same false result as in the simple == comparison
The thing is, That statement you read in your java book just prevents you from some errors in future, that can be hardly debugged. Computers store decimals/floats as binary, thus not everything we can express as rational in decimal numbers can be expressed as rational in binary, so there's always something like 0.7 = 0.699999999998511. You may not see difference in those comparisons you use, but in real project where you may use much more variables, add and subtract from them, this difference may appear in very surprising place.
There is some classic question about floating numbers. You may see it in any language as well
Why does this code print a result of '7'?

Java hashcode() collision for objects containing different but similar Strings

While verifying output data of my program, I identified cases for which hash codes of two different objects were identical. To get these codes, I used the following function:
int getHash( long lID, String sCI, String sCO, double dSR, double dGR, String sSearchDate ) {
int result = 17;
result = 31 * result + (int) (lID ^ (lID >>> 32));
long temp;
temp = Double.doubleToLongBits(dGR);
result = 31 * result + (int) (temp ^ (temp >>> 32));
temp = Double.doubleToLongBits(dSR);
result = 31 * result + (int) (temp ^ (temp >>> 32));
result = 31 * result + (sCI != null ? sCI.hashCode() : 0);
result = 31 * result + (sCO != null ? sCO.hashCode() : 0);
result = 31 * result + (sSearchDate != null ? sSearchDate.hashCode() : 0);
return result;
}
These are two example cases:
getHash( 50122,"03/25/2015","03/26/2015",4.0,8.0,"03/24/15 06:01" )
getHash( 51114,"03/24/2015","03/25/2015",4.0,8.0,"03/24/15 06:01" )
I suppose, this issue arises, as I have three very similar strings present in my data, and the difference in the hashcode between String A to B and B to C are of the same size, leading to an identical returned hashcode.
The proposed hashcode() implementation by IntelliJ is using 31 as a multiplier for each variable that contributes to the final hashcode. I was wondering why one is not using different values for each variable (like 33, 37, 41 (which I have seen mentioned in other posts dealing with hashcodes))? In my case, this would lead to a differentiation between my two objects.
But I'm wondering whether this could then lead to issues in other cases?
Any ideas or hints on this? Thank you very much!
The hashCode() contract allows different objects to have the same hash code. From the documentation:
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
But, since you've got a bunch of parameters for your hash, you may consider using Objects.hash() instead of doing your own implementation:
#Override
int getHash(long lID, String sCI, String sCO, double dSR, double dGR, String sSearchDate) {
return Objects.hash(lID, sCI, sCO, dSR, dGR, sSearchDate);
}
For example:
Objects.hash(50122, "03/25/2015", "03/26/2015", 4.0, 8.0, "03/24/15 06:01")
Objects.hash(51114, "03/24/2015", "03/25/2015", 4.0, 8.0, "03/24/15 06:01")
Results in:
-733895022
-394580334
The code shown by your may add zero for example by
result = 31 * result + (sCI != null ? sCI.hashCode() : 0);
When adding some zeros this may degenerate to a multiplation of
31 * 31 * 31 ...
which could destroy uniqueness.
However the hashCode method is not intended to return unique values. It simply should provide a uniform distribution of values and it should be easy to compute (or cache hashCode as the String class does).
From a more theoretical point of view a hashCode maps from a large set A into a smaller set B. Hence collisions (different elements from A map to the same value in B) are unavoidable. You could choose a set B which is bigger than A but this would violate the purpose of hashCode: performance optimization. Really, you could achieve anything with a linked list and some additional logic what you achieve with hashCode.
Prime numbers are chosen as they result in a better distribution. For example if using none primes 4*3 = 12 = 2*6 result in the same hashCode. The 31 is sometimes chosen as it is a Mersenne prime number 2^n-1 which is said to perform better on processors (I'm not sure about that).
As the hashCode method is specified not return unambiguously identify elements non-unique hashCodes are perfectly fine. Assuming uniqueness of hashCodes is a bug.
However a HashMap can be described as a set of buckets with each bucket holding a single linked list of elements. The buckets are indexed by the hashCode. Hence providing identical hashCodes leads to less buckets with longer lists. In the most extreme case (returning an arbitrary constant as hashCode) the map degenerates to a linked list.
When an object is searched in a hash data structure, the hashCode is used to get the bucket index. For each objetc in this bucket the equals method is invoked -> long lists means a large number of invocations of equals.
Conclusion: Assuming that the hashCode method is correctly used this can not cause a program to malfunction. However it may result in a severe performance penalty.
Ash the other answers explain well, it is allowed for hashCode to return same values for different objects. This is not a cryptographic hash value so it's easy to find examples of hashCode collisions.
However, I point out a problem in your code: if you have made the hashCode method yourself, you should definitely be using a better hash algorithm. Take a look at MurmurHash: http://en.wikipedia.org/wiki/MurmurHash. You want to use the 32-bit version. There are also Java implementations.
Yes, hash collisions can lead to performance issues. Therefore it's important to use a good hash algorithm. Additionally, for security MurmurHash allows a seed value to make hash collision denial of service attacks harder. You should generate that seed value you use randomly on the start of the program. Your implementation of the hashCode method is vulnerable to these hash collision DoS attacks.

Why BigDecimal("5.50") not equals to BigDecimal("5.5") and how to work around this issue?

Actually, I've found possible solution
//returns true
new BigDecimal("5.50").doubleValue() == new BigDecimal("5.5").doubleValue()
Of course, it can be improved with something like Math.abs (v1 - v2) < EPS to make the comparison more robust, but the question is whether this technique acceptable or is there a better solution?
If someone knows why java designers decided to implement BigDecimal's equals in that way, it would be interesting to read.
From the javadoc of BigDecimal
equals
public boolean equals(Object x)
Compares this BigDecimal with the specified Object for equality. Unlike compareTo, this method considers two BigDecimal objects equal only if they are equal in value and scale (thus 2.0 is not equal to 2.00 when compared by this method).
Simply use compareTo() == 0
The simplest expression to compare ignoring trailing zeros is since Java 1.5:
bd1.stripTrailingZeros().equals(bd2.stripTrailingZeros())
Using == to compare doubles seems like a bad idea in general.
You could call setScale to the same thing on the numbers you're comparing:
new BigDecimal ("5.50").setScale(2).equals(new BigDecimal("5.5").setScale (2))
where you would be setting the scale to the larger of the two:
BigDecimal a1 = new BigDecimal("5.051");
BigDecimal b1 = new BigDecimal("5.05");
// wow, this is awkward in Java
int maxScale = Collections.max(new ArrayList() {{ a1.scale(), b1.scale()}});
System.out.println(
a1.setScale(maxScale).equals(b1.setScale(maxScale))
? "are equal"
: "are different" );
Using compareTo() == 0 is the best answer, though. The increasing of the scale of one of the numbers in my approach above is likely the "unnecessary inflation" that the compareMagnitude method documentation is mentioning when it says:
/**
* Version of compareTo that ignores sign.
*/
private int compareMagnitude(BigDecimal val) {
// Match scales, avoid unnecessary inflation
long ys = val.intCompact;
long xs = this.intCompact;
and of course compareTo is a lot easier to use since it's already implemented for you.

What's wrong with using == to compare floats in Java?

According to this java.sun page == is the equality comparison operator for floating point numbers in Java.
However, when I type this code:
if(sectionID == currentSectionID)
into my editor and run static analysis, I get: "JAVA0078 Floating point values compared with =="
What is wrong with using == to compare floating point values? What is the correct way to do it?
the correct way to test floats for 'equality' is:
if(Math.abs(sectionID - currentSectionID) < epsilon)
where epsilon is a very small number like 0.00000001, depending on the desired precision.
Floating point values can be off by a little bit, so they may not report as exactly equal. For example, setting a float to "6.1" and then printing it out again, you may get a reported value of something like "6.099999904632568359375". This is fundamental to the way floats work; therefore, you don't want to compare them using equality, but rather comparison within a range, that is, if the diff of the float to the number you want to compare it to is less than a certain absolute value.
This article on the Register gives a good overview of why this is the case; useful and interesting reading.
Just to give the reason behind what everyone else is saying.
The binary representation of a float is kind of annoying.
In binary, most programmers know the correlation between 1b=1d, 10b=2d, 100b=4d, 1000b=8d
Well it works the other way too.
.1b=.5d, .01b=.25d, .001b=.125, ...
The problem is that there is no exact way to represent most decimal numbers like .1, .2, .3, etc. All you can do is approximate in binary. The system does a little fudge-rounding when the numbers print so that it displays .1 instead of .10000000000001 or .999999999999 (which are probably just as close to the stored representation as .1 is)
Edit from comment: The reason this is a problem is our expectations. We fully expect 2/3 to be fudged at some point when we convert it to decimal, either .7 or .67 or .666667.. But we don't automatically expect .1 to be rounded in the same way as 2/3--and that's exactly what's happening.
By the way, if you are curious the number it stores internally is a pure binary representation using a binary "Scientific Notation". So if you told it to store the decimal number 10.75d, it would store 1010b for the 10, and .11b for the decimal. So it would store .101011 then it saves a few bits at the end to say: Move the decimal point four places right.
(Although technically it's no longer a decimal point, it's now a binary point, but that terminology wouldn't have made things more understandable for most people who would find this answer of any use.)
What is wrong with using == to compare floating point values?
Because it's not true that 0.1 + 0.2 == 0.3
As of today, the quick & easy way to do it is:
if (Float.compare(sectionID, currentSectionID) == 0) {...}
However, the docs do not clearly specify the value of the margin difference (an epsilon from #Victor 's answer) that is always present in calculations on floats, but it should be something reasonable as it is a part of the standard language library.
Yet if a higher or customized precision is needed, then
float epsilon = Float.MIN_NORMAL;
if(Math.abs(sectionID - currentSectionID) < epsilon){...}
is another solution option.
I think there is a lot of confusion around floats (and doubles), it is good to clear it up.
There is nothing inherently wrong in using floats as IDs in standard-compliant JVM [*]. If you simply set the float ID to x, do nothing with it (i.e. no arithmetics) and later test for y == x, you'll be fine. Also there is nothing wrong in using them as keys in a HashMap. What you cannot do is assume equalities like x == (x - y) + y, etc. This being said, people usually use integer types as IDs, and you can observe that most people here are put off by this code, so for practical reasons, it is better to adhere to conventions. Note that there are as many different double values as there are long values, so you gain nothing by using double. Also, generating "next available ID" can be tricky with doubles and requires some knowledge of the floating-point arithmetic. Not worth the trouble.
On the other hand, relying on numerical equality of the results of two mathematically equivalent computations is risky. This is because of the rounding errors and loss of precision when converting from decimal to binary representation. This has been discussed to death on SO.
[*] When I said "standard-compliant JVM" I wanted to exclude certain brain-damaged JVM implementations. See this.
Foating point values are not reliable, due to roundoff error.
As such they should probably not be used for as key values, such as sectionID. Use integers instead, or long if int doesn't contain enough possible values.
This is a problem not specific to java. Using == to compare two floats/doubles/any decimal type number can potentially cause problems because of the way they are stored.
A single-precision float (as per IEEE standard 754) has 32 bits, distributed as follows:
1 bit - Sign (0 = positive, 1 = negative)
8 bits - Exponent (a special (bias-127) representation of the x in 2^x)
23 bits - Mantisa. The actuall number that is stored.
The mantisa is what causes the problem. It's kinda like scientific notation, only the number in base 2 (binary) looks like 1.110011 x 2^5 or something similar.
But in binary, the first 1 is always a 1 (except for the representation of 0)
Therefore, to save a bit of memory space (pun intended), IEEE deccided that the 1 should be assumed. For example, a mantisa of 1011 really is 1.1011.
This can cause some issues with comparison, esspecially with 0 since 0 cannot possibly be represented exactly in a float.
This is the main reason the == is discouraged, in addition to the floating point math issues described by other answers.
Java has a unique problem in that the language is universal across many different platforms, each of which could have it's own unique float format. That makes it even more important to avoid ==.
The proper way to compare two floats (not-language specific mind you) for equality is as follows:
if(ABS(float1 - float2) < ACCEPTABLE_ERROR)
//they are approximately equal
where ACCEPTABLE_ERROR is #defined or some other constant equal to 0.000000001 or whatever precision is required, as Victor mentioned already.
Some languages have this functionality or this constant built in, but generally this is a good habit to be in.
Here is a very long (but hopefully useful) discussion about this and many other floating point issues you may encounter: What Every Computer Scientist Should Know About Floating-Point Arithmetic
In addition to previous answers, you should be aware that there are strange behaviours associated with -0.0f and +0.0f (they are == but not equals) and Float.NaN (it is equals but not ==) (hope I've got that right - argh, don't do it!).
Edit: Let's check!
import static java.lang.Float.NaN;
public class Fl {
public static void main(String[] args) {
System.err.println( -0.0f == 0.0f); // true
System.err.println(new Float(-0.0f).equals(new Float(0.0f))); // false
System.err.println( NaN == NaN); // false
System.err.println(new Float( NaN).equals(new Float( NaN))); // true
}
}
Welcome to IEEE/754.
First of all, are they float or Float? If one of them is a Float, you should use the equals() method. Also, probably best to use the static Float.compare method.
You can use Float.floatToIntBits().
Float.floatToIntBits(sectionID) == Float.floatToIntBits(currentSectionID)
The following automatically uses the best precision:
/**
* Compare to floats for (almost) equality. Will check whether they are
* at most 5 ULP apart.
*/
public static boolean isFloatingEqual(float v1, float v2) {
if (v1 == v2)
return true;
float absoluteDifference = Math.abs(v1 - v2);
float maxUlp = Math.max(Math.ulp(v1), Math.ulp(v2));
return absoluteDifference < 5 * maxUlp;
}
Of course, you might choose more or less than 5 ULPs (‘unit in the last place’).
If you’re into the Apache Commons library, the Precision class has compareTo() and equals() with both epsilon and ULP.
you may want it to be ==, but 123.4444444444443 != 123.4444444444442
If you *have to* use floats, strictfp keyword may be useful.
http://en.wikipedia.org/wiki/strictfp
Two different calculations which produce equal real numbers do not necessarily produce equal floating point numbers. People who use == to compare the results of calculations usually end up being surprised by this, so the warning helps flag what might otherwise be a subtle and difficult to reproduce bug.
Are you dealing with outsourced code that would use floats for things named sectionID and currentSectionID? Just curious.
#Bill K: "The binary representation of a float is kind of annoying." How so? How would you do it better? There are certain numbers that cannot be represented in any base properly, because they never end. Pi is a good example. You can only approximate it. If you have a better solution, contact Intel.
As mentioned in other answers, doubles can have small deviations. And you could write your own method to compare them using an "acceptable" deviation. However ...
There is an apache class for comparing doubles: org.apache.commons.math3.util.Precision
It contains some interesting constants: SAFE_MIN and EPSILON, which are the maximum possible deviations of simple arithmetic operations.
It also provides the necessary methods to compare, equal or round doubles. (using ulps or absolute deviation)
In one line answer I can say, you should use:
Float.floatToIntBits(sectionID) == Float.floatToIntBits(currentSectionID)
To make you learned more about using related operators correctly, I am elaborating some cases here:
Generally, there are three ways to test strings in Java. You can use ==, .equals (), or Objects.equals ().
How are they different? == tests for the reference quality in strings meaning finding out whether the two objects are the same. On the other hand, .equals () tests whether the two strings are of equal value logically. Finally, Objects.equals () tests for any nulls in the two strings then determine whether to call .equals ().
Ideal operator to use
Well this has been subject to lots of debates because each of the three operators have their unique set of strengths and weaknesses. Example, == is often a preferred option when comparing object reference, but there are cases where it may seem to compare string values as well.
However, what you get is a falls value because Java creates an illusion that you are comparing values but in the real sense you are not. Consider the two cases below:
Case 1:
String a="Test";
String b="Test";
if(a==b) ===> true
Case 2:
String nullString1 = null;
String nullString2 = null;
//evaluates to true
nullString1 == nullString2;
//throws an exception
nullString1.equals(nullString2);
So, it’s way better to use each operator when testing the specific attribute it’s designed for. But in almost all cases, Objects.equals () is a more universal operator thus experience web developers opt for it.
Here you can get more details: http://fluentthemes.com/use-compare-strings-java/
The correct way would be
java.lang.Float.compare(float1, float2)
One way to reduce rounding error is to use double rather than float. This won't make the problem go away, but it does reduce the amount of error in your program and float is almost never the best choice. IMHO.

Can every float be expressed exactly as a double?

Can every possible value of a float variable can be represented exactly in a double variable?
In other words, for all possible values X will the following be successful:
float f1 = X;
double d = f1;
float f2 = (float)d;
if(f1 == f2)
System.out.println("Success!");
else
System.out.println("Failure!");
My suspicion is that there is no exception, or if there is it is only for an edge case (like +/- infinity or NaN).
Edit: Original wording of question was confusing (stated two ways, one which would be answered "no" the other would be answered "yes" for the same answer). I've reworded it so that it matches the question title.
Yes.
Proof by enumeration of all possible cases:
public class TestDoubleFloat {
public static void main(String[] args) {
for (long i = Integer.MIN_VALUE; i <= Integer.MAX_VALUE; i++) {
float f1 = Float.intBitsToFloat((int) i);
double d = (double) f1;
float f2 = (float) d;
if (f1 != f2) {
if (Float.isNaN(f1) && Float.isNaN(f2)) {
continue; // ok, NaN
}
fail("oops: " + f1 + " != " + f2);
}
}
}
}
finishes in 12 seconds on my machine. 32 bits are small.
In theory, there is not such a value, so "yes", every float should be representable as a double.. Converting from a float to a double should involve just tacking four bytes of 00 on the end -- they are stored using the same format, just with different sized fields.
Yes, floats are a subset of doubles. Both floats and doubles have the form (sign * a * 2^b). The difference between floats and doubles is the number of bits in a & b. Since doubles have more bits available, assigning a float value to a double effectively means inserting extra 0 bits.
As everyone has already said, "no". But that's actually a "yes" to the question itself, i.e. every float can be exactly expressed as a double. Confusing. :)
If I'm reading the language specification correctly (and as everyone else is confirming), there is no such value.
That is, each claims only to hold only IEEE 754 standard values, so casts between the two should incur no change except in memory given.
(clarification: There would be no change as long as the value was small enough to be held in a float; obviously if the value was too many bits to be held in a float to begin with, casting from double to float would result in a loss of precision.)
#KenG: This code:
float a = 0.1F
println "a=${a}"
double d = a
println "d=${d}"
fails not because 0.1f can't be exactly represented. The question was "is there a float value that cannot be represented as a double", which this code doesn't prove. Although 0.1f can't be stored exactly, the value that a is given (which isn't 0.1f exactly) can be stored as a double (which also won't be 0.1f exactly). Assuming an Intel FPU, the bit pattern for a is:
0 01111011 10011001100110011001101
and the bit pattern for d is:
0 01111111011 100110011001100110011010 (followed by lots more zeros)
which has the same sign, exponent (-4 in both cases) and the same fractional part (separated by spaces above). The difference in the output is due to the position of the second non-zero digit in the number (the first is the 1 after the point) which can only be represented with a double. The code that outputs the string format stores intermediate values in memory and is specific to floats and doubles (i.e. there is a function double-to-string and another float-to-string). If the to-string function was optimised to use the FPU stack to store the intermediate results of the to-string process, the output would be the same for float and double since the FPU uses the same, larger format (80bits) for both float and double.
There are no float values that can't be stored identically in a double, i.e. the set of float values is a sub-set of the the set of double values.
Snark: NaNs will compare differently after (or indeed before) conversion.
This does not, however, invalidate the answers already given.
I took the code you listed and decided to try it in C++ since I thought it might execute a little faster and it is significantly easier to do unsafe casting. :-D
I found out that for valid numbers, the conversion works and you get the exact bitwise representation after the cast. However, for non-numbers, e.g. 1.#QNAN0, etc., the result will use a simplified representation of the non-number rather than the exact bits of the source. For example:
**** FAILURE **** 2140188725 | 1.#QNAN0 -- 0xa0000000 0x7ffa1606
I cast an unsigned int to float then to double and back to float. The number 2140188725 (0x7F90B035) results in a NAN and converting to double and back is still a NAN but not the exact same NAN.
Here is the simple C++ code:
typedef unsigned int uint;
for (uint i = 0; i < 0xFFFFFFFF; ++i)
{
float f1 = *(float *)&i;
double d = f1;
float f2 = (float)d;
if(f1 != f2)
printf("**** FAILURE **** %u | %f -- 0x%08x 0x%08x\n", i, f1, f1, f2);
if ((i % 1000000) == 0)
printf("Iteration: %d\n", i);
}
The answer to the first question is yes, the answer to the 'in other words', however is no. If you change the test in the code to be if (!(f1 != f2)) the answer to the second question becomes yes -- it will print 'Success' for all float values.
In theory every normal single can have the exponent and mantissa padded to create a double and then remove the padding and you return to the original single.
When you go from theory to reality is when you will have problems. I dont know if you were interested in theory or implementation. If it is implementation then you can rapidly get into trouble.
IEEE is a horrible format, my understanding it was intentionally designed to be so tough that nobody could meet it and allow the market to catch up to intel (this was a while back) allowing for more competition. If that is true it failed, either way we are stuck with this dreadful spec. Something like the TI format is far superior for the real world in so many ways. I have no connection to either company or any of these formats.
Thanks to this spec there are very few if any fpus that actually meet it (in hardware or even in hardware plus the operating system), and those that do often fail on the next generation. (google: TestFloat). The problems these days tend to lie in the int to float and float to int and not single to double and double to single as you have specified above. Of course what operation is the fpu going to perform to do that conversion? Add 0? Multiply by 1? Depends on the fpu and the compiler.
The problem with IEEE related to your question above is that there is more than one way a number, not every number but many numbers can be represented. If I wanted to break your code I would start with minus zero in the hope that one of the two operations would convert it to a plus zero. Then I would try denormals. And it should fail with a signaling nan, but you called that out as a known exception.
The problem is that equal sign, here is rule number one about floating point, never use an equal sign. Equals is a bit comparison not a value comparison, if you have two values represented in different ways (plus zero and minus zero for example) the bit comparison will fail even though its the same number. Greater than and less than are done in the fpu, equals is done with the integer alu.
I realize that you probably used the equal to explain the problem and not necessarily the code you wanted to succeed or fail.
If a floating-point type is viewed as representing a precise value, then as other posters have noted, every float value is representable as a double, but only a few values of double can be represented by float. On the other hand, if one recognizes that floating-point values are approximations, one will realize the real situation is reversed. If one uses a very precise instrument to measure something which is 3.437mm, one may correctly describe is size as 3.4mm. if one uses a ruler to measure the object as 3.4mm, it would be incorrect to describe its size as 3.400mm.
Even bigger problems exist at the top of the range. There is a float value that represents: "computed value exceeded 2^127 by an unknown amount", but there's no double value that indicates such a thing. Casting an "infinity" from single to double will yield a value "computed value exceeded 2^1023 by an unknown amount" which is off by a factor of over a googol.

Categories

Resources