I was reading about floating-point NaN values in the Java Language Specification (I'm boring). A 32-bit float has this bit format:
seee eeee emmm mmmm mmmm mmmm mmmm mmmm
s is the sign bit, e are the exponent bits, and m are the mantissa bits. A NaN value is encoded as an exponent of all 1s, and the mantissa bits are not all 0 (which would be +/- infinity). This means that there are lots of different possible NaN values (having different s and m bit values).
On this, JLS §4.2.3 says:
IEEE 754 allows multiple distinct NaN values for each of its single and double floating-point formats. While each hardware architecture returns a particular bit pattern for NaN when a new NaN is generated, a programmer can also create NaNs with different bit patterns to encode, for example, retrospective diagnostic information.
The text in the JLS seems to imply that the result of, for example, 0.0/0.0, has a hardware-dependent bit pattern, and depending on whether that expression was computed as a compile time constant, the hardware it is dependent on might be the hardware the Java program was compiled on or the hardware the program was run on. This all seems very flaky if true.
I ran the following test:
System.out.println(Integer.toHexString(Float.floatToRawIntBits(0.0f/0.0f)));
System.out.println(Integer.toHexString(Float.floatToRawIntBits(Float.NaN)));
System.out.println(Long.toHexString(Double.doubleToRawLongBits(0.0d/0.0d)));
System.out.println(Long.toHexString(Double.doubleToRawLongBits(Double.NaN)));
The output on my machine is:
7fc00000
7fc00000
7ff8000000000000
7ff8000000000000
The output shows nothing out of the expected. The exponent bits are all 1. The upper bit of the mantissa is also 1, which for NaNs apparently indicates a "quiet NaN" as opposed to a "signalling NaN" (https://en.wikipedia.org/wiki/NaN#Floating_point). The sign bit and the rest of the mantissa bits are 0. The output also shows that there was no difference between the NaNs generated on my machine and the constant NaNs from the Float and Double classes.
My question is, is that output guaranteed in Java, regardless of the CPU of the compiler or VM, or is it all genuinely unpredictable? The JLS is mysterious about this.
If that output is guaranteed for 0.0/0.0, are there any arithmetic ways of producing NaNs that do have other (possibly hardware-dependent?) bit patterns? (I know intBitsToFloat/longBitsToDouble can encode other NaNs, but I'd like to know if other values can occur from normal arithmetic.)
A followup point: I've noticed that Float.NaN and Double.NaN specify their exact bit pattern, but in the source (Float, Double) they are generated by 0.0/0.0. If the result of that division is really dependent on the hardware of the compiler, it seems like there is a flaw there in either the spec or the implementation.
This is what §2.3.2 of the JVM 7 spec has to say about it:
The elements of the double value set are exactly the values that can be represented
using the double floating-point format defined in the IEEE 754 standard, except
that there is only one NaN value (IEEE 754 specifies 253-2 distinct NaN values).
and §2.8.1:
The Java Virtual Machine has no signaling NaN value.
So technically there is only one NaN. But §4.2.3 of the JLS also says (right after your quote):
For the most part, the Java SE platform treats NaN values of a given type as though collapsed into a single canonical value, and hence this specification normally refers to an arbitrary NaN as though to a canonical value.
However, version 1.3 of the Java SE platform introduced methods enabling the programmer to distinguish between NaN values: the Float.floatToRawIntBits and Double.doubleToRawLongBits methods. The interested reader is referred to the specifications for the Float and Double classes for more information.
Which I take to mean exactly what you and CandiedOrange propose: It is dependent on the underlying processor, but Java treats them all the same.
But it gets better: Apparently, it is entirely possible that your NaN values are silently converted to different NaNs, as described in Double.longBitsToDouble():
Note that this method may not be able to return a double NaN with exactly same bit pattern as the long argument. IEEE 754 distinguishes between two kinds of NaNs, quiet NaNs and signaling NaNs. The differences between the two kinds of NaN are generally not visible in Java. Arithmetic operations on signaling NaNs turn them into quiet NaNs with a different, but often similar, bit pattern. However, on some processors merely copying a signaling NaN also performs that conversion. In particular, copying a signaling NaN to return it to the calling method may perform this conversion. So longBitsToDouble may not be able to return a double with a signaling NaN bit pattern. Consequently, for some long values, doubleToRawLongBits(longBitsToDouble(start)) may not equal start. Moreover, which particular bit patterns represent signaling NaNs is platform dependent; although all NaN bit patterns, quiet or signaling, must be in the NaN range identified above.
For reference, there is a table of the hardware-dependant NaNs here. In summary:
- x86:
quiet: Sign=0 Exp=0x7ff Frac=0x80000
signalling: Sign=0 Exp=0x7ff Frac=0x40000
- PA-RISC:
quiet: Sign=0 Exp=0x7ff Frac=0x40000
signalling: Sign=0 Exp=0x7ff Frac=0x80000
- Power:
quiet: Sign=0 Exp=0x7ff Frac=0x80000
signalling: Sign=0 Exp=0x7ff Frac=0x5555555500055555
- Alpha:
quiet: Sign=0 Exp=0 Frac=0xfff8000000000000
signalling: Sign=1 Exp=0x2aa Frac=0x7ff5555500055555
So, to verify this you would really need one of these processors and go try it out. Also any insights on how to interpret the longer values for the Power and Alpha architectures are welcome.
The way I read the JLS here the exact bit value of a NaN is dependent on who/what made it and since the JVM didn't make it, don't ask them. You might as well ask them what an "Error code 4" string means.
The hardware produces different bit patterns meant to represent different kinds of NaN's. Unfortunately the different kinds hardware produce different bit patterns for the same kinds of NaN's. Fortunately there is a standard pattern that Java can use to at least tell that it is some kind of NaN.
It's like Java looked at the "Error code 4" string and said, "We don't know what 'code 4' means on your hardware, but there was the word 'error' in that string, so we think it's an error."
The JLS tries to give you a chance to figure it out on your own though:
"However, version 1.3 of the Java SE platform introduced methods enabling the programmer to distinguish between NaN values: the Float.floatToRawIntBits and Double.doubleToRawLongBits methods. The interested reader is referred to the specifications for the Float and Double classes for more information."
Which looks to me like a C++ reinterpret_cast. It's Java giving you a chance to analyze the NaN yourself in case you happen to know how its signal was encoded. If you want to track down the hardware specs so you can predict what different events should produce which NaN bit patterns you are free to do so but you are outside the uniformity the JVM was meant to give us. So expect it might change from hardware to hardware.
When testing if a number is NaN we check if it's equal to itself since it's the only number that isn't. This isn't to say that the bits are different. Before comparing the bits the JVM tests for the many bit patterns that say it's a NaN. If it's any of those patterns then it reports that it's not equal, even if the bits of the two operands really are the same (and even if they're different).
Back in 1964, when pressed to give an exact definition for pornography, U.S. Supreme Court Justice Stewart famously said, “I Know It When I See It”. I think of Java as doing the same thing with NaN's. Java can't tell you anything that a "signaling" NaN might be signaling cause it doesn't know how that signal was encoded. But it can look at the bits and tell it's a NaN of some kind since that pattern follows one standard.
If you happen to be on hardware that encodes all NaN's with uniform bits you'll never prove that Java is doing anything to make NaN's have uniform bits. Again, the way I read the JLS, they are outright saying you are on your own here.
I can see why this feels flaky. It is flaky. It's just not Java's fault. I'll lay odds that some where out there some enterprising hardware manufactures came up with wonderfully expressive signaling NaN bit patterns but they failed to get it adopted as a standard widely enough that Java can count on it. That's what's flaky. We have all these bits reserved for signalling what kind of NaN we have and can't use them because we don't agree what they mean. Having Java beat NaN's into a uniform value after the hardware makes them would only destroy that information, harm performance, and the only payoff is to not seem flaky. Given the situation, I'm glad they realized they could cheat their way out of the problem and define NaN as not being equal to anything.
Here is a program demonstrating different NaN bit patterns:
public class Test {
public static void main(String[] arg) {
double myNaN = Double.longBitsToDouble(0x7ff1234512345678L);
System.out.println(Double.isNaN(myNaN));
System.out.println(Long.toHexString(Double.doubleToRawLongBits(myNaN)));
final double zeroDivNaN = 0.0 / 0.0;
System.out.println(Double.isNaN(zeroDivNaN));
System.out
.println(Long.toHexString(Double.doubleToRawLongBits(zeroDivNaN)));
}
}
output:
true
7ff1234512345678
true
7ff8000000000000
Regardless of what the hardware does, the program can itself create NaNs that may not be the same as e.g. 0.0/0.0, and may have some meaning in the program.
The only other NaN value that I could generate with normal arithmetic operations so far is the same but with the sign changed:
public static void main(String []args) {
Double tentative1 = 0d/0d;
Double tentative2 = Math.sqrt(-1d);
System.out.println(tentative1);
System.out.println(tentative2);
System.out.println(Long.toHexString(Double.doubleToRawLongBits(tentative1)));
System.out.println(Long.toHexString(Double.doubleToRawLongBits(tentative2)));
System.out.println(tentative1 == tentative2);
System.out.println(tentative1.equals(tentative2));
}
Output:
NaN
NaN
7ff8000000000000
fff8000000000000
false
true
Related
double r = 11.631;
double theta = 21.4;
In the debugger, these are shown as 11.631000000000000 and 21.399999618530273.
How can I avoid this?
These accuracy problems are due to the internal representation of floating point numbers and there's not much you can do to avoid it.
By the way, printing these values at run-time often still leads to the correct results, at least using modern C++ compilers. For most operations, this isn't much of an issue.
I liked Joel's explanation, which deals with a similar binary floating point precision issue in Excel 2007:
See how there's a lot of 0110 0110 0110 there at the end? That's because 0.1 has no exact representation in binary... it's a repeating binary number. It's sort of like how 1/3 has no representation in decimal. 1/3 is 0.33333333 and you have to keep writing 3's forever. If you lose patience, you get something inexact.
So you can imagine how, in decimal, if you tried to do 3*1/3, and you didn't have time to write 3's forever, the result you would get would be 0.99999999, not 1, and people would get angry with you for being wrong.
If you have a value like:
double theta = 21.4;
And you want to do:
if (theta == 21.4)
{
}
You have to be a bit clever, you will need to check if the value of theta is really close to 21.4, but not necessarily that value.
if (fabs(theta - 21.4) <= 1e-6)
{
}
This is partly platform-specific - and we don't know what platform you're using.
It's also partly a case of knowing what you actually want to see. The debugger is showing you - to some extent, anyway - the precise value stored in your variable. In my article on binary floating point numbers in .NET, there's a C# class which lets you see the absolutely exact number stored in a double. The online version isn't working at the moment - I'll try to put one up on another site.
Given that the debugger sees the "actual" value, it's got to make a judgement call about what to display - it could show you the value rounded to a few decimal places, or a more precise value. Some debuggers do a better job than others at reading developers' minds, but it's a fundamental problem with binary floating point numbers.
Use the fixed-point decimal type if you want stability at the limits of precision. There are overheads, and you must explicitly cast if you wish to convert to floating point. If you do convert to floating point you will reintroduce the instabilities that seem to bother you.
Alternately you can get over it and learn to work with the limited precision of floating point arithmetic. For example you can use rounding to make values converge, or you can use epsilon comparisons to describe a tolerance. "Epsilon" is a constant you set up that defines a tolerance. For example, you may choose to regard two values as being equal if they are within 0.0001 of each other.
It occurs to me that you could use operator overloading to make epsilon comparisons transparent. That would be very cool.
For mantissa-exponent representations EPSILON must be computed to remain within the representable precision. For a number N, Epsilon = N / 10E+14
System.Double.Epsilon is the smallest representable positive value for the Double type. It is too small for our purpose. Read Microsoft's advice on equality testing
I've come across this before (on my blog) - I think the surprise tends to be that the 'irrational' numbers are different.
By 'irrational' here I'm just referring to the fact that they can't be accurately represented in this format. Real irrational numbers (like π - pi) can't be accurately represented at all.
Most people are familiar with 1/3 not working in decimal: 0.3333333333333...
The odd thing is that 1.1 doesn't work in floats. People expect decimal values to work in floating point numbers because of how they think of them:
1.1 is 11 x 10^-1
When actually they're in base-2
1.1 is 154811237190861 x 2^-47
You can't avoid it, you just have to get used to the fact that some floats are 'irrational', in the same way that 1/3 is.
One way you can avoid this is to use a library that uses an alternative method of representing decimal numbers, such as BCD
If you are using Java and you need accuracy, use the BigDecimal class for floating point calculations. It is slower but safer.
Seems to me that 21.399999618530273 is the single precision (float) representation of 21.4. Looks like the debugger is casting down from double to float somewhere.
You cant avoid this as you're using floating point numbers with fixed quantity of bytes. There's simply no isomorphism possible between real numbers and its limited notation.
But most of the time you can simply ignore it. 21.4==21.4 would still be true because it is still the same numbers with the same error. But 21.4f==21.4 may not be true because the error for float and double are different.
If you need fixed precision, perhaps you should try fixed point numbers. Or even integers. I for example often use int(1000*x) for passing to debug pager.
Dangers of computer arithmetic
If it bothers you, you can customize the way some values are displayed during debug. Use it with care :-)
Enhancing Debugging with the Debugger Display Attributes
Refer to General Decimal Arithmetic
Also take note when comparing floats, see this answer for more information.
According to the javadoc
"If at least one of the operands to a numerical operator is of type double, then the
operation is carried out using 64-bit floating-point arithmetic, and the result of the
numerical operator is a value of type double. If the other operand is not a double, it is
first widened (§5.1.5) to type double by numeric promotion (§5.6)."
Here is the Source
I want to know how to round a floating-point number to a machine floating number(for example double).
The number "0.01111116" cannot be represented by machine floating point, in some rounding mode, this number should be represented as "0.011111159999999995" with some precision loss.
But I don't know how to finish this in Java?
So I want to know the API to set the rounding mode to get the exact representation of machine floating-point number in Java.
Thanks!
The Java specification does not provide any means to control the floating-point rounding mode. Round-to-nearest is used.
It is not generally possible to arrange for floating-point arithmetic to produce mathematically exact results, so software must be designed to tolerate and adjust for errors or, in very special cases, to get exact results by using extra care.
If you are talking about a literal 0.01111116 in the source code of your program, the Java compiler converts that into the binary floating point representation at compile time.
If you are talking about (say) a String containing the characters "0.01111116", that gets converted to a binary floating point representation if/when you call (for example) Double.parseDouble(...).
Either way, the conversion happens behind the scenes and you don't have any control over the actual rounding. But in a sense it is moot. It is inherent in the nature of the representation that some rounding happens, and the result is generally speaking "the most accurate" you can get from a mathematical perspective ... given the floating point type you have chosen.
If you really wanted the conversion to use different rounding / truncation rules you could either do this after the fact (e.g. round or truncate the converted value), or you could implement your own String to floating-point conversion method.
You won't be able to change the way that the Java compiler converts literals. It is part of the language specification.
So I want to know the API to set the rounding mode to get the exact representation of machine floating-point number in Java.
There is another way of thinking about this.
The exact representation of a machine floating point number is 32 or 64 bits of binary data. You could render the bits of a double as hexadecimal in a couple of ways:
Double::doubleToLongBits or Double::doubleToRawLongBits followed by Long::toHexString gives a precise but unhelpful rendering, or
Double::toHexString gives a hexadecimal floating point representation.
All of these renderings are exact (no rounding errors) representations of the double, but most readers won't understand them. (The "raw" version deals best with edge-cases involving variant NaN values.)
There are equivalent methods for float.
Try commons-math3 Precision http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math3/util/Precision.html
double d = 0.1 + 0.2;
System.out.println(d);
d = Precision.round(d, 1); <-- 1 decimal place
System.out.println(d);
output
0.30000000000000004
0.3
Are there any fully compliant IEEE754r implementations available for Java that offer support for all the features Java chose to omit (or rather high level languages in general like to omit):
Traps
Sticky flags
Directed rounding modes
Extended/long double
Quad precision
DPD (densly packed decimals)
Clarification before anyone gets it wrong: I'm not looking for the JVM to offer any support for the above, just some classes that do implement the types and operations in software, basically something in the style of the already existing primitve wrapper classes Float/Double.
No, there does not exist a fully compliant IEEE754R implementation. Not only in Java, but in all currently available languages (Status July 2012).
EDIT: The poster asked for IEEE754 R support which is identical to IEEE 754-2008. If I want to add all reasons why there is no such thing, this would be long.
Traps: No, calling own routines with OVERFLOW, UNDERFLOW, INEXACT etc. with SIGFPE is not
a trap. See IEEE754 (the old one) p. 21 for what constitutes a trap.
Signaling NaNs. NaN payload access. Flag access.
Enumerate languages which can do that.
Rounding modes: The new standard defines roundTiesToAway (p. 16) as new rounding mode.
Unfortunately there are AFAIK no processors which supports this mode and
no software emulation either.
Quad precision: Only supported in very few compilers and even less compilers which are not broken.
Densely packed Decimals: Will probably only supported in languages which use decimals,
e.g. COBOL.
Intersection of all sets: Empty set. None. Nothing.
This with the following source implemented functions below:
double nextAfter(double x, double y) - returns the double adjacent to x in the direction of y
double scalb(double x, int e) - computes x*2e quickly
boolean unordered(double c1, double c2) - returns true iff the two cannot be compared numerically (one or both is NaN)
int fpclassify(double value) - classifies a floating-point value into one of five types:
FP_NAN: "not any number", typically the result of illegal operations like 0/0
FP_INFINITY: represents one end of the real line, available by 1/0 or POSITIVE_INFINITY
FP_ZERO: positive or negative zero; they are different, but not so much that it comes up much
FP_SUBNORMAL: a class of numbers very near zero; further explanation would require a detailed examination of the floating-point binary representation
FP_NORMAL: most values you encounter are "normal"
double copySign(double value, double sign) - returns value, possibly with its sign flipped, to match "sign"
double logb754(double value) - extracts the exponent of the value, to compute log2
double logb854(double value) - like logb754(value), but with an IEEE854-compliant variant for subnormal numbers
double logbn(double value) - also computing log2(value), but with a normalizing correction for the subnormals; this is the best log routine
double raise(double x) - not actually an IEEE754 routine, this is an optimized version of nextAfter(x,POSITIVE_INFINITY)
double lower(double x) - not actually an IEEE754 routine, this is an optimized version of nextAfter(x,NEGATIVE_INFINITY)
"All of these routines also have float variants, differing only in argument and return types. The class is org.dosereality.util.IEEE754"
Sun bug reference 2003
So I know the IEEE 754 specifies some special floating point values for values that are not real numbers. In Java, casting those values to a primitive int does not throw an exception like I would have expected. Instead we have the following:
int n;
n = (int)Double.NaN; // n == 0
n = (int)Double.POSITIVE_INFINITY; // n == Integer.MAX_VALUE
n = (int)Double.NEGATIVE_INFINITY; // n == Integer.MIN_VALUE
What is the rationale for not throwing exceptions in these cases? Is this an IEEE standard, or was it merely a choice by the designers of Java? Are there bad consequences that I am unaware of if exceptions would be possible with such casts?
What is the rationale for not throwing exceptions in these cases?
I imagine that the reasons include:
These are edge cases, and are likely to occur rarely in applications that do this kind of thing.
The behavior is not "totally unexpected".
When an application casts from a double to an int, significant loss of information is expected. The application is either going to ignore this possibility, or the cast will be preceded by checks to guard against it ... which could also check for these cases.
No other double / float operations result in exceptions, and (IMO) it would be a bit schizophrenic to do it in this case.
There could possibly be a performance hit ... on some hardware platforms (current or future).
A commentator said this:
"I suspect the decision to not have the conversion throw an exception was motivated by a strong desire to avoid throwing exceptions for any reasons, for fear of forcing code to add it to a throws clause."
I don't think that is a plausible explanation:
The Java language designers1 don't have a mindset of avoiding throwing exceptions "for any reason". There are numerous examples in the Java APIs that demonstrate this.
The issue of the throws clause is addressed by making the exception unchecked. Indeed, many related exceptions like ArithmeticException or ClassCastException are declared as unchecked for this reason.
Is this an IEEE standard, or was it merely a choice by the designers of Java?
The latter, I think.
Are there bad consequences that I am unaware of if exceptions would be possible with such casts?
None apart from the obvious ones ...
(But it is not really relevant. The JLS and JVM spec say what they say, and changing them would be liable to break existing code. And it is not just Java code we are talking about now ...)
I've done a bit of digging. A lot of the x86 instructions that could be used convert from double to integers seem to generate hardware interrupts ... unless masked. It is not clear (to me) whether the specified Java behavior is easier or harder to implement than the alternative suggested by the OP.
1 - I don't dispute that some Java programmers do think this way. But they were / are not the Java designers, and this question is asking specifically about the Java design rationale.
What is the rationale for not throwing
exceptions in these cases? Is this an
IEEE standard, or was it merely a
choice by the designers of Java?
The IEEE 754-1985 Standard in pages 20 and 21 under the sections 2.2.1 NANs and 2.2.2 Infinity clearly explains the reasons why NAN and Infinity values are required by the standard. Therefore this not a Java thing.
The Java Virtual Machine Specification in section 3.8.1 Floating Point Arithmetic and IEEE 754 states that when conversions to integral types are carried out then the JVM will apply rounding toward zero which explains the results you are seeing.
The standard does mention a feature named "trap handler" that could be used to determine when overflow or NAN occurs but the Java Virtual Machine Specification clearly states this is not implemented for Java. It says in section 3.8.1:
The floating-point operations of the
Java virtual machine do not throw
exceptions, trap, or otherwise signal
the IEEE 754 exceptional conditions of
invalid operation, division by zero,
overflow, underflow, or inexact. The
Java virtual machine has no signaling
NaN value.
So, the behavior is not unspecified regardless of consequences.
Are there bad consequences that I am
unaware of if exceptions would be
possible with such casts?
Understanding the reasons stated in standard should suffice to answer this question. The standard explains with exhaustive examples the consequences you are asking for here. I would post them, but that would be too much information here and the examples can be impossible to format appropriately in this edition tool.
EDIT
I was reading the latest maintenance review of the Java Virtual Machine Specification as published recently by the JCP as part of their work on JSR 924 and in the section 2.11.14 named type conversion istructions contains some more information that could help you in your quest for answers, not yet what you are looking for, but I believe it helps a bit. It says:
In a narrowing numeric conversion of a
floating-point value to an integral
type T, where T is either int or long,
the floating-point value is converted
as follows:
If the floating-point value is NaN, the result of the conversion is an
int or long 0.
Otherwise, if the floating-point value is not an infinity, the
floating-point value is rounded to
an integer value V using IEEE 754
round towards zero mode.
There are two cases:
If T is long and this integer value can be represented as a long, then
the result is the long value V.
If T is of type int and this integer value can be represented as
an int, then the result is the int
value V.
Otherwise:
Either the value must be too small (a negative value of large
magnitude or negative infinity),
and the result is the smallest
representable value of type int or
long.
Or the value must be too large (a positive value of large magnitude or
posi- tive infinity), and the result
is the largest representable value of
type int or long.
A narrowing numeric conversion from
double to float behaves in accordance
with IEEE 754. The result is correctly
rounded using IEEE 754 round to
nearest mode. A value too small to be
represented as a float is converted to
a positive or negative zero of type
float; a value too large to be
represented as a float is converted to
a positive or negative infinity. A
double NaN is always converted to
a float NaN.
Despite the fact that overflow,
underflow, or loss of precision
may occur, narrowing conversions among
numeric types never cause the
Java virtual machine to throw a
runtime exception (not to be confused
with an IEEE 754 floating-point
exception).
I know this simply restates what you already know, but it has a clue, it appears that the IEEE standard has a requirement of rounding to the nearest. Perhaps there you can find the reasons for this behavior.
EDIT
The IEEE Standard in question in section 2.3.2 Rounding Modes States:
By
default, rounding means round toward
the
nearest. The standard requires that three other
rounding modes be provided; namely, round toward
0, round toward +Infinity and round toward –Infinity.
When used with the convert to integer operation, round toward –Infinity causes the convert to become the floor function, whereas, round toward +Infinity
is ceiling.
The mode rounding affects overflow because when round toward O or round toward
-Infinite is in effect, an overflow of positive magnitude causes the default result to be the largest representable number, not +Infinity.
Similarly, overflows of
negative magnitude will produce
the largest negative number when round toward +Infinity or round toward O is in effect.
Then they proceed to mention an example of why this is useful in interval arithmetic. Not sure, again, that this is the answer you are looking for, but it can enrich your search.
There is an ACM presentation from 1998 that still seems surprisingly current and brings some light: https://people.eecs.berkeley.edu/~wkahan/JAVAhurt.pdf.
More concretely, regarding the surprising lack of exceptions when casting NaNs and infinities: see page 3, point 3: "Infinities and NaNs unleashed without the protection of floating-point traps and flags mandated by IEEE Standards 754/854 belie Java’s claim to robustness."
The presentation doesn't really answer the "why's", but does explain the consequences of the problematic design decisions in the Java language's implementation of floating point, and puts them in the context of the IEEE standards and even other implementations.
It is in the JLS, see:
JavaRanch post
http://www.coderanch.com/t/239753/java-programmer-SCJP/certification/cast-double-int
However a warning will be nice.
Actually i think during some casting there are bit operations done (probably for perf issues?) so that you can have some unexpected behaviours. See what happens when you use >> and << operators.
For exemple:
public static void main(String[] args) {
short test1 = (short)Integer.MAX_VALUE;
System.out.println(test1);
short test2 = (short)Integer.MAX_VALUE-1;
System.out.println(test2);
short test3 = (short)Integer.MAX_VALUE-2;
System.out.println(test3);
short test4 = (short)Double.MAX_VALUE-3;
System.out.println(test4);
}
will output:
-1
-2
-3
-4
I read the JVM specification for the fpstrict modifier but still don't fully understand what it means.
Can anyone enlighten me?
Basically, it mandates that calculations involving the affected float and double variables have to follow the IEEE 754 spec to the letter, including for intermediate results.
This has the effect of:
Ensuring that the same input will always generate exactly the same result on all systems
The CPU may have to do some extra work, making it slightly slower
The results will be, in some cases, less accurate (much less, in extreme cases)
Edit:
More specifically, many modern CPUs use 80 bit floating point arithmetic ("extended precision") internally. Thus, they can internally represent some numbers in denormalized form that would cause arithmetic overflow or underflow (yielding Infinity or zero, respectively) in 32 or 64bit floats; in borderline cases, 80 bit just allows to retain more precision. When such numbers occur as intermediate results in a calculation, but with an end result inside the range of numbers that can represented by 32/64bit floats, then you get a "more correct" result on machines that use 80bit float arithmetic than on machines that don't - or in code that uses strictfp.
I like this answer, which hits the point:
It ensures that your calculations are equally wrong on all platforms.
strictfp is a keyword and can be used to modify a class or a method, but never a
variable. Marking a class as strictfp means that any method code in the class will
conform to the IEEE 754 standard rules for floating points. Without that modifier,
floating points used in the methods might behave in a platform-dependent way.
If you don't declare a class as strictfp, you can still get strictfp behavior on a
method-by-method basis, by declaring a method as strictfp.
If you don't know the IEEE 754 standard, now's not the time to learn it. You have, as we say, bigger fish to fry.