Difference between Float.MIN_VALUE and Float.MIN_NORMAL? [duplicate] - java

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Difference among Double.MIN_NORMAL and Double.MIN_VALUE
1) Can some one pls explain to me what the difference is between MIN_NORMAL and MIN_VALUE?
System.out.println(Float.MIN_NORMAL);
System.out.println(Float.MIN_VALUE);
2) Also, why does still still print 1.0?
float f = Float.MIN_NORMAL + 1.0f;
System.out.println(f);
double d = Float.MIN_NORMAL + 1.0f;
System.out.println(d);
Output:
1.17549435E-38
1.4E-45
1.0
1.0

The answer for the first question is in the duplicate.
The answer for the second question is:
Both Floats and Doubles do not have infinite precision. You can conveniently think they have around 16 digits of precision. Anything past that and you are going to have rounding errors and truncation.
So, 1.0e0 + 1e-38 is going to lack precision to do anything except for end with 1.0e0 due to truncating the extra precision.
Really, like the rest of the answer, it requires an understanding of how floating point numbers in IEEE format are actually added in binary. The basic idea is that the non-sign-and-exponent portion of the binary, floating point number is shifted in the IEEE-754 unit on the CPU (80 bits wide on Intel, which means there is always truncation at the end of the calculation) to represent its real number. In decimal this would look like this:
Digit: 1 234567890123456
Value: 1.0000000000000000000000000000...0000
Value: 0.0000000000000000000000000000...0001
After the add is processed, it is practically:
Digit: 1 234567890123456
Value: 1.0000000000000000000000000000...0001
So, with it in mind that the value truncates around the 16 digit mark (in decimal, it's exactly 22 binary digits in a 32-bit float, and 51 binary digits in a 64-bit double, ignoring the very important fact that the leading 1 is shifted (with respect to the exponent) and assumed (effectively compressing 23 binary digits into 22 for 32-bit and 52 to 51 for 64-bit); this is a very interesting point, but one you should read more detailed examples, such as here for those details).
Truncated:
Digit: 1 234567890123456
Value: 1.00000000000000000000
Note the really small decimal portion is truncated, thus leaving 1.
Here is a good page that I use whenever I have issues thinking about the actual representation in memory: Decimal to 32-bit IEEE-754 format. In the site, there are links for 64-bit as well and going in reverse.

Related

Java float to long explicit cast [duplicate]

This question already has answers here:
How many significant digits do floats and doubles have in java?
(6 answers)
Closed 7 years ago.
I'd like to understand how i in this piece of code ends up being 99999904 ?
long i = (long)(99999900f);
System.out.println(i);
I understand that floats are stored with decimals and are not the exact value they appear to be but I don't see how that could lead to the first digit being a 4 and not a 9 or something that would eventually round up to 99999900?
I'd like to understand how i in this piece of code ends up being 99999904
99999904 has a simpler representation in binary than 99999900 (uses fewer significant digits), and since the latter needs more significant digits than can be stored in a float, Java has to use the nearest representable value instead.
99999900 in hex: 0x5F5E09C
What 24 significant binary digits look like in hex:
0x7FFFFF8
99999904 in hex: 0x5F5E0A0
Between the two nearest representable values 0x5F5E098 and 0x5F5E0A0, Java chose the latter according to the “nearest-even” rule. The two representable values are in fact equally close, but 0x5F5E0A0 has 0 for the last significant digit, and so is chosen according to this rule. You can check for yourself that 99999899 rounds down to 0x5F5E098 and 99999900 rounds up to 0x5F5E0A0.
With 24 significant binary digits to the float type, all integers between 0 and 224-1 can be represented exactly. 224 happens to be also representable exactly (it has only one significant digit), but 224+1, or 16777217, is not representable exactly as a float, and the integers above it are not automatically exactly representable as float either.

Converting double value of 1234567.1234 to float in java

I am trying to convert double to float in java.
Double d = 1234567.1234;
Float f = d.floatValue();
I see that the value of f is
1234567.1
I am not trying to print a string value of float. I just wonder what is the maximum number of digits not to lose any precision when converting double to float. Can i show more than 8 significant digits in java?
float: 32 bits (4 bytes) where 23 bits are used for the mantissa (6 to 9 decimal digits, about 7 on average). 8 bits are used for the exponent, so a float can “move” the decimal point to the right or to the left using those 8 bits. Doing so avoids storing lots of zeros in the mantissa as in 0.0000003 (3 × 10-7) or 3000000 (3 × 107). There is 1 bit used as the sign bit.
double: 64 bits (8 bytes) where 52 bits are used for the mantissa (15 to 17 decimal digits, about 16 on average). 11 bits are used for the exponent and 1 bit is the sign bit.
I believe you hit this limit what cause that problem.
If you change
Double d = 123456789.1234;
Float f = d.floatValue();
You will see that float value will be 1.23456792E8
The precision of a float is about 7 decimals, but since floats are stored in binary, that's an approximation.
To illustrate the actual precision of the float value in question, try this:
double d = 1234567.1234;
float f = (float)d;
System.out.printf("%.9f%n", d);
System.out.printf("%.9f%n", Math.nextDown(f));
System.out.printf("%.9f%n", f);
System.out.printf("%.9f%n", Math.nextUp(f));
Output
1234567.123400000
1234567.000000000
1234567.125000000
1234567.250000000
As you can see, the effective decimal precision is about 1 decimal place for this number, or 8 digits, but if you ran the code with the number 9876543.9876, you get:
9876543.987600000
9876543.000000000
9876544.000000000
9876545.000000000
That's only 7 digits of precision.
This is a simple example in support of the view that there is no safe number of decimal digits.
Consider 0.1.
The closest IEEE 754 64-bit binary floating point number has exact value 0.1000000000000000055511151231257827021181583404541015625. It converts to 32-bit binary floating point as 0.100000001490116119384765625, which is considerably further from 0.1.
There can be loss of precision with even a single significant digit and single decimal place.
Unless you really need the compactness of float, and have very relaxed precision requirements, it is generally better to just stick with double.
I just wonder what is the maximum number of digits not to lose any precision when converting double to float.
Maybe you don't realize it, but the concept of N digits precisions is already ambigous. Doubtlessly you meant "N digits precision in base 10". But unlike humans, our computers work with Base 2.
Its not possible to convert every number from Base X to Base Y (with a limited amount of retained digits) without loss of precision, e.g. the value of 1/3rd is perfectly accurately representable in Base 3 as "0.1". In Base 10 it has an infinite number of digits 0.3333333333333... Likewise, commonly perfectly representable numbers in Base 10, e.g. 0.1 need an infinite number of digits to be represented in Base 2. On the other hand, 0.5 (Base 10) is peferectly accurately representable as 0.1 (Base 2).
So back to
I just wonder what is the maximum number of digits not to lose any precision when converting double to float.
The answer is "it depends on the value". The commonly cited rule of thumb "float has about 6 to 7 digits decimal precision" is just an approximation. It can be much more or much less depending on the value.
When dealing with floating point the concept of relative accuracy is more useful, stop thinking about "digits" and replace it with relative error. Any number N (in range) is representable with an error of (at most) N / accuracy, and the accuracy is the number of mantissa bits in the chosen format (e.g. 23 (+1) for float, 52 (+1) for double). So a decimal number represented as a float is has a maximum approximation error of N / pow(2, 24). The error may be less, even zero, but it is never greater.
The 23+1 comes from the convention that floating point numbers are organized with the exponent chosen such that the first mantissa bit is always a 1 (whenever possible), so it doesn't need to be explicitly stored. The number of physically stored bits, e.g. 23 thus allows for one extra bit of accuracy. (There is an exceptional case where "whenever possible" does not apply, but lets ignore that here).
TL;DR: There is no fixed number of decimal digits accuracy in float or double.
EDIT.
No you cannot get any more precise with a float in Java because floats can only contain 32 bits ( 4 bytes). If you want more precision, then continue to use the Double. This might also be helpful

Are there limits of precision on the memory capacity of double?

I'm reading an introductory book on Java and ran into something I don't quite understand. In covering variable types, the author states "the word double stands for numbers between
This struck me as odd, since as written, it would include all numbers on the real number line between the two aforementioned limits.
From my understanding, double is a primitive data type assigned 64 bits of memory. In turn, it's clear to me that 5.9 is a perfectly fine double, or float for that matter. However, I'm not sure how the series
i.e., 5.9, 5.99, 5.999, 5.9999, ... would fit in memory as k approaches infinity.
Is my intuition correct that not all real numbers between the author's two limits would be appropriately held in memory as a double?
Is my intuition that not all real numbers between the author's two limits would be appropriately held in memory as a double?
Yes, you are right.
Even the MOST obvious "doubles" cannot be stored correctly. For instance 0.1 is "1/10" - have you ever divided by ten in a base-2 system? That's an infinite number (comparable to "/3" in the base 10 system)
(This fact btw was responsible for the Patriot-Bug: http://sydney.edu.au/engineering/it/~alum/patriot_bug.html)
And therefore even some obvious easy maths will go wrong on java:
Take the compiler of your choice and try this:
System.out.println(0.8 == (0.1 + 0.7));
Whoops, it will output false.
Indeed. In one sentence, ints are exact, while floats and doubles are stored using scientific notation notated in bits. This means that there will be a roundoff error, as scientific notation goes.
As per wikipedia:
Sign bit: 1 bit
Exponent width: 11 bits
Significand precision: 53 bits (52 explicitly stored)
An interesting note: the exponent has 1 bit storing its sign also!
To read more:
Wikipedia - Double precision floating point format
The double data type is a double-precision 64-bit(8 bytes) IEEE-754 floating point.The format consists of 1-bit for sign, 11-bits for exponent and the remaining 52 bits of the significand represent the fraction part. With the 52 bits of the fraction significand appearing in the memory format, the total precision is therefore 53 bits (approximately 16 decimal digits, 53 log10(2) ≈ 15.955). The 11 bit width of the exponent allows the representation of numbers with a decimal exponent between 10E−308 and 10E+308, with full 15–17 decimal digits precision. Double and float are not exactly real numbers.There can be infinite number of real numbers in any range, but it should always be kept in mind that there are only finite number of bits to represent them and hence not all numbers could be represented.
For higher and better precision, you can use BigDecimal class found in the java.math package.
http://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html

Subtracting two decimal numbers giving weird outputs [duplicate]

This question already has answers here:
Whats wrong with this simple 'double' calculation? [duplicate]
(5 answers)
Closed 9 years ago.
While I was having fun with codes from Java Puzzlers(I don't have the book) I came across this piece of code
public static void main(String args[]) {
System.out.println(2.00 - 1.10);
}
Output is
0.8999999999999999
When I tried changing the code to
2.00d - 1.10d still I get the same output as 0.8999999999999999
For,2.00d - 1.10f Output is 0.8999999761581421
For,2.00f - 1.10d Output is 0.8999999999999999
For,2.00f - 1.10f Output is 0.9
Why din't I get the output as 0.9 in the first place? I could not make any heads or tails out of this? Can somebody articulate this?
Because in Java double values are IEEE floating point numbers.
The work around could be to use Big Decimal class
Immutable, arbitrary-precision signed decimal numbers. A BigDecimal
consists of an arbitrary precision integer unscaled value and a 32-bit
integer scale. If zero or positive, the scale is the number of digits
to the right of the decimal point. If negative, the unscaled value of
the number is multiplied by ten to the power of the negation of the
scale. The value of the number represented by the BigDecimal is
therefore (unscaledValue × 10^-scale).
On a side note you may also want to check Wikipedia article on IEEE 754 how floating point numbers are stored on most systems.
The more operations you do on a floating point number, the more significant rounding errors can become.
In binary 0.1 is 0.00011001100110011001100110011001.....,
As such it cannot be represented exactly in binary. Depending where you round off (float or double) you get different answers.
So 0.1f =0.000110011001100110011001100
And 0.1d=0.0001100110011001100110011001100110011001100110011001
You note that the number repeats on a 1100 cycle. However the float and double precision split it at a different point in the cycle. As such on one the error rounds up and the other rounds down; leading to the difference.
But most importantly;
Never assume floating point numbers are exact
Other answers are correct, just to point to a valid reference, I quote oracle doc:
double: The double data type is a double-precision 64-bit IEEE 754
floating point. Its range of values is beyond the scope of this
discussion, but is specified in the Floating-Point Types, Formats, and
Values section of the Java Language Specification. For decimal values,
this data type is generally the default choice. As mentioned above,
this data type should never be used for precise values, such as
currency

Adding and subtracting doubles are giving strange results [duplicate]

This question already has answers here:
Retain precision with double in Java
(24 answers)
Closed 9 years ago.
So when I add or subtract in Java with Doubles, it giving me strange results. Here are some:
If I add 0.0 + 5.1, it gives me 5.1. That's correct.
If I add 5.1 + 0.1, it gives me 5.199999999999 (The number of repeating 9s may be off). That's wrong.
If I subtract 4.8 - 0.4, it gives me 4.39999999999995 (Again, the repeating 9s may be off). That's wrong.
At first I thought this was only the problem with adding doubles with decimal values, but I was wrong. The following worked fine:
5.1 + 0.2 = 5.3
5.1 - 0.3 = 4.8
Now, the first number added is a double saved as a variable, though the second variable grabs the text from a JTextField. For example:
//doubleNum = 5.1 RIGHT HERE
//The textfield has only a "0.1" in it.
doubleNum += Double.parseDouble(textField.getText());
//doubleNum = 5.199999999999999
In Java, double values are IEEE floating point numbers. Unless they are a power of 2 (or sums of powers of 2, e.g. 1/8 + 1/4 = 3/8), they cannot be represented exactly, even if they have high precision. Some floating point operations will compound the round-off error present in these floating point numbers. In cases you've described above, the floating-point errors have become significant enough to show up in the output.
It doesn't matter what the source of the number is, whether it's parsing a string from a JTextField or specifying a double literal -- the problem is inherit in floating-point representation.
Workarounds:
If you know you'll only have so many decimal points, then use integer
arithmetic, then convert to a decimal:
(double) (51 + 1) / 10
(double) (48 - 4) / 10
Use BigDecimal
If you must use double, you can cut down on floating-point errors
with the Kahan Summation Algorithm.
In Java, doubles use IEEE 754 floating point arithmetic (see this Wikipedia article), which is inherently inaccurate. Use BigDecimal for perfect decimal precision. To round in printing, accepting merely "pretty good" accuracy, use printf("%.3f", x).

Categories

Resources