Java float to long explicit cast [duplicate] - java

This question already has answers here:
How many significant digits do floats and doubles have in java?
(6 answers)
Closed 7 years ago.
I'd like to understand how i in this piece of code ends up being 99999904 ?
long i = (long)(99999900f);
System.out.println(i);
I understand that floats are stored with decimals and are not the exact value they appear to be but I don't see how that could lead to the first digit being a 4 and not a 9 or something that would eventually round up to 99999900?

I'd like to understand how i in this piece of code ends up being 99999904
99999904 has a simpler representation in binary than 99999900 (uses fewer significant digits), and since the latter needs more significant digits than can be stored in a float, Java has to use the nearest representable value instead.
99999900 in hex: 0x5F5E09C
What 24 significant binary digits look like in hex:
0x7FFFFF8
99999904 in hex: 0x5F5E0A0
Between the two nearest representable values 0x5F5E098 and 0x5F5E0A0, Java chose the latter according to the “nearest-even” rule. The two representable values are in fact equally close, but 0x5F5E0A0 has 0 for the last significant digit, and so is chosen according to this rule. You can check for yourself that 99999899 rounds down to 0x5F5E098 and 99999900 rounds up to 0x5F5E0A0.
With 24 significant binary digits to the float type, all integers between 0 and 224-1 can be represented exactly. 224 happens to be also representable exactly (it has only one significant digit), but 224+1, or 16777217, is not representable exactly as a float, and the integers above it are not automatically exactly representable as float either.

Related

How is float range that large with 4 bytes (±3.40282347E+38) [duplicate]

This question already has answers here:
What is the difference between the float and integer data type when the size is the same?
(3 answers)
Closed 3 years ago.
Both int and float types are of 4 bytes in java.
Then how can int represent a range of just -2,147,483,648 to 2,147,483,647 where float has approximately ±3.40282347E+38F while both have the same limited amount of bytes?
According to my understanding both should have the same range as they have the same amount of bytes. Can some one explain me how float can represent a range that large?
"Floating" point means that the number of digits for the fractional part of your number can change to represent your number "as best as possible" given the constraints dictated by its size.
Let's forget for the time being of the 4 bytes of the float datatype and assume that you your "floating point" type can store up to 10 digits plus the negative symbol.
This means you can accurately represent numbers from :-9 999 999 999 to +9 999 999 999.
However, if you want one decimal, you can accurately represent numbers from -999 999 999.9 to +999 999 999.9. As you can see, the range has effectively changed.
Now, let's formalize a bit the explanation by talking about significand and exponent:
the significand contains your significant digits
the exponent represents the exponent of the 10 multiplier or, if it's easier, by how many positions you have to move the decimal point (with 0 being just before the first significant digit).
Let's say that your "floating point" data type can have up to 4 digits in its significand and up to 1 digit in its exponent as well, plus the minus symbol in both significand and exponent.
You will be able to represent numbers from -0.9 999 * 10^9 = -999900000 to +0.999 9 * 10^9 = +999900000. As you can see while the numbers are pretty large you can't accurately represent most large numbers as you only have 4 digits you can use for your representation. This loss in precision is compensated by the ability to represent very small numbers so for example you can represent 0.999 9 * 10^-9 = 0.000 000 000 999 9.
This explains why the range is so large despite the size being only 4 bytes, as stated in your question.
To complete your knowledge on the matter, bring the above concepts to binary (your typical float uses 4 bits for the exponent, 23 for the significand and 1 bit for the sign of the significand).
Wikipedia is a good starting point. The major takeaway from programming purposes, usually, is to understand how many decimal digits you can store given your datatype (your "precision") as that will determine what specific decimal format fits your purposes best.
See the following link for more information:
https://en.wikipedia.org/wiki/Floating-point_arithmetic#IEEE_754:_floating_point_in_modern_computers
Please note that understanding the concept of floating point number on a binary system is extremely important in information technology as even the most simple computations are heavily affected by it.
Floating points as represented on a computer (binary) are for example the reason why writing things like:
public class MyClass {
public static void main(String args[]) {
double x=0.1f;
double y=0.2f;
double z=0.3f;
if(x+y == z) {
System.out.println("something");
}
else {
System.out.println("something else");
}
}
}
will counter-intuitively output something else but if you start playing with the numbers or change the type to float it will produce the correct output.
So be aware: you will need to understand the concept fully.

Java Double Precision: Different formula leads to different result [duplicate]

This question already has answers here:
Rounding Errors?
(9 answers)
Is floating point math broken?
(31 answers)
Closed 6 years ago.
Today I find an interesting fact that the formula will influence the precision of the result. Please see the below code
double x = 7d
double y = 10d
​println(1-x/y​)
println((y-x)/y​)​
I wrote this code using groovy, you can just treat it as Java
The result is
1-x/y: 0.30000000000000004
(y-x)/y: 0.3
It's interesting that the two formulas which should be equal have different result.
Can anyone explain it for me?
And can I apply the second formula to anywhere applicable as a valid solution for double precision issue?
To control the precision of floating point arithmetic, you should use java.math.BigDecimal.
You can do something like this.
BigDecimal xBigdecimal = BigDecimal.valueOf(7d);
BigDecimal yBigdecimal = BigDecimal.valueOf(10d);
System.out.println(BigDecimal.valueOf(1).subtract(xBigdecimal.divide(yBigdecimal)));
Can anyone explain it for me?
The float and double primitive types in Java are floating point numbers, where the number is stored as a binary representation of a fraction and a exponent.
More specifically, a double-precision floating point value such as the double type is a 64-bit value, where:
1 bit denotes the sign (positive or negative).
11 bits for the exponent.
52 bits for the significant digits (the fractional part as a binary).
These parts are combined to produce a double representation of a value.
Check this
For a detailed description of how floating point values are handled in Java, follow Floating-Point Types, Formats, and Values of the Java Language Specification.
The byte, char, int, long types are fixed-point numbers, which are exact representions of numbers. Unlike fixed point numbers, floating point numbers will some times (safe to assume "most of the time") not be able to return an exact representation of a number. This is the reason why you end up with 0.30000000000000004 in the result of 1 - (x / y​).
When requiring a value that is exact, such as 1.5 or 150.1005, you'll want to use one of the fixed-point types, which will be able to represent the number exactly.
As I've already showed in the above example, Java has a BigDecimal class which will handle very large numbers and very small numbers.

Converting double value of 1234567.1234 to float in java

I am trying to convert double to float in java.
Double d = 1234567.1234;
Float f = d.floatValue();
I see that the value of f is
1234567.1
I am not trying to print a string value of float. I just wonder what is the maximum number of digits not to lose any precision when converting double to float. Can i show more than 8 significant digits in java?
float: 32 bits (4 bytes) where 23 bits are used for the mantissa (6 to 9 decimal digits, about 7 on average). 8 bits are used for the exponent, so a float can “move” the decimal point to the right or to the left using those 8 bits. Doing so avoids storing lots of zeros in the mantissa as in 0.0000003 (3 × 10-7) or 3000000 (3 × 107). There is 1 bit used as the sign bit.
double: 64 bits (8 bytes) where 52 bits are used for the mantissa (15 to 17 decimal digits, about 16 on average). 11 bits are used for the exponent and 1 bit is the sign bit.
I believe you hit this limit what cause that problem.
If you change
Double d = 123456789.1234;
Float f = d.floatValue();
You will see that float value will be 1.23456792E8
The precision of a float is about 7 decimals, but since floats are stored in binary, that's an approximation.
To illustrate the actual precision of the float value in question, try this:
double d = 1234567.1234;
float f = (float)d;
System.out.printf("%.9f%n", d);
System.out.printf("%.9f%n", Math.nextDown(f));
System.out.printf("%.9f%n", f);
System.out.printf("%.9f%n", Math.nextUp(f));
Output
1234567.123400000
1234567.000000000
1234567.125000000
1234567.250000000
As you can see, the effective decimal precision is about 1 decimal place for this number, or 8 digits, but if you ran the code with the number 9876543.9876, you get:
9876543.987600000
9876543.000000000
9876544.000000000
9876545.000000000
That's only 7 digits of precision.
This is a simple example in support of the view that there is no safe number of decimal digits.
Consider 0.1.
The closest IEEE 754 64-bit binary floating point number has exact value 0.1000000000000000055511151231257827021181583404541015625. It converts to 32-bit binary floating point as 0.100000001490116119384765625, which is considerably further from 0.1.
There can be loss of precision with even a single significant digit and single decimal place.
Unless you really need the compactness of float, and have very relaxed precision requirements, it is generally better to just stick with double.
I just wonder what is the maximum number of digits not to lose any precision when converting double to float.
Maybe you don't realize it, but the concept of N digits precisions is already ambigous. Doubtlessly you meant "N digits precision in base 10". But unlike humans, our computers work with Base 2.
Its not possible to convert every number from Base X to Base Y (with a limited amount of retained digits) without loss of precision, e.g. the value of 1/3rd is perfectly accurately representable in Base 3 as "0.1". In Base 10 it has an infinite number of digits 0.3333333333333... Likewise, commonly perfectly representable numbers in Base 10, e.g. 0.1 need an infinite number of digits to be represented in Base 2. On the other hand, 0.5 (Base 10) is peferectly accurately representable as 0.1 (Base 2).
So back to
I just wonder what is the maximum number of digits not to lose any precision when converting double to float.
The answer is "it depends on the value". The commonly cited rule of thumb "float has about 6 to 7 digits decimal precision" is just an approximation. It can be much more or much less depending on the value.
When dealing with floating point the concept of relative accuracy is more useful, stop thinking about "digits" and replace it with relative error. Any number N (in range) is representable with an error of (at most) N / accuracy, and the accuracy is the number of mantissa bits in the chosen format (e.g. 23 (+1) for float, 52 (+1) for double). So a decimal number represented as a float is has a maximum approximation error of N / pow(2, 24). The error may be less, even zero, but it is never greater.
The 23+1 comes from the convention that floating point numbers are organized with the exponent chosen such that the first mantissa bit is always a 1 (whenever possible), so it doesn't need to be explicitly stored. The number of physically stored bits, e.g. 23 thus allows for one extra bit of accuracy. (There is an exceptional case where "whenever possible" does not apply, but lets ignore that here).
TL;DR: There is no fixed number of decimal digits accuracy in float or double.
EDIT.
No you cannot get any more precise with a float in Java because floats can only contain 32 bits ( 4 bytes). If you want more precision, then continue to use the Double. This might also be helpful

Subtracting two decimal numbers giving weird outputs [duplicate]

This question already has answers here:
Whats wrong with this simple 'double' calculation? [duplicate]
(5 answers)
Closed 9 years ago.
While I was having fun with codes from Java Puzzlers(I don't have the book) I came across this piece of code
public static void main(String args[]) {
System.out.println(2.00 - 1.10);
}
Output is
0.8999999999999999
When I tried changing the code to
2.00d - 1.10d still I get the same output as 0.8999999999999999
For,2.00d - 1.10f Output is 0.8999999761581421
For,2.00f - 1.10d Output is 0.8999999999999999
For,2.00f - 1.10f Output is 0.9
Why din't I get the output as 0.9 in the first place? I could not make any heads or tails out of this? Can somebody articulate this?
Because in Java double values are IEEE floating point numbers.
The work around could be to use Big Decimal class
Immutable, arbitrary-precision signed decimal numbers. A BigDecimal
consists of an arbitrary precision integer unscaled value and a 32-bit
integer scale. If zero or positive, the scale is the number of digits
to the right of the decimal point. If negative, the unscaled value of
the number is multiplied by ten to the power of the negation of the
scale. The value of the number represented by the BigDecimal is
therefore (unscaledValue × 10^-scale).
On a side note you may also want to check Wikipedia article on IEEE 754 how floating point numbers are stored on most systems.
The more operations you do on a floating point number, the more significant rounding errors can become.
In binary 0.1 is 0.00011001100110011001100110011001.....,
As such it cannot be represented exactly in binary. Depending where you round off (float or double) you get different answers.
So 0.1f =0.000110011001100110011001100
And 0.1d=0.0001100110011001100110011001100110011001100110011001
You note that the number repeats on a 1100 cycle. However the float and double precision split it at a different point in the cycle. As such on one the error rounds up and the other rounds down; leading to the difference.
But most importantly;
Never assume floating point numbers are exact
Other answers are correct, just to point to a valid reference, I quote oracle doc:
double: The double data type is a double-precision 64-bit IEEE 754
floating point. Its range of values is beyond the scope of this
discussion, but is specified in the Floating-Point Types, Formats, and
Values section of the Java Language Specification. For decimal values,
this data type is generally the default choice. As mentioned above,
this data type should never be used for precise values, such as
currency

Difference between Float.MIN_VALUE and Float.MIN_NORMAL? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Difference among Double.MIN_NORMAL and Double.MIN_VALUE
1) Can some one pls explain to me what the difference is between MIN_NORMAL and MIN_VALUE?
System.out.println(Float.MIN_NORMAL);
System.out.println(Float.MIN_VALUE);
2) Also, why does still still print 1.0?
float f = Float.MIN_NORMAL + 1.0f;
System.out.println(f);
double d = Float.MIN_NORMAL + 1.0f;
System.out.println(d);
Output:
1.17549435E-38
1.4E-45
1.0
1.0
The answer for the first question is in the duplicate.
The answer for the second question is:
Both Floats and Doubles do not have infinite precision. You can conveniently think they have around 16 digits of precision. Anything past that and you are going to have rounding errors and truncation.
So, 1.0e0 + 1e-38 is going to lack precision to do anything except for end with 1.0e0 due to truncating the extra precision.
Really, like the rest of the answer, it requires an understanding of how floating point numbers in IEEE format are actually added in binary. The basic idea is that the non-sign-and-exponent portion of the binary, floating point number is shifted in the IEEE-754 unit on the CPU (80 bits wide on Intel, which means there is always truncation at the end of the calculation) to represent its real number. In decimal this would look like this:
Digit: 1 234567890123456
Value: 1.0000000000000000000000000000...0000
Value: 0.0000000000000000000000000000...0001
After the add is processed, it is practically:
Digit: 1 234567890123456
Value: 1.0000000000000000000000000000...0001
So, with it in mind that the value truncates around the 16 digit mark (in decimal, it's exactly 22 binary digits in a 32-bit float, and 51 binary digits in a 64-bit double, ignoring the very important fact that the leading 1 is shifted (with respect to the exponent) and assumed (effectively compressing 23 binary digits into 22 for 32-bit and 52 to 51 for 64-bit); this is a very interesting point, but one you should read more detailed examples, such as here for those details).
Truncated:
Digit: 1 234567890123456
Value: 1.00000000000000000000
Note the really small decimal portion is truncated, thus leaving 1.
Here is a good page that I use whenever I have issues thinking about the actual representation in memory: Decimal to 32-bit IEEE-754 format. In the site, there are links for 64-bit as well and going in reverse.

Categories

Resources