Why is (long)9223372036854665200d giving me 9223372036854665216? - java

I know about weird stuff with precision errors, but I can't fathom,
Why is (long)9223372036854665200d giving me 9223372036854665216 ?

9223372036854665200d is a constant of type double. However, 9223372036854665200 does not fit in a double without loss of precision. A double only has 52 bits of mantissa, whereas the number in question requires 63 bits to be represented exactly.
The nearest double to 9223372036854665200d is the number whose mantissa equals 1.1111111111111111111111111111111111111111111110010100 in binary and whose exponent is 63 (decimal). This number is none other than 9223372036854665216 (call it U).
If we decrease the mantissa one notch to 1.1...0011, we get 9223372036854664192 (call it L).
The original number is between L and U and is much closer to U than it is to L
Finally, if you think that this truncation of the mantissa ought to result in a number that ends in a bunch of zeros, you're right. Only it happens in binary, not in decimal: U in base-16 is 0x7ffffffffffe5000 and L is 0x7ffffffffffe4c00.

Because doubles don't have that much precision. Why are you doing such a strange thing? Change the d to l.

Doubles have 52-53 bit precision, whereas a long has 64 bit precision (for integers only). The loss of precision in a double is used to represent the exponent, which allows a double to represent larger/smaller numbers than a long can.
Your number is 19 digits long, whereas a double can only store roughly 16 digits of (decimal) integer data. Thus the final number ends up being rounded.
Reference: Double - Wikipedia

Because doubles have limited precision. Your constant has more significant digits than a double can keep track of, so it loses them.

You are assuming that limited precision means that it is represented in decimal so is limited to 15 or 16 digits. Actually it is represented in binary and limited to 53 bits of precision. double takes the closest representable value.
double d = 9223372036854665200d;
System.out.println(d +" is actually\n" + new BigDecimal(d)+" so when cast to (long) is\n"+(long) d);
prints
9.2233720368546652E18 is actually
9223372036854665216 so when cast to (long) is
9223372036854665216

Related

Type casting in Java. Double to long

Why the output of System.out.println((long)Math.pow(2,63)); and System.out.println((long)(Math.pow(2,63)-1)); is same in Java?
The output is the same because double does not have enough bits to represent 263 exactly.
Mantissa of a double has only 52 bits:
This gives you at most 17 decimal digit precision. The value you computed, on the other hand, is 9223372036854775808, so it needs 19 digits to be represented exactly. As the result, the actual representation of 263 is 9223372036854776000:
Mantissa is set to 1.0 (1 in front is implied)
Exponent is set to 1086 (1024 is implicitly subtracted to yield 63)
The mantissa of representation of 1 is the same, while the exponent is 1024 for the effective value of zero, i.e. the exponents of the two numbers differ by 63, which is more than the size of the mantissa.
Subtraction of 1 happens while your number is represented as double. Since the magnitude of minuend is much larger than that of the subtrahend, the whole subtraction operation is ignored.
You would get the same result after subtracting much larger numbers - all the way to 512, which is 29 (demo). After that the difference in exponent would be less than 52, so you would start getting different results.
Math.pow( double, double ) returns a double values.
double in java is a 64-bit IEEE 754 floating point.(https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html)
If you look here: https://en.wikipedia.org/wiki/Double-precision_floating-point_format you will find, that this format is composed of:
1 bit sign
11 bit exponent
53 bit significant precision
The returned number by pow would need a higher precision (63) to be stored exactly.
Basically the 1 you add is below this precision threshold.
In contrast long has 64 bit precision.
To make it more clear lets assume we are working in decimal and not in base2:
In some imaginary small float datatype with a precision of 2 the value 1000 would be stored as 1.00e3. If you add 1 it would have to store it as 1.001e3. But since we only have a precision of 2 it can only store 1.00e3 and nothing changes. So 1.00e3 + 1 == 1.00e3
The same happens in your example, only that we are dealing with larger numbers and base2 of cause.
You should use parenthesis to incorporate the result and then subtract 1, like this:
System.out.println((long)Math.pow(2,63));
System.out.println(((long)(Math.pow(2,63))-1));
Output:
9223372036854775807
9223372036854775806
For long data type in java,Maximum value is 9,223,372,036,854,775,807 (inclusive). (2^63 -1)
So Even if you try
System.out.println((long)Math.pow(2,65));
System.out.println((long)(Math.pow(2,63)-1));
output
9223372036854775807
9223372036854775807

Converting double value of 1234567.1234 to float in java

I am trying to convert double to float in java.
Double d = 1234567.1234;
Float f = d.floatValue();
I see that the value of f is
1234567.1
I am not trying to print a string value of float. I just wonder what is the maximum number of digits not to lose any precision when converting double to float. Can i show more than 8 significant digits in java?
float: 32 bits (4 bytes) where 23 bits are used for the mantissa (6 to 9 decimal digits, about 7 on average). 8 bits are used for the exponent, so a float can “move” the decimal point to the right or to the left using those 8 bits. Doing so avoids storing lots of zeros in the mantissa as in 0.0000003 (3 × 10-7) or 3000000 (3 × 107). There is 1 bit used as the sign bit.
double: 64 bits (8 bytes) where 52 bits are used for the mantissa (15 to 17 decimal digits, about 16 on average). 11 bits are used for the exponent and 1 bit is the sign bit.
I believe you hit this limit what cause that problem.
If you change
Double d = 123456789.1234;
Float f = d.floatValue();
You will see that float value will be 1.23456792E8
The precision of a float is about 7 decimals, but since floats are stored in binary, that's an approximation.
To illustrate the actual precision of the float value in question, try this:
double d = 1234567.1234;
float f = (float)d;
System.out.printf("%.9f%n", d);
System.out.printf("%.9f%n", Math.nextDown(f));
System.out.printf("%.9f%n", f);
System.out.printf("%.9f%n", Math.nextUp(f));
Output
1234567.123400000
1234567.000000000
1234567.125000000
1234567.250000000
As you can see, the effective decimal precision is about 1 decimal place for this number, or 8 digits, but if you ran the code with the number 9876543.9876, you get:
9876543.987600000
9876543.000000000
9876544.000000000
9876545.000000000
That's only 7 digits of precision.
This is a simple example in support of the view that there is no safe number of decimal digits.
Consider 0.1.
The closest IEEE 754 64-bit binary floating point number has exact value 0.1000000000000000055511151231257827021181583404541015625. It converts to 32-bit binary floating point as 0.100000001490116119384765625, which is considerably further from 0.1.
There can be loss of precision with even a single significant digit and single decimal place.
Unless you really need the compactness of float, and have very relaxed precision requirements, it is generally better to just stick with double.
I just wonder what is the maximum number of digits not to lose any precision when converting double to float.
Maybe you don't realize it, but the concept of N digits precisions is already ambigous. Doubtlessly you meant "N digits precision in base 10". But unlike humans, our computers work with Base 2.
Its not possible to convert every number from Base X to Base Y (with a limited amount of retained digits) without loss of precision, e.g. the value of 1/3rd is perfectly accurately representable in Base 3 as "0.1". In Base 10 it has an infinite number of digits 0.3333333333333... Likewise, commonly perfectly representable numbers in Base 10, e.g. 0.1 need an infinite number of digits to be represented in Base 2. On the other hand, 0.5 (Base 10) is peferectly accurately representable as 0.1 (Base 2).
So back to
I just wonder what is the maximum number of digits not to lose any precision when converting double to float.
The answer is "it depends on the value". The commonly cited rule of thumb "float has about 6 to 7 digits decimal precision" is just an approximation. It can be much more or much less depending on the value.
When dealing with floating point the concept of relative accuracy is more useful, stop thinking about "digits" and replace it with relative error. Any number N (in range) is representable with an error of (at most) N / accuracy, and the accuracy is the number of mantissa bits in the chosen format (e.g. 23 (+1) for float, 52 (+1) for double). So a decimal number represented as a float is has a maximum approximation error of N / pow(2, 24). The error may be less, even zero, but it is never greater.
The 23+1 comes from the convention that floating point numbers are organized with the exponent chosen such that the first mantissa bit is always a 1 (whenever possible), so it doesn't need to be explicitly stored. The number of physically stored bits, e.g. 23 thus allows for one extra bit of accuracy. (There is an exceptional case where "whenever possible" does not apply, but lets ignore that here).
TL;DR: There is no fixed number of decimal digits accuracy in float or double.
EDIT.
No you cannot get any more precise with a float in Java because floats can only contain 32 bits ( 4 bytes). If you want more precision, then continue to use the Double. This might also be helpful

Are there limits of precision on the memory capacity of double?

I'm reading an introductory book on Java and ran into something I don't quite understand. In covering variable types, the author states "the word double stands for numbers between
This struck me as odd, since as written, it would include all numbers on the real number line between the two aforementioned limits.
From my understanding, double is a primitive data type assigned 64 bits of memory. In turn, it's clear to me that 5.9 is a perfectly fine double, or float for that matter. However, I'm not sure how the series
i.e., 5.9, 5.99, 5.999, 5.9999, ... would fit in memory as k approaches infinity.
Is my intuition correct that not all real numbers between the author's two limits would be appropriately held in memory as a double?
Is my intuition that not all real numbers between the author's two limits would be appropriately held in memory as a double?
Yes, you are right.
Even the MOST obvious "doubles" cannot be stored correctly. For instance 0.1 is "1/10" - have you ever divided by ten in a base-2 system? That's an infinite number (comparable to "/3" in the base 10 system)
(This fact btw was responsible for the Patriot-Bug: http://sydney.edu.au/engineering/it/~alum/patriot_bug.html)
And therefore even some obvious easy maths will go wrong on java:
Take the compiler of your choice and try this:
System.out.println(0.8 == (0.1 + 0.7));
Whoops, it will output false.
Indeed. In one sentence, ints are exact, while floats and doubles are stored using scientific notation notated in bits. This means that there will be a roundoff error, as scientific notation goes.
As per wikipedia:
Sign bit: 1 bit
Exponent width: 11 bits
Significand precision: 53 bits (52 explicitly stored)
An interesting note: the exponent has 1 bit storing its sign also!
To read more:
Wikipedia - Double precision floating point format
The double data type is a double-precision 64-bit(8 bytes) IEEE-754 floating point.The format consists of 1-bit for sign, 11-bits for exponent and the remaining 52 bits of the significand represent the fraction part. With the 52 bits of the fraction significand appearing in the memory format, the total precision is therefore 53 bits (approximately 16 decimal digits, 53 log10(2) ≈ 15.955). The 11 bit width of the exponent allows the representation of numbers with a decimal exponent between 10E−308 and 10E+308, with full 15–17 decimal digits precision. Double and float are not exactly real numbers.There can be infinite number of real numbers in any range, but it should always be kept in mind that there are only finite number of bits to represent them and hence not all numbers could be represented.
For higher and better precision, you can use BigDecimal class found in the java.math package.
http://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html

Precision loss with java.lang.Double

Say I have 2 double values. One of them is very large and one of them is very small.
double x = 99....9; // I don't know the possible max and min values,
double y = 0,00..1; // so just assume these values are near max and min.
If I add those values together, do I lose precision?
In other words, does the max possible double value increase if I assign an int value to it? And does the min possible double value decrease if I choose a small integer part?
double z = x + y; // Real result is something like 999999999999999.00000000000001
double values are not evenly distributed over all numbers. double uses the floating point representation of the number which means you have a fixed amount of bits used for the exponent and a fixed amount of bits used to represent the actual "numbers"/mantissa.
So in your example using a large and a small value would result in dropping the smaller value since it can not be expressed using the larger exponent.
The solution to not dropping precision is using a number format that has a potentially growing precision like BigDecimal - which is not limited to a fixed number of bits.
I'm using a decimal floating point arithmetic with a precision of three decimal digits and (roughly) with the same features as the typical binary floating point arithmetic. Say you have 123.0 and 4.56. These numbers are represented by a mantissa (0<=m<1) and an exponent: 0.123*10^3 and 0.456*10^1, which I'll write as <.123e3> and <.456e1>. Adding two such numbers isn't immediately possible unless the exponents are equal, and that's why the addition proceeds according to:
<.123e3> <.123e3>
<.456e1> <.004e3>
--------
<.127e3>
You see that the necessary alignment of the decimal digits according to a common exponent produces a loss of precision. In the extreme case, the entire addend could be shifted into nothingness. (Think of summing an infinite series where the terms get smaller and smaller but would still contribute considerably to the sum being computed.)
Other sources of imprecision result from differences between binary and decimal fractions, where an exact fraction in one base cannot be represented without error using the other one.
So, in short, addition and subtraction between numbers from rather different orders of magnitude are bound to cause a loss of precision.
If you try to assign too big value or too small value a double, compiler will give an error:
try this
double d1 = 1e-1000;
double d2 = 1e+1000;

Why when I assign an int to a float is it not the same value?

When I assign from an int to a float I thought float allows more precision, so would not lose or change the value assigned, but what I am seeing is something quite different. What is going on here?
for(int i = 63000000; i < 63005515; i++) {
int a = i;
float f = 0;
f=a;
System.out.print(java.text.NumberFormat.getInstance().format(a) + " : " );
System.out.println(java.text.NumberFormat.getInstance().format(f));
}
some of the output :
...
63,005,504 : 63,005,504
63,005,505 : 63,005,504
63,005,506 : 63,005,504
63,005,507 : 63,005,508
63,005,508 : 63,005,508
Thanks!
A float has the same number of bits as an int -- 32 bits. But it allows for a greater range of values, far greater than the range of int values. However, the precision is fixed at 24 bits (23 "mantissa" bits, plus 1 implied 1 bit). At the value of about 63,000,000, the precision is greater than 1. You can verify this with the Math.ulp method, which gives the difference between 2 consecutive values.
The code
System.out.println(Math.ulp(63000000.0f));
prints
4.0
You can use double values for a far greater (yet still limited) precision:
System.out.println(Math.ulp(63000000.0));
prints
7.450580596923828E-9
However, you can just use ints here, because your values, at about 63 million, are still well below the maximum possible int value, which is about 2 billion.
A float in java is a number IEEE 754 floating point representation, even when it can be used to represent values from ±1.40129846432481707e-45 to ±3.40282346638528860e+38 it has only 6 or 7 significant decimal digits.
A simple solution would be use a double which has at least 14 significant digits and can cover without any issue all the values of an int.
However, if it is accuracy what you're looking for stay away from native floating point representations and go for classes like BigInteger and BigDecimal.
No, they are not necessarily the same value. An int and a float are each 32 bits but in a float some of those bits are used for the floating point part of the number so there are fewer whole numbers which can be represented in a float than in an int. Depending on what your application is doing with these numbers you may not care about these differences or maybe you want to look at using something like BigDecimal.
Floats don't allow more precision, floats allow wider range of numbers.
We've got 2^32 possible values for integers in range (approximately) -2 * 10^9 to 2 * 10^9. Floats are also 32bit, so the number of possible values is at most the same as for integers.
Out of these 32 bits, some of them are reserved for mantisa, the rest of these is for exponent. The resulting number represented by the float is then calculated (for simplicity I'll use 10-base) as mantisa * 10^exponent.
Obviously, the maximum precision is limited by the number of bits assigned to mantisa. So you can represent some integers exactly as integers, but they won't fit to mantisa, so the least significant bits are thrown off, as in your case.
Float have a greater range of values but lower precision.
Int have a lower range of values but higher precision.
Int is specific to 1, while Float is specific to 4.
So if you are dealing with trillions but don't care about +/- 4 then use float. but if you need the last digit to be precise you need to use int.

Categories

Resources