This is IEEE 754 standard question. I don't completely understand the mechanics behind it.
public class Gray {
public static void main(String[] args){
System.out.println( (float) (2000000000) == (float) (2000000000 + 50));
}
}
Because a float can only hold about 7 to 8 significant digits. That is, it doesn't have enough bits to represent the number 2000000050 exactly, so it gets rounded to 2000000000.
Specifically speaking, a float consists of three parts:
the sign bit (1 bit)
the exponent (8 bits)
the significand (24 bits, but only 23 bits are stored since the MSB of the significand is always 1)
You can think of floating point as the computer's way doing scientific notation, but in binary.
The precision is equal to log(2 ^ number of significand bits). That means a float can hold log(2 ^ 24) = 7.225 significant digits.
The number 2,000,000,050 has 9 significant digits. The calculation above tells us that a 24-bit significand can't hold that many significant digits. The reason why 2,000,000,000 works because there's only 1 significant digit, so it fits in the significand.
To solve the problem, you would use a double since it has a 52-bit significand, which is more than enough to represent every possible 32-bit number.
Plainly said - 50 is a rounding error when a float has a value of two-billion.
You might find this trick to find the next representable value interesting.
float f = 2000000000;
int binaryValue = Float.floatToRawIntBits(f);
int nextBinaryValue = binaryValue+1;
float nextFloat = Float.intBitsToFloat(nextBinaryValue);
System.out.printf("The next float value after %.0f is %.0f%n", f, nextFloat);
double d = 2000000000;
long binaryValue2 = Double.doubleToRawLongBits(d);
long nextBinaryValue2 = binaryValue2+1;
double nextDouble = Double.longBitsToDouble(nextBinaryValue2);
System.out.printf("The next double value after %.7f is %.7f%n", d, nextDouble);
prints
The next float value after 2000000000 is 2000000128
The next double value after 2000000000.0000000 is 2000000000.0000002
It might help you understand the situation if you consider a program (C++) as below. It displays the groups of successive integers that get rounded to the same float value:
#include <iostream>
#include <iomanip>
int main()
{
float prev = 0;
int count = 0;
double from;
for (double to = 2000000000 - 150; count < 10; to += 1.0)
{
float now = to;
if (now != prev)
{
if (count)
std::cout << std::setprecision(20) << from << ".." << to - 1 << " ==> " << prev << '\n';
prev = now;
from = to;
++count;
}
}
}
Output:
1999999850..1999999935 ==> 1999999872
1999999936..2000000064 ==> 2000000000
2000000065..2000000191 ==> 2000000128
2000000192..2000000320 ==> 2000000256
2000000321..2000000447 ==> 2000000384
2000000448..2000000576 ==> 2000000512
2000000577..2000000703 ==> 2000000640
2000000704..2000000832 ==> 2000000768
2000000833..2000000959 ==> 2000000896
This indicates that floating point is only precise enough to represent all integers from 1999999850 to 1999999935, wrongly recording their value as 1999999872. So on for other values. This is the tangible consequence of the limited storage space mentioned above.
Related
This code print -46 when we cast a float to int,
This is because the information was lost during the conversion from type int to type float.
int i = 1_234_567_890;
float f = i;
System.out.println(i - (int)f); //print -46
System.out.println(i - f); //print 0.0
How can one know about loss of precision while using float and double?
How can we know the result when we analyze the code before testing it ?
The float representation is quite different from the integer representation.
Here you can see this specification (4.2.3. Floating-Point Types)
Here you can see Integer representation
Floating-Point representation uses sign, exponent and mantissa and float has
Sign (1 bit)
Exponent (8 bits)
Mantissa (23 bits)
With this, the maximun value that float can represent is: 340,282,346,638,528,860,000,000,000,000,000,000,000.000000
The maximun value that long can represent is 9,223,372,036,854,775,807
The maximun value that int can represent is: 2,147,483,647
Therefore, during the conversion from float to long or integer, the lost information can be of many orders of magnitude.
During the integer to float conversion you can lose up to 64u. of detail,It depends on the value.
For example:
int i = 1_073_741_888;
float f = i;
System.out.println(i - (int)f); //print 64
System.out.println(i - f); //print 0.0
And
int i = 1_073_742_016;
float f = i;
System.out.println(i - (int)f); //print -64
System.out.println(i - f); //print 0.0
The binary-integer representation of this values are:
1_073_741_888: 1000000000000000000000001000000
1_073_742_016: 1000000000000000000000011000000
The least significant bits are rounded when the value is greather than 16777216 (2^24 because of float's mantissa size -23 bits-).
I am trying to find trailing numbers of zeros in a number, here is my code:
public class TrailingZeroes {
public static void bruteForce(int num){ //25
double fact = num; //25
int numOfZeroes = 0;
for(int i= num - 1; i > 1; i--){
fact = fact * (i);
}
System.out.printf("Fact: %.0f\n",fact); //15511210043330984000000000
while(fact % 10 == 0){
fact = fact / 10;
double factRem = fact % 10;
System.out.printf("Fact/10: %.0f\n",fact); //1551121004333098400000000
System.out.printf("FactRem: %.0f\n",factRem); // 2?
numOfZeroes++;
}
System.out.println("Nnumber of zeroes "+ numOfZeroes); //1
}
}
As you can see the fact%10
You use floating point data type illegally.
The float and double primitive types in Java are floating point numbers, where the number is stored as a binary representation of a fraction and a exponent.
More specifically, a double-precision floating point value such as the double type is a 64-bit value, where:
1 bit denotes the sign (positive or negative).
11 bits for the exponent.
52 bits for the significant digits (the fractional part as a binary).
These parts are combined to produce a double representation of a value.
For a detailed description of how floating point values are handled in Java, see the Section 4.2.3: Floating-Point Types, Formats, and Values of the Java Language Specification.
The byte, char, int, long types are [fixed-point][6] numbers, which are exact representions of numbers. Unlike fixed point numbers, floating point numbers will some times (safe to assume "most of the time") not be able to return an exact representation of a number. This is the reason why you end up with 11.399999999999 as the result of 5.6 + 5.8.
When requiring a value that is exact, such as 1.5 or 150.1005, you'll want to use one of the fixed-point types, which will be able to represent the number exactly.
As has been mentioned several times already, Java has a BigDecimal class which will handle very large numbers and very small numbers.
public static void bruteForce(int num) { //25
double fact = num;
// precision was lost on high i
for (int i = num - 1; i > 1; i--)
fact *= i;
String str = String.format("%.0f", fact); //15511210043330984000000000
System.out.println(str);
int i = str.length() - 1;
int numOfZeroes = 0;
while (str.charAt(i--) == '0')
numOfZeroes++;
System.out.println("Number of zeroes " + numOfZeroes); //9
}
This question already has answers here:
Efficient way to round double precision numbers to a lower precision given in number of bits
(2 answers)
Closed 5 years ago.
I would like to introduce some artificial precision loss into two numbers being compared to smooth out minor rounding errors so that I don't have to use the Math.abs(x - y) < eps idiom in every comparison involving x and y.
Essentially, I want something that behaves similarly to down-casting a double to a float and then up-casting it back to a double, except I want to also preserve very large and very small exponents and I want some control over the number of significand bits preserved.
Given the following function that produces the binary representation of the significand of a 64-bit IEEE 754 number:
public static String significand(double d) {
int SIGN_WIDTH = 1;
int EXP_WIDTH = 11;
int SIGNIFICAND_WIDTH = 53;
String s = String.format("%64s", Long.toBinaryString(Double.doubleToRawLongBits(d))).replace(' ', '0');
return s.substring(0 + SIGN_WIDTH, 0 + SIGN_WIDTH + EXP_WIDTH);
}
I want a function reducePrecision(double x, int bits) that reduces the precision of the significand of a double such that:
significand(reducePrecision(x, bits)).substring(bits).equals(String.format("%0" + (52 - bits) + "d", 0))
In other words, every bit after the bits-most significant bit in the significand of reducePrecision(x, bits) should be 0, while the bits-most significant bits in the significand of reducePrecision(x, bits) should reasonably approximate the bits-most signicant bits in the significand of x.
Suppose x is the number you wish to reduce the precision of and bits is the number of significant bits you wish to retain.
When bits is sufficiently large and the order of magnitude of x is sufficiently close to 0, then x * (1L << (bits - Math.getExponent(x))) will scale x so that the bits that need to be removed will appear in the fractional component (after the radix point) while the bits that will be retained will appear in the integer component (before the radix point). You can then round this to remove the fractional component and then divide the rounded number by (1L << (bits - Math.getExponent(x))) to restore the order of magnitude of x, i.e.:
public static double reducePrecision(double x, int bits) {
int exponent = bits - Math.getExponent(x);
return Math.round(x * (1L << exponent)) / (1L << exponent);
}
However, (1L << exponent) will break down when Math.getExponent(x) > bits || Math.getExponent(x) < bits - 62. The solution is to use Math.pow(2, exponent) (or the fast pow2(exponent) implementation from this answer) to calculate a fractional, or a very large, power of 2, i.e.:
public static double reducePrecision(double x, int bits) {
int exponent = bits - Math.getExponent(x);
return Math.round(x * Math.pow(2, exponent)) * Math.pow(2, -exponent);
}
However, Math.pow(2, exponent) will break down as exponent approaches -1074 or +1023. The solution is to use Math.scalb(x, exponent) so that the power of 2 doesn't have to be explicitly calculated, i.e.:
public static double reducePrecision(double x, int bits) {
int exponent = bits - Math.getExponent(x);
return Math.scalb(Math.round(Math.scalb(x, exponent)), -exponent);
}
However, Math.round(y) returns a long so it does not preserve Infinity, NaN, and cases where Math.abs(x) > Long.MAX_VALUE / Math.pow(2, exponent). Furthermore, Math.round(y) always rounds ties to positive infinity (e.g. Math.round(0.5) == 1 && Math.round(1.5) == 2). The solution is to use Math.rint(y) to receive a double and preserve the unbiased IEEE 754 round-to-nearest, ties-to-even rule (e.g. Math.rint(0.5) == 0.0 && Math.rint(1.5) == 2.0), i.e.:
public static double reducePrecision(double x, int bits) {
int exponent = bits - Math.getExponent(x);
return Math.scalb(Math.rint(Math.scalb(x, exponent)), -exponent);
}
Finally, here is a unit test confirming our expectations:
public static String decompose(double d) {
int SIGN_WIDTH = 1;
int EXP_WIDTH = 11;
int SIGNIFICAND_WIDTH = 53;
String s = String.format("%64s", Long.toBinaryString(Double.doubleToRawLongBits(d))).replace(' ', '0');
return s.substring(0, 0 + SIGN_WIDTH) + " "
+ s.substring(0 + SIGN_WIDTH, 0 + SIGN_WIDTH + EXP_WIDTH) + " "
+ s.substring(0 + SIGN_WIDTH + EXP_WIDTH, 0 + SIGN_WIDTH + EXP_WIDTH + SIGNIFICAND_WIDTH - 1);
}
public static void test() {
// Use a fixed seed so the generated numbers are reproducible.
java.util.Random r = new java.util.Random(0);
// Generate a floating point number that makes use of its full 52 bits of significand precision.
double a = r.nextDouble() * 100;
System.out.println(decompose(a) + " " + a);
Assert.assertFalse(decompose(a).split(" ")[2].substring(23).equals(String.format("%0" + (52 - 23) + "d", 0)));
// Cast the double to a float to produce a "ground truth" of precision loss to compare against.
double b = (float) a;
System.out.println(decompose(b) + " " + b);
Assert.assertTrue(decompose(b).split(" ")[2].substring(23).equals(String.format("%0" + (52 - 23) + "d", 0)));
// 32-bit float has a 23 bit significand, so c's bit pattern should be identical to b's bit pattern.
double c = reducePrecision(a, 23);
System.out.println(decompose(c) + " " + c);
Assert.assertTrue(b == c);
// 23rd-most significant bit in c is 1, so rounding it to the 22nd-most significant bit requires breaking a tie.
// Since 22nd-most significant bit in c is 0, d will be rounded down so that its 22nd-most significant bit remains 0.
double d = reducePrecision(c, 22);
System.out.println(decompose(d) + " " + d);
Assert.assertTrue(decompose(d).split(" ")[2].substring(22).equals(String.format("%0" + (52 - 22) + "d", 0)));
Assert.assertTrue(decompose(c).split(" ")[2].charAt(22) == '1' && decompose(c).split(" ")[2].charAt(21) == '0');
Assert.assertTrue(decompose(d).split(" ")[2].charAt(21) == '0');
// 21st-most significant bit in d is 1, so rounding it to the 20th-most significant bit requires breaking a tie.
// Since 20th-most significant bit in d is 1, e will be rounded up so that its 20th-most significant bit becomes 0.
double e = reducePrecision(c, 20);
System.out.println(decompose(e) + " " + e);
Assert.assertTrue(decompose(e).split(" ")[2].substring(20).equals(String.format("%0" + (52 - 20) + "d", 0)));
Assert.assertTrue(decompose(d).split(" ")[2].charAt(20) == '1' && decompose(d).split(" ")[2].charAt(19) == '1');
Assert.assertTrue(decompose(e).split(" ")[2].charAt(19) == '0');
// Reduce the precision of a number close to the largest normal number.
double f = reducePrecision(a * 0x1p+1017, 23);
System.out.println(decompose(f) + " " + f);
// Reduce the precision of a number close to the smallest normal number.
double g = reducePrecision(a * 0x1p-1028, 23);
System.out.println(decompose(g) + " " + g);
// Reduce the precision of a number close to the smallest subnormal number.
double h = reducePrecision(a * 0x1p-1051, 23);
System.out.println(decompose(h) + " " + h);
}
And its output:
0 10000000101 0010010001100011000110011111011100100100111000111011 73.0967787376657
0 10000000101 0010010001100011000110100000000000000000000000000000 73.0967788696289
0 10000000101 0010010001100011000110100000000000000000000000000000 73.0967788696289
0 10000000101 0010010001100011000110000000000000000000000000000000 73.09677124023438
0 10000000101 0010010001100011001000000000000000000000000000000000 73.0968017578125
0 11111111110 0010010001100011000110100000000000000000000000000000 1.0266060746443803E308
0 00000000001 0010010001100011000110100000000000000000000000000000 2.541339559435826E-308
0 00000000000 0000000000000000000000100000000000000000000000000000 2.652494739E-315
I want to write/read millions of numbers from 0-15 from/to a file. Precision is not an issue, as long as the read values are within +-0.1 of the written one, everything is fine.
Previous Ideas
My first, premature idea was to convert a float to a String like this and write them space-separated:
String.format("%.1f%s", float)
This, of course, is very inefficient, as it uses 4-5 bytes for every float.
Then I came to the idea to just write the bytes of each float, that would be faster but not sufficiently reduce the size.
ByteBuffer.allocate(4).putFloat(float).array()
Current issue
My current idea is to reduce the float to one byte. Looking at the ranges and precision I need, I would allocate the first 4 bits to represent the decimals before the floating point and the last 4 bits to the tail.
But how can I obtain these bits fast, since it has to be done millions of times?
Since your tail is single digit, it can be implicit - i.e. 14.3 is converted to 143. To convert back, it would simply be 143 / 10 to get whole part, and 143 % 10 to get the fraction. Here is how an implementation could look like.
public class Test {
public static void main(String[] args) {
float floatValue = 14.1f;
Test test = new Test();
byte b = test.toByteStorage(floatValue);
System.out.println(test.fromByteStorage(b));
}
byte toByteStorage(float f) {
return (byte) (f * 10);
}
String fromByteStorage(byte b) {
int intValue = Byte.toUnsignedInt(b);
return intValue / 10 + "." + intValue % 10;
}
}
You might use something like this:
// conversion factor to map 15.0 to 0xF0
float m = (float) (0xF0 / 15.0);
for (float x = 0.0f; x <= 15.0f; x += 0.25f) {
// obtain byte corresponding to float
byte b = (byte) (m * x);
// recover float from byte to check conversion
// mask off sign bit to convert signed to unsigned byte
float r = (b & 0x0FF) / m;
// what is the difference between original float and
// recovered float?
float error = Math.abs(x - r);
// show all values for testing
String s = " b=0x" + Integer.toHexString(b & 0x0FF) +
" r=" + Float.toString(r) +
" err=" + Float.toString(error);
System.out.println(s);
}
I wrote this little program to calculate pi.
While playing with the code and trying to find the most exact result, I found a point where my computer couldn't calalculate a result. It could do 33554430 repetitions within seconds, but if i increased the for loop to 33554431 it didn't output anything.
So is 33554430 a special number?
public class CalculatePi{
public static void main(String[] args){
float pi=0;
int sign=1;
for(float i=1; i <= 33554430; i+=2){
pi += (sign*(1.0/i));
sign*= -1;
}
pi *= 4;
System.out.println(pi);
}
}
You are getting and endless loop, because during the comparison i <= 33554431, the int value 33554431 is promoted to a float value which is "too precise" for float and will actually equal to 33554432.
Then, when you try to increase the value by +2, the float just isn't precise enough to increment from the value 33554432. To illustrate my point:
float f = 33554432;
System.out.println(f); //33554432
f += 2;
System.out.println(f); //33554432
So the value f doesn't increase due to its precision limitation. If you'd increase it by, say 11, you'd get 33554444 (and not 33554443) as that is the closest number expressible with that precision.
So is 33554430 a special number?
Sort of, not 33554430 but rather 33554432. First "special number" for float is 16777217, which is the first positive integer that cannot be represented as a float (equals 16777216 as float). So, if you'd increment your i variable by 1, this is the number you'd get stuck on. Now, since you are incrementing by 2, the number you get stuck on is 16777216 * 2 = 33554432.
public class CalculatePi{
public static void main(String[] args){
float pi=0;
int sign=1;
for(float i=1; i <= 33554431; i+=2){
pi += (sign*(1.0/i));
sign*= -1;
if( i > 33554410) System.out.println(i);
}
pi *= 4;
System.out.println(pi);
System.out.println((float)33554431);
System.out.println((float)33554432);
System.out.println((float)33554434);
}
}
You compare float with int in for loop. When you convert 33554431 (it's int value) to float you get 3.3554432E7.
It's about accuracy, precision. When you run:
System.out.println((float)33554431); // -> 3.3554432E7
System.out.println((float)33554432); // -> 3.3554432E7
System.out.println((float)33554434); // -> 3.3554432E7
All 3 prints 3.3554432E7, it means that when you increase float value of 33554432 by 2, you get 3.3554432E7, exactly this same value, and your loop runs forever.
Your loop increments by 2 each time.
2 * 33554430 = 67108860
2 ^ 26 = 67108864
Maybe Java stores floating point numbers in a 32bit system by using 26bits for the mantissa and 6 bits for the exponent?
This version works for both. It's the float loop variable causing problems:
public static void main(String[] args){
float pi=0;
int sign=1;
for(int i=1; i <= 33554430; i+=2){
pi += (sign*(1.0/(float)i));
sign*= -1;
}
pi *= 4;
System.out.println(pi);
}
Probably this is due this issue:
The finite nonzero values of any floating-point value set can all be
expressed in the form s · m · 2(e - N + 1), where s is +1 or -1, m is
a positive integer less than 2N, and e is an integer between Emin =
-(2K-1-2) and Emax = 2K-1-1, inclusive, and where N and K are parameters that depend on the value set.