This code print -46 when we cast a float to int,
This is because the information was lost during the conversion from type int to type float.
int i = 1_234_567_890;
float f = i;
System.out.println(i - (int)f); //print -46
System.out.println(i - f); //print 0.0
How can one know about loss of precision while using float and double?
How can we know the result when we analyze the code before testing it ?
The float representation is quite different from the integer representation.
Here you can see this specification (4.2.3. Floating-Point Types)
Here you can see Integer representation
Floating-Point representation uses sign, exponent and mantissa and float has
Sign (1 bit)
Exponent (8 bits)
Mantissa (23 bits)
With this, the maximun value that float can represent is: 340,282,346,638,528,860,000,000,000,000,000,000,000.000000
The maximun value that long can represent is 9,223,372,036,854,775,807
The maximun value that int can represent is: 2,147,483,647
Therefore, during the conversion from float to long or integer, the lost information can be of many orders of magnitude.
During the integer to float conversion you can lose up to 64u. of detail,It depends on the value.
For example:
int i = 1_073_741_888;
float f = i;
System.out.println(i - (int)f); //print 64
System.out.println(i - f); //print 0.0
And
int i = 1_073_742_016;
float f = i;
System.out.println(i - (int)f); //print -64
System.out.println(i - f); //print 0.0
The binary-integer representation of this values are:
1_073_741_888: 1000000000000000000000001000000
1_073_742_016: 1000000000000000000000011000000
The least significant bits are rounded when the value is greather than 16777216 (2^24 because of float's mantissa size -23 bits-).
Related
public class Hello {
public static void main(String[] args){
int myMinIntValue = Integer.MIN_VALUE;
int myMaxIntValue = Integer.MAX_VALUE;
System.out.println("The min Value Integer can hold is " + myMinIntValue);
System.out.println("The Maximum Value Integer can hold is " + myMaxIntValue);
System.out.println("The BUSTED MAX INT value is "+ (myMaxIntValue+1));
System.out.println("The BUSTED Min INT value is "+ (myMinIntValue-1));
System.out.print( "\n");
byte myMinByteValue = Byte.MIN_VALUE;
byte myMaxByteValue = Byte.MAX_VALUE;
System.out.println("The min Value Byte can hold is " + myMinByteValue);
System.out.println("The Maximum Value Byte can hold is " + myMaxByteValue);
System.out.println("The BUSTED MAX Byte value is "+ (myMaxByteValue+1));
System.out.println("The BUSTED Min Byte value is "+ (myMinByteValue-1));
System.out.print( "\n");
}
}
Returns
The min Value Integer can hold is -2147483648
The Maximum Value Integer can hold is 2147483647
The BUSTED MAX INT value is -2147483648
The BUSTED Min INT value is 2147483647
The min Value Byte can hold is -128
The Maximum Value Byte can hold is 127
The BUSTED MAX Byte value is 128
The BUSTED Min Byte value is -129
The min Value Short can hold is -32768
The Maximum Value Short can hold is 32767
The BUSTED MAX Short value is 32768
The BUSTED Min Short value is -32769
In the case of Int, The Maximum Value Integer can hold is 2147483647 when we add +1 ( Because of OVER flow it changes to negative, but in case of Byte if we add +1 it keeps adding? Someone explain why so?
When you add two numbers together, the operands undergo binary numeric promotion.
The operands are first unboxed, if necessary; then the first matching rule applies:
If one operand is a double, the other is widened to double
If one operand is a float, the other is widened to a float
If one operand is a long, the other is widened to long
Otherwise, both are widened to int.
Since 1 is an int (because it's an int literal), adding it to a byte means that the last rule applies, so the byte is widened to an int; and the result of the addition of two ints is an int.
Because 128 is within the range of int, no overflow occurs.
Note that the rules go no narrower than int, so even adding two bytes will result in an int:
System.out.println(Byte.MAX_VALUE + (byte) 1); // 128
Note also that if you used a pre/post-increment:
byte myMaxByteValue = Byte.MAX_VALUE;
++myMaxByteValue;
then the value of maxByte would be -128. This is because pre-increment is equivalent to:
myMaxByteValue = (byte) (myMaxByteValue + 1);
i.e. there is an implicit cast back to the variable type.
If you add an int and a byte, the result will be an int.
When adding 2 integers types that are smaller(like bytes) than int, the calvulation will result in an int.
To resolve this problem, you can just cast the result to a byte:
(byte)(myMaxByteValue+1)
I am trying to find trailing numbers of zeros in a number, here is my code:
public class TrailingZeroes {
public static void bruteForce(int num){ //25
double fact = num; //25
int numOfZeroes = 0;
for(int i= num - 1; i > 1; i--){
fact = fact * (i);
}
System.out.printf("Fact: %.0f\n",fact); //15511210043330984000000000
while(fact % 10 == 0){
fact = fact / 10;
double factRem = fact % 10;
System.out.printf("Fact/10: %.0f\n",fact); //1551121004333098400000000
System.out.printf("FactRem: %.0f\n",factRem); // 2?
numOfZeroes++;
}
System.out.println("Nnumber of zeroes "+ numOfZeroes); //1
}
}
As you can see the fact%10
You use floating point data type illegally.
The float and double primitive types in Java are floating point numbers, where the number is stored as a binary representation of a fraction and a exponent.
More specifically, a double-precision floating point value such as the double type is a 64-bit value, where:
1 bit denotes the sign (positive or negative).
11 bits for the exponent.
52 bits for the significant digits (the fractional part as a binary).
These parts are combined to produce a double representation of a value.
For a detailed description of how floating point values are handled in Java, see the Section 4.2.3: Floating-Point Types, Formats, and Values of the Java Language Specification.
The byte, char, int, long types are [fixed-point][6] numbers, which are exact representions of numbers. Unlike fixed point numbers, floating point numbers will some times (safe to assume "most of the time") not be able to return an exact representation of a number. This is the reason why you end up with 11.399999999999 as the result of 5.6 + 5.8.
When requiring a value that is exact, such as 1.5 or 150.1005, you'll want to use one of the fixed-point types, which will be able to represent the number exactly.
As has been mentioned several times already, Java has a BigDecimal class which will handle very large numbers and very small numbers.
public static void bruteForce(int num) { //25
double fact = num;
// precision was lost on high i
for (int i = num - 1; i > 1; i--)
fact *= i;
String str = String.format("%.0f", fact); //15511210043330984000000000
System.out.println(str);
int i = str.length() - 1;
int numOfZeroes = 0;
while (str.charAt(i--) == '0')
numOfZeroes++;
System.out.println("Number of zeroes " + numOfZeroes); //9
}
On the JVM, does division between two double values always yield the same exact result as doing the integer division?
With the following prerequisites:
Division without remainder
No division by zero
Both x and y actually hold integer values.
E.g. in the following code
double x = ...;
int resultInt = ...;
double y = x * resultInt;
double resultDouble = y / x; // double division
does resultDouble always equal resultInt or could there be some loss of precision?
There are two reasons that assigning an int to a double or a float might lose precision:
There are certain numbers that just can't be represented as a double/float, so they end up approximated
Large integer numbers may contain too much precision in the lease-significant digits
So it depeands on how big the int is, but in Java a double uses a 52 bit mantissa, so will be able to represent a 32bit integer without lost of data.
The are fabolous examples in this two sites:
1- Java's Floating-Point (Im)Precision
2- About Primitive Data Types In Java
also check:
Loss of precision - int -> float or double
Yes, if x and y are both in the int range, unless the division is -2147483648.0 / -1.0. In that case, the double result will match the result of integer division, but not int division.
If both division inputs are in int range, they are both exactly representable as double. If their ratio is an integer and the division is not -2147483648.0 / -1.0 the ratio is in the int range, and so exactly representable as double. That double is the closest value to the result of the division, and therefore must be the result of the double division.
This reasoning does not necessarily apply if x and y are integers outside the int range.
The statement is true with the exception of Integer.MIN_VALUE, where the result of division of two integers may fall out of integer range.
int n = Integer.MIN_VALUE;
int m = -1;
System.out.println(n / m); // -2147483648
System.out.println((int) ((double)n / (double)m)); // 2147483647
This is IEEE 754 standard question. I don't completely understand the mechanics behind it.
public class Gray {
public static void main(String[] args){
System.out.println( (float) (2000000000) == (float) (2000000000 + 50));
}
}
Because a float can only hold about 7 to 8 significant digits. That is, it doesn't have enough bits to represent the number 2000000050 exactly, so it gets rounded to 2000000000.
Specifically speaking, a float consists of three parts:
the sign bit (1 bit)
the exponent (8 bits)
the significand (24 bits, but only 23 bits are stored since the MSB of the significand is always 1)
You can think of floating point as the computer's way doing scientific notation, but in binary.
The precision is equal to log(2 ^ number of significand bits). That means a float can hold log(2 ^ 24) = 7.225 significant digits.
The number 2,000,000,050 has 9 significant digits. The calculation above tells us that a 24-bit significand can't hold that many significant digits. The reason why 2,000,000,000 works because there's only 1 significant digit, so it fits in the significand.
To solve the problem, you would use a double since it has a 52-bit significand, which is more than enough to represent every possible 32-bit number.
Plainly said - 50 is a rounding error when a float has a value of two-billion.
You might find this trick to find the next representable value interesting.
float f = 2000000000;
int binaryValue = Float.floatToRawIntBits(f);
int nextBinaryValue = binaryValue+1;
float nextFloat = Float.intBitsToFloat(nextBinaryValue);
System.out.printf("The next float value after %.0f is %.0f%n", f, nextFloat);
double d = 2000000000;
long binaryValue2 = Double.doubleToRawLongBits(d);
long nextBinaryValue2 = binaryValue2+1;
double nextDouble = Double.longBitsToDouble(nextBinaryValue2);
System.out.printf("The next double value after %.7f is %.7f%n", d, nextDouble);
prints
The next float value after 2000000000 is 2000000128
The next double value after 2000000000.0000000 is 2000000000.0000002
It might help you understand the situation if you consider a program (C++) as below. It displays the groups of successive integers that get rounded to the same float value:
#include <iostream>
#include <iomanip>
int main()
{
float prev = 0;
int count = 0;
double from;
for (double to = 2000000000 - 150; count < 10; to += 1.0)
{
float now = to;
if (now != prev)
{
if (count)
std::cout << std::setprecision(20) << from << ".." << to - 1 << " ==> " << prev << '\n';
prev = now;
from = to;
++count;
}
}
}
Output:
1999999850..1999999935 ==> 1999999872
1999999936..2000000064 ==> 2000000000
2000000065..2000000191 ==> 2000000128
2000000192..2000000320 ==> 2000000256
2000000321..2000000447 ==> 2000000384
2000000448..2000000576 ==> 2000000512
2000000577..2000000703 ==> 2000000640
2000000704..2000000832 ==> 2000000768
2000000833..2000000959 ==> 2000000896
This indicates that floating point is only precise enough to represent all integers from 1999999850 to 1999999935, wrongly recording their value as 1999999872. So on for other values. This is the tangible consequence of the limited storage space mentioned above.
private final static int L1_MAX_SCORE = 30;
private final static int L2_MAX_SCORE = 150;
public void UpdateLevel(int score) {
double progress;
//returns 0.0
progress = score / (L2_MAX_SCORE - L1_MAX_SCORE) * 100;
//works correctly.
progress = (double) score / (L2_MAX_SCORE - L1_MAX_SCORE) * 100;
Thanks.
Dividing an integer with an integer is defined to do integer division, just like in most (all?) other C-like languages.
By casting score to a double, you are dividing a floating-point value with an integer, and you get a floating-point value back.
Arithmetic operations in Java whose operands are all ints will result in ints, so you're actually assigning an integer result to a double variable. Thus you must cast at least one of them to a double so that the calculations are performed based on doubles, because of the higher precision.
L2_MAX_SCORE - L1_MAX_SCORE = 150 - 30 = 120
Let's assume score is 30
Using floating point, the answer should be 30/120 = 0.4
With integer division, this will get rounded to 0 hence why it 'doesn't work' - it is actually working, but doing the wrong kind of division for your needs
You can either
a) Cast either numerator or denominator as double
b) Rearrange
the formula to be (score * 100) / (L2_MAX_SCORE - L1_MAX_SCORE)
with this, it would become (30 * 100) / 120 = 3000 / 120 = 40
it still does integer division, so if the score is 31 your answer (3100
/ 120) is not 25.8333 but 25