Working on a Java class, its making me crazy because this expression is evaluating to zero, I need it to evaluate to a double, then round it down to the nearest int. So what Im trying to get is for days to be a whole number of days, yet when I run it through java it evaluates to 0. When I run it through my calculator it evaluates to the correct value. I would love a fix and an explanation to why this what I already have isn't working.
public int getEventDays(){
//variables
double daysCalc;
int days;
//logic
if (getStatus().equals("filling")){
//this is indented less to fit everything on one line, its not this way in
//the fractions are for unit conversion
daysCalc= Math.floor(((capacity-storage)/(inflow-outflow))*(43560)*(1/3600)*(1/24));
days = (int)daysCalc;
}
else if (getStatus().equals("emptying")){
//this is indented less to fit everything
//the fractions are for unit conversion
daysCalc=Math.floor(((storage-0)/(outflow-inflow))*(43560)*(1/3600)*(1/24));
days = (int)daysCalc;
}
else{
days = -1;
}
return days;
}
Change your code to this :
daysCalc = Math.floor(((storage-0)/(outflow-inflow))*(43560)*(1.0/3600)*(1.0/24));
Explanation:
The right hand expression is returning an integer value. In your case, 1/3600 is rounded to 0, similar to the case of 1/24.
Now by using 1.0 instead of 1, it is giving the unrounded float value of 1/3600.
Your problem is connected with the order of operations within your expression. The parentheses around 1/3600 and 1/24 cause these expressions to be evaluated first - and since each of these divisions has an expression of integer type on either side of the division, it's treated as an integer division. In other words, 1/3600 and 1/24 are both evaluated as integers, to give a result of zero. This means that your arithmetic includes a couple of multiplications by zero, which is why your result is zero.
The simplest fix is to understand that multiplying by the reciprocal of some number is the same as dividing by that number. In other words, you could simplify the calculation to
daysCalc = Math.floor( storage / ( outflow - inflow ) * 43560 / 3600 / 24 );
which will give the correct result, provided storage, outflow and inflow are not all integers.
On the other hand, if storage, outflow and inflow are all integers, then you'll need to make sure that the first division is also not treated as an integer division. You could do this by writing
daysCalc = Math.floor((double) storage / ( outflow - inflow ) * 43560 / 3600 / 24 );
which forces the division to be done with floating point arithmetic; and thereafter, each one of the divisions is done in floating point.
Related
This question already has answers here:
How can a primitive float value be -0.0? What does that mean?
(5 answers)
Closed 2 years ago.
I need to round a small negative number to return 0.0. However, the value I'm getting is minus zero. The following code demonstrates the problem:
double value = -0.000000001;
double roundedValue = Double.valueOf(String.format(Locale.US, "%.4f", value));
System.out.println(roundedValue); // I need the roundedValue to be equal 0.0 (not -0.0)
There is a way to fix this?
You can explicitly handle the negative zero case:
roundedValue = (roundedValue == 0.0 && 1 / roundedValue < 0) ? 0 : roundedValue;
(1 / roundedValue < 0 to check for negative zero, from this answer)
Nit: use Double.parseInt rather than valueOf, to avoid the unnecessary boxing and immediate unboxing.
First, you have to realize that double values are semantically different from integers. They are imprecise, so there is always ever-so-slight error in every double value. The error, however, is small, but in unfortunate case it could be on the other side of zero - technically correct, but not what you want. This understanding is essential; if you need digit-exact arithmetic, you shouldn't be using doubles anyway, but integers/longs. Part of IEEE specification for double values also defines "negative zero", "NaN", "infinity" and so on, so technically the software is correct, but you are not using it the right way for what you want to achieve.
Second, like other people already mentioned, never use string formatting for rounding. If you need 4 decimal places, a much better way is to multiply the number by 10000, take floor/round of it and divide it by 10000 again. However, due to the facts mentioned above, you might again get some small decimal off (such as the 15th decimal digit).
On the other hand, if you just want to get rid of "rounding noise" which is sufficiently close to zero, you can also use this approach which is very robust:
if (Math.abs(x) < 0.000001d) x = 0d;
Much convenient would be:
double roundedValue = value < 0 ? (Math.ceil(value) == -0 ? 0 : Math.ceil(value)) : value;
In this case, you are going to preserve - values: for example -1.5 the result would be -1, but -0.00000001 is 0.0.
This method returns 'true'. Why ?
public static boolean f() {
double val = Double.MAX_VALUE/10;
double save = val;
for (int i = 1; i < 1000; i++) {
val -= i;
}
return (val == save);
}
You're subtracting quite a small value (less than 1000) from a huge value. The small value is so much smaller than the large value that the closest representable value to the theoretical result is still the original value.
Basically it's a result of the way floating point numbers work.
Imagine we had some decimal floating point type (just for simplicity) which only stored 5 significant digits in the mantissa, and an exponent in the range 0 to 1000.
Your example is like writing 10999 - 1000... think about what the result of that would be, when rounded to 5 significant digits. Yes, the exact result is 99999.....9000 (with 999 digits) but if you can only represent values with 5 significant digits, the closest result is 10999 again.
When you set val to Double.MAX_VALUE/10, it is set to a value approximately equal to 1.7976931348623158 * 10^307. substracting values like 1000 from that would required a precision on the double representation that is not possible, so it basically leaves val unchanged.
Depending on your needs, you may use BigDecimal instead of double.
Double.MAX_VALUE is so big that the JVM does not tell the difference between it and Double.MAX_VALUE-1000
if you subtract a number fewer than "1.9958403095347198E292" from Double.MAV_VALUE the result is still Double.MAX_VALUE.
System.out.println(
new BigDecimal(Double.MAX_VALUE).equals( new BigDecimal(
Double.MAX_VALUE - 2.E291) )
);
System.out.println(
new BigDecimal(Double.MAX_VALUE).equals( new BigDecimal(
Double.MAX_VALUE - 2.E292) )
);
Ouptup:
true
false
A double does not have enough precision to perform the calculation you are attempting. So the result is the same as the initial value.
It is nothing to do with the == operator.
val is a big number and when subtracting 1 (or even 1000) from it, the result cannot be expressed properly as a double value. The representation of this number x and x-1 is the same, because double only has a limited number of bits to represent an unlimited number of numbers.
Double.MAX_VALUE is a huge number compared to 1 or 1000. Double.MAX_VALUE-1 is generally equals to Double.MAX_VALUE. So your code roughly does nothing when substracting 1 or 1000 to Double.MAX_VALUE/10.
Always remember that:
doubles or floats are just approximations of real numbers, they are just rationals not equally distributed among the reals
you should use very carefully arithmetic operators between doubles or floats which are not close (there is many other rules such like this...)
in general, never use doubles or float if you need arbitrary precision
Because double is a floating point numeric type, which is a way of approximating numeric values. Floating point representations encode numbers so that we can store numbers much larger or smaller than we normally could. However, not all numbers can be represented in the given space, so multiple numbers get rounded to the same floating point value.
As a simplified example, we might want to be able to store values ranging from -1000 to 1000 in some small amount of space where we would normally only be able to store -10 to 10. So we could round all values to the nearest thousand and store them in the small space: -1000 gets encoded as -10, -900 gets encoded as -9, 1000 gets encoded as 10. But what if we want to store -999? The closest value we can encoded is -1000, so we have to encode -999 as the same value as -1000: -10.
In reality, floating point schemes are much more complicated than the example above, but the concept is similar. Floating point representations of numbers can only represent some of all the possible numbers, so when we have a number that can't be represented as part of the scheme, we have to round it to the closest representable value.
In your code, all values within 1000 of Double.MAX_VALUE / 10 automatically get rounded to Double.MAX_VALUE / 10, which is why the computer thinks (Double.MAX_VALUE / 10) - 1000 == Double.MAX_VALUE / 10.
The result of a floating point calculation is the closest representable value to the exact answer. This program:
public class Test {
public static void main(String[] args) throws Exception {
double val = Double.MAX_VALUE/10;
System.out.println(val);
System.out.println(Math.nextAfter(val, 0));
}
}
prints:
1.7976931348623158E307
1.7976931348623155E307
The first of these numbers is your original val. The second is the largest double that is less than it.
When you subtract 1000 from 1.7976931348623158E307, the exact answer is between those two numbers, but very, very much closer to 1.7976931348623158E307 than to 1.7976931348623155E307, so the result will be rounded to 1.7976931348623155E307, leaving val unchanged.
I am trying to get percentage but the result is error, i have expression as:
Uper=(Upcount/total)*100;
where Uper is float while Upcount and total is integer i am getting the result Uper=0.
An int divided by an int will result in an int. That could be 0. Multiply 0 * 100, convert to float, and the result is still 0.0. You need at least one of the operands to be floating point before the division will give a floating point result.
Try:
Uper = ((float)Upcount/(float)total)*100.0;
The extra (float) is me being paranoid that this line might be modified in the future without fully understanding the floating-point requirement. The 100.0 is to be explicit about what you want -- a floating point result.
Perhaps changing Upcount or total to float would make more sense.
the division of 2 integers will always result in an integer which is 0 in your case.
To solve this, use the following code:
Uper = ((Double) Upcount) / total * 100
Casting at least 1 member to Double or Float will get the result you want
This question already has answers here:
Integer division: How do you produce a double?
(11 answers)
Closed 7 years ago.
if I have something like:
long x = 1/2;
shouldn't this be rounded up to 1? When I print it on the screen it say 0.
It's doing integer division, which truncates everything to the right of the decimal point.
Integer division has its roots in number theory. When you do 1/2 you are asking how many times does 2 equal 1? The answer is never, so the equation becomes 0*2 + 1 = 1, where 0 is the quotient (what you get from 1/2) and 1 is the remainder (what you get from 1%2).
It is right to point out that % is not a true modulus in the mathematical sense but always a remainder from division. There is a difference when you are dealing with negative integers.
Hope that helps.
What this expression is doing is it first declares the existence of a long called x, and then assigning it the value of the right hand side expression. The right hand side expression is 1/2, and since 1 and 2 are both integers this is interpreted as integer division. With integer division the result is always an Integer, so something along the lines of 5/3 will return 1, as only one three fits in a five. So with 1/2, how many 2s can fit into 1? 0.
This can in some languages result in some interesting outputs if you write something like
double x = 1/2. You might expect 0.5 in this case, but it will often evaluate the integer value on the right first before assigning and converting the result into a double, giving the value 0.0
It is important to note that when doing this kind of type conversion, it will never round the result. So if you do the opposite:
long x = (long)(1.0/2.0);
then while (1.0/2.0) will evaluate to 0.5, the (long) cast will force this to be truncated to 0. Even if I had long x = (long)(0.9), the result will still be 0. It simply truncates after the decimal point.
It can't round because it's never in a state to be rounded
The expression "1/2" is never 0.5 before assign to long
Now, long x = 1.0/2.0 because the expression on the right before assign is valid for rounding. Unless you get 0.499999999999997...
this question was answered before on this site, you are doing an integer division, if you want to get the 0.5 use:
double x = (double)1/2;
and you will get the value of 0.5 .
There are lots of different rounding conventions, the most common being rounding towards +inf, rounding towards -inf and rounding towards zero. Lots of people assume there's one right way, but they all have different ideas about what that one way should be ;-)
There is no intermediate non-integer result for integer division, but of course the division is done deterministically, and one particular rounding convention will always be followed for a particular platform and compiler.
With Visual C++ I get 5/2 = 2 and -5/2 = -2, rounding towards zero.
The rounding in C, C++ and Java is commonly called "truncation" - meaning drop off the unwanted bits. But this can be misleading. Using 4 bit 2s complement binary, doing what truncation implies gives...
5/2 = 0101/0010 = 0010.1 --> 0010 = 2
-5/2 = 1011/0010 = 1101.1 --> 1101 = -3
Which is rounding towards -infinity, which is what Python does (or at least what it did in Python 2.5).
Truncation would be the right word if we used a sign-magnitude representation, but twos complement has been the de-facto standard for decades.
In C and C++, I expect while it's normally called truncation, in reality this detail is undefined in the standards and left to the implementation - an excuse for allowing the compiler to use the simplest and fastest method for the platform (what the processors division instruction naturally does). It's only an issue if you have negative numbers though - I've yet to see any language or implementation that would give 5/2 = 3.
I don't know what the Java standard says. The Python manual specifies "floor" division, which is a common term for rounding to -infinity.
EDIT
An extra note - by definition, if a/b = c remainder d, then a = (b*c)+d. For this to hold, you have to choose a remainder to suite your rounding convention.
People tend to assume that remainders and modulos are the same, but WRT signed values, they can be different - depending on the rounding rules. Modulo values are by definition never negative, but remainders can be negative.
I suspect the Python round-towards-negative-infinity rule is intended to ensure that the single % operator is valid both as a remainder and as a modulo. In C and C++, what % means (remainder or modulo) is (yes, you guessed it) implementation defined.
Ada actually has two separate operators - mod and rem. With division required to round towards zero, so that mod and rem do give different results.
Can every possible value of a float variable can be represented exactly in a double variable?
In other words, for all possible values X will the following be successful:
float f1 = X;
double d = f1;
float f2 = (float)d;
if(f1 == f2)
System.out.println("Success!");
else
System.out.println("Failure!");
My suspicion is that there is no exception, or if there is it is only for an edge case (like +/- infinity or NaN).
Edit: Original wording of question was confusing (stated two ways, one which would be answered "no" the other would be answered "yes" for the same answer). I've reworded it so that it matches the question title.
Yes.
Proof by enumeration of all possible cases:
public class TestDoubleFloat {
public static void main(String[] args) {
for (long i = Integer.MIN_VALUE; i <= Integer.MAX_VALUE; i++) {
float f1 = Float.intBitsToFloat((int) i);
double d = (double) f1;
float f2 = (float) d;
if (f1 != f2) {
if (Float.isNaN(f1) && Float.isNaN(f2)) {
continue; // ok, NaN
}
fail("oops: " + f1 + " != " + f2);
}
}
}
}
finishes in 12 seconds on my machine. 32 bits are small.
In theory, there is not such a value, so "yes", every float should be representable as a double.. Converting from a float to a double should involve just tacking four bytes of 00 on the end -- they are stored using the same format, just with different sized fields.
Yes, floats are a subset of doubles. Both floats and doubles have the form (sign * a * 2^b). The difference between floats and doubles is the number of bits in a & b. Since doubles have more bits available, assigning a float value to a double effectively means inserting extra 0 bits.
As everyone has already said, "no". But that's actually a "yes" to the question itself, i.e. every float can be exactly expressed as a double. Confusing. :)
If I'm reading the language specification correctly (and as everyone else is confirming), there is no such value.
That is, each claims only to hold only IEEE 754 standard values, so casts between the two should incur no change except in memory given.
(clarification: There would be no change as long as the value was small enough to be held in a float; obviously if the value was too many bits to be held in a float to begin with, casting from double to float would result in a loss of precision.)
#KenG: This code:
float a = 0.1F
println "a=${a}"
double d = a
println "d=${d}"
fails not because 0.1f can't be exactly represented. The question was "is there a float value that cannot be represented as a double", which this code doesn't prove. Although 0.1f can't be stored exactly, the value that a is given (which isn't 0.1f exactly) can be stored as a double (which also won't be 0.1f exactly). Assuming an Intel FPU, the bit pattern for a is:
0 01111011 10011001100110011001101
and the bit pattern for d is:
0 01111111011 100110011001100110011010 (followed by lots more zeros)
which has the same sign, exponent (-4 in both cases) and the same fractional part (separated by spaces above). The difference in the output is due to the position of the second non-zero digit in the number (the first is the 1 after the point) which can only be represented with a double. The code that outputs the string format stores intermediate values in memory and is specific to floats and doubles (i.e. there is a function double-to-string and another float-to-string). If the to-string function was optimised to use the FPU stack to store the intermediate results of the to-string process, the output would be the same for float and double since the FPU uses the same, larger format (80bits) for both float and double.
There are no float values that can't be stored identically in a double, i.e. the set of float values is a sub-set of the the set of double values.
Snark: NaNs will compare differently after (or indeed before) conversion.
This does not, however, invalidate the answers already given.
I took the code you listed and decided to try it in C++ since I thought it might execute a little faster and it is significantly easier to do unsafe casting. :-D
I found out that for valid numbers, the conversion works and you get the exact bitwise representation after the cast. However, for non-numbers, e.g. 1.#QNAN0, etc., the result will use a simplified representation of the non-number rather than the exact bits of the source. For example:
**** FAILURE **** 2140188725 | 1.#QNAN0 -- 0xa0000000 0x7ffa1606
I cast an unsigned int to float then to double and back to float. The number 2140188725 (0x7F90B035) results in a NAN and converting to double and back is still a NAN but not the exact same NAN.
Here is the simple C++ code:
typedef unsigned int uint;
for (uint i = 0; i < 0xFFFFFFFF; ++i)
{
float f1 = *(float *)&i;
double d = f1;
float f2 = (float)d;
if(f1 != f2)
printf("**** FAILURE **** %u | %f -- 0x%08x 0x%08x\n", i, f1, f1, f2);
if ((i % 1000000) == 0)
printf("Iteration: %d\n", i);
}
The answer to the first question is yes, the answer to the 'in other words', however is no. If you change the test in the code to be if (!(f1 != f2)) the answer to the second question becomes yes -- it will print 'Success' for all float values.
In theory every normal single can have the exponent and mantissa padded to create a double and then remove the padding and you return to the original single.
When you go from theory to reality is when you will have problems. I dont know if you were interested in theory or implementation. If it is implementation then you can rapidly get into trouble.
IEEE is a horrible format, my understanding it was intentionally designed to be so tough that nobody could meet it and allow the market to catch up to intel (this was a while back) allowing for more competition. If that is true it failed, either way we are stuck with this dreadful spec. Something like the TI format is far superior for the real world in so many ways. I have no connection to either company or any of these formats.
Thanks to this spec there are very few if any fpus that actually meet it (in hardware or even in hardware plus the operating system), and those that do often fail on the next generation. (google: TestFloat). The problems these days tend to lie in the int to float and float to int and not single to double and double to single as you have specified above. Of course what operation is the fpu going to perform to do that conversion? Add 0? Multiply by 1? Depends on the fpu and the compiler.
The problem with IEEE related to your question above is that there is more than one way a number, not every number but many numbers can be represented. If I wanted to break your code I would start with minus zero in the hope that one of the two operations would convert it to a plus zero. Then I would try denormals. And it should fail with a signaling nan, but you called that out as a known exception.
The problem is that equal sign, here is rule number one about floating point, never use an equal sign. Equals is a bit comparison not a value comparison, if you have two values represented in different ways (plus zero and minus zero for example) the bit comparison will fail even though its the same number. Greater than and less than are done in the fpu, equals is done with the integer alu.
I realize that you probably used the equal to explain the problem and not necessarily the code you wanted to succeed or fail.
If a floating-point type is viewed as representing a precise value, then as other posters have noted, every float value is representable as a double, but only a few values of double can be represented by float. On the other hand, if one recognizes that floating-point values are approximations, one will realize the real situation is reversed. If one uses a very precise instrument to measure something which is 3.437mm, one may correctly describe is size as 3.4mm. if one uses a ruler to measure the object as 3.4mm, it would be incorrect to describe its size as 3.400mm.
Even bigger problems exist at the top of the range. There is a float value that represents: "computed value exceeded 2^127 by an unknown amount", but there's no double value that indicates such a thing. Casting an "infinity" from single to double will yield a value "computed value exceeded 2^1023 by an unknown amount" which is off by a factor of over a googol.