Rounding double strangeness - java

it might be too late in the night, but I can't understand the behavior of this code:
public class DT {
static void theTest(double d){
double e = Math.floor(d/1630)*1630;
System.out.println(e-d);
}
public static void main(String[] args) {
theTest(2*1630);
theTest(2*1631);
theTest(2*1629);
theTest(8.989779443802325E18);
}
}
in my understangind, all 4 cases should be NON-positive, i.e. "e" is always <= "d",
but I do get following output:
0.0
-2.0
-1628.0
1024.0
Why??.
as this is same with FastMath, I suspect something double-specific? but could anyone explain me this?

When you get up into huge numbers, the doubles are spaced more widely than integers. When you do a division in this range, the result can be rounded up or down. So in your fourth test case, the result of the division d/1630 is actually rounded up to the nearest available double. Since this is a whole number, the call to floor does not change it. Multiplying it by 1630 then gives a result that is larger than d.
Edit
This effect kicks in at 2^52. Once you get past 2^52, there are no more non-integer doubles. Between 2^52 and 2^53, the doubles are just the integers. Above 2^53, the doubles are spaced more widely than the integers.
The result of the division in the question is 5515202112762162.576... which is between 2^52 and 2^53. It is rounded to the nearest double, which is the same as the nearest integer, which is 5515202112762163. Now, the floor does not change this number, and the multiplication by 1630 gives a result that is larger than d.
In summary, I guess the first sentence of the answer was a little misleading - you don't need the doubles to be spaced more widely than the integers for this effect to occur; you only need them to be spaced at least as widely as the integers.
With a value of d between 0 and 2^52 * 1630, the program in the question will never output a positive number.

NOTE: I think you are looking for the operation called fmod in other languages and % in Java. The e - d that you wish to compute could be computed allways of the correct sign and always lower than 1630 as -(d % 1630.0).
all 4 cases should be NON-positive
For an arbitrary double d, it is likely that Math.floor(d/1630)*1630 would be less than d, but not necessary.
In order:
d/1630 is the double nearest to the real d / 1630. It can be up to one half ULP above the real d / 1630, and it can be arbitrarily close to an integer.
When d is large enough, d/1630 is always an integer, because any large enough double is an integer. In other words, when d is large enough, Math.floor(d/1630) is identical to d/1630. This applies to your last example.
d / 1630 * 1630 is the double nearest to the real multiplication of d / 1630 by 1630. It is one half ULP from the real result.
The two rounding operations in d / 1630 * 1630 can both round up, and in this case, d / 1630 * 1630 is larger than d. It wouldn't be expected to be larger than d by more than one ULP(d).
If you want to compute a number that is guaranteed to be below the real d / 1630, you should either change the rounding mode to downward (not sure if Java lets you do this), or subtract one ULP from the result of d / 1630 computed in the default round-to-nearest rounding mode. You can do the latter with the function nextAfter.

Related

Biggest amount in USD (double) that can accurately be converted to cents (long)

I'm writing a bank program with a variable long balance to store cents in an account. When users inputs an amount I have a method to do the conversion from USD to cents:
public static long convertFromUsd (double amountUsd) {
if(amountUsd <= maxValue || amountUsd >= minValue) {
return (long) (amountUsd * 100.0)
} else {
//no conversion (throws an exception, but I'm not including that part of the code)
}
}
In my actual code I also check that amountUsd does not have more than 2 decimals, to avoid inputs that cannot be accurately be converted (e.g 20.001 dollars is not exactly 2000 cents). For this example code, assume that all inputs has 0, 1 or 2 decimals.
At first I looked at Long.MAX_VALUE (9223372036854775807 cents) and assumed that double maxValue = 92233720368547758.07 would be correct, but it gave me rounding errors for big amounts:
convertFromUsd(92233720368547758.07) gives output 9223372036854775807
convertFromUsd(92233720368547758.00) gives the same output 9223372036854775807
What should I set double maxValue and double minValue to always get accurate return values?
You could use BigDecimal as a temp holder
If you have a very large double (something between Double.MAX_VALUE / 100.0 + 1 and Double.MAX_VALUE) the calculation of usd * 100.0 would result in an overflow of your double.
But since you know that every possible result of <any double> * 100 will fit in a long you could use a BigDecimal as a temporary holder for your calculation.
Also, the BigDecimal class defines two methods which come in handy for this purpose:
BigDecimal#movePointRight
BigDecimal#longValueExact
By using a BigDecimal you don't have to bother about specifying a max-value at all -> any given double representing USD can be converted to a long value representing cents (assuming you don't have to handle cent-fractions).
double usd = 123.45;
long cents = BigDecimal.valueOf(usd).movePointRight(2).setScale(0).longValueExact();
Attention: Keep in mind that a double is not able to store the exact USD information in the first place. It is not possible to restore the information that has been lost by converting the double to a BigDecimal.
The only advantage a temporary BigDecimal gives you is that the calculation of usd * 100 won't overflow.
First of all, using double for monetary amounts is risky.
TL;DR
I'd recommend to stay below $17,592,186,044,416.
The floating-point representation of numbers (double type) doesn't use decimal fractions (1/10, 1/100, 1/1000, ...), but binary ones (e.g. 1/128, 1/256). So, the double number will never exactly hit something like $1.99. It will be off by some fraction most of the time.
Hopefully, the conversion from decimal digit input ("1.99") to a double number will end up with the closest binary approximation, being a tiny fraction higher or lower than the exact decimal value.
To be able to correctly represent the 100 different cent values from $xxx.00 to $xxx.99, you need a binary resolution where you can at least represent 128 different values for the fractional part, meaning that the least significant bit corresponds to 1/128 (or better), meaning that at least 7 trailing bits have to be dedicated to the fractional dollars.
The double format effectively has 53 bits for the mantissa. If you need 7 bits for the fraction, you can devote at most 46 bits to the integral part, meaning that you have to stay below 2^46 dollars ($70,368,744,177,664.00, 70 trillions) as the absolute limit.
As a precaution, I wouldn't trust the best-rounding property of converting from decimal digits to double too much, so I'd spend two more bits for the fractional part, resulting in a limit of 2^44 dollars, $17,592,186,044,416.
Code Warning
There's a flaw in your code:
return (long) (amountUsd * 100.0);
This will truncate down to the next-lower cent if the double value lies between two exact cents, meaning that e.g. "123456789.23" might become 123456789.229... as a double and getting truncated down to 12345678922 cents as a long.
You should better use
return Math.round(amountUsd * 100.0);
This will end up with the nearest cent value, most probably being the "correct" one.
EDIT:
Remarks on "Precision"
You often read statements that floating-point numbers aren't precise, and then in the next sentence the authors advocate BigDecimal or similar representations as being precise.
The validity of such a statement depends on the type of number you want to represent.
All the number representation systems in use in today's computing are precise for some types of numbers and imprecise for others. Let's take a few example numbers from mathematics and see how well they fit into some typical data types:
42: A small integer can be represented exactly in virtually all types.
1/3: All the typical data types (including double and BigDecimal) fail to represent 1/3 exactly. They can only do a (more or less close) approximation. The result is that multiplication with 3 does not exactly give the integer 1. Few languages offer a "ratio" type, capable to represent numbers by numerator and denominator, thus giving exact results.
1/1024: Because of the power-of-two denominator, float and double can easily do an exact representation. BigDecimal can do as well, but needs 10 fractional digits.
14.99: Because of the decimal fraction (can be rewritten as 1499/100), BigDecimal does it easily (that's what it's made for), float and double can only give an approximation.
PI: I don't know of any language with support for irrational numbers - I even have no idea how this could be possible (aside from treating popular irrationals like PI and E symbolically).
123456789123456789123456789: BigInteger and BigDecimal can do it exactly, double can do an approximation (with the last 13 digits or so being garbage), int and long fail completely.
Let's face it: Each data type has a class of numbers that it can represent exactly, where computations deliver precise results, and other classes where it can at best deliver approximations.
So the questions should be:
What's the type and range of numbers to be represented here?
Is an approximation okay, and if yes, how close should it be?
What's the data type that matches my requirements?
Using a double, the biggest, in Java, would be: 70368744177663.99.
What you have in a double is 64 bit (8 byte) to represent:
Decimals and integers
+/-
Problem is to get it to not round of 0.99 so you get 46 bit for the integer part and the rest need to be used for the decimals.
You can test with the following code:
double biggestPossitiveNumberInDouble = 70368744177663.99;
for(int i=0;i<130;i++){
System.out.printf("%.2f\n", biggestPossitiveNumberInDouble);
biggestPossitiveNumberInDouble=biggestPossitiveNumberInDouble-0.01;
}
If you add 1 to biggestPossitiveNumberInDouble you will see it starting to round off and lose precision.
Also note the round off error when subtracting 0.01.
First iterations
70368744177663.99
70368744177663.98
70368744177663.98
70368744177663.97
70368744177663.96
...
The best way in this case would not to parse to double:
System.out.println("Enter amount:");
String input = new Scanner(System.in).nextLine();
int indexOfDot = input.indexOf('.');
if (indexOfDot == -1) indexOfDot = input.length();
int validInputLength = indexOfDot + 3;
if (validInputLength > input.length()) validInputLength = input.length();
String validInput = input.substring(0,validInputLength);
long amout = Integer.parseInt(validInput.replace(".", ""));
System.out.println("Converted: " + amout);
This way you don't run into the limits of double and just have the limits of long.
But ultimately would be to go with a datatype made for currency.
You looked at the largest possible long number, while the largest possible double is smaller. Calculating (amountUsd * 100.0) results in a double (and afterwards gets casted into a long).
You should ensure that (amountUsd * 100.0) can never be bigger than the largest double, which is 9007199254740992.
Floating values (float, double) are stored differently than integer values (int, long) and while double can store very large values, it is not good for storing money amounts as they get less accurate the bigger or more decimal places the number has.
Check out How many significant digits do floats and doubles have in java? for more information about floating point significant digits
A double is 15 significant digits, the significant digit count is the total number of digits from the first non-zero digit. (For a better explanation see https://en.wikipedia.org/wiki/Significant_figures Significant figures rules explained)
Therefor in your equation to include cents and make sure you are accurate you would want the maximum number to have no more than 13 whole number places and 2 decimal places.
As you are dealing with money it would be better not to use floating point values. Check out this article on using BigDecimal for storing currency: https://medium.com/#cancerian0684/which-data-type-would-you-choose-for-storing-currency-values-like-trading-price-dd7489e7a439
As you mentioned users are inputting an amount, you could read it in as a String rather than a floating point value and pass that into a BigDecimal.

Java: convert float to double preserving decimal point precision

I have a float-based storage of decimal by their nature numbers. The precision of float is fine for my needs. Now I want is to perform some more precise calculations with these numbers using double.
An example:
float f = 0.1f;
double d = f; //d = 0.10000000149011612d
// but I want some code that will convert 0.1f to 0.1d;
Update 1:
I know very well that 0.1f != 0.1d. This question is not about precise decimal calculations. Sadly, the question was downvoted. I will try to explain it again...
Let's say I work with an API that returns float numbers for decimal MSFT stock prices. Believe or not, this API exists:
interface Stock {
float[] getDayPrices();
int[] getDayVolumesInHundreds();
}
It is known that the price of a MSFT share is a decimal number with no more than 5 digits, e.g. 31.455, 50.12, 45.888. Obviously the API does not work with BigDecimal because it would be a big overhead for the purpose to just pass the price.
Let's also say I want to calculate a weighted average of these prices with double precision:
float[] prices = msft.getDayPrices();
int[] volumes = msft.getDayVolumesInHundreds();
double priceVolumeSum = 0.0;
long volumeSum = 0;
for (int i = 0; i < prices.length; i++) {
double doublePrice = decimalFloatToDouble(prices[i]);
priceVolumeSum += doublePrice * volumes[i];
volumeSum += volumes[i];
}
System.out.println(priceVolumeSum / volumeSum);
I need a performant implemetation of decimalFloatToDouble.
Now I use the following code, but I need a something more clever:
double decimalFloatToDouble(float f) {
return Double.parseDouble(Float.toString(f));
}
EDIT: this answer corresponds to the question as initially phrased.
When you convert 0.1f to double, you obtain the same number, the imprecise representation of the rational 1/10 (which cannot be represented in binary at any precision) in single-precision. The only thing that changes is the behavior of the printing function. The digits that you see, 0.10000000149011612, were already there in the float variable f. They simply were not printed because these digits aren't printed when printing a float.
Ignore these digits and compute with double as you wish. The problem is not in the conversion, it is in the printing function.
As I understand you, you know that the float is within one float-ulp of an integer number of hundredths, and you know that you're well inside the range where no two integer numbers of hundredths map to the same float. So the information isn't gone at all; you just need to figure out which integer you had.
To get two decimal places, you can multiply by 100, rint/Math.round the result, and multiply by 0.01 to get a close-by double as you wanted. (To get the closest, divide by 100.0 instead.) But I suspect you knew this already and are looking for something that goes a little faster. Try ((9007199254740992 + 100.0 * x) - 9007199254740992) * 0.01 and don't mess with the parentheses. Maybe strictfp that hack for good measure.
You said five significant figures, and apparently your question isn't limited to MSFT share prices. Up until doubles can't represent powers of 10 exactly, this isn't too bad. (And maybe this works beyond that threshold too.) The exponent field of a float narrows down the needed power of ten down to two things, and there are 256 possibilities. (Except in the case of subnormals.) Getting the right power of ten just needs a conditional, and the rounding trick is straightforward enough.
All of this is all going to be a mess, and I'd recommend you stick with the toString approach for all the weird cases.
If your goal is to have a double whose canonical representation will match the canonical representation of a float converting the float to string and converting the result back to double would probably be the most accurate way of achieving that result, at least when it's possible (I don't know for certain whether Java's double-to-string logic would guarantee that there won't be a pair of consecutive double values which report themselves as just above and just-below a number with five significant figures).
If your goal is to round to five significant figures a value which is known to have been rounded to five significant figures while in float form, I would suggest that the simplest approach is probably to simply round to five significant figures. If your magnitude of your numbers will be roughly within the range 1E+/-12, start by finding the smallest power of ten which is smaller than your number, multiply that by 100,000, multiply your number by that, round to the nearest unit, and divide by that power of ten. Because division is often much slower than multiplication, if performance is critical, you might keep a table with powers of ten and their reciprocals. To avoid the possibility of rounding errors, your table should store for each power of then the closest power-of-two double to its reciprocal, and then the closest double to the difference between the first double and the actual reciprocal. Thus, the reciprocal of 100 would be stored as 0.0078125 + 0.0021875; the value n/100 would be computed as n*0.0078125 + n*0.0021875. The first term would never have any round-off error (multiplying by a power of two), and the second value would have precision beyond that needed for the final result, so the final result should thus be rounded accurately.

Why is scale subtracted when dividing BigInteger and added when multiplying?

I have data that I'm trying to accurately and precisely manipulate with an ever-increasing denominator.
Please assume that the numerator will always have a decimal.
I see in the docs that divide(BigDecimal divisor) will actually reduce the scale which seems strange since as I understand the usage of "scale", number of digits past the decimal point, it should increase upon division.
I also see in the docs that multiply(BigDecimal multiplicand) increases the scale. This also doesn't make sense, according to my understanding of scale, since the likelihood of two multiplied numbers needing digits beyond the decimal point goes down.
Are these typos in the docs?
If not, is my understanding of scale incorrect?
If not, how can precision be maintained with an ever-increasing denominator that increases the number of digits past the decimal point?
This is effectively just scientific notation. As it says in the docs, the value of a BigDecimal is:
unscaledValue × 10^(-scale)
Thus multiplying two BigDecimals is equivalent to multiplying their unscaledValues and adding their scales:
a * b
== (uA * 10^-sA) * (uB * 10^-sB)
== (uA * uB) * 10^-(sA + sB)

Java's '==' operator on doubles

This method returns 'true'. Why ?
public static boolean f() {
double val = Double.MAX_VALUE/10;
double save = val;
for (int i = 1; i < 1000; i++) {
val -= i;
}
return (val == save);
}
You're subtracting quite a small value (less than 1000) from a huge value. The small value is so much smaller than the large value that the closest representable value to the theoretical result is still the original value.
Basically it's a result of the way floating point numbers work.
Imagine we had some decimal floating point type (just for simplicity) which only stored 5 significant digits in the mantissa, and an exponent in the range 0 to 1000.
Your example is like writing 10999 - 1000... think about what the result of that would be, when rounded to 5 significant digits. Yes, the exact result is 99999.....9000 (with 999 digits) but if you can only represent values with 5 significant digits, the closest result is 10999 again.
When you set val to Double.MAX_VALUE/10, it is set to a value approximately equal to 1.7976931348623158 * 10^307. substracting values like 1000 from that would required a precision on the double representation that is not possible, so it basically leaves val unchanged.
Depending on your needs, you may use BigDecimal instead of double.
Double.MAX_VALUE is so big that the JVM does not tell the difference between it and Double.MAX_VALUE-1000
if you subtract a number fewer than "1.9958403095347198E292" from Double.MAV_VALUE the result is still Double.MAX_VALUE.
System.out.println(
new BigDecimal(Double.MAX_VALUE).equals( new BigDecimal(
Double.MAX_VALUE - 2.E291) )
);
System.out.println(
new BigDecimal(Double.MAX_VALUE).equals( new BigDecimal(
Double.MAX_VALUE - 2.E292) )
);
Ouptup:
true
false
A double does not have enough precision to perform the calculation you are attempting. So the result is the same as the initial value.
It is nothing to do with the == operator.
val is a big number and when subtracting 1 (or even 1000) from it, the result cannot be expressed properly as a double value. The representation of this number x and x-1 is the same, because double only has a limited number of bits to represent an unlimited number of numbers.
Double.MAX_VALUE is a huge number compared to 1 or 1000. Double.MAX_VALUE-1 is generally equals to Double.MAX_VALUE. So your code roughly does nothing when substracting 1 or 1000 to Double.MAX_VALUE/10.
Always remember that:
doubles or floats are just approximations of real numbers, they are just rationals not equally distributed among the reals
you should use very carefully arithmetic operators between doubles or floats which are not close (there is many other rules such like this...)
in general, never use doubles or float if you need arbitrary precision
Because double is a floating point numeric type, which is a way of approximating numeric values. Floating point representations encode numbers so that we can store numbers much larger or smaller than we normally could. However, not all numbers can be represented in the given space, so multiple numbers get rounded to the same floating point value.
As a simplified example, we might want to be able to store values ranging from -1000 to 1000 in some small amount of space where we would normally only be able to store -10 to 10. So we could round all values to the nearest thousand and store them in the small space: -1000 gets encoded as -10, -900 gets encoded as -9, 1000 gets encoded as 10. But what if we want to store -999? The closest value we can encoded is -1000, so we have to encode -999 as the same value as -1000: -10.
In reality, floating point schemes are much more complicated than the example above, but the concept is similar. Floating point representations of numbers can only represent some of all the possible numbers, so when we have a number that can't be represented as part of the scheme, we have to round it to the closest representable value.
In your code, all values within 1000 of Double.MAX_VALUE / 10 automatically get rounded to Double.MAX_VALUE / 10, which is why the computer thinks (Double.MAX_VALUE / 10) - 1000 == Double.MAX_VALUE / 10.
The result of a floating point calculation is the closest representable value to the exact answer. This program:
public class Test {
public static void main(String[] args) throws Exception {
double val = Double.MAX_VALUE/10;
System.out.println(val);
System.out.println(Math.nextAfter(val, 0));
}
}
prints:
1.7976931348623158E307
1.7976931348623155E307
The first of these numbers is your original val. The second is the largest double that is less than it.
When you subtract 1000 from 1.7976931348623158E307, the exact answer is between those two numbers, but very, very much closer to 1.7976931348623158E307 than to 1.7976931348623155E307, so the result will be rounded to 1.7976931348623155E307, leaving val unchanged.

Is "long x = 1/2" equal to 1 or 0, and why? [duplicate]

This question already has answers here:
Integer division: How do you produce a double?
(11 answers)
Closed 7 years ago.
if I have something like:
long x = 1/2;
shouldn't this be rounded up to 1? When I print it on the screen it say 0.
It's doing integer division, which truncates everything to the right of the decimal point.
Integer division has its roots in number theory. When you do 1/2 you are asking how many times does 2 equal 1? The answer is never, so the equation becomes 0*2 + 1 = 1, where 0 is the quotient (what you get from 1/2) and 1 is the remainder (what you get from 1%2).
It is right to point out that % is not a true modulus in the mathematical sense but always a remainder from division. There is a difference when you are dealing with negative integers.
Hope that helps.
What this expression is doing is it first declares the existence of a long called x, and then assigning it the value of the right hand side expression. The right hand side expression is 1/2, and since 1 and 2 are both integers this is interpreted as integer division. With integer division the result is always an Integer, so something along the lines of 5/3 will return 1, as only one three fits in a five. So with 1/2, how many 2s can fit into 1? 0.
This can in some languages result in some interesting outputs if you write something like
double x = 1/2. You might expect 0.5 in this case, but it will often evaluate the integer value on the right first before assigning and converting the result into a double, giving the value 0.0
It is important to note that when doing this kind of type conversion, it will never round the result. So if you do the opposite:
long x = (long)(1.0/2.0);
then while (1.0/2.0) will evaluate to 0.5, the (long) cast will force this to be truncated to 0. Even if I had long x = (long)(0.9), the result will still be 0. It simply truncates after the decimal point.
It can't round because it's never in a state to be rounded
The expression "1/2" is never 0.5 before assign to long
Now, long x = 1.0/2.0 because the expression on the right before assign is valid for rounding. Unless you get 0.499999999999997...
this question was answered before on this site, you are doing an integer division, if you want to get the 0.5 use:
double x = (double)1/2;
and you will get the value of 0.5 .
There are lots of different rounding conventions, the most common being rounding towards +inf, rounding towards -inf and rounding towards zero. Lots of people assume there's one right way, but they all have different ideas about what that one way should be ;-)
There is no intermediate non-integer result for integer division, but of course the division is done deterministically, and one particular rounding convention will always be followed for a particular platform and compiler.
With Visual C++ I get 5/2 = 2 and -5/2 = -2, rounding towards zero.
The rounding in C, C++ and Java is commonly called "truncation" - meaning drop off the unwanted bits. But this can be misleading. Using 4 bit 2s complement binary, doing what truncation implies gives...
5/2 = 0101/0010 = 0010.1 --> 0010 = 2
-5/2 = 1011/0010 = 1101.1 --> 1101 = -3
Which is rounding towards -infinity, which is what Python does (or at least what it did in Python 2.5).
Truncation would be the right word if we used a sign-magnitude representation, but twos complement has been the de-facto standard for decades.
In C and C++, I expect while it's normally called truncation, in reality this detail is undefined in the standards and left to the implementation - an excuse for allowing the compiler to use the simplest and fastest method for the platform (what the processors division instruction naturally does). It's only an issue if you have negative numbers though - I've yet to see any language or implementation that would give 5/2 = 3.
I don't know what the Java standard says. The Python manual specifies "floor" division, which is a common term for rounding to -infinity.
EDIT
An extra note - by definition, if a/b = c remainder d, then a = (b*c)+d. For this to hold, you have to choose a remainder to suite your rounding convention.
People tend to assume that remainders and modulos are the same, but WRT signed values, they can be different - depending on the rounding rules. Modulo values are by definition never negative, but remainders can be negative.
I suspect the Python round-towards-negative-infinity rule is intended to ensure that the single % operator is valid both as a remainder and as a modulo. In C and C++, what % means (remainder or modulo) is (yes, you guessed it) implementation defined.
Ada actually has two separate operators - mod and rem. With division required to round towards zero, so that mod and rem do give different results.

Categories

Resources