One of the first things we learn in floating point arithmetics is how rounding error plays a crucial role in double summation. Let's say we have an array of double myArray and we want to find the mean. What we could trivially do is:
double sum = 0.0;
for(int i = 0; i < myArray.length; i++) {
sum += myArray[i];
}
double mean = (double) sum/myArray.length;
However, we would have rounding error. This error can be reduced using other summation algorithm such as the Kahan one (wiki https://en.wikipedia.org/wiki/Kahan_summation_algorithm).
I have recently discovered Java Streams (refer to: https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html) and in particular DoubleStream (see: https://docs.oracle.com/javase/8/docs/api/java/util/stream/DoubleStream.html).
With the code:
double sum = DoubleStream.of(myArray).parallel().sum();
double average = (double) sum/myArray.length;
we can get the average of our array. Two advantages are remarkable in my opinion:
More concise code
Faster as it is parallelized
Of course we could also have done something like:
double average = DoubleStream.of(myArray).parallel().average();
but I wanted to stress the summation.
At this point I have a question (which API didn't answer): is this method sum() numerically stable? I have done some experiments and it appears to be working fine. However I am not sure is at least good as the Kahan algorithm. Any help really welcomed!
The documentation says it:
Returns the sum of elements in this stream. Summation is a special
case of a reduction. If floating-point summation were exact, this
method would be equivalent to:
return reduce(0, Double::sum);
However, since floating-point summation is not exact, the above code
is not necessarily equivalent to the summation computation done by
this method.
Have you considered using BigDecimal to perform exact results?
Interesting, so I implemented the Kahan variant of Klein, mentioned in the wikipedia article. And a Stream version of it.
The results are not convincing.
double[] values = new double[10_000];
Random random = new Random();
Arrays.setAll(values, (i) -> Math.atan(random.nextDouble()*Math.PI*2) * 3E17);
long t0 = System.nanoTime();
double sum1 = DoubleStream.of(values).sum();
long t1 = System.nanoTime();
double sum2 = DoubleStream.of(values).parallel().sum();
long t2 = System.nanoTime();
double sum3 = kleinSum(values);
long t3 = System.nanoTime();
double sum4 = kleinSumAsStream(values);
long t4 = System.nanoTime();
System.out.printf(
"seq %f (%d ns)%npar %f (%d ns)%nkah %f (%d ns)%nstr %f (%d ns)%n",
sum1, t1 - t0,
sum2, t2 - t1,
sum3, t3 - t2,
sum4, t4 - t3);
An a non-stream version of modified Kahan:
public static double kleinSum(double[] input) {
double sum = 0.0;
double cs = 0.0;
double ccs = 0.0;
for (int i = 0; i < input.length; ++i) {
double t = sum + input[i];
double c = Math.abs(sum) >= Math.abs(input[i])
? (sum - t) + input[i]
: (input[i] - t) + sum;
sum = t;
t = cs + c;
double cc = Math.abs(cs) >= Math.abs(c)
? (cs - t) + c
: (c - t) + cs;
cs = t;
ccs += cc;
}
return sum + cs + ccs;
}
A Stream version:
public static double kleinSumAsStream(double[] input) {
double[] scc = DoubleStream.of(input)
.boxed()
.reduce(new double[3],
(sumCsCcs, x) -> {
double t = sumCsCcs[0] + x;
double c = Math.abs(sumCsCcs[0]) >= Math.abs(x)
? (sumCsCcs[0] - t) + x
: (x - t) + sumCsCcs[0];
sumCsCcs[0] = t;
t = sumCsCcs[1] + c;
double cc = Math.abs(sumCsCcs[1]) >= Math.abs(c)
? (sumCsCcs[1] - t) + c
: (c - t) + sumCsCcs[1];
sumCsCcs[1] = t;
sumCsCcs[2] += cc;
return sumCsCcs;
},
(scc1, scc2) -> new double[] {
scc2[0] + scc1[0],
scc2[1] + scc1[1],
scc2[2] + scc1[2]});
return scc[0] + scc[1] + scc[2];
}
Mind that the times would only be evidence, when a microworkbench would have been used.
However one still sees the overhead of a DoubleStream:
sequential 3363280744568882000000,000000 (5083900 ns)
parallel 3363280744568882500000,000000 (4492600 ns)
klein 3363280744568882000000,000000 (1051600 ns)
kleinStream 3363280744568882000000,000000 (3277500 ns)
Unfortunately I did not correctly cause floating point errors, and its for me late.
Using a Stream instead of the kleinSum would need a reduction with at least 2 doubles (sum and correction), so a double[2] or in newer Java a Record(double sum, double cs, double ccs) value.
A far less magical auxiliary approach is to sort the input by magnitude.
float (used for readability reasons only, double has a precision limit too, used later) has a 24-bit mantissa (of which 23 bits are stored, and the 24th one is considered 1 for "normal" numbers), so if you have the number 2^24, you simply can't add 1 to it, the smallest increment it has is 2:
float f=1<<24;
System.out.println(Float.valueOf(f).intValue());
f++;
f++;
System.out.println(Float.valueOf(f).intValue());
f+=2;
System.out.println(Float.valueOf(f).intValue());
will display
16777216
16777216 <-- 16777216+1+1
16777218 <-- 16777216+2
while summing them in the other direction works
float f=0;
System.out.println(Float.valueOf(f).intValue());
f++;
f++;
System.out.println(Float.valueOf(f).intValue());
f+=2;
System.out.println(Float.valueOf(f).intValue());
f+=1<<24;
System.out.println(Float.valueOf(f).intValue());
produces
0
2
4
16777220 <-- 4+16777216
(of course the pair of f++s is intentional, 16777219 would not exist, just like 16777217 for the previous case. These are not incomprehensibly huge numbers, yet a simple line as System.out.println((int)(float)16777219); already prints 16777220).
The thing applies to double too, just there you have 53-bits precision.
Two things:
the documentation actually suggests this: API Note: Elements sorted by increasing absolute magnitude tend to yield more accurate results
sum() internally ends in Collectors.sumWithCompensation(), which explicitly writes that it's an implementation of Kahan summation. (GitHub link is of JetBrains because Java uses different source control, which is a bit harder to find and link - but the file is present in your JDK too, inside src.zip, usually located in the lib folder)
Ordering by magnitude is something like ordering by log(abs(x)), which is a bit uglier in code, but possible:
double t[]= {Math.pow(2, 53),1,-1,-Math.pow(2, 53),1};
System.out.println(DoubleStream.of(t).boxed().collect(Collectors.toList()));
t=DoubleStream.of(t).boxed()
.sorted((a,b)->(int)(Math.log(Math.abs(a))-Math.log(Math.abs(b))))
.mapToDouble(d->d)
.toArray();
System.out.println(DoubleStream.of(t).boxed().collect(Collectors.toList()));
will print an okay order
[9.007199254740992E15, 1.0, -1.0, -9.007199254740992E15, 1.0]
[1.0, -1.0, 1.0, 9.007199254740992E15, -9.007199254740992E15]
So it's nice, but you can actually break it with little effort (the first few lines show that 2^53 really is the "integer limit" for double, and also "reminds" us of the actual value, then the sum with a single +1 ends up being less than 2^53):
double d=Math.pow(2, 53);
System.out.println(Double.valueOf(d).longValue());
d++;
d++;
System.out.println(Double.valueOf(d).longValue());
d+=2;
System.out.println(Double.valueOf(d).longValue());
double array[]= {Math.pow(2, 53),1,1,1,1};
for(var i=0;i<5;i++) {
var copy=Arrays.copyOf(array, i+1);
d=DoubleStream.of(copy).sum();
System.out.println(i+": "+Double.valueOf(d).longValue());
}
produces
9007199254740992
9007199254740992 <-- 9007199254740992+1+1
9007199254740994 <-- 9007199254740992+2
0: 9007199254740992
1: 9007199254740991 <-- that would be 9007199254740992+1 with Kahan
2: 9007199254740994
3: 9007199254740996 <-- "rounding" upwards, just like with (float)16777219 earlier
4: 9007199254740996
TL;DR: you don't need your own Kahan implementation, but use computers with care in general.
Related
How do I map numbers, linearly, between a and b to go between c and d.
That is, I want numbers between 2 and 6 to map to numbers between 10 and 20... but I need the generalized case.
My brain is fried.
If your number X falls between A and B, and you would like Y to fall between C and D, you can apply the following linear transform:
Y = (X-A)/(B-A) * (D-C) + C
That should give you what you want, although your question is a little ambiguous, since you could also map the interval in the reverse direction. Just watch out for division by zero and you should be OK.
Divide to get the ratio between the sizes of the two ranges, then subtract the starting value of your inital range, multiply by the ratio and add the starting value of your second range. In other words,
R = (20 - 10) / (6 - 2)
y = (x - 2) * R + 10
This evenly spreads the numbers from the first range in the second range.
It would be nice to have this functionality in the java.lang.Math class, as this is such a widely required function and is available in other languages.
Here is a simple implementation:
final static double EPSILON = 1e-12;
public static double map(double valueCoord1,
double startCoord1, double endCoord1,
double startCoord2, double endCoord2) {
if (Math.abs(endCoord1 - startCoord1) < EPSILON) {
throw new ArithmeticException("/ 0");
}
double offset = startCoord2;
double ratio = (endCoord2 - startCoord2) / (endCoord1 - startCoord1);
return ratio * (valueCoord1 - startCoord1) + offset;
}
I am putting this code here as a reference for future myself and may be it will help someone.
As an aside, this is the same problem as the classic convert celcius to farenheit where you want to map a number range that equates 0 - 100 (C) to 32 - 212 (F).
https://rosettacode.org/wiki/Map_range
[a1, a2] => [b1, b2]
if s in range of [a1, a2]
then t which will be in range of [b1, b2]
t= b1 + ((s- a1) * (b2-b1))/ (a2-a1)
In addition to #PeterAllenWebb answer, if you would like to reverse back the result use the following:
reverseX = (B-A)*(Y-C)/(D-C) + A
Each unit interval on the first range takes up (d-c)/(b-a) "space" on the second range.
Pseudo:
var interval = (d-c)/(b-a)
for n = 0 to (b - a)
print c + n*interval
How you handle the rounding is up to you.
if your range from [a to b] and you want to map it in [c to d] where x is the value you want to map
use this formula (linear mapping)
double R = (d-c)/(b-a)
double y = c+(x*R)+R
return(y)
Where X is the number to map from A-B to C-D, and Y is the result:
Take the linear interpolation formula, lerp(a,b,m)=a+(m*(b-a)), and put C and D in place of a and b to get Y=C+(m*(D-C)). Then, in place of m, put (X-A)/(B-A) to get Y=C+(((X-A)/(B-A))*(D-C)). This is an okay map function, but it can be simplified. Take the (D-C) piece, and put it inside the dividend to get Y=C+(((X-A)*(D-C))/(B-A)). This gives us another piece we can simplify, (X-A)*(D-C), which equates to (X*D)-(X*C)-(A*D)+(A*C). Pop that in, and you get Y=C+(((X*D)-(X*C)-(A*D)+(A*C))/(B-A)). The next thing you need to do is add in the +C bit. To do that, you multiply C by (B-A) to get ((B*C)-(A*C)), and move it into the dividend to get Y=(((X*D)-(X*C)-(A*D)+(A*C)+(B*C)-(A*C))/(B-A)). This is redundant, containing both a +(A*C) and a -(A*C), which cancel each other out. Remove them, and you get a final result of: Y=((X*D)-(X*C)-(A*D)+(B*C))/(B-A)
TL;DR: The standard map function, Y=C+(((X-A)/(B-A))*(D-C)), can be simplified down to Y=((X*D)-(X*C)-(A*D)+(B*C))/(B-A)
int srcMin = 2, srcMax = 6;
int tgtMin = 10, tgtMax = 20;
int nb = srcMax - srcMin;
int range = tgtMax - tgtMin;
float rate = (float) range / (float) nb;
println(srcMin + " > " + tgtMin);
float stepF = tgtMin;
for (int i = 1; i < nb; i++)
{
stepF += rate;
println((srcMin + i) + " > " + (int) (stepF + 0.5) + " (" + stepF + ")");
}
println(srcMax + " > " + tgtMax);
With checks on divide by zero, of course.
I'm supposed to calculate
using Simpson's rule, with 4 sub intervals.
I surely do not want do it by hand so I have tried to write that algorithm in Java.
The formula for Simpson's rule is
And here is my code:
import java.util.Scanner;
import java.util.Locale;
public class Simpson {
public static void main(String[] args) {
Scanner input = new Scanner(System.in).useLocale(Locale.US);
//e= 2.718281828459045 to copy paste
System.out.println("Interval a: ");
double aInt = input.nextDouble();
System.out.println("Interval b: ");
double bInt = input.nextDouble();
System.out.println("How many sub intervals: ");
double teilInt = input.nextDouble();
double intervaldistance = (bInt-aInt)/teilInt;
System.out.println("h = "+"("+bInt+"-"+aInt+") / "+teilInt+ " = "+intervaldistance);
double total = 0;
System.out.println("");
double totalSum=0;
for(double i=0; i<teilInt; i++) {
bInt = aInt+intervaldistance;
printInterval(aInt, bInt);
total = prod1(aInt, bInt);
total = total*prod2(aInt, bInt);
aInt = bInt;
System.out.println(total);
totalSum=totalSum+total;
total=0;
}
System.out.println("");
System.out.println("Result: "+totalSum);
}
static double prod1(double a, double b) { // first product of simpson rule; (b-a) / 6
double res1 = (b-a)/6;
return res1;
}
static double prod2(double a, double b) { // second pproduct of simpson rule
double res2 = Math.log(a)+4*Math.log((a+b)/2)+Math.log(b);
return res2;
}
static void printInterval(double a, double b) {
System.out.println("");
System.out.println("["+a+"; "+b+"]");
}
}
Output for 4 sub intervals:
[1.0; 1.4295704571147612]
0.08130646125926948
[1.4295704571147612; 1.8591409142295223]
0.21241421690076787
[1.8591409142295223; 2.2887113713442835]
0.31257532785558795
[2.2887113713442835; 2.7182818284590446]
0.39368288949073565
Result: 0.9999788955063609
Now If I compare my solution with other online calculators (http://www.emathhelp.net/calculators/calculus-2/simpsons-rule-calculator/?f=ln+%28x%29&a=1&b=e&n=4&steps=on), it differs.. But I don't see why mine should be wrong.
My solution is 0.9999788955063609, online solution is 0.999707944567103
Maybe there is a mistake I made? But I have double checked everything and couldn't find.
You may be accumulating the rounding error by doing b_n = a_{n} + interval many times.
Instead you could be using an inductive approach, where you say a_n = a_0 + n*interval, since this only involves introducing a rounding error once.
I will test with actual numbers to confirm and flesh out the answer in a little bit, but in the meantime you can watch this explanation about accumulation of error from handmade hero
PS. As a bonus, you get to watch an excerpt from handmade hero!
UPDATE: I had a look at your link. While the problem I described above does apply, the difference in precision is small (you'll get the answer 0.9999788955063612 instead). The reason for the discrepancy in your case is that the formula used in your online calculator is a slightly different variant in terms of notation, which treats the interval [a,b] as 2h. In other words, your 4 intervals is equivalent to 8 intervals in their calculation.
If you put 8 rectangles in that webpage you'll get the same result as the (more accurate) number here:
Answer: 0.999978895506362.
See a better explanation of the notation used on that webpage here
I changed your delta calculation to the top to so that you don't calculate the delta over and over again. You were also not applying the right multipliers for the odd and even factors, as well as not applying the right formula for deltaX since it has to be: ((a-b)/n) /3
double deltaX = ((bInt-aInt)/teilInt)/3;
for(int i=0; i<=teilInt; i++) { //changed to <= to include the last interval
bInt = aInt+intervaldistance;
printInterval(aInt, bInt);
total = prod2(aInt, bInt, i+1, teilInt); //added the current interval and n. The interval is +1 to work well with the even and odds
totalSum += total;
aInt = bInt;
System.out.println(total);
}
System.out.println("");
System.out.println("Result: "+ (totalSum*deltaX)); //multiplication with deltaX is now here
To account for the right factor of f(x) i changed the prod2 to:
static double prod2(double a, double b, int interval, double n) {
int multiplier = 1;
if (interval > 0 && interval <= n){
//applying the right multiplier to f(x) given the current interval
multiplier = (interval % 2 == 0) ? 4 : 2;
}
return multiplier * Math.log(a);
}
Now it yields the correct result:
Im writing a function that implements the following expression (1/n!)*(1!+2!+3!+...+n!).
The function is passed the arguement n and I have to return the above statement as a double, truncated to the 6th decimal place. The issue im running into is that the factorial value becomes so large that it becomes infinity (for large values of n).
Here is my code:
public static double going(int n) {
double factorial = 1.00;
double result = 0.00, sum = 0.00;
for(int i=1; i<n+1; i++){
factorial *= i;
sum += factorial;
}
//Truncate decimals to 6 places
result = (1/factorial)*(sum);
long truncate = (long)Math.pow(10,6);
result = result * truncate;
long value = (long) result;
return (double) value / truncate;
}
Now, the above code works fine for say n=5 or n= 113, but anything above n = 170 and my factorial and sum expressions become infinity. Is my approach just not going to work due to the exponential growth of the numbers? And what would be a work around to calculating very large numbers that doesnt impact performance too much (I believe BigInteger is quite slow from looking at similar questions).
You can solve this without evaluating a single factorial.
Your formula simplifies to the considerably simpler, computationally speaking
1!/n! + 2!/n! + 3!/n! + ... + 1
Aside from the first and last terms, a lot of factors actually cancel, which will help the precision of the final result, for example for 3! / n! you only need to multiply 1 / 4 through to 1 / n. What you must not do is to evaluate the factorials and divide them.
If 15 decimal digits of precision is acceptable (which it appears that it is from your question) then you can evaluate this in floating point, adding the small terms first. As you develop the algorithm, you'll notice the terms are related, but be very careful how you exploit that as you risk introducing material imprecision. (I'd consider that as a second step if I were you.)
Here's a prototype implementation. Note that I accumulate all the individual terms in an array first, then I sum them up starting with the smaller terms first. I think it's computationally more accurate to start from the final term (1.0) and work backwards, but that might not be necessary for a series that converges so quickly. Let's do this thoroughly and analyse the results.
private static double evaluate(int n){
double terms[] = new double[n];
double term = 1.0;
terms[n - 1] = term;
while (n > 1){
terms[n - 2] = term /= n;
--n;
}
double sum = 0.0;
for (double t : terms){
sum += t;
}
return sum;
}
You can see how very quickly the first terms become insignificant. I think you only need a few terms to compute the result to the tolerance of a floating point double. Let's devise an algorithm to stop when that point is reached:
The final version. It seems that the series converges so quickly that you don't need to worry about adding small terms first. So you end up with the absolutely beautiful
private static double evaluate_fast(int n){
double sum = 1.0;
double term = 1.0;
while (n > 1){
double old_sum = sum;
sum += term /= n--;
if (sum == old_sum){
// precision exhausted for the type
break;
}
}
return sum;
}
As you can see, there is no need for BigDecimal &c, and certainly never a need to evaluate any factorials.
You could use BigDecimal like this:
public static double going(int n) {
BigDecimal factorial = BigDecimal.ONE;
BigDecimal sum = BigDecimal.ZERO;
BigDecimal result;
for(int i=1; i<n+1; i++){
factorial = factorial.multiply(new BigDecimal(i));
sum = sum.add(factorial);
}
//Truncate decimals to 6 places
result = sum.divide(factorial, 6, RoundingMode.HALF_EVEN);
return result.doubleValue();
}
Polynomial: a0x^0 + a1x^1 +a2x^2 + a3x^3 + ... + anx^n
Array: array_a[] = {a0, a1, a2, a3 ... an};
I wrote a function to calculate this polynomial in Java:
public double cal(double x) {
double y = 0.0;
for (int index = array_a.length - 1; index >= 0; index--) {
y = array_a[index] + y * x;
}
return y;
}
This seems 5 times faster than the loop y += array_a[index] * Math.Pow(x, index);
But I wondering if there is a better way to compute this polynomial?
** For anyone thinks it's a different calculation: I did test the function above. It does the same thing with y += array_a[index] * Math.Pow(x, index); and they compute the same result.
Thanks.
This is Horner's method. If you only want to calculate it once per polynomial, this is the most efficient algorithm:
… Horner's method requires only n additions and n multiplications, and its storage requirements are only n times the number of bits of x. …
Horner's method is optimal, in the sense that any algorithm to evaluate an arbitrary polynomial must use at least as many operations. Alexander Ostrowski proved in 1954 that the number of additions required is minimal. Victor Pan proved in 1966 that the number of multiplications is minimal.
If you need to evaluate the polynomial extremely many times and the degree is very high, then there are methods to transform the representation of the polynomial (preconditioning) so that the number of multiplication is reduced to ⌊n/2⌋ + 2. This seems not very practical though, at least I've never seen this in the wild. I've found an online paper that describes some of the algorithms if you are interested.
Also mentioned in the paper, due to the CPU architecture it might be more efficient if you evaluating even and odd terms separately so they can be placed in parallel pipelines:
public double cal(double x) {
double x2 = x * x;
double y_odd = 0.0, y_even = 0.0;
int index = array_a.length - 1;
if (index % 2 == 0) {
y_even = array_a[index];
index -= 1;
}
for (; index >= 0; index -= 2) {
y_odd = array_a[index] + y_odd * x2;
y_even = array_a[index-1] + y_even * x2;
}
return y_even + y_odd * x;
}
The JIT/compiler might be able to do this conversion for you or even use SIMD to make it very fast automagically. Anyway, for this kind of micro-optimization, always profile before committing to a final solution.
I am new to java, and my program is likely nowhere near as efficient as it could be, but here it is:
public class Compute {
public static void main(String[] args) {
for(double i = 10000; i <= 100000; i += 10000)
{
System.out.println("The value for the series when i = " + i + " is " + e(i));
}
}
public static double e(double input) {
double e = 0;
for(double i = 0; i <= input; i++)
{
e += 1 / factorial(input);
}
return e;
}
public static double factorial(double input) {
double factorial = 1;
for(int i = 1; i <= input; i++)
{
factorial *= i;
}
return factorial;
}
}
I believe this calculates the value e for i = 10000, 20000, ..., & 100000.
Where e = 1 + (1/1!) + (2/2!) + ... + (1/i!)
It takes about 47 seconds to do so, but I believe it works.
My issue is, for every i, the result is always 0.0
I believe this is because whenever the method factorial is called, the return value is too big to be stored which somehow causes a problem.
What can I do to store the value returned by the method Factorial?
Although you can calculate arbitrary precision results with BigDecimal, there is no need to calculate to 100000! for the series expansion of e. Consider that the 20th term in the series (20/20!) has a magnitude of about 10-19, so its contribution to the overall total is insignificant.
In other words, the contribution of any terms after the 20th would change only digits after the 19th decimal place.
You should probably use java.math.BigInteger to store the factorial.
Change this
e += 1 / factorial(input);
to
e += 1 / factorial(i);
Lots to do to speed up the code. Think about (i+1)! vs i!, don't recalc the whole factorial every time.
Also stop calculating when the answer will change less than the required precision like Jim said.