Related
This question already has answers here:
nth fibonacci number in sublinear time
(16 answers)
Closed 7 years ago.
I was required to write a simple implementation of Fibonacci's algorithm and then to make it faster.
Here is my initial implementation
public class Fibonacci {
public static long getFibonacciOf(long n) {
if (n== 0) {
return 0;
} else if (n == 1) {
return 1;
} else {
return getFibonacciOf(n-2) + getFibonacciOf(n-1);
}
}
public static void main(String[] args) {
Scanner scanner = new Scanner (System.in);
while (true) {
System.out.println("Enter n :");
long n = scanner.nextLong();
if (n >= 0) {
long beginTime = System.currentTimeMillis();
long fibo = getFibonacciOf(n);
long endTime = System.currentTimeMillis();
long delta = endTime - beginTime;
System.out.println("F(" + n + ") = " + fibo + " ... computed in " + delta + " milliseconds");
} else {
break;
}
}
}
}
As you can see I am using System.currentTimeMillis() to get a simple measure of the time elapsed while computed Fibonacci.
This implementation get rapidly kind of exponentially slow as you can see on the following picture
So I've got a simple optimisation idea. To put previous values in a HashMap and instead of re-computing them each time, to simply take them back from the HashMap if they exist. If they don't exist, we then put them in the HashMap.
Here is the new version of the code
public class FasterFibonacci {
private static Map<Long, Long> previousValuesHolder;
static {
previousValuesHolder = new HashMap<Long, Long>();
previousValuesHolder.put(Long.valueOf(0), Long.valueOf(0));
previousValuesHolder.put(Long.valueOf(1), Long.valueOf(1));
}
public static long getFibonacciOf(long n) {
if (n== 0) {
return 0;
} else if (n == 1) {
return 1;
} else {
if (previousValuesHolder.containsKey(Long.valueOf(n))) {
return previousValuesHolder.get(n);
} {
long newValue = getFibonacciOf(n-2) + getFibonacciOf(n-1);
previousValuesHolder.put(Long.valueOf(n), Long.valueOf(newValue));
return newValue;
}
}
}
public static void main(String[] args) {
Scanner scanner = new Scanner (System.in);
while (true) {
System.out.println("Enter n :");
long n = scanner.nextLong();
if (n >= 0) {
long beginTime = System.currentTimeMillis();
long fibo = getFibonacciOf(n);
long endTime = System.currentTimeMillis();
long delta = endTime - beginTime;
System.out.println("F(" + n + ") = " + fibo + " ... computed in " + delta + " milliseconds");
} else {
break;
}
}
}
This change makes the computing extremely fast. I computes all the values from 2 to 103 in no time at all and I get a long overflow at F(104) (Gives me F(104) = -7076989329685730859, which is wrong). I find it so fast that **I wonder if there is any mistakes in my code (Thank your checking and let me know please) **. Please take a look at the second picture:
Is my faster fibonacci's algorithm's implementation correct (It seems it is to me because it gets the same values as the first version, but since the first version was too slow I could not compute bigger values with it such as F(75))? What other way can I use to make it faster? Or is there a better way to make it faster? Also how can I compute Fibonacci for greater values (such as 150, 200) without getting a **long overflow**? Though it seems fast I would like to push it to the limits. I remember Mr Abrash saying 'The best optimiser is between your two ears', so I believe it can still be improved. Thank you for helping
[Edition Note:] Though this question adresses one of the main point in my question, you can see from above that I have additionnal issues.
Dynamic programming
Idea:Instead of recomputing the same value multiple times you just store the value calculated and use them as you go along.
f(n)=f(n-1)+f(n-2) with f(0)=0,f(1)=1.
So at the point when you have calculated f(n-1) you can easily calculate f(n) if you store the values of f(n) and f(n-1).
Let's take an array of Bignums first. A[1..200].
Initialize them to -1.
Pseudocode
fact(n)
{
if(A[n]!=-1) return A[n];
A[0]=0;
A[1]=1;
for i=2 to n
A[i]= addition of A[i],A[i-1];
return A[n]
}
This runs in O(n) time. Check it out yourself.
This technique is also called memoization.
The IDEA
Dynamic programming (usually referred to as DP ) is a very powerful technique to solve a particular class of problems. It demands very elegant formulation of the approach and simple thinking and the coding part is very easy. The idea is very simple, If you have solved a problem with the given input, then save the result for future reference, so as to avoid solving the same problem again.. shortly 'Remember your Past'.
If the given problem can be broken up in to smaller sub-problems and these smaller subproblems are in turn divided in to still-smaller ones, and in this process, if you observe some over-lappping subproblems, then its a big hint for DP. Also, the optimal solutions to the subproblems contribute to the optimal solution of the given problem ( referred to as the Optimal Substructure Property ).
There are two ways of doing this.
1.) Top-Down : Start solving the given problem by breaking it down. If you see that the problem has been solved already, then just return the saved answer. If it has not been solved, solve it and save the answer. This is usually easy to think of and very intuitive. This is referred to as Memoization. (I have used this idea).
2.) Bottom-Up : Analyze the problem and see the order in which the sub-problems are solved and start solving from the trivial subproblem, up towards the given problem. In this process, it is guaranteed that the subproblems are solved before solving the problem. This is referred to as Dynamic Programming. (MinecraftShamrock used this idea)
There's more!
(Other ways to do this)
Look our quest to get a better solution doesn't end here. You will see a different approach-
If you know how to solve recurrence relation then you will find a solution to this relation
f(n)=f(n-1)+f(n-2) given f(0)=0,f(1)=1
You will arrive at the formula after solving it-
f(n)= (1/sqrt(5))((1+sqrt(5))/2)^n - (1/sqrt(5))((1-sqrt(5))/2)^n
which can be written in more compact form
f(n)=floor((((1+sqrt(5))/2)^n) /sqrt(5) + 1/2)
Complexity
You can get the power a number in O(logn) operations.
You have to learn the Exponentiation by squaring.
EDIT: It is good to point out that this doesn't necessarily mean that the fibonacci number can be found in O(logn). Actually the number of digits we need to calculate frows linearly. Probably because of the position where I stated that it seems to claim the wrong idea that factorial of a number can be calculated in O(logn) time.
[Bakurui,MinecraftShamrock commented on this]
If you need to compute n th fibonacci numbers very frequently I suggest using amalsom's answer.
But if you want to compute a very big fibonacci number, you will run out of memory because you are storing all smaller fibonacci numbers. The following pseudocode only keeps the last two fibonacci numbers in memory, i.e. it requires much less memory:
fibonacci(n) {
if n = 0: return 0;
if n = 1: return 1;
a = 0;
b = 1;
for i from 2 to n: {
sum = a + b;
a = b;
b = sum;
}
return b;
}
Analysis
This can compute very high fibonacci numbers with quite low memory consumption: We have O(n) time as the loop repeats n-1 times. The space complexity is interesting as well: The nth fibonacci number has a length of O(n), which can easily be shown:
Fn <= 2 * Fn-1
Which means that the nth fibonacci number is at most twice as big as its predecessor. Doubling a number in binary is equivalent with a single left-shift, which increases the number of necessary bits by one. So representing the nth fibonacci number takes at most O(n) space. We have at most three successive fibonacci numbers in memory which makes O(n) + O(n-1) + O(n-2) = O(n) total space consumption. In contrast to this the memoization algorithm always keeps the first n fibonacci numbers in memory, which makes O(n) + O(n-1) + O(n-2) + ... + O(1) = O(n^2) space consumption.
So which way should one use?
The only reason to keep all lower fibonacci numbers in memory is if you need fibonacci numbers very frequently. It is a question of balancing time with memory consumption.
Get away from the Fibonacci recursion and use the identities
(F(2n), F(2n-1)) = (F(n)^2 + 2 F(n) F(n-1), F(n)^2+F(n-1)^2)
(F(2n+1), F(2n)) = (F(n+1)^2+F(n)^2, 2 F(n+1) F(n) - F(n)^2)
This allows you to compute (F(m+1), F(m)) in terms of (F(k+1), F(k)) for k half the size of m. Written iteratively with some bit shifting for division by 2, this should give you the theoretical O(log n) speed of exponentiation by squaring while staying entirely within integer arithmetic. (Well, O(log n) arithmetic operations. Since you will be working with numbers with roughly n bits, it won't be O(log n) time once you are forced to switch to a large integer library. After F(50), you will overflow the integer data type, which only goes up to 2^(31).)
(Apologies for not remembering Java well enough to implement this in Java; anyone who wants to is free to edit it in.)
Fibonacci(0) = 0
Fibonacci(1) = 1
Fibonacci(n) = Fibonacci(n - 1) + Fibonacci(n - 2), when n >= 2
Usually there are 2 ways to calculate Fibonacci number:
Recursion:
public long getFibonacci(long n) {
if(n <= 1) {
return n;
} else {
return getFibonacci(n - 1) + getFibonacci(n - 2);
}
}
This way is intuitive and easy to understand, while because it does not reuse calculated Fibonacci number, the time complexity is about O(2^n), but it does not store calculated result, so it saves space a lot, actually the space complexity is O(1).
Dynamic Programming:
public long getFibonacci(long n) {
long[] f = new long[(int)(n + 1)];
f[0] = 0;
f[1] = 1;
for(int i=2;i<=n;i++) {
f[i] = f[i - 1] + f[i - 2];
}
return f[(int)n];
}
This Memoization way calculated Fibonacci numbers and reuse them when calculate next one. The time complexity is pretty good, which is O(n), while space complexity is O(n). Let's investigate whether the space complexity can be optimized... Since f(i) only requires f(i - 1) and f(i - 2), there is not necessary to store all calculated Fibonacci numbers.
The more efficient implementation is:
public long getFibonacci(long n) {
if(n <= 1) {
return n;
}
long x = 0, y = 1;
long ans;
for(int i=2;i<=n;i++) {
ans = x + y;
x = y;
y = ans;
}
return ans;
}
With time complexity O(n), and space complexity O(1).
Added: Since Fibonacci number increase amazing fast, long can only handle less than 100 Fibonacci numbers. In Java, we can use BigInteger to store more Fibonacci numbers.
Precompute a large number of fib(n) results, and store them as a lookup table inside your algorithm. Bam, free "speed"
Now if you need to compute fib(101) and you already have fibs 0 to 100 stored, this is just like trying to compute fib(1).
Chances are this isn't what this homework is looking for, but it's a completely legit strategy and basically the idea of caching extracted further away from running the algorithm. If you know you're likely to be computing the first 100 fibs often and you need to do it really really fast, there's nothing faster than O(1). So compute those values entirely out of band and store them so they can be looked up later.
Of course, cache values as you compute them too :) Duplicated computation is waste.
Here is a snippet of code with an iterative approach instead of recursion.
Output example:
Enter n: 5
F(5) = 5 ... computed in 1 milliseconds
Enter n: 50
F(50) = 12586269025 ... computed in 0 milliseconds
Enter n: 500
F(500) = ...4125 ... computed in 2 milliseconds
Enter n: 500
F(500) = ...4125 ... computed in 0 milliseconds
Enter n: 500000
F(500000) = 2955561408 ... computed in 4,476 ms
Enter n: 500000
F(500000) = 2955561408 ... computed in 0 ms
Enter n: 1000000
F(1000000) = 1953282128 ... computed in 15,853 ms
Enter n: 1000000
F(1000000) = 1953282128 ... computed in 0 ms
Some pieces of results are omitted with ... for a better view.
Code snippet:
public class CachedFibonacci {
private static Map<BigDecimal, BigDecimal> previousValuesHolder;
static {
previousValuesHolder = new HashMap<>();
previousValuesHolder.put(BigDecimal.ZERO, BigDecimal.ZERO);
previousValuesHolder.put(BigDecimal.ONE, BigDecimal.ONE);
}
public static BigDecimal getFibonacciOf(long number) {
if (0 == number) {
return BigDecimal.ZERO;
} else if (1 == number) {
return BigDecimal.ONE;
} else {
if (previousValuesHolder.containsKey(BigDecimal.valueOf(number))) {
return previousValuesHolder.get(BigDecimal.valueOf(number));
} else {
BigDecimal olderValue = BigDecimal.ONE,
oldValue = BigDecimal.ONE,
newValue = BigDecimal.ONE;
for (int i = 3; i <= number; i++) {
newValue = oldValue.add(olderValue);
olderValue = oldValue;
oldValue = newValue;
}
previousValuesHolder.put(BigDecimal.valueOf(number), newValue);
return newValue;
}
}
}
public static void main(String[] args) {
Scanner scanner = new Scanner(System.in);
while (true) {
System.out.print("Enter n: ");
long inputNumber = scanner.nextLong();
if (inputNumber >= 0) {
long beginTime = System.currentTimeMillis();
BigDecimal fibo = getFibonacciOf(inputNumber);
long endTime = System.currentTimeMillis();
long delta = endTime - beginTime;
System.out.printf("F(%d) = %.0f ... computed in %,d milliseconds\n", inputNumber, fibo, delta);
} else {
System.err.println("You must enter number > 0");
System.out.println("try, enter number again, please:");
break;
}
}
}
}
This approach runs much faster than the recursive version.
In such a situation, the iterative solution tends to be a bit faster, because each
recursive method call takes a certain amount of processor time. In principle, it is
possible for a smart compiler to avoid recursive method calls if they follow simple
patterns, but most compilers don’t do that. From that point of view, an iterative
solution is preferable.
UPDATE:
After Java 8 releases and Stream API is available one more way is available for calculating Fibonacci.
Checked with JDK 17.0.2.
Code:
public static BigInteger streamFibonacci(long n) {
return Stream.iterate(new BigInteger[]{BigInteger.ONE, BigInteger.ONE},
p -> new BigInteger[]{p[1], p[0].add(p[1])})
.limit(n)
.reduce((a, b) -> b)
.get()[0];
}
Test output:
Enter n (q for quit): 5
F(5) = 5 ... computed in 2 ms
Enter n (q for quit): 50
F(50) = 1258626902 ... computed in 0 ms
Enter n (q for quit): 500
F(500) = 1394232245 ... computed in 3 ms
Enter n (q for quit): 500000
F(500000) = 2955561408 ... computed in 4,343 ms
Enter n (q for quit): 1000000
F(1000000) = 1953282128 ... computed in 19,280 ms
The results are pretty good.
Keep in mind that ... just cuts all following digits of the real numbers.
Having followed a similar approach some time ago, I've just realized there's another optimization you can make.
If you know two large consecutive answers, you can use this as a starting point. For example, if you know F(100) and F(101), then calculating F(104) is approximately as difficult (*) as calculating F(4) based on F(0) and F(1).
Calculating iteratively up is as efficient calculation-wise as doing the same using cached-recursion, but uses less memory.
Having done some sums, I have also realized that, for any given z < n:
F(n)=F(z) * F(n-z) + F(z-1) * F(n-z-1)
If n is odd, and you choose z=(n+1)/2, then this is reduced to
F(n)=F(z)^2+F(z-1)^2
It seems to me that you should be able to use this by a method I have yet to find, that you should be able use the above info to find F(n) in the number of operations equal to:
the number of bits in n doublings (as per above) + the number of 1 bits in n addings; in the case of 104, this would be (7 bits, 3 '1' bits) = 14 multiplications (squarings), 10 additions.
(*) assuming adding two numbers takes the same time, irrelevant of the size of the two numbers.
Here's a way of provably doing it in O(log n) (as the loop runs log n times):
/*
* Fast doubling method
* F(2n) = F(n) * (2*F(n+1) - F(n)).
* F(2n+1) = F(n+1)^2 + F(n)^2.
* Adapted from:
* https://www.nayuki.io/page/fast-fibonacci-algorithms
*/
private static long getFibonacci(int n) {
long a = 0;
long b = 1;
for (int i = 31 - Integer.numberOfLeadingZeros(n); i >= 0; i--) {
long d = a * ((b<<1) - a);
long e = (a*a) + (b*b);
a = d;
b = e;
if (((n >>> i) & 1) != 0) {
long c = a+b;
a = b;
b = c;
}
}
return a;
}
I am assuming here (as is conventional) that one multiply / add / whatever operation is constant time irrespective of number of bits, i.e. that a fixed-length data type will be used.
This page explains several methods of which this is the fastest. I simply translated it away from using BigInteger for readability. Here's the BigInteger version:
/*
* Fast doubling method.
* F(2n) = F(n) * (2*F(n+1) - F(n)).
* F(2n+1) = F(n+1)^2 + F(n)^2.
* Adapted from:
* http://www.nayuki.io/page/fast-fibonacci-algorithms
*/
private static BigInteger getFibonacci(int n) {
BigInteger a = BigInteger.ZERO;
BigInteger b = BigInteger.ONE;
for (int i = 31 - Integer.numberOfLeadingZeros(n); i >= 0; i--) {
BigInteger d = a.multiply(b.shiftLeft(1).subtract(a));
BigInteger e = a.multiply(a).add(b.multiply(b));
a = d;
b = e;
if (((n >>> i) & 1) != 0) {
BigInteger c = a.add(b);
a = b;
b = c;
}
}
return a;
}
EDIT: maaartinus gave the answer I was looking for and tmyklebu's data on the problem helped a lot, so thanks both! :)
I've read a bit about how HotSpot has some "intrinsics" that injects in the code, specially for Java standard Math libs (from here)
So I decided to give it a try, to see how much difference HotSpot could make against doing the comparison directly (specially since I've heard min/max can compile to branchless asm).
public class OpsMath {
public static final int max(final int a, final int b) {
if (a > b) {
return a;
}
return b;
}
}
That's my implementation. From another SO question I've read that using the ternary operator uses an extra register, I haven't found significant differences between doing an if block and using a ternary operator (ie, return ( a > b ) ? a : b ).
Allocating a 8Mb int array (ie, 2 million values), and randomizing it, I do the following test:
try ( final Benchmark bench = new Benchmark( "millis to max" ) )
{
int max = Integer.MIN_VALUE;
for ( int i = 0; i < array.length; ++i )
{
max = OpsMath.max( max, array[i] );
// max = Math.max( max, array[i] );
}
}
I'm using a Benchmark object in a try-with-resources block. When it finishes, it calls close() on the object and prints the time the block took to complete. The tests are done separately by commenting in/out the max calls in the code above.
'max' is added to a list outside the benchmark block and printed later, so to avoid the JVM optimizing the whole block away.
The array is randomized each time the test runs.
Running the test 6 times, it gives these results:
Java standard Math:
millis to max 9.242167
millis to max 2.1566199999999998
millis to max 2.046396
millis to max 2.048616
millis to max 2.035761
millis to max 2.001044
So fairly stable after the first run, and running the tests again gives similar results.
OpsMath:
millis to max 8.65418
millis to max 1.161559
millis to max 0.955851
millis to max 0.946642
millis to max 0.994543
millis to max 0.9469069999999999
Again, very stable results after the first run.
The question is: Why? Thats quite a big difference there. And I have no idea why. Even if I implement my max() method exactly like Math.max() (ie, return (a >= b) ? a : b ) I still get better results! It makes no sense.
Specs:
CPU: Intel i5 2500, 3,3Ghz.
Java Version: JDK 8 (public march 18 release), x64.
Debian Jessie (testing release) x64.
I have yet to try with 32 bit JVM.
EDIT: Self contained test as requested. Added a line to force the JVM to preload Math and OpsMath classes. That eliminates the 18ms cost of the first iteration for OpsMath test.
// Constant nano to millis.
final double TO_MILLIS = 1.0d / 1000000.0d;
// 8Mb alloc.
final int[] array = new int[(8*1024*1024)/4];
// Result and time array.
final ArrayList<Integer> results = new ArrayList<>();
final ArrayList<Double> times = new ArrayList<>();
// Number of tests.
final int itcount = 6;
// Call both Math and OpsMath method so JVM initializes the classes.
System.out.println("initialize classes " +
OpsMath.max( Math.max( 20.0f, array.length ), array.length / 2.0f ));
final Random r = new Random();
for ( int it = 0; it < itcount; ++it )
{
int max = Integer.MIN_VALUE;
// Randomize the array.
for ( int i = 0; i < array.length; ++i )
{
array[i] = r.nextInt();
}
final long start = System.nanoTime();
for ( int i = 0; i < array.length; ++i )
{
max = Math.max( array[i], max );
// OpsMath.max() method implemented as described.
// max = OpsMath.max( array[i], max );
}
// Calc time.
final double end = (System.nanoTime() - start);
// Store results.
times.add( Double.valueOf( end ) );
results.add( Integer.valueOf( max ) );
}
// Print everything.
for ( int i = 0; i < itcount; ++i )
{
System.out.println( "IT" + i + " result: " + results.get( i ) );
System.out.println( "IT" + i + " millis: " + times.get( i ) * TO_MILLIS );
}
Java Math.max result:
IT0 result: 2147477409
IT0 millis: 9.636998
IT1 result: 2147483098
IT1 millis: 1.901314
IT2 result: 2147482877
IT2 millis: 2.095551
IT3 result: 2147483286
IT3 millis: 1.9232859999999998
IT4 result: 2147482828
IT4 millis: 1.9455179999999999
IT5 result: 2147482475
IT5 millis: 1.882047
OpsMath.max result:
IT0 result: 2147482689
IT0 millis: 9.003616
IT1 result: 2147483480
IT1 millis: 0.882421
IT2 result: 2147483186
IT2 millis: 1.079143
IT3 result: 2147478560
IT3 millis: 0.8861169999999999
IT4 result: 2147477851
IT4 millis: 0.916383
IT5 result: 2147481983
IT5 millis: 0.873984
Still the same overall results. I've tried with randomizing the array only once, and repeating the tests over the same array, I get faster results overall, but the same 2x difference between Java Math.max and OpsMath.max.
It's hard to tell why Math.max is slower than a Ops.max, but it's easy to tell why this benchmark strongly favors branching to conditional moves: On the n-th iteration, the probability of
Math.max( array[i], max );
being not equal to max is the probability that array[n-1] is bigger than all previous elements. Obviously, this probability gets lower and lower with growing n and given
final int[] array = new int[(8*1024*1024)/4];
it's rather negligible most of the time. The conditional move instruction is insensitive to the branching probability, it always take the same amount of time to execute. The conditional move instruction is faster than branch prediction if the branch is very hard to predict. On the other hand, branch prediction is faster if the branch can be predicted well with high probability. Currently, I'm unsure about the speed of conditional move compared to best and worst case of branching.1
In your case all but first few branches are fairly predictable. From about n == 10 onward, there's no point in using conditional moves as the branch is rather guaranteed to be predicted correctly and can execute in parallel with other instructions (I guess you need exactly one cycle per iteration).
This seems to happen for algorithms computing minimum/maximum or doing some inefficient sorting (good branch predictability means low entropy per step).
1 Both conditional move and predicted branch take one cycle. The problem with the former is that it needs its two operands and this takes additional instruction. In the end the critical path may get longer and/or the ALUs saturated while the branching unit is idle. Often, but not always, branches can be predicted well in practical applications; that's why branch prediction was invented in the first place.
As for the gory details of timing conditional move vs. branch prediction best and worst case, see the discussion below in comments. My my own benchmark shows that conditional move is significantly faster than branch prediction when branch prediction encounters its worst case, but I can't ignore contradictory results. We need some explanation for what exactly makes the difference. Some more benchmarks and/or analysis could help.
When I run your (suitably-modified) code using Math.max on an old (1.6.0_27) JVM, the hot loop looks like this:
0x00007f4b65425c50: mov %r11d,%edi ;*getstatic array
; - foo146::bench#81 (line 40)
0x00007f4b65425c53: mov 0x10(%rax,%rdx,4),%r8d
0x00007f4b65425c58: mov 0x14(%rax,%rdx,4),%r10d
0x00007f4b65425c5d: mov 0x18(%rax,%rdx,4),%ecx
0x00007f4b65425c61: mov 0x2c(%rax,%rdx,4),%r11d
0x00007f4b65425c66: mov 0x28(%rax,%rdx,4),%r9d
0x00007f4b65425c6b: mov 0x24(%rax,%rdx,4),%ebx
0x00007f4b65425c6f: rex mov 0x20(%rax,%rdx,4),%esi
0x00007f4b65425c74: mov 0x1c(%rax,%rdx,4),%r14d ;*iaload
; - foo146::bench#86 (line 40)
0x00007f4b65425c79: cmp %edi,%r8d
0x00007f4b65425c7c: cmovl %edi,%r8d
0x00007f4b65425c80: cmp %r8d,%r10d
0x00007f4b65425c83: cmovl %r8d,%r10d
0x00007f4b65425c87: cmp %r10d,%ecx
0x00007f4b65425c8a: cmovl %r10d,%ecx
0x00007f4b65425c8e: cmp %ecx,%r14d
0x00007f4b65425c91: cmovl %ecx,%r14d
0x00007f4b65425c95: cmp %r14d,%esi
0x00007f4b65425c98: cmovl %r14d,%esi
0x00007f4b65425c9c: cmp %esi,%ebx
0x00007f4b65425c9e: cmovl %esi,%ebx
0x00007f4b65425ca1: cmp %ebx,%r9d
0x00007f4b65425ca4: cmovl %ebx,%r9d
0x00007f4b65425ca8: cmp %r9d,%r11d
0x00007f4b65425cab: cmovl %r9d,%r11d ;*invokestatic max
; - foo146::bench#88 (line 40)
0x00007f4b65425caf: add $0x8,%edx ;*iinc
; - foo146::bench#92 (line 39)
0x00007f4b65425cb2: cmp $0x1ffff9,%edx
0x00007f4b65425cb8: jl 0x00007f4b65425c50
Apart from the weirdly-placed REX prefix (not sure what that's about), here you have a loop that's been unrolled 8 times that does mostly what you'd expect---loads, comparisons, and conditional moves. Interestingly, if you swap the order of the arguments to max, here it outputs the other kind of 8-deep cmovl chain. I guess it doesn't know how to generate a 3-deep tree of cmovls or 8 separate cmovl chains to be merged after the loop is done.
With the explicit OpsMath.max, it turns into a ratsnest of conditional and unconditional branches that's unrolled 8 times. I'm not going to post the loop; it's not pretty. Basically each mov/cmp/cmovl above gets broken into a load, a compare and a conditional jump to where a mov and a jmp happen. Interestingly, if you swap the order of the arguments to max, here it outputs an 8-deep cmovle chain instead. EDIT: As #maaartinus points out, said ratsnest of branches is actually faster on some machines because the branch predictor works its magic on them and these are well-predicted branches.
I would hesitate to draw conclusions from this benchmark. You have benchmark construction issues; you have to run it a lot more times than you are and you have to factor your code differently if you want to time Hotspot's fastest code. Beyond the wrapper code, you aren't measuring how fast your max is, or how well Hotspot understands what you're trying to do, or anything else of value here. Both implementations of max will result in code that's entirely too fast for any sort of direct measurement to be meaningful within the context of a larger program.
Using JDK 8:
java version "1.8.0"
Java(TM) SE Runtime Environment (build 1.8.0-b132)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode)
On Ubuntu 13.10
I ran the following:
import java.util.Random;
import java.util.function.BiFunction;
public class MaxPerformance {
private final BiFunction<Integer, Integer, Integer> max;
private final int[] array;
public MaxPerformance(BiFunction<Integer, Integer, Integer> max, int[] array) {
this.max = max;
this.array = array;
}
public double time() {
long start = System.nanoTime();
int m = Integer.MIN_VALUE;
for (int i = 0; i < array.length; ++i) m = max.apply(m, array[i]);
m = Integer.MIN_VALUE;
for (int i = 0; i < array.length; ++i) m = max.apply(array[i], m);
// total time over number of calls to max
return ((double) (System.nanoTime() - start)) / (double) array.length / 2.0;
}
public double averageTime(int repeats) {
double cumulativeTime = 0;
for (int i = 0; i < repeats; i++)
cumulativeTime += time();
return (double) cumulativeTime / (double) repeats;
}
public static void main(String[] args) {
int size = 1000000;
Random random = new Random(123123123L);
int[] array = new int[size];
for (int i = 0; i < size; i++) array[i] = random.nextInt();
double tMath = new MaxPerformance(Math::max, array).averageTime(100);
double tAlt1 = new MaxPerformance(MaxPerformance::max1, array).averageTime(100);
double tAlt2 = new MaxPerformance(MaxPerformance::max2, array).averageTime(100);
System.out.println("Java Math: " + tMath);
System.out.println("Alt 1: " + tAlt1);
System.out.println("Alt 2: " + tAlt2);
}
public static int max1(final int a, final int b) {
if (a >= b) return a;
return b;
}
public static int max2(final int a, final int b) {
return (a >= b) ? a : b; // same as JDK implementation
}
}
And I got the following results (average nanoseconds taken for each call to max):
Java Math: 15.443555810000003
Alt 1: 14.968298919999997
Alt 2: 16.442204045
So on a long run it looks like the second implementation is the fastest, although by a relatively small margin.
In order to have a slightly more scientific test, it makes sense to compute the max of pairs of elements where each call is independent from the previous one. This can be done by using two randomized arrays instead of one as in this benchmark:
import java.util.Random;
import java.util.function.BiFunction;
public class MaxPerformance2 {
private final BiFunction<Integer, Integer, Integer> max;
private final int[] array1, array2;
public MaxPerformance2(BiFunction<Integer, Integer, Integer> max, int[] array1, int[] array2) {
this.max = max;
this.array1 = array1;
this.array2 = array2;
if (array1.length != array2.length) throw new IllegalArgumentException();
}
public double time() {
long start = System.nanoTime();
int m = Integer.MIN_VALUE;
for (int i = 0; i < array1.length; ++i) m = max.apply(array1[i], array2[i]);
m += m; // to avoid optimizations!
return ((double) (System.nanoTime() - start)) / (double) array1.length;
}
public double averageTime(int repeats) {
// warm up rounds:
double tmp = 0;
for (int i = 0; i < 10; i++) tmp += time();
tmp *= 2.0;
double cumulativeTime = 0;
for (int i = 0; i < repeats; i++)
cumulativeTime += time();
return cumulativeTime / (double) repeats;
}
public static void main(String[] args) {
int size = 1000000;
Random random = new Random(123123123L);
int[] array1 = new int[size];
int[] array2 = new int[size];
for (int i = 0; i < size; i++) {
array1[i] = random.nextInt();
array2[i] = random.nextInt();
}
double tMath = new MaxPerformance2(Math::max, array1, array2).averageTime(100);
double tAlt1 = new MaxPerformance2(MaxPerformance2::max1, array1, array2).averageTime(100);
double tAlt2 = new MaxPerformance2(MaxPerformance2::max2, array1, array2).averageTime(100);
System.out.println("Java Math: " + tMath);
System.out.println("Alt 1: " + tAlt1);
System.out.println("Alt 2: " + tAlt2);
}
public static int max1(final int a, final int b) {
if (a >= b) return a;
return b;
}
public static int max2(final int a, final int b) {
return (a >= b) ? a : b; // same as JDK implementation
}
}
Which gave me:
Java Math: 15.346468170000005
Alt 1: 16.378737519999998
Alt 2: 20.506475350000006
The way your test is set up makes a huge difference on the results. The JDK version seems to be the fastest in this scenario. This time by a relatively large margin compared to the previous case.
Somebody mentioned Caliper. Well if you read the wiki, one the first things they say about micro-benchmarking is not to do it: this is because it's hard to get accurate results in general. I think this is a clear example of that.
Here's a branchless min operation, max can be implemented by replacing diff=a-b with diff=b-a.
public static final long min(final long a, final long b) {
final long diff = a - b;
// All zeroes if a>=b, all ones if a<b because the sign bit is propagated
final long mask = diff >> 63;
return (a & mask) | (b & (~mask));
}
It should be as fast as streaming the memory because the CPU operations should be hidden by the sequential memory read latency.
I've got two different methods, one is calculating Fibonacci sequence to the nth element by using iteration and the other one is doing the same thing using recursive method.
Program example looks like this:
import java.util.Scanner;
public class recursionVsIteration {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
//nth element input
System.out.print("Enter the last element of Fibonacci sequence: ");
int n = sc.nextInt();
//Print out iteration method
System.out.println("Fibonacci iteration:");
long start = System.currentTimeMillis();
System.out.printf("Fibonacci sequence(element at index %d) = %d \n", n, fibIteration(n));
System.out.printf("Time: %d ms\n", System.currentTimeMillis() - start);
//Print out recursive method
System.out.println("Fibonacci recursion:");
start = System.currentTimeMillis();
System.out.printf("Fibonacci sequence(element at index %d) = %d \n", n, fibRecursion(n));
System.out.printf("Time: %d ms\n", System.currentTimeMillis() - start);
}
//Iteration method
static int fibIteration(int n) {
int x = 0, y = 1, z = 1;
for (int i = 0; i < n; i++) {
x = y;
y = z;
z = x + y;
}
return x;
}
//Recursive method
static int fibRecursion(int n) {
if ((n == 1) || (n == 0)) {
return n;
}
return fibRecursion(n - 1) + fibRecursion(n - 2);
}
}
I was trying to find out which method is faster. I came to the conclusion that recursion is faster for the smaller amount of numbers, but as the value of nth element increases recursion becomes slower and iteration becomes faster. Here are the three different results for three different n:
Example #1 (n = 10)
Enter the last element of Fibonacci sequence: 10
Fibonacci iteration:
Fibonacci sequence(element at index 10) = 55
Time: 5 ms
Fibonacci recursion:
Fibonacci sequence(element at index 10) = 55
Time: 0 ms
Example #2 (n = 20)
Enter the last element of Fibonacci sequence: 20
Fibonacci iteration:
Fibonacci sequence(element at index 20) = 6765
Time: 4 ms
Fibonacci recursion:
Fibonacci sequence(element at index 20) = 6765
Time: 2 ms
Example #3 (n = 30)
Enter the last element of Fibonacci sequence: 30
Fibonacci iteration:
Fibonacci sequence(element at index 30) = 832040
Time: 4 ms
Fibonacci recursion:
Fibonacci sequence(element at index 30) = 832040
Time: 15 ms
What I really want to know is why all of a sudden iteration became faster and recursion became slower. I'm sorry if I missed some obvious answer to this question, but I'm still new to the programming, I really don't understand what's going on behind that and I would like to know. Please provide a good explanation or point me in the right direction so I can find out the answer myself. Also, if this is not a good way to test which method is faster let me know and suggest me different method.
Thanks in advance!
For terseness, Let F(x) be the recursive Fibonacci
F(10) = F(9) + F(8)
F(10) = F(8) + F(7) + F(7) + F(6)
F(10) = F(7) + F(6) + F(6) + F(5) + 4 more calls.
....
So your are calling F(8) twice,
F(7) 3 times, F(6) 5 times, F(5) 7 times.. and so on
So with larger inputs, the tree gets bigger and bigger.
This article does a comparison between recursion and iteration and covers their application on generating fibonacci numbers.
As noted in the article,
The reason for the poor performance is heavy push-pop of the registers in the ill level of each recursive call.
which basically says there is more overhead in the recursive method.
Also, take a look at Memoization
When doing the recursive implementation of Fibonacci algorithm, you are adding redundant calls by recomputing the same values over and over again.
fib(5) = fib(4) + fib(3)
fib(4) = fib(3) + fib(2)
fib(3) = fib(2) + fib(1)
Notice, that fib(2) will be redundantly calculated both for fib(4) and for fib(3).
However this can be overcome by a technique called Memoization, that improves the efficiency of recursive Fibonacci by storing the values, you have calculated once. Further calls of fib(x) for known values may be replaced by a simple lookup, eliminating the need for further recursive calls.
This is the main difference between the iterative and recursive approaches, if you are interested, there are also other, more efficient algorithms of calculating Fibonacci numbers.
Why is Recursion slower?
When you call your function again itself (as recursion) the compiler allocates new Activation Record (Just think as an ordinary Stack) for that new function. That stack is used to keep your states, variables, and addresses. Compiler creates a stack for each function and this creation process continues until the base case is reached. So, when the data size becomes larger, compiler needs large stack segment to calculate the whole process. Calculating and managing those Records is also counted during this process.
Also, in recursion, the stack segment is being raised during run-time. Compiler does not know how much memory will be occupied during compile time.
That is why if you don't handle your Base case properly, you will get StackOverflow exception :).
Using recursion the way you have, the time complexity is O(fib(n)) which is very expensive. The iterative method is O(n) This doesn't show because a) your tests are very short, the code won't even be compiled b) you used very small numbers.
Both examples will become faster the more you run them. Once a loop or method has been called 10,000 times, it should be compiled to native code.
If anyone is interested in an iterative Function with array:
public static void fibonacci(int y)
{
int[] a = new int[y+1];
a[0] = 0;
a[1] = 1;
System.out.println("Step 0: 0");
System.out.println("Step 1: 1");
for(int i=2; i<=y; i++){
a[i] = a[i-1] + a[i-2];
System.out.println("Step "+i+": "+a[i]);
}
System.out.println("Array size --> "+a.length);
}
This solution crashes for input value 0.
Reason: The array a will be initialized 0+1=1 but the consecutive assignment of a[1] will result in an index out of bounds exception.
Either add an if statement that returns 0 on y=0 or initialize the array by y+2, which will waste 1 int but still be of constant space and not change big O.
I prefer using a mathematical solution using the golden number. enjoy
private static final double GOLDEN_NUMBER = 1.618d;
public long fibonacci(int n) {
double sqrt = Math.sqrt(5);
double result = Math.pow(GOLDEN_NUMBER, n);
result = result - Math.pow(1d - GOLDEN_NUMBER, n);
result = Math.round(result / sqrt);
return Double.valueOf(result).longValue();
}
Whenever you are looking for time taken to complete a particular algorithm, it's best you always go for time complexity.
Evaluate the time complexity on the paper in terms of O(something).
Comparing the above two approaches, time complexity of iterative approach is O(n) whereas that of recursive approach is O(2^n).
Let's try to find the time complexity of fib(4)
Iterative approach, the loop evaluates 4 times, so it's time complexity is O(n).
Recursive approach,
fib(4)
fib(3) + fib(2)
fib(2) + fib(1) fib(1) + fib(0)
fib(1) + fib(0)
so fib() is called 9 times which is slightly lower than 2^n when the value of n is large, even small also(remember that BigOh(O) takes care of upper bound) .
As a result we can say that the iterative approach is evaluating in polynomial time, whereas recursive one is evaluating in exponential time
The recursive approach that you use is not efficient. I would suggest you use tail recursion. In contrast to your approach tail recursion keeps only one function call in the stack at any point in time.
public static int tailFib(int n) {
if (n <= 1) {
return n;
}
return tailFib(0, 1, n);
}
private static int tailFib(int a, int b, int count) {
if(count <= 0) {
return a;
}
return tailFib(b, a+b, count-1);
}
public static void main(String[] args) throws Exception{
for (int i = 0; i <10; i++){
System.out.println(tailFib(i));
}
}
I have a recursive solution that you where the computed values are stored to avoid the further unnecessary computations. The code is provided below,
public static int fibonacci(int n) {
if(n <= 0) return 0;
if(n == 1) return 1;
int[] arr = new int[n+1];
// this is faster than using Array
// List<Integer> lis = new ArrayList<>(Collections.nCopies(n+1, 0));
arr[0] = 0;
arr[1] = 1;
return fiboHelper(n, arr);
}
public static int fiboHelper(int n, int[] arr){
if(n <= 0) {
return arr[0];
}
else if(n == 1) {
return arr[1];
}
else {
if( arr[n-1] != 0 && (arr[n-2] != 0 || (arr[n-2] == 0 && n-2 == 0))){
return arr[n] = arr[n-1] + arr[n-2];
}
else if (arr[n-1] == 0 && arr[n-2] != 0 ){
return arr[n] = fiboHelper(n-1, arr) + arr[n-2];
}
else {
return arr[n] = fiboHelper(n-2, arr) + fiboHelper(n-1, arr );
}
}
}
Practicing recursion and D&C and a frequent problem seems to be to convert the array:
[a1,a2,a3..an,b1,b2,b3...bn] to [a1,b1,a2,b2,a3,b3...an,bn]
I solved it as follows (startA is the start of as and startB is the start of bs:
private static void shuffle(int[] a, int startA, int startB){
if(startA == startB)return;
int tmp = a[startB];
shift(a, startA + 1, startB);
a[startA + 1] = tmp;
shuffle(a, startA + 2, startB + 1);
}
private static void shift(int[] a, int start, int end) {
if(start >= end)return;
for(int i = end; i > start; i--){
a[i] = a[i - 1];
}
}
But I am not sure what the runtime is. Isn't it linear?
Let the time consumed by the algorithm be T(n), and let n=startB-startA.
Each recursive invokation reduces the run time by 1 (startB-startA is reduced by one per invokation), so the run time is T(n) = T(n-1) + f(n), we only need to figure what f(n) is.
The bottle neck in each invokation is the shift() operation, which is iterating from startA+1 to startB, meaning n-1 iterations.
Thus, the complexity of the algorithm is T(n) = T(n-1) + (n-1).
However, this is a known Theta(n^2) function (sum of arithmetic progression) - and the time complexity of the algorithm is Theta(N^2), since the initial startB-startA is linear with N (the size of the input).
I am writing a "simple" program to determine the Nth number in the Fibonacci sequence. Ex: the 7th number in the sequence is: 13. I have finished writing the program, it works, but beginning at the 40th number it begins to delay, and takes longer, and longer. My program has to go to the 100th spot in the series.
How can I fix this so it doesn't take so long? This is very basic program, so I don't know all the fancy syntax codes.. my formula is:
if n =1 || n = 0
return n;
else
return F(n-1) + F(n-2);
This works great until it goes past the 40th term. What other statement do I have to add to make it go quicker for higher numbers??
The problem is that because you are using simple recursion, you re-evaluate F(n) multiple times, so your execution time is exponential.
There are two simple ways to fix this:
1) Cache values of F(n) when they are evaluated the first time. Check the cache first before evaluating F(n) to see if you have already calculated it for this n.
2) Use an iterative approach: Calculate F(1), F(2), F(3), etc... until you reach the number you need.
The issue is that your algorithm, while mathematically pure (and nice) isn't very good.
For every number it wants to calculate, it has to calculate two lower ones which in turn have to calculate two lower ones, etc. Your current algorithm has a Big O notation complexity of about O(1.6n), so for very large numbers (100 for example) it takes a long time.
This book, Structure and Interpretation of Computer programs has a nice diagram: showing what happens when you generate fib 5 with your algorithm
(source: mit.edu)
The simplest thing to do is to store F - 1 and F - 2, so that you don't have to calculate them from scratch every time. In other words, rather than using recursion, use a loop. Than means that the complexity of the algorithm goes from O(1.6n) to O(n).
There are a number of solutions. The most straightforward is to use memoization. There's also Binet's formula which will give you the nth fibonacci number in constant time.
For memoization, you store your results for F[a_i] in a map or list of some kind. In the naive recursion, you compute F[4] hundreds of thousands of times, for example. By storing all these results as you find them, the recursion ceases to proceed like a tree and looks like the straightforward iterative solution.
If this isn't homework, use Binet's formula. It's the fastest method available.
Try this example, it calculates the millionth Fibonacci number in a reasonable time frame without any loss of precision.
import java.math.BigInteger;
/*
250000th fib # is: 36356117010939561826426 .... 10243516470957309231046875
Time to compute: 3.5 seconds.
1000000th fib # is: 1953282128707757731632 .... 93411568996526838242546875
Time to compute: 58.1 seconds.
*/
public class Fib {
public static void main(String... args) {
int place = args.length > 0 ? Integer.parseInt(args[0]) : 1000 * 1000;
long start = System.nanoTime();
BigInteger fibNumber = fib(place);
long time = System.nanoTime() - start;
System.out.println(place + "th fib # is: " + fibNumber);
System.out.printf("Time to compute: %5.1f seconds.%n", time / 1.0e9);
}
private static BigInteger fib(int place) {
BigInteger a = new BigInteger("0");
BigInteger b = new BigInteger("1");
while (place-- > 1) {
BigInteger t = b;
b = a.add(b);
a = t;
}
return b;
}
}
Create an array with 100 values, then when you calculate a value for Fib(n), store it in the array and use that array to get the values of Fib(n-1) and Fib(n-2).
If you're calling Fib(100) without storing any of the previously calculated values, you're going to make your java runtime explode.
Pseudocode:
array[0] = 0;
array[1] = 1;
for 2:100
array[n] = array[n-1] + array[n-2];
The problem is not JAVA, but the way you are implementing your Fibonacci algorithm.
You are computing the same values many times, which is slowing your program.
Try something like this : Fibonacci with memoization
F(n)
/ \
F(n-1) F(n-2)
/ \ / \
F(n-2) F(n-3) F(n-3) F(n-4)
/ \
F(n-3) F(n-4)
Notice that many computations are repeated!
Important point to note is this algorithm is exponential because it does not store the result of previous calculated numbers. eg F(n-3) is called 3 times.
Better solution is iterative code written below
function fib2(n) {
if n = 0
return 0
create an array f[0.... n]
f[0] = 0, f[1] = 1
for i = 2...n:
f[i] = f[i - 1] + f[i - 2]
return f[n]
}
For more details refer algorithm by dasgupta chapter 0.2
My solution using Java 8 Stream:
public class Main {
public static void main(String[] args) {
int n = 10;
Fibonacci fibonacci = new Fibonacci();
LongStream.generate(fibonacci::next)
.skip(n)
.findFirst()
.ifPresent(System.out::println);
}
}
public class Fibonacci {
private long next = 1;
private long current = 1;
public long next() {
long result = current;
long previous = current;
current = next;
next = current + previous;
return result;
}
}
If you use the naive approach, you'll end up with an exploding number of same calculations, i.e. to calc fib(n) you have to calc fib(n-1) and fib(n-2). Then to calc fib(n-1) you have to calc fib(n-2) and fib(n-3), etc. A better approach is to do the inverse. You calc starting with fib(0), fib(1), fib(2) and store the values in a table. Then to calc the subsequent values you use the values stored in a table (array). This is also caled memoization. Try this and you should be able to calc large fib numbers.
This is the code in Python, which can easily be converted to C/Java. First one is recursive and second is the iterative solution.
def fibo(n, i=1, s=1, s_1=0):
if n <= i: return s
else: return fibo(n, i+1, s+s_1, s)
def fibo_iter_code(n):
s, s_1 = 1, 0
for i in range(n-1):
temp = s
s, s_1 = s+s_1, temp
print(s)
Too slow...
Better:
(JavaScript example)
function fibonacci(n) {
var a = 0, b = 1;
for (var i = 0; i < n; i++) {
a += b;
b = a - b;
}
return a;
}
import java.util.*;
public class FibonacciNumber
{
public static void main(String[] args)
{
int high = 1, low = 1;
int num;
Scanner in = new Scanner(System.in);
try
{
System.out.print("Enter Number : " );
num = in.nextInt();
System.out.println( low);
while(high < num && num < 2000000000)
{
System.out.println(high);
high = low + high;
low = high - low;
}
} catch (InputMismatchException e) {
System.out.print("Limit Exceeded");
}
}
}
/* Ouput :
Enter Number : 1999999999
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
6765
10946
17711
28657
46368
75025
121393
196418
317811
514229
832040
1346269
2178309
3524578
5702887
9227465
14930352
24157817
39088169
63245986
102334155
165580141
267914296
433494437
701408733
1134903170
1836311903
-1323752223
512559680
-811192543
-298632863
-1109825406
-1408458269
1776683621
368225352 */
Naive implementation is natural and elegant but during execution recursive calls are creating binary tree. Beside already mentioned memoization, cashing of previous F(n) results and avoiding of unnecessary tree traversal, you can go for tail call optimization, already mentioned iterative or matrix multiplication. For example, Java 8 memoization:
private static final Map<Long, Long> memo = new HashMap<>();
static {
memo.put(0L, 0L);
memo.put(1L, 1L);
}
public static void main(String[] args) {
System.out.println(fibonacci(0));
System.out.println(fibonacci(43));
System.out.println(fibonacci(92));
}
public static long fibonacci(long n) {
return memo.computeIfAbsent(n, m -> fibonacci(m - 1) + fibonacci(m - 2));
}
Or maybe tail call optimized version:
interface FewArgs<T, U, V, R> {
public R apply(T t, U u, V v);
}
static FewArgs<Long, Long, Long, Long> tailRecursive;
static {
tailRecursive = (a, b, n) -> {
if (n > 0)
return tailRecursive.apply(b, a + b, n - 1);
return a;
};
}
You call it with a = 0, b = 1, n is required nth Fibonacci number but must be smaller than 93.
More efficient ways to calculate Fibonacci numbers are matrix squaring, you will find example on my blog, and Binet formula
You can use the caching technic. Since f(n)= f(n-1)+f(n-2) , you'll calculate f(n-2) one more time when you calculate f(n-1). So simply treat them as two incremental numbers like below:
public int fib(int ithNumber) {
int prev = 0;
int current = 1;
int newValue;
for (int i=1; i<ithNumber; i++) {
newValue = current + prev;
prev = current;
current = newValue;
}
return current;
}
It looks better with multiple statements of ternary operator.
static int fib(int n) {
return n > 5 ? fib(n-2) + fib(n-1)
: n < 2 || n == 5 ? n
: n - 1;
}