Slow recursive function. How can I speed it up?

Slow recursive function. How can I speed it up? - java

Below is a recursive function that does nothing. The real function does some checks that actually can remove some of the recursive calls but it's just an example to simplify the question.
public void speedtest2(char[] s) {
int a, c;
int l, lmax;
int arrlength = Array.getLength(s);
for (a = 0; a < arrlength; a++) {
if ((a == 0) || (s[a]) != s[a - 1]) {
if (s[a] == '*') {
lmax = 26;
} else {
lmax = 1;
};
for (l = 1; l <= lmax; l++) {
tot++;
char[] tmp = new char[arrlength - 1];
int p = 0;
for (c = 0; c < arrlength; c++) {
if (c != a) {
tmp[p++] = s[c];
};
};
speedtest2(tmp);
}
}
}
}
Calling speedtest with an array containing "srrieca*" will take up to 5 seconds on my lowly HTC Flyer android tablet. Granted in this extreme example, the function is called 1 274 928 times but still.
I'm pretty sure there is something to do to improve the speed of this example.

You can reduce the garbage collector overhead by pooling the tmp array. Of course after you have checked that this is a bottleneck in your case.

One possibility involves reducing the number of recursive calls by having some sort of cache that stores results that can be re-used, instead of recursing. Although in your contrived example, it's not clear what that would be.

Memory allocation is an expensive operation. Your speed is brought down by that new statement. Try to allocate one large block for a working area and stay within that.
Also you're going to trigger the garbage collector quite often, which is an even bigger speed bump.

Related

Multithreaded Segmented Sieve of Eratosthenes in Java

I am trying to create a fast prime generator in Java. It is (more or less) accepted that the fastest way for this is the segmented sieve of Eratosthenes: https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes. Lots of optimizations can be further implemented to make it faster. As of now, my implementation generates 50847534 primes below 10^9 in about 1.6 seconds, but I am looking to make it faster and at least break the 1 second barrier. To increase the chance of getting good replies, I will include a walkthrough of the algorithm as well as the code.
Still, as a TL;DR, I am looking to include multi-threading into the code
For the purposes of this question, I want to separate between the 'segmented' and the 'traditional' sieves of Eratosthenes. The traditional sieve requires O(n) space and therefore is very limited in range of the input (the limit of it). The segmented sieve however only requires O(n^0.5) space and can operate on much larger limits. (A main speed-up is using a cache-friendly segmentation, taking into account the L1 & L2 cache sizes of the specific computer). Finally, the main difference that concerns my question is that the traditional sieve is sequential, meaning it can only continue once the previous steps are completed. The segmented sieve however, is not. Each segment is independent, and is 'processed' individually against the sieving primes (the primes not larger than n^0.5). This means that theoretically, once I have the sieving primes, I can divide the work between multiple computers, each processing a different segment. The work of eachother is independent of the others. Assuming (wrongly) that each segment requires the same amount of time t to complete, and there are k segments, One computer would require total time of T = k * t, whereas k computers, each working on a different segment would require a total amount of time T = t to complete the entire process. (Practically, this is wrong, but for the sake of simplicity of the example).
This brought me to reading about multithreading - dividing the work to a few threads each processing a smaller amount of work for better usage of CPU. To my understanding, the traditional sieve cannot be multithreaded exactly because it is sequential. Each thread would depend on the previous, rendering the entire idea unfeasible. But a segmented sieve may indeed (I think) be multithreaded.
Instead of jumping straight into my question, I think it is important to introduce my code first, so I am hereby including my current fastest implementation of the segmented sieve. I have worked quite hard on it. It took quite some time, slowly tweaking and adding optimizations to it. The code is not simple. It is rather complex, I would say. I therefore assume the reader is familiar with the concepts I am introducing, such as wheel factorization, prime numbers, segmentation and more. I have included notes to make it easier to follow.
import java.math.BigInteger;
import java.util.ArrayList;
import java.util.Arrays;
public class primeGen {
public static long x = (long)Math.pow(10, 9); //limit
public static int sqrtx;
public static boolean [] sievingPrimes; //the sieving primes, <= sqrtx
public static int [] wheels = new int [] {2,3,5,7,11,13,17,19}; // base wheel primes
public static int [] gaps; //the gaps, according to the wheel. will enable skipping multiples of the wheel primes
public static int nextp; // the first prime > wheel primes
public static int l; // the amount of gaps in the wheel
public static void main(String[] args)
{
long startTime = System.currentTimeMillis();
preCalc(); // creating the sieving primes and calculating the list of gaps
int segSize = Math.max(sqrtx, 32768*8); //size of each segment
long u = nextp; // 'u' is the running index of the program. will continue from one segment to the next
int wh = 0; // the will be the gap index, indicating by how much we increment 'u' each time, skipping the multiples of the wheel primes
long pi = pisqrtx(); // the primes count. initialize with the number of primes <= sqrtx
for (long low = 0 ; low < x ; low += segSize) //the heart of the code. enumerating the primes through segmentation. enumeration will begin at p > sqrtx
{
long high = Math.min(x, low + segSize);
boolean [] segment = new boolean [(int) (high - low + 1)];
int g = -1;
for (int i = nextp ; i <= sqrtx ; i += gaps[g])
{
if (sievingPrimes[(i + 1) / 2])
{
long firstMultiple = (long) (low / i * i);
if (firstMultiple < low)
firstMultiple += i;
if (firstMultiple % 2 == 0) //start with the first odd multiple of the current prime in the segment
firstMultiple += i;
for (long j = firstMultiple ; j < high ; j += i * 2)
segment[(int) (j - low)] = true;
}
g++;
//if (g == l) //due to segment size, the full list of gaps is never used **within just one segment** , and therefore this check is redundant.
//should be used with bigger segment sizes or smaller lists of gaps
//g = 0;
}
while (u <= high)
{
if (!segment[(int) (u - low)])
pi++;
u += gaps[wh];
wh++;
if (wh == l)
wh = 0;
}
}
System.out.println(pi);
long endTime = System.currentTimeMillis();
System.out.println("Solution took "+(endTime - startTime) + " ms");
}
public static boolean [] simpleSieve (int l)
{
long sqrtl = (long)Math.sqrt(l);
boolean [] primes = new boolean [l/2+2];
Arrays.fill(primes, true);
int g = -1;
for (int i = nextp ; i <= sqrtl ; i += gaps[g])
{
if (primes[(i + 1) / 2])
for (int j = i * i ; j <= l ; j += i * 2)
primes[(j + 1) / 2]=false;
g++;
if (g == l)
g=0;
}
return primes;
}
public static long pisqrtx ()
{
int pi = wheels.length;
if (x < wheels[wheels.length-1])
{
if (x < 2)
return 0;
int k = 0;
while (wheels[k] <= x)
k++;
return k;
}
int g = -1;
for (int i = nextp ; i <= sqrtx ; i += gaps[g])
{
if(sievingPrimes[( i + 1 ) / 2])
pi++;
g++;
if (g == l)
g=0;
}
return pi;
}
public static void preCalc ()
{
sqrtx = (int) Math.sqrt(x);
int prod = 1;
for (long p : wheels)
prod *= p; // primorial
nextp = BigInteger.valueOf(wheels[wheels.length-1]).nextProbablePrime().intValue(); //the first prime that comes after the wheel
int lim = prod + nextp; // circumference of the wheel
boolean [] marks = new boolean [lim + 1];
Arrays.fill(marks, true);
for (int j = 2 * 2 ;j <= lim ; j += 2)
marks[j] = false;
for (int i = 1 ; i < wheels.length ; i++)
{
int p = wheels[i];
for (int j = p * p ; j <= lim ; j += 2 * p)
marks[j]=false; // removing all integers that are NOT comprime with the base wheel primes
}
ArrayList <Integer> gs = new ArrayList <Integer>(); //list of the gaps between the integers that are coprime with the base wheel primes
int d = nextp;
for (int p = d + 2 ; p < marks.length ; p += 2)
{
if (marks[p]) //d is prime. if p is also prime, then a gap is identified, and is noted.
{
gs.add(p - d);
d = p;
}
}
gaps = new int [gs.size()];
for (int i = 0 ; i < gs.size() ; i++)
gaps[i] = gs.get(i); // Arrays are faster than lists, so moving the list of gaps to an array
l = gaps.length;
sievingPrimes = simpleSieve(sqrtx); //initializing the sieving primes
}
}
Currently, it produces 50847534 primes below 10^9 in about 1.6 seconds. This is very impressive, at least by my standards, but I am looking to make it faster, possibly break the 1 second barrier. Even then, I believe it can be made much faster still.
The whole program is based on wheel factorization: https://en.wikipedia.org/wiki/Wheel_factorization. I have noticed I am getting the fastest results using a wheel of all primes up to 19.
public static int [] wheels = new int [] {2,3,5,7,11,13,17,19}; // base wheel primes
This means that the multiples of those primes are skipped, resulting in a much smaller searching range. The gaps between numbers which we need to take are then calculated in the preCalc method. If we make those jumps between the the numbers in the searching range we skip the multiples of the base primes.
public static void preCalc ()
{
sqrtx = (int) Math.sqrt(x);
int prod = 1;
for (long p : wheels)
prod *= p; // primorial
nextp = BigInteger.valueOf(wheels[wheels.length-1]).nextProbablePrime().intValue(); //the first prime that comes after the wheel
int lim = prod + nextp; // circumference of the wheel
boolean [] marks = new boolean [lim + 1];
Arrays.fill(marks, true);
for (int j = 2 * 2 ;j <= lim ; j += 2)
marks[j] = false;
for (int i = 1 ; i < wheels.length ; i++)
{
int p = wheels[i];
for (int j = p * p ; j <= lim ; j += 2 * p)
marks[j]=false; // removing all integers that are NOT comprime with the base wheel primes
}
ArrayList <Integer> gs = new ArrayList <Integer>(); //list of the gaps between the integers that are coprime with the base wheel primes
int d = nextp;
for (int p = d + 2 ; p < marks.length ; p += 2)
{
if (marks[p]) //d is prime. if p is also prime, then a gap is identified, and is noted.
{
gs.add(p - d);
d = p;
}
}
gaps = new int [gs.size()];
for (int i = 0 ; i < gs.size() ; i++)
gaps[i] = gs.get(i); // Arrays are faster than lists, so moving the list of gaps to an array
l = gaps.length;
sievingPrimes = simpleSieve(sqrtx); //initializing the sieving primes
}
At the end of the preCalc method, the simpleSieve method is called, efficiently sieving all the sieving primes mentioned before, the primes <= sqrtx. This is a simple Eratosthenes sieve, rather than segmented, but it is still based on wheel factorization, perviously computed.
public static boolean [] simpleSieve (int l)
{
long sqrtl = (long)Math.sqrt(l);
boolean [] primes = new boolean [l/2+2];
Arrays.fill(primes, true);
int g = -1;
for (int i = nextp ; i <= sqrtl ; i += gaps[g])
{
if (primes[(i + 1) / 2])
for (int j = i * i ; j <= l ; j += i * 2)
primes[(j + 1) / 2]=false;
g++;
if (g == l)
g=0;
}
return primes;
}
Finally, we reach the heart of the algorithm. We start by enumerating all primes <= sqrtx, with the following call:
long pi = pisqrtx();`
which used the following method:
public static long pisqrtx ()
{
int pi = wheels.length;
if (x < wheels[wheels.length-1])
{
if (x < 2)
return 0;
int k = 0;
while (wheels[k] <= x)
k++;
return k;
}
int g = -1;
for (int i = nextp ; i <= sqrtx ; i += gaps[g])
{
if(sievingPrimes[( i + 1 ) / 2])
pi++;
g++;
if (g == l)
g=0;
}
return pi;
}
Then, after initializing the pi variable which keeps track of the enumeration of primes, we perform the mentioned segmentation, starting the enumeration from the first prime > sqrtx:
int segSize = Math.max(sqrtx, 32768*8); //size of each segment
long u = nextp; // 'u' is the running index of the program. will continue from one segment to the next
int wh = 0; // the will be the gap index, indicating by how much we increment 'u' each time, skipping the multiples of the wheel primes
long pi = pisqrtx(); // the primes count. initialize with the number of primes <= sqrtx
for (long low = 0 ; low < x ; low += segSize) //the heart of the code. enumerating the primes through segmentation. enumeration will begin at p > sqrtx
{
long high = Math.min(x, low + segSize);
boolean [] segment = new boolean [(int) (high - low + 1)];
int g = -1;
for (int i = nextp ; i <= sqrtx ; i += gaps[g])
{
if (sievingPrimes[(i + 1) / 2])
{
long firstMultiple = (long) (low / i * i);
if (firstMultiple < low)
firstMultiple += i;
if (firstMultiple % 2 == 0) //start with the first odd multiple of the current prime in the segment
firstMultiple += i;
for (long j = firstMultiple ; j < high ; j += i * 2)
segment[(int) (j - low)] = true;
}
g++;
//if (g == l) //due to segment size, the full list of gaps is never used **within just one segment** , and therefore this check is redundant.
//should be used with bigger segment sizes or smaller lists of gaps
//g = 0;
}
while (u <= high)
{
if (!segment[(int) (u - low)])
pi++;
u += gaps[wh];
wh++;
if (wh == l)
wh = 0;
}
}
I have also included it as a note, but will explain as well. Because the segment size is relatively small, we will not go through the entire list of gaps within just one segment, and checking it - is redundant. (Assuming we use a 19-wheel). But in a broader scope overview of the program, we will make use of the entire array of gaps, so the variable u has to follow it and not accidentally surpass it:
while (u <= high)
{
if (!segment[(int) (u - low)])
pi++;
u += gaps[wh];
wh++;
if (wh == l)
wh = 0;
}
Using higher limits will eventually render a bigger segment, which might result in a neccessity of checking we don't surpass the gaps list even within the segment. This, or tweaking the wheel primes base might have this effect on the program. Switching to bit-sieving can largely improve the segment limit though.
As an important side-note, I am aware that efficient segmentation is
one that takes the L1 & L2 cache-sizes into account. I get the
fastest results using a segment size of 32,768 * 8 = 262,144 = 2^18. I am not sure what the cache-size of my computer is, but I do
not think it can be that big, as I see most cache sizes <= 32,768.
Still, this produces the fastest run time on my computer, so this is
why it's the chosen segment size.
As I mentioned, I am still looking to improve this by a lot. I
believe, according to my introduction, that multithreading can result
in a speed-up factor of 4, using 4 threads (corresponding to 4
cores). The idea is that each thread will still use the idea of the
segmented sieve, but work on different portions. Divide the n
into 4 equal portions - threads, each in turn performing the
segmentation on the n/4 elements it is responsible for, using the
above program. My question is how do I do that? Reading about
multithreading and examples, unfortunately, did not bring to me any
insight on how to implement it in the case above efficiently. It
seems to me, as opposed to the logic behind it, that the threads were
running sequentially, rather than simultaneously. This is why I
excluded it from the code to make it more readable. I will really
appreciate a code sample on how to do it in this specific code, but a
good explanation and reference will maybe do the trick too.
Additionally, I would like to hear about more ways of speeding-up
this program even more, any ideas you have, I would love to hear!
Really want to make it very fast and efficient. Thank you!

An example like this should help you get started.
An outline of a solution:
Define a data structure ("Task") that encompasses a specific segment; you can put all the immutable shared data into it for extra neatness, too. If you're careful enough, you can pass a common mutable array to all tasks, along with the segment limits, and only update the part of the array within these limits. This is more error-prone, but can simplify the step of joining the results (AFAICT; YMMV).
Define a data structure ("Result") that stores the result of a Task computation. Even if you just update a shared resulting structure, you may need to signal which part of that structure has been updated so far.
Create a Runnable that accepts a Task, runs a computation, and puts the results into a given result queue.
Create a blocking input queue for Tasks, and a queue for Results.
Create a ThreadPoolExecutor with the number of threads close to the number of machine cores.
Submit all your Tasks to the thread pool executor. They will be scheduled to run on the threads from the pool, and will put their results into the output queue, not necessarily in order.
Wait for all the tasks in the thread pool to finish.
Drain the output queue and join the partial results into the final result.
Extra speedup may (or may not) be achieved by joining the results in a separate task that reads the output queue, or even by updating a mutable shared output structure under synchronized, depending on how much work the joining step involves.
Hope this helps.

Are you familiar with the work of Tomas Oliveira e Silva? He has a very fast implementation of the Sieve of Eratosthenes.

How interested in speed are you? Would you consider using c++?
$ time ../c_code/segmented_bit_sieve 1000000000
50847534 primes found.
real 0m0.875s
user 0m0.813s
sys 0m0.016s
$ time ../c_code/segmented_bit_isprime 1000000000
50847534 primes found.
real 0m0.816s
user 0m0.797s
sys 0m0.000s
(on my newish laptop with an i5)
The first is from #Kim Walisch using a bit array of odd prime candidates.
https://github.com/kimwalisch/primesieve/wiki/Segmented-sieve-of-Eratosthenes
The second is my tweak to Kim's with IsPrime[] also implemented as bit array, which is slightly less clear to read, although a little faster for big N due to the reduced memory footprint.
I will read your post carefully as I am interested in primes and performance no matter what language is used. I hope this isn't too far off topic or premature. But I noticed I was already beyond your performance goal.

Good fix for this algorithmic puzzle code (USACO)?

I'm a first-year computer science student and I am currently dabbling in some algorithmic competitions. The code below that I made has a flaw that I'm not sure how to fix
Here is the problem statement:
http://www.usaco.org/index.php?page=viewproblem2&cpid=811
In the statement, I missed where it said that Farmer John could only switch boots on tiles that both boots can stand on. I tried adding constraints in different places but none seemed to address the problem fully. I don't really see a way to do it without butchering the code
Basically, the problem is that John keeps switching boots on tiles where the new boots can't stand on, and I can't seem to fix it
Here is my code (sorry for the one letter variables):
import java.io.*;
import java.util.*;
public class snowboots {
static int n,k;
static int[] field,a,b; //a,b --> strength, distance
static int pos;
public static void main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(new FileReader("snowboots.in"));
PrintWriter pw = new PrintWriter(new BufferedWriter(new FileWriter("snowboots.out")));
StringTokenizer st = new StringTokenizer(br.readLine());
n = Integer.parseInt(st.nextToken());
k = Integer.parseInt(st.nextToken());
st = new StringTokenizer(br.readLine());
field = new int[n];
a = new int[k];
b = new int[k];
for (int i = 0; i < n; i++)
field[i] = Integer.parseInt(st.nextToken());
for (int i = 0; i < k; i++) {
st = new StringTokenizer(br.readLine());
a[i] = Integer.parseInt(st.nextToken());
b[i] = Integer.parseInt(st.nextToken());
}
pw.println(solve());
pw.close();
}
static int solve() {
pos = 0;
int i = 0; //which boot are we on?
while(pos < n-1) {
while(move(i)); //move with boot i as far as possible
i++; //use the next boot
}
i--;
return i;
}
static boolean move(int c) {
for (int i = pos+b[c]; i > pos; i--) {
if (i < n && field[i] <= a[c]) { //snow has to be less than boot strength
pos = i;
return true;
}
}
return false;
}
}
I tried adding a constraint in the "move" method, and one when updating I, but they both are too strict and activate at unwanted times
Is it salvageable?

Yes, it's possible to salvage your solution, by adding an extra for-loop.
What you need to do is, if you find that your previous pair of boots can get you all the way to a tile that's too deep in snow for your next pair, then you need to try "backtracking" to the latest tile that's not too deep. This ends up giving a solution in worst-case O(N·B) time and O(1) extra space.
It may not be obvious why it's OK to backtrack to that tile — after all, just because you can reach a given tile, that doesn't necessarily mean that you were able to reach all the tiles before it — so let me explain a bit why it is OK.
Let maxReachableTileNum be the number (between 1 and N) of the last tile that you were able to reach with your previous boots, and let lastTileNumThatsNotTooDeep be the number (between 1 and N) of the last tile on or before maxReachableTileNum that's not too deeply snow-covered for your next pair. (We know that there is such a tile, because tile #1 has no snow at all, so if nothing else we know that we can backtrack to the very beginning.) Now, since we were able to get to maxReachableTileNum, then some previous boot must have either stepped on lastTileNumThatsNotTooDeep (in which case, no problem, it's reachable) or skipped over it to some later tile (on or before maxReachableTileNum). But that later tile must be deeper than lastTileNumThatsNotTooDeep (because that later tile's depth is greater than scurrentBootNum, which is at least at great as the depth of lastTileNumThatsNotTooDeep), which means that the boot that skipped over lastTileNumThatsNotTooDeep certainly could have stepped on lastTileNumThatsNotTooDeep instead: it would have meant taking a shorter step (OK) onto a less-deeply-covered tile (OK) than what it actually did. So, either way, we know that lastTileNumThatsNotTooDeep was reachable. So it's safe for us to try backtracking to lastTileNumThatsNotTooDeep. (Note: the below code uses the name reachableTileNum instead of lastTileNumThatsNotTooDeep, because it continues to use the reachableTileNum variable for searching forward to find reachable tiles.)
However, we still have to hold onto the previous maxReachableTileNum: backtracking might turn out not to be helpful (because it may not let us make any further forward progress than we already have), in which case we'll just discard these boots, and move on to the next pair, with maxReachableTileNum at its previous value.
So, overall, we have this:
public static int solve(
final int[] tileSnowDepths, // tileSnowDepths[0] is f_1
final int[] bootAllowedDepths, // bootAllowedDepths[0] is s_1
final int[] bootAllowedTilesPerStep // bootAllowedTilesPerStep[0] is d_1
) {
final int numTiles = tileSnowDepths.length;
final int numBoots = bootAllowedDepths.length;
assert numBoots == bootAllowedTilesPerStep.length;
int maxReachableTileNum = 1; // can reach tile #1 even without boots
for (int bootNum = 1; bootNum <= numBoots; ++bootNum) {
final int allowedDepth = bootAllowedDepths[bootNum-1];
final int allowedTilesPerStep = bootAllowedTilesPerStep[bootNum-1];
// Find the starting-point for this boot -- ideally the last tile
// reachable so far, but may need to "backtrack" if that tile is too
// deep; see explanation above of why it's safe to assume that we
// can backtrack to the latest not-too-deep tile:
int reachableTileNum = maxReachableTileNum;
while (tileSnowDepths[reachableTileNum-1] > allowedDepth) {
--reachableTileNum;
}
// Now see how far we can go, updating both maxReachableTileNum and
// reachableTileNum when we successfully reach new tiles:
for (int tileNumToTry = maxReachableTileNum + 1;
tileNumToTry <= numTiles
&& tileNumToTry <= reachableTileNum + allowedTilesPerStep;
++tileNumToTry
) {
if (tileSnowDepths[tileNumToTry-1] <= allowedDepth) {
maxReachableTileNum = reachableTileNum = tileNumToTry;
}
}
// If we've made it to the last tile, then yay, we're done:
if (maxReachableTileNum == numTiles) {
return bootNum - 1; // had to discard this many boots to get here
}
}
throw new IllegalArgumentException("Couldn't reach last tile with any boot");
}
(I tested this on USACO's example data, and it returned 2, as expected.)
This can potentially be optimized further, e.g. with logic to skip pairs of boots that clearly aren't helpful (because they're neither stronger nor more agile than the previous successful pair), or with an extra data structure to keep track of the positions of latest minima (to optimize the backtracking process), or with logic to avoid backtracking further than is conceivably useful; but given that N·B ≤ 2502 = 62,500, I don't think any such optimizations are warranted.
Edited to add (2019-02-23): I've thought about this further, and it occurs to me that it's actually possible to write a solution in worst-case O(N + B log N) time (which is asymptotically better than O(N·B)) and O(N) extra space. But it's much more complicated; it involves three extra data-structures (one to keep track of the positions of latest minima, to allow backtracking in O(log N) time; one to keep track of the positions of future minima, to allow checking in O(log N) time if the backtracking is actually helpful (and if so to move forward to the relevant minimum); and one to maintain the necessary forward-looking information in order to let the second one be maintained in amortized O(1) time). It's also complicated to explain why that solution is guaranteed to be within O(N + B log N) time (because it involves a lot of amortized analysis, and making a minor change that might seem like an optimization — e.g., replacing a linear search with a binary search — can break the analysis and actually increase the worst-case time complexity. Since N and B are both known to be at most 250, I don't think all the complication is worth it.

You can solve this problem by Dynamic Programming. You can see the concept in this link (Just read the Computer programming part).
It has following two steps.
First solve the problem recursively.
Memoize the states.
#include<bits/stdc++.h>
using namespace std;
#define ll long long
#define mx 100005
#define mod 1000000007
int n, b;
int f[333], s[333], d[333];
int dp[251][251];
int rec(int snowPos, int bootPos)
{
if(snowPos == n-1){
return 0;
int &ret = dp[snowPos][bootPos];
if(ret != -1) return ret;
ret = 1000000007;
for(int i = bootPos+1; i<b; i++)
{
if(s[i] >= f[snowPos]){
ret = min(ret, i - bootPos + rec(snowPos, i));
}
}
for(int i = 1; i<=d[bootPos] && snowPos+i < n; i++){
if(f[snowPos + i] <= s[bootPos]){
ret = min(ret, rec(snowPos+i, bootPos));
}
}
return ret;
}
int main()
{
freopen("snowboots.in", "r", stdin);
freopen("snowboots.out", "w", stdout);
scanf("%d %d", &n, &b);
for(int i = 0; i<n; i++)
scanf("%d", &f[i]);
for(int i = 0; i<b; i++){
scanf("%d %d", &s[i], &d[i]);
}
memset(dp, -1, sizeof dp);
printf("%d\n", rec(0, 0));
return 0;
}
This is my solution to this problem (in C++).
This is just a recursion. As problem says,
you can change boot, Or
you can do a jump by current boot.
Memoization part is done by the 2-Dimensional array dp[][].

One way which to solve it using BFS. You may refer below code for details. Hope this helps.
import java.util.*;
import java.io.*;
public class SnowBoots {
public static int n;
public static int[] deep;
public static int nBoots;
public static Boot[] boots;
public static void main(String[] args) throws Exception {
// Read the grid.
Scanner stdin = new Scanner(new File("snowboots.in"));
// Read in all of the input.
n = stdin.nextInt();
nBoots = stdin.nextInt();
deep = new int[n];
for (int i = 0; i < n; ++i) {
deep[i] = stdin.nextInt();
}
boots = new Boot[nBoots];
for (int i = 0; i < nBoots; ++i) {
int d = stdin.nextInt();
int s = stdin.nextInt();
boots[i] = new boot(d, s);
}
PrintWriter out = new PrintWriter(new FileWriter("snowboots.out"));
out.println(bfs());
out.close();
stdin.close();
}
// Breadth First Search Algorithm [https://en.wikipedia.org/wiki/Breadth-first_search]
public static int bfs() {
// These are all valid states.
boolean[][] used = new boolean[n][nBoots];
Arrays.fill(used[0], true);
// Put each of these states into the queue.
LinkedList<Integer> q = new LinkedList<Integer>();
for (int i = 0; i < nBoots; ++i) {
q.offer(i);
}
// Usual bfs.
while (q.size() > 0) {
int cur = q.poll();
int step = cur / nBoots;
int bNum = cur % nBoots;
// Try stepping with this boot...
for (int i = 1; ((step + i) < n) && (i <= boots[bNum].maxStep); ++i) {
if ((deep[step+i] <= boots[bNum].depth) && !used[step+i][bNum]) {
q.offer(nBoots * (step + i) + bNum);
used[step + i][bNum] = true;
}
}
// Try switching to another boot.
for (int i = bNum + 1; i < nBoots; ++i) {
if ((boots[i].depth >= deep[step]) && !used[step][i]) {
q.offer(nBoots * step + i);
used[step][i] = true;
}
}
}
// Find the earliest boot that got us here.
for (int i = 0; i < nBoots; ++i) {
if (used[n - 1][i]) {
return i;
}
}
// Should never get here.
return -1;
}
}
class Boot {
public int depth;
public int maxStep;
public Boot(int depth, int maxStep) {
this.depth = depth;
this.maxStep = maxStep;
}
}

How to merge an array of size M into another array of size 2M

I got this question in an online code challenge. I needed to merge two sorted arrays, one of size M with M elements into another one, with M elements and capacity 2M. I provided the following solution:
class ArraysSorting {
/**
* Function to move elements at the end of array
*
* #param bigger array
*/
void moveToEnd(int bigger[]) {
int i = 0, j = bigger.length - 1;
for (i = bigger.length - 1; i >= 0; i--)
if (bigger[i] != 0) {
bigger[j] = bigger[i];
j--;
}
}
/**
* Merges array smaller array of size M into bigger array of size 2M
* #param bigger array
* #param smaller array
*/
void merge(int bigger[], int smaller[]) {
moveToEnd(bigger);
int i = smaller.length;
int j = 0;
int k = 0;
while (k < (bigger.length)) {
if ((i < (bigger.length) && bigger[i] <= smaller[j]) || (j == smaller.length)) {
bigger[k] = bigger[i];
k++;
i++;
} else {
bigger[k] = smaller[j];
k++;
j++;
}
}
}
}
Is there a more efficient way to do this?
The Time Complexity: O(2M)

You can't beat linear time because you have to at least scan all the 2M elements, so that's the best you can ever get.
In practice though, you can optimize it a little further. There's no need to shift the elements of bigger towards the end; just write the results right-to-left rather than left-to-right (and of course, you'll need to invert the comparison, at any step you'll want to select the largest element rather than the smallest).
Also, this is not good:
if ((i < (bigger.length) && bigger[i] <= smaller[j]) || (j == smaller.length)) {
/* ... */
}
You should test j == smaller.length before accessing smaller[j]; the code as is will possibly access out of bounds positions in smaller. Do this instead:
if ((j == smaller.length) || (i < (bigger.length) && bigger[i] <= smaller[j])) {
/* ... */
}
Overall, I do think you can make the code simpler, here's something that in my opinion is easier to read because the if conditions are smaller and easier to understand (it also approaches the traditional way in which you merge two arrays and avoids the extra O(M) work of shifting the elements to the back):
void merge(int bigger[], size_t bigger_len, int smaller[], size_t smaller_len) {
ssize_t smaller_i, bigger_i, idx;
if (smaller_len == 0)
return;
smaller_i = smaller_len-1;
if (bigger_len == 0)
bigger_i = -1;
else
bigger_i = bigger_len-1;
idx = bigger_len+smaller_len-1;
while (smaller_i >= 0 && bigger_i >= 0) {
if (bigger[bigger_i] > smaller[smaller_i]) {
bigger[idx] = bigger[bigger_i];
bigger_i--;
}
else {
bigger[idx] = smaller[smaller_i];
smaller_i--;
}
idx--;
}
while (smaller_i >= 0) {
bigger[idx] = smaller[smaller_i];
smaller_i--;
idx--;
}
}
It's easy to see that the first loop runs as long as a comparison between two elements in the different arrays is possible (rather than having the loop always run and use complicated if tests inside). Also note that since output is being written to bigger, once the first loop terminates, we only need to make sure that the rest (if any) of smaller that is left is copied over to bigger. The code is in C, but it's pretty much the same in Java. bigger_len and smaller_len are the number of elements in bigger and in smaller; it is assumed that bigger has enough space for bigger_len+smaller_len elements. The initial if tests to assign to smaller_i and bigger_i are necessary to handle edge cases where subtracting 1 would overflow the (unsigned) range of size_t; they are unnecessary in Java since Java doesn't have unsigned types (correct me if I'm wrong, I haven't done Java recently).
Time complexity remains O(M).

Java Benchmark for recursive stairs climbing puzzle

This now very common algorithm question was asked by a proctor during a whiteboard exam session. My job was to observe, listen to and objectively judge the answers given, but I had neither control over this question asked nor could interact with the person answering.
There was five minutes given to analyze the problem, where the candidate could write bullet notes, pseudo code (this was allowed during actual code-writing as well as long as it was clearly indicated, and people including pseudo-code as comments or TODO tasks before figuring out the algorithm got bonus points).
"A child is climbing up a staircase with n steps, and can hop either 1 step, 2 steps, or 3 steps at a time. Implement a method to count how many possible ways the child can jump up the stairs."
The person who got this question couldn't get started on the recursion algorithm on the spot, so the proctor eventually piece-by-piece led him to HIS solution, which in my opinion was not optimal (well, different from my chosen solution making it difficult to grade someone objectively with respect to code optimization).
Proctor:
public class Staircase {
public static int stairs;
public Staircase() {
int a = counting(stairs);
System.out.println(a);
}
static int counting(int n) {
if (n < 0)
return 0;
else if (n == 0)
return 1;
else
return counting(n - 1) + counting(n - 2) + counting(n - 3);
}
public static void main(String[] args) {
Staircase child;
long t1 = System.nanoTime();
for (int i = 0; i < 30; i++) {
stairs = i;
child = new Staircase();
}
System.out.println("Time:" + ((System.nanoTime() - t1)/1000000));
}
}
//
Mine:
public class Steps {
public static int stairs;
int c2 = 0;
public Steps() {
int a = step2(0);
System.out.println(a);
}
public static void main(String[] args) {
Steps steps;
long t1 = System.nanoTime();
for (int i = 0; i < 30; i++) {
stairs = i;
steps = new Steps();
}
System.out.println("Time:" + ((System.nanoTime() - t1) / 1000000));
}
public int step2(int c) {
if (c + 1 < stairs) {
if (c + 2 <= stairs) {
if (c + 3 <= stairs) {
step2(c + 3);
}
step2(c + 2);
}
step2(c + 1);
} else {
c2++;
}
return c2;
}
}
OUTPUT:
Proctor: Time: 356
Mine: Time: 166
Could someone clarify which algorithm is better/ more optimal? The execution time of my algorithm appears to be less than half as long, (but I am referencing and updating an additional integer which i thought was rather inconsequential) and it allows for setting arbitrary starting and ending step without needing to first now their difference (although for anything higher than n=40 you will need a beast of a CPU).
My question: (feel free to ignore the above example) How do you properly benchmark a similar recursion-based problem (tower of Hanoi etc.). Do you just look at the timing, or take other things into consideration (heap?).

Teaser: You may perform this computation easily in less than one millisecond. Details follow...
Which one is "better"?
The question of which algorithm is "better" may refer to the execution time, but also to other things, like the implementation style.
The Staircase implementation is shorter, more concise and IMHO more readable. And more importantly: It does not involve a state. The c2 variable that you introduced there destroys the advantages (and beauty) of a purely functional recursive implementation. This may easily be fixed, although the implementation then already becomes more similar to the Staircase one.
Measuring performance
Regarding the question about execution time: Properly measuring execution time in Java is tricky.
Related reading:
How do I write a correct micro-benchmark in Java?
Java theory and practice: Anatomy of a flawed microbenchmark
HotSpot Internals
In order to properly and reliably measure execution times, there exist several options. Apart from a profiler, like VisualVM, there are frameworks like JMH or Caliper, but admittedly, using them may be some effort.
For the simplest form of a very basic, manual Java Microbenchmark you have to consider the following:
Run the algorithms multiple times, to give the JIT a chance to kick in
Run the algorithms alternatingly and not only one after the other
Run the algorithms with increasing input size
Somehow save and print the results of the computation, to prevent the computation from being optimized away
Don't print anything to the console during the benchmark
Consider that timings may be distorted by the garbage collector (GC)
Again: These are only rules of thumb, and there may still be unexpected results (refer to the links above for more details). But with this strategy, you usually obtain a good indication about the performance, and at least can see whether it's likely that there really are significant differences between the algorithms.
The differences between the approaches
The Staircase implementation and the Steps implementation are not very different.
The main conceptual difference is that the Staircase implementation is counting down, and the Steps implementation is counting up.
The main difference that actually affects the performance is how the Base Case is handled (see Recursion on Wikipedia). In your implementation, you avoid calling the method recursively when it is not necessary, at the cost of some additional if statements. The Staircase implementation uses a very generic treatment of the base case, by just checking whether n < 0.
One could consider an "intermediate" solution that combines ideas from both approaches:
class Staircase2
{
public static int counting(int n)
{
int result = 0;
if (n >= 1)
{
result += counting(n-1);
if (n >= 2)
{
result += counting(n-2);
if (n >= 3)
{
result += counting(n-3);
}
}
}
else
{
result += 1;
}
return result;
}
}
It's still recursive without a state, and sums up the intermediate results, avoiding many of the "useless" calls by using some if queries. It's already noticably faster than the original Staircase implementation, but still a tad slower than the Steps implementation.
Why both solutions are slow
For both implementations, there's not really anything to be computed. The method consists of few if statements and some additions. The most expensive thing here is actually the recursion itself, with the deeeeply nested call tree.
And that's the key point here: It's a call tree. Imagine what it is computing for a given number of steps, as a "pseudocode call hierarchy":
compute(5)
compute(4)
compute(3)
compute(2)
compute(1)
compute(0)
compute(0)
compute(1)
compute(0)
compute(0)
compute(2)
compute(1)
compute(0)
compute(0)
compute(1)
compute(0)
compute(3)
compute(2)
compute(1)
compute(0)
compute(0)
compute(1)
compute(0)
compute(0)
compute(2)
compute(1)
compute(0)
compute(0)
One can imagine that this grows exponentially when the number becomes larger. And all the results are computed hundreds, thousands or or millions of times. This can be avoided
The fast solution
The key idea to make the computation faster is to use Dynamic Programming. This basically means that intermediate results are stored for later retrieval, so that they don't have to be computed again and again.
It's implemented in this example, which also compares the execution time of all approaches:
import java.util.Arrays;
public class StaircaseSteps
{
public static void main(String[] args)
{
for (int i = 5; i < 33; i++)
{
runStaircase(i);
runSteps(i);
runDynamic(i);
}
}
private static void runStaircase(int max)
{
long before = System.nanoTime();
long sum = 0;
for (int i = 0; i < max; i++)
{
sum += Staircase.counting(i);
}
long after = System.nanoTime();
System.out.println("Staircase up to "+max+" gives "+sum+" time "+(after-before)/1e6);
}
private static void runSteps(int max)
{
long before = System.nanoTime();
long sum = 0;
for (int i = 0; i < max; i++)
{
sum += Steps.step(i);
}
long after = System.nanoTime();
System.out.println("Steps up to "+max+" gives "+sum+" time "+(after-before)/1e6);
}
private static void runDynamic(int max)
{
long before = System.nanoTime();
long sum = 0;
for (int i = 0; i < max; i++)
{
sum += StaircaseDynamicProgramming.counting(i);
}
long after = System.nanoTime();
System.out.println("Dynamic up to "+max+" gives "+sum+" time "+(after-before)/1e6);
}
}
class Staircase
{
public static int counting(int n)
{
if (n < 0)
return 0;
else if (n == 0)
return 1;
else
return counting(n - 1) + counting(n - 2) + counting(n - 3);
}
}
class Steps
{
static int c2 = 0;
static int stairs;
public static int step(int c)
{
c2 = 0;
stairs = c;
return step2(0);
}
private static int step2(int c)
{
if (c + 1 < stairs)
{
if (c + 2 <= stairs)
{
if (c + 3 <= stairs)
{
step2(c + 3);
}
step2(c + 2);
}
step2(c + 1);
}
else
{
c2++;
}
return c2;
}
}
class StaircaseDynamicProgramming
{
public static int counting(int n)
{
int results[] = new int[n+1];
Arrays.fill(results, -1);
return counting(n, results);
}
private static int counting(int n, int results[])
{
int result = results[n];
if (result == -1)
{
result = 0;
if (n >= 1)
{
result += counting(n-1, results);
if (n >= 2)
{
result += counting(n-2, results);
if (n >= 3)
{
result += counting(n-3, results);
}
}
}
else
{
result += 1;
}
}
results[n] = result;
return result;
}
}
The results on my PC are as follows:
...
Staircase up to 29 gives 34850335 time 310.672814
Steps up to 29 gives 34850335 time 112.237711
Dynamic up to 29 gives 34850335 time 0.089785
Staircase up to 30 gives 64099760 time 578.072582
Steps up to 30 gives 64099760 time 204.264142
Dynamic up to 30 gives 64099760 time 0.091524
Staircase up to 31 gives 117897840 time 1050.152703
Steps up to 31 gives 117897840 time 381.293274
Dynamic up to 31 gives 117897840 time 0.084565
Staircase up to 32 gives 216847936 time 1929.43348
Steps up to 32 gives 216847936 time 699.066728
Dynamic up to 32 gives 216847936 time 0.089089
Small changes in the order of statements or so ("micro-optimizations") may have a small impact, or make a noticable difference. But using an entirely different approach can make the real difference.

Java mergesort, should the "merge" step be done with queues or arrays?

This is not homework, I don't have money for school so I am teaching myself whilst working shifts at a tollbooth on the highway (long nights with few customers)
I was trying to implement a simple "mergesort" by thinking first, stretching my brain a little if you like for some actual learning, and then looking at the solution on the manual I am using: "2008-08-21 | The Algorithm Design Manual | Springer | by Steven S. Skiena | ISBN-1848000693".
I came up with a solution which implements the "merge" step using an array as a buffer, I am pasting it below. The author uses queues so I wonder:
Should queues be used instead?
What are the advantages of one method Vs the other? (obviously his method will be better as he is a top algorist and I am a beginner, but I can't quite pinpoint the strengths of it, help me please)
What are the tradeoffs/assumptions that governed his choice?
Here is my code (I am including my implementation of the splitting function as well for the sake of completeness but I think we are only reviewing the merge step here; I do not believe this is a Code Review post by the way as my questions are specific to just one method and about its performance in comparison to another):
package exercises;
public class MergeSort {
private static void merge(int[] values, int leftStart, int midPoint,
int rightEnd) {
int intervalSize = rightEnd - leftStart;
int[] mergeSpace = new int[intervalSize];
int nowMerging = 0;
int pointLeft = leftStart;
int pointRight = midPoint;
do {
if (values[pointLeft] <= values[pointRight]) {
mergeSpace[nowMerging] = values[pointLeft];
pointLeft++;
} else {
mergeSpace[nowMerging] = values[pointRight];
pointRight++;
}
nowMerging++;
} while (pointLeft < midPoint && pointRight < rightEnd);
int fillFromPoint = pointLeft < midPoint ? pointLeft : pointRight;
System.arraycopy(values, fillFromPoint, mergeSpace, nowMerging,
intervalSize - nowMerging);
System.arraycopy(mergeSpace, 0, values, leftStart, intervalSize);
}
public static void mergeSort(int[] values) {
mergeSort(values, 0, values.length);
}
private static void mergeSort(int[] values, int start, int end) {
int intervalSize = end - start;
if (intervalSize < 2) {
return;
}
boolean isIntervalSizeEven = intervalSize % 2 == 0;
int splittingAdjustment = isIntervalSizeEven ? 0 : 1;
int halfSize = intervalSize / 2;
int leftStart = start;
int rightEnd = end;
int midPoint = start + halfSize + splittingAdjustment;
mergeSort(values, leftStart, midPoint);
mergeSort(values, midPoint, rightEnd);
merge(values, leftStart, midPoint, rightEnd);
}
}
Here is the reference solution from the textbook: (it's in C so I am adding the tag)
merge(item_type s[], int low, int middle, int high)
{
int i; /* counter */
queue buffer1, buffer2; /* buffers to hold elements for merging */
init_queue(&buffer1);
init_queue(&buffer2);
for (i=low; i<=middle; i++) enqueue(&buffer1,s[i]);
for (i=middle+1; i<=high; i++) enqueue(&buffer2,s[i]);
i = low;
while (!(empty_queue(&buffer1) || empty_queue(&buffer2))) {
if (headq(&buffer1) <= headq(&buffer2))
s[i++] = dequeue(&buffer1);
else
s[i++] = dequeue(&buffer2);
}
while (!empty_queue(&buffer1)) s[i++] = dequeue(&buffer1);
while (!empty_queue(&buffer2)) s[i++] = dequeue(&buffer2);
}

Abstractly, a queue is just some object that supports the enqueue, dequeue, peek, and is-empty operations. It can be implemented in many different ways (using a circular buffer, using linked lists, etc.)
Logically speaking, the merge algorithm is easiest to describe in terms of queues. You begin with two queues holding the values to merge together, then repeatedly apply peek, is-empty, and dequeue operations on those queues to reconstruct a single sorted sequence.
In your implementation using arrays, you are effectively doing the same thing as if you were using queues. You have just chosen to implement those queues using arrays. There isn't necessarily "better" or "worse" than using queues. Using queues makes the high-level operation of the merge algorithm clearer, but might introduce some inefficiency (though it's hard to say for certain without benchmarking). Using arrays might be slightly more efficient (again, you should test this!), but might obscure the high-level operation of the algorithm. From Skienna's point of view, using queues might be better because it makes the high-level details of the algorithm clear. From your point of view, arrays might be better because of the performance concerns.
Hope this helps!

You're worrying about minor constant factors which are largely down to the quality of your compiler. Given that you seem to be worried about that, arrays are your friend. Below is my C# implementation for integer merge-sort which, I think, is close to as tight as you can get. [EDIT: fixed a buglet.]
If you want to do better in practice, you need something like natural merge-sort, where, instead of merging up in powers of two, you simply merge adjacent non-decreasing sequences of the input. This is certainly no worse than powers-of-two, but is definitely faster when the input data contains some sorted sequences (i.e., anything other than a purely descending input sequence). That's left as an exercise for the student.
int[] MSort(int[] src) {
var n = src.Length;
var from = (int[]) src.Clone();
var to = new int[n];
for (var span = 1; span < n; span += span) {
var i = 0;
for (var j = 0; j < n; j += span + span) {
var l = j;
var lend = Math.Min(l + span, n);
var r = lend;
var rend = Math.Min(r + span, n);
while (l < lend && r < rend) to[i++] = (from[l] <= from[r] ? from[l++] : from[r++]);
while (l < lend) to[i++] = from[l++];
while (r < rend) to[i++] = from[r++];
}
var tmp = from; from = to; to = tmp;
}
return from;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.