Explanation of Dynamic Programming solution - java

This is the problem: given a number of bricks n, between 3 and 200, return the number of different staircases that can be built. Each type of staircase should consist of 2 or more steps. No two steps are allowed to be at the same height - each step must be lower than the previous one. All steps must contain at least one brick. A step's height is classified as the total amount of bricks that make up that step.
For example, when N = 3, you have only 1 choice of how to build the staircase, with the first step having a height of 2 and the second step having a height of 1: (# indicates a brick)
#
##
21
When N = 4, you still only have 1 staircase choice:
#
#
##
31
But when N = 5, there are two ways you can build a staircase from the given bricks. The two staircases can have heights (4, 1) or (3, 2), as shown below:
#
#
#
##
41
#
##
##
32
I found a solution online, but I don't quite intuitively understand the dynamic programming solution.
public class Answer {
static int[][] p = new int[201][201];
public static void fillP() {
p[1][1] = 1;
p[2][2] = 1;
for (int w = 3; w < 201 ; w++) {
for (int m = 1; m <= w; m++) {
if (w-m == 0) {
p[w][m] = 1 + p[w][m-1];
} else if (w-m < m) {
p[w][m] = p[w-m][w-m] + p[w][m-1];
} else if (w-m == m) {
p[w][m] = p[m][m-1] + p[w][m-1];
} else if (w-m >m) {
p[w][m] = p[w-m][m-1] + p[w][m-1];
}
}
}
}
public static int answer(int n) {
fillP();
return p[n][n] - 1;
}
}
In particular, how would one come up with the relationships between each successive entry in the array?

This is a very interesting question. First, let's try to understand the recurrence relation:
If we currently built a step of height h and we have b bricks left to use, the number of ways we could complete the staircase from here is equal to the sum of all the ways we can complete the staircase with the next step of height h' and b - h' bricks, for 0 < h' < h.
Once we have that recurrence relation, we can devise a recursive solution; however, at it's current state, the solution runs in exponential time. So, we just need to "cache" our results:
import java.util.Scanner;
public class Stairs {
static int LIMIT = 200;
static int DIRTY = -1;
static int[][] cache = new int[LIMIT + 2][LIMIT + 2];
public static void clearCache() {
for (int i = 0; i <= LIMIT + 1; i++) {
for (int j = 0; j <= LIMIT + 1; j++) {
// mark cache as dirty/garbage values
cache[i][j] = DIRTY;
}
}
}
public static int numberOfStaircases(int level, int bricks, int steps) {
// base cases
if (bricks < 0) return 0;
if (bricks == 0 && steps >= 2) return 1;
// only compute answer if we haven't already
if (cache[level][bricks] == DIRTY) {
int ways = 0;
for (int nextLevel = level - 1; nextLevel > 0; nextLevel--) {
ways += numberOfStaircases(nextLevel, bricks - nextLevel, steps + 1);
}
cache[level][bricks] = ways;
}
return cache[level][bricks];
}
public static int answer(int n) {
clearCache();
return numberOfStaircases(n + 1, n, 0);
}
public static void main(String[] args) {
Scanner scanner = new Scanner(System.in);
int n = scanner.nextInt();
System.out.println(answer(n));
}
}
From the code you provided, it seems as if the author went one more step further and replaced the recursive solution with a purely iterative version. This means that the author made a bottom-up solution rather than a top-down solution.
We could also approach the problem more mathematically:
How many distinct non-trivial integer partitions does n have?
So for n = 6, we have: 5 + 1, 4 + 2, 3 + 2 + 1. So answer(6) = 3. Interestingly enough, Euler proved that the number of distinct integer partitions for a given n is always the same as the number of not necessarily distinct odd integer partitions.
(As a side note, I know where this question comes from. Good luck!)

Good explanation of this problem (The Grandest Staircase Of Them All) is on the page with several different solutions.
https://jtp.io/2016/07/26/dynamic-programming-python.html

For building a staircase, we can consider it as a pyramid to build on top of each step with the amount of bricks that remain with us as we ascend and complete our staircase.
For n bricks we have, we can start with i bricks on top of the first step, which means we have n-i bricks remaining with us for the current step. As we calculate the number of ways for building a multilevel staircase of n bricks, for first step n-i, the number of ways are - to build the staircase with i bricks which can either be multilevel or a single step. We can follow this relative mechanism to get the total number of staircases that are possible from the zeroth step with n bricks.
To avoid calculating the same results for a pyramid of bricks i, we can use an in memory cache which stores results of the possible staircases for n bricks with k as its last step (since the possible staircases will depend on the previous step over which the pyramid will be placed just to avoid double steps or last step becoming smaller than the next one).
package com.dp;
import java.util.HashMap;
import java.util.Map;
public class Staircases {
private static Map<String, Long> cacheNumberStaircasesForNBricks = new HashMap<String, Long>();
public static void main(String[] args) {
int bricks = 1000;
Long start = System.currentTimeMillis();
long numberOfStaircases = getStaircases(bricks, Integer.MAX_VALUE, true);
Long end = System.currentTimeMillis();
System.out.println(numberOfStaircases);
System.out.println("Time taken " + (end - start) + " ms");
}
/*
* For n bricks returns number of staircases can be formed with minimum 2
* stairs and no double steps, with k as the number of bricks in last step
*/
private static long getStaircases(int n, int k, boolean multilevelOnly) {
/*
* if the last step was same as n, you can't get a single step of n bricks as the next step,
* hence the staircase needs to be multilevel
*/
if (n == k) {
multilevelOnly = true;
}
/*
* for n less than 3 ie 1 or 2 there is only one stair case possible if the last step is of greater number of bricks
*/
if (n < 3) {
if (k <= n) {
return 0;
}
return 1;
}
/*
* for n =3, if multilevel is allowed only, then only one combination is
* there ie 2,1.
*/
if (n == 3) {
if (k < n) {
return 0;
}
if (multilevelOnly) {
return 1;
}
}
/*
* refer from the in-memory cache. Don't compute if we have computed for last step (k) and current bricks left (n) to build the rest of the staircase
*/
String cacheKey = n + "-" + k;
if (cacheNumberStaircasesForNBricks.get(cacheKey) != null) {
return cacheNumberStaircasesForNBricks.get(cacheKey);
}
/*
* start with one case which involves a single step of n bricks.
* for multilevel only or last step being smaller(invalid scenario) staircases, put the initial count as zero
*/
long numberOfStaircases = multilevelOnly || k < n ? 0 : 1;
for (int i = 1; n - i > 0; i++) {
// current step must be smaller than the last step
if (n - i < k) {
numberOfStaircases += getStaircases(i, n - i, false);
}
}
cacheNumberStaircasesForNBricks.put(cacheKey, numberOfStaircases);
return numberOfStaircases;
}
}

Related

How to calculate the probability of getting the sum X using N six-sided dice

The Challenge:
For example, what is the probability of getting the sum of 15 when using 3 six-sided dice. This can be for example by getting 5-5-5 or 6-6-3 or 3-6-6 or many more options.
A brute force solution for 2 dice - with complexity of 6^2:
Assuming we had only 2 six-sided dice, we can write a very basic code like that:
public static void main(String[] args) {
System.out.println(whatAreTheOdds(7));
}
public static double whatAreTheOdds(int wantedSum){
if (wantedSum < 2 || wantedSum > 12){
return 0;
}
int wantedFound = 0;
int totalOptions = 36;
for (int i = 1; i <= 6; i++) {
for (int j = 1; j <= 6; j++) {
int sum = i+j;
if (sum == wantedSum){
System.out.println("match: " + i + " " + j );
wantedFound +=1;
}
}
}
System.out.println("combinations count:" + wantedFound);
return (double)wantedFound / totalOptions;
}
And the output for 7 will be:
match: 1 6
match: 2 5
match: 3 4
match: 4 3
match: 5 2
match: 6 1
combination count:6
0.16666666666666666
The question is how to generalize the algorithm to support N dice:
public static double whatAreTheOdds(int wantedSum, int numberOfDices)
Because we can't dynamically create nested for loops, we must come with a different approach.
I thought of something like that:
public static double whatAreTheOdds(int sum, int numberOfDices){
int sum;
for (int i = 0; i < numberOfDices; i++) {
for (int j = 1; j <= 6; j++) {
}
}
}
but failed to come up with the right algorithm.
Another challenge here is - is there a way to do it efficiently, and not in a complexity of 6^N?
Here is a recursive solution with memoization to count the combinations.
import java.util.Arrays;
import java.lang.Math;
class Dices {
public static final int DICE_FACES = 6;
public static void main(String[] args) {
System.out.println(whatAreTheOdds(40, 10));
}
public static double whatAreTheOdds(int sum, int dices) {
if (dices < 1 || sum < dices || sum > DICE_FACES * dices) return 0;
long[][] mem = new long[dices][sum];
for (long[] mi : mem) {
Arrays.fill(mi, 0L);
}
long n = whatAreTheOddsRec(sum, dices, mem);
return n / Math.pow(DICE_FACES, dices);
}
private static long whatAreTheOddsRec(int sum, int dices, long[][] mem) {
if (dices <= 1) {
return 1;
}
long n = 0;
int dicesRem = dices - 1;
int minFace = Math.max(sum - DICE_FACES * dicesRem, 1);
int maxFace = Math.min(sum - dicesRem, DICE_FACES);
for (int i = minFace; i <= maxFace; i++) {
int sumRem = sum - i;
long ni = mem[dicesRem][sumRem];
if (ni <= 0) {
ni = whatAreTheOddsRec(sumRem, dicesRem, mem);
mem[dicesRem][sumRem] = ni;
}
n += ni;
}
return n;
}
}
Output:
0.048464367913724195
EDIT: For the record, the complexity of this algorithm is still O(6^n), this answer just aims to give a possible implementation for the general case that is better than the simplest implementation, using memoization and search space prunning (exploring only feasible solutions).
As Alex's answer notes, there is a combinatorial formula for this:
In this formula, p is the sum of the numbers rolled (X in your question), n is the number of dice, and s is the number of sides each dice has (6 in your question). Whether the binomial coefficients are evaluated using loops, or precomputed using Pascal's triangle, either way the time complexity is O(n2) if we take s = 6 to be a constant and X - n to be O(n).
Here is an alternative algorithm, which computes all of the probabilities at once. The idea is to use discrete convolution to compute the distribution of the sum of two random variables given their distributions. By using a divide and conquer approach as in the exponentiation by squaring algorithm, we only have to do O(log n) convolutions.
The pseudocode is below; sum_distribution(v, n) returns an array where the value at index X - n is the number of combinations where the sum of n dice rolls is X.
// for exact results using integers, let v = [1, 1, 1, 1, 1, 1]
// and divide the result through by 6^n afterwards
let v = [1/6.0, 1/6.0, 1/6.0, 1/6.0, 1/6.0, 1/6.0]
sum_distribution(distribution, n)
if n == 0
return [1]
else if n == 1
return v
else
let r = convolve(distribution, distribution)
// the division here rounds down
let d = sum_distribution(r, n / 2)
if n is even
return d
else
return convolve(d, v)
Convolution cannot be done in linear time, so the running time is dominated by the last convolution on two arrays of length 3n, since the other convolutions are on sufficiently shorter arrays.
This means if you use a simple convolution algorithm, it should take O(n2) time to compute all of the probabilities, and if you use a fast Fourier transform then it should take O(n log n) time.
You might want to take a look at Wolfram article for a completely different approach, which calculates the desired probability with a single loop.
The idea is to have an array storing the current "state" of each dice, starting will every dice at one, and count upwards. For example, with three dice you would generate the combinations:
111
112
...
116
121
122
...
126
...
665
666
Once you have a state, you can easily find if the sum is the one you are looking for.
I leave the details to you, as it seems a useful learning exercise :)

Multithreaded Segmented Sieve of Eratosthenes in Java

I am trying to create a fast prime generator in Java. It is (more or less) accepted that the fastest way for this is the segmented sieve of Eratosthenes: https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes. Lots of optimizations can be further implemented to make it faster. As of now, my implementation generates 50847534 primes below 10^9 in about 1.6 seconds, but I am looking to make it faster and at least break the 1 second barrier. To increase the chance of getting good replies, I will include a walkthrough of the algorithm as well as the code.
Still, as a TL;DR, I am looking to include multi-threading into the code
For the purposes of this question, I want to separate between the 'segmented' and the 'traditional' sieves of Eratosthenes. The traditional sieve requires O(n) space and therefore is very limited in range of the input (the limit of it). The segmented sieve however only requires O(n^0.5) space and can operate on much larger limits. (A main speed-up is using a cache-friendly segmentation, taking into account the L1 & L2 cache sizes of the specific computer). Finally, the main difference that concerns my question is that the traditional sieve is sequential, meaning it can only continue once the previous steps are completed. The segmented sieve however, is not. Each segment is independent, and is 'processed' individually against the sieving primes (the primes not larger than n^0.5). This means that theoretically, once I have the sieving primes, I can divide the work between multiple computers, each processing a different segment. The work of eachother is independent of the others. Assuming (wrongly) that each segment requires the same amount of time t to complete, and there are k segments, One computer would require total time of T = k * t, whereas k computers, each working on a different segment would require a total amount of time T = t to complete the entire process. (Practically, this is wrong, but for the sake of simplicity of the example).
This brought me to reading about multithreading - dividing the work to a few threads each processing a smaller amount of work for better usage of CPU. To my understanding, the traditional sieve cannot be multithreaded exactly because it is sequential. Each thread would depend on the previous, rendering the entire idea unfeasible. But a segmented sieve may indeed (I think) be multithreaded.
Instead of jumping straight into my question, I think it is important to introduce my code first, so I am hereby including my current fastest implementation of the segmented sieve. I have worked quite hard on it. It took quite some time, slowly tweaking and adding optimizations to it. The code is not simple. It is rather complex, I would say. I therefore assume the reader is familiar with the concepts I am introducing, such as wheel factorization, prime numbers, segmentation and more. I have included notes to make it easier to follow.
import java.math.BigInteger;
import java.util.ArrayList;
import java.util.Arrays;
public class primeGen {
public static long x = (long)Math.pow(10, 9); //limit
public static int sqrtx;
public static boolean [] sievingPrimes; //the sieving primes, <= sqrtx
public static int [] wheels = new int [] {2,3,5,7,11,13,17,19}; // base wheel primes
public static int [] gaps; //the gaps, according to the wheel. will enable skipping multiples of the wheel primes
public static int nextp; // the first prime > wheel primes
public static int l; // the amount of gaps in the wheel
public static void main(String[] args)
{
long startTime = System.currentTimeMillis();
preCalc(); // creating the sieving primes and calculating the list of gaps
int segSize = Math.max(sqrtx, 32768*8); //size of each segment
long u = nextp; // 'u' is the running index of the program. will continue from one segment to the next
int wh = 0; // the will be the gap index, indicating by how much we increment 'u' each time, skipping the multiples of the wheel primes
long pi = pisqrtx(); // the primes count. initialize with the number of primes <= sqrtx
for (long low = 0 ; low < x ; low += segSize) //the heart of the code. enumerating the primes through segmentation. enumeration will begin at p > sqrtx
{
long high = Math.min(x, low + segSize);
boolean [] segment = new boolean [(int) (high - low + 1)];
int g = -1;
for (int i = nextp ; i <= sqrtx ; i += gaps[g])
{
if (sievingPrimes[(i + 1) / 2])
{
long firstMultiple = (long) (low / i * i);
if (firstMultiple < low)
firstMultiple += i;
if (firstMultiple % 2 == 0) //start with the first odd multiple of the current prime in the segment
firstMultiple += i;
for (long j = firstMultiple ; j < high ; j += i * 2)
segment[(int) (j - low)] = true;
}
g++;
//if (g == l) //due to segment size, the full list of gaps is never used **within just one segment** , and therefore this check is redundant.
//should be used with bigger segment sizes or smaller lists of gaps
//g = 0;
}
while (u <= high)
{
if (!segment[(int) (u - low)])
pi++;
u += gaps[wh];
wh++;
if (wh == l)
wh = 0;
}
}
System.out.println(pi);
long endTime = System.currentTimeMillis();
System.out.println("Solution took "+(endTime - startTime) + " ms");
}
public static boolean [] simpleSieve (int l)
{
long sqrtl = (long)Math.sqrt(l);
boolean [] primes = new boolean [l/2+2];
Arrays.fill(primes, true);
int g = -1;
for (int i = nextp ; i <= sqrtl ; i += gaps[g])
{
if (primes[(i + 1) / 2])
for (int j = i * i ; j <= l ; j += i * 2)
primes[(j + 1) / 2]=false;
g++;
if (g == l)
g=0;
}
return primes;
}
public static long pisqrtx ()
{
int pi = wheels.length;
if (x < wheels[wheels.length-1])
{
if (x < 2)
return 0;
int k = 0;
while (wheels[k] <= x)
k++;
return k;
}
int g = -1;
for (int i = nextp ; i <= sqrtx ; i += gaps[g])
{
if(sievingPrimes[( i + 1 ) / 2])
pi++;
g++;
if (g == l)
g=0;
}
return pi;
}
public static void preCalc ()
{
sqrtx = (int) Math.sqrt(x);
int prod = 1;
for (long p : wheels)
prod *= p; // primorial
nextp = BigInteger.valueOf(wheels[wheels.length-1]).nextProbablePrime().intValue(); //the first prime that comes after the wheel
int lim = prod + nextp; // circumference of the wheel
boolean [] marks = new boolean [lim + 1];
Arrays.fill(marks, true);
for (int j = 2 * 2 ;j <= lim ; j += 2)
marks[j] = false;
for (int i = 1 ; i < wheels.length ; i++)
{
int p = wheels[i];
for (int j = p * p ; j <= lim ; j += 2 * p)
marks[j]=false; // removing all integers that are NOT comprime with the base wheel primes
}
ArrayList <Integer> gs = new ArrayList <Integer>(); //list of the gaps between the integers that are coprime with the base wheel primes
int d = nextp;
for (int p = d + 2 ; p < marks.length ; p += 2)
{
if (marks[p]) //d is prime. if p is also prime, then a gap is identified, and is noted.
{
gs.add(p - d);
d = p;
}
}
gaps = new int [gs.size()];
for (int i = 0 ; i < gs.size() ; i++)
gaps[i] = gs.get(i); // Arrays are faster than lists, so moving the list of gaps to an array
l = gaps.length;
sievingPrimes = simpleSieve(sqrtx); //initializing the sieving primes
}
}
Currently, it produces 50847534 primes below 10^9 in about 1.6 seconds. This is very impressive, at least by my standards, but I am looking to make it faster, possibly break the 1 second barrier. Even then, I believe it can be made much faster still.
The whole program is based on wheel factorization: https://en.wikipedia.org/wiki/Wheel_factorization. I have noticed I am getting the fastest results using a wheel of all primes up to 19.
public static int [] wheels = new int [] {2,3,5,7,11,13,17,19}; // base wheel primes
This means that the multiples of those primes are skipped, resulting in a much smaller searching range. The gaps between numbers which we need to take are then calculated in the preCalc method. If we make those jumps between the the numbers in the searching range we skip the multiples of the base primes.
public static void preCalc ()
{
sqrtx = (int) Math.sqrt(x);
int prod = 1;
for (long p : wheels)
prod *= p; // primorial
nextp = BigInteger.valueOf(wheels[wheels.length-1]).nextProbablePrime().intValue(); //the first prime that comes after the wheel
int lim = prod + nextp; // circumference of the wheel
boolean [] marks = new boolean [lim + 1];
Arrays.fill(marks, true);
for (int j = 2 * 2 ;j <= lim ; j += 2)
marks[j] = false;
for (int i = 1 ; i < wheels.length ; i++)
{
int p = wheels[i];
for (int j = p * p ; j <= lim ; j += 2 * p)
marks[j]=false; // removing all integers that are NOT comprime with the base wheel primes
}
ArrayList <Integer> gs = new ArrayList <Integer>(); //list of the gaps between the integers that are coprime with the base wheel primes
int d = nextp;
for (int p = d + 2 ; p < marks.length ; p += 2)
{
if (marks[p]) //d is prime. if p is also prime, then a gap is identified, and is noted.
{
gs.add(p - d);
d = p;
}
}
gaps = new int [gs.size()];
for (int i = 0 ; i < gs.size() ; i++)
gaps[i] = gs.get(i); // Arrays are faster than lists, so moving the list of gaps to an array
l = gaps.length;
sievingPrimes = simpleSieve(sqrtx); //initializing the sieving primes
}
At the end of the preCalc method, the simpleSieve method is called, efficiently sieving all the sieving primes mentioned before, the primes <= sqrtx. This is a simple Eratosthenes sieve, rather than segmented, but it is still based on wheel factorization, perviously computed.
public static boolean [] simpleSieve (int l)
{
long sqrtl = (long)Math.sqrt(l);
boolean [] primes = new boolean [l/2+2];
Arrays.fill(primes, true);
int g = -1;
for (int i = nextp ; i <= sqrtl ; i += gaps[g])
{
if (primes[(i + 1) / 2])
for (int j = i * i ; j <= l ; j += i * 2)
primes[(j + 1) / 2]=false;
g++;
if (g == l)
g=0;
}
return primes;
}
Finally, we reach the heart of the algorithm. We start by enumerating all primes <= sqrtx, with the following call:
long pi = pisqrtx();`
which used the following method:
public static long pisqrtx ()
{
int pi = wheels.length;
if (x < wheels[wheels.length-1])
{
if (x < 2)
return 0;
int k = 0;
while (wheels[k] <= x)
k++;
return k;
}
int g = -1;
for (int i = nextp ; i <= sqrtx ; i += gaps[g])
{
if(sievingPrimes[( i + 1 ) / 2])
pi++;
g++;
if (g == l)
g=0;
}
return pi;
}
Then, after initializing the pi variable which keeps track of the enumeration of primes, we perform the mentioned segmentation, starting the enumeration from the first prime > sqrtx:
int segSize = Math.max(sqrtx, 32768*8); //size of each segment
long u = nextp; // 'u' is the running index of the program. will continue from one segment to the next
int wh = 0; // the will be the gap index, indicating by how much we increment 'u' each time, skipping the multiples of the wheel primes
long pi = pisqrtx(); // the primes count. initialize with the number of primes <= sqrtx
for (long low = 0 ; low < x ; low += segSize) //the heart of the code. enumerating the primes through segmentation. enumeration will begin at p > sqrtx
{
long high = Math.min(x, low + segSize);
boolean [] segment = new boolean [(int) (high - low + 1)];
int g = -1;
for (int i = nextp ; i <= sqrtx ; i += gaps[g])
{
if (sievingPrimes[(i + 1) / 2])
{
long firstMultiple = (long) (low / i * i);
if (firstMultiple < low)
firstMultiple += i;
if (firstMultiple % 2 == 0) //start with the first odd multiple of the current prime in the segment
firstMultiple += i;
for (long j = firstMultiple ; j < high ; j += i * 2)
segment[(int) (j - low)] = true;
}
g++;
//if (g == l) //due to segment size, the full list of gaps is never used **within just one segment** , and therefore this check is redundant.
//should be used with bigger segment sizes or smaller lists of gaps
//g = 0;
}
while (u <= high)
{
if (!segment[(int) (u - low)])
pi++;
u += gaps[wh];
wh++;
if (wh == l)
wh = 0;
}
}
I have also included it as a note, but will explain as well. Because the segment size is relatively small, we will not go through the entire list of gaps within just one segment, and checking it - is redundant. (Assuming we use a 19-wheel). But in a broader scope overview of the program, we will make use of the entire array of gaps, so the variable u has to follow it and not accidentally surpass it:
while (u <= high)
{
if (!segment[(int) (u - low)])
pi++;
u += gaps[wh];
wh++;
if (wh == l)
wh = 0;
}
Using higher limits will eventually render a bigger segment, which might result in a neccessity of checking we don't surpass the gaps list even within the segment. This, or tweaking the wheel primes base might have this effect on the program. Switching to bit-sieving can largely improve the segment limit though.
As an important side-note, I am aware that efficient segmentation is
one that takes the L1 & L2 cache-sizes into account. I get the
fastest results using a segment size of 32,768 * 8 = 262,144 = 2^18. I am not sure what the cache-size of my computer is, but I do
not think it can be that big, as I see most cache sizes <= 32,768.
Still, this produces the fastest run time on my computer, so this is
why it's the chosen segment size.
As I mentioned, I am still looking to improve this by a lot. I
believe, according to my introduction, that multithreading can result
in a speed-up factor of 4, using 4 threads (corresponding to 4
cores). The idea is that each thread will still use the idea of the
segmented sieve, but work on different portions. Divide the n
into 4 equal portions - threads, each in turn performing the
segmentation on the n/4 elements it is responsible for, using the
above program. My question is how do I do that? Reading about
multithreading and examples, unfortunately, did not bring to me any
insight on how to implement it in the case above efficiently. It
seems to me, as opposed to the logic behind it, that the threads were
running sequentially, rather than simultaneously. This is why I
excluded it from the code to make it more readable. I will really
appreciate a code sample on how to do it in this specific code, but a
good explanation and reference will maybe do the trick too.
Additionally, I would like to hear about more ways of speeding-up
this program even more, any ideas you have, I would love to hear!
Really want to make it very fast and efficient. Thank you!
An example like this should help you get started.
An outline of a solution:
Define a data structure ("Task") that encompasses a specific segment; you can put all the immutable shared data into it for extra neatness, too. If you're careful enough, you can pass a common mutable array to all tasks, along with the segment limits, and only update the part of the array within these limits. This is more error-prone, but can simplify the step of joining the results (AFAICT; YMMV).
Define a data structure ("Result") that stores the result of a Task computation. Even if you just update a shared resulting structure, you may need to signal which part of that structure has been updated so far.
Create a Runnable that accepts a Task, runs a computation, and puts the results into a given result queue.
Create a blocking input queue for Tasks, and a queue for Results.
Create a ThreadPoolExecutor with the number of threads close to the number of machine cores.
Submit all your Tasks to the thread pool executor. They will be scheduled to run on the threads from the pool, and will put their results into the output queue, not necessarily in order.
Wait for all the tasks in the thread pool to finish.
Drain the output queue and join the partial results into the final result.
Extra speedup may (or may not) be achieved by joining the results in a separate task that reads the output queue, or even by updating a mutable shared output structure under synchronized, depending on how much work the joining step involves.
Hope this helps.
Are you familiar with the work of Tomas Oliveira e Silva? He has a very fast implementation of the Sieve of Eratosthenes.
How interested in speed are you? Would you consider using c++?
$ time ../c_code/segmented_bit_sieve 1000000000
50847534 primes found.
real 0m0.875s
user 0m0.813s
sys 0m0.016s
$ time ../c_code/segmented_bit_isprime 1000000000
50847534 primes found.
real 0m0.816s
user 0m0.797s
sys 0m0.000s
(on my newish laptop with an i5)
The first is from #Kim Walisch using a bit array of odd prime candidates.
https://github.com/kimwalisch/primesieve/wiki/Segmented-sieve-of-Eratosthenes
The second is my tweak to Kim's with IsPrime[] also implemented as bit array, which is slightly less clear to read, although a little faster for big N due to the reduced memory footprint.
I will read your post carefully as I am interested in primes and performance no matter what language is used. I hope this isn't too far off topic or premature. But I noticed I was already beyond your performance goal.

O(log n) Programming

I am trying to prepare for a contest but my program speed is always dreadfully slow as I use O(n). First of all, I don't even know how to make it O(log n), or I've never heard about this paradigm. Where can I learn about this?
For example,
If you had an integer array with zeroes and ones, such as [ 0, 0, 0, 1, 0, 1 ], and now you wanted to replace every 0 with 1 only if one of it's neighbors has the value of 1, what is the most efficient way to go about doing if this must occur t number of times? (The program must do this for a number of t times)
EDIT:
Here's my inefficient solution:
import java.util.Scanner;
public class Main {
static Scanner input = new Scanner(System.in);
public static void main(String[] args) {
int n;
long t;
n = input.nextInt();
t = input.nextLong();
input.nextLine();
int[] units = new int[n + 2];
String inputted = input.nextLine();
input.close();
for(int i = 1; i <= n; i++) {
units[i] = Integer.parseInt((""+inputted.charAt(i - 1)));
}
int[] original;
for(int j = 0; j <= t -1; j++) {
units[0] = units[n];
units[n + 1] = units[1];
original = units.clone();
for(int i = 1; i <= n; i++) {
if(((original[i - 1] == 0) && (original[i + 1] == 1)) || ((original[i - 1] == 1) && (original[i + 1] == 0))) {
units[i] = 1;
} else {
units[i] = 0;
}
}
}
for(int i = 1; i <= n; i++) {
System.out.print(units[i]);
}
}
}
This is an elementary cellular automaton. Such a dynamical system has properties that you can use for your advantages. In your case, for example, you can set to value 1 every cell at distance at most t from any initial value 1 (cone of light property). Then you may do something like:
get a 1 in the original sequence, say it is located at position p.
set to 1 every position from p-t to p+t.
You may then take as your advantage in the next step that you've already set position p-t to p+t... This can let you compute the final step t without computing intermediary steps (good factor of acceleration isn't it?).
You can also use some tricks as HashLife, see 1.
As I was saying in the comments, I'm fairly sure you can keep out the array and clone operations.
You can modify a StringBuilder in-place, so no need to convert back and forth between int[] and String.
For example, (note: This is on the order of an O(n) operation for all T <= N)
public static void main(String[] args) {
System.out.println(conway1d("0000001", 7, 1));
System.out.println(conway1d("01011", 5, 3));
}
private static String conway1d(CharSequence input, int N, long T) {
System.out.println("Generation 0: " + input);
StringBuilder sb = new StringBuilder(input); // Will update this for all generations
StringBuilder copy = new StringBuilder(); // store a copy to reference current generation
for (int gen = 1; gen <= T; gen++) {
// Copy over next generation string
copy.setLength(0);
copy.append(input);
for (int i = 0; i < N; i++) {
conwayUpdate(sb, copy, i, N);
}
input = sb.toString(); // next generation string
System.out.printf("Generation %d: %s\n", gen, input);
}
return input.toString();
}
private static void conwayUpdate(StringBuilder nextGen, final StringBuilder currentGen, int charPos, int N) {
int prev = (N + (charPos - 1)) % N;
int next = (charPos + 1) % N;
// **Exactly one** adjacent '1'
boolean adjacent = currentGen.charAt(prev) == '1' ^ currentGen.charAt(next) == '1';
nextGen.setCharAt(charPos, adjacent ? '1' : '0'); // set cell as alive or dead
}
For the two samples in the problem you posted in the comments, this code generates this output.
Generation 0: 0000001
Generation 1: 1000010
1000010
Generation 0: 01011
Generation 1: 00011
Generation 2: 10111
Generation 3: 10100
10100
The BigO notation is a simplification to understand the complexity of the Algorithm. Basically, two algorithms O(n) can have very different execution times. Why? Let's unroll your example:
You have two nested loops. The outer loop will run t times.
The inner loop will run n times
For each time the loop executes, it will take a constant k time.
So, in essence your algorithm is O(k * t * n). If t is in the same order of magnitude of n, then you can consider the complexity as O(k * n^2).
There is two approaches to optimize this algorithm:
Reduce the constant time k. For example, do not clone the whole array on each loop, because it is very time consuming (clone needs to do a full array loop to clone).
The second optimization in this case is to use Dynamic Programing (https://en.wikipedia.org/wiki/Dynamic_programming) that can cache information between two loops and optimize the execution, that can lower k or even lower the complexity from O(nˆ2) to O(n * log n).

Restaurant Maximum Profit using Dynamic Programming

Its an assignment task,I have spend 2 days to come up with a solution but still having lots of confusion,however here I need to make few points clear. Following is the problem:
Yuckdonald’s is considering opening a series of restaurant along QVH. n possible locations are along a straight line and the distances of these locations from the start of QVH are in miles and in increasing order m1, m2, ...., mn. The constraints are as follows:
1. At each location, Yuckdonald may open one restaurant and expected profit from opening a restaurant at location i is given as pi
2. Any two restaurants should be at least k miles apart, where k is a positive integer
My solution:
public class RestaurantProblem {
int[] Profit;
int[] P;
int[] L;
int k;
public RestaurantProblem(int[] L , int[] P, int k) {
this.L = L;
this.P = P;
this.k = k;
Profit = new int[L.length];
}
public int compute(int i){
if(i==0)
return 0;
Profit[i]= P[i]+(L[i]-L[i-1]< k ? 0:compute(i-1));//if condition satisfies then adding previous otherwise zero
if (Profit[i]<compute(i-1)){
Profit[i] = compute(i-1);
}
return Profit[i];
}
public static void main(String args[]){
int[] m = {0,5,10,15,19,25,28,29};
int[] p = {0,10,4,61,21,13,19,15};
int k = 5;
RestaurantProblem rp = new RestaurantProblem(m, p ,k);
rp.compute(m.length-1);
for(int n : rp.Profit)
System.out.println(n);
}
}
This solution giving me 88 however if I exclude (Restaurant at 25 with Profit 13) and include (Restaurant 28 with profit 19) I can have 94 max...
point me if I am wrong or how can I achieve this if its true.
I was able to identify 2 mistakes:
You are not actually using dynamic programming
, you are just storing the results in a data structure, which wouldn't be that bad for performance if the program worked the way you have written it and if you did only 1 recursive call.
However you do at least 2 recursive calls. Therefore the program runs in Ω(2^n) instead of O(n).
Dynamic programming usually works like this (pseudocode):
calculate(input) {
if (value already calculated for input)
return previously calculated value
else
calculate and store value for input and return result
}
You could do this by initializing the array elements to -1 (or 0 if all profits are positive):
Profit = new int[L.length];
Arrays.fill(Profit, -1); // no need to do this, if you are using 0
public int compute(int i) {
if (Profit[i] >= 0) { // modify the check, if you're using 0 for non-calculated values
// reuse already calculated value
return Profit[i];
}
...
You assume the previous restaurant can only be build at the previous position
Profit[i] = P[i] + (L[i]-L[i-1]< k ? 0 : compute(i-1));
^
Just ignores all positions before i-1
Instead you should use the profit for the last position that is at least k miles away.
Example
k = 3
L 1 2 3 ... 100
P 5 5 5 ... 5
here L[i] - L[i-1] < k is true for all i and therefore the result will just be P[99] = 5 but it should be 34 * 5 = 170.
int[] lastPos;
public RestaurantProblem(int[] L, int[] P, int k) {
this.L = L;
this.P = P;
this.k = k;
Profit = new int[L.length];
lastPos = new int[L.length];
Arrays.fill(lastPos, -2);
Arrays.fill(Profit, -1);
}
public int computeLastPos(int i) {
if (i < 0) {
return -1;
}
if (lastPos[i] >= -1) {
return lastPos[i];
}
int max = L[i] - k;
int lastLastPos = computeLastPos(i - 1), temp;
while ((temp = lastLastPos + 1) < i && L[temp] <= max) {
lastLastPos++;
}
return lastPos[i] = lastLastPos;
}
public int compute(int i) {
if (i < 0) {
// no restaurants can be build before pos 0
return 0;
}
if (Profit[i] >= 0) { // modify the check, if you're using 0 for non-calculated values
// reuse already calculated value
return Profit[i];
}
int profitNoRestaurant = compute(i - 1);
if (P[i] <= 0) {
// no profit can be gained by building this restaurant
return Profit[i] = profitNoRestaurant;
}
return Profit[i] = Math.max(profitNoRestaurant, P[i] + compute(computeLastPos(i)));
}
To my understanding, the prolem can be modelled with a two-dimensional state space, which I don't find in the presented implementation. For each (i,j) in{0,...,n-1}times{0,...,n-1}` let
profit(i,j) := the maximum profit attainable for selecting locations
from {0,...,i} where the farthest location selected is
no further than at position j
(or minus infinity if no such solution exist)
and note that the recurrence relation
profit(i,j) = min{ p[i] + profit(i-1,lastpos(i)),
profit(i-1,j)
}
where lastpos(i) is the location which is farthest from the start, but no closer than k to position i; the first case above corresponds to selection location i into the solution while the second case corresponds to omitting location j in the solution. The overall solution can be obtained by evaluating profit(n-1,n-1); the evaluation can be done either recursively or by filling a two-dimensional array in a bottom-up manner and returning its contents at (n-1,n-1).

Given a number N, can N be expressed as the sum of two or more consecutive perfect squares?

At a recent computer programming competition that I was at, there was a problem where you have to determine if a number N, for 1<=N<=1000, is a palindromic square. A palindromic square is number that can be read the same forwards and backwards and can be expressed as the sum of two or more consecutive perfect squares. For example, 595 is a palindrome and can be expressed as 6^2 + 7^2 + 8^2 + 9^2 + 10^2 + 11^2 + 12^2.
I understand how to determine if the number is a palindrome, but I'm having trouble trying to figure out if it can be expressed as the sum of two or more consecutive squares.
Here is the algorithm that I tried:
public static boolean isSumOfSquares(int num) {
int sum = 0;
int lowerBound = 1;
//largest square root that is less than num
int upperBound = (int)Math.floor(Math.sqrt(num));
while(lowerBound != upperBound) {
for(int x=lowerBound; x<upperBound; x++) {
sum += x*x;
}
if(sum != num) {
lowerBound++;
}
else {
return true;
}
sum=0;
}
return false;
}
My approach sets the upper boundary to the closest square root to the number and sets the lower bound to 1 and keeps evaluating the sum of squares from the lower bound to the upper bound. The issue is that only the lower bound changes while the upper bound stays the same.
This should be an efficient algorithm for determining if it's a sum of squares of consecutive numbers.
Start with a lower bound and upper bound of 1. The current sum of squares is 1.
public static boolean isSumOfSquares(int num) {
int sum = 1;
int lowerBound = 1;
int upperBound = 1;
The maximum possible upper bound is the maximum number whose square is less than or equal to the number to test.
int max = (int) Math.floor(Math.sqrt(num));
While loop. If the sum of squares is too little, then add the next square, incrementing upperBound. If the sum of squares is too high, then subtract the first square, incrementing lowerBound. Exit if the number is found. If it can't be expressed as the sum of squares of consecutive numbers, then eventually upperBound will exceed the max, and false is returned.
while(sum != num)
{
if (sum < num)
{
upperBound++;
sum += upperBound * upperBound;
}
else if (sum > num)
{
sum -= lowerBound * lowerBound;
lowerBound++;
}
if (upperBound > max)
return false;
}
return true;
Tests for 5, 11, 13, 54, 181, and 595. Yes, some of them aren't palindromes, but I'm just testing the sum of squares of consecutive numbers part.
1: true
2: false
3: false
4: true
5: true
11: false
13: true
54: true
180: false
181: true
595: true
596: false
Just for play, I created a Javascript function that gets all of the palindromic squares between a min and max value: http://jsfiddle.net/n5uby1wd/2/
HTML
<button text="click me" onclick="findPalindromicSquares()">Click Me</button>
<div id="test"></div>
JS
function isPalindrome(val) {
return ((val+"") == (val+"").split("").reverse().join(""));
}
function findPalindromicSquares() {
var max = 1000;
var min = 1;
var list = [];
var done = false,
first = true,
sum = 0,
maxsqrt = Math.floor(Math.sqrt(max)),
sumlist = [];
for(var i = min; i <= max; i++) {
if (isPalindrome(i)) {
done = false;
//Start walking up the number list
for (var j = 1; j <= maxsqrt; j++) {
first = true;
sum = 0;
sumlist = [];
for(var k = j; k <= maxsqrt; k++) {
sumlist.push(k);
sum = sum + (k * k);
if (!first && sum == i) {
list.push({"Value":i,"Sums":sumlist});
done = true;
}
else if (!first && sum > i) {
break;
}
first = false;
if (done) break;
}
if (done) break;
}
}
}
//write the list
var html = "";
for(var l = 0; l < list.length; l++) {
html += JSON.stringify(list[l]) + "<br>";
}
document.getElementById("test").innerHTML = html;
}
Where min=1 and max=1000, returns:
{"Value":5,"Sums":[1,2]}
{"Value":55,"Sums":[1,2,3,4,5]}
{"Value":77,"Sums":[4,5,6]}
{"Value":181,"Sums":[9,10]}
{"Value":313,"Sums":[12,13]}
{"Value":434,"Sums":[11,12,13]}
{"Value":505,"Sums":[2,3,4,5,6,7,8,9,10,11]}
{"Value":545,"Sums":[16,17]}
{"Value":595,"Sums":[6,7,8,9,10,11,12]}
{"Value":636,"Sums":[4,5,6,7,8,9,10,11,12]}
{"Value":818,"Sums":[2,3,4,5,6,7,8,9,10,11,12,13]}
An updated version which allows testing individual values: http://jsfiddle.net/n5uby1wd/3/
It only took a few seconds to find them all between 1 and 1,000,000.
You are looking for S(n, k) = n^2 + (n + 1)^2 + (n + 2)^2 + ... (n + (k - 1))^2 which adds up to a specified sum m, i.e., S(n, k) = m. (I'm assuming you'll test for palindromes separately.) S(n, k) - m is a quadratic in n. You can easily work out an explicit expression for S(n, k) - m, so solve it using the quadratic formula. If S(n, k) - m has a positive integer root, keep that root; it gives a solution to your problem.
I'm assuming you can easily test whether a quadratic has a positive integer root. The hard part is probably determining whether the discriminant has an integer square root; I'm guessing you can figure that out.
You'll have to look for k = 2, 3, 4, .... You can stop when 1 + 4 + 9 + ... + k^2 > m. You can probably work out an explicit expression for that.
since there are only few integer powers, you can create an array of powers.
Then you can have 1st and last included index. Initially they are both 1.
while sum is lower than your number, increase last included index. Update sum
while sum is higher, increase 1st included index. Update sum
Or without any array, as in rgettman's answer
Start with an array of The first perfect squares, Let's say your numbers are 13 and 17 , then your array will contain: 1, 4, 9, and 16
Do this kind of checking:
13 minus 1 (0^2) is 12. 1 is a perfect square, 12 is not.
13 minus 2(1^2) is 11. 2 is a perfect square, 11 is not.
13 minus 4(2^2) is 9. 4 is a perfect square, 9 is a perfect square, so 13 is the sum of two perfect
17 minus 1 is 16. 1 and 16 are perfect squares. Eliminate choice.
Keep going until you find one that is not the sum of two perfect squares or not.
One method (probably not efficient) I can think of off the top of my head is,
Suppose N is 90.
X=9 (integer value of sqrt of 90)
1. Create an array of all the integer powers less than x [1,4,9,16,25,36,49,64,81]
2. Generate all possible combinations of the items in the array using recursion. [1,4],[1,9],[1,16],....[4,1],[4,9],....[1,4,9]....3. For each combination (as you generate)- check if the sum of add up to N
**To save memory space, upon generating each instance, you can verify if it sums up to N. If not, discard it and move on to the next.
One of the instances will be [9,81] where 9+81=[90]
I think you can determine whether a number is a sum of consecutive squares quickly in the following manner, which vastly reduces the amount of arithmetic that needs to be done. First, precompute all the sums of squares and place them in an array:
0, 0+1=1, 1+4=5, 5+9=14, 14+16=30, 30+25=55, 55+36=91, ...
Now, if a number is the sum of two or more consecutive squares, we can complete it by adding a number from the above sequence to obtain another number in the above sequence. For example, 77=16+25+36, and we can complete it by adding the listed number 14=0+1+4+9 to obtain the listed number 91=14+77=(0+1+4+9)+(16+25+36). The converse holds as well, provided the two listed numbers are at least two positions apart on the list.
How long does our list have to be? We can stop when we add the first square of n which satisfies (n-1)^2+n^2 > max where max in this case is 1000. Simplifying, we can stop when 2(n-1)^2 > max or n > sqrt(max/2) + 1. So for max=1000, we can stop when n=24.
To quickly test membership in the set, we should hash the numbers in the list as well as storing them in the list; the value of the hash should be the location of the number in the list so that we can quickly locate its position to determine whether it is at least two positions away from the starting point.
Here's my suggestion in Java:
import java.util.HashMap;
public class SumOfConsecutiveSquares {
// UPPER_BOUND is the largest N we are testing;
static final int UPPER_BOUND = 1000;
// UPPER_BOUND/2, sqrt, then round up, then add 1 give MAX_INDEX
static final int MAX_INDEX = (int)(Math.sqrt(UPPER_BOUND/2.0)) + 1 + 1;
static int[] sumsOfSquares = new int[MAX_INDEX+1];
static HashMap<Integer,Integer> sumsOfSquaresHash
= new HashMap<Integer,Integer>();
// pre-compute our list
static {
sumsOfSquares[0] = 0;
sumsOfSquaresHash.put(0,0);
for (int i = 1; i <= MAX_INDEX; ++i) {
sumsOfSquares[i] = sumsOfSquares[i-1] + i*i;
sumsOfSquaresHash.put(sumsOfSquares[i],i);
}
}
public static boolean isSumOfConsecutiveSquares(int N) {
for (int i=0; i <= MAX_INDEX; ++i) {
int candidate = sumsOfSquares[i] + N;
if (sumsOfSquaresHash.containsKey(candidate)
&& sumsOfSquaresHash.get(candidate) - i >= 2) {
return true;
}
}
return false;
}
public static void main(String[] args) {
for (int i=0; i < 1000; ++i) {
if (isSumOfConsecutiveSquares(i)) {
System.out.println(i);
}
}
}
}
Each run of the function performs at most 25 additions and 25 hash table lookups. No multiplications.
To use it efficiently to solve the problem, construct 1, 2, and 3-digit palindromes (1-digit are easy: 1, 2, ..., 9; 2-digit by multiplying by 11: 11, 22, 33, ..., 99; 3-digit by the formula i*101 + j*10. Then check the palindromes with the function above and print out if it returns true.
public static boolean isSumOfSquares(int num) {
int sum = 0;
int lowerBound = 1;
//largest square root that is less than num
int upperBound = (int)Math.floor(Math.sqrt(num));
while(lowerBound != upperBound) {
sum = 0
for(int x=lowerBound; x<upperBound; x++) {
sum += x * x;
}
if(sum != num) {
lowerBound++;
}
else {
return true;
}
}
return false;
}
Perhaps I am missing the point, but considering N, for 1<=N<=1000 the most efficient way would be to solve the problem some way (perhaps brute force) and store the solutions in a switch.
switch(n){
case 5:
case 13:
...
return true;
default:
return false;
}
public static boolean validNumber(int num) {
if (!isPalindrome(num))
return false;
int i = 1, j = 2, sum = 1*1 + 2*2;
while (i < j)
if (sum > num) {
sum = sum - i*i; i = i + 1;
} else if (sum < num) {
j = j + 1; sum = sum + j*j;
} else {
return true;
}
return false;
}
However There Are Only Eleven "Good Numbers" { 5, 55, 77, 181, 313, 434, 505, 545, 595, 636, 818 }. And This Grows Very Slow, For N = 10^6, There Are Only 59.

Categories

Resources