I'm trying to learn Java, Scala, & Clojure.
I'm working through the Project Euler problems in the three languages. Listed below is the code for problem #5 (http://projecteuler.net/problem=5) as well as the run time (in seconds) so far on the first five problems. It is striking to me that the Java and Clojure versions are so much slower than the Scala one for problem #5. They are running on the same machine, same jvm, and the results are consistent over several trials. What can I do to speed the two up (especially the Clojure version)? Why is the Scala version so much faster?
Running Times (in seconds)
|---------|--------|--------|----------|
| problem | Java | Scala | Clojure |
|=========|========|========|==========|
| 1 | .0010 | .1570 | .0116 |
| 2 | .0120 | .0030 | .0003 |
| 3 | .0530 | .0200 | .1511 |
| 4 | .2120 | .2600 | .8387 |
| 5 | 3.9680 | .3020 | 33.8574 |
Java Version of Problem #5
public class Problem005 {
private static ArrayList<Integer> divisors;
private static void initializeDivisors(int ceiling) {
divisors = new ArrayList<Integer>();
for (Integer i = 1; i <= ceiling; i++)
divisors.add(i);
}
private static boolean isDivisibleByAll(int n) {
for (int divisor : divisors)
if (n % divisor != 0)
return false;
return true;
}
public static int findSmallestMultiple (int ceiling) {
initializeDivisors(ceiling);
int number = 1;
while (!isDivisibleByAll(number))
number++;
return number;
}
}
Scala Version of Problem #5
object Problem005 {
private def isDivisibleByAll(n: Int, top: Int): Boolean =
(1 to top).forall(n % _ == 0)
def findSmallestMultiple(ceiling: Int): Int = {
def iter(n: Int): Int = if (isDivisibleByAll(n, ceiling)) n else iter(n+1)
iter(1)
}
}
Clojure Verson of Problem #5
(defn smallest-multiple-of-1-to-n
[n]
(loop [divisors (range 2 (inc n))
i n]
(if (every? #(= 0 (mod i %)) divisors)
i
(recur divisors (inc i)))))
EDIT
It was suggested that I compile the various answers into my own answer. However, I want to give credit where credit is due (I really didn't answer this question myself).
As to the first question, all three versions could be sped up by using a better algorithm. Specifically, create a list of the greatest common factors of the numbers 1-20 (2^4, 3^2, 5^1, 7^1, 11^1, 13^1, 17^1, 19^1) and multiply them out.
The far more interesting aspect is to understand the differences between the three languages using essentially the same algorithm. There are instances where a brute force algorithm such as this one can be helpful. So, why the performance difference?
For Java, one suggestion was to change the ArrayList to a primitive array of ints. This does decrease the running time, cutting about 0.5 - 1 second off (I just ran it this morning and it cut the running time from 4.386 seconds to 3.577 seconds. That cuts down a bit, but no one was able to come up with a way to bring it to under a half second (similar to the Scala version). This is surprising considering that all three compile down to java byte-code. There was a suggestion by #didierc to use an immutable iterator; I tested this suggestion, and it increased the running time to just over 5 seconds.
For Clojure, #mikera and #Webb give several suggestions to speed things up. They suggest to use loop/recur for fast iteration with two loop variables, unchecked-math for slightly faster maths operations (since we know there is no danger of overflow here), use primitive longs rather than boxed numbers, and avoid higher order functions like every?
Running the code of #mikera, I end up with a running time of 2.453 seconds, not quite as good as the scala code, but much better than my original version and better than the Java version:
(set! *unchecked-math* true)
(defn euler5
[]
(loop [n 1
d 2]
(if (== 0 (unchecked-remainder-int n d))
(if (>= d 20) n (recur n (inc d)))
(recur (inc n) 2))))
(defn is-divisible-by-all?
[number divisors]
(= 0 (reduce + (map #(mod 2 %) divisors))))
For Scala, #didierc states that the range object 1 to 20 isn't actually a list of objects but rather one object. Very cool. Thus, the performance difference in Scala is that the we iterate over a single object instead of the list/array of integers 1-20.
In fact, if I change the helper function in the scala method from a range object to a list (see below), then the running time of the scala version increases from 0.302 seconds to 226.59 seconds.
private def isDivisibleByAll2(n: Int, top: Int): Boolean = {
def divisors: List[Int] = List(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
divisors.forall(n % _ == 0)
}
Thus, it appears that #didierc has correctly identified the advantage scala has in this instance. It would be interesting to know how this type of object might be implemented in java and clojure.
#didierc suggestion to improve the code by creating an ImmutableRange class, as follows:
import java.util.Iterator;
import java.lang.Iterable;
public class ImmutableRange implements Iterable<Integer> {
class ImmutableRangeIterator implements Iterator<Integer> {
private int counter, end, step;
public ImmutableRangeIterator(int start_, int end_, int step_) {
end = end_;
step = step_;
counter = start_;
}
public boolean hasNext(){
if (step>0) return counter <= end;
else return counter >= end;
}
public Integer next(){
int r = counter;
counter+=step;
return r;
}
public void remove(){
throw new UnsupportedOperationException();
}
}
private int start, end, step;
public ImmutableRange(int start_, int end_, int step_){
// fix-me: properly check for parameters consistency
start = start_;
end = end_;
step = step_;
}
public Iterator<Integer> iterator(){
return new ImmutableRangeIterator(start,end,step);
}
}
did not improve the running time. The java version ran at 5.097 seconds on my machine. Thus, at the end, we have a satisfactory answer as to why the Scala version performs better, we understand how to improve the performance of the Clojure version, but what is missing would be to understand how to implement a Scala's immutable range object in Java.
FINAL THOUGHTS
As several have commented, the the most effective way to improve the running time of this code is to use a better algorithm. For example, the following java code computes the answer in less than 1 millisecond using the Sieve of Eratosthenes and Trial Division:
/**
* Smallest Multiple
*
* 2520 is the smallest number that can be divided by each of the numbers
* from 1 to 10 without any remainder. What is the smallest positive number
* that is evenly divisible by all of the numbers from 1 to 20?
*
* User: Alexandros Bantis
* Date: 1/29/13
* Time: 7:06 PM
*/
public class Problem005 {
final private static int CROSSED_OUT = 0;
final private static int NOT_CROSSED_OUT = 1;
private static int intPow(int base, int exponent) {
int value = 1;
for (int i = 0; i < exponent; i++)
value *= base;
return value;
}
/**
* primesTo computes all primes numbers up to n using trial by
* division algorithm
*
* #param n designates primes should be in the range 2 ... n
* #return int[] a sieve of all prime factors
* (0=CROSSED_OUT, 1=NOT_CROSSED_OUT)
*/
private static int[] primesTo(int n) {
int ceiling = (int) Math.sqrt(n * 1.0) + 1;
int[] sieve = new int[n+1];
// set default values
for (int i = 2; i <= n; i++)
sieve[i] = NOT_CROSSED_OUT;
// cross out sieve values
for (int i = 2; i <= ceiling; i++)
for (int j = 2; i*j <= n; j++)
sieve[i*j] = CROSSED_OUT;
return sieve;
}
/**
* getPrimeExp computes a prime factorization of n
*
* #param n the number subject to prime factorization
* #return int[] an array of exponents for prime factors of n
* thus 8 => (0^0, 1^0, 2^3, 3^0, 4^0, 5^0, 6^0, 7^0, 8^0)
*/
public static int[] getPrimeExp(int n) {
int[] factor = primesTo(n);
int[] primePowAll = new int[n+1];
// set prime_factor_exponent for all factor/exponent pairs
for (int i = 2; i <= n; i++) {
if (factor[i] != CROSSED_OUT) {
while (true) {
if (n % i == 0) {
n /= i;
primePowAll[i] += 1;
} else {
break;
}
}
}
}
return primePowAll;
}
/**
* findSmallestMultiple computes the smallest number evenly divisible
* by all numbers 1 to n
*
* #param n the top of the range
* #return int evenly divisible by all numbers 1 to n
*/
public static int findSmallestMultiple(int n) {
int[] gcfAll = new int[n+1];
// populate greatest common factor arrays
int[] gcfThis = null;
for (int i = 2; i <= n; i++) {
gcfThis = getPrimeExp(i);
for (int j = 2; j <= i; j++) {
if (gcfThis[j] > 0 && gcfThis[j] > gcfAll[j]) {
gcfAll[j] = gcfThis[j];
}
}
}
// multiply out gcf arrays
int value = 1;
for (int i = 2; i <= n; i++) {
if (gcfAll[i] > 0)
value *= intPow(i, gcfAll[i]);
}
return value;
}
}
Here's a much faster version in Clojure:
(set! *unchecked-math* true)
(defn euler5 []
(loop [n 1
d 2)]
(if (== 0 (unchecked-remainder-int n d))
(if (>= d 20) n (recur n (inc d)))
(recur (inc n) 2))))
(time (euler5))
=> "Elapsed time: 2438.761237 msecs"
i.e. it is around the same speed as your Java version.
The key tricks are:
use loop/recur for fast iteration with two loop variables
use unchecked-math for slightly faster maths operations (since we know there is no danger of overflow here)
use primitive longs rather than boxed numbers
avoid higher order functions like every? - they have a higher overhead than the low level operations
Obviously, if you really care about speed you would pick a better algorithm :-)
Scala is faster because the other solutions create explicit collections for no reason. In Scala, 1 to top creates an object that represents the numbers from 1 to top but doesn't explicitly list them anywhere. In Java, you do explicitly create the list--and it's a lot faster to create one object than an array of 20 (actually 21 objects, since ArrayList is also an object) every iteration.
(Note that none of the versions are actually anywhere near optimal. See "least common multiple", which is what Eastsun is doing without mentioning it.)
The first thing I noticed that will probably have some impact on speed in the Java version is that you're creating an ArrayList<Integer> instead of an int[].
Java has a feature since version 5 that will automatically convert between an Integer and int - you're iterating over this list, treating them as int type in your comparisons and math calculations, which forces Java to spend a lot of cycles converting between the two types. Replacing your ArrayList<Integer> with an int[] will probably have some performance impact.
My first instinct when looking at your timings is to verify all are giving correct results. I assume you've properly tested all three to make sure the faster Scala version is indeed giving you correct results.
It doesn't seem related to the choice of algorithm for solving it since the strategy looks the same in all three (I'm not familiar with Clojure or Scala, so I might be missing on some subtle difference). Perhaps Scala is able to internally optimize this particular loop/algorithm, yielding much faster results?
On my painfully slow computer, the Clojure code takes nearly 10 minutes, so I am running about 20x slower on old faithful here.
user=> (time (smallest-multiple-of-1-to-n 20))
"Elapsed time: 561420.259 msecs"
232792560
You might be able to make this same algorithm more comparable with the others by avoiding laziness, using type hints / primitives / unchecked operations, etc. The Clojure code is boxing primitives for the anonymous function and creating/realizing a lazy sequence for range each iteration of the loop. This overhead is usually negligible, but here it is being looped hundreds of millions of times. The following non-idiomatic code gives a 3x speed-up.
(defn smallest-multiple-of-1-to-n [n]
(loop [c (int n)]
(if
(loop [x (int 2)]
(cond (pos? (unchecked-remainder-int c x)) false
(>= x n) true
:else (recur (inc x))))
c (recur (inc c)))))
user=> (time (smallest-multiple-of-1-to-n 20))
"Elapsed time: 171921.80347 msecs"
232792560
You could continue to tinker with this and probably get even closer, but it is better to think through the algorithm instead and do better than iterating from 20 to ~200 million.
(defn gcd [a b]
(if (zero? b) a (recur b (mod a b))))
(defn lcm
([a b] (* b (quot a (gcd a b))))
([a b & r] (reduce lcm (lcm a b) r)))
user=> (time (apply lcm (range 2 21)))
"Elapsed time: 0.268749 msecs"
232792560
So even on my ancient machine, this is over 1000x faster than any implementation of your algorithm on your quick machine. I noticed that a gcd/lcm fold solution was posted for Scala as well. So, it would interesting to compare speeds of these similar algorithms.
Follow your algorithm, The clojure is about 10 times slower than java version.
A bit faster for the clojure version:
46555ms => 23846ms
(defn smallest-multiple-of-1-to-n
[n]
(let [divisors (range 2 (inc n))]
(loop [i n]
(if (loop [d 2]
(cond (> d n) true
(not= 0 (mod i d)) false
:else (recur (inc d))))
i
(recur (inc i))))))
A bit faster for the Java version: 3248ms => 2757ms
private static int[] divisors;
private static void initializeDivisors(int ceiling) {
divisors = new int[ceiling];
for (Integer i = 1; i <= ceiling; i++)
divisors[i - 1] = i;
}
First of all, if a number is divisible by, for example, 4, it is also divisible by 2 (one of 4's factors).
So, from 1-20, you only need to check some of the numbers, not all of them.
Secondly, if you can prime factorize the numbers, this is simply asking you for the lowest common multiplier (that's another way to approach this problem). In fact, you could probably do it with pen and paper since its only 1-20.
The algorithm that you're working with is fairly naive - it doesn't use the information that the problem is providing you with to its full extent.
Here is a more efficient solution in scala:
def smallestMultipe(n: Int): Int = {
#scala.annotation.tailrec
def gcd(x: Int, y: Int): Int = if(x == 0) y else gcd(y%x, x)
(1 to n).foldLeft(1){ (x,y) => x/gcd(x,y)*y }
}
And I doublt why your scala version of Problem 1 is so un-efficient.
Here are two possible solution of Problem 1 in Scala:
A short one:
(1 until 1000) filter (n => n%3 == 0 || n%5 == 0) sum
A more efficient one:
(1 until 1000).foldLeft(0){ (r,n) => if(n%3==0||n%5==0) r+n else r }
The problem is not about boxing, laziness, list, vectors, etc. The problem is about the algorithm. Of course, the solution is "brute force", but it's about the proportion of "brute" in "force".
First, in Euler Problem 5, we are not asked to check divisibility by 1 to n: just one to twenty. That said: Second, the solution must be a multiple of 38. Third, the prime numbers must be checked first and all divisors must be checked in descending order, to fail as soon as possible. Fourth, some divisors also ensure other divisors, i.e. if a number is divisible by 18, it is also divisible by 9, 6 and 3. Finally, all numbers are divisible by 1.
This solution in Clojure runs in a negligible time of 410 ms on a MacBook Pro i7:
;Euler 5 helper
(defn divisible-by-all [n]
(let [divisors [19 17 13 11 20 18 16 15 14 12]
maxidx (dec (count divisors))]
(loop [idx 0]
(let [result (zero? (mod n (nth divisors idx)))]
(cond
(and (= idx maxidx) (true? result)) true
(false? result) false
:else (recur (inc idx)))))))
;Euler 5 solution
(defn min-divisible-by-one-to-twenty []
(loop[ x 38 ] ;this one can be set MUCH MUCH higher...
(let [result (divisible-by-all x)]
(if (true? result) x (recur (+ x 38))))))
user=>(time (min-divisible-by-one-to-twenty))
"Elapsed time: 410.06 msecs"
I believe this is the fastest pure Java code you could write for that problem and naive algorithm. It is faster than Scala.
public class Euler5 {
public static void main(String[] args) {
int test = 2520;
int i;
again: while (true) {
test++;
for (i = 20; i >1; i--) {
if (test % i != 0)
continue again;
}
break;
}
System.out.println(test);
}
}
A couple of little details:
We can start testing at 2520 since the question mentioned it as a value :)
It seemed to me like we'd fail faster at the top of the range than at the bottom - I mean, how many things are divisible by 19 vs say, 3?
I used a label for the continue statement. This is basically a cheap, synthetic way to both reset the for loop and increment our test case.
Related
factorial of 42 is going beyond the final limit of long data type in java. that's why I can't find digits.
42!
The factorial of 42 is of 51 digits while the max limit of long datatype in Java is 9,223,372,036,854,775,807 i.e only 20 digits. But don't worry, Java has a Big Integer class to store large numbers such as 100!. But it's a bit slower than primitive data types such as int, long etc because it stores integers in the form of arrays. There are many ways to implement the Big Integer class but here's the most used way. This code calculates the factorial of 42 and prints the same-
// Java program to find large factorials using BigInteger
import java.math.BigInteger;
public class Factorial
{
// Returns Factorial of N
static BigInteger factorial(int N)
{
BigInteger fact = new BigInteger("1"); // Or BigInteger.ONE
// Multiply f with 2, 3, ...N
for (int i = 2; i <= N; i++)
fact = fact.multiply(BigInteger.valueOf(i));
return fact;
}
public static void main(String args[])
{
int N = 42;
System.out.println(factorial(N));
}
}
Output:
1405006117752879898543142606244511569936384000000000
Explanation
We have to import the Big Integer class, which is stored in java.math package. I have named my file Factorial.java, so my class name is Factorial.
In this method, I've created a function, if you want the code without function, just comment below. Now in this syntax-
BigInteger f = new BigInteger("1");
I've assigned fact as Big integer which is equal to 1. In the for loop,
i value is set to 2 s 1*1=1.
fact = fact.multiply(BigInteger.valueOf(i));
The above syntax is for the multiplication of Big integers. This multiplies the Biginteger fact by i.
Have a look at this GeeksforGeeks article- https://www.geeksforgeeks.org/biginteger-class-in-java/
If you only care about the number of digits, I would recommend taking a more mathematical approach. There are ways to compute this number without actually computing the factorial itself. This would not require so big a variable and would be a lot faster.
You could think it this way:
Digits(n!) = floor(log10(n!)) + 1 = floor(log10(n * (n - 1) * ... * 1)) + 1 =floor(\sum_{i = 1}^{n}log10(i)) + 1
A picture of this expression: expression
This would still require iteration, but it deals with much smaller numbers.
If you still want O(1) complexity for this task, you can go with a pretty good approximation I've just tried.
Digits(n!) ~ floor(\int_{1}^{x}log10(x) dx) + 1 = floor(\frac{-x + x*ln(x) + 1}{ln(10)}) + 1
Another image of this formula: approximate expression
Of course, the latter is no absolutely exact since we are now integrating a continuous function. However, it will probably be worth implementing.
Digits(42!) = floor(50.37...) + 1 = 50 + 1 = 51
This is a question regarding a piece of coursework so would rather you didn't fully answer the question but rather give tips to improve the run time complexity of my current algorithm.
I have been given the following information:
A function g(n) is given by g(n) = f(n,n) where f may be defined recursively by
I have implemented this algorithm recursively with the following code:
public static double f(int i, int j)
{
if (i == 0 && j == 0) {
return 0;
}
if (i ==0 || j == 0) {
return 1;
}
return ((f(i-1, j)) + (f(i-1, j-1)) + (f(i, j-1)))/3;
}
This algorithm gives the results I am looking for, but it is extremely inefficient and I am now tasked to improve the run time complexity.
I wrote an algorithm to create an n*n matrix and it then computes every element up to the [n][n] element in which it then returns the [n][n] element, for example f(1,1) would return 0.6 recurring. The [n][n] element is 0.6 recurring because it is the result of (1+0+1)/3.
I have also created a spreadsheet of the result from f(0,0) to f(7,7) which can be seen below:
Now although this is much faster than my recursive algorithm, it has a huge overhead of creating a n*n matrix.
Any suggestions to how I can improve this algorithm will be greatly appreciated!
I can now see that is it possible to make the algorithm O(n) complexity, but is it possible to work out the result without creating a [n][n] 2D array?
I have created a solution in Java that runs in O(n) time and O(n) space and will post the solution after I have handed in my coursework to stop any plagiarism.
This is another one of those questions where it's better to examine it, before diving in and writing code.
The first thing i'd say you should do is look at a grid of the numbers, and to not represent them as decimals, but fractions instead.
The first thing that should be obvious is that the total number of you have is just a measure of the distance from the origin, .
If you look at a grid in this way, you can get all of the denominators:
Note that the first row and column are not all 1s - they've been chosen to follow the pattern, and the general formula which works for all of the other squares.
The numerators are a little bit more tricky, but still doable. As with most problems like this, the answer is related to combinations, factorials, and then some more complicated things. Typical entries here include Catalan numbers, Stirling's numbers, Pascal's triangle, and you will nearly always see Hypergeometric functions used.
Unless you do a lot of maths, it's unlikely you're familiar with all of these, and there is a hell of a lot of literature. So I have an easier way to find out the relations you need, which nearly always works. It goes like this:
Write a naive, inefficient algorithm to get the sequence you want.
Copy a reasonably large amount of the numbers into google.
Hope that a result from the Online Encyclopedia of Integer Sequences pops up.
3.b. If one doesn't, then look at some differences in your sequence, or some other sequence related to your data.
Use the information you find to implement said sequence.
So, following this logic, here are the numerators:
Now, unfortunately, googling those yielded nothing. However, there are a few things you can notice about them, the main being that the first row/column are just powers of 3, and that the second row/column are one less than powers of three. This kind boundary is exactly the same as Pascal's triangle, and a lot of related sequences.
Here is the matrix of differences between the numerators and denominators:
Where we've decided that the f(0,0) element shall just follow the same pattern. These numbers already look much simpler. Also note though - rather interestingly, that these numbers follow the same rules as the initial numbers - except the that the first number is one (and they are offset by a column and a row). T(i,j) = T(i-1,j) + T(i,j-1) + 3*T(i-1,j-1):
1
1 1
1 5 1
1 9 9 1
1 13 33 13 1
1 17 73 73 17 1
1 21 129 245 192 21 1
1 25 201 593 593 201 25 1
This looks more like the sequences you see a lot in combinatorics.
If you google numbers from this matrix, you do get a hit.
And then if you cut off the link to the raw data, you get sequence A081578, which is described as a "Pascal-(1,3,1) array", which exactly makes sense - if you rotate the matrix, so that the 0,0 element is at the top, and the elements form a triangle, then you take 1* the left element, 3* the above element, and 1* the right element.
The question now is implementing the formulae used to generate the numbers.
Unfortunately, this is often easier said than done. For example, the formula given on the page:
T(n,k)=sum{j=0..n, C(k,j-k)*C(n+k-j,k)*3^(j-k)}
is wrong, and it takes a fair bit of reading the paper (linked on the page) to work out the correct formula. The sections you want are proposition 26, corollary 28. The sequence is mentioned in Table 2 after proposition 13. Note that r=4
The correct formula is given in proposition 26, but there is also a typo there :/. The k=0 in the sum should be a j=0:
Where T is the triangular matrix containing the coefficients.
The OEIS page does give a couple of implementations to calculate the numbers, but neither of them are in java, and neither of them can be easily transcribed to java:
There is a mathematica example:
Table[ Hypergeometric2F1[-k, k-n, 1, 4], {n, 0, 10}, {k, 0, n}] // Flatten
which, as always, is ridiculously succinct. And there is also a Haskell version, which is equally terse:
a081578 n k = a081578_tabl !! n !! k
a081578_row n = a081578_tabl !! n
a081578_tabl = map fst $ iterate
(\(us, vs) -> (vs, zipWith (+) (map (* 3) ([0] ++ us ++ [0])) $
zipWith (+) ([0] ++ vs) (vs ++ [0]))) ([1], [1, 1])
I know you're doing this in java, but i could not be bothered to transcribe my answer to java (sorry). Here's a python implementation:
from __future__ import division
import math
#
# Helper functions
#
def cache(function):
cachedResults = {}
def wrapper(*args):
if args in cachedResults:
return cachedResults[args]
else:
result = function(*args)
cachedResults[args] = result
return result
return wrapper
#cache
def fact(n):
return math.factorial(n)
#cache
def binomial(n,k):
if n < k: return 0
return fact(n) / ( fact(k) * fact(n-k) )
def numerator(i,j):
"""
Naive way to calculate numerator
"""
if i == j == 0:
return 0
elif i == 0 or j == 0:
return 3**(max(i,j)-1)
else:
return numerator(i-1,j) + numerator(i,j-1) + 3*numerator(i-1,j-1)
def denominator(i,j):
return 3**(i+j-1)
def A081578(n,k):
"""
http://oeis.org/A081578
"""
total = 0
for j in range(n-k+1):
total += binomial(k, j) * binomial(n-k, j) * 4**(j)
return int(total)
def diff(i,j):
"""
Difference between the numerator, and the denominator.
Answer will then be 1-diff/denom.
"""
if i == j == 0:
return 1/3
elif i==0 or j==0:
return 0
else:
return A081578(j+i-2,i-1)
def answer(i,j):
return 1 - diff(i,j) / denominator(i,j)
# And a little bit at the end to demonstrate it works.
N, M = 10,10
for i in range(N):
row = "%10.5f"*M % tuple([numerator(i,j)/denominator(i,j) for j in range(M)])
print row
print ""
for i in range(N):
row = "%10.5f"*M % tuple([answer(i,j) for j in range(M)])
print row
So, for a closed form:
Where the are just binomial coefficients.
Here's the result:
One final addition, if you are looking to do this for large numbers, then you're going to need to compute the binomial coefficients a different way, as you'll overflow the integers. Your answers are lal floating point though, and since you're apparently interested in large f(n) = T(n,n) then I guess you could use Stirling's approximation or something.
Well for starters here are some things to keep in mind:
This condition can only occur once, yet you test it every time through every loop.
if (x == 0 && y == 0) {
matrix[x][y] = 0;
}
You should instead: matrix[0][0] = 0; right before you enter your first loop and set x to 1. Since you know x will never be 0 you can remove the first part of your second condition x == 0 :
for(int x = 1; x <= i; x++)
{
for(int y = 0; y <= j; y++)
{
if (y == 0) {
matrix[x][y] = 1;
}
else
matrix[x][y] = (matrix[x-1][y] + matrix[x-1][y-1] + matrix[x][y-1])/3;
}
}
No point in declaring row and column since you only use it once. double[][] matrix = new double[i+1][j+1];
This algorithm has a minimum complexity of Ω(n) because you just need to multiply the values in the first column and row of the matrix with some factors and then add them up. The factors stem from unwinding the recursion n times.
However you therefore need to do the unwinding of the recursion. That itself has a complexity of O(n^2). But by balancing unwinding and evaluation of recursion, you should be able to reduce complexity to O(n^x) where 1 <= x <= 2. This is some kind of similiar to algorithms for matrix-matrix multiplication, where the naive case has a complexity of O(n^3) but Strassens's algorithm is for example O(n^2.807).
Another point is the fact that the original formula uses a factor of 1/3. Since this is not accurately representable by fixed point numbers or ieee 754 floating points, the error increases when evaluating the recursion successively. Therefore unwinding the recursion could give you higher accuracy as a nice side effect.
For example when you unwind the recursion sqr(n) times then you have complexity O((sqr(n))^2+(n/sqr(n))^2). The first part is for unwinding and the second part is for evaluating a new matrix of size n/sqr(n). That new complexity actually can be simplified to O(n).
To describe time complexity we usually use a big O notation. It is important to remember that it only describes the growth given the input. O(n) is linear time complexity, but it doesn't say how quickly (or slowly) the time grows when we increase input. For example:
n=3 -> 30 seconds
n=4 -> 40 seconds
n=5 -> 50 seconds
This is O(n), we can clearly see that every increase of n increases the time by 10 seconds.
n=3 -> 60 seconds
n=4 -> 80 seconds
n=5 -> 100 seconds
This is also O(n), even though for every n we need twice that much time, and the raise is 20 seconds for every increase of n, the time complexity grows linearly.
So if you have O(n*n) time complexity and you will half the number of operations you perform, you will get O(0.5*n*n) which is equal to O(n*n) - i.e. your time complexity won't change.
This is theory, in practice the number of operations sometimes makes a difference. Because you have a grid n by n, you need to fill n*n cells, so the best time complexity you can achieve is O(n*n), but there are a few optimizations you can do:
Cells on the edges of the grid could be filled in separate loops. Currently in majority of the cases you have two unnecessary conditions for i and j equal to 0.
You grid has a line of symmetry, you could utilize it to calculate only half of it and then copy the results onto the other half. For every i and j grid[i][j] = grid[j][i]
On final note, the clarity and readability of the code is much more important than performance - if you can read and understand the code, you can change it, but if the code is so ugly that you cannot understand it, you cannot optimize it. That's why I would do only first optimization (it also increases readability), but wouldn't do the second one - it would make the code much more difficult to understand.
As a rule of thumb, don't optimize the code, unless the performance is really causing problems. As William Wulf said:
More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity.
EDIT:
I think it may be possible to implement this function with O(1) complexity. Although it gives no benefits when you need to fill entire grid, with O(1) time complexity you can instantly get any value without having a grid at all.
A few observations:
denominator is equal to 3 ^ (i + j - 1)
if i = 2 or j = 2, numerator is one less than denominator
EDIT 2:
The numerator can be expressed with the following function:
public static int n(int i, int j) {
if (i == 1 || j == 1) {
return 1;
} else {
return 3 * n(i - 1, j - 1) + n(i - 1, j) + n(i, j - 1);
}
}
Very similar to original problem, but no division and all numbers are integers.
If the question is about how to output all values of the function for 0<=i<N, 0<=j<N, here is a solution in time O(N²) and space O(N). The time behavior is optimal.
Use a temporary array T of N numbers and set it to all ones, except for the first element.
Then row by row,
use a temporary element TT and set it to 1,
then column by column, assign simultaneously T[I-1], TT = TT, (TT + T[I-1] + T[I])/3.
Thanks to will's (first) answer, I had this idea:
Consider that any positive solution comes only from the 1's along the x and y axes. Each of the recursive calls to f divides each component of the solution by 3, which means we can sum, combinatorially, how many ways each 1 features as a component of the solution, and consider it's "distance" (measured as how many calls of f it is from the target) as a negative power of 3.
JavaScript code:
function f(n){
var result = 0;
for (var d=n; d<2*n; d++){
var temp = 0;
for (var NE=0; NE<2*n-d; NE++){
temp += choose(n,NE);
}
result += choose(d - 1,d - n) * temp / Math.pow(3,d);
}
return 2 * result;
}
function choose(n,k){
if (k == 0 || n == k){
return 1;
}
var product = n;
for (var i=2; i<=k; i++){
product *= (n + 1 - i) / i
}
return product;
}
Output:
for (var i=1; i<8; i++){
console.log("F(" + i + "," + i + ") = " + f(i));
}
F(1,1) = 0.6666666666666666
F(2,2) = 0.8148148148148148
F(3,3) = 0.8641975308641975
F(4,4) = 0.8879743941472337
F(5,5) = 0.9024030889600163
F(6,6) = 0.9123609205913732
F(7,7) = 0.9197747256986194
I am writing a code for a crypto method to compute x^d modulo n using Repeated Squaring
public static long repeatedSquaring(long x, long d, long n){
x = x%n;
boolean dj = d % 2 == 1;
long c = dj ? x : 1;
d = d / 2;
while (d > 0){
dj = d % 2 == 1;
x = x * x % n; //Here
if (dj)
c = c * x % n; //and here..
d = d / 2;
}
return c;
}
This code work fine when n is small. But with n > sqrt(Long.MAX_VALUE)it gives an unexpected result.
Because with x ≈ n, we can have x*x > Long.MAX_VALUE and then the modulo operator give an incorrect value assign to x (or c).
So, my question is, how we can compute (A * B) % N (all are long type) using only math related method.
I don't want to use BigInteger (BigA.multiply(BigB).remainder(BigN) or we can use BigX.modPow(BigD, BigN) directly for the big problem).
I think that a normal computing will run faster than String computing? Morever with my problem, all temp values are long type 'enough'.
And I wonder that the solution will work fine with the worst case: A, B, N <≈ Long.MAX_VALUE.
multiplying can be done in log(B) time simliar to exponentiation
if(b is odd) a+multiply(2*a,(b-1)/2) mod N
else multiply(2*a,b/2) mod N
this works till longvalue/2
http://en.wikipedia.org/wiki/Montgomery_reduction might be more optimum
Really, the short answer is that you need to use BigInteger, even if you don't want to. As you've discovered, the approach you're currently taking will overflow the bounds of a long; even if you improve the algorithm, you still can't get more than 64 bits into the answer with a long.
You say you're using this for crypto; but 64-bit public key crypto is so weak that it is worse than not having it (because it gives a false sense of security). Even 1024 bits is not enough these days for public key, and 64 bits could be cracked more or less instantaneously.
Note that this is not the same as symmetric crypto, where the keys can be much smaller. (But even there, 64 bits is not enough to stop even an amateur hacker.)
See this question, where it was pointed out that 64-bit RSA can be cracked in a fraction of a second... and that was four years ago!
recently I became interested in the subset-sum problem which is finding a zero-sum subset in a superset. I found some solutions on SO, in addition, I came across a particular solution which uses the dynamic programming approach. I translated his solution in python based on his qualitative descriptions. I'm trying to optimize this for larger lists which eats up a lot of my memory. Can someone recommend optimizations or other techniques to solve this particular problem? Here's my attempt in python:
import random
from time import time
from itertools import product
time0 = time()
# create a zero matrix of size a (row), b(col)
def create_zero_matrix(a,b):
return [[0]*b for x in xrange(a)]
# generate a list of size num with random integers with an upper and lower bound
def random_ints(num, lower=-1000, upper=1000):
return [random.randrange(lower,upper+1) for i in range(num)]
# split a list up into N and P where N be the sum of the negative values and P the sum of the positive values.
# 0 does not count because of additive identity
def split_sum(A):
N_list = []
P_list = []
for x in A:
if x < 0:
N_list.append(x)
elif x > 0:
P_list.append(x)
return [sum(N_list), sum(P_list)]
# since the column indexes are in the range from 0 to P - N
# we would like to retrieve them based on the index in the range N to P
# n := row, m := col
def get_element(table, n, m, N):
if n < 0:
return 0
try:
return table[n][m - N]
except:
return 0
# same definition as above
def set_element(table, n, m, N, value):
table[n][m - N] = value
# input array
#A = [1, -3, 2, 4]
A = random_ints(200)
[N, P] = split_sum(A)
# create a zero matrix of size m (row) by n (col)
#
# m := the number of elements in A
# n := P - N + 1 (by definition N <= s <= P)
#
# each element in the matrix will be a value of either 0 (false) or 1 (true)
m = len(A)
n = P - N + 1;
table = create_zero_matrix(m, n)
# set first element in index (0, A[0]) to be true
# Definition: Q(1,s) := (x1 == s). Note that index starts at 0 instead of 1.
set_element(table, 0, A[0], N, 1)
# iterate through each table element
#for i in xrange(1, m): #row
# for s in xrange(N, P + 1): #col
for i, s in product(xrange(1, m), xrange(N, P + 1)):
if get_element(table, i - 1, s, N) or A[i] == s or get_element(table, i - 1, s - A[i], N):
#set_element(table, i, s, N, 1)
table[i][s - N] = 1
# find zero-sum subset solution
s = 0
solution = []
for i in reversed(xrange(0, m)):
if get_element(table, i - 1, s, N) == 0 and get_element(table, i, s, N) == 1:
s = s - A[i]
solution.append(A[i])
print "Solution: ",solution
time1 = time()
print "Time execution: ", time1 - time0
I'm not quite sure if your solution is exact or a PTA (poly-time approximation).
But, as someone pointed out, this problem is indeed NP-Complete.
Meaning, every known (exact) algorithm has an exponential time behavior on the size of the input.
Meaning, if you can process 1 operation in .01 nanosecond then, for a list of 59 elements it'll take:
2^59 ops --> 2^59 seconds --> 2^26 years --> 1 year
-------------- ---------------
10.000.000.000 3600 x 24 x 365
You can find heuristics, which give you just a CHANCE of finding an exact solution in polynomial time.
On the other side, if you restrict the problem (to another) using bounds for the values of the numbers in the set, then the problem complexity reduces to polynomial time. But even then the memory space consumed will be a polynomial of VERY High Order.
The memory consumed will be much larger than the few gigabytes you have in memory.
And even much larger than the few tera-bytes on your hard drive.
( That's for small values of the bound for the value of the elements in the set )
May be this is the case of your Dynamic programing algorithm.
It seemed to me that you were using a bound of 1000 when building your initialization matrix.
You can try a smaller bound. That is... if your input is consistently consist of small values.
Good Luck!
Someone on Hacker News came up with the following solution to the problem, which I quite liked. It just happens to be in python :):
def subset_summing_to_zero (activities):
subsets = {0: []}
for (activity, cost) in activities.iteritems():
old_subsets = subsets
subsets = {}
for (prev_sum, subset) in old_subsets.iteritems():
subsets[prev_sum] = subset
new_sum = prev_sum + cost
new_subset = subset + [activity]
if 0 == new_sum:
new_subset.sort()
return new_subset
else:
subsets[new_sum] = new_subset
return []
I spent a few minutes with it and it worked very well.
An interesting article on optimizing python code is available here. Basically the main result is that you should inline your frequent loops, so in your case this would mean instead of calling get_element twice per loop, put the actual code of that function inside the loop in order to avoid the function call overhead.
Hope that helps! Cheers
, 1st eye catch
def split_sum(A):
N_list = 0
P_list = 0
for x in A:
if x < 0:
N_list+=x
elif x > 0:
P_list+=x
return [N_list, P_list]
Some advices:
Try to use 1D list and use bitarray to reduce memory footprint at minimum (http://pypi.python.org/pypi/bitarray) so you will just change get / set functon. This should reduce your memory footprint by at lest 64 (integer in list is pointer to integer whit type so it can be factor 3*32)
Avoid using try - catch, but figure out proper ranges at beginning, you might found out that you will gain huge speed.
The following code works for Python 3.3+ , I have used the itertools module in Python that has some great methods to use.
from itertools import chain, combinations
def powerset(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
nums = input("Enter the Elements").strip().split()
inputSum = int(input("Enter the Sum You want"))
for i, combo in enumerate(powerset(nums), 1):
sum = 0
for num in combo:
sum += int(num)
if sum == inputSum:
print(combo)
The Input Output is as Follows:
Enter the Elements 1 2 3 4
Enter the Sum You want 5
('1', '4')
('2', '3')
Just change the values in your set w and correspondingly make an array x as big as the len of w then pass the last value in the subsetsum function as the sum for which u want subsets and you wl bw done (if u want to check by giving your own values).
def subsetsum(cs,k,r,x,w,d):
x[k]=1
if(cs+w[k]==d):
for i in range(0,k+1):
if x[i]==1:
print (w[i],end=" ")
print()
elif cs+w[k]+w[k+1]<=d :
subsetsum(cs+w[k],k+1,r-w[k],x,w,d)
if((cs +r-w[k]>=d) and (cs+w[k]<=d)) :
x[k]=0
subsetsum(cs,k+1,r-w[k],x,w,d)
#driver for the above code
w=[2,3,4,5,0]
x=[0,0,0,0,0]
subsetsum(0,0,sum(w),x,w,7)
This question was asked in my interview.
random(0,1) is a function that generates integers 0 and 1 randomly.
Using this function how would you design a function that takes two integers a,b as input and generates random integers including a and b.
I have No idea how to solve this.
We can do this easily by bit logic (E,g, a=4 b=10)
Calculate difference b-a (for given e.g. 6)
Now calculate ceil(log(b-a+1)(Base 2)) i.e. no of bits required to represent all numbers b/w a and b
now call random(0,1) for each bit. (for given example range will be b/w 000 - 111)
do step 3 till the number(say num) is b/w 000 to 110(inclusive) i.e. we need only 7 levels since b-a+1 is 7.So there are 7 possible states a,a+1,a+2,... a+6 which is b.
return num + a.
I hate this kind of interview Question because there are some
answer fulfilling it but the interviewer will be pretty mad if you use them. For example,
Call random,
if you obtain 0, output a
if you obtain 1, output b
A more sophisticate answer, and probably what the interviewer wants is
init(a,b){
c = Max(a,b)
d = log2(c) //so we know how much bits we need to cover both a and b
}
Random(){
int r = 0;
for(int i = 0; i< d; i++)
r = (r<<1)| Random01();
return r;
}
You can generate random strings of 0 and 1 by successively calling the sub function.
So we have randomBit() returning 0 or 1 independently, uniformly at random and we want a function random(a, b) that returns a value in the range [a,b] uniformly at random. Let's actually make that the range [a, b) because half-open ranges are easier to work with and equivalent. In fact, it is easy to see that we can just consider the case where a == 0 (and b > 0), i.e. we just want to generate a random integer in the range [0, b).
Let's start with the simple answer suggested elsewhere. (Forgive me for using c++ syntax, the concept is the same in Java)
int random2n(int n) {
int ret = n ? randomBit() + (random2n(n - 1) << 1) : 0;
}
int random(int b) {
int n = ceil(log2(b)), v;
while ((v = random2n(n)) >= b);
return v;
}
That is-- it is easy to generate a value in the range [0, 2^n) given randomBit(). So to get a value in [0, b), we repeatedly generate something in the range [0, 2^ceil(log2(b))] until we get something in the correct range. It is rather trivial to show that this selects from the range [0, b) uniformly at random.
As stated before, the worst case expected number of calls to randomBit() for this is (1 + 1/2 + 1/4 + ...) ceil(log2(b)) = 2 ceil(log2(b)). Most of those calls are a waste, we really only need log2(n) bits of entropy and so we should try to get as close to that as possible. Even a clever implementation of this that calculates the high bits early and bails out as soon as it exits the wanted range has the same expected number of calls to randomBit() in the worst case.
We can devise a more efficient (in terms of calls to randomBit()) method quite easily. Let's say we want to generate a number in the range [0, b). With a single call to randomBit(), we should be able to approximately cut our target range in half. In fact, if b is even, we can do that. If b is odd, we will have a (very) small chance that we have to "re-roll". Consider the function:
int random(int b) {
if (b < 2) return 0;
int mid = (b + 1) / 2, ret = b;
while (ret == b) {
ret = (randomBit() ? mid : 0) + random(mid);
}
return ret;
}
This function essentially uses each random bit to select between two halves of the wanted range and then recursively generates a value in that half. While the function is fairly simple, the analysis of it is a bit more complex. By induction one can prove that this generates a value in the range [0, b) uniformly at random. Also, it can be shown that, in the worst case, this is expected to require ceil(log2(b)) + 2 calls to randomBit(). When randomBit() is slow, as may be the case for a true random generator, this is expected to waste only a constant number of calls rather than a linear amount as in the first solution.
function randomBetween(int a, int b){
int x = b-a;//assuming a is smaller than b
float rand = random();
return a+Math.ceil(rand*x);
}