Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I try to calculate factorial in a functional style.
I did this:
private static Function<BigInteger, BigInteger> factorial = x -> BigInteger.ONE.equals(x)
? BigInteger.ONE
: x.multiply(Main.factorial.apply(x.subtract(BigInteger.ONE)));
And I have got StackOverflowError when trying to get 11111!
BUT when I calculate factorial using this method:
private static BigInteger factorial(BigInteger request) {
if (BigInteger.ONE.equals(request)) return BigInteger.ONE;
else return request.multiply(factorial(request.subtract(BigInteger.ONE)));
}
I can get the result without StackOverflowError.
Is functional style less effective? Why?
There are twice as many calls in the functional style as compared to function calls. see image.
So while the stack size increases to 11,111 calls in latter, it increases by over 22,222 calls in functional style. I believe stack limit in your environment should be between 11111 and 22222 so that explains why it breaks. So in this sense Functional style seems inefficient.
You can increase the stack size using -Xss described in the below link.
Or, better to use tail recursion which looks something like this:
private static BiFunction<BigInteger, BigInteger, BigInteger> factorialTR = (n, acc) -> BigInteger.ONE.equals(x)
? BigInteger.ONE
: Main.factorialTR.apply(x.subtract(BigInteger.ONE), acc * n));
This will still cause StackoverflowError in Java as it does not support tail call optimization. But Scala, lisp do, there you wont get one.
Refs
Tail-recursive factorial
Leetcode explanation(requires login)
max stack depth
Your terminology is somewhat confusing. Both of the examples you showed are written in a functional style: there are no side-effects, no mutable state, no loops. Both examples are referentially transparent.
Also, you seem to be under the impression that only one of those examples will throw a StackOverflowError. That is not the case. Both of those will eventually blow the stack.
In fact, in my testing, both of those blew the stack pretty much at the same value.
For the lambda version, I ran multiple tests, and the stack overflow happened at slightly different values each time, the smallest and biggest ones were around 11000 and around 15300.
For the method version, the stack overflow happened pretty consistently between 13901 and 13907.
Initially, I thought that the lambda version would consistently overflow earlier than the method version, because it uses much more complex runtime machinery (LambdaMetaFactory, method handles, call sites, invokedynamic) which increases the stack size. But, it looks like more than increase the stack size, it increases the variance due to its bigger reliance on runtime optimizations and heuristics.
By the way, your code (both versions) has the same two bugs (which are actually the same bug): it doesn't handle factorial of zero (which is one) and it runs into an infinite recursion for negative numbers. A more correct version would be something like:
private static Function<BigInteger, BigInteger> factorial =
x ->
x.compareTo(BigInteger.ZERO) < 0
? throw new ArgumentError()
: BigInteger.ZERO.equals(x)
? BigInteger.ONE
: x.multiply(App.factorial.apply(x.subtract(BigInteger.ONE)));
private static BigInteger factorial(BigInteger request) {
if (request.compareTo(BigInteger.ZERO) < 0) throw new ArgumentError;
if (BigInteger.ZERO.equals(request)) return BigInteger.ONE;
else return request.multiply(factorial(request.subtract(BigInteger.ONE)));
}
Related
I have 2 strings in an array. I want there to be a 10% chance of one and 90% chance to select the other. Right now I am using:
Random random = new Random();
int x = random.nextInt(100 - 1) + 1;
if (x < 10) {
string = stringArray(0);
} else {
string = stringArray(1);
}
Is this the best way of accomplishing this or is there a better method?
I know it's typically a bad idea to submit a stack overflow response without submitting code, but I really challenge this question of " the best way." People ask this all the time and, while there are established design patterns in software worth knowing, this question almost always can be answered by "it depends."
For example, your pattern looks fine (I might add some comments). You might get a minuscule performance increase by using 1 - 10 instead of 1 - 100, but the things you need to ask yourself are as follows :
If I get hit by a bus, is the person who is going to be working on the application going to know what I was trying to do?
If it isn't intuitive, I should write a comment. Then I should ask myself, "Can I change this code so that a comment isn't necessary?"
Is there an existing library that solves this problem? If so, is it FOSS approved (if applicable) / can I use it?
What is the size of this codebase eventually going to be? Am I making a full program with microservices, a DAO, DTO, Controller, View, and different layers for validation?
Is there an existing convention to solve my problem (either at my company or in general), or is it unique enough that I can take my own spin on it?
Does this follow the DRY principle?
I'm in (apparently) a very small camp on stack overflow that doesn't always believe in universal "bests" for solving code problems. Just remember, programming is only as hard as the problem you're trying to solve.
EDIT
Since people asked, I'd do it like this:
/*
* #author DaveCat
* #version 1.0
* #since 2019-03-9
* Convenience method that calculates 90% odds of A and 10% odds of B.
*
*/
public static String[] calculatesNinetyPercent()
{
Random random = new Random();
int x = random.nextInt(10 - 1 ) + 1
//Option A
if(x <= 9) {
return stringArray(0);
}
else
{
//Option B
return stringArray(1);
}
}
As an aside, one of the common mistakes junior devs make in enterprise level development is excessive comments.This has a javadoc, which is probably overkill, but I'm assuming this is a convenience method you're using in a greater program.
Edit (again)
You guys keep confusing me. This is how you randomly generate between 2 given numbers in Java
One alternative is to use a random float value between 0..1 and comparing it to the probability of the event. If the random value is less than the probability, then the event occurs.
In this specific example, set x to a random float and compare it to 0.1
I like this method because it can be used for probabilities other than percent integers.
This question already has answers here:
Is finding the equivalence of two functions undecidable?
(9 answers)
Closed 6 years ago.
Is there a way to compare if two methods are equivalent by function (i.e. they do the same thing) rather than equivalent by value (i.e. all of the code in the method is the same) ?
For example these two methods are coded differently, but perform the same function.
public int doIt(int a, int b) {
a = a + 1;
b = b + 1;
return a + b;
}
public int doIt2(int z, int x) {
int total = z + x + 2;
return total;
}
I was looking for a way to do this in Eclipse, but am interested if this is even possible beyond a trivial method.
The only way to be 100% is to mathematically prove it
There are ways:
1- Theorem proving
2- Model Checking
and etc
Although these approaches can be very hard, sometime it might take days to prove it even for trivial programs and even days to produce the adequate abstraction level.
There are some heuristic approaches but obviously they are not 100% accurate (heuristic)
A simple heuristic approach would be to try both methods for 1000 inputs and see if the results are the same
EDIT:
here is a list of Model Checker I found on Wikipedia. I haven't used any of them, they may not be exactly what you are looking for.
https://en.wikipedia.org/wiki/List_of_model_checking_tools
Ignoring side effects, 2 functions will be functionally equivalent if for the same input, they produce the same output.
This will only work for pure code though. There's no way I know of to monitor for side effects in general since the side effects a function carries out could be anything.
Note, there wouldn't be a way to completely verify this without testing every possible input. If the input is just a limited Enum, that might be easy. If it's 2 integers though for example, the total number of combinations would be huge.
In general, the purpose of refactoring is to have a function behave the same before and after it is refactored. Developers generally do this by creating extensive unit tests, testing both normal, edge, and exception cases.
In the OP's two functions to be compared, doIt and doIt2, they might usually return the same answer, given any integer inputs a and b. Unit testing would demonstrate this.
But what if a or b were the largest integer that Java could store, MAX_VALUE?
What if there were a side effect from a=a+1?
In these cases, the two functions may appear similar on the surface, but yield different results.
This question already has answers here:
StackOverflowError computing factorial of a BigInteger?
(5 answers)
Closed 6 years ago.
I've been working around with this recursive function but couldn't find myself which led me to overflow error and it keeps coming around. I've already tried casting to BigInteger also but nothing special is coming. I don't see any warning or syntax error in my Eclipse IDE. I had to submit an efficient algorithm for big numbers. thanks in advance. :-)
public static BigInteger factorial(int n)
{
return n > 2 ? new BigInteger(n+"").multiply(factorial(n-1)) : new BigInteger(n+"");
}
The problem
You're getting the error because the computer has to remember every method call you make (and other information) until that method call is finished, and there's only so much space on the "stack" set aside to remember all that.
You recurse so many times that you overflow the stack space set up to remember method calls that are in progress. That's called a stack overflow.
A possible solution
A reasonably-efficient algorithm is to use a simple loop. This has the side benefit of not causing stack overflows, since you don't create more method calls over and over again, you just do stuff inside the first method call.
You should also use BigInteger.valueOf(n) instead of new BigInteger(n+""):
public static BigInteger factorial(int n) {
BigInteger result = BigInteger.ONE;
for (; n > 1; n--) {
result = result.multiply(BigInteger.valueOf(n));
}
return result;
}
This takes my computer about 6 seconds to compute 100,000!
More efficient solutions
There are faster algorithms than this. See another question and its links for more details.
This is the context of my program.
A function has 50% chance to do nothing, 50% to call itself twice.
What is the probability that the program will finish?
I wrote this piece of code, and it works great apparently. The answer which may not be obvious to everyone is that this program has 100% chance to finish. But there is a StackOverflowError (how convenient ;) ) when I run this program, occuring in Math.Random(). Could someone point to me where does it come from, and tell me if maybe my code is wrong?
static int bestDepth =0;
static int numberOfPrograms =0;
#Test
public void testProba(){
for(int i = 0; i <1000; i++){
long time = System.currentTimeMillis();
bestDepth = 0;
numberOfPrograms = 0;
loop(0);
LOGGER.info("Best depth:"+ bestDepth +" in "+(System.currentTimeMillis()-time)+"ms");
}
}
public boolean loop(int depth){
numberOfPrograms++;
if(depth> bestDepth){
bestDepth = depth;
}
if(proba()){
return true;
}
else{
return loop(depth + 1) && loop(depth + 1);
}
}
public boolean proba(){
return Math.random()>0.5;
}
.
java.lang.StackOverflowError
at java.util.Random.nextDouble(Random.java:394)
at java.lang.Math.random(Math.java:695)
.
I suspect the stack and the amount of function in it is limited, but I don't really see the problem here.
Any advice or clue are obviously welcome.
Fabien
EDIT: Thanks for your answers, I ran it with java -Xss4m and it worked great.
Whenever a function is called or a non-static variable is created, the stack is used to place and reserve space for it.
Now, it seems that you are recursively calling the loop function. This places the arguments in the stack, along with the code segment and the return address. This means that a lot of information is being placed on the stack.
However, the stack is limited. The CPU has built-in mechanics that protect against issues where data is pushed into the stack, and eventually override the code itself (as the stack grows down). This is called a General Protection Fault. When that general protection fault happens, the OS notifies the currently running task. Thus, originating the Stackoverflow.
This seems to be happening in Math.random().
In order to handle your problem, I suggest you to increase the stack size using the -Xss option of Java.
As you said, the loop function recursively calls itself. Now, tail recursive calls can be rewritten to loops by the compiler, and not occupy any stack space (this is called the tail call optimization, TCO). Unfortunately, java compiler does not do that. And also your loop is not tail-recursive. Your options here are:
Increase the stack size, as suggested by the other answers. Note that this will just defer the problem further in time: no matter how large your stack is, its size is still finite. You just need a longer chain of recursive calls to break out of the space limit.
Rewrite the function in terms of loops
Use a language, which has a compiler that performs TCO
You will still need to rewrite the function to be tail-recursive
Or rewrite it with trampolines (only minor changes are needed). A good paper, explaining trampolines and generalizing them further is called "Stackless Scala with Free Monads".
To illustrate the point in 3.2, here's how the rewritten function would look like:
def loop(depth: Int): Trampoline[Boolean] = {
numberOfPrograms = numberOfPrograms + 1
if(depth > bestDepth) {
bestDepth = depth
}
if(proba()) done(true)
else for {
r1 <- loop(depth + 1)
r2 <- loop(depth + 1)
} yield r1 && r2
}
And initial call would be loop(0).run.
Increasing the stack-size is a nice temporary fix. However, as proved by this post, though the loop() function is guaranteed to return eventually, the average stack-depth required by loop() is infinite. Thus, no matter how much you increase the stack by, your program will eventually run out of memory and crash.
There is nothing we can do to prevent this for certain; we always need to encode the stack in memory somehow, and we'll never have infinite memory. However, there is a way to reduce the amount of memory you're using by about 2 orders of magnitude. This should give your program a significantly higher chance of returning, rather than crashing.
We can do this by noticing that, at each layer in the stack, there's really only one piece of information we need to run your program: the piece that tells us if we need to call loop() again or not after returning. Thus, we can emulate the recursion using a stack of bits. Each emulated stack-frame will require only one bit of memory (right now it requires 64-96 times that, depending on whether you're running in 32- or 64-bit).
The code would look something like this (though I don't have a Java compiler right now so I can't test it):
static int bestDepth = 0;
static int numLoopCalls = 0;
public void emulateLoop() {
//Our fake stack. We'll push a 1 when this point on the stack needs a second call to loop() made yet, a 0 if it doesn't
BitSet fakeStack = new BitSet();
long currentDepth = 0;
numLoopCalls = 0;
while(currentDepth >= 0)
{
numLoopCalls++;
if(proba()) {
//"return" from the current function, going up the callstack until we hit a point that we need to "call loop()"" a second time
fakeStack.clear(currentDepth);
while(!fakeStack.get(currentDepth))
{
currentDepth--;
if(currentDepth < 0)
{
return;
}
}
//At this point, we've hit a point where loop() needs to be called a second time.
//Mark it as called, and call it
fakeStack.clear(currentDepth);
currentDepth++;
}
else {
//Need to call loop() twice, so we push a 1 and continue the while-loop
fakeStack.set(currentDepth);
currentDepth++;
if(currentDepth > bestDepth)
{
bestDepth = currentDepth;
}
}
}
}
This will probably be slightly slower, but it will use about 1/100th the memory. Note that the BitSet is stored on the heap, so there is no longer any need to increase the stack-size to run this. If anything, you'll want to increase the heap-size.
The downside of recursion is that it starts filling up your stack which will eventually cause a stack overflow if your recursion is too deep. If you want to ensure that the test ends you can increase your stack size using the answers given in the follow Stackoverflow thread:
How to increase to Java stack size?
Just for fun, I tried to compare the stack performance of a couple of programming languages calculating the Fibonacci series using the naive recursive algorithm. The code is mainly the same in all languages, i'll post a java version:
public class Fib {
public static int fib(int n) {
if (n < 2) return 1;
return fib(n-1) + fib(n-2);
}
public static void main(String[] args) {
System.out.println(fib(Integer.valueOf(args[0])));
}
}
Ok so the point is that using this algorithm with input 40 I got these timings:
C: 2.796s
Ocaml: 2.372s
Python: 106.407s
Java: 1.336s
C#(mono): 2.956s
They are taken in a Ubuntu 10.04 box using the versions of each language available in the official repositories, on a dual core intel machine.
I know that functional languages like ocaml have the slowdown that comes from treating functions as first order citizens and have no problem to explain CPython's running time because of the fact that it's the only interpreted language in this test, but I was impressed by the java running time which is half of the c for the same algorithm! Would you attribute this to the JIT compilation?
How would you explain these results?
EDIT: thank you for the interesting replies! I recognize that this is not a proper benchmark (never said it was :P) and maybe I can make a better one and post it to you next time, in the light of what we've discussed :)
EDIT 2: I updated the runtime of the ocaml implementation, using the optimizing compiler ocamlopt. Also I published the testbed at https://github.com/hoheinzollern/fib-test. Feel free to make additions to it if you want :)
You might want to crank up the optimisation level of your C compiler. With gcc -O3, that makes a big difference, a drop from 2.015 seconds to 0.766 seconds, a reduction of about 62%.
Beyond that, you need to ensure you've tested correctly. You should run each program ten times, remove the outliers (fastest and slowest), then average the other eight.
In addition, make sure you're measuring CPU time rather than clock time.
Anything less than that, I would not consider a decent statistical analysis and it may well be subject to noise, rendering your results useless.
For what it's worth, those C timings above were for seven runs with the outliers taken out before averaging.
In fact, this question shows how important algorithm selection is when aiming for high performance. Although recursive solutions are usually elegant, this one suffers from the fault that you duplicate a lot of calculations. The iterative version:
int fib(unsigned int n) {
int t, a, b;
if (n < 2) return 1;
a = b = 1;
while (n-- >= 2) {
t = a + b;
a = b;
b = t;
}
return b;
}
further drops the time taken, from 0.766 seconds to 0.078 seconds, a further reduction of 89% and a whopping reduction of 96% from the original code.
And, as a final attempt, you should try out the following, which combines a lookup table with calculations beyond a certain point:
static int fib(unsigned int n) {
static int lookup[] = {
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377,
610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657,
46368, 75025, 121393, 196418, 317811, 514229, 832040,
1346269, 2178309, 3524578, 5702887, 9227465, 14930352,
24157817, 39088169, 63245986, 102334155, 165580141 };
int t, a, b;
if (n < sizeof(lookup)/sizeof(*lookup))
return lookup[n];
a = lookup[sizeof(lookup)/sizeof(*lookup)-2];
b = lookup[sizeof(lookup)/sizeof(*lookup)-1];
while (n-- >= sizeof(lookup)/sizeof(*lookup)) {
t = a + b;
a = b;
b = t;
}
return b;
}
That reduces the time yet again but I suspect we're hitting the point of diminishing returns here.
You say very little about your configuration (in benchmarking, details are everything: commandlines, computer used, ...)
When I try to reproduce for OCaml I get:
let rec f n = if n < 2 then 1 else (f (n-1)) + (f (n-2))
let () = Format.printf "%d#." (f 40)
$ ocamlopt fib.ml
$ time ./a.out
165580141
real 0m1.643s
This is on an Intel Xeon 5150 (Core 2) at 2.66GHz. If I use the bytecode OCaml compiler ocamlc on the other hand, I get a time similar to your result (11s). But of course, for running a speed comparison, there is no reason to use the bytecode compiler, unless you want to benchmark the speed of compilation itself (ocamlc is amazing for speed of compilation).
One possibility is that the C compiler is optimizing on the guess that the first branch (n < 2) is the one more frequently taken. It has to do that purely at compile time: make a guess and stick with it.
Hotspot gets to run the code, see what actually happens more often, and reoptimize based on that data.
You may be able to see a difference by inverting the logic of the if:
public static int fib(int n) {
if (n >= 2) return fib(n-1) + fib(n-2);
return 1;
}
It's worth a try, anyway :)
As always, check the optimization settings for all platforms, too. Obviously the compiler settings for C - and on Java, try using the client version of Hotspot vs the server version. (Note that you need to run for longer than a second or so to really get the full benefit of Hotspot... it might be interesting to put the outer call in a loop to get runs of a minute or so.)
I can explain the Python performance. Python's performance for recursion is abysmal at best, and it should be avoided like the plague when coding in it. Especially since stack overflow occurs by default at a recursion depth of only 1000...
As for Java's performance, that's amazing. It's rare that Java beats C (even with very little compiler optimization on the C side)... what the JIT might be doing is memoization or tail recursion...
Note that if the Java Hotspot VM is smart enough to memoise fib() calls, it can cut down the exponentional cost of the algorithm to something nearer to linear cost.
I wrote a C version of the naive Fibonacci function and compiled it to assembler in gcc (4.3.2 Linux). I then compiled it with gcc -O3.
The unoptimised version was 34 lines long and looked like a straight translation of the C code.
The optimised version was 190 lines long and (it was difficult to tell but) it appeared to inline at least the calls for values up to about 5.
With C, you should either declare the fibonacci function "inline", or, using gcc, add the -finline-functions argument to the compile options. That will allow the compiler to do recursive inlining. That's also the reason why with -O3 you get better performance, it activates -finline-functions.
Edit You need to at least specify -O/-O1 to have recursive inlining, also if the function is declared inline. Actually, testing myself I found that declaring the function inline and using -O as compilation flag, or just using -O -finline-functions, my recursive fibonacci code was faster than with -O2 or -O2 -finline-functions.
One C trick which you can try is to disable the stack checking (i e built-in code which makes sure that the stack is large enough to permit the additional allocation of the current function's local variables). This could be dicey for a recursive function and indeed could be the reason behind the slow C times: the executing program might well have run out of stack space which forces the stack-checking to reallocate the entire stack several times during the actual run.
Try to approximate the stack size you need and force the linker to allocate that much stack space. Then disable stack-checking and re-make the program.