Is my analysis of space complexity correct? - java

This is problem 9.5 from Cracking the Coding Interview 5th edition
The Problem: Write a method to compute all permutations of a string
Here is my solution, coded in Java(test it, it works :) )
public static void generatePerm(String s) {
Queue<Character> poss = new LinkedList<Character>();
int len = s.length();
for(int count=0;count<len;count++)
poss.add(s.charAt(count));
generateRecurse(poss, len, "");
}
private static void generateRecurse(Queue<Character> possibles, int n, String word) {
if(n==0)
System.out.println(word);
else {
for(int count=0;count<n;count++) {
char first = possibles.remove();
generateRecurse(possibles, n-1, word+first);
possibles.add(first);
}
}
}
I agreed with the author that my solution runs in O(n!) time complexity because to solve this problem, you have to consider factorials, like for a word like "top", there are three possibilities for the first letter, 2 for the second and so on....
However she didn't make any mention of space complexity. I know that interviewers love to ask you the time and space complexity of your solution. What would the space complexity of this solution be? My initial guess was O(n2) because there are n recursive calls at each level n. So you would add n + n - 1 + n - 2 + n - 3.....+ 1 to get n(n+1)⁄2 which is in O(n2). I reasoned that there are n recursive calls, because you have to backtrack n times at each level and that space complexity is the number of recursive calls your algorithm makes. For example, when considering all the permutations of "TOP", at level, 3 recursive calls, gR([O,P],2,"T"), gR([P,T],2,"O"), gR([T,O],2,"P") are made. Is my analysis of space complexity correct?

I think you got the right answer but for the wrong reason. The number of recursive calls doesn't have anything to do with it. When you make a recursive call, it will add a certain amount of space to the stack; but when that call exits, the stack space is released. So suppose you have something like this:
void method(int n) {
if (n == 1) {
for (int i = 0; i < 10000; i++) {
method(0);
}
}
}
method(1);
Although method calls itself 10000 times, there will still be no more than 2 invocations of method on the stack at any one time. So the space complexity would be O(1) [constant].
The reason your algorithm has space complexity O(n2) is because of the word string. When n gets down to 0, there will be len stack entries being taken up by invocations of generateRecurse. There will be len stack entries at most, so the space usage of the stack will only be O(n); but each of those stack entries has its own word, which will all exist on the heap at the same time; and the lengths of those word parameters are 1, 2, ..., len, which of course do add up to (len * (len+1)) / 2, which means the space usage will be O(n2).
MORE ABOUT STACK FRAMES: It appears that an explanation of the basics of stack frames would be helpful...
A "stack frame" is just an area of memory that's part of the "stack". Typically, the stack is a predefined area of memory; the location and size of stack frames, however, are not predefined. When a program is first executed, there won't be anything on the stack (actually, there will probably be some initial data there, but let's say there's nothing, just to keep things simple). So the stack area of memory looks like this:
bottom of stack top of stack
------------------------------------------------------------------
| nothing |
------------------------------------------------------------------
^
+--- stack pointer
(This assumes that the stack grows upward, from lower to higher addresses. Many machines have stacks that grow downward. To simplify, I'll keep assuming that this is a machine whose stack grows upward.)
When a method (function, procedure, subroutine, etc.) is called, a certain area of the stack is allocated. The area is enough to hold the method's local variables (or references to them), parameters (or references to them), some data so that the program will know where to go back when you return, and possibly other information--the other information is highly dependent on the machine, the programming language, and the compiler. In Java, the first method will be main
bottom of stack top of stack
------------------------------------------------------------------
| main's frame | nothing |
------------------------------------------------------------------
^
+--- stack pointer
Note that the stack pointer has moved up. Now main calls method1. Since method1 will return to main, the local variables and parameters of main have to be preserved for when main gets to resume executing. A new frame, of some size, is allocated on the stack:
bottom of stack top of stack
------------------------------------------------------------------
| main's frame | method1's frame | nothing |
------------------------------------------------------------------
^
+--- stack pointer
and then method1 calls method2:
bottom of stack top of stack
------------------------------------------------------------------
| main's frame | method1's frame | method2's frame | nothing |
------------------------------------------------------------------
^
+--- stack pointer
Now method2 returns. After method2 returns, its parameters and local variables will no longer be accessible. Therefore, the entire frame can be thrown out. This is done by moving the stack pointer back to where it was before. (The "previous stack pointer" is one of the things saved in some frame.) The stack goes back to looking like this:
bottom of stack top of stack
------------------------------------------------------------------
| main's frame | method1's frame | nothing |
------------------------------------------------------------------
^
+--- stack pointer
This means that, at this point, the machine will see the portion of the stack starting with the stack pointer as "unused". It's not really correct to speak of method2's frame being reused. You can't really use something that has ceased to exist, and method2's frame no longer exists. Conceptually, all there is is a big empty area of the stack. If method1 calls another method, whether it's method2, method1 recursively, System.out.println, or something else, a new frame will be created at the place where the stack pointer is now pointing. This frame could be smaller, equal, or larger in size than the method2 frame used to be. It will take up part or all of the memory where the method2 frame was. If it's another call to method2, it doesn't matter whether it's called with the same or different parameters. It can't matter, because the program doesn't remember what parameters were used last time. All it knows is that the area of memory starting with the stack pointer is empty and available for use. The program has no idea what frame most recently lived there. That frame is gone, gone, gone.
If you can follow this, you can see that when computing the space complexity and when looking just at the amount of space used by the stack, the only thing that matters is, how many frames can exist on the stack at any one point in time? Frames that may have existed in the past but no longer do are not relevant to the computation, no matter what parameters the methods were called with.
(P.S. In case anyone was planning to point out how I'm technically wrong about this or that detail--I already know that this is a gross oversimplification.)

Related

How can you properly make a Stack with Integers on BlueJ?

I've been using BlueJ for a while now and recently, we've started making and working on Stacks and Arrays in my class. This is basically what I have to do currently:
Create the class "StackTest", which contains a Stack called "zahlen" with values of the type "Integer". Add the numbers 5, 10, 50 and 30 to the Stack respectively. Finally, run the Stack and it should show all values that are bigger than 10 in the console.
They also gave us certain keywords that have to be used at least once in the class: Keywords
import java.util.Stack;
public class StackTest
{
public StackTest(){
Stack zahlen = new Stack();
zahlen.push(5);
zahlen.push(10);
zahlen.push(50);
zahlen.push(30);
while (!zahlen.isEmpty()){
if(zahlen.top()>10){
}
zahlen.pop();
}
}
}
My problem is that first of all, I don't know what exactly the Integer in parenthesis is or what it can be used for (talking about (Integer) ) and I also don't know how you can check if the top number ( zahlen.top() ) can be used in the if command.
I think it would be genuinely worth your while to read up on the Stack class in the official Java documentation:
https://docs.oracle.com/javase/8/docs/api/java/util/Stack.html
It can be a bit dense, but it contains a lot of useful information. This will give you the info you need regardless of using BlueJ, Eclipse, or any other IDE. :-)
A stack follows LIFO rule (Last-In-First-Out). Think of a stack like a stack of dirty plates that you want to clean, to clean a plate you would take a plate on top of the stack rather than reach for the bottom or the middle. In your case rather than a stack of dirty plates, it's a stack of Integers.
Once you've created your stack collection, you push() elements into the stack in your code example it would look something like this:
|30|
|50|
|10|
|5 |
¯¯
Note that you don't have access to any of elements aside from the top of stack (30 in this case). To gain access to elements below you have to pop() the stack which will remove it from the collection.
For example:
int value = zahlen.pop();
will cause value to be equal to 30 and your new stack collection will look like this:
| |
|50|
|10|
|5 |
¯¯
You can now use value to check whether or not it's greater than 10 then use
System.out.println() to print out the value to the console, simply loop this till your stack is empty. If you wish to look at the value on top of the stack without popping it off you can use peek() method.

Stack overflow occurring on first procedure call inside recursive procedure

I am getting a stack overflow error due to a bug I understand, but what I do not understand is why the stack overflow is occurring on the first procedure in the recursive procedure instead of on the call to the recursive procedure.
In a method to solve a sudoku puzzle here is the recursive segment (the bolded text is the recursive call:
System.out.print(""); &lt= stack overflow occurs here
int[] move_quality_sorted_keys = Sorting_jsc.RadixSort_unsigned_1( move_quality );
for( int xPossibleMove = 1; xPossibleMove <= ctPossibleMoves; xPossibleMove++ ){
int xMove = move_quality_sorted_keys[ctPossibleMoves - xPossibleMove + 1];
int[][] new_grid = new int[10][10];
for( int xRow = 1; xRow <= 9; xRow++ )
for( int xColumn = 1; xColumn <= 9; xColumn++ )
new_grid[xRow][xColumn] = grid[xRow][xColumn];
new_grid[move_row[xMove]][move_column[xMove]] = move_value[xMove];
int[][] solution = solveSudokuGrid( new_grid );
if( solution != null ) return solution;
}
The stack overflow error is the following (note it is occurring on the System.out.print() statement):
Exception in thread "main" java.lang.StackOverflowError
at java.io.BufferedWriter.write(BufferedWriter.java:221)
at java.io.Writer.write(Writer.java:157)
at java.io.PrintStream.write(PrintStream.java:525)
at java.io.PrintStream.print(PrintStream.java:669)
at Euler100.solveSudokuGrid(Euler100.java:2458)
at Euler100.solveSudokuGrid(Euler100.java:2467)
at Euler100.solveSudokuGrid(Euler100.java:2467)
at Euler100.solveSudokuGrid(Euler100.java:2467)
at Euler100.solveSudokuGrid(Euler100.java:2467)
at Euler100.solveSudokuGrid(Euler100.java:2467)
at Euler100.solveSudokuGrid(Euler100.java:2467)
at Euler100.solveSudokuGrid(Euler100.java:2467)
at Euler100.solveSudokuGrid(Euler100.java:2467)
at Euler100.solveSudokuGrid(Euler100.java:2467)
I would expect the stack overflow to occur on the call to solveSudokuGrid, not on the print statement. Why is it?
Look at it this way: each time you call System.out.println you push 4 (or more) additional stack frames on to the top of your stack as you see in the error. These are then popped off the stack before you call your own function recursively. The depth of the stack therefore goes like this:
your code, 1 level
println, 5 levels
your code, 2 levels
println, 6 levels
your code, 3 levels
println, 7 levels
...
your code, n levels
println, n + 4 levels
your code, n + 1 levels
...
Assuming each level takes the same amount of stack memory (which isn't actually true but is probably close enough for this kind of analysis) it should be quite obvious that for any particular limit on the size of the stack, the println code will break through it first.
All that is actually required is for the other procedure to use more memory on the stack than your procedure and this will always happen. If it uses less, it might still happen (because for any given level it is called before your code), and presumably as the println call is only there to demonstrate this, the radix sort code you have a call to in the next line was previously triggering the behaviour. It presumably uses more stack space than your own method (which seems quite likely; you only have 6 local variables and most of your expressions are very simple).
Because you exceed the stack bound at that point. wiki here
It seems that System.out.println will eventually call BufferedWriter.write which is also a recursive function that will eventually cause the stackoverflow.

Declaring multiple arrays with 64 elements 1000 times faster than declaring array of 65 elements

Recently I noticed declaring an array containing 64 elements is a lot faster (>1000 fold) than declaring the same type of array with 65 elements.
Here is the code I used to test this:
public class Tests{
public static void main(String args[]){
double start = System.nanoTime();
int job = 100000000;//100 million
for(int i = 0; i < job; i++){
double[] test = new double[64];
}
double end = System.nanoTime();
System.out.println("Total runtime = " + (end-start)/1000000 + " ms");
}
}
This runs in approximately 6 ms, if I replace new double[64] with new double[65] it takes approximately 7 seconds. This problem becomes exponentially more severe if the job is spread across more and more threads, which is where my problem originates from.
This problem also occurs with different types of arrays such as int[65] or String[65].
This problem does not occur with large strings: String test = "many characters";, but does start occurring when this is changed into String test = i + "";
I was wondering why this is the case and if it is possible to circumvent this problem.
You are observing a behavior that is caused by the optimizations done by the JIT compiler of your Java VM. This behavior is reproducible triggered with scalar arrays up to 64 elements, and is not triggered with arrays larger than 64.
Before going into details, let's take a closer look at the body of the loop:
double[] test = new double[64];
The body has no effect (observable behavior). That means it makes no difference outside of the program execution whether this statement is executed or not. The same is true for the whole loop. So it might happen, that the code optimizer translates the loop to something (or nothing) with the same functional and different timing behavior.
For benchmarks you should at least adhere to the following two guidelines. If you had done so, the difference would have been significantly smaller.
Warm-up the JIT compiler (and optimizer) by executing the benchmark several times.
Use the result of every expression and print it at the end of the benchmark.
Now let's go into details. Not surprisingly there is an optimization that is triggered for scalar arrays not larger than 64 elements. The optimization is part of the Escape analysis. It puts small objects and small arrays onto the stack instead of allocating them on the heap - or even better optimize them away entirely. You can find some information about it in the following article by Brian Goetz written in 2005:
Urban performance legends, revisited: Allocation is faster than you think, and getting faster
The optimization can be disabled with the command line option -XX:-DoEscapeAnalysis. The magic value 64 for scalar arrays can also be changed on the command line. If you execute your program as follows, there will be no difference between arrays with 64 and 65 elements:
java -XX:EliminateAllocationArraySizeLimit=65 Tests
Having said that, I strongly discourage using such command line options. I doubt that it makes a huge difference in a realistic application. I would only use it, if I would be absolutely convinced of the necessity - and not based on the results of some pseudo benchmarks.
There are any number of ways that there can be a difference, based on the size of an object.
As nosid stated, the JITC may be (most likely is) allocating small "local" objects on the stack, and the size cutoff for "small" arrays may be at 64 elements.
Allocating on the stack is significantly faster than allocating in heap, and, more to the point, stack does not need to be garbage collected, so GC overhead is greatly reduced. (And for this test case GC overhead is likely 80-90% of the total execution time.)
Further, once the value is stack-allocated the JITC can perform "dead code elimination", determine that the result of the new is never used anywhere, and, after assuring there are no side-effects that would be lost, eliminate the entire new operation, and then the (now empty) loop itself.
Even if the JITC does not do stack allocation, it's entirely possible for objects smaller than a certain size to be allocated in a heap differently (eg, from a different "space") than larger objects. (Normally this would not produce quite so dramatic timing differences, though.)

StackOverflowError in Math.Random in a randomly recursive method

This is the context of my program.
A function has 50% chance to do nothing, 50% to call itself twice.
What is the probability that the program will finish?
I wrote this piece of code, and it works great apparently. The answer which may not be obvious to everyone is that this program has 100% chance to finish. But there is a StackOverflowError (how convenient ;) ) when I run this program, occuring in Math.Random(). Could someone point to me where does it come from, and tell me if maybe my code is wrong?
static int bestDepth =0;
static int numberOfPrograms =0;
#Test
public void testProba(){
for(int i = 0; i <1000; i++){
long time = System.currentTimeMillis();
bestDepth = 0;
numberOfPrograms = 0;
loop(0);
LOGGER.info("Best depth:"+ bestDepth +" in "+(System.currentTimeMillis()-time)+"ms");
}
}
public boolean loop(int depth){
numberOfPrograms++;
if(depth> bestDepth){
bestDepth = depth;
}
if(proba()){
return true;
}
else{
return loop(depth + 1) && loop(depth + 1);
}
}
public boolean proba(){
return Math.random()>0.5;
}
.
java.lang.StackOverflowError
at java.util.Random.nextDouble(Random.java:394)
at java.lang.Math.random(Math.java:695)
.
I suspect the stack and the amount of function in it is limited, but I don't really see the problem here.
Any advice or clue are obviously welcome.
Fabien
EDIT: Thanks for your answers, I ran it with java -Xss4m and it worked great.
Whenever a function is called or a non-static variable is created, the stack is used to place and reserve space for it.
Now, it seems that you are recursively calling the loop function. This places the arguments in the stack, along with the code segment and the return address. This means that a lot of information is being placed on the stack.
However, the stack is limited. The CPU has built-in mechanics that protect against issues where data is pushed into the stack, and eventually override the code itself (as the stack grows down). This is called a General Protection Fault. When that general protection fault happens, the OS notifies the currently running task. Thus, originating the Stackoverflow.
This seems to be happening in Math.random().
In order to handle your problem, I suggest you to increase the stack size using the -Xss option of Java.
As you said, the loop function recursively calls itself. Now, tail recursive calls can be rewritten to loops by the compiler, and not occupy any stack space (this is called the tail call optimization, TCO). Unfortunately, java compiler does not do that. And also your loop is not tail-recursive. Your options here are:
Increase the stack size, as suggested by the other answers. Note that this will just defer the problem further in time: no matter how large your stack is, its size is still finite. You just need a longer chain of recursive calls to break out of the space limit.
Rewrite the function in terms of loops
Use a language, which has a compiler that performs TCO
You will still need to rewrite the function to be tail-recursive
Or rewrite it with trampolines (only minor changes are needed). A good paper, explaining trampolines and generalizing them further is called "Stackless Scala with Free Monads".
To illustrate the point in 3.2, here's how the rewritten function would look like:
def loop(depth: Int): Trampoline[Boolean] = {
numberOfPrograms = numberOfPrograms + 1
if(depth > bestDepth) {
bestDepth = depth
}
if(proba()) done(true)
else for {
r1 <- loop(depth + 1)
r2 <- loop(depth + 1)
} yield r1 && r2
}
And initial call would be loop(0).run.
Increasing the stack-size is a nice temporary fix. However, as proved by this post, though the loop() function is guaranteed to return eventually, the average stack-depth required by loop() is infinite. Thus, no matter how much you increase the stack by, your program will eventually run out of memory and crash.
There is nothing we can do to prevent this for certain; we always need to encode the stack in memory somehow, and we'll never have infinite memory. However, there is a way to reduce the amount of memory you're using by about 2 orders of magnitude. This should give your program a significantly higher chance of returning, rather than crashing.
We can do this by noticing that, at each layer in the stack, there's really only one piece of information we need to run your program: the piece that tells us if we need to call loop() again or not after returning. Thus, we can emulate the recursion using a stack of bits. Each emulated stack-frame will require only one bit of memory (right now it requires 64-96 times that, depending on whether you're running in 32- or 64-bit).
The code would look something like this (though I don't have a Java compiler right now so I can't test it):
static int bestDepth = 0;
static int numLoopCalls = 0;
public void emulateLoop() {
//Our fake stack. We'll push a 1 when this point on the stack needs a second call to loop() made yet, a 0 if it doesn't
BitSet fakeStack = new BitSet();
long currentDepth = 0;
numLoopCalls = 0;
while(currentDepth >= 0)
{
numLoopCalls++;
if(proba()) {
//"return" from the current function, going up the callstack until we hit a point that we need to "call loop()"" a second time
fakeStack.clear(currentDepth);
while(!fakeStack.get(currentDepth))
{
currentDepth--;
if(currentDepth < 0)
{
return;
}
}
//At this point, we've hit a point where loop() needs to be called a second time.
//Mark it as called, and call it
fakeStack.clear(currentDepth);
currentDepth++;
}
else {
//Need to call loop() twice, so we push a 1 and continue the while-loop
fakeStack.set(currentDepth);
currentDepth++;
if(currentDepth > bestDepth)
{
bestDepth = currentDepth;
}
}
}
}
This will probably be slightly slower, but it will use about 1/100th the memory. Note that the BitSet is stored on the heap, so there is no longer any need to increase the stack-size to run this. If anything, you'll want to increase the heap-size.
The downside of recursion is that it starts filling up your stack which will eventually cause a stack overflow if your recursion is too deep. If you want to ensure that the test ends you can increase your stack size using the answers given in the follow Stackoverflow thread:
How to increase to Java stack size?

Problem with terminated paths in simple recursive algorithm

First of all: this is not a homework assignment, it's for a hobby project of mine.
Background:
For my Java puzzle game I use a very simple recursive algorithm to check if certain spaces on the 'map' have become isolated after a piece is placed. Isolated in this case means: where no pieces can be placed in.
Current Algorithm:
public int isolatedSpace(Tile currentTile, int currentSpace){
if(currentTile != null){
if(currentTile.isOpen()){
currentTile.flag(); // mark as visited
currentSpace += 1;
currentSpace = isolatedSpace(currentTile.rightNeighbor(),currentSpace);
currentSpace = isolatedSpace(currentTile.underNeighbor(),currentSpace);
currentSpace = isolatedSpace(currentTile.leftNeighbor(),currentSpace);
currentSpace = isolatedSpace(currentTile.upperNeighbor(),currentSpace);
if(currentSpace < 3){currentTile.markAsIsolated();} // <-- the problem
}
}
return currentSpace;
}
This piece of code returns the size of the empty space where the starting tile is part of. That part of the code works as intented. But I came across a problem regarding the marking of the tiles and that is what makes the title of this question relevant ;)
The problem:
The problem is that certain tiles are never 'revisited' (they return a value and terminate, so never get a return value themselves from a later incarnation to update the size of the empty space). These 'forgotten' tiles can be part of a large space but still marked as isolated because they were visited at the beginning of the process when currentSpace had a low value.
Question:
How to improve this code so it sets the correct value to the tiles without too much overhead? I can think of ugly solutions like revisiting all flagged tiles and if they have the proper value check if the neighbors have the same value, if not update etc. But I'm sure there are brilliant people here on Stack Overflow with much better ideas ;)
Update:
I've made some changes.
public int isolatedSpace(Tile currentTile, int currentSpace, LinkedList<Tile> visitedTiles){
if(currentTile != null){
if(currentTile.isOpen()){
// do the same as before
visitedTiles.add();
}
}
return currentSpace;
}
And the marktiles function (only called when the returned spacesize is smaller than a given value)
marktiles(visitedTiles){
for(Tile t : visitedTiles){
t.markAsIsolated();
}
}
This approach is in line with the answer of Rex Kerr, at least if I understood his idea.
This isn't a general solution, but you only mark spaces as isolated if they occur in a region of two or fewer spaces. Can't you simplify this test to "a space is isolated iff either (a) it has no open neighbours or (b) precisely one open neighbour and that neighbour has no other open neighbours".
You need to have a two-step process: gathering info about whether a space is isolated, and then then marking as isolated separately. So you'll need to first count up all the spaces (using one recursive function) and then mark all connected spaces if the criterion passes (using a different recursive function).

Categories

Resources