This has already been discussed here, but I have an implementation below (which was never discussed in the thread),
public boolean isBalanced(BSTNode node) {
if(maxHeight() > (int)(Math.log(size())/Math.log(2)) + 1)
return false;
else
return true;
}
where maxHeight() returns the maximum height of the tree. Basically I am checking if maxHeight > log(n), where n is the number of elements in the tree. Is this a correct solution?
This solution is not correct. A balanced tree is balanced if its height is O(lg(n)), thus it (the height) needs to be smaller then c*lg(n) - for some CONSTANT c. Your solution assumes this constant is 1.
Note that only a complete tree is of height lg(n) exactly.
Look for example on a Fibonacci tree, which is a balanced tree (and is actually the worst case for an AVL tree). However - its height is larger then lgn (~1.44*lg(n)), and the suggested algorithm will return a fibonacci tree is not balanced.
Related
In the book Cracking the Coding Interview there is an example of a recursive algorithm to calculate all the permutations of a string. The code:
void permutation(String str){
permutation(str,"");
}
void permutation(String str, String prefix){
if(str.length()==0){
System.out.println(prefix);
} else{
for(int i=0;i<str.length();i++){
String rem=str.substring(0,i)+str.substring(i+1);
permutation(rem,prefix+str.charAt(i));
}
}
}
The complexity is stated to be something like O(N * N * N!). The intuition for this result is that each permutation is a leaf node, and there are N! permutations. It is stated that since each leaf node is N nodes away from the root, the upper bound on the number of nodes in the tree is O(N*N!), and we then do O(N) amount of work on each node.
I understand the linear amount of work on each node, I also understand there being N! leaf nodes. But how is it that it can be concluded that the number of nodes in a tree is equal to computing (# of leaf nodes * distance to root)? Are we not going to be counting the same nodes many times since the path is shared by several of the nodes?
For example, if the tree were a binary tree it seems this method would not be accurate. The number of nodes is then equal to 2^n - 1, which is the result of the sum of the geometric series (2^0 + 2^1 + ... + 2^(n-1)).
If n=4 a full tree would have 15 nodes. There would be 2^(n-1) = 8 leaf nodes, and they would all have n distance to the root, but 2^(n-1)*n gives a much larger number (32) than the actual full tree of 15 nodes. This result would be 2^(n+1) which is still O(2^n), but I feel like I have probably misunderstood something along the way.
My question is what makes these two examples different in terms of estimating the upper bound on the number of nodes in their respective recursive trees, when is it safe to assume that the upper-bound is the number of leaf nodes times the depth of the tree (assuming all leaf nodes have the same depth).
I understand that in order to compute the height of a Binary Search Tree in O(n) we can use the following function
public static int treeHeight(Node root) {
if (root == null) {
return -1;
}
int left = treeHeight(root.left) + 1;
int right = treeHeight(root.right) + 1;
return Math.max(left, right);
}
However, given the fact that the tree is [balanced][2] how can we calculate the height of Binary Search Tree in O(log n).
The given problem:
It's not possible.
If the bottom row were filled out in a predictable fashion it could be done. For example, if the last row were always filled out left-to-right you could descend down the left side in O(log n) time since the left side would be guaranteed to have max height.
In the problem statement the nodes in the bottom row can be anywhere. The exact height can't be computed in O(log n) time. You can get within 1 of the height in O(log n) steps, but to get the exact height you may have to examine up to n/2 nodes at the bottom of the tree to find the stragglers (if any).
The worst case is if the last level is completely full and every single node in the last level has to be checked for children. There would be n/2 nodes and two checks per node, thus n checks in total. There wouldn't be any children in this case, but it'd still take O(n) checks to verify it.
This code is meant to check if a binary tree is balanced (balanced being defined as a tree such that the heights of the two subtrees of any node never differ by more than one.
I understand the N part of the runtime O(NlogN). The N is because every node in the tree is visited at least once.
int getHeight(TreeNode root){
if(root==null) return -1; //Base case
return Math.max(getHeight(root.left), getHeight(root.right))+1;
}
boolean isBalanced(TreeNode root){
if(root == null) return true; //Base case
int heightDiff = getHeight(root.left) - getHeight(root.right);
if(Math.abs(heightDiff) > 1){
return false;
} else{ //Recurse
return isBalanced(root.left) && isBalanced(root.right);
}
}
What I don't understand is the logN part of the runtime O(NlogN). The code will trace every possible path from a node to the bottom of the tree. Therefore should the code be more like N2^N or something? How does one step by step come to the conclusion that the runtime is O(NlogN)?
I agree with you that the runtime of this code is not necessarily O(n log n). However, I don't believe that it will always trace out every path from a node to the bottom of the tree. For example, consider this tree:
*
/
*
/
*
Here, computing the depths of the left and right subtrees will indeed visit every node once. However, because an imbalance is found between the left and right subtrees, the recursion stops without recursively exploring the left subtree. In other words, finding an example where the recursion has to do a lot of work is going to require some creativity.
You are correct that the baseline check for the height difference will take time Θ(n) because every node must be scanned. The concern with this code is that it might rescan nodes many, many times as it recomputes the height differences during the recursion. If we want this function to run for a really long time - not necessarily as long as possible, but for a long time - we'd want to make it so that
the left and right subtrees have roughly the same height, so that the recursion proceeds to the left subtree, but
the tree is extremely imbalanced, placing most of the nodes into the left subtree.
One way to do this is to create trees where the right subtree is just a long spine that happens to have the same height as the left subtree, but with way fewer nodes. Here's one possible sequence of trees that has this property:
*
/ \
* * *
/ \ / \ \
* * * * * *
/ \ / \ \ / \ \ \
* * * * * * * * * *
Mechanically, each tree is formed by taking the previous tree and putting a rightward spine on top of it. Operationally, these trees are defined recursively as follows:
An order-0 tree is a single node.
An order-(k+1) tree is a node whose left child is an order-k tree and whose right child is a linked list of height k.
Notice that the number of nodes in an order-k tree is Θ(k2). You can see this by noticing that the trees have a nice triangular shape, where each layer has one more node in it than the previous one. Sums of the form 1 + 2 + 3 + ... + k work out to Θ(k2), and while we can be more precise than this, there really isn't a need to do so.
Now, what happens if we fire off this recursion on the root of any one of these trees? Well, the recursion will begin by computing the heights of the left and right subtrees, which will report that they have the same height as one another. It will then recursively explore the left subtree to see whether it's balanced. After doing some (large) amount of work, it'll find that the left subtree is not balanced, at which point the recursion won't branch to the right subtree. In other words, the amount of work done on an order-k tree is lower-bounded by
W(0) = 1 (there's a single node visited once), and
W(k+1) = W(k) + Θ(k2).
To see where the W(k+1) term comes from, notice that we begin by scanning every node in the tree and there are Θ(k2) nodes to scan, then recursively applying the procedure to the left subtree. Expanding this recurrence, we see that in an order-k tree, the total work done is
W(k) = Θ(k2) + W(k-1)
= Θ(k2 + (k - 1)2) + W(k - 2)
= Θ(k2 + (k - 1)2 + (k - 2)2) + W(k - 3)
...
= Θ(k2 + (k - 1)2 + ... + 22 + 12)
= Θ(k3).
This last step follows from the fact that the sum of the first k cubes works out to Θ(k3).
To finish things off, we have one more step. We've shown that order-k trees require Θ(k3) total work to process with this recursive algorithm. However, we'd like a runtime bound in terms of n, the total number of nodes in the tree, not k, the order of the tree. Using the fact that the number of nodes in a tree of order k is Θ(k2), we see that a tree with n nodes has order Θ(k1/2). Plugging this in, we see that for arbitrarily large n, we can make the total work done equal to Θ((n1/2)3) = Θ(n3/2), which exceeds the O(n log n) proposed bound you mentioned. I'm not sure whether this is the worst-case input for this algorithm, but it's certainly not a good one.
So yes, you are correct - the runtime is not O(n log n) in general. However, it is the case that if the tree is perfectly balanced, the runtime is indeed O(n log n). To see why, notice that if the tree is perfectly balanced, each recursive call will
do O(n) work scanning each node in the tree, then
make two recursive calls on smaller trees, each of which is approximately half as large as the previous one.
That gives the recurrence T(n) = 2T(n / 2) + O(n), which solves to O(n log n). But that's just one specific case, not the general case.
A concluding note - with a minor modification, this code can be made to run in time O(n) in all cases. Instead of recomputing the depth of each node, make an initial pass over the tree and annotate each node with its depth (either by setting some internal field equal to the depth or by having an auxiliary HashMap mapping each node to its depth). This can be done in time O(n). From there, recursively walking the tree and checking whether the left and right subtrees have heights that differ by at most one requires O(1) work per node across n total nodes for a total runtime of O(n).
Hope this helps!
This is the implementation of add in Binary Search Tree from BST Add
private IntTreeNode add(IntTreeNode root, int value) {
if (root == null) {
root = new IntTreeNode(value);
} else if (value <= root.data) {
root.left = add(root.left, value);
} else {
root.right = add(root.right, value);
}
return root;
}
I understand why this runs in O(log n). Here's how I analyze it. We have a tree size of n. How many cuts of 2, or half cut, will reduce this tree down to a size of 1. So we have the expression n(1/2)^x = 1 where the 1/2 represents each half cut. Solving this for x, we have log2(x) so the logn comes from search.
Here is a lecture slide from Heap that discusses runtime for an unbalanced binary search.
My question is even if the binary search tree is unbalanced, wouldn't the same strategy work for analyzing the runtime of add? How many cuts you have to make. Wouldn't the runtime still be O(log n), not O(n)? If so, can someone show the math of why it would be O(n)?
With an unbalanced tree:
1
\
2
\
3
\
4
\
5
\
...
Your intuition of cutting the tree in half with each operation no longer applies. This unbalanced tree is the worst case of an unbalanced binary search tree. To search for 10 at the bottom of the list, you must make 10 operations, one for each element in the tree. That is why a search operation for an unbalanced binary search tree is O(n) - this unbalanced binary search tree is equivalent to a linked list. Each operation doesn't cut off half the tree -- just the one node you've already visited.
That is why specialized versions of binary search trees, such as red-black trees and AVL trees are important: they maintain trees that are balanced well enough so that all operations - search, insert, delete -- are still O(log n).
The O(n) situation in a BST happens when you have either the minimum or the maximum at the top, effectively turning your BST into a linked list. Suppose you added elements as: 1, 2, 3, 4, 5, generating your BST, which will be a linked list due to every element having only a right child. Adding 6 would have to go down right on every single node, going through all the elements, hence making the asymptotic complexity of add O(n)
Consider the following code:
public int heightOfBinaryTree(Node node)
{
if (node == null)
{
return 0;
}
else
{
return 1 +
Math.max(heightOfBinaryTree(node.left),
heightOfBinaryTree(node.right));
}
}
I want to know the logical reasoning behind this code. How did people come up with it? Does some have an inductive proof?
Moreover, I thought of just doing a BFS with the root of the binary tree as the argument to get the height of the binary tree. Is the previous approach better than mine?Why?
if (node == null)
{
return 0;
}
The children of leaf nodes are null. Therefore this is saying that once we've gone past the leaves, there are no further nodes.
If we are not past the leaf nodes, we have to calculate the height and this code does so recursively.
return 1 +
The current node adds a height of 1 to the height of the subtree currently being calculated.
Math.max(heightOfBinaryTree(node.left),
heightOfBinaryTree(node.right));
We recursively calculate the height of the left subtree (node.left) and right subtree (node.right). Since we're calculating the maximum depth, we take the maximum of these two depths.
I've shown above that the recursive function is correct. So calling the function on the parent node will calculate the depth of the entire tree.
Here's a graphical representation of the height of a tree from this document. h is the height of the tree, hl and hr are the heights of the left and right subtrees respectively.
Moreover, I thought of just doing a
BFS with the root of the binary tree
as the argument to get the height of
the binary tree. Is the previous
approach better than mine?Why?
The code you provided is a form of DFS. Since you have to process all nodes to find the height of the tree, there will be no runtime difference between DFS and BFS, although BFS will use O(N) memory while DFS will use O(logN) memory. BFS is also slightly more complex to code, since it requires a queue while DFS makes use of the "built-in" recursive stack.
The logic behind that code is:
since a node will have two children, the height of the Tree will be maximum of the heights of tree whose roots are the left child and right child, and of course +1 for the walk to the children.
As you can see, the description above is recursive and so is the code.
BFS should also do, but it would be an overkill as both implementation and space/time complexity.
There is a say, recursive functions though hard to understand, but are very elegant to implement.
The height of a tree is the length of longest downward path from it's root.
This function is a recursive way to count the levels of a binary tree. It just increments counters as it descends the tree, returning the maximum counter (the counter on the lowest node).
I hope I have helped.
It's a recursive function. It's saying the height of a tree is 1 + the height of its tallest branch.
Is BFS a breadth first search? I'm not sure what difference there would be in efficiency, but I like the simplicity of the recursive function.
To extend more on the answers and elaborate more on recursion plus recursive call stack.
Suppose the tree
2
/\
5 9
/
0
lets suppose the left sub-tree first, root(2) called the heightOfBinaryTree method on the left sub-tree
The call stack of the method in question will be as follows
node(5) calls node(0)
node(0) calls node(null)
node(null) breaks the recursive loop
consider that fact these calls are made before the method returned anything.
iterating back on the recursive call stack, this is where each node return its output.
node(null) returned 0 -> 0
node(0) returned (return from node(null) + 1) -> 1
node(5) returned (return from node(0) + 1) -> 2
Same goes for the right sub-tree. If we compare the output from both left and right sub-tree we will have the height.