This is the implementation of add in Binary Search Tree from BST Add
private IntTreeNode add(IntTreeNode root, int value) {
if (root == null) {
root = new IntTreeNode(value);
} else if (value <= root.data) {
root.left = add(root.left, value);
} else {
root.right = add(root.right, value);
}
return root;
}
I understand why this runs in O(log n). Here's how I analyze it. We have a tree size of n. How many cuts of 2, or half cut, will reduce this tree down to a size of 1. So we have the expression n(1/2)^x = 1 where the 1/2 represents each half cut. Solving this for x, we have log2(x) so the logn comes from search.
Here is a lecture slide from Heap that discusses runtime for an unbalanced binary search.
My question is even if the binary search tree is unbalanced, wouldn't the same strategy work for analyzing the runtime of add? How many cuts you have to make. Wouldn't the runtime still be O(log n), not O(n)? If so, can someone show the math of why it would be O(n)?
With an unbalanced tree:
1
\
2
\
3
\
4
\
5
\
...
Your intuition of cutting the tree in half with each operation no longer applies. This unbalanced tree is the worst case of an unbalanced binary search tree. To search for 10 at the bottom of the list, you must make 10 operations, one for each element in the tree. That is why a search operation for an unbalanced binary search tree is O(n) - this unbalanced binary search tree is equivalent to a linked list. Each operation doesn't cut off half the tree -- just the one node you've already visited.
That is why specialized versions of binary search trees, such as red-black trees and AVL trees are important: they maintain trees that are balanced well enough so that all operations - search, insert, delete -- are still O(log n).
The O(n) situation in a BST happens when you have either the minimum or the maximum at the top, effectively turning your BST into a linked list. Suppose you added elements as: 1, 2, 3, 4, 5, generating your BST, which will be a linked list due to every element having only a right child. Adding 6 would have to go down right on every single node, going through all the elements, hence making the asymptotic complexity of add O(n)
Related
In the book Cracking the Coding Interview there is an example of a recursive algorithm to calculate all the permutations of a string. The code:
void permutation(String str){
permutation(str,"");
}
void permutation(String str, String prefix){
if(str.length()==0){
System.out.println(prefix);
} else{
for(int i=0;i<str.length();i++){
String rem=str.substring(0,i)+str.substring(i+1);
permutation(rem,prefix+str.charAt(i));
}
}
}
The complexity is stated to be something like O(N * N * N!). The intuition for this result is that each permutation is a leaf node, and there are N! permutations. It is stated that since each leaf node is N nodes away from the root, the upper bound on the number of nodes in the tree is O(N*N!), and we then do O(N) amount of work on each node.
I understand the linear amount of work on each node, I also understand there being N! leaf nodes. But how is it that it can be concluded that the number of nodes in a tree is equal to computing (# of leaf nodes * distance to root)? Are we not going to be counting the same nodes many times since the path is shared by several of the nodes?
For example, if the tree were a binary tree it seems this method would not be accurate. The number of nodes is then equal to 2^n - 1, which is the result of the sum of the geometric series (2^0 + 2^1 + ... + 2^(n-1)).
If n=4 a full tree would have 15 nodes. There would be 2^(n-1) = 8 leaf nodes, and they would all have n distance to the root, but 2^(n-1)*n gives a much larger number (32) than the actual full tree of 15 nodes. This result would be 2^(n+1) which is still O(2^n), but I feel like I have probably misunderstood something along the way.
My question is what makes these two examples different in terms of estimating the upper bound on the number of nodes in their respective recursive trees, when is it safe to assume that the upper-bound is the number of leaf nodes times the depth of the tree (assuming all leaf nodes have the same depth).
This code is meant to check if a binary tree is balanced (balanced being defined as a tree such that the heights of the two subtrees of any node never differ by more than one.
I understand the N part of the runtime O(NlogN). The N is because every node in the tree is visited at least once.
int getHeight(TreeNode root){
if(root==null) return -1; //Base case
return Math.max(getHeight(root.left), getHeight(root.right))+1;
}
boolean isBalanced(TreeNode root){
if(root == null) return true; //Base case
int heightDiff = getHeight(root.left) - getHeight(root.right);
if(Math.abs(heightDiff) > 1){
return false;
} else{ //Recurse
return isBalanced(root.left) && isBalanced(root.right);
}
}
What I don't understand is the logN part of the runtime O(NlogN). The code will trace every possible path from a node to the bottom of the tree. Therefore should the code be more like N2^N or something? How does one step by step come to the conclusion that the runtime is O(NlogN)?
I agree with you that the runtime of this code is not necessarily O(n log n). However, I don't believe that it will always trace out every path from a node to the bottom of the tree. For example, consider this tree:
*
/
*
/
*
Here, computing the depths of the left and right subtrees will indeed visit every node once. However, because an imbalance is found between the left and right subtrees, the recursion stops without recursively exploring the left subtree. In other words, finding an example where the recursion has to do a lot of work is going to require some creativity.
You are correct that the baseline check for the height difference will take time Θ(n) because every node must be scanned. The concern with this code is that it might rescan nodes many, many times as it recomputes the height differences during the recursion. If we want this function to run for a really long time - not necessarily as long as possible, but for a long time - we'd want to make it so that
the left and right subtrees have roughly the same height, so that the recursion proceeds to the left subtree, but
the tree is extremely imbalanced, placing most of the nodes into the left subtree.
One way to do this is to create trees where the right subtree is just a long spine that happens to have the same height as the left subtree, but with way fewer nodes. Here's one possible sequence of trees that has this property:
*
/ \
* * *
/ \ / \ \
* * * * * *
/ \ / \ \ / \ \ \
* * * * * * * * * *
Mechanically, each tree is formed by taking the previous tree and putting a rightward spine on top of it. Operationally, these trees are defined recursively as follows:
An order-0 tree is a single node.
An order-(k+1) tree is a node whose left child is an order-k tree and whose right child is a linked list of height k.
Notice that the number of nodes in an order-k tree is Θ(k2). You can see this by noticing that the trees have a nice triangular shape, where each layer has one more node in it than the previous one. Sums of the form 1 + 2 + 3 + ... + k work out to Θ(k2), and while we can be more precise than this, there really isn't a need to do so.
Now, what happens if we fire off this recursion on the root of any one of these trees? Well, the recursion will begin by computing the heights of the left and right subtrees, which will report that they have the same height as one another. It will then recursively explore the left subtree to see whether it's balanced. After doing some (large) amount of work, it'll find that the left subtree is not balanced, at which point the recursion won't branch to the right subtree. In other words, the amount of work done on an order-k tree is lower-bounded by
W(0) = 1 (there's a single node visited once), and
W(k+1) = W(k) + Θ(k2).
To see where the W(k+1) term comes from, notice that we begin by scanning every node in the tree and there are Θ(k2) nodes to scan, then recursively applying the procedure to the left subtree. Expanding this recurrence, we see that in an order-k tree, the total work done is
W(k) = Θ(k2) + W(k-1)
= Θ(k2 + (k - 1)2) + W(k - 2)
= Θ(k2 + (k - 1)2 + (k - 2)2) + W(k - 3)
...
= Θ(k2 + (k - 1)2 + ... + 22 + 12)
= Θ(k3).
This last step follows from the fact that the sum of the first k cubes works out to Θ(k3).
To finish things off, we have one more step. We've shown that order-k trees require Θ(k3) total work to process with this recursive algorithm. However, we'd like a runtime bound in terms of n, the total number of nodes in the tree, not k, the order of the tree. Using the fact that the number of nodes in a tree of order k is Θ(k2), we see that a tree with n nodes has order Θ(k1/2). Plugging this in, we see that for arbitrarily large n, we can make the total work done equal to Θ((n1/2)3) = Θ(n3/2), which exceeds the O(n log n) proposed bound you mentioned. I'm not sure whether this is the worst-case input for this algorithm, but it's certainly not a good one.
So yes, you are correct - the runtime is not O(n log n) in general. However, it is the case that if the tree is perfectly balanced, the runtime is indeed O(n log n). To see why, notice that if the tree is perfectly balanced, each recursive call will
do O(n) work scanning each node in the tree, then
make two recursive calls on smaller trees, each of which is approximately half as large as the previous one.
That gives the recurrence T(n) = 2T(n / 2) + O(n), which solves to O(n log n). But that's just one specific case, not the general case.
A concluding note - with a minor modification, this code can be made to run in time O(n) in all cases. Instead of recomputing the depth of each node, make an initial pass over the tree and annotate each node with its depth (either by setting some internal field equal to the depth or by having an auxiliary HashMap mapping each node to its depth). This can be done in time O(n). From there, recursively walking the tree and checking whether the left and right subtrees have heights that differ by at most one requires O(1) work per node across n total nodes for a total runtime of O(n).
Hope this helps!
I am working on below interview question:
Given a singly linked list where elements are sorted in ascending
order, convert it to a height balanced BST.
For this problem, a height-balanced binary tree is defined as a binary
tree in which the depth of the two subtrees of every node never differ
by more than 1.
I am trying to understand below solution and its complexity? Can someone help me understand how it works? Is below solution O(n) time complexity and O(log n) space complexity?
Also is below algorithm better than "counting the number of nodes in the given Linked List. Let that be n. After counting nodes, we take left n/2 nodes and recursively construct the left subtree. After left subtree is constructed, we allocate memory for root and link the left subtree with root. Finally, we recursively construct the right subtree and link it with root. While constructing the BST, we also keep moving the list head pointer to next so that we have the appropriate pointer in each recursive call"
public TreeNode toBST(ListNode head) {
if(head==null) return null;
return helper(head,null);
}
public TreeNode helper(ListNode head, ListNode tail){
ListNode slow = head;
ListNode fast = head;
if(head==tail) return null;
while(fast!=tail && fast.next!=tail){
fast = fast.next.next;
slow = slow.next;
}
TreeNode thead = new TreeNode(slow.val);
thead.left = helper(head,slow);
thead.right = helper(slow.next,tail);
return thead;
}
BST-construction
A balanced tree can be constructed from a sorted list by subdividing the list into two equally long lists with one element in the middle being used as a root. E.g.:
1. [1, 2, 3, 4, 5, 6, 7]
2. 4
/ \
[1, 2, 3] [5, 6, 7]
3. 4
/ \
2 6
/ \ / \
1 3 5 7
Even if the two sublists differ by one element, they can at most differ by 1 in their height, thus making the tree balanced. By taking the middle element of the list the resulting tree is guaranteed to be a BST, since all smaller elements are part of the left subtree and all larger elements of the right subtree.
slow and fast
Your code works using two iterators, where one (fast) iterates over nodes twice as fast as the other (slow). So when fast has reached either the tail or the node right before the tail of the list, slow must be at the node in the middle of the list, thus dividing the list into two sublists of same length (up to at most one element difference), which can then be recursively processed as shown in the above diagramm.
Runtime Complexity
The algorithm runs in O(n lg n). Let's start with the recurrence of helper:
T(n) = n / 2 + 2 * T(n / 2)
T(1) = 1
In each call of helper, we must find the middle-node of the linkedlist defined by the two parameters passed to helper. This can only be done in n / 2 steps, since we can only walk linearly through the list. In addition, the helper is called recursively twice on linkedlists of half the size of the original list to build the left and right subtree.
Applying the Master-theorem(case 2) to the above recurrence, we get O(n lg n).
Space complexity
Space-complexity also needs to take the produced output-structure into account. Since each element of the input-linked list is converted into a node in the BST, the complexity is O(n).
EDIT
If the output is ignored, the space-complexity is solely dependent on the recursion-depth, which in turn is O(lg n), thus making the space-complexity O(lg n).
Which data structure can be used for storing a set of integers such that each of the following operations can be done in O(log N) time, where N is the number of elements?
deletion of the smallest element
insertion of a element if it is not already present in the set
PICK ONE OF THE CHOICES
A heap can be used, but not a balanced binary search tree
A balanced binary search tree can be used, but not a heap
Both balanced binary search and heap can be used
Neither balanced binary search tree nor heap can be used
I think that the second one, "A balanced binary search tree can be used, but not a heap", because the worst case complexity of inserting and finding of a Balanced Search Tree is logN.
And we cannot use a Heap, because, for example in Binary Heap, which is the faster, the worst case of finding is N.
A balanced binary search tree can be used, but not a heap
Because,
Balanced binary tree has it's smallest elements in the leaves. Therefore, no overhead once the smallest element is identified. To identify, you have to check log(N) number of nodes.
When inserting and element, all you do is, traverse till you find the position (you will have to traverse maximum of log(N) nodes) and add the new element as right or left child.
But in heap, inserting an element make it call build heap, which is Nlog(N).
Checking whether the element exists in the tree can be done with a sligh modification in constant time in balanced binary tree.
Consider the following code:
public int heightOfBinaryTree(Node node)
{
if (node == null)
{
return 0;
}
else
{
return 1 +
Math.max(heightOfBinaryTree(node.left),
heightOfBinaryTree(node.right));
}
}
I want to know the logical reasoning behind this code. How did people come up with it? Does some have an inductive proof?
Moreover, I thought of just doing a BFS with the root of the binary tree as the argument to get the height of the binary tree. Is the previous approach better than mine?Why?
if (node == null)
{
return 0;
}
The children of leaf nodes are null. Therefore this is saying that once we've gone past the leaves, there are no further nodes.
If we are not past the leaf nodes, we have to calculate the height and this code does so recursively.
return 1 +
The current node adds a height of 1 to the height of the subtree currently being calculated.
Math.max(heightOfBinaryTree(node.left),
heightOfBinaryTree(node.right));
We recursively calculate the height of the left subtree (node.left) and right subtree (node.right). Since we're calculating the maximum depth, we take the maximum of these two depths.
I've shown above that the recursive function is correct. So calling the function on the parent node will calculate the depth of the entire tree.
Here's a graphical representation of the height of a tree from this document. h is the height of the tree, hl and hr are the heights of the left and right subtrees respectively.
Moreover, I thought of just doing a
BFS with the root of the binary tree
as the argument to get the height of
the binary tree. Is the previous
approach better than mine?Why?
The code you provided is a form of DFS. Since you have to process all nodes to find the height of the tree, there will be no runtime difference between DFS and BFS, although BFS will use O(N) memory while DFS will use O(logN) memory. BFS is also slightly more complex to code, since it requires a queue while DFS makes use of the "built-in" recursive stack.
The logic behind that code is:
since a node will have two children, the height of the Tree will be maximum of the heights of tree whose roots are the left child and right child, and of course +1 for the walk to the children.
As you can see, the description above is recursive and so is the code.
BFS should also do, but it would be an overkill as both implementation and space/time complexity.
There is a say, recursive functions though hard to understand, but are very elegant to implement.
The height of a tree is the length of longest downward path from it's root.
This function is a recursive way to count the levels of a binary tree. It just increments counters as it descends the tree, returning the maximum counter (the counter on the lowest node).
I hope I have helped.
It's a recursive function. It's saying the height of a tree is 1 + the height of its tallest branch.
Is BFS a breadth first search? I'm not sure what difference there would be in efficiency, but I like the simplicity of the recursive function.
To extend more on the answers and elaborate more on recursion plus recursive call stack.
Suppose the tree
2
/\
5 9
/
0
lets suppose the left sub-tree first, root(2) called the heightOfBinaryTree method on the left sub-tree
The call stack of the method in question will be as follows
node(5) calls node(0)
node(0) calls node(null)
node(null) breaks the recursive loop
consider that fact these calls are made before the method returned anything.
iterating back on the recursive call stack, this is where each node return its output.
node(null) returned 0 -> 0
node(0) returned (return from node(null) + 1) -> 1
node(5) returned (return from node(0) + 1) -> 2
Same goes for the right sub-tree. If we compare the output from both left and right sub-tree we will have the height.