better way to find depth of a node in tree - java

I want to compute depth of a tree
I have written code below
My question is that :
what is the order of this code? Is it O(n) [n is number of tree Nodes]
Is there any other way which you think is better and faster?
Thanks in advance
public int height(Node n)
{
if(n == null)
return 0;
return 1 + Math.max(height(n.left), height(n.right));
}

It's O(n), regardless of what type of tree you have, since you're visiting every single node to establish the maximum depth.
The only more efficient way is to have extra information about the tree stored somewhere. If it's balanced, you know the maximum height based on the number of nodes.
Alternatively, you can cache the information. Have two extra variables depth and dirty and initially set dirty to true:
When a caller requests the depth and dirty is true, call your function to work it out but the store that into depth and set dirty to false.
When a caller requests the depth and dirty is false, just return depth.
Whenever the structure of the tree is changed (insert or delete node), set dirty to true.

The order of your function is O(n).
Store the height in each node as you create it, and maintain that height as you add, remove, and balance the tree.

Well, I see that your question asks for depth and the code is to find the height of the node.
I am not really sure, what you want height or depth. If you require the code for depth, this is not the way. See below:
Depth (n) = 1 + Depth(P(n))
is the recursive definition of depth.
For height, what you wrote is correct.
Check Tree Operations , I have written it for most of the operations on a tree with their asymptotic analysis

Yes, your code is O(n), because in the worst case you call height on the root node (and then you have to touch every node to find the longest path to a leaf).
TTS's code is the same as yours, and I don't think there is a faster way to do it.

Related

Understanding Recursion logic

I really need your help for me to understand recursion properly. I can understand basic recursions and their logic like fibonacci
int factorial(int n)
if(n <=1)
return n
else
return(n*factorial(n-1))
That's easy the function keep calling factorial until n becomes zero and finally multiply all the results. But recursions like tree traversal is hard for me to understand
void inorderTraverse(Node* head)
if(head!=NULL){
inorderTraverse(head->left)
cout << head-> data
inorderTraverse(head->right)
}
}
Here I lost the logic how does this function goes if first recursion call will go back to function how can it goes to cout line or how can it show right child data. I really need your help.
Thank you
A binary search tree in alphabetical order:
B
/ \
A C
inorderTraverse(&A) will first descend to A and print it (recursively printing any subtree), then print B, then descend to C and print it (recursively printing any subtree).
So in this case, A B C.
For a more complicated tree:
D
/ \
B E
/ \
A C
This will be printed as A B C D E. Notice how the original tree is on the left of D, so is printed in its entirety first; the problem is reduced to a smaller instance of the starting problem. This is the essence of recursion. Next D is printed, then everything on the right of D (which is just E).
Note that in this implementation, the nodes don't know about their parent. The recursion means this information is stored on the call stack. You could replace the whole thing with an iterative implementation, but you would need some stack-like data structure to keep track of moving back up through the tree.
Inorder traversal says you need to traverse as Left-Root-Right.So for one level it is fine we print in left-root-right format. But With the level increases you need to makesure your algorithm traverse in the same way.So you need to print the leftSubtree first then the root of that subTree and then the right subTree at each level.
The Recursive code inorderTraverse(head->left) tells till the node is not null go to its leftside of the tree.Once it reaches the end it prints the left node then print the Root node of that subTree and wahtever operation u performed on leftSubTree you need to perform the same on Right subTree that's why you write inorderTraverse(head->right). Start debugging by creating 3level trees. Happy Learning.
Try to imagine binary tree and then start traversing it from root. You always go left. If there is no more lefts then you go right and after that you just go up. And you will finish back in root (from right side).
It is similar as going thought maze. You can choose that you will always go to left. (you will always touch left wall). At the end you will finish in exit or back in entrance if there isn't another exit.
In this code is important that you have two recursive calls in body. First is for left subtree and second is for right subtree. When you finish one function returns you back to node where you started.
Binary search trees have the property that for every node, the left subtree contains values that are smaller than the current node's value, and the right subtree contains values that are larger.
Thus, for a given node to yield the values in its subtree in-order the function needs to:
Handle the values less than the current value;
Handle its value;
Handle the values greater than the current value.
If you think of your function initially as a black box that deals with a subtree, then the recursion magically appears
Apply the function to the left subtree;
Deal with the current value;
Apply the function to the right subtree.
Basically, the trick is to initially think of the function as a shorthand way to invoke an operation, without worrying about how it might be accomplished. When you think of it abstractly like that, and you note that the current problem's solution can be achieved by applying that same functionality to a subset of the current problem, you have a recursion.
Similarly, post-order traversal boils down to:
Deal with all my children (left subtree, then right subtree, or vice-versa if you're feeling contrary);
Now I can deal with myself.

Getting max and min from BST (C++ vs Java)

I know that given a balanced binary search tree, getting the min and max element takes O(logN), I want to ask about their implementation in C++ and Java respectively.
C++
Take std::set for example, getting min/max can be done by calling *set.begin() / *set.rbegin() , and it's constant time.
Java
Take HashSet for example, getting min/max can be done by calling HashSet.first() and HashSet.last(), but it's logarithmic time.
I wonder if this is because std::set has done some extra trick to always update the beigin() and rbegin() pointer, if so, can anyone show me that code? Btw why didn't Java add this trick too, it seems pretty convenient to me...
EDIT:
My question might not be very clear, I want to see how std::set implements insert/erase , I'm curious to see how the begin() and rbegin() iterator are updated during those operations..
EDIT2:
I'm very confused now. Say I have following code:
set<int> s;
s.insert(5);
s.insert(3);
s.insert(7);
... // say I inserted a total of n elements.
s.insert(0);
s.insert(9999);
cout<<*s.begin()<<endl; //0
cout<<*s.rbegin()<<endl; //9999
Isn't both *s.begin() and *s.rbegin() O(1) operations? Are you saying they aren't? s.rbegin() actually iterate to the last element?
My answer isn't language specific.
To fetch the MIN or MAX in a BST, you need to always iterate till the leftmost or the rightmost node respectively. This operation in O(height), which may roughly be O(log n).
Now, to optimize this retrieval, a class that implements Tree can always store two extra pointers pointing to the leftmost and the rightmost node and then retrieving them becomes O(1). Off course, these pointers brings in the overhead of updating them with each insert operation in the tree.
begin() and rbegin() only return iterators, in constant time. Iterating them isn't constant-time.
There is no 'begin()/rbegin() pointer'. The min and max are reached by iterating to the leftmost or rightmost elements, just as in Java, only Java doesn't have the explicit iterator.

Why store the points in a binary tree?

This question covers a software algorithm, from On topic
I am working on an interview question from Amazon Software Question,
specifically "Given a set of points (x,y) and an integer "n", return n number of points which are close to the origin"
Here is the sample high level psuedocode answer to this question, from Sample Answer
Step 1: Design a class called point which has three fields - int x, int y, int distance
Step 2: For all the points given, find the distance between them and origin
Step 3: Store the values in a binary tree
Step 4: Heap sort
Step 5: print the first n values from the binary tree
I agree with steps 1 and 2 because it makes sense in terms of object-oriented design to have one software bundle of data, Point, encapsulate away the fields of x, y and distance.Ensapsulation
Can someone explain the design decisions from 3 to 5?
Here's how I would do steps of 3 to 5
Step 3: Store all the points in an array
Step 4: Sort the array with respect to distance(I use some build in sort here like Arrays.Sort
Step 5: With the array sorted in ascending order, I print off the first n values
Why the author of that response use a more complicated data structure, binary tree and not something simpler like an array that I used? I know what a binary tree is - hierarchical data structure of nodes with two pointers. In his algorithm, would you have to use a BST?
First, I would not say that having Point(x, y, distance) is good design or encapsulation. distance is not really part of a point, it can be computed from x and y. In term of design, I would certainly have a function, i.e. a static method from Point or an helper class Points.
double distance(Point a, Point b)
Then for the specific question, I actually agree with your solution, to put the data in an array, sort this array and then extract the N first.
What the example may be hinted at is that the heapsort actually often uses a binary tree structure inside the array to be sorted as explained here :
The heap is often placed in an array with the layout of a complete binary tree.
Of course, if the distance to the origin is not stored in the Point, for performance reason, it had to be put with the corresponding Point object in the array, or any information that will allow to get the Point object from the sorted distance (reference, index), e.g.
List<Pair<Long, Point>> distancesToOrigin = new ArrayList<>();
to be sorted with a Comparator<Pair<Long, Point>>
It is not necessary to use BST. However, it is a good practice to use BST when needing a structure that is self-sorted. I do not see the need to both use BST and heapsort it (somehow). You could use just BST and retrieve the first n points. You could also use an array, sort it and use the first n points.
If you want to sort an array of type Point, you could implement the interface Comparable (Point would imolement that interface) and overload the default method.
You never have to choose any data structures, but by determining the needs you have, you would also easily determine the optimum structure.
The approach described in this post is more complex than needed for such a question. As you noted, simple sorting by distance will suffice. However, to help explain your confusion about what your sample answer author was trying to get at, maybe consider the k nearest neighbors problem which can be solved with a k-d tree, a structure that applies space partitioning to the k-d dataset. For 2-dimensional space, that is indeed a binary tree. This tree is inherently sorted and doesn't need any "heap sorting."
It should be noted that building the k-d tree will take O(n log n), and is only worth the cost if you need to do repeated nearest neighbor searches on the structure. If you only need to perform one search to find k nearest neighbors from the origin, it can be done with a naive O(n) search.
How to build a k-d tree, straight from Wiki:
One adds a new point to a k-d tree in the same way as one adds an element to any other search tree. First, traverse the tree, starting from the root and moving to either the left or the right child depending on whether the point to be inserted is on the "left" or "right" side of the splitting plane. Once you get to the node under which the child should be located, add the new point as either the left or right child of the leaf node, again depending on which side of the node's splitting plane contains the new node.
Adding points in this manner can cause the tree to become unbalanced, leading to decreased tree performance. The rate of tree performance degradation is dependent upon the spatial distribution of tree points being added, and the number of points added in relation to the tree size. If a tree becomes too unbalanced, it may need to be re-balanced to restore the performance of queries that rely on the tree balancing, such as nearest neighbour searching.
Once have have built the tree, you can find k nearest neighbors to some point (the origin in your case) in O(k log n) time.
Straight from Wiki:
Searching for a nearest neighbour in a k-d tree proceeds as follows:
Starting with the root node, the algorithm moves down the tree recursively, in the same way that it would if the search point were being inserted (i.e. it goes left or right depending on whether the point is lesser than or greater than the current node in the split dimension).
Once the algorithm reaches a leaf node, it saves that node point as the "current best"
The algorithm unwinds the recursion of the tree, performing the following steps at each node:
If the current node is closer than the current best, then it becomes the current best.
The algorithm checks whether there could be any points on the other side of the splitting plane that are closer to the search point than the current best. In concept, this is done by intersecting the splitting hyperplane with a hypersphere around the search point that has a radius equal to the current nearest distance. Since the hyperplanes are all axis-aligned this is implemented as a simple comparison to see whether the difference between the splitting coordinate of the search point and current node is lesser than the distance (overall coordinates) from the search point to the current best.
If the hypersphere crosses the plane, there could be nearer points on the other side of the plane, so the algorithm must move down the other branch of the tree from the current node looking for closer points, following the same recursive process as the entire search.
If the hypersphere doesn't intersect the splitting plane, then the algorithm continues walking up the tree, and the entire branch on the other side of that node is eliminated.
When the algorithm finishes this process for the root node, then the search is complete.
This is a pretty tricky algorithm that I would hate to need to describe as an interview question! Fortunately the general case here is more complex than is needed, as you pointed out in your post. But I believe this approach may be close to what your (wrong) sample answer was trying to describe.

Height of binary search tree in constant time

I need to fine the height of a binary search tree with a time of O(1) the only way i could think to do this is to put a check in the add and remove methods incrementing a global counter is there any other way?
O(1) time suggests that you should already have the height when it is requested.
The best way is it to keep/update the correct value whenever a new node is added/deleted . You are doing it in a right way , however it increases the complexity on addition and deletion.
You can do it number of ways , like keep the depth value along with the node in tree etc.
class Node{
int depth;
Object value;
}
Node lowestNode;
I can think of storing the max depth node reference in an object and keep that as depth . So whenever you add a new element , you can check the depth of element based on its parent and replace the max depth node if the new element has more depth .
If you are deleting the max depth node then replace it with the parent otherwise correct the depth of all elments recursively along the tree.
The height of tree is , lowestNode.depth
Storing an attribute for the height and update it when you are doing add/remove should be the most reasonable solution.
If the tree is guaranteed to be balanced (e.g. AVL), you can calculate the height by number of element in the tree (you have to keep the number of elements though :P )

Queue data structure supporting fast k-th largest element finding

I'm faced with a problem which requires a Queue data structure supporting fast k-th largest element finding.
The requirements of this data structure are as follows:
The elements in the queue are not necessarily integers, but they must be comparable to each other, i.e we can tell which one is greater when we compare two elements(they can be equal as well).
The data structure must support enqueue(adds the element at the tail) and dequeue(removes the element at the head).
It can quickly find the k-th largest element in the queue, pls note k is not a constant.
You can assume that operations enqueue , dequeue and k-th largest element finding all occur with the same frequency.
My idea is to use a modified balanced binary search tree. The tree is the same as ordinary balanced binary search tree except that every nodei is augmented with another field ni, ni denotes the number of nodes contained in the subtree with root nodei. The aforementioned operations are supported as follows:
For simplicity assume that all elements are distinct.
Enqueue(x): x is first inserted into the tree, suppose the corresponding node is nodet, we append pair(x,pointer to nodet) to the queue.
Dequeue: suppose (e1, node1) is the element at the head, node1 is the pointer into the tree corresponding to e1. We delete node1 from the tree and remove (e1, node1) from the queue.
K-th largest element finding: suppose root node is noderoot, its two children are nodeleft and noderight(suppose they all exist), we compare K with nroot , three cases may happen:
if K< nleft we find the K-th largest element in the left subtree of nroot;
if K>nroot-nright we find the (K-nroot+nright)-th largest element in the right subtree of nroot;
otherwise nroot is the node we want.
The time complexity of all the three operations are O(logN) , where N is the number of elements currently in the queue.
How can I speed up the operations mentioned above? With what data structures and how?
Note - you cannot achieve better then O(logn) for all, at best you need to "chose" which op you care for the most. (Otherwise, you could sort in O(n) by feeding the array to the DS, and querying 1st, 2nd, 3rd, ... nth elements)
Using a skip list instead of a Balanced BST as the sorted structure
can reduce dequeue complexity to O(1) average case. It does
not affect complexity of any other op.
To remove from a skip list - all you need to do is to get to the element using the pointer from the head of the queue, and follow the links up and remove each. The expected number of nodes needed to be deleted is 1 + 1/2 + 1/4 + ... = 2.
find Kth can be achieved in O(logK) by starting from the leftest node (and not the root) and making your way up until you find you have "more sons then needed", and then treat the just found node as the root just like the algorithm in the question. Though it is better in asymptotic complexity - the constant factor is double.
I found an interesting paper:
Sliding-Window Top-k Queries on Uncertain Streams published in VLDB 2008 and cited by 71.
https://www.cse.ust.hk/~yike/wtopk.pdf
VLDB is the best conference in database research area, and the number of citations proves the data structure actually works.
The paper looks pretty difficult, but if you really need improve your data structure, I suggest you to read this paper or papers in the reference page of this paper.
You can also use a finger tree.
For example, a priority queue can be implemented by labeling the internal nodes by the minimum priority of its children in the tree, or an indexed list/array can be implemented with a labeling of nodes by the count of the leaves in their children. Finger trees can provide amortized O(1) cons, reversing, cdr, O(log n) append and split; and can be adapted to be indexed or ordered sequences.
Also note that being a purely functional structure makes this a good choice for concurrent usage.

Categories

Resources