This is for a homework problem -
I need to find the median of a Binary Search Tree in O(h) time - meaning, I cant go through the whole tree to find it. I need to find the median by going through the left or right side of the tree, in a recursion solution.
I've placed an index into each node of the BST, using inorder traversal.
So if a BST has 99 nodes, the smallest node has an index of 1, the next smallest has index of 2 etc etc...while the largest node has index of 99. THis indexing is also required of the assignment.
So whichever side is heavier, has the median.
So here's what I was thinking:
if the left side has more nodes median is on the left. if right side has more nodes, median is on the right. if the left and right sides have equal num of nodes, median is the main root of the whole tree.
But I can't think of how the nodes' indexes would play a role in any of this. I was thinking of using the difference of 2 nodes' indexes to see if 1 side is heavier, but this would require me to go through the whole tree I think, so this would be O(n).
Please set me on a clearer path or a hint or 2.
Related
I was given a graph, including a start vertex, other vertices, and edges represent the costs going from one vertex to another. I need to find the set of destination vertices that I can travel to from the start vertex. The budget is a certain amount of dollars and the travel total cost should be within the budget. How can i implement the Dijkstra's algorithm to this problem? I think we usually use Dijkstra to find the shortest path between two fixed vertices before. But I am not sure how to implement Dijkstra on this budget problem. If someone can give some ideas, that really helps!
To my undersatanding, Dijkstra's algorithm solves the single-source shortest path problem. This means that the resulting data structure is a tree rooted at the starting node. Typically, an implementation would store the minimum cost to reach each node from the starting node. In total, the set of vertices that can be reached within the budget are the ones to which the cost of the shortest path does not exceed the budget. The algorithm itself does not need to be modified; in an additional step, the nodes which can be reached within budged has to be returned as the output.
If you use Dijkstra algorithm, you may end up with the below scenarios:
Assuming budget of 50 and the graph of 4 nodes (start, node 1, node 2, node 3)
Start Node -> node 1 (15) -> node 2 (10): so total cost is 25
Start Node -> node 3 (15): total cost is 15
So now what is your expected result? You should go to node 1 then node 2 and ignore node 3 (since you cannot return back to start then go to node 3, the return will cost 30 too). Or you should go to node 1, back to start, then go to node 3 (total cost of 45, the largest cost you can utilise)
What you need is not the shortest path covering all destinations which is Floyd-Warshall algorithm https://en.wikipedia.org/wiki/Floyd–Warshall_algorithm
I understand worst case happens when the pivot is the smallest or the largest element. Then one of the partition is empty and we repeat the recursion for N-1 elements
But how it is calculated to O(N^2)
I have read couple of articles still not able to understand it fully.
Similarly, best case is when the pivot is the median of the array and the left and right part are of the same size. But, then how the value O(NlogN) is calculated
I understand worst case happens when the pivot is the smallest or the largest element. Then one of the partition is empty and we repeat the recursion for N-1 elements.
So, imagine that you repeatedly pick the worst pivot; i.e. in the N-1 case one partition is empty and you recurse with N-2 elements, then N-3, and so on until you get to 1.
The sum of N-1 + N-2 + ... + 1 is (N * (N - 1)) / 2. (Students typically learn this in high-school maths these days ...)
O(N(N-1)/2) is the same as O(N^2). You can deduce this from first principles from the mathematical definition of Big-O notation.
Similarly, best case is when the pivot is the median of the array and the left and right part are of the same size. But, then how the value O(NlogN) is calculated.
That is a bit more complicated.
Think of the problem as a tree:
At the top level, you split the problem into two equal-sized sub problems, and move N objects into their correct partitions.
At the 2nd level. you split the two sub-problems into four sub-sub-problems, and in 2 problems you move N/2 objects into their correct partitions, for a total of N objects moved.
At the bottom level you have N/2 sub-problems of size 2 which you (notionally) split into N problems of size 1, again copying N objects.
Clearly, at each level you move N objects. The height of the tree for a problem of size N is log2N. So ... there are N * log2N object moves; i.e. O(N * log2)
But log2N is logeN * loge2. (High-school maths, again.)
So O(Nlog2N) is O(NlogN)
Little correction to your statement:
I understand worst case happens when the pivot is the smallest or the largest element.
Actually, the worst case happens when each successive pivots are the smallest or the largest element of remaining partitioned array.
To better understand the worst case: Think about an already sorted array, which you may be trying to sort.
You select first element as first pivot. After comparing the rest of the array, you would find that the n-1 elements still are on the other end (rightside) and the first element remains at the same position, which actually totally beats the purpose of partitioning. These steps you would keep repeating till the last element with the same effect, which in turn would account for (n-1 + n-2 + n-3 + ... + 1) comparisons, and that sums up to (n*(n-1))/2 comparisons. So,
O(n*(n-1)/2) = O(n^2) for worst case.
To overcome this problem, it is always recommended to pick up each successive pivots randomly.
I would try to add explanation for average case as well.
The best case can derived from the Master theorem. See https://en.wikipedia.org/wiki/Master_theorem for instance, or Cormen, Leiserson, Rivest, and Stein: Introduction to Algorithms for a proof of the theorem.
One way to think of it is the following.
If quicksort makes a poor choice for a pivot, then the pivot induces an unbalanced partition, with most of the elements being on one side of the pivot (either below or above). In an extreme case, you could have, as you suggest, that all elements are below or above the pivot. In that case we can model the time complexity of quicksort with the recurrence T(n)=T(1)+T(n-1)+O(n). However, T(1)=O(1), and we can write out the recurrence T(n)=O(n)+O(n-1)+...+O(1) = O(n^2). (One has to take care to understand what it means to sum Big-Oh terms.)
On the other hand, if quicksort repeatedly makes good pivot choices, then those pivots induce balanced partitions. In the best case, about half the elements will be below the pivot, and about half the elements above the pivot. This means we recurse on two subarrays of roughly equal sizes. The recurrence then becomes T(n)=T(n/2)+T(n/2)+O(n) = 2T(n/2)+O(n). (Here, too, one must take care about rounding the n/2 if one wants to be formal.) This solves to T(n)=O(n log n).
The intuition for the last paragraph is that we compute the relative position of the pivot in the sorted order without actually sorting the whole array. We can then compute the relative position of the pivot in the below subarray without having to worry about the elements from the above subarray; similarly for the above subarray.
Firstly, paste a pseudo-code here:
In my opinion, you need understand the two cases: the worst case & the best case.
The worst case:
The most unbalanced partition occurs when the pivot divides the list into two sublists of sizes 0 and n−1. The complexity in recursively is T(n)=O(n)+T(0)+T(n-1)=O(n)+T(n-1). The master theorem tells that
T(n)=O(n²).
The best case:
In the most balanced case, each time we perform a partition we divide the list into two nearly equal pieces. The same as the worst, the recurrence relation is T(n)=O(n)+2T(n/2). And it can transform to T(n)=O(n logn).
So how do I tackle this problem? I need to a program that reads a positive integer n from standard input and writes to standard output a representation of all distinct rooted, ordered, labeled trees on the set {1,2,3....n} of vertices.
For the output, I need to use the following linear textual representation L(t) of a tree t:
If t is empty then L(t) = ().
If t has root n and children, in sibling order, C = (c1; c2; : : : ; ck), then
L(t) = (n, (L(c1), L(c2), : : :, L(ck)))
where, L(ci) denotes the linear textual representation of the subtree rooted at
child ci. There is a single space after each comma in the representation.
The output should contain the representation of one tree on each line and should be
sorted by lexicographic order on the linear representations viewed as strings. The
output should contain nothing else (such as spurious newline characters, prompts, or
informative messages). Sample inputs and outputs for n = 1; 2; appear below.
enter code here
Input: 1
Output:
(1, ())
Input: 2
Output:
(1, ((2, ())))
(2, ((1, ())))
enter code here
Any help will be largely appreciated. I just need to be steered to a direction. Right now, I'm completely stumped :(
You can generate trees recursively. Start with a root. The root can have 0, 1, 2 ... (m - 1) children, where m is the number of vertices you have left to place. Start by placing (m - 1) vertices under the root, and then go down all the way to 0. You'll "place" these vertices recursively, so placing a vertex as child under the root means calling the same method again, but the maximum number of children will be a bit less this time.
You'll get two stopping criteria for the recursion:
You've placed all N vertices. You need to output the current tree then with your yet-to-define L(t) function, then backtrack to try different trees.
The algorithm gave all leaf vertices a degree of 0 and you haven't placed n vertices yet. Don't output this tree since it is invalid, but backtrack.
The algorithm is finished after it tries to give the root node 0 children.
As for the output L(t) function, it seems to suffice to do a depth-first tree traversal. Recursive is easiest to program (as this seems to be a practical assignment of some kind, that's probably what they want you to do). If you need speed, look for a non-recursive depth-first tree traversal algorithm on Wikipedia.
I have an undirected graph A that has : no multiple-links between any two nodes , no self-connected node, there can be some isolated nodes (nodes with degree 0).
I need to go through all the possible combinations of pair of nodes in graph A to assign some kind of score for non-existence links (Let say if my graph has k nodes and n links, then the number of combination should be (k*(k-1)/2 - n) of combinations). The way that I assign score is based on the common neighbor nodes between the 2 nodes of combination.
Ex: score between A-D should be 1, score between G-D should be 0 ...
The biggest problem is that my graph has more than 100.000 nodes and it was too slow to handle almost 10^10 combinations which is my first attempt to approach the solution.
My second thought is since the algorithm is based on common neighbors of the node, I might only need to look at the neighbors so that I can assign score which is different from 0. The rest can be determined as 0 and no need to compute. But this could possibly repeat a combination.
Any idea to approach this solution is appreciated. Please keep in mind that the actual network has more than 100.000 nodes.
If you represent your graph as an adjacency list (rather than an adjacency matrix), you can make use of the fact that your graph has only 600,000 edges to hopefully reduce the computation time.
Let's take a node V[j] with neighbors V[i] and V[k]:
V[i] ---- V[j] ---- V[k]
To find all such pairs of neighbors you can take the list of nodes adjacent to V[j] and find all combinations of those nodes. To avoid duplicates you will have to generate the combinations rather than the permutations of the end nodes V[i] and V[k] by requiring that i < k.
Alternatively, you can start with node V[i] and find all of the nodes that have a distance of 2 from V[i]. Let S be the set of all the nodes adjacent to V[i]. For each node V[j] in S, create a path V[i]-V[j]-V[k] where:
V[k] is adjacent to V[j]
V[k] is not an element of S (to avoid directly connected nodes)
k != i (to avoid cycles)
k > i (to avoid duplications)
I personally like this approach better because it completes the adjacency list for one node before moving on to the next.
Given that you have ~600,000 edges in a graph with ~100,000 nodes, assuming an even distribution of edges across all of the nodes each node would have an average degree of 12. The number of possible paths for each node is then on the order of 102. Over 105 nodes that gives on the order of 107 total paths rather than the theoretical limit of 1010 for a complete graph. Still a large number, but a thousand times faster than before.
Consider the following code:
public int heightOfBinaryTree(Node node)
{
if (node == null)
{
return 0;
}
else
{
return 1 +
Math.max(heightOfBinaryTree(node.left),
heightOfBinaryTree(node.right));
}
}
I want to know the logical reasoning behind this code. How did people come up with it? Does some have an inductive proof?
Moreover, I thought of just doing a BFS with the root of the binary tree as the argument to get the height of the binary tree. Is the previous approach better than mine?Why?
if (node == null)
{
return 0;
}
The children of leaf nodes are null. Therefore this is saying that once we've gone past the leaves, there are no further nodes.
If we are not past the leaf nodes, we have to calculate the height and this code does so recursively.
return 1 +
The current node adds a height of 1 to the height of the subtree currently being calculated.
Math.max(heightOfBinaryTree(node.left),
heightOfBinaryTree(node.right));
We recursively calculate the height of the left subtree (node.left) and right subtree (node.right). Since we're calculating the maximum depth, we take the maximum of these two depths.
I've shown above that the recursive function is correct. So calling the function on the parent node will calculate the depth of the entire tree.
Here's a graphical representation of the height of a tree from this document. h is the height of the tree, hl and hr are the heights of the left and right subtrees respectively.
Moreover, I thought of just doing a
BFS with the root of the binary tree
as the argument to get the height of
the binary tree. Is the previous
approach better than mine?Why?
The code you provided is a form of DFS. Since you have to process all nodes to find the height of the tree, there will be no runtime difference between DFS and BFS, although BFS will use O(N) memory while DFS will use O(logN) memory. BFS is also slightly more complex to code, since it requires a queue while DFS makes use of the "built-in" recursive stack.
The logic behind that code is:
since a node will have two children, the height of the Tree will be maximum of the heights of tree whose roots are the left child and right child, and of course +1 for the walk to the children.
As you can see, the description above is recursive and so is the code.
BFS should also do, but it would be an overkill as both implementation and space/time complexity.
There is a say, recursive functions though hard to understand, but are very elegant to implement.
The height of a tree is the length of longest downward path from it's root.
This function is a recursive way to count the levels of a binary tree. It just increments counters as it descends the tree, returning the maximum counter (the counter on the lowest node).
I hope I have helped.
It's a recursive function. It's saying the height of a tree is 1 + the height of its tallest branch.
Is BFS a breadth first search? I'm not sure what difference there would be in efficiency, but I like the simplicity of the recursive function.
To extend more on the answers and elaborate more on recursion plus recursive call stack.
Suppose the tree
2
/\
5 9
/
0
lets suppose the left sub-tree first, root(2) called the heightOfBinaryTree method on the left sub-tree
The call stack of the method in question will be as follows
node(5) calls node(0)
node(0) calls node(null)
node(null) breaks the recursive loop
consider that fact these calls are made before the method returned anything.
iterating back on the recursive call stack, this is where each node return its output.
node(null) returned 0 -> 0
node(0) returned (return from node(null) + 1) -> 1
node(5) returned (return from node(0) + 1) -> 2
Same goes for the right sub-tree. If we compare the output from both left and right sub-tree we will have the height.