How do I generate trees in Java?

How do I generate trees in Java? - java

So how do I tackle this problem? I need to a program that reads a positive integer n from standard input and writes to standard output a representation of all distinct rooted, ordered, labeled trees on the set {1,2,3....n} of vertices.
For the output, I need to use the following linear textual representation L(t) of a tree t:
If t is empty then L(t) = ().
If t has root n and children, in sibling order, C = (c1; c2; : : : ; ck), then
L(t) = (n, (L(c1), L(c2), : : :, L(ck)))
where, L(ci) denotes the linear textual representation of the subtree rooted at
child ci. There is a single space after each comma in the representation.
The output should contain the representation of one tree on each line and should be
sorted by lexicographic order on the linear representations viewed as strings. The
output should contain nothing else (such as spurious newline characters, prompts, or
informative messages). Sample inputs and outputs for n = 1; 2; appear below.
enter code here
Input: 1
Output:
(1, ())
Input: 2
Output:
(1, ((2, ())))
(2, ((1, ())))
enter code here
Any help will be largely appreciated. I just need to be steered to a direction. Right now, I'm completely stumped :(

You can generate trees recursively. Start with a root. The root can have 0, 1, 2 ... (m - 1) children, where m is the number of vertices you have left to place. Start by placing (m - 1) vertices under the root, and then go down all the way to 0. You'll "place" these vertices recursively, so placing a vertex as child under the root means calling the same method again, but the maximum number of children will be a bit less this time.
You'll get two stopping criteria for the recursion:
You've placed all N vertices. You need to output the current tree then with your yet-to-define L(t) function, then backtrack to try different trees.
The algorithm gave all leaf vertices a degree of 0 and you haven't placed n vertices yet. Don't output this tree since it is invalid, but backtrack.
The algorithm is finished after it tries to give the root node 0 children.
As for the output L(t) function, it seems to suffice to do a depth-first tree traversal. Recursive is easiest to program (as this seems to be a practical assignment of some kind, that's probably what they want you to do). If you need speed, look for a non-recursive depth-first tree traversal algorithm on Wikipedia.

Related

Big O time complexity, number of nodes in a tree based on number of leaf nodes (recursion)

In the book Cracking the Coding Interview there is an example of a recursive algorithm to calculate all the permutations of a string. The code:
void permutation(String str){
permutation(str,"");
}
void permutation(String str, String prefix){
if(str.length()==0){
System.out.println(prefix);
} else{
for(int i=0;i<str.length();i++){
String rem=str.substring(0,i)+str.substring(i+1);
permutation(rem,prefix+str.charAt(i));
}
}
}
The complexity is stated to be something like O(N * N * N!). The intuition for this result is that each permutation is a leaf node, and there are N! permutations. It is stated that since each leaf node is N nodes away from the root, the upper bound on the number of nodes in the tree is O(N*N!), and we then do O(N) amount of work on each node.
I understand the linear amount of work on each node, I also understand there being N! leaf nodes. But how is it that it can be concluded that the number of nodes in a tree is equal to computing (# of leaf nodes * distance to root)? Are we not going to be counting the same nodes many times since the path is shared by several of the nodes?
For example, if the tree were a binary tree it seems this method would not be accurate. The number of nodes is then equal to 2^n - 1, which is the result of the sum of the geometric series (2^0 + 2^1 + ... + 2^(n-1)).
If n=4 a full tree would have 15 nodes. There would be 2^(n-1) = 8 leaf nodes, and they would all have n distance to the root, but 2^(n-1)*n gives a much larger number (32) than the actual full tree of 15 nodes. This result would be 2^(n+1) which is still O(2^n), but I feel like I have probably misunderstood something along the way.
My question is what makes these two examples different in terms of estimating the upper bound on the number of nodes in their respective recursive trees, when is it safe to assume that the upper-bound is the number of leaf nodes times the depth of the tree (assuming all leaf nodes have the same depth).

How priority queue is used with heap to solve min distance

Please bear with me I am very new to data structures.
I am getting confused how a priroity queue is used to solve min distance. For example if I have a matrix and want to find the min distance from the source to the destination, I know that I would perform Dijkstra algorithm in which with a queue I can easily find the distance between source and all elements in the matrix.
However, I am confused how a heap + priority queue is used here. For example say that I start at (1,1) on a grid and want to find the min distance to (3,3) I know how to implement the algorithm in the sense of finding the neighbours and checking the distances and marking as visited. But I have read about priority queues and min heaps and want to implement that.
Right now, my only understanding is a priority queue has a key to position elements. My issue is when I insert the first neighbours (1,0),(0,0),(2,1),(1,2) they are inserted in the pq based on a key (which would be distance in this case). So then the next search would be the element in the matrix with the shortest distance. But with the pq, how can a heap be used here with more then 2 neighbours? For example the children of (1,1) are the 4 neighbours stated above. This would go against the 2*i and 2*i + 1 and i/2
In conclusion, I don't understand how a min heap + priority queue works with finding the min of something like distance.
0 1 2 3
_ _ _ _
0 - |2|1|3|2|
1 - |1|3|5|1|
2 - |5|2|1|4|
3 - |2|4|2|1|

You need to use the priority queue to get the minimum weights in every move so the MinPQ will be fit for this.
MinPQ uses internally technique of heap to put the elements in the right position operations such as sink() swim()
So the MinPQ is the data structure that uses heap technique internally

If I'm interpreting your question correctly, you're getting stuck at this point:
But with the pq, how can a heap be used here with more then 2 neighbours? For example the children of (1,1) are the 4 neighbours stated above. This would go against the 2*i and 2*i + 1 and i/2
It sounds like what's tripping you up is that there are two separate concepts here that you may be combining together. First, there's the notion of "two places in a grid might be next to one another." In that world, you have (up to) four neighbors for each location. Next, there's the shape of the binary heap, in which each node has two children whose locations are given by certain arithmetic computations on array indices. Those are completely independent of one another - the binary heap has no idea that the items its storing come from a grid, and the grid has no idea that there's an array where each node has two children stored at particular positions.
For example, suppose you want to store locations (0, 0), (2, 0), (-2, 0) and (0, 2) in a binary heap, and that the weights of those locations are 1, 2, 3, and 4, respectively. Then the shape of the binary heap might look like this:
(0, 0)
Weight 1
/ \
(2, 0) (0, 2)
Weight 2 Weight 4
/
(0, -2)
Weight 3
This tree still gives each node two children; those children just don't necessarily map back to the relative positions of nodes in the grid.
More generally, treat the priority queue as a black box. Imagine that it's just a magic device that says "you can give me some new thing to store" and "I can give you the cheapest thing you've given be so far" and that's it. The fact that, internally, it coincidentally happens to be implemented as a binary heap is essentially irrelevant.
Hope this helps!

Why does this code to check if a binary tree is balanced take time O(n log n) when it recomputes depths multiple times?

This code is meant to check if a binary tree is balanced (balanced being defined as a tree such that the heights of the two subtrees of any node never differ by more than one.
I understand the N part of the runtime O(NlogN). The N is because every node in the tree is visited at least once.
int getHeight(TreeNode root){
if(root==null) return -1; //Base case
return Math.max(getHeight(root.left), getHeight(root.right))+1;
}
boolean isBalanced(TreeNode root){
if(root == null) return true; //Base case
int heightDiff = getHeight(root.left) - getHeight(root.right);
if(Math.abs(heightDiff) > 1){
return false;
} else{ //Recurse
return isBalanced(root.left) && isBalanced(root.right);
}
}
What I don't understand is the logN part of the runtime O(NlogN). The code will trace every possible path from a node to the bottom of the tree. Therefore should the code be more like N2^N or something? How does one step by step come to the conclusion that the runtime is O(NlogN)?

I agree with you that the runtime of this code is not necessarily O(n log n). However, I don't believe that it will always trace out every path from a node to the bottom of the tree. For example, consider this tree:
*
/
*
/
*
Here, computing the depths of the left and right subtrees will indeed visit every node once. However, because an imbalance is found between the left and right subtrees, the recursion stops without recursively exploring the left subtree. In other words, finding an example where the recursion has to do a lot of work is going to require some creativity.
You are correct that the baseline check for the height difference will take time Θ(n) because every node must be scanned. The concern with this code is that it might rescan nodes many, many times as it recomputes the height differences during the recursion. If we want this function to run for a really long time - not necessarily as long as possible, but for a long time - we'd want to make it so that
the left and right subtrees have roughly the same height, so that the recursion proceeds to the left subtree, but
the tree is extremely imbalanced, placing most of the nodes into the left subtree.
One way to do this is to create trees where the right subtree is just a long spine that happens to have the same height as the left subtree, but with way fewer nodes. Here's one possible sequence of trees that has this property:
*
/ \
* * *
/ \ / \ \
* * * * * *
/ \ / \ \ / \ \ \
* * * * * * * * * *
Mechanically, each tree is formed by taking the previous tree and putting a rightward spine on top of it. Operationally, these trees are defined recursively as follows:
An order-0 tree is a single node.
An order-(k+1) tree is a node whose left child is an order-k tree and whose right child is a linked list of height k.
Notice that the number of nodes in an order-k tree is Θ(k2). You can see this by noticing that the trees have a nice triangular shape, where each layer has one more node in it than the previous one. Sums of the form 1 + 2 + 3 + ... + k work out to Θ(k2), and while we can be more precise than this, there really isn't a need to do so.
Now, what happens if we fire off this recursion on the root of any one of these trees? Well, the recursion will begin by computing the heights of the left and right subtrees, which will report that they have the same height as one another. It will then recursively explore the left subtree to see whether it's balanced. After doing some (large) amount of work, it'll find that the left subtree is not balanced, at which point the recursion won't branch to the right subtree. In other words, the amount of work done on an order-k tree is lower-bounded by
W(0) = 1 (there's a single node visited once), and
W(k+1) = W(k) + Θ(k2).
To see where the W(k+1) term comes from, notice that we begin by scanning every node in the tree and there are Θ(k2) nodes to scan, then recursively applying the procedure to the left subtree. Expanding this recurrence, we see that in an order-k tree, the total work done is
W(k) = Θ(k2) + W(k-1)
= Θ(k2 + (k - 1)2) + W(k - 2)
= Θ(k2 + (k - 1)2 + (k - 2)2) + W(k - 3)
...
= Θ(k2 + (k - 1)2 + ... + 22 + 12)
= Θ(k3).
This last step follows from the fact that the sum of the first k cubes works out to Θ(k3).
To finish things off, we have one more step. We've shown that order-k trees require Θ(k3) total work to process with this recursive algorithm. However, we'd like a runtime bound in terms of n, the total number of nodes in the tree, not k, the order of the tree. Using the fact that the number of nodes in a tree of order k is Θ(k2), we see that a tree with n nodes has order Θ(k1/2). Plugging this in, we see that for arbitrarily large n, we can make the total work done equal to Θ((n1/2)3) = Θ(n3/2), which exceeds the O(n log n) proposed bound you mentioned. I'm not sure whether this is the worst-case input for this algorithm, but it's certainly not a good one.
So yes, you are correct - the runtime is not O(n log n) in general. However, it is the case that if the tree is perfectly balanced, the runtime is indeed O(n log n). To see why, notice that if the tree is perfectly balanced, each recursive call will
do O(n) work scanning each node in the tree, then
make two recursive calls on smaller trees, each of which is approximately half as large as the previous one.
That gives the recurrence T(n) = 2T(n / 2) + O(n), which solves to O(n log n). But that's just one specific case, not the general case.
A concluding note - with a minor modification, this code can be made to run in time O(n) in all cases. Instead of recomputing the depth of each node, make an initial pass over the tree and annotate each node with its depth (either by setting some internal field equal to the depth or by having an auxiliary HashMap mapping each node to its depth). This can be done in time O(n). From there, recursively walking the tree and checking whether the left and right subtrees have heights that differ by at most one requires O(1) work per node across n total nodes for a total runtime of O(n).
Hope this helps!

Best algorthm to get all combination pair of nodes in an undirected graph (need to improve time complexity)

I have an undirected graph A that has : no multiple-links between any two nodes , no self-connected node, there can be some isolated nodes (nodes with degree 0).
I need to go through all the possible combinations of pair of nodes in graph A to assign some kind of score for non-existence links (Let say if my graph has k nodes and n links, then the number of combination should be (k*(k-1)/2 - n) of combinations). The way that I assign score is based on the common neighbor nodes between the 2 nodes of combination.
Ex: score between A-D should be 1, score between G-D should be 0 ...
The biggest problem is that my graph has more than 100.000 nodes and it was too slow to handle almost 10^10 combinations which is my first attempt to approach the solution.
My second thought is since the algorithm is based on common neighbors of the node, I might only need to look at the neighbors so that I can assign score which is different from 0. The rest can be determined as 0 and no need to compute. But this could possibly repeat a combination.
Any idea to approach this solution is appreciated. Please keep in mind that the actual network has more than 100.000 nodes.

If you represent your graph as an adjacency list (rather than an adjacency matrix), you can make use of the fact that your graph has only 600,000 edges to hopefully reduce the computation time.
Let's take a node V[j] with neighbors V[i] and V[k]:
V[i] ---- V[j] ---- V[k]
To find all such pairs of neighbors you can take the list of nodes adjacent to V[j] and find all combinations of those nodes. To avoid duplicates you will have to generate the combinations rather than the permutations of the end nodes V[i] and V[k] by requiring that i < k.
Alternatively, you can start with node V[i] and find all of the nodes that have a distance of 2 from V[i]. Let S be the set of all the nodes adjacent to V[i]. For each node V[j] in S, create a path V[i]-V[j]-V[k] where:
V[k] is adjacent to V[j]
V[k] is not an element of S (to avoid directly connected nodes)
k != i (to avoid cycles)
k > i (to avoid duplications)
I personally like this approach better because it completes the adjacency list for one node before moving on to the next.
Given that you have ~600,000 edges in a graph with ~100,000 nodes, assuming an even distribution of edges across all of the nodes each node would have an average degree of 12. The number of possible paths for each node is then on the order of 102. Over 105 nodes that gives on the order of 107 total paths rather than the theoretical limit of 1010 for a complete graph. Still a large number, but a thousand times faster than before.

Height of a binary tree

Consider the following code:
public int heightOfBinaryTree(Node node)
{
if (node == null)
{
return 0;
}
else
{
return 1 +
Math.max(heightOfBinaryTree(node.left),
heightOfBinaryTree(node.right));
}
}
I want to know the logical reasoning behind this code. How did people come up with it? Does some have an inductive proof?
Moreover, I thought of just doing a BFS with the root of the binary tree as the argument to get the height of the binary tree. Is the previous approach better than mine?Why?

if (node == null)
{
return 0;
}
The children of leaf nodes are null. Therefore this is saying that once we've gone past the leaves, there are no further nodes.
If we are not past the leaf nodes, we have to calculate the height and this code does so recursively.
return 1 +
The current node adds a height of 1 to the height of the subtree currently being calculated.
Math.max(heightOfBinaryTree(node.left),
heightOfBinaryTree(node.right));
We recursively calculate the height of the left subtree (node.left) and right subtree (node.right). Since we're calculating the maximum depth, we take the maximum of these two depths.
I've shown above that the recursive function is correct. So calling the function on the parent node will calculate the depth of the entire tree.
Here's a graphical representation of the height of a tree from this document. h is the height of the tree, hl and hr are the heights of the left and right subtrees respectively.
Moreover, I thought of just doing a
BFS with the root of the binary tree
as the argument to get the height of
the binary tree. Is the previous
approach better than mine?Why?
The code you provided is a form of DFS. Since you have to process all nodes to find the height of the tree, there will be no runtime difference between DFS and BFS, although BFS will use O(N) memory while DFS will use O(logN) memory. BFS is also slightly more complex to code, since it requires a queue while DFS makes use of the "built-in" recursive stack.

The logic behind that code is:
since a node will have two children, the height of the Tree will be maximum of the heights of tree whose roots are the left child and right child, and of course +1 for the walk to the children.
As you can see, the description above is recursive and so is the code.
BFS should also do, but it would be an overkill as both implementation and space/time complexity.
There is a say, recursive functions though hard to understand, but are very elegant to implement.

The height of a tree is the length of longest downward path from it's root.
This function is a recursive way to count the levels of a binary tree. It just increments counters as it descends the tree, returning the maximum counter (the counter on the lowest node).
I hope I have helped.

It's a recursive function. It's saying the height of a tree is 1 + the height of its tallest branch.
Is BFS a breadth first search? I'm not sure what difference there would be in efficiency, but I like the simplicity of the recursive function.

To extend more on the answers and elaborate more on recursion plus recursive call stack.
Suppose the tree
2
/\
5 9
/
0
lets suppose the left sub-tree first, root(2) called the heightOfBinaryTree method on the left sub-tree
The call stack of the method in question will be as follows
node(5) calls node(0)
node(0) calls node(null)
node(null) breaks the recursive loop
consider that fact these calls are made before the method returned anything.
iterating back on the recursive call stack, this is where each node return its output.
node(null) returned 0 -> 0
node(0) returned (return from node(null) + 1) -> 1
node(5) returned (return from node(0) + 1) -> 2
Same goes for the right sub-tree. If we compare the output from both left and right sub-tree we will have the height.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.