loading binary tree from string (parent-left-right)

loading binary tree from string (parent-left-right) - java

I have a binary tree that looks like this
the object that represents it looks like this (java)
public class node {
private String value = "";
private TreeNode aChild;
private TreeNode bChild;
....
}
I want to read the data and build the tree from a string.
So I wrote some small method to serialize it and I have it like this
(parent-left-right)
0,null,O#1,left,A#2,left,C#3,left,D#4,left,E#4,right,F#1,right,B#
Then I read it and I have it as a list - objects in this order O,A,C,D,E,F,B
And now my question is - how to I build the tree?
iterating and putting it on a stack, queue ?
should I serialize on a different order ?
(basically I want to learn the best practices for building a tree from string data)
can you refer me to a link on that subject ?

Given your second string representation, there is no way to retrieve the original tree. So unless any tree with that sequence is acceptable, you'll have to include mor information in your string. One possible way would be representing null references in some fashion. Another would be using parentheses or similar.
Given your first representation, restoring the data is still possible. One algorithm expliting the level information would be the following:
Maintain a reference x to the current position in your tree
For every node n you want to add, move that reference x up in your tree as long as the level of x is no less than the level of n
Check that now the level of x is exactly one less than the level of n
Make x the parent of n, and n the next child of x
Move x to now point at n
This works if you have parent links in your nodes. If you don't, then you can maintain a list of the most recent node for every level. x would then correspond to the last element of that list, and moving x up the tree would mean removing the last element from the list. The level of x would be the length of the list.

Your serialization is not well explained, especially regarding how you represent missing nodes. There are several ways, such as representing the tree structure with ()s or by using the binary tree in an array technique. Both of these can be serialized easily. Take a look at Efficient Array Storage for Binary Tree for further explanations.

Related

Generating hierarchy from flat data

I need to generate a hierarchy from flat data. This question is not for homework or for an interview test, although I imagine it'd make a good example for either one. I've seen this and this and this, none of them exactly fit my situation.
My data is as follows. I have a list of objects. Each object has a breadcrumb and a text. Examples are below:
Object 1:
---------
breadcrumb: [Person, Manager, Hourly, New]
text: hello world
Object 2:
---------
breadcrumb: [Person, Manager, Salary]
text: hello world again
And I need to convert that to a hierarchy:
Person
|--Manager
|--Hourly
|--New
|--hello world
|--Salary
|--hello world again
I'm doing this in Java but any language would work.

You need a Trie datastructure, where each Node holds children in
List<Node>
Trie itself should contain one Node--root, that initially is empty;
When new sequence arrives, iterate over its items trying to find corresponding value among existing children at the current Node, moving forward if corresponding item is found. Such you find a longest prefix existing in trie, common to a given sequence;
If longest common prefix doesn't cover entire sequence, use remaining items to build a chain of nodes where each node have only one child (next item), and attach it as child to a node at which you stopped at step (2).
You see, this is not so easy. Implementation code would be long and not obvious. Unfortunately, JDK doesn't have standard trie implementations, but you can try to find some existing or write your own.
For more details, see https://en.wikipedia.org/wiki/Trie#Algorithms

current between Linked list and BST - java

Is it possible to convert a linked list to binary search tree BST.
and also have a link between them in a way of the current element, is pointing to the same in both linked list and BST
linked list :
3 , 5 ,6 ,1, 2 ,0,4
BST :
3
/ \
1 5
/ \ / \
0 2 4 6
when current of the binary search tree points to 1
it should also point to 1 in the linked list

Absolutely, you can use the same nodes in both data structures. The nodes would then each have the two links of their "linked-list-ness" and the two children of their "BST-ness".
So as a quick example, a node might have fields like this:
class Node {
public Node next, prev; // for the linked list
public Node left, right; // for the BST
// put some data in here too
}
It takes some care to use those nodes, because it's easy to get the two data structures "out of sync", for example if you remove a node from one but forget to remove it from the other. Then again, maybe that's what you want. But other things stay the same, for example if you do rotations in the BST then you don't have to touch the linked-lists fields at all, and similarly if you swap two nodes in the linked-list then the BST fields are not affected at all. So in those cases the code will be the same as it would have been if you had a normal kind of node that is part of only one data structure.
Of course this trick does not apply to the standard library implementations.

You can absolutely have that as harold pointed out above. And as he also points out, you can get them out of sync if your implementation has bugs that allow the data structures to go out of sync.
So, you could do this instead:
//YourBST.java
public class YourBST<T> {
public List<T> toList() { //Convert this to a list }
}
//YourList.java
public class YourList<T> {
public YourBST<T> toBST() { //... }
}
And then, when modifying one or the other..just call the conversion methods to set the other datastructure.
YourBST<String> bst = new YourBST<String>();
YourList<String> list = new YourList<String>();
modify(bst); //modify your BST in some way
list = bst.toList();
modify(list); //modify your List in some way
bst = list.toBST();
Edit: if you indeed want the same object to have prev/next references as well as left/right references, that still requires the node class as Harold described above. This solution simply fixes the "out of sync" problem.

Why store the points in a binary tree?

This question covers a software algorithm, from On topic
I am working on an interview question from Amazon Software Question,
specifically "Given a set of points (x,y) and an integer "n", return n number of points which are close to the origin"
Here is the sample high level psuedocode answer to this question, from Sample Answer
Step 1: Design a class called point which has three fields - int x, int y, int distance
Step 2: For all the points given, find the distance between them and origin
Step 3: Store the values in a binary tree
Step 4: Heap sort
Step 5: print the first n values from the binary tree
I agree with steps 1 and 2 because it makes sense in terms of object-oriented design to have one software bundle of data, Point, encapsulate away the fields of x, y and distance.Ensapsulation
Can someone explain the design decisions from 3 to 5?
Here's how I would do steps of 3 to 5
Step 3: Store all the points in an array
Step 4: Sort the array with respect to distance(I use some build in sort here like Arrays.Sort
Step 5: With the array sorted in ascending order, I print off the first n values
Why the author of that response use a more complicated data structure, binary tree and not something simpler like an array that I used? I know what a binary tree is - hierarchical data structure of nodes with two pointers. In his algorithm, would you have to use a BST?

First, I would not say that having Point(x, y, distance) is good design or encapsulation. distance is not really part of a point, it can be computed from x and y. In term of design, I would certainly have a function, i.e. a static method from Point or an helper class Points.
double distance(Point a, Point b)
Then for the specific question, I actually agree with your solution, to put the data in an array, sort this array and then extract the N first.
What the example may be hinted at is that the heapsort actually often uses a binary tree structure inside the array to be sorted as explained here :
The heap is often placed in an array with the layout of a complete binary tree.
Of course, if the distance to the origin is not stored in the Point, for performance reason, it had to be put with the corresponding Point object in the array, or any information that will allow to get the Point object from the sorted distance (reference, index), e.g.
List<Pair<Long, Point>> distancesToOrigin = new ArrayList<>();
to be sorted with a Comparator<Pair<Long, Point>>

It is not necessary to use BST. However, it is a good practice to use BST when needing a structure that is self-sorted. I do not see the need to both use BST and heapsort it (somehow). You could use just BST and retrieve the first n points. You could also use an array, sort it and use the first n points.
If you want to sort an array of type Point, you could implement the interface Comparable (Point would imolement that interface) and overload the default method.
You never have to choose any data structures, but by determining the needs you have, you would also easily determine the optimum structure.

The approach described in this post is more complex than needed for such a question. As you noted, simple sorting by distance will suffice. However, to help explain your confusion about what your sample answer author was trying to get at, maybe consider the k nearest neighbors problem which can be solved with a k-d tree, a structure that applies space partitioning to the k-d dataset. For 2-dimensional space, that is indeed a binary tree. This tree is inherently sorted and doesn't need any "heap sorting."
It should be noted that building the k-d tree will take O(n log n), and is only worth the cost if you need to do repeated nearest neighbor searches on the structure. If you only need to perform one search to find k nearest neighbors from the origin, it can be done with a naive O(n) search.
How to build a k-d tree, straight from Wiki:
One adds a new point to a k-d tree in the same way as one adds an element to any other search tree. First, traverse the tree, starting from the root and moving to either the left or the right child depending on whether the point to be inserted is on the "left" or "right" side of the splitting plane. Once you get to the node under which the child should be located, add the new point as either the left or right child of the leaf node, again depending on which side of the node's splitting plane contains the new node.
Adding points in this manner can cause the tree to become unbalanced, leading to decreased tree performance. The rate of tree performance degradation is dependent upon the spatial distribution of tree points being added, and the number of points added in relation to the tree size. If a tree becomes too unbalanced, it may need to be re-balanced to restore the performance of queries that rely on the tree balancing, such as nearest neighbour searching.
Once have have built the tree, you can find k nearest neighbors to some point (the origin in your case) in O(k log n) time.
Straight from Wiki:
Searching for a nearest neighbour in a k-d tree proceeds as follows:
Starting with the root node, the algorithm moves down the tree recursively, in the same way that it would if the search point were being inserted (i.e. it goes left or right depending on whether the point is lesser than or greater than the current node in the split dimension).
Once the algorithm reaches a leaf node, it saves that node point as the "current best"
The algorithm unwinds the recursion of the tree, performing the following steps at each node:
If the current node is closer than the current best, then it becomes the current best.
The algorithm checks whether there could be any points on the other side of the splitting plane that are closer to the search point than the current best. In concept, this is done by intersecting the splitting hyperplane with a hypersphere around the search point that has a radius equal to the current nearest distance. Since the hyperplanes are all axis-aligned this is implemented as a simple comparison to see whether the difference between the splitting coordinate of the search point and current node is lesser than the distance (overall coordinates) from the search point to the current best.
If the hypersphere crosses the plane, there could be nearer points on the other side of the plane, so the algorithm must move down the other branch of the tree from the current node looking for closer points, following the same recursive process as the entire search.
If the hypersphere doesn't intersect the splitting plane, then the algorithm continues walking up the tree, and the entire branch on the other side of that node is eliminated.
When the algorithm finishes this process for the root node, then the search is complete.
This is a pretty tricky algorithm that I would hate to need to describe as an interview question! Fortunately the general case here is more complex than is needed, as you pointed out in your post. But I believe this approach may be close to what your (wrong) sample answer was trying to describe.

Binary Search Tree of Strings

I had a question of exactly how a binary search tree of strings works. I know and have implemented binary search trees of integers by checking if the new data <= parent data then by branching left if its less or right if its greater. However I am a little confused on how to implement this with nodes of strings.
With the integers or characters I can just insert in an array into my insert method of the tree i programmed and it builds the tree nodes correctly. My question is how you would work this with an array of strings. How would you get the strings to branch off correctly in the tree? For example if I had an array of questions how would I be able to branch the BST correctly so I would eventually get to the correct answer.
For example look at the following trivial tree example.
land animal?
have tentacles?------------^-------------indoor animal
have claws?-----^----jellyfish live in jungle?----^----does it bark?
eat plankton?----^----lobster bear----^----lion cat----^----dog
shark----^----whale
How would you populate a tree such as this so that nodes populate where how you want them. I am trying to make a BST for trouble shooting and i am confused how to populate the nodes of strings so they appear in the correct positions. Do you need to hard code the nodes?

Update 2, to build a binary decision tree:
A binary decision tree can be thought of as a bunch of questions that yield boolean responses about facets of leaf nodes - the facet either exists / holds true or it does not. That is, for every descendent of a particular node/edge we must be able to say "this question/answer holds" (answers can be "true" or "false"). For instance, a bark is a facet of a (normal) dog, but tentacles are not a facet of a Whale. In the presented tree, the false edge always leads to the left subtree: this is a convention to avoid labeling each edge with true/false or Y/N.
The tree can only be built from existing/external knowledge that allows one to answer each question for every animal.
Here is a rough algorithm can be used to build such a tree:
Start with a set of possible animals, call this A, and a set of questions, call this Q.
Pick a question, q, from Q for which count(True(q, a in A)) is closest to that of count(False(q, a in A)) - if the resulting tree is a balanced binary tree these counts will always be equal for the best question to ask.
Remove q from Q and use it as the question to ask for the current node. Put all False(q,a) into the set of animals (A') available to the left child node and put all True(q,a) into the set of animals (A'') available to the right child node.
Following each edge/branch (false=left, true=right), pick a suitable question from the remaining Q and repeat (using A' or A'' for A, as appropriate).
(Of course, there are many more complete/detailed/accurate resources found online as course material or whitepapers. Not to mention a suitable selection of books at most college campuses ..)
Update, for a [binary] decision tree:
In this particular case (which is clear with the added diagram) the graph is based on the "yes" or "no" response for the question which represent the edges between the nodes. That is, the tree is not not built using an ordering of the string values themselves. In this case it might make sense to always have the left branch "false" and the right branch "true" although each node could have more edges/children if non-binary responses are allowed.
The decision tree must be "trained" (google search). That is, the graph must be built initially based on the questions/responses which is unlike a BST that is based merely on ordering between nodes. The initial graph building cannot be done from merely an array of questions as the edges do not follow an intrinsic ordering.
Initial response, for a binary search tree:
The same way it does for integers: the algorithm does not change.
Consider a function, compareTo(a,b) that will return -1, 0 or 1 for a < b, a == b, and a > b, respectively.
Then consider that the type of neither a nor b matter (as long as they are the same) when implementing a function with this contract if such a type supports ordering: it will be "raw" for integers and use the host language's corresponding string comparison for string types.

Data structure to represent nodes lengths of paths between them?

Okay I am new to Java, and I'm asking this question because I'm sure there is a better simple way to deal with this and the more experienced folk out there may be able to give me some pointers.
I have a graph of cities with lengths of paths between them. I am trying to construct an algorithm using Java to go from a start city to a destination city, finding the shortest path. Each city will have a name and map coordinates. More specifically I will be using the A* algorithm, but that is (probably) not important to my question.
My issue is I am trying to figure out a good way to represent the nodes and the paths between them with the length.
The easiest way I could think of was to create a huge 2 dimensional square array with each city represented by an index, where the connecting cities can be represented by where they intersect in the array. I assigned an index # to each city. In the array values, 0's would go where there is no connection, and the distance would go where there is a connection.
I will also have a city subclass with an "index" attribute, with the value of its index in the array. The downside to this is to figure out which cities have connections, there have to be extra steps to lookup what the city's index is in the array, and also having to lookup which connecting city has the connecting index.
Is there a better way to represent this?

An alternative way would be having a Node structure that store all the pointers to the adjacent nodes.
E.g.
if you have something like this in your data structure
A B C
A / 0 1
B 0 / 1
C 1 1 /
in the new structure it would be
A: [C]
B: [C]
C: [AB]
Compare to your 2D array approach, this way takes longer time to check if two nodes are connected, but uses smaller space

Consider...
class Node {
List<Link> link;
String cityName;
}
class Link {
Node destinationCity;
Long distance;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.