resolve a tree knowing ALL parents and ancestors of leaf nodes - java

It is pretty straighforward to build a tree if you know the immediate parent of all nodes. But what if you have information about ALL parents of the leaf nodes (including grand-parents, great-grand-parent, etc.) without knowing if it is immediate parent or not?
For example consider the following tree:
A -----> B ------> C -----> G
|
D ------> E
|
F
The information available to describe this tree is the following CSV file:
child, parent
E,D
E,B
E,A
F,D
F,B
G,C
G,B
G,A
F,A
Could you please give some advice on a general algortihm to solve this?

parents(F) = {A,B,D}
parents(E) = {A,B,D}
parents(G) = {A,B,C}
It is not possible to recreate the tree from this data set because obviously we can't see from that data which node is the root, is it A or is it B?

Related

Graph traversal for hierarchical relations

I have graph represents the parent child hierarchy. Also it holds the relationship between objects.
I have above as graph. Where Orange are my parent-child hierarchy and Green are my relations. So if i want to get relation between E and F, i will get the relation which is in between B and C (as they are parents of E and F). This relation finding can go up to top most parents.
I can find the parents of a node using Gremlin query like
g.V().has('name', 'D').repeat(out('parent')).emit().values('name')
This query will return me B and A.
Q. On similar lines does Gremlin or any other graph query language supports the relation inheritance ? How Gremlin query should be formed ?
Note : Graph can be very huge containing many unique nodes and many unique relations. I want to get the inherited relations in quick time so that i wont have to pre-calculate and cached it or make duplicates for quick reference.
I assume that by inheritance you mean the ability to traverse from grandparent to parent to child to grand child ,etc... Arango support traversals and has the ability to traverse these type of relationships very fast. For example, to duplicate your example above of starting at node D and getting node B and A you could do something like :
// Find all nodes that are named d
let dNodes = (FOR test in test2
FILTER test.name == 'd'
RETURN test)
//Traverse outbound relationships starting at the dNodes and return up to 2 nodes up the hierarchy
FOR node in dNodes
FOR v,e IN 1..2 OUTBOUND node
testEdge
RETURN v
In terms of performance, I have traversed irregular hierarchies with thousands of nodes without performance issues and without having to cache anything. Keep in mind however that there is no magic here and a bad data model will cause trouble no matter the db engine.
There is some performance information here if you want to review and play with it here
Traversing multiple edges (relationship types) is very similar to our earlier example. To find the path from E to F using hierarchy (orange) edges and relationship (green) edges, we can do :
// Find all nodes that are named E
let eNodes = (FOR test in test3
FILTER test.name == 'E'
RETURN test
)
// Start in node E and go upto three steps
// Traverse the hierarchy edges in any direction (so that we can find parents and child nodes)
// Traverse the relatedto (green) edges in the outbound direction only
// Filter the traversal to items that end in vertice F and return the path (E<-B->C->F)
FOR node in eNodes
FOR v,e,p IN 1..3 ANY node
parentOf, OUTBOUND relatedTo
FILTER v.name == 'F'
RETURN p
Or if we just want the shortest path between E and F we can do for example:
let eNodes = (FOR test in test3
FILTER test.name == 'E'
RETURN test
)
//Find shortest path between node E and F and return the path (E<-B->C->F)
FOR node in eNodes
FOR v, e IN ANY SHORTEST_PATH
node TO 'test3/F'
parentOf, OUTBOUND relatedTo
RETURN e
Note that I just used the id of the "F" record in the code above, but we could have searched fo the record using the name just like we did for the "E" record.
Also note we created the edge data for our example as directed edges in the DB: parentOf edges were created from parent to child (ex: A to B) and for green relationships edges we created them alphabetically (ex: B to C).

Hierarchical Data structure design

I have a hierarchical Data. something like:
The following are the characteristics:
A node can have any number of children
The nodes can be marked as special. Once a node is marked special, the whole subtree starting from that node becomes special.
The following are the operations I want to perform:
Tree.get("a.b.d.g") should give me node g
Tree.set("a.b.d.g",value) which set node g's value
at any node I should know who is the root node
at any node I should if I'm part of special subtree
I should be able to copy/move a subtree in to another tree
I can add new nodes or delete new nodes at every level
I should be able to serialize this data
I can currently think of "hashmap of hashmaps" kind of data structure. I can always cache answer to operations 3 and 4 at every node. Of course I need to clear that cache when I do copy or move etc...
Are there any other ways of implementation to achieve best performance from above operations with minimal memory footprint.
For basic modeling, you should use a Composite pattern:
public class TreeNode {
private String id;
private TreeNode parent;
private List<TreeNode> treeNodes = new ArrayList<>();
...
}
Each node has a String id, a reference to its parent, and references its children.
You can get the top root by iterating on getParent() until its null (use recursion).
For parsing the path imagine something like:
public TreeNode get(final String path) {
if (!path.isEmpty()) {
for (TreeNode treeNode : treeNodes) {
if (path.startsWith(treeNode.getId())) {
return treeNode.get(path.substring(...));
}
}
}
return this;
}
Now if you are looking for a way to store this kind of data (graph like) and to have performant queries on it, you can consider using a graph database as #sebgymn mentioned: Neo4j is a great database for that in java.
It is about using a Connected Data Model with NOSQL. Nodes store data in properties, relationships are also stored and explicitly named in Neo4j and acts as links between nodes. You can then execute queries on a Node (properties, relationships to others...).
Here is link to a presentation: http://fr.slideshare.net/neo4j/data-modeling-with-neo4j
A great tutorial: http://technoracle.blogspot.fr/2012/04/getting-started-with-neo4j-beginners.html
A test graph database to execute queries: http://www.neo4j.org/learn/cypher
For instance: you can try implementing a multi-level pattern tree in neo4j (as in your case it is important to check the top root: so it seems your model has different levels on the tree).

Storing parent child mapping in memory. To list all reachable child for a parent efficiently

I have parent and child mappings in reational database as below,
relationship_id | parent_id | child_id
1 | 100009 | 600009
2 | 100009 | 600010
3 | 600010 | 100008
for performance optimization, i like to keep all these mappings in memory.
Here, a child will be having more than one parent and a parent has more than 2 children.
I guess, i should use "Graph" data structure.
Populating into memory is a one time activity. My concern is that, when I ask to list all child (not only immediate child) it should return them as fast as possible. Addition and deletion happens rarely.
What data structure and algorithm I should use?
Tried MultiHashMap, to achieve O(1) search time, but it has more redundancy.
Have a graph data structure for parent-child relationships. Each GraphNode can just have an ArrayList of children.
Then have HashMap that maps ID to GraphNode.
You need to figure something out so you don't create a cycle (if this is possible) which will cause an infinite loop.
You'll need a custom Node class and a hashmap to store node references for easy lookup.
for each row in database
if parent node exists in map
get it
else
create it and add it
if child node exists in map
get it
else
create it and add it
set relationship between parent and child
The node class would look something like;
public class Node {
private int id;
private List<Node> parents = new ArrayList<Node>();
private List<Node> children = new ArrayList<Node>();
//getters and setters
}

Neo4j Using a custom evaluator to get the depth a node was retrieved at

I'm using the latest version of Neo4j to build up a graph of nodes and relationships with the Java API.
My problem is that I need to traverse the nodes to a certain depth. There may exist a relationship between two nodes at the same depth in the database, but I don't want to return that relationship in the traversal.
I tried to make a custom Evaluator by implementing the Evaluator interface but the only method it overrides is Evaluation evaluate(Path path) . It doesn't appear to have the notion of depth associated with it.
I would really appreciate some advice on how to either associate a node with its depth (when traversing from a particular node) or prune a relationship where two nodes are in the same level.
You can use Evaluators.atDepth() to get a predefined Evaluator that only includes paths with a certain depth.
In your custom evaluator you can simply check the length of the passed path parameter to decide if you want to include this path or not e.g. with:
Evaluation evaluate(Path path) {
return path.length() == 4 ? Evaluation.INCLUDE_AND_PRUNE : Evaluation.EXCLUDE_AND_CONTINUE);
}
Have you tried Cypher for that, something like
start n = node(1) match p=n-[*4]->(x) return x, length(p)
?
The path has a length(), which is the depth. The length is equal to the number of relationships in the path, i.e. number of nodes - 1.

In an AST Visitor, how can I know which node's property I am visiting?

I'm programming an AST Visitor (eclipse JDT).
An EnumDeclaration node contains the following structural properties:
JAVADOC, MODIFIERS, NAME, SUPER_INTERFACE_TYPES, ENUM_CONSTANTS and BODY_DECLARATIONS.
When I visit a child node of EnumDeclaration (a SimpleName node, for instance), is it possible to know which of the lists of nodes I'm visiting? Is it possible to differentiate?
I'd like to process a node differently, depending on whether I found it in ENUM_CONSTANTS or BODY_DECLARATIONS.
I found a solution. Explicitly visiting the nodes in the list (WITH accept(), not visit()). Something like (for visiting the super interfaces):
List<Type> superInterfaces = enumDecNode.superInterfaceTypes();
for( Type superInterface: superInterfaces)
superInterface.accept( this);
Note that it is not possible to use:
this.visit( superInterface);
because Type is an umbrella abstract class for which no visit( Type node) implementation exists.
This also forces the children of the nodes in the superInterfaces list to be visited as soon as their parent is visited. Problem solved.
On a side note, if you already process all the children of a node via these lists, you can forbid the visitor from re-visiting its children, by returning false.
Your nodes should invoke corresponding methods.
MODIFIERS -> visitModifiers
NAME -> visitNAME
and so on
Another alternative solution (thanks to Markus Keller # eclipse JDT forum):
Use "node.getLocationInParent() == EnumDeclaration.NAME_PROPERTY" or
other *_PROPERTY constants.
Markus

Categories

Resources