Graph traversal for hierarchical relations - java

I have graph represents the parent child hierarchy. Also it holds the relationship between objects.
I have above as graph. Where Orange are my parent-child hierarchy and Green are my relations. So if i want to get relation between E and F, i will get the relation which is in between B and C (as they are parents of E and F). This relation finding can go up to top most parents.
I can find the parents of a node using Gremlin query like
g.V().has('name', 'D').repeat(out('parent')).emit().values('name')
This query will return me B and A.
Q. On similar lines does Gremlin or any other graph query language supports the relation inheritance ? How Gremlin query should be formed ?
Note : Graph can be very huge containing many unique nodes and many unique relations. I want to get the inherited relations in quick time so that i wont have to pre-calculate and cached it or make duplicates for quick reference.

I assume that by inheritance you mean the ability to traverse from grandparent to parent to child to grand child ,etc... Arango support traversals and has the ability to traverse these type of relationships very fast. For example, to duplicate your example above of starting at node D and getting node B and A you could do something like :
// Find all nodes that are named d
let dNodes = (FOR test in test2
FILTER test.name == 'd'
RETURN test)
//Traverse outbound relationships starting at the dNodes and return up to 2 nodes up the hierarchy
FOR node in dNodes
FOR v,e IN 1..2 OUTBOUND node
testEdge
RETURN v
In terms of performance, I have traversed irregular hierarchies with thousands of nodes without performance issues and without having to cache anything. Keep in mind however that there is no magic here and a bad data model will cause trouble no matter the db engine.
There is some performance information here if you want to review and play with it here
Traversing multiple edges (relationship types) is very similar to our earlier example. To find the path from E to F using hierarchy (orange) edges and relationship (green) edges, we can do :
// Find all nodes that are named E
let eNodes = (FOR test in test3
FILTER test.name == 'E'
RETURN test
)
// Start in node E and go upto three steps
// Traverse the hierarchy edges in any direction (so that we can find parents and child nodes)
// Traverse the relatedto (green) edges in the outbound direction only
// Filter the traversal to items that end in vertice F and return the path (E<-B->C->F)
FOR node in eNodes
FOR v,e,p IN 1..3 ANY node
parentOf, OUTBOUND relatedTo
FILTER v.name == 'F'
RETURN p
Or if we just want the shortest path between E and F we can do for example:
let eNodes = (FOR test in test3
FILTER test.name == 'E'
RETURN test
)
//Find shortest path between node E and F and return the path (E<-B->C->F)
FOR node in eNodes
FOR v, e IN ANY SHORTEST_PATH
node TO 'test3/F'
parentOf, OUTBOUND relatedTo
RETURN e
Note that I just used the id of the "F" record in the code above, but we could have searched fo the record using the name just like we did for the "E" record.
Also note we created the edge data for our example as directed edges in the DB: parentOf edges were created from parent to child (ex: A to B) and for green relationships edges we created them alphabetically (ex: B to C).

Related

Sorting a parent object by value of its set child

I'm trying to sort list of parent object by the value of its child which is
a set. So let's say I have the ff:
Parent1 with children name rose
Parent2 with children name cameo
Parent3 with children name of abba, zeon, max.
When I sort it descending it should show Parent3 first since it has a z.
This is my current hql which gets a wrong result of 1>2>3:
SELECT DISTINCT p FROM Parent p JOIN p.children c ORDER BY c.name desc
Without a distinct, it gets it just fine although it selects multiple same parents.
I have a model setup like below:
public class Parent {
private Set<Child> children = new HashSet<Child>();
}
public class Child{
private String name;
}
Edit: Managed to sort it HQL order by within a collection although when both parents have the same children.name value, it doesn't compare the next possible value. I.e.
If Parent1 has children abba, zeon
Parent2 has children abba, cameo
Ascending order should prioritize Parent2 first.
If you don't want duplication of the parent you can't do it with a collection. What you can do is to select the Parent and subselect the the max element (alphabeticaly) in each Parent and order by it.
Something around these lines.
select p from Parent p
where p.maxChild =
(select max(c.name) from Child c where c.parentID=p.parentID)
You can use #Formula to map the max element if you want to. So it is not realy nessesary to use a collection at all.
Actualy it may be even better with formula, you can map the maximum name and order by it directly without fetching the collection at all.

Hierarchical Data structure design

I have a hierarchical Data. something like:
The following are the characteristics:
A node can have any number of children
The nodes can be marked as special. Once a node is marked special, the whole subtree starting from that node becomes special.
The following are the operations I want to perform:
Tree.get("a.b.d.g") should give me node g
Tree.set("a.b.d.g",value) which set node g's value
at any node I should know who is the root node
at any node I should if I'm part of special subtree
I should be able to copy/move a subtree in to another tree
I can add new nodes or delete new nodes at every level
I should be able to serialize this data
I can currently think of "hashmap of hashmaps" kind of data structure. I can always cache answer to operations 3 and 4 at every node. Of course I need to clear that cache when I do copy or move etc...
Are there any other ways of implementation to achieve best performance from above operations with minimal memory footprint.
For basic modeling, you should use a Composite pattern:
public class TreeNode {
private String id;
private TreeNode parent;
private List<TreeNode> treeNodes = new ArrayList<>();
...
}
Each node has a String id, a reference to its parent, and references its children.
You can get the top root by iterating on getParent() until its null (use recursion).
For parsing the path imagine something like:
public TreeNode get(final String path) {
if (!path.isEmpty()) {
for (TreeNode treeNode : treeNodes) {
if (path.startsWith(treeNode.getId())) {
return treeNode.get(path.substring(...));
}
}
}
return this;
}
Now if you are looking for a way to store this kind of data (graph like) and to have performant queries on it, you can consider using a graph database as #sebgymn mentioned: Neo4j is a great database for that in java.
It is about using a Connected Data Model with NOSQL. Nodes store data in properties, relationships are also stored and explicitly named in Neo4j and acts as links between nodes. You can then execute queries on a Node (properties, relationships to others...).
Here is link to a presentation: http://fr.slideshare.net/neo4j/data-modeling-with-neo4j
A great tutorial: http://technoracle.blogspot.fr/2012/04/getting-started-with-neo4j-beginners.html
A test graph database to execute queries: http://www.neo4j.org/learn/cypher
For instance: you can try implementing a multi-level pattern tree in neo4j (as in your case it is important to check the top root: so it seems your model has different levels on the tree).

Neo4j Using a custom evaluator to get the depth a node was retrieved at

I'm using the latest version of Neo4j to build up a graph of nodes and relationships with the Java API.
My problem is that I need to traverse the nodes to a certain depth. There may exist a relationship between two nodes at the same depth in the database, but I don't want to return that relationship in the traversal.
I tried to make a custom Evaluator by implementing the Evaluator interface but the only method it overrides is Evaluation evaluate(Path path) . It doesn't appear to have the notion of depth associated with it.
I would really appreciate some advice on how to either associate a node with its depth (when traversing from a particular node) or prune a relationship where two nodes are in the same level.
You can use Evaluators.atDepth() to get a predefined Evaluator that only includes paths with a certain depth.
In your custom evaluator you can simply check the length of the passed path parameter to decide if you want to include this path or not e.g. with:
Evaluation evaluate(Path path) {
return path.length() == 4 ? Evaluation.INCLUDE_AND_PRUNE : Evaluation.EXCLUDE_AND_CONTINUE);
}
Have you tried Cypher for that, something like
start n = node(1) match p=n-[*4]->(x) return x, length(p)
?
The path has a length(), which is the depth. The length is equal to the number of relationships in the path, i.e. number of nodes - 1.

Load parent/child hierarchy with hibernate controlling leafs

I am trying to load from a database a graph of parent/child objects (similar to the DefaultMutableTreeNode object of Java). There is a simple one-to-many association between the 2. The total number of levels of the graph is known so i know exactly how many times to invoke the 'getChildren()' method.
What i want to do is to NOT call this method for the actual leaf nodes. Usually the graph consists of a few non-leaf nodes and several hundreds leaf nodes. If i specify lazy=false in the hb mapping, i get hundreds of unnecessary queries from hb for the children of leaf nodes, whereas i know beforehand that they are not needed (since i know the total number of levels on the tree).
Unfortunately i cannot use lazy=true and only loop until the parents of the leaf nodes because i am working on a disconnected client/server model and using beanlib to load the whole object graph (that contains several other objects).
So i am trying to find a way to intercept the loading of the 'children' collection and instruct hb to stop when it reaches the leaf nodes. Is there a way to do that?
I am looking at 2 solutions:
What i have in mind is this: when i call the node.getChildren() method (within a hb session), normally hb will perform a db query to get the children: is there a way to intercept this call and just not make it? I know that there are no children so i just want it to fail fast (in fact i don't want to make it at all).
Thank you
Costas
Why don't you just use a boolean leaf property, and make your getChildren method return an empty list if leaf is true?
private boolean leaf;
private List<Node> children;
public List<Node> getChildren() {
if (leaf) {
return Collection.<Node>emptyList();
}
return children;
}
Unless your database is colocated with the java code issueing these queries, it is probably a performance bottleneck to issue a query per node, even if it just a query per inner node. Since you know the maximum levels of the tree (let's assume 3 for the sake of example), the following ought to fetch the entire tree in a single query:
from Node n1
left join n1.children as n2
left join n2.children as n3
left join n3.children as n4
The disadvantage of that method is that the resultset will repeat the data for each inner node for each of its descendants, i.e. the bandwith taken is multiplied by the number of tree levels. If that is an issue because you have many levels, you could enable batch fetching for that collection, or even do something similar by hand:
List<Node> border = Collections.singletonList(rootNode);
while (!border.isEmpty()) {
List<Integer> ids = new ArrayList<Integer>();
for (Node n : border) {
ids.add(n.getId());
}
// initialize the children collection in all nodes in border
session.createQuery("from Node n left join n.children where n.id in ?").setParameter(0, ids).list();
List<Node> newBorder = new ArrayList<Node>();
for (Node n : border) {
newBorder.addAll(n.getChildren());
}
border = newBorder;
}
This will issue as many queries as there are levels in the tree, and transmit the data for each node twice. (Some databases restrict the size of an in-clause. You'd have to batch within the level, then)
You can use AOP around advice around the getChildren call that does something like this (please note this is very rough psuedo-code, you will have to fill in the "blanks"):
childrenResult = node.getChildren()
if (Hibernate.isInitialized(childrenResult)) {
return node.getChildren()
} else {
// Do something else here
}
What this will do is when you make a call to getChildren and the collection is not initialized, it can be ingored or not allowed to continue processing. However, if the item is initialized it will allow the calls to continue. One thing to note about Hibernate.isInitialized is that it will return true on ALL objects but lazy-loaded collections that have not been populated yet.
If you are not able to use AOP, you could always do this check on your own call to getChildren in your code.

resolve a tree knowing ALL parents and ancestors of leaf nodes

It is pretty straighforward to build a tree if you know the immediate parent of all nodes. But what if you have information about ALL parents of the leaf nodes (including grand-parents, great-grand-parent, etc.) without knowing if it is immediate parent or not?
For example consider the following tree:
A -----> B ------> C -----> G
|
D ------> E
|
F
The information available to describe this tree is the following CSV file:
child, parent
E,D
E,B
E,A
F,D
F,B
G,C
G,B
G,A
F,A
Could you please give some advice on a general algortihm to solve this?
parents(F) = {A,B,D}
parents(E) = {A,B,D}
parents(G) = {A,B,C}
It is not possible to recreate the tree from this data set because obviously we can't see from that data which node is the root, is it A or is it B?

Categories

Resources