Load parent/child hierarchy with hibernate controlling leafs - java

I am trying to load from a database a graph of parent/child objects (similar to the DefaultMutableTreeNode object of Java). There is a simple one-to-many association between the 2. The total number of levels of the graph is known so i know exactly how many times to invoke the 'getChildren()' method.
What i want to do is to NOT call this method for the actual leaf nodes. Usually the graph consists of a few non-leaf nodes and several hundreds leaf nodes. If i specify lazy=false in the hb mapping, i get hundreds of unnecessary queries from hb for the children of leaf nodes, whereas i know beforehand that they are not needed (since i know the total number of levels on the tree).
Unfortunately i cannot use lazy=true and only loop until the parents of the leaf nodes because i am working on a disconnected client/server model and using beanlib to load the whole object graph (that contains several other objects).
So i am trying to find a way to intercept the loading of the 'children' collection and instruct hb to stop when it reaches the leaf nodes. Is there a way to do that?
I am looking at 2 solutions:
What i have in mind is this: when i call the node.getChildren() method (within a hb session), normally hb will perform a db query to get the children: is there a way to intercept this call and just not make it? I know that there are no children so i just want it to fail fast (in fact i don't want to make it at all).
Thank you
Costas

Why don't you just use a boolean leaf property, and make your getChildren method return an empty list if leaf is true?
private boolean leaf;
private List<Node> children;
public List<Node> getChildren() {
if (leaf) {
return Collection.<Node>emptyList();
}
return children;
}

Unless your database is colocated with the java code issueing these queries, it is probably a performance bottleneck to issue a query per node, even if it just a query per inner node. Since you know the maximum levels of the tree (let's assume 3 for the sake of example), the following ought to fetch the entire tree in a single query:
from Node n1
left join n1.children as n2
left join n2.children as n3
left join n3.children as n4
The disadvantage of that method is that the resultset will repeat the data for each inner node for each of its descendants, i.e. the bandwith taken is multiplied by the number of tree levels. If that is an issue because you have many levels, you could enable batch fetching for that collection, or even do something similar by hand:
List<Node> border = Collections.singletonList(rootNode);
while (!border.isEmpty()) {
List<Integer> ids = new ArrayList<Integer>();
for (Node n : border) {
ids.add(n.getId());
}
// initialize the children collection in all nodes in border
session.createQuery("from Node n left join n.children where n.id in ?").setParameter(0, ids).list();
List<Node> newBorder = new ArrayList<Node>();
for (Node n : border) {
newBorder.addAll(n.getChildren());
}
border = newBorder;
}
This will issue as many queries as there are levels in the tree, and transmit the data for each node twice. (Some databases restrict the size of an in-clause. You'd have to batch within the level, then)

You can use AOP around advice around the getChildren call that does something like this (please note this is very rough psuedo-code, you will have to fill in the "blanks"):
childrenResult = node.getChildren()
if (Hibernate.isInitialized(childrenResult)) {
return node.getChildren()
} else {
// Do something else here
}
What this will do is when you make a call to getChildren and the collection is not initialized, it can be ingored or not allowed to continue processing. However, if the item is initialized it will allow the calls to continue. One thing to note about Hibernate.isInitialized is that it will return true on ALL objects but lazy-loaded collections that have not been populated yet.
If you are not able to use AOP, you could always do this check on your own call to getChildren in your code.

Related

Graph traversal for hierarchical relations

I have graph represents the parent child hierarchy. Also it holds the relationship between objects.
I have above as graph. Where Orange are my parent-child hierarchy and Green are my relations. So if i want to get relation between E and F, i will get the relation which is in between B and C (as they are parents of E and F). This relation finding can go up to top most parents.
I can find the parents of a node using Gremlin query like
g.V().has('name', 'D').repeat(out('parent')).emit().values('name')
This query will return me B and A.
Q. On similar lines does Gremlin or any other graph query language supports the relation inheritance ? How Gremlin query should be formed ?
Note : Graph can be very huge containing many unique nodes and many unique relations. I want to get the inherited relations in quick time so that i wont have to pre-calculate and cached it or make duplicates for quick reference.
I assume that by inheritance you mean the ability to traverse from grandparent to parent to child to grand child ,etc... Arango support traversals and has the ability to traverse these type of relationships very fast. For example, to duplicate your example above of starting at node D and getting node B and A you could do something like :
// Find all nodes that are named d
let dNodes = (FOR test in test2
FILTER test.name == 'd'
RETURN test)
//Traverse outbound relationships starting at the dNodes and return up to 2 nodes up the hierarchy
FOR node in dNodes
FOR v,e IN 1..2 OUTBOUND node
testEdge
RETURN v
In terms of performance, I have traversed irregular hierarchies with thousands of nodes without performance issues and without having to cache anything. Keep in mind however that there is no magic here and a bad data model will cause trouble no matter the db engine.
There is some performance information here if you want to review and play with it here
Traversing multiple edges (relationship types) is very similar to our earlier example. To find the path from E to F using hierarchy (orange) edges and relationship (green) edges, we can do :
// Find all nodes that are named E
let eNodes = (FOR test in test3
FILTER test.name == 'E'
RETURN test
)
// Start in node E and go upto three steps
// Traverse the hierarchy edges in any direction (so that we can find parents and child nodes)
// Traverse the relatedto (green) edges in the outbound direction only
// Filter the traversal to items that end in vertice F and return the path (E<-B->C->F)
FOR node in eNodes
FOR v,e,p IN 1..3 ANY node
parentOf, OUTBOUND relatedTo
FILTER v.name == 'F'
RETURN p
Or if we just want the shortest path between E and F we can do for example:
let eNodes = (FOR test in test3
FILTER test.name == 'E'
RETURN test
)
//Find shortest path between node E and F and return the path (E<-B->C->F)
FOR node in eNodes
FOR v, e IN ANY SHORTEST_PATH
node TO 'test3/F'
parentOf, OUTBOUND relatedTo
RETURN e
Note that I just used the id of the "F" record in the code above, but we could have searched fo the record using the name just like we did for the "E" record.
Also note we created the edge data for our example as directed edges in the DB: parentOf edges were created from parent to child (ex: A to B) and for green relationships edges we created them alphabetically (ex: B to C).

Hierarchical Data structure design

I have a hierarchical Data. something like:
The following are the characteristics:
A node can have any number of children
The nodes can be marked as special. Once a node is marked special, the whole subtree starting from that node becomes special.
The following are the operations I want to perform:
Tree.get("a.b.d.g") should give me node g
Tree.set("a.b.d.g",value) which set node g's value
at any node I should know who is the root node
at any node I should if I'm part of special subtree
I should be able to copy/move a subtree in to another tree
I can add new nodes or delete new nodes at every level
I should be able to serialize this data
I can currently think of "hashmap of hashmaps" kind of data structure. I can always cache answer to operations 3 and 4 at every node. Of course I need to clear that cache when I do copy or move etc...
Are there any other ways of implementation to achieve best performance from above operations with minimal memory footprint.
For basic modeling, you should use a Composite pattern:
public class TreeNode {
private String id;
private TreeNode parent;
private List<TreeNode> treeNodes = new ArrayList<>();
...
}
Each node has a String id, a reference to its parent, and references its children.
You can get the top root by iterating on getParent() until its null (use recursion).
For parsing the path imagine something like:
public TreeNode get(final String path) {
if (!path.isEmpty()) {
for (TreeNode treeNode : treeNodes) {
if (path.startsWith(treeNode.getId())) {
return treeNode.get(path.substring(...));
}
}
}
return this;
}
Now if you are looking for a way to store this kind of data (graph like) and to have performant queries on it, you can consider using a graph database as #sebgymn mentioned: Neo4j is a great database for that in java.
It is about using a Connected Data Model with NOSQL. Nodes store data in properties, relationships are also stored and explicitly named in Neo4j and acts as links between nodes. You can then execute queries on a Node (properties, relationships to others...).
Here is link to a presentation: http://fr.slideshare.net/neo4j/data-modeling-with-neo4j
A great tutorial: http://technoracle.blogspot.fr/2012/04/getting-started-with-neo4j-beginners.html
A test graph database to execute queries: http://www.neo4j.org/learn/cypher
For instance: you can try implementing a multi-level pattern tree in neo4j (as in your case it is important to check the top root: so it seems your model has different levels on the tree).

What's an intelligent way of creating a graph like object for the Ford-Fulkerson algorithm?

I'm trying to implement a Ford–Fulkerson algorithm in java and I've been having some problems where my code gets obnoxiously and unnecessarily complicated.
What I do want is to have:
class Node:
private int id
private static int idAssigner // I may move this to another class
// etc
class flowNetwork
private Node source // begin point
private Node sink // end point
Now I want to group nodes similarly how I would a (bidirectional) tree. Each node has a list of all nodes it's connected to.
My problem is this: How could I give this connection a value (maximum flow, current flow) ?
Should I make another class Connection that has Node A Node B and max flow / current flow. And if I do that, how should I connect the nodes ? (as in should every node have a Connection and wouldn't that be redundant ? I'm a bit stuck.
edit Or should I just have Connections and implement some sort of search function to acomodate linking elements. It's all I can think of to be honest.
P.S.
This class is mostly just the math part, so I have never implemented a graph, nor does the course cover this, so thank you for helping a novice :) (that's if this doesn't get closed in like 5 minutes).
I think, you can use map of linked nodes in each node. With node key and link information as value.
It's not a fast solution, but it's simple.
Faster will be to have a matrix, elements of wich is a link objects, containing all link info. Rows and columns will be node indices.

Elegant way to implement a navigable graph?

This is a design problem. I'm struggling to create a conceptual model for a problem I'm facing.
I have a graph of a number of objects (<1000). These objects are connected together in a myriad of ways. Each of these objects have some attributes.
I need to be able to access these object via both their connections and their attributes.
For example let us assume following objects -
{name: A, attributes:{black, thin, invalid}, connections: {B,C}}
{name: B, attributes:{white, thin, valid}, connections: {A}}
{name: C, attributes:{black, thick, invalid}, connections: {A,B}}
Now I should be able to query this graph in following ways -
Using attributes -
black - yields [A,C]
black.thick - yields C
Using connections -
A.connections[0].connections[0] - yields A
Using combination thereof -
black[0].connections[0] - yields B
My primary language is Java. But I don't think Java is capable of handling these kinds of beasts. Thus I'm trying to implement this in a dynamic language like Python.
I have also thought about using expression language evaluation like OGNL, or a Graph database. But I'm confused. I'm not interested in coding solutions. But what is the correct way to model such a problem?
It sounds like you have some object model which you want to query in different ways. One solution would be to use Java to create your model and then use a scripting language to support querying against this model in different ways. e.g: Java + Groovy would be my recommendation.
You could use the following Java class for the model.
public class Node {
private String name;
private final Set<String> attributes = new HashSet<String>();
private final List<Node> connections = new ArrayList<Node>();
// getter / setter for all
}
You should then populate a list of such objects with 'connections' property properly populated.
To support different kinds of scripting what you need to do is create a context for the scripts and then populated this context. Context is basically a map. The keys of the map become variables available to the script. The trick is to populate this context to support your querying requirements.
For example in groovy the binding is the context (refer http://groovy.codehaus.org/Embedding+Groovy). So if you populate it the following way your querying needs will be taken care of
Context/Binding Map
1. <Node name(String), Node object instance(Node)>
2. <Attribute name(String), list of nodes having this attribute(List<Node>)>
when you evaluate a script saying 'A.connections[0]', in the binding the object stored against key 'A' would be looked up. Then the returned objects 'connections' property will be accessed. Since that is a list the '[0]' syntax on that is permitted in groovy. This will return the object at index 0. Likewise to support your querying requirements you need to populate the context.
It depends where you want your performance to be.
If you want fast queries, and don't mind a bit of extra time/memory when adding an object, keeping an array/list of pointers to objects with specific attributes might be a good idea (particularly if you know during design-time what the possible attributes could be). Then, when adding a new object, say:
{name: A, attributes:{black, thin, invalid}, connections: {B,C}}
add a new pointer to the black list, the thin list, and the invalid list. Quick queries on connections will probably require keeping a list/array of pointers as a member of the object class. Then when you create an object, add pointers for the correct objects.
If you don't mind slower queries and want to optimize performance when adding objects, a linked list might be a better approach. You can just loop through all of the objects, checking at each one if it satisfies the condition of the query.
In this case, it would still be a good idea to keep member pointers for the connections, if (as your question would seem to indicate) you're looking to do multiple-level queries (i.e. A.connections[0].connections[0]. This will result in extremely poor performance if done via nested loops.)
Hopefully that helps, it really kind of depends on what kind of queries you're expecting to call most frequently.
There is no problem expressing this in Java. Just define classes representing nodes sets of nodes. Assuming that there is a fixed set of attributes, it could look like:
enum Attribute {
BLACK, WHITE, THIN, VALID /* etc. */ ;
}
class Node {
public final String name;
public final EnumSet<Attribute> attrs
= EnumSet.noneOf(Attribute.class);
public final NodeSet connections
= new NodeSet();
public Node(String name)
{
this.name = name;
}
// ... methods for adding attributes and connections
}
and then a class that represents a set of nodes:
class NodeSet extends LinkedHashSet<Node> {
/**
* Filters out nodes with at least one of the attributes.
*/
public NodeSet with(Attribute... as) {
NodeSet out = new NodeSet();
for(Node n : this) {
for(a : as)
if (n.attrs.contains(a)) {
out.add(n);
break;
}
}
return out;
}
/**
* Returns all nodes connected to this set.
*/
public NodeSet connections() {
NodeSet out = new NodeSet();
for(Node n : this)
out.addAll(n.connections);
return out;
}
/**
* Returns the first node in the set.
*/
public Node first() {
return iterator().next();
}
}
(I haven't checked that the code compiles, it's just a sketch.) Then, assuming you have a NodeSet all of all the nodes, you can do things like
all.with(BLACK).first().connections()
I think that solving this problem with a graph makes sense. You mention the possibility of using a graph database which I think will allow you to better focus on your problem as opposed to coding infrastructure. A simple in-memory graph like TinkerGraph from the TinkerPop project would be a good place to start.
By using TinkerGraph you then get access to a query language called Gremlin (also see GremlinDocs)which can help answer the questions you posed in your post. Here's a Gremlin session in the REPL which show how to construct the graph you presented and some sample graph traversals that yield the answers you wanted...this first part simple constructs the graph given your example:
gremlin> g = new TinkerGraph()
==>tinkergraph[vertices:0 edges:0]
gremlin> a = g.addVertex("A",['color':'black','width':'thin','status':'invalid'])
==>v[A]
gremlin> b = g.addVertex("B",['color':'white','width':'thin','status':'valid'])
==>v[B]
gremlin> c = g.addVertex("C",['color':'black','width':'thick','status':'invalid'])
==>v[C]
gremlin> a.addEdge('connection',b)
==>e[0][A-connection->B]
gremlin> a.addEdge('connection',c)
==>e[1][A-connection->C]
gremlin> b.addEdge('connection',a)
==>e[2][B-connection->A]
gremlin> c.addEdge('connection',a)
==>e[3][C-connection->A]
gremlin> c.addEdge('connection',b)
==>e[4][C-connection->B]
Now the queries:
// black - yields [A,C]
gremlin> g.V.has('color','black')
==>v[A]
==>v[C]
// black.thick - yields C
gremlin> g.V.has('width','thick')
==>v[C]
// A.connections[0].connections[0] - yields A
gremlin> a.out.out[0]
==>v[A]
// black[0].connections[0] - yields B
gremlin> g.V.has('color','black')[0].out[0]
==>v[B]
While this approach does introduce some learning curve if you are unfamiliar with the stack, I think you'll find that graphs fit as solutions to many problems and having some experience with the TinkerPop stack will be generally helpful for other scenarios you encounter.

Neo4j Using a custom evaluator to get the depth a node was retrieved at

I'm using the latest version of Neo4j to build up a graph of nodes and relationships with the Java API.
My problem is that I need to traverse the nodes to a certain depth. There may exist a relationship between two nodes at the same depth in the database, but I don't want to return that relationship in the traversal.
I tried to make a custom Evaluator by implementing the Evaluator interface but the only method it overrides is Evaluation evaluate(Path path) . It doesn't appear to have the notion of depth associated with it.
I would really appreciate some advice on how to either associate a node with its depth (when traversing from a particular node) or prune a relationship where two nodes are in the same level.
You can use Evaluators.atDepth() to get a predefined Evaluator that only includes paths with a certain depth.
In your custom evaluator you can simply check the length of the passed path parameter to decide if you want to include this path or not e.g. with:
Evaluation evaluate(Path path) {
return path.length() == 4 ? Evaluation.INCLUDE_AND_PRUNE : Evaluation.EXCLUDE_AND_CONTINUE);
}
Have you tried Cypher for that, something like
start n = node(1) match p=n-[*4]->(x) return x, length(p)
?
The path has a length(), which is the depth. The length is equal to the number of relationships in the path, i.e. number of nodes - 1.

Categories

Resources