how to print all result from Gremlin traversal - java

I have a topology, it is
aws_vpc<------(composition)------test---(membership)---->location
So I use query
graph.traversal().V().hasLabel("test").or(
__.out("membership").hasLabel("location"),
__.out("composition").hasLabel("aws_vpc"))
to select it, but how to print all elements' name,
I want to output : test, membership, location,composition, aws_vpc.
Is there a way to achieve this?

You've written a traversal that only detects if "test" vertices have outgoing "membership" edges that have adjacent "location" vertices OR outgoing "composition" edges that have adjacent "aws_vpc" vertices, so all that traversal will return is "test" vertices that match that filter. It does not "select" anything more than that. In fact, the or() is immediately satisfied as soon as a single __.out("membership").hasLabel("location") or __.out("composition").hasLabel("aws_vpc") is returned in the order they are provided to or() so you don't even traverse all of those paths (which is a good thing for a filtering operation).
If you want to return all of the data you describe, you need to write your query in such a way so as to traverse it all and transform it into a format to return. A simple way to do this in your case would be to use project():
g.V().hasLabel('test').
project('data','memberships', 'compositions').
by(__.elementMap()).
by(__.outE("membership").as('e').
inV().hasLabel("location").as('v').
select('e','v').
by(elementMap()).
fold()).
by(__.outE("composition").as('e').
inV().hasLabel("aws_vpc").as('v').
select('e','v').
by(elementMap()).
fold())
This takes each "test" vertex and transforms it to a Map with three keys: "data", "memberships" and "competitions" and then each by() modulator specifies what to do with the current "test" vertex being transformed and places it in the respective key. Note that I chose select() to get the edge and vertex combinations but that could have been a project() step as well if you liked. The key there is to end with fold() so that you reduce the stream of edge data for each "test" vertex to a List of data that can put in the related "memberships" and "compositions" keys.

Related

Tinkerpop/Gremlin: select vertices together with outgoing edge count

I try to find an efficient gremlin query that returns a traversal with the vertex and the number of outgoing edges. Or even better instead of the number of outgoing edges a boolean value if outgoing edges exist or not.
Background: I try to improve the performance of a program that writes some properties on the vertices and then iterates over the outgoing edges to remove some of it. In a lot of cases there are no outgoing edges and the iteration
for (Iterator<Edge> iE = v.edges(Direction.OUT); iE.hasNext();) { ... } consumes a significant part of the runtime. So instead of resolving the ids to vertices (with gts.V(ids) I want to collect the information about the existence of outgoing edges to skip the iteration, if possible.
My first try was:
gts.V(ids).as("v").choose(__.outE(), __.constant(true), __.constant(false)).as("e").select("v", "e");
Second idea was:
gts.V(ids).project("v", "e").by().by(__.outE().count());
Both seem to work, but is there a better solution that does not require the underlying graph implementation to fetch or count all edges?
(We currently use the sqlg implementation of tinkerpop/gremlin with Postgresql and both queries seem to fetch all outgoing edges from Postgresql. This may be a case where some optimization is missing. But my question is not sqlg specific.)
If you only need to know whether edges exist or not then you should limit() results in the by() modulator:
gremlin> g.V().project('v','e').by().by(outE().limit(1).count())
==>[v:v[1],e:1]
==>[v:v[2],e:0]
==>[v:v[3],e:0]
==>[v:v[4],e:1]
==>[v:v[5],e:0]
==>[v:v[6],e:1]
In this way you don't count all of the edges, just the first which is enough to answer your question. You can do true and false if you like with a minor modification:
gremlin> g.V().project('v','e').by().by(coalesce(outE().limit(1).constant(true),constant(false)))
==>[v:v[1],e:true]
==>[v:v[2],e:false]
==>[v:v[3],e:false]
==>[v:v[4],e:true]
==>[v:v[5],e:false]
==>[v:v[6],e:true]

Depth-first tree traversal in TinkerPop graph

Given a tree-structured TinkerPop graph with vertices connected by labeled parent-child relationships ([parent-PARENT_CHILD->child]), what's the idiomatic way to traverse and find all those nodes?
I'm new to graph traversals, so it seems more or less straightforward to traverse them with a recursive function:
Stream<Vertex> depthFirst(Vertex v) {
Stream<Vertex> selfStream = Stream.of(v);
Iterator<Vertex> childIterator = v.vertices(Direction.OUT, PARENT_CHILD);
if (childIterator.hasNext()) {
return selfStream.appendAll(
Stream.ofAll(() -> childIterator)
.flatMap(this::depthFirst)
);
}
return selfStream;
}
(N.b. this example uses Vavr streams, but the Java stream version is similar, just slightly more verbose.)
I assume a graph-native implementation would be more performant, especially on databases other than the in-memory TinkerGraph.
However, when I look at the TinkerPop tree recipes, it's not obvious what combination of repeat() / until() etc. is the right one to do what I want.
If I then want to find only those vertices (leaf or branch) having a certain label, again, I can see how to do it with the function above:
Stream<Vertex> nodesWithMyLabel = depthFirst(root)
.filter(v -> "myLabel".equals(v.label()));
but it's far from obvious that this is efficient, and I assume there must be a better graph-native approach.
If you are using TinkerPop, it is best to just write your traversals with Gremlin. Let's use the tree described in the recipe:
g.addV().property(id, 'A').as('a').
addV().property(id, 'B').as('b').
addV().property(id, 'C').as('c').
addV().property(id, 'D').as('d').
addV().property(id, 'E').as('e').
addV().property(id, 'F').as('f').
addV().property(id, 'G').as('g').
addE('hasParent').from('a').to('b').
addE('hasParent').from('b').to('c').
addE('hasParent').from('d').to('c').
addE('hasParent').from('c').to('e').
addE('hasParent').from('e').to('f').
addE('hasParent').from('g').to('f').iterate()
To find all the children of "A", you simply do:
gremlin> g.V('A').repeat(out()).emit()
==>v[B]
==>v[C]
==>v[E]
==>v[F]
The traversal above basically says, "Start at 'A" vertex and traverse on out edges until there are no more, and oh, by the way, emit each of those child vertices as you go." If you want to also get the root of "A" then you just need to switch things around a bit:
gremlin> g.V('A').emit().repeat(out())
==>v[A]
==>v[B]
==>v[C]
==>v[E]
==>v[F]
Going a step further, if you want to emit only certain vertices based on some filter (in your question you specified label) you can just provide a filtering argument to emit(). In this case, I only emit those vertices that have more than one incoming edge:
gremlin> g.V('A').emit(inE().count().is(gt(1))).repeat(out())
==>v[C]
==>v[F]
Here's what I ended up with, after a certain amount of trial and error:
GraphTraversal<Vertex, Vertex> traversal =
graph.traversal().V(parent)
.repeat(out(PARENT_CHILD)) // follow only edges labeled PARENT_CHILD
.emit()
.hasLabel("myLabel"); // filter for vertices labeled "myLabel"
Note that this is slightly different from the recursive version in the original question since I realized I don't actually want to include the parent in the result. (I think, from the Repeat Step docs, that I could include the parent by putting emit() before repeat(), but I haven't tried it.)

gremlin query to retrieve vertices which are having multiple edges between them

Consider the above graph. I would like a gremlin query that returns all nodes that have multiple edges between them as shown in the graph.
this graph was obtained using neo4j cypher query:
MATCH (d:dest)-[r]-(n:cust)
WITH d,n, count(r) as popular
RETURN d, n
ORDER BY popular desc LIMIT 5
for example:
between RITUPRAKA... and Asia there are 8 multiple edges hence the query has returned the 2 nodes along with the edges, similarly for other nodes.
Note: the graph has other nodes with only a single edge between them, these nodes will not be returned.
I would like same thing in gremlin.
I have used given below query
g.V().as('out').out().as('in').select('out','in').groupCount().unfold().filter(select(values).is(gt(1))).select(keys)
it is displaying
out:v[1234],in:v[3456] .....
but instead of displaying Ids of the node I want to display values of the node
like out:ICIC1234,in:HDFC234
I have modified query as
g.V().values("name").as('out').out().as('in').values("name").select('out','in').
groupCount().unfold().filter(select(values).is(gt(1))).select(keys)
but it is showing the error like classcastException, each vertex to be traversed use indexes for fast iteration
Your graph doesn't seem to indicate bi-directional edges are possible so I will answer with that assumption in mind. Here's a simple sample graph - please consider including one on future questions as it makes it much easier than pictures and textual descriptions for those reading your question to understand and to get started writing a Gremlin traversal to help you:
g.addV().property(id,'a').as('a').
addV().property(id,'b').as('b').
addV().property(id,'c').as('c').
addE('knows').from('a').to('b').
addE('knows').from('a').to('b').
addE('knows').from('a').to('c').iterate()
So you can see that vertex "a" has two outgoing edges to "b" and one outgoing edge to "c", thus we should get the "a b" vertex pair. One way to get this is with:
gremlin> g.V().as('out').out().as('in').
......1> select('out','in').
......2> groupCount().
......3> unfold().
......4> filter(select(values).is(gt(1))).
......5> select(keys)
==>[out:v[a],in:v[b]]
The above traversal uses groupCount() to count the number of times the "out" and "in" labelled vertices show up (i.e. the number of edges between them). It uses unfold() to iterate through the Map of <Vertex Pairs,Count> (or more literally <List<Vertex>,Long>) and filter out those that have a count greater than 1 (i.e. multiple edges). The final select(keys) drops the "count" as it is not needed anymore (i.e. we just need the keys which hold the vertex pairs for the result).
Perhaps another way to go is with this method:
gremlin> g.V().filter(outE()).
......1> project('out','in').
......2> by().
......3> by(out().
......4> groupCount().
......5> unfold().
......6> filter(select(values).is(gt(1))).
......7> select(keys)).
......8> select(values)
==>[v[a],v[b]]
This approach with project() forgoes the heavier memory requirements for a big groupCount() over the whole graph in favor of building a smaller Map over an individual Vertex that becomes eligible for garbage collection at the end of the by() (or essentially per initial vertex processed).
My suggestion is similar to Stephen's, but also includes the edges or rather the whole path (I guess the Cypher query returned the edges too).
g.V().as("dest").outE().inV().as("cust").
group().by(select("dest","cust")).by(path().fold()).
unfold().filter(select(values).count(local).is(gt(1))).
select(values).unfold()

How can I collect property values while traversing a graph with gremlin in Java?

Every vertex in my graph has at least a name property. I have a label L set S of name values. Now I want to collect the values of the name property of all vertices that can be reached (recursively) via a specific outgoing edge with edge label EL from the vertices with the names in set S.
My current solution for a single start node with name S1 looks like the following:
g.traversal().V().hasLabel(L)
.has("name", S1)
.repeat(__.optional(__.out(EL)))
.until(__.out(EL).count().is(0))
.path()
.forEachRemaining(path -> {
path.forEach(e -> System.out.println(((Vertex)e).property("name").value()));});
The println is only to see that this produces the expected result, normally I would collect the names in a Set.
Is there a better way to collect the values of the name property of all the vertices reachable via outgoing edges with label EL?
And what would be the best way to start with multiple vertices (where only the name is known from Set S)?
Currently, the structure is a tree, but if there may by cycles, does the code above prevents endless loops? If not, how can this be done?
Your approach is a good start.
To start from a set of multiple vertices, use the P.within() predicate. TinkerPop provides several other predicates.
Use simplePath() to prevent repeating through loops.
Use store() to keep track of items as it traverses the graph. The by("name") modulator will store the "name" property rather than the vertex.
To get out the result, use cap() to output the items it stored during the traversal. The result at this point is a Set which potentially contains duplicates. Use unfold() to turn the Set into an iterator that we can dedup() then finish with toSet().
graph.traversal().V().hasLabel(L).has("name", P.within(S)).
repeat( __.out(EL).simplePath().store("x").by("name") ).
until( __.outE(EL).count().is(0) ).
cap("x").unfold().dedup().toSet()

Cypher traversal

I'm implementing a something like a linked-list structure in a Neo4j graph. The graph is created by executing many statements similar to this:
CREATE (R1:root{edgeId:2})-[:HEAD]->
(:node{text: 'edge 2 head text', width:300})-[:NEXT{edge:2, hard:true}]->
(:node{text: 'edge 2 point 0'})-[:NEXT{edge:2}]->
(n0:node{text: 'edge 2 point 1'}),
(n0)-[:BRANCH]->(:root{edgeId:3}),
(n0)-[:NEXT{edge:2}]->
(:node{text: 'edge 2 point 2'})-[:NEXT{edge:2}]->
(:node{text: 'edge 2 point 3'})<-[:TAIL{edge:2}]->(R1)
Traversing an edge means starting with a root node, following its outgoing HEAD relationship to the first node, and following the chain of NEXT relationships until reaching a node with an incoming TAIL relationship from the root we started from.
i.e.:
MATCH path = (root:root:main)-[:HEAD]->(a:point)-[n:NEXT*]->(z:point)<-[:TAIL]-(root)
RETURN nodes(path), n
Every node has an outgoing NEXT relationship, but some nodes also have BRANCH relationships, which point to the root nodes of other edges.
In the above query, nodes(path) obviously returns all the nodes along the edge, and n lists the outgoing NEXT relationship for each node along it. How could I modify this query so that, in addition to the outgoing NEXT relationship, it also returns any outgoing BRANCH relationships
How can I modify the above query so that each record returned contains a node on the path along with a list of all outgoing relationships (both NEXT and BRANCH) from it?
Note that I don't want to traverse the BRANCH edges in this query, I just want it to tell me they're there.
(PS I'm implementing this strategy in Java, but so far have preferred executing Cypher queries directly rather than using the Traversal API. If I'm making this more difficult on myself by doing so, please bring it to my attention).
You can return path-expressions at any time.
MATCH path = (root:root:main)-[:HEAD]->(a:point)-[n:NEXT*]->(z:point)<-[:TAIL]-(root)
RETURN extract(x in nodes(path) | [x, x-[:BRANCH]->()]), n
This x-[:BRANCH]->() returns a collection of paths, so if you just want to access the relationship you'd have to do
[p in x-[:BRANCH]->() | head(rels(p)) ]
For an example of how to implement an activity stream as an unmanaged extension you might have a look at this: https://github.com/jexp/neo4j-activity-stream

Categories

Resources