I try to find an efficient gremlin query that returns a traversal with the vertex and the number of outgoing edges. Or even better instead of the number of outgoing edges a boolean value if outgoing edges exist or not.
Background: I try to improve the performance of a program that writes some properties on the vertices and then iterates over the outgoing edges to remove some of it. In a lot of cases there are no outgoing edges and the iteration
for (Iterator<Edge> iE = v.edges(Direction.OUT); iE.hasNext();) { ... } consumes a significant part of the runtime. So instead of resolving the ids to vertices (with gts.V(ids) I want to collect the information about the existence of outgoing edges to skip the iteration, if possible.
My first try was:
gts.V(ids).as("v").choose(__.outE(), __.constant(true), __.constant(false)).as("e").select("v", "e");
Second idea was:
gts.V(ids).project("v", "e").by().by(__.outE().count());
Both seem to work, but is there a better solution that does not require the underlying graph implementation to fetch or count all edges?
(We currently use the sqlg implementation of tinkerpop/gremlin with Postgresql and both queries seem to fetch all outgoing edges from Postgresql. This may be a case where some optimization is missing. But my question is not sqlg specific.)
If you only need to know whether edges exist or not then you should limit() results in the by() modulator:
gremlin> g.V().project('v','e').by().by(outE().limit(1).count())
==>[v:v[1],e:1]
==>[v:v[2],e:0]
==>[v:v[3],e:0]
==>[v:v[4],e:1]
==>[v:v[5],e:0]
==>[v:v[6],e:1]
In this way you don't count all of the edges, just the first which is enough to answer your question. You can do true and false if you like with a minor modification:
gremlin> g.V().project('v','e').by().by(coalesce(outE().limit(1).constant(true),constant(false)))
==>[v:v[1],e:true]
==>[v:v[2],e:false]
==>[v:v[3],e:false]
==>[v:v[4],e:true]
==>[v:v[5],e:false]
==>[v:v[6],e:true]
Related
I have a topology, it is
aws_vpc<------(composition)------test---(membership)---->location
So I use query
graph.traversal().V().hasLabel("test").or(
__.out("membership").hasLabel("location"),
__.out("composition").hasLabel("aws_vpc"))
to select it, but how to print all elements' name,
I want to output : test, membership, location,composition, aws_vpc.
Is there a way to achieve this?
You've written a traversal that only detects if "test" vertices have outgoing "membership" edges that have adjacent "location" vertices OR outgoing "composition" edges that have adjacent "aws_vpc" vertices, so all that traversal will return is "test" vertices that match that filter. It does not "select" anything more than that. In fact, the or() is immediately satisfied as soon as a single __.out("membership").hasLabel("location") or __.out("composition").hasLabel("aws_vpc") is returned in the order they are provided to or() so you don't even traverse all of those paths (which is a good thing for a filtering operation).
If you want to return all of the data you describe, you need to write your query in such a way so as to traverse it all and transform it into a format to return. A simple way to do this in your case would be to use project():
g.V().hasLabel('test').
project('data','memberships', 'compositions').
by(__.elementMap()).
by(__.outE("membership").as('e').
inV().hasLabel("location").as('v').
select('e','v').
by(elementMap()).
fold()).
by(__.outE("composition").as('e').
inV().hasLabel("aws_vpc").as('v').
select('e','v').
by(elementMap()).
fold())
This takes each "test" vertex and transforms it to a Map with three keys: "data", "memberships" and "competitions" and then each by() modulator specifies what to do with the current "test" vertex being transformed and places it in the respective key. Note that I chose select() to get the edge and vertex combinations but that could have been a project() step as well if you liked. The key there is to end with fold() so that you reduce the stream of edge data for each "test" vertex to a List of data that can put in the related "memberships" and "compositions" keys.
Given a tree-structured TinkerPop graph with vertices connected by labeled parent-child relationships ([parent-PARENT_CHILD->child]), what's the idiomatic way to traverse and find all those nodes?
I'm new to graph traversals, so it seems more or less straightforward to traverse them with a recursive function:
Stream<Vertex> depthFirst(Vertex v) {
Stream<Vertex> selfStream = Stream.of(v);
Iterator<Vertex> childIterator = v.vertices(Direction.OUT, PARENT_CHILD);
if (childIterator.hasNext()) {
return selfStream.appendAll(
Stream.ofAll(() -> childIterator)
.flatMap(this::depthFirst)
);
}
return selfStream;
}
(N.b. this example uses Vavr streams, but the Java stream version is similar, just slightly more verbose.)
I assume a graph-native implementation would be more performant, especially on databases other than the in-memory TinkerGraph.
However, when I look at the TinkerPop tree recipes, it's not obvious what combination of repeat() / until() etc. is the right one to do what I want.
If I then want to find only those vertices (leaf or branch) having a certain label, again, I can see how to do it with the function above:
Stream<Vertex> nodesWithMyLabel = depthFirst(root)
.filter(v -> "myLabel".equals(v.label()));
but it's far from obvious that this is efficient, and I assume there must be a better graph-native approach.
If you are using TinkerPop, it is best to just write your traversals with Gremlin. Let's use the tree described in the recipe:
g.addV().property(id, 'A').as('a').
addV().property(id, 'B').as('b').
addV().property(id, 'C').as('c').
addV().property(id, 'D').as('d').
addV().property(id, 'E').as('e').
addV().property(id, 'F').as('f').
addV().property(id, 'G').as('g').
addE('hasParent').from('a').to('b').
addE('hasParent').from('b').to('c').
addE('hasParent').from('d').to('c').
addE('hasParent').from('c').to('e').
addE('hasParent').from('e').to('f').
addE('hasParent').from('g').to('f').iterate()
To find all the children of "A", you simply do:
gremlin> g.V('A').repeat(out()).emit()
==>v[B]
==>v[C]
==>v[E]
==>v[F]
The traversal above basically says, "Start at 'A" vertex and traverse on out edges until there are no more, and oh, by the way, emit each of those child vertices as you go." If you want to also get the root of "A" then you just need to switch things around a bit:
gremlin> g.V('A').emit().repeat(out())
==>v[A]
==>v[B]
==>v[C]
==>v[E]
==>v[F]
Going a step further, if you want to emit only certain vertices based on some filter (in your question you specified label) you can just provide a filtering argument to emit(). In this case, I only emit those vertices that have more than one incoming edge:
gremlin> g.V('A').emit(inE().count().is(gt(1))).repeat(out())
==>v[C]
==>v[F]
Here's what I ended up with, after a certain amount of trial and error:
GraphTraversal<Vertex, Vertex> traversal =
graph.traversal().V(parent)
.repeat(out(PARENT_CHILD)) // follow only edges labeled PARENT_CHILD
.emit()
.hasLabel("myLabel"); // filter for vertices labeled "myLabel"
Note that this is slightly different from the recursive version in the original question since I realized I don't actually want to include the parent in the result. (I think, from the Repeat Step docs, that I could include the parent by putting emit() before repeat(), but I haven't tried it.)
Consider the above graph. I would like a gremlin query that returns all nodes that have multiple edges between them as shown in the graph.
this graph was obtained using neo4j cypher query:
MATCH (d:dest)-[r]-(n:cust)
WITH d,n, count(r) as popular
RETURN d, n
ORDER BY popular desc LIMIT 5
for example:
between RITUPRAKA... and Asia there are 8 multiple edges hence the query has returned the 2 nodes along with the edges, similarly for other nodes.
Note: the graph has other nodes with only a single edge between them, these nodes will not be returned.
I would like same thing in gremlin.
I have used given below query
g.V().as('out').out().as('in').select('out','in').groupCount().unfold().filter(select(values).is(gt(1))).select(keys)
it is displaying
out:v[1234],in:v[3456] .....
but instead of displaying Ids of the node I want to display values of the node
like out:ICIC1234,in:HDFC234
I have modified query as
g.V().values("name").as('out').out().as('in').values("name").select('out','in').
groupCount().unfold().filter(select(values).is(gt(1))).select(keys)
but it is showing the error like classcastException, each vertex to be traversed use indexes for fast iteration
Your graph doesn't seem to indicate bi-directional edges are possible so I will answer with that assumption in mind. Here's a simple sample graph - please consider including one on future questions as it makes it much easier than pictures and textual descriptions for those reading your question to understand and to get started writing a Gremlin traversal to help you:
g.addV().property(id,'a').as('a').
addV().property(id,'b').as('b').
addV().property(id,'c').as('c').
addE('knows').from('a').to('b').
addE('knows').from('a').to('b').
addE('knows').from('a').to('c').iterate()
So you can see that vertex "a" has two outgoing edges to "b" and one outgoing edge to "c", thus we should get the "a b" vertex pair. One way to get this is with:
gremlin> g.V().as('out').out().as('in').
......1> select('out','in').
......2> groupCount().
......3> unfold().
......4> filter(select(values).is(gt(1))).
......5> select(keys)
==>[out:v[a],in:v[b]]
The above traversal uses groupCount() to count the number of times the "out" and "in" labelled vertices show up (i.e. the number of edges between them). It uses unfold() to iterate through the Map of <Vertex Pairs,Count> (or more literally <List<Vertex>,Long>) and filter out those that have a count greater than 1 (i.e. multiple edges). The final select(keys) drops the "count" as it is not needed anymore (i.e. we just need the keys which hold the vertex pairs for the result).
Perhaps another way to go is with this method:
gremlin> g.V().filter(outE()).
......1> project('out','in').
......2> by().
......3> by(out().
......4> groupCount().
......5> unfold().
......6> filter(select(values).is(gt(1))).
......7> select(keys)).
......8> select(values)
==>[v[a],v[b]]
This approach with project() forgoes the heavier memory requirements for a big groupCount() over the whole graph in favor of building a smaller Map over an individual Vertex that becomes eligible for garbage collection at the end of the by() (or essentially per initial vertex processed).
My suggestion is similar to Stephen's, but also includes the edges or rather the whole path (I guess the Cypher query returned the edges too).
g.V().as("dest").outE().inV().as("cust").
group().by(select("dest","cust")).by(path().fold()).
unfold().filter(select(values).count(local).is(gt(1))).
select(values).unfold()
When solving the Chinese postman problem (route inspection problem), how can we find the pairings (between odd vertices) such that the sum of the weights is minimized?
This is the most crucial step in the algorithm that successfully solves the Chinese Postman Problem for a non-Eulerian Graph. Though it is easy to implement on paper, but I am facing difficulty in implementing in Java.
I was thinking about ways to find all possible pairs but if one runs the first loop over all the odd vertices and the next loop for all the other possible pairs. This will only give one pair, to find all other pairs you would need another two loops and so on. This is rather strange as one will be 'looping over loops' in a crude sense. Is there a better way to resolve this problem.
I have read about the Edmonds-Jonhson algorithm, but I don't understand the motivation behind constructing a bipartite graph. And I have also read Chinese Postman Problem: finding best connections between odd-degree nodes, but the author does not explain how to implement a brute-force algorithm.
Also the following question: How should I generate the partitions / pairs for the Chinese Postman problem? has been asked previously by a user of Stack overflow., but a reply to the post gives a python implementation of the code. I am not familiar with python and I would request any community member to rewrite the code in Java or if possible explain the algorithm.
Thank You.
Economical recursion
These tuples normally are called edges, aren't they?
You need a recursion.
0. Create main stack of edge lists.
1. take all edges into a current edge list. Null the found edge stack.
2. take a next current edge for the current edge list and add it in the found edge stack.
3. Create the next edge list from the current edge list. push the current edge list into the main stack. Make next edge list current.
4. Clean current edge list from all adjacent to current edge and from current edge.
5. If the current edge list is not empty, loop to 2.
6. Remember the current state of found edge stack - it is the next result set of edges that you need.
7. Pop the the found edge stack into current edge. Pop the main stack into current edge list. If stacks are empty, return. Repeat until current edge has a next edge after it.
8. loop to 2.
As a result, you have all possible sets of edges and you never need to check "if I had seen the same set in the different order?"
It's actually fairly simple when you wrap your head around it. I'm just sharing some code here in the hope it will help the next person!
The function below returns all the valid odd vertex combinations that one then needs to check for the shortest one.
private static ObjectArrayList<ObjectArrayList<IntArrayList>> getOddVertexCombinations(IntArrayList oddVertices,
ObjectArrayList<IntArrayList> buffer){
ObjectArrayList<ObjectArrayList<IntArrayList>> toReturn = new ObjectArrayList<>();
if (oddVertices.isEmpty()) {
toReturn.add(buffer.clone());
} else {
int first = oddVertices.removeInt(0);
for (int c = 0; c < oddVertices.size(); c++) {
int second = oddVertices.removeInt(c);
buffer.add(new IntArrayList(new int[]{first, second}));
toReturn.addAll(getOddVertexCombinations(oddVertices, buffer));
buffer.pop();
oddVertices.add(c, second);
}
oddVertices.add(0, first);
}
return toReturn;
}
I'm trying to get breadth first enumeration working with Gremlin, however I'm having trouble finding a way to output all the steps observed during the enumeration. I can only print out the result of the last iteration.
My question would be, given a starting node like this, how can I follow all paths (without knowing the overall depth) using Gremlin and print out everything I find along the way?
study=g.v('myId')
I have tried the scatter approach, looping approach (although both seem to require knowledge about the actual length of the path beforehand if I understand correctly)
Many thanks!
You don't provide any significant code that shows how you are using loop, but I think with the right arguments you can get it to do what you want:
gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.v(1).as('x').out.gather.scatter.loop('x'){true}{true}
==>v[2]
==>v[4]
==>v[3]
==>v[5]
==>v[3]
I'm assuming that you understand the code through gather/scatter and the first part of the loop that points back to x. So, with that assumption in mind, I'll focus on the two closures passed to loop.
The first closure passed to loop tells Gremlin when to break out of the loop. By simply returning true, you're saying to exhaust the loop. Depending on the structure of your graph, that may not be advisable as you could be waiting for a very long time for a result to come back. At a minimum you should consider setting it to something impossibly high so that if you do hit some cycle in the graph, your traversal breaks.
The second closure is known as the "emit closure". You can read more about it here, but basically it determines if intermediate objects in the pipe (not just the ones at the end of the loop) should be returned or not. In this case, you can see I simply set that value to true so that it would emit all objects at all steps of the loop.