Breadth First enumeration in Gremlin

Breadth First enumeration in Gremlin - java

I'm trying to get breadth first enumeration working with Gremlin, however I'm having trouble finding a way to output all the steps observed during the enumeration. I can only print out the result of the last iteration.
My question would be, given a starting node like this, how can I follow all paths (without knowing the overall depth) using Gremlin and print out everything I find along the way?
study=g.v('myId')
I have tried the scatter approach, looping approach (although both seem to require knowledge about the actual length of the path beforehand if I understand correctly)
Many thanks!

You don't provide any significant code that shows how you are using loop, but I think with the right arguments you can get it to do what you want:
gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.v(1).as('x').out.gather.scatter.loop('x'){true}{true}
==>v[2]
==>v[4]
==>v[3]
==>v[5]
==>v[3]
I'm assuming that you understand the code through gather/scatter and the first part of the loop that points back to x. So, with that assumption in mind, I'll focus on the two closures passed to loop.
The first closure passed to loop tells Gremlin when to break out of the loop. By simply returning true, you're saying to exhaust the loop. Depending on the structure of your graph, that may not be advisable as you could be waiting for a very long time for a result to come back. At a minimum you should consider setting it to something impossibly high so that if you do hit some cycle in the graph, your traversal breaks.
The second closure is known as the "emit closure". You can read more about it here, but basically it determines if intermediate objects in the pipe (not just the ones at the end of the loop) should be returned or not. In this case, you can see I simply set that value to true so that it would emit all objects at all steps of the loop.

Related

Tinkerpop/Gremlin: select vertices together with outgoing edge count

I try to find an efficient gremlin query that returns a traversal with the vertex and the number of outgoing edges. Or even better instead of the number of outgoing edges a boolean value if outgoing edges exist or not.
Background: I try to improve the performance of a program that writes some properties on the vertices and then iterates over the outgoing edges to remove some of it. In a lot of cases there are no outgoing edges and the iteration
for (Iterator<Edge> iE = v.edges(Direction.OUT); iE.hasNext();) { ... } consumes a significant part of the runtime. So instead of resolving the ids to vertices (with gts.V(ids) I want to collect the information about the existence of outgoing edges to skip the iteration, if possible.
My first try was:
gts.V(ids).as("v").choose(__.outE(), __.constant(true), __.constant(false)).as("e").select("v", "e");
Second idea was:
gts.V(ids).project("v", "e").by().by(__.outE().count());
Both seem to work, but is there a better solution that does not require the underlying graph implementation to fetch or count all edges?
(We currently use the sqlg implementation of tinkerpop/gremlin with Postgresql and both queries seem to fetch all outgoing edges from Postgresql. This may be a case where some optimization is missing. But my question is not sqlg specific.)

If you only need to know whether edges exist or not then you should limit() results in the by() modulator:
gremlin> g.V().project('v','e').by().by(outE().limit(1).count())
==>[v:v[1],e:1]
==>[v:v[2],e:0]
==>[v:v[3],e:0]
==>[v:v[4],e:1]
==>[v:v[5],e:0]
==>[v:v[6],e:1]
In this way you don't count all of the edges, just the first which is enough to answer your question. You can do true and false if you like with a minor modification:
gremlin> g.V().project('v','e').by().by(coalesce(outE().limit(1).constant(true),constant(false)))
==>[v:v[1],e:true]
==>[v:v[2],e:false]
==>[v:v[3],e:false]
==>[v:v[4],e:true]
==>[v:v[5],e:false]
==>[v:v[6],e:true]

Backtracking search for constraint satisfaction problem implementation

For a homework assignment, my goal is to have a backtracking search using minimum remaining values, a degree heuristic, and forward checking. Needs to solve a Boolean satisfiability problem consisting of sets of 3 Boolean variables or'd with each other, and each set must evaluate to true. As my current implementation stands, I believe it will eventually solve it, but it takes so long to finish that I end up with a java heap out of memory error.
With that out of the way, my implementation is as follows:
Have an arrayList of each constraint
an array of values, with the index being the value for each variable that's either true, false, or not yet given a value, initially all not given a value. The 0th element is used as a flag.
an array for the domain of each variable: either true or false, only true, only false or no possible value, initially true or false
An arrayList of arrays, each array being a list of values; used to avoid trying the same thing twice
An array for the number of times a variable is in a constraint: this is the degree heuristic
The back search returns an array of values, and takes in an array of values and an array of domains
First checks each constraint in the list to make sure the domain for each variable works to make it true. If none of them work, set 0th flag and exit. Additionally, if 2 variables have a domain that makes the the third variable have to be true in order to make the statement work, changes the domain of that variable.
After it passes that step, it checks if each variable in the vals array has an assigned value, ending the program on a success.
Next it makes a temporary array to keep track of values it's tried, and begins a loop. Inside, it adds each variable that has a domain of the smallest domain found (MRV) and doesn't have a value/hasn't been tried, adding it to the list, and exiting the recursion if it can't find any. It then selects from the variables the one that appears the most based on the degree heuristic array. On a tie, it picks the last one that appears in the tie, and sets a flag so we don't try that variable again in the same recursion.
Still inside the loop, it first tries to set the domain and value of that variable as true if the domain isn't only false, and false otherwise. It checks to see if that value combination for the variables has been done before, and if it has it reselects. If it hasn't, it adds that value combination to the list and does a recursive step. If that recursive step returns a stop flag, swap back the values to what they were for the domain and value of selected variable, and tries again, this time making the domain and value false, but first it adds it to the list of tried combinations, reselecting if its already there, and resets the domains of all variables that don't have a value yet, then does the recursive step. If that also fails, resets the values of the variable selected domain and value and tries to select a different variable in the same loop. The loop breaks once it's complete or has failed for all combinations, and then returns the values array.
I can tell that it's not repeating itself but I don't know how I can fix/speed up my implementation so that it runs at a reasonable time.

The first step is to create a knowledge model for the domain. This can be done with a domain specific language (DSL). In the DSL syntax a possible solution to the problem is formulated. The wanted side effect of a domain specific language is, that a grammar has to be created which can parse the language. This grammar can be used in the reverse order which is called a "generative grammar". The aim is to include as much domain knowledge as possible in the DSL which makes it easier for the solver to test out all states.
Unfortunately, the question has only a few information about the domain itself. According to the description there are three variables which can be on or off. This would generate a possible statespace of 2x2x2=8 which seems a bit too easy for a domain, because the solver is done if he tested out all 8 states. So i would guess, that the problem is bit harder but not explained in the description. Nevertheless, the first step is to convert the problem into a language which can be parsed by a grammar.

Finding path between two nodes without mentioning the number of loop to iterate

Situation here, I just do not want to loop externally in java for a gremlin query. Like I want to fetch all the paths or rather the name of the nodes/ edges in between of two separate nodes.
Here the first query with loop would be
(Name of the first Node).loop(1){it.loops<100}{true}.has('name', (Name of the second node)).path{it.name}
converting which in java has taken a lots of code !
but in case of the above there is a loop which looping for 100 nodes. If I try to traverse through any big number of nodes. or the count is not numerable to me then how the full traversal would be ?
here I have got one suggestion like : g.v(100).out.loop(1) { it.object != g.v(200) }.path
How will it directly work inside java ? I want to avoid groovy !

If you don't know the number of steps to loop you can just do:
g.v(100).out.loop(1){true}.path
That first closure after the loop represents a function that accepts loop metadata and return a boolean that decides whether or not to continue the looping process (true) or to terminate (false). As I simply return true above, the loop will not cease until all the paths are exhausted. Of course, the problem there is that if the graph cycles, the traversal will iterate indefinitely. It is therefore only safe to do a {true} if you are certain that your graph structure is such that you won't fall into that trap.
It is far better to simply have a max loop length of some reasonable size so that the traversal terminates. Or, alternatively, you could have some other intelligent method for managing loop termination. Maybe you could terminate the loop after some number of cycles is detected.
As far as wanting a Java answer instead of a groovy answer, I suggest that you read this answer about this topic. It explains how you go about converting groovy to java. For your specific problem, here's the java:
new GremlinPipeline(g.getVertex(100)).out().loop(1,
new PipeFunction<LoopPipe.LoopBundle<Vertex>,Boolean>() {
public Boolean compute(LoopPipe.LoopBundle<Vertex> argument) {
return true;
}
}, closure)

Find K max values from a N List

I got requirements-
1. Have random values in a List/Array and I need to find 3 max values .
2. I have a pool of values and each time this pool is getting updated may be in every 5 seconds, Now every time after the update , I need to find the 3 max Values from the list pool.
I thought of using Math.max thrice on the list but I dont think it as
a very optimized approach.
> Won't any sorting mechanism be costly as I am bothered about only top
3 Max Values , why to sort all these
Please suggest the best way to do it in JAVA

Sort the list, get the 3 max values. If you don't want the expense of the sort, iterate and maintain the n largest values.
Maintain the pool is a sorted collection.
Update: FYI Guava has an Ordering class with a greatestOf method to get the n max elements in a collection. You might want to check out the implementation.
Ordering.greatestOf

Traverse the list once, keeping an ordered array of three largest elements seen so far. This is trivial to update whenever you see a new element, and instantly gives you the answer you're looking for.

A priority queue should be the data structure you need in this case.

First, it would be wise to never say again, "I dont think it as a very optimized approach." You will not know which part of your code is slowing you down until you put a profiler on it.
Second, the easiest way to do what you're trying to do -- and what will be most clear to someone later if they are trying to see what your code does -- is to use Collections.sort() and pick off the last three elements. Then anyone who sees the code will know, "oh, this code takes the three largest elements." There is so much value in clear code that it will likely outweigh any optimization that you might have done. It will also keep you from writing bugs, like giving a natural meaning to what happens when someone puts the same number into the list twice, or giving a useful error message when there are only two elements in the list.
Third, if you really get data which is so large that O(n log n) operations is too slow, you should rewrite the data structure which holds the data in the first place -- java.util.NavigableSet for example offers a .descendingIterator() method which you can probe for its first three elements, those would be the three maximum numbers. If you really want, a Heap data structure can be used, and you can pull off the top 3 elements with something like one comparison, at the cost of making adding an O(log n) procedure.

Print a Tree of Recursion

I am writing minimax as part of a project, but its awfully hard to check that it is working correctly. If I could print a tree of what it does, it would be extremely useful.
Is there an easy way to print a tree of recursive calls, selecting whatever variables are important to the situation?

Keep track of recursion depth by means of a parameter (in minimax, you'd do that anyway). Then print depth * a small number of spaces, followed by the interesting variables in each call to obtain
player=1, move=...
player=2, move=...
player=1, move=...
...
player=2, move=...
You might also want to print the return value of each recursive call.
If you desperately want a pretty picture of a tree, post-process the output of the above and feed it to a tree-drawing package.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.