Cannot read Gremlin data from remote after writing

Cannot read Gremlin data from remote after writing - java

I use Java to connect to a "remote" (localhost:8182) Gremlin server g this way:
traversalSource = traversal().withRemote(DriverRemoteConnection.using("localhost", 8182, "g"));
Then, I write some node like this:
traversalSource.addV("TenantProfile");
From Gremlin console, connected to the same Gremlin server, I see all created nodes and edges
gremlin> g==>graphtraversalsource[tinkergraph[vertices:42 edges:64], standard]
and queries work, but if I read graph from Java, it results empty, so querying e.g. like
traversalSource.V()
.has("label", TENANT_PROFILE_LABEL)
.has("fiscal id", "04228480408")
.out(OWNS_LABEL)
.has("type", "SH")
.values("description")
.toList();
returns an emtpy list.
Could anyone help me solve this mistery, please?
Thanks.

In reply to Stephen, I post the last instructions before iterate()
for (final Map<String, String> edgePropertyMap : edgePropertyTable) {
edgeTraversal = traversalSource
.V(vertices.get(edgePropertyMap.get(FROM_KEY)))
.addE(edgeLabel)
.to(vertices.get(edgePropertyMap.get(TO_KEY)));
final Set<String> edgePropertyNames = edgePropertyMap.keySet();
for (final String nodePropertyName : edgePropertyNames)
if ((!nodePropertyName.equals(FROM_KEY)) && (!nodePropertyName.equals(TO_KEY))) {
final String edgePropertyValue = edgePropertyMap.get(nodePropertyName);
edgeTraversal = edgeTraversal.property(nodePropertyName, edgePropertyValue);
}
edgeTraversal.as(edgePropertyMap.get(IDENTIFIER_KEY)).iterate();
}
Anyway, if no iterate() were present, how could nodes and edges be visible from inside console? How could they have been "finalized" on remote server?

Related

Janusgraph Remote Traversal with Java

I am building a Java application that needs to connect to a remote JanusGraph server and create graphs on the fly.
I have installed/configured a single node JanusGraph Server with a Berkeley database backend and ConfigurationManagementGraph support so that I can create/manage multiple graphs on the server.
In a Gremlin console I can connect to the remote server, create graphs, create vertexes, etc. Example:
gremlin> :remote connect tinkerpop.server conf/remote.yaml session
gremlin> :remote console
gremlin> map = new HashMap<String, Object>();
gremlin> map.put("storage.backend", "berkeleyje");
gremlin> map.put("storage.directory", "db/test");
gremlin> ConfiguredGraphFactory.createTemplateConfiguration(new MapConfiguration(map));
gremlin> ConfiguredGraphFactory.create("test");
gremlin> graph = ConfiguredGraphFactory.open("test");
gremlin> g = graph.traversal();
gremlin> g.addV("person").property("name", "peter");
gremlin> g.tx().commit();
gremlin> graph.vertices().size();
==>1
gremlin> g.V();
==>v[4288]
gremlin> g.V().count();
==>1
gremlin> g.close();
So far so good. On the Java side, I can connect to the remote server and issue commands via Client.submit() method. In the following example, I connect to the remote server and create a new graph called "test2":
Cluster cluster = Cluster.build()
.addContactPoint(host)
.port(port)
.serializer(Serializers.GRYO_V3D0)
.create();
String name = "test2";
String sessionId = UUID.randomUUID().toString();
Client client = cluster.connect(sessionId);
client.submit("map = new HashMap<String, Object>();");
client.submit("map.put(\"storage.backend\", \"berkeleyje\");");
client.submit("map.put(\"storage.directory\", \"db/" + name + "\");");
client.submit("ConfiguredGraphFactory.createTemplateConfiguration(new MapConfiguration(map));");
client.submit("ConfiguredGraphFactory.create(\"" + name + "\");");
I can confirm that the graph was created and see other graphs programmatically as well using the client.submit() method:
ResultSet results = client.submit("ConfiguredGraphFactory.getGraphNames()");
Iterator<Result> it = results.iterator();
while (it.hasNext()){
Result result = it.next();
String graphName = result.getString();
System.out.println(graphName);
}
Next I want to connect to a graph and traverse the nodes programmatically (in Java). However, I can't seem to figure out how to do this. From what I've read, it should be something as simple as this:
DriverRemoteConnection conn = DriverRemoteConnection.using(client, name); //"_traversal"
GraphTraversalSource g = AnonymousTraversalSource.traversal().withRemote(conn);
These commands don't raise any errors but the GraphTraversalSource appears to be empty:
System.out.println(g.getGraph()); //always returns emptygraph[empty]
System.out.prinltn(g.V()); //Appears to be empty [GraphStep(vertex,[])]
Iterator<Vertex> it = g.getGraph().vertices(); //empty
Any suggestions of how to get a GraphTraversalSource for a remote JanusGraph server in Java? I suspect that my issue something to do with ConfigurationManagementGraph but I can't put my finger on it. Again, the client.submit() works. It would be cool if I could do something like this:
GraphTraversalSource g = (GraphTraversalSource) client.submit("ConfiguredGraphFactory.open(\"" + name + "\");").iterator().next();
...but of course, that doesn't work
UPDATE
Looking at the code, it appears that the graph name (remoteTraversalSourceName) passed to the DriverRemoteConnection is being ignored.
Starting with the DriverRemoteConnection:
DriverRemoteConnection conn = DriverRemoteConnection.using(client, name);
Under the hood, the graph name (remoteTraversalSourceName) is simply used to set an alias (e.g. client.alias(name);)
Next, in the AnonymousTraversalSource.traversal().withRemote() method
GraphTraversalSource g = AnonymousTraversalSource.traversal().withRemote(conn);
Under the hood, withRemote() is calling:
traversalSourceClass.getConstructor(RemoteConnection.class).newInstance(remoteConnection);
Where traversalSourceClass is GraphTraversalSource.class
Which is the same as this:
g = GraphTraversalSource.class.getConstructor(RemoteConnection.class).newInstance(conn);
Finally, the constructor for the GraphTraversalSource looks like this:
public GraphTraversalSource(final RemoteConnection connection) {
this(EmptyGraph.instance(), TraversalStrategies.GlobalCache.getStrategies(EmptyGraph.class).clone());
this.connection = connection;
this.strategies.addStrategies(new RemoteStrategy(connection));
}
As you can see, the graph variable in the GraphTraversalSource is never set.
I suspect that either (a) I shouldn't be using the AnonymousTraversalSource or (b) that I need to instantiate the GraphTraversalSource some other way, perhaps using a Graph object.

Updated answer
Most likely the channelizer that you use is the Tinkerpop's channelizer org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer. Replacing it with the Janus' channelizer org.janusgraph.channelizers.JanusGraphWsAndHttpChannelizer properly binds the graph to the connection.
Older answer
The workaround that was used along when the channelizer was org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer.
I'm having the same issue. For now, the workaround that I found is to bind the traversals during the Janus startup.
Additionally to the gremlin-server.yaml and janusgraph.properties, I also override the empty-sample.groovy with the content:
def globals = [:]
ConfiguredGraphFactory.getGraphNames().each { name ->
globals << [ (name + "_traversal") : ConfiguredGraphFactory.open(name).traversal()]
}
Now the graph that you create is available at the yourgraphname_traversal:
import org.apache.tinkerpop.gremlin.driver.Cluster
import org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteConnection
import org.apache.tinkerpop.gremlin.driver.ser.Serializers
import org.apache.tinkerpop.gremlin.process.traversal.AnonymousTraversalSource
...
val cluster = Cluster.build()
.addContactPoint("your_load_balancer_host")
.port(8182)
.serializer(Serializers.GRAPHBINARY_V1D0.simpleInstance())
.create()
val remoteConnection = DriverRemoteConnection.using(cluster, "yourgraphname_traversal")
val g = AnonymousTraversalSource.traversal().withRemote(remoteConnection)
The solution is not ideal, cause it requires all the Janus nodes to be restarted in order to update the bindings. Assuming that the graphs are rarely created, this solution is at least something.

How can I use fetchplan with Java API?

As stated here, if I define [*]in_*:-2 out_*:-2 as fetchplan, the query should return only properties and none info about the edges.
OrientGraph graph = new OrientGraph(URL, USER, USER);
try {
Iterable resultList = graph.command(new OSQLSynchQuery("select from #11:0")).setFetchPlan("[*]in_*:-2 out_*:-2").execute();
OrientVertex user = (OrientVertex) resultList.iterator().next();
for (String s : user.getRecord().fieldNames()) {
System.out.println(s);
}
Iterable resultList2 = graph.command(new OSQLSynchQuery("select from #11:0")).execute();
OrientVertex user2 = (OrientVertex) resultList2.iterator().next();
for (String s : user2.getRecord().fieldNames()) {
System.out.println(s);
}
} finally {
graph.shutdown();
}
I'm having the same output (that includes info about edges), with and without fetchplan. What am I doing wrong?

With network protocol, Fetch plan is to optimize network transfer, not to exclude connections. In the case above OrientDB client will fetch the connected edges if excluded by fetch plan.

Neo4j ExecutionEngine does not return valid results

Trying to use a similar example from the sample code found here
My sample function is:
void query()
{
String nodeResult = "";
String rows = "";
String resultString;
String columnsString;
System.out.println("In query");
// START SNIPPET: execute
ExecutionEngine engine = new ExecutionEngine( graphDb );
ExecutionResult result;
try ( Transaction ignored = graphDb.beginTx() )
{
result = engine.execute( "start n=node(*) where n.Name =~ '.*79.*' return n, n.Name" );
// END SNIPPET: execute
// START SNIPPET: items
Iterator<Node> n_column = result.columnAs( "n" );
for ( Node node : IteratorUtil.asIterable( n_column ) )
{
// note: we're grabbing the name property from the node,
// not from the n.name in this case.
nodeResult = node + ": " + node.getProperty( "Name" );
System.out.println("In for loop");
System.out.println(nodeResult);
}
// END SNIPPET: items
// START SNIPPET: columns
List<String> columns = result.columns();
// END SNIPPET: columns
// the result is now empty, get a new one
result = engine.execute( "start n=node(*) where n.Name =~ '.*79.*' return n, n.Name" );
// START SNIPPET: rows
for ( Map<String, Object> row : result )
{
for ( Entry<String, Object> column : row.entrySet() )
{
rows += column.getKey() + ": " + column.getValue() + "; ";
System.out.println("nested");
}
rows += "\n";
}
// END SNIPPET: rows
resultString = engine.execute( "start n=node(*) where n.Name =~ '.*79.*' return n.Name" ).dumpToString();
columnsString = columns.toString();
System.out.println(rows);
System.out.println(resultString);
System.out.println(columnsString);
System.out.println("leaving");
}
}
When I run this in the web console I get many results (as there are multiple nodes that have an attribute of Name that contains the pattern 79. Yet running this code returns no results. The debug print statements 'in loop' and 'nested' never print either. Thus this must mean there are not results found in the Iterator, yet that doesn't make sense.
And yes, I already checked and made sure that the graphDb variable is the same as the path for the web console. I have other code earlier that uses the same variable to write to the database.
EDIT - More info
If I place the contents of query in the same function that creates my data, I get the correct results. If I run the query by itself it returns nothing. It's almost as the query works only in the instance where I add the data and not if I come back to the database cold in a separate instance.
EDIT2 -
Here is a snippet of code that shows the bigger context of how it is being called and sharing the same DBHandle
package ContextEngine;
import ContextEngine.NeoHandle;
import java.util.LinkedList;
/*
* Class to handle streaming data from any coded source
*/
public class Streamer {
private NeoHandle myHandle;
private String contextType;
Streamer()
{
}
public void openStream(String contextType)
{
myHandle = new NeoHandle();
myHandle.createDb();
}
public void streamInput(String dataLine)
{
Context context = new Context();
/*
* get database instance
* write to database
* check for errors
* report errors & success
*/
System.out.println(dataLine);
//apply rules to data (make ContextRules do this, send type and string of data)
ContextRules contextRules = new ContextRules();
context = contextRules.processContextRules("Calls", dataLine);
//write data (using linked list from contextRules)
NeoProcessor processor = new NeoProcessor(myHandle);
processor.processContextData(context);
}
public void runQuery()
{
NeoProcessor processor = new NeoProcessor(myHandle);
processor.query();
}
public void closeStream()
{
/*
* close database instance
*/
myHandle.shutDown();
}
}
Now, if I call streamInput AND query in in the same instance (parent calls) the query returns results. If I only call query and do not enter ANY data in that instance (yet web console shows data for same query) I get nothing. Why would I have to create the Nodes and enter them into the database at runtime just to return a valid query. Shouldn't I ALWAYS get the same results with such a query?

You mention that you are using the Neo4j Browser, which comes with Neo4j. However, the example you posted is for Neo4j Embedded, which is the in-process version of Neo4j. Are you sure you are talking to the same database when you try your query in the Browser?
In order to talk to Neo4j Server from Java, I'd recommend looking at the Neo4j JDBC driver, which has good support for connecting to the Neo4j server from Java.
http://www.neo4j.org/develop/tools/jdbc
You can set up a simple connection by adding the Neo4j JDBC jar to your classpath, available here: https://github.com/neo4j-contrib/neo4j-jdbc/releases Then just use Neo4j as any JDBC driver:
Connection conn = DriverManager.getConnection("jdbc:neo4j://localhost:7474/");
ResultSet rs = conn.executeQuery("start n=node({id}) return id(n) as id", map("id", id));
while(rs.next()) {
System.out.println(rs.getLong("id"));
}
Refer to the JDBC documentation for more advanced usage.
To answer your question on why the data is not durably stored, it may be one of many reasons. I would attempt to incrementally scale back the complexity of the code to try and locate the culprit. For instance, until you've found your problem, do these one at a time:
Instead of looping through the result, print it using System.out.println(result.dumpToString());
Instead of the regex query, try just MATCH (n) RETURN n, to return all data in the database
Make sure the data you are seeing in the browser is not "old" data inserted earlier on, but really is an insert from your latest run of the Java program. You can verify this by deleting the data via the browser before running the Java program using MATCH (n) OPTIONAL MATCH (n)-[r]->() DELETE n,r;
Make sure you are actually working against the same database directories. You can verify this by leaving the server running. If you can still start your java program, unless your Java program is using the Neo4j REST Bindings, you are not using the same directory. Two Neo4j databases cannot run against the same database directory simultaneously.

MongoDB SELF JOIN query having 1 collection

I'd like to do something like
SELECT e1.sender
FROM email as e1, email as e2
WHERE e1.sender = e2.receiver;
but in MongoDB. I found many forums about JOIN, which can be implemented via MapReduce in MongoDB, but I don't understand how to do it in this example with self-join.
I was thinking about something like this:
var map1 = function(){
var output = {
sender:db.collectionSender.email,
receiver: db.collectionReceiver.findOne({email:db.collectionSender.email}).email
}
emit(this.email, output);
};
var reduce1 = function(key, values){
var outs = {sender:null, receiver:null
values.forEach(function(v) {
if(outs.sender == null){
outs.sender = v.sender
}
if(outs.receivers == null){
outs.receiver = v.receiver
}
});
return outs; }};
db.email.mapReduce(map2,reduce2,{out:'rec_send_email'})
to create 2 new collections - collectionReceiver containing only receiver email and collectionSender containing only sender email
OR
var map2 = function(){
var output = {sender:this.sender,
receiver: db.email.findOne({receiver:this.sender})}
emit(this.sender, output);
};
var reduce2 = function(key, values){
var outs = {sender:null, receiver:null
values.forEach(function(v){
if(outs.sender == null){
outs.sender = v.sender
}
if(outs.receiver == null){
outs.receiver = v.receiver
}
});
return outs; };};
db.email.mapReduce(map2,reduce2,{out:'rec_send_email'})
but none of them is working and I don't understand this MapReduce-thing well. Could somebody explain it to me please? I was inspired by this article http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/ .
Additionally, I need to write it in Java. Is there any way how to solve it?

If you need to implement a "self-join" when using MongoDB then you may have structured your schema incorrectly (or sub-optimally).
In MongoDB (and noSQL in general) the schema structure should reflect the queries you will need to run against them.
It looks like you are assuming a collection of emails where each document has one sender and one receiver and now you want to find all senders who also happen to be receivers of email? The only way to do this would be via two simple queries, and not via map/reduce (which would be far more complex, unnecessary and the way you've written them wouldn't work as you can't query from within map function).
You are writing in Java - why not make two queries - the first to get all unique senders and the second to find all unique receivers who are also in the list of senders?
In the shell it would be:
var senderList = db.email.distinct("sender");
var receiverList = db.email.distinct("receiver", {"receiver":{$in:senderList}})

Elasticsearch - Assigning Shards

I have recently discovered Elasticsearch and I decided to have a play. Unfortunately I am having trouble with adding indexes.
The code used to add an index is as follows and runs every time a new index is attempted to be added:
public void index ( String index, String type, String id, String json ){
Node node = null;
try{
node = nodeBuilder().node();
Client client = node.client();
IndexResponse response = client.prepareIndex( index, type, id )
.setSource( json )
.execute()
.actionGet();
}
catch ( Exception e ){
Logger.error( e, " Error indexing JSON file: " + json );
}
finally {
if( node != null)
node.close();
}
}
No indexes appear to be added and my Cluster helath is currently red (as one of the shards is red), but I have no idea how to resolve this. I am receiveing confirmation that my index is being added each time but they do not show up when searched or in es-admin.
All help or ideas are greatly appreciated.

When starting a Node, one of the common settings to consider is if it should hold data or not. In other words, should indices and shards be allocated to it. Many times we would like to have the clients just be clients, without shards being allocated to them [1].
If you want to set up your client as being a non-data client (no shards) try setting it up like so by replacing this:
node = nodeBuilder().node();
with this:
node = nodeBuilder().client(true).node();
[1] http://www.elasticsearch.org/guide/reference/java-api/client.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Cannot read Gremlin data from remote after writing - java

Related

Janusgraph Remote Traversal with Java

How can I use fetchplan with Java API?

Neo4j ExecutionEngine does not return valid results

MongoDB SELF JOIN query having 1 collection

Elasticsearch - Assigning Shards

Categories

Resources