Neo4J find unique results

Neo4J find unique results - java

I have a graph that displays a finance relation between companies - the relation of owes , which companies owe money to companies . I seek a unique relations - circles , which are latter closed .So if I owe You money and we find that somehow you owe me money me close the debt . The companies are identified by tax number. For this I use this Cypher query :
start n=node(*)
match p=n-[r:OWES*1..200]->n
where HAS(n.taxnumber)
return extract(s in relationships(p) : s.amount),
extract(t in nodes(p) : ID(t)),
length(p) ;
But I also get results like
Company1-Company2-company1-Company-3
I display this results back in my java application . Should I maybe hide this results after I parse them in java code - results where one company is shown twice .
This is fine when it comes to logic but I need results where a company is shown only once , I do not want results where I get the same company multiple times . How to modify my Cypher query for that ? What I want is that the company in the results can be only at the beginning and at the end of the result and not somehow circled in the middle .

You can try to check the path-nodes to not contain your start node.
start n=node(*)
match p=n-[:OWES*1..200]->(m), (m)-[r:OWES]->n
where HAS(n.taxnumber)
AND NOT(n IN tail(nodes(p)))
return extract(s in relationships(p) : s.amount) + r.amount,
extract(t in nodes(p) : ID(t)) + ID(n),
length(p) + 1;
Unfortunately there is no subscript in 1.8.2 and only tail(coll) no simple way to exclude the last element from a check. That's why I have to break up p and fix your aggregations at the end.

Related

Java Neo4j Cypher Or Match

I have a graph were user can have a post and also can have a friend that have a post , the friend can be followed or not .
how can i query all the posts by the user and the friends that he is following ?
I tried this :
" MATCH (u1:User)-[:POSTED]->(p1:Post)"
+ " WHERE u1.username =~ '"+user+"'"
+ " OPTIONAL MATCH (u3:User)-[:FOLLOWING]->(u2:User)-[:POSTED]->(p2:Post),"
+ " (u3:User)-[:FRIEND_OF]->(u2:User)"
+ " WHERE u3.username =~ '"+user+"' return u1.username, u1.name,"
+ "p1 ,u2.username, u2.name , p2";
but this query returns duplicates, lets say we have our user and a friend .
the frien have one post and the user have two , the query returns the friend post twice its like for every MATCH the query returns also the result of OPTIONAL MATCH .
to further exaplain:
(u:User)-[:POSTED]->(p:Post)
(u:User)-[:FRIEND_OF]->(u2:User)
(u:User)-[:FOLLOWING]->(u2:User)-[:POSTED]->(p2:Post)
these are the relationships that exist what i want is all Posts (:Post) that meet those relationships without duplicates and preferably with single query .

First of all, your query is much more complex than it needs to be. This simpler query should be equivalent. I assume that {user} is supplied as a parameter.
MATCH (u1:User {username: {user}})-[:POSTED]->(p1:Post)
OPTIONAL MATCH
(u1)-[:FOLLOWING]->(u2:User)-[:POSTED]->(p2:Post),
(u1)-[:FRIEND_OF]->(u2)
RETURN u1.username, u1.name, p1, u2.username, u2.name, p2;
The reason you get multiple rows with the same p2 values is because your RETURN clause is returning values related to u1 and u2 together. If there are N u1/p1 results and M u2/p2 results, then you'd get N*M result rows.
To get a result with N rows (with one row per u1/p2 result), you can use something like this query:
MATCH (u1:User {username: {user}})-[:POSTED]->(p1:Post)
OPTIONAL MATCH
(u1)-[:FOLLOWING]->(u2:User)-[:POSTED]->(p2:Post),
(u1)-[:FRIEND_OF]->(u2)
RETURN
u1.username, u1.name, p1,
COLLECT({username: u2.username, name: u2.name, p2: p2}) AS friends;
Each result row will have a friends collection with the data for each relevant friend.

SPARQL ARQ Query Execution

So I have this piece of Jena code, which basically tries to build a query using a Triple ElementTriplesBlock and finally using the QueryFactory.make(). Now I have a local Virtuoso instance set up and so my SPARQL end point is the localhost. i.e. just http://localhost:8890/sparql. The RDFs that I am querying are generated from the Lehigh University Benchmark generator. NowI am trying to replace the triples in the query pattern based on some conditions. i.e. lets say if the query is made of two BGPs or triple patterns and if one of the triple patterns gives zero results, I'd want to change that triple pattern to something else. How do I achieve this in Jena? . My code looks like
//Create your triples
Triple pattern1 = Triple.create(Var.alloc("X"),Node.createURI("http://www.w3.org/1999/02/22-rdf-syntax-ns#type"),Node.createURI("http://swat.cse.lehigh.edu/onto/univ-bench.owl#AssociateProfessor"));
Triple pattern = Triple.create(Var.alloc("X"), Node.createURI("http://swat.cse.lehigh.edu/onto/univ-bench.owl#emailAddress"), Var.alloc("Y2"));
ElementTriplesBlock block = new ElementTriplesBlock();
block.addTriple(pattern1);
block.addTriple(pattern);
ElementGroup body = new ElementGroup();
body.addElement(block);
//Build a Query here
Query q = QueryFactory.make();
q.setPrefix("ub", "http://swat.cse.lehigh.edu/onto/univ-bench.owl#");
q.setQueryPattern(body);
q.setQuerySelectType();
q.addResultVar("X");
//?X ub:emailAddress ?Y2 .
//Query to String
System.out.println(q.toString());
QueryExecution qexec = QueryExecutionFactory.sparqlService("http://localhost:8890/sparql", q);
Op op = Algebra.optimize(Algebra.compile(q));
System.out.println(op.toString());
So to be clear I am able to actually see the BGP in a Relational Algebra form by using the Op op = Algebra.optimize(Algebra.compile(q)) line. The output looks like
(project (?X)
(bgp
(triple ?X <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://swat.cse.lehigh.edu/onto/univ-bench.owl#AssociateProfessor>)
(triple ?X <http://swat.cse.lehigh.edu/onto/univ-bench.owl#emailAddress> ?Y2)
))
Now how would I go about evaluating the execution of each triple? In this case, if I just wanted to print the number of results at each step of the query pattern execution, how would I do it? I did read some of the examples here. I guess one has to use an OpExecutor and a QueryIterator but I am not sure how they all fit together. In this case I just would want to iterate through each of the basic graph patterns and then output the basic graph pattern and the number of results that it returns from the end point. Any help or pointers would be appreciated.

How to get country of a municipality?

The ontology lies in XML here.
I also tried to ask which are the classes of my world and then tried to check if my resource (the municipality) really belongs to that class, but still Country slips away (although it's fetched when I ask for all the classes, it fails to connect via the property belongs_to to my resource).
I have also enabled Forward chaining for reasoning in Sesame! BTW, I am a beginner, so any tip would be gold to me. What am I missing?
Edit:
New query:
"SELECT ?res ?belongs " +
"WHERE {" +
"?res a geo:Mun ;"+
"geo:hasName \"mun name\" ;"+
"geo:belongs_to+ ?belongs ."+
"}";
Output:
[belongs=http://geo.linkedopendata.gr/gag/id/1304;res=http://geo.linkedopendata.gr/gag/id/9325]
[belongs=http://geo.linkedopendata.gr/gag/id/13;res=http://geo.linkedopendata.gr/gag/id/9325]
[belongs=http://geo.linkedopendata.gr/gag/id/997;res=http://geo.linkedopendata.gr/gag/id/9325]

If I understand your data model correctly, you have instances of class geo:Municipality, which belong to an instance of geo:RegionUnit, which in turn belongs to an instance of geo:Region, etc, until ultimately they belong to an instance of geo:Country. And unless I misunderstand, your query tries to get back all these instances for one particular munipicality.
This is quite simply done, and does not even require any RDFS inferencing support.
Let's build up the query one step at a time. First, let's grab the actual municipality itself:
SELECT ?res
WHERE {
?res a geo:Municipality ;
geo:έχει_επίσημο_όνομα "ΔΗΜΟΣ ΧΑΝΙΩΝ" .
}
I'm assuming here (since I don't speak Greek) that geo:έχει_επίσημο_όνομα is the RDF property that links the Municipality resource with its name label.
The second step is that we want to grab all other resources it belongs to.
SELECT ?res ?belongs
WHERE {
?res a geo:Municipality ;
geo:έχει_επίσημο_όνομα "ΔΗΜΟΣ ΧΑΝΙΩΝ" ;
geo:belongsTo ?belongs .
}
Of course, the above only gets us back the things that it directly belongs to. If we believe your ontology, this will give us back the region unit(s) it belongs to, but nothing else. But we want all of them, N steps removed, so we want to transitively follow the geo:belongsTo relation. This can be done using a transitive property path:
SELECT ?res ?belongs
WHERE {
?res a geo:Municipality ;
geo:έχει_επίσημο_όνομα "ΔΗΜΟΣ ΧΑΝΙΩΝ" ;
geo:belongsTo+ ?belongs .
}
Notice the +. This means "one or more times", so the variable ?belongs will be bound to any values that can be reached by following the geo:belongsTo property one or more times (the * that you used in your question, btw, expresses 'zero or more times').
Update
Now, if in addition to getting the individual resources, you also want to get back the classes themselves that they belong to (that is, geo:RegionUnit, geo:Country, etc), you can amend the query like so:
SELECT ?res ?belongs ?adminUnit
WHERE {
?res a geo:Municipality ;
geo:έχει_επίσημο_όνομα "ΔΗΜΟΣ ΧΑΝΙΩΝ" ;
geo:belongsTo+ ?belongs .
?belongs a ?adminUnit .
}
This will give you back all administrative units that the given municipality belongs to, and the class of each particular administrative unit. Here, by the way, RDFS inferencing will make a slight difference: because all your admin-unit classes are defined as a rdfs:subClassOf the class geo:AdministrativeUnit, you will get back two results for each unit: once the specific class, and once the superclass geo:AdministrativeUnit.
Update 2 if the problem is that you get all the administrative units back except the country, then the most likely cause of this is that the geo:belongsTo relation between the specific Decentralised Admin (Let's call it geo:DecAdm for short) and the Country is missing. If you wish to verify this, you can do the following query:
ASK WHERE {
<http://geo.linkedopendata.gr/gag/id/997> geo:belongsTo [ a geo:Country ] .
}
Replace <http://geo.linkedopendata.gr/gag/id/997> with the actual id of the geo:DecAdmin you're interested in. The query will return true if the relation exists, false otherwise.
Or if you want to check more generally, you can do the following:
SELECT ?decAdmin ?country
WHERE {
?decAdmin a geo:DecAdm .
OPTIONAL { ?decAdmin geo:belongsTo ?country .
?country a geo:Country .
}
} ORDER BY ?decAdmin
This query will give you an overview of all decentralized administration instances, and the country to which they belong. If there is no country known for a particular ?decAdmin, the second column in your query result will be empty for that row.

Load Social Network Data into Neo4J

I have a dataset similar to Twitter's graph. The data is in the following form:
<user-id1> <List of ids which he follows separated by spaces>
<user-id2> <List of ids which he follows separated by spaces>
...
I want to model this in the form of a unidirectional graph, expressed in the cypher syntax as:
(A:Follower)-[:FOLLOWS]->(B:Followee)
The same user can appear more than once in the dataset as he might be in the friend list of more than one person, and he might also have his friend list as part of the data set. The challenge here is to make sure that there are no duplicate nodes for any user. And if the user appears as a Follower and Followee both in the data set, then the node's label should have both the values, i.e., Follower:Followee. There are about 980k nodes in the graph and size of dataset is 1.4 GB.
I am not sure if Cypher's load CSV will work here because each line of the dataset has a variable number of columns making it impossible to write a query to generate the nodes for each of the columns. So what would be the best way to import this data into Neo4j without creating any duplicates?

I did actually exactly the same for the friendster dataset, which has almost the same format as yours.
There the separator for the many friends was ":".
The queries I used there, are these:
create index on :User(id);
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///home/michael/import/friendster/friends-000______.txt" as line FIELDTERMINATOR ":"
MERGE (u1:User {id:line[0]})
;
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///home/michael/import/friendster/friends-000______.txt" as line FIELDTERMINATOR ":"
WITH line[1] as id2
WHERE id2 <> '' AND id2 <> 'private' AND id2 <> 'notfound'
UNWIND split(id2,",") as id
WITH distinct id
MERGE (:User {id:id})
;
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///home/michael/import/friendster/friends-000______.txt" as line FIELDTERMINATOR ":"
WITH line[0] as id1, line[1] as id2
WHERE id2 <> '' AND id2 <> 'private' AND id2 <> 'notfound'
MATCH (u1:User {id:id1})
UNWIND split(id2,",") as id
MATCH (u2:User {id:id})
CREATE (u1)-[:FRIEND_OF]->(u2)
;

Problem with JDOQL to obtain results with a "contains" request

I am using Google App Engine for a project and I need to do some queries on the database. I use the JDOQL to ask the database. In my case I want to obtain the university that contains the substring "array". I think my query has a mistake because it returns the name of universities in the alphabetical order and not the ones containing the substring.
Query query = pm.newQuery("SELECT FROM " + University.class.getName() + " WHERE name.contains("+array+") ORDER BY name RANGE 0, 5");
Could someone tell me what's wrong in my query?
Thank you for your help!
EDIT
I have a list of universities store and I have a suggestbox where we can request a university by his name. And I want to autocomplete the requested name.

App engine does not support full-text searches, you should star issue 217. However, A partial workaround is possible. And in your case I think it is a good fit.
First thing, adjust your model such that there is a lower (or upper case) version of the name as well -- I will assume it is called lname. Unless you want your queries to be case-sensitive.
Then you query like this:
Query query = pm.newQuery(University.class);
query.setFilter("lname >= startNameParam");
query.setFilter("lname < stopNameParam");
query.setOrdering("lname asc");
query.declareParameters("String startNameParam");
query.declareParameters("String stopNameParam");
query.setRange(0, 5);
List<University> results = (List<University>) query.execute(search_value, search_value + "z");

The correct way to do this is like this -
Query query = pm.newQuery(University.class,":p.contains(name)");
query.setOrdering("name asc");
query.setRange(0, 5);
List univs = q.execute(Arrays.asList(array));
(note- In this case the :p is an implicit param name you can replace with any name)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Neo4J find unique results - java

Related

Java Neo4j Cypher Or Match

SPARQL ARQ Query Execution

How to get country of a municipality?

Load Social Network Data into Neo4J

Problem with JDOQL to obtain results with a "contains" request

Categories

Resources