neo4j: Replace multiple nodes with same property by one node

neo4j: Replace multiple nodes with same property by one node - java

Let's say I have a property "name" of nodes in neo4j. Now I want to enforce that there is maximally one node for a given name by identifying all nodes with the same name. More precisely: If there are three nodes where name is "dog", I want them to be replaced by just one node with name "dog", which:
Gathers all properties from all the original three nodes.
Has all arcs that were attached to the original three nodes.
The background for this is the following: In my graph, there are often several nodes of the same name which should considered as "equal" (although some have richer property information than others). Putting a.name = b.name in a WHERE clause is extremely slow.
EDIT: I forgot to mention that my Neo4j is of version 2.3.7 currently (I cannot update it).
SECOND EDIT: There is a known list of labels for the nodes and for the possible arcs. The type of the nodes is known.
THIRD EDIT: I want to call above "node collapse" procedure from Java, so a mixture of Cypher queries and procedural code would also be a useful solution.

I have made a testcase with following schema:
CREATE (n1:TestX {name:'A', val1:1})
CREATE (n2:TestX {name:'B', val2:2})
CREATE (n3:TestX {name:'B', val3:3})
CREATE (n4:TestX {name:'B', val4:4})
CREATE (n5:TestX {name:'C', val5:5})
MATCH (n6:TestX {name:'A', val1:1}) MATCH (m7:TestX {name:'B', val2:2}) CREATE (n6)-[:TEST]->(m7)
MATCH (n8:TestX {name:'C', val5:5}) MATCH (m10:TestX {name:'B', val3:3}) CREATE (n8)<-[:TEST]-(m10)
What results in following output:
Where the nodes B are really the same nodes. And here is my solution:
//copy all properties
MATCH (n:TestX), (m:TestX) WHERE n.name = m.name AND ID(n)<ID(m) WITH n, m SET n += m;
//copy all outgoing relations
MATCH (n:TestX), (m:TestX)-[r:TEST]->(endnode) WHERE n.name = m.name AND ID(n)<ID(m) WITH n, collect(endnode) as endnodes
FOREACH (x in endnodes | CREATE (n)-[:TEST]->(x));
//copy all incoming relations
MATCH (n:TestX), (m:TestX)<-[r:TEST]-(endnode) WHERE n.name = m.name AND ID(n)<ID(m) WITH n, collect(endnode) as endnodes
FOREACH (x in endnodes | CREATE (n)<-[:TEST]-(x));
//delete duplicates
MATCH (n:TestX), (m:TestX) WHERE n.name = m.name AND ID(n)<ID(m) detach delete m;
The resulting output looks like this:
It has to be marked that you have to know the type of the various relationships.
All the properties are copied from the nodes with "higher" IDs to the nodes with the "lower" IDs.

I think you need something like a synonym of nodes.
1) Go through all nodes and create a node synonym:
MATCH (N)
WITH N
MERGE (S:Synonym {name: N.name})
MERGE (S)<-[:hasSynonym]-(N)
RETURN count(S);
2) Remove the synonyms with only one node:
MATCH (S:Synonym)
WITH S
MATCH (S)<-[:hasSynonym]-(N)
WITH S, count(N) as count
WITH S WHERE count = 1
DETACH DELETE S;
3) Transport properties and relationships for the remaining synonyms (with apoc):
MATCH (S:Synonym)
WITH S
MATCH (S)<-[:hasSynonym]-(N)
WITH [S] + collect(N) as nodesForMerge
CALL apoc.refactor.mergeNodes( nodesForMerge );
4) Remove Synonym label:
MATCH (S:Synonym)<-[:hasSynonym]-(N)
CALL apoc.create.removeLabels( [S], ['Synonym'] );

Related

Is there any simple way of Comparing two graphs in neo4j

My back-end generates a graph that contains a node (lets call it Node 1), the graph looks as follows
1 (TOPNODE)
/ \
2 1 2
/ \
3 3 4
/ \
4 5 6
The top node contains the date the graph was generated.
After that all the even levels (level 2 and level 4 which contain nodes 1, 2,5 and 6) contain a unique name, and a value ie. a phone number.
All of the odd levels (level 3 AKA: nodes 3 and 4) contain their parent name and their children information.
In my service I can edit parts of the graph. For example: I can change the value in the node (NOT THE NAME). or I can delete nodes at once. But i cna only access the edited information by generating that part of the subgraph.
SO my question is: Can I get the full graph into JAVA, then compare only that subgraph with the new subgraph that was just generated and then create a new version of the old graph but with the changes?
What I have tried is:
pulling all of the graph into java as a JSON, and using that to compare to the smaller graph, this works. But I dont know if there is a more efficient way or if there is any way to get the nodes in java as actual nodes instead of JSON. To get it into a JSON I did the following:
Session session = driver.session();
String message = "START n=node(*) MATCH (n)-[r]->(m) RETURN n,r,m;";
StatementResult result = session.run(message);
while ( result.hasNext() ) {
Record record = result.next();
Gson gson = new Gson();
System.out.println(gson.toJson(record.asMap()));
String m = gson.toJson(record.asMap().get("n"));
JSONObject json = new JSONObject(gson.toJson(record.asMap()));
convert(json,m);
}
session.close();

Java Neo4j Cypher Or Match

I have a graph were user can have a post and also can have a friend that have a post , the friend can be followed or not .
how can i query all the posts by the user and the friends that he is following ?
I tried this :
" MATCH (u1:User)-[:POSTED]->(p1:Post)"
+ " WHERE u1.username =~ '"+user+"'"
+ " OPTIONAL MATCH (u3:User)-[:FOLLOWING]->(u2:User)-[:POSTED]->(p2:Post),"
+ " (u3:User)-[:FRIEND_OF]->(u2:User)"
+ " WHERE u3.username =~ '"+user+"' return u1.username, u1.name,"
+ "p1 ,u2.username, u2.name , p2";
but this query returns duplicates, lets say we have our user and a friend .
the frien have one post and the user have two , the query returns the friend post twice its like for every MATCH the query returns also the result of OPTIONAL MATCH .
to further exaplain:
(u:User)-[:POSTED]->(p:Post)
(u:User)-[:FRIEND_OF]->(u2:User)
(u:User)-[:FOLLOWING]->(u2:User)-[:POSTED]->(p2:Post)
these are the relationships that exist what i want is all Posts (:Post) that meet those relationships without duplicates and preferably with single query .

First of all, your query is much more complex than it needs to be. This simpler query should be equivalent. I assume that {user} is supplied as a parameter.
MATCH (u1:User {username: {user}})-[:POSTED]->(p1:Post)
OPTIONAL MATCH
(u1)-[:FOLLOWING]->(u2:User)-[:POSTED]->(p2:Post),
(u1)-[:FRIEND_OF]->(u2)
RETURN u1.username, u1.name, p1, u2.username, u2.name, p2;
The reason you get multiple rows with the same p2 values is because your RETURN clause is returning values related to u1 and u2 together. If there are N u1/p1 results and M u2/p2 results, then you'd get N*M result rows.
To get a result with N rows (with one row per u1/p2 result), you can use something like this query:
MATCH (u1:User {username: {user}})-[:POSTED]->(p1:Post)
OPTIONAL MATCH
(u1)-[:FOLLOWING]->(u2:User)-[:POSTED]->(p2:Post),
(u1)-[:FRIEND_OF]->(u2)
RETURN
u1.username, u1.name, p1,
COLLECT({username: u2.username, name: u2.name, p2: p2}) AS friends;
Each result row will have a friends collection with the data for each relevant friend.

Neo4j-ogm query path

In my Java code I have a query to match the shortest path from root to a leaf in my tree.
Strinq query = "Match path = (p:Root)-[*1..100]-(m:Leaf) "
+ "WITH p,m,path ORDER BY length(path) LIMIT 1 RETURN path";
However, when I try to query this as follows
SessionFactory sessionFactory = new SessionFactory("incyan.Data.Neo4j.Models");
Session session = sessionFactory.openSession("http://localhost:7474");
Object o = session(query, new HashMap<String,Object>());
o contains an ArrayList of LinkedHashMaps instead of mapped objects.
I cannot even determine the labels of the path elements and the start and end nodes of the relations.
What am I doing wrong?

The current neo4j-ogm release does not map query results to domain entities. Returning a path will only give you the properties of nodes and relationships in that path (in order, so you can infer the relationship start/end). ID's aren't returned by the Neo4j REST api currently used by the OGM for this particular operation and that's why they are missing. You may instead have to extract the ID's and return them as part of your query.
Mapping individual query result columns to entities will be available in a Neo4j-OGM 2.0 release.

I'm not sure about the Java bit, but if you use the shortestPath function (keyword?) your query should be more efficient:
MATCH path=shortestPath((p:Root)-[*1..100]-(m:Leaf))
RETURN path
Also, I don't know what your data model is like, but I would expect the labels on the nodes of your tree (I'm assuming it's a tree) to all be the same. You can tell if a node is a root or a leaf using Cypher:
MATCH path=shortestPath((root:Element)-[*1..100]-(leaf:Element))
WHERE NOT((root)-[:HAS_PARENT]->()) AND NOT(()-[:HAS_PARENT]->(leaf))
RETURN path

Load Social Network Data into Neo4J

I have a dataset similar to Twitter's graph. The data is in the following form:
<user-id1> <List of ids which he follows separated by spaces>
<user-id2> <List of ids which he follows separated by spaces>
...
I want to model this in the form of a unidirectional graph, expressed in the cypher syntax as:
(A:Follower)-[:FOLLOWS]->(B:Followee)
The same user can appear more than once in the dataset as he might be in the friend list of more than one person, and he might also have his friend list as part of the data set. The challenge here is to make sure that there are no duplicate nodes for any user. And if the user appears as a Follower and Followee both in the data set, then the node's label should have both the values, i.e., Follower:Followee. There are about 980k nodes in the graph and size of dataset is 1.4 GB.
I am not sure if Cypher's load CSV will work here because each line of the dataset has a variable number of columns making it impossible to write a query to generate the nodes for each of the columns. So what would be the best way to import this data into Neo4j without creating any duplicates?

I did actually exactly the same for the friendster dataset, which has almost the same format as yours.
There the separator for the many friends was ":".
The queries I used there, are these:
create index on :User(id);
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///home/michael/import/friendster/friends-000______.txt" as line FIELDTERMINATOR ":"
MERGE (u1:User {id:line[0]})
;
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///home/michael/import/friendster/friends-000______.txt" as line FIELDTERMINATOR ":"
WITH line[1] as id2
WHERE id2 <> '' AND id2 <> 'private' AND id2 <> 'notfound'
UNWIND split(id2,",") as id
WITH distinct id
MERGE (:User {id:id})
;
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///home/michael/import/friendster/friends-000______.txt" as line FIELDTERMINATOR ":"
WITH line[0] as id1, line[1] as id2
WHERE id2 <> '' AND id2 <> 'private' AND id2 <> 'notfound'
MATCH (u1:User {id:id1})
UNWIND split(id2,",") as id
MATCH (u2:User {id:id})
CREATE (u1)-[:FRIEND_OF]->(u2)
;

Hibernate criteria with restrictions on children

I have a Hibernate criteria call that I want to execute in one SQL statement. What I'm trying to do is select instances of Parent that have Children with a property in a range of values (SQL IN clause), all while loading the children using an outer join. Here's what I have so far:
Criteria c = session.createCriteria(Parent.class);
c.createAlias("children", "c", CriteriaSpecification.LEFT_JOIN)
.setFetchMode("c", FetchMode.JOIN)
.add(Restrictions.in("c.property", properties));
c.setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY);
return c.list();
Here's some sample data:
Parent
Parent ID
A
B
C
Children
Child ID Parent ID property
... A 0
... A 2
... A 7
... B 1
... C 1
... C 2
... C 3
What I want to do is return the parents and ALL their children if one of the children has a property equal to my bind parameter(s). Let's assume properties is an array containing {2}. In this case, the call will return parents A and C but their child collections will contain only element 2. I.e. Parent[Children]:
A[2] & C[2]
What I want is:
A[0, 2, 7] & C[1, 2 3]
If this is not a bug, it seems to be a broken semantic. I don't see how calling A.getChildren() or C.getChildren() and returning 1 record would ever be considered correct -- this is not a projection. I.e. if I augment the query to use the default select fetch, it returns the proper children collections, albiet with a multitude of queries:
c.createAlias("children", "c").add(
Restrictions.in("c.property", properties));
Is this a bug? If not, how can I achieve my desired result?

Criteria c = session.createCriteria(Parent.class);
c.createAlias("children", "children");
c.add(Restrictions.in("children.property", properties));
c.setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY);
return c.list();

getChildren() is just the name of the getter/setter, your query will determine how the objects get populated.
I'm going to guess here that the first part spits out
SELECT * FROM Parent
INNER JOIN Child c ON ...
WHERE c.property in (x,y,z)
which doesn't get you what you want. What'd you'd want to do if you were writing this in raw SQL is this:
SELECT * FROM Parent
WHERE ParentID IN (SELECT DISTINCT parentID FROM Child WHERE c.property in (x,y,z))
rearranging your criteria appropriately might do the trick if the last one isn't producing this query. (Could you also post what hibernate is generating for each?)

I would start the Criteria with the child class. You'll get a list with all the children, and then you can iterate and get the parent for each children.

This can be done in work around way.
Criteria c1 = session.createCriteria(Child.class);
c1.add(Restrictions.in("property", properties));
c1.setProjection( Projections.distinct( Projections.property( "parentId" ) ) );
List<Integer> parentIds = c1.list();
Criteria c2 = session.createCriteria(Parent.class);
c2.createAlias("children", "children");
c2.add(Restrictions.in("id", parentIds));
return c2.list();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

neo4j: Replace multiple nodes with same property by one node - java

Related

Is there any simple way of Comparing two graphs in neo4j

Java Neo4j Cypher Or Match

Neo4j-ogm query path

Load Social Network Data into Neo4J

Hibernate criteria with restrictions on children

Categories

Resources