Load Social Network Data into Neo4J

Load Social Network Data into Neo4J - java

I have a dataset similar to Twitter's graph. The data is in the following form:
<user-id1> <List of ids which he follows separated by spaces>
<user-id2> <List of ids which he follows separated by spaces>
...
I want to model this in the form of a unidirectional graph, expressed in the cypher syntax as:
(A:Follower)-[:FOLLOWS]->(B:Followee)
The same user can appear more than once in the dataset as he might be in the friend list of more than one person, and he might also have his friend list as part of the data set. The challenge here is to make sure that there are no duplicate nodes for any user. And if the user appears as a Follower and Followee both in the data set, then the node's label should have both the values, i.e., Follower:Followee. There are about 980k nodes in the graph and size of dataset is 1.4 GB.
I am not sure if Cypher's load CSV will work here because each line of the dataset has a variable number of columns making it impossible to write a query to generate the nodes for each of the columns. So what would be the best way to import this data into Neo4j without creating any duplicates?

I did actually exactly the same for the friendster dataset, which has almost the same format as yours.
There the separator for the many friends was ":".
The queries I used there, are these:
create index on :User(id);
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///home/michael/import/friendster/friends-000______.txt" as line FIELDTERMINATOR ":"
MERGE (u1:User {id:line[0]})
;
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///home/michael/import/friendster/friends-000______.txt" as line FIELDTERMINATOR ":"
WITH line[1] as id2
WHERE id2 <> '' AND id2 <> 'private' AND id2 <> 'notfound'
UNWIND split(id2,",") as id
WITH distinct id
MERGE (:User {id:id})
;
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///home/michael/import/friendster/friends-000______.txt" as line FIELDTERMINATOR ":"
WITH line[0] as id1, line[1] as id2
WHERE id2 <> '' AND id2 <> 'private' AND id2 <> 'notfound'
MATCH (u1:User {id:id1})
UNWIND split(id2,",") as id
MATCH (u2:User {id:id})
CREATE (u1)-[:FRIEND_OF]->(u2)
;

Related

How to update multiple rows using a single query with a mutable colletion

I want to update rows on a table which contains the following colums:
`parameter_name`(PRIMARY KEY),
`option_order`,
`value`.
I have a collection called parameterColletion which contains "parameterNames", "optionOrders" and "values". This collection does not have a fixed value, it can receive the quantity of parameters you want to.
Imagine I have 5 parameters inside my collection (I could have 28, or 10204 too) and I am trying to update the rows of the database using the next query. Example of query:
UPDATE insight_app_parameter_option
SET option_order IN (1,2,3,4,5), value IN ('a','b','c','d','e')
WHERE parameter_name IN ('name1', 'name2', 'name3', 'name4', 'name5')
But this isn't doing the job, instead it gives back an error which says You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IN (1,2,3,4,5), value IN ('a','b','c','d','e') WHERE parameter_name IN ('name1'' at line 2
1,2,3,4,5 -> Represent the option orders inside parameterCollection.
'a','b','c','d','e' -> Represent the values inside parameterCollection.
'name1', 'name2', 'name3', 'name4', 'name5' -> Represent the names inside parameterCollection.
I know how to update each parameter by separate but i would like to do it all together. Here are some links I visited where people asked the same question but they used a fixed colletion of objects, not a mutable one.
MySQL - UPDATE multiple rows with different values in one query
Multiple rows update into a single query
SQL - Update multiple records in one query

That's not possible with MySQL. The error you are receiving is a syntax error. You are not able to set multiple values at once. This is the correct syntax to a UPDATE statement: (ref)
UPDATE [LOW_PRIORITY] [IGNORE] table_reference
SET assignment_list
[WHERE where_condition]
[ORDER BY ...]
[LIMIT row_count]
value:
{expr | DEFAULT}
assignment:
col_name = value
assignment_list:
assignment [, assignment] ...
You need to create separate UPDATEs for each row. I suggest executing all in a single transaction, if its the case.
The correct syntax for your example is:
UPDATE insight_app_parameter_option
SET option_order = 1, value = 'a'
WHERE parameter_name = 'name1';
UPDATE insight_app_parameter_option
SET option_order = 2, value = 'b'
WHERE parameter_name = 'name2';
UPDATE insight_app_parameter_option
SET option_order = 3, value = 'c'
WHERE parameter_name = 'name3';
...

Using SQL stored procedure can we get the following output? or handle it within the code itself?

Image 1 is the current data,
Image 2 is the data i need to store into a new table. Thing is i want to combine all the same ITEM_NO and put it as a comma separated value and insert into a new table.

Whilst I don't think storing data like this is a good idea at all (see what others have said in the comments) it is possible by doing:
SELECT REFERENCE_NO,
ITEM_NO,
ROLES = STUFF((SELECT N', ' + ENTITY_ROLE
FROM dbo.MyTable AS p2
WHERE p2.ITEM_NO = p.ITEM_NO
ORDER BY ENTITY_ROLE
FOR XML PATH(N'')), 1, 2, N'')
FROM dbo.MyTable AS p
GROUP BY REFERENCE_NO, ITEM_NO
ORDER BY ITEM_NO;
A demo of this in action: SQL Fiddle

neo4j: Replace multiple nodes with same property by one node

Let's say I have a property "name" of nodes in neo4j. Now I want to enforce that there is maximally one node for a given name by identifying all nodes with the same name. More precisely: If there are three nodes where name is "dog", I want them to be replaced by just one node with name "dog", which:
Gathers all properties from all the original three nodes.
Has all arcs that were attached to the original three nodes.
The background for this is the following: In my graph, there are often several nodes of the same name which should considered as "equal" (although some have richer property information than others). Putting a.name = b.name in a WHERE clause is extremely slow.
EDIT: I forgot to mention that my Neo4j is of version 2.3.7 currently (I cannot update it).
SECOND EDIT: There is a known list of labels for the nodes and for the possible arcs. The type of the nodes is known.
THIRD EDIT: I want to call above "node collapse" procedure from Java, so a mixture of Cypher queries and procedural code would also be a useful solution.

I have made a testcase with following schema:
CREATE (n1:TestX {name:'A', val1:1})
CREATE (n2:TestX {name:'B', val2:2})
CREATE (n3:TestX {name:'B', val3:3})
CREATE (n4:TestX {name:'B', val4:4})
CREATE (n5:TestX {name:'C', val5:5})
MATCH (n6:TestX {name:'A', val1:1}) MATCH (m7:TestX {name:'B', val2:2}) CREATE (n6)-[:TEST]->(m7)
MATCH (n8:TestX {name:'C', val5:5}) MATCH (m10:TestX {name:'B', val3:3}) CREATE (n8)<-[:TEST]-(m10)
What results in following output:
Where the nodes B are really the same nodes. And here is my solution:
//copy all properties
MATCH (n:TestX), (m:TestX) WHERE n.name = m.name AND ID(n)<ID(m) WITH n, m SET n += m;
//copy all outgoing relations
MATCH (n:TestX), (m:TestX)-[r:TEST]->(endnode) WHERE n.name = m.name AND ID(n)<ID(m) WITH n, collect(endnode) as endnodes
FOREACH (x in endnodes | CREATE (n)-[:TEST]->(x));
//copy all incoming relations
MATCH (n:TestX), (m:TestX)<-[r:TEST]-(endnode) WHERE n.name = m.name AND ID(n)<ID(m) WITH n, collect(endnode) as endnodes
FOREACH (x in endnodes | CREATE (n)<-[:TEST]-(x));
//delete duplicates
MATCH (n:TestX), (m:TestX) WHERE n.name = m.name AND ID(n)<ID(m) detach delete m;
The resulting output looks like this:
It has to be marked that you have to know the type of the various relationships.
All the properties are copied from the nodes with "higher" IDs to the nodes with the "lower" IDs.

I think you need something like a synonym of nodes.
1) Go through all nodes and create a node synonym:
MATCH (N)
WITH N
MERGE (S:Synonym {name: N.name})
MERGE (S)<-[:hasSynonym]-(N)
RETURN count(S);
2) Remove the synonyms with only one node:
MATCH (S:Synonym)
WITH S
MATCH (S)<-[:hasSynonym]-(N)
WITH S, count(N) as count
WITH S WHERE count = 1
DETACH DELETE S;
3) Transport properties and relationships for the remaining synonyms (with apoc):
MATCH (S:Synonym)
WITH S
MATCH (S)<-[:hasSynonym]-(N)
WITH [S] + collect(N) as nodesForMerge
CALL apoc.refactor.mergeNodes( nodesForMerge );
4) Remove Synonym label:
MATCH (S:Synonym)<-[:hasSynonym]-(N)
CALL apoc.create.removeLabels( [S], ['Synonym'] );

ORMLite groupByRaw and groupBy issue on android SQLite db

I have a SQLite table content with following columns:
-----------------------------------------------
|id|book_name|chapter_nr|verse_nr|word_nr|word|
-----------------------------------------------
the sql query
select count(*) from content where book_name = 'John'
group by book_name, chapter_nr
in DB Browser returns 21 rows (which is the count of chapters)
the equivalent with ORMLite android:
long count = getHelper().getWordDao().queryBuilder()
.groupByRaw("book_name, chapter_nr")
.where()
.eq("book_name", book_name)
.countOf();
returns 828 rows (which is the count of verse numbers)
as far as I know the above code is translated to:
select count(*) from content
where book_name = 'John'
group by book_name, chapter_nr
result of this in DB Browser:
| count(*)
------------
1 | 828
2 | 430
3 | 653
...
21| 542
---------
21 Rows returned from: select count(*)...
so it seems to me that ORMLite returns the first row of the query as the result of countOf().
I've searched stackoverflow and google a lot. I found this question (and more interestingly the answer)
You can also count the number of rows in a custom query by calling the > countOf() method on the Where or QueryBuilder object.
// count the number of lines in this custom query
int numRows = dao.queryBuilder().where().eq("name", "Joe Smith").countOf();
this is (correct me if I'm wrong) exactly what I'm doing, but somehow I just get the wrong number of rows.
So... either I'm doing something wrong here or countOf() is not working the way it is supposed to.
Note: It's the same with groupBy instead of groupByRaw (according to ORMLite documentation joining groupBy's should work)
...
.groupBy("book_name")
.groupBy("chapter_nr")
.where(...)
.countOf()
EDIT: getWordDao returns from class Word:
#DatabaseTable(tableName = "content")
public class Word { ... }

returns 828 rows (which is the count of verse numbers)
This seems to be a limitation of the QueryBuilder.countOf() mechanism. It is expecting a single value and does not understand the addition of GROUP BY to the count query. You can tell that it doesn't because that method returns a single long.
If you want to extract the counts for each of the groups it looks like you will need to do a raw query check out the docs.

Need suggestions to clarify the concept of mongoDB to store and retrieve images

I am new to mongoDB. I was told to use mongoDB to my photo management web app. I am not able to understand mongoDB's basic concept. The documents.
What is documents in mongoDB?
j = { name : "mongo" };
t = { x : 3 };
In mongoDB website they told that the above 2 lines were 2 documents.
But till this time i thought .txt, .doc .excel... etc. were documents.(This may be funny, but i am really in need of understanding its concepts!)
How do you represent a txt file for example example.txt in mongoDB?
What is collection?
Collection of documents is known as "Collections in mongoDB"
How many collections i can create?
All documents were shared in all collections
Finnaly i come to my part, How shall i represent images in mongoDB?
With the help of tutorials i learned to store and retrieve images from mongoDB using java!!
But, without the understanding of mongoDB's concepts i cannot move further!
The blog's and articles about mongoDB is pretty interesting. But still i am not able to understand its basic concepts!!!
Can anyone strike my head with mongoDB!!??

Perhaps comparing MongoDB to SQL would help you ...
In SQL queries work against tables, columns & rows in set-based operations. There are pre-defined schema's (and hopefully indexes) to aid the query processor (as well as the querier!)
SQL Table / Rows
id | Column1 | Column2
-----------------------
1 | aaaaa | Bill
2 | bbbbb | Sally
3 | ccccc | Kyle
SQL Query
SELECT * FROM Table1 WHERE Column1 = 'aaaaa' ORDER BY Column2 DESC
This query would return all the columns in the table named Table1 where the column named Column1 has a value of aaaaa it then will order the results by the value of Column2 and return the results in descending order to the client.
MongoDB
In MongoDB there are no tables, columns or rows ... instead their are Collections (these are like tables) and Documents inside the Collections (like rows.)
MongoDB Collection / Documents
{
"_id" : ObjectId("497ce96f395f2f052a494fd4"),
"attribute1" : "aaaaa",
"attribute2" : "Bill",
"randomAttribute" : "I am different"
}
{
"_id" : ObjectId("497ce96f395f2f052a494fd5"),
"attribute1" : "bbbbb",
"attribute2" : "Sally"
}
{
"_id" : ObjectId("497ce96f395f2f052a494fd6"),
"attribute1" : "ccccc",
"attribute2" : "Kyle"
}
However there is no predefined "table structure" or "schema" like a SQL table. For example you can see the second document in this collection has an attribute called randomAttribute that none of the other documents have.
This is just fine, it won't affect our queries but does allow for for some very powerful things ...
The data is stored in a format called BSON which is very close to the Javascript JSON standard. You can find out more at http://bson.org/
MongoDB Query
SELECT * FROM Table1 WHERE Column1 = 'aaaaa' ORDER BY Column2 DESC
How would we do this same thing in MongoDB's Shell?
> db.collection1.find({attribute1:"aaaaa"}).sort({attribute2:-1});
Perhaps you can already see how similar a MongoDB query really is to SQL (while appearing quite different.) I have some posts up on http://learnmongo.com which might help you as well.

MongoDB is a document database : http://en.wikipedia.org/wiki/Document-oriented_database

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.