How to do batch insert using neo4j cypher and java

How to do batch insert using neo4j cypher and java - java

I am performing single insert using for each loop for each value.
How can we do batch insert using cypher queries.
Here is my code...
Controller
#PostMapping("/geohash")
public Set<String> create(#RequestParam String name, #RequestBody LatLng[] latLngs) {
double[][] polygonPoints = convertTo2dArrayOfLatLng(latLngs);
Set<String> geoHashesForPolygon = GeoHashUtils.geoHashesForPolygon(6, polygonPoints);
for (String geohash : geoHashesForPolygon) {
min = Math.min(min, geohash.length());
geohashes = neoService.create(name, geohash);
}
return geoHashesForPolygon;
}
I want to insert each geoHashesForPolygon as single node..
Cypher query
#Query("MATCH (c:C) WHERE c.name = {name} CREATE (g: G{name : {geohash}} )<-[:cToG]-(c) RETURN c,g")
public GeohashOfCluster create(#Param("name") String name,#Param("geohash") String geohash);

You can have the params in a list and unwind it to create nodes. Your query would be like WITH [{name:'',geohash:''},{name:'',geohash:''},{name:'',geohash:''}] as data UNWIND data as d MATCH (c:C) WHERE c.name = d.name CREATE (g: G{name :d.geohash} )<-[:cToG]-(c) RETURN c,g
Hope this helps!

Related

How to return one random element by Query

I'm trying to return random element in Spring using Query.
I have this:
#Override
public List<AdventureHolidays> findRandomTrekking() {
Query query = new Query();
query.addCriteria(Criteria.where("typeOfAdventureHolidays").is("trekking"));
return mongoTemplate.find(query, AdventureHolidays.class);
}
But this return me all elements that match my criteria,
I tried with:
return mongoTemplate.findOne(query, AdventureHolidays.class); but then I have required type List provided AdventureHoliday
Also I was using and tried with this, but on this way elements appear twice sometimes:
#Aggregation(pipeline = {"{'$match':{'typeOfAdventureHolidays':'trekking'}}", "{$sample:
{size:1}}"})
So I find a way with this Query, but its listing me all documents while I want just one random from collection

After some discussion this is what OP asked for:
private static Queue<AdventureHolidays> elementsToReturn = new LinkedList<>();
public AdventureHolidays findRandomTrekking() {
if (elementsToReturn.size() == 0) { //fetch data from db
Query query = new Query();
query.addCriteria(Criteria.where("typeOfAdventureHolidays")
.is("trekking"));
List<AdventureHolidays> newData = mongoTemplate.find(query, AdventureHolidays.class)
Collections.shuffle(newData);
elementsToReturn.addAll(newData);
}
return elementsToReturn.poll(); //this will crash if database is empty
}
Original answer.
You need to change return type of a method:
public AdventureHolidays findRandomTrekking() {
Query query = new Query();
query.addCriteria(Criteria.where("typeOfAdventureHolidays").is("trekking"));
return mongoTemplate.findOne(query, AdventureHolidays.class);
}

Improve Performance with Multiple Row Inserts and Fetches with Oracle SQL Stored Procedures and Java Spring

I currently have stored procedures for Oracle SQL, version 18c, for both inserting and fetching multiple rows of data from one parent table and one child table, being called from my Java Spring Boot application. Everything works fine, but it is extremely slow, for only a few rows of data.
When only inserting 70 records between the two, it takes up to 267 seconds into empty tables. Fetching that same data back out takes about 40 seconds.
Any help would be greatly appreciated or if there is any additional information needed from me.
Below is a cut down and renamed version of my stored procedures for my parent and child tables, actual parent table has 32 columns and child has 11.
PROCEDURE processParentData(
i_field_one varchar2,
v_parent_id OUT number) is
v_new PARENT%ROWTYPE;
BEGIN
v_new.id := ROW_SEQUENCE.nextval;
v_new.insert_time := systimestamp;
v_new.field_one := i_field_one;
insert into PARENT values v_new;
v_parent_id := v_new.id;
END;
PROCEDURE readParentData(
i_field_one IN varchar2,
v_parent OUT SYS_REFCURSOR) AS
BEGIN
OPEN v_parent FOR select h.* from PARENT h
where h.field_one = i_field_one;
END;
PROCEDURE processChild(
i_field_one varchar2,
i_parent_id number) is
v_new CHILD%ROWTYPE;
BEGIN
v_new.id := ROW_SEQUENCE.nextval;
v_new.insert_time := systimestamp;
v_new.field_one := i_field_one;
v_new.parent_id := i_parent_id;
insert into CHILD values v_new;
END;
PROCEDURE readChild(
i_parent_id IN number,
v_child OUT SYS_REFCURSOR) AS
BEGIN
OPEN v_child FOR select h.* from CHILD h
where h.parent_id = i_parent_id;
END;
For my Java code I am using Spring JDBC. After I get the parent data, I then fetch each child data by looping through the parent data and calling readChild with the parent ID for each.
var simpleJdbcCall = new SimpleJdbcCall(jdbcTemplate)
.withCatalogName("PARENT_PACKAGE")
.withProcedureName("processParentData");
SqlParameterSource sqlParameterSource = new MapSqlParameterSource()
.addValue("i_field_one", locationId)
.addValue("v_parent_id", null);
Map<String, Object> out = simpleJdbcCall.execute(sqlParameterSource);
var stopId = (BigDecimal) out.get("v_parent_id");
return stopId.longValue();
var simpleJdbcCall = new SimpleJdbcCall(jdbcTemplate)
.withCatalogName("PARENT_PACKAGE")
.withProcedureName("readParentData")
.returningResultSet("v_parent", BeanPropertyRowMapper.newInstance(Parent.class));
SqlParameterSource sqlParameterSource = new MapSqlParameterSource()
.addValue("i_field_one", location.getId());
Map<String, Object> out = simpleJdbcCall.execute(sqlParameterSource);
return (List<Parent>) out.get("v_parent");
UPDATE 1: As I know and have tested, using the same data and tables, if I use pure JDBC or JPA/Hibernate for inserting and fetching to the tables directly and avoid using stored procedures, then the whole process of inserting and fetching only takes a few seconds.
The issue is, at the company I work at, they have set a policy that all applications going forward are not allowed to have direct read/write access to the database and everything must be done through stored procedures, they say for security reasons. Meaning I need to workout how to do the same thing we have been doing for years with direct read/write access, now with only using Oracle stored procedures.
UPDATE 2: Adding my current Java code for fetching the child data.
for (Parent parent : parents) {
parent.setChilds(childRepository.readChildByParentId(parent.getId()));
}
public List<Child> readChildByParentId(long parentId) {
var simpleJdbcCall = new SimpleJdbcCall(jdbcTemplate)
.withCatalogName("CHILD_PACKAGE")
.withProcedureName("readChild")
.returningResultSet("v_child", BeanPropertyRowMapper.newInstance(Child.class));
SqlParameterSource sqlParameterSource = new MapSqlParameterSource()
.addValue("i_parent_id ", parentId);
Map<String, Object> out = simpleJdbcCall.execute(sqlParameterSource);
return (List<Child>) out.get("v_child");
}

The problem is that the insert you are trying to perform using the stored procedure is not optimized, because you are calling the database every time you try to insert a row.
I strongly recommend you to transform the data to XML (for example, you can also use CSV) and pass it to the procedure, then loop over it and perform the inserts that you need.
Here is an example made using Oracle:
CREATE OR REPLACE PROCEDURE MY_SCHEMA.my_procedure(xmlData clob) IS
begin
FOR CONTACT IN (SELECT *
FROM XMLTABLE(
'/CONTACTS/CONTACT' PASSING
XMLTYPE(contactes)
COLUMNS param_id FOR ORDINALITY
,id NUMBER PATH 'ID'
,name VARCHAR2(100) PATH 'NAME'
,surname VARCHAR2(100) PATH 'SURNAME'
))
LOOP
INSERT INTO PARENT_TABLE VALUES CONTACT.id, CONTACT.name, CONTACT.surname;
end loop;
end;
The XML, you can use a String to pass the data to the procedure:
<CONTACTS>
<CONTACT>
<ID>1</ID>
<NAME>Jonh</NAME>
<SURNAME>Smith</SURNAME>
</CONTACT>
<CONTACTS>

For my Java code I am using Spring JDBC. After I get the parent data, I then fetch each child data by looping through the parent data and calling readChild with the parent ID for each.
Instead of fetching child data in loop, you can modify your procedure to accept list of parent id and return all the data in one call.
It will be helpful if you share spring boot for loop code as well.
Update
Instead of fetching single parent details, you should have update your code like this. Also you have to update your procedure as well.
List<Long> parents = new ArrayList<>();
for (Parent parent : parents) {
parents.add(parent.getId());
}
You can use java streams but that is secondary things.
Now you have to modify your procedure and method to accept multiple parent ids.
List<Child> children = childRepository.readreadChildByParentId(parents);
public List<Child> readChildByParentId(long parentId) {
var simpleJdbcCall = new SimpleJdbcCall(jdbcTemplate)
.withCatalogName("CHILD_PACKAGE")
.withProcedureName("readChild")
.returningResultSet("v_child", BeanPropertyRowMapper.newInstance(Child.class));
SqlParameterSource sqlParameterSource = new MapSqlParameterSource()
.addValue("i_parent_id ", parentId);
Map<String, Object> out = simpleJdbcCall.execute(sqlParameterSource);
return (List<Child>) out.get("v_child");
}
After having all the children you can set parent children via java code.
P.S.
Could you please check if you fetch parents with children if parent is coming from the database?

Your performance problems are probably related with the number of operations performed against the database: you are iterating in Java your collections, and interacting with the database in every iteration. You need to minimize the number of operations performed.
One possible solution can be the use of the standard STRUCT and ARRAY Oracle types. Please, consider for instance the following example:
public static void insertData() throws SQLException {
DriverManagerDataSource dataSource = ...
JdbcTemplate jdbcTemplate = new JdbcTemplate(dataSource);
jdbcTemplate.setResultsMapCaseInsensitive(true);
SimpleJdbcCall insertDataCall = new SimpleJdbcCall(jdbcTemplate)
.withCatalogName("parent_child_pkg")
.withProcedureName("insert_data")
.withoutProcedureColumnMetaDataAccess()
.useInParameterNames("p_parents")
.declareParameters(
new SqlParameter("p_parents", OracleTypes.ARRAY, "PARENT_ARRAY")
);
OracleConnection connection = null;
try {
connection = insertDataCall
.getJdbcTemplate()
.getDataSource()
.getConnection()
.unwrap(OracleConnection.class)
;
List<Parent> parents = new ArrayList<>(100);
Parent parent = null;
List<Child> chilren = null;
Child child = null;
for (int i = 0; i < 100; i++) {
parent = new Parent();
parents.add(parent);
parent.setId((long) i);
parent.setName("parent-" + i);
chilren = new ArrayList<>(1000);
parent.setChildren(chilren);
for (int j = 0; j < 1000; j++) {
child = new Child();
chilren.add(child);
child.setId((long) j);
child.setName("parent-" + j);
}
}
System.out.println("Inserting data...");
StopWatch stopWatch = new StopWatch();
stopWatch.start("insert-data");
StructDescriptor parentTypeStructDescriptor = StructDescriptor.createDescriptor("PARENT_TYPE", connection);
ArrayDescriptor parentArrayDescriptor = ArrayDescriptor.createDescriptor("PARENT_ARRAY", connection);
StructDescriptor childTypeStructDescriptor = StructDescriptor.createDescriptor("CHILD_TYPE", connection);
ArrayDescriptor childArrayDescriptor = ArrayDescriptor.createDescriptor("CHILD_ARRAY", connection);
Object[] parentArray = new Object[parents.size()];
int pi = 0;
for (Parent p : parents) {
List<Child> children = p.getChildren();
Object[] childArray = new Object[children.size()];
int ci = 0;
for (Child c : children) {
Object[] childrenObj = new Object[2];
childrenObj[0] = c.getId();
childrenObj[1] = c.getName();
STRUCT childStruct = new STRUCT(childTypeStructDescriptor, connection, childrenObj);
childArray[ci++] = childStruct;
}
ARRAY childrenARRAY = new ARRAY(childArrayDescriptor, connection, childArray);
Object[] parentObj = new Object[3];
parentObj[0] = p.getId();
parentObj[1] = p.getName();
parentObj[2] = childrenARRAY;
STRUCT parentStruct = new STRUCT(parentTypeStructDescriptor, connection, parentObj);
parentArray[pi++] = parentStruct;
}
ARRAY parentARRAY = new ARRAY(parentArrayDescriptor, connection, parentArray);
Map in = Collections.singletonMap("p_parents", parentARRAY);
insertDataCall.execute(in);
connection.commit();
stopWatch.stop();
System.out.println(stopWatch.prettyPrint());
} catch (Throwable t) {
t.printStackTrace();
connection.rollback();
} finally {
if (connection != null) {
try {
connection.close();
} catch (Throwable nested) {
nested.printStackTrace();
}
}
}
}
Where:
CREATE OR REPLACE TYPE child_type AS OBJECT (
id NUMBER,
name VARCHAR2(512)
);
CREATE OR REPLACE TYPE child_array
AS TABLE OF child_type;
CREATE OR REPLACE TYPE parent_type AS OBJECT (
id NUMBER,
name VARCHAR2(512),
children child_array
);
CREATE OR REPLACE TYPE parent_array
AS TABLE OF parent_type;
CREATE SEQUENCE PARENT_SEQ INCREMENT BY 1 MINVALUE 1;
CREATE SEQUENCE CHILD_SEQ INCREMENT BY 1 MINVALUE 1;
CREATE TABLE parent_table (
id NUMBER,
name VARCHAR2(512)
);
CREATE TABLE child_table (
id NUMBER,
name VARCHAR2(512),
parent_id NUMBER
);
CREATE OR REPLACE PACKAGE parent_child_pkg AS
PROCEDURE insert_data(p_parents PARENT_ARRAY);
END;
CREATE OR REPLACE PACKAGE BODY parent_child_pkg AS
PROCEDURE insert_data(p_parents PARENT_ARRAY) IS
l_parent_id NUMBER;
l_child_id NUMBER;
BEGIN
FOR i IN 1..p_parents.COUNT LOOP
SELECT parent_seq.nextval INTO l_parent_id FROM dual;
INSERT INTO parent_table(id, name)
VALUES(l_parent_id, p_parents(i).name);
FOR j IN 1..p_parents(i).children.COUNT LOOP
SELECT child_seq.nextval INTO l_child_id FROM dual;
INSERT INTO child_table(id, name, parent_id)
VALUES(l_child_id, p_parents(i).name, l_parent_id);
END LOOP;
END LOOP;
END;
END;
And Parent and Child are simple POJOs:
import java.util.ArrayList;
import java.util.List;
public class Parent {
private Long id;
private String name;
private List<Child> children = new ArrayList<>();
public Long getId() {
return id;
}
public void setId(Long id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public List<Child> getChildren() {
return children;
}
public void setChildren(List<Child> children) {
this.children = children;
}
}
public class Child {
private Long id;
private String name;
public Long getId() {
return id;
}
public void setId(Long id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}
Please, forgive for the code legibility and incorrect error handling, I will improve the answer later including some information about obtaining the data as well.

The times you mention are horrible indeed. A big boost forward in performance will be to work set based. This means reducing the row by row database calls.
Row by row is synonymous for slow, especially when network round trips are involved.
One call to get the parent.
One call to get the set of children and process them. The jdbc fetch size is a nice tunable here. Give it a chance to work for you.

You do not need to use DYNAMIC SQL OPEN v_parent FOR and also it is not clear how the view v_parent is defined.
Try to check exec plan of this query:
FOR select h.* from PARENT h where h.field_one = ?;
Usually returning recordset via SYS_REFCURSOR increases performance when you return more (let's say) than 10K records.

The SimpleJdbcCall object can be reused in your scenario as only the parameters changes. The SimpleJdbcCall object compiles the jdbc statement on the first invocation. It does some meta-data fetching and it interacts with the Database for that. So, having separate objects would mean fetching same metadata that many times which is not needed.
So, I suggest to initialise all the 4 SimpleJdbcCall objects in the very beginning and then work with them.
var insertParentJdbcCall = new SimpleJdbcCall(jdbcTemplate)
.withCatalogName("PARENT_PACKAGE")
.withProcedureName("processParentData");
var readParentJdbcCall = new SimpleJdbcCall(jdbcTemplate)
.withCatalogName("PARENT_PACKAGE")
.withProcedureName("readParentData")
.returningResultSet("v_parent", BeanPropertyRowMapper.newInstance(Parent.class));
var insertChildJdbcCall = new SimpleJdbcCall(jdbcTemplate)
.withCatalogName("CHILD_PACKAGE")
.withProcedureName("processChildData");
var readChildJdbcCall = new SimpleJdbcCall(jdbcTemplate)
.withCatalogName("CHILD_PACKAGE")
.withProcedureName("readChild")
.returningResultSet("v_child", BeanPropertyRowMapper.newInstance(Child.class));

Is there a way to make JPQL #Query annotation dynamic?

Currently, I am trying to make a JPQL query in my repository within my Spring project,
this is my current code for the repository
#Query("select d.denda from DataTransaksi d WHERE d.tanggal= 1170130 AND d.nama = Suratno AND d.masaPajak=2016")
Collection<DataTransaksiModel> findAllDenda();
However, d.tanggal, d.nama, and d.masaPajak will not always be the same. I want to use the method here
if(file.getContentType().equalsIgnoreCase("application/vnd.ms-excel")) {
InputStreamReader input = new InputStreamReader(file.getInputStream());
CSVParser csvParser = CSVFormat.EXCEL.withFirstRecordAsHeader().parse(input);
for (CSVRecord record : csvParser) {
String tanggal = record.get("Tanggal");
String nama = record.get("Nama WP");
String masaPajak = record.get("Masa Pajak");
String denda = record.get("Denda");
String jumlahSetoran = record.get("Jumlah Setoran");
String pokok = record.get("Pokok");
String luasTanah = record.get("L.Tanah");
String luasBangunan = record.get("L. Bangunan");
Where d.tanggal, d.nama, and d.masaPajakis based on tanggal, nama, and masaPajakfrom the CSV that is uploaded.
Is there a way to make the value of d.tanggal, d.nama, and d.masaPajakdynamic and follow the variable that we set?

Just use query placeholders
#Query("select d.denda from DataTransaksi d WHERE d.tanggal=:x AND d.nama = :y AND d.masaPajak=:z")
Collection<DataTransaksiModel> findAllDenda(Longx,String y,Long z);
Adjust x,y,z types to suit your needs. Also you can renam params to be more descriptive

There are two ways you can do this, One way is to use indexed query params. Index value and the method parameter index should match in this case as below.
#Query("select d.denda from DataTransaksi d WHERE d.tanggal= ?1 AND d.nama = ?2 AND d.masaPajak=?3")
Collection<DataTransaksiModel> findAllDenda( Long x, String y, Long z); // These should be in exact order.
Other way is to use the named query parameters, In this way, you are free to define the mapping between the method parameter and the query parameter as below
#Query("select d.denda from DataTransaksi d WHERE d.tanggal= :x AND d.nama = :y AND d.masaPajak=:z")
Collection<DataTransaksiModel> findAllDenda( #Param("x") Long x, #Param("y") String y, #Param("z") Long z);

Aggregate HQL result to list

I want to group some sql data using criteria. Lets start with entity which looks mostly like this:
class CityEntity {
private String name;
private Date lastVisited;
}
What I want to do is to find all cities and return result in a tranformer:
class CityTransformer {
private String name;
private List<Date> lastVisited;
}
So as you can see sql result should group by name and put dates to a list.
I want to do it using criteria so it will look almost like this:
Criteria criteria = session.createCriteria(CityEntity.class, "ce");
criteria.setProjection(Projections.projectionList().add(Projections.groupProperty("name"), "name"));
criteria.setResultTransformer(Transformers.aliasToBean(CityTransformer.class));
List<CityTransformer> cities = criteria.list();
The problem is that I don't know how to aggregate dates (lastVisited) to list. Any help?
For example the input will look like this (name, lastVisited):
[Los Angeles, 10-11-2014],
[Los Angeles, 11-12-2011],
[LosAngeles, 10-01-2011],
[Berlin, 01-10-2011]
and output should look like this
[LosAngeles, list[10-11-2014, 11-12-2011, 10-01-2011]],
[Berlin, list[01-10-2011]]

You don't need SQL grouping for that. You can group it in Java:
Criteria criteria = session.createCriteria(CityEntity.class, "ce");
criteria.setProjection(
Projections.projectionList()
.add( Projections.property("ce.name"), "ceName" )
.add( Projections.property("ce.lastVisited"), "ceLastVisited" )
);
List<Object[]> citiesAndDates = (List<Object[]>) criteria.list();
Map<String, CityTransformer> cityTransformerMap = new HashMap<String, CityTransformer>();
for(Object[] citiesAndDate : citiesAndDates) {
String city = (String) citiesAndDate[0];
Date date = (Date) citiesAndDate[1];
CityTransformer cityTransformer = cityTransformerMap.get(city);
if(cityTransformer == null) {
cityTransformer = new CityTransformer();
cityTransformerMap.put(city, cityTransformer);
}
cityTransformer.getLastVisited().add(date);
}
return cityTransformerMap;

Creating morphia query with regex

I have a following classes:
class Document{
Map<EnumChannelType, Channel> data;
//some more fields
}
class Channel{
String topic;
//some more fields
}
enum EnumChannelType{
BASIC_CHANNEL(1), ADVANCED_CHANNEL(2),......;
int value;
//constructor and some methods
}
Now I want to query on topic inside Channel. If channelType is known, we can easily query as below:
Query<Document> createQuery(EnumChannelType channelType, String topic){
Query<Document> query = dao.createQuery().disableValidation();
query.field("data." + channelType.name() + ".topic").equal(topic);
return query;
}
But what if I want to get query for given only topic (channelType can be anything)? How can we create query for this?
One option is using or as follows:
Query<Document> createQueryForTopic(String topic) {
Query<Document> query = dao.createQuery().disableValidation();
// add all possible Channel Types
query.or(query.criteria("data." + EnumChannelType.BASIC_CHANNEL.name() + ".topic").equal(topic),
query.criteria("data." + EnumChannelType.ADVANCED_CHANNEL.name() + ".topic").equal(topic),
/*...add criteria for all possible channel types*/);
return query;
}
But this is not feasible if EnumChannelType is changing over time or if EnumChannelType has large number of members (like BASIC_CHANNEL(1), ADVANCED_CHANNEL(2),....).
I'm looking for something like...
Query<Document> createQuery(String topic){
Query<Document> query = dao.createQuery().disableValidation();
// use some regex instead of ????
query.field("data." + ???? + ".topic").equal(topic);
return query;
}

I almost sure that Morphia and MongoDB doesn't support regex on field names. In this case, the best option is use $or operator. You could traverse the whole enum to avoid errors:
List<Criteria> criterias = new ArrayList<Criteria>();
for(EnumChannelType v : EnumChannelType.values()) {
criterias.add(query.criteria("data." + v.name() + ".topic").equal(topic));
}
query.or(criterias.toArray(new Criteria[criterias.size()]);
Remember that $or operator executes queries in parallel and then merge the results.
Info: http://docs.mongodb.org/manual/reference/operator/or/#op._S_or

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to do batch insert using neo4j cypher and java - java

You can have the params in a list and unwind it to create nodes. Your query would be like WITH [{name:'',geohash:''},{name:'',geohash:''},{name:'',geohash:''}] as data UNWIND data as d MATCH (c:C) WHERE c.name = d.name CREATE (g: G{name :d.geohash} )<-[:cToG]-(c) RETURN c,g Hope this helps!

Related

How to return one random element by Query

Improve Performance with Multiple Row Inserts and Fetches with Oracle SQL Stored Procedures and Java Spring

Is there a way to make JPQL #Query annotation dynamic?

Aggregate HQL result to list

Creating morphia query with regex

Categories

Resources