performance issue of - one to many JPA records

performance issue of - one to many JPA records - java

I have relationship with one to many (parent- child ) relationship. while using JPA save method.
Parent have one record child have 5000 to 10000 records . 5k records are in list
#OneToMany(cascade = CascadeType.ALL, fetch = FetchType.LAZY)
#JoinColumn(name = testId")
private List<test> testList;
Likewise I defined . but while I am saving it take much time . how to reduce the save timing .

The first thing you can try (as you need to insert large amount of elements) is to use batch insert. You need to change your application.yml as follows:
spring:
jpa:
show-sql: true # Just to make sure that your inserts are batched
properties:
hibernate:
jdbc:
batch_size: 250 # 250 is just an example, feel free to change it as you want

Related

Spring JPA - Hibernate: Batch insert execute too much select nextval (‘sequence’)

Now I'm trying to enhance performance of my web application, I use spring JPA 2.3.0- Hibernate 5.4.15.Final, Postgres 12 and manage transaction by #Transaction. The web app is deployed on aws beanstalk, run multiple instances at the same time, but the database instance is not scalable. And I use bigSerial type for ID of tables.
For instance, I have a STUDENTS table, ID is bigSerial and some other columns.
I got a problems when using
#GeneratedValue(strategy = GenerationType.IDENTITY)
,
Hibernate couldn't batch insert when saving a list of entities.
And I try to use
#GeneratedValue(strategy = GenerationType.AUTO, generator = "students_id_seq")
#SequenceGenerator(name = "students_id_seq", sequenceName = "students_id_seq")
hibernate.id.new_generator_mappings=false
hibernate.jdbc.batch_size=10
hibernate.order_inserts=true
hibernate.order_updates=true
hibernate.batch_versioned_data=true
It seem Hibernate could batch insert, but the problem is Hibernate execute select nextval ('students_id_seq') multiple times. If an entity list has 30 items, Hibernate execute select nextval 30 times, and 3 times for batch insert query.
Some statistics:
If using GenerationType.IDENTITY
save(entity):
insert into ... : execute once
saveAll(n entities)
insert into ... : execute n times
If using GenerationType.SEQUENCE/ GenerationType.AUTO
save(entity):
select nextval ('students_id_seq'): execute once
insert into ... : execute once
saveAll(n entities):
select nextval ('students_id_seq'): execute n times
insert into ... : execute n/batch_size times
In conclusion, If using GenerationType.AUTO or GenerationType.SEQUENCE with allocationSize = 1:
when insert one entity, application increases 100% times to execute
queries ( from one insert query only increase to 2 queries: select
nextval, and insert query )
when batch insert, application increase more than 10% if batch_size = 10
My question is, is there anyway to batch insert but not execute to many select nextval query ? Something likes GenerationType.IDENTITY, not execute select nextval, just batch insert and IDs will be handled by sequence in the database.
When I test with GenerationType.SEQUENCE and allocationSize=1 (GenerationType.AUTO), the application executes too much select nextval query, I think It is even worse than the IDENTITY strategy.
And for some reasons, I don't want to use allocationSize, it may lead to duplicate primary key error when run insert query manual or when migrate data or some other cases.
After some research, I found a way to get a value list of a sequence:
select nextval ('students_id_seq') from generate_series(1,10);
We can replace 10 by entityList.size() or number of entities doesn't have ID in the entityList when batch insert, just get enough to use, don't create too much gap between IDs, but I'm not sure whether or not Hibernate supported, if yes, please share me the documentation to reference.
Thank you
https://discourse.hibernate.org/t/batch-insert-execute-too-much-select-nextval-sequence/4232

What you are looking for is the HiLo algorithm for id generation.
For every id generated from a sequence it generates a number of ids on the client without accessing the database.
You configure it on your entity as this:
#GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "hilo_sequence_generator")
#GenericGenerator(
name = "hilo_sequence_generator",
strategy = "org.hibernate.id.enhanced.SequenceStyleGenerator",
parameters = {
#Parameter(name = "sequence_name", value = "hilo_seqeunce"),
#Parameter(name = "initial_value", value = "1"),
#Parameter(name = "increment_size", value = "3"),
#Parameter(name = "optimizer", value = "hilo")
})
#Id
private Long id;

I would say that I have some experience of that point. I was doing insertion for more than 128,000 records.
And my target was to enhance the time-consuming to do that. I would try to summarize the case as below :
The code did not use any persist() or save() methods. These records were saved when the #Transactionl method exits
I was using hibernate batch inserts below are the settings in the config map
spring.jpa.properties.hibernate.jdbc.batch_size: "40"
spring.jpa.properties.hibernate.order_inserts: "true"
spring.jpa.properties.hibernate.order_updates: "true"
spring.main.allow-bean-definition-overriding: "true"
I modified the allocation size in my entity Id configuration as below:
#Id
#GeneratedValue(strategy = GenerationType.SEQUENCE, generator ="mappingentry_sql_generator")
#SequenceGenerator(name = "mappingentry_sql_generator", sequenceName ="mappingentry_id_seq", allocationSize = 40)
private Long id;
Notes: I set the "allocationSize" in the sequence generator to be equal to the "batch_size" value in the setting
Also, I altered my sequence "mappingentry_id_seq" to increment by 40 as well
After doing these changes the time was reduced from 55 seconds to 20 seconds which was a great impact
The only thing that I did not understand is that when I checked the value of the id column that was generated by the sequence I did not find any value gap. Every ID value exceeded the previous by 1, not 40. That is what I am currently trying to understand

Is Session.Save sending a request to the database?

I have to improve the performance of a very slow code and I am pretty new to Hibernate. I have studied carefully the code and concluded that the issue is that it has a large set of entities to load and update/insert. To translate the algorithm to a more digestible example, let's say we have an algorithm like this:
for each competitionToSave in competitionsToSave
competition <- load a Competition by competitionToSave from database
winner <- load Person by competitionToSave.personID
do some preprocessing
if (newCompetition) then
insert competition
else
update competition
end if
end for
This algorithm is of course problematic when there are lots of competitions in competitionToSave. So, my plan is to select all competitions and winners involved with two database requests the most, preprocess data, which will quicken the read, but more importantly, to make sure I will save via insert/update batches of 100 competitions instead of saving them separately. Since I am pretty new to Hibernate, I consulted the documentation and found the following example:
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
However, I am not sure I understand it correctly. About the method .save() I read:
Persist the given transient instance, first assigning a generated
identifier. (Or using the current value of the identifier property if
the assigned generator is used.) This operation cascades to associated
instances if the association is mapped with cascade="save-update".
But it is unclear to me whether a request to the database is send upon every save. Am I accurate if I assume that in the example taken from the documentation session.save(customer) saves the modification of the object in the Session without sending a request to the database and then on every 20th item the session.flush() sends the request to the database and session.clear() removes the cache of the Session?

You are correct in your assumptions, though the inserts will be triggered one-by-one:
insert into Customer(id , name) values (1, 'na1');
insert into Customer(id , name) values (2, 'na2');
insert into Customer(id , name) values (3, 'na3');
You can try and take advantage of the bulk insert feature to increase the performance even more.
There is hibernate property which you can define as one of the properties of hibernate's SessionFactory:
<property name="jdbc.batch_size">20</property>
With this batch setting you should have output like this after each flush:
insert into Customer(id , name) values (1, 'na1') , (2, 'na2') ,(3, 'na3')..
One insert instead of a twenty.

Spring data JPA with rest: how to filter out a subset of records in a one-to-many mapping

I am a bad java programmer...
Maybe I am designing my database in the wrong way. Any clue is welcome.
Now it's like this:
table1 named node has:
id
info
update-date
table2 named node_records has:
id
node_id(reverse reference to node)
value
create-date
I want to select all records in node_records that node_record.node_id = node.id and node_record.create-date = node.update-date.
In Node.java:
#OneToMany(targetEntity = NodeRecord.class, mappedBy = "node")
List<NodeRecord> nodeRecord;
In NodeRecord.java:
#ManyToOne(fetch=FetchType.LAZY)
#JoinColumn(name="id")
Node node;
For now, I can select all records where record.node_id = node.id, but I searched for hours trying to find out how to select records that also satisfies condition record.creat-date = node.updated-date.

Use #Query annotation: http://docs.spring.io/spring-data/jpa/docs/2.0.0.M1/reference/html/#jpa.query-methods.at-query
Or
Use Spring data custom implements: http://docs.spring.io/spring-data/jpa/docs/current/reference/html/#repositories.custom-implementations

Dialects can help here, There are database specific date function which support specific dialects, like:
If your database is mysql so you can use "date" function in your Query
Query query = session.createQuery("from NodeRecord as nr join nr.node as n where date(nr.createdDate) = date(n.updateDate)");
If you are using oracle then can use TRUNC function in your query.

Hibernate: multiple selects #OneToMany

I have three entity:
Users(id, username, password...); BoughtItems(id, userid, itemcat, itemid); ItemCustomizations(id(id of item), slot(id of color), color)
When i fetch like this in BoughtItems.java:
#OneToMany(fetch = FetchType.EAGER, mappedBy = "id")
#OrderBy("slot")
private List<ItemCustomization> customColors = new ArrayList<>();
Hibernate asks database for each item:
Hibernate: select customcolo0_.itemid as itemid1_2_0_, customcolo0_.itemid
as itemid1_3_0_, customcolo0_.slot as slot2_3_0_, customcolo0_.itemid as
itemid1_3_1_, customcolo0_.slot as slot2_3_1_, customcolo0_.color as
color3_3_1_ from shop_customizations customcolo0_ where
customcolo0_.itemid=? order by customcolo0_.slot
My question
How to proper optimize my mappings to reduce mysql server performance hit?
Can I somehow optimize query/item to 1 query?

I believe you are looking for batch size feature of hibernate. See example here.
Batch size defines how many collections should be loaded. For example if you have 10 BoughtItems in session with batch size =5 , only two sql queries will be fired to fetch customColors collections for all possible BoughtItems
See Possible fetching strategies in hibernate

Hibernate JPA: one table has one-to-many with more than 10 tables, and cause performance

we have now, one basic table, T_RESOURCE, which contains the basic resource information, and some other tables which has relationship with T_RESOURCE, like T_RESOURCE_TAGS, which contains tags on resource in T_RESOURCE, a resource could have more than one tags, this is one kind of this table; there is another kind of this relationship, connected with a middle class, like T_FILES is connected with T_RESOURCE with T_RESOURCE_FILES, indeed T_RESOURCE_TAGS is also a middle table, but no table T_TAGS just like T_FILES. We totally have 11 relationships with these 2 kinds formats(or forms).
and we have a entity class for T_RESOURCE, in which all the related tables are connected with something like below:
#Field(index=Index.YES, analyze=Analyze.NO, store=Store.NO)
#IndexedEmbedded(indexNullAs = Field.DEFAULT_NULL_TOKEN)
#LazyCollection(LazyCollectionOption.FALSE)
#OneToMany
#JoinTable(name="T_RESOURCE_FILES",joinColumns={ #JoinColumn(name="RESOURCE_ID", referencedColumnName="ID") },inverseJoinColumns={ #JoinColumn(name="FILE_ID", referencedColumnName="ID", unique=true) })
private List<TFILE> files;
#IndexedEmbedded(indexNullAs = Field.DEFAULT_NULL_TOKEN)
#LazyCollection(LazyCollectionOption.FALSE)
#ElementCollection
#CollectionTable(name = "T_RESOURCE_TAG", joinColumns = #JoinColumn(name = "RESOURCE_ID"))
#Column(name = "tag")
private List<String> tags;
and we will get all the resource information in T_RESOURCE with all the resource ids(RESOURCE_ID) that fit for the user's search criteria. we got the results using a native query:
em.createNativeQuery(sql.toString(),TRESOUCE.class).setFirstResult(firstResult).setMaxResults(pageSize).getResultList();
the sql is something like
"select * from T_RESOURCE where id in ('1','2','3','4') order by ..."
It works in this way, but it cost much time to get tags from T_RESOURCE_TAGS, from T_RESOURCE_FILES, from the other 9 relationships, it works in way:
select * from T_RESOURCE_TAGS where asset_id = '1';
select * from T_RESOURCE_TAGS where asset_id = '2';
select * from T_RESOURCE_TAGS where asset_id = '3';
select * from T_RESOURCE_TAGS where asset_id = '4';
and in this way, we will find that more time cost if more resources are fit for the search criteria.
I also tried with join fetch, but seems there is limit on join fetch numbers, not work.
I also want to seperate these relationships from TRESOUCE class, and try to get them just like get TRESOURCE basic information. for information, for the middle table, there is no unique id for each record, only resouce_id with related id/or directly name for files, tags, etc.
Could you please give your ideas, thanks very much.

Some DB like Postgres has very useful array type. You can store tags without additional table inside one column. Hibernate supports mapping custom data types like arrays to Java objects. It can significantly improve performance.
But Oracle array is different case - you have to specify array length when creating table.

You an improve performance by using the annotation org.hibernate.annotations.BatchSize on your collection.
#Column(name = "tag")
#BatchSize(size=20)
private List<String> tags;
will cause it to execute the following statement when loading the object:
select * from T_RESOURCE_TAGS where asset_id IN ('1', '2', '3', '4');
This will save several round trips to the database and speed up your application.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.