Batch update does not works when using JPQL - java

I'm trying to add a batch update to my spring boot project. But multiple queries are still executed when I check SQL logs and Hibernate stats.
Hibernate stats
290850400 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
3347700 nanoseconds spent preparing 19 JDBC statements;
5919028800 nanoseconds spent executing 19 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
2635900 nanoseconds spent executing 1 flushes (flushing a total of 1 entities and 0 collections);
19447300 nanoseconds spent executing 19 partial-flushes (flushing a total of 18 entities and 18 collections)
Versions
Spring Boot v2.7.1
Spring v5.3.21
Java 17.0.3.1
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
application.yml
spring:
datasource:
url: jdbc:oracle:thin:#localhost:1521/db
username: user
password: password
driver-class-name: oracle.jdbc.OracleDriver
jpa:
database-platform: org.hibernate.dialect.Oracle12cDialect
hibernate:
naming:
physical-strategy: org.hibernate.boot.model.naming.PhysicalNamingStrategyStandardImpl
ddl-auto: update
show-sql: true
properties:
hibernate:
dialect: org.hibernate.dialect.Oracle12cDialect
format_sql: false
jdbc:
fetch_size: 100
batch_size: 5
order_updates: true
batch_versioned_data: true
generate_statistics: true
Snapshot entity
#Entity
#Table(name = "SNAPSHOT", schema = "SYSTEM", catalog = "")
public class Snapshot {
#GeneratedValue(strategy = GenerationType.SEQUENCE)
#Id
#Column(name = "ID")
private long id;
#Basic
#Column(name = "CREATED_ON")
private String createdOn;
...
}
SnapshotRepository
#Repository
public interface SnapshotRepository extends JpaRepository<Snapshot, Long> {
#Modifying
#Query("UPDATE Snapshot s SET s.fieldValue =?1,s.createdOn=?2 where s.id = ?3 and s.fieldName = ?4")
int updateSnapshot(String fieldValue, String createdOn, String id, String fieldName);
}
And this repository method is called from the service class.
for (Map.Entry<String, String> entry : res.getValues().entrySet()) {
snapshotRepository.updateSnapshot(entry.getValue(), createdOn, id, entry.getKey());
}
pom.xml
<dependency>
<groupId>com.oracle.database.jdbc</groupId>
<artifactId>ojdbc11</artifactId>
<version>21.7.0.0</version>
</dependency>
In the application.yml, I think I'm configuring all required properties to activate batch update but still no luck.
Please let me know what I'm doing incorrectly.

Batching only works when Hibernate does the flushing of entities. If you are executing manual queries, Hibernate can't do batching.
The way you are implementing this, Hibernate will reuse the same prepared statement on the database side though, but no JDBC batching.

Related

Hibernate Inconsistent and Conflicting Results with Merge

VERSION
Red Hat Build of Quarkus
<properties>
<compiler-plugin.version>3.8.1</compiler-plugin.version>
<maven.compiler.parameters>true</maven.compiler.parameters>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<quarkus-plugin.version>1.7.5.Final</quarkus-plugin.version>
<quarkus.platform.artifact-id>quarkus-universe-bom</quarkus.platform.artifact-id>
<quarkus.platform.group-id>io.quarkus</quarkus.platform.group-id>
<quarkus.platform.version>1.7.5.Final</quarkus.platform.version>
<surefire-plugin.version>3.0.0-M5</surefire-plugin.version>
</properties>
I am totally confused about this inconsistency and despite logging set to the maximum, all I see is that my code is not behaving the same in this project as compared to other projects.
Everything is configured the same for versions and settings.
I have three quarkus jax-rs services with hibernate-orm pushing to PostgreSQL.
2 of the services all behave as expected with approximately 25 identically configured PUT endpoints with a merge backing manager (code below). In the two working services, the statement in the logs shows where an UPDATE statement is required. However, the third (new) service simply ignores the change. It responds with 200 and even returns the updated data, but the database is never updated.
simple entity
#Entity
#Table(name = "applicants")
public class Applicant implements Serializable {
private static final long serialVersionUID = 6796452320678418766L;
#Column(name = "applicant_id")
#Id
#GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "applicant_seq")
public long applicant_id;
#Column(name = "email", nullable = false, unique = true)
public String email;
}
#PUT Resource
#PUT
#Path("{id}")
public Response update(#PathParam("id") Long id, #Valid Applicant applicant) {
try {
Applicant applicantToUpdate = this.manager.findById(id);
if (applicantToUpdate != null) {
applicantToUpdate = this.manager.save(applicant);
if (applicantToUpdate != null) {
return Response.ok(applicantToUpdate).build();
} else {
...
}
}
return Response.status(Response.Status.NOT_FOUND).build();
} catch (Exception ex) {
...
}
}
Backing Manager for persistance
#Transactional
public Applicant save(Applicant applicant) {
return this.em.merge(applicant);
}
Both the ApplicantResource and the ApplicantManager classes are #RequestScoped.
The logs for the failing service
2020-11-16 13:51:26,122 DEBUG [org.hib.res.jdb.int.LogicalConnectionManagedImpl] (executor-thread-198) `hibernate.connection.provider_disables_autocommit` was enabled. This setting should only be enabled when you are certain that the Connections given to Hibernate by the ConnectionProvider have auto-commit disabled. Enabling this setting when the Connections do not have auto-commit disabled will lead to Hibernate executing SQL operations outside of any JDBC/SQL transaction.
2020-11-16 13:51:26,123 DEBUG [org.hib.SQL] (executor-thread-198)
select
applicant0_.applicant_id as applican1_0_0_,
applicant0_.email as email2_0_0_
from
applicants applicant0_
where
applicant0_.applicant_id=?
Hibernate:
select
applicant0_.applicant_id as applican1_0_0_,
applicant0_.email as email2_0_0_
from
applicants applicant0_
where
applicant0_.applicant_id=?
2020-11-16 13:51:26,156 DEBUG [org.hib.loa.pla.exe.pro.int.EntityReferenceInitializerImpl] (executor-thread-198) On call to EntityIdentifierReaderImpl#resolve, EntityKey was already known; should only happen on root returns with an optional identifier specified
2020-11-16 13:51:26,156 DEBUG [org.hib.eng.int.TwoPhaseLoad] (executor-thread-198) Resolving attributes for [org.protechskillsinstitute.applicant.entity.Applicant#1]
2020-11-16 13:51:26,157 DEBUG [org.hib.eng.int.TwoPhaseLoad] (executor-thread-198) Processing attribute `email` : value = original#email.com
2020-11-16 13:51:26,157 DEBUG [org.hib.eng.int.TwoPhaseLoad] (executor-thread-198) Attribute (`email`) - enhanced for lazy-loading? - false
2020-11-16 13:51:26,157 DEBUG [org.hib.eng.int.TwoPhaseLoad] (executor-thread-198) Done materializing entity [org.protechskillsinstitute.applicant.entity.Applicant#1]
2020-11-16 13:51:26,159 DEBUG [org.hib.res.jdb.int.LogicalConnectionManagedImpl] (executor-thread-198) Initiating JDBC connection release from afterStatement
2020-11-16 13:51:26,159 DEBUG [org.hib.loa.ent.pla.AbstractLoadPlanBasedEntityLoader] (executor-thread-198) Done entity load : org.protechskillsinstitute.applicant.entity.Applicant#1
2020-11-16 13:51:26,159 DEBUG [org.hib.res.jdb.int.LogicalConnectionManagedImpl] (executor-thread-198) Initiating JDBC connection release from afterTransaction
2020-11-16 13:51:26,160 DEBUG [org.hib.res.jdb.int.LogicalConnectionManagedImpl] (executor-thread-198) `hibernate.connection.provider_disables_autocommit` was enabled. This setting should only be enabled when you are certain that the Connections given to Hibernate by the ConnectionProvider have auto-commit disabled. Enabling this setting when the Connections do not have auto-commit disabled will lead to Hibernate executing SQL operations outside of any JDBC/SQL transaction.
2020-11-16 13:51:26,160 DEBUG [org.hib.res.tra.bac.jta.int.JtaTransactionCoordinatorImpl] (executor-thread-198) Hibernate RegisteredSynchronization successfully registered with JTA platform
2020-11-16 13:51:26,161 DEBUG [org.hib.res.tra.bac.jta.int.JtaTransactionCoordinatorImpl] (executor-thread-198) JTA transaction was already joined (RegisteredSynchronization already registered)
2020-11-16 13:51:26,161 DEBUG [org.hib.eve.int.EntityCopyObserverFactoryInitiator] (executor-thread-198) Configured EntityCopyObserver strategy: disallow
2020-11-16 13:51:26,161 DEBUG [org.hib.loa.Loader] (executor-thread-198) Loading entity: [org.protechskillsinstitute.applicant.entity.Applicant#1]
2020-11-16 13:51:26,212 DEBUG [org.hib.SQL] (executor-thread-198)
select
applicant0_.applicant_id as applican1_0_0_,
applicant0_.email as email2_0_0_
from
applicants applicant0_
where
applicant0_.applicant_id=?
Hibernate:
select
applicant0_.applicant_id as applican1_0_0_,
applicant0_.email as email2_0_0_
from
applicants applicant0_
where
applicant0_.applicant_id=?
2020-11-16 13:51:26,216 DEBUG [org.hib.loa.Loader] (executor-thread-198) Result set row: 0
2020-11-16 13:51:26,216 DEBUG [org.hib.loa.Loader] (executor-thread-198) Result row: EntityKey[org.protechskillsinstitute.applicant.entity.Applicant#1]
2020-11-16 13:51:26,217 DEBUG [org.hib.eng.int.TwoPhaseLoad] (executor-thread-198) Resolving attributes for [org.protechskillsinstitute.applicant.entity.Applicant#1]
2020-11-16 13:51:26,217 DEBUG [org.hib.eng.int.TwoPhaseLoad] (executor-thread-198) Processing attribute `email` : value = original#email.com
2020-11-16 13:51:26,217 DEBUG [org.hib.eng.int.TwoPhaseLoad] (executor-thread-198) Attribute (`email`) - enhanced for lazy-loading? - false
2020-11-16 13:51:26,217 DEBUG [org.hib.eng.int.TwoPhaseLoad] (executor-thread-198) Done materializing entity [org.protechskillsinstitute.applicant.entity.Applicant#1]
2020-11-16 13:51:26,218 DEBUG [org.hib.res.jdb.int.LogicalConnectionManagedImpl] (executor-thread-198) Initiating JDBC connection release from afterStatement
2020-11-16 13:51:26,218 DEBUG [org.hib.loa.Loader] (executor-thread-198) Done entity load
2020-11-16 13:51:26,219 DEBUG [org.hib.eve.int.AbstractFlushingEventListener] (executor-thread-198) Processing flush-time cascades
2020-11-16 13:51:26,219 DEBUG [org.hib.eve.int.AbstractFlushingEventListener] (executor-thread-198) Dirty checking collections
2020-11-16 13:51:26,219 DEBUG [org.hib.eve.int.AbstractFlushingEventListener] (executor-thread-198) Flushed: 0 insertions, 0 updates, 0 deletions to 1 objects
2020-11-16 13:51:26,220 DEBUG [org.hib.eve.int.AbstractFlushingEventListener] (executor-thread-198) Flushed: 0 (re)creations, 0 updates, 0 removals to 0 collections
2020-11-16 13:51:26,220 DEBUG [org.hib.int.uti.EntityPrinter] (executor-thread-198) Listing entities:
2020-11-16 13:51:26,220 DEBUG [org.hib.int.uti.EntityPrinter] (executor-thread-198) org.protechskillsinstitute.applicant.entity.Applicant{applicant_id=1, email=stephen.w.boyd#gmail.com}
2020-11-16 13:51:26,220 DEBUG [org.hib.res.jdb.int.LogicalConnectionManagedImpl] (executor-thread-198) Initiating JDBC connection release from afterStatement
2020-11-16 13:51:26,221 DEBUG [org.hib.eng.tra.int.TransactionImpl] (executor-thread-198) On TransactionImpl creation, JpaCompliance#isJpaTransactionComplianceEnabled == false
2020-11-16 13:51:26,221 DEBUG [org.hib.eve.int.AbstractFlushingEventListener] (executor-thread-198) Processing flush-time cascades
2020-11-16 13:51:26,221 DEBUG [org.hib.eve.int.AbstractFlushingEventListener] (executor-thread-198) Dirty checking collections
2020-11-16 13:51:26,221 DEBUG [org.hib.eve.int.AbstractFlushingEventListener] (executor-thread-198) Flushed: 0 insertions, 0 updates, 0 deletions to 1 objects
2020-11-16 13:51:26,222 DEBUG [org.hib.eve.int.AbstractFlushingEventListener] (executor-thread-198) Flushed: 0 (re)creations, 0 updates, 0 removals to 0 collections
2020-11-16 13:51:26,222 DEBUG [org.hib.int.uti.EntityPrinter] (executor-thread-198) Listing entities:
2020-11-16 13:51:26,222 DEBUG [org.hib.int.uti.EntityPrinter] (executor-thread-198) org.protechskillsinstitute.applicant.entity.Applicant{applicant_id=1, email=edited#email.com}
2020-11-16 13:51:26,222 DEBUG [org.hib.res.jdb.int.LogicalConnectionManagedImpl] (executor-thread-198) Initiating JDBC connection release from afterStatement
2020-11-16 13:51:26,223 INFO [org.hib.eng.int.StatisticalLoggingSessionEventListener] (executor-thread-198) Session Metrics {
112700 nanoseconds spent acquiring 1 JDBC connections;
9700 nanoseconds spent releasing 1 JDBC connections;
147400 nanoseconds spent preparing 1 JDBC statements;
340500 nanoseconds spent executing 1 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
3546700 nanoseconds spent executing 2 flushes (flushing a total of 2 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
2020-11-16 13:51:26,227 DEBUG [org.hib.res.jdb.int.LogicalConnectionManagedImpl] (executor-thread-198) Initiating JDBC connection release from afterTransaction
2020-11-16 13:51:26,230 INFO [org.hib.eng.int.StatisticalLoggingSessionEventListener] (executor-thread-198) Session Metrics {
34900 nanoseconds spent acquiring 1 JDBC connections;
18700 nanoseconds spent releasing 1 JDBC connections;
77500 nanoseconds spent preparing 1 JDBC statements;
430200 nanoseconds spent executing 1 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
Everything in the code and logs are identical for the working and non-working services with one exception. The log shows the Update is required for the working services and does not for the failing service.
Thoughts? As always, thank you for your assistance!
Drilled in logs
2020-11-16 13:51:26,156 DEBUG [org.hib.loa.pla.exe.pro.int.EntityReferenceInitializerImpl] (executor-thread-198) On call to EntityIdentifierReaderImpl#resolve, EntityKey was already known; should only happen on root returns with an optional identifier specified
2020-11-16 13:51:26,156 DEBUG [org.hib.eng.int.TwoPhaseLoad] (executor-thread-198) Resolving attributes for [org.protechskillsinstitute.applicant.entity.Applicant#1]
2020-11-16 13:51:26,157 DEBUG [org.hib.eng.int.TwoPhaseLoad] (executor-thread-198) Processing attribute `email` : value = original#email.com
2020-11-16 13:51:26,157 DEBUG [org.hib.eng.int.TwoPhaseLoad] (executor-thread-198) Attribute (`email`) - enhanced for lazy-loading? - false
and the check
11-16 13:51:26,220 DEBUG [org.hib.int.uti.EntityPrinter] (executor-thread-198) Listing entities:
2020-11-16 13:51:26,220 DEBUG [org.hib.int.uti.EntityPrinter] (executor-thread-198) org.protechskillsinstitute.applicant.entity.Applicant{applicant_id=1, email=edited#email.com}
2020-11-16 13:51:26,220 DEBUG [org.hib.res.jdb.int.LogicalConnectionManagedImpl] (executor-thread-198) Initiating JDBC connection release from afterStatement
2020-11-16 13:51:26,221 DEBUG [org.hib.eng.tra.int.TransactionImpl] (executor-thread-198) On TransactionImpl creation, JpaCompliance#isJpaTransactionComplianceEnabled == false
2020-11-16 13:51:26,221 DEBUG [org.hib.eve.int.AbstractFlushingEventListener] (executor-thread-198) Processing flush-time cascades
2020-11-16 13:51:26,221 DEBUG [org.hib.eve.int.AbstractFlushingEventListener] (executor-thread-198) Dirty checking collections
2020-11-16 13:51:26,221 DEBUG [org.hib.eve.int.AbstractFlushingEventListener] (executor-thread-198) Flushed: 0 insertions, 0 updates, 0 deletions to 1 objects
2020-11-16 13:51:26,222 DEBUG [org.hib.eve.int.AbstractFlushingEventListener] (executor-thread-198) Flushed: 0 (re)creations, 0 updates, 0 removals to 0 collections
2020-11-16 13:51:26,222 DEBUG [org.hib.int.uti.EntityPrinter] (executor-thread-198) Listing entities:
2020-11-16 13:51:26,222 DEBUG [org.hib.int.uti.EntityPrinter] (executor-thread-198) org.protechskillsinstitute.applicant.entity.Applicant{applicant_id=1, email=edited#email.com}
Are you sure the email actually changed? Hibernate will obviously not execute an update query if the data didn't change.
Have you tried flush and commit after merge? did you check if the flush mode is manual or auto?
Based on the Resolved Dependencies it looks like Red Hat is using:
hibernate-core: 5.4.21.Final-redhat-00001
hibarenate-commons-annotations: 5.1.0.Final-redhat-00005
hibernate-graalvm: 5.4.21.Final-redhat-00001
Per the comments below, I have opened a ticket with Red Hat.
Case 02805123

Spring Data JPA Is Too Slow

I recently switched my app to Spring Boot 2. I rely on Spring Data JPA to handle all transactions and I noticed a huge speed difference between this and my old configuration. Storing around 1000 elements was being done in around 6s and now it's taking over 25 seconds. I have seen SO posts about batching with Data JPA but none of these worked.
Let me show you the 2 configurations:
The entity (common to both) :
#Entity
#Table(name = "category")
public class CategoryDB implements Serializable
{
private static final long serialVersionUID = -7047292240228252349L;
#Id
#Column(name = "category_id", length = 24)
private String category_id;
#Column(name = "category_name", length = 50)
private String name;
#Column(name = "category_plural_name", length = 50)
private String pluralName;
#Column(name = "url_icon", length = 200)
private String url;
#Column(name = "parent_category", length = 24)
#JoinColumn(name = "parent_category", referencedColumnName = "category_id")
private String parentID;
//Getters & Setters
}
Old Repository (showing an insert only) :
#Override
public Set<String> insert(Set<CategoryDB> element)
{
Set<String> ids = new HashSet<>();
Transaction tx = session.beginTransaction();
for (CategoryDB category : element)
{
String id = (String) session.save(category);
ids.add(id);
}
tx.commit();
return ids;
}
Old Hibernate XML Config File:
<property name="show_sql">true</property>
<property name="format_sql">true</property>
<!-- connection information -->
<property name="hibernate.connection.driver_class">com.mysql.cj.jdbc.Driver</property>
<property name="hibernate.dialect">org.hibernate.dialect.MySQLDialect</property>
<!-- database pooling information -->
<property name="connection_provider_class">org.hibernate.connection.C3P0ConnectionProvider</property>
<property name="hibernate.c3p0.min_size">5</property>
<property name="hibernate.c3p0.max_size">100</property>
<property name="hibernate.c3p0.timeout">300</property>
<property name="hibernate.c3p0.max_statements">50</property>
<property name="hibernate.c3p0.idle_test_period">3000</property>
Old Statistics:
18949156 nanoseconds spent acquiring 2 JDBC connections;
5025322 nanoseconds spent releasing 2 JDBC connections;
33116643 nanoseconds spent preparing 942 JDBC statements;
3185229893 nanoseconds spent executing 942 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
3374152568 nanoseconds spent executing 1 flushes (flushing a total of 941 entities and 0 collections);
6485 nanoseconds spent executing 1 partial-flushes (flushing a total of 0 entities and 0 collections)
New Repository:
#Repository
public interface CategoryRepository extends JpaRepository<CategoryDB,String>
{
#Query("SELECT cat.parentID FROM CategoryDB cat WHERE cat.category_id = :#{#category.category_id}")
String getParentID(#Param("category") CategoryDB category);
}
And I'm using the saveAll() in my service.
New application.properties:
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.hikari.connection-timeout=6000
spring.datasource.hikari.maximum-pool-size=10
spring.jpa.properties.hibernate.show_sql=true
spring.jpa.properties.hibernate.format_sql=true
spring.jpa.properties.hibernate.generate_statistics = true
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQLDialect
spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
New Statistics:
24543605 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
136919170 nanoseconds spent preparing 942 JDBC statements;
5457451561 nanoseconds spent executing 941 JDBC statements;
19985781508 nanoseconds spent executing 19 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
20256178886 nanoseconds spent executing 3 flushes (flushing a total of 2823 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
Probably , I'm misconfiguring something on behalf on Spring. This is a huge performance difference and I'm on a dead end. Any hints on what is going wrong here are very appreciated.
Let's merge the statistics so they can be easily compared.
Old rows are prefixed with o, new ones with n.
Rows with a count of 0 are ignored.
Nanoseconds measurements are formatted so that milliseconds can are before a .
o: 18 949156 nanoseconds spent acquiring 2 JDBC connections;
n: 24 543605 nanoseconds spent acquiring 1 JDBC connections;
o: 33 116643 nanoseconds spent preparing 942 JDBC statements;
n: 136 919170 nanoseconds spent preparing 942 JDBC statements;
o: 3185 229893 nanoseconds spent executing 942 JDBC statements;
n: 5457 451561 nanoseconds spent executing 941 JDBC statements; //loosing ~2sec
o: 0 nanoseconds spent executing 0 JDBC batches;
n: 19985 781508 nanoseconds spent executing 19 JDBC batches; // loosing ~20sec
o: 3374 152568 nanoseconds spent executing 1 flushes (flushing a total of 941 entities and 0 collections);
n: 20256 178886 nanoseconds spent executing 3 flushes (flushing a total of 2823 entities and 0 collections); // loosing ~20sec, processing 3 times the entities
o: 6485 nanoseconds spent executing 1 partial-flushes (flushing a total of 0 entities and 0 collections)
n: 0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
The following seem to be the relevant points:
The new version has 19 batches which take 20sec which don't exist in the old version at all.
The new version has 3 flushes instead of 1, which take together 20 sec more or about 6 times as long. This is probably more or less the same extra time as the batches since they are most certainly part of these flushes.
Although batches are supposed to make things faster, there are reports where they make things slower, especially with MySql: Why Spring's jdbcTemplate.batchUpdate() so slow?
This brings us to a couple of things you can try/investigate:
Disable batching, in order to test if you are actually suffering from some kind of slow batch problem.
Use the linked SO post in order to speed up batching.
log the SQL statements that actually get executed in order to find the difference.
Since this will result in rather lengthy logs to manipulate, try extracting only the SQL statements in two files and comparing them with a diff tool.
log flushes in order to get ideas why extra flushes are triggered.
use breakpoints and a debugger or extra logging to find out what entities are getting flushed and why you have way more entities in the second variant.
All the proposals above operate on JPA.
But your statistics and question content suggest that you are doing simple inserts in a single or few tables.
Doing this with on JDBC, e.g. with a JdbcTemplate might be more efficient and at least easier to understand.
You can use jdbc template directly it is much fast than data jpa.

JPA batch inserts does not improve performance

I would like to improve the performance of my postgresql inserts with JPA batch inserts.
I'm using :
spring-boot-starter-data-jpa 2.1.3.RELEASE
postgresql 42.2.5 (jdbc driver).
Database is PostgreSQL 9.6.2
I have managed to activate JPA's batch inserts, but performance has not improved at all.
I use #GeneratedValue(strategy = GenerationType.SEQUENCE) in my entities
I use reWriteBatchedInserts=true in my jdbc connexion string
I set the following properties :
spring.jpa.properties.hibernate.jdbc.batch_size=100
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.generate_statistics=true
I use the saveAll(collection) method
I tried to flush and clean my entityManager after each batch
I tried with a batch size of 100 and 1000, flushing for each batch, with no noticeable change.
I can see in the logs that hibernate does use batch inserts but am unsure if my database does (I'm trying to fetch the logs, folder permission is pending).
#Service
#Configuration
#Transactional
public class SecteurGeographiqueServiceImpl implements SecteurGeographiqueService {
private static final Logger logger = LoggerFactory.getLogger(SecteurGeographiqueServiceImpl.class);
#Value("${spring.jpa.properties.hibernate.jdbc.batch_size}")
private int batchSize;
#PersistenceContext
private EntityManager entityManager;
#Autowired
private SecteurGeographiqueRepository secteurGeographiqueRepository;
#Override
public List<SecteurGeographique> saveAllSecteurGeographiquesISOs(List<SecteurGeographique> listSecteurGeographiques) {
logger.warn("BATCH SIZE : " + this.batchSize);
final List<SecteurGeographique> tempList = new ArrayList<>();
final List<SecteurGeographique> savedList = new ArrayList<>();
for (int i = 0; i < listSecteurGeographiques.size(); i++) {
if ((i % this.batchSize) == 0) {
savedList.addAll(this.secteurGeographiqueRepository.saveAll(tempList));
tempList.clear();
this.entityManager.flush();
this.entityManager.clear();
}
tempList.add(listSecteurGeographiques.get(i));
}
savedList.addAll(this.secteurGeographiqueRepository.saveAll(tempList));
tempList.clear();
this.entityManager.flush();
this.entityManager.clear();
return savedList;
}
}
...
#Entity
public class SecteurGeographique {
private static final long serialVersionUID = 1L;
#Id
#GeneratedValue(strategy = GenerationType.SEQUENCE)
#Column(name = "id")
public Long id;
...
My repository implementation is :
org.springframework.data.jpa.repository.JpaRepository<SecteurGeographique, Long>
application.properties (connection part) :
spring.datasource.url=jdbc:postgresql://xx.xx.xx.xx:5432/bddname?reWriteBatchedInserts=true
spring.jpa.properties.hibernate.default_schema=schema
spring.datasource.username=xxxx
spring.datasource.password=xxxx
spring.datasource.driverClassName=org.postgresql.Driver
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.PostgreSQLDialect
spring.jpa.properties.hibernate.jdbc.lob.non_contextual_creation=true
spring.jpa.properties.hibernate.jdbc.batch_size=100
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.generate_statistics=true
And in the logs after my 16073 entities are inserted (this test does not include flushing) :
13:31:40.882 [restartedMain] INFO o.h.e.i.StatisticalLoggingSessionEventListener - Session Metrics {
15721506 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
121091067 nanoseconds spent preparing 16074 JDBC statements;
240144821872 nanoseconds spent executing 16073 JDBC statements;
3778202166 nanoseconds spent executing 161 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
4012929596 nanoseconds spent executing 1 flushes (flushing a total of 16073 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
Note that this is just one table, with no constraint / foreign key. Just flat basic data in a table, nothing fancy.
From the logs ot does look like there is a problem :
240144821872 nanoseconds spent executing <b>16073 JDBC statements</b>;
3778202166 nanoseconds spent executing 161 JDBC batches;
Shouldn't it be "executing 161 JDBC statements" if everything is in the batches ?
Tests with flushes, and batch sizes 100 then 1000 :
15:32:17.612 [restartedMain] WARN f.g.j.a.r.s.i.SecteurGeographiqueServiceImpl - BATCH SIZE : 100
15:36:46.206 [restartedMain] INFO o.h.e.i.StatisticalLoggingSessionEventListener - Session Metrics {
15416324 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
105369002 nanoseconds spent preparing 16234 JDBC statements;
262388696401 nanoseconds spent executing 16073 JDBC statements;
3669253410 nanoseconds spent executing 161 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
3956493726 nanoseconds spent executing 161 flushes (flushing a total of 16073 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
15:43:54.155 [restartedMain] WARN f.g.j.a.r.s.i.SecteurGeographiqueServiceImpl - BATCH SIZE : 1000
15:48:22.335 [restartedMain] INFO o.h.e.i.StatisticalLoggingSessionEventListener - Session Metrics {
15676227 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
111370586 nanoseconds spent preparing 16090 JDBC statements;
265089247563 nanoseconds spent executing 16073 JDBC statements;
599946208 nanoseconds spent executing 17 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
866452023 nanoseconds spent executing 17 flushes (flushing a total of 16073 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
Each time I get a 4min 30sec execution time. It feels enormous for batch inserts.
What am I missing / misinterpreting ?
After trying a batch size of 1000 with a postgresql server on localhost (https://gareth.flowers/postgresql-portable/ v10.1.1), the execution runs under 3 seconds. So it seems the code or configuration is not to blame here.
Unfortunately I cannot investigate why it was taking so much time on the remote postgresql (hosted on an AWS), but I can only conclude this was a network or database issue.
As of today I cannot access postgresql remote logs, but if you have any advice on what to look for on the postgresql instance, I'm all ears.
Logs with batching (1000) and flush+clean :
16:20:52.360 [restartedMain] WARN f.g.j.a.r.s.i.SecteurGeographiqueServiceImpl - BATCH SIZE : 1000
16:20:54.844 [restartedMain] INFO o.h.e.i.StatisticalLoggingSessionEventListener - Session Metrics {
523125 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
44649191 nanoseconds spent preparing 16090 JDBC statements;
1311557995 nanoseconds spent executing 16073 JDBC statements;
204225325 nanoseconds spent executing 17 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
381230968 nanoseconds spent executing 17 flushes (flushing a total of 16073 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
Logs WITHOUT batching, flush or clean :
16:57:34.426 [restartedMain] INFO o.h.e.i.StatisticalLoggingSessionEventListener - Session Metrics {
725069 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
55763008 nanoseconds spent preparing 32146 JDBC statements;
2816525053 nanoseconds spent executing 32146 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
1796451447 nanoseconds spent executing 1 flushes (flushing a total of 16073 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
This comparison shows a 46% gain in the overall JDBC statements execution time.

Hibernate + enabling SQLite Pragmas to gain speed on Windows 7 machine

Used software:
hibernate 3.6
sqlite jbdc 3.6.0
java jre 1.6.X
I have a problem with transferring data over a tcp connection ( 20 000 entrys )
create a sqlite database with the help of hibernate
use hibernateview and hibernate annotations to create querys
hibernate proberties are also used
storing 20 000 entries with hibernate and NO sqlite pragmas enabled lasts nearly 6 minutes ( ~ 330 sec) on Windows 7
storing 20 000 entries without hibernate and all relevant sql pragmas enabled lasts ca 2 minutes ( ~ 109 sec ) on windows 7
tests with hibernate and sqlite without pragmas on windows XP and windows Vista run fast, but on win7 it lasts nearly
3 times ( ~ 330 sec - win 7) as long as on the XP machine
on windows 7 we want to activate sqlite pragmas to gain speed boost
relevant pragmas are:
PRAGMA cache_size = 400000;
PRAGMA synchronous = OFF;
PRAGMA count_changes = OFF;
PRAGMA temp_store = MEMORY;
PRAGMA auto_vacuum = NONE;
Problem: we must use hibernate ( no Nhibernate ! )
Questions:
how to enable these pragmas in hibernate sqlite connection if its possible?
Is it possible to do so with using hibernate?
I was also looking for some way to set another pragma: PRAGMA foreign_keys = ON for hibernate connections. I didn't find anything on the subject and the only solution I came up with is to decor SQLite JDBC driver and set required pragma each time new connection is retrieved. See sample code below:
#Override
public Connection connect(String url, Properties info) throws SQLException {
final Connection connection = originalDriver.connect(url, info);
initPragmas(connection);
return connection;
}
private void initPragmas(Connection connection) throws SQLException {
//Enabling foreign keys
connection.prepareStatement("PRAGMA foreign_keys = ON;").execute();
}
Full sample is here: https://gist.github.com/52dbc7066787684de634. Then when initializing hibernate.connection.driver_class property just set it to your package.DriverDecorator
Inserts one-by-one can be very slow; you may want to consider batching. Please see my answer to this other post.
for PRAGMA foreign_keys = ON equivalent
hibernate.connection.foreign_keys=true
or
<property name="connection.foreign_keys">true</property>
depending on your strategy

SELECT 1 from DUAL: MySQL

In looking over my Query log, I see an odd pattern that I don't have an explanation for.
After practically every query, I have "select 1 from DUAL".
I have no idea where this is coming from, and I'm certainly not making the query explicitly.
The log basically looks like this:
10 Query SELECT some normal query
10 Query select 1 from DUAL
10 Query SELECT some normal query
10 Query select 1 from DUAL
10 Query SELECT some normal query
10 Query select 1 from DUAL
10 Query SELECT some normal query
10 Query select 1 from DUAL
10 Query SELECT some normal query
10 Query select 1 from DUAL
...etc...
Has anybody encountered this problem before?
MySQL Version: 5.0.51
Driver: Java 6 app using JDBC. mysql-connector-java-5.1.6-bin.jar
Connection Pool: commons-dbcp 1.2.2
The validationQuery was set to "select 1 from DUAL" (obviously) and apparently the connection pool defaults testOnBorrow and testOnReturn to true when a validation query is non-null.
One further question that this brings up for me is whether or not I actually need to have a validation query, or if I can maybe get a performance boost by disabling it or at least reducing the frequency with which it is used. Unfortunately, the developer who wrote our "database manager" is no longer with us, so I can't ask him to justify it for me. Any input would be appreciated. I'm gonna dig through the API and google for a while and report back if I find anything worthwhile.
EDIT: added some more info
EDIT2: Added info that was asked for in the correct answer for anybody who finds this later
It could be coming from the connection pool your application is using. We use a simple query to test the connection.
Just had a quick look in the source to mysql-connector-j and it isn't coming from in there.
The most likely cause is the connection pool.
Common connection pools:
commons-dbcp has a configuration property validationQuery, this combined with testOnBorrow and testOnReturn could cause the statements you see.
c3p0 has preferredTestQuery, testConnectionOnCheckin, testConnectionOnCheckout and idleConnectionTestPeriod
For what's it's worth I tend to configure connection testing and checkout/borrow even if it means a little extra network chatter.
I have performed 100 inserts/deltes and tested on both DBCP and C3PO.
DBCP :: testOnBorrow=true impacts the response time by more than 4 folds.
C3P0 :: testConnectionOnCheckout=true impacts the response time by more than 3 folds.
Here are the results :
DBCP – BasicDataSource
Average time for 100 transactions ( insert operation )
testOnBorrow=false :: 219.01 ms
testOnBorrow=true :: 1071.56 ms
Average time for 100 transactions ( delete opration )
testOnBorrow=false :: 223.4 ms
testOnBorrow=true :: 1067.51 ms
C3PO – ComboPooledDataSource
Average time for 100 transactions ( insert operation )
testConnectionOnCheckout=false :: 220.08 ms
testConnectionOnCheckout=true :: 661.44 ms
Average time for 100 transactions ( delete opration )
testConnectionOnCheckout=false :: 216.52 ms
testConnectionOnCheckout=true :: 648.29 ms
Conculsion : Setting testOnBorrow=true in DBCP or testConnectionOnCheckout=true in C3PO impacts the performance by 3-4 folds. Is there any other setting that will enhance the performance.
-Durga Prasad
The "dual" table/object name is an Oracle construct, which MySQL supports for compatibility - or to provide a target for queries that dont have a target but people want one to feel all warm and fuzzy. E.g.
select curdate()
can be
select curdate() from dual
Someone could be sniffing you to see if you're running Oracle.

Categories

Resources