No matter what, I can't batch MySQL INSERT statements in Hibernate - java

I'm currently facing the well-known and common Hibernate insert batch problem.
I need to save batches 5 millions of rows long. I'm first trying with a much lighter payload. Since I have to insert entities of only 2 types (first all records of type A, then all records of type B, all pointing to common type C ManyToOne parent), I would like to take the most advantage from JDBC batch insert.
I have already read lots of documentation, but none that I have tried worked.
I know that in order to use batch inserts I must not use an entity generator. So I removed the AUTO_INCREMENT ID and I'm setting the ID with a trick: SELECT MAX(ID) FROM ENTITIES and increment every time.
I know that I must flush the session regularly. I'll post code ahead, but anyway I perform a transaction every 500 elements.
I know that I have to set hibernate.jdbc.batch_size consistent with my application's bulk size, so I set it in the LocalSessionFactoryBean (Spring ORM integration)
I know I have to enable rewriting batched statements in connection URL.
Here are my entities
Common parent entity. This gets inserted first in a single transaction. I don't care about auto increment column here. Only one record per batch job
#Entity
#Table(...)
#SequenceGenerator(...)
public class Deal
{
#Id
#Column(
name = "DEAL_ID",
nullable = false)
#GeneratedValue(
strategy = GenerationType.AUTO)
protected Long id;
................
}
One of the children (let's say 2.5M records per batch)
#Entity
#Table(
name = "TA_LOANS")
public class Loan
{
#Id
#Column(
name = "LOAN_ID",
nullable = false)
protected Long id;
#ManyToOne(
optional = false,
targetEntity = Deal.class,
fetch = FetchType.LAZY)
#JoinColumn(
name = "DEAL_ID",
nullable = false)
protected Deal deal;
.............
}
The other children type. Let's say the other 2.5M records
#Entity
#Table(
name = "TA_BONDS")
public class Bond
{
#Id
#Column(
name = "BOND_ID")
#ManyToOne(
fetch = FetchType.LAZY,
optional = false,
targetEntity = Deal.class)
#JoinColumn(
name = "DEAL_ID",
nullable = false,
updatable = false)
protected Deal deal;
}
Simplified code that inserts records
long loanIdCounter = loanDao.getMaxId(), bondIdCounter = bondDao.getMaxId(); //Perform SELECT MAX(ID)
Deal deal = null;
List<Bond> bondList = new ArrayList<Bond>(COMMIT_BATCH_SIZE); //500 constant value
List<Loan> loanList = new ArrayList<Loan>(COMMIT_BATCH_SIZE);
for (String msg: inputStreamReader)
{
log.debug(msg.toString());
if (this is a deal)
{
Deal deal = parseDeal(msg.getMessage());
deal = dealManager.persist(holder.deal); //Called in a separate transaction using Spring annotation #Transaction(REQUIRES_NEW)
}
else if (this is a loan)
{
Loan loan = parseLoan(msg.getMessage());
loan.setId(++loanIdCounter);
loan.setDeal(deal);
loanList.add(loan);
if (loanList.size() == COMMIT_BATCH_SIZE)
{
loanManager.bulkInsert(loanList); //Perform a bulk insert in a single transaction, not annotated but handled manually this time
loanList.clear();
}
}
else if (this is a bond)
{
Bond bond = parseBond(msg.getMessage());
bond.setId(++bondIdCounter);
bond.setDeal(deal);
bondList.add(bond);
if (bondList.size() == COMMIT_BATCH_SIZE) //As above
{
bondManager.bulkInsert(bondList);
bondList.clear();
}
}
}
if (!bondList.isEmpty())
bondManager.bulkInsert(bondList);
if (!loanList.isEmpty())
loanManager.bulkInsert(loanList);
//Flush remaining items, not important
Implementation of bulkInsert:
#Override
public void bulkInsert(Collection<Bond> bonds)
{
// StatelessSession session = sessionFactory.openStatelessSession();
Session session = sessionFactory.openSession();
try
{
Transaction t = session.beginTransaction();
try
{
for (Bond bond : bonds)
// session.persist(bond);
// session.insert(bond);
session.save(bond);
}
catch (RuntimeException ex)
{
t.rollback();
}
finally
{
t.commit();
}
}
finally
{
session.close();
}
}
As you can see from comments, I have tried several combinations of stateful/stateless session. None worked.
My dataSource is a ComboPooledDataSource with following URL
<b:property name="jdbcUrl" value="jdbc:mysql://server:3306/db?autoReconnect=true&rewriteBatchedStatements=true" />
My SessionFactory
<b:bean id="sessionFactory" class="class.that.extends.org.springframework.orm.hibernate3.LocalSessionFactoryBean" lazy-init="false" depends-on="dataSource">
<b:property name="dataSource" ref="phoenixDataSource" />
<b:property name="hibernateProperties">
<b:props>
<b:prop key="hibernate.dialect">${hibernate.dialect}</b:prop> <!-- MySQL5InnoDb-->
<b:prop key="hibernate.show_sql">${hibernate.showSQL}</b:prop>
<b:prop key="hibernate.jdbc.batch_size">500</b:prop>
<b:prop key="hibernate.jdbc.use_scrollable_resultset">false</b:prop>
<b:prop key="hibernate.cache.use_second_level_cache">false</b:prop>
<b:prop key="hibernate.cache.provider_class">org.hibernate.cache.EhCacheProvider</b:prop>
<b:prop key="hibernate.cache.use_query_cache">false</b:prop>
<b:prop key="hibernate.validator.apply_to_ddl">false</b:prop>
<b:prop key="hibernate.validator.autoregister_listeners">false</b:prop>
<b:prop key="hibernate.order_inserts">true</b:prop>
<b:prop key="hibernate.order_updates">true</b:prop>
</b:props>
</b:property>
</b:bean>
Even if my project-wide class extends LocalSessionFactoryBean, it does not override its methods (only adds few project-wide methods)
I'm getting mad since a few days. I read a few articles and none helped me enable batch inserts. I run all of my code from JUnit tests instrumented with Spring context (so I can #Autowire my classes). All of my attempts only produce a lots of separate INSERT statements
https://stackoverflow.com/questions/12011343/how-do-you-enable-batch-inserts-in-hibernate
https://stackoverflow.com/questions/3469364/faster-way-to-batch-saves-with-hibernate
https://forum.hibernate.org/viewtopic.php?p=2374413
https://stackoverflow.com/questions/3026968/high-performance-hibernate-insert
What am I missing?

It's likely your queries are being rewritten but you wouldn't know if by looking at the Hibernate SQL logs. Hibernate does not rewrite the insert statements - the MySQL driver rewrites them. In other words, Hibernate will send multiple insert statements to the driver, and then the driver will rewrite them. So the Hibernate logs only show you what SQL Hibernate sent to the driver, not what SQL the driver sent to the database.
You can verify this by enabling MySQL's profileSQL parameter in connection url:
<b:property name="jdbcUrl" value="jdbc:mysql://server:3306/db?autoReconnect=true&rewriteBatchedStatements=true&profileSQL=true" />
Using an example similar to yours, this is what my output looks like:
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
Wed Feb 05 13:29:52 MST 2014 INFO: Profiler Event: [QUERY] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) duration: 1 ms, connection-id: 81, statement-id: 33, resultset-id: 0, message: insert into Person (firstName, lastName, id) values ('person1', 'Name', 1),('person2', 'Name', 2),('person3', 'Name', 3),('person4', 'Name', 4),('person5', 'Name', 5),('person6', 'Name', 6),('person7', 'Name', 7),('person8', 'Name', 8),('person9', 'Name', 9),('person10', 'Name', 10)
The first 10 lines are being logged by Hibernate though this not what is actually being sent to MySQL database. The last line is coming from MySQL driver and it clearly shows a single batch insert with multiple values and that is what is actually being sent to the MySQL database.

Related

JPA Inserting an entity if not exists in concurrent operation

I have a simple use case:
I want to insert an entity that doesn't exist.
Here is the entity class, it has combined unique columns:
#Entity
#Table(uniqueConstraints = {
#UniqueConstraint(columnNames = {
"firstName", "lastName"
})
})
public class Customer {
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
private Long id;
private String firstName;
private String lastName;
And here is the simple insertion logic:
Optional<Customer> customer = customerRepository.findByFirstNameAndLastName("John", "Doe");
if (!customer.isPresent()) {
customerRepository.save(new Customer("John", "Doe"));
}
when I call this via concurrent threads, I get this ConstraintViolationException and that's obvious
insert into customer (first_name, last_name, id) values (?, ?, ?) [23505-200]]; nested exception is org.hibernate.exception.ConstraintViolationException: could not execute statement] with root cause
org.h2.jdbc.JdbcSQLIntegrityConstraintViolationException: Unique index or primary key violation: "PUBLIC.UKKIYY7M3FWM4VO5NIL9IBP5846_INDEX_5 ON PUBLIC.CUSTOMER(FIRST_NAME, LAST_NAME) VALUES 8"; SQL statement:
insert into customer (first_name, last_name, id) values (?, ?, ?) [23505-200]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:459) ~[h2-1.4.200.jar:1.4.200]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:429) ~[h2-1.4.200.jar:1.4.200]
I can catch this exception, but ConstraintViolationException can occur in any scenario other than the unique index.
I also tried JPA's Optimistic locking but it seems to work only in the update and delete operations but not in insertion.
I don't want native query (like ON DUPLICATE KEY in mysql, ON CONFLICT in PostgresSQL) because I want to make operations to be handled in JPA itself.
Also, I cannot make it threadsafe because the application will be running on multiple JVMs.
Is there any way somehow I can handle this, like locking on insert or atleast update on same keys?

What hibernate do with unmapped column?

Suppose there is a class like that:
public class Entity {
private Long id;
private String name;
}
And table with 3 columns: id, name and address.
CREATE TABLE entity (
id NUMBER(9,0),
name VARCHAR2(255),
address VARCHAR2(1000),
Then en insert was performed:
INSERT INTO entity (id, name, address) VALUES (1, "a", "b")
Then we load and update hibernate entity:
Session session = ...
Entity entity = session.get(Entity.class, 1);
Then update name and save it again:
entity.setName("newName");
session.save(entity);
So what is address column value now - null or b? Does hibernate provide some stgrategies for such situations or I have to
add address field into entity and mark it as #Column(updatable=false, insertable = false)?
If you would put the following properties in persistence.xml(or where you have defined your hibernate properties)
<property name="hibernate.show_sql" value="false"/>
<property name="hibernate.format_sql" value="false"/>
Then you could see the queries executed by hibernate when server is run in debug mode with logged configured for debug.
If your entity is
public class Entity {
private Long id;
private String name;
private String secondName;
//Getters & Setters
}
Then executing below HQL
SELECT e FROM Entity e WHERE e.id = 121
would produce results similar to
SELECT entity0_.id AS id1_63_,
entity0_.name AS name6_63_,
entity0_.secondName AS secondName6_63_,
FROM yout_db.Entity entity0_
WHERE entity0_.id = 121
You see that here SELECT * FROM Entity was not executed instead all the fields from the Class were fetched and added to the query. So if you have ignored any field from DB then it will NOT be taking part in Queries.
For Select-Update also same thing happens.
entity.setName("newName");
session.save(entity);
Below is formatted query if you would update an entity:
UPDATE your_db.Entity
SET name = ?
secondName = ?
WHERE id = ?
This query will be executed even if only one field is changed.
Hibernate operates only with columns taken from entities, based on property name or described in annotation. So in your case 'address' value will be 'b'.

JPA ManyToMany persist

I have a NOTIFICATION table who contains one ManyToMany association :
#Entity
#Table(name="NOTIFICATION")
#NamedQuery(name="Notification.findAll", query="SELECT f FROM Notification f")
public class Notification {
/** SOME COLUMN DEFINITION NOT IMPORTANT FOR MY CASE
COD, DATE, ID_THEME, ID_TYP, IC_ARCH, ID_CLIENT, INFOS, NAME, TITRE_NOT, ID_NOT
**/
#ManyToMany
#JoinTable(
name="PJ_PAR_NOTIF"
, joinColumns={
#JoinColumn(name="ID_NOTIF")
}
, inverseJoinColumns={
#JoinColumn(name="ID_PJ_GEN")
}
)
private List<PiecesJointesGen> piecesJointesGens;
}
So, I have an association table called PJ_PAR_NOTIF.
I try to persist a Notification entity. Here is piecesJointesGens initialisation, from a Value Object :
#PersistenceContext(unitName="pu/middle")
private EntityManager entityMgr;
FoaNotification lFoaNotification = new FoaNotification();
for(PieceJointeGenVO lPJGenVO : pNotificationVO.getPiecesJointes()){
PiecesJointesGen lPiecesJointesGen = new PiecesJointesGen();
lPiecesJointesGen.setLienPjGen(lPJGenVO.getLienPieceJointeGen());
lPiecesJointesGen.setIdPjGen(lPJGenVO.getIdPieceJointeGen());
lNotification.getFoaPiecesJointesGens().add(lFoaPiecesJointesGen);
}
entityMgr.persist(pNotification);
The persist doesn't work. JPA generate a first insert for my Notification object, that is ok :
insert
into
NOTIFICATION
(COD, DATE, ID_THEME, ID_TYP, IC_ARCH, ID_CLIENT, INFOS, NAME, TITRE_NOT, ID_NOT)
values
(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Then, JPA try to insert values in my association table, but piecesJointesGen doesn't exists for the moment :
insert
into
PJ_PAR_NOTIF
(ID_NOTIF, ID_PJ_GEN)
values
(?, ?)
So, I have this error :
GRAVE: EJB Exception: : java.lang.IllegalStateException: org.hibernate.TransientObjectException: object references an unsaved transient instance - save the transient instance before flushing: com.entities.PiecesJointesGen
Is there a way to tell JPA to insert piecesJointesGen before the PJ_PAR_NOTIF insert ?
Modify piecesJointesGens mapping to #ManyToMany(cascade = CascadeType.PERSIST).

Hibernate putting restrictions on collection elements conflicts with fetch mode

I am using Hibernate 4.3.8 as ORM tool for our MySql database. I have a class to be mapped which is annotated as follows:
#Entity
#DynamicUpdate
#Table(name = "myclass")
public class MyClass {
#Id
#Column(name = "myClassId")
private String id;
#Column(name = "status")
private String status;
#ElementCollection(fetch = FetchType.EAGER)
#CollectionTable(name = "myclass_children", joinColumns = #JoinColumn(name = "myClassId"))
#Column(name = "child")
#Fetch(FetchMode.JOIN)
#BatchSize(size = 100)
#Cascade(value = CascadeType.ALL)
private Set<String> children;
}
To perform read queries via Hibernate, I am asked to use Criteria API. I should mention at the beginning that using HQL or SQL are not options.
Using the following code performs exactly what I want to do: Performs a second select query to retrieve collection elements and returns exactly 20 MyClass objects.
public List<MyClass> listEntities() {
Session session = sessionFactory.openSession();
try {
Criteria criteria = session.createCriteria(MyClass.class);
criteria.setFetchMode("children", FetchMode.SELECT);
criteria.add(Restrictions.eq("status", "open"));
criteria.setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY);
criteria.setMaxResults(20);
}
}
Here are the queries generated:
select
this.myClassId as myClassId_1_0_0,
this.status as status_2_0_0
from
myclass this
where
status = ?
limit ?
select
children0.myClassId as myClassId1_0_0,
children0.child as child2_0_0
from
myclass_children as children0_
where
children0_.myClassId in (
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?
)
However, when I try to put a restriction on collection elements, hibernate performs a single join query. When number of rows (not distinct root entities) in the result set of this single query reaches to the limit, Hibernate returns the existing MyClass objects as result. If each MyClass objects as 2 children, 10 MyClass objects are returned.
public List<MyClass> listEntities() {
Session session = sessionFactory.openSession();
try {
Criteria criteria = session.createCriteria(MyClass.class);
criteria.setFetchMode("children", FetchMode.SELECT);
criteria.createCriteria("children", "ch", JoinType.LEFT_OUTER_JOIN);
criteria.add(Restrictions.eq("status", "open"));
criteria.add(Restrictions.in("ch.elements", Arrays.asList("child1", "child2"));
criteria.setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY);
criteria.setMaxResults(20);
}
}
Here is the generated query:
select
this.id as id1_0_0,
this.status as status2_0_0,
ch.child as child1_0_2
from
myclass this
left outer join
myclass_children ch1_
on this.myClassId = ch1_.myClassId
where
this.status = ? limit ?
What can I do to obtain 20 MyClass objects while adding restrictions on collection elements? Any suggestions & answers are welcome!
NOTE: #Fetch(FetchMode.JOIN) annotation is used for other code base (like selecting by id, etc.). It should does not have any effect on my question since I am setting FetchMode.SELECT for criteria object separately.

hibernate and oracle sequence GenericGenerator creates gap

I've mapped my class as follow (omitted other fields as only ID matters):
#Entity
#Table(name = "MODEL_GROUP")
#Cache(usage = CacheConcurrencyStrategy.TRANSACTIONAL)
public class SettlementModelGroup implements Serializable
{
#Id
#GeneratedValue(generator = "MODEL_GROUP_SEQ", strategy = GenerationType.SEQUENCE)
#GenericGenerator(name = "MODEL_GROUP_SEQ",
strategy = "sequence",
parameters = #Parameter(name = "sequence", value = "SEQ_MODEL_GROUP_MODEL_GROUP_ID"))
#Column(name = "MODEL_GROUP_ID", nullable = false)
private Integer modelId;
}
when I'm saving new object:
Integer modelGroupId = sessionFactory.getCurrentSession().save( modelGroup );
System.out.println( modelGroupId );
ID is set as for example 23, but when I look at the database it is actually 24. This is leading to many problems, as I'm using this ID later on. Any idea why it is making this gap?
SQL logs show that everything is fine (I thinks so):
Hibernate:
select
SEQ_MODEL_GROUP_MODEL_GROUP_ID.nextval
from
dual
Hibernate:
insert
into
MODEL_GROUP
(DOMAIN_ID, DESCRIPTION, NAME, PERIOD_TYPE_ID, MODEL_GROUP_TYPE_ID, STATUS_ID, OWNER_ID, MODEL_GROUP_ID)
values
(?, ?, ?, ?, ?, ?, ?, ?)
Trigger and Sequence:
CREATE SEQUENCE "SEQ_MODEL_GROUP_MODEL_GROUP_ID"
INCREMENT BY 1
START WITH 1
NOMAXVALUE
MINVALUE 1
NOCYCLE
NOCACHE
NOORDER
;
CREATE OR REPLACE TRIGGER "TRG_MODEL_GROUP_MODEL_GROUP_ID"
BEFORE INSERT
ON "MODEL_GROUP"
FOR EACH ROW
WHEN (NEW."MODEL_GROUP_ID" is NULL)
BEGIN
SELECT "SEQ_MODEL_GROUP_MODEL_GROUP_ID".NEXTVAL
INTO :NEW."MODEL_GROUP_ID"
FROM DUAL;
END;
Apparently, when Hibernate ask your database for nextValue of ID, it fires also Trigger. So when I ask for ID, I've got number 23 but when actually saving to database by commiting transaction, it is increased again so I've got 24. Solution is described here:
HIbernate issue with Oracle Trigger for generating id from a sequence
To make it work correctly, I changed Trigger:
CREATE OR REPLACE TRIGGER "TRG_MODEL_GROUP_MODEL_GROUP_ID"
BEFORE INSERT
ON "MODEL_GROUP"
FOR EACH ROW
WHEN (NEW."MODEL_GROUP_ID" is NULL)
BEGIN
SELECT "SEQ_MODEL_GROUP_MODEL_GROUP_ID".NEXTVAL
INTO :NEW."MODEL_GROUP_ID"
FROM DUAL;
END;

Categories

Resources