the first thanks for your time.
I am trying to insert data to the database by JPA(spring-boot), the project is using Oracle.
Currently, Insert 5000 record, it takes a long time with repository.save(...) or repository.saveAll(...).
I tried batch_size, but it is not working(looks like it is not working for oracle ?).
Code config:
Properties properties = new Properties();
properties.setProperty("hibernate.ddl-auto", "none");
properties.setProperty("hibernate.dialect", "org.hibernate.dialect.Oracle12cDialect");
properties.setProperty("hibernate.show_sql", "true");
properties.put("hibernate.jdbc.batch_size", 5);
properties.put("hibernate.order_inserts", true);
properties.put("hibernate.order_updates", true);
setJpaProperties(properties);
I create sql query to insert several rows at one time execute statement.
INSERT ALL INTO table(...)...
I hope there is a better and more efficient way
So, can you give me any solution?
Thankyou so much!!!!
How about:
batch_size : 1000
when entity count is 1000, then :repository.saveAndFlush();
then call the next batch.
Another method can be call the EntityManager persist directly in the batch saving. like:
public int saveDemoEntities(List<DemoEntity> DemoEntities) {
long start = System.currentTimeMillis();
int count = 0;
for (DemoEntities o : DemoEntities) {
entityManager.persist(o);
count++;
if (count % BATCH_COUNT == 0) {
entityManager.flush();
entityManager.clear();
}
}
entityManager.flush();
entityManager.clear();
return count;
}
Related
I am attempting to insert ~57,000 entities in my database, but the insert method takes longer and longer as the loop progresses. I have implemented batches of 25 - each time flushing, clearing, and closing the transaction (I'm pretty sure) without success. Is there something else I need to be doing in the code below to maintain the insert rate? I feel like it should not take 4+ hours to insert 57K records.
[Migrate.java]
This is the main class that loops through 'Xaction' entities and adds 'XactionParticipant' records based off each Xaction.
// Use hibernate cursor to efficiently loop through all xaction entities
String hql = "select xaction from Xaction xaction";
Query<Xaction> query = session.createQuery(hql, Xaction.class);
query.setFetchSize(100);
query.setReadOnly(true);
query.setLockMode("xaction", LockMode.NONE);
ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY);
int count = 0;
Instant lap = Instant.now();
List<Xaction> xactionsBatch = new ArrayList<>();
while (results.next()) {
count++;
Xaction xaction = (Xaction) results.get(0);
xactionsBatch.add(xaction);
// save new XactionParticipants in batches of 25
if (count % 25 == 0) {
xactionParticipantService.commitBatch(xactionsBatch);
float rate = ChronoUnit.MILLIS.between(lap, Instant.now()) / 25f / 1000;
System.out.printf("Batch rate: %.4fs per xaction\n", rate);
xactionsBatch = new ArrayList<>();
lap = Instant.now();
}
}
xactionParticipantService.commitBatch(xactionsBatch);
results.close();
[XactionParticipantService.java]
This service provides a method with "REQUIRES_NEW" in an attempt to close the transaction for each batch
#Transactional(propagation = Propagation.REQUIRES_NEW)
public void commitBatch(List<Xaction> xactionBatch) {
for (Xaction xaction : xactionBatch) {
try {
XactionParticipant xp = new XactionParticipant();
// ... create xp based off Xaction info ...
// Use native query for efficiency
String nativeQueryStr = "INSERT INTO XactionParticipant .... xp info/data";
Query q = em.createNativeQuery(nativeQueryStr);
q.executeUpdate();
} catch (Exception e) {
log.error("Unable to update", e);
}
}
// Clear just in case??
em.flush();
em.clear();
}
That is not clear what is the root cause of your performance problem: java memory consumption or db performance, please check some thoughts below:
The following code does not actually optimize memory consumption:
String hql = "select xaction from Xaction xaction";
Query<Xaction> query = session.createQuery(hql, Xaction.class);
query.setFetchSize(100);
query.setReadOnly(true);
query.setLockMode("xaction", LockMode.NONE);
ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY);
Since you are retrieving full-functional entities, those entities get stored in persistence context (session-level cache), and in order to free memory up you need to detach entity upon entity has been processed (i.e. after xactionsBatch.add(xaction) or // ... create xp based off Xaction info ...), otherwise at the end of processing you consume the same amount of memory as you were doing List<> results = query.getResultList();, and here I'm not sure what is better: consume all memory required at the start of transaction and release all other resources or keep cursor and jdbc connection open for 4 hours.
The following code does not actually optimize JDBC interactions:
for (Xaction xaction : xactionBatch) {
try {
XactionParticipant xp = new XactionParticipant();
// ... create xp based off Xaction info ...
// Use native query for efficiency
String nativeQueryStr = "INSERT INTO XactionParticipant .... xp info/data";
Query q = em.createNativeQuery(nativeQueryStr);
q.executeUpdate();
} catch (Exception e) {
log.error("Unable to update", e);
}
}
yes, in general, JDBC should be faster than JPA API, however that is not your case - you are inserting records one-by-one instead of using batch inserts. In order to take advantage of batches your code should look like:
#Transactional(propagation = Propagation.REQUIRES_NEW)
public void commitBatch(List<Xaction> xactionBatch) {
session.doWork(connection -> {
String insert = "INSERT INTO XactionParticipant VALUES (?, ?, ...)";
try (PreparedStatement ps = connection.prepareStatement(insert)) {
for (Xaction xaction : xactionBatch) {
ps.setString(1, "val1");
ps.setString(2, "val2");
ps.addBatch();
ps.clearParameters();
}
ps.executeBatch();
}
});
}
BTW, Hibernate may do the same if hibernate.jdbc.batch_size is set to large enough positive integer and entities are properly designed (id generation is backed up by DB sequence and allocationSize is large enough)
I am working on creating a tool allowing admins to purge data from the database. Our one collection has millions of records making deletes seize up the system. Originally I was just running a query with that returns a Page and dropping that into the standard delete. Ideally i'd prefer to run the query and delete in one go.
#Query(value = "{ 'timestamp' : {$gte : ?0, $lte: ?1 }}")
public Page deleteByTimestampBetween(Date from, Date to, Pageable pageable);
Is this possible, using the above code the system behaves the same where the program doesnt continue the delete function and the data isnt removed from mongo. Or is there a better approach?
I don't think it is possible using Pageable/Query annotation. You can use Bulk Write to process deletes in batches.
Something like
int count = 0;
int batch = 100; //Send 100 requests at a time
BulkOperations bulkOps = mongoTemplate.bulkOps(BulkOperations.BulkMode.UNORDERED, YourPojo.class);
List<DateRange> dateRanges = generateDateRanges(from, to, step); //Add a function to generate date ranges with the defined step.
for (DateRange dateRange: dateRanges){
Query query = new Query();
Criteria criteria = new Criteria().andOperator(Criteria.where("timestamp").gte(dateRange.from), Criteria.where("timestamp").lte(dateRange.to));
query.addCriteria(criteria);
bulkOps.remove(query);
count++;
if (count == batch) {
bulkOps.execute();
count = 0;
}
}
if (count > 0) {
bulkOps.execute();
}
I am new to hibernate i have doubt in hibernate batch processing, i read some tutorial for hibernate batch processing they said
Session session = SessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ )
{
Employee employee = new Employee(.....);
session.save(employee);
}
tx.commit();
session.close();
Hibernate will cache all the persisted objects in the session-level cache and ultimately your application would fall over with an OutOfMemoryException somewhere around the 50,000th row. You can resolve this problem if you are using batch processing with Hibernate like,
Session session = SessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ )
{
Employee employee = new Employee(.....);
session.save(employee);
if( i % 50 == 0 )
{ // Same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
My doubt is instead of initializing the session outside, why can't we initialize it in to the for loop like,
Session session = null;
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ )
{
session =SessionFactory.openSession()
Employee employee = new Employee(.....);
session.save(employee);
}
tx.commit();
session.close();
Is it correct way or not any one suggest me the correct way?
No. Don't initialize the session in the for loop; every time you start a new session you're starting a new batch (so you have a batch size of one your way, that is it is non-batching). Also, it would be much slower your way. That is why the first example has
if( i % 50 == 0 ) {
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
that is what "flush a batch of inserts and release memory" was for.
Batch Processing in Hibernate means to divide a task of huge numbers to some smaller tasks.
When you fire session.save(obj), hibernate will actually cache that object into its memory (still the object is not written into database), and would save it to database when you commit your transaction i.e when you call transactrion.commit().
Lets say you have millions of records to insert, so firing session.save(obj) would consume a lot of memory and eventually would result into OutOfMemoryException.
Solution :
Creating a simple batch of smaller size and saving them to database.
if( i % 50 == 0 ) {
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
Note :
In code above session.flush() would flush i.e actually save the objects into database and session.clear() would clear any memory occupied by those objects for a batch of size 50.
Batch processing allows you to optimize writing data.
However, the usual advice of flushing and clearing the Hibernate Session is incomplete.
You need to commit the transaction at the end of the batch to avoid long-running transactions which can hurt performance and, if the last item fails, undoing all changes is going to put a lot of pressure on the DB.
Therefore, this is how you should do batch processing:
int entityCount = 50;
int batchSize = 25;
EntityManager entityManager = entityManagerFactory().createEntityManager();
EntityTransaction entityTransaction = entityManager.getTransaction();
try {
entityTransaction.begin();
for (int i = 0; i < entityCount; i++) {
if (i > 0 && i % batchSize == 0) {
entityTransaction.commit();
entityTransaction.begin();
entityManager.clear();
}
Post post = new Post(
String.format("Post %d", i + 1)
);
entityManager.persist(post);
}
entityTransaction.commit();
} catch (RuntimeException e) {
if (entityTransaction.isActive()) {
entityTransaction.rollback();
}
throw e;
} finally {
entityManager.close();
}
I use OpenJPA 2.2.0 on WebSphere Application Server 8 with a MySQL 5.0 DB.
I have a list of objects which I want to merge into the DB.
it's like:
for (Object ob : list) {
Long start = Calendar.getInstance().getTimeInMillis();
em = factory.createEntityManager();
em.getTransaction().begin();
em.merge(ob);
em.getTransaction().commit();
em.close();
Long end = Calendar.getInstance().getTimeInMillis();
Long diff = end - start;
LOGGER.info("Time: " + diff);
}
When I run this loop I need about 300-600 Milliseconds to merge one object. When I delete the line "em.merge(ob);" then I need "0" Milliseconds to iterate over 1 List Object.
So my question is: What can I do to improve the time to merge one object?
Thanks!
You can try starting the transaction before iteration & then commiting it afterwards within a single transaction. So, basically you are creating a batch which would be merged/persisted on commit.
Also, you can limit the number of objects in a batch to be processed at a time & can explicitly flush the changes into database.
Here, you are initiating a transaction & commiting it in each iteration and also creating/closing entity manager each time, will affect performance for numerous data.
It will be something like below code.
em = factory.createEntityManager();
em.getTransaction().begin();
int i = 0;
for (Object ob : list) {
Long start = Calendar.getInstance().getTimeInMillis();
em.merge(ob);
Long end = Calendar.getInstance().getTimeInMillis();
Long diff = end - start;
LOGGER.info("Time: " + diff);
/*BATCH_SIZE is the number of entities
that will be persisted/merged at once */
if(i%BATCH_SIZE == 0){
em.flush();
em.clear();
}
i++;
}
em.getTransaction().commit();
em.close();
Here, you can also rollback the whole transaction if any of the object fails to persist/merge.
Given the following I am trying to force the child collection (countryData) to be loaded when I perform the query, this works however I end up with duplicates of the Bin records loaded.
public Collection<Bin> getBinsByPromotion(String season, String promotion) {
final Session session = sessionFactory.getCurrentSession();
try {
session.beginTransaction();
return (List<Bin>) session.createCriteria(Bin.class).
setFetchMode("countryData", FetchMode.JOIN).
add(Restrictions.eq("key.seasonCode", season)).
add(Restrictions.eq("key.promotionCode", promotion)).
add(Restrictions.ne("status", "closed")).
list();
} finally {
session.getTransaction().commit();
}
}
I don't want the default (lazy) behavior as the query will return ~8k records thus sending 16k additional queries off to get the child records.
If nothing else I'd prefer.
select ... from bins b where b.seasonCode = ?
and b.promotionCode = ?
and b.status <> 'Closed';
select ... from binCountry bc where bc.seasonCode = ?
and bc.promotionCode = ?;
you can use CriteriaSpecification.DISTINCT_ROOT_ENTITY;
criteria.setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY);