Batch insert using spring data - java

I have 60K records to be inserted. I want to commit the records by batch of 100.
Below is my code
for(int i = 0 ;i < 60000; i++) {
entityRepo.save(entity);
if(i % 100 == 0) {
entityManager.flush();
entityManager.clear();
LOG.info("Committed = " + i);
}
}
entityManager.flush();
entityManager.clear();
I keep checking the database whenever I receive the log but I don't see the records getting committed.. What am I missing?

It is not enough to call flush() and clear(). You need a reference to the Transaction and call .commit() (from the reference guide)
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
}
tx.commit();
session.close();

I assume two ways to do this, One as define transaction declarative, and call from external method.
Parent:
List<Domain> domainList = new ArrayList<>();
for(int i = 0 ;i < 60000; i++) {
domainList.add(domain);
if(i%100 == 0){
child.saveAll(domainList);
domainList.clear();
}
}
Child:
#Transactional
public void saveAll(List<Domain> domainList) {
}
This calls the declarative method at regular intervals as defined by the parent.
The other one is to manually begin and end the transaction and close the session.

Related

Java JDBC bugs with transaction and losing data

I have 2 bugs rarely happening during last 3 years.
If I have 100 orders during a day 1-2 orders have alerts when manual db check says that number was not incremented, !! but when i check db manually it is really incremented.
If I have 3000 orders during a month 3-5 orders have alerts when lock is not released from order after order completion. But when I check db manually it is not null when it should be null
I am using only jdbcTemplate and transactional template(select, update, read). I am using JPA only when insert a model to mysql.
Everything is done with lock by 1 thread.
Code snippet to show the issue:
public synchronized void test() {
long payment = 999;
long bought_times_before = jdbcTemplate.queryForObject("select bought_times from user where id = ?", new Object[]{1}, Long.class);
TransactionTemplate tmpl = new TransactionTemplate(txManager);
tmpl.setTimeout(300);
tmpl.setName("p:" + payment);
tmpl.executeWithoutResult(status -> {
jdbcTemplate.update("update orders set attempts_to_verify = attempts_to_verify + 1, transaction_value = null where id = ?", payment);
jdbcTemplate.update("update orders set locked = null where id = ?", payment);
jdbcTemplate.update("update user set bought_times = bought_times + 1 where id = 1");
});
long bought_times_after = jdbcTemplate.queryForObject("select bought_times from user where id = ?", new Object[]{1}, Long.class);
if (bought_times_after <= bought_times_before) log.error("bought_times_after <= bought_times_before");
}
I upgraded mysql and implemented redis distributed lock to allow only 1 thread run code with select and transaction and select.
UPDATE:
default isolation level is read comited
i tried serializable but it still has the same bug
UPDATE 2:
re: lock != null after transaction it is somehow related to high load on mysql, since it is never occur when low load.
UPDATE 3:
i checked mysql logs - nothing, no errors
also i tried to use REQUIRED_NEW + SERIALIZABLE but received dead locks
UPDATE 4:
i wrote a test and cannot reproduce the issue - but on production there are more than 1 transaction as well as more updates and reads but i guess it is hardware issue or mysql bug
#PostConstruct
public void test(){
jdbcTemplate.execute("CREATE TEMPORARY TABLE IF NOT EXISTS TEST ( id int, name int, locked boolean )");
jdbcTemplate.execute("insert into TEST values(1, 1, 1);");
for(int i = 0; i < 100000; i++) {
long prev = jdbcTemplate.queryForObject("select name from TEST where id = 1", Long.class);
TransactionTemplate tmpl = new TransactionTemplate(txManager);
jdbcTemplate.update("update TEST set locked = true where id = 1;");
tmpl.execute(new TransactionCallbackWithoutResult() {
#SneakyThrows
#Override
protected void doInTransactionWithoutResult(org.springframework.transaction.TransactionStatus status) {
jdbcTemplate.update("update TEST set name = name + 1 where id = 1;");
jdbcTemplate.update("update TEST set locked = false where id = 1;");
}
});
long curr = jdbcTemplate.queryForObject("select name from TEST where id = 1", Long.class);
boolean lock = jdbcTemplate.queryForObject("select locked from TEST where id = 1", Boolean.class);
if(curr <= prev){
log.error("curr <= prev");
}
if(lock){
log.error("lock = true");
}
}
}
UPDATE 5: WAS ABLE TO REPRODUCE IT!!!!
#PostConstruct
public void test(){
jdbcTemplate.execute("CREATE TEMPORARY TABLE IF NOT EXISTS TEST ( id int, name int, locked boolean )");
jdbcTemplate.execute("insert into TEST values(1, 1, 1);");
ExecutorService executorService = Executors.newFixedThreadPool(100);
for(int i = 0; i < 100000; i++) {
executorService.submit(() -> {
RLock rLock = redissonClient.getFairLock("lock");
try {
rLock.lock(120, TimeUnit.SECONDS);
long prev = jdbcTemplate.queryForObject("select name from TEST where id = 1", Long.class);
TransactionTemplate tmpl = new TransactionTemplate(txManager);
jdbcTemplate.update("update TEST set locked = true where id = 1;");
tmpl.execute(new TransactionCallbackWithoutResult() {
#SneakyThrows
#Override
protected void doInTransactionWithoutResult(org.springframework.transaction.TransactionStatus status) {
jdbcTemplate.update("update TEST set name = name + 1 where id = 1;");
jdbcTemplate.update("update TEST set locked = false where id = 1;");
}
});
long curr = jdbcTemplate.queryForObject("select name from TEST where id = 1", Long.class);
boolean lock = jdbcTemplate.queryForObject("select locked from TEST where id = 1", Boolean.class);
if (curr <= prev) {
log.error("curr <= prev");
}
if (lock) {
log.error("lock = true");
}
} finally {
rLock.unlock();
}
});
}
}
UPDATE 7: after the second and third run i cannot reproduce it again neither with Lock nor with FairLock ..
UPDATE 8: on prod i am using 3 redis lock with 120 sec timeouts so i think there is timeout occurs rarely on 1 of 3 lock thus code might be executed by 2 threads without lock
SOLUTION: increase lock timeout as well as transaction timeout up to 500 seconds
UPDATE 9: looks like the issue has been resolved but i need to monitor it during couple of weeks before close the issue on stack overflow

How does pessimistic locking work in Hibernate?

I'm currently using Hibernate 6 and H2. I want to safely increment count field of Entity class but using more then 1 thread per time just to make sure that transaction is actually locking my entity. But when I ran this code, result count column in H2 wasn't 10, but instead some random number under 10. What am I missing about pessimistic locking?
for (int a = 0; a < 5; a++) {
executorService.execute(() -> {
Session innerSession = sessionFactory.openSession();
Transaction innerTransaction = innerSession.beginTransaction();
Entity entity = innerSession.get(Entity.class, id, LockMode.PESSIMISTIC_WRITE);
entity.setCount(entity.getCount() + 1);
innerSession.flush();
innerTransaction.commit();
innerSession.close();
});
executorService.execute(() -> {
Session innerSession = sessionFactory.openSession();
Transaction innerTransaction = innerSession.beginTransaction();
Entity entity = innerSession.get(Entity.class, id, LockMode.PESSIMISTIC_WRITE);
entity.setCount(entity.getCount() + 1);
innerSession.flush();
innerTransaction.commit();
innerSession.close();
});
}
Entire method:
Long id;
SessionFactory sessionFactory;
Session session;
Transaction transaction;
ExecutorService executorService = Executors.newFixedThreadPool(4);
Properties properties = new Properties();
Configuration configuration = new Configuration();
properties.put(AvailableSettings.URL, "jdbc:h2:tcp://localhost/~/test");
properties.put(AvailableSettings.USER, "root");
properties.put(AvailableSettings.PASS, "root");
properties.put(AvailableSettings.DIALECT, H2Dialect.class.getName());
properties.put(AvailableSettings.SHOW_SQL, true);
properties.put(AvailableSettings.HBM2DDL_AUTO, Action.CREATE.getExternalHbm2ddlName());
// classes are provided by another library
entityClasses.forEach(configuration::addAnnotatedClass);
sessionFactory = configuration.buildSessionFactory(new StandardServiceRegistryBuilder().applySettings(properties).build());
session = sessionFactory.openSession();
transaction = session.beginTransaction();
// initial value of count field is 0
id = (Long) session.save(new Entity());
transaction.commit();
for (int a = 0; a < 5; a++) {
executorService.execute(() -> {
Session innerSession = sessionFactory.openSession();
Transaction innerTransaction = innerSession.beginTransaction();
Entity entity = innerSession.get(Entity.class, id, LockMode.PESSIMISTIC_WRITE);
entity.setCount(entity.getCount() + 1);
innerSession.flush();
innerTransaction.commit();
innerSession.close();
});
executorService.execute(() -> {
Session innerSession = sessionFactory.openSession();
Transaction innerTransaction = innerSession.beginTransaction();
Entity entity = innerSession.get(Entity.class, id, LockMode.PESSIMISTIC_WRITE);
entity.setCount(entity.getCount() + 1);
innerSession.flush();
innerTransaction.commit();
innerSession.close();
});
}
executorService.shutdown();
executorService.awaitTermination(5, TimeUnit.SECONDS);
session.clear(); // prevent reading from cache
System.out.println(session.get(Entity.class, id).getCount()); // printed result doesn't match 10, same for reading from H2 browser interface
session.close();
Answer was simple, I need just to upgrade version of hibernate to 6.0.0.Alpha9. Higher versions requires 11 java to compile (I'm using 8). Seems like it was a bug in 6.0.0.Alpha6, which I used previously. There was no problem with H2 1.4.200. From hibernate sql logs I understood that the main problem in 6.0.0.Alpha6 was incorrect select query for transaction with pessimistic lock, it was just regular select, but in 6.0.0.Alpha9 already used select for update, which prevents other transactions from reading this row.

java transaction management - propagation

#Transactional(propagation = Propagation.REQUIRES_NEW)
public void jasperToInvoice1(BigDecimal generateNo) throws ClassNotFoundException, SQLException, JRException{
List<TInvoiceSummary> invoices = invoiceSummaryRepository.findListSummary(generateNo) ;
Connection connection = dataSource.getConnection() ;
if (invoices.size() > 0) {
int i = 1 ;
for(TInvoiceSummary invoice : invoices) {
jasperConvert.jInvoicePdf(invoice,connection);
invoice.setIsPdf(true);
em.merge(invoice) ;
if(i % batchSize == 0) {
em.flush();
em.clear();
}
i++ ;
}
}
}
I want to ask something about spring transactional propagation. Let's say I want to insert 10.000 rows into my DB. When I insert to DB I want to flush every 100 record. Is it possible to select the record while the process is still running? and how to do that?
please let me know how to do that and if someone have an alternative way to do something like this.
Thanks

What is the use of Hibernate batch processing

I am new to hibernate i have doubt in hibernate batch processing, i read some tutorial for hibernate batch processing they said
Session session = SessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ )
{
Employee employee = new Employee(.....);
session.save(employee);
}
tx.commit();
session.close();
Hibernate will cache all the persisted objects in the session-level cache and ultimately your application would fall over with an OutOfMemoryException somewhere around the 50,000th row. You can resolve this problem if you are using batch processing with Hibernate like,
Session session = SessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ )
{
Employee employee = new Employee(.....);
session.save(employee);
if( i % 50 == 0 )
{ // Same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
My doubt is instead of initializing the session outside, why can't we initialize it in to the for loop like,
Session session = null;
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ )
{
session =SessionFactory.openSession()
Employee employee = new Employee(.....);
session.save(employee);
}
tx.commit();
session.close();
Is it correct way or not any one suggest me the correct way?
No. Don't initialize the session in the for loop; every time you start a new session you're starting a new batch (so you have a batch size of one your way, that is it is non-batching). Also, it would be much slower your way. That is why the first example has
if( i % 50 == 0 ) {
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
that is what "flush a batch of inserts and release memory" was for.
Batch Processing in Hibernate means to divide a task of huge numbers to some smaller tasks.
When you fire session.save(obj), hibernate will actually cache that object into its memory (still the object is not written into database), and would save it to database when you commit your transaction i.e when you call transactrion.commit().
Lets say you have millions of records to insert, so firing session.save(obj) would consume a lot of memory and eventually would result into OutOfMemoryException.
Solution :
Creating a simple batch of smaller size and saving them to database.
if( i % 50 == 0 ) {
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
Note :
In code above session.flush() would flush i.e actually save the objects into database and session.clear() would clear any memory occupied by those objects for a batch of size 50.
Batch processing allows you to optimize writing data.
However, the usual advice of flushing and clearing the Hibernate Session is incomplete.
You need to commit the transaction at the end of the batch to avoid long-running transactions which can hurt performance and, if the last item fails, undoing all changes is going to put a lot of pressure on the DB.
Therefore, this is how you should do batch processing:
int entityCount = 50;
int batchSize = 25;
EntityManager entityManager = entityManagerFactory().createEntityManager();
EntityTransaction entityTransaction = entityManager.getTransaction();
try {
entityTransaction.begin();
for (int i = 0; i < entityCount; i++) {
if (i > 0 && i % batchSize == 0) {
entityTransaction.commit();
entityTransaction.begin();
entityManager.clear();
}
Post post = new Post(
String.format("Post %d", i + 1)
);
entityManager.persist(post);
}
entityTransaction.commit();
} catch (RuntimeException e) {
if (entityTransaction.isActive()) {
entityTransaction.rollback();
}
throw e;
} finally {
entityManager.close();
}

OpenJPA merging/persisting is very slow

I use OpenJPA 2.2.0 on WebSphere Application Server 8 with a MySQL 5.0 DB.
I have a list of objects which I want to merge into the DB.
it's like:
for (Object ob : list) {
Long start = Calendar.getInstance().getTimeInMillis();
em = factory.createEntityManager();
em.getTransaction().begin();
em.merge(ob);
em.getTransaction().commit();
em.close();
Long end = Calendar.getInstance().getTimeInMillis();
Long diff = end - start;
LOGGER.info("Time: " + diff);
}
When I run this loop I need about 300-600 Milliseconds to merge one object. When I delete the line "em.merge(ob);" then I need "0" Milliseconds to iterate over 1 List Object.
So my question is: What can I do to improve the time to merge one object?
Thanks!
You can try starting the transaction before iteration & then commiting it afterwards within a single transaction. So, basically you are creating a batch which would be merged/persisted on commit.
Also, you can limit the number of objects in a batch to be processed at a time & can explicitly flush the changes into database.
Here, you are initiating a transaction & commiting it in each iteration and also creating/closing entity manager each time, will affect performance for numerous data.
It will be something like below code.
em = factory.createEntityManager();
em.getTransaction().begin();
int i = 0;
for (Object ob : list) {
Long start = Calendar.getInstance().getTimeInMillis();
em.merge(ob);
Long end = Calendar.getInstance().getTimeInMillis();
Long diff = end - start;
LOGGER.info("Time: " + diff);
/*BATCH_SIZE is the number of entities
that will be persisted/merged at once */
if(i%BATCH_SIZE == 0){
em.flush();
em.clear();
}
i++;
}
em.getTransaction().commit();
em.close();
Here, you can also rollback the whole transaction if any of the object fails to persist/merge.

Categories

Resources