Spring - How to commit inside forEach

Spring - How to commit inside forEach - java

I have the following code:
private void sendList(List<Data> myData) {
myData.forEach(x -> {
sendData(x);
})
}
#Transactional
private void sendData(Data myData){
//do some changes in myData object and inserts into table
}
Currently, it commits after the insert is complete for every Data object.
But, I would like to commit every 500 records.
Is possible to do this?

Firstly, you don’t need a loop to save the data. Use saveAll instead. (It’s the power of crud).
Secondly, you should add batching. By default it isn't switched on. Therefore you have to add some params to application.properties:
spring.jpa.properties.hibernate.jdbc.batch_size=500
spring.jpa.properties.hibernate.order_inserts=true
The 1st property tells Hibernate to collect inserts in batches of 500. The order_inserts property tells Hibernate to take the time to group inserts by entity, creating larger batches.
Source

Related

Cannot save after deleting entity in Hibernate

I want to delete all records with some lineId to save another records with the same lineId(as refresh) but after deleting I can't save anything. There isn't any error, but I don't have my record in database.
When I don't have ma deleting code everything saves correctly.
public void deleteAndSaveEntities(List<Entity> entities, Long lineId){
deleteEntities(lineId);
saveEntities(entities);
}
private void deleteEntities(Long lineId) {
List<Entity> entitiesToDelete = entityRepository.findAllByLineId(lineId);
entityRepository.deleteAll(entitiesToDelete);
}
private void saveEntities(List<Entity> entities) {
entityRepository.saveAll(entities);
}

Actually you want to update the entries that has the lineId. Try it as:
First fetch by find..().
Make related changes on that entries
Then save them.

As thomas mentioned, hibernate reorders the queries within the transaction for performance reasons and executes the delete after the update.
I would commit the transaction between these two operations.
Add a #Transactional over deleteEntities and saveEntities.
But be aware that #Transactional does not work when invoked with in the same object.
You must inject the Service into itself and then call the methods on the self reference

How to Rollback the entire process when database error occurs within the loop?

Consider two MicroServices ( 1 & 2), 1st Service retrieves the data from the user and sends the data to the 2nd Service where the data gets stored in the database. This takes place in a loop.
For example, consider the length of the loop as 5. When each time it iterates, it calls the rest service and saves the data to the database. Suppose, while iterating for the 3rd time, the service2 throws some error and the execution stops, and the remaining data are not stored in the Database. Now my question is there any way I can remove the other two data which got saved to Database successfully without writing a separate function for removing the data individually?
Is there any way to roll back the entire process when an error occurs within the loop?
Microservice-1
class MicroService1 {
#Autowired
private RestTemplate restTemplate;
void rollBack(){
String [] name={"hello","hai","hey"};
for(String n: name){
MyObject object=new MyObject();
object.setName( n );
try{
restTemplate.postForEntity("micorservice-2",object,String.class);
} catch(){
.....
}
}
}
}
Microservice-2
class MicroService2 {
#Autowired
private MyRepository repo;
//Template call comes to this function and saves the data to the database
void saveData(MyObject obj){
try{
repository.save(obj);
} catch(Exception ex) {
....
}
}
}

Every external rest call is technically independent from each other, and every request starts a new transaction in MicroService2. These transactions are also independent and unaware of each other.
Your business transaction (saving entities in a loop) spans multiple technical transactions.
is there any way I can remove the other two data which got saved to Database successfully without writing a separate function for removing the data individually? - So no, there is no out of the box solution. You have to write your custom rollback logic, to delete the saved entities from previous transactions.
If your use case allows it, you could wait for the loop to end in Microservice1 and then initiate a single external rest call to Microservice2, the payload of the call would be a list of entities, it would be a bulk save. This way, either all entities are saved, or none, plus if the save fails in Microservice2, you could also rollback Microservice1.

Try using "Transactions" or "Saga" : https://microservices.io/patterns/data/saga.html

Spring Boot JPARepository performance on save()

I have an issue where my spring boot application performance is very slow when inserting data.
I am extracting a large subset of data from one database and inserting the data into another database.
The following is my entity.
#Entity
#Table(name = "element")
public class VXMLElementHistorical {
#Id
#Column(name = "elementid")
private long elementid;
#Column(name = "elementname")
private String elementname;
Getter/Setter methods...
I have configured a JPA repository
public interface ElementRepository extends JpaRepository<Element, Long> {
}
and call the save() method with my object
#Transactional
public void processData(List<sElement> hostElements)
throws DataAccessException {
List<Element> elements = new ArrayList<Element>();
for (int i = 0; i < hostElements.size(); i++) {
Element element = new Element();
element.setElementid(hostElements.get(i).getElementid());
element.setElementname(hostElements.get(i).getElementname());
elements.add(element);
}
try{
elementRepository.save(elements);{
//catch etc...
}
What is happening is that for each item, it is taking between 6 and 12 seconds to perform an insert. I have turned on hibernate trace logging and statistics and what is happening when I call the save function is that hibernate performs two queries, a select and an insert. The select query is taking 99% of the overall time.
I have ran the select query direct on the database and the result returns in nanoseconds. Which leads me to believe it is not an indexing issue however I am no DBA.
I have created a load test in my dev environment, and with similar load sizes, the over all process time is no where near as long as in my prod environment.
Any suggestions?

Instead of creating a list of elements and saving those, save the individual elements. Every now an then do a flush and clear to prevent dirty checking to become a bottleneck.
#PersistenceContext
private EntityManager entityManager;
#Transactional
public void processData(List<sElement> hostElements)
throws DataAccessException {
for (int i = 0; i < hostElements.size(); i++) {
Element element = new Element();
element.setElementid(hostElements.get(i).getElementid());
element.setElementname(hostElements.get(i).getElementname());
elementRepository.save(element)
if ( (i % 50) == 0) {
entityManager.flush();
entityManager.clear();
}
}
entityManager.flush(); // flush the last records.
You want to flush + clear every x elements (here it is 50 but you might want to find your own best number.
Now as you are using Spring Boot you also might want to add some additional properties. Like configuring the batch-size.
spring.jpa.properties.hibernate.jdbc.batch_size=50
This will, if your JDBC driver supports it, convert 50 single insert statements into 1 large batch insert. I.e. 50 inserts to 1 insert.
See also https://vladmihalcea.com/how-to-batch-insert-and-update-statements-with-hibernate/

As #M. Deinum said in comment you can improve by calling flush() and clear() after a certain number of inserts like below.
int i = 0;
for(Element element: elements) {
dao.save(element);
if(++i % 20 == 0) {
dao.flushAndClear();
}
}

Since loading the entities seems to be the bottleneck and you really just want to do inserts, i.e. you know the entities don't exist in the database you probably shouldn't use the standard save method of Spring Data JPA.
The reason is that it performs a merge which triggers Hibernate to load an entity that might already exist in the database.
Instead, add a custom method to your repository which does a persist on the entity manager. Since you are setting the Id in advance, make sure you have a version property so that Hibernate can determine that this indeed is a new entity.
This should make the select go away.
Other advice given in other answers is worth considering as a second step:
enable batching.
experiment with intermediate flushing and clearing of the session.
saving one instance at a time without gathering them in a collection, since the call to merge or persist doesn't actually trigger writing to the database, but only the flushing does (this is a simplification, but it shall do for this context)

Orphan removal in JPA 2 based on only one plain string

I have the following situation:
class Container {
...
String key;
...
}
class Item {
String containerKey;
}
I require a mechanism to automatically delete all items "referencing" containers, something like cascading.
Is there such a mechanism in JPA 2?

No, you'll have to get them all and delete them, or execute a delete query:
delete from Item i where i.containerKey = :containerKey

It's not a JPA related solution, but what I did was to create a DB trigger. So every time when a record is deleted from the first table the deletion from the second one is triggered also.

Find or insert based on unique key with Hibernate

I'm trying to write a method that will return a Hibernate object based on a unique but non-primary key. If the entity already exists in the database I want to return it, but if it doesn't I want to create a new instance and save it before returning.
UPDATE: Let me clarify that the application I'm writing this for is basically a batch processor of input files. The system needs to read a file line by line and insert records into the db. The file format is basically a denormalized view of several tables in our schema so what I have to do is parse out the parent record either insert it into the db so I can get a new synthetic key, or if it already exists select it. Then I can add additional associated records in other tables that have foreign keys back to that record.
The reason this gets tricky is that each file needs to be either totally imported or not imported at all, i.e. all inserts and updates done for a given file should be a part of one transaction. This is easy enough if there's only one process that's doing all the imports, but I'd like to break this up across multiple servers if possible. Because of these constraints I need to be able to stay inside one transaction, but handle the exceptions where a record already exists.
The mapped class for the parent records looks like this:
#Entity
public class Foo {
#Id
#GeneratedValue(strategy = IDENTITY)
private int id;
#Column(unique = true)
private String name;
...
}
My initial attempt at writting this method is as follows:
public Foo findOrCreate(String name) {
Foo foo = new Foo();
foo.setName(name);
try {
session.save(foo)
} catch(ConstraintViolationException e) {
foo = session.createCriteria(Foo.class).add(eq("name", name)).uniqueResult();
}
return foo;
}
The problem is when the name I'm looking for exists, an org.hibernate.AssertionFailure exception is thrown by the call to uniqueResult(). The full stack trace is below:
org.hibernate.AssertionFailure: null id in com.searchdex.linktracer.domain.LinkingPage entry (don't flush the Session after an exception occurs)
at org.hibernate.event.def.DefaultFlushEntityEventListener.checkId(DefaultFlushEntityEventListener.java:82) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.event.def.DefaultFlushEntityEventListener.getValues(DefaultFlushEntityEventListener.java:190) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.event.def.DefaultFlushEntityEventListener.onFlushEntity(DefaultFlushEntityEventListener.java:147) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.event.def.AbstractFlushingEventListener.flushEntities(AbstractFlushingEventListener.java:219) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.event.def.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:99) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.event.def.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:58) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.impl.SessionImpl.autoFlushIfRequired(SessionImpl.java:1185) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1709) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:347) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
at org.hibernate.impl.CriteriaImpl.uniqueResult(CriteriaImpl.java:369) [hibernate-core-3.6.0.Final.jar:3.6.0.Final]
Does anyone know what is causing this exception to be thrown? Does hibernate support a better way of accomplishing this?
Let me also preemptively explain why I'm inserting first and then selecting if and when that fails. This needs to work in a distributed environment so I can't synchronize across the check to see if the record already exists and the insert. The easiest way to do this is to let the database handle this synchronization by checking for the constraint violation on every insert.

I had a similar batch processing requirement, with processes running on multiple JVMs. The approach I took for this was as follows. It is very much like jtahlborn's suggestion. However, as vbence pointed out, if you use a NESTED transaction, when you get the constraint violation exception, your session is invalidated. Instead, I use REQUIRES_NEW, which suspends the current transaction and creates a new, independent transaction. If the new transaction rolls back it will not affect the original transaction.
I am using Spring's TransactionTemplate but I'm sure you could easily translate it if you do not want a dependency on Spring.
public T findOrCreate(final T t) throws InvalidRecordException {
// 1) look for the record
T found = findUnique(t);
if (found != null)
return found;
// 2) if not found, start a new, independent transaction
TransactionTemplate tt = new TransactionTemplate((PlatformTransactionManager)
transactionManager);
tt.setPropagationBehavior(TransactionDefinition.PROPAGATION_REQUIRES_NEW);
try {
found = (T)tt.execute(new TransactionCallback<T>() {
try {
// 3) store the record in this new transaction
return store(t);
} catch (ConstraintViolationException e) {
// another thread or process created this already, possibly
// between 1) and 2)
status.setRollbackOnly();
return null;
}
});
// 4) if we failed to create the record in the second transaction, found will
// still be null; however, this would happy only if another process
// created the record. let's see what they made for us!
if (found == null)
found = findUnique(t);
} catch (...) {
// handle exceptions
}
return found;
}

You need to use UPSERT or MERGE to achieve this goal.
However, Hibernate does not offer support for this construct, so you need to use jOOQ instead.
private PostDetailsRecord upsertPostDetails(
DSLContext sql, Long id, String owner, Timestamp timestamp) {
sql
.insertInto(POST_DETAILS)
.columns(POST_DETAILS.ID, POST_DETAILS.CREATED_BY, POST_DETAILS.CREATED_ON)
.values(id, owner, timestamp)
.onDuplicateKeyIgnore()
.execute();
return sql.selectFrom(POST_DETAILS)
.where(field(POST_DETAILS.ID).eq(id))
.fetchOne();
}
Calling this method on PostgreSQL:
PostDetailsRecord postDetailsRecord = upsertPostDetails(
sql,
1L,
"Alice",
Timestamp.from(LocalDateTime.now().toInstant(ZoneOffset.UTC))
);
Yields the following SQL statements:
INSERT INTO "post_details" ("id", "created_by", "created_on")
VALUES (1, 'Alice', CAST('2016-08-11 12:56:01.831' AS timestamp))
ON CONFLICT DO NOTHING;
SELECT "public"."post_details"."id",
"public"."post_details"."created_by",
"public"."post_details"."created_on",
"public"."post_details"."updated_by",
"public"."post_details"."updated_on"
FROM "public"."post_details"
WHERE "public"."post_details"."id" = 1
On Oracle and SQL Server, jOOQ will use MERGE while on MySQL it will use ON DUPLICATE KEY.
The concurrency mechanism is ensured by the row-level locking mechanism employed when inserting, updating, or deleting a record, which you can view in the following diagram:
Code avilable on GitHub.

Two solution come to mind:
That's what TABLE LOCKS are for
Hibernate does not support table locks, but this is the situation when they come handy. Fortunately you can use native SQL thru Session.createSQLQuery(). For example (on MySQL):
// no access to the table for any other clients
session.createSQLQuery("LOCK TABLES foo WRITE").executeUpdate();
// safe zone
Foo foo = session.createCriteria(Foo.class).add(eq("name", name)).uniqueResult();
if (foo == null) {
foo = new Foo();
foo.setName(name)
session.save(foo);
}
// releasing locks
session.createSQLQuery("UNLOCK TABLES").executeUpdate();
This way when a session (client connection) gets the lock, all the other connections are blocked until the operation ends and the locks are released. Read operations are also blocked for other connections, so needless to say use this only in case of atomic operations.
What about Hibernate's locks?
Hibernate uses row level locking. We can not use it directly, because we can not lock non-existent rows. But we can create a dummy table with a single record, map it to the ORM, then use SELECT ... FOR UPDATE style locks on that object to synchronize our clients. Basically we only need to be sure that no other clients (running the same software, with the same conventions) will do any conflicting operations while we are working.
// begin transaction
Transaction transaction = session.beginTransaction();
// blocks until any other client holds the lock
session.load("dummy", 1, LockOptions.UPGRADE);
// virtual safe zone
Foo foo = session.createCriteria(Foo.class).add(eq("name", name)).uniqueResult();
if (foo == null) {
foo = new Foo();
foo.setName(name)
session.save(foo);
}
// ends transaction (releasing locks)
transaction.commit();
Your database has to know the SELECT ... FOR UPDATE syntax (Hibernate is goig to use it), and of course this only works if all your clients has the same convention (they need to lock the same dummy entity).

The Hibernate documentation on transactions and exceptions states that all HibernateExceptions are unrecoverable and that the current transaction must be rolled back as soon as one is encountered. This explains why the code above does not work. Ultimately you should never catch a HibernateException without exiting the transaction and closing the session.
The only real way to accomplish this it would seem would be to manage the closing of the old session and reopening of a new one within the method itself. Implementing a findOrCreate method which can participate in an existing transaction and is safe within a distributed environment would seem to be impossible using Hibernate based on what I have found.

The solution is in fact really simple. First perform a select using your name value. If a result is found, return that. If not, create a new one. In case the creation fail (with an exception), this is because another client added this very same value between your select and your insert statement. This is then logical that you have an exception. Catch it, rollback your transaction and run the same code again. Because the row already exist, the select statement will find it and you'll return your object.
You can see here explanation of strategies for optimistic and pessimistic locking with hibernate here : http://docs.jboss.org/hibernate/core/3.3/reference/en/html/transactions.html

a couple people have mentioned different parts of the overall strategy. assuming that you generally expect to find an existing object more often than you create a new object:
search for existing object by name. if found, return
start nested (separate) transaction
try to insert new object
commit nested transaction
catch any failure from nested transaction, if anything but constraint violation, re-throw
otherwise search for existing object by name and return it
just to clarify, as pointed out in another answer, the "nested" transaction is actually a separate transaction (many databases don't even support true, nested transactions).

Well, here's one way to do it - but it's not appropriate for all situations.
In Foo, remove the "unique = true" attribute on name. Add a timestamp that gets updated on every insert.
In findOrCreate(), don't bother checking if the entity with the given name already exists - just insert a new one every time.
When looking up Foo instances by name, there may be 0 or more with a given name, so you just select the newest one.
The nice thing about this method is that it doesn't require any locking, so everything should run pretty fast. The downside is that your database will be littered with obsolete records, so you may have to do something somewhere else to deal with them. Also, if other tables refer to Foo by its id, then this will screw up those relations.

Maybe you should change your strategy:
First find the user with the name and only if the user thoes not exist, create it.

I would try the following strategy:
A. Start a main transaction (at time 1)
B. Start a sub-transaction (at time 2)
Now, any object created after time 1 will not be visible in the main transaction. So when you do
C. Create new race-condition object, commit sub-transaction
D. Handle conflict by starting a new sub-transaction (at time 3) and getting the object from a query (the sub-transaction from point B is now out-of-scope).
only return the object primary key and then use EntityManager.getReference(..) to obtain the object you will be using in the main transaction. Alternatively, start the main transaction after D; it is not totally clear to me in how many race conditions you will have within your main transaction, but the above should allow for n times B-C-D in a 'large' transaction.
Note that you might want to do multi-threading (one thread per CPU) and then you can probably reduce this issue considerably by using a shared static cache for these kind of conflicts - and point 2 can be kept 'optimistic', i.e. not doing a .find(..) first.
Edit: For a new transaction, you need an EJB interface method call annotated with transaction type REQUIRES_NEW.
Edit: Double check that the getReference(..) works as I think it does.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spring - How to commit inside forEach - java

Related

Cannot save after deleting entity in Hibernate

How to Rollback the entire process when database error occurs within the loop?

Spring Boot JPARepository performance on save()

Orphan removal in JPA 2 based on only one plain string

Find or insert based on unique key with Hibernate

Categories

Resources