How to delete entities in (multithreaded) transactional Spring/JPA

How to delete entities in (multithreaded) transactional Spring/JPA - java

Assume there is a service component, taking use of a CrudRepository of Book entities. Assume one of the methods of the service should be transactional, and should (among other things) delete one entity from the database with transactional semantics (i.e. should the delete be impossible to perform, all effects should be rolled-back).
Roughly,
#Component
public class TraService {
#Autowired
BookRepo repo;
#Transactional
public void removeLongest() {
//some repo.find's and business logic --> Book toDel
repo.delete(toDel);
}
}
Now this should work in a multithreaded context, e.g. in a Spring MVC. For simplicity I launch 2 threads, each on a task provided with a reference to the TraService bean. Logs show, that indeed two EntityManagers are created and bound to the respective threads. However, as the first thread is successful with delete, the other throws
org.springframework.orm.jpa.JpaOptimisticLockingFailureException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1;
from which I do not know how to recover (i.e. I suspect the rollback is not complete, and the code the thread was supposed to execute after calling the transactional method will not be executed). Worker's code:
public void run() {
service.removeLongest(); //transactional
System.out.println("Code progressing really well " + Thread.currentThread()); //not executed on thread with exception
}
How do we properly handle such transactional deletes in Spring/JPA?

Short answer: The correct behaviour on optimistic lock exceptions is Catch the exception and Retry.
Long answer: Optimistic locking is a mutex strategy that assume that
from: https://en.wikipedia.org/wiki/Optimistic_concurrency_control
multiple transactions can frequently complete without interfering with each other
Optimistic locking exist mainly because of performances, and is usually implemented with a field that takes into account each modification with a version counter. If during a transaction the version counter changes, that means that a concurrent modification is happening and that cause the exception to be thrown.
Pessimistic locking will instead prevent any possible concurrent modification, easier to manage, but with worse performances.

Related

Spring/Hibernate: Best option for dealing with Galera/MySql/Mariadb replication delay during asynchronous processing

In my service I have an endpoint that creates a resource. After creation the resource needs to be validated and otherwise "prepared" for future processing. To accomplish this my service creates the resource in the database, spawns an asynchronous thread to perform the validation, and then returns to the user.
Entry point:
#Override
public FooDto createFoo(FooDto fooDto) {
FooDto retDto = fooService.createFoo(fooDto); //Annotated with #Transactional
asyncFooService.initializeFooAsync(retDto.getFooId()); //Annotated with #Transactional and #Async
return retDto;
}
Async call
#Transactional
#Async
#Override
public void initializeFooAsync(String foodId) {
Foo foo = fooRepository.findById(foodId);
logger.info("Found foo with id={}", foo.getId())
//More processing which can take a while to perform
}
I was careful to ensure that I have exited the transactional boundaries so that the commit would run before the async call would happen, and that the async method lives in a different bean than the entry method. So logically this should have no issues seeing the data from the first method in the second, and the second should be running asynchronously.
What I have noticed is that the log message in the async call is sometimes throwing a null pointer exception as foo is null. By the time I get notified of this I can see in the database that the wanted foo record exists.
My persistance layer consists of three MySQL or MariaDB replicas (depending on the enviornment) in "master/master" configuration, so what I have derived is that the insert done in fooService.createFoo is going to nodeA, and the select done by initializeFooAsync is going to nodeB which has yet to persist the replication from nodeA. The further evidence I have for this is I have done a patch which, in the initializeFooAsync method checks for a null Food and tries to find it again after 3 seconds. This patch has worked.
I'm looking for other, "cleaner" approaches that don't utilize thread.sleep. The other approach that I thought of was using RMQ (which is available to me) and dead letter exchanges to create a delayed processing queue with limited amount of retries should Foo not be found (so if not found try again in Xms up to Y times). However this approach is being frowned upon by the chief architect.
The other approach I see is to do more of the same, and just do more checks in initializeFooAsync at shorter intervals to minimize wait time. Regardless it would essentially be the same solution using Thread.sleep to deal with replication delay.
Doing the initialization inline with the creation is not possible as this is a specific requirement from product, and the initialization may end up taking what they consider a "significant" amount of time due to coordination.
Is there some other utility or tool in the Spring/Java ecosystem that can help me deliver a cleaner approach? Preferably something that doesn't rely on sleeping my thread.

Spring #Transaction and #Async usage for database operations

In a spring application when we receive message #Service persist bean is calling the database operation to insert in to database & parallel #Service to parse & process message. In this case persist is using #Transactional. In order to make the flow in parallel, is it advised to add #Async for persist.
Additionally there is #Aspect on each save method called by persist service for logging & audit.
Is #Async advisable for database operations?
Does #Async create table locks?

All that #Async does is cause the methods of the annotated component to be executed on another thread, where it gets the thread from a pool (which can be specified, so you can choose for some operations to have a dedicated pool).
#Async itself doesn’t do anything to lock database tables, or anything else database-related. If you want database-level locking you will have to implement that through some other means. If you want the call to use a transaction you have to use the #Transactional annotation on the component being called asynchronously. The transaction will be separate from the caller's transaction. Of course the transaction can possibly cause database locking depending on the isolation level and database implementation.
It’s tricky to use #Async with database work. One pitfall occurs with jpa persistent entities passed across threads, when they have a lazy property that gets realized in the new thread (where the proxy is now invalid because it can’t get to the entityManager from the old thread). It’s safer if the things passed between threads are immutable.
#Async adds complexity and is hard to reason about. There are opportunities for race conditions and deadlocks where if you don’t get it exactly right then bad things can happen, and you can’t count on testing to uncover the issues. It’s working without a net, if you want any infrastructure to help with exception handling, retries, or other recovery you will have to provide it yourself.
So no, I wouldn’t necessarily call it advisable. It's a good capability to have in your toolbox that might be helpful for a few isolated cases, but pervasive usage would seem like a bad thing. There are alternatives if you’re looking for ways to persist data without blocking.

Java EE - Singleton EJB with Concurrent Access to Synchronized Method

I have an EJB as below. This has been created solely for test purposes - I'm "sleeping" the thread as I want to simulate the case where the query is scheduled again before the synchronized method has finished executing.
The observed behaviour is as expected - but is this the correct way to poll the database for, for example, rows that have been inserted so that some processing can be performed, before they are updated? I want the method to be synchronized as I don't want another call to modify the database state while processing those from a previous method call
#Singleton
public class MyResource {
#PersistenceContext(unitName="MyMonitor")
private EntityManager em;
#Schedule(second="*", minute="*", hour="*")
public synchronized void checkDb() throws SQLException, InterruptedException {
List<ReferenceNames> l =
em.createQuery("from Clients cs", Clients.class).getResultList();
Thread.sleep(2000);
System.out.println(l.size());
}
}

You should not implement a single point of database access yourself, just to make sure, that records are not changed during an update. For that, you want to use database locking. In Java EE / JPA 2.0 you have several locking modes at hand, check out for example this Oracle blog or this wikibook article. Concerning the other components trying to write during the locking, you have to react to the lock exception and implement some sort of retry mechanism.

What happens when a #Transactional annotated method is hit in parallel by multiple instances?

Please correct me if I am wrong somewhere.
I am having an issue where my transaction are not being saved to the data base and some sort of racing is occurring which screws up the data. The app is hit in parallel by multiple instances. I have used #Transactional, which I know is to do a transaction with database and the transaction is committed when the method returns.
The question is, does hitting it through multiple instance still maintain this one transaction per hit thing, or it does not handle the situation and data will screw up because of racing?
Can a solution be suggested for the given condition?

The #Transactional is not related to synchronization. It just makes sure that your flow either succeeds or fails. Each hit has its own flow and its own success or failure.
I guess what you're experiencing is due to the use of shared data.
For example. If you have a class Foo that looks like this:
public class Foo {
private static boolean flag = true;
#Transactional
public void doSomething() {
flag = false;
}
}
In this case it doesn't matter that you have many Foo instances because they all use the same flag.
Another scenario would be if you have one instance of Foo (very common if you use something like Spring) and you have data that is changed for this instance. You can look at the same Foo example and just remove the static from flag:
public class Foo {
private boolean flag = true;
#Transactional
public void doSomething() {
flag = false;
}
}
In either of those cases you need to synchronize the data changes somehow. It has nothing to do with #Transactional.

That transactions are database transactions and behavior is database engine dependant but it usually works this way:
A thread enter the method.
A thread enter the same or any other transactional method. It does not block as #Transactional is not about synchronization.
One thread execute any query that blocks a database resource. Eg. SELECT * FROM MYTABLE FOR UPDATE;.
Another thread try to execute anything that needs the same database resource. Eg. UPDATE MYTABLE SET A = A + 1; And it blocks.
The thread that acquired the lock on step 3 completes the transactional method successfully making an implicit commit or fails making an implicit rollback.
The blocked thread wakes up and continues as it can now get the resource that was locked.

Spring transactions and their interaction with the synchronized keyword

I have a DAO class that uses Spring JDBC to access an SQLite database. I have declared transactions on the DAO methods themselves since my service layer never combines queries in a transaction.
Since I use a few worker threads in parallel but only one thread can update an SQLite DB at the same time, I use synchronized to serialize access to the DAO.
At first, I synchronized externally from my service class, for example:
synchronized (dao) {
dao.update(...);
}
Then, I figured I might as well get rid of the external synchronization and put synchronized on the DAO method itself:
public synchronized void update(...) {
// Spring JDBC calls here
}
The strange thing is: my queries now take twice the time they used to!
Why?

Well, one difference is obvious:
synchronized (dao) {
// here you are synchronizing on the transactional proxy
}
public synchronized void update(...) {
// and here you are synchronizing on the target class, *inside* the proxy
}
What the implications of this are depends on your other code, but that's the obvious difference.

My guess is your update method or entire class is annotated with Transactional or wrapped by transactional proxy through other means. This means whenever you call dao's method, the transactional proxy retrieves db connection from the pool, opens a transaction and then calls the real method.
In your first scenario you synchronize before even reaching the proxy, thus no connection and transaction magic happens. In the second scenario you do the waiting call after that.
If there are multiple threads trying to perform simultaneous updates there will be only one doing the update and the rest will be first opening new connections and then waiting for dao access. As a consequence instead of one connection being constantly reused you will have multiple connections in use. I can only guess how this really affects the performance but you can experiment with different pool size starting with one.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.