Spring #Transaction and #Async usage for database operations

Spring #Transaction and #Async usage for database operations - java

In a spring application when we receive message #Service persist bean is calling the database operation to insert in to database & parallel #Service to parse & process message. In this case persist is using #Transactional. In order to make the flow in parallel, is it advised to add #Async for persist.
Additionally there is #Aspect on each save method called by persist service for logging & audit.
Is #Async advisable for database operations?
Does #Async create table locks?

All that #Async does is cause the methods of the annotated component to be executed on another thread, where it gets the thread from a pool (which can be specified, so you can choose for some operations to have a dedicated pool).
#Async itself doesn’t do anything to lock database tables, or anything else database-related. If you want database-level locking you will have to implement that through some other means. If you want the call to use a transaction you have to use the #Transactional annotation on the component being called asynchronously. The transaction will be separate from the caller's transaction. Of course the transaction can possibly cause database locking depending on the isolation level and database implementation.
It’s tricky to use #Async with database work. One pitfall occurs with jpa persistent entities passed across threads, when they have a lazy property that gets realized in the new thread (where the proxy is now invalid because it can’t get to the entityManager from the old thread). It’s safer if the things passed between threads are immutable.
#Async adds complexity and is hard to reason about. There are opportunities for race conditions and deadlocks where if you don’t get it exactly right then bad things can happen, and you can’t count on testing to uncover the issues. It’s working without a net, if you want any infrastructure to help with exception handling, retries, or other recovery you will have to provide it yourself.
So no, I wouldn’t necessarily call it advisable. It's a good capability to have in your toolbox that might be helpful for a few isolated cases, but pervasive usage would seem like a bad thing. There are alternatives if you’re looking for ways to persist data without blocking.

Related

What is a transaction boundary in hibernate

I have 2 questions related to each other
Q1 What exactly is a transaction boundary in hibernate/Spring Data JPA.
I am new to JPA , so please give a very basic example so I can understand as I tried to read multiple blogs but still not very clear.
Q2 And on top of it, what does this mean-
In hibernate, persist() method guarantees that it will not execute an INSERT statement if it is called outside of transaction boundaries, save() method does not guarantee the same.
What is outside and inside of a transaction boundary and how executions are performed outside boundaries?

A transaction is a unit of work that is either executed completely or not at all.
Transactions are fairly simple to use in a typical relational database.
You start a transaction by modifying some data. Every modification starts a transaction, you typically can't avoid it. You end the transaction by executing a commit or rollback.
Before your transaction is finished your changes can't be seen in other transactions (there are exceptions, variations and details). If you rollback your transaction all your changes in the database are undone.
If you commit your changes your changes become visible to other transactions, i.e. for other users connected to the same database. Implementations vary among many other things if changes become visible only for new transactions or also for already running transactions.
A transaction in JPA is a database transaction plus additional stuff.
You can start and end a transaction by getting a Transaction object and calling methods on it. But nobody does that anymore since it is error prone. Instead you annotate methods with #Transaction and entering the method will start a transaction and exiting the method will end the transaction.
The details are taken care of by Spring.
The tricky part with JPAs transactions is that within a JPA transaction JPA might (and will) choose to delay or even avoid read and write operations as long as possible. For example when you load an entity, and load it again in the same JPA transaction, JPA won't load it from the database but return the same instance it returned during the first load operation. If you want to learn more about this I recommend looking into JPAs first level cache.

A transaction boundary it's where the transaction starts or is committed/rollbacked.

When a transaction is started, the transaction context is bound to the current thread. So regardless of how many endpoints and channels you have in your Message flow your transaction context will be preserved as long as you are ensuring that the flow continues on the same thread. As soon as you break it by introducing a Pollable Channel or Executor Channel or initiate a new thread manually in some service, the Transactional boundary will be broken as well.
some other people ask about it - look it up.
If you do not understand something write to me again more accurately and I will explain.
I really hope I helped!

Why do I need Transaction in Hibernate for read-only operations?

Why do I need Transaction in Hibernate for read-only operations?
Does the following transaction put a lock in the DB?
Example code to fetch from DB:
Transaction tx = HibernateUtil.getCurrentSession().beginTransaction(); // why begin transaction?
//readonly operation here
tx.commit() // why tx.commit? I don't want to write anything
Can I use session.close() instead of tx.commit()?

Transactions for reading might look indeed strange and often people don't mark methods for transactions in this case. But JDBC will create transaction anyway, it's just it will be working in autocommit=true if different option wasn't set explicitly. But there are practical reasons to mark transactions read-only:
Impact on databases
Read-only flag may let DBMS optimize such transactions or those running in parallel.
Having a transaction that spans multiple SELECT statements guarantees proper Isolation for levels starting from Repeatable Read or Snapshot (e.g. see PostgreSQL's Repeatable Read). Otherwise 2 SELECT statements could see inconsistent picture if another transaction commits in parallel. This isn't relevant when using Read Committed.
Impact on ORM
ORM may cause unpredictable results if you don't begin/finish transactions explicitly. E.g. Hibernate will open transaction before the 1st statement, but it won't finish it. So connection will be returned to the Connection Pool with an unfinished transaction. What happens then? JDBC keeps silence, thus this is implementation specific: MySQL, PostgreSQL drivers roll back such transaction, Oracle commits it. Note that this can also be configured on Connection Pool level, e.g. C3P0 gives you such an option, rollback by default.
Spring sets the FlushMode=MANUAL in case of read-only transactions, which leads to other optimizations like no need for dirty checks. This could lead to huge performance gain depending on how many objects you loaded.
Impact on architecture & clean code
There is no guarantee that your method doesn't write into the database. If you mark method as #Transactional(readonly=true), you'll dictate whether it's actually possible to write into DB in scope of this transaction. If your architecture is cumbersome and some team members may choose to put modification query where it's not expected, this flag will point you to the problematic place.

All database statements are executed within the context of a physical transaction, even when we don’t explicitly declare transaction boundaries (e.g., BEGIN, COMMIT, ROLLBACK).
If you don't declare transaction boundaries explicitly, then each statement will have to be executed in a separate transaction (autocommit mode). This may even lead to opening and closing one connection per statement unless your environment can deal with connection-per-thread binding.
Declaring a service as #Transactional will give you one connection for the whole transaction duration, and all statements will use that single isolation connection. This is way better than not using explicit transactions in the first place.
On large applications, you may have many concurrent requests, and reducing database connection acquisition request rate will definitely improve your overall application performance.
JPA doesn't enforce transactions on read operations. Only writes end up throwing a TransactionRequiredException in case you forget to start a transactional context. Nevertheless, it's always better to declare transaction boundaries even for read-only transactions (in Spring #Transactional allows you to mark read-only transactions, which has a great performance benefit).

Transactions indeed put locks on the database — good database engines handle concurrent locks in a sensible way — and are useful with read-only use to ensure that no other transaction adds data that makes your view inconsistent. You always want a transaction (though sometimes it is reasonable to tune the isolation level, it's best not to do that to start out with); if you never write to the DB during your transaction, both committing and rolling back the transaction work out to be the same (and very cheap).
Now, if you're lucky and your queries against the DB are such that the ORM always maps them to single SQL queries, you can get away without explicit transactions, relying on the DB's built-in autocommit behavior, but ORMs are relatively complex systems so it isn't at all safe to rely on such behavior unless you go to a lot more work checking what the implementation actually does. Writing the explicit transaction boundaries in is far easier to get right (especially if you can do it with AOP or some similar ORM-driven technique; from Java 7 onwards try-with-resources could be used too I suppose).

It doesn't matter whether you only read or not - the database must still keep track of your resultset, because other database clients may want to write data that would change your resultset.
I have seen faulty programs to kill huge database systems, because they just read data, but never commit, forcing the transaction log to grow, because the DB can't release the transaction data before a COMMIT or ROLLBACK, even if the client did nothing for hours.

Spring transaction REQUIRED vs REQUIRES_NEW : Rollback Transaction

I have a method that has the propagation = Propagation.REQUIRES_NEW transactional property:
#Transactional(propagation = Propagation.REQUIRES_NEW)
public void createUser(final UserBean userBean) {
//Some logic here that requires modification in DB
}
This method can be called multiple times simultaneously, and for every transaction if an error occurs than it's rolled back (independently from the other transactions).
The problem is that this might force Spring to create multiple transactions, even if another one is available, and may cause some performance problems.
Java doc of propagation = Propagation.REQUIRED says: Support a current transaction, create a new one if none exists.
This seems to solve the performance problem, doesn't it?
What about the rollback issue ? What if a new method call rolls back while using an existing transaction ? won't that rollback the whole transaction even the previous calls ?
[EDIT]
I guess my question wasn't clear enough:
We have hundreds of clients connected to our server.
For each client we naturally need to send a feedback about the transaction (OK or exception -> rollback).
My question is: if I use REQUIRED, does it mean only one transaction is used, and if the 100th client encounters a problem the 1st client's transaction will rollback as well ?

Using REQUIRES_NEW is only relevant when the method is invoked from a transactional context; when the method is invoked from a non-transactional context, it will behave exactly as REQUIRED - it will create a new transaction.
That does not mean that there will only be one single transaction for all your clients - each client will start from a non-transactional context, and as soon as the the request processing will hit a #Transactional, it will create a new transaction.
So, with that in mind, if using REQUIRES_NEW makes sense for the semantics of that operation - than I wouldn't worry about performance - this would textbook premature optimization - I would rather stress correctness and data integrity and worry about performance once performance metrics have been collected, and not before.
On rollback - using REQUIRES_NEW will force the start of a new transaction, and so an exception will rollback that transaction. If there is also another transaction that was executing as well - that will or will not be rolled back depending on if the exception bubbles up the stack or is caught - your choice, based on the specifics of the operations.
Also, for a more in-depth discussion on transactional strategies and rollback, I would recommend: «Transaction strategies: Understanding transaction pitfalls», Mark Richards.

If you really need to do it in separate transaction you need to use REQUIRES_NEW and live with the performance overhead. Watch out for dead locks.
I'd rather do it the other way:
Validate data on Java side.
Run everyting in one transaction.
If anything goes wrong on DB side -> it's a major error of DB or validation design. Rollback everything and throw critical top level error.
Write good unit tests.

Webservice calls within the same JDBC transaction are causing DB lock timeouts

I'm using the H2 database embedded within a Spring-MVC app.
I have declare Transactions at my service level. In particular I have a case where we do the folllowing:
Start a transaction
Make a webservices call
Make the subsequent database calls
Upon rollback (an exception), the webservices call is manually rolled back, and the Spring Transaction Manager handles calling a rollback on the DB transaction.
But I realized that the existence of the webservices call within the DB transaction is causing a table-wide lock, which causes other users to receive errors (I can produce this on a 1-user system with 2 browsers).
I am trying to plot my best course of action.
Should I be changing the transaction isolation level in Spring, will that affect the DB locking?
Is it just bad form to have a webservice and DB call in the same transaction?
Should I enable row level locking within the H2 DB because of this?
Am I not thinking of other solutions?
I should add that my service method serviceA() calls two other methods webServiceX() and daoMethodY(). serviceA() is encompassed in a transaction because any exception needs to rollback both the webServiceX() (a function I provide), and rollback the daoMethodY() database operations (a function of the DataSourceTransactionManager).

I think your approach is reasonable, and you should definitely try row-level locking if it's possible.
You might want to reconsider your design. Is the database duplicating state which really comes from the web service? In that case, you might want to consider caching the web service calls instead. But that depends on your application.
Alternatively, you might just have to roll your own transaction management. As long as it's generic, it shouldn't be too much trouble. We've done this in our project where we're not using Spring's transactions. Something like
performTransaction() {
doWSCall();
// no need to worry about WS call exception, because DB call won't happen
try {
doDbCall()
} catch (Exception ex) {
rollbackWSCall()
// rethrow ex
}
}
where all the methods are abstract.

I wouldn't have a web service call mingled into a database call. Your method is breaking the rule that says "do one thing well".
Have your service call other web services and the DAO.

Testing simultaneous calls to transactional service

How should I test a service method that is transactional for its simultaneous use (it updates a database row by decreasing a value)?
I have setup a JUnit test class with SpringJunit4ClassRunner and components are #autowired.
Just spawning threads which would call the method doesn't seem to work. I'm not sure whether this has something to do with the Spring proxy mechanism.
What I would like to achieve is to create a situation where simultaneously two threads are "inside" the tested method and other one will fail and rollback. e.g. The row value is 3 and both method calls try to decrease the value by 2; if the method wouldn't work, the value would be -1, which is illegal. But I want that either both of the calls fail and rollback, or failing the one that tries to update it an instant later than the other.
Is this even possible?

The first problem is that the transaction context is bound to one thread (with a thread local). So you have to start a transaction in each of your threads. (I think there is no support for this in spring. You can start transaction programmatically with the transaction manager.)
The code you described: read, decrement, write does only work with the right isolation level (serialized and repeatable read would work).
After this setup is done, you can test the behavior by blocking one thread while he has the database lock. You can use a Latch for this.
The thread without database lock will now still not rollback. It will block until the database lock is available again. The scheme you're describing is quite similiar to Optimistic concurrency control so maybe this is already implemented.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.