Long running transactions with Spring and Hibernate?

Long running transactions with Spring and Hibernate? - java

The underlying problem I want to solve is running a task that generates several temporary tables in MySQL, which need to stay around long enough to fetch results from Java after they are created. Because of the size of the data involved, the task must be completed in batches. Each batch is a call to a stored procedure called through JDBC. The entire process can take half an hour or more for a large data set.
To ensure access to the temporary tables, I run the entire task, start to finish, in a single Spring transaction with a TransactionCallbackWithoutResult. Otherwise, I could get a different connection that does not have access to the temporary tables (this would happen occasionally before I wrapped everything in a transaction).
This worked fine in my development environment. However, in production I got the following exception:
java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
This happened when a different task tried to access some of the same tables during the execution of my long running transaction. What confuses me is that the long running transaction only inserts or updates into temporary tables. All access to non-temporary tables are selects only. From what documentation I can find, the default Spring transaction isolation level should not cause MySQL to block in this case.
So my first question, is this the right approach? Can I ensure that I repeatedly get the same connection through a Hibernate template without a long running transaction?
If the long running transaction approach is the correct one, what should I check in terms of isolation levels? Is my understanding correct that the default isolation level in Spring/MySQL transactions should not lock tables that are only accessed through selects? What can I do to debug which tables are causing the conflict, and prevent those tables from being locked by the transaction?

I consider keeping transaction open for an extended time evil. During my career the definition of "extended" has descended from seconds to milli-seconds.
It is an unending source of non-repeatable problems and headscratching problems.
I would bite the bullet in this case and keep a 'work log' in software which you can replay in reverse to clean up if the batch fails.

When you say your table is temporary, is it transaction scoped? That might lead to other transactions (perhaps on a different transaction) not being able to see/access it. Perhaps a join involving a real table and a temporary table somehow locks the real table.
Root cause: Have you tried to use the MySQL tools to determine what is locking the connection? It might be something like next row locking. I don't know the MySQL tools that well, but on oracle you can see what connections are blocking other connections.
Transaction timeout: You should create a second connection pool/data source with a much longer timeout. Use that connection pool for your long running task. I think your production environment is 'trying' to help you out by detecting stuck connections.

As mentioned by Justin regarding Transaction timeout, I recently faced the problem in which the connection pool ( in my case tomcat dbcp in Tomcat 7), had setting which was supposed to mark the long running connections mark abandon and then close them. After tweaking those parameters I could avoid that issue.

Related

Postgres vacuum/demon partially working when issued from JDBC

First of all I know it's odd to rely on a manual vacuum from the application layer, but this is how we decided to run it.
I have the following stack :
HikariCP
JDBC
Postgres 11 in AWS
Now here is the problem. When we start fresh with brand new tables with autovacuum=off the manual vacuum is working fine. I can see the number of dead_tuples growing up to the threshold then going back to 0. The tables are being updated heavily in parallel connections (HOT is being used as well). At some point the number of dead rows is like 100k jumping up to the threshold and going back to 100k. The n_dead_tuples slowly creeps up.
Now the worst of all is that when you issue vacuum from a pg console ALL the dead tuples are cleaned, but oddly enough when the application is issuing vacuum it's successful, but partially cleans "threshold amount of records", but not all ?
Now I am pretty sure about the following:
Analyze is not running, nor auto-vacuum
There are no long running transactions
No replication is going on
These tables are "private"
Where is the difference between issuing a vacuum from the console with auto-commit on vs JDBC ? Why the vacuum issued from the console is cleaning ALL the tupples whereas the vacuum from the JDBC cleans it only partially ?
The JDBC vacuum is ran in a fresh connection from the pool with the default isolation level, yes there are updates going on in parallel, but this is the same as when a vacuum is executed from the console.
Is the connection from the pool somehow corrupted and can not see the updates? Is the ISOLATION the problem?
Visibility Map corruption?
Index referencing old tuples?
Side-note: I have observed that same behavior with autovacuum on and cost limit through the roof like 4000-8000 , threshold default + 5% . At first the n_dead_tuples is close to 0 for like 4-5 hours... The next day the table is 86gigs with milions of dead tuples. All the other tables are vacuumed and ok...
PS: I will try to log a vac verbose in the JDBC..
PS2: Because we are running in AWS could it be a backup that is causing it to stop cleaning ?
PS3: When refering to vaccum I mean simple vacuum, not full vacuum. We are not issuing full vacuum.

The main problem was that vacuum is run by another user. The vacuuming that I was seeing was the HOT updates + selects running over that data resulting in on-the-fly vacuum of the page.
Next: Vacuuming is affected by long running transactions ACROSS ALL schemas and tables. Yes, ALL schemas and tables. Changing to the correct user fixed the vacuum, but it will get ignored if there is an open_in_transaction in any other schema.table.
Work maintance memory helps, but in the end when the system is under heavy load all vacuuming is paused.
So we upgraded the DB's resources a bit and added a monitor to help us if there are any issues.

Threads on Multiple VMs accessing a table on single Instance of DB causing low performance and Exceptions occasionally

Application is hosted on multiple Virtual Machines and DB is on single server. All VMs are pointing to single Instance of DB.
In this architecture, I have a table having very few record. But this table is accessed and updated by threads running on VMs very heavily. This is causing a performance bottleneck and sometimes record level exception. Database level locking does not seem to be best option as it is introducing significant delays in request processing.
Please suggest if there is any other technique to solve this problem.

Few questions first!
Is your application using connection pooling? If not, please use it. Creating a JDBC connection is expensive!
Is your application read heavy/write heavy?
What kind of storage engine are you using in your MySQL tables? InnoDB or MyISAM. If your application is write heavy, please use InnoDB based tables as it uses row level locking and will serve concurrent requests better.

One special case - if you are implementing queues on top of database tables, find a database that has a built-in queue operation and use that, or use a reliable messaging service. Building queues on top of databases is typically not efficient. See e.g. http://mikehadlow.blogspot.co.uk/2012/04/database-as-queue-anti-pattern.html
In general, running transactions on databases is slow because at the end of each transaction the database needs to be sure that enough has been saved out to disk that if the system died right now the changes made by the transaction would be safely preserved. If you don't need this you might find it faster to write a single non-database application that does what the database does but doesn't write anything out to disk, or still does database IO but does the minimum possible. Then instead of all of the VMs talking to the database directly they would all talk to this application.

Postgres Issue Isolation level

Lot of SHOW TRANSACTION ISOLATION LEVEL appears in process list in Postgres 9.0 .
What are reasons for this and when it appears ?. All are in idle state.
How to disable this ?

I assume that with process list you mean the system view pg_stat_activity (accessible in pgAdmin III in the "statistics" tab or with "Tools/Server Status").
Since you say that the connections are idle, the query column does not show an active query, it shows the last query that has been issued in this database connection. I don't know which ORM or connection pooler you are using, but some software in your stack must insert these statements routinely at the end of a database action.
I wouldn't worry about them, this statements are not resource intensive and probably won't cause you any trouble.
If you really need to get rid of them, figure out which software in your stack causes them and investigate there.

Hibernate: avoid Meta Data Locking in MySQL 5.5

We have a lot of long running processes in our Spring/Hibernate based application which span one single long running transaction.
Since we upgraded to MySQL 5.5. we often run into effects of Meta-Data Locking (see https://www.percona.com/blog/2013/02/01/implications-of-metadata-locking-changes-in-mysql-5-5/)
Before MySQL 5.5. we could easily do a ALTER TABLE ADD COLUMN even though a long running process was active. This was useful when we do staggered deployments (one server at a time until all servers are deployed with newest code)
With MySQL 5.5. this is not possible anymore, as the long transactions holds a Meta-data lock and we basically need to wait until all long running processes holding those locks have finished.
In other words: We cannot do DDL statements when there is at least one long running process running.
That basically means a downtime, which we would like to avoid.
Question: Which strategies do exist to handle that situation?
My idea would be:
have multiple sessionFactories:
a) SessionFactoryA (autocommit=true) for all tables which are not transactional (the ones we would like to touch with DDL statements)
b) SessionFactoryB (autocommit=false) for all other tables which should be transactional, even during long running processes.
Would that work? Which downsides are there? Any other ideas?

Hibernate Java batch operation deadlock

We have J2EE application built using Hibernate and struts. We have RMI registry based implementation for business functionality.
In our application around 250 concurrent users are going to upload batches containing huge data named BATCHDET. These batches are first validated against 30 validation and then they are inserted to tables where we have parent and child relationship. Similar there are other operation which need huge processing. like printing etc.
There is one table containing 10 million record which gets accessed for all types of transactions and every process inserts and updates this table. This table has emerged as bottleneck. We have added all the required indexes as well.
After 30 minutes of run system JVM utilizes all the allocated 6GB of RAM and goes in no response state. When we tried to find out root cause we realized there was lock at database site and all the update queries related to BATCHDET table were in wait state. We tried everything which we could but no luck.
System run smooth when tried with 50 concurrent user but dies with 250 users which are expected. BATCHDET has lot of dependency on almost every module, not in mood to rewrite the implementation, could you please provide quick fix to it.
we have Thread based transaction demarcation at Hibernate implemented with HIbernateUtil.java. Transaction isolation is ReadCommitted. Is there any way where we can define no lock for all search operation. we have oracle 10G RDBMS.
Let me know if you need any other details.
~Amar

" Is there any way where we can define no lock for all search operation. we have oracle 10G RDBMS."
Oracle doesn't lock on selects, so in effect this is already in place.
Oracle also locks at a row level, so you need to stop thinking about the table as a whole and start thinking individual rows.
You need to talk with your DBA. There's a whole bunch of stuff to monitor in Oracle at both the system and session level. The DBA will be able to be able to look at v$session and tell you what the individual sessions are waiting on. There might be locks, it might be a disk bottle neck, it may be index contention, or it may be the database is sat there idle and all the inefficiency is in the java layer.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.