JDBC transactions in multi-threaded environment - java

Developing a Java application that share a single Connection between mutiple threads, the problem of concurrency arise.
If thread A updates record 1 in table T, and simultaneously thread B issues a SELECT on record 1 in table T, how do I ensure thread B reads the updated values of thread A?
java.sql.Connection offers transactions with begin(), commit() and rollback(), but does this process also cover data correctness?
I think I'm missing something.

Two points:
You shouldn't share a jdbc.Connection between threads, at least for any 'seriously production' code, see here. For demo purposes, I think, sharing a Connection is OK;
If a thread reads from DB after relevant DB transaction is committed, it will see data written by another thread.
For your second question
will thread B timeout until the first transaction has commit() or rollback()
-- B will block till A tx is finished (either by commit or rollback) if:
B tries to update/delete same table row which is being updated by A, and ...
A updates that row under DB-level lock, using SELECT ... FOR UPDATE.
You can get this behavior using two consoles (for example, with PostgreSQL psql), each console stands for a thread:
in A console type following:
BEGIN;
SELECT some_col FROM some_tbl WHERE some_col = some_val FOR UPDATE;
now in B console type:
BEGIN;
UPDATE some_tbl SET some_col = new_val WHERE some_col = some_val;
You should see that UPDATE blocks until in A you do either COMMIT or ROLLBACK.
Above explanation uses separate DB connections, just like Java JDBC connection pool. When you share single connection between Java threads, I think, any interaction with DB will block if connection is used by some other thread.

Jdbc is a standard that is broadly adopted but with uneven levels of adherence, it is probably not good to make sweeping statements about what is safe.
I would not expect there is anything to keep statement executions and commits and rollbacks made from multiple threads from getting interleaved. Best case, only one thread can use the connection at a time and the others block, making multithreading useless.
If you don't want to provide a connection to each thread, you could have the threads submit work items to a queue that is consumed by a single worker thread handling all the jdbc work. But it's probably less impact on existing code to introduce a connection pool.
In general if you have concurrent updates and reads then they happen in the order that they happen. Locking and isolation levels provide consistency guarantees for concurrent transactions but if one hasn't started its transaction yet those aren't applicable. You could have a status flag, version number, or time stamp on each row to indicate when an update occurred.
If you have a lot of updates it can be better to collect them in a flat file and execute a bulk copy. It can be much faster than using jdbc. Then with updates out of the way execute selects in jdbc.

Related

Why does a SELECT wait for a lock?

In my application I have the problem that sometimes SELECT statements run into a java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction exception. Sadly I can't create an example as the circumstances are very complex. So the question is just about general understanding.
A little bit of background information: I'm using MySQL (InnoDB) with READ_COMMITED isolation level.
Actually I don't understand how a SELECT can ever run into a lock timeout with that setup. I thought that a SELECT would never lock as it will just return the latest commited state (managed by MySQL). Anyway according to what is happening this seems to be wrong. So how is it really?
I already read this https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html but that didn't really give me a clue. No SELECT ... FOR UPDATE or something like that is used.
That is probably due to your database. Usually this kind of problems come from that side, not from the programming side that access it.In my experience with db's, these problems are usually due to that. In the end, the programming side is just "go and get that for me in that db" thing.
I found this without much effort.
It basically explains that:
Lock wait timeout occurs typically when a transaction is waiting on row(s) of data to update which is already been locked by some other transaction.
You should also check this answer that has a specific transaction problem, which might help you, as trying to change different tables might do the timeout
the query was attempting to change at least one row in one or more InnoDB tables. Since you know the query, all the tables being accessed are candidates for being the culprit.
To speed up queries in a DB, several transactions can be executed at the same time. For example if someone runs a select query over a table for the wages of the employees of a company (each employee identified by an id) and another one changes the last name of someone who e.g. has married, you can execute both queries at the same time because they don't interfere.
But in other cases even a SELECT statement might interfere with another statement.
To prevent unexpected results in a SQL transactions, transactions follow the ACID-model which stands for Atomicity, Consistency, Isolation and Durability (for further information read wikipedia).
Let's say transaction 1 starts to calculate something and then wants to write the results to table A. Before writing it it locks all SELECT statements to table A. Otherwise this would interfere with the Isolation requirement. Because if a transaction 2 would start while 1 is still writing, 2's results depend on where 1 has already written to and where not.
Now, it might even produce a dead-lock. E.g. before transaction 1 can write the last field in table A, it still has to write something to table B but transaction 2 has already blocked table B to read safely from it after it read from A and now you have a deadlock. 2 wants to read from A which is blocked by 1, so it waits for 1 to finish but 1 waits for 2 to unlock table B to finish by itself.
To solve this problem one strategy is to rollback certain transactions after a certain timeout. (more here)
So that might be a read on for your select statement to get a lock wait timeout exceeded.
But a dead-lock usually just happens by coincidence, so if transaction 2 was forced to rollback, transaction 1 should be able to finish so that 2 should be able to succeed on a later try.

Concurrent update and delete on one table

Using Java, Hibernate and Oracle database.
I have two concurrent processes:
Process1 removes some entities from table1. (multiple: delete from table1 where id =...) Done by native hibernate query.
Process2 updates SAME/other entities in table1. (multiple: update table1 set name=... where id=...) Done by jpa repository delete method.
Currently sometimes exception
CannotAcquireLockException is thrown,
(SQL Error: 60, SQLState: 61000..
ORA-00060: deadlock detected while waiting for resource)
So, the question is: what is going on and how I can avoid exception? Any workaround?
IMPORTANT: In case of collisions I would be satisfied if delete succeeds and update won't do anything.
Session A waits for B, B waits for A - this is what a deadlock basically is.
Nothing to wait for any more, Oracle kills either of the sessions.
Option 1
Create semaphore to effectively serialize concurrent processes.
create table my_semaphore(dummy char(1));
Session 1:
LOCK TABLE my_semaphore in exclusive mode;
UPDATE <your update here>;
COMMIT;
Session 2:
LOCK TABLE my_semaphore in exclusive mode;
DELETE <your delete here>;
COMMIT;
Option 2
Try processing rows with both statements in the same order, say by rowid or whatever.
So that session B never returns to rows held by A, if A is stuck in behind by rows locked by B. This more tricky and resource-intesive.
"locking tables doesnt look attractive at all -what the point then of having severaal processes working with database"
Obviously we want to enable concurrent processes. The trick is to design processes which can run concurrently without interfering with each other. Your architecture is failing to address this point. It should not be possible for Process B to update records which are being deleted by Process A.
This is an unfortunate side-effect of the whole web paradigm which is stateless and favours an optimistic locking strategy. Getting locks at the last possible moment "scales" but incurs the risk of deadlock.
The alternative is a pessimistic locking strategy, in which a session locks the rows it wants upfront. In Oracle we can do this with SELECT .. FOR UPDATE. This locks a subset of rows (the set defined by the WHERE clause) and not the whole table. Find out more.
So it doesn't hinder concurrent processes which operate on different subsets of data but it will prevent a second session grabbing records which are already being processed. This still results in an exception for the second session but at least that happens before the session has done any work, and provides information to re-evaluate the task (hmmm, do we want to delete these records if they're being updated?).
Hibernate supports the SELECT FOR UPDATE syntax. This StackOverflow thread discusses it.

Java concurrency in DB write operation

I have got one scenario where I need to insert 16k records in a DB table. So along with normal DB batch insert I have created Callable task which will take up respective batch(size of 500 records) and will do insertion in independent manner. I am curious to know how underlying database will take these requests. Does database locking at page level will block rest of java threads until first thread with batch of 500 records get committed?
My answer is for Sybase ASE. For Sybase IQ see Guillaume's answer.
Does database locking at page level will block rest of java threads until first thread with batch of 500 records get committed?
That depends on what locking granularity you have set. According to Sybase's doc, there are three locking granularities:
Allpages locking, which locks datapages and index pages
Datapages locking, which locks only the data pages
Datarows locking, which locks only the data rows
So, if you select Allpages your threads will block until the current batch gets committed. Otherwise, your threads will not block, but will naturally incur a higher locking overhead.
For full details on Sybase ASE's locking granularity, see this documentation.
From Sybase IQ's documentation:
Sybase allows multiple readers, but only one writer to a table.
So, unless you open and close the transaction for every row you insert (which will be slow), your Threads will have to wait until one transaction closes to start a new one.

Concurrent use of same JDBC connection by multiple threads

I'm trying to better understand what will happen if multiple threads try to execute different sql queries, using the same JDBC connection, concurrently.
Will the outcome be functionally correct?
What are the performance implications?
Will thread A have to wait for thread B to be completely done with its query?
Or will thread A be able to send its query immediately after thread B has sent its query, after which the database will execute both queries in parallel?
I see that the Apache DBCP uses synchronization protocols to ensure that connections obtained from the pool are removed from the pool, and made unavailable, until they are closed. This seems more inconvenient than it needs to be. I'm thinking of building my own "pool" simply by creating a static list of open connections, and distributing them in a round-robin manner.
I don't mind the occasional performance degradation, and the convenience of not having to close the connection after every use seems very appealing. Is there any downside to me doing this?
I ran the following set of tests using a AWS RDS Postgres database, and Java 11:
Create a table with 11M rows, each row containing a single TEXT column, populated with a random 100-char string
Pick a random 5 character string, and search for partial-matches of this string, in the above table
Time how long the above query takes to return results. In my case, it takes ~23 seconds. Because there are very few results returned, we can conclude that the majority of this 23 seconds is spent waiting for the DB to run the full-table-scan, and not in sending the request/response packets
Run multiple queries in parallel (with different keywords), using different connections. In my case, I see that they all complete in ~23 seconds. Ie, the queries are being efficiently parallelized
Run multiple queries on parallel threads, using the same connection. I now see that the first result comes back in ~23 seconds. The second result comes back in ~46 seconds. The third in ~1 minute. etc etc. All the results are functionally correct, in that they match the specific keyword queried by that thread
To add on to what Joni mentioned earlier, his conclusion matches the behavior I'm seeing on Postgres as well. It appears that all "correctness" is preserved, but all parallelism benefits are lost, if multiple queries are sent on the same connection at the same time.
Since the JDBC spec doesn't give guarantees of concurrent execution, this question can only be answered by testing the drivers you're interested in, or reading their source code.
In the case of MySQL Connector/J, all methods to execute statements lock the connection with a synchronized block. That is, if one thread is running a query, other threads using the connection will be blocked until it finishes.
Doing things the wrong way will have undefined results... if someone runs some tests, maybe they'll answer all your questions exactly, but then a new JVM comes out, or someone tries it on another jdbc driver or database version, or they hit a different set of race conditions, or tries another platform or JVM implementation, and another different undefined result happens.
If two threads modify the same state at the same time, anything could happen depending on the timing. Maybe the 2nd one overwrites the first's query, and then both run the same query. Maybe the library will detect your error and throw an exception. I don't know and wouldn't bother testing... (or maybe someone already knows or it should be obvious what would happen) so this isn't "the answer", but just some advice. Just use a connection pool, or use a synchronized block to ensure problems don't happen.
We had to disable the statement cache on Websphere, because it was throwing ArrayOutOfBoundsException at PreparedStatement level.
The issue was that some guy though it was smart to share a connection with multiple threads.
He said it was to save connections, but there is no point multithreading queries because the db won't run them parallel.
There was also an issue with a java runnables that were blocking each others because they used the same connection.
So that's just something to not do, there is nothing to gain.
There is an option in websphere to detect this multithreaded access.
I implemented my own since we use jetty in developpement.

Mysql/JDBC: Deadlock

I have a J2EE server, currently running only one thread (the problem arises even within one single request) to save its internal model of data to MySQL/INNODB-tables.
Basic idea is to read data from flat files, do a lot of calculation and then write the result to MySQL. Read another set of flat files for the next day and repeat with step 1. As only a minor part of the rows change, I use a recordset of already written rows, compare to the current result in memory and then update/insert it correspondingly (no delete, just setting a deletedFlag).
Problem: Despite a purely sequential process I get lock timeout errors (#1204) and Innodump show record locks (though I do not know how to figure the details). To complicate things under my windows machine everything works, while the production system (where I can't install innotop) has some record locks.
To the critical code:
Read data and calculate (works)
Get Connection from Tomcat Pool and set to autocommit=false
Use Statement to issue "LOCK TABLES order WRITE"
Open Recordset (Updateable) on table order
For each row in Recordset --> if difference, update from in-memory-object
For objects not yet in the database --> Insert data
Commit Connection, Close Connection
The Steps 5/6 have an Commitcounter so that every 500 changes the rows are committed (to avoid having 50.000 rows uncommitted). In the first run (so w/o any locks) this takes max. 30sec / table.
As stated above right now I avoid any other interaction with the database, but it in future other processes (user requests) might read data or even write some fields. I would not mind for those processes to read either old/new data and to wait for a couple of minutes to save changes to the db (that is for a lock).
I would be happy to any recommendation to do better than that.
Summary: Complex code calculates in-memory objects which are to be synchronized with database. This sync currently seems to lock itself despite the fact that it sequentially locks, changes unlocks the tables without any exceptions thrown. But for some reason row locks seem to remain.
Kind regards
Additional information:
Mysql: show processlist lists no active connections (all asleep or alternatively waiting for table locks on table order) while "show engine INNODB" reports a number of row locks (unfortuantely I can't understand which transaction is meant as output is quite cryptic).
Solved: I wrongly declared a ResultSet as updateable. The ResultSet was closed only on a "finalize()" method via Garbage Collector which was not fast enough - before I reopended the ResultSet and tried therefore to aquire a lock on an already locked table.
Yet it was odd, that innotop showed another query of mine to hang on a completely different table. Though as it works for me, I do not care about oddities:-)

Categories

Resources