Why does a SELECT wait for a lock?

Why does a SELECT wait for a lock? - java

In my application I have the problem that sometimes SELECT statements run into a java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction exception. Sadly I can't create an example as the circumstances are very complex. So the question is just about general understanding.
A little bit of background information: I'm using MySQL (InnoDB) with READ_COMMITED isolation level.
Actually I don't understand how a SELECT can ever run into a lock timeout with that setup. I thought that a SELECT would never lock as it will just return the latest commited state (managed by MySQL). Anyway according to what is happening this seems to be wrong. So how is it really?
I already read this https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html but that didn't really give me a clue. No SELECT ... FOR UPDATE or something like that is used.

That is probably due to your database. Usually this kind of problems come from that side, not from the programming side that access it.In my experience with db's, these problems are usually due to that. In the end, the programming side is just "go and get that for me in that db" thing.
I found this without much effort.
It basically explains that:
Lock wait timeout occurs typically when a transaction is waiting on row(s) of data to update which is already been locked by some other transaction.
You should also check this answer that has a specific transaction problem, which might help you, as trying to change different tables might do the timeout
the query was attempting to change at least one row in one or more InnoDB tables. Since you know the query, all the tables being accessed are candidates for being the culprit.

To speed up queries in a DB, several transactions can be executed at the same time. For example if someone runs a select query over a table for the wages of the employees of a company (each employee identified by an id) and another one changes the last name of someone who e.g. has married, you can execute both queries at the same time because they don't interfere.
But in other cases even a SELECT statement might interfere with another statement.
To prevent unexpected results in a SQL transactions, transactions follow the ACID-model which stands for Atomicity, Consistency, Isolation and Durability (for further information read wikipedia).
Let's say transaction 1 starts to calculate something and then wants to write the results to table A. Before writing it it locks all SELECT statements to table A. Otherwise this would interfere with the Isolation requirement. Because if a transaction 2 would start while 1 is still writing, 2's results depend on where 1 has already written to and where not.
Now, it might even produce a dead-lock. E.g. before transaction 1 can write the last field in table A, it still has to write something to table B but transaction 2 has already blocked table B to read safely from it after it read from A and now you have a deadlock. 2 wants to read from A which is blocked by 1, so it waits for 1 to finish but 1 waits for 2 to unlock table B to finish by itself.
To solve this problem one strategy is to rollback certain transactions after a certain timeout. (more here)
So that might be a read on for your select statement to get a lock wait timeout exceeded.
But a dead-lock usually just happens by coincidence, so if transaction 2 was forced to rollback, transaction 1 should be able to finish so that 2 should be able to succeed on a later try.

Related

Can executeQuery and executeBatch be used simultaneously with one connection for each?

I have a use case where in I need to process some queries in batch ( stmt.addBatch() ) and some queries need to be executed as soon as they are created (stmt.executeQuery() ) because their result will be used in the queries going for batch processing. The database is (obviously) common for both. My question is can I keep two open connections throughout for the above use case ? Will that be a good idea considering consistency & viability in mind ?
Edit : What if the queries in question were to act upon different records, can 2 simultaneous active connection objects be kept alive for handling each scenario ?
P.S. - I am relatively new to backend & databases specifically, request you to include some explanation / further reading pointers for the same.

In your case it is basically not predictable which will start first as also processors can shift commands ad risc level.
in a view of a rdms, you can have simultaneous access to the same table/ row for that you can read more about locking
for a executebatch is one transaction for the hole batch, so if it would start first, your single command would be locked out, till the lock was lifted, and then proceed if not the timeout hits first.
so for one rdms the consistency is guaranteed, for replicated system it gets more complicated
Oh I see an edit:
Row level locks are mentioned in the first link, which would answer your edit also:

"Select for update" and update with pessimistic locking

I'm trying to implement pessimistic locking using select for update, as I want other threads to wait until the lock on the selected row is released.
The part that I have understood is after going through multiple threads Spring JDBC select for update and various similar threads is it is achievable in case select and update are happening within same method and hence they are part of same transaction.
The issue in my case is I have a JAR for DAO functionality where in a selectforUpdate method is available and a separate update method is available, both method has a finally block which contains
resultSet.close();
statement.close();
connection.close();
Now I'm struggling to find out is there a way in which I can use both the methods from outside of the JAR, maybe by annotating my method with #Transactional annotation and make it work in some way. So that lock is only released once update method has been executed.

You're making a mistake. Using the wrong tool for the job. Transaction levels and FOR UPDATE has the purpose of ensuring data integrity. Period. It it isn't designed for control flow and if you use it for this, it will bite you in the butt sooner rather than later.
Let me try to explain what SELECT FOR UPDATE is for, so that, when later I tell you that it is most definitely not for what you're trying to do with it, it is easier to follow.
Imagine a bank. Simple enough. The bank has some ATMs out front and a website where you can see your transactions and transfer money to other accounts.
Imagine you (ABC) and I (Reinier) are trying to fleece the bank some. Here is our plan: We set it up so that you have €1000,- in your account and I have nothing.
Then, you log into the website from your phone, and start a transfer, transferring €1000,- to my account. But, while you're doing that, right in the middle, you withdraw €10,- from the ATM.
If the bank messed up their transactions, it's possible you end up with €990,- in your account and I have €1000,- in my account, and we fleeced the bank. This is how that could happen (and if halfway through the example you think: I already know this stuff, I know what FOR UPDATE does! - I'm not so sure you do, read it carefully)
ATM code
startTransaction();
int currentBalance = sql("SELECT balance FROM account WHERE user = ?", abc);
if (currentBalance < requestedWithdrawal) throw new InsufficientFundsEx();
sql("UPDATE account SET balance = ? WHERE user = ?", currentBalance - requestedWithdrawal, abc);
commit();
moneyHopper.spitOut(requestedWithdrawal();
Website code
startTransaction();
int balanceTo = sql("SELECT balance FROM account WHERE user = ?", reinier);
int balanceFrom = sql("SELECT balance FROM account WHERE user = ?", abc);
if (transfer > balanceFrom) throw new InsufficientFundsEx();
sql("UPDATE account SET balance = ? WHERE user = ?", balanceTo + transfer, reinier);
sql("UPDATE account SET balance = ? WHERE user = ?", balanceFrom - transfer, abc);
commit();
controller.notifyTransferSucceeded();
How it can go wrong
The way it goes wrong is if the balanceTo and balanceFrom are 'locked in', then the ATM withdrawal goes through, and then the update SQL statements from the website transaction go through (this wipes out the ATM withdrawal, effectively - whatever the ATM spit out is free money), or if the ATM's balance check locks in, then the transfer goes through, and then the ATM's update goes through (which gives the recipient, i.e. me their €1000,-, and ensures that the ATM code's update, setting your balance to 990, is the last thing that happens, giving us €990,- of free money.
So what's the fix? Hint: Not FOR UPDATE
The fix is to consider what a transaction means. The purpose of transactions is to turn operations into atomic notions. Either both your account is reduced by the transfer amount and mine is raised by the same, or nothing happens.
It's obvious enough with statements that change things (UPDATE and INSERT). It's a bit more wonky when we talk about reading data. Should those reads be considered part of the transaction?
One way to go is to say: No, unless you add FOR UPDATE at the end of it all, in which case, yes - i.e. lock those rows only if FOR UPDATE is applied until the transaction ends.
But that is not the only way to ensure data integrity.
Optimistic locking to the rescue - or rather, to your doom
A much more common way is called MVCC (MultiVersion Concurrency Control) and is far faster. The idea behind MVCC (also called optimistic locking), is to just assume no clashes ever occur. Nothing is ever locked. Instead, [A] all changes made within a transaction are completely invisible to things running in any other transaction until you commit, and [B] when you COMMIT a transaction, the database checks if everything you have done within the span of this transaction still 'holds up' - for example, if you updated a row within this transaction that was also modified by another transaction that has committed already, you get an error when you commit, not when you ran the UPDATE statement.
In this framework, we can still talk about what SELECT even means. This, in java/JDBC, is called the Transaction Isolation Level and is configurable on a DB connection. The best level, the level the bank should be using to avoid this issue, is called the TransactionLevel.SERIALIZABLE. Serializable effectively means everything dirties everything else: If during a transaction you read some data, and when you commit, that same SELECT statement would have produced different results because some other transaction modified something, then the COMMIT just fails.
They fail with a so-called 'RetryException'. This means literally what it says: Just start your transaction over, from the top. It makes sense if you think about that bank example: What WOULD have happened, had the bank done it right and set up serializable transaction isolation level, is that either the ATM machine's transaction or the transfer transaction would get the retryexception. Assuming the bank wrote their code right and they actually do what the exception tells you to (start over), then they would start over, and that includes re-reading the balances out. No cheating of the bank can occur now.
Crucially, in the SERIALIZABLE model, locking NEVER occurs, and FOR UPDATE does not mean anything at all.
Thus, usually, FOR UPDATE does literal stone cold nothing, a complete no-op, depending on how the db is setup.
FOR UPDATE does not mean 'lock other transactions that touch this row'. No matter how much you want it to.
Some DB implementations, or even some combination of DB engine and connection configuration may be implemented in that fashion, but that is an extremely finicky setup, and your app should include documentation that strongly recommends the operator to never change the db settings, never switch db engines, never update the db engine, never update the JDBC driver, and never mess with the connection settings.
That's the kind of silly caveat you really, really don't want to put on your code.
The solution is to stop buttering your toast with that chainsaw. Even if you think you can manage to get some butter on that toast with it, it's just not what it was made for, like at all, and we're all just waiting until you lose a thumb here. Just stop doing it. Get a butterknife, please.
If you want to have one thread wait for another, don't use the database, use a lock object. If you want to have one process wait for another, don't use the database, don't use a lock object (you can't; processes don't share memory); use a file. the new java file IO has an option to make a file atomically (meaning, if the file already exists, throw an exception, otherwise make the file, and do so atomically, meaning if two processes both run this 'create atomically new file' code, you have a guarantee that one succeeds and one throws).
If you want data integrity and that's the only reason you wanted pessimistic locking in the first place, stop thinking that way - it's the DBs job, not your job, to guarantee data integrity. MVCC/Optimistic locking DBs guarantee that the bank will never get fleeced no matter how hard you try with the shenanigans at the top of this answer and nevertheless, pessimistic locking just isn't involved.
JDBC itself sucks (intentionally, a bit too much to get into) for 'end use' like what you are doing here. Get yourself an abstraction that makes it nice such as JDBI or JOOQ. These tools also have the only proper way to interact with databases, which is that all DB code must be in a lambda. That's because you don't want to manually handle those retry exceptions, you want your DB access framework to take care of it. This is what the bank code should really look like:
dbAccess.run(db -> {
int balance = db.sql("SELECT balance FROM account WHERE user =?", abc);
if (balance < requested) throw new InsufficientBalanceEx();
db.update("UPDATE account SET balance = ? WHERE user = ?", balance - requested, abc);
return requested;
};
This way, the 'framework' (the code behind that run method) can catch the retryex and just rerun the lambda as often as it needs to. rerunning is tricky - if two threads on a server both cause the other to retry, which is not that hard to do, then you can get into an endless loop where they both restart and both again cause the other to retry, at infinitum. The solution is literally dicethrowing. When retrying, you should roll a random number and wait that many milliseconds, and for every further retry, the range on which you're rolling should increase. If this sounds dumb to you, know that you're currently using it: It's how Ethernet works, too (ethernet uses randomized backoff when collisions occur on the wire). Ethernet won, token ring lost. It's the exact same principle at work (token ring is pessimistic locking, ethernet is optimistic 'eh just try it and detect if it went wrong, then just redo it, with some randomized exponential backoff sprinkled in to ensure you don't get 2 systems in lock-step forever screwing up the other's attempt).

Concurrent update and delete on one table

Using Java, Hibernate and Oracle database.
I have two concurrent processes:
Process1 removes some entities from table1. (multiple: delete from table1 where id =...) Done by native hibernate query.
Process2 updates SAME/other entities in table1. (multiple: update table1 set name=... where id=...) Done by jpa repository delete method.
Currently sometimes exception
CannotAcquireLockException is thrown,
(SQL Error: 60, SQLState: 61000..
ORA-00060: deadlock detected while waiting for resource)
So, the question is: what is going on and how I can avoid exception? Any workaround?
IMPORTANT: In case of collisions I would be satisfied if delete succeeds and update won't do anything.

Session A waits for B, B waits for A - this is what a deadlock basically is.
Nothing to wait for any more, Oracle kills either of the sessions.
Option 1
Create semaphore to effectively serialize concurrent processes.
create table my_semaphore(dummy char(1));
Session 1:
LOCK TABLE my_semaphore in exclusive mode;
UPDATE <your update here>;
COMMIT;
Session 2:
LOCK TABLE my_semaphore in exclusive mode;
DELETE <your delete here>;
COMMIT;
Option 2
Try processing rows with both statements in the same order, say by rowid or whatever.
So that session B never returns to rows held by A, if A is stuck in behind by rows locked by B. This more tricky and resource-intesive.

"locking tables doesnt look attractive at all -what the point then of having severaal processes working with database"
Obviously we want to enable concurrent processes. The trick is to design processes which can run concurrently without interfering with each other. Your architecture is failing to address this point. It should not be possible for Process B to update records which are being deleted by Process A.
This is an unfortunate side-effect of the whole web paradigm which is stateless and favours an optimistic locking strategy. Getting locks at the last possible moment "scales" but incurs the risk of deadlock.
The alternative is a pessimistic locking strategy, in which a session locks the rows it wants upfront. In Oracle we can do this with SELECT .. FOR UPDATE. This locks a subset of rows (the set defined by the WHERE clause) and not the whole table. Find out more.
So it doesn't hinder concurrent processes which operate on different subsets of data but it will prevent a second session grabbing records which are already being processed. This still results in an exception for the second session but at least that happens before the session has done any work, and provides information to re-evaluate the task (hmmm, do we want to delete these records if they're being updated?).
Hibernate supports the SELECT FOR UPDATE syntax. This StackOverflow thread discusses it.

Transaction in PostgreSql

I'd like to realize following scenario in PosgreSql from java:
User selects data
User starts transaction: inserts, updates, deletes data
User commits transaction
I'd like data not be available for other users during the transaction. It would be enough if I'd get an exception when other user tries to update the table.
I've tried to use select for update or select for share, but it locks data for reading also. I've tried to use lock command, but I'm not able to get a lock (ERROR: could not obtain lock on relation "fppo10") or another transaction gets lock when trying to commit transaction, not when updating the data.
Does it exist a way to lock data in a moment of transaction start to prevent any other call of update, insert or delete statement?
I have this scenario working successfully for a couple of years on DB2 database. Now I need the same application to work also for PostgreSql.

Finally, I think I get what you're going for.
This isn't a "transaction" problem per se (and depending on the number of tables to deal with and the required statements, you may not even need one), it's an application design problem. You have two general ways to deal with this; optimistic and pessimistic locking.
Pessimistic locking is explicitly taking and holding a lock. It's best used when you can guarantee that you will be changing the row plus stuff related to it, and when your transactions will be short. You would use it in situations like updating "current balance" when adding sales to an account, once a purchase has been made (update will happen, short transaction duration time because no further choices to be made at that point). Pessimistic locking becomes frustrating if a user reads a row and then goes to lunch (or on vacation...).
Optimistic locking is reading a row (or set of), and not taking any sort of db-layer lock. It's best used if you're just reading through rows, without any immediate plan to update any of them. Usually, row data will include a "version" value (incremented counter or last updated timestamp). If your application goes to update the row, it compares the original value(s) of the data to make sure it hasn't been changed by something else first, and alerts the user if the data changed. Most applications interfacing with users should use optimistic locking. It does, however, require that users notice and pay attention to updated values.
Note that, because a lock is rarely (and for a short period) taken in optimistic locking, it usually will not conflict with a separate process that takes a pessimistic lock. A pessimistic locking app would prevent an optimistic one from updating locked rows, but not reading them.
Also note that this doesn't usually apply to bulk updates, which will have almost no user interaction (if any).
tl;dr
Don't lock your rows on read. Just compare the old value(s) with what the app last read, and reject the update if they don't match (and alert the user). Train your users to respond appropriately.

Instead of select for update try a "row exclusive" table lock:
LOCK TABLE YourTable IN ROW EXCLUSIVE MODE;
According to the documentation, this lock:
The commands UPDATE, DELETE, and INSERT acquire this lock mode on the
target table (in addition to ACCESS SHARE locks on any other
referenced tables). In general, this lock mode will be acquired by any
command that modifies data in a table.
Note that the name of the lock is confusing, but it does lock the entire table:
Remember that all of these lock modes are table-level locks, even if
the name contains the word "row"; the names of the lock modes are
historical

Mysql/JDBC: Deadlock

I have a J2EE server, currently running only one thread (the problem arises even within one single request) to save its internal model of data to MySQL/INNODB-tables.
Basic idea is to read data from flat files, do a lot of calculation and then write the result to MySQL. Read another set of flat files for the next day and repeat with step 1. As only a minor part of the rows change, I use a recordset of already written rows, compare to the current result in memory and then update/insert it correspondingly (no delete, just setting a deletedFlag).
Problem: Despite a purely sequential process I get lock timeout errors (#1204) and Innodump show record locks (though I do not know how to figure the details). To complicate things under my windows machine everything works, while the production system (where I can't install innotop) has some record locks.
To the critical code:
Read data and calculate (works)
Get Connection from Tomcat Pool and set to autocommit=false
Use Statement to issue "LOCK TABLES order WRITE"
Open Recordset (Updateable) on table order
For each row in Recordset --> if difference, update from in-memory-object
For objects not yet in the database --> Insert data
Commit Connection, Close Connection
The Steps 5/6 have an Commitcounter so that every 500 changes the rows are committed (to avoid having 50.000 rows uncommitted). In the first run (so w/o any locks) this takes max. 30sec / table.
As stated above right now I avoid any other interaction with the database, but it in future other processes (user requests) might read data or even write some fields. I would not mind for those processes to read either old/new data and to wait for a couple of minutes to save changes to the db (that is for a lock).
I would be happy to any recommendation to do better than that.
Summary: Complex code calculates in-memory objects which are to be synchronized with database. This sync currently seems to lock itself despite the fact that it sequentially locks, changes unlocks the tables without any exceptions thrown. But for some reason row locks seem to remain.
Kind regards
Additional information:
Mysql: show processlist lists no active connections (all asleep or alternatively waiting for table locks on table order) while "show engine INNODB" reports a number of row locks (unfortuantely I can't understand which transaction is meant as output is quite cryptic).

Solved: I wrongly declared a ResultSet as updateable. The ResultSet was closed only on a "finalize()" method via Garbage Collector which was not fast enough - before I reopended the ResultSet and tried therefore to aquire a lock on an already locked table.
Yet it was odd, that innotop showed another query of mine to hang on a completely different table. Though as it works for me, I do not care about oddities:-)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.