"Select for update" and update with pessimistic locking - java

I'm trying to implement pessimistic locking using select for update, as I want other threads to wait until the lock on the selected row is released.
The part that I have understood is after going through multiple threads Spring JDBC select for update and various similar threads is it is achievable in case select and update are happening within same method and hence they are part of same transaction.
The issue in my case is I have a JAR for DAO functionality where in a selectforUpdate method is available and a separate update method is available, both method has a finally block which contains
resultSet.close();
statement.close();
connection.close();
Now I'm struggling to find out is there a way in which I can use both the methods from outside of the JAR, maybe by annotating my method with #Transactional annotation and make it work in some way. So that lock is only released once update method has been executed.

You're making a mistake. Using the wrong tool for the job. Transaction levels and FOR UPDATE has the purpose of ensuring data integrity. Period. It it isn't designed for control flow and if you use it for this, it will bite you in the butt sooner rather than later.
Let me try to explain what SELECT FOR UPDATE is for, so that, when later I tell you that it is most definitely not for what you're trying to do with it, it is easier to follow.
Imagine a bank. Simple enough. The bank has some ATMs out front and a website where you can see your transactions and transfer money to other accounts.
Imagine you (ABC) and I (Reinier) are trying to fleece the bank some. Here is our plan: We set it up so that you have €1000,- in your account and I have nothing.
Then, you log into the website from your phone, and start a transfer, transferring €1000,- to my account. But, while you're doing that, right in the middle, you withdraw €10,- from the ATM.
If the bank messed up their transactions, it's possible you end up with €990,- in your account and I have €1000,- in my account, and we fleeced the bank. This is how that could happen (and if halfway through the example you think: I already know this stuff, I know what FOR UPDATE does! - I'm not so sure you do, read it carefully)
ATM code
startTransaction();
int currentBalance = sql("SELECT balance FROM account WHERE user = ?", abc);
if (currentBalance < requestedWithdrawal) throw new InsufficientFundsEx();
sql("UPDATE account SET balance = ? WHERE user = ?", currentBalance - requestedWithdrawal, abc);
commit();
moneyHopper.spitOut(requestedWithdrawal();
Website code
startTransaction();
int balanceTo = sql("SELECT balance FROM account WHERE user = ?", reinier);
int balanceFrom = sql("SELECT balance FROM account WHERE user = ?", abc);
if (transfer > balanceFrom) throw new InsufficientFundsEx();
sql("UPDATE account SET balance = ? WHERE user = ?", balanceTo + transfer, reinier);
sql("UPDATE account SET balance = ? WHERE user = ?", balanceFrom - transfer, abc);
commit();
controller.notifyTransferSucceeded();
How it can go wrong
The way it goes wrong is if the balanceTo and balanceFrom are 'locked in', then the ATM withdrawal goes through, and then the update SQL statements from the website transaction go through (this wipes out the ATM withdrawal, effectively - whatever the ATM spit out is free money), or if the ATM's balance check locks in, then the transfer goes through, and then the ATM's update goes through (which gives the recipient, i.e. me their €1000,-, and ensures that the ATM code's update, setting your balance to 990, is the last thing that happens, giving us €990,- of free money.
So what's the fix? Hint: Not FOR UPDATE
The fix is to consider what a transaction means. The purpose of transactions is to turn operations into atomic notions. Either both your account is reduced by the transfer amount and mine is raised by the same, or nothing happens.
It's obvious enough with statements that change things (UPDATE and INSERT). It's a bit more wonky when we talk about reading data. Should those reads be considered part of the transaction?
One way to go is to say: No, unless you add FOR UPDATE at the end of it all, in which case, yes - i.e. lock those rows only if FOR UPDATE is applied until the transaction ends.
But that is not the only way to ensure data integrity.
Optimistic locking to the rescue - or rather, to your doom
A much more common way is called MVCC (MultiVersion Concurrency Control) and is far faster. The idea behind MVCC (also called optimistic locking), is to just assume no clashes ever occur. Nothing is ever locked. Instead, [A] all changes made within a transaction are completely invisible to things running in any other transaction until you commit, and [B] when you COMMIT a transaction, the database checks if everything you have done within the span of this transaction still 'holds up' - for example, if you updated a row within this transaction that was also modified by another transaction that has committed already, you get an error when you commit, not when you ran the UPDATE statement.
In this framework, we can still talk about what SELECT even means. This, in java/JDBC, is called the Transaction Isolation Level and is configurable on a DB connection. The best level, the level the bank should be using to avoid this issue, is called the TransactionLevel.SERIALIZABLE. Serializable effectively means everything dirties everything else: If during a transaction you read some data, and when you commit, that same SELECT statement would have produced different results because some other transaction modified something, then the COMMIT just fails.
They fail with a so-called 'RetryException'. This means literally what it says: Just start your transaction over, from the top. It makes sense if you think about that bank example: What WOULD have happened, had the bank done it right and set up serializable transaction isolation level, is that either the ATM machine's transaction or the transfer transaction would get the retryexception. Assuming the bank wrote their code right and they actually do what the exception tells you to (start over), then they would start over, and that includes re-reading the balances out. No cheating of the bank can occur now.
Crucially, in the SERIALIZABLE model, locking NEVER occurs, and FOR UPDATE does not mean anything at all.
Thus, usually, FOR UPDATE does literal stone cold nothing, a complete no-op, depending on how the db is setup.
FOR UPDATE does not mean 'lock other transactions that touch this row'. No matter how much you want it to.
Some DB implementations, or even some combination of DB engine and connection configuration may be implemented in that fashion, but that is an extremely finicky setup, and your app should include documentation that strongly recommends the operator to never change the db settings, never switch db engines, never update the db engine, never update the JDBC driver, and never mess with the connection settings.
That's the kind of silly caveat you really, really don't want to put on your code.
The solution is to stop buttering your toast with that chainsaw. Even if you think you can manage to get some butter on that toast with it, it's just not what it was made for, like at all, and we're all just waiting until you lose a thumb here. Just stop doing it. Get a butterknife, please.
If you want to have one thread wait for another, don't use the database, use a lock object. If you want to have one process wait for another, don't use the database, don't use a lock object (you can't; processes don't share memory); use a file. the new java file IO has an option to make a file atomically (meaning, if the file already exists, throw an exception, otherwise make the file, and do so atomically, meaning if two processes both run this 'create atomically new file' code, you have a guarantee that one succeeds and one throws).
If you want data integrity and that's the only reason you wanted pessimistic locking in the first place, stop thinking that way - it's the DBs job, not your job, to guarantee data integrity. MVCC/Optimistic locking DBs guarantee that the bank will never get fleeced no matter how hard you try with the shenanigans at the top of this answer and nevertheless, pessimistic locking just isn't involved.
JDBC itself sucks (intentionally, a bit too much to get into) for 'end use' like what you are doing here. Get yourself an abstraction that makes it nice such as JDBI or JOOQ. These tools also have the only proper way to interact with databases, which is that all DB code must be in a lambda. That's because you don't want to manually handle those retry exceptions, you want your DB access framework to take care of it. This is what the bank code should really look like:
dbAccess.run(db -> {
int balance = db.sql("SELECT balance FROM account WHERE user =?", abc);
if (balance < requested) throw new InsufficientBalanceEx();
db.update("UPDATE account SET balance = ? WHERE user = ?", balance - requested, abc);
return requested;
};
This way, the 'framework' (the code behind that run method) can catch the retryex and just rerun the lambda as often as it needs to. rerunning is tricky - if two threads on a server both cause the other to retry, which is not that hard to do, then you can get into an endless loop where they both restart and both again cause the other to retry, at infinitum. The solution is literally dicethrowing. When retrying, you should roll a random number and wait that many milliseconds, and for every further retry, the range on which you're rolling should increase. If this sounds dumb to you, know that you're currently using it: It's how Ethernet works, too (ethernet uses randomized backoff when collisions occur on the wire). Ethernet won, token ring lost. It's the exact same principle at work (token ring is pessimistic locking, ethernet is optimistic 'eh just try it and detect if it went wrong, then just redo it, with some randomized exponential backoff sprinkled in to ensure you don't get 2 systems in lock-step forever screwing up the other's attempt).

Related

Need a locking a mechanism (Persistent lock) which will not be lost even if application restarts?

I have the following scenario. There are 2 applications that share a database. Both these applications can be used to alter the underlying database. For e.g., Customer 1 can be modified from both the systems. I want to make sure when someone performs an action on say customer 1 in application 1 then I need a persistent lock for that lock so that nobody from application 2 can perform any action on the same customer. Even if any of these applications go down, it should still hold the lock. What will be the right approach for solving such an issue?
As #Turing85's comment hints at, this is extremely dangerous territory: If someone trips over a power cable, your app is out of the running and cannot be started again. permanently. At least, until someone goes in and manually addresses the problem. This is rarely what you want.
The normal solution is to do the locking at the DB level: If it is a 'single file is the database' model, such as H2 or SQLite, then let the DB engine lock the file for writing, and treat the OS-level file lock serve as your gating mechanism. This has the considerable advantage that if app A falls out of the air for any reason (power shortage, hard crash, who knows), the lock is relinquished.
If the DB is a separate running process (psql, mysql, mssql, etc), those have locking features you can use.
If none of those options are available, you can handroll it: You can make files with the new file API that are guaranteed atomic/unique:
int pid = 0; // see below
Path p = Paths.get("/absolute/path/to/agreed/upon/lockfile/location/lockfile.pid");
Files.write(p, String.valueOf(pid), StandardOpenOption.CREATE_NEW);
The CREATE_NEW open option is asking java to ensure atomicity: Either [A] the file did not exist before, exists now, and it is this process that made it, or [B] this will throw.
There is no [C] this process created it, but another process unluckily was doing the same thing at the same time and also created it and one of these processes is now overwriting the efforts of the other - that is what CREATE_NEW means and guarantees: That this won't happen. (vs the CREATE option which will overwrite what's there and makes no atomicity guarantees).
You can now use the file system as your global unique lock: To acquire the lock, you make that file. If you can, great. You got it. If you can't, then you have to wait (you'll need to use the watcher API or a loop if you care about acquiring it as soon as you can, not a great option, that is a very expensive operation compared to in-process locking!) - to relinquish the lock, simply delete the file.
To guard against a hardcrash leaving the file there, stuck, permanently, preventing your app from ever running again, it can help to register the 'pid' (process id) inside it. This gives you some debugging if you are manually fixing matters, and you can use to automatically check ('hey, OS, is there even still a process running with id 123981? No? Okay, then it must have hard-crashed and left the lock file in place!'). Unfortunately, working with pids in java is convoluted, as java is more or less designed around the notion that you shouldn't rely too much on the underlying OS, and java does not really assume that 'process IDs' are a thing the underlying OS does. google around for how to obtain it, you CAN do this.
Which gets us to the final point, your evident fear of inconsistency: After all, you actually appear to the desire the clearly insane notion that you want the app to be permanently disabled when there's a hard crash (a process crashes and the lock is not explicitly relinquished). I assume you want this because you are afraid that the database is left in an inconsistent state and you don't want anything to ever touch it again until you manually look at it.
Oookay, well, the lock file business is precisely how you get that. However, this is rather user hostile, and not needed: You can design databases and process flows (using transactions, append-only tables, and journal systems) so that they will always cleanly survive hard crashes.
For example, consider file systems. In ye old aged sepia toned past, when you stumbled over your power cord, then on bootup you'd get a nasty thing where the system would do a 'full disk check', and it may well find a bunch of errors.
But on modern systems this is no longer the case. Trip over that power card all day long. You won't get corrupted files (unless processes are badly designed, in which case the corruption is the fault of the app, not the file system), and no extensive disk checks are needed.
This works primarily by a concept known as 'journalling'.
Say, you want to replace a file that reads "Hello, World!" with the text "Goodbye now!". You could just start writing bytes. Let's say you get to "Goodb, World!" and then someone trips over a cable.
You're now hosed. The data is inconsistent and who knows what was happening.
But imagine a different system:
Journalling
The system first makes a file called '.jobrecord', writes in it: I'm going to open this file, and overwrite the data at the start with 'Goodbye, now!'.
Then, it actually goes ahead and does that.
Then, it deletes the job record in an atomic way (by updating a single byte for example, to mark: "done").
Now, on bootup, the system can check if that file is there, and if it is, check that the job was actually done, or finish it if need be. Voila, now you can never have an inconsistent system.
You can write such tools too.
Alternative: append-only
Another way to roll is that data is only ever added, and has a validity marker. So, you never overwrite any files, you only make new ones, and you 'rotate them into place'. For example, instead of writing over the file, you make a new file called 'target.new', copy over the data, then overwrite the start with "Goodbye, now!", and then atomically rename the file over the original 'target', thus guaranteeing that the original is never harmed, and in one moment in time, the 'view' of the target file is the old one, and in another atomic followup moment, the new one, with never a time in between that is halfway between two points.
A similar concept in databases is to never UPDATE, only INSERT, having an atomically increasing counter, and knowing that 'the current state' is always the row with the highest counter number.
The point is: There are ways to build robust systems that do not ever result in data being inconsistent unless an external force corrupts your data stores.

Why does a SELECT wait for a lock?

In my application I have the problem that sometimes SELECT statements run into a java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction exception. Sadly I can't create an example as the circumstances are very complex. So the question is just about general understanding.
A little bit of background information: I'm using MySQL (InnoDB) with READ_COMMITED isolation level.
Actually I don't understand how a SELECT can ever run into a lock timeout with that setup. I thought that a SELECT would never lock as it will just return the latest commited state (managed by MySQL). Anyway according to what is happening this seems to be wrong. So how is it really?
I already read this https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html but that didn't really give me a clue. No SELECT ... FOR UPDATE or something like that is used.
That is probably due to your database. Usually this kind of problems come from that side, not from the programming side that access it.In my experience with db's, these problems are usually due to that. In the end, the programming side is just "go and get that for me in that db" thing.
I found this without much effort.
It basically explains that:
Lock wait timeout occurs typically when a transaction is waiting on row(s) of data to update which is already been locked by some other transaction.
You should also check this answer that has a specific transaction problem, which might help you, as trying to change different tables might do the timeout
the query was attempting to change at least one row in one or more InnoDB tables. Since you know the query, all the tables being accessed are candidates for being the culprit.
To speed up queries in a DB, several transactions can be executed at the same time. For example if someone runs a select query over a table for the wages of the employees of a company (each employee identified by an id) and another one changes the last name of someone who e.g. has married, you can execute both queries at the same time because they don't interfere.
But in other cases even a SELECT statement might interfere with another statement.
To prevent unexpected results in a SQL transactions, transactions follow the ACID-model which stands for Atomicity, Consistency, Isolation and Durability (for further information read wikipedia).
Let's say transaction 1 starts to calculate something and then wants to write the results to table A. Before writing it it locks all SELECT statements to table A. Otherwise this would interfere with the Isolation requirement. Because if a transaction 2 would start while 1 is still writing, 2's results depend on where 1 has already written to and where not.
Now, it might even produce a dead-lock. E.g. before transaction 1 can write the last field in table A, it still has to write something to table B but transaction 2 has already blocked table B to read safely from it after it read from A and now you have a deadlock. 2 wants to read from A which is blocked by 1, so it waits for 1 to finish but 1 waits for 2 to unlock table B to finish by itself.
To solve this problem one strategy is to rollback certain transactions after a certain timeout. (more here)
So that might be a read on for your select statement to get a lock wait timeout exceeded.
But a dead-lock usually just happens by coincidence, so if transaction 2 was forced to rollback, transaction 1 should be able to finish so that 2 should be able to succeed on a later try.

Can a write to the datastore fail?

Say I'm creating an entity like this:
Answer answer = new Answer(this, question, optionId);
ofy().save().entity(answer);
Should I check whether the write process is successful?
Say I want to make another action (increment a counter), Should I make a transaction, that includes the writing process?
And also, how can I check if the writing process is successful?
An error while saving will produce an exception. Keep in mind that since you are not calling now(), you have started an async operation and the actual exception may occur when the session is closed (eg, end of the request).
Yes, if you want to increment a counter, you need to start a transaction which encompasses the load, increment, and save. Also keep in mind that it's possible for a transaction to retry even though it is successful, so a naive transaction can possibly overcount. If you need a rigidly exact increment, the pattern is significantly more complex. All databases suffer from some variation of this problem.

Check MySQL table's ROW LOCK STATUS via Java

I have a Java frontend and a MySQL backend scenario, I used a 'LOCK IN SHARE MODE' for SELECT. If I request the same row from another process, it gives the data.. However it does not allow me to update. What I would like to do is inform the user they will only have a READ only copy, so if they wish to see the information they can else they can request it later.. How could I check the status of the ROW so that the user will be informed about this situation?? If I use 'FOR UPDATE', It just waits for until the first user saves the data. I find it less user friendly, if they just have a blank screen or when they click button it does nothing. Any help will be greatly appreciated. Using MySQL 5.5, Java 7.
The short answer is "You can't"!
You may want to take a look at this discussion.
[EDIT]
The answer to that post states:
You can't (check lock's state) for non-named locks!!!! More info:
http://forums.mysql.com/read.php?21,222363,223774#msg-223774
Row-level locks are not meant for application level locks. They are just means to implement consistent reads and writes. That means you have to release them as soon as possible. You need to implement your own application level lock and it's not that much hard. Perhaps a simple user_id field will do. If it is null then there's no lock. But if it's not null, the id indicates who is holding the record. In this case you'll need row-level locking to update the user_id field. And as I said before, you'll have to release MySQL lock as soon as you are done locking / unlocking the record.
The question's entire premise lies in the rather liberal use of RDBMS' row-level locking (which is usually used for short-lived concurrency control) directly for interactive UI control.
But putting that aside and answering the question, one can set the session's innodb_lock_wait_timeout to a very short value, minimum being 1, and catching the resulting Lock wait timeout exceeded; try restarting transaction when unable to lock.
The exception class was com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException when I just tried with mysql-connector-java 5.1.38, but other exception classes has changed over releases so this too may be different in older version of MySQL Connector/J.
The "attempt and fail" method of acquiring locks is the standard way of tackling these types of concurrency situations, as the alternate method of "check before attempting" is an anti-pattern that creates a race-condition between checking and the actual attempt to lock.

locking DB records for concurrency between threads

This kind of thing has been done a million times I'm sure, but my search foo appears weak today, and I'd like to get opinions on what is generally considered the best way to accomplish this goal.
My application keeps track of sessions for online users in a system. Each session corresponds to a single record in a database. A session can be ended in one of two ways. Either a "stop" message is received, or the session can timeout. The former case is easy, it is handled in the message processing thread and everything is fine. The latter case is where the concern comes from.
In order to process timeouts, each record has an ending time column that is updated each time a message is received for that session. To make timeouts work, I have a thread that returns all records from the database whose endtime < NOW() (has an end time in the past), and goes through the processing to close those sessions. The problem here is that it's possible that I might receive a message for a session while the timeout thread is going through processing for the same session. I end up with a race between the timeout thread and message processing thread.
I could use a semaphore or the like and just prevent the message thread from processing while timeout is taking place as it only needs to run every 30 seconds or a minute. However, as the user table gets large this is going to run into some serious performance issues. What I think I would like is a way to know in the message thread that this record is currently being processed by the timeout thread. If I could achieve that I could either discard the message or wait for timeout thread to end but only in the case of conflicts now instead of always.
Currently my application uses JDBC directly. Would there be an easier/standard method for solving this issue if I used a framework such as Hibernate?
This is a great opportunity for all kinds of crazy bugs to occur, and some of the cures can cause performance issues.
The classic solution would be to use transactions (http://dev.mysql.com/doc/refman/5.0/en/commit.html). This allows you to guarantee the consistency of your data - but a long-running transaction on the database turns it into a huge bottleneck; if your "find timed-out sessions" code runs for a minute, the transaction may run for that entire period, effectively locking write access to the affected table(s). Most systems would not deal well with this.
My favoured solution for this kind of situation is to have a "state machine" for status; I like to implement this as a history table, but that does tend to lead to a rapidly growing database.
You define the states of a session as "initiated", "running", "timed-out - closing", "timed-out - closed", and "stopped by user" (for example).
You implement code which honours the state transition logic in whatever data access logic you've got. The pseudo code for your "clean-up" script might then be:
update all records whose endtime < now() and whose status is "running, set status = "timed-out - closing"
for each record whose status is "timed-out - closing"
do whatever other stuff you need to do
update that record to set status "timed-out - closed" where status = "timed-out - closing"
next record
All other attempts to modify the current state of the session record must check that the current status is valid for the attempted change.
For instance, the "manual" stop code should be something like this:
update sessions
set status = "stopped by user"
where session_id = xxxxx
and status = 'running'
If the auto-close routine has kicked off in the time between showing the user interface and the database code, the where clause won't match any records, so the rest of the code simply doesn't run.
For this to work, all code that modifies the session status must check its pre-conditions; the most maintainable way is to encode status and allowed transitions into a separate database table.
You could also write triggers to enforce this logic, though I'm normally not a fan of triggers - only do this if you have to.
I don't think this adds significant performance worries - but test and optimize. The majority of the extra work on the database is by adding extra "where" clauses to your update statements; assuming you have an index on status, it's unlikely to have a measurable impact.

Categories

Resources