(UPDLOCK, ROWLOCK) locks whole table even tough only 1 row is selected

(UPDLOCK, ROWLOCK) locks whole table even tough only 1 row is selected - java

Inside our Java application we are using a SQL Server statement to pause some processes.
This is the SQL statement:
SELECT * FROM MESSAGES WITH (UPDLOCK, ROWLOCK)
WHERE MESSAGES.INTERNAL_ID IN ('6f53448f-1c47-4a58-8839-e126e81130f0');
The UUIDs in the IN clause changes of course from run to run.
This the Java code we use for locking:
entityManager.createNativeQuery(sqlString).getResultList()
The above SQL statement returns only one row. Unfortunately it seems that the whole table gets locked. The result is that all processes are locked even though none or only some should be blocked.
Why is the whole table locked even though I specify UPDLOCK?
Additional information:
MESSAGES.INTERNAL_ID is NVARCHAR(255) which is not nullable.
Otherwise there is no constraint on the column.
The isolation level is READ_COMMITTED.

This is because your MESSAGES.INTERNAL_ID is not a key. Once row is locked you cannot read it and check it's value. Try to create a primary key on this column.
If it's impossible, create INDEX on it and rewrite your query:
SELECT MESSAGES.INTERNAL_ID FROM MESSAGES WITH (UPDLOCK, ROWLOCK)
WHERE MESSAGES.INTERNAL_ID IN ('6f53448f-1c47-4a58-8839-e126e81130f0');
MSDN says:
Lock hints ROWLOCK, UPDLOCK, AND XLOCK that acquire row-level locks
may place locks on index keys rather than the actual data rows. For
example, if a table has a nonclustered index, and a SELECT statement
using a lock hint is handled by a covering index, a lock is acquired
on the index key in the covering index rather than on the data row in
the base table.

Related

Locking table rows in a distributed application, entire table is being locked when running select..for update in mysql 5.6 [duplicate]

I have a user table with field lastusedecnumber.
I need to access and increment lastusedecnumber.
During that accessing time I need to lock that particular user row (not the entire table).
How do I do this?
The table type is MyISAM.

MySQL uses only table-level locking from MyISAM tables. If you can, switch to InnoDB for row-level locking.
Here's a link to the MySQL site describing Locks set by SQL Statements for InnoDB tables.
http://dev.mysql.com/doc/refman/5.0/en/innodb-locks-set.html

Kind of late, but hope it will help someone:
UPDATE user SET lastusedecnumber = LAST_INSERT_ID(lastusedecnumber + 1);
SELECT LAST_INSERT_ID();
Will give you atomic increment of lastusedecnumber and ability to read new value of lastusedecnumber field (after increment) using SELECT LAST_INSERT_ID().

As a workaround you could add a column to your table, like locked TINYINT(1) - whenever you want the row to be locked you set it to 1. When you now try to access this row, the first thing you do is check if the locked fields is set.
To unlock the row, simply set it to 0 again. Not nice but a really simple workaround.

I didn't feel like converting my whole database from myisam. So I simply try to create a new table named based on the id of the record I want to lock. If create table is successful, do my work and delete the table at the end. If create table not successful, stop.

A better workaround is to create a column containting a timestamp. Whenever you want to lock the row you update it to the current time. To unlock update to a time at least x minutes in the past. Then to check if its locked check that the time stamp is at least x minutes old.
This way if the process crashes (or the user never completes their operation) the lock effectively expires after x minutes.

Does PESSIMISTIC_WRITE lock the whole table?

Just to be sure that I correctly understand how things work.
If I do em.lock(employee, LockModeType.PESSIMISTIC_WRITE); - will it block only this entity (employee) or the whole table Employees?
If it matters, I am talking about PostgreSQL.

It should block only the entity.
PostgreSQL hibernate dialect adds for update in case of write locks:
https://github.com/hibernate/hibernate-orm/blob/master/hibernate-core/src/main/java/org/hibernate/dialect/PostgreSQL81Dialect.java#L549
(newer versions just use the same implementation)
for update is treated row-wise by PostgreSQL:
https://www.postgresql.org/docs/9.5/static/explicit-locking.html
FOR UPDATE causes the rows retrieved by the SELECT statement to be
locked as though for update. This prevents them from being locked,
modified or deleted by other transactions until the current
transaction ends. That is, other transactions that attempt UPDATE,
DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE
or SELECT FOR KEY SHARE of these rows will be blocked until the
current transaction ends; conversely, SELECT FOR UPDATE will wait for
a concurrent transaction that has run any of those commands on the
same row, and will then lock and return the updated row (or no row, if
the row was deleted).

How to prevent MySQL InnoDB setting a lock for delete statement through JDBC

I have a multi-threaded client/server system with thousands of clients continuously sending data to the server that is stored in a specific table. This data is only important for a few days, so it's deleted afterwards.
The server is written in J2SE, database is MySQL and my table uses InnoDB engine. It contains some millions of entries (and is indexed properly for the usage).
One scheduled thread is running once a day to delete old entries. This thread could take a large amount of time for deleting, because the number of rows to delete could be very large (some millions of rows).
On my specific system deletion of 2.5 million rows would take about 3 minutes.
The inserting threads (and reading threads) get a timeout error telling me
Lock wait timeout exceeded; try restarting transaction
How can I simply get that state from my Java code? I would prefer handling the situation on my own instead of waiting. But the more important point is, how to prevent that situation?
Could I use
conn.setIsolationLevel( Connection.TRANSACTION_READ_UNCOMMITTED )
for the reading threads, so they will get their information regardless if it is most currently accurate (which is absolutely OK for this usecase)?
What can I do to my inserting threads to prevent blocking? They purely insert data into the table (primary key is the tuple userid, servertimemillis).
Should I change my deletion thread? It is purely deleting data for the tuple userid, greater than specialtimestamp.
Edit:
When reading the MySQL documentation, I wonder if I cannot simply define the connection for inserting and deleting rows with
conn.setIsolationLevel( Connection.TRANSACTION_READ_COMMITTED )
and achieve what I need. It says that UPDATE- and DELETE statements, that use a unique index with a unique search pattern only lock the matching index entry, but not the gap before and with that, rows can still be inserted into that gap. It would be great to get your experience on that, since I can't simply try it on production - and it is a big effort to simulate it on test environment.

Try in your deletion thread to first load the IDs of the records to be deleted and then delete one at a time, committing after each delete.
If you run the thread that does the huge delete once a day and it takes 3 minutes, you can split it to smaller transactions that delete a small number of records, and still manage to get it done fast enough.
A better solution :
First of all. Any solution you try must be tested prior to deployment in production. Especially a solution suggested by some random person on some random web site.
Now, here's the solution I suggest (making some assumptions regarding your table structure and indices, since you didn't specify them):
Alter your table. It's not recommended to have a primary key of multiple columns in InnoDB, especially in large tables (since the primary key is included automatically in any other indices). See the answer to this question for more reasons. You should add some unique RecordID column as primary key (I'd recommend a long identifier, or BIGINT in MySQL).
Select the rows for deletion - execute "SELECT RecordID FROM YourTable where ServerTimeMillis < ?".
Commit (to release the lock on the ServerTimeMillis index, which I assume you have, quickly)
For each RecordID, execute "DELETE FROM YourTable WHERE RecordID = ?"
Commit after each record or after each X records (I'm not sure whether that would make much difference). Perhaps even one Commit at the end of the DELETE commands will suffice, since with my suggested new logic, only the deleted rows should be locked.
As for changing the isolation level. I don't think you have to do it. I can't suggest whether you can do it or not, since I don't know the logic of your server, and how it will be affected by such a change.

You can try to replace your one huge DELETE with multiple shorter DELETE ... LIMIT n with n being determined after testing (not too small to cause many queries and not too large to cause long locks). Since the locks would last for a few ms (or seconds, depending on your n) you could let the delete thread run continuously (provided it can keep-up; again n can be adjusted so it can keep-up).
Also, table partitioning can help.

How do I ensure data consistency in this concurrent situation?

The problem is this:
I have multiple competing threads (100+) that need to access one database table
Each thread will pass a String name - where that name exists in the table, the database should return the id for the row, where the name doesn't already exist, the name should be inserted and the id returned.
There can only ever be one instance of name in the database - ie. name must be unique
How do I ensure that thread one doesn't insert name1 at the same time as thread two also tries to insert name1? In other words, how do I guarantee the uniqueness of name in a concurrent environment? This also needs to be as efficient as possible - this has the potential to be a serious bottleneck.
I am using MySQL and Java.
Thanks

Assuming there is a unique constraint on the name column, each insert will acquire a lock. Any thread that attempts to insert it a second time concurrently will wait until the 1st insert either succeeds or fails (tx commit or rolls back).
If the 1st transaction succeeds, 2nd transaction will fail with with a unique key violation. Then you know it exists already.
If there is one insert per transaction, it'ok. If there are more than 1 insert per transaction, you may deadlock.
Each thread will pass a String name -
where that name exists in the table,
the database should return the id for
the row, where the name doesn't
already exist, the name should be
inserted and the id returned.
So all in all, the algo is like this:
1 read row with name
2.1 if found, return row id
2.2 if not found, attempt to insert
2.2.1 if insert succeeds, return new row id
2.2.2 if insert fails with unique constraint violation
2.2.2.1 read row with name
2.2.2.2 read should succeed this time, so return row id
Because there can be a high contention on the unique index, the insert may block for some time. In which case the transaction may time out. Make some stress test, and tune the configuration until it works correctly with your load.
Also, you should check if you get a unique constraint violation exception or some other exception.
And again, this works only if there is one insert per transaction, otherwise it may deadlock.
Also, you can try to read the row at step 1 with "select * for update". In this case, it waits until a concurrent insert either commits or succeeds. This can slightly reduce the amount of error at step 2.2.2 due to the contention on the index.

Create a unique constraint on name column in database.

Add a unique constraint for the name column.

Concurrent process inserting data in database

Consider following schema in postgres database.
CREATE TABLE employee
(
id_employee serial NOT NULL PrimarKey,
tx_email_address text NOT NULL Unique,
tx_passwd character varying(256)
)
I have a java class which does following
conn.setAutoComit(false);
ResultSet rs = stmt.("select * from employee where tx_email_address = 'test1'");
if (!rs.next()) {
Insert Into employee Values ('test1', 'test1');
}
ResultSet rs = stmt.("select * from employee where tx_email_address = 'test2'");
if (!rs.next()) {
Insert Into employee Values ('test2', 'test2');
}
ResultSet rs = stmt.("select * from employee where tx_email_address = 'test3'");
if (!rs.next()) {
Insert Into employee Values ('test3', 'test3');
}
ResultSet rs = stmt.("select * from employee where tx_email_address = 'test4'");
if (!rs.next()) {
Insert Into employee Values ('test4', 'test4');
}
conn.commit();
conn.setAutoComit(true);
The problem here is if there are two or more concurrent instance of the above mentioned transaction trying to write data. Only one transaction would eventually succeeds and rest would throw SQLException "unique key constraint violation". How do we get around this.
PS: I have chosen only one table and simple insert queries to demonstrate the problem. My application is java based application whose sole purpose is to write data to the target database. and there can be concurrent process doing so and there is very high probability that some process might be trying to write in same data(as shown in example above).

The simplest way would seem to be to use the transaction isolation level 'serializable', which prevents phantom reads (other people inserting data which would satisfy a previous SELECT during your transaction).
if (!conn.getMetaData().supportsTransactionIsolationLevel(Connection.TRANSACTION_SERIALIZABLE)) {
// OK, you're hosed. Hope for your sake your drivers supports this isolation level
}
conn.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE);
There are also techniques like Oracle's "MERGE" statement -- a single statement which does 'insert or update', depending on whether the data's there. I don't know if Postgres has an equivalent, but there are techniques to 'fake it' -- see e.g.
How to write INSERT IF NOT EXISTS queries in standard SQL.

I would first try to design the data flow in a way that only one transaction will ever get one instance of the data. In that scenario the "unique key constraint violation" should never happen and therefore indicate a real problem.
Failing that, I would catch and ignore the "unique key constraint violation" after each insert. Of course, logging that it happened might be a good idea still.
If both approaches were not feasible for some reason, then I would most probably create a transit table of the same structure as "employee", but without primary key constraint and with a "transit status" field. No "unique key constraint violation" would ever happen on the insert into this transit table.
A job would be needed, that reads out this transit table and transfers the data into the "employee" table. This job would utilize the "transit status" to keep track of processed rows. I would let the job do different things each run:
execute an update statement on the transit table to set the "transit status" to "work in progress" for a number of rows. How large that number is or if all currently new rows get marked would need some thinking over.
execute an update statement that sets "transit status" to "duplicate" for all rows whose data is already in the "employee" table and whose "transit status" is not in ("duplicate", "processed")
repeat as long as there are rows in the transit table with "transit status" = "work in progress":
select a row from the transit table with "transit status" = "work in progress".
Insert that rows data into the "employee" table.
Set this rows "transit status" to "processed".
update all rows in the transit table with the same data as the currently processed row and "transit status" = "work in progress" to "transit status" = "duplicate".
I would most probably want another job to regularly delete the rows with "transit status" in ("duplicate", "processed")
If postgres does not know database jobs, an os side job would do.

A solution is to use a table level exclusive lock, locking for write while allowing concurrent reads, using the command LOCK.
Pseudo-sql-code:
select * from employee where tx_email_address = 'test1';
if not exists
lock table employee in exclusive mode;
select * from employee where tx_email_address = 'test1';
if still not exists //may be inserted before lock
insert into employee values ('test1', 'test1');
commit; //releases exclusive lock
Note that using this method will block all other writes until the lock is released, lowering throughput.
If all inserts are dependent on a parent row, then a better approach is to lock only the parent row, serializing child inserts, instead of locking the whole table.

You could expose a public method that queues the write operations and handles queue concurrency, then create another method to run on a different thread (or another process entirely) that actually performs the writes serially.

You could add concurrency control at the application level this by making the code a critical section:
synchronized(lock) {
// Code to perform selects / inserts within database transaction.
}
This way one thread is prevented from querying the table while the other is querying and inserting into the table. When the first thread completes, the second thread enters the synchronized block. However, at this point each select attempt will return data and hence the thread will not attempt to insert data.
EDIT:
In cases where you have multiple processes inserting into the same table you could consider taking out a table lock when performing the transaction to prevent other transactions from commencing. This is effectively doing the same as the code above (i.e. serializing the two transactions) but at the database level. Obviously there are potential performance implications in doing this.

One way to solve this particular problem is by ensuring that each of the individual threads/instances process rows in a mutually exclusive manner. In other words if instance 1 processes rows where tx_email_address='test1' then no other instance should process these rows again.
This can be achieved by generating a unique server id on instance startup and marking the rows to be processed with this server id. The way to do it is by -
<LOOP>
adding 2 columns status and server_id to employee table.
update employee set status='In Progress', server_id='<unique_id_for_instance>' where status='Uninitialized' and rownum<2
commit
select * from employee where server_id='<unique_id_for_instance>' and status='In Progress'
process the rows selected in step 4.
<END LOOP>
Following the above sequence of steps ensures that all the VM instances get different rows to process and there is no deadlock. It is necessary to have update before select to make the operation atomic. Doing it the other way round can lead to concurrency issues.
Hope this helps

An often used system is to have a primary key that is a UUID ( Unique Universal ID ) and a UUIDGenerator,
see http://jug.safehaus.org/ or similar things google has lots of answers
This will prevent the Unique Key constraint to happen
But that offcourse is only a part of your problem, you tx_email_address would still have to be unique and nothing solves that.
There is no way to prevent the constraint violation to happen, as long as you have concurrency you will run into it, and in itself this really is no problem.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.