Concurrent process inserting data in database - java

Consider following schema in postgres database.
CREATE TABLE employee
(
id_employee serial NOT NULL PrimarKey,
tx_email_address text NOT NULL Unique,
tx_passwd character varying(256)
)
I have a java class which does following
conn.setAutoComit(false);
ResultSet rs = stmt.("select * from employee where tx_email_address = 'test1'");
if (!rs.next()) {
Insert Into employee Values ('test1', 'test1');
}
ResultSet rs = stmt.("select * from employee where tx_email_address = 'test2'");
if (!rs.next()) {
Insert Into employee Values ('test2', 'test2');
}
ResultSet rs = stmt.("select * from employee where tx_email_address = 'test3'");
if (!rs.next()) {
Insert Into employee Values ('test3', 'test3');
}
ResultSet rs = stmt.("select * from employee where tx_email_address = 'test4'");
if (!rs.next()) {
Insert Into employee Values ('test4', 'test4');
}
conn.commit();
conn.setAutoComit(true);
The problem here is if there are two or more concurrent instance of the above mentioned transaction trying to write data. Only one transaction would eventually succeeds and rest would throw SQLException "unique key constraint violation". How do we get around this.
PS: I have chosen only one table and simple insert queries to demonstrate the problem. My application is java based application whose sole purpose is to write data to the target database. and there can be concurrent process doing so and there is very high probability that some process might be trying to write in same data(as shown in example above).

The simplest way would seem to be to use the transaction isolation level 'serializable', which prevents phantom reads (other people inserting data which would satisfy a previous SELECT during your transaction).
if (!conn.getMetaData().supportsTransactionIsolationLevel(Connection.TRANSACTION_SERIALIZABLE)) {
// OK, you're hosed. Hope for your sake your drivers supports this isolation level
}
conn.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE);
There are also techniques like Oracle's "MERGE" statement -- a single statement which does 'insert or update', depending on whether the data's there. I don't know if Postgres has an equivalent, but there are techniques to 'fake it' -- see e.g.
How to write INSERT IF NOT EXISTS queries in standard SQL.

I would first try to design the data flow in a way that only one transaction will ever get one instance of the data. In that scenario the "unique key constraint violation" should never happen and therefore indicate a real problem.
Failing that, I would catch and ignore the "unique key constraint violation" after each insert. Of course, logging that it happened might be a good idea still.
If both approaches were not feasible for some reason, then I would most probably create a transit table of the same structure as "employee", but without primary key constraint and with a "transit status" field. No "unique key constraint violation" would ever happen on the insert into this transit table.
A job would be needed, that reads out this transit table and transfers the data into the "employee" table. This job would utilize the "transit status" to keep track of processed rows. I would let the job do different things each run:
execute an update statement on the transit table to set the "transit status" to "work in progress" for a number of rows. How large that number is or if all currently new rows get marked would need some thinking over.
execute an update statement that sets "transit status" to "duplicate" for all rows whose data is already in the "employee" table and whose "transit status" is not in ("duplicate", "processed")
repeat as long as there are rows in the transit table with "transit status" = "work in progress":
select a row from the transit table with "transit status" = "work in progress".
Insert that rows data into the "employee" table.
Set this rows "transit status" to "processed".
update all rows in the transit table with the same data as the currently processed row and "transit status" = "work in progress" to "transit status" = "duplicate".
I would most probably want another job to regularly delete the rows with "transit status" in ("duplicate", "processed")
If postgres does not know database jobs, an os side job would do.

A solution is to use a table level exclusive lock, locking for write while allowing concurrent reads, using the command LOCK.
Pseudo-sql-code:
select * from employee where tx_email_address = 'test1';
if not exists
lock table employee in exclusive mode;
select * from employee where tx_email_address = 'test1';
if still not exists //may be inserted before lock
insert into employee values ('test1', 'test1');
commit; //releases exclusive lock
Note that using this method will block all other writes until the lock is released, lowering throughput.
If all inserts are dependent on a parent row, then a better approach is to lock only the parent row, serializing child inserts, instead of locking the whole table.

You could expose a public method that queues the write operations and handles queue concurrency, then create another method to run on a different thread (or another process entirely) that actually performs the writes serially.

You could add concurrency control at the application level this by making the code a critical section:
synchronized(lock) {
// Code to perform selects / inserts within database transaction.
}
This way one thread is prevented from querying the table while the other is querying and inserting into the table. When the first thread completes, the second thread enters the synchronized block. However, at this point each select attempt will return data and hence the thread will not attempt to insert data.
EDIT:
In cases where you have multiple processes inserting into the same table you could consider taking out a table lock when performing the transaction to prevent other transactions from commencing. This is effectively doing the same as the code above (i.e. serializing the two transactions) but at the database level. Obviously there are potential performance implications in doing this.

One way to solve this particular problem is by ensuring that each of the individual threads/instances process rows in a mutually exclusive manner. In other words if instance 1 processes rows where tx_email_address='test1' then no other instance should process these rows again.
This can be achieved by generating a unique server id on instance startup and marking the rows to be processed with this server id. The way to do it is by -
<LOOP>
adding 2 columns status and server_id to employee table.
update employee set status='In Progress', server_id='<unique_id_for_instance>' where status='Uninitialized' and rownum<2
commit
select * from employee where server_id='<unique_id_for_instance>' and status='In Progress'
process the rows selected in step 4.
<END LOOP>
Following the above sequence of steps ensures that all the VM instances get different rows to process and there is no deadlock. It is necessary to have update before select to make the operation atomic. Doing it the other way round can lead to concurrency issues.
Hope this helps

An often used system is to have a primary key that is a UUID ( Unique Universal ID ) and a UUIDGenerator,
see http://jug.safehaus.org/ or similar things google has lots of answers
This will prevent the Unique Key constraint to happen
But that offcourse is only a part of your problem, you tx_email_address would still have to be unique and nothing solves that.
There is no way to prevent the constraint violation to happen, as long as you have concurrency you will run into it, and in itself this really is no problem.

Related

(UPDLOCK, ROWLOCK) locks whole table even tough only 1 row is selected

Inside our Java application we are using a SQL Server statement to pause some processes.
This is the SQL statement:
SELECT * FROM MESSAGES WITH (UPDLOCK, ROWLOCK)
WHERE MESSAGES.INTERNAL_ID IN ('6f53448f-1c47-4a58-8839-e126e81130f0');
The UUIDs in the IN clause changes of course from run to run.
This the Java code we use for locking:
entityManager.createNativeQuery(sqlString).getResultList()
The above SQL statement returns only one row. Unfortunately it seems that the whole table gets locked. The result is that all processes are locked even though none or only some should be blocked.
Why is the whole table locked even though I specify UPDLOCK?
Additional information:
MESSAGES.INTERNAL_ID is NVARCHAR(255) which is not nullable.
Otherwise there is no constraint on the column.
The isolation level is READ_COMMITTED.
This is because your MESSAGES.INTERNAL_ID is not a key. Once row is locked you cannot read it and check it's value. Try to create a primary key on this column.
If it's impossible, create INDEX on it and rewrite your query:
SELECT MESSAGES.INTERNAL_ID FROM MESSAGES WITH (UPDLOCK, ROWLOCK)
WHERE MESSAGES.INTERNAL_ID IN ('6f53448f-1c47-4a58-8839-e126e81130f0');
MSDN says:
Lock hints ROWLOCK, UPDLOCK, AND XLOCK that acquire row-level locks
may place locks on index keys rather than the actual data rows. For
example, if a table has a nonclustered index, and a SELECT statement
using a lock hint is handled by a covering index, a lock is acquired
on the index key in the covering index rather than on the data row in
the base table.

How to prevent MySQL InnoDB setting a lock for delete statement through JDBC

I have a multi-threaded client/server system with thousands of clients continuously sending data to the server that is stored in a specific table. This data is only important for a few days, so it's deleted afterwards.
The server is written in J2SE, database is MySQL and my table uses InnoDB engine. It contains some millions of entries (and is indexed properly for the usage).
One scheduled thread is running once a day to delete old entries. This thread could take a large amount of time for deleting, because the number of rows to delete could be very large (some millions of rows).
On my specific system deletion of 2.5 million rows would take about 3 minutes.
The inserting threads (and reading threads) get a timeout error telling me
Lock wait timeout exceeded; try restarting transaction
How can I simply get that state from my Java code? I would prefer handling the situation on my own instead of waiting. But the more important point is, how to prevent that situation?
Could I use
conn.setIsolationLevel( Connection.TRANSACTION_READ_UNCOMMITTED )
for the reading threads, so they will get their information regardless if it is most currently accurate (which is absolutely OK for this usecase)?
What can I do to my inserting threads to prevent blocking? They purely insert data into the table (primary key is the tuple userid, servertimemillis).
Should I change my deletion thread? It is purely deleting data for the tuple userid, greater than specialtimestamp.
Edit:
When reading the MySQL documentation, I wonder if I cannot simply define the connection for inserting and deleting rows with
conn.setIsolationLevel( Connection.TRANSACTION_READ_COMMITTED )
and achieve what I need. It says that UPDATE- and DELETE statements, that use a unique index with a unique search pattern only lock the matching index entry, but not the gap before and with that, rows can still be inserted into that gap. It would be great to get your experience on that, since I can't simply try it on production - and it is a big effort to simulate it on test environment.
Try in your deletion thread to first load the IDs of the records to be deleted and then delete one at a time, committing after each delete.
If you run the thread that does the huge delete once a day and it takes 3 minutes, you can split it to smaller transactions that delete a small number of records, and still manage to get it done fast enough.
A better solution :
First of all. Any solution you try must be tested prior to deployment in production. Especially a solution suggested by some random person on some random web site.
Now, here's the solution I suggest (making some assumptions regarding your table structure and indices, since you didn't specify them):
Alter your table. It's not recommended to have a primary key of multiple columns in InnoDB, especially in large tables (since the primary key is included automatically in any other indices). See the answer to this question for more reasons. You should add some unique RecordID column as primary key (I'd recommend a long identifier, or BIGINT in MySQL).
Select the rows for deletion - execute "SELECT RecordID FROM YourTable where ServerTimeMillis < ?".
Commit (to release the lock on the ServerTimeMillis index, which I assume you have, quickly)
For each RecordID, execute "DELETE FROM YourTable WHERE RecordID = ?"
Commit after each record or after each X records (I'm not sure whether that would make much difference). Perhaps even one Commit at the end of the DELETE commands will suffice, since with my suggested new logic, only the deleted rows should be locked.
As for changing the isolation level. I don't think you have to do it. I can't suggest whether you can do it or not, since I don't know the logic of your server, and how it will be affected by such a change.
You can try to replace your one huge DELETE with multiple shorter DELETE ... LIMIT n with n being determined after testing (not too small to cause many queries and not too large to cause long locks). Since the locks would last for a few ms (or seconds, depending on your n) you could let the delete thread run continuously (provided it can keep-up; again n can be adjusted so it can keep-up).
Also, table partitioning can help.

Handling the concurrent request while persisting in oracle database?

I have this scenario ,on a airline website (using Java) two separate customers send two requests at same time to book a same seat in same airline
from New York to Chicago. I am using the oracle database and isolation level is read committed.My question here is that does oracle database provide
any solution to deal with this kind of concurrent scenario? what I know is when first transaction DML statement is fired it will get a lock on affected
rows and will release when transaction completes i.e on issuing rollback or commit.But as soon as commit is done and second request will proceed as soon as
first is completed and will override the first one.So it does not help?
Yes in Java I can deal with making my db class as singleton and using synchronized keyword on method which is doing update. But want to know is there
anyway we can this kind of issue at database level itself?Probably isolation level as serializable can help. But not sure?
It will only over write if you allow it. You can try something like
UPDATE seatTable
SET seatTaken = true
WHERE .. find the seat, flight etc.. AND seatTaken = false
This will return 1 row updated the first time and 0 rows updated after that.
As you mention, transanction settings will help you achieving one operation. The best way to enforce this kind of restrictions it to ensure that your relational model is constrained not to accept the 2nd operation once the 1st one succeeds.
Instead of having to do an update on a row, say update .... seat = "taken", create a reservation table (customer, flight, seat) which has a constrain (column:seat = unique) (lookup ora docs to learn the syntax for that on table creation). That way your reservation process becomes an insert in the reservation table and you can rely on the RDBMS to enforce your relational constrains to keep your business model valid.
e.g. Let t1 be the earlier operation time, you'll have:
t1=> insert into reservations(customer1,flight-x,seat-y) // succeeds. Customer 1 reserved the seat-y
t2=> insert into reservations(customer2,flight-x,seat-y) // fails with RDBMS unique constrain violated.
The only way to reserve seat-y again is to first remove the previous reservation, which is probably what your business process wants to achieve.
To handle concurrency in a web site a common practice it to have a column on each record that allows you to check it has not been updated since you got it. Either last update date or a sequential version number (auto incremented by a trigger).
Typically you will read the data (plus the concurrency column)
SELECT seat,etc,version_no
FROM t1
WHERE column = a_value
Then when the user eventually gets round to booking the seat the update will work unless there has been an update.
(the version number or update date will change after every update)
BEGIN
UPDATE t1
SET seatTaken = true
WHERE seatid = .....
AND version_no = p_version
RETURNING version_no INTO p_version;
EXCEPTION WHEN NOT_FOUND THEN
--Generate a custom exception
--concurrency viloation the record has been updated already
END;
the trigger to auto update the version number would look a little like this
CREATE OR REPLACE TRIGGER t1_version
AFTER INSERT OR UPDATE ON t1
FOR EACH ROW
BEGIN
IF :new.version_no IS NULL THEN
:new.version_no := 0;
ELSE
:new.version_no := :old.version_no + 1;
END IF;
END;
Aside from doing everything in a single UPDATE by carefully crafting WHERE clause, you can do this:
Transaction 1:
SELECT ... FOR UPDATE exclusively locks the row for the duration of the transaction.
Check if the returned status of the row is "booked" and exit (or retry another row) if it is.
UPDATE the row and set its "status" to "booked" - it is guaranteed nobody else updated it in the meantime.
Commit. This removes the exclusive lock.
Transaction 2:
SELECT ... FOR UPDATE blocks until Transaction 1 finishes, then exclusively locks the row.
The returned status of the row is "booked" (since Transaction 1 marked it that way), so exit (or possibly retry another row).

How do I ensure data consistency in this concurrent situation?

The problem is this:
I have multiple competing threads (100+) that need to access one database table
Each thread will pass a String name - where that name exists in the table, the database should return the id for the row, where the name doesn't already exist, the name should be inserted and the id returned.
There can only ever be one instance of name in the database - ie. name must be unique
How do I ensure that thread one doesn't insert name1 at the same time as thread two also tries to insert name1? In other words, how do I guarantee the uniqueness of name in a concurrent environment? This also needs to be as efficient as possible - this has the potential to be a serious bottleneck.
I am using MySQL and Java.
Thanks
Assuming there is a unique constraint on the name column, each insert will acquire a lock. Any thread that attempts to insert it a second time concurrently will wait until the 1st insert either succeeds or fails (tx commit or rolls back).
If the 1st transaction succeeds, 2nd transaction will fail with with a unique key violation. Then you know it exists already.
If there is one insert per transaction, it'ok. If there are more than 1 insert per transaction, you may deadlock.
Each thread will pass a String name -
where that name exists in the table,
the database should return the id for
the row, where the name doesn't
already exist, the name should be
inserted and the id returned.
So all in all, the algo is like this:
1 read row with name
2.1 if found, return row id
2.2 if not found, attempt to insert
2.2.1 if insert succeeds, return new row id
2.2.2 if insert fails with unique constraint violation
2.2.2.1 read row with name
2.2.2.2 read should succeed this time, so return row id
Because there can be a high contention on the unique index, the insert may block for some time. In which case the transaction may time out. Make some stress test, and tune the configuration until it works correctly with your load.
Also, you should check if you get a unique constraint violation exception or some other exception.
And again, this works only if there is one insert per transaction, otherwise it may deadlock.
Also, you can try to read the row at step 1 with "select * for update". In this case, it waits until a concurrent insert either commits or succeeds. This can slightly reduce the amount of error at step 2.2.2 due to the contention on the index.
Create a unique constraint on name column in database.
Add a unique constraint for the name column.

Updating a database while using a preparedStatement select

I'm selecting a subset of data from a MS SQL datbase, using a PreparedStatement.
While iterating through the resultset, I also want to update the rows. At the moment I use something like this:
prepStatement = con.prepareStatement(
selectQuery,
ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_UPDATABLE);
rs = prepStatement.executeQuery();
while(rs.next){
rs.updateInt("number", 20)
rs.updateRow();
}
The database is updated with the correct values, but I get the following exception:
Optimistic concurrency check failed. The row was modified outside of this cursor.
I've Googled it, but haven't been able to find any help on the issue.
How do I prevent this exception? Or since the program does do what I want it to do, can I just ignore it?
The record has been modified between the moment it was retrieved from the database (through your cursor) and the moment when you attempted to save it back. If the number column can be safely updated independently of the rest of the record or independently of some other process having already set the number column to some other value, you could be tempted to do:
con.execute("update table set number = 20 where id=" & rs("id") )
However, the race condition persists, and your change may be in turn overwritten by another process.
The best strategy is to ignore the exception (the record was not updated), possibly pushing the failed record to a queue (in memory), then do a second pass over the failed records (re-evaluating the conditions in query and updating as appropriate - add number <> 20 as one of the conditions in query if this is not already the case.) Repeat until no more records fail. Eventually all records will be updated.
Assuming you know exactly which rows you will update, I would do
SET your AUTOCOMMIT to OFF
SET ISOLATION Level to SERIALIZABLE
SELECT row1, row1 FROM table WHERE somecondition FOR UPDATE
UPDATE the rows
COMMIT
This is achieved via pessimistic locking (and assuming row locking is supported in your DB, it should work)

Categories

Resources