Java concurrency in DB write operation

Java concurrency in DB write operation - java

I have got one scenario where I need to insert 16k records in a DB table. So along with normal DB batch insert I have created Callable task which will take up respective batch(size of 500 records) and will do insertion in independent manner. I am curious to know how underlying database will take these requests. Does database locking at page level will block rest of java threads until first thread with batch of 500 records get committed?

My answer is for Sybase ASE. For Sybase IQ see Guillaume's answer.
Does database locking at page level will block rest of java threads until first thread with batch of 500 records get committed?
That depends on what locking granularity you have set. According to Sybase's doc, there are three locking granularities:
Allpages locking, which locks datapages and index pages
Datapages locking, which locks only the data pages
Datarows locking, which locks only the data rows
So, if you select Allpages your threads will block until the current batch gets committed. Otherwise, your threads will not block, but will naturally incur a higher locking overhead.
For full details on Sybase ASE's locking granularity, see this documentation.

From Sybase IQ's documentation:
Sybase allows multiple readers, but only one writer to a table.
So, unless you open and close the transaction for every row you insert (which will be slow), your Threads will have to wait until one transaction closes to start a new one.

Related

JDBC transactions in multi-threaded environment

Developing a Java application that share a single Connection between mutiple threads, the problem of concurrency arise.
If thread A updates record 1 in table T, and simultaneously thread B issues a SELECT on record 1 in table T, how do I ensure thread B reads the updated values of thread A?
java.sql.Connection offers transactions with begin(), commit() and rollback(), but does this process also cover data correctness?
I think I'm missing something.

Two points:
You shouldn't share a jdbc.Connection between threads, at least for any 'seriously production' code, see here. For demo purposes, I think, sharing a Connection is OK;
If a thread reads from DB after relevant DB transaction is committed, it will see data written by another thread.
For your second question
will thread B timeout until the first transaction has commit() or rollback()
-- B will block till A tx is finished (either by commit or rollback) if:
B tries to update/delete same table row which is being updated by A, and ...
A updates that row under DB-level lock, using SELECT ... FOR UPDATE.
You can get this behavior using two consoles (for example, with PostgreSQL psql), each console stands for a thread:
in A console type following:
BEGIN;
SELECT some_col FROM some_tbl WHERE some_col = some_val FOR UPDATE;
now in B console type:
BEGIN;
UPDATE some_tbl SET some_col = new_val WHERE some_col = some_val;
You should see that UPDATE blocks until in A you do either COMMIT or ROLLBACK.
Above explanation uses separate DB connections, just like Java JDBC connection pool. When you share single connection between Java threads, I think, any interaction with DB will block if connection is used by some other thread.

Jdbc is a standard that is broadly adopted but with uneven levels of adherence, it is probably not good to make sweeping statements about what is safe.
I would not expect there is anything to keep statement executions and commits and rollbacks made from multiple threads from getting interleaved. Best case, only one thread can use the connection at a time and the others block, making multithreading useless.
If you don't want to provide a connection to each thread, you could have the threads submit work items to a queue that is consumed by a single worker thread handling all the jdbc work. But it's probably less impact on existing code to introduce a connection pool.
In general if you have concurrent updates and reads then they happen in the order that they happen. Locking and isolation levels provide consistency guarantees for concurrent transactions but if one hasn't started its transaction yet those aren't applicable. You could have a status flag, version number, or time stamp on each row to indicate when an update occurred.
If you have a lot of updates it can be better to collect them in a flat file and execute a bulk copy. It can be much faster than using jdbc. Then with updates out of the way execute selects in jdbc.

Concurrent update and delete on one table

Using Java, Hibernate and Oracle database.
I have two concurrent processes:
Process1 removes some entities from table1. (multiple: delete from table1 where id =...) Done by native hibernate query.
Process2 updates SAME/other entities in table1. (multiple: update table1 set name=... where id=...) Done by jpa repository delete method.
Currently sometimes exception
CannotAcquireLockException is thrown,
(SQL Error: 60, SQLState: 61000..
ORA-00060: deadlock detected while waiting for resource)
So, the question is: what is going on and how I can avoid exception? Any workaround?
IMPORTANT: In case of collisions I would be satisfied if delete succeeds and update won't do anything.

Session A waits for B, B waits for A - this is what a deadlock basically is.
Nothing to wait for any more, Oracle kills either of the sessions.
Option 1
Create semaphore to effectively serialize concurrent processes.
create table my_semaphore(dummy char(1));
Session 1:
LOCK TABLE my_semaphore in exclusive mode;
UPDATE <your update here>;
COMMIT;
Session 2:
LOCK TABLE my_semaphore in exclusive mode;
DELETE <your delete here>;
COMMIT;
Option 2
Try processing rows with both statements in the same order, say by rowid or whatever.
So that session B never returns to rows held by A, if A is stuck in behind by rows locked by B. This more tricky and resource-intesive.

"locking tables doesnt look attractive at all -what the point then of having severaal processes working with database"
Obviously we want to enable concurrent processes. The trick is to design processes which can run concurrently without interfering with each other. Your architecture is failing to address this point. It should not be possible for Process B to update records which are being deleted by Process A.
This is an unfortunate side-effect of the whole web paradigm which is stateless and favours an optimistic locking strategy. Getting locks at the last possible moment "scales" but incurs the risk of deadlock.
The alternative is a pessimistic locking strategy, in which a session locks the rows it wants upfront. In Oracle we can do this with SELECT .. FOR UPDATE. This locks a subset of rows (the set defined by the WHERE clause) and not the whole table. Find out more.
So it doesn't hinder concurrent processes which operate on different subsets of data but it will prevent a second session grabbing records which are already being processed. This still results in an exception for the second session but at least that happens before the session has done any work, and provides information to re-evaluate the task (hmmm, do we want to delete these records if they're being updated?).
Hibernate supports the SELECT FOR UPDATE syntax. This StackOverflow thread discusses it.

Transaction in PostgreSql

I'd like to realize following scenario in PosgreSql from java:
User selects data
User starts transaction: inserts, updates, deletes data
User commits transaction
I'd like data not be available for other users during the transaction. It would be enough if I'd get an exception when other user tries to update the table.
I've tried to use select for update or select for share, but it locks data for reading also. I've tried to use lock command, but I'm not able to get a lock (ERROR: could not obtain lock on relation "fppo10") or another transaction gets lock when trying to commit transaction, not when updating the data.
Does it exist a way to lock data in a moment of transaction start to prevent any other call of update, insert or delete statement?
I have this scenario working successfully for a couple of years on DB2 database. Now I need the same application to work also for PostgreSql.

Finally, I think I get what you're going for.
This isn't a "transaction" problem per se (and depending on the number of tables to deal with and the required statements, you may not even need one), it's an application design problem. You have two general ways to deal with this; optimistic and pessimistic locking.
Pessimistic locking is explicitly taking and holding a lock. It's best used when you can guarantee that you will be changing the row plus stuff related to it, and when your transactions will be short. You would use it in situations like updating "current balance" when adding sales to an account, once a purchase has been made (update will happen, short transaction duration time because no further choices to be made at that point). Pessimistic locking becomes frustrating if a user reads a row and then goes to lunch (or on vacation...).
Optimistic locking is reading a row (or set of), and not taking any sort of db-layer lock. It's best used if you're just reading through rows, without any immediate plan to update any of them. Usually, row data will include a "version" value (incremented counter or last updated timestamp). If your application goes to update the row, it compares the original value(s) of the data to make sure it hasn't been changed by something else first, and alerts the user if the data changed. Most applications interfacing with users should use optimistic locking. It does, however, require that users notice and pay attention to updated values.
Note that, because a lock is rarely (and for a short period) taken in optimistic locking, it usually will not conflict with a separate process that takes a pessimistic lock. A pessimistic locking app would prevent an optimistic one from updating locked rows, but not reading them.
Also note that this doesn't usually apply to bulk updates, which will have almost no user interaction (if any).
tl;dr
Don't lock your rows on read. Just compare the old value(s) with what the app last read, and reject the update if they don't match (and alert the user). Train your users to respond appropriately.

Instead of select for update try a "row exclusive" table lock:
LOCK TABLE YourTable IN ROW EXCLUSIVE MODE;
According to the documentation, this lock:
The commands UPDATE, DELETE, and INSERT acquire this lock mode on the
target table (in addition to ACCESS SHARE locks on any other
referenced tables). In general, this lock mode will be acquired by any
command that modifies data in a table.
Note that the name of the lock is confusing, but it does lock the entire table:
Remember that all of these lock modes are table-level locks, even if
the name contains the word "row"; the names of the lock modes are
historical

Mysql/JDBC: Deadlock

I have a J2EE server, currently running only one thread (the problem arises even within one single request) to save its internal model of data to MySQL/INNODB-tables.
Basic idea is to read data from flat files, do a lot of calculation and then write the result to MySQL. Read another set of flat files for the next day and repeat with step 1. As only a minor part of the rows change, I use a recordset of already written rows, compare to the current result in memory and then update/insert it correspondingly (no delete, just setting a deletedFlag).
Problem: Despite a purely sequential process I get lock timeout errors (#1204) and Innodump show record locks (though I do not know how to figure the details). To complicate things under my windows machine everything works, while the production system (where I can't install innotop) has some record locks.
To the critical code:
Read data and calculate (works)
Get Connection from Tomcat Pool and set to autocommit=false
Use Statement to issue "LOCK TABLES order WRITE"
Open Recordset (Updateable) on table order
For each row in Recordset --> if difference, update from in-memory-object
For objects not yet in the database --> Insert data
Commit Connection, Close Connection
The Steps 5/6 have an Commitcounter so that every 500 changes the rows are committed (to avoid having 50.000 rows uncommitted). In the first run (so w/o any locks) this takes max. 30sec / table.
As stated above right now I avoid any other interaction with the database, but it in future other processes (user requests) might read data or even write some fields. I would not mind for those processes to read either old/new data and to wait for a couple of minutes to save changes to the db (that is for a lock).
I would be happy to any recommendation to do better than that.
Summary: Complex code calculates in-memory objects which are to be synchronized with database. This sync currently seems to lock itself despite the fact that it sequentially locks, changes unlocks the tables without any exceptions thrown. But for some reason row locks seem to remain.
Kind regards
Additional information:
Mysql: show processlist lists no active connections (all asleep or alternatively waiting for table locks on table order) while "show engine INNODB" reports a number of row locks (unfortuantely I can't understand which transaction is meant as output is quite cryptic).

Solved: I wrongly declared a ResultSet as updateable. The ResultSet was closed only on a "finalize()" method via Garbage Collector which was not fast enough - before I reopended the ResultSet and tried therefore to aquire a lock on an already locked table.
Yet it was odd, that innotop showed another query of mine to hang on a completely different table. Though as it works for me, I do not care about oddities:-)

Is LOAD DATA LOW_PRIORITY executed asynchronously?

I wish to execute LOAD DATA LOW_PRIORITY INFILE statement from Java.
I am dealing only with MyISAM engine.
I am interested if statement.execute("LOAD DATA LOW_PRIORITY INFILE ...") will execute this query asynchronously, or will it block until this statement will be completed.
I am asking this since I have SQL operations after this statement that are based on the loaded data, but I am still interested that any read operation on this table that are executed concurrently will have higher priority over LOAD DATA statement.

LOAD DATA LOW_PRIORITY INFILE ... blocks until completion on the commandline, so I assume your code will block too.
If you want concurrent transactions to be able to read from your table during the import, then you want to use the CONCURRENT option instead of LOW PRIORITY.
As stated in the manual:
If you specify CONCURRENT with a MyISAM table that satisfies the
condition for concurrent inserts (that is, it contains no free blocks
in the middle), other threads can retrieve data from the table while
LOAD DATA is executing.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.