I write a java application where different threads (each thread has an own connection object using a connection pool c3p0) call a method like that.
Pseudo code:
void example(Connection connection) {
connection.update("LOCK TABLES Test WRITE");
resultSet = connection.query("SELECT * FROM Test WHERE Id = '5'");
if (resultSet.next()) {
connection.update("UPDATE Test SET Amount = '10' WHERE Id = '5'");
} else {
connection.update("INSERT INTO Test (Id, Amount) VALUES ('5', '10')");
}
connection.update("UNLOCK TABLES");
connection.commit();
}
There are a few other similar methods which lock a table, select/update/insert something and then unlock the table. The aim is to prevent race conditions and deadlocks.
Is it possible to cause MySQL deadlocks when I call such a method from different threads? If yes, can you give me an example how that happens (timing of two transactions which cause a deadlock)? I am a noob with deadlocks and I want to get into this.
Edit: Make clear that the connection that should be used in the method is passed from the thread that calls the method.
Edit: Replace READ with WRITE
It cannot here. As there is no complex logic and the code immediately commits after update, there must be always one thread which goes through. Even in more complex scenarios it would probably require highest serialization level (repeatable reads) which I believe MySql does not support.
This would possibly create a deadlock. Actually I'm not sure if it'll even execute, because you need to acquire a "WRITE" lock, not a "READ".
Related
I am writing a program with threads insert into a db.
Example
public static void save(String name){
{
try(PreparedStatement preparedStatement = ...insert...)
{
preparedStatement.setString(1, name);
preparedStatement.executeUpdate();
preparedStatement.close();
} catch (...){
}
}
Question: Could it be that when simultaneously executing threads of insert into a table, one thread will use (preparedStatement.executeUpdate()) the preparedStatement from another Thread?
Absolutely. You should not be doing this - each thread needs to have its own database connection (which therefore implies it neccessarily also ends up having its own PreparedStatement).
Better yet - don't do this. You're just making things confusing and slower, it's lose-lose-lose. There is no benefit at all to your plan. The database isn't going to magically do the job faster if you insert from 2 threads simultaneously.
The conclusion is simple: threads are a really bad idea when INSERTing a lot of data into the same table, so DO NOT DO IT!
But I really want to speed up my INSERTs!
My data gathering is slow
IF (big if!!) gathering the data for insertion is slower than the database can insert records for you, and the data gathering job lends itself well to multi-threading, then have threads that gather the data, but have these threads put objects onto a queue, and have a separate 'DB inserter' thread (the only thread that even has a connection to the DB) that pulls objects off this queue and runs an INSERT.
If you can gather the data quickly, or the source does not lend itself to multi-threaded, this only makes your code longer, harder to understand, considerably harder to test, and slower. No point at all.
Useful tools: LinkedBlockingQueue - an instance of this is the one piece of shared data all threads have. Your data gatherer threads toss objects onto this queue, and your single db inserted thread fetches objects off of it.
General insert speed advice 1: bundling
DBs work in transactions. If you have autocommit mode on (and Connections start in this mode), that's not 'no transactions'. That's merely (hence the name): The DB commits after every transaction. You can't do 'non-transactional' in proper databases. A commit() is heavy (takes a long time to process), but so is excessively long transactions (doing thousands of things in a single transaction before committing). Thus, you get the goldilocks principle: You want to run about 500 or so inserts, then commit.
Note that this has a downside: If an error occurs halfway through this process, then some records have been committed and some haven't been. Keep that in mind - your process needs to be idempotent or that is likely not acceptable and you'd need to make it idempotent (e.g. by having a column that lists the 'insert session' id, so you can delete them all if the operation cannot be completed properly) - and if your DB is simultaneously used by other stuff, you need more complexity as well (some sort of flag or filter so that other simultaneous code doesn't consider any of the committed, inserted records until the entire batch is completely added).
Relevant methods:
con.setAutoCommit(false);
con.commit()
This general structure:
try (PreparedStatement ps = con.prepare.......) {
int inserted = 0;
while (!allGenerationDone) {
Data d = queue.take();
ps.setString(1, d.getName());
ps.setDate(2, d.getBirthDate());
// set the other stuff
ps.execute();
if (inserted++ % 500 == 0) con.commit();
}
}
con.commit();
General insert speed advice 2: bulk
Most DB engines have special commands for bulk insertion. From a DB engine's perspective, various cleanup and care tasks take a ton of time and may not even be neccessary, or can be combined to save a load of time, when doing bulk inserts. Specifically, checking of constraints (particularly, reference constraints) and building of indices takes most of the time of processing an INSERT, and these things can either be skipped entirely or sped up considerably by doing them in bulk all at once at the end.
The way to do this is highly dependent on the underlying database. For example, in postgres, you can turn off constraint checking and turn off index building, then run your inserts, then re-enable. You can even choose to omit constraint checks entirely (meaning, your DB could be in an invalid state if your code is messed up, but if speed is more important than safety this can be the right way to go about it). Index building is considerably faster if done at the end.
Other databases generally have similar strategies. Alternatively, there are commands that combine it all, generally called COPY (instead of INSERT). Check your DB engine's docs.
Read this SO question for some info and benchmarks on how COPY compares to INSERT. And use a web search engine searching for e.g. mysql bulk insert.
Developing a Java application that share a single Connection between mutiple threads, the problem of concurrency arise.
If thread A updates record 1 in table T, and simultaneously thread B issues a SELECT on record 1 in table T, how do I ensure thread B reads the updated values of thread A?
java.sql.Connection offers transactions with begin(), commit() and rollback(), but does this process also cover data correctness?
I think I'm missing something.
Two points:
You shouldn't share a jdbc.Connection between threads, at least for any 'seriously production' code, see here. For demo purposes, I think, sharing a Connection is OK;
If a thread reads from DB after relevant DB transaction is committed, it will see data written by another thread.
For your second question
will thread B timeout until the first transaction has commit() or rollback()
-- B will block till A tx is finished (either by commit or rollback) if:
B tries to update/delete same table row which is being updated by A, and ...
A updates that row under DB-level lock, using SELECT ... FOR UPDATE.
You can get this behavior using two consoles (for example, with PostgreSQL psql), each console stands for a thread:
in A console type following:
BEGIN;
SELECT some_col FROM some_tbl WHERE some_col = some_val FOR UPDATE;
now in B console type:
BEGIN;
UPDATE some_tbl SET some_col = new_val WHERE some_col = some_val;
You should see that UPDATE blocks until in A you do either COMMIT or ROLLBACK.
Above explanation uses separate DB connections, just like Java JDBC connection pool. When you share single connection between Java threads, I think, any interaction with DB will block if connection is used by some other thread.
Jdbc is a standard that is broadly adopted but with uneven levels of adherence, it is probably not good to make sweeping statements about what is safe.
I would not expect there is anything to keep statement executions and commits and rollbacks made from multiple threads from getting interleaved. Best case, only one thread can use the connection at a time and the others block, making multithreading useless.
If you don't want to provide a connection to each thread, you could have the threads submit work items to a queue that is consumed by a single worker thread handling all the jdbc work. But it's probably less impact on existing code to introduce a connection pool.
In general if you have concurrent updates and reads then they happen in the order that they happen. Locking and isolation levels provide consistency guarantees for concurrent transactions but if one hasn't started its transaction yet those aren't applicable. You could have a status flag, version number, or time stamp on each row to indicate when an update occurred.
If you have a lot of updates it can be better to collect them in a flat file and execute a bulk copy. It can be much faster than using jdbc. Then with updates out of the way execute selects in jdbc.
I am working on a program that allows multiple users to access a db (MySQL), and at various times I'm getting a SQLException: Lock wait timeout exceeded .
The connection is created using:
conn = DriverManager.getConnection(connString, username, password);
conn.setAutoCommit(false);
and the calls all go through this bit of code:
try {
saveRecordInternal(record);
conn.commit();
} catch (Exception ex) {
conn.rollback();
throw ex;
}
Where saveRecordInternal has some internal logic, saving the given record. Somewhere along the way is the method which I suspect is the problem:
private long getNextIndex() throws Exception {
String query = "SELECT max(IDX) FROM MyTable FOR UPDATE";
PreparedStatement stmt = conn.prepareStatement(query);
ResultSet rs = stmt.executeQuery();
if (rs.next()) {
return (rs.getLong("IDX") + 1);
} else {
return 1;
}
}
This method is called by saveRecordInternal somewhere along it's operation, if needed.
For reasons that are currently beyond my control I cannot use auto-increment index, and anyway the index to-be-inserted is needed for some internal-program logic.
I would assume having either conn.commit() or conn.rollback() called would suffice to release the lock, but apparently it's not. So my question is - Should I use stmt.close() or rs.close() inside getNextIndex? Would that release the lock before the transaction is either committed or rolled back, or would it simply ensure the lock is indeed released when calling conn.commit() or conn.rollback()?
Is there anything else I'm missing / doing entirely wrong?
Edit: At the time the lock occurs all connected clients seem to be responsive, with no queries currently under-way, but closing all connected clients does resolve the issue. This leads me to think the lock is somehow preserved even though the transaction (supposedly?) ends, either by committing or rolling back.
Even though not closing a Statement or ResultSet is a bad idea but that function doesn't seem responsible for error that you are receiving. Function , getNextIndex() is creating local Statement andResultSet but not closing it. Close those right there or create those Statement and ResultSetobjects in saveRecordInternal() and pass as parameters or better if created in your starting point and reused again and again. Finally, close these when not needed anymore ( in following order - ResultSet, Statement, Connection ).
Error simply means that a lock was present on some DB object ( Connection, Table ,Row etc ) while another thread / process needed it at the same time but had to wait ( already locked ) but wait timed out due to longer than expected wait.
Refer , How to Avoid Lock wait timeout exceeded exception.? to know more about this issue.
All in all this is your environment specific issue and need to be debugged on your machine with extensive logging turned on.
Hope it helps !!
From the statements above I don't see any locks that remain open!
In general MySql should release the locks whenever a commit or rollback is called, or when the connection is closed.
In your case
SELECT max(IDX) FROM MyTable FOR UPDATE
would result in locking the whole table, but I assume that this is the expected logic! You lock the table until the new row is inserted and then release it to let the others insert.
I would test with:
SELECT IDX FROM MyTable FOR UPDATE Order by IDX Desc LIMIT 1
to make sure that the lock remains open even when locking a single row.
If this is not the case, I might be a lock timeout due to a very large table.
So, what I think happen here is: you query is trying to executed on some table but that table is locked with some another process. So, till the time the older lock will not get released from the table, your query will wait to get executed and after some time if the lock will no release you will get the lock time out exception.
You can also take a look on table level locks and row level locks.
In brief: table level locks lock the whole table and till the lock is there you want be able to execute any other query on the same table.
while
row level lock will put a lock on a specific row, so apart from that row you can execute queries on the table.
you can also check if there is any query running on the table from long time and due to that you are not able to execute another query and getting exception. How, to check this will vary upon database, but you can google it for your specific database to find out query to get the open connections or long running queries on database.
I have a set of queries which has to be executed simultaneously. For this I am starting a Runnable thread for each queries, calling it from a while loop iterating through the List of queries.
The thread executed each of the queries, and also calculates the time taken by each query to execute. This is done by capturing start time and end time getting the diff. This time should be specific for each query, which needs to be printed out along with the corresponding query/
In this scenario, do I just need to capture times and display? Will it lead to any synchronization problems?
This question is too generic. Will it lead to concurrency problems? May be. Do you share Objects between each Runnable? If you do you MIGHT have issue, if you don't then no.
If your loop looks like this for example (this code proves a point - it does not work):
for(int i=0;i<queries.size();++i){
String query = queries.get(i);
new Thread(new Runnable() {
public void run() {
//Execute the query
}
});
}
Each Thread will execute it's own query WITHOUT accessing some shared data - then no, you will not have concurrency problems - at least in the Java code. You could have problems in the Database - for example multiple queries trying to update the same row.
In this scenario, do I just need to capture times and display? Will it
lead to any synchronization problems?
I don't know, do you? Why don't you just have a member field for each one of your Runnable instances that keeps track of the time it takes to execute - and then when all threads are finished, iterate over the Runnables and display the data. Who besides you knows what your true intentions are. There are only two synchronization problems I know of: Deadlock & Race Conditions. Neither one applies to this scenario.
I'm writing an application to analyse a MySQL database, and I need to execute several DMLs simmultaneously; for example:
// In ResultSet rsA: Select * from A;
rsA.beforeFirst();
while (rsA.next()) {
id = rsA.getInt("id");
// Retrieve data from table B: Select * from B where B.Id=" + id;
// Crunch some numbers using the data from B
// Close resultset B
}
I'm declaring an array of data objects, each with its own Connection to the database, which in turn calls several methods for the data analysis. The problem is all threads use the same connection, thus all tasks throw exceptios: "Lock wait timeout exceeded; try restarting transaction"
I believe there is a way to write the code in such a way that any given object has its own connection and executes the required tasks independent from any other object. For example:
DataObject dataObject[0] = new DataObject(id[0]);
DataObject dataObject[1] = new DataObject(id[1]);
DataObject dataObject[2] = new DataObject(id[2]);
...
DataObject dataObject[N] = new DataObject(id[N]);
// The 'DataObject' class has its own connection to the database,
// so each instance of the object should use its own connection.
// It also has a "run" method, which contains all the tasks required.
Executor ex = Executors.newFixedThreadPool(10);
for(i=0;i<=N;i++) {
ex.execute(dataObject[i]);
}
// Here where the problem is: Each instance creates a new connection,
// but every DML from any of the objects is cluttered in just one connection
// (in MySQL command line, "SHOW PROCESSLIST;" throws every connection, and all but
// one are idle).
Can you point me in the right direction?
Thanks
I think the problem is that you've confounded a lot of middle tier, transactional, and persistent logic into one class.
If you're dealing directly with ResultSet, you're not thinking about things in a very object-oriented fashion.
You're smart if you can figure out how to get the database to do some of your calculations.
If not, I'd recommend keeping Connections open for the minimum time possible. Open a Connection, get the ResultSet, map it into an object or data structure, close the ResultSet and Connection in local scope, and return the mapped object/data structure for processing.
You keep persistence and processing logic separate this way. You save yourself a lot of grief by keeping connections short-lived.
If a stored procedure solution is slow it could be due to poor indexing. Another solution will perform equally poorly if not worse. Try running EXPLAIN PLAN and see if any of your queries are using TABLE SCAN. If yes, you have some indexes to add. It could also be due to large rollback logs if your transactions are long-running. There's a lot you could and should do to ensure you've done everything possible with the solution you have before switching. You could go to a great deal of effort and still not address the root cause.
After some time of brain breaking, I figured out my own mistakes... I want to put this new knowledge, so... here I go
I made a very big mistake by declaring the Connection objet as a Static object in my code... so obviously, despite I created a new Connection for each new data object I created, every transaction went through a single, static, connection.
With that first issue corrected, I went back to the design table, and realized that my process was:
Read an Id from an input table
Take a block of data related to the Id read in step 1, stored in other input tables
Crunch numbers: Read the related input tables and process the data stored in them
Save the results in one or more output tables
Repeat the process while I have pending Ids in the input table
Just by using a dedicated connection for input reading and a dedicated connection for output writing, the performance of my program increased... but I needed a lot more!
My original approach for steps 3 and 4 was to save into the output each one of the results as soon as I had them... But I found a better approach:
Read the input data
Crunch the numbers, and put the results in a bunch of queues (one for each output table)
A separated thread is checking every second if there's data in any of the queues. If there's data in the queues, write it to the tables.
So, by dividing input and output tasks using different connections, and by redirecting the core process output to a queue, and by using a dedicated thread for output storage tasks, I finally achieved what I wanted: Multithreaded DML execution!
I know there are better approaches to this particular problem, but this one works quite fine.
So... if anyone is stuck with a problem like this... I hope this helps.