I encountered some curious behavior today and was wondering if it is expected or standard. We are using Hibernate against MySQL5. During the course of coding I forgot to close a transaction, I presume others can relate.
When I finally closed the transaction, ran the code and checked the table, I noticed the following. All the times I mistakenly ran my code without closing the transaction, which therefore did not result in actual rows being inserted, nevertheless incremented the auto-increment surrogate primary key value, so that I have a gap (i.e. no rows with id field value of 751 to 762).
Is this expected or standard behavior? Might it vary depending on the database? And/or does Hibernate's own transaction abstraction have some possible effect on this?
Yes that's expected.
If you think about it: what else can the database do? If you increment the column and then use that as a foreign key in other inserts within the same transaction and while you're doing that someone else commits then they can't use your value. You'll get a gap.
Sequences in databases like Oracle work much the same way. Once a particular value is requested, whether or not it's then committed doesn't matter. It'll never be reused. And sequences are loosely not absolutely ordered too.
It's pretty much expected behaviour. With out it the db would have to wait for each transaction that has inserted a record to complete before assigning a new id to the next insert.
Yes, this is expected behaviour. This documentation explains it very well.
Beginning with 5.1.22, there are actually three different lock modes that control how concurrent transactions get auto-increment values. But all three will cause gaps for rolled-back transactions (auto-increment values used by the rolled-back transaction will be thrown away).
Database sequences are not to guarantee id sequence without gaps. They are designed to be transaction-independent, only in such way can be non-blocking.
You want no gaps, you must write your own stored procedure to increase column transactionally, but such code will block other transactions, so you must be carrefull.
You do SELECT CURRVAL FROM SEQUENCE_TABLE WHERE TYPE = :YOUR_SEQ_NAME FOR UPDATE;
UPDATE SEQUENCE_TABLE SET CURRVAL = :INCREMENTED_CURRVAL WHERE TYPE = :YOUR_SEQ.
Related
I am peer reviewing a code.
I found lots of Delete statement without conditions where Developers are removing data from table and inserting fresh data.
public void deleteAll() throws Exception {
String sql = "DELETE FROM ERP_FND_USER";
entityManager.createNativeQuery(sql, FndUserFromErp.class).executeUpdate();
LOG.log(Level.INFO, "TERP_FND_USER all data deleted");
}
Shall i make it standard to always use Truncate when delete all data as Truncate is more efficient when delete all? (or shall i be suspicious that in future a condition will come and we would need to change statement)?
I think rollbacking thing also not implemented in code i.e not transactional .
Truncating a table means we have no way of recovering the data, once done.
I believe DELETE is a better option in this case, given that we are expecting the table size is not very big.
If we are expecting a table size to be very big in terms of volume of data we are planning to store, then even in that case I recommend to use to DELETE given that we do not want to delete tables without any conditions in such cases.
Also if we are using a table only for the session of the java program I believe we can use a TEMP table instead of main table, that will help you to not DELETE it explicitly and it will be purged once the session is over.
Truncate should only be used when you are absolutely sure of DELETING the entire table and you have no intention of recovering it at all.
There is no strict answer.
There are several differences between DELETE and TRUNCATE commands.
In general, TRUNCATE works faster - the reason is evident: it is unconditional and does not perform search on the table.
Another difference is the identity: TRUNCATE reseeds the table identity, DELETE does not.
For example, you have the users table with column ID defined as identity and column Name:
1 | John Doe
2 | Max Mustermann
3 | Israel Israeli
Suppose you delete user with ID=3 via the DELETE command (with WHERE clause or not - does not even matter). Inserting another user will NEVER create a user with ID=3, most probably the created ID will be 4 (but there are situations when it can be different).
Truncating the table will start the identity from 1.
If you do not worry about identity and there are no foreign keys which may prevent you from deleting records - I would use TRUNCATE.
Update: Dinesh (below) is right, TRUNCATE is irreversible. This should be also taken into consideration.
You should use TRUNCATE if you need to reset AUTO_INCREMENT fields.
DELETE of all rows will not.
Other difference is performance, TRUNCATE will be faster than DELETE all row.
Either TRUNCATE or DELETE will remove definitively rows,
contrary to what was mentioned in another answer, except if DELETE is execute inside a TRANCACTION which is ROLLBACK. But if TRANSACTION is commited, no recover is possible.
We need to generate sequential numbers for our transactions. We encountered sqlcode=-911, sqlstate=40001, sqlerrmc=2 (deadlock) when concurrent users are trying to book transactions at the same time. Currently deadlock occurs because it is reading and updating to the same record. How can we design this so that deadlock can be prevented?
Create a "seed" table that contains a single data row.
This "seed" table row holds the "Next Sequential" value.
When you wish to insert a new business data row using the "Next Sequential" value. perform the following steps.
1). Open a cursor for UPDATE on the "seed" table and fetch the current row. This gives you exclusive control over the seed value.
2). You will employ this fetched row as the "Next Value"... However before doing so
3). Increment the fetched "Next Value" and commit the update. This commit closes your cursor and releases the seed row with the new "Next Value".
you are now free to employ your "Next Value".
There are a number of ways around this issue, some less performant than others.
Deadlocks can be prevented if all objects are locked in the same hierarchical sequence. [https://en.wikipedia.org/wiki/Deadlock#Prevention]
However, solutions to the Dining Philosophers Problem [https://en.wikipedia.org/wiki/Dining_philosophers_problem] which completely prevent deadlocks are often less performant than simply rolling back the transaction and retrying. You'll need to test your solution.
If you're looking for a data-side solution, an old-fashioned (and potentially under-performant) approach is to force the acquisition of new transaction IDs to be atomic by establishing a rigorous lock sequence.
A quick-ish solution (test this under load before releasing to production!) could be to use TRANSACTION boundaries and have a control row acting as a gatekeeper. Here's a stupid example which demonstrates the basic technique.
It has no error checking, and the code to reclaim ghost IDs is outside the scope of this example:
DECLARE #NewID INTEGER;
BEGIN TRANSACTION;
UPDATE [TABLE] SET [LOCKFLAG] = CURRENT_TIMESTAMP WHERE [ROW_ID] = 0;
SELECT #NewID = MAX([ROW_ID])+1 FROM [TABLE];
INSERT INTO [TABLE] ([ROW_ID]) VALUES (#NewID);
UPDATE [TABLE] SET [LOCKFLAG] = NULL WHERE [ROW_ID] = 0;
COMMIT TRANSACTION;
The idea is to make this atomic, single-threaded, serialized operation very, very short in duration -- do only what is needed to safely reserve the ID and get out of the way.
By making the first step to update row 0, if all ID requests comply with this standard, then competing users will simply queue up behind that first step.
Once you have your ID reserved, go off and do what you like, and you can use a new transaction to update the row you've created.
You'd need to cover cases where the later steps decide to ROLLBACK, as there would now be a ghost row in the table. You'd want a way to reclaim those; a variety of simple solutions can be used.
I have a multi-threaded client/server system with thousands of clients continuously sending data to the server that is stored in a specific table. This data is only important for a few days, so it's deleted afterwards.
The server is written in J2SE, database is MySQL and my table uses InnoDB engine. It contains some millions of entries (and is indexed properly for the usage).
One scheduled thread is running once a day to delete old entries. This thread could take a large amount of time for deleting, because the number of rows to delete could be very large (some millions of rows).
On my specific system deletion of 2.5 million rows would take about 3 minutes.
The inserting threads (and reading threads) get a timeout error telling me
Lock wait timeout exceeded; try restarting transaction
How can I simply get that state from my Java code? I would prefer handling the situation on my own instead of waiting. But the more important point is, how to prevent that situation?
Could I use
conn.setIsolationLevel( Connection.TRANSACTION_READ_UNCOMMITTED )
for the reading threads, so they will get their information regardless if it is most currently accurate (which is absolutely OK for this usecase)?
What can I do to my inserting threads to prevent blocking? They purely insert data into the table (primary key is the tuple userid, servertimemillis).
Should I change my deletion thread? It is purely deleting data for the tuple userid, greater than specialtimestamp.
Edit:
When reading the MySQL documentation, I wonder if I cannot simply define the connection for inserting and deleting rows with
conn.setIsolationLevel( Connection.TRANSACTION_READ_COMMITTED )
and achieve what I need. It says that UPDATE- and DELETE statements, that use a unique index with a unique search pattern only lock the matching index entry, but not the gap before and with that, rows can still be inserted into that gap. It would be great to get your experience on that, since I can't simply try it on production - and it is a big effort to simulate it on test environment.
Try in your deletion thread to first load the IDs of the records to be deleted and then delete one at a time, committing after each delete.
If you run the thread that does the huge delete once a day and it takes 3 minutes, you can split it to smaller transactions that delete a small number of records, and still manage to get it done fast enough.
A better solution :
First of all. Any solution you try must be tested prior to deployment in production. Especially a solution suggested by some random person on some random web site.
Now, here's the solution I suggest (making some assumptions regarding your table structure and indices, since you didn't specify them):
Alter your table. It's not recommended to have a primary key of multiple columns in InnoDB, especially in large tables (since the primary key is included automatically in any other indices). See the answer to this question for more reasons. You should add some unique RecordID column as primary key (I'd recommend a long identifier, or BIGINT in MySQL).
Select the rows for deletion - execute "SELECT RecordID FROM YourTable where ServerTimeMillis < ?".
Commit (to release the lock on the ServerTimeMillis index, which I assume you have, quickly)
For each RecordID, execute "DELETE FROM YourTable WHERE RecordID = ?"
Commit after each record or after each X records (I'm not sure whether that would make much difference). Perhaps even one Commit at the end of the DELETE commands will suffice, since with my suggested new logic, only the deleted rows should be locked.
As for changing the isolation level. I don't think you have to do it. I can't suggest whether you can do it or not, since I don't know the logic of your server, and how it will be affected by such a change.
You can try to replace your one huge DELETE with multiple shorter DELETE ... LIMIT n with n being determined after testing (not too small to cause many queries and not too large to cause long locks). Since the locks would last for a few ms (or seconds, depending on your n) you could let the delete thread run continuously (provided it can keep-up; again n can be adjusted so it can keep-up).
Also, table partitioning can help.
I am curious to know on two levels the following details.
Is it considered bad practice or is it worst performance to continually try to insert duplicate data and allowing the dbms to enforce an entity's constraints to deny those inserts. OR is it better to do some sort of SELECT COUNT(1) and only insert if count is not 1.
Assuming that from the first item it is more efficient from dbms perspective to enforce an entity's constraints and not make multiple calls. Will the application code (Java, .NET, etc) suffer a greater performance impact due to code unnecessarily heading into exception block even though the exception will not be handled.
Possible Duplicate: Inserting data into SQL Table with Primary Key. For dupes - allow insert error or Select first?
In terms of performance, you are better off using the in-database feature to enforce constraints.
When you attempt to enforce the constraint outside the database you have two issues. The first is that you have overhead of running a separate query, returning the results, and performing logic -- several database operations. Using the constraint, on the other hand, might do the same work, but it does it all inside the database without the extra overhead of passing things back and forth.
Second, when you attempt to enforce the constraint yourself, you introduce race conditions. This means that you might run the count() and it returns 0. Another transaction, meanwhile, inserts the value and then your insert fails anyway. You really want to avoid such race conditions. One solution, of course, is to put all the logic in a single transaction. This introduces its own overhead.
If you have to do a select and then insert each time that will be considerably slower as it requires two round trips. The cost to check an exception is nothing compared to the time required to do a database statement.
Some databases allow "upsert" statements that allow you to do both update and then insert if it doesnt exist as one call.
Really if you are doing this a lot you need to step back and think about the overall algorithm and architecture. Why are you constantly trying to insert values that already exist and is there something you can change so that it isn't happening at all - rather than handling the failure at the database point.
Just you increased round trip time. One time check and other time insert if not duplicate.
May be it is not good. But what about to catch exception and handle it, time comparison between first check then inset and catching an Exception has much of difference.
Edit:
One issue may rise in First check and then inset is synchronization. May be some thread update/Insert the data before your thread when you checked that there is no duplication, result will be exception at the end. So you must tackle that can not leave.
Personally, I would probably write a cursor to check that a record exists. To me it is a cleaner approach than relying on an Exception in order to control your program flow. Yes, there is a performance implication, but it is likely to be marginal. Clean, readable code would take the priority for me.
I have to create a MySQL InnoDB table using a strictly sequential ID to each element in the table (row). There cannot be any gap in the IDs - each element has to have a different ID and they HAVE TO be sequentially assigned. Concurrent users create data on this table.
I have experienced MySQL "auto-increment" behaviour where if a transaction fails, the PK number is not used, leaving a gap. I have read online complicated solutions that did not convince me and some other that dont really address my problem (Emulate auto-increment in MySQL/InnoDB, Setting manual increment value on synchronized mysql servers)
I want to maximise writing concurrency. I cant afford having users writing on the table and waiting long times.
I might need to shard the table... but still keeping the ID count.
The sequence of the elements in the table is NOT important, but the IDs have to be sequential (ie, if an element is created before another does not need to have a lower ID, but gaps between IDs are not allowed).
The only solution I can think of is to use an additional COUNTER table to keep the count. Then create the element in the table with an empty "ID" (not PK) and then lock the COUNTER table, get the number, write it on the element, increase the number, unlock the table. I think this will work fine but has an obvious bottle neck: during the time of locking nobody is able to write any ID.
Also, is a single point of failure if the node holding the table is not available. I could create a "master-master"? replication but I am not sure if this way I take the risk of using an out-of-date ID counter (I have never used replication).
Thanks.
I am sorry to say this, but allowing high concurrency to achieve high performance and at the same time asking for a strictly monotone sequence are conflicting requirements.
Either you have a single point of control/failure that issues the IDs and makes sure there are neither duplicates nor is one skipped, or you will have to accept the chance of one or both of these situations.
As you have stated, there are attempts to circumvent this kind of problem, but in the end you will always find that you need to make a tradeoff between speed and correctness, because as soon as you allow concurrency you can run into split-brain situations or race-conditions.
Maybe a strictly monotone sequence would be ok for each of possibly many servers/databases/tables?