I have to create a MySQL InnoDB table using a strictly sequential ID to each element in the table (row). There cannot be any gap in the IDs - each element has to have a different ID and they HAVE TO be sequentially assigned. Concurrent users create data on this table.
I have experienced MySQL "auto-increment" behaviour where if a transaction fails, the PK number is not used, leaving a gap. I have read online complicated solutions that did not convince me and some other that dont really address my problem (Emulate auto-increment in MySQL/InnoDB, Setting manual increment value on synchronized mysql servers)
I want to maximise writing concurrency. I cant afford having users writing on the table and waiting long times.
I might need to shard the table... but still keeping the ID count.
The sequence of the elements in the table is NOT important, but the IDs have to be sequential (ie, if an element is created before another does not need to have a lower ID, but gaps between IDs are not allowed).
The only solution I can think of is to use an additional COUNTER table to keep the count. Then create the element in the table with an empty "ID" (not PK) and then lock the COUNTER table, get the number, write it on the element, increase the number, unlock the table. I think this will work fine but has an obvious bottle neck: during the time of locking nobody is able to write any ID.
Also, is a single point of failure if the node holding the table is not available. I could create a "master-master"? replication but I am not sure if this way I take the risk of using an out-of-date ID counter (I have never used replication).
Thanks.
I am sorry to say this, but allowing high concurrency to achieve high performance and at the same time asking for a strictly monotone sequence are conflicting requirements.
Either you have a single point of control/failure that issues the IDs and makes sure there are neither duplicates nor is one skipped, or you will have to accept the chance of one or both of these situations.
As you have stated, there are attempts to circumvent this kind of problem, but in the end you will always find that you need to make a tradeoff between speed and correctness, because as soon as you allow concurrency you can run into split-brain situations or race-conditions.
Maybe a strictly monotone sequence would be ok for each of possibly many servers/databases/tables?
Related
I have a project that its structure is java ee 7. I use hibernate as ORM and my database is Oracle.
I use #SequenceGenerator with allocationSize = 1 for id of my entity and #GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "seq"). My database sequence in Oracle has cache=1000.
But when I persist two records in database, first record's id is older than second record even if after a day and the id is not in a row and continuously.
How can I resolve this problem and what is my problem?
As you are using 11g ( a very old version, so your company should think in upgrade asap ) the option for RAC has to pass for a balance between performance and integrity gap.
You have two options noorder vs order
create sequence xxx start with 1 increment by 1 noorder|order cache xxx
How do the instances co-ordinate their use of sequence values and avoid the risk of two instances using the same value?
There are two solutions: the default noorder mechanism where each instance behaves as if it doesn’t know about the other instances. The other is the order option, where the instances continuously negotiate through the use of global enqueues to determine which instance should be responsible for the sequence at any moment.
Noorder
The upshot of this noorder mechanism is that each instance will be working its way through a different range of numbers, and there will be no overlaps between instances. If you had sessions that logged on to the database once per second to issue a call to nextval (and they ended up connecting through a different instance each time), then the values returned would appear to be fairly randomly scattered over a range dictated by “number of instances x cache size.” Uniqueness would be guaranteed, but ordering would not.
Order
If you declare a sequence with the order option, Oracle adopts a strategy of using a single “cache” for the values and introduces a mechanism for making sure that only one instance at a time can access and modify that cache. Oracle does this by taking advantage of its Global Enqueue services. Whenever a session issues a call to nextval, the instance acquires an exclusive SV lock (global enqueue) on the sequence cache, effectively saying, “who’s got the most up to date information about this sequence – I want control”. The one instance holding the SV lock in exclusive mode is then the only instance that can increment the cached value and, if necessary, update the seq$ table by incrementing the highwater. This means that the sequence numbers will, once again, be generated in order. But this option has a penalty in performance and should be considered carefully.
Summary
If your transactions are fast, you can use order and test how it behaves. If your transactions are not fast, I would avoid order all together. The best option is to upgrade to 19c ( 12c is already near of obsolescence ) and use IDENTTTY COLUMNS
If you have unordered (separate) caches on each node (the default):
node 1: cache values (1 - 1000)
node 2: cache values (1001 - 2000)
then the caches cannot overlap values and value used will depend on which node performs the insert. That is why your sequence values currently appear to be out of order.
Using the NOCACHE and/or ORDERED options will result sequential numbers, but you can expect at least some performance impact to your application, as the database must perform more overhead to determine the current sequence value before making it available to your SQL command. Reducing the cache size or eliminating the cache entirely can have a severe negative impact on performance if you are executing a lot of inserts (as suggested by your current cache value of 1000).
Assuming for now that you continue to use a cache (whether ordered or not), be aware that every time you restart your database, or a node (depending on your exact configuration), the unused cached values will be flushed and lost and a new cache will be created.
In the end, it is important to realize that sequence values are not intended (for most applications) to be perfectly sequential without gaps, or even (as in your case) ordered. They are only intended to be unique. Be sure to understand your requirement, and don't be put off if sequences don't behave quite like you expected: must the values be sequential in the order inserted, and will gaps in the sequence affect your application? If the answer is no, the application won't care, then stick with what you've got for the sake of performance.
As we know the below hibernate annotation generates a new number each time from the sequence starting from 1. Consider a situation wherein i have a set of records with ids(1-5).Now a record is deleted from the table which had id as 3. If we see number 3 is missing from the sequence 1-5 now because of the operation. I have a requirement for the sequence to re-generate and reassign that number 3 when i will be adding new record in the table. How to do this ?
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private int id;
I don't think this is a great idea. A sequence is just a number incremented of 1 each time. This allows it to be fast but already this is a bottleneck for a distributed database for writes as all the nodes need to synchronize on that number.
If you try to get the first available integer, you need basically to do a full table scan, order the records by id and check the first missing one. That's extremely costly and inefficient for something that shall be as cheap as possible.
You should view the id as a technical ID without functional meaning and thus do not care if there are holes in the sequence or not.
Edit:
I also would add the implications go deeper, even in term of business.
If I get an ID for a article I sell as a merchant and I model its deletion as removing the record or even better put a status "deleted" on it potentially with a date and reason for deletion, I have much easier bookkeeping. Actually, I would prefer the last design: keep the record and have a status that is dynamic and potentially with history. The item could be unavailable for 1 year and be used again if I sell it again.
If on the contrary I silently reuse the ID, then, my system may display an old bill with the data of the new article. Instead of ski boots that I don't sell anymore, it may become a PS5 or 1kg of rice. This is error prone.
This may not apply to all business cases, of course, but its better to consider this kind of usage before going with a design that delete data.
I Agree with Nicolas, but Just to clarify.
You are using an "Identity" and not a "Sequence" there are some differences between them, and how are declared and used (Each database could have their propietary implementation).
A Sequence is an independent object in your database with some properties (like start, end,increment,...) and an identity is a "property" of the column that depends on how the database handles it.
In the case of sequence (and depending on the database in some identities) you could create "cyclic" sequences to repeat the numbers after the cycle ends. But never a sequence or identity scans for "gaps" in the ids. (As Nicolas said is really bad for perfomance)
But depending on how your code will work you could create a cycle in a sequence to prevent having an always increasing value. But Only you are sure that there will not be conflicts when inserting new records.
We need to generate sequential numbers for our transactions. We encountered sqlcode=-911, sqlstate=40001, sqlerrmc=2 (deadlock) when concurrent users are trying to book transactions at the same time. Currently deadlock occurs because it is reading and updating to the same record. How can we design this so that deadlock can be prevented?
Create a "seed" table that contains a single data row.
This "seed" table row holds the "Next Sequential" value.
When you wish to insert a new business data row using the "Next Sequential" value. perform the following steps.
1). Open a cursor for UPDATE on the "seed" table and fetch the current row. This gives you exclusive control over the seed value.
2). You will employ this fetched row as the "Next Value"... However before doing so
3). Increment the fetched "Next Value" and commit the update. This commit closes your cursor and releases the seed row with the new "Next Value".
you are now free to employ your "Next Value".
There are a number of ways around this issue, some less performant than others.
Deadlocks can be prevented if all objects are locked in the same hierarchical sequence. [https://en.wikipedia.org/wiki/Deadlock#Prevention]
However, solutions to the Dining Philosophers Problem [https://en.wikipedia.org/wiki/Dining_philosophers_problem] which completely prevent deadlocks are often less performant than simply rolling back the transaction and retrying. You'll need to test your solution.
If you're looking for a data-side solution, an old-fashioned (and potentially under-performant) approach is to force the acquisition of new transaction IDs to be atomic by establishing a rigorous lock sequence.
A quick-ish solution (test this under load before releasing to production!) could be to use TRANSACTION boundaries and have a control row acting as a gatekeeper. Here's a stupid example which demonstrates the basic technique.
It has no error checking, and the code to reclaim ghost IDs is outside the scope of this example:
DECLARE #NewID INTEGER;
BEGIN TRANSACTION;
UPDATE [TABLE] SET [LOCKFLAG] = CURRENT_TIMESTAMP WHERE [ROW_ID] = 0;
SELECT #NewID = MAX([ROW_ID])+1 FROM [TABLE];
INSERT INTO [TABLE] ([ROW_ID]) VALUES (#NewID);
UPDATE [TABLE] SET [LOCKFLAG] = NULL WHERE [ROW_ID] = 0;
COMMIT TRANSACTION;
The idea is to make this atomic, single-threaded, serialized operation very, very short in duration -- do only what is needed to safely reserve the ID and get out of the way.
By making the first step to update row 0, if all ID requests comply with this standard, then competing users will simply queue up behind that first step.
Once you have your ID reserved, go off and do what you like, and you can use a new transaction to update the row you've created.
You'd need to cover cases where the later steps decide to ROLLBACK, as there would now be a ghost row in the table. You'd want a way to reclaim those; a variety of simple solutions can be used.
We are developing an application in which entity ids for tables must be in incremental order starting from 1 to so on, for each namespace.
We came across allocateIdRange, allocateIds methods in DatastoreService interface but these ids must be assigned manually and will not be assigned by DatastoreService itself. Assigning ids manually may leads to synchronization problems with multiple instances.
Can anyone provide me suggestions to overcome this problem?
We are using objectify 3.0 for DatastoreService operations.
I agree with Tim Hoffman and tx802 when they say you should reconsider your design regarding sequential ids. However a while ago i had to implement something very similar because the customer forced us to use sequential and uninterrupted numbers for order numbers (for unclear reasons). Regardless we complied with the customers wishes by using sharding counters(link contains full code sample) for the order numbers. Sharding counters work like this:
You create a couple of entities of the same kind in your datastore which are just counter values
The actual value is calculated by querying all entities of that kind and summarizing their values
When you wish to increase the value, one of the entities is randomly chosen and incremented
The current counter value may be cached in memcache for improved performance
Why does this work:
As you may know you have a restriction/limitation of 1 transaction per second and entity group in the datastore. Therefor you shard the counter into multiple entities and avoid this limitation. The more traffic you expect, the more shards you're going to need. Luckily you can increase the count of shards at any time.
We also know that writes are slow in comparison to reads. Therefor building the sum of all shards is a fast operation while increasing a single shard value (write) is slow, which doesn't bother us when using sharding counters because we have sufficient time.
Summarized:
You can use sharding counters for sequential ids. If you can avoid the whole sequential id dilemma it would be a better solution though.
I have a multi-threaded client/server system with thousands of clients continuously sending data to the server that is stored in a specific table. This data is only important for a few days, so it's deleted afterwards.
The server is written in J2SE, database is MySQL and my table uses InnoDB engine. It contains some millions of entries (and is indexed properly for the usage).
One scheduled thread is running once a day to delete old entries. This thread could take a large amount of time for deleting, because the number of rows to delete could be very large (some millions of rows).
On my specific system deletion of 2.5 million rows would take about 3 minutes.
The inserting threads (and reading threads) get a timeout error telling me
Lock wait timeout exceeded; try restarting transaction
How can I simply get that state from my Java code? I would prefer handling the situation on my own instead of waiting. But the more important point is, how to prevent that situation?
Could I use
conn.setIsolationLevel( Connection.TRANSACTION_READ_UNCOMMITTED )
for the reading threads, so they will get their information regardless if it is most currently accurate (which is absolutely OK for this usecase)?
What can I do to my inserting threads to prevent blocking? They purely insert data into the table (primary key is the tuple userid, servertimemillis).
Should I change my deletion thread? It is purely deleting data for the tuple userid, greater than specialtimestamp.
Edit:
When reading the MySQL documentation, I wonder if I cannot simply define the connection for inserting and deleting rows with
conn.setIsolationLevel( Connection.TRANSACTION_READ_COMMITTED )
and achieve what I need. It says that UPDATE- and DELETE statements, that use a unique index with a unique search pattern only lock the matching index entry, but not the gap before and with that, rows can still be inserted into that gap. It would be great to get your experience on that, since I can't simply try it on production - and it is a big effort to simulate it on test environment.
Try in your deletion thread to first load the IDs of the records to be deleted and then delete one at a time, committing after each delete.
If you run the thread that does the huge delete once a day and it takes 3 minutes, you can split it to smaller transactions that delete a small number of records, and still manage to get it done fast enough.
A better solution :
First of all. Any solution you try must be tested prior to deployment in production. Especially a solution suggested by some random person on some random web site.
Now, here's the solution I suggest (making some assumptions regarding your table structure and indices, since you didn't specify them):
Alter your table. It's not recommended to have a primary key of multiple columns in InnoDB, especially in large tables (since the primary key is included automatically in any other indices). See the answer to this question for more reasons. You should add some unique RecordID column as primary key (I'd recommend a long identifier, or BIGINT in MySQL).
Select the rows for deletion - execute "SELECT RecordID FROM YourTable where ServerTimeMillis < ?".
Commit (to release the lock on the ServerTimeMillis index, which I assume you have, quickly)
For each RecordID, execute "DELETE FROM YourTable WHERE RecordID = ?"
Commit after each record or after each X records (I'm not sure whether that would make much difference). Perhaps even one Commit at the end of the DELETE commands will suffice, since with my suggested new logic, only the deleted rows should be locked.
As for changing the isolation level. I don't think you have to do it. I can't suggest whether you can do it or not, since I don't know the logic of your server, and how it will be affected by such a change.
You can try to replace your one huge DELETE with multiple shorter DELETE ... LIMIT n with n being determined after testing (not too small to cause many queries and not too large to cause long locks). Since the locks would last for a few ms (or seconds, depending on your n) you could let the delete thread run continuously (provided it can keep-up; again n can be adjusted so it can keep-up).
Also, table partitioning can help.