I'm developing a REST API using Java and Spring Boot to manage purchases and customers. In my MySQL database, I have a table Purchase with a column that stores the unique ticketId. It is not the primary key.
When a new purchase is added (by doing a PUT request), I create a new purchase from data provided in the request, obtain the max ticketId, increment it by one, and store it in database. Primary key is auto-incremented.
This is my code:
#Transactional
public boolean saveNewPurchase(PurchaseDTO data) {
Purchase p = createPurchaseFromData(data);
Long idTicket = purchaseDao.getMaxIdTicket();
p.setIdTicket(idTicket + 1);
save(p);
}
Are there concurrency issues here? Let's say two PUT requests executes this method in parallel, could they retrieve the same max idTicket so violate unique idTicket constraint when save the second purchase?
If so, how could I solve it? Would making the method synchronized solve the problem?
Thanks.
Yes there is a concurrency issue here. Two different threads could get the same maxIdTicket, and then save the Purchase with the same ticketId.
I see 3 solutions here :
Use synchronized
Use an AtomicInteger to keep the counter in memory
Use a specific table in MySQL, with only one column, AUTO_INCREMENT and insert/get a row each time you need a counter value. With other RDBMS, you could use a sequence, but I am pretty sure there is no sequences in MySQL.
The first 2 solutions are not working in distributed environment, so I would do the third.
Yes, it is certainly prone to concurrency issues. One suggestion is to keep counter in memory with AtomicInteger, so that you wouldn't end up getting current maxId always from DB which might cause race conditions. So, when application starts up, by querying the DB, it could store the maxId in memory. This is robust, even in case of any crash, it could always the read the information from DB.
This won't work in distributed environment, in that case, having another table dedicated for storing the counter, is the better approach, as suggested by JamieB, in comments.
Related
I am using Cassandra as the DB, I want to insert a serial number for every record in sequential form for every record, such that every record is unique.
So that even if the application crashes, after the restart if any record is inserted then the serial number is the latest one.
I have looked for it but haven't found any solution for Cassandra.
The solution I thought of is to get the count(*) of the table and then inserting record with incremented value by 1. But getting count does not seem a good approach as overtime the number of records will be far higher.
Trying to create a sequential key like this in Cassandra isn't a good idea as Cassandra is a highly available distributed database that generally sacrifices consistency for availability. 'Read before write' (getting a count(*) and then inserting a record) is considered an anti-pattern in Cassandra due to consistency issues. It's not safe to modify data based on a read, as that data could have been changed by another process during the read.
A viable solution to this problem would be to use a TimeUUID. If generated correctly, the IDs will all be unique and as a bonus can also be ordered by time. Check https://cwiki.apache.org/confluence/display/CASSANDRA2/TimeBaseUUIDNotes for more info. There are also plenty of answers on how to create a TimeUUID out there.
I have a multi-threaded client/server system with thousands of clients continuously sending data to the server that is stored in a specific table. This data is only important for a few days, so it's deleted afterwards.
The server is written in J2SE, database is MySQL and my table uses InnoDB engine. It contains some millions of entries (and is indexed properly for the usage).
One scheduled thread is running once a day to delete old entries. This thread could take a large amount of time for deleting, because the number of rows to delete could be very large (some millions of rows).
On my specific system deletion of 2.5 million rows would take about 3 minutes.
The inserting threads (and reading threads) get a timeout error telling me
Lock wait timeout exceeded; try restarting transaction
How can I simply get that state from my Java code? I would prefer handling the situation on my own instead of waiting. But the more important point is, how to prevent that situation?
Could I use
conn.setIsolationLevel( Connection.TRANSACTION_READ_UNCOMMITTED )
for the reading threads, so they will get their information regardless if it is most currently accurate (which is absolutely OK for this usecase)?
What can I do to my inserting threads to prevent blocking? They purely insert data into the table (primary key is the tuple userid, servertimemillis).
Should I change my deletion thread? It is purely deleting data for the tuple userid, greater than specialtimestamp.
Edit:
When reading the MySQL documentation, I wonder if I cannot simply define the connection for inserting and deleting rows with
conn.setIsolationLevel( Connection.TRANSACTION_READ_COMMITTED )
and achieve what I need. It says that UPDATE- and DELETE statements, that use a unique index with a unique search pattern only lock the matching index entry, but not the gap before and with that, rows can still be inserted into that gap. It would be great to get your experience on that, since I can't simply try it on production - and it is a big effort to simulate it on test environment.
Try in your deletion thread to first load the IDs of the records to be deleted and then delete one at a time, committing after each delete.
If you run the thread that does the huge delete once a day and it takes 3 minutes, you can split it to smaller transactions that delete a small number of records, and still manage to get it done fast enough.
A better solution :
First of all. Any solution you try must be tested prior to deployment in production. Especially a solution suggested by some random person on some random web site.
Now, here's the solution I suggest (making some assumptions regarding your table structure and indices, since you didn't specify them):
Alter your table. It's not recommended to have a primary key of multiple columns in InnoDB, especially in large tables (since the primary key is included automatically in any other indices). See the answer to this question for more reasons. You should add some unique RecordID column as primary key (I'd recommend a long identifier, or BIGINT in MySQL).
Select the rows for deletion - execute "SELECT RecordID FROM YourTable where ServerTimeMillis < ?".
Commit (to release the lock on the ServerTimeMillis index, which I assume you have, quickly)
For each RecordID, execute "DELETE FROM YourTable WHERE RecordID = ?"
Commit after each record or after each X records (I'm not sure whether that would make much difference). Perhaps even one Commit at the end of the DELETE commands will suffice, since with my suggested new logic, only the deleted rows should be locked.
As for changing the isolation level. I don't think you have to do it. I can't suggest whether you can do it or not, since I don't know the logic of your server, and how it will be affected by such a change.
You can try to replace your one huge DELETE with multiple shorter DELETE ... LIMIT n with n being determined after testing (not too small to cause many queries and not too large to cause long locks). Since the locks would last for a few ms (or seconds, depending on your n) you could let the delete thread run continuously (provided it can keep-up; again n can be adjusted so it can keep-up).
Also, table partitioning can help.
I have to create a MySQL InnoDB table using a strictly sequential ID to each element in the table (row). There cannot be any gap in the IDs - each element has to have a different ID and they HAVE TO be sequentially assigned. Concurrent users create data on this table.
I have experienced MySQL "auto-increment" behaviour where if a transaction fails, the PK number is not used, leaving a gap. I have read online complicated solutions that did not convince me and some other that dont really address my problem (Emulate auto-increment in MySQL/InnoDB, Setting manual increment value on synchronized mysql servers)
I want to maximise writing concurrency. I cant afford having users writing on the table and waiting long times.
I might need to shard the table... but still keeping the ID count.
The sequence of the elements in the table is NOT important, but the IDs have to be sequential (ie, if an element is created before another does not need to have a lower ID, but gaps between IDs are not allowed).
The only solution I can think of is to use an additional COUNTER table to keep the count. Then create the element in the table with an empty "ID" (not PK) and then lock the COUNTER table, get the number, write it on the element, increase the number, unlock the table. I think this will work fine but has an obvious bottle neck: during the time of locking nobody is able to write any ID.
Also, is a single point of failure if the node holding the table is not available. I could create a "master-master"? replication but I am not sure if this way I take the risk of using an out-of-date ID counter (I have never used replication).
Thanks.
I am sorry to say this, but allowing high concurrency to achieve high performance and at the same time asking for a strictly monotone sequence are conflicting requirements.
Either you have a single point of control/failure that issues the IDs and makes sure there are neither duplicates nor is one skipped, or you will have to accept the chance of one or both of these situations.
As you have stated, there are attempts to circumvent this kind of problem, but in the end you will always find that you need to make a tradeoff between speed and correctness, because as soon as you allow concurrency you can run into split-brain situations or race-conditions.
Maybe a strictly monotone sequence would be ok for each of possibly many servers/databases/tables?
I have more of theoretical question:
When data gets inserted into a database? is it after persist or after commit is called? Because I have a problem with unique keys (manually generated) - they get duplicate. I'm thinking this is due multiple users inserting data simultaneously into a same table.
UPDATE 1:
I generate keys in my application. Keys example: '123456789123','123456789124','123456789125'...
Key field is varchar type, because there are lot of old keys (I can't delete or change them) like 'VP123456','VP15S3456'. Another problem, that after inserting them into one database, these keys have to be inserted in another database. And I don't know what are DB sequences and Atomic objects..
UPDATE 2:
These keys are used in finance documents and not as database keys. So they must be unique, but they are not used anywhere in programming as object keys.
I would suggest you create a Singleton that takes care of generating your keys. Make sure you can only get a new id once the singleton has initialized with the latest value from the database.
To safeguard you from incomplete inserts into the two databases I would suggest you try to use XA transactions. This will allow you to have all-or-nothing inserts and updates. So if any of the operations on any of the databases fails, everything will be rolled back. Of course there is a downside of XA transactions; they are quite slow and not all databases and database drivers support it.
How do you generate these keys? Have you tried using sequences in DB or atomic objects?
I'm asking because it is normal to populate DB concurrently.
EDIT1:
You can write a method that returns new keys based on atomic counter, this way you'll know that anytime you request a new key you receive a unique key. This strategy may and will lead to some keys being discarded but it is a small price to pay, unless it is a requirement that keys in the database are sequential.
private AtomicLong counter; //initialized somewhere else.
public String getKey(){
return "VP" + counter.incrementAndGet();
}
And here's some help on DB Sequences in Oracle, MySql, etc.
This is a use case in member enrollment via web application/web service. We have a complex algorithm for checking if a member is duplicate, by looking at multiple tables like phone,address etc. The algorithm varies based on member's country. So this restriction cannot be implemented using primary key/unique key constraint.
So we have the checks in Java code. But if there are 2 duplicate concurrent requests, the 2 Java threads see that the member doesn't exist and they both insert the record resulting in duplicates. How can I prevent such duplicate inserts?
I can prevent updates by using row level locks or Hibernate's optimistic concurrency. I can think of table level locks to prevent such inserts, but limits the application performance as it also blocks updates. Another option I think would be to create a lock table with a record with id='memberInsert', and force all inserts via JDBC to obtain a row level lock on this record.
Thanks
Suneel
If it's going to be anywhere, I'd expect it to be in a write trigger, not in the Java code. Some other application or some other area of the application could do something badly.
Offloading this on the database gives you two advantages. 1) It prevents the race condition you mention up there and 2) It protects the integrity of the data by not allowing some errant application to modify records putting them in an illegal state.
Can't you hash the outcome of the algorithm or something and simply use that as a unique primary key?
As long as the database is not aware of your requirements, it will not help you. And then you probably have no other choice than table level locking.