This question already has answers here:
SQL Server Insert if not exists
(13 answers)
Closed 8 years ago.
I have a table users with primary key column email.
I have a piece of code where I store the user that simply invokes userDao.store(user);
Since the constraint exists, I can catch the exception and show the error on the UI. This approach works fine.
Another solution is to check first if the user exists and then store him in the database. This would result in two consecutive queries - select and then insert. So basically if the user exists I show the error. The issue I see here that if two users with the same email try to register at the same time and provide the same email. It may happen than both threads check the existence of the user and return nothing. Then the first thread saves the user and the second throws exception.
The third approach is to use MERGE query (I use hsqldb). Basically in one query I insert the user only if he does not exist. Then I can see the result of the query. If no rows have changed then it means that the user exists and I can show the error. Either of these approaches would not violate the consistency of my data. But I am looking for the best practices on how to handle this kind of problem.
Your first instinct was correct. To protect against duplicates, define a UNIQUE constraint on that column. Then catch any exception resulting from a violation of that constraint.
SQL lacks an atomic insert-if-not-exists command. You will see code using a nested SELECT statement, but such code is not atomic, so you would still need to trap for the UNIQUE constraint violations.
This Question is basically a duplicate. Search StackOverflow for more discussion and examples.
By the way, I would recommend against using email address as a primary key. If a user wants to change their email address on their account, you will have to update all related records using that value as a Foreign Key. I suggest using a Surrogate Key instead of a Natural Key almost always.
The chance of that happening is so remote you really don't have to consider it. Especially if you use email validation before someone can use the system. If you still are worried you can minimize the chance by using a synchronize operation on the call that checks for the existence of the email. The only way this would not work is if you have a clustered environment with the code running on 2 or more load balanced servers.
Related
Suppose that we want to insert a record in some table. But in order to be allowed to do that, that table must not contain any record with duplicated values in some fields in such a way that database primary keys are not enough for doing that control and it must be done by the application's code. If the code for inserting a record looked like this...
check duplicates
if no duplicates:
insert the record
else:
show the user a error
That code would be wrong because two different threads could make the check of duplicates at the same time, then pass the check, then insert the same record, producing a situation where there are duplications so that the table state is now inconsistent.
As the code is a web application made in Java, I guess that it would be enough to synchronize the critical section with the same static object so that any user that makes the execution flow to get into the critical section must wait for another one that has previously got into that section. But, is that enough? Is there a more elegant way for doing that?
You can use database triggers for that. The correct one to use in this case is "Before Insert" trigger. If the use case is simpler, another option would be to use checks and constraints.
Third option would be to do it in the java code and synchronizing the section like you described would work too.
According to the clarification from the question comments, a unique index is exactly what's needed here. Create a unique index with the name column as well as the boolean column in it. It will then not allow two entries where both columns have the same values.
sorry, if the question title is misleading or not accurate enough, but i didn't see how to ask it in one sentence.
Let's say we have a table where the PK is a String (numbers from '100,000' to '999,999', comma is for readability only).
Let's also say, the PK is not sequentially used.
Now i want to insert a new row into the table using java.sql and show the PK of the inserted row to the User. Since the PK is not generated by default (e.g. insert values without the PK didn't work, something like generated_keys is not available in the given environment) i've seen two different approaches:
in two different statements, first find a possible next key, then try to insert (and expect that another transaction used the same key in the time between the two statements) - is it valid to retry until success or could any sql trick with transaction-settings/locks help here? how can i realize that in java.sql?
for me, that's a disappointing solution, because of the non-deterministic behaviour (perhaps you could convince me of the contrary), so i searched for another one:
insert with a nested select statement that looks up the next possible PK. looking up other answers on generating the PK myself I came close to a working solution with that statement (left out the casts from string to int):
INSERT INTO mytable (pk,othercolumns)
VALUES(
(SELECT MIN(empty_numbers.empty_number)
FROM (SELECT t1.pk + 1 as empty_number
FROM mytable t1
LEFT OUTER JOIN mytable t2
ON t1.pk + 1 = t2.pk
WHERE t2.pk IS NULL
AND t1.pk > 100000)
as empty_numbers),
othervalues);
that works like a charm and has (afaik) a more predictable and stable solution than my first approach, but: how can i possibly retrieve the generated PK from that statement? I've read that there is no way to return the inserted row (or any columns) directly and most of the google results i've found, point to returning generated keys - even though my key is generated, it's not generated by the DBMS directly, but by my statement.
Note, that the DBMS used in development is MSSQL 2008 and the productive system is currently a DB2 on AS/400 (don't know which version) so i have to stick close to SQL standards. i can't change the db-structure in any way (e.g. use generated keys, i'm not sure about stored procedures).
DB2 for i allows generated keys, stored procedures, user defined functions - pretty much all of the things SQL Server can do. The exact implementation is different, but that's what manuals are for :-) Ask your admin what version of IBM i they're running, then hit up the Infocenter for specifics.
The constraining factor is that you can't alter the database design; you are stuck with apparently multiple processes trying to INSERT while backfilling 'holes' in the existing keyspace. That's a very tough nut to crack. Because you can't change the DB design, there's nothing to be done except to allow for and handle PK collisions. There's no SQL trick that'll help - the SQL way is to have the DB generate the PK, not the application.
There are several alternatives to suggest, in the event that some change is allowed. All have issues needing a workaround, but that is unavoidable at this point due to the application design.
Create a UDF that all INSERT clients use to retrieve the next available PK. Use a table of 'available numbers' and delete them as they are issued.
Pre-INSERT all the available numbers. Force clients to do an UPDATE. Make them FETCH...FOR UPDATE where (rest of data = not populated). This will lock the row, avoiding collisions as well as make the PK immediately available.
Leave the DB and the other application programs using this table as-is, but have your INSERT process draw from a block of keys that's been set aside for your use. Keep the next available number in an SQL SEQUENCE or an IBM i data area. This only works if there's a very large hole in the keyspace that's not yet used.
Is it a bad practice to expose DB internal IDs in URLs?
For example, suppose I have a users table with some IDs (primary key) for each row. Would exposing the URL myapp.com/accountInfo.html?userId=5, where 5 is an actual primary key, be considered a "bad thing" and why?
Also assume that we properly defend against SQL injections.
I am mostly interested in answers related to the Java web technology stack (hence the java tag), but general answers will also be very helpful.
Thanks.
That bases on the way you parse the URL. If you allow blind SQL injections that is bad. You have to only to validate the id from the user input.
Stackexchange also puts the id of the row into the URL as you can see in your address bar. The trick is to parse the part and get did of all possible SQL. The simples way is to check that the id is a number.
It isn't a bad thing to pass through in the URL, as it doesn't mean much to the end user - its only bad if you rely on that value in the running of your application. For example, you don't want the user to notice that userId=5 and change it to userID=10 to display the account of another person.
It would be much safer to store this information in a session on the server. For example, when the user logs in, their userID value is stored in the session on the server, and you use this value whenever you query the database. If you do it this way, there usually wouldn't be any need to pass through the userID in the URL, however it wouldn't hurt because it isn't used by your DB-querying code.
To use the database ID in URLs is good, because this ID should never change in an objects (db rows) life. Thus the URL is durable - the most important aspect of an URL. See also Cool URIs don't change.
Yes it is a bad thing. You are exposing implementation detail. How bad? That depends. It forces you to do unneeded checks of the user input. If other applications start depending on it, you are no longer free to change the database scheme.
PKs are meant for the system.
To the user, it may represent a different meaning:
For e.g.
Let's consider following links. Using primary-key,it displays an item under products productA, productB,productC;
(A)http://blahblahsite.com/browse/productA/111 (pkey)
(B)http://blahblahsite.com/browse/productB/112 (pkey)
(C)http://blahblahsite.com/browse/productC/113 (pkey)
User on link B may feel there are 112 items under ProductB, which is misleading.
Also it will cause problem while merging tables since PK will be auto-incremented.
Consider am using java , struts, hibernate and oracle. How can i prevent duplicate entries stored in database. One way is to make field as Unique . For example i am entering country "USA" in jsp page,USA is already available means how can i prevent it. Please let me know.
Regards,
sara
You should always indeed put a unique constraint on fields which must stay unique. This will, however, lead to a cryptic exception at commit time. If you want to be more user-friendly, you should check if the entry already exists (using a query) before inserting it, and display a useful and readable error message to the user if the entry already exists.
This still allows two concurrent users to check at the same time, then insert at the same time, but it greatly reduces the probability, and the unique constraint makes sure that one of the commits will fail, leaving your database in a consistent state.
Query your database whether it already contains USA or not. If it does, then don't store it. If not, then do.
Add a unique index to your database table on the country column.
Additionally you can annotate the country attribute of your hibernate object with #Column(unique=true).
I am looking for a way to save or update records, according to the table's unique key which is composed of several columns).
I want to achieve the same functionality used by INSERT ... ON DUPLICATE KEY UPDATE - meaning to blindly save a record, and have the DB/Hibernate insert a new one, or update the existing one if the unique key already exists.
I know I can use #SQLInsert( sql="INSERT INTO .. ON DUPLICATE KEY UPDATE"), but I was hoping not to write my own SQLs and let Hibernate do the job. (I am assuming it will do a better job - otherwise why use Hibernate?)
Hibernate may throw a ConstraintViolationException when you attempt to insert a row that breaks a constraint (including a unique constraint). If you don't get that exception, you may get some other general Hibernate exception - it depends on the version of Hibernate and the ability of Hibernate to map the MySQL exception to a Hibernate exception in the version and type of database you are using (I haven't tested it on everything).
You will only get the exception after calling flush(), so you should make sure this is also in your try-catch block.
I would be careful of implementing solutions where you check that the row exists first. If multiple sessions are updating the table concurrently you could get a race condition. Two processes read the row at nearly-the-same time to see if it exists; they both detect that it is not there, and then they both try to create a new row. One will fail depending on who wins the race.
A better solution is to attempt the insert first and if it fails, assume it was there already. However, once you have an exception you will have to roll back, so that will limit how you can use this approach.
This doesn't really sound like a clean approach to me. It would be better to first see if an entity with given key(s) exists. If so, update it and save it, if not create a new one.
EDIT
Or maybe consider if merge() is what you're looking for:
if there is a persistent instance with the same identifier currently associated with the session, copy the state of the given object onto the persistent instance
if there is no persistent instance currently associated with the session, try to load it from the database, or create a new persistent instance
the persistent instance is returned
the given instance does not become associated with the session, it remains detached
< http://docs.jboss.org/hibernate/core/3.3/reference/en/html/objectstate.html
You could use saveOrUpdate() from Session class.