Prevent duplicate inserts without using unique keys in Oracle

Prevent duplicate inserts without using unique keys in Oracle - java

This is a use case in member enrollment via web application/web service. We have a complex algorithm for checking if a member is duplicate, by looking at multiple tables like phone,address etc. The algorithm varies based on member's country. So this restriction cannot be implemented using primary key/unique key constraint.
So we have the checks in Java code. But if there are 2 duplicate concurrent requests, the 2 Java threads see that the member doesn't exist and they both insert the record resulting in duplicates. How can I prevent such duplicate inserts?
I can prevent updates by using row level locks or Hibernate's optimistic concurrency. I can think of table level locks to prevent such inserts, but limits the application performance as it also blocks updates. Another option I think would be to create a lock table with a record with id='memberInsert', and force all inserts via JDBC to obtain a row level lock on this record.
Thanks
Suneel

If it's going to be anywhere, I'd expect it to be in a write trigger, not in the Java code. Some other application or some other area of the application could do something badly.
Offloading this on the database gives you two advantages. 1) It prevents the race condition you mention up there and 2) It protects the integrity of the data by not allowing some errant application to modify records putting them in an illegal state.

Can't you hash the outcome of the algorithm or something and simply use that as a unique primary key?
As long as the database is not aware of your requirements, it will not help you. And then you probably have no other choice than table level locking.

Related

sequence is not in a row on oracle tables

I have a project that its structure is java ee 7. I use hibernate as ORM and my database is Oracle.
I use #SequenceGenerator with allocationSize = 1 for id of my entity and #GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "seq"). My database sequence in Oracle has cache=1000.
But when I persist two records in database, first record's id is older than second record even if after a day and the id is not in a row and continuously.
How can I resolve this problem and what is my problem?

As you are using 11g ( a very old version, so your company should think in upgrade asap ) the option for RAC has to pass for a balance between performance and integrity gap.
You have two options noorder vs order
create sequence xxx start with 1 increment by 1 noorder|order cache xxx
How do the instances co-ordinate their use of sequence values and avoid the risk of two instances using the same value?
There are two solutions: the default noorder mechanism where each instance behaves as if it doesn’t know about the other instances. The other is the order option, where the instances continuously negotiate through the use of global enqueues to determine which instance should be responsible for the sequence at any moment.
Noorder
The upshot of this noorder mechanism is that each instance will be working its way through a different range of numbers, and there will be no overlaps between instances. If you had sessions that logged on to the database once per second to issue a call to nextval (and they ended up connecting through a different instance each time), then the values returned would appear to be fairly randomly scattered over a range dictated by “number of instances x cache size.” Uniqueness would be guaranteed, but ordering would not.
Order
If you declare a sequence with the order option, Oracle adopts a strategy of using a single “cache” for the values and introduces a mechanism for making sure that only one instance at a time can access and modify that cache. Oracle does this by taking advantage of its Global Enqueue services. Whenever a session issues a call to nextval, the instance acquires an exclusive SV lock (global enqueue) on the sequence cache, effectively saying, “who’s got the most up to date information about this sequence – I want control”. The one instance holding the SV lock in exclusive mode is then the only instance that can increment the cached value and, if necessary, update the seq$ table by incrementing the highwater. This means that the sequence numbers will, once again, be generated in order. But this option has a penalty in performance and should be considered carefully.
Summary
If your transactions are fast, you can use order and test how it behaves. If your transactions are not fast, I would avoid order all together. The best option is to upgrade to 19c ( 12c is already near of obsolescence ) and use IDENTTTY COLUMNS

If you have unordered (separate) caches on each node (the default):
node 1: cache values (1 - 1000)
node 2: cache values (1001 - 2000)
then the caches cannot overlap values and value used will depend on which node performs the insert. That is why your sequence values currently appear to be out of order.
Using the NOCACHE and/or ORDERED options will result sequential numbers, but you can expect at least some performance impact to your application, as the database must perform more overhead to determine the current sequence value before making it available to your SQL command. Reducing the cache size or eliminating the cache entirely can have a severe negative impact on performance if you are executing a lot of inserts (as suggested by your current cache value of 1000).
Assuming for now that you continue to use a cache (whether ordered or not), be aware that every time you restart your database, or a node (depending on your exact configuration), the unused cached values will be flushed and lost and a new cache will be created.
In the end, it is important to realize that sequence values are not intended (for most applications) to be perfectly sequential without gaps, or even (as in your case) ordered. They are only intended to be unique. Be sure to understand your requirement, and don't be put off if sequences don't behave quite like you expected: must the values be sequential in the order inserted, and will gaps in the sequence affect your application? If the answer is no, the application won't care, then stick with what you've got for the sake of performance.

Concurrency Control on my Code

I am working on an order capture and generator application. Application is working fine with concurrent users working on different orders. The problem starts when two Users from different systems/locations try to work on the same order. How it is impacting the business is, that for same order, application will generate duplicate data since two users are working on that order simultaneously.
I have tried to synchronize the method where I am generating the order, but that would mean that no other user can work on any new order since synchronize will put a lock for that method. This will certainly block all the users from generating a new order when one order is being progressed, since, it will hit the synchronized code.
I have also tried with criteria initilization for an order, but no success.
Can anyone please suggest a proper approach??
All suggestions/comments are welcome. Thanks in advance.

Instead of synchronizing on the method level, you may use block-level synchronization for the blocks of code which must be operated on by only one thread at a time. This way you can increase the scope for parallel processing of the same order.

On a grand scale, if you are backing up your entities in a database, I would advice you to look at optimistic locking.
Add a version field to your order entity. Once the order is placed (the first time) the version is 1. Every update should then come in order from this, so imagine two subsequent concurrent processes
a -> Read data (version=1)
Update data
Store data (set version=2 if version=1)
b -> Read data (version=1)
Update data
Store data (set version=2 if version=1)
If the processing of these two are concurrent rather than serialized, you will notice how one of the processes indeed will fail to store data. That is the losing user, who will have to retry his edits. (Where he reads version=2 instead).
If you use JPA, optimistic locking is as easy as adding a #Version attribute to your model. If you use raw JDBC, you will need to add the add it to the update condition
update table set version=2, data=xyz where orderid=x and version=1
That is by far the best and in fact preferred solution to your general problem.

Strictly auto-increment value in MySQL

I have to create a MySQL InnoDB table using a strictly sequential ID to each element in the table (row). There cannot be any gap in the IDs - each element has to have a different ID and they HAVE TO be sequentially assigned. Concurrent users create data on this table.
I have experienced MySQL "auto-increment" behaviour where if a transaction fails, the PK number is not used, leaving a gap. I have read online complicated solutions that did not convince me and some other that dont really address my problem (Emulate auto-increment in MySQL/InnoDB, Setting manual increment value on synchronized mysql servers)
I want to maximise writing concurrency. I cant afford having users writing on the table and waiting long times.
I might need to shard the table... but still keeping the ID count.
The sequence of the elements in the table is NOT important, but the IDs have to be sequential (ie, if an element is created before another does not need to have a lower ID, but gaps between IDs are not allowed).
The only solution I can think of is to use an additional COUNTER table to keep the count. Then create the element in the table with an empty "ID" (not PK) and then lock the COUNTER table, get the number, write it on the element, increase the number, unlock the table. I think this will work fine but has an obvious bottle neck: during the time of locking nobody is able to write any ID.
Also, is a single point of failure if the node holding the table is not available. I could create a "master-master"? replication but I am not sure if this way I take the risk of using an out-of-date ID counter (I have never used replication).
Thanks.

I am sorry to say this, but allowing high concurrency to achieve high performance and at the same time asking for a strictly monotone sequence are conflicting requirements.
Either you have a single point of control/failure that issues the IDs and makes sure there are neither duplicates nor is one skipped, or you will have to accept the chance of one or both of these situations.
As you have stated, there are attempts to circumvent this kind of problem, but in the end you will always find that you need to make a tradeoff between speed and correctness, because as soon as you allow concurrency you can run into split-brain situations or race-conditions.
Maybe a strictly monotone sequence would be ok for each of possibly many servers/databases/tables?

What are the difference between: sequence id using JPA #TableGenerator, #GeneratedValue vs database Auto_Increment

Q1.: What is the difference between applying sequence Id in a database using
A.
CREATE TABLE Person
(
id long NOT NULL AUTO_INCREMENT
...
PRIMARY KEY (id)
)
versus
B.
#Entity
public class Person {
#Id
#TableGenerator(name="TABLE_GEN", table="SEQUENCE_TABLE", pkColumnName="SEQ_NAME",
valueColumnName="SEQ_COUNT", pkColumnValue="PERSON_SEQ")
#GeneratedValue(strategy=GenerationType.TABLE, generator="TABLE_GEN")
private long id;
...
}
My system is highly concurrent. Since my DB is a Microsoft SQL server, I do not think it supports #SequenceGenerator, so I have to stay with #TableGenerator which is prone to concurrency issues.
Q2. This link here (http://en.wikibooks.org/wiki/Java_Persistence/Identity_and_Sequencing#Advanced_Sequencing) suggests that B might suffer from concurrency issues, but I do not understand the proposed solution. I would greatly appreciate it if someone could explain to me how to avoid concurrency issues with B. Here is a snippet of their solution:
If a large sequence pre-allocation size is used this becomes less of an issue, because the sequence table is rarely accessed.
Q2.1: How much allocation size are we talking about here? Should I do allocationSize=10 or allocationSize=100?
Some JPA providers use a separate (non-JTA) connection to allocate the sequence ids in, avoiding or limiting this issue. In this case, if you use a JTA data-source connection, it is important to also include a non-JTA data-source connection in your persistence.xml.
Q2.2: I use EclipseLink as my provider; do I have to do what it suggests above?
Q3. If B suffers from concurrency issues, does A suffer the same?

Using a TableGenerator the next id value will be looked up and maintained in a table and basically maintained by JPA and not your database. This may lead to concurrency issue when you have multiple threads accessing your database and trying to figure out what the next value for the id field may be.
The auto_increment type will make your database take care about the next id of your table, ie. it will be determined automatically by the database server when running the insert - which surely is concurrency safe.
Update:
Is there something that keeps you away from using GenerationType.AUTO?
GenerationType.AUTO does select an appropriate way to retrieve the id for your entity. So in best case in uses the built-in functionality. However, you need to check the generated SQLs and see what exactly happens there - as MSSQL does not offer sequences I assume it would use GenerationType.IDENTITY.
As said the auto_increment column takes care about assigning the next id value, ie. there is no concurrency issue there - even with multiple threads tackling the database in parallel. The challenge is to transfer this feature to be used by JPA.

A: uses IDENTITY id generation, #GeneratedValue(IDENTITY)
B: uses TABLE id generation
JPA supports three types, IDENTITY, SEQUENCE and TABLE.
There are trade-offs with both.
IDENTITY does not allow preallocation, so requires an extra SELECT after every INSERT, prevents batch writing, and requires a flush to access the id which may lead to poor concurrency.
TABLE allows preallocation, but can have concurrency issues with locks on the sequence table.
Technically SEQUENCE id generation is the best, but not all databases support it.
With TABLE sequencing if you use a preallocaiton size of 100, then only every 100 inserts will lock the row in the sequence table, so as long as you don't commonly have 100 inserts at the same time, you will not suffer any loss in concurrency. If you application does a lot of inserts, maybe use 1000 or larger value.
EclipseLink will use a separate transaction for TABLE sequencing, so any concurrency issue with locks to the sequence table will be reduced. If you are using JTA, then you need to specify a non-jta-datasource to do this and configure a sequence-connection-pool in your persistence.xml properties.

Way to know table is modified

There are two different processes developed in Java running independently,
If any of the process modifyies the table, can i get any intimation? As the table is modified. My objective is i want a object always in sync with a table in database, if any modification happens on table i want to modify the object.
If table is modified can i get any intimation regarding this ? Do Database provide any facility like this?

We use SQL Server and have certain triggers that fire when a table is modified and call an external binary. The binary we call sends a Tib rendezvous message to notify other applications that the table has been updated.
However, I'm not a huge fan of this solution - Much better to control writing to your table through one "custodian" process and have other applications delegate to that. To enforce this you could change permissions on your table so that only your custodian process can write to the database.
The other advantage of this approach is being able to provide a caching layer within your custodian process to cater for common access patterns. Granted that a DBMS performs caching anyway, but by offering it at the application layer you will have more control / visibility over it.

No, database doesn't provide these services. You have to query it periodically to check for modification. Or use some JMS solution to send notifications from one app to another.

You could add a timestamp column (last_modified) to the tables and check it periodically for updates or sequence numbers (which are incremented on updates similiar in concept to optimistic locking).
You could use jboss cache which provides update mechanisms.

One way, you can do this is: Just enclose your database statement in a method which should return 'true' when successfully accomplished. Maintain the scope of the flag in your code so that whenever you want to check whether the table has been modified or not. Why not you try like this???

If you're willing to take the hack approach, and your database stores tables as files (eg, mySQL), you could always have something that can check the modification time of the files on disk, and look to see if it's changed.
Of course, databases like Oracle where tables are assigned to tablespaces, and tablespaces are what have storage on disk it won't work.
(yes, I know this is a bad approach, that's why I said it's a hack -- but we don't know all of the requirements, and if he needs something quick, without re-writing the whole application, this would technically work for some databases)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.