Guaranteed FIFO using JPA (Hibernate implementation) with MySQL - java

I need to persist a queue of tasks in MySQL. When reading them from DB I have to make sure the order is exactly the same as they have been persisted.
In general I prefer to have the solution DB agnostic (i.e. pure JPA) but adding some flavor of Hibernate and/or MySQL is acceptable as well.
My (probably naive) first version looks like:
em.createNamedQuery("MyQuery", MyTask.class).setFirstResult(0).setMaxResults(count).getResultList();
Where MyQuery doesn't have any "order by" clause i.e. it looks like:
SELECT t FROM MyTasks
Would such approach guarantee that the incoming results/entities are ordered in the way they have been persisted? What if I enable caching as well?
I was also thinking of adding an extra field to the task entity which is a timestamp in milliseconds (UTC from 1970-01-01) and then order by it in the query but then I might be in a situation where two tasks get generated immediately one after the other and they have the same timestamp.
Any solutions/ideas are welcome!
EDIT:
I just realised that auto increment (at least in MySQL) would throw an exception once it reaches its max value and no more inserts would be possible. This means I shouldn't worry about having the counter reset by the DB and I could explicitly order by an "auto increment" column in my query. Of course I would have another problem to deal with i.e. what to do in case the volume is so high that the largest possible unsigned integer type in MySQL is not big enough but this problem is not nesessarily coupled with the problem I am dealing right now.

Focusing in a pure JPA solution, cause the entity MyTasks must have a primary key I suggest you to use Sequence Generator for its primary key and sort the result of your query using order by clause on the key.
For example:
#Entity
class MyTask {
#Id #GeneratedValue(strategy=GenerationType.SEQUENCE)
private Long id;
You can also tight it a little bit with your database using #SequenceGenerator to specify a generator defined in the database.
Edit: Did you take a look at the #PrePersist option for setting the timestamp? Maybe you can combine the timestamp field and the id sequenced generation and order by in this order, so timestamp conflicts are resolved by id comparation (which are unique).

Most RDBMS's will store in the order of insertion and given no other instruction will order results that way too. If you don't want to leave it to chance, you have a couple of options.
1) You can generate a reasonably unique ID by using a timestamp and a incrementing fixed-length number,
OR
2) You can just define your table with an autonumbered primary key (which is probably easier).
If the table has a primary key to order by, then by default, most RDBMS's will return things in ascending primary key order... or you can enforce it explicitly in your query.

JPA (with or without cache) and RDBMS not guarantee of persisting or uploading sequence when you do not use order instruction. To solve task you should add integral primary key to the entity and use it when gather data as it mentioned other answereres.

Related

Hibernate - Fetch a sequential number from database, preventing duplicated keys during concurrency

I have a situation maintaining a legacy project, using JSF / Primefaces / Hibernate, the database is DB2, the original code was migrated from Delphi to Java, but keeping the database structure since it came from a vendor (we can't change it). There are some tables used to fetch a sequential id (SELECT MAX and UPDATE after that).
The table structure has a composite key (year and number), the issue today is: we select the max number based on the year from a param table (which holds the "next sequential" value). Sometimes users using concurrently get the same number, causing errors when trying to persist duplicated keys.
I tried to implement a Hibernate Interceptor to fetch and set the value during the onSave method, but I was unable to make it avoid the duplicated keys issue (Tried using it as SessionFactory-scoped). Also I tried to make the methods syncronized, but it didn't work also.
Is there a way to prevent this duplicated key issue (programmatically, without the need of changing the database) using Hibernate features?
Thanks in advance!

Return (self) generated value from insert statement (no id, no returning)

sorry, if the question title is misleading or not accurate enough, but i didn't see how to ask it in one sentence.
Let's say we have a table where the PK is a String (numbers from '100,000' to '999,999', comma is for readability only).
Let's also say, the PK is not sequentially used.
Now i want to insert a new row into the table using java.sql and show the PK of the inserted row to the User. Since the PK is not generated by default (e.g. insert values without the PK didn't work, something like generated_keys is not available in the given environment) i've seen two different approaches:
in two different statements, first find a possible next key, then try to insert (and expect that another transaction used the same key in the time between the two statements) - is it valid to retry until success or could any sql trick with transaction-settings/locks help here? how can i realize that in java.sql?
for me, that's a disappointing solution, because of the non-deterministic behaviour (perhaps you could convince me of the contrary), so i searched for another one:
insert with a nested select statement that looks up the next possible PK. looking up other answers on generating the PK myself I came close to a working solution with that statement (left out the casts from string to int):
INSERT INTO mytable (pk,othercolumns)
VALUES(
(SELECT MIN(empty_numbers.empty_number)
FROM (SELECT t1.pk + 1 as empty_number
FROM mytable t1
LEFT OUTER JOIN mytable t2
ON t1.pk + 1 = t2.pk
WHERE t2.pk IS NULL
AND t1.pk > 100000)
as empty_numbers),
othervalues);
that works like a charm and has (afaik) a more predictable and stable solution than my first approach, but: how can i possibly retrieve the generated PK from that statement? I've read that there is no way to return the inserted row (or any columns) directly and most of the google results i've found, point to returning generated keys - even though my key is generated, it's not generated by the DBMS directly, but by my statement.
Note, that the DBMS used in development is MSSQL 2008 and the productive system is currently a DB2 on AS/400 (don't know which version) so i have to stick close to SQL standards. i can't change the db-structure in any way (e.g. use generated keys, i'm not sure about stored procedures).
DB2 for i allows generated keys, stored procedures, user defined functions - pretty much all of the things SQL Server can do. The exact implementation is different, but that's what manuals are for :-) Ask your admin what version of IBM i they're running, then hit up the Infocenter for specifics.
The constraining factor is that you can't alter the database design; you are stuck with apparently multiple processes trying to INSERT while backfilling 'holes' in the existing keyspace. That's a very tough nut to crack. Because you can't change the DB design, there's nothing to be done except to allow for and handle PK collisions. There's no SQL trick that'll help - the SQL way is to have the DB generate the PK, not the application.
There are several alternatives to suggest, in the event that some change is allowed. All have issues needing a workaround, but that is unavoidable at this point due to the application design.
Create a UDF that all INSERT clients use to retrieve the next available PK. Use a table of 'available numbers' and delete them as they are issued.
Pre-INSERT all the available numbers. Force clients to do an UPDATE. Make them FETCH...FOR UPDATE where (rest of data = not populated). This will lock the row, avoiding collisions as well as make the PK immediately available.
Leave the DB and the other application programs using this table as-is, but have your INSERT process draw from a block of keys that's been set aside for your use. Keep the next available number in an SQL SEQUENCE or an IBM i data area. This only works if there's a very large hole in the keyspace that's not yet used.

JPA insert transaction concurrency

I have more of theoretical question:
When data gets inserted into a database? is it after persist or after commit is called? Because I have a problem with unique keys (manually generated) - they get duplicate. I'm thinking this is due multiple users inserting data simultaneously into a same table.
UPDATE 1:
I generate keys in my application. Keys example: '123456789123','123456789124','123456789125'...
Key field is varchar type, because there are lot of old keys (I can't delete or change them) like 'VP123456','VP15S3456'. Another problem, that after inserting them into one database, these keys have to be inserted in another database. And I don't know what are DB sequences and Atomic objects..
UPDATE 2:
These keys are used in finance documents and not as database keys. So they must be unique, but they are not used anywhere in programming as object keys.
I would suggest you create a Singleton that takes care of generating your keys. Make sure you can only get a new id once the singleton has initialized with the latest value from the database.
To safeguard you from incomplete inserts into the two databases I would suggest you try to use XA transactions. This will allow you to have all-or-nothing inserts and updates. So if any of the operations on any of the databases fails, everything will be rolled back. Of course there is a downside of XA transactions; they are quite slow and not all databases and database drivers support it.
How do you generate these keys? Have you tried using sequences in DB or atomic objects?
I'm asking because it is normal to populate DB concurrently.
EDIT1:
You can write a method that returns new keys based on atomic counter, this way you'll know that anytime you request a new key you receive a unique key. This strategy may and will lead to some keys being discarded but it is a small price to pay, unless it is a requirement that keys in the database are sequential.
private AtomicLong counter; //initialized somewhere else.
public String getKey(){
return "VP" + counter.incrementAndGet();
}
And here's some help on DB Sequences in Oracle, MySql, etc.

What are the difference between: sequence id using JPA #TableGenerator, #GeneratedValue vs database Auto_Increment

Q1.: What is the difference between applying sequence Id in a database using
A.
CREATE TABLE Person
(
id long NOT NULL AUTO_INCREMENT
...
PRIMARY KEY (id)
)
versus
B.
#Entity
public class Person {
#Id
#TableGenerator(name="TABLE_GEN", table="SEQUENCE_TABLE", pkColumnName="SEQ_NAME",
valueColumnName="SEQ_COUNT", pkColumnValue="PERSON_SEQ")
#GeneratedValue(strategy=GenerationType.TABLE, generator="TABLE_GEN")
private long id;
...
}
My system is highly concurrent. Since my DB is a Microsoft SQL server, I do not think it supports #SequenceGenerator, so I have to stay with #TableGenerator which is prone to concurrency issues.
Q2. This link here (http://en.wikibooks.org/wiki/Java_Persistence/Identity_and_Sequencing#Advanced_Sequencing) suggests that B might suffer from concurrency issues, but I do not understand the proposed solution. I would greatly appreciate it if someone could explain to me how to avoid concurrency issues with B. Here is a snippet of their solution:
If a large sequence pre-allocation size is used this becomes less of an issue, because the sequence table is rarely accessed.
Q2.1: How much allocation size are we talking about here? Should I do allocationSize=10 or allocationSize=100?
Some JPA providers use a separate (non-JTA) connection to allocate the sequence ids in, avoiding or limiting this issue. In this case, if you use a JTA data-source connection, it is important to also include a non-JTA data-source connection in your persistence.xml.
Q2.2: I use EclipseLink as my provider; do I have to do what it suggests above?
Q3. If B suffers from concurrency issues, does A suffer the same?
Using a TableGenerator the next id value will be looked up and maintained in a table and basically maintained by JPA and not your database. This may lead to concurrency issue when you have multiple threads accessing your database and trying to figure out what the next value for the id field may be.
The auto_increment type will make your database take care about the next id of your table, ie. it will be determined automatically by the database server when running the insert - which surely is concurrency safe.
Update:
Is there something that keeps you away from using GenerationType.AUTO?
GenerationType.AUTO does select an appropriate way to retrieve the id for your entity. So in best case in uses the built-in functionality. However, you need to check the generated SQLs and see what exactly happens there - as MSSQL does not offer sequences I assume it would use GenerationType.IDENTITY.
As said the auto_increment column takes care about assigning the next id value, ie. there is no concurrency issue there - even with multiple threads tackling the database in parallel. The challenge is to transfer this feature to be used by JPA.
A: uses IDENTITY id generation, #GeneratedValue(IDENTITY)
B: uses TABLE id generation
JPA supports three types, IDENTITY, SEQUENCE and TABLE.
There are trade-offs with both.
IDENTITY does not allow preallocation, so requires an extra SELECT after every INSERT, prevents batch writing, and requires a flush to access the id which may lead to poor concurrency.
TABLE allows preallocation, but can have concurrency issues with locks on the sequence table.
Technically SEQUENCE id generation is the best, but not all databases support it.
With TABLE sequencing if you use a preallocaiton size of 100, then only every 100 inserts will lock the row in the sequence table, so as long as you don't commonly have 100 inserts at the same time, you will not suffer any loss in concurrency. If you application does a lot of inserts, maybe use 1000 or larger value.
EclipseLink will use a separate transaction for TABLE sequencing, so any concurrency issue with locks to the sequence table will be reduced. If you are using JTA, then you need to specify a non-jta-datasource to do this and configure a sequence-connection-pool in your persistence.xml properties.

Order of rows in Table

I have the following problem: I'm loading a "dif"-table with only deletes and inserts. A "change" is determined by a delete followed by an insert.
This table has no primary key and no field to order by. I want to use Hibernate to load this table. (SELECT obj FROM MyDifTable obj WHERE obj.group = groupId)
Right now, I'm using a key over all values and the above query and I'm not getting the rows in the same order as in the source table.
My questions are:
How do I let Hibernate always return the same order as in the table?
How to specify a primary key in this case?
I know this is such a badly designed thing and I've tried to argument with the design team but without success.
Best Regards,
Kai
Relational databases are based upon a branch of Set theory. Information is either in the database (in the Set) or absent. Order is not a concept in sets, only presence is a concept in sets. This means that you must model ordering explicitly if you need it.
If Order must be maintained, you must add an "order by" clause to sort the resulting members of the set. Sometimes that sort is external (so you can delay decision of the ordering until query submission time). On very rare occasions it is important to have that order internalized in the "data set".
On the odd case you need an internalized ordering, you must do so by storing the order along with the data in the set; but, to enforce the presence of the "order by" clause (query writers might omit it), you might also have to put a few stored procedures / views of the data to ensure that the query always returns with the correct ordering.
Examples of internalized ordering are common when storing a list to a database. One typically stores the list item along with it's index in the list in the same row. That way when it is retrieved, you can restore the items and the ordering.

Categories

Resources