Query multiple tables modified inside transaction

Query multiple tables modified inside transaction - java

I have two tables A and B. My application continuously executes transactions that consist of:
Insert rows in table B.
Update a row in table A.
(The two steps belong to the same transaction to keep table A and B mutually consistent.)
At any time t, I need a way to get a snapshot of the tables. More particularly, at any time t, I need the value of a particular row in table B, and I need the rows inserted in table A during the transaction that last updated the row of table B.
For example, at time t0, my tables have the following content:
Table A => (rowA1)
Table B => (rowB11, rowB12)
The rows rowB11 and rowB12 have been inserted inside the transaction that updated the row in table A to the state rowA1.
At time t1, the transaction is executed again, and my tables have now the following content:
Table A => (rowA2)
Table B => (rowB11, rowB12, rowB21, rowB22)
The rows rowB21 and rowB22 have been inserted inside the transaction that moved the row in table A from state rowA1 to state rowA2.
Now, at any time t, I would like to select the row in table A (i.e. now it's rowA2) and also to select the rows that have been inserted to reach state rowA2 (i.e. rowB21 and rowB22). What I don't want, is to select the row in table A (i.e. rowA2) and to get rows rowB31 and rowB32 from table B since the state I got from table A doesn't match these inserted rows (that just have been inserted during a still running transaction).
I hope my question is clear enough.
I precise I'm using MySQL and I manage my transactions using Spring.
Thanks,
Mickael
EDIT:
Finally, simply using transactions with a transaction level that is at least READ_COMMITTED is not enough. If between the the two SELECTs (the one to get current state of a row in table A and the one to get the rows associated to this state in table B), one or more other transactions are executed (i.e. one or more executions of steps 1-2), the rows fetched from table B will not correspond to the state of the row previously fetched from table A.

Add a column in B, that allows you to match rows in B with a specific status in A:
Time t0:
Table A => (rowA1)
Table B => (rowB11, rowA1), (rowB12, rowA1)
Time t1:
Table A => (rowA2)
Table B => (rowB11, null), (rowB12, null), (rowB21, rowA2), (rowB22, rowA2)
At t1, the rows in B you want are something like SELECT * FROM B WHERE ref_to_A = [current_value_in_A].
It appears that your question was related to transactions isolation, after all. So here we go:
Anything that happens during a transaction (unless isolation level is READ_UNCOMMITTED), i.e. between BEGIN and COMMIT (or ROLLBACK), is invisible to concurrent transactions.

Related

Best way (or ways) to insert multiple cross-reference records (using Java and SQLite)?

---- Note: working with JavaFX and SQLite ----
Ok guys, first let me illustrate the scenario:
So A has a one-to-many relation with B and C, and the last two have a many-to-many relation between them, as described above.
The sequence of record insertion is such that records in table A are always created first, and when a record in table B is created, at least one record exists in table A.
Similarly, when a record in C is created, there will be at least one record in A and one in B.
Here's the question: it's possible that a record being created in C, references multiple records in B, so there will be multiple inserts in table B_C_XRef (dashed circle in picture above), at the same time.
In other words, this is what's happening when a record is created in table C:
Insert record in C with foreign key from A
Insert as many records in B_C_XRef as there are references to B
As an example, let's imagine that "C" is an "Orders" table, and "B" is a "Products" table. That way since an order can contain multiple products, how would I add all those records to B_C_XRef representing each order-product relationship?
Thank you in advance.

Small table to Big table update

Problem Details:-
Table 1:- Product_Country A
Description:- Contains Product ID and other details along with Country Code.
Table 2:- Product_Outlet B
Description:- Contains Product ID and other details along with Outlet Code
A country can have many outlets, suppose country Australia has 50 outlets.
So suppose if i update any details of a product in table A, i want to update the same detail in table B based on some if else conditions.
Points to consider:-
1.) Table B is having 50 times more data than table A.
2.) There is a Java application through which we update table A.
3.) There are some rules to update the details in table B, some if else conditions based on which we update and create records in it.
Current Solution:-
There is a trigger which puts entry in a Temp table while updating/inserting A,
from where a PL/SQL job scheduled twice a day, picks up the data and update/insert in the table B.
This solution was considered, because updating the B table right after A table will consume much time and the application will be unresponsive.
Solutions Considered but rejected:-
1.) Updating table B right after A table, will consume much time.
2.) Increasing frequency of scheduled job, would hog the Database.
More Solution Proposals??????

A solution would be to have a "product" table, with references from table A and table B.
So if you update product for country in the A set, it's instantly updated for occurrences in the B set as well.
This means review your data model: a basic rule is you should not have replicated information in your database.

SymmetricDS - Can't synchronize row with foreign key

I have two tables, let call it table A and B. Table A has a foreign key of table B. My system creates first a row in table B, and in another screen user can create a row in table A that is related to the created row in table B.
These two rows need be send to a specific SymmetricDS client, to do this I'm using a subselect router for each table. The problem is: the row created in table B only know where it need go when the row in table A is created. When it happens, SymmetricDS already evaluated the subselect router of table B and considered the batch as unrouted. As the row of table B was not routed, the client can't create the row in table A due a foreign key error.
Is there a way I can guarantee that the two rows will synchronize together?

yes there is. use trigger customization. you'll have to wait until the version 3.7 is released or take the latest version of the source, apply the patch http://www.symmetricds.org/issues/view.php?id=1570 and declare before trigger customization for the table A which will update the row with the foreign key in the table B and have it being routed to the target before the row in table A

How can I avoid changes in DB while making a set of queries?

In my web application (Java + Spring + JPA) I have a method executing a set of related queries. In particular I need to know the total row count of a table and the row count of the result set for a certain query.
Obviously between these two queries changes can happen in my table: a new row added, row removed, field value changed, etc.
Table has millions of rows, so it's impossible to load the whole table in memory and make filtering in application context.
So I need to find a way to execute a set of queries maintaining the same "state" for the table (some kind of snapshot).
Is it sufficient to execute queries inside the same transaction, or is there some other approach?
UPDATE
The method is used for table pagination. I need to show n rows (PAGE) taken from m pages (SEARCH) filtered from a a total of t existing rows (TOTAL).
So basically I need to extract n records and to provide two numeric info: filtered rows count and total rows count.
I can execute SELECT count(*) from table, then SELECT count(*) from table where <search criteria> and then SELECT * from table where <search criteria> limit <n>, but I must be sure that no change appens in between...
I'm using MySQL 5

My tests in MySQL 5.5.28 show that you could rely on the fact simply being in a transaction won't count any new inserted/deleted/changed rows (being in a REPEATABLE READ transaction isolation level). This means, that COUNT() is subject to the transaction isolation levels. According to the documentation this mode is exactly what you want: a snapshot-based read based on the first read.

Concurrent process inserting data in database

Consider following schema in postgres database.
CREATE TABLE employee
(
id_employee serial NOT NULL PrimarKey,
tx_email_address text NOT NULL Unique,
tx_passwd character varying(256)
)
I have a java class which does following
conn.setAutoComit(false);
ResultSet rs = stmt.("select * from employee where tx_email_address = 'test1'");
if (!rs.next()) {
Insert Into employee Values ('test1', 'test1');
}
ResultSet rs = stmt.("select * from employee where tx_email_address = 'test2'");
if (!rs.next()) {
Insert Into employee Values ('test2', 'test2');
}
ResultSet rs = stmt.("select * from employee where tx_email_address = 'test3'");
if (!rs.next()) {
Insert Into employee Values ('test3', 'test3');
}
ResultSet rs = stmt.("select * from employee where tx_email_address = 'test4'");
if (!rs.next()) {
Insert Into employee Values ('test4', 'test4');
}
conn.commit();
conn.setAutoComit(true);
The problem here is if there are two or more concurrent instance of the above mentioned transaction trying to write data. Only one transaction would eventually succeeds and rest would throw SQLException "unique key constraint violation". How do we get around this.
PS: I have chosen only one table and simple insert queries to demonstrate the problem. My application is java based application whose sole purpose is to write data to the target database. and there can be concurrent process doing so and there is very high probability that some process might be trying to write in same data(as shown in example above).

The simplest way would seem to be to use the transaction isolation level 'serializable', which prevents phantom reads (other people inserting data which would satisfy a previous SELECT during your transaction).
if (!conn.getMetaData().supportsTransactionIsolationLevel(Connection.TRANSACTION_SERIALIZABLE)) {
// OK, you're hosed. Hope for your sake your drivers supports this isolation level
}
conn.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE);
There are also techniques like Oracle's "MERGE" statement -- a single statement which does 'insert or update', depending on whether the data's there. I don't know if Postgres has an equivalent, but there are techniques to 'fake it' -- see e.g.
How to write INSERT IF NOT EXISTS queries in standard SQL.

I would first try to design the data flow in a way that only one transaction will ever get one instance of the data. In that scenario the "unique key constraint violation" should never happen and therefore indicate a real problem.
Failing that, I would catch and ignore the "unique key constraint violation" after each insert. Of course, logging that it happened might be a good idea still.
If both approaches were not feasible for some reason, then I would most probably create a transit table of the same structure as "employee", but without primary key constraint and with a "transit status" field. No "unique key constraint violation" would ever happen on the insert into this transit table.
A job would be needed, that reads out this transit table and transfers the data into the "employee" table. This job would utilize the "transit status" to keep track of processed rows. I would let the job do different things each run:
execute an update statement on the transit table to set the "transit status" to "work in progress" for a number of rows. How large that number is or if all currently new rows get marked would need some thinking over.
execute an update statement that sets "transit status" to "duplicate" for all rows whose data is already in the "employee" table and whose "transit status" is not in ("duplicate", "processed")
repeat as long as there are rows in the transit table with "transit status" = "work in progress":
select a row from the transit table with "transit status" = "work in progress".
Insert that rows data into the "employee" table.
Set this rows "transit status" to "processed".
update all rows in the transit table with the same data as the currently processed row and "transit status" = "work in progress" to "transit status" = "duplicate".
I would most probably want another job to regularly delete the rows with "transit status" in ("duplicate", "processed")
If postgres does not know database jobs, an os side job would do.

A solution is to use a table level exclusive lock, locking for write while allowing concurrent reads, using the command LOCK.
Pseudo-sql-code:
select * from employee where tx_email_address = 'test1';
if not exists
lock table employee in exclusive mode;
select * from employee where tx_email_address = 'test1';
if still not exists //may be inserted before lock
insert into employee values ('test1', 'test1');
commit; //releases exclusive lock
Note that using this method will block all other writes until the lock is released, lowering throughput.
If all inserts are dependent on a parent row, then a better approach is to lock only the parent row, serializing child inserts, instead of locking the whole table.

You could expose a public method that queues the write operations and handles queue concurrency, then create another method to run on a different thread (or another process entirely) that actually performs the writes serially.

You could add concurrency control at the application level this by making the code a critical section:
synchronized(lock) {
// Code to perform selects / inserts within database transaction.
}
This way one thread is prevented from querying the table while the other is querying and inserting into the table. When the first thread completes, the second thread enters the synchronized block. However, at this point each select attempt will return data and hence the thread will not attempt to insert data.
EDIT:
In cases where you have multiple processes inserting into the same table you could consider taking out a table lock when performing the transaction to prevent other transactions from commencing. This is effectively doing the same as the code above (i.e. serializing the two transactions) but at the database level. Obviously there are potential performance implications in doing this.

One way to solve this particular problem is by ensuring that each of the individual threads/instances process rows in a mutually exclusive manner. In other words if instance 1 processes rows where tx_email_address='test1' then no other instance should process these rows again.
This can be achieved by generating a unique server id on instance startup and marking the rows to be processed with this server id. The way to do it is by -
<LOOP>
adding 2 columns status and server_id to employee table.
update employee set status='In Progress', server_id='<unique_id_for_instance>' where status='Uninitialized' and rownum<2
commit
select * from employee where server_id='<unique_id_for_instance>' and status='In Progress'
process the rows selected in step 4.
<END LOOP>
Following the above sequence of steps ensures that all the VM instances get different rows to process and there is no deadlock. It is necessary to have update before select to make the operation atomic. Doing it the other way round can lead to concurrency issues.
Hope this helps

An often used system is to have a primary key that is a UUID ( Unique Universal ID ) and a UUIDGenerator,
see http://jug.safehaus.org/ or similar things google has lots of answers
This will prevent the Unique Key constraint to happen
But that offcourse is only a part of your problem, you tx_email_address would still have to be unique and nothing solves that.
There is no way to prevent the constraint violation to happen, as long as you have concurrency you will run into it, and in itself this really is no problem.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.