Small table to Big table update

Small table to Big table update - java

Problem Details:-
Table 1:- Product_Country A
Description:- Contains Product ID and other details along with Country Code.
Table 2:- Product_Outlet B
Description:- Contains Product ID and other details along with Outlet Code
A country can have many outlets, suppose country Australia has 50 outlets.
So suppose if i update any details of a product in table A, i want to update the same detail in table B based on some if else conditions.
Points to consider:-
1.) Table B is having 50 times more data than table A.
2.) There is a Java application through which we update table A.
3.) There are some rules to update the details in table B, some if else conditions based on which we update and create records in it.
Current Solution:-
There is a trigger which puts entry in a Temp table while updating/inserting A,
from where a PL/SQL job scheduled twice a day, picks up the data and update/insert in the table B.
This solution was considered, because updating the B table right after A table will consume much time and the application will be unresponsive.
Solutions Considered but rejected:-
1.) Updating table B right after A table, will consume much time.
2.) Increasing frequency of scheduled job, would hog the Database.
More Solution Proposals??????

A solution would be to have a "product" table, with references from table A and table B.
So if you update product for country in the A set, it's instantly updated for occurrences in the B set as well.
This means review your data model: a basic rule is you should not have replicated information in your database.

Related

Best way (or ways) to insert multiple cross-reference records (using Java and SQLite)?

---- Note: working with JavaFX and SQLite ----
Ok guys, first let me illustrate the scenario:
So A has a one-to-many relation with B and C, and the last two have a many-to-many relation between them, as described above.
The sequence of record insertion is such that records in table A are always created first, and when a record in table B is created, at least one record exists in table A.
Similarly, when a record in C is created, there will be at least one record in A and one in B.
Here's the question: it's possible that a record being created in C, references multiple records in B, so there will be multiple inserts in table B_C_XRef (dashed circle in picture above), at the same time.
In other words, this is what's happening when a record is created in table C:
Insert record in C with foreign key from A
Insert as many records in B_C_XRef as there are references to B
As an example, let's imagine that "C" is an "Orders" table, and "B" is a "Products" table. That way since an order can contain multiple products, how would I add all those records to B_C_XRef representing each order-product relationship?
Thank you in advance.

Hive: Best way to do incremetal updates on a main table

So I have a main table in Hive, it will store all my data.
I want to be able to load a incremental data update about every month
with a large amount of data couple billion rows. There will be new data
as well as updated entries.
What is the best way to approach this, I know Hive recently upgrade and supports update/insert/delete.
What I've been thinking is to somehow find the entries that will be updated and remove them from the main table and then just insert the new incremental update. However after trying this, the inserts are very fast, but the deletes are very slow.
The other way is to do something using the update statement to match the key values from the main table and the incremental update and update their fields. I haven't tried this yet. This also sounds painfully slow since Hive would have to update each entry 1 by 1.
Anyone got any ideas as to how to do this most efficiently and effectively ??
I'm pretty new to Hive and databases in general.

If merge in ACID mode is not applicable, then it's possible to update using FULL OUTER JOIN or using UNION ALL + row_number.
To find all entries that will be updated you can join increment data with old data:
insert overwrite target_data [partition() if applicable]
SELECT
--select new if exists, old if not exists
case when i.PK is not null then i.PK else t.PK end as PK,
case when i.PK is not null then i.COL1 else t.COL1 end as COL1,
...
case when i.PK is not null then i.COL_n else t.COL_n end as COL_n
FROM
target_data t --restrict partitions if applicable
FULL JOIN increment_data i on (t.PK=i.PK);
It's possible to optimize this by restricting partitions in target_data that will be overwritten and joined using WHERE partition_col in (select distinct partition_col from increment_data) or pass partition list if possible as a parameter and use in the where clause, it will work even faster.
Also if you want to update all columns with new data, you can apply this solution with UNION ALL+row_number(), it works faster than full join: https://stackoverflow.com/a/44755825/2700344

Here is my solution/work around if you are using old hive version. This works better when you have large data in target table which we can't drop and recreate with full data every time.
create one more table say delete_keys table. This will hold all the key from main table which are deleted along with its surrogate key.
While loading incremental data into main table, do a left join with main table. For all the matching records, we ideally should update the main table. But instead, we take keys (along with surrogate key) from main table for all matching records and insert that to delete_keys table. Now we can insert all delta records into main table as it irrespective of whether they are to be updated or inserted.
Create view on main table using delete-keys table so that matching keys with delete-keys table are not fetched. So, this view will be final target table. This view will not show records from main table which are updated with latest records.

Query multiple tables modified inside transaction

I have two tables A and B. My application continuously executes transactions that consist of:
Insert rows in table B.
Update a row in table A.
(The two steps belong to the same transaction to keep table A and B mutually consistent.)
At any time t, I need a way to get a snapshot of the tables. More particularly, at any time t, I need the value of a particular row in table B, and I need the rows inserted in table A during the transaction that last updated the row of table B.
For example, at time t0, my tables have the following content:
Table A => (rowA1)
Table B => (rowB11, rowB12)
The rows rowB11 and rowB12 have been inserted inside the transaction that updated the row in table A to the state rowA1.
At time t1, the transaction is executed again, and my tables have now the following content:
Table A => (rowA2)
Table B => (rowB11, rowB12, rowB21, rowB22)
The rows rowB21 and rowB22 have been inserted inside the transaction that moved the row in table A from state rowA1 to state rowA2.
Now, at any time t, I would like to select the row in table A (i.e. now it's rowA2) and also to select the rows that have been inserted to reach state rowA2 (i.e. rowB21 and rowB22). What I don't want, is to select the row in table A (i.e. rowA2) and to get rows rowB31 and rowB32 from table B since the state I got from table A doesn't match these inserted rows (that just have been inserted during a still running transaction).
I hope my question is clear enough.
I precise I'm using MySQL and I manage my transactions using Spring.
Thanks,
Mickael
EDIT:
Finally, simply using transactions with a transaction level that is at least READ_COMMITTED is not enough. If between the the two SELECTs (the one to get current state of a row in table A and the one to get the rows associated to this state in table B), one or more other transactions are executed (i.e. one or more executions of steps 1-2), the rows fetched from table B will not correspond to the state of the row previously fetched from table A.

Add a column in B, that allows you to match rows in B with a specific status in A:
Time t0:
Table A => (rowA1)
Table B => (rowB11, rowA1), (rowB12, rowA1)
Time t1:
Table A => (rowA2)
Table B => (rowB11, null), (rowB12, null), (rowB21, rowA2), (rowB22, rowA2)
At t1, the rows in B you want are something like SELECT * FROM B WHERE ref_to_A = [current_value_in_A].
It appears that your question was related to transactions isolation, after all. So here we go:
Anything that happens during a transaction (unless isolation level is READ_UNCOMMITTED), i.e. between BEGIN and COMMIT (or ROLLBACK), is invisible to concurrent transactions.

Many subscribers for many shops

I have many shops in Shop table and many users in User table in DB.
User can subscribe to any stores.
How can I create table for this task? First column with shop name and second column - string with comma separated user ids?

This is generally referred to as a many-to-many relationship. In a standard relational database, this is usually represented by a table with user_id and shop_id, one row for each user -> shop relationship.
You will never want to use a comma-separated string to represent multiple items in a database, as this will make queries very inefficient. For example, to find which stores user a subscribes to, you would have to parse each row of the table to find which of the comma-separated values contains a.

This is a many-to-many relationship. You need a third table, say it's called Subscription. The fields will be two foreign keys, one referencing the primary key of the User table and the other referencing the primary key of the Shop table.

You can create a table where each row represents a subscription (i.e., a pair (user,shop)).
So, you will have (at least) two columns in the table: the id of the shop and the id of the user.
Assuming that a user can subscribe to a shop only once, the pair (user_id,shop_id) should be the primary key of the table (in case you need to define a surrogate key for the table, you could add a unique constraint to the pair (user_id,shop_id)).
If you have many data and need to quickly get all the shops to which a user has subscribed, you should add an index to the user_id column.
On the other hand, if you need to quickly get all the user that subscribed to a shop, you should add an index to the shop_id column.

I believe user can subscribe to multiple shops, and one shop can have multiple users subscribed to it. In this case, it is best to use map table with shop_id from shop table and user_id from user table. ( keep the foreign key constraints) . For each subscription ,this table will keep an entry, hence no comma separated string is necessary.

It sounds like you have a classic situation in databases which marks the change between a "flat file" database and "relational database" like MySQL, Oracle or MS's SQL Server. A similar situation is Authors and Books. An author can have more than one book and some books have more than one Author.
A simple comma delimited database is a flat file and you are going to need a separate line for each relationship. For example,
ShopA, UserA
ShopB, UserA
ShopA, UserB
ShopB, UserB
In a relational database, you can isolate the user table and shop table in two separate tables and form a third table, a relations table that describes the relation and so remove redundancy in the user and shop tables,
ShopTable
ShopA, SA
ShopB, SB
UserTable
UserA, UA
UserB, UB
ShopUserRelationTable
SA, UA
SA, UB
SB, UA
SB, UB
When you want records displaying the relationship, you apply the SQL language to create a sequence of records displaying this.
This basic problem of removing redundancy in tables is what motivates the development of relational database software.
It sometimes happens that a table is complicated enough to contain "many to many" relationship like this and SQL can be further applied to analyzing and creating records reflecting these relationships. Such a table may also serve as a many to many relationship table for other tables.

ADF - Edit multiple rows of detail table simultaneously from master table

My query surrounds the Master-Detail relationship in Oracle ADF Development.
What i am trying to do is make a "batch" change to all rows of a particular detail table without having to change each row individually.
For example, i have a master table of countries and corresponding detail tables of all the cities within those countries. The cities table has a column for "dialing code". If i select "UK" from the countries table i want to be able to populate the "dialing code" column of all the cities in UK with the value "+44" without having to go through them one by one.
Is this possible? I'm assuming a Master-Detail relationship is perfect for this but if not, what else is out there?
Thanks in advance for your help.

You can write a service method on your master VO that iterates through all the details of a row and updates an attribute in them.
You can then expose that service method to the client and drag it to add it to a page you are building.
Some links:
http://download.oracle.com/otn_hosted_doc/jdeveloper/11gdemos/ADF_Insider_Essentials/IterateRows/IterateRows.html
https://blogs.oracle.com/shay/entry/am_service_method_-_simple_dem

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.