I am trying to sense all the rows of some sql server 2008 tables if any changes occur using java technology.
I have investigated some approaches like use of timestamps column , Change Tracking mechanism , Change Data Capture.
But all the above approaches need some customization in database as follows:
1.Timestamps column should present in each table.
2.Change tracking require primary key in each table.
3.Change Data Capture require some creation of system tables and other customizations.
I need some approach which do not require such heavy customizations in the database because database is crucial and does not allow to alter the config settings.
Can anyone help or suggest something in this regard?
The below changes can accomplish data audit
Create an identity column for all txn tables.
Fetch this identity data to the front end along with the transaction data.
Create history tables for all txn tables and move the original data prior to every transaction using a version ID.
After modification in the UI, pass the data back to database and compare the data with the existing information using SQL MERGE statement to perform the update/Insert/Delete
Compare the latest version available in the history table with the data in the current table using the following logic
New data Inserted - IF an Identity key exist in current table and NOT available in the latest version data of history table
WHERE C.IdentityColumn NOT IN (Select identitycolumn from History H)
Data deleted - IF an identity key exist in the latest version data of history tables AND NOT exist in the current table.
WHERE H.IdentityColumn NOT IN (Select identitycolumn from Current C)
Data updated - IF identity key exist in both current table and latest version of history table and any one column data is modified
WHERE (C.IdentityColumn = H.IdentityColumn)
AND
(
C.Col1 <> H.Col1
OR
C.Col2 <> H.Col2
OR
C.ColN <> H.ColN
)
C - Current table
H - History table
using the above logic the modified data can be tracked in a separate audit table which can have columns like Record ID, Field Name, Old Value, New Value, Modification, Modified By, Modified Date/Time
Related
I have a requirement to fetch the records from the Dataverse in which some changes have been done in specif columns values. For example, let's say we have a table named employee in which we have a field called position which can be changed over time from intern, software developer, development lead, etc. If we have 10 records currently and if the position of one of the employees gets changed, I need only that one employee record. I have gone through Retrieve and detect changes to table definitions but I believe it is related to changes in the schema and not related changes in the data. I am using the Spring Boot with Java 11 and to work with Dataverse I am using the Olingo library and also may use the Web APIs if required. Is there a way to detect the changes in the data as described above?
EDIT
To add more details we will have a scheduled job that triggers at X minutes which needs to fetch the employee data for which position has changed from the last fetch time of X minutes. As we can see in the image below, all 3 records are being updated in that X minutes internal and last modified time has been updated for all. I need to fetch the records highlighted in green for which position attribute has changed. For a record with Id 2, I don't need to fetch it as the position is the same.
Solution 1: Custom changes table
If you may and can extend your current Dataverse environment
Create a new table called Employee Change. Add a column of type Lookup named Employee and link it to your Employee table
Modify Main Form and add Employee column to the form
Create a workflow process which would fire on field change. Inside the workflow process you create an Employee Change record and set lookup column value to the changed record
You can now query Employee Change table for changed records. You would need to expand the lookup column to get required columns from Employee table.
Example Web API query:
GET [Organization URI]/api/data/v9.1/employee?$select=createdon, employeeid
&$expand=employeeid($select=employeeid,fullname,column2,column3) HTTP/1.1
Accept: application/json
OData-MaxVersion: 4.0
OData-Version: 4.0
More info on expanding lookup columns can be found here
Solution 2: Auditing
Use built-in auditing feature
Make sure auditing is enabled. Details can be found in docs
Enable auditing on required column in Employee table
Query audit records for changes in Employee table. You have to pay attention only to changes to specific attributes of interest
You will get a list of records changed and then you have to query once again to retrieve columns of the records
Solution 3: Push instead of pull
It might make more sense to push changes from Dataverse to your API instead of constantly querying for changes.
You could use Microsoft Power Automate to create a simple flow which would call your API / platform when change is detected in Dataverse
A good start could be exploring the following Power Automate template: When a record is updated in Microsoft Dataverse, send an email. You could then replace "send email" steps with querying other APIs
I have a JAVA requirement where i have 1500 records that I have to update or insert into the database.
If a record exists with userId, then update it.
If a record does not exist with userId, then Insert it.
And, if there is an error in lets say, 10th record,,,I need to get
the error code for that record.
It looks like I have 2 options using JPA 1.0
A) Fire a select to check if record exists. If yes, then fire update. If not, fire insert.
B) Fire an insert always,,,but i get an uniqe record exception, only then fire an update query..
Are there any other more efficient ways ? how can this be done with as few queries and as quick as possible ?
ENV- JAVA, JPA 1.0, DB2
You did not specify which version of DB2 you use and on which system. Anyway, check if MERGE statement is available on your DB:
LUW from 9.5.0: http://www.ibm.com/support/knowledgecenter/SSEPGG_9.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0010873.html
Z/OS from 10.0.0: http://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/sqlref/src/tpc/db2z_sql_merge.html
Another way is to do delete + insert on every record (poor performance).
Third option is to create dynamic one delete statement with listed ID/KEY in where clause from data you are going to update, fire delete and then insert all data.
Performance of every option will depend on table specification, indexes etc.
you can write query in mysql as below
//suppose a as pk
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1,b=b+1;
here update will run when record with pk as a=1 is already present
refer below link http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
So I have a main table in Hive, it will store all my data.
I want to be able to load a incremental data update about every month
with a large amount of data couple billion rows. There will be new data
as well as updated entries.
What is the best way to approach this, I know Hive recently upgrade and supports update/insert/delete.
What I've been thinking is to somehow find the entries that will be updated and remove them from the main table and then just insert the new incremental update. However after trying this, the inserts are very fast, but the deletes are very slow.
The other way is to do something using the update statement to match the key values from the main table and the incremental update and update their fields. I haven't tried this yet. This also sounds painfully slow since Hive would have to update each entry 1 by 1.
Anyone got any ideas as to how to do this most efficiently and effectively ??
I'm pretty new to Hive and databases in general.
If merge in ACID mode is not applicable, then it's possible to update using FULL OUTER JOIN or using UNION ALL + row_number.
To find all entries that will be updated you can join increment data with old data:
insert overwrite target_data [partition() if applicable]
SELECT
--select new if exists, old if not exists
case when i.PK is not null then i.PK else t.PK end as PK,
case when i.PK is not null then i.COL1 else t.COL1 end as COL1,
...
case when i.PK is not null then i.COL_n else t.COL_n end as COL_n
FROM
target_data t --restrict partitions if applicable
FULL JOIN increment_data i on (t.PK=i.PK);
It's possible to optimize this by restricting partitions in target_data that will be overwritten and joined using WHERE partition_col in (select distinct partition_col from increment_data) or pass partition list if possible as a parameter and use in the where clause, it will work even faster.
Also if you want to update all columns with new data, you can apply this solution with UNION ALL+row_number(), it works faster than full join: https://stackoverflow.com/a/44755825/2700344
Here is my solution/work around if you are using old hive version. This works better when you have large data in target table which we can't drop and recreate with full data every time.
create one more table say delete_keys table. This will hold all the key from main table which are deleted along with its surrogate key.
While loading incremental data into main table, do a left join with main table. For all the matching records, we ideally should update the main table. But instead, we take keys (along with surrogate key) from main table for all matching records and insert that to delete_keys table. Now we can insert all delta records into main table as it irrespective of whether they are to be updated or inserted.
Create view on main table using delete-keys table so that matching keys with delete-keys table are not fetched. So, this view will be final target table. This view will not show records from main table which are updated with latest records.
I need maintain two tables one with the data and one with Versions.
So each time when a new record Inserted/Deleted from the table I need to update the version in the other table.
How to do it from Java/JPA ?
Database is Postgres.
If Java-only, you do it by executing all the SQL statements necessary to accomplish the changes you want to both tables, in the same database transaction.
Alternatively, if you want updates to main table to auto-replicate to version table, create a trigger on the main table.
Following situations:
I got two databases featuring an identical structure. On top of each of these databases runs an instance of the same app using Hibernate for ORM. The two are completely independent.
Now I have to merge both applications into one. In some tables, adjustments need to be made to avoid violating unique key constraints.
Since both databases are identical in terms of structure and the same Hibernate mapping is used, is there a way to use Hibernate for the task? I'm thinking of loading an Object from database A, modifying it in code and simply saving it to a Session from a SessionFactory based on database B. I'm wondering whether Hibernate would be able to update the primary and foreign key values accordingly and how difficult it would be to handle dependencies to objects that are not copied from the database A (because they are not needed any more).
Any recommendations?
isn't it easier to just do a database dump from database A and import it into database B? Or as an alternative use insert into B.table (col1,col2) values (select col1,col3 from A.table) ?
If your databases are MySQL, you use the MERGE storage engine. Here are the steps:
-In one of your databases, update all your id via Hibernate using the cascade all. All your id have to be increment by the last id of your other database on each table:
User1 (2000 rows, lastId: 2000) and User2 (3000 rows, lastId: 3000) -> User1 (2000 rows, lastId: 2000) and User2 (3000 rows, firstId:3000, lastId: 6000)
-Create an other database that merge all your databases
-Extract a dump from your new database and load this dump in your final database -> http://dev.mysql.com/doc/refman/5.0/en/merge-storage-engine.html
This is one possible way :)
I know it is an old thread, but I had a similar problem.
I solved including two date fields : included_date and changed_date to my tables, and also, I included another field to save the date I last sync the databases somewhere else (I have a table with configuration info).
When my system connects to the server I send the date from the last sync, then my routine can compare which rows hava been included or changed since my last sync.
Every new row I set the date into the included_date field, so when I sync I know which rows were created after my last sync, then I can do an INSERT. The same happens with row changes and the changed_date field, then I do an UPDATE.