I have a web project that uses a database to store data that is used to generate tasks that would be processed for remote machines to alter that records and store new data. My problem here is that I have to store all that changes on each table but I don't need all these information. For example, a table A could have 5 fields but I only need 2 for historical purposes. Another table B could have 3 and I would have to add another one (date for example). Also, I don't need changes during daily task generation, only the most recent one.
Which is the best way to maintain a change history? Someone told me that a good idea is having two tables, the A (B) table and another one called A_history (B_history) with the needed fields. This is actually what I'm doing, using triggers to insert into history tables but I don't feel comfortable with this approach. My project uses Spring (Spring-data, Hibernate and JPA) and if I change the DB (currently MySQL) I'd have to migrate triggers. Is there a good way to manage history records? Tables could be generated with Hibernate/JPA annotations.
If I maintain the two tables approach, can I add a method to the repository to fetch rows from current table and history table at once?
For this pourpose there is a special Hibernate Envers project. See official documentation here. Just configure it, annotate necessary properties with #Audited annotation and that's all. No need for DB triggers.
One pitfall: if you want to have a record for each delete operation then you need to use Session.delete(entity) way instead of HQL "delete ...".
EDIT. Also take a look into native auditing support of spring data jpa.
I am not a database expert. What I have seen them do boils down to a few ways of approach.
1) They add a trigger to the transactional table that copies inserts and updates to a history table but not deletes. This means any queries that need to include history can be done from the history table since all the current info is there too.
a) They can tag each entry in the history table with time and date and
keep track of all the states of the original records.
b) They can only
keep track of the current state of the original record and then it
settles when the original is deleted.
2) They have a periodic task that goes around and copies data marked as deletable into the history table. It then deletes the data from the transactional table. Any queries in the transactional table have to make sure to ignore the deletable rows. Any queries that need history have to search both tables and merge the results.
3) If the volume of data isn't too large, they just leave everything in one table and mark some entries as historical. Queries have to ignore historical rows. Queries that include history are easy. This may slow down database access as the table grows to include many unused rows but that can sometimes be ameliorated by clever use of indexes.
Related
I'm currently sourcing some static data from a third party. It's a simple one-to-many, like this
garage:
id
name
desc
location
garage_price:
id
garage_id
price_type
price
Sometimes, the data is incorrect, and I will need to correct it. At the same time, I'd like to preserve the original sourced data somewhere and potentially run some queries to show the changes.
My question is whether someone is doing something like this with SQL, Java and Hibernate, and what's the approach you've taken, or would take.
I could add a boolean column, "original_data", to both tables, and before an update happens, run a trigger to copy the row from garage or garage_price into an "original_garage" or "original_price" table as long as original_data is true. Then set original_data to false, and all further updates will just happen on the garage/garage_price tables.
Anything wrong with that approach, and how do people typically work with multiple tables with the same data in Hibernate/JPA? Previously, I'd create a class that holds all the data, and subclass it twice, once per each table, while setting
#Inheritance(strategy=InheritanceType.TABLE_PER_CLASS)
on the parent.
As so often there are various options:
Use Hibernate Envers. It will keep a complete history of changes, so if you do multiple changes each will result in a row in the auditing tables. These tables are separate from your main data tables which might be a pro or a con, depending on your requirements.
Use the approach that you described: Write the original dataset, copy it before modifying it. You'll need two additional attributes:
A flag marking the original and a technical id do have a unique primary key.
Just as the second version, but you could actually do that in a trigger in the database. Which probably is faster, works no matter how the data gets inserted and to copy rows in the database is actually really easy, while it feels rather cumbersome in Java. Of course, writing triggers is considered a PITA in itself by many Java developers. If your application doesn't usually use triggers and stored procedures it is also really easy to forget about the trigger and being rather confused where these additional rows come from.
I have a DB table with many columns and associated Entities.
Update is supported on some of the columns. I need to maintain history of the data that's overwritten in update/delete in a separate table. Options that I have considered are below:
1. Hibernate-envers: Most easiest to use but issue with this is the insert in audit table are synchronous and also it becomes a part of actual transaction. Which is not a desired solution for my use-case.
2. Debezium: While it does make the audit insert asynchronous, but it looks like an overkill for my use-case as it includes installation of a lot of services like Kafka, zookeeper and there seem to be multiple points of failure.
3. JPA listeners: I can use these to get the data being updated/deleted and call an async insert in history table. Only issue I see here is I'll have to replicate actual entity classes code in the history entities.
Please suggest a solution I can go ahead with. Thanks.
I'm using spring-batch jobs to persist content of a large csv file to a database.
JpaItemWriter is used for persistence, which is fine so far.
But now I'd like first to check if an entity already exists in the database (by id - the id field in csv and in database are equal), and in case just update the entity instead.
How could this be done?
When I needed to do this, the best I came up with was having my custom FieldSetMapper (used by the FlatFileItemReader) load the item from the database (or create a new instance of it doesn't exist) and then setting the properties based on the input. Since JpaItemWriter uses .merge, it will write the entity by updating if it was loaded from the database and insert if it was a new entity.
I also needed to have it run with a batch size of 1, to ensure that if there were duplicates in my input (which I did have), it would actually go one row at a time and insert or update for each one and not try to insert them all at once causing key problems.
As you might imagine, all this worked a lot slower than I would have liked. It queries the database for each and every row, and then does the corresponding update or insert. But as for my case it was for a monthly overnight batch process, it was good enough for our needs, even if it took many hours to run.
Following situations:
I got two databases featuring an identical structure. On top of each of these databases runs an instance of the same app using Hibernate for ORM. The two are completely independent.
Now I have to merge both applications into one. In some tables, adjustments need to be made to avoid violating unique key constraints.
Since both databases are identical in terms of structure and the same Hibernate mapping is used, is there a way to use Hibernate for the task? I'm thinking of loading an Object from database A, modifying it in code and simply saving it to a Session from a SessionFactory based on database B. I'm wondering whether Hibernate would be able to update the primary and foreign key values accordingly and how difficult it would be to handle dependencies to objects that are not copied from the database A (because they are not needed any more).
Any recommendations?
isn't it easier to just do a database dump from database A and import it into database B? Or as an alternative use insert into B.table (col1,col2) values (select col1,col3 from A.table) ?
If your databases are MySQL, you use the MERGE storage engine. Here are the steps:
-In one of your databases, update all your id via Hibernate using the cascade all. All your id have to be increment by the last id of your other database on each table:
User1 (2000 rows, lastId: 2000) and User2 (3000 rows, lastId: 3000) -> User1 (2000 rows, lastId: 2000) and User2 (3000 rows, firstId:3000, lastId: 6000)
-Create an other database that merge all your databases
-Extract a dump from your new database and load this dump in your final database -> http://dev.mysql.com/doc/refman/5.0/en/merge-storage-engine.html
This is one possible way :)
I know it is an old thread, but I had a similar problem.
I solved including two date fields : included_date and changed_date to my tables, and also, I included another field to save the date I last sync the databases somewhere else (I have a table with configuration info).
When my system connects to the server I send the date from the last sync, then my routine can compare which rows hava been included or changed since my last sync.
Every new row I set the date into the included_date field, so when I sync I know which rows were created after my last sync, then I can do an INSERT. The same happens with row changes and the changed_date field, then I do an UPDATE.
I have an existing application that I am working w/ and the customer has defined the table structure they would like for an audit log. It has the following columns:
storeNo
timeChanged
user
tableChanged
fieldChanged
BeforeValue
AfterValue
Usually I just have simple audit columns on each table that provide a userChanged, and timeChanged value. The application that will be writing to these tables is a java application, and the calls are made via jdbc, on an oracle database. The question I have is what is the best way to get the before/after values. I hate to compare objects to see what changes were made to populate this table, this is not going to be efficient. If several columns change in one update, then this new table will have several entries. Or is there a way to do this in oracle? What have others done in the past to track not only changes but changed values?
This traditionally what oracle triggers are for. Each insert or update triggers a stored procedure which has access to the "before and after" data, which you can do with as you please, such as logging the old values to an audit table. It's transparent to the application.
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:59412348055
If you use Oracle 10g or later, you can use built in auditing functions. You paid good money for the license, might as well use it.
Read more at http://www.oracle.com/technology/pub/articles/10gdba/week10_10gdba.html
"the customer has defined the table structure they would like for an audit log"
Dread words.
Here is how you would implement such a thing:
create or replace trigger emp_bur before insert on emp for each row
begin
if :new.ename = :old.ename then
insert_audit_record('EMP', 'ENAME', :old.ename, :new.ename);
end if;
if :new.sal = :old.sal then
insert_audit_record('EMP', 'SAL', :old.sal, :new.sal);
end if;
if :new.deptno = :old.deptno then
insert_audit_record('EMP', 'DEPTNO', :old.deptno, :new.deptno);
end if;
end;
/
As you can see, it involves a lot of repetition, but that is easy enough to handle, with a code generator built over the data dictionary. But there are more serious problems with this approach.
It has a sizeable overhead: an
single update which touches ten
field will generate ten insert
statements.
The BeforeValue and AfterValue
columns become problematic when we
have to handle different datatypes -
even dates and timestamps become
interesting, let alone CLOBs.
It is hard to reconstruct the state
of a record at a point in time. We
need to start with the earliest
version of the record and apply the
subsequent changes incrementally.
It is not immediately obvious how
this approach would handle INSERT
and DELETE statements.
Now, none of those objections are a problem if the customer's underlying requirement is to monitor changes to a handful of sensitive columns: EMPLOYEES.SALARY, CREDIT_CARDS.LIMIT, etc. But if the requirement is to monitor changes to every table, a "whole record" approach is better: just insert a single audit record for each row affected by the DML.
I'll ditto on triggers.
If you have to do it at the application level, I don't see how it would be possible without going through these steps:
start a transaction
SELECT FOR UPDATE of the record to be changed
for each field to be changed, pick up the old value from the record and the new value from the program logic
for each field to be changed, write an audit record
update the record
end the transaction
If there's a lot of this, I think I would be creating an update-record function to do the compares, either at a generic level or a separate function for each table.