Merge two databases with identical structure and Hibernate mappings - java

Following situations:
I got two databases featuring an identical structure. On top of each of these databases runs an instance of the same app using Hibernate for ORM. The two are completely independent.
Now I have to merge both applications into one. In some tables, adjustments need to be made to avoid violating unique key constraints.
Since both databases are identical in terms of structure and the same Hibernate mapping is used, is there a way to use Hibernate for the task? I'm thinking of loading an Object from database A, modifying it in code and simply saving it to a Session from a SessionFactory based on database B. I'm wondering whether Hibernate would be able to update the primary and foreign key values accordingly and how difficult it would be to handle dependencies to objects that are not copied from the database A (because they are not needed any more).
Any recommendations?

isn't it easier to just do a database dump from database A and import it into database B? Or as an alternative use insert into B.table (col1,col2) values (select col1,col3 from A.table) ?

If your databases are MySQL, you use the MERGE storage engine. Here are the steps:
-In one of your databases, update all your id via Hibernate using the cascade all. All your id have to be increment by the last id of your other database on each table:
User1 (2000 rows, lastId: 2000) and User2 (3000 rows, lastId: 3000) -> User1 (2000 rows, lastId: 2000) and User2 (3000 rows, firstId:3000, lastId: 6000)
-Create an other database that merge all your databases
-Extract a dump from your new database and load this dump in your final database -> http://dev.mysql.com/doc/refman/5.0/en/merge-storage-engine.html
This is one possible way :)

I know it is an old thread, but I had a similar problem.
I solved including two date fields : included_date and changed_date to my tables, and also, I included another field to save the date I last sync the databases somewhere else (I have a table with configuration info).
When my system connects to the server I send the date from the last sync, then my routine can compare which rows hava been included or changed since my last sync.
Every new row I set the date into the included_date field, so when I sync I know which rows were created after my last sync, then I can do an INSERT. The same happens with row changes and the changed_date field, then I do an UPDATE.

Related

Mysql data export tool

so, say I have a SQL(Mysql) database containing 4 tables {A, B, C, D}
and I want to create a testing database which contains a subset of data from the first database (both in time and type).
so for example:
"I want to create a new (identical in structure) database containing all of the data for user "bob" for the last two weeks."
the naive approach is to dump two weeks of data from the first database, use vagrant / chef to spin up a new empty database and import the dump data.
however, this does not work as each table has foreign keys with each other.
so, if I have two weeks of data of "A", it might rely on a year old data from "D".
My current solution is to use the data layer of my java application load the data in to memory and then inserted it into the database. however, this is not sustainable / scalable.
so, in a roundabout way, my question is, does anyone know of any tools or tricks to migrate a "complete" set of data from one database to another considering a time period of 1 table and including all other related data from the other tables as well?
any suggestions would be fantastic :)
Try your "naive approach", but SET FOREIGN_KEY_CHECKS=0; first, then run your backup queries, then SET FOREIGN_KEY_CHECKS=1;
There is the way to recreate similar database with part of data, not so simple, but can be used:
Create new database (only schema), for example using buckup or schema comparer tool.
The next step is to copy table data. You could do it with a help of data comparer tool (look at schema comparer link). Select tables you need and check record data to synchronize.

MySQL check if table has some records changed

I am working on an Java application which uses MySQL database as the data storage layer. There are few configuration tables in database, but each table has many thousands of records / rows. These all configuration is cached / loaded in memory in corresponding data structures / beans(JAVA POJO's) when application starts up.
Everything is fine except that every time the application starts the caching takes place and this usually takes 15-20 minutes, as the data to be cached is huge and also some columns have XML string which is parsed and then stored in beans.
So what's the big deal??
Why should we cache when no data is changed between consecutive start-up's.?? I can have all the beans encapsulated in a common Config bean and serialize it. And load this serialized object the next time when I figure out no data is changed - and yes of course loading serialized object is far faster then database hit plus bean population.
So is there any way I can figure this out?
Of course at database level. I would query when the application starts - Was there any change in the database tables since it was last started. If yes do the same old boring caching process and store some unique identifier and serialize, Or if last identifier and current identifier are same just load the serialized object. This unique identifier will of course be persistent.
Add an last_updated column of type timestamp to the table.
When you need to check if there are changes on the table simply execute the query:
select max(last_updated) from YOUR_TABLE
If the last_updated is after the time you created the last cache copy you can update the cache with only the elements changed since last creation of the cache with a query similar to this one:
select * from YOUR_TABLE where last_updated > LAST_CACHE_UPDATE
As explained in the comments is higly recomandable to add an index on the column last_updated. Using an index give you the possibility to retrieve the maximum value in a table of 1.000.000.000 records in 30 steps (not 1.000.000.000 as wrong mentioned in the comments).
If you restart your application a lot and your cache can live in off memory data structure like redis or hazelcast, use that as cache, not the jvm memory. When update data, update both sides.

How to update existing entities with JpaItemWriter?

I'm using spring-batch jobs to persist content of a large csv file to a database.
JpaItemWriter is used for persistence, which is fine so far.
But now I'd like first to check if an entity already exists in the database (by id - the id field in csv and in database are equal), and in case just update the entity instead.
How could this be done?
When I needed to do this, the best I came up with was having my custom FieldSetMapper (used by the FlatFileItemReader) load the item from the database (or create a new instance of it doesn't exist) and then setting the properties based on the input. Since JpaItemWriter uses .merge, it will write the entity by updating if it was loaded from the database and insert if it was a new entity.
I also needed to have it run with a batch size of 1, to ensure that if there were duplicates in my input (which I did have), it would actually go one row at a time and insert or update for each one and not try to insert them all at once causing key problems.
As you might imagine, all this worked a lot slower than I would have liked. It queries the database for each and every row, and then does the corresponding update or insert. But as for my case it was for a monthly overnight batch process, it was good enough for our needs, even if it took many hours to run.

Managing history records in a database

I have a web project that uses a database to store data that is used to generate tasks that would be processed for remote machines to alter that records and store new data. My problem here is that I have to store all that changes on each table but I don't need all these information. For example, a table A could have 5 fields but I only need 2 for historical purposes. Another table B could have 3 and I would have to add another one (date for example). Also, I don't need changes during daily task generation, only the most recent one.
Which is the best way to maintain a change history? Someone told me that a good idea is having two tables, the A (B) table and another one called A_history (B_history) with the needed fields. This is actually what I'm doing, using triggers to insert into history tables but I don't feel comfortable with this approach. My project uses Spring (Spring-data, Hibernate and JPA) and if I change the DB (currently MySQL) I'd have to migrate triggers. Is there a good way to manage history records? Tables could be generated with Hibernate/JPA annotations.
If I maintain the two tables approach, can I add a method to the repository to fetch rows from current table and history table at once?
For this pourpose there is a special Hibernate Envers project. See official documentation here. Just configure it, annotate necessary properties with #Audited annotation and that's all. No need for DB triggers.
One pitfall: if you want to have a record for each delete operation then you need to use Session.delete(entity) way instead of HQL "delete ...".
EDIT. Also take a look into native auditing support of spring data jpa.
I am not a database expert. What I have seen them do boils down to a few ways of approach.
1) They add a trigger to the transactional table that copies inserts and updates to a history table but not deletes. This means any queries that need to include history can be done from the history table since all the current info is there too.
a) They can tag each entry in the history table with time and date and
keep track of all the states of the original records.
b) They can only
keep track of the current state of the original record and then it
settles when the original is deleted.
2) They have a periodic task that goes around and copies data marked as deletable into the history table. It then deletes the data from the transactional table. Any queries in the transactional table have to make sure to ignore the deletable rows. Any queries that need history have to search both tables and merge the results.
3) If the volume of data isn't too large, they just leave everything in one table and mark some entries as historical. Queries have to ignore historical rows. Queries that include history are easy. This may slow down database access as the table grows to include many unused rows but that can sometimes be ameliorated by clever use of indexes.

Accessing database multiple times

I am working on solution of below mentioned but could not find any best practice/tool for this.
For a batch of requests(say 5000 unique ids and records) received in webservice, it has to fetch rows for those unique ids in database and keep them in buffer(or cache) and compare those with records received in webservice. If there is a change for a particular data(say column) that will be updated in table for that unique id. And in turn, the child tables of that table also get affected. For ex, if someone changes his laptop model number and country, model number will be updated in a table and country value in another table. Likewise it goes on accessing multiple tables in short time. The maximum records coming in a webservice call might reach 70K in one call in an hour.
I don't have any other option than implementing it in java. Is there any good practice of implementing this, or can it be achieved using any open source java tools. Please suggest. Thanks.
Hibernate is likely to be the first thing you should try. I tend to avoid because it is overkill for most of my applications but it is a standard tool for accessing database which anyone who knows Java should at least have an understanding of. There are dozens of other solutions you could use but Hibernate is the most often used.
JDBC is the API to use to access relational database. Useful performance and security tips:
use prepared statements
use where ... in () queries to load many rows at once, but beware on the limit in the number of values in the in clause (1000 max in Oracle)
use batched statements to make your updates, rather than executing each update separately (see http://download.oracle.com/javase/1.3/docs/guide/jdbc/spec2/jdbc2.1.frame6.html)
See http://download.oracle.com/javase/tutorial/jdbc/ for a tutorial on JDBC.
This sounds not that complicated. Of course, you must know (or learn):
SQL
JDBC
Then you can go through the web service data record by record and for each record do the following:
fetch corresponding database record
for each field in record
if updated
execute corresponding update SQL statement
commit // every so many records
70K records per hour should be not the slightest problem for a decent RDBMS.

Categories

Resources