How to synchronize remote database with local database in JAVA

How to synchronize remote database with local database in JAVA - java

I need to create project in which there are two databases local and remote. Remote database needs to be synchronized daily with local database reflecting changes made in local database.
I am using JAVA. Database is ORACLE. I have JAVA/JPA code that does CRUD operations on local database.
How to synchronize changes to remote database.

I would not do this in Java, but look for native Oracle database synchronization mechanisms/tools. This will
be quicker to implement
be more robust
have faster replication events
be more 'correct'

Please look at some synchronization products. SQL Anywhere from Sybase where I work is one such product. You may be able to get a developer/evaluation copy that you can use to explore your options. I am sure Oracle has something similar too.
The basic idea is to be able to track the changes that have happened in the central database. This is typically done by keeping a timestamp for each row. During the synchronization, the remote database provides the last sync time and the server sends to it all rows that have changed since then. Note that the rows that have been deleted in the central database will need some special handling to ensure they get deleted from the remote database.
A true two-way synchronization is lot more complex. You need to also upload the changes from remote database to central and also some conflict resolution strategies have to be implemented for the cases when the same row has been changed in both the remote and central database in incompatible way in the two.
The general problem is too complex to be explained in a respone here but I hope I have been able to provide some useful pointers.

The problem is that what you are asking can range from moderately difficult (for a simple, not very robust system) to a very complex product that could keep a small team busy for a year depending on requirements.
That's why the other answers said "Find another way" (basically)
If you have to do this for a class assignment or something, it's possible but it probably won't be quick, robust or easy.
You need server software on each side, a way to translate unknown tables to data that can be transferred over the wire (along with enough meta-data to re-create it on the other side) and you'll probably want to track database changes (perhaps with a flag or timestamp) so that you don't have to send each record over every time.
It's a hard enough problem that we can't really help much. If I HAD to do that for a customer, I'd quote him at least a man year of work to get it even moderately reliable.
Good Luck

Oracle has a sophistication replication functionality to synchronise databases. Find out more..
From your comments it appears you're using the Oracle Lite: this supports replication, which is covered in the Lite documentation.

Never worked with it, but http://symmetricds.codehaus.org/ might be of use

Related

Database distribution

What are the possibilities to distribute data selectively?
I explain my question with an example.
Consider a central database that holds all the data. This database is located in a certain geographical location.
Application A needs a subset of the information present in the central database. Also, application A may be located in a geographical location different (and maybe far) from the one where the central database is located.
So, I thought about creating a new database at the same location of application A that would contain a subset of information of the central database.
Which technology/product allow me to deploy such a configuration?
Thanks

Look for database replication. SQL Server can do this for sure, others (Oracle, MySQL, ...) should have it, too.
The idea is that the other location maintains a (subset) copy. Updates are exchanged incrementally. The way to treat conflicts depends on your application.

Most major database software such as MySql and SQL server can do the job, but it
is not a good model. With the growth of the application (traffic and users),
not only will you create a load on the central database server (which might be serving
other applications),but you will also be abusing your network bandwidth to transfer data
between the far away database and the application server.
A better model is to keep your data close to the application server, and use the far away
database for backup and recovery purposes only. You can use an FC\IP SAN (or any other
storage network architecture) as your storage network model, based on your applications' needs.

One big question that you didn't address is if Application A needs read-only access to the data or if it needs to be read-write.
The immediate concept that comes to mind when reading your requirements is sharding. In MySQL, this can be accomplished with partitioning. That being said, before you jump into partitions, make sure you read up on their pros and cons. There are instances where partitioning can slow things down if your indexes are not well chosen, or your partitioning scheme is not well thought out.
If your needs are read-only, then this should be a fairly simple solution. You can use MySQL in a Master-Slave context, and use App A off a slave. If you need read-write, then this becomes much more complex.
Depending on your write needs, you can split your reads to your slave, and your writes to the master, but that significantly adds complexity to your code structure (need to deal with multiple connections to multiple dbs). The advantage of this kind of layout is that you don't need to have complex DB infrastructure.
On the flip side, you can keep your code as is, and use a Master-Master replication in MySQL. Although not officially supported by Oracle, a lot of people have had success in this. A quick Google search will find you a huge list of blogs, howtos, etc. Just keep in mind that your code has to be properly written to support this (ex: you cannot use auto-increment fields for PKs, etc).
If you have cash to spend, then you can look at some of the more commercial offerings. Oracle DB and SQL Server both support this.
You can also use Block Based data replication, such as DRDB (and Mysql DRDB) to handle the replication between your nodes, but the problem you always will encounter is what happens if your link between the two nodes fails.
The biggest issue you will encounter is how to handle conflicting updates in 2 separate DB nodes. If your data is geographically dependent, then this may not be an issue for you.
Long story short, this is not an easy (or inexpensive) problem to resolve.

It's important to address the possibility of conflicts at the design phase anytime you are talking about replicating databases.
Moving on from that, SAP's Sybase Replication Server will allow you to do just that, either with Sybase database's or 3rd party databases.
In Sybase's world this is frequently called a corporate roll-up environment. There may be multiple geographically seperated databases each with a subset of data which they have primary control over. At the HQ, there is a server that contains all the various subsets in one repository. You can choose to replicate whole tables, or replicate based on values in individual rows/columns.
This keeps the databases in a loosely consistent state. Transaction rates, Geographic separation, and the latency that can be inherent to network will impact how quickly updates move from one database to another. If a network connection is temporarily down, Sybase Replication Server will queue up transaction, and send them as soon as the link comes back up, but the reliability and stability of the replication system will be affected by the stability of the network connection.
Again, as others have stated it's not cheap, but it's relatively straight forward to implement and maintain.
Disclaimer: I have worked for Sybase, and am still part of the SAP family of companies.

how to keep object instances synchronized

I am working with an object that serves as a database in my application. However, I need to have redundant copies of this database. So, on init, I create multiple instances (say 5) copies of the same object. (I am using JAVA for this, so any hint of pre-existing libraries could be helpful as well.)
The object is a server that listens on a port for request for the information it is holding. This information may be updated by other entities via the same or a different port at any time.
My question is as follows:
Would a lock strategy
work in this case? That is, every time an update is made in
any instance, that instance contacts
all other instances and passes the
update.
During this time, all the requests
(read or update) from other entities
are queued.
Would this approach work? I have my doubts because, even if this works, I think the system is creating its own bottleneck. What do you guys say? Is there a better way of doing this distributed synchronization?

What you're describing is a distributed cache. The big player in that space is currently Coherence though I believe JBoss Cache is catching up.
As for rolling your own, having seen the complexity in what superficially sounds quite a simple problem, I wouldn't recommend it in a comercial setting, though it'd be a fun home project.

Are you talking about a distributed cache? Have you looked at ehcache?

Would this approach work? I have my
doubts because, even if this works, I
think the system is creating its own
bottleneck.
It would be creating its own bottleneck. You'd be better off using an in-memory database like HSQLDB or an embedded database like SQLite.

There is lot more to distributed syntonization than it's possible to mention in a single answer. You have to worry about two-phase commits, network partitions, etc. etc. I would advise you to look into an existing distributed DB solution combined with an n-tier Java EE architecture that includes load-balancing.

Strategies for dealing with constantly changing requirements for MySQL schemas?

I'm using Hibernate EntityManager and Hibernate Annotations for ORM in a very early stage project. The project needs to launch soon, but the specs are changing constantly and I am concerned that the system will be launched and live data will be collected, and then the specs will change again and I will be in a situation where I need to change the database schema.
How can I set things up in order to minimize the impact of this? Are there any open source projects that deal with this kind of migration? Can Hibernate do this automatically (without wiping the database)?
Your advice is much appreciated.

It's more a functional or organizational problem than a technical one. No tool will automatically guess how to migrate data from one schema to another one. You'd better learn how to write stored procedure in order to migrate your data.
You'll probably need to disable constraints, create temporary table and columns, copy lots of data, and then delete the temporary tables and columns and re-enable constraints to have migrate your data.
Once in maintenance mode, every new feature that modifies the schema should also come with the script allowing to migrate from the current schema and data in production to the new one.

No system can possibly create datamigration scripts automatically from just the original and the final schema. There just isn't enough information.
Consider for example a new column. Should it just contain the default value? Or a value calculated from other fields/tables.
There is a good book about refactoring databases: http://www.amazon.com/Refactoring-Databases-Evolutionary-Addison-Wesley-Signature/dp/0321774515/ref=sr_1_1?ie=UTF8&qid=1300140045&sr=8-1
But there is little to no tool support for this kind of stuff.
I think the best thing you can do in advance:
Don't let anybody access the database but your application
If something else absolutely must access the db directly, give it a separate set of view specially for that purpose. This allows you to change your table structure by keeping at least the structure of what other systems see.
Have tons of tests. I just posted an article wich (with the upcoming 2nd and 3rd part) might help a little with this: http://blog.schauderhaft.de/2011/03/13/testing-databases-with-junit-and-hibernate-part-1-one-to-rule-them/

Hibernate can update the database entity model with data in the database. So do that and write migration code in java which sets or removes data relationships.
This works, and we have done it multiple times. But of course, try to follow a flexible development process; make what you know for sure first, then reevaluate the requirements - scrum etc.

In your case, I would recommend a NoSQL database. I don't have much experience with such kind of databases so I can't recommend any current implementation so you may want to check this too.

Getting events from a database

I am not very familiar with databases and what they offer outside of the CRUD operations.
My research has led me to triggers. Basically it looks like triggers offer this type of functionality:
(from Wikipedia)
There are typically three triggering events that cause triggers to "fire":
INSERT event (as a new record is being inserted into the database).
UPDATE event (as a record is being changed).
DELETE event (as a record is being deleted).
My question is: is there some way I can be notified in Java (preferably including the data that changed) by the database when a record is Updated/Deleted/Inserted using some sort of trigger semantics?
What might be some alternate solutions to this problem? How can I listen to database events?
The main reason I want to do this is a scenario like this:
I have 5 client applications all in different processes/existing across different PCs. They all share a common database (Postgres in this case).
Lets say one client changes a record in the DB that all 5 of the clients are "interested" in. I am trying to think of ways for the clients to be "notified" of the change (preferably with the affected data attached) instead of them querying for the data at some interval.

Using Oracle you can setup a Trigger on a table and then have the trigger send a JMS message. Oracle has two different JMS implementations. You can then have a process that will 'listen' for the message using the JDBC Driver. I have used this method to push changes out to my application vs. polling.
If you are using a Java database (H2) you have additional options. In my current application (SIEM) I have triggers in H2 that publish change events using JMX.

Don't mix up the database (which contains the data), and events on that data.
Triggers are one way, but normally you will have a persistence layer in your application. This layer can choose to fire off events when certain things happen - say to a JMS topic.
Triggers are a last ditch thing, as you're operating on relational items then, rather than "events" on the data. (For example, an "update", could in reality map to a "company changed legal name" event) If you rely on the db, you'll have to map the inserts & updates back to real life events.... which you already knew about!
You can then layer other stuff on top of these notifications - like event stream processing - to find events that others are interested in.
James

Hmm. So you're using PostgreSQL and you want to "listen" for events and be "notified" when they occur?
http://www.postgresql.org/docs/8.3/static/sql-listen.html
http://www.postgresql.org/docs/8.3/static/sql-notify.html
Hope this helps!

Calling external processes from the database is very vendor specific.
Just off the top of my head:
SQLServer can call CLR programs from
triggers,
postgresql can call arbitrary C
functions loaded dynamically,
MySQL can call arbitrary C functions,
but they must be compiled in,
Sybase can make system calls if set
up to do so.

The simplest thing to do is to have the insert/update/delete triggers make an entry in some log table, and have your java program monitor that table. Good columns to have in your log table would be things like EVENT_CODE, LOG_DATETIME, and LOG_MSG.
Unless you require very high performance or need to handle 100Ks of records, that is probably sufficient.

I think you're confusing two things. They are both highly db vendor specific.
The first I shall call "triggers". I am sure there is at least one DB vendor who thinks triggers are different than this, but bear with me. A trigger is a server-side piece of code that can be attached to table. For instance, you could run a PSQL stored procedure on every update in table X. Some databases allow you to write these in real programming languages, others only in their variant of SQL. Triggers are typically reasonably fast and scalable.
The other I shall call "events". These are triggers that fire in the database that allow you to define an event handler in your client program. IE, any time there are updates to the clients database, fire updateClientsList in your program. For instance, using python and firebird see http://www.firebirdsql.org/devel/python/docs/3.3.0/beyond-python-db-api.html#database-event-notification
I believe the previous suggestion to use a monitor is an equivalent way to implement this using some other database. Maybe oracle? MSSQL Notification services, mentioned in another answer is another implementation of this as well.
I would go so far as to say you'd better REALLY know why you want the database to notify your client program, otherwise you should stick with server side triggers.

What you're asking completely depends on both the database you're using and the framework you're using to communicate with your database.
If you're using something like Hibernate as your persistence layer, it has a set of listeners and interceptors that you can use to monitor records going in and out of the database.

There are a few different techniques here depending on the database you're using. One idea is to poll the database (which I'm sure you're trying to avoid). Basically you could check for changes every so often.
Another solution (if you're using SQL Server 2005) is to use Notification Services, although this techonology is supposedly being replaced in SQL 2008 (we haven't seen a pure replacement yet, but Microsoft has talked about it publicly).

This is usually what the standard client/server application is for. If all inserts/updates/deletes go through the server application, which then modifies the database, then client applications can find out much easier what changes were made.

If you are using postgresql it has capability to listen notifications from JDBC client.

I would suggest using a timestamp column, last updated, together with possibly the user updating the record, and then let the clients check their local record timestamp against that of the persisted record.
The added complexity of adding a callback/trigger functionality is just not worth it in my opinion, unless supported by the database backend and the client library used, like for instance the notification services offered for SQL Server 2005 used together with ADO.NET.

What JDBC tools do you use for synchronization of data sources?

I'm hoping to find out what tools folks use to synchronize data between databases. I'm looking for a JDBC solution that can be used as a command-line tool.
There used to be a tool called Sync4J that used the SyncML framework but this seems to have fallen by the wayside.

I have heard that the Data Replication Service provided by Db4O is really good. It allows you to use Hibernate to back onto a RDBMS - I don't think it supports JDBC tho (http://www.db4o.com/about/productinformation/drs/Default.aspx?AspxAutoDetectCookieSupport=1)
There is an open source project called Daffodil, but I haven't investigated it at all. (https://daffodilreplicator.dev.java.net/)
The one I am currently considering using is called SymmetricDS (http://symmetricds.sourceforge.net/)
There are others, they each do it slightly differently. Some use triggers, some poll, some use intercepting JDBC drivers. You need to decide what technical limitations you are under to determine which one you really want to use.
Wikipedia provides a nice overview of different techniques (http://en.wikipedia.org/wiki/Multi-master_replication) and also provides a link to another alternative DBReplicator (http://dbreplicator.org/).

If you have a model and DAO layer that exists already for your codebase, you can just create your own sync framework, it isn't hard.
Copy data is as simple as:
read an object from database A
remove database metadata (uuid, etc)
insert into database B
Syncing has some level of knowledge about what has been synced already. You can either do it at runtime by getting a list of uuids from TableInA and TableInB and working out which entries are new, or you can have a table of items that need to be synced (populate with a trigger upon insert/update in TableInA), and run from that. Your tool can be a TimerTask so databases are kept synced at the time granularity that you desire.
However there is probably some tool out there that does it all without any of this implementation faff, and each implementation would be different based on business needs anyway. In addition at the database level there will be replication tools.

True synchronization requires some data that I hope your database schema has (you can read the SyncML doc to see how they proceed). Sync4J won't help you much, it's really high-level and XML oriented. If you don't foresee any conflicts (which means: really easy synchronisation), you could try with a lightweight ETL like Enhydra Octopus.

I'm primarily using Oracle at the moment, and the most full-featured route I've come across is Red Gate's Data Compare:
http://www.red-gate.com/products/oracle-development/data-compare-for-oracle/
This old blog gives a good summary of the solution routes available:
http://www.novell.com/coolsolutions/feature/17995.html
The JDBC-specific offerings I've come across have been very basic. The solution mentioned by Aidos seems the most feature complete if you want to go down the publish-subscribe route:
http://symmetricds.codehaus.org/
Hope this helps.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.