Strategies for dealing with constantly changing requirements for MySQL schemas?

Strategies for dealing with constantly changing requirements for MySQL schemas? - java

I'm using Hibernate EntityManager and Hibernate Annotations for ORM in a very early stage project. The project needs to launch soon, but the specs are changing constantly and I am concerned that the system will be launched and live data will be collected, and then the specs will change again and I will be in a situation where I need to change the database schema.
How can I set things up in order to minimize the impact of this? Are there any open source projects that deal with this kind of migration? Can Hibernate do this automatically (without wiping the database)?
Your advice is much appreciated.

It's more a functional or organizational problem than a technical one. No tool will automatically guess how to migrate data from one schema to another one. You'd better learn how to write stored procedure in order to migrate your data.
You'll probably need to disable constraints, create temporary table and columns, copy lots of data, and then delete the temporary tables and columns and re-enable constraints to have migrate your data.
Once in maintenance mode, every new feature that modifies the schema should also come with the script allowing to migrate from the current schema and data in production to the new one.

No system can possibly create datamigration scripts automatically from just the original and the final schema. There just isn't enough information.
Consider for example a new column. Should it just contain the default value? Or a value calculated from other fields/tables.
There is a good book about refactoring databases: http://www.amazon.com/Refactoring-Databases-Evolutionary-Addison-Wesley-Signature/dp/0321774515/ref=sr_1_1?ie=UTF8&qid=1300140045&sr=8-1
But there is little to no tool support for this kind of stuff.
I think the best thing you can do in advance:
Don't let anybody access the database but your application
If something else absolutely must access the db directly, give it a separate set of view specially for that purpose. This allows you to change your table structure by keeping at least the structure of what other systems see.
Have tons of tests. I just posted an article wich (with the upcoming 2nd and 3rd part) might help a little with this: http://blog.schauderhaft.de/2011/03/13/testing-databases-with-junit-and-hibernate-part-1-one-to-rule-them/

Hibernate can update the database entity model with data in the database. So do that and write migration code in java which sets or removes data relationships.
This works, and we have done it multiple times. But of course, try to follow a flexible development process; make what you know for sure first, then reevaluate the requirements - scrum etc.

In your case, I would recommend a NoSQL database. I don't have much experience with such kind of databases so I can't recommend any current implementation so you may want to check this too.

Related

Without using hibernate.hbm2ddl.auto, how do I export all the initial schema into Flyway?

I am at the almost ready stage of my JEE development. With a lot of recommendation NOT to use Hibernate's hbm2ddl.auto in production, I decided to remove it.
So now, I found out about Flyway, which seems great for future db changes and migrations, but I am stuck at first step: I have many entities, some entities inherit from base entities. This makes the CREATE statement very complex.
What is the best practice to create the first migration file?
Thanks!

If you've taken an "entities first" approach during development you'll need to generate the initial schema in the same way for the first live deployment: This will produce the first creation script used by Flyway and there may also need to be a second associated script for populating reference data.
In a nutshell, the reasons for no longer being able to use hbm2ddl.auto after the first deployment are that create will destroy existing data and update isn't reliable enough to cover all types of schema changes (as it sounds like you may already know from this SO question).
Flyway is a very useful tool but it does require a level of discipline that may not have existed during development. When going forward from the initial release, database update scripts need to be produced for Flyway that are equivalent to the changes made to the entities since the last release. There are tools (e.g. various commercial products from Redgate) that may help here: These attempt to "diff" two schemas and generate schema and/or data update scripts for getting from database A to database B. But in my experience, none of them are perfect and they don't quite reach the holy grail of enabling a completely automated approach.
Arguably, the best way is an "as you go" manual approach to ensure that non-destructive update scripts are committed to source control whenever an entity change is made that affects the schema or reference data - but as already mentioned, this will require some discipline and/or documented processes for all team members to follow.

For the first migration file, you just need the current ddl of your database. There are many tools which can get this for you (such as the "copy ddl" option in the IntelliJ IDEA Database tool or a GUI client from your database vendor).

I am not sure about Flyway but there is an alternate way, you can use ant tasks for hibernate to generate or update schema.
Hope it helps.

If you build your project with Maven, you could use Hibernate maven plugin.

How to get rid of database dependency of an already developed application having Oracle native queries?

I have an application with a huge code base which uses an Oracle database. I want to develop an hibernate app which can interact with incoming and outgoing request from the above said application without any dependencies of database.
Like if I want to change the database to mysql or postgresql it would not have any problem. Is this practical? Can it be done? Asking for help.

As to practicality, very seldom does an app ever change databases. While the idea sounds great it isn't often done and generally the benefits you can get from using built in database features sometimes outweighs the work of keeping it database independent.
As to it being done, it certainly can between SQL databases. To go from SQL to noSQL is a bit more tricky as they are in the process of supporting them in JPA. If interested in that take a look at Hibernate OGM. If you want to truly keep it so you can easily switch databases you need to stick to the JPA standard. See this on generating JPA compliant entities from the database. So long as you use ONLY JPA you can easily switch between the databases that provide a JPA implementation. Then you just include the correct implementation set the dialect and you are switched.
If you have access to change the current application it will probably be easier to just update each of the actions that contain the hard coded queries with your JPA code. If you have unit testing that would make this process much easier as well.
If you want to write something new, but not change the front end, you would need to handle whatever actions your forms on the front end are submitting. Making sure to make them available at the same path and with the same HTTP methods (GET, POST, PUT, etc.), that take the same parameters, and returning the same structure as what your actions due today.
Both approaches would allow you to go action by action replacing them. With writing something new though, replacing them one at a time is a little more difficult if both the new app and old app aren't in the same domain OR if authentication/authorization is involved.
Good luck and best wishes!

Best practices in database changes in web applications already deployed

I am trying to find a standard approach on the following problem I have.
I have a web application deployed in a container (specifically Tomcat) and it uses a database for its functionality (in my case it is an SQL database in file mode, so there is no back-end SQL server).
What I am interested in is what is the best way to handle the various changes of my database on newer versions of my web application as the database schema changes (new tables/ new columns, removal of columns etc).
I.e. how can I handle the case of someone upgrading to a newer version of my web application and still retain his old data from the old database in the best (automatic? seemless? less manual?) manner.
I think that this is not a rare case so I believe there some best practice I can follow here.
Can anyone help me on this?

Recently we discovered Flyway - it works pretty well and embraces versioning of database schema changes (plain SQL scripts).
Obviously this topic is much broader. For instance you need to be extra careful when both the old and the new version of the application should run flawlessly in updated schema. Also you should consider rollback strategy (when upgrade didn't work well or you want to downgrade your application) - sometimes it is as simple as removing added objects (tables, columns), but when your scripts removes something, rollback should restore them.

First of all, you'd want to keep changes to the database and especially to existing columns as low as possible.
Second, if you need to rename a column or change some constraints (be careful not to get more restrictive because there might be some data that would not match), use ALTER TABLE statements. This way the data in the columns is preserved unless you drop columns. :)
Additionally, provide default values for new columns that have constraints (like not null) because there might already be datasets in that table that need to be updated in order not to violate those constraints. (Alternatively add the column, run some code to fill the column and then add the constraint.)
Third, since there seem to be multiple users of your application and they might have different versions, the easiest way for providing updates is to provide for sequential updates to the next higher version. Thus if someone wants to update from version 2 to 5, you'd first do the 2->3 update, then 3->4 and finally 4->5.
This might take longer to run but should reduce complexity since you'd bot have to worry about all possible combinations (e.g. 2->4, 2->5, 3->5 etc.)

Standard Workflow when working with JPA

I am currently trying to wrap my head around working with JPA. I can't help but feel like I am missing something or doing it the wrong way. It just seems forced so far.
What I think I know so far is that their are couple of ways to work with JPA and tools to support this.
You can do everything in Java using annotations, and let JPA (whatever implementation you decide to use) create your schema and update it when changes are made.
You can use a tool to reverse engineer you database and generate the entity classes for you. When the schema is updated you have to regenerate these classes, or manually update them.
There seems to be drawbacks to both, and benefits to both (as with all things). My question is in an ideal situation what is the standard workflow with JPA? Most schemas will require updates during the maintenance phase and especially during the development phase, so how is this handled?

It's not always a good approach to generate the DB schema from the annotated entities. Although in theory it sounds great - in practice often the generated schema is not optimal and would not satisfy and experienced DBA.
The approach that I follow in my workflow is to create the entities and db schema separately, while still using a pretty intelligent tool for the schema creating - either something like Liquibase, that is database agnostic, supports revisions, rollbacks, etc... or a custom baked migration tool that simply runs heavily optimized db specific sql scripts.
It probably sounds to you less than ideal, but I can assure it gets the jobs done and keep your schema related code consistent since, as grigory pointed out - not everything related to the database can be generated from the entities anyways.
I can, however, be useful to generate the schema from the entities for the test database against which unit and integration tests are being run. Assuming you're using say PostgreSQL is production you might decide to speed things up for the unit tests running some embedded in-memory database like H2 which gets created from the entities before the tests are started and disappears automatically(since it was in-memory) after the tests finish executing. This is a very common practice.

As usual the answer is it depends...
Ideal approach (in ideal world) would probably be your 1st option: maintain everything using JPA annotations and forward engineer database artifacts using utility tool (e.g. use Hibernate Maven plugin).
It depends on the level of support for your database artifacts - not everything either belongs or suitable for annotations. That is why my projects usually use parallel maintenance for both and using unit tests to keep them in sync.
It also depends on resources available. If you have a dedicated DBA who is responsible for your database then delegating maintenance to her would make sense.
Other consideration is how much database development is really done in JPA. Are there also stored procedures or other non-JPA applications that use the same back-end, or maybe you just integrate with other team's database...

If this is an existing application, I would check what you have existing, if the database structure complex as can be seen with the DDL and the DDL shows significant logic is being done on the database itself, then you are better off using plain SQLs and let the DBA maintain your data structures. JPA does not really lend well when the database structures are already complicated and there is no business benefit to use JPA at that point.
What needs to happen is a project to migrate to JPA. There are a few advantages to that:
Business Logic is removed from the database layer (which is harder to scale horizontally) to the application tier.
Java developers are generally cheaper compared to a DBA. Though you still need someone who can do both database thinking and Java thinking to do this properly and that's more rare.
By reducing the database to become as simple datastore, you can break yourself from vendor lock-in.
If done right, you can have a different database for development (can be DB2 Express C which is free) and have a more robust database for your integration and production environments (e.g. DB/2 for zOS). This allows you to be able to have more developers without worrying about licensing costs as much.
As for schemas being generated and such, there are actually four workflows that can occur:
For design, an Object-Relational (rather than an Entity-Relational) diagram serves as a contract between the application team and the database team. The end result is the JPA objects will run in the physical data structure that the DBA sets.
For Java application development, just let each developer have their own database and let them blow it up as much as they want. The JPA code will generate the schemas for you.
For database development, the generated schemas and class diagrams are passed onto review by the DBA to see where performance can be improved upon. Specifically they are there to specify the indices which are not available in the JPA standard since it is not cross database. They are also there to set up the table spaces and all the access controls and schemas for the development, but at least the gist of the structure can be taken away from them and passed onto the application team which gives the application team more flexibility to adapt to changes. What would normally happen is the DBA just includes some generated SQL and then have alters to add additional columns and others that would be used for other purposes outside of the application (the JPA structure needs only what is needed by the application, it does not need to map one-to-one 100% to the database)
For migration, the DBA needs to do a differential analysis between the two schemas. There's a program called dbsolo (not free) that can do it with most databases. However, if things were done in JPA, the structures are simpler since in theory there is no longer any business logic on the database thus reducing the complexity of data migrations due to upgrades.
The net of it is you can't just say you're using JPA without involving the whole delivery team which will have to include the DBA willing to relinquish control and ownership of the structure of the data to the application team, but still be part of the design and reviews.

What JDBC tools do you use for synchronization of data sources?

I'm hoping to find out what tools folks use to synchronize data between databases. I'm looking for a JDBC solution that can be used as a command-line tool.
There used to be a tool called Sync4J that used the SyncML framework but this seems to have fallen by the wayside.

I have heard that the Data Replication Service provided by Db4O is really good. It allows you to use Hibernate to back onto a RDBMS - I don't think it supports JDBC tho (http://www.db4o.com/about/productinformation/drs/Default.aspx?AspxAutoDetectCookieSupport=1)
There is an open source project called Daffodil, but I haven't investigated it at all. (https://daffodilreplicator.dev.java.net/)
The one I am currently considering using is called SymmetricDS (http://symmetricds.sourceforge.net/)
There are others, they each do it slightly differently. Some use triggers, some poll, some use intercepting JDBC drivers. You need to decide what technical limitations you are under to determine which one you really want to use.
Wikipedia provides a nice overview of different techniques (http://en.wikipedia.org/wiki/Multi-master_replication) and also provides a link to another alternative DBReplicator (http://dbreplicator.org/).

If you have a model and DAO layer that exists already for your codebase, you can just create your own sync framework, it isn't hard.
Copy data is as simple as:
read an object from database A
remove database metadata (uuid, etc)
insert into database B
Syncing has some level of knowledge about what has been synced already. You can either do it at runtime by getting a list of uuids from TableInA and TableInB and working out which entries are new, or you can have a table of items that need to be synced (populate with a trigger upon insert/update in TableInA), and run from that. Your tool can be a TimerTask so databases are kept synced at the time granularity that you desire.
However there is probably some tool out there that does it all without any of this implementation faff, and each implementation would be different based on business needs anyway. In addition at the database level there will be replication tools.

True synchronization requires some data that I hope your database schema has (you can read the SyncML doc to see how they proceed). Sync4J won't help you much, it's really high-level and XML oriented. If you don't foresee any conflicts (which means: really easy synchronisation), you could try with a lightweight ETL like Enhydra Octopus.

I'm primarily using Oracle at the moment, and the most full-featured route I've come across is Red Gate's Data Compare:
http://www.red-gate.com/products/oracle-development/data-compare-for-oracle/
This old blog gives a good summary of the solution routes available:
http://www.novell.com/coolsolutions/feature/17995.html
The JDBC-specific offerings I've come across have been very basic. The solution mentioned by Aidos seems the most feature complete if you want to go down the publish-subscribe route:
http://symmetricds.codehaus.org/
Hope this helps.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.