database independent data migration

database independent data migration - java

My goal is to enable schema and data migration for an existing application.
This kind of question seems to have been asked many times, however with different requirements and circumstances as mine, I think.
Since I am inexperienced in this domain, allow me to lay out the architecture of the app and my assumptions first.
Architecture
The app is a multi-user, enterprise desktop application with a backend server that can persist to any major DB (MySql, Postgresql, SQL Server, Oracle DB, etc). It is assumed the DB is on-premise and maintained by our clients.
The tech stack used is a fairly common Hibernate+Spring+RMI/JMS-Combo.
Currently, migrations are done by the server in the following way:
On server start it checks for the latest expected schema version
If larger than the current version, start migration to next version until current==latest:
Create new database
Load (whole) latest schema (SQL script with a lot of CREATE TABLE ...)
Migrate data (in Java classes using 2 JDBC-Connections to old and new schema)
Load (all) latest constraints (SQL script with a lot of ALTER TABLE ...)
This migration is slow and forward-only. But it is simple. The problem is, that until now the schema scripts and the queries in the data migrations have been using MySQL-syntax and features.
Note that by migrate data I mean: the backend server copies the data from the old schema to the new one, transforming it if necessary.
Also, the migration process starts automatically on-premise of our clients. Meaning, we only have control over the JDBC connection, but no direct access to the database nor knowledge about the specific database being used (MySQL, SQL Server,...).
The goal is to either replace or augment this migration scheme with a database independent one.
Assumptions and research
StackOverflow 1 2 3 4 5 6 7: Answers state to use Hibernate's inbuilt feature. However, the docs state that this is not production ready. Also, AFAICT, all answers are concerned with schema migration only.
Liquibase: Uses a custom DSL (in XML/JSON/YAML/etc) to allow for database independent schema migration only.
DBUnit: Uses custom XML-DSL to capture snapshots of states of databases. Can not recreate a snapshot of schema version 1 to version 2.
flyway: In principle same as Liquibase. But is not database independent because SQL-Scripts are used for migrations.
JOOQ: A database independent Query-DSL in Java on top of JDBC. Comparable to Criteria API but without the drawbacks of JPA. Should in principle allow for database independent data migration, however, does not help with schema migration.
JPA-Query languages like HQL, JPQL, Criteria API are not sufficient because
One cannot reference tables not mapped by the entity manager. E.g. join tables, metadata and audit tables.
A copy of all versions of the Entity classes needs to be kept around for the mapping.
Question
I realize, that as this question stands now, it will be dismissed as opinion-based.
However, I am not necessarily looking for specific solutions to this problem ( I doubt there exists a clear solution for such a complex problem space ) but rather to validate my assumptions.
Namely, is it true, that
Liquibase and Flyway are mainly concerned with schema migration and data migration is left as an exercise for the reader?
in order for Flyway to support multiple, different databases, one needs to duplicate the migrations scripts per database?
by and large, the problem of database independent data migration remains unresolved in enterprise Java?
Even if I was to combine Liquibase/Flyway with JOOQ, I do not see how to perform a data migration, because Liquibase/Flyway migrate databases in place. The old database gets destroyed and with it the opportunity to transform the old data to the new schema.
Thanks for your attention!

Let's break it down a little bit. You're right in that this is largely opinion based, but here's what I've noticed in my experiences.
Liquibase and Flyway are mainly concerned with schema migration and data migration is left as an exercise for the reader?
You can do data migration with liquibase and flyway. It's something I've done pretty often. Take the example where I want to split a User table into User and Address tables. I'd write a migration script, which is basically just a sql file, to create the new Address table and the copy all the relevant data into it.
in order for Flyway to support multiple, different databases, one needs to duplicate the migrations scripts per database?
Possibly, flyway and liquibase are better thought of as database versioning tools. If my app needs version 10 of the database, these tools would help me get to that point. Again, the migration scripts are just basic .sql files. If you're using some mysql specific functions then those will just go in the migration script and they wouldn't work on a sql server
by and large, the problem of database independent data migration remains unresolved in enterprise Java?
Eh, I'm not sure about this one. I agree its a problem, but in practice it's not a huge one. For the past 8+ years, I've only written ansi sql. It should be portable everywhere. So in theory, we can lift those applications on to a different database. JPA and the various implementations help with those differences. Depending on how your project was built, say an application that has all of its business logic in implementation specific sql functions, then it's going to be a headache. If you're using the database for CRUD, and I'd argue that's all you should be using it for, then it's not a huge deal.
So all that said, I think you might have the wrong idea about flyway and liquibase. Like i said earlier, they aren't really 'migration tools' so much as they are database versioning tools. With a list of specific sql migration scripts that are ordered, i can guarantee the state of my database at any version. I'm not sure these are tools that I'd use to 'migrate' a legacy SQL Server based application into a PostGres based application.

Related

Java Application - Can i Store my sql queries in the DB rather than a file packaged inside the application?

As the application gets complicated, one thing that change a lot is the queries, especially if they are complex queries. Wouldn't it be easier to maintain the queries in the db rather then the resources location inside the package, so that it can be enhanced easily without a code change. What are the drawbacks of this?

You can use stores procedures, to save your queries in the database. Than your Java code can just call the procedure from the database instead of building a complex query.
See wikipedia for a more detailed explanation about stored procedures:
https://en.wikipedia.org/wiki/Stored_procedure
You can find details about the implementation and usage in the documentation of your database system (MySql, MariaDb, Oracle...)
When you decide to move logic to the database, you should use a version control system for databases like liquibase: https://www.liquibase.org/get-started/quickstart
You can write the changes to you database code in xml, json or even yaml and check that in in your version control system (svn, git...). This way you have a history of the changes and can roll back to a previous version of your procedure, if something goes wrong.
You also asked, why some people use stored procedures and others keep their queries in the code.
Stored procedures can encapsulate the query and provide an interface to the data. They can be faster than queries. That is good.
But there are also problems
you distribute the buisiness logic of your application to the database and the programm code. It can realy be troublesome, if the logic is spread through all technical layers of your applicaton.
it is not so simple anymore to switch from a Oracle database to a MariaDb, if you use specific features of the database system. You have to migrate or rewrite the procedures.
you have to integrate liquibase or another system into you build pipeline, to keep track of you database changes.
So it depends on the project and it's size, if either of the solutions is better.

Working with microservices. Hibernate or Scripts

What is the best approach for database creation and relationship management when working with microservices?Hibernate or scripts, as i feel it shouldn't be the responsibility of microservices to create a database

As already pointed by #Vadim in the comment it ultimately it is the desginer's or developer's job to decide what to use.
My two cents from experience, in long run it is always good to use schema generation scripts and there are lots of opensource libraries available.
For instance in java we have Liquibase and Flyway.
The reason why I am saying this is, your DB will undergo lot of changes in long run. Hibernate can easily handle creating and modifying table and column changes, but sometimes for example when you add a new column you may want to fill the existing records for which you may need to write custom sqls.
Similarly from time to time you may want to update records from back-end which is difficult to achieve using hibernate.
I have observed that DB creation generally is part of pre-deploy scripts and schema generation happens during application startup.
My advise is to use some schema generation tool for schema migration and use hibernate for schema validation so that the two remain in sync.

New Flyway migrations break existing jOOQ generated code

I currently use jOOQ to generate Java code from my database and Flyway to manage my binary (Java) migrations as well as SQL migrations.
However, I run into problems when I modify existing tables. For example, if I were to drop a column in one migration and a past binary migration was dependent on that column, the migration will have a syntax error because the field wouldn't exist in jOOQ anymore.
I know I could just comment out the body of the migration but that kind of defeats the whole purpose of Flyway or any sort of database version manager if I can't rerun my migrations or makes it very tedious (run 1 migration, uncomment, run next, generate jOOQ, etc)
Is there a better way to approach this problem?

I'd argue this is a workflow problem.
You are effectively upgrading an API with each migration, expecting legacy consumers of that API to continue to work would be nothing short of miraculous.
jOOQ is a great tool but using it in this context (to assist migrations) is certainly going to lead to trouble.
My suggestion would be to rethink you schema evolution strategy; using raw SQL, which comes naturally to Flyway, and leave jOOQ for exclusively assisting your application instead.

Enhance persistence.xml for database update

For development and deployment of my WAR application I use the drop-and-create functionality. Basically erasing everything from the database and then automatically recreating all the necessary tables and fields according to my #Entity-classes.
Obviously, for production the drop-and-create functionality is out of question. How would I have to create the database tables and fields?
The nice thing about #Entity-classes is that due to OQL and the use of EntityManager all the database queries are generated, hence the WAR application gets database independent. If I now had to create the queries by hand in SQL and then let the application execute them, then I would have to decide in which sql dialect they are (i.e. MySQL, Oracly, SQL Server, ...). Is there a way to create the tables database independently? Is there a way to run structural database updates as well database independently (i.e. for database version 1 to database version 2)? Like altering field or table names, adding tables, droping tables, etc.?

Thank you #Qwerky for mentioning Liquibase. This absolutely is a solution and perfect for my case as I won't have to worry about versioning anymore. Liquibase is very easy to understand and studied in minutes.
For anyone looking for database versioning / scheme appliance:
Liquibase

Standard Workflow when working with JPA

I am currently trying to wrap my head around working with JPA. I can't help but feel like I am missing something or doing it the wrong way. It just seems forced so far.
What I think I know so far is that their are couple of ways to work with JPA and tools to support this.
You can do everything in Java using annotations, and let JPA (whatever implementation you decide to use) create your schema and update it when changes are made.
You can use a tool to reverse engineer you database and generate the entity classes for you. When the schema is updated you have to regenerate these classes, or manually update them.
There seems to be drawbacks to both, and benefits to both (as with all things). My question is in an ideal situation what is the standard workflow with JPA? Most schemas will require updates during the maintenance phase and especially during the development phase, so how is this handled?

It's not always a good approach to generate the DB schema from the annotated entities. Although in theory it sounds great - in practice often the generated schema is not optimal and would not satisfy and experienced DBA.
The approach that I follow in my workflow is to create the entities and db schema separately, while still using a pretty intelligent tool for the schema creating - either something like Liquibase, that is database agnostic, supports revisions, rollbacks, etc... or a custom baked migration tool that simply runs heavily optimized db specific sql scripts.
It probably sounds to you less than ideal, but I can assure it gets the jobs done and keep your schema related code consistent since, as grigory pointed out - not everything related to the database can be generated from the entities anyways.
I can, however, be useful to generate the schema from the entities for the test database against which unit and integration tests are being run. Assuming you're using say PostgreSQL is production you might decide to speed things up for the unit tests running some embedded in-memory database like H2 which gets created from the entities before the tests are started and disappears automatically(since it was in-memory) after the tests finish executing. This is a very common practice.

As usual the answer is it depends...
Ideal approach (in ideal world) would probably be your 1st option: maintain everything using JPA annotations and forward engineer database artifacts using utility tool (e.g. use Hibernate Maven plugin).
It depends on the level of support for your database artifacts - not everything either belongs or suitable for annotations. That is why my projects usually use parallel maintenance for both and using unit tests to keep them in sync.
It also depends on resources available. If you have a dedicated DBA who is responsible for your database then delegating maintenance to her would make sense.
Other consideration is how much database development is really done in JPA. Are there also stored procedures or other non-JPA applications that use the same back-end, or maybe you just integrate with other team's database...

If this is an existing application, I would check what you have existing, if the database structure complex as can be seen with the DDL and the DDL shows significant logic is being done on the database itself, then you are better off using plain SQLs and let the DBA maintain your data structures. JPA does not really lend well when the database structures are already complicated and there is no business benefit to use JPA at that point.
What needs to happen is a project to migrate to JPA. There are a few advantages to that:
Business Logic is removed from the database layer (which is harder to scale horizontally) to the application tier.
Java developers are generally cheaper compared to a DBA. Though you still need someone who can do both database thinking and Java thinking to do this properly and that's more rare.
By reducing the database to become as simple datastore, you can break yourself from vendor lock-in.
If done right, you can have a different database for development (can be DB2 Express C which is free) and have a more robust database for your integration and production environments (e.g. DB/2 for zOS). This allows you to be able to have more developers without worrying about licensing costs as much.
As for schemas being generated and such, there are actually four workflows that can occur:
For design, an Object-Relational (rather than an Entity-Relational) diagram serves as a contract between the application team and the database team. The end result is the JPA objects will run in the physical data structure that the DBA sets.
For Java application development, just let each developer have their own database and let them blow it up as much as they want. The JPA code will generate the schemas for you.
For database development, the generated schemas and class diagrams are passed onto review by the DBA to see where performance can be improved upon. Specifically they are there to specify the indices which are not available in the JPA standard since it is not cross database. They are also there to set up the table spaces and all the access controls and schemas for the development, but at least the gist of the structure can be taken away from them and passed onto the application team which gives the application team more flexibility to adapt to changes. What would normally happen is the DBA just includes some generated SQL and then have alters to add additional columns and others that would be used for other purposes outside of the application (the JPA structure needs only what is needed by the application, it does not need to map one-to-one 100% to the database)
For migration, the DBA needs to do a differential analysis between the two schemas. There's a program called dbsolo (not free) that can do it with most databases. However, if things were done in JPA, the structures are simpler since in theory there is no longer any business logic on the database thus reducing the complexity of data migrations due to upgrades.
The net of it is you can't just say you're using JPA without involving the whole delivery team which will have to include the DBA willing to relinquish control and ownership of the structure of the data to the application team, but still be part of the design and reviews.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.