Standard Workflow when working with JPA

Standard Workflow when working with JPA - java

I am currently trying to wrap my head around working with JPA. I can't help but feel like I am missing something or doing it the wrong way. It just seems forced so far.
What I think I know so far is that their are couple of ways to work with JPA and tools to support this.
You can do everything in Java using annotations, and let JPA (whatever implementation you decide to use) create your schema and update it when changes are made.
You can use a tool to reverse engineer you database and generate the entity classes for you. When the schema is updated you have to regenerate these classes, or manually update them.
There seems to be drawbacks to both, and benefits to both (as with all things). My question is in an ideal situation what is the standard workflow with JPA? Most schemas will require updates during the maintenance phase and especially during the development phase, so how is this handled?

It's not always a good approach to generate the DB schema from the annotated entities. Although in theory it sounds great - in practice often the generated schema is not optimal and would not satisfy and experienced DBA.
The approach that I follow in my workflow is to create the entities and db schema separately, while still using a pretty intelligent tool for the schema creating - either something like Liquibase, that is database agnostic, supports revisions, rollbacks, etc... or a custom baked migration tool that simply runs heavily optimized db specific sql scripts.
It probably sounds to you less than ideal, but I can assure it gets the jobs done and keep your schema related code consistent since, as grigory pointed out - not everything related to the database can be generated from the entities anyways.
I can, however, be useful to generate the schema from the entities for the test database against which unit and integration tests are being run. Assuming you're using say PostgreSQL is production you might decide to speed things up for the unit tests running some embedded in-memory database like H2 which gets created from the entities before the tests are started and disappears automatically(since it was in-memory) after the tests finish executing. This is a very common practice.

As usual the answer is it depends...
Ideal approach (in ideal world) would probably be your 1st option: maintain everything using JPA annotations and forward engineer database artifacts using utility tool (e.g. use Hibernate Maven plugin).
It depends on the level of support for your database artifacts - not everything either belongs or suitable for annotations. That is why my projects usually use parallel maintenance for both and using unit tests to keep them in sync.
It also depends on resources available. If you have a dedicated DBA who is responsible for your database then delegating maintenance to her would make sense.
Other consideration is how much database development is really done in JPA. Are there also stored procedures or other non-JPA applications that use the same back-end, or maybe you just integrate with other team's database...

If this is an existing application, I would check what you have existing, if the database structure complex as can be seen with the DDL and the DDL shows significant logic is being done on the database itself, then you are better off using plain SQLs and let the DBA maintain your data structures. JPA does not really lend well when the database structures are already complicated and there is no business benefit to use JPA at that point.
What needs to happen is a project to migrate to JPA. There are a few advantages to that:
Business Logic is removed from the database layer (which is harder to scale horizontally) to the application tier.
Java developers are generally cheaper compared to a DBA. Though you still need someone who can do both database thinking and Java thinking to do this properly and that's more rare.
By reducing the database to become as simple datastore, you can break yourself from vendor lock-in.
If done right, you can have a different database for development (can be DB2 Express C which is free) and have a more robust database for your integration and production environments (e.g. DB/2 for zOS). This allows you to be able to have more developers without worrying about licensing costs as much.
As for schemas being generated and such, there are actually four workflows that can occur:
For design, an Object-Relational (rather than an Entity-Relational) diagram serves as a contract between the application team and the database team. The end result is the JPA objects will run in the physical data structure that the DBA sets.
For Java application development, just let each developer have their own database and let them blow it up as much as they want. The JPA code will generate the schemas for you.
For database development, the generated schemas and class diagrams are passed onto review by the DBA to see where performance can be improved upon. Specifically they are there to specify the indices which are not available in the JPA standard since it is not cross database. They are also there to set up the table spaces and all the access controls and schemas for the development, but at least the gist of the structure can be taken away from them and passed onto the application team which gives the application team more flexibility to adapt to changes. What would normally happen is the DBA just includes some generated SQL and then have alters to add additional columns and others that would be used for other purposes outside of the application (the JPA structure needs only what is needed by the application, it does not need to map one-to-one 100% to the database)
For migration, the DBA needs to do a differential analysis between the two schemas. There's a program called dbsolo (not free) that can do it with most databases. However, if things were done in JPA, the structures are simpler since in theory there is no longer any business logic on the database thus reducing the complexity of data migrations due to upgrades.
The net of it is you can't just say you're using JPA without involving the whole delivery team which will have to include the DBA willing to relinquish control and ownership of the structure of the data to the application team, but still be part of the design and reviews.

Related

Working with microservices. Hibernate or Scripts

What is the best approach for database creation and relationship management when working with microservices?Hibernate or scripts, as i feel it shouldn't be the responsibility of microservices to create a database

As already pointed by #Vadim in the comment it ultimately it is the desginer's or developer's job to decide what to use.
My two cents from experience, in long run it is always good to use schema generation scripts and there are lots of opensource libraries available.
For instance in java we have Liquibase and Flyway.
The reason why I am saying this is, your DB will undergo lot of changes in long run. Hibernate can easily handle creating and modifying table and column changes, but sometimes for example when you add a new column you may want to fill the existing records for which you may need to write custom sqls.
Similarly from time to time you may want to update records from back-end which is difficult to achieve using hibernate.
I have observed that DB creation generally is part of pre-deploy scripts and schema generation happens during application startup.
My advise is to use some schema generation tool for schema migration and use hibernate for schema validation so that the two remain in sync.

How to make reporting section of site database independent

I need to build a reporting section of my site that consists of some decently complicated queries including things like UNION, GROUP_CONCAT, etc. JPA integration with my entities has maintained database independence so far. Currently the system uses MsSQL, but we want to be sure later we can switch to Postgres or MySQL if needed.
What's a good approach to take with these reports so that without too much work I can make it work on MySQL or Postgres?
The site also uses Spring

As I see it, your question is more or less "how can I make use of vendor specific features without becoming tied to that vendor?".
This result in not an easy answer; probably the most flexible would be stick to JPA and suck up the performance hit.
Other possibilities:
Define the reports as a component that publish a set of interfaces. Use CDI to inject the implementation related to your DB of choice
A variation of above, setup your own DAO interfaces for data access. Like another ORM framework that, being more specific, can have better performance. Build reports on top of that.
If your bussiness allows for it, chose a RDBM to work with for reports. During night time (maybe even on demand, if there is not too much data), dump your production database into it.

The best option is to use data access objects, one per database, each conforming to a common interface. Any client code can use the common DAO interface without needing to know about the underlying database.
With Spring it is easy to swap between the DAO implementation classes as a configuration option, e.g. if you have a CustomerDao interface that has Oracle and DB2 implementations, then use either:
<bean id="customerDao" class="my.package.customer.OracleCustomerDao"/>
or
<bean id="customerDao" class="my.package.customer.Db2CustomerDao"/>

If you want your SQL to work across multiple databases, then here's a plan you can follow:
Test your SQL across multiple databases
To get portability, you need to write portable SQL. The only way to ensure that your SQL is portable is to check it for portability.
If you stick to using standard SQL, then this should be fairly straightforward; you won't be able to use database-specific features, but there is a huge amount you can do without them (they're mostly syntactic sugar, or for stepping outside the relational model, which you hopefully won't need to do). If you've already strayed into using nonstandard SQL, then it may be very hard to get the point where you can do this, but if if you start off working in a disciplined way, i would be optimistic about your ability to stick to standards.
If you're working on SQL Server, the PostgreSQL would be a good choice for a second database to test against, as it's free, easy to set up, and very capable.

JPA Native Queries versus 'pure' JPA persistence

I have a scenario where in I need to keep a log of all incoming files (flat, xml) to an application. This log table is hardly used, except for fault investigation or regulatory purposes and things like that, and data will be purged regularly.
We are using JPA 2.0 for persistence. We tried the initial prototype with pure JPA persistence using entityManager.persist(); and flush immediately. But the performance was not up to the expectation. So I suggested NativeNamedQueries for this operation and the performance improvement was huge (300 milliseconds vs 47 milliseconds) on tests.
But the lead engineer is bit adamant on using NativeNamedQueries, saying that its coupled to the database and less maintainable and things like that.
Questions :
What is your take on this, in case if you had to take a decision. How often does database or schema changes happen once the application goes to production ?
Is there any other way to improve performance? Performance is very very critical for this application.
Its only 4 years since I started programming, but never seen a DB schema change or DB provider change happening for an existing application.
Note : We are using EclipseLink 2.3 and Oracle. Also its a fresh application that we are developing. Just in case these points makes question more clear

How often does database or schema changes happen once the application goes to production ?
This is immaterial to your problem at hand. The quantity of changes to database schemas does not matter. What matters is the maintainability of your database model, how well it has been designed. Most business apps will see a lot of changes being done if sufficient performance testing hasn't been done, which is sadly true for most apps.
If you are a writing a typical line-of-business application, I would expect some form of round-trip engineering between the object model and the database model to occur in development. Your DBAs ought to own and know the database model quite well, so that they can aid or perform the fine-tuning the queries issued by your ORM framework. This is keeping in mind that you may not rely on the queries issued by the ORM framework alone. All changes should preferably be done and tested in the development and integration-testing (and possibly UAT, if you have one) environments before it is rolled out to production, and as common sense would suggest, all changes would be under version control.
On the topic of coupling the queries to a database, then that is a decision your business has to take. If you are in the business of supporting multiple databases, then you ought to testing against all. Also, you should be capable of providing different distributions for supporting different databases; this is made easier if you place your native queries in database specific orm.xml files like orm-oracle.xml, orm-mysql.xml etc. and rename the files to orm.xml before you prepare a distribution. Using Maven or Ant would make the proposed change easy to implement.
Is there any other way to improve performance? Performance is very very critical for this application.
That would depend on how well you have designed your object and data models, how well you've understood your ORM framework and how willing you are in "corrupting" your object model.
The first bit of performance tuning any application is to always measure twice and cut once. You cannot simply iterate through a list of possible solutions and try each one of them without knowing how they work and in what circumstances they are useful; okay, you could do that if your business is willing to invest time in that, but it is often not the case.
To begin, you'll need to understand why native queries are providing or appear* to provide a better performance. Maybe this has got a lot to do with the fact that you are merely inserting data, and it would be better for an ORM framework to simply issue the INSERT statement rather than construct one from HQL or the abstract query notation used under the hood; only a profiler will reveal the difference.
If the above is true, then you could reconsider whether your audit tables must be managed by the ORM framework. If your application is responsible for only writing to these tables and not reading from them (and it is quite possible that another app is responsible for reading the entries), then I would suspect that not managing these tables in ORM would provide better performance, especially if you use plain JDBC to issue the INSERT statement. The reason is quite simple - if your ORM framework is managing the entity, then it is also responsible for managing the persistence context (which now includes the class and the associated table); not having ORM manage the entity would possibly result in the scenario where the persistence context need not be updated at all for audit entries.
There is a healthy possibility of other performance tuning measures that you can undertake, but like I stated earlier, it would require you to understand a profiler report and estimate which possible choices would be better in your application.
* I'm afraid that unless you publish benchmarks and how you conducted them I will be skeptical of claims.

It's quite rare that you actually DO switch the database provider, especially once you've paid several 100k's of license for an excellent and high-performant database like Oracle. Besides, the SQL syntax variants of the INSERT statement are not so distinct that you wouldn't be able to switch the database, even when using native SQL, exceptionally.
I don't see why patching a single query that needs extra tuning is bad. Ask your lead developer why he's so strict. But before you do, use a profiler, such as JProfiler, or Yourkit to identify the exact spot that's causing the performance issues. With JPA, any of these may cause issues: caching, eager loading of dependent data (which you wouldn't need, probably), inefficient SQL generation, a bad query execution plan in your Oracle database, etc... Maybe you don't need a native query after all.
If performance is so critical, then maybe JPA is not good enough for the job. Have you (and your lead developer) considered other frameworks such as jOOQ, QueryDSL, MyBatis or anything similar? I have understood from your comments that your main use-cases are OLAP-querying, and not OLTP, hence you might even like to use advanced Oracle features, such as analytic functions and data-warehousing functionality, for which jOOQ has native support, for instance...

1) I have seen only 2 applications that moved from oracle to MySQL (to save on license costs) in 10 years, so it's not something that happens very often, BUT if you want to write integration tests using another database (eg hsqldb) you'll be in trouble.
About how often schema changes after an app goes to production, my answer is: A LOT!! If the app will be updated regularly, expect LOTs of changes, as usually the team understand the business better. I even worked on the project in which the schema was considerably different after one year of the app going live.
At the same time, this looks like you deferred optimizing the until the last posible time (a good thing to do) and now you need optimize the sql using some native queries (which also happens quite regularly)... What I'm trying to say is that your idea doesn't sound bad at all for me.
2) In the past I've used a mix of Hibernate and iBatis (or mybatis nowadays) for similar situations (in case you want to check iBatis). And one question, why are you doing a flush() after each persist()? You shoulnd't really need to do that.
Also, I'm quite surprised that the inserts take so much longer if they're done in EclipseLink. The calls to persist() should take almost the same amount of time as native query (I assuming they'll take longer if there is any lifecycle callbacks). I assume you've seen the sql generated by eclipseLink, is it that different?
I know my answer is not specific at all, but I hope it helps.

Strategies for dealing with constantly changing requirements for MySQL schemas?

I'm using Hibernate EntityManager and Hibernate Annotations for ORM in a very early stage project. The project needs to launch soon, but the specs are changing constantly and I am concerned that the system will be launched and live data will be collected, and then the specs will change again and I will be in a situation where I need to change the database schema.
How can I set things up in order to minimize the impact of this? Are there any open source projects that deal with this kind of migration? Can Hibernate do this automatically (without wiping the database)?
Your advice is much appreciated.

It's more a functional or organizational problem than a technical one. No tool will automatically guess how to migrate data from one schema to another one. You'd better learn how to write stored procedure in order to migrate your data.
You'll probably need to disable constraints, create temporary table and columns, copy lots of data, and then delete the temporary tables and columns and re-enable constraints to have migrate your data.
Once in maintenance mode, every new feature that modifies the schema should also come with the script allowing to migrate from the current schema and data in production to the new one.

No system can possibly create datamigration scripts automatically from just the original and the final schema. There just isn't enough information.
Consider for example a new column. Should it just contain the default value? Or a value calculated from other fields/tables.
There is a good book about refactoring databases: http://www.amazon.com/Refactoring-Databases-Evolutionary-Addison-Wesley-Signature/dp/0321774515/ref=sr_1_1?ie=UTF8&qid=1300140045&sr=8-1
But there is little to no tool support for this kind of stuff.
I think the best thing you can do in advance:
Don't let anybody access the database but your application
If something else absolutely must access the db directly, give it a separate set of view specially for that purpose. This allows you to change your table structure by keeping at least the structure of what other systems see.
Have tons of tests. I just posted an article wich (with the upcoming 2nd and 3rd part) might help a little with this: http://blog.schauderhaft.de/2011/03/13/testing-databases-with-junit-and-hibernate-part-1-one-to-rule-them/

Hibernate can update the database entity model with data in the database. So do that and write migration code in java which sets or removes data relationships.
This works, and we have done it multiple times. But of course, try to follow a flexible development process; make what you know for sure first, then reevaluate the requirements - scrum etc.

In your case, I would recommend a NoSQL database. I don't have much experience with such kind of databases so I can't recommend any current implementation so you may want to check this too.

Hibernate or JDBC

I have a thick client, java swing application with a schema of 25 tables and ~15 JInternalFrames (data entry forms for the tables). I need to make a design choice of straight JDBC or ORM (hibernate with spring framework in this case) for DBMS interaction. Build out of the application will occur in the future.
Would hibernate be overkill for a project of this size? An explanation of either yes or no answer would be much appreciated (or even a different approach if warranted).
TIA.

Good question with no single simple answer.
I used to be a big fan of Hibernate after using it in multiple projects over multiple years.
I used to believe that any project should default to hibernate.
Today I am not so sure.
Hibernate (and JPA) is great for some things, especially early in the development cycle.
It is much faster to get to something working with Hibernate than it is with JDBC.
You get a lot of features for free - caching, optimistic locking and so on.
On the other hand it has some hidden costs. Hibernate is deceivingly simple when you start. Follow some tutorial, put some annotations on your class - and you've got yourself persistence. But it's not simple and to be able to write good code in it requires good understanding of both it's internal workings and database design. If you are just starting you may not be aware of some issues that may bite you later on, so here is an incomplete list.
Performance
The runtime performance is good enough, I have yet to see a situation where hibernate was the reason for poor performance in production. The problem is the startup performance and how it affects your unit tests time and development performance. When hibernate loads it analyzes all entities and does a lot of pre-caching - it can take about 5-10-15 seconds for a not very big application. So your 1 second unit test is going to take 11 secods now. Not fun.
Database Independency
It is very cool as long as you don't need to do some fine tuning on the database.
In-memory Session
For every transaction Hibernate will store an object in memory for every database row it "touches". It's a nice optimization when you are doing some simple data entry. If you need to process lots of objects for some reason though, it can seriously affect performance, unless you explicitly and carefully clean up the in-memory session on your own.
Cascades
Cascades allow you to simplify working with object graphs. For example if you have a root object and some children and you save root object, you can configure hibernate to save children as well. The problem starts when your object graph grow complex. Unless you are extremely careful and have a good understanding of what goes on internally, it's easy to mess this up. And when you do it is very hard to debug those problems.
Lazy Loading
Lazy Loading means that every time you load an object, hibernate will not load all it's related objects but instead will provide place holders which will be resolved as soon as you try to access them. Great optimization right? It is, except you need to be aware of this behaviour otherwise you will get cryptic errors. Google "LazyInitializationException" for an example. And be careful with performance. Depending on the order of how you load your objects and your object graph you may hit "n+1 selects problem". Google it for more information.
Schema Upgrades
Hibernate allows easy schema changes by just refactoring java code and restarting. It's great when you start. But then you release version one. And unless you want to lose your customers you need to provide them schema upgrade scripts. Which means no more simple refactoring as all schema changes must be done in SQL.
Views and Stored Procedures
Hibernate requires exclusive write access to the data it works with. Which means you can't really use views, stored procedures and triggers as those can cause changes to data with hibernate not aware of them. You can have some external processes writing data to the database in a separate transactions. But if you do, your cache will have invalid data. Which is one more thing to care about.
Single Threaded Sessions
Hibernate sessions are single threaded. Any object loaded through a session can only be accessed (including reading) from the same thread. This is acceptable for server side applications but might complicate things unnecessary if you are doing GUI based application.
I guess my point is that there are no free meals.
Hibernate is a good tool, but it's a complex tool, and it requires time to understand it properly. If you or your team members don't have such knowledge it might be simpler and faster to go with pure JDBC (or Spring JDBC) for a single application. On the other hand if you are willing to invest time into learning it (including learning by doing and debugging) than in the future you will be able to understand the tradeoffs better.

Hibernate can be good but it and other JPA ORMs tend to dictate your database structure to a degree. For example, composite primary keys can be done in Hibernate/JPA but they're a little awkward. There are other examples.
If you're comfortable with SQL I would strongly suggest you take a look at Ibatis. It can do 90%+ of what Hibernate can but is far simpler in implementation.
I can't think of a single reason why I'd ever choose straight JDBC (or even Spring JDBC) over Ibatis. Hibernate is a more complex choice.
Take a look at the Spring and Ibatis Tutorial.

No doubt Hibernate has its complexity.
But what I really like about the Hibernate approach (some others too) is the conceptual model you can get in Java is better. Although I don't think of OO as a panacea, and I don't look for theoritical purity of the design, I found so many times that OO does in fact simplify my code. As you asked specifically for details, here are some examples :
the added complexity is not in the model and entities, but in your framework for manipulating all entities for example. For maintainers, the hard part is not a few framework classes but your model, so Hibernate allows you to keep the hard part (the model) at its cleanest.
if a field (like an id, or audit fields, etc) is used in all your entities, then you can create a superclass with it. Therefore :
you write less code, but more importantly ...
there are less concepts in your model (the unique concept is unique in the code)
for free, you can write code more generic, that provided with an entity (unknown, no type-switching or cast), allows you to access the id.
Hibernate has also many features to deal with other model caracteristics you might need (now or later, add them only as needed). Take it as an extensibility quality for your design.
You might replace inheritance (subclassing) by composition (several entities having a same member, that contains a few related fields that happen to be needed in several entities).
There can be inheritance between a few of your entities. It often happens that you have two tables that have pretty much the same structure (but you don't want to store all data in one table, because you would loose referential integrity to a different parent table).
With reuse between your entities (but only appropriate inheritance, and composition), there is usually some additional advantages to come. Examples :
there is often some way to read the data of the entities that is similar but different. Suppose I read the "title" field for three entities, but for some I replace the result with a differing default value if it is null. It is easy to have a signature "getActualTitle" (in a superclass or an interface), and implement the default value handling in the three implementations. That means the code out of my entities just deals with the concept of an "actual title" (I made this functional concept explicit), and the method inheritance takes care of executing the correct code (no more switch or if, no code duplication).
...
Over time, the requirements evolve. There will be a point where your database structure has problems. With JDBC alone, any change to the database must impact the code (ie. double cost). With Hibernate, many changes can be absorbed by changing only the mapping, not the code. The same happens the other way around : Hibernate lets you change your code (between versions for example) without altering your database (changing the mapping, although it is not always sufficient). To summarize, Hibernate lets your evolve your database and your code independtly.
For all these reasons, I would choose Hibernate :-)

I think either is a fine choice, but personally I would use hibernate. I don't think hibernate is overkill for a project of that size.
Where Hibernate really shines for me is dealing with relationships between entities/tables. Doing JDBC by hand can take a lot of code if you deal with modifying parent and children (grandchildren, siblings, etc) at the same time. Hibernate can make this a breeze (often a single save of the parent entity is enough).
There are certainly complexities when dealing with Hibernate though, such as understanding how the Session flushing works, and dealing with lazy loading.

Straight JDBC would fit the simplest cases at best.
If you want to stay within Java and OOD then going Hibernate or Hibernate/JPA or any-other-JPA-provider/JPA should be your choice.
If you are more comfortable with SQL then having Spring for JDBC templates and other SQL-oriented frameworks won't hurt.
In contrast, besides transactional control, there is not much help from having Spring when working with JPA.

Hibernate best suits for the middleware applications. Assume that we build a middle ware on top of the data base, The middelware is accessed by around 20 applications in that case we can have a hibernate which satisfies the requirement of all 20 applications.

In JDBC, if we open a database connection we need to write in try, and if any exceptions occurred catch block will takers about it, and finally used to close the connections.
In jdbc all exceptions are checked exceptions, so we must write code in try, catch and throws, but in hibernate we only have Un-checked exceptions
Here as a programmer we must close the connection, or we may get a chance to get our of connections message…!
Actually if we didn’t close the connection in the finally block, then jdbc doesn’t responsible to close that connection.
In JDBC we need to write Sql commands in various places, after the program has created if the table structure is modified then the JDBC program doesn’t work, again we need to modify and compile and re-deploy required, which is tedious.
JDBC used to generate database related error codes if an exception will occurs, but java programmers are unknown about this error codes right.
While we are inserting any record, if we don’t have any particular table in the database, JDBC will rises an error like “View not exist”, and throws exception, but in case of hibernate, if it not found any table in the database this will create the table for us
JDBC support LAZY loading and Hibernate supports Eager loading
Hibernate supports Inheritance, Associations, Collections
In hibernate if we save the derived class object, then its base class object will also be stored into the database, it means hibernate supporting inheritance
Hibernate supports relationships like One-To-Many,One-To-One, Many-To- Many-to-Many, Many-To-One
Hibernate supports caching mechanism by this, the number of round trips between an application and the database will be reduced, by using this caching technique an application performance will be increased automatically
Getting pagination in hibernate is quite simple.
Hibernate has capability to generate primary keys automatically while we are storing the records into database

... In-memory Session ... LazyInitializationException ...
You could look at Ebean ORM which doesn't use session objects ... and where lazy loading just works. Certainly an option, not overkill, and will be simpler to understand.

if billions of user using out app or web then in jdbc query will get executed billions of time but in hibernate query will get executed only once for any number of user most important and easy advantage of hibernate over jdbc.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.