in-memory DBs evaluation

in-memory DBs evaluation - java

I am trying to increase the overall Integration test execution time and I am currently evaluating various in-memory db solutions. The idea is to have DAOs hit in-mem db during the tests as opposed to hitting a real DB. This is a java app using Hibernate for persistence.
I'd be interested to see your experience with one of these products H2, Derby, HSQLDB, Oracle Berkeley DB.
Some of my concerns are: will in-mem DBs be able to execute stored procedures, custom native sql? Can you selectively choose which one of your services should hit real DB vs in mem DB?
And overall, since this approach involves DB bootstrapping(pre-load/pre-create all tables with data) I am now thinking if it'd be simply easier to just mock out the DAO layer and not even worry about all the unknown problems that in mem DB may bring...
thanks.

My suggestion is to test everything, including the DAO layer as you mention. But see if you can test it in pieces. Services, DAOs, UI.
For service layer testing, mock out the DAOs. That way the service layer tests are independent of whether the DAOs are working. If the service layer tests are using DAOs and using a real database then I'd argue that it's not really a Unit test but an Integration test. Although those are valuable too, if they fail it doesn't pinpoint the problem like a Unit test.
For our DAO layer tests we use DbUnit with HSQLDB. (Using Unitils helps if you are using Spring/Hibernate/DbUnit to tie it all together.) Our DAO tests execute nice and quickly (which is important when you have 500+ tests). The memory db schema is built from our schema creation scripts so as a side effect we are testing those as well. We load/refresh a known set of data from some flat files into the memory database. (Compared to when we were using the DEV database and some data would get removed which then broke tests). This solution is working great for us and I would recommend it to anyone.
Note, however, that we are not able to test the DAO that uses a stored proc this way (but we only have one). I disagree somewhat with the poster who mentioned that using different databases is "bad" -- just be aware of the differences and know the implications of doing so.
You didn't mention if you are using Hibernate or not -- that is one important factor in that it abstracts us away from modifying any SQL that may be specific to Oracle or SQLServer or HSQLDB which another poster mentioned.

Mock out the DAO layer.
Despite what some claim unless you are just using trivial sql the subtle implementation differences and differing feature sets between databases will limit what you can do (stored procedures, views etc.) and also to some extent invalidate the tests.
My personal mocking framework of choice is Mockito. But there are lots that do the job and mocking out the DAO is standard practice so you'll find lots of documentation.

It is bad idea to have different databases for unit-testing and for production.
BTW, testing in real database should be fast, probably you are doing something wrong in your tests.

I just came across Oracle Times Ten in mem db.
http://www.oracle.com/technology/products/timesten/index.html
This may seem like possibly the most painless solution. Since no additional mocking/configuration is required. You still have all of your Integration tests intact hitting the DB but now the data is delivered faster. What do you guys think ?

Related

DAO layer testing strategy with JUnit and Spring

I'm trying to devise an optimal strategy to unit-test DAO layer of my Spring app.
Many existing approaches like in-memory DB usage, etc (posts: 12289800, 12390813, 9940010, 12801926). do not appeal to me.
So, here is a straightforward way that occurs to me:
Create Spring test-context.xml and put there all the data needed for testing all the DAO classes;
For each test class create a template method to test CRUD operations and all 'select' operations;
Before testing, insert all needed data from test-context.xml to your real DB. We may need also some dependencies (references), so insert them as well, let's say in #Before method.
After all CRUD operations, delete all dependencies (references) from DB, let's say in #After method.
If we have a lot of dependencies, this may become a terribly expensive and laborious approach. Also we have only one #Test method (template method, to ensure the order of operations: create, read... etc.) - so one test per test class.
So, I need an advice whether this strategy is viable? What similar did you do to test your DAOs?

After all, I ended up with this strategy for testing classes responsible for interacting with the database in Spring-based app. Key thoughts:
Use an in-memory DB (H2 is OK), a separate Spring profile with test data source and settings.
Database is set up at the beginning of the entire test process from the schema.sql scripts. So we need to have the sql sources to rebuild the test database. Possibly it comes from DBA or yourself if you are designing the database on your own. Tools like liquibase or flyway are if you work with the database in a large team, where everybody needs the actual state of the database by applying incremental scripts. In this way the results setup script in managed by the tool.
Obviously each test case will require its own set of data to be initialized before executing the test. We do it by making sample.sql/clear_sample.sql scripts (a pair for each test case) to insert and delete data before and after each case. For this we can use either spring's annotations: #Sql or ScriptUtils
To help designing tests we can inject EntityManager for example for retrieving inserted with the help of sql scripts.
Basic JUnit asserts are used to compare.
Thus, we have no additional software layer, like DbUnit or anything and write isolated and maintainable unit-tests.
The unavoidable downside is that when a more or less significant change comes to the DB we need to rewrite the whole test, or even several.

How to setup an in memory db schema for test as close as possible of the production db

This question is extracted from a comment I posted here:
What's the best strategy for unit-testing database-driven applications?
So I have a huge database schema for a legacy application (with quite an old code base) that has many tables, synonyms, triggers, and dblinks. We and we have (finally) started to test some part of the application.
Our tests are already using mocks, but in order to test the queries that we are using we have decided to use an in-memory db with short-lived test dataset.
But the setup of the in-memory database requires a specific SQL script for the db schema setup. The script is not the real DDL we have in production because we can not import it directly.
To make things harder, the database contains functions and procedures that needs to be implemented in Java (we use the h2 db, and that is the way to declare procedures).
I'm afraid that our test won't break the day the real db will change and we will spot the problem only at runtime, potentially in production.
I know that our tests are quite at the border between integration and unit. However with the current architecture it is quite hard to insulate the test from the db. And we want to have proper tests for the db queries (no ORM inside).
What would be solution to have a DDL as close as possible of the real one and without the need to manually maintain it ?

If your environments are Dockerized I would highly suggest checking out Testcontainers (https://www.testcontainers.org/modules/databases/). We have used it to replace in-memory databases in our tests with database instances created from production DDL scripts.
Additionally, you can use tmpfs mounting to get performance levels similar to in-memory databases. This is nicely explained in following post from Vlad Mihalcea: https://vladmihalcea.com/how-to-run-integration-tests-at-warp-speed-with-docker-and-tmpfs/.
This combination works great for our purposes (especially when combined with Hibernate auto-ddl option) and I recommend that you check it out.

Junit good practices

I have a code which retrieves few information from Database.
For example if you pass person Id, method will return you person details like:
Name: XXX X XXX
Address: XXXXXXXXXXXXX XXXXX
Phone: XXXXXX
In Junit what is the good practice to test this type of code? Is it good practice that Junit to have DB connection?
Is it a good practice, that JUnit will connect to DB and retrieve information for same person Id and do assertion.
Thanks.

For testing the code that really needs to work with the database, you should look at dbunit. As little of the code as possible should know about the database though - allowing you to fake out the "fetch or update the data" parts when testing other components.
I'd strongly advise a mixture of DB tests - lots of unit tests which hit an in-memory database (e.g. HSQLDB) and "enough" integration tests which talk to the real kind of database that will be used in production. You may well want to make sure that all your tests can actually run against both environments - typically develop against HSQLDB, but then run against your production-like database (which is typically slower to set up/tear down) before check-in and in your continuous build.

It sounds like you're talking about something like a Data Access Object. I'd say it's essential to test that kind of thing with a real database. Look at H2 for a fast, in-memory database that's excellent for testing. Create your populated object, use your persistence code to save it to the database and then to load it back. Then make sure the object you get back has the same state as what you saved in the first place.
Consider using the Spring test framework for help managing transactions in persistence tests and for general test support if you're using Spring elsewhere.

JPA Native Queries versus 'pure' JPA persistence

I have a scenario where in I need to keep a log of all incoming files (flat, xml) to an application. This log table is hardly used, except for fault investigation or regulatory purposes and things like that, and data will be purged regularly.
We are using JPA 2.0 for persistence. We tried the initial prototype with pure JPA persistence using entityManager.persist(); and flush immediately. But the performance was not up to the expectation. So I suggested NativeNamedQueries for this operation and the performance improvement was huge (300 milliseconds vs 47 milliseconds) on tests.
But the lead engineer is bit adamant on using NativeNamedQueries, saying that its coupled to the database and less maintainable and things like that.
Questions :
What is your take on this, in case if you had to take a decision. How often does database or schema changes happen once the application goes to production ?
Is there any other way to improve performance? Performance is very very critical for this application.
Its only 4 years since I started programming, but never seen a DB schema change or DB provider change happening for an existing application.
Note : We are using EclipseLink 2.3 and Oracle. Also its a fresh application that we are developing. Just in case these points makes question more clear

How often does database or schema changes happen once the application goes to production ?
This is immaterial to your problem at hand. The quantity of changes to database schemas does not matter. What matters is the maintainability of your database model, how well it has been designed. Most business apps will see a lot of changes being done if sufficient performance testing hasn't been done, which is sadly true for most apps.
If you are a writing a typical line-of-business application, I would expect some form of round-trip engineering between the object model and the database model to occur in development. Your DBAs ought to own and know the database model quite well, so that they can aid or perform the fine-tuning the queries issued by your ORM framework. This is keeping in mind that you may not rely on the queries issued by the ORM framework alone. All changes should preferably be done and tested in the development and integration-testing (and possibly UAT, if you have one) environments before it is rolled out to production, and as common sense would suggest, all changes would be under version control.
On the topic of coupling the queries to a database, then that is a decision your business has to take. If you are in the business of supporting multiple databases, then you ought to testing against all. Also, you should be capable of providing different distributions for supporting different databases; this is made easier if you place your native queries in database specific orm.xml files like orm-oracle.xml, orm-mysql.xml etc. and rename the files to orm.xml before you prepare a distribution. Using Maven or Ant would make the proposed change easy to implement.
Is there any other way to improve performance? Performance is very very critical for this application.
That would depend on how well you have designed your object and data models, how well you've understood your ORM framework and how willing you are in "corrupting" your object model.
The first bit of performance tuning any application is to always measure twice and cut once. You cannot simply iterate through a list of possible solutions and try each one of them without knowing how they work and in what circumstances they are useful; okay, you could do that if your business is willing to invest time in that, but it is often not the case.
To begin, you'll need to understand why native queries are providing or appear* to provide a better performance. Maybe this has got a lot to do with the fact that you are merely inserting data, and it would be better for an ORM framework to simply issue the INSERT statement rather than construct one from HQL or the abstract query notation used under the hood; only a profiler will reveal the difference.
If the above is true, then you could reconsider whether your audit tables must be managed by the ORM framework. If your application is responsible for only writing to these tables and not reading from them (and it is quite possible that another app is responsible for reading the entries), then I would suspect that not managing these tables in ORM would provide better performance, especially if you use plain JDBC to issue the INSERT statement. The reason is quite simple - if your ORM framework is managing the entity, then it is also responsible for managing the persistence context (which now includes the class and the associated table); not having ORM manage the entity would possibly result in the scenario where the persistence context need not be updated at all for audit entries.
There is a healthy possibility of other performance tuning measures that you can undertake, but like I stated earlier, it would require you to understand a profiler report and estimate which possible choices would be better in your application.
* I'm afraid that unless you publish benchmarks and how you conducted them I will be skeptical of claims.

It's quite rare that you actually DO switch the database provider, especially once you've paid several 100k's of license for an excellent and high-performant database like Oracle. Besides, the SQL syntax variants of the INSERT statement are not so distinct that you wouldn't be able to switch the database, even when using native SQL, exceptionally.
I don't see why patching a single query that needs extra tuning is bad. Ask your lead developer why he's so strict. But before you do, use a profiler, such as JProfiler, or Yourkit to identify the exact spot that's causing the performance issues. With JPA, any of these may cause issues: caching, eager loading of dependent data (which you wouldn't need, probably), inefficient SQL generation, a bad query execution plan in your Oracle database, etc... Maybe you don't need a native query after all.
If performance is so critical, then maybe JPA is not good enough for the job. Have you (and your lead developer) considered other frameworks such as jOOQ, QueryDSL, MyBatis or anything similar? I have understood from your comments that your main use-cases are OLAP-querying, and not OLTP, hence you might even like to use advanced Oracle features, such as analytic functions and data-warehousing functionality, for which jOOQ has native support, for instance...

1) I have seen only 2 applications that moved from oracle to MySQL (to save on license costs) in 10 years, so it's not something that happens very often, BUT if you want to write integration tests using another database (eg hsqldb) you'll be in trouble.
About how often schema changes after an app goes to production, my answer is: A LOT!! If the app will be updated regularly, expect LOTs of changes, as usually the team understand the business better. I even worked on the project in which the schema was considerably different after one year of the app going live.
At the same time, this looks like you deferred optimizing the until the last posible time (a good thing to do) and now you need optimize the sql using some native queries (which also happens quite regularly)... What I'm trying to say is that your idea doesn't sound bad at all for me.
2) In the past I've used a mix of Hibernate and iBatis (or mybatis nowadays) for similar situations (in case you want to check iBatis). And one question, why are you doing a flush() after each persist()? You shoulnd't really need to do that.
Also, I'm quite surprised that the inserts take so much longer if they're done in EclipseLink. The calls to persist() should take almost the same amount of time as native query (I assuming they'll take longer if there is any lifecycle callbacks). I assume you've seen the sql generated by eclipseLink, is it that different?
I know my answer is not specific at all, but I hope it helps.

Standard Workflow when working with JPA

I am currently trying to wrap my head around working with JPA. I can't help but feel like I am missing something or doing it the wrong way. It just seems forced so far.
What I think I know so far is that their are couple of ways to work with JPA and tools to support this.
You can do everything in Java using annotations, and let JPA (whatever implementation you decide to use) create your schema and update it when changes are made.
You can use a tool to reverse engineer you database and generate the entity classes for you. When the schema is updated you have to regenerate these classes, or manually update them.
There seems to be drawbacks to both, and benefits to both (as with all things). My question is in an ideal situation what is the standard workflow with JPA? Most schemas will require updates during the maintenance phase and especially during the development phase, so how is this handled?

It's not always a good approach to generate the DB schema from the annotated entities. Although in theory it sounds great - in practice often the generated schema is not optimal and would not satisfy and experienced DBA.
The approach that I follow in my workflow is to create the entities and db schema separately, while still using a pretty intelligent tool for the schema creating - either something like Liquibase, that is database agnostic, supports revisions, rollbacks, etc... or a custom baked migration tool that simply runs heavily optimized db specific sql scripts.
It probably sounds to you less than ideal, but I can assure it gets the jobs done and keep your schema related code consistent since, as grigory pointed out - not everything related to the database can be generated from the entities anyways.
I can, however, be useful to generate the schema from the entities for the test database against which unit and integration tests are being run. Assuming you're using say PostgreSQL is production you might decide to speed things up for the unit tests running some embedded in-memory database like H2 which gets created from the entities before the tests are started and disappears automatically(since it was in-memory) after the tests finish executing. This is a very common practice.

As usual the answer is it depends...
Ideal approach (in ideal world) would probably be your 1st option: maintain everything using JPA annotations and forward engineer database artifacts using utility tool (e.g. use Hibernate Maven plugin).
It depends on the level of support for your database artifacts - not everything either belongs or suitable for annotations. That is why my projects usually use parallel maintenance for both and using unit tests to keep them in sync.
It also depends on resources available. If you have a dedicated DBA who is responsible for your database then delegating maintenance to her would make sense.
Other consideration is how much database development is really done in JPA. Are there also stored procedures or other non-JPA applications that use the same back-end, or maybe you just integrate with other team's database...

If this is an existing application, I would check what you have existing, if the database structure complex as can be seen with the DDL and the DDL shows significant logic is being done on the database itself, then you are better off using plain SQLs and let the DBA maintain your data structures. JPA does not really lend well when the database structures are already complicated and there is no business benefit to use JPA at that point.
What needs to happen is a project to migrate to JPA. There are a few advantages to that:
Business Logic is removed from the database layer (which is harder to scale horizontally) to the application tier.
Java developers are generally cheaper compared to a DBA. Though you still need someone who can do both database thinking and Java thinking to do this properly and that's more rare.
By reducing the database to become as simple datastore, you can break yourself from vendor lock-in.
If done right, you can have a different database for development (can be DB2 Express C which is free) and have a more robust database for your integration and production environments (e.g. DB/2 for zOS). This allows you to be able to have more developers without worrying about licensing costs as much.
As for schemas being generated and such, there are actually four workflows that can occur:
For design, an Object-Relational (rather than an Entity-Relational) diagram serves as a contract between the application team and the database team. The end result is the JPA objects will run in the physical data structure that the DBA sets.
For Java application development, just let each developer have their own database and let them blow it up as much as they want. The JPA code will generate the schemas for you.
For database development, the generated schemas and class diagrams are passed onto review by the DBA to see where performance can be improved upon. Specifically they are there to specify the indices which are not available in the JPA standard since it is not cross database. They are also there to set up the table spaces and all the access controls and schemas for the development, but at least the gist of the structure can be taken away from them and passed onto the application team which gives the application team more flexibility to adapt to changes. What would normally happen is the DBA just includes some generated SQL and then have alters to add additional columns and others that would be used for other purposes outside of the application (the JPA structure needs only what is needed by the application, it does not need to map one-to-one 100% to the database)
For migration, the DBA needs to do a differential analysis between the two schemas. There's a program called dbsolo (not free) that can do it with most databases. However, if things were done in JPA, the structures are simpler since in theory there is no longer any business logic on the database thus reducing the complexity of data migrations due to upgrades.
The net of it is you can't just say you're using JPA without involving the whole delivery team which will have to include the DBA willing to relinquish control and ownership of the structure of the data to the application team, but still be part of the design and reviews.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.