Here is my scenario:
I have a java application that reads data from a table T1 of database D1, processes it and puts it in another table T2 of another database D2. This happens real time, i.e., as and when a record is inserted or updated in table T1, the application will pick the data, process it and pushes it to the destination table. I wish to monitor performance of this application using a testing(preferrably JUnit) and/or performance framework. In my test case I wish to have following
Insert and update a fixed number of records for fixed time at fixed intervals on table T1 of database D1.
After a fixed time, either check the number of records that are present in T2 of database D2 or look for existence of a specific record.
The tests that I wish to create should be
Database agnostic
Provide results that can show trends and be configurable with a CI tool like Jenkins
So, my question is, what is the best way to test this kind of scenario? Are there any available tools that will help me achieve this?
Database agnostic
In order to achieve that I would suggest to use simplest possible SQL and some low-level JDBC abstraction layer:
DBUtils
The Commons DbUtils library is a small set of classes designed to make
working with JDBC easier. JDBC resource cleanup code is mundane, error
prone work so these classes abstract out all of the cleanup tasks from
your code leaving you with what you really wanted to do with JDBC in
the first place: query and update data.
MyBatis
MyBatis is a first class persistence framework with support for custom
SQL, stored procedures and advanced mappings. MyBatis eliminates
almost all of the JDBC code and manual setting of parameters and
retrieval of results. MyBatis can use simple XML or Annotations for
configuration and map primitives, Map interfaces and Java POJOs (Plain
Old Java Objects) to database records.
Both will do the trick for you. With good attention to details you'll manage to provide flexible enough solution and test as many databases as you want.
Provide results that can show trends and be configurable with a CI tool like Jenkins
Define several KPIs and make sure you can get all values periodically. For example you can measure a throughput (records per second). Export data periodically (as CSV or properties for example) and use PlotPlugin for visualization:
You can also check relevant question: How do I plot benchmark data in a Jenkins matrix project
Proper testing
Please make sure your testing strategy is well defined and you will not miss anything:
Load testing
Stress testing
Related
As the application gets complicated, one thing that change a lot is the queries, especially if they are complex queries. Wouldn't it be easier to maintain the queries in the db rather then the resources location inside the package, so that it can be enhanced easily without a code change. What are the drawbacks of this?
You can use stores procedures, to save your queries in the database. Than your Java code can just call the procedure from the database instead of building a complex query.
See wikipedia for a more detailed explanation about stored procedures:
https://en.wikipedia.org/wiki/Stored_procedure
You can find details about the implementation and usage in the documentation of your database system (MySql, MariaDb, Oracle...)
When you decide to move logic to the database, you should use a version control system for databases like liquibase: https://www.liquibase.org/get-started/quickstart
You can write the changes to you database code in xml, json or even yaml and check that in in your version control system (svn, git...). This way you have a history of the changes and can roll back to a previous version of your procedure, if something goes wrong.
You also asked, why some people use stored procedures and others keep their queries in the code.
Stored procedures can encapsulate the query and provide an interface to the data. They can be faster than queries. That is good.
But there are also problems
you distribute the buisiness logic of your application to the database and the programm code. It can realy be troublesome, if the logic is spread through all technical layers of your applicaton.
it is not so simple anymore to switch from a Oracle database to a MariaDb, if you use specific features of the database system. You have to migrate or rewrite the procedures.
you have to integrate liquibase or another system into you build pipeline, to keep track of you database changes.
So it depends on the project and it's size, if either of the solutions is better.
What is the best approach for database creation and relationship management when working with microservices?Hibernate or scripts, as i feel it shouldn't be the responsibility of microservices to create a database
As already pointed by #Vadim in the comment it ultimately it is the desginer's or developer's job to decide what to use.
My two cents from experience, in long run it is always good to use schema generation scripts and there are lots of opensource libraries available.
For instance in java we have Liquibase and Flyway.
The reason why I am saying this is, your DB will undergo lot of changes in long run. Hibernate can easily handle creating and modifying table and column changes, but sometimes for example when you add a new column you may want to fill the existing records for which you may need to write custom sqls.
Similarly from time to time you may want to update records from back-end which is difficult to achieve using hibernate.
I have observed that DB creation generally is part of pre-deploy scripts and schema generation happens during application startup.
My advise is to use some schema generation tool for schema migration and use hibernate for schema validation so that the two remain in sync.
Our application uses java / sql server.
We have ETL jobs (around 35 for different upstreams) using sprint batch. Some of the code is in java and some in database. We want to track lifecycle of a job from database. E.g. when a job started, when a particular component got called, when a method / stored procedure got called and how much time that took. The purpose is to do health check which component is taking more time and in case some stored procedure takes lots of time in production we should be able to query database. Moreover, we also want to store intermediate calculations for audit and debug purpose.
This time tracking and intermediate calculations would be stored besides normal application logging.
Current solution we have implemented is normalized tables in database (e.g. Job, Task, status, etc) for which we have stored procedure wrapper and then have java classes as well to call those stored procedures.
We are not redesigning our application, so wanted to check what is the best approach to track such information. AOP? but I believe that usually gets called for before and after what about the intermediate calculations we want to store?
Our current approach is working, but it is cluttering code as method is doing logging & auditing, instead of just concentrating on the main logic.
A free and open-source tool you should consider is Jamon, it is a comprehensive monitoring framework that provides a lots of useful features:
JAMon allows developers to track their applications performance and
behavior using predefined modules. There are modules that
automatically monitor : SQL, HTTP page requests, Spring beans, method
invocations, Log4j, and Exceptions. Other modules are often easy to
build. JAMon keeps track of the following metrics for any of the items
it tracks in the modules: hits, total, average, min, max and
concurrency (average, max, current/active) to name a few.
Now about storing calculation, I would suggest to break your methods in smaller sub-methods and then use AOP or any other tool to capture the returned value and perform whatever operation you want on these data.
In addition, if you need to have more details on the database layer I would recommend log4jdbc, which will give you nice audit and metrics around jdbc calls. For example you'll be able to get the execution time, the in and out parameters of called procedures, parameters provided to any statements.
You can even extends this tool to provide custom behavior (audit only some procedures, do something specific with collected data.
Aspects are a very good way to isolate the timing code in one place.
Stored procedures seem unnecessary to me. A simple SQL INSERT ought to do the trick. It's fine if you're using the stored proc as an interface to hide the schema from users, but I doubt that this table will evolve much.
Logging, timing, and auditing are the "hello world" of aspect oriented programming.
I am developping an application that tests different WebServices, and I want it to be as generic as possible. I need to populate database to do JUnit tests, but I don't want these changes to be commited.
I know that some in-memory databases like HSQL DB allow testing on a sort of a virtual (or mock) database, but unfortunately I use oracle and I cannot change it now because of my complex data tables structure.
What is the best practice you suggest?
Thanks.
First of all, HSQL and Hibernate aren't related in any way. The question is whether you can find an embedded database which supports the same SQL as your production database (or rather the subset of SQL which your application uses).
A good candidate for this is H2 database since it emulates a lot of different SQL flavours.
On top of that: Don't test the database. Assume that the database is tested thoroughly by your vendor and just works.
In my code, I aim for:
Save and load each entity.
Generate the SQL for all the queries that I use and compare them against String literals in tests (i.e. I don't run the queries against the database all the time).
Some tests look for a System property. If it's set, then they will run the queries against the database. This happens during the night on my CI server.
The rationale for this: As long as the DB schema doesn't change, there is no point to actually run the queries. That means running them during the day while I sit in front of the computer is a huge waste of time.
To make sure that "low impact" changes don't slip through the gaps, I let a computer run them when I don't care.
Along the same lines, I have mocks for many DAOs which return various predefined results, so I don't have to query the database. The rationale here is that I want to test the processing of results from the database, not the JDBC API, the DB driver, the OS's TCP/IP stack, the network hardware (and software), or any other of the 1000 things between my code and the database records on a harddisk somewhere.
More details in my blog: http://blog.pdark.de/2008/07/26/testing-with-databases/
Our development databases (Oracle 9i) use a remote database link to a remote shared database.
This decision was made years ago when it wasn't practical to put some of the database schemas on a development machine - they were too big.
We have certain schemas on the development machines and we make the remote schemas look local by using Oracle's database links, together with some synonyms on the development machines.
The problem I have is that I would like to test a piece of SQL which joins tables in schemas on either side of the database link.
e.g. (a simplified case):
select a.col, b.col
from a, b
where a.b_id = b.id
a is on the local database
b is on the remove database
I have a synonymn on the locale DB so that 'b' actually points at b#remotedb.
Running the query takes ages in the development environment because of the link. The queries run fine in production (I don't think the Oracle cost based optimiser can cope very well with database links).
We have not been very good at writing unit tests for these types of queries in the past - probably due to the due to the poor performance - so I'd like to start creating some tests for them.
Does anyone have any strategies for writing a unit test for such a query, so as to avoid the performance problems of using the database link?
I'd normally be looking at ways of trying to mock out remote service, but since all this is in a SQL query, I can't see anyway of easily mocking out the remove database.
You should create exact copies of all the schema you need from production on development but without all the data. You should populate the schema with enough data so you can do a proper test. You can also manipulate the optimizer to behave on the test system to be like production by exporting the statistics from the production server and importing them to the development database for the schemas you are duplicating. That way the query will run with the data set you've made but the query will optimize with plans that is similar to that of production. Then you can estimate theoretically how it will scale on production.
Copy the relevant data into your development database and create the tables locally.
Ideally, just build a test case which tells you:
The SQL is correct (it parses)
It operates correctly with a few rows of test data
Don't fall for the "let's copy everything" because that means you'll have no idea what you're testing anymore (and what you're missing).
If in doubt, create a table b with just a single record. If you get an error in this area, add more rows as you learn where it can fail.
If you want to take this to the edge, create the test table (with all data) in a unit test. This way, you can document the test data you're using.
[EDIT] What you need is a test database. Don't run tests against a database which can change. Ideally, the tests should tear down the whole database and recreate it from scratch (tables, indexes, data, everything) as the first step.
In this test database, only keep well defined test data that only changes by defining new tests (and not by someone "just doing something"). If you can, try to run your tests against an in-memory database.
I would suggest materialized views. These are views that store remote data locally.
In theory to do the unit-testing you can work with any set of controlled data created and designed based on your test-cases. It doesn't have to be your live or development system. That's assuming your unit is portable enough. You would test it with your current databases/application when you come to integration testing, which might as well be on the live system anyway (so no db links will be required - I understand your live databases are in one place).
What I'm trying to say, is that you can/should test your unit (i.e. your component, query or whatever you define as a unit) on a controlled set of data that would simulate different 'use cases' and once you complete your testing to satisfactory results, then you can proceed to integration + running integration tests.
Integration tests - you could run this in the live environment, but only after you've proved by unit-testing that your component is 'bullet-proof' (if that's OK with your company's approach/philosophy :) - sys admin's reaction:"Are you flippin creazy?!")
If you are trying to go back in time and test already implemented units, then why bother? If they've been in a production use for some time without any incidents then I would argue that they're OK. However, there's always a chance that your unit/query might have some 'slowly ticking time bomb' effect on the side (cumulative effect over time). Well, analyse the impact is the answer.