I want to connect to multiple databases at the same time in openbravo so I would be able to store data in two different databases(for example: mysql and postgresql) for any transaction in the app.
Is there any clean way to do that and keep minimal changes to the existing code?
Thanks
I think, you should use replication for this task. It would be more clean and right solution from application architecture perspective.
You might configure 2 databases (with some out of-the-box solution or boiler-plate code). But it would decrease the application performance because each time when app would trigger a query, it must be executed at two DB instances. And in case of transactions, it would get even more complex/slow.
So replication is best way for such task. If you want to use selective replication use Tungsten. Let me know your specific need that can't be met with replication. I might point some more ideas for that.
Related
I am going to make a desktop application with mysql database. My database tables are frequenlty changing -- almost 60% of the tables. So I think caching may be a bad idea. Can anyone suggest me:
How can I make a fast desktop application with a remote database ?
My language is Java.
The biggest problem with most projects that have performance as their primary concern is that people tend make some exotic choices that end up complicating the project without any real benefits. Unless you have previous actual hands-on experience with the environment you will be working start simple.
Set some realistic goals about how often you have to refresh your data before you start. If your data changes very frequently, eg. every second, does it make sense to try and show the changes in real time? A query every second will make everyone involved miserable.
Use a thread to take care of the queries. You don't need more than one, since any more will only make the race conditions in the database worst.
Design your database layer to be insulated from the rest of the application. Also time your DB-related operations from the beginning in order to track the impact of your optimizations.
Start with Hibernate / ORMLite. Although I cannot talk about ORMLite, I have used (optimized) Hibernate in heavy load environments without any problems. If you have complicated objects you should give it a try, it sure beats using plain JDBC and implementing the cache mechanism yourself.
Find out when you need lazy loading and when it's slowing you down (due to the select n+1 problem).
If you have performance issues optimize. You don't have to map every single relationship. Use custom SQL in separate methods to get the objects you need when you need them. You can write a query that only returns table ids and afterwards ask Hibernate to load the corresponding objects.
Optimize your SQL. Avoid joins, use subselects, where id in etc.
Implement (database) paging if it makes sense.
If all else fails, start using plain SQL. You' ll have already written the most complex queries and you'll know where your bigger bottlenecks are.
You could use a local SQLite to save the less volatile data and talk to the database mainly to get lists of ids and the stuff that you're missing. For example if you have users and orders, you can assume that you will have many more new orders per minute/second than users per hour.
To sum up, set clear performance goals before you start, always use a separate thread for data retrieval, avoid reinventing the wheel and keep it as simple as possible.
Here goes some generic approaches to the problem.
0) HW: make sure you are not having bottlenecks in you hardware, that you can cheaply increase. (adding HW is faster and cheaper that dev hours in most cases)
1) Caching:
Perhaps you can cache (locally or in a distributed cache like memcache) the 40% of data that tends to be immutable. You could invalidate the cache when data gets modified. You should choose the right entities and granularity level for building the keys.
2) Replication:
If the first is to much overhead, you could create slaves of your mysql and read from there. Again, you have to know when you can afford to have some stale data.
3) NoSQL:
Moving in that direction, but increasing the dev effort, you could move to some distributed store (take a look at the CAP theorem before making a choice)
Hope it helps
Depends on your database structure and application. You can use an object relational mapping library like ormlite and refresh objects loaded from database at the background with threads. With ormlite you may also use LazyForeignCollection to load only required data in your application.
Minimize unnecessary database call.
If your fields on database is changing, you can shift from relational to NoSQL database like MongoDB.
You can perform multithreading on the server side for data processing and clustering of application servers. While using multithreading use it effectively, be aware of the sychronized keyword, it will degrade the performance to some extend.
Perform best practice of coding, don't use more instance variable, try to use local variable, it will make you thread safe also.
You can use Mybatis for ORM also for large queries.
You can perform caching on DAO layer, service layer and even in client side but be sure to sychronize with the database, you can use different caching soutions.
You can do database indexing for first retrival.
Do not use same service for large data querying break it down into different services which will help u to process in multithreading way.
If the application is not very hard real time system you can use messaging solution also, like asychronously processing of data.
I am developing java application based on spring framework.
It
Connects to a MySQL database
Gets data from MySQLTable1 in POJOs
Manipulates (update,delete) it in memory
Inserts into a Netezza database table
The above 4 processes are done for each client (A,B,C) every hour.
I am using a spring JDBC template to get the data like this:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
and read each record into a POJO before I write it to a Netezza table.
There are going to be multiple instance of this application running every hour through a scheduler.
So Client A and Client B can be running concurrently but the SELECT will be unique,
I mean data for:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
will be different from
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='B' AND COL4='CONDITION'
But remember all of these are stored in memory as POJOs.
My questions are :
Is there a risk of data contamination?
Is there a need to implement database transaction using spring data transaction manager?
Does my application really need to use something like Spring Batch to deal with this?
I appreciate your thoughts and feedback.
I know this is a perfect scenario for using an ETL tool but that is out of scope.
Is there a risk of data contamination?
It depend on what you are doing with your data but I don't see how you can have data contamination if every instance is independant, you just have to make sure that every instances that run concurrently are not working on the same data (Client ID).
Is there a need to implement database transaction using spring data transaction manager?
You will probably need a transaction for insertion into the Netezza table. You certainly want your data to have a consistent state in the result table. If an error occur in the middle of the process, you'll probably want to rollback everything that was inserted before it failed. Regarding the transaction manager, you don't especially need the Spring transaction manager, but since you are using Spring it might be a good option.
Does my application really need to use something like Spring Batch to deal with this?
Does it really need it, probably not, but Spring Batch was made for those kind of application, so it might help you to structure your application (Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management). Everything can be made without the framework and it might be overkill to use it if you have a really small application. But at the end, if you need those features, you'll probably want to use it...
Spring Batch is ETL, so using it would be a good fit for this use case and also a good alternative to a commercial ETL tool.
Is there a risk of data contamination? Client A and B read separate data, so they can never interfere with each other by reading or writing the same data by accident. The risk would be if two clients with the same ID are created, but that is not the case.
Is there a need to implement database transaction using spring data transaction manager?
There is no mandatory need to do that, although programatic transaction management has many pitfalls and is best avoided. Spring Batch would manage transactions for you, as well as other aspects such as paging.
Does my application really need to use something like Spring Batch to deal with this? There is no mandatory need to do this, although it would help a lot, especially in the paging aspect. How will you handle queries that return thousands of rows? Without a framework this needs to be handled manually.
Here is my scenario:
I have a java application that reads data from a table T1 of database D1, processes it and puts it in another table T2 of another database D2. This happens real time, i.e., as and when a record is inserted or updated in table T1, the application will pick the data, process it and pushes it to the destination table. I wish to monitor performance of this application using a testing(preferrably JUnit) and/or performance framework. In my test case I wish to have following
Insert and update a fixed number of records for fixed time at fixed intervals on table T1 of database D1.
After a fixed time, either check the number of records that are present in T2 of database D2 or look for existence of a specific record.
The tests that I wish to create should be
Database agnostic
Provide results that can show trends and be configurable with a CI tool like Jenkins
So, my question is, what is the best way to test this kind of scenario? Are there any available tools that will help me achieve this?
Database agnostic
In order to achieve that I would suggest to use simplest possible SQL and some low-level JDBC abstraction layer:
DBUtils
The Commons DbUtils library is a small set of classes designed to make
working with JDBC easier. JDBC resource cleanup code is mundane, error
prone work so these classes abstract out all of the cleanup tasks from
your code leaving you with what you really wanted to do with JDBC in
the first place: query and update data.
MyBatis
MyBatis is a first class persistence framework with support for custom
SQL, stored procedures and advanced mappings. MyBatis eliminates
almost all of the JDBC code and manual setting of parameters and
retrieval of results. MyBatis can use simple XML or Annotations for
configuration and map primitives, Map interfaces and Java POJOs (Plain
Old Java Objects) to database records.
Both will do the trick for you. With good attention to details you'll manage to provide flexible enough solution and test as many databases as you want.
Provide results that can show trends and be configurable with a CI tool like Jenkins
Define several KPIs and make sure you can get all values periodically. For example you can measure a throughput (records per second). Export data periodically (as CSV or properties for example) and use PlotPlugin for visualization:
You can also check relevant question: How do I plot benchmark data in a Jenkins matrix project
Proper testing
Please make sure your testing strategy is well defined and you will not miss anything:
Load testing
Stress testing
I have developed a small swing desktop application. This app needs data from other database, so for that I've created an small process using java that allows to get the info (using jdbc) from remote db and copy (using jpa) it to the local database, the problem is that this process take a lot of time. is there other way to do it in order to make faster this task ?
Please let me know if I am not clear, I'm not a native speaker.
Thanks
Diego
One good option is to use the Replication feature in MySQL. Please refer to the MySQL manual here for more information.
JPA is less suited here, as object-relational-mapping is costly, and this is bulk data transfer. Here you probably also do not need data base replication.
Maybe backup is a solution: several different approaches listed there.
In general one can also do a mysqldump (on a table for instance) on a cron task compress the dump, and retrieve it.
I need to create project in which there are two databases local and remote. Remote database needs to be synchronized daily with local database reflecting changes made in local database.
I am using JAVA. Database is ORACLE. I have JAVA/JPA code that does CRUD operations on local database.
How to synchronize changes to remote database.
I would not do this in Java, but look for native Oracle database synchronization mechanisms/tools. This will
be quicker to implement
be more robust
have faster replication events
be more 'correct'
Please look at some synchronization products. SQL Anywhere from Sybase where I work is one such product. You may be able to get a developer/evaluation copy that you can use to explore your options. I am sure Oracle has something similar too.
The basic idea is to be able to track the changes that have happened in the central database. This is typically done by keeping a timestamp for each row. During the synchronization, the remote database provides the last sync time and the server sends to it all rows that have changed since then. Note that the rows that have been deleted in the central database will need some special handling to ensure they get deleted from the remote database.
A true two-way synchronization is lot more complex. You need to also upload the changes from remote database to central and also some conflict resolution strategies have to be implemented for the cases when the same row has been changed in both the remote and central database in incompatible way in the two.
The general problem is too complex to be explained in a respone here but I hope I have been able to provide some useful pointers.
The problem is that what you are asking can range from moderately difficult (for a simple, not very robust system) to a very complex product that could keep a small team busy for a year depending on requirements.
That's why the other answers said "Find another way" (basically)
If you have to do this for a class assignment or something, it's possible but it probably won't be quick, robust or easy.
You need server software on each side, a way to translate unknown tables to data that can be transferred over the wire (along with enough meta-data to re-create it on the other side) and you'll probably want to track database changes (perhaps with a flag or timestamp) so that you don't have to send each record over every time.
It's a hard enough problem that we can't really help much. If I HAD to do that for a customer, I'd quote him at least a man year of work to get it even moderately reliable.
Good Luck
Oracle has a sophistication replication functionality to synchronise databases. Find out more..
From your comments it appears you're using the Oracle Lite: this supports replication, which is covered in the Lite documentation.
Never worked with it, but http://symmetricds.codehaus.org/ might be of use