MongoDB replica set - distribute queries

MongoDB replica set - distribute queries - java

I have a MongoDB replica set and want to know if it is possible to distribute queries evenly between the members of the set to increase the performance. If it is possible please let me know how to achieve this.
I'm using the 10gen Java-driver 2.12.1.
Thank for your help.

It is possible, but you would need to implement the query distribution at the driver level using a ReadPreference.
You should carefully consider the fact that secondary reads may return stale data. You should also carefully consider whether or not the added complexity of implementing the query distribution and tolerating stale data are warranted. I would suggest that you start by doing some performance tests using the primary (default) ReadPreference.
It is also worth pointing out that what you are trying to accomplish (read scaling) can be better accomplished using sharding, without the need for additional client logic and use of a secondary ReadPreference.

Related

How Can I Make Fast Desktop Application With Remote Database?

I am going to make a desktop application with mysql database. My database tables are frequenlty changing -- almost 60% of the tables. So I think caching may be a bad idea. Can anyone suggest me:
How can I make a fast desktop application with a remote database ?
My language is Java.

The biggest problem with most projects that have performance as their primary concern is that people tend make some exotic choices that end up complicating the project without any real benefits. Unless you have previous actual hands-on experience with the environment you will be working start simple.
Set some realistic goals about how often you have to refresh your data before you start. If your data changes very frequently, eg. every second, does it make sense to try and show the changes in real time? A query every second will make everyone involved miserable.
Use a thread to take care of the queries. You don't need more than one, since any more will only make the race conditions in the database worst.
Design your database layer to be insulated from the rest of the application. Also time your DB-related operations from the beginning in order to track the impact of your optimizations.
Start with Hibernate / ORMLite. Although I cannot talk about ORMLite, I have used (optimized) Hibernate in heavy load environments without any problems. If you have complicated objects you should give it a try, it sure beats using plain JDBC and implementing the cache mechanism yourself.
Find out when you need lazy loading and when it's slowing you down (due to the select n+1 problem).
If you have performance issues optimize. You don't have to map every single relationship. Use custom SQL in separate methods to get the objects you need when you need them. You can write a query that only returns table ids and afterwards ask Hibernate to load the corresponding objects.
Optimize your SQL. Avoid joins, use subselects, where id in etc.
Implement (database) paging if it makes sense.
If all else fails, start using plain SQL. You' ll have already written the most complex queries and you'll know where your bigger bottlenecks are.
You could use a local SQLite to save the less volatile data and talk to the database mainly to get lists of ids and the stuff that you're missing. For example if you have users and orders, you can assume that you will have many more new orders per minute/second than users per hour.
To sum up, set clear performance goals before you start, always use a separate thread for data retrieval, avoid reinventing the wheel and keep it as simple as possible.

Here goes some generic approaches to the problem.
0) HW: make sure you are not having bottlenecks in you hardware, that you can cheaply increase. (adding HW is faster and cheaper that dev hours in most cases)
1) Caching:
Perhaps you can cache (locally or in a distributed cache like memcache) the 40% of data that tends to be immutable. You could invalidate the cache when data gets modified. You should choose the right entities and granularity level for building the keys.
2) Replication:
If the first is to much overhead, you could create slaves of your mysql and read from there. Again, you have to know when you can afford to have some stale data.
3) NoSQL:
Moving in that direction, but increasing the dev effort, you could move to some distributed store (take a look at the CAP theorem before making a choice)
Hope it helps

Depends on your database structure and application. You can use an object relational mapping library like ormlite and refresh objects loaded from database at the background with threads. With ormlite you may also use LazyForeignCollection to load only required data in your application.

Minimize unnecessary database call.
If your fields on database is changing, you can shift from relational to NoSQL database like MongoDB.
You can perform multithreading on the server side for data processing and clustering of application servers. While using multithreading use it effectively, be aware of the sychronized keyword, it will degrade the performance to some extend.
Perform best practice of coding, don't use more instance variable, try to use local variable, it will make you thread safe also.
You can use Mybatis for ORM also for large queries.
You can perform caching on DAO layer, service layer and even in client side but be sure to sychronize with the database, you can use different caching soutions.
You can do database indexing for first retrival.
Do not use same service for large data querying break it down into different services which will help u to process in multithreading way.
If the application is not very hard real time system you can use messaging solution also, like asychronously processing of data.

Way to find incremental changes in the system

We have a Java based system with postgres as database. For some reasons we want to propagate certain changes on timely basis (say 1 hour) to a different location. The two broad approaches are
Logging all the changes to a file as and when that happens. However
this approach will scatter the code everywhere.
Somehow find the incremental changes in postgres between two time stamps in
some log files and send that. However I am not sure how feasible is this
approach.
Anyone has any thoughts/ideas around this?

Provided that the database size is not very great, you could do it quick&dirt by just:
Dumping the entire postgresql to a textfile.
(If the dump file is not sorted *1) sorting the textfile.
Create a diff file with the previous dump file.
Of course, I would only advice this for a situation where your database is going to be kept relatively small and you are just going to use it for a couple of servers.
*1: I do not know if it is somehow sorted, check the docs.

There are a few different options available:
Depending on the amount of data being written you could give Bucardo a try.
Otherwise it is also possible to do something with PgQ in combination with Londiste
Or create something yourself by using triggers so you can generate some kind of audit table

There are many pre-packaged approaches, so you probably don't need to develop your own. Many of the options are summarized and compared on this Wiki page:
http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling
Many of them are based on the use of triggers to capture the data, with automatic generation of the triggers based on a more user-friendly interface.

Instead of writing your own solution, I would advise to leverage work already done by others. And in the case you described I would go for PgQ + Londiste (both part of Skytools package), that are easy to set up and use. If you do not want streaming replication, you could still use PgQ / Londiste to easily capture DMLs and write them to a file that you can load when needed. This would allow you expand your setup / processing when new requirements come.

connect to multiple databases at the same time in openBravo?

I want to connect to multiple databases at the same time in openbravo so I would be able to store data in two different databases(for example: mysql and postgresql) for any transaction in the app.
Is there any clean way to do that and keep minimal changes to the existing code?
Thanks

I think, you should use replication for this task. It would be more clean and right solution from application architecture perspective.
You might configure 2 databases (with some out of-the-box solution or boiler-plate code). But it would decrease the application performance because each time when app would trigger a query, it must be executed at two DB instances. And in case of transactions, it would get even more complex/slow.
So replication is best way for such task. If you want to use selective replication use Tungsten. Let me know your specific need that can't be met with replication. I might point some more ideas for that.

Strategies for dealing with constantly changing requirements for MySQL schemas?

I'm using Hibernate EntityManager and Hibernate Annotations for ORM in a very early stage project. The project needs to launch soon, but the specs are changing constantly and I am concerned that the system will be launched and live data will be collected, and then the specs will change again and I will be in a situation where I need to change the database schema.
How can I set things up in order to minimize the impact of this? Are there any open source projects that deal with this kind of migration? Can Hibernate do this automatically (without wiping the database)?
Your advice is much appreciated.

It's more a functional or organizational problem than a technical one. No tool will automatically guess how to migrate data from one schema to another one. You'd better learn how to write stored procedure in order to migrate your data.
You'll probably need to disable constraints, create temporary table and columns, copy lots of data, and then delete the temporary tables and columns and re-enable constraints to have migrate your data.
Once in maintenance mode, every new feature that modifies the schema should also come with the script allowing to migrate from the current schema and data in production to the new one.

No system can possibly create datamigration scripts automatically from just the original and the final schema. There just isn't enough information.
Consider for example a new column. Should it just contain the default value? Or a value calculated from other fields/tables.
There is a good book about refactoring databases: http://www.amazon.com/Refactoring-Databases-Evolutionary-Addison-Wesley-Signature/dp/0321774515/ref=sr_1_1?ie=UTF8&qid=1300140045&sr=8-1
But there is little to no tool support for this kind of stuff.
I think the best thing you can do in advance:
Don't let anybody access the database but your application
If something else absolutely must access the db directly, give it a separate set of view specially for that purpose. This allows you to change your table structure by keeping at least the structure of what other systems see.
Have tons of tests. I just posted an article wich (with the upcoming 2nd and 3rd part) might help a little with this: http://blog.schauderhaft.de/2011/03/13/testing-databases-with-junit-and-hibernate-part-1-one-to-rule-them/

Hibernate can update the database entity model with data in the database. So do that and write migration code in java which sets or removes data relationships.
This works, and we have done it multiple times. But of course, try to follow a flexible development process; make what you know for sure first, then reevaluate the requirements - scrum etc.

In your case, I would recommend a NoSQL database. I don't have much experience with such kind of databases so I can't recommend any current implementation so you may want to check this too.

What JDBC tools do you use for synchronization of data sources?

I'm hoping to find out what tools folks use to synchronize data between databases. I'm looking for a JDBC solution that can be used as a command-line tool.
There used to be a tool called Sync4J that used the SyncML framework but this seems to have fallen by the wayside.

I have heard that the Data Replication Service provided by Db4O is really good. It allows you to use Hibernate to back onto a RDBMS - I don't think it supports JDBC tho (http://www.db4o.com/about/productinformation/drs/Default.aspx?AspxAutoDetectCookieSupport=1)
There is an open source project called Daffodil, but I haven't investigated it at all. (https://daffodilreplicator.dev.java.net/)
The one I am currently considering using is called SymmetricDS (http://symmetricds.sourceforge.net/)
There are others, they each do it slightly differently. Some use triggers, some poll, some use intercepting JDBC drivers. You need to decide what technical limitations you are under to determine which one you really want to use.
Wikipedia provides a nice overview of different techniques (http://en.wikipedia.org/wiki/Multi-master_replication) and also provides a link to another alternative DBReplicator (http://dbreplicator.org/).

If you have a model and DAO layer that exists already for your codebase, you can just create your own sync framework, it isn't hard.
Copy data is as simple as:
read an object from database A
remove database metadata (uuid, etc)
insert into database B
Syncing has some level of knowledge about what has been synced already. You can either do it at runtime by getting a list of uuids from TableInA and TableInB and working out which entries are new, or you can have a table of items that need to be synced (populate with a trigger upon insert/update in TableInA), and run from that. Your tool can be a TimerTask so databases are kept synced at the time granularity that you desire.
However there is probably some tool out there that does it all without any of this implementation faff, and each implementation would be different based on business needs anyway. In addition at the database level there will be replication tools.

True synchronization requires some data that I hope your database schema has (you can read the SyncML doc to see how they proceed). Sync4J won't help you much, it's really high-level and XML oriented. If you don't foresee any conflicts (which means: really easy synchronisation), you could try with a lightweight ETL like Enhydra Octopus.

I'm primarily using Oracle at the moment, and the most full-featured route I've come across is Red Gate's Data Compare:
http://www.red-gate.com/products/oracle-development/data-compare-for-oracle/
This old blog gives a good summary of the solution routes available:
http://www.novell.com/coolsolutions/feature/17995.html
The JDBC-specific offerings I've come across have been very basic. The solution mentioned by Aidos seems the most feature complete if you want to go down the publish-subscribe route:
http://symmetricds.codehaus.org/
Hope this helps.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.