I am trying to write to 3 different databases: MySQL, Oracle and MongoDB. The requirement is that all 3 databases should be in a consistent state. For e.g if the write to MySQL and Oracle succeeded, and mongo failed(e.g. network failure), then there should be a way to write the failed record back to mongo to keep all 3 records consistent. What's the best way to do this? Should I implement a queue to store failed records and have some background process to read records from the queue and try to write it again to the failed database?
Your best bet would probably be the Java Transaction API (JTA). I have not personally used it but it seems to be the Java "industry standard" for distributed transactions.
Related
I have a situation wherein as part of a online transaction, I have to save some data into other database, a slight latency (few seconds) in updating the other database is fine. Now since both databases are Oracle, I have below 3 options, I need some insight as which one is better.
Oracle Database Links: Wherein I convert the SQL into PL/SQL and make my database take care of writing into another Oracle based database for DEV env both the databases are in same server as different schema while in production they happen to be two separate ORACLE RACs separated by a few routers and switches.
Spring Batch: Use a batch job somehow to pick the transactions from my source database and process and write into another target database. This way my online transactions would not fail it other database ever goes down or hits a perf issue or face a network issue. And if they ever fail I can code for job restart ability. Is Spring batch well suited for such event publishing case? Would I hit any challenge in future?
2-Phase-Commit: I simply implement 2PC and save the data in both the database in a transaction. Or maybe make it look more future proof and save in a messaging system and my source database.
I am trying to use DB data as my Kafka producer in my java code. The source data grows continuously (say 20 rows per second). The whole data is read from DB and added to the Kafka topic every time a new record is inserted into the DB table. I want only the newly appended rows to be sent to the topic (ie. if the table holds 10 rows already and 4 more rows are appended to it, only those 4 rows need to be sent to the topic).
Is there a way to achieve this in java, provided we can also use the Kafka API??
A much easier route would be to use change-data-capture to feed the changes from the database to the Kafka topic. Trying to build this yourself is reinventing a wheel that has already been perfected ;-)
What's your source database? For proprietary RDBMS (Oracle, DB2, MS SQL etc) you have commercial tools such as GoldenGate, Attunity, DBVisit and so on. For open source RDBMS (e.g. MySQL, PostgreSQL) you should look at the open source Debezium tool.
All of these CDC tools integrate directly with Kafka.
The other option you have, depending on your use case, scale, etc is just to pull changed rows from the database using the JDBC Kafka Connect connector. This is not as flexible or scalable as CDC, but still useful and easier than trying to poll the database yourself.
I am developing java application based on spring framework.
It
Connects to a MySQL database
Gets data from MySQLTable1 in POJOs
Manipulates (update,delete) it in memory
Inserts into a Netezza database table
The above 4 processes are done for each client (A,B,C) every hour.
I am using a spring JDBC template to get the data like this:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
and read each record into a POJO before I write it to a Netezza table.
There are going to be multiple instance of this application running every hour through a scheduler.
So Client A and Client B can be running concurrently but the SELECT will be unique,
I mean data for:
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='A' AND COL4='CONDITION'
will be different from
SELECT COL1,COL2,COL3 FROM MySQLTable1 WHERE CLIENTID='B' AND COL4='CONDITION'
But remember all of these are stored in memory as POJOs.
My questions are :
Is there a risk of data contamination?
Is there a need to implement database transaction using spring data transaction manager?
Does my application really need to use something like Spring Batch to deal with this?
I appreciate your thoughts and feedback.
I know this is a perfect scenario for using an ETL tool but that is out of scope.
Is there a risk of data contamination?
It depend on what you are doing with your data but I don't see how you can have data contamination if every instance is independant, you just have to make sure that every instances that run concurrently are not working on the same data (Client ID).
Is there a need to implement database transaction using spring data transaction manager?
You will probably need a transaction for insertion into the Netezza table. You certainly want your data to have a consistent state in the result table. If an error occur in the middle of the process, you'll probably want to rollback everything that was inserted before it failed. Regarding the transaction manager, you don't especially need the Spring transaction manager, but since you are using Spring it might be a good option.
Does my application really need to use something like Spring Batch to deal with this?
Does it really need it, probably not, but Spring Batch was made for those kind of application, so it might help you to structure your application (Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management). Everything can be made without the framework and it might be overkill to use it if you have a really small application. But at the end, if you need those features, you'll probably want to use it...
Spring Batch is ETL, so using it would be a good fit for this use case and also a good alternative to a commercial ETL tool.
Is there a risk of data contamination? Client A and B read separate data, so they can never interfere with each other by reading or writing the same data by accident. The risk would be if two clients with the same ID are created, but that is not the case.
Is there a need to implement database transaction using spring data transaction manager?
There is no mandatory need to do that, although programatic transaction management has many pitfalls and is best avoided. Spring Batch would manage transactions for you, as well as other aspects such as paging.
Does my application really need to use something like Spring Batch to deal with this? There is no mandatory need to do this, although it would help a lot, especially in the paging aspect. How will you handle queries that return thousands of rows? Without a framework this needs to be handled manually.
I have two MongoDB running in two different servers connected via LAN. I want to replicate records from few collections from server 1 to collections in server 2. Is there any way to do it. Below is the pictorial representation of what I want to achieve.
Following are the methods I consider using.
MongoDB replication - But it replicates all collections. Is selective replication possible in MongoDB ??
Oplog watcher APIs - Please suggest some reliable java APIs
Is there any other way to do this ? and what is the best way of doing it ?
MongoDB does not yet support selective replication and it sounds as though you are not actually looking for selective replication but more for selective copying since replication ensures certain rules of using that server.
I am not sure what you mean by a oplog watcher API but it is easy enough to read the oplog over time by just querying it:
> use local
> db.oplog.rs.find()
( http://docs.mongodb.org/manual/reference/local-database/ )
and then storing the latest timestamp of the record you have copied within a script you make.
You can also use a tailable cursor here on the oplog to effectiely listen (pub/sub) to changes and copy them over to your other server.
I have developed a small swing desktop application. This app needs data from other database, so for that I've created an small process using java that allows to get the info (using jdbc) from remote db and copy (using jpa) it to the local database, the problem is that this process take a lot of time. is there other way to do it in order to make faster this task ?
Please let me know if I am not clear, I'm not a native speaker.
Thanks
Diego
One good option is to use the Replication feature in MySQL. Please refer to the MySQL manual here for more information.
JPA is less suited here, as object-relational-mapping is costly, and this is bulk data transfer. Here you probably also do not need data base replication.
Maybe backup is a solution: several different approaches listed there.
In general one can also do a mysqldump (on a table for instance) on a cron task compress the dump, and retrieve it.