I have a requirement where I need to create a JAVA application which will read data from 52 database tables, copy all data which is more than 3 years old to flat files (either csv, txt files), delete these data from tables and store these files on a SFTP server. The database in this case is Sybase ASE 15 version.
Also I need to restore this data in temporary tables when certain reports are to be prepared involving the above archived data.
If I make this application a single threaded application, it will take many hours to complete the task. So I need to make it multi-threaded.
Please suggest should I use only core java or any framework like spring batch. And how to achieve multi-threading in both the cases.
Related
We have source Oracle database, where we have a lot of tabels (let say 100) which we need to mirror to target database. So we need to copy data increments periodically to another db tables. The target database is currently Oracle, but in the short future it will be probably changed to a different database technology.
So currently we can create a PL/SQL procedure which will dynamically generate DML (insert, update or merge statements) for each table (assuming that the source and target table have exactly the same attributes) from Oracle metadata.
But we would rather create some db technology independent solution so when we change target database to another (e.g. MS SQL or Postgres), then we will no need to change whole logic of data mirroring.
Does anyone have a suggestion how to do it differently (preferably in java)?
Thanks for every advice.
The problem you have is called CDC - continuous data capture. In case of Oracle this is complicated because Oracle is usually asking money for this.
So you can use:
PL/SQL or Java and use SQL to incrementally detect changes in data. IT requires plenty of work and performance is bad.
Use tools based on Oracle triggers, which will dects data changes and pushes them into some queue.
Use tool which can parse content of Oracle Archive logs. These are commercial products: GoldenGate (from Oracle) and Shareplex (Dell/EMC/dunno). GoldenDate also contains Java technology(XStreams) which allows you to inject Java visitor into the data stream. Those technologies also support sending data changes into Kafka stream.
There are plenty of tools like Debezium, Informatica, Tibco which can not parse Archived logs by themself, but rather they use Oracle's internal tool LogMiner. These tools usually do not scale well and can not cope with higher data volumes.
Here is quite article in as a summary. If you have money pick GoldenGate or Shareplex. If you don't pick Debezium or any other Java CDC project based on Logminer.
I am trying to use DB data as my Kafka producer in my java code. The source data grows continuously (say 20 rows per second). The whole data is read from DB and added to the Kafka topic every time a new record is inserted into the DB table. I want only the newly appended rows to be sent to the topic (ie. if the table holds 10 rows already and 4 more rows are appended to it, only those 4 rows need to be sent to the topic).
Is there a way to achieve this in java, provided we can also use the Kafka API??
A much easier route would be to use change-data-capture to feed the changes from the database to the Kafka topic. Trying to build this yourself is reinventing a wheel that has already been perfected ;-)
What's your source database? For proprietary RDBMS (Oracle, DB2, MS SQL etc) you have commercial tools such as GoldenGate, Attunity, DBVisit and so on. For open source RDBMS (e.g. MySQL, PostgreSQL) you should look at the open source Debezium tool.
All of these CDC tools integrate directly with Kafka.
The other option you have, depending on your use case, scale, etc is just to pull changed rows from the database using the JDBC Kafka Connect connector. This is not as flexible or scalable as CDC, but still useful and easier than trying to poll the database yourself.
I have a web application written in java and it is about course scheduling. I used file system to store my data. I serialized my collentions (objects in them) and collect them into an serialalized object. Every user has a few file for system data and backup or prepared table data. My program is reading that file (only system data) when user entered and writing it to that file when user exit.
Object design is very complex and there are many many-to-many relations between my objects. Writing data of this program requires so much join operation and well design schema in SQL. Furthermore a user generally have thousands of information containing hundreds of courses, teacher, rooms and even thousands of constraints. Beside of those, there is no connection between users.
In this program I do not see any advantage of using sql. Even my system is running very fast. However lots of developers which do not know structure of my program advised to me to carry my database system to SQL.
What is the advantage of SQL against file system (with serialized objects) especially for my extraordinary system?
To name just a few advantages of using a SQL database in general:
Scalability: You can move your data to a separate server(s) to reduce the load on your application server.
Accessibility: Should you wish to release an extension of your program in the future, that program can run independently and still access the same data without you having to worry about locking files etc.
Security: Database systems allow you to configure "privileges" as who is allowed to write data and who is allowed to read data.
Portability: Should you decide in the future to upgrade your program to a different technology, the SQL language is an industry standard that can relatively easily be ported from one database vendor to another. You can also port your main application independently of your database to another technology given both technologies has drivers / libraries / connection / plug-ins / what-ever-you-want-to-call-it to communicate with your database.
Backups: You can easily run a backup while your application is online. With a files, you might run into locking issues or alternatively have to shut down the application to do a backup.
Human resources: You can employ a database administrator to take care of your databases and reduce the responsibility of your developers/engineers so that they can focus on other tasks.
Monitoring: You can monitor or inspect your data, while your application is live.
However, usually application that uses a SQL database was designed with a relational data-structure in mind from the beginning. Without fully understanding your application, it sound like you have a very complex data structure and that it might not be that easy to migrate to a SQL database. I would recommend that you also have a look at a NoSQL database which
allows large volumes of structured, semi-structured, and unstructured data
encourages object-oriented programming
easy to use
and is much more flexible than SQL
I need to copy about 50 millions of rows with joins among 3 4 tables from one Oracle db to another into a single table. This is a repeating process happening from time to time. I copy only active data (meaning that there's some outdated archived data that not needed any more). We havea special java service which does this procedure via JDBC which is too slow for our needs.
You can use the Transportable Modules
The fundamental functionality of a Transportable Module is to copy a
group of related database objects from one database to another using
the fastest possible mechanisms.
You can use DataPump utility available in Oracle 10g onwards. It gives you the capability to use direct path export. To know more, here is the link -
http://docs.oracle.com/cd/B19306_01/server.102/b14215/dp_export.htm
I have developed a small swing desktop application. This app needs data from other database, so for that I've created an small process using java that allows to get the info (using jdbc) from remote db and copy (using jpa) it to the local database, the problem is that this process take a lot of time. is there other way to do it in order to make faster this task ?
Please let me know if I am not clear, I'm not a native speaker.
Thanks
Diego
One good option is to use the Replication feature in MySQL. Please refer to the MySQL manual here for more information.
JPA is less suited here, as object-relational-mapping is costly, and this is bulk data transfer. Here you probably also do not need data base replication.
Maybe backup is a solution: several different approaches listed there.
In general one can also do a mysqldump (on a table for instance) on a cron task compress the dump, and retrieve it.