We have source Oracle database, where we have a lot of tabels (let say 100) which we need to mirror to target database. So we need to copy data increments periodically to another db tables. The target database is currently Oracle, but in the short future it will be probably changed to a different database technology.
So currently we can create a PL/SQL procedure which will dynamically generate DML (insert, update or merge statements) for each table (assuming that the source and target table have exactly the same attributes) from Oracle metadata.
But we would rather create some db technology independent solution so when we change target database to another (e.g. MS SQL or Postgres), then we will no need to change whole logic of data mirroring.
Does anyone have a suggestion how to do it differently (preferably in java)?
Thanks for every advice.
The problem you have is called CDC - continuous data capture. In case of Oracle this is complicated because Oracle is usually asking money for this.
So you can use:
PL/SQL or Java and use SQL to incrementally detect changes in data. IT requires plenty of work and performance is bad.
Use tools based on Oracle triggers, which will dects data changes and pushes them into some queue.
Use tool which can parse content of Oracle Archive logs. These are commercial products: GoldenGate (from Oracle) and Shareplex (Dell/EMC/dunno). GoldenDate also contains Java technology(XStreams) which allows you to inject Java visitor into the data stream. Those technologies also support sending data changes into Kafka stream.
There are plenty of tools like Debezium, Informatica, Tibco which can not parse Archived logs by themself, but rather they use Oracle's internal tool LogMiner. These tools usually do not scale well and can not cope with higher data volumes.
Here is quite article in as a summary. If you have money pick GoldenGate or Shareplex. If you don't pick Debezium or any other Java CDC project based on Logminer.
Related
As the application gets complicated, one thing that change a lot is the queries, especially if they are complex queries. Wouldn't it be easier to maintain the queries in the db rather then the resources location inside the package, so that it can be enhanced easily without a code change. What are the drawbacks of this?
You can use stores procedures, to save your queries in the database. Than your Java code can just call the procedure from the database instead of building a complex query.
See wikipedia for a more detailed explanation about stored procedures:
https://en.wikipedia.org/wiki/Stored_procedure
You can find details about the implementation and usage in the documentation of your database system (MySql, MariaDb, Oracle...)
When you decide to move logic to the database, you should use a version control system for databases like liquibase: https://www.liquibase.org/get-started/quickstart
You can write the changes to you database code in xml, json or even yaml and check that in in your version control system (svn, git...). This way you have a history of the changes and can roll back to a previous version of your procedure, if something goes wrong.
You also asked, why some people use stored procedures and others keep their queries in the code.
Stored procedures can encapsulate the query and provide an interface to the data. They can be faster than queries. That is good.
But there are also problems
you distribute the buisiness logic of your application to the database and the programm code. It can realy be troublesome, if the logic is spread through all technical layers of your applicaton.
it is not so simple anymore to switch from a Oracle database to a MariaDb, if you use specific features of the database system. You have to migrate or rewrite the procedures.
you have to integrate liquibase or another system into you build pipeline, to keep track of you database changes.
So it depends on the project and it's size, if either of the solutions is better.
I thought about this solution: get data from web service, insert into table and then join with other table, but it will affect perfomance and, also, after this I must delete all that data.
Are there other ways to do this?
You don't return a record set from a web service. HTTP knows nothing about your database or result sets.
HTTP requests and responses are strings. You'll have to parse out the data, turn it into queries, and manipulate it.
Performance depends a great deal on things like having proper indexes on columns in WHERE clauses, the nature of the queries, and a lot of details that you don't provide here.
This sounds like a classic case of "client versus server". Why don't you write a stored procedure that does all that work on the database server? You are describing a lot of work to bring a chunk of data to the middle tier, manipulate it, put it back, and then delete it? I'd figure out how to have the database do it if I could.
no, you don't need save anything into database, there's a number of ways to convert XML to table without saving it into database
for example in Oracle database you can use XMLTable/XMLType/XQuery/dbms_xml
to convert xml result from webservice into table and then use it in your queries
for example:
if you use Oracle 12c you can use JSON_QUERY: Oracle 12ะก JSON
XMLTable: oracle-xmltable-tutorial
this week discussion about converting xml into table data
It is common to think about applications having a three-tier structure: user interface, "business logic"/middleware, and backend data management. The idea of pulling records from a web service and (temporarily) inserting them into a table in your SQL database has some advantages, as the "join" you wish to perform can be quickly implemented in SQL.
Oracle (as other SQL DBMS) features temporary tables which are optimized for just such tasks.
However this might not be the best approach given your concerns about performance. It's a guess that your "middleware" layer is written in Java, given the tags placed on the Question, and the lack of any explicit description suggests you may be attempting a two-tier design, where user interface programs connect directly with the backend data management resources.
Given your apparent investment in Oracle products, you might find it worthwhile to incorporate Oracle Middleware elements in your design. In particular Oracle Fusion Middleware promises to enable "data integration" between web services and databases.
I need to copy about 50 millions of rows with joins among 3 4 tables from one Oracle db to another into a single table. This is a repeating process happening from time to time. I copy only active data (meaning that there's some outdated archived data that not needed any more). We havea special java service which does this procedure via JDBC which is too slow for our needs.
You can use the Transportable Modules
The fundamental functionality of a Transportable Module is to copy a
group of related database objects from one database to another using
the fastest possible mechanisms.
You can use DataPump utility available in Oracle 10g onwards. It gives you the capability to use direct path export. To know more, here is the link -
http://docs.oracle.com/cd/B19306_01/server.102/b14215/dp_export.htm
I have an data centric & data sensitive application, which is written using java, but almost all the business logic is maintained in a .sql files.
These sql files are executed 1 by 1 , temporary table is created and updated by these sql files.
Internally these sql files fire update queries on temporary table with available data values on various conditions.
finally the temporary table is dumped into a physical table.
We are planning to move this to java rule based application as sql scripts are getting huge and hard to understand as well as maintain.
Planning to have all the data in memory using Lucene & its RAMDiirectory, what would be the preferred choice for building rules (these are nothing but update queries in sql)
Was looking # scripting languages to have dynamic rules, but scripts (rhino / groovy)
have same characteristics as that of sql files (hard to write & maintain)
Please post your suggestions.
Thanks in advance!!
Our company uses Drools. Works really great for us. Drools normally has you write your rules in an XML-based format, but we just extended some of their classes so we could write our rules in Java (allows us to debug the rules at runtime).
We also use JBoss Rules / Drools.The newer version (> 4.0.0) has a nice DSL that is perfectly readable and maintainable.No more XML is needed.
I need to create a storage file format for some simple data in a tabular format, was trying to use HDF5 but have just about given up due to some issues, and I'd like to reexamine the use of embedded databases to see if they are fast enough for my application.
Is there a reputable embedded Java database out there that has the option to store data in one file? The only one I'm aware of is SQLite (Java bindings available). I tried H2 and HSQLDB but out of the box they seem to create several files, and it is highly desirable for me to have a database in one file.
edit: reasonably fast performance is important. Object storage is not; for performance concerns I only need to store integers and BLOBs. (+ some strings but nothing performance critical)
edit 2: storage data efficiency is important for larger datasets, so XML is out.
Nitrite Database http://www.dizitart.org/nitrite-database.html
NOsql Object (NO2 a.k.a Nitrite) database is an open source nosql
embedded document store written in Java with MongoDB like API. It
supports both in-memory and single file based persistent store.
H2 uses only one file, if you use the latest H2 build with the PAGE_STORE option. It's a new feature, so it might not be solid.
If you only need read access then H2 is able to read the database files from a zip file.
Likewise if you don't need persistence it's possible to have an in-memory only version of H2.
If you need both read/write access and persistence, then you may be out of luck with standard SQL-type databases, as these pretty much all uniformly maintain the index and data files separately.
Once i used an object database that saved its data to a file. It has a Java and a .NET interface. You might want to check it out. It's called db4o.
Chronicle Map is an embedded pure Java database.
It stores data in one file, i. e.
ChronicleMap<Integer, String> map = ChronicleMap
.of(Integer.class, String.class)
.averageValue("my-value")
.entries(10_000)
.createPersistedTo(databaseFile);
Chronicle Map is mature (no severe storage bugs reported for months now, while it's in active use).
Idependent benchmarks show that Chronicle Map is the fastest and the most memory efficient key-value store for Java.
The major disadvantage for your use case is that Chronicle Map supports only a simple key-value model, however more complex solution could be build on top of it.
Disclaimer: I'm the developer of Chronicle Map.
If you are looking for a small and fast database to maybe ship with another program I would check Apache Derby I don't know how you would define embedded-database but I used this in some projects as a debugging database that can be checked in with the source and is available on every developer machine instantaneous.
This isn't an SQL engine, but If you use Prevayler with XStream, you can easily create a single XML file with all your data. (Prevayler calls it a snapshot file.)
Although it isn't SQL-based, and so requires a little elbow grease, its self-contained nature makes development (and especially good testing) much easier. Plus, it's incredibly fast and reliable.
You may want to check out jdbm - we use it on several projects, and it is quite fast. It does use 2 files (a database file and a log file) if you are using it for ACID type apps, but you can drop directly to direct database access (no log file) if you don't need solid ACID.
JDBM will easily support integers and blobs (anything you want), and is quite fast. It isn't really designed for concurrency, so you have to manage the locking yourself if you have multiple threads, but if you are looking for a simple, solid embedded database, it's a good option.
Since you mentioned sqlite, I assume that you don't mind a native db (as long as good java bindings are available). Firebird works well with java, and does single file storage by default.
Both H2 and HSQLDB would be excellent choices, if you didn't have the single file requirement.
I think for now I'm just going to continue to use HDF5 for the persistent data storage, in conjunction with H2 or some other database for in-memory indexing. I can't get SQLite to use BLOBs with the Java driver I have, and I can't get embedded Firebird up and running, and I don't trust H2 with PAGE_STORE yet.