Processing a text file from PL/SQL vs Java

Processing a text file from PL/SQL vs Java - java

I need to implement a store procedure in an Oracle database that will do the following:
Read an external file that needs to be processed (extract data from file and validate)
Call another store procedure in the database in charge to validate/insert the data.
Manage exceptions.
Write to another file with the results of the operations executed.
I know I can do all these things with PL/SQL or Java (store procedure), but which will be more efficient/faster or better? most of the operations are reading/writing a file, and the database operations are done in a store procedure already.
I have read other posts about PL/SQL vs Java (like this and this) but none talks about this.

I would never want to use a SQL dialect, no matter how versatile it may be, to do anything outside the DB. I would do what you're trying to do preferably in a shell or perl script, to use the lowest common denominator, although Java is OK, just perhaps a bit too sophisticated for a job this simple. But if Java is all you have or know how to use, then go for it.

Related

Java Application - Can i Store my sql queries in the DB rather than a file packaged inside the application?

As the application gets complicated, one thing that change a lot is the queries, especially if they are complex queries. Wouldn't it be easier to maintain the queries in the db rather then the resources location inside the package, so that it can be enhanced easily without a code change. What are the drawbacks of this?

You can use stores procedures, to save your queries in the database. Than your Java code can just call the procedure from the database instead of building a complex query.
See wikipedia for a more detailed explanation about stored procedures:
https://en.wikipedia.org/wiki/Stored_procedure
You can find details about the implementation and usage in the documentation of your database system (MySql, MariaDb, Oracle...)
When you decide to move logic to the database, you should use a version control system for databases like liquibase: https://www.liquibase.org/get-started/quickstart
You can write the changes to you database code in xml, json or even yaml and check that in in your version control system (svn, git...). This way you have a history of the changes and can roll back to a previous version of your procedure, if something goes wrong.
You also asked, why some people use stored procedures and others keep their queries in the code.
Stored procedures can encapsulate the query and provide an interface to the data. They can be faster than queries. That is good.
But there are also problems
you distribute the buisiness logic of your application to the database and the programm code. It can realy be troublesome, if the logic is spread through all technical layers of your applicaton.
it is not so simple anymore to switch from a Oracle database to a MariaDb, if you use specific features of the database system. You have to migrate or rewrite the procedures.
you have to integrate liquibase or another system into you build pipeline, to keep track of you database changes.
So it depends on the project and it's size, if either of the solutions is better.

Which is the best way to handle big CSV files (Java, MySQL, MongoDB)

i need to handle a big CSV file with around +750.000 rows of data. Each line has around 1000+ characters and ~50 columns, and i am really not sure what's the best (or atleast good and sufficient) way to handle and manipulate this kind of data.
I need to do the following steps:
Compare the values of two Colomns and write the result to a new column (this one seems easy)
Compare values of two lines and do stuff. (e.g delete if one value is duplicated.)
Compare values of two different files.
My Problem is that this is currently done with PHP and/ or Excel and the limits are nearly exceeded + this takes a long time to process and will be no longer possible when the files get even bigger.
I have 3 different possibilities in mind:
Use MySQL, create a table (or two) and do the comparing, adding or deleting part. (I am not really familiar with SQL and would have to learn it, also it should be done automatically so there is the problem that you cant create tables of CSV files )
Use Java creating Objects in ArrayList or Linked Lists and to "the stuff" (to operations would be easy but handling that much data will probably be the problem)
(Is it even possible to save that many files in Java or does it crash / is there a good tool etc.?)
Use Clojure along with MongoDB to add files from CSV to MongoDB and read files using Mongo.
(Name additional possibilities if you have another idea ..)
All in all I am not a Pro in any of these but would like to solve this problem / get some hints or even your opinion.
Thanks in advance

Since in our company we work a lot with huge csv files here are some ideas:
because these files are in our case always exported from some other relational database we always use PostgreSQL, MySQL or golang + SQLite to be able to use simple plain SQL queries which are in these cases most simple and reliable solution
number of rows you describe is quite low from the point of view of all these databases so do not worry
all have native internal solution for import / export of CSV - which works much quicker than anything created manually
for repeated standard checks I use golang + SQLite with :memory: database - this is definitely the quickest solution
MySQL is definitely very good and quick for checks you described but choose of database depends also on how sophisticated analysis you would need to do further - for example MySQL up to 5.7 still does not have window functions which you could need later - so consider using PostgreSQL in some cases too...

I normally use PostgreSQL for this kind of tasks. PostgreSQL COPY allows importing CSV data easily. Then you get a table with your CSV data and the power for SQL (and a reasonable database) to do basically anything you want with the data.
I am pretty sure MySQL have similar capabilities of importing CSV, I just generally prefer PostgreSQL.
I would not use Java for CSV processing. This will be too much code and unless you take care of indices, the processing will not be performant. An SQL database is much better equiped for tabular data processing (should not be a surprize).
I wouldn't use MongoDB, my impression is that it is less powerful in update operations compared to an SQL database. But this is just an opinion, take it with a grain of salt.

You should try Python with the pandas package. On a machine with enough memory (say 16GB) it should be able to handle your CSV files with ease. The main thing is - anyone with some experience with pandas will be able to develop a quick script for you and tell you in a few minutes if your job is doable or not. To get you started:
import pandas
df = pandas.read_csv('filename.csv')
You might need to specify the column type if you get into memory issues.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

I'd suggest to use Spark. Even in a standalone machine the performance is incredible. You can use Scala and Python to handle your data. It's flexible and you can do processing that is impossible in Java or relational database.
The other choices are great also, but I'd consider Spark to all analytics needs from now on.

Execute Java upon change in a mysql database [duplicate]

I am trying to create some MySQL code that will invoke a Java program from a trigger.
Here is what I have so far:
CREATE TRIGGER trig_name after insert ON studentinfo
FOR EACH ROW
BEGIN
END
The trigger content would then call the Java program. Is this possible?

Though not a standard feature this is very well possible with MySQL. You can use the SELECT .. INTO OUTFILE statement from inside the trigger to write to a named pipe (Windows) or memroy filesystem (Linux). Both of those can easily be monitored from Java code (or any other code for that matter). Using this technique you will avoid polling and because no actual disk access takes place either you will have good performance.
I have written a Java package for this actually so I'm 100% sure it is possible and performs well. Unfortunately I am not allowed to share my efforts here (my previous answer was deleted by a moderator) so you will have to code it yourself, sorry.

A direct answer: no you can't call a java method from a mysql trigger. If you had an oracle database you could, but not mysql.
To do what you want to do with mysql you can
make the code that updates the database also notify the swing application. Or you can
make the trigger accumulate data on pending operations in a separate table that you read periodically from the swing app.

Calling a java method from an SQL database isn't a standard feature. The Informix DB can call a shell script from a stored procedure, but I don't know of a feature like this in MySQL (I'm not an expert on mysql).
The closest thing that works with all databases would be to have a thread and periodically poll the database for new records.
SELECT * FROM studentinfo WHERE id > last_seen_id
Or you could use a timestamp:
SELECT * FROM studentinfo WHERE create_date >= last_seen_create_date
In this case you would have to filter duplicated rows which have already loaded from the previous run.

using sqlldr from java

I have a Java utility for database imports. I'd like to be able to use sqlldr for performance on oracle. I could create the control and data files, but that doesn't seem like The Right Thing™ to do. I should be able to stream the data by providing INFILE "-" in the control file (q1 - how? from command line, I can pipe "echo <data...>" to the sqlldr, but there must be a way to just stream the string into the input stream for the process? never used Java for this before). I can't see how to stream the control file itself (q2 - or am I missing something obvious?). I could use named pipes, but I have no idea how to instantiate and use them from Java in windows (q3 - would that work and how?).
<moan>why must oracle be so complicated? it was trivial in mysql...<moan>

"why must oracle be so complicated? it
was trivial in mysql"
What you must remember is, Oracle is a venerable product. SQL Loader as a utility must be twenty years old, maybe more. So naturally it is harder to work with than some newer tools.
And that is why you should stop trying to fit SQL Loader into your new-fangled Java app :-) Look at external tables instead. Because these are database objects we can use SQL SELECTs against them, so it's a whole easier to automate load processes with them. I wrote a bit more about external tables in my answer to another question. Check it out.

Fundamentally SQLLDR is about getting data from one or more files into a database table. It is powerful in that role, especially when dealing with multiple files or parallel loads from a single file (it can have multiple threads/processes reading from the same file at the same time).
Not all of these fit well with reading from something that isn't a real file. If your data stream is coming from a web service, then I'd pull it using UTL_HTTP. If it is coming from FTP, then I'd FTP straight into the database as a CLOB/BLOB and process it from there.
Depending on your version, also look at the preprocessor capabilities of external tables

Alternative of Storing data except databases like mysql,sql etc

I had completed my project Address Book in Java core, in which my data is stored in database (MySql).
I am facing a problem that when i run my program on other computer than tere is the requirement of creating the hole data base again.
So please tell me any alternative for storing my data without using any database software like mysql, sql etc.

You can use an in-memory database such as HSQLDB, Derby (a.k.a JavaDB), H2, ..
All of those can run without any additional software installation and can be made to act like just another library.

I would suggest using an embeddable, lightweight database such as SQLite. Check it out.
From the features page (under the section Suggested Uses For SQLite):
Application File Format. Rather than
using fopen() to write XML or some
proprietary format into disk files
used by your application, use an
SQLite database instead. You'll avoid
having to write and troubleshoot a
parser, your data will be more easily
accessible and cross-platform, and
your updates will be transactional.

The whole point of StackOverflow was so that you would not have to email around questions/answers :)
You could store data in a filesystem, memory (use serialisation etc) which are simple alternatives to DB. You can even use HSQLDB which can be run completely in memory

If you data is not so big, you may use simple txt file and store everything in it. Then load it in memory. But this will lead to changing the way you modify/query data.

Database software like mysql, sql etc provides an abstraction in terms of implementation effort. If you wish to avoid using the same, you can think of having your own database like XML or flat files. XML is still a better choice as XML parsers or handlers are available. Putting your data in your customised database/flat files will not be manageable in the long run.
Why don't you explore sqlite? It is file based, means you don't need to install it separately and still you have the standard SQL to retrieve or interact with the data? I think, sqlite will be a better choice.

Just use a prevayler (.org). Faster and simpler than using a database.

I assume from your question that you want some form of persistent storage to the local file system of the machine your application runs on. In addition to that, you need to decide on how the data in your application is to be used, and the volume of it. Do you need a database? Are you going to be searching the data different fields? Do you need a query language? Is the data small enough to fit in to a simple data structure in memory? How resilient does it need to be? The answers to these types of questions will help lead to the correct choice of storage. It could be that all you need is a simple CSV file, XML or similar. There are a host of lightweight databases such as SQLite, Berkelely DB, JavaDB etc - but whether or not you need the power of a database is up to your requirements.

A store that I'm using a lot these days is Neo4j. It's a graph database and is not only easy to use but also is completely in Java and is embedded. I much prefer it to a SQL alternative.

In addition of the others answers about embedded databases I was working on a objects database that directly serialize java objects without the need for ORM. Its name is Sofof and I use it in my projects. It has many features which are described in its website page.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.