using sqlldr from java

using sqlldr from java - java

I have a Java utility for database imports. I'd like to be able to use sqlldr for performance on oracle. I could create the control and data files, but that doesn't seem like The Right Thing™ to do. I should be able to stream the data by providing INFILE "-" in the control file (q1 - how? from command line, I can pipe "echo <data...>" to the sqlldr, but there must be a way to just stream the string into the input stream for the process? never used Java for this before). I can't see how to stream the control file itself (q2 - or am I missing something obvious?). I could use named pipes, but I have no idea how to instantiate and use them from Java in windows (q3 - would that work and how?).
<moan>why must oracle be so complicated? it was trivial in mysql...<moan>

"why must oracle be so complicated? it
was trivial in mysql"
What you must remember is, Oracle is a venerable product. SQL Loader as a utility must be twenty years old, maybe more. So naturally it is harder to work with than some newer tools.
And that is why you should stop trying to fit SQL Loader into your new-fangled Java app :-) Look at external tables instead. Because these are database objects we can use SQL SELECTs against them, so it's a whole easier to automate load processes with them. I wrote a bit more about external tables in my answer to another question. Check it out.

Fundamentally SQLLDR is about getting data from one or more files into a database table. It is powerful in that role, especially when dealing with multiple files or parallel loads from a single file (it can have multiple threads/processes reading from the same file at the same time).
Not all of these fit well with reading from something that isn't a real file. If your data stream is coming from a web service, then I'd pull it using UTL_HTTP. If it is coming from FTP, then I'd FTP straight into the database as a CLOB/BLOB and process it from there.
Depending on your version, also look at the preprocessor capabilities of external tables

Related

Which is the best way to handle big CSV files (Java, MySQL, MongoDB)

i need to handle a big CSV file with around +750.000 rows of data. Each line has around 1000+ characters and ~50 columns, and i am really not sure what's the best (or atleast good and sufficient) way to handle and manipulate this kind of data.
I need to do the following steps:
Compare the values of two Colomns and write the result to a new column (this one seems easy)
Compare values of two lines and do stuff. (e.g delete if one value is duplicated.)
Compare values of two different files.
My Problem is that this is currently done with PHP and/ or Excel and the limits are nearly exceeded + this takes a long time to process and will be no longer possible when the files get even bigger.
I have 3 different possibilities in mind:
Use MySQL, create a table (or two) and do the comparing, adding or deleting part. (I am not really familiar with SQL and would have to learn it, also it should be done automatically so there is the problem that you cant create tables of CSV files )
Use Java creating Objects in ArrayList or Linked Lists and to "the stuff" (to operations would be easy but handling that much data will probably be the problem)
(Is it even possible to save that many files in Java or does it crash / is there a good tool etc.?)
Use Clojure along with MongoDB to add files from CSV to MongoDB and read files using Mongo.
(Name additional possibilities if you have another idea ..)
All in all I am not a Pro in any of these but would like to solve this problem / get some hints or even your opinion.
Thanks in advance

Since in our company we work a lot with huge csv files here are some ideas:
because these files are in our case always exported from some other relational database we always use PostgreSQL, MySQL or golang + SQLite to be able to use simple plain SQL queries which are in these cases most simple and reliable solution
number of rows you describe is quite low from the point of view of all these databases so do not worry
all have native internal solution for import / export of CSV - which works much quicker than anything created manually
for repeated standard checks I use golang + SQLite with :memory: database - this is definitely the quickest solution
MySQL is definitely very good and quick for checks you described but choose of database depends also on how sophisticated analysis you would need to do further - for example MySQL up to 5.7 still does not have window functions which you could need later - so consider using PostgreSQL in some cases too...

I normally use PostgreSQL for this kind of tasks. PostgreSQL COPY allows importing CSV data easily. Then you get a table with your CSV data and the power for SQL (and a reasonable database) to do basically anything you want with the data.
I am pretty sure MySQL have similar capabilities of importing CSV, I just generally prefer PostgreSQL.
I would not use Java for CSV processing. This will be too much code and unless you take care of indices, the processing will not be performant. An SQL database is much better equiped for tabular data processing (should not be a surprize).
I wouldn't use MongoDB, my impression is that it is less powerful in update operations compared to an SQL database. But this is just an opinion, take it with a grain of salt.

You should try Python with the pandas package. On a machine with enough memory (say 16GB) it should be able to handle your CSV files with ease. The main thing is - anyone with some experience with pandas will be able to develop a quick script for you and tell you in a few minutes if your job is doable or not. To get you started:
import pandas
df = pandas.read_csv('filename.csv')
You might need to specify the column type if you get into memory issues.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

I'd suggest to use Spark. Even in a standalone machine the performance is incredible. You can use Scala and Python to handle your data. It's flexible and you can do processing that is impossible in Java or relational database.
The other choices are great also, but I'd consider Spark to all analytics needs from now on.

Processing a text file from PL/SQL vs Java

I need to implement a store procedure in an Oracle database that will do the following:
Read an external file that needs to be processed (extract data from file and validate)
Call another store procedure in the database in charge to validate/insert the data.
Manage exceptions.
Write to another file with the results of the operations executed.
I know I can do all these things with PL/SQL or Java (store procedure), but which will be more efficient/faster or better? most of the operations are reading/writing a file, and the database operations are done in a store procedure already.
I have read other posts about PL/SQL vs Java (like this and this) but none talks about this.

I would never want to use a SQL dialect, no matter how versatile it may be, to do anything outside the DB. I would do what you're trying to do preferably in a shell or perl script, to use the lowest common denominator, although Java is OK, just perhaps a bit too sophisticated for a job this simple. But if Java is all you have or know how to use, then go for it.

XML as data store. Insert, remove, delete

I was planning to use XML to store the data for a Java DVD database application I'm writing. I know that the word "database" is right there in the title, but XML just seemed so much more portable, was human readable and (I assumed before looking into it) simpler to implement.
Parsing XML seems to be the easiest thing in the world... even creating a new XML file isn't much trouble, but changing records, inserting them or deleting them, I can only see to do by creating a fresh XML file.
Am I missing something? Or is the thing that I'm missing that I should switch over to a database format (but there's some wonderful database format I've not heard of, that's totally portable and users won't need to install something separate to use :) )

the most popular way to use a file as a database is probably with sqlite http://www.sqlite.org/ and that's what i would use if i were solving your problem (it's pretty much a standard SQL database, but uses just one file as storage). another, pure-java option is apache derby http://db.apache.org/derby/
however, pure xml databases do exist (and were quite fashionable about 10 years ago - the "nosql" of their time) - the associated standards are xpath http://en.wikipedia.org/wiki/XPath and xquery http://en.wikipedia.org/wiki/Xquery . i haven't used it, but it seems like basex http://basex.org/open-source/ is an open-source implementation that you could use (and it does claim to provide ACID guarantees - http://basex.org/products/ ).
if you're more familiar with xml than sql i don't see any great harm in using an xml database for a small project. just structure your code so that most of the program doesn't care what the storage is (ie by providing a neutral interface). then if xml doesn't work out you can switch to sql by re-implementing just that interface and leaving the rest of your program alone (and if it does work, post back here saying so - it would be interesting to know).

If you're going to have a web-based front end, it seems that a regular database is the way to go as the back end. I don't believe your users would have a need to download anything new, since that's all taken care of server-side. A real database also has the ACID advantage over a pseudobase; it should be atomic, consistent, isolated, and durable, and I can't imagine XML would be a good substitute in those respects.

Alternative of Storing data except databases like mysql,sql etc

I had completed my project Address Book in Java core, in which my data is stored in database (MySql).
I am facing a problem that when i run my program on other computer than tere is the requirement of creating the hole data base again.
So please tell me any alternative for storing my data without using any database software like mysql, sql etc.

You can use an in-memory database such as HSQLDB, Derby (a.k.a JavaDB), H2, ..
All of those can run without any additional software installation and can be made to act like just another library.

I would suggest using an embeddable, lightweight database such as SQLite. Check it out.
From the features page (under the section Suggested Uses For SQLite):
Application File Format. Rather than
using fopen() to write XML or some
proprietary format into disk files
used by your application, use an
SQLite database instead. You'll avoid
having to write and troubleshoot a
parser, your data will be more easily
accessible and cross-platform, and
your updates will be transactional.

The whole point of StackOverflow was so that you would not have to email around questions/answers :)
You could store data in a filesystem, memory (use serialisation etc) which are simple alternatives to DB. You can even use HSQLDB which can be run completely in memory

If you data is not so big, you may use simple txt file and store everything in it. Then load it in memory. But this will lead to changing the way you modify/query data.

Database software like mysql, sql etc provides an abstraction in terms of implementation effort. If you wish to avoid using the same, you can think of having your own database like XML or flat files. XML is still a better choice as XML parsers or handlers are available. Putting your data in your customised database/flat files will not be manageable in the long run.
Why don't you explore sqlite? It is file based, means you don't need to install it separately and still you have the standard SQL to retrieve or interact with the data? I think, sqlite will be a better choice.

Just use a prevayler (.org). Faster and simpler than using a database.

I assume from your question that you want some form of persistent storage to the local file system of the machine your application runs on. In addition to that, you need to decide on how the data in your application is to be used, and the volume of it. Do you need a database? Are you going to be searching the data different fields? Do you need a query language? Is the data small enough to fit in to a simple data structure in memory? How resilient does it need to be? The answers to these types of questions will help lead to the correct choice of storage. It could be that all you need is a simple CSV file, XML or similar. There are a host of lightweight databases such as SQLite, Berkelely DB, JavaDB etc - but whether or not you need the power of a database is up to your requirements.

A store that I'm using a lot these days is Neo4j. It's a graph database and is not only easy to use but also is completely in Java and is embedded. I much prefer it to a SQL alternative.

In addition of the others answers about embedded databases I was working on a objects database that directly serialize java objects without the need for ORM. Its name is Sofof and I use it in my projects. It has many features which are described in its website page.

Storing java objects in server memory

I got a java web project handling several objects (again containing n objects of type A (e.g. time and value) and m objects of type B (e.g. time and String array)). The web projects itself contains several servlets/jsps for visualization as well as some logic for data manipulation and currently runs on an Apache Tomcat.
Is it possible to store the whole data in the servers (or most of the time: local) memory while the server is running? If the Tomcat is shut down, the data could be stored in a simple file, no restrictions there. On server startup, I just want to read in the files and write the objects to memory. How can I initiate the Tomcat to do so?
The reason why I do not want to use an extra database is, that I want to deliver a zip file containing the tomcat including the deployed *.war file (as I don't want my prof getting stuck with tomcat server setup etc.)
Thanks, ChrisH

You could implement ServletContextListener and write the load-from-file and save-to-file logic in the contextInitialized() and contextDestroyed() methods which are invoked during webapp's startup and shutdown respectively.
You can read and write objects to disk, but they all need to implement java.io.Serializable first. Here is a Serialization tutorial with code examples.
That said, have you considered an embedded database so that you don't need to install a database server? You could use the JDK6's built-in JavaDB for this or its competitor HSQLDB. Alternatively, if it are pure key-value pairs, then you could also just use the java.util.Properties API for this (tutorial here). Just place the propertiesfile somewhere in the classpath and use ClassLoader#getResourceAsStream() to get an InputStream of it, or place it somewhere in WEB-INF and use ServletContext#getResourceAsStream().

I think that HSQLDB is exactly what you need, a small database server that is also embedded natively in Apache Tomcat. It stores data in memory allowing also to write and read contents from a file.

If the app shuts down unexpectedly, you'll lose all your data, because it won't have time to write it to disk.
You could use a database like SQLite/derby/hsql etc. which store their data to the filesystem.
If you don't want to mess with a DB, then you could store everything in memory and flush it to disk every time it's modified. A couple tips here:
Serialization can make this really easy. Make all your objects implement Serializable, and give them a serial version id
use a BufferedOutputStream when
writing to disk, this is faster than a straight FileOutputStream
DO NOT overwrite your old data file directly! Write to a new file, and when done writing, move the completed file on top of your old file. That way, if the server shuts down while you're in the middle of writing your data file, you still have the good file which was written before.
You should acquire a read lock on your data while writing it. Any other code which modifies the data should get a Write lock on the data.

If you don't care about the possibility that your application may scribble all over your data files, your Tomcat / JVM may crash, or your machine may die losing all in-memory objects, then managing persistence as you suggest is an option. But you'll have quite a bit of infrastructure to build, test and maintain. And you'll miss out on the "value add" tools that most RDBMs provide; backup, a query tool, optimizers, replication, etc.
But if catastrophic data loss is not an option for you, you should use an RDBMs, ODBMs, whatever to do your persistence.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.