Create a Java ResultSet using CSV file instead of a SQL statement - java

I want to use a result set to hold my user data. Just one wrinkle - the data is in a CSV (comma delimited) format, not in a database.
The CSV file contains a header row at the top, then data rows. I would like to dynamically create a result set but not using an SQL query. The input would be the CSV file.
I am expecting (from my Java experience) that there would be methods like rs.newRow, rs.newColumn, rs.updateRow and so on, so I could use the structure of a result set without having any database query first.
Are there such methods on result sets? I read the docs and did not find any way. It would be highly useful to just build a result set from the CSV and then use it like I had done an SQL query. I find it difficult to use a java bean, array list of beans, and such - because I want to read in CSV files with different structures, the equivalent of SELECT * FROM in SQL.
I don't know what the names or numbers of columns will be in advance. So other SO questions like "Manually add data to a Java ResultSet" do not answer my needs. In that case, he already has an SQL created result set, and wants to add rows.
In my case, I want to create a result set from scratch, and add columns as the first row is read in. Now that I am thinking about it, I could create a query from the first row, using a statement like SELECT COL1, COL2, COL3 ... FROM DUMMY. Then use the INSERT ROW statement to read and insert the rest. (Later:) But on trying that, it fails because there is no real table associated.

The CsvJdbc open source project should do just that:
// Load the driver.
Class.forName("org.relique.jdbc.csv.CsvDriver");
// Create a connection using the directory containing the file(s)
Connection conn = DriverManager.getConnection("jdbc:relique:csv:/path/to/files");
// Create a Statement object to execute the query with.
Statement stmt = conn.createStatement();
// Query the table. The name of the table is the name of the file without ".csv"
ResultSet results = stmt.executeQuery("SELECT col1, col2 FROM myfile");

Just another idea to complement #Mureinik's one, that I find really good.
Alternatively, you could use any CSV Reader library (many of them out there), and load the file into an in-memory table using any in-memory database such as H2, HyperSQL, Derby, etc. These ones offer you a full/complete SQL engine, where you can run high end/complex queries.
It requires more work but you get a lot of flexibility to use the data afterwards.
After you try the in-memory solution, switching to a persistent database is really easy (just change the URL). This way you could load the CSV file only once into the database. On the second execution an on, the database would be ready to use; no need to load the CSV again.

Related

Is it possible to save Dynamic Tables in SQL server?

I'm working on a school project where I need to make a dynamic table in JavaFX, which should be saved in my SQL database.
The user should be able to specify the number of rows and columns needed.
Is it possible to save a dynamic table in SQL?
I've thought about exporting the dynamic table to a CSV file, and then save the information in the CSV file as a string. Then when you load the table again, a method would be needed to convert from string to table again. Seems like a stupid and inefficient way to do it though.
I've read some places that I should use XML or JSON in some way, but I never understood how.

Netezza Streaming ResultSet

I am having memory issues because I am trying to read a huge ResultSet from a Netezza database. Does Netezza support any kind of "streaming" ResultSet like MySQL does? If not, will limiting the fetch size like this work instead?:
stmt.setFetchSize(50);
conn.setAutoCommitMode(false);
If you want to pull the rows to store in a file, then your best best is to use a remote external table.
Here is an example that creates a transient remote external table over JDBC. This will invoke the bulk export/load funciontality provided with the JDBC driver, and create a pipe delimited text file.
create external table 'c:\mytest.txt'
USING (DELIMITER '|' REMOTESOURCE 'JDBC' ) as
select *
from table1;
You can call this using conn.createStatement().execute, and you will likely have to add change the file specification to c:\mytest.txt to escape the existing backslash.
You can read more about external tables in the documentation here.
You can use setFetchSize, by the way. I'm not sure that it would solve your memory issue though.
From following URL, you can get information about ROWSET_LIMIT, with this variable setting you can limit query result to your requirement and create streams as per need -
https://www-01.ibm.com/support/knowledgecenter/SSULQD_7.1.0/com.ibm.nz.adm.doc/c_sysadm_user_rowset_limits.html?lang=en
"You can place a limit on the number of rows a query can return and thus restrict resources for large result sets. Specifying a rowset limit when you create a user or a group automatically limits the rows that are returned so that users do not have to append a limit clause to their SQL queries."

Updating row while iterating through ResultSet takes a lot of time

I am trying to improve a data transfer program that I wrote. I am looking for suggestions on how to make it quicker.
My program extracts data from a database (usually Oracle 11g) by filling a ResultSet and writing this result into a file. The program looks periodically into the tables and queries if a special column has changed. For example, this could be such a query:
select columnA, columnB from scheme.table where changeColumn = '1'
Now comes the critical part. After extracting the data I need to update this changeColumn to '0'. Since I have just used the ResultSet for exporting the data into a file I have to rewind it, so the code looks like this:
extractedData.beforeFirst();
while (extractedData.next()) {
extractedData.updateString("changeColumn", "0");
extractedData.updateRow();
}
Now if this ResultSet is bigger (let's say more than 100.000 entries) then this loop can take hours. Does anyone have any suggestions on how to increase the performance of this?
I heard of setting the fetch size to a bigger value, but usually the ResultSet only contains less than a dozen entries. Is there a way to dynamically set the fetch size?
Use a JDBC Batch Update. From all the row that needs updating, take the primary key on the row that needs updating, add it to a batch update (SQL query) and execute the batch.
A good example from Mkyong shows you how to do JDBC Batch Update with JDBC PreparedStatement.

Speeding Up Query Calls to MySQL using Java

I am parsing an XML file consisting of ~600K lines. Parsing and Inserting the data from the XML to the database is not a problem as I am using SAX to parse and using LOAD DATA INFILE (from a .txt file) to INSERT into the database. The txt file is populated in Java using JDBC. All of this takes a good 5 seconds to populate in the database.
My bottle neck is now executing multiple SELECT queries. Basically, each time I hit a certain XML tag, I would call the SELECT query to grab a data from another DB table. Adding these SELECT queries brings my populating time to 2 minutes.
For example:
I am parsing through an XML consisting of books, articles, thesis, etc.
Each book/article has child elements such as isbn, title, author, editor, publisher.
At each author/editor/publisher, I need to query a table in a database.
Let's say I encountered the author tag with value Tolkien.
I need to query a table that already exist in the database called author_table
The query is [select author_id from author_table where name = 'Tolkien']
This is where the bottle neck is happening.
Now my question is: Is there a way to speed this up?
BTW, the reason why I think 2 minutes is long is because this is a homework assignment and I am not yet finished with populating the database. I would estimate that the whole DB population would take 5 minutes. Thus the reason why I am seeking advice for performance optimization.
There are few things you can consider:
Use connection pooling so you don't create/close a new connection everytime you're executing query. Doing so is expensive
Cache whatever data you are obtaining via SELECT query. Is it possible to prefetch all the data beforehand so you don't have to query them on the spot?
If your SELECT is slow, ensure the query is optimized and you have appropriate index in place to avoid scanning the whole table
Ensure you use buffered IO in Java
Can you subdivide the work into multiple threads? If so create multiple worker thread to do multiple instance of your job in parallel

what is the best way to put resultset data into a text file?

I want to put all the data in the resultset into a text file in the same order. Is there any method to get data in all the rows at once and write to a file or have to write it row by row?
Why go through Java to do this? Many DBMS provide this feature out of the box.
MySQL example:
select
your_first_field,
your_second_field
from
your_favorite_table
into
outfile
'/path/to/favorite/file.csv'
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
ResultSet is not how you think it is. Its just a reference to the actual resultset in the database. You cannot convert it in one shot. You will have to iterate it row by row. Whenever a select query is fired, the result that is produced is held in the database cache for which jdbc allocates a resultset reference to make life easier to access data.
So, the answer to your question is, Yes you need to iterate row by row and probably you can use a CSV file to store your values.
1) Answer to this question can be handled in the following way
What ever the resultset you have got, iterate through the result set and when you are storing the data into the variables, send it to the csv files through the File I/O Package
2) if you have the database, from there directly export the data to a csv files.

Categories

Resources