Getting Data as chunks from Database to save memory

Getting Data as chunks from Database to save memory - java

My problem is related to JDBC queries where the number of records in the Table is huge. The end goal is getting the data from DB in a streamed fashion where by you get the data chunk by chunk.
This is possible by creating Multiple SQL statements by key words such as LIMIT , OFFSET. But in this case there will be multiple DB calls which will cost more time.
Is there a way where by you do not load an entire result Set into memory & can get data in chunks without having additional DB calls?
Thanks

First, if you are getting data in chunks from the database, you will be doing multiple database calls. You won't be executing as many queries.
Second, yes it is possible. There is a DB concept known as a "cursor".
Connection cn = //..
// Very important - JDBC default is to commit after every statement
// which will cause DB to close the cursor
cn.setAutoCommit(false);
Statement st = cn.prepareStatement("SELECT * FROM TBL_FOO");
// Cache 50 rows at the client at a time
st.setFetchSize(50);
ResultSet rs = st.executeQuery();
while (rs.next()) {
// Move the cursor position forward - moving past cached rows triggers another fetch
}
rs.close();
st.close();
Note, the database will have fetched all the rows when it executed the query, and the result set will occupy DB memory until you close the cursor. Remember, the DB is a shared resource.

Related

JDBC - ResultSet cache management

I'm using this code to load data from a database :
PreparedStatement inputStmt = connection.prepareStatement("select * from A");
inputStmt.setFetchSize(3000);
inputStmt.executeQuery();
Since i m using a setFetchSize, I know that the request will fetch only 3000 rows at a time, and if needed it will fetch the next 3000 rows ...
My question is : when we fetch the second 3000 rows, is the first 3000 remains in the cache ?
because I'm reading a table with millions of rows, and if I do not manage well my callc I will have a memory issue.

The default ResultSet type is ResultSet.TYPE_FORWARD_ONLY, meaning that you can only go forward, one row at a time. Unless the driver is very poorly implemented, it won't keep any unaccessible rows in memory anymore.
The default fetch size differs between drivers. Some load the full resultset by default (for example Postgres), others have a smaller fetch size (e.g. Oracle), which can even be ineffective for some types of tasks.

Row storage location in Resultset

What happens upon executing statement? does it retrieves all the rows into memory?Where the rows are stored in the Resultset, and how they are fetched into java program?
Update
On calling resultSet.next(); Does it goes to database fetch a singleRow come back at java side and display it? and ResultSet has cursor is it similar to database cursor?

A common misunderstanding is that the ResultSet must be some kind of container that holds all the rows from the query, so that people try to pass it around the application and are surprised that it becomes invalid when the database connection used to create it closes. The ResultSet is more like a database cursor, it's something you use to pull rows back from the database. The ResultSet has a fetch size that suggests to the driver how many rows it can get from the server at a time, so that it can retrieve the rows in chunks and buffer them for when they're needed.

It is driver specific. E.g. Postgres JDBC loads all rows by default to memory, Oracle uses cursor on server side and fetches only part of rows.

CachedRowSet slower than ResultSet?

In my java code, I access an oracle database table with an select statement.
I receive a lot of rows (about 50.000 rows), so the rs.next() needs some time to process all of the rows.
using ResultSet, the processing of all rows (rs.next) takes about 30 secs
My goal is to speed up this process, so I changed the code and now using a CachedRowSet:
using CachedRowSet, the processing of all rows takes about 35 secs
I don't understand why the CachedRowSet is slower than the normal ResultSet, because the CachedRowSet retrieves all data at once, while the ResultSet retrieves the data every time the rs.next is called.
Here is a part of the code:
try {
stmt = masterCon.prepareStatement(sql);
rs = stmt.executeQuery();
CachedRowSet crset = new CachedRowSetImpl();
crset.populate(rs);
while (rs.next()) {
int countStar = iterRs.getInt("COUNT");
...
}
} finally {
//cleanup
}

CachedRowSet caches the results in memory i.e. that you don't need the connection anymore. Therefore it it "slower" in the first place.
A CachedRowSet object is a container for rows of data that caches its
rows in memory, which makes it possible to operate without always
being connected to its data source.
-> http://download.oracle.com/javase/1,5.0/docs/api/javax/sql/rowset/CachedRowSet.html

There is an issue with CachedRowSet coupled together with a postgres jdbc driver.
CachedRowSet needs to know the types of the columns so it knows which java objects to create
(god knows what else it fetches from DB behind the covers!).
It therefor makes more roundtrips to the DB to fetch column metadata.
In very high volumes this becomes a real problem.
If the DB is on a remote server, this is a real problem as well because of network latency.
We've been using CachedRowSet for years and just discovered this. We now implement our own CachedRowSet, as we never used any of it's fancy stuff anyway.
We do getString for all types and convert ourselves as this seems the quickest way.
This clearly wasn't an issue with fetch size as postgres driver fetches everything by default.

What makes you think that ResultSet will retrieve the data each time rs.next() is called? It's up to the implementation exactly how it works - and I wouldn't be surprised if it fetches a chunk at a time; quite possibly a fairly large chunk.
I suspect you're basically seeing the time it takes to copy all the data into the CachedRowSet and then access it all - basically you've got an extra copying operation for no purpose.

Using normal ResultSet you can get more optimization options with RowPrefetch and FetchSize.
Those optimizes the network transport chunks and processing in the while loop, so the rs.next() has always a data to work with.
FetchSize has a default set to 10(Oracle latest versions), but as I know RowPrefetch is not set. Thus means network transport is not optimized at all.

How to optimize the processing speed for inserting data using java?

I have a requirement to read an Excel file with its headers and data and create a table in a Database (MySQL) on the basis of header and put value which is extracted from file. For that I am using JDBC for creating and inserting data (used prepared statement) in DB Table.
It works nicely but when the number of records are increased -suppose file contains 200000 or more records- it's going to be slow. Please guide me how I am able to optimize the processing speed of inserting data into an DB Table.
Thanks, Sameek

To optimize it you should first use the same PreparedStatement object in all of the inserts.
To further optimize the code you can send batches of updates.
e.g. batches of 5:
//create table
PreparedStatement ps = conn.prepareStatement(sql);
for(int i =0; i < rows.length; ++i) {
if(i != 0 && i%5 == 0) {
pstmt.executeBatch();
}
pstmt.setString(1, rows[i].getName());
pstmt.setLong(2, rows[i].getId());
pstmt.addBatch();
}
pstmt.executeBatch();

Wrap your inserts in a transaction. Pseudo code:
1) Begin transaction
2) Create prepared statement
3) Loop for all inserts, setting prepared statement parameters and executing for each insert
4) Commit transaction

I'll take the example of hibernate. Hibernate have a concept called HibernateSession which stores the SQL command that are not yet sent to DB.
With Hibernate you can do inserts and flush the session every 100 inserts which means sending SQL queries every 100 inserts. This helps to gain performance because it communicates with database every 100 inserts and not each insert.
So you can make the same thing by executing the executeUpdate every 100 (or what ever you want) times or use preparedStatement.

Does a ResultSet load all data into memory or only when requested?

I have a .jsp page where I have a GUI table that displays records from an Oracle database. This table allows typical pagination behaviour, such as "FIRST", "NEXT", "PREVIOUS" and "LAST". The records are obtained from a Java ResultSet object that is returned from executing a SQL statement.
This ResultSet might be very big, so my question is:
If I have a ResultSet containing one million records but my table only displays the data from the first ten records in the ResultSet, is the data only fetched when I start requesting record data or does all of the data get loaded into memory entirely once the ResultSet is returned from executing a SQL statement?

The Java ResultSet is a pointer (or cursor) to the results in the database. The ResultSet loads records in blocks from the database. So to answer your question, the data is only fetched when you request it but in blocks.
If you need to control how many rows are fetched at once by the driver, you can use the setFetchSize(int rows) method on the ResultSet. This will allow you to control how big the blocks it retrieves at once.

The JDBC spec does not specify whether the data is streamed or if it is loaded into memory. Oracle streams by default. MySQL does not. To get MySQL to stream the resultset, you need to set the following on the Statement:
pstmt = conn.prepareStatement(
sql,
ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY);
pstmt.setFetchSize(Integer.MIN_VALUE);

The best idea is make a sub query and display 100 or 1000 rows at a time/in single page. And managing the connection by connection pooling.
To make a sub query you can use Row count in oracle and Limit in MY SQL.

While the JDBC spec does not specify whether or not the all data in the result set would get fetched, any well-written driver won't do that.
That said, a scrollable result set might be more what you have in mind:
(link redacted, it pointed to a spyware page)
You may also consider a disconnected row set, that's stored in the session (depending on how scalable your site needs to be):
http://java.sun.com/j2se/1.4.2/docs/api/javax/sql/RowSet.html

lets say we have a table that contains 500 records in it
PreparedStatement stm=con.prepareStatement("select * from table");
stm.setFetchSize(100);// now each 100 records are loaded together from the database into the memory,
// and since we have 500 5 server round trips will occur.
ResultSet rs = stm.executeQuery();
rs.setFetchSize (50);//overrides the fetch size provided in the statements,
//and the next trip to the database will fetch the records based on the new fetch size

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Getting Data as chunks from Database to save memory - java

Related

JDBC - ResultSet cache management

Row storage location in Resultset

CachedRowSet slower than ResultSet?

How to optimize the processing speed for inserting data using java?

Does a ResultSet load all data into memory or only when requested?

Categories

Resources