Java JDBC Lazy-Loaded ResultSet

Java JDBC Lazy-Loaded ResultSet - java

Is there a way to get a ResultSet you obtain from running a JDBC query to be lazily-loaded? I want each row to be loaded as I request it and not beforehand.

Short answer:
Use Statement.setFetchSize(1) before calling executeQuery().
Long answer:
This depends very much on which JDBC driver you are using. You might want to take a look at this page, which describes the behavior of MySQL, Oracle, SQL Server, and DB2.
Major take-aways:
Each database (i.e. each JDBC driver) has its own default behavior.
Some drivers will respect setFetchSize() without any caveats, whereas others require some "help".
MySQL is an especially strange case. See this article. It sounds like if you call setFetchSize(Integer.MIN_VALUE), then it will download the rows one at a time, but it's not perfectly clear.
Another example: here's the documentation for the PostgreSQL behavior. If auto-commit is turned on, then the ResultSet will fetch all the rows at once, but if it's off, then you can use setFetchSize() as expected.
One last thing to keep in mind: these JDBC driver settings only affect what happens on the client side. The server may still load the entire result set into memory, but you can control how the client downloads the results.

Could you not achieve this by setting the fetch size for your Statement to 1?
If you only fetch 1 row at a time each row shouldn't be loaded until you called next() on the ResultSet.
e.g.
Statement statement = connection.createStatement();
statement.setFetchSize(1);
ResultSet resultSet = statement.executeQuery("SELECT .....");
while (resultSet.next())
{
// process results. each call to next() should fetch the next row
}

There is an answer provided here.
Quote:
The Presto JDBC driver never buffers the entire result set in memory. The server API will return at most ~1MB of data to the driver per request. The driver will not request more data from the server until that data is consumed (by calling the next() method on ResultSet an appropriate number of times).
Because of how the server API works, the driver fetch size is ignored (per the JDBC specification, it is only a hint).
Prove that the setFetchSize is ignored

I think what you would want to do is defer the actually loading of the ResultSet itself. You would need to implement that manually.

You will find this a LOT easier using hibernate. You will basically have to roll-your-own if you are using jdbc directly.
The fetching strategies in hibernate are highly configurable, and will most likely offer performance options you weren't even aware of.

Related

hibernate scrollable result fetch immediatelly all rows in mysql

Our applications may run over multiple DB depending on customer infrastructure.
We use Hibernate orm, so we can deploy our applications over various RDBMS.
We notice that we have an abnormal consumption of memory in all the environments where a MySql database is used.
Analyzing the problem we see that it raises when we use scrollable result.
At the opposite of other environnements (SQL server, Oracle,....) it looks like the scrollable result in MySql fetch immediately all the results of the query.
As far as I know and I have seen in Oracle and SQL Server when you use scrollable result a cursor is open, and rows are fetched from the DB server only when next() method is called. All our cursors are `FORWARD_ONLY.
I think that the cause is the MySql-jdbc-connector that doesn't handle correctly the scrollable result.
Is that correct?
Can I have the scrollable result working correctly on MySql? And, if yes, how?
Thanks in advance to anyone

Yes apparently MySQL caches ResultSet data by default because it's the most efficent way for it
https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
By default, ResultSets are completely retrieved and stored in memory.
In most cases this is the most efficient way to operate and, due to
the design of the MySQL network protocol, is easier to implement.
you can try: query.setFetchSize(), it will give the underlying jdbc hint about your requirment.
But it all depends on the driver, some may actually just ignore it and proceed to fetch everything

Is the JDBC ResultSet an application-level query cursor

The database cursor definition is strikingly resembling with the JDBC ResultSet API.
the database cursor can be forward-only just like ResultSet.TYPE_FORWARD_ONLY.
the database cursor can be scrollable and even have a sensitivity setting just like ResultSet.TYPE_SCROLL_SENSITIVE.
there is also support for holdability like ResultSet.HOLD_CURSORS_OVER_COMMIT
and even the support for positional update/delete is being replicated into JDBC ResultSet.CONCUR_UPDATABLE
But in spite of all these resembling, MySQL doesn't support database cursors:
MySQL does not support SQL cursors, and the JDBC driver doesn't
emulate them, so setCursorName() has no effect.
So, is the JDBC implementation a data access specification that mimics a database cursor implementation, even if the database doesn't really support such a feature?

What's in a name...
Indeed, a ResultSet and a database cursor are semantically similar. The SQL:2011 standard specifies:
A cursor is a mechanism by which the rows of a table may be acted on (e.g., returned to a host programming language) one at a time.
That does sound a lot like a ResultSet. Further down, the SQL:2011 standard goes on and mentions:
A cursor declaration descriptor and a result set descriptor have four properties: the sensitivity property (either SENSITIVE, INSENSITIVE, or ASENSITIVE), the scrollability property (either SCROLL or NO SCROLL), the holdability property (either WITH HOLD or WITHOUT HOLD), and the returnability property (either WITH RETURN or WITHOUT RETURN).
In other words, none of these features were "invented" by the JDBC (or ODBC) spec teams. They do exist exactly in this form in many SQL database implementations, and as with any specs, many of the above features are optional in SQL implementations as well.
You've gotten an authoritative response on the MySQL part already by Jess. I'd like to add that JDBC, like any specification on a high level, has parts that are required and parts that are optional.
Looking at the JDBC Spec, I can see the following relevant parts.
6.3 JDBC 4.2 API Compliance
A driver that is compliant with the JDBC specification must do the following:
[...]
It must implement the Statement interface with the exception of the following
optional methods:
[...]
setCursorName
[...]
It must implement the ResultSet interface with the exception of the following
optional methods:
[...]
getCursorName
[...]
The same is true for the implementation of ResultSet types. Further down in the specs, you will find:
The method DatabaseMetaData.supportsResultSetType returns true if the
specified type is supported by the driver and false otherwise.

You can certainly think of it that way. All of these concepts are inherited from ODBC so you can thank (blame?) history for things being this way. Cursors aren't widely supported by most dbs to the full extent that features are provided in APIs such as JDBC. In MySQL specifically, there is a cursor "fetch" supported as of MySQL 5.0 which means that the driver isn't forced to read the entire result, whether it's needed or not. This means that it is possible to abandon a result set early with little to no cost. However, an additional round-trip is required to request blocks of rows periodically. MySQL Connector/J doesn't enforce the FORWARD_ONLY semantics by default and buffers the entire result in the client allowing "scrollability". However, due to the implementation in the server, this does not allow for being sensitive to changes committed in other transactions. Features are typically mimicked/emulated where possible to provide the convenience of the API.

Based on my understanding about JDBC ResultSet i will say it does not depends Database which it connects, its behaviour would be same.
JDBC will always fetches default number of rows (not the entire result set) to your local memory. Once you reach at the last line of the fetched rows (say by doing next() and try to access next row) and if there are more rows in the result, then another round-trip call will be made to the database to fetch next batch of rows to local memory.
Even you can set number of rows you want fetch in local memory than usual, you may consider CachedRowSet.
When you set the fetchSize() on the Statement, you are only giving a instruction to the JDBC driver how much you want it should fetch, but JDBC driver is free to ignore your instructions. I do not know what the Oracle driver does with the fetchSize(). Most of times its observed that MySQL JDBC driver will always fetch all rows unless you set the fetchSize() to Integer.MIN_VALUE.

What is Usage of TYPE_SCROLL_INSENSITIVE while creating Statement object

JavaDoc says: "The constant indicating the type for a ResultSet object that is scrollable but generally not sensitive to changes to the data that underlies the ResultSet"
.
I am clear about the scrollable part but have doubts regarding the latter part of statement.
I am using following code snippet for validate my understanding.
conn = getConnection();
Statement stmt = conn
.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE,
ResultSet.CONCUR_UPDATABLE);
String query = "select * from vehicle";
ResultSet rs = stmt.executeQuery(query);
rs.absolute(2);
System.out.print(rs.getString(2));
System.out.println("Waiting........");
Thread.sleep(20000); //1 manually changed database entry
rs.refreshRow();
System.out.println(rs.getString(2));//2 Surprisingly changes is reflected
At Comment 1, I did manual changes in the database then I called rs.refreshRow() method. After this At Comment 2, when when I accessed the value of second column then surprisingly change in value of second column is reflected. As per my understanding this change should not be reflected, as 'it is insensitive to changes done by other'(as per JavaDoc). Can anybody explain me what is its actual usage?

I investigated this a while ago, specifically with regard to MySQL Connector/J. As far as I could tell, the settings ResultSet.TYPE_SCROLL_SENSITIVE and ResultSet.TYPE_SCROLL_INSENSITIVE did not actually affect the behaviour when retrieving data from MySQL.
Several similar questions and blog posts I found referred to the MySQL Connector/J documentation, where in the section on JDBC API Implementation Notes it says that
By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate and, due to the design of the MySQL network protocol, is easier to implement.
It goes on to talk about using ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY, and stmt.setFetchSize(Integer.MIN_VALUE); as "a signal to the driver to stream result sets row-by-row", but even in that case my testing showed that the entire ResultSet was still being retrieved as soon as I did stmt.executeQuery(...). (Although perhaps I missed some other connection setting that wasn't explicitly mentioned in that section of the MySQL Connector/J documentation.)
Finally I came to the conclusion that the ResultSet.TYPE_SCROLL_[IN]SENSITIVE setting really doesn't make any difference under MySQL Connector/J. While simply scrolling around the ResultSet it always seems to act like it is INSENSITIVE (ignoring any changes to existing rows that were made by other processes), but rs.refreshRow(); always returns the latest data (including changes made by other processes) as though it is SENSITIVE even if the ResultSet is supposed to be INSENSITIVE.

CachedRowSet slower than ResultSet?

In my java code, I access an oracle database table with an select statement.
I receive a lot of rows (about 50.000 rows), so the rs.next() needs some time to process all of the rows.
using ResultSet, the processing of all rows (rs.next) takes about 30 secs
My goal is to speed up this process, so I changed the code and now using a CachedRowSet:
using CachedRowSet, the processing of all rows takes about 35 secs
I don't understand why the CachedRowSet is slower than the normal ResultSet, because the CachedRowSet retrieves all data at once, while the ResultSet retrieves the data every time the rs.next is called.
Here is a part of the code:
try {
stmt = masterCon.prepareStatement(sql);
rs = stmt.executeQuery();
CachedRowSet crset = new CachedRowSetImpl();
crset.populate(rs);
while (rs.next()) {
int countStar = iterRs.getInt("COUNT");
...
}
} finally {
//cleanup
}

CachedRowSet caches the results in memory i.e. that you don't need the connection anymore. Therefore it it "slower" in the first place.
A CachedRowSet object is a container for rows of data that caches its
rows in memory, which makes it possible to operate without always
being connected to its data source.
-> http://download.oracle.com/javase/1,5.0/docs/api/javax/sql/rowset/CachedRowSet.html

There is an issue with CachedRowSet coupled together with a postgres jdbc driver.
CachedRowSet needs to know the types of the columns so it knows which java objects to create
(god knows what else it fetches from DB behind the covers!).
It therefor makes more roundtrips to the DB to fetch column metadata.
In very high volumes this becomes a real problem.
If the DB is on a remote server, this is a real problem as well because of network latency.
We've been using CachedRowSet for years and just discovered this. We now implement our own CachedRowSet, as we never used any of it's fancy stuff anyway.
We do getString for all types and convert ourselves as this seems the quickest way.
This clearly wasn't an issue with fetch size as postgres driver fetches everything by default.

What makes you think that ResultSet will retrieve the data each time rs.next() is called? It's up to the implementation exactly how it works - and I wouldn't be surprised if it fetches a chunk at a time; quite possibly a fairly large chunk.
I suspect you're basically seeing the time it takes to copy all the data into the CachedRowSet and then access it all - basically you've got an extra copying operation for no purpose.

Using normal ResultSet you can get more optimization options with RowPrefetch and FetchSize.
Those optimizes the network transport chunks and processing in the while loop, so the rs.next() has always a data to work with.
FetchSize has a default set to 10(Oracle latest versions), but as I know RowPrefetch is not set. Thus means network transport is not optimized at all.

sql server query running slow from java

I have a java program that runs a bunch of queries against an sql server database. The first of these, which queries against a view returns about 750k records. I can run the query via sql server management studio, and I get results in about 30 seconds. however, I kicked off the program to run last night. when I checked on it this morning, this query still had not returned results back to the java program, some 15 hours later.
I have access to the database to do just about anything I want, but I'm really not sure how to begin debugging this. What should one do to figure out what is causing a situation like this? I'm not a dba, and am not intimately familiar with the sql server tool set, so the more detail you can give me on how to do what you might suggest would be appreciated.
heres the code
stmt = connection.createStatement();
clientFeedRS = stmt.executeQuery(StringBuffer.toString());
EDIT1:
Well it's been a while, and this got sidetracked, but this issue is back. I looked into upgrading from jdbc driver v 1.2 to 2.0, but we are stuck on jdk 1.4, and v 2.0 require jdk 1.5 so that's a non starter. Now I'm looking at my connection string properties. I see 2 that might be useful.
SelectMethod=cursor|direct
responseBuffering=adaptive|full
Currently, with the latency issue, I am running with cursor as the selectMethod, and with the default for responseBuffering which is full. Is changing these properties likely to help? if so, what would be the ideal settings? I'm thinking, based on what I can find online, that using a direct select method and adaptive response buffering might solve my issue. any thoughts?
EDIT2:
WEll I ended changing both of these connection string params, using the default select method(direct) and specifying the responseBuffering as adaptive. This ends up working best for me and alleviates the latency issues I was seeing. thanks for all the help.

I had similar problem, with a very simple request (SELECT . FROM . WHERE = .) taking up to 10 seconds to return a single row when using a jdbc connection in Java, while taking only 0.01s in sqlshell. The problem was the same whether i was using the official MS SQL driver or the JTDS driver.
The solution was to setup this property in the jdbc url :
sendStringParametersAsUnicode=false
Full example if you are using MS SQL official driver : jdbc:sqlserver://yourserver;instanceName=yourInstance;databaseName=yourDBName;sendStringParametersAsUnicode=false;
Instructions if using different jdbc drivers and more detailled infos about the problem here : http://emransharif.blogspot.fr/2011/07/performance-issues-with-jdbc-drivers.html
SQL Server differentiates its data types that support Unicode from the ones that just support ASCII. For example, the character data types that support Unicode are nchar, nvarchar, longnvarchar where as their ASCII counter parts are char, varchar and longvarchar respectively. By default, all Microsoft’s JDBC drivers send the strings in Unicode format to the SQL Server, irrespective of whether the datatype of the corresponding column defined in the SQL Server supports Unicode or not. In the case where the data types of the columns support Unicode, everything is smooth. But, in cases where the data types of the columns do not support Unicode, serious performance issues arise especially during data fetches. SQL Server tries to convert non-unicode datatypes in the table to unicode datatypes before doing the comparison. Moreover, if an index exists on the non-unicode column, it will be ignored. This would ultimately lead to a whole table scan during data fetch, thereby slowing down the search queries drastically.
In my case, i had 30M+ records in the table i was searching from. The duration to complete the request went from more than 10 seconds, to approximatively 0.01s after applying the property.
Hope this will help someone !

It appears this may not have applied to your particular situation, but I wanted to provide another possible explanation for someone searching for this problem.
I just had a similar problem where a query executed directly in SQL Server took 1 minute while the same query took 5 minutes through a java prepared statemnent. I tracked it down to the fact that it is was done as a prepared statement.
When you execute a query directly in SQL Server, you are providing it a non-parameterized query, in which it knows all of the search criteria at optimization time. In my case, my search criteria included a date range, and SQL server was able to look at it, decide "that date range is huge, let's not use the date index" and then it chose something much better.
When I execute the same query through a java prepared statement, at the time that SQL Server is optimizing the query, you haven't yet provided it any of the parameter values, so it has to make a guess which index to use. In the case of my date range, if it optimizes for a small range and I give it a large range, it will perform slower than it could. Likewise if it optimizes for a large range and I give it a small one, it's again going to perform slower than it could.
To demonstrate this was indeed the problem, as an experiment I tried giving it hints as to what to optimize for using SQL Server's "OPTIMIZE FOR" option. When I told it to use a tiny date range, my java query (which actually had a wide date range) actually took twice as long as before (10 minutes, as opposed to 5 minutes before, and as opposed to 1 minute in SQL Server). When I told it my exact dates to optimize for, the execution time was identical between the java prepared statement.
So my solution was to hard code the exact dates into the query. This worked for me because this was just a one-off statement. The PreparedStatement was not intended to be reused, but merely to parameterize the values to avoid SQL injection. Since these dates were coming from a java.sql.Date object, I didn't have to worry about my date values containing injection code.
However, for a statement that DOES need to be reused, hard coding the dates wouldn't work. Perhaps a better option for that would be to create multiple prepared statements optimized for different date ranges (one for a day, one for a week, one for a month, one for a year, and one for a decade...or maybe you only need 2 or 3 options...I don't know) and then for each query, execute the one prepared statement whose time range best matches the range in the actual query.
Of course, this only works well if your date ranges are evenly distributed. If 80% of your records were in the last year, and 20% percent spread out over the previous 10 years, then doing the "multiple queries based on range size" thing might not be best. You'd have to optimize you queries based on specific ranges or something. You'd need to figure that out through trial an error.

Be sure that your JDBC driver is configured to use a direct connection and not a cusror based connection. You can post your JDBC connection URL if you are not sure.
Make sure you are using a forward-only, read-only result set (this is the default if you are not setting it).
And make sure you are using updated JDBC drivers.
If all of this is not working, then you should look at the sql profiler and try to capture the sql query as the jdbc driver executes the statement, and run that statement in the management studio and see if there is a difference.
Also, since you are pulling so much data, you should be try to be sure you aren't having any memory/garbage collection slowdowns on the JVM (although in this case that doesn't really explain the time discrepancy).

If the query is parametrized it can be a missing parameter or a parameter that is set with the wrong function, e.g. setLong for string, etc.
Try to run your query with all parameters hardcoded into the query body without any ? to see of this is a problem.

I know this is an old question but since it's one of the first results when searching for this issue I figured I should post what worked for me. I had a query that took less than 10 seconds when I used SQL Server JDBC driver but more than 4 minutes when using jTDS. I tried all suggestions mentioned here and none of it made any difference. The only thing that worked is adding this to the URL ";prepareSQL=1"
See Here for more

I know this is a very old question but since it's one of the first results when searching for this issue I thought that I should post what worked for me.
I had a query that took about 3 seconds when I used SQL Server Management Studio (SSMS) but took 3.5 minutes when running using jTDS JDBC driver via the executeQuery method.
None of the suggestion mentioned above worked for me mainly because I was using just Statement and not Prepared Statement. The only thing that worked for me was to specify the name of the initial or default database in the connection string, to which the connecting user has at least the db_datareader database role membership. Having only the public role is not sufficient.
Here’s the sample connection string:
jdbc:jtds:sqlserver://YourSqlServer.name:1433/DefaultDbName
Please ensure that you have the ending /DefaultDbName specified in the connection string. Here DefaultDbName is the name of the database to which the user ID specified for making the JDBC connection has at least the db_datareader database role. If omitted, SQL Server defaults to using the master database. If the user ID used to make the JDBC connection only has the public role in the master database, the query takes exceptionally long.
I don’t know why this happens. However, I know a different query plan is used in such circumstances. I confirmed this using the SQL Profiler tool.
Environment details:
SQL Server version: 2016
jTDS driver version: 1.3.1
Java version: 11

Pulling back that much data is going to require lots of time. You should probably figure out a way to not require that much data in your application at any given time. Page the data or use lazy loading for example. Without more details on what you're trying to accomplish, it's hard to say.

The fact that it is quick when run from management studio could be due to an incorrectly cached query plan and out of date indexes (say, due to a large import or deletions). Is it returning all 750K records quickly in SSMS?
Try rebuilding your indexes (or if that would take too long, update your statistics); and maybe flushing the procedure cache (use caution if this is a production system...): DBCC FREEPROCCACHE

To start debugging this, it would be good to determine whether the problem area is in the database or in the app. Have you tried changing the query such that it returns a much smaller result? If that doesnt return, I would suggest targeting the way you are accessing the DB from Java.

Try adjusting the fetch size of the Statement and try selectMethod of cursor
http://technet.microsoft.com/en-us/library/aa342344(SQL.90).aspx
We had issues with large result sets using mysql and needed to make it stream the result set as explained in the following link.
http://helpdesk.objects.com.au/java/avoiding-outofmemoryerror-with-mysql-jdbc-driver

Quote from the MS Adaptive buffer guidelines:
Avoid using the connection string property selectMethod=cursor to allow the application to process a very large result set. The adaptive buffering feature allows applications to process very large forward-only, read-only result sets without using a server cursor. Note that when you set selectMethod=cursor, all forward-only, read-only result sets produced by that connection are impacted. In other words, if your application routinely processes short result sets with a few rows, creating, reading, and closing a server cursor for each result set will use more resources on both client-side and server-side than is the case where the selectMethod is not set to cursor.
And
There are some cases where using selectMethod=cursor instead of responseBuffering=adaptive would be more beneficial, such as:
If your application processes a forward-only, read-only result set slowly, such as reading each row after some user input, using selectMethod=cursor instead of responseBuffering=adaptive might help reduce resource usage by SQL Server.
If your application processes two or more forward-only, read-only result sets at the same time on the same connection, using selectMethod=cursor instead of responseBuffering=adaptive might help reduce the memory required by the driver while processing these result sets.
In both cases, you need to consider the overhead of creating, reading, and closing the server cursors.
See more: http://technet.microsoft.com/en-us/library/bb879937.aspx

Sometimes it could be due to the way parameters are binding to the query object.
I found the following code is very slow when executing from java program.
Query query = em().createNativeQuery(queryString)
.setParameter("param", SomeEnum.DELETED.name())
Once I remove the "deleted" parameter and directly append that "DELETED" string to the query, it became super fast. It may be due to that SQL server is expecting to have all the parameters bound to decide the optimized plan.

Two connections instead of two Statements
I had one connection to SQL server and used it for running all queries I needed, creating a new Statement in each method that needed DB interaction.
My application was traversing a master table and, for each record, fetching all related information from other tables, so the first and largest query would be running from beginning to end of the execution while iterating its result set.
Connection conn;
conn = DriverManager.getConnection("jdbc:jtds:sqlserver://myhostname:1433/DB1", user, pasword);
Statement st = conn.createStatement();
ResultSet rs = st.executeQuery("select * from MASTER + " ;");
// iterating rs will cause the other queries to complete Entities read from MASTER
// ...
Statement st1 = conn.createStatement();
ResultSet rs1 = st1.executeQuery("select * from TABLE1 where id=" + masterId + ";");
// st1.executeQuery() makes rs to be cached
// ...
Statement st2 = conn.createStatement();
ResultSet rs2 = st2.executeQuery("select * from TABLE2 where id=" + masterId + ";");
// ...
This meant that any subsequent queries (to read single records from the other tables) would cause the first result set to be cached entirely and not before that the other queries would run at all.
The solution was running all other queries in a second connection. This let the first query and its result set alone and undisturbed while the rest of the queries run swiftly in the other connection.
Connection conn;
conn = DriverManager.getConnection("jdbc:jtds:sqlserver://myhostname:1433/DB1", user, pasword);
Statement st = conn.createStatement();
ResultSet rs = st.executeQuery("select * from MASTER + " ;");
// ...
Connection conn2 = DriverManager.getConnection("jdbc:jtds:sqlserver://myhostname:1433/DB1", user, pasword);
Statement st1 = conn2.createStatement();
ResultSet rs1 = st1.executeQuery("select * from TABLE1 where id=" + masterId + ";");
// ...
Statement st2 = conn2.createStatement();
ResultSet rs2 = st2.executeQuery("select * from TABLE2 where id=" + masterId + ";");
// ...

Does it take a similar amount of time with SQLWB? If the Java version is much slower, then I would check a couple of things:
You shoudl get the best performance with a forward-only, read-only ResultSet.
I recall that the older JDBC drivers from MSFT were slow. Make sure you are using the latest-n-greatest. I think there is a generic SQL Server one and one specifically for SQL 2005.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.