What is the fastest way to retrieve sequential data from database?

What is the fastest way to retrieve sequential data from database? - java

I have a lot of rows in a database and it must be processed, but I can't retrieve all the data to the memory due to memory limitations.
At the moment, I using LIMIT and OFFSET to retrieve the data to get the data in some especified interval.
I want to know if the is the faster way or have another method to getting all the data from a table in database. None filter will be aplied, all the rows will be processed.

SELECT * FROM table ORDER BY column
There's no reason to be sucking the entire table in to RAM. Simply open a cursor and start reading. You can play games with fetch sizes and what not, but the DB will happily keep its place while you process your rows.
Addenda:
Ok, if you're using Java then I have a good idea what your problem is.
First, just by using Java, you're using a cursor. That's basically what a ResultSet is in Java. Some ResultSets are more flexible than others, but 99% of them are simple, forward only ResultSets that you call 'next' upon to get each row.
Now as to your problem.
The problem is specifically with the Postgres JDBC driver. I don't know why they do this, perhaps it's spec, perhaps it's something else, but regardless, Postgres has the curious characteristic that if your Connection has autoCommit set to true, then Postgres decides to suck in the entire result set on either the execute method or the first next method. Not really important as to where, only that if you have a gazillion rows, you get a nice OOM exception. Not helpful.
This can easily be exactly what you're seeing, and I appreciate how it can be quite frustrating and confusing.
Most Connection default to autoCommit = true. Instead, simply set autoCommit to false.
Connection con = ...get Connection...
con.setAutoCommit(false);
PreparedStatement ps = con.prepareStatement("SELECT * FROM table ORDER BY columm");
ResultSet rs = ps.executeQuery();
while(rs.next()) {
String col1 = rs.getString(1);
...and away you go here...
}
rs.close();
ps.close();
con.close();
Note the distinct lack of exception handling, left as an exercise for the reader.
If you want more control over how many rows are fetched at a time into memory, you can use:
ps.setFetchSize(numberOfRowsToFetch);
Playing around with that might improve your performance.
Make sure you have an appropriate index on the column you use in the ORDER BY if you care about sequencing at all.

Since its clear your using Java based on your comments:
If you are using JDBC you will want to use:
http://download.oracle.com/javase/1.5.0/docs/api/java/sql/ResultSet.html
If you are using Hibernate it gets trickier:
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/batch.html

Related

Implement next and previous from database to form

I have a problem in implementing this case.
1) some examples show that we can use as in the following:
try{
bd.conectarBaseDeDatos();
PreparedStatement stmt;
String sql=("SELECT * FROM cab_pedido a, det_pedido b WHERE a.numero = b.numero and a.cod_agencia = b.cod_agencia");
System.out.println(sql);
stmt = bd.conexion.prepareStatement(sql,ResultSet.TYPE_SCROLL_INSENSITIVE,ResultSet.CONCUR_READ_ONLY);
bd.result = stmt.executeQuery(sql);
}
catch (InstantiationException | IllegalAccessException | SQLException e){ System.out.println(e);}
return rs;
}
This returns my resultSet and I can use rs.next(). rs.previous(), etc to solve my problem, but I see some comments that say we should close rs and db connection. How dangerous is that? I can implement without closing the resultset and connection? Because when we close resultSet, I will not able to get data anymore.
2) Store the data into Hashmap or list
This is another possibility but if I want to get the last or the first values how can I do that?
I need the next, prev, last, and first functions but I'm not sure about my first implementation.
Can anybody give me some advices of how start this.
I need solutions, advices. that duplicate means nothing.

I am going to try to start an answer here since I don't have option to comment yet All that is understandable from your code and question is:
You have a resultset from which you want to read first, last, previous, next etc set of data.
Closing the db connection resource
Either to use HashMap or List
Well, from what you have so far, you can definitely read the data from your resultset into List or HashMap
Secondly on closing the resources, Yes! You should close the db connection resources always to avoid memory leak. You can do that by adding a finally block after the catch block in your code.
On HasMap or List. If you aren't sure about the type of data you will be reading from your resultset then go with some implementation of List, e.g. ArrayList<String>. If you know the data will have Keys and Values, then you can go for HashMap.
These are just general guidelines. If you post some more code, maybe we can help further.

ANSWER for #1 -- This is a repeated question.
When you close your DB connection. ResultSet and Statement are also closed.
But that is not a guaranteed action. Therefore you should close all your DB resources.
Need for closing DB resources separately
Also consider scenario where max no of DB connection allowed to be open in the connection pool is set to a fix number. You will eventually reach that and your application will simply crash or not respond correctly.
This not only applies to DB connection/resulorces but to all IO resources.
You should (as a good practice) always close all your IO resources after you are done using them.
Not doing so you are simply leaving reason for memory leaks.
Answer to #2
Its a good to move the information/data into a proper data structure before you may want to close resultset. This is more of implementation scenario which we always face.
Form of data structure to use again will be based on the scenario. But in all cases you may want to use a Bean class/POJO to store relevant information for each row fetched because you are getting multiple values each row.
Another suggestion would be not to do a select * call via JDBC. This is not very useful. You should be preferably mentioning column names, this will allow you to control the data you are fetching into the resultset, also in many cases make the query execution faster.

JAVA Getting data from Database Without a loop

I am Programming a software with JAVA and using the Oracle DB.
Normally we obtain the values from the Database using a Loop like
Resultset rt = (Resultset) cs.getObject(1);
while(rt.next){
....
}
But it sound is more slowly when fetch thousand of data from the database.
My question is:
In Oracle DB: I created a Procedure like this and it is the Iterating data and assign to the cursor.
Ex.procedure test_pro(sysref_cursor out info) as
open info select * from user_tbl ......
end test_pro;
In JAVA Code: As I mentioned before I Iterate a the resultset for obtain values, but the side of database, even I select the values, why should I use a loop for getting that values?
(another fact in the .net frameworks, there are using the database binding concept. So is any way in the java, binding the database procedures like .net 's, without the iterating.
)

Depending on what you are going to do with that data and at which frequence, the choice for a ref_cursor might be a good or a bad one. Ref_cursors are intended to give non Oracle aware programs a way to pass it data, for reporting purposes.
In you case, stick to the looping but don't forget to implement array fetching because this has a tremendous effect on the performance. The database passes blocks of rows to your jdbc buffer at the client and your code fetches rows from that buffer. By the time you hit the end of the buffer, the Jdbc layer requests the next chunk of rows from the database, eliminating lot's of network round trips. The default already fetches 10 rows at a time. For larger sets, use bigger numbers, if memory can provide the room.
See Oracle® Database JDBC Developer's Guide and Reference

If you know for sure there will always be exactly one result, like in this case, you can even skip the if and just call rs.next() once:
For example :
ResultSet resultset = statement.executeQuery("SELECT MAX (custID) FROM customer");
resultset.next(); // exactly one result so allowed
int max = resultset.getInt(1); // use indexed retrieval since the column has no name

Yes,you can call procedure in java.
http://www.mkyong.com/jdbc/jdbc-callablestatement-stored-procedure-out-parameter-example/

You can't avoid looping. For performance reasons you need to adjust your prefetch on Statement or Resultset object (100 is a solid starting point).
Why is done this way? It's similar to reading streams - you never know how big it can be - so you read by chunk/buffer, one after another...

CachedRowSet slower than ResultSet?

In my java code, I access an oracle database table with an select statement.
I receive a lot of rows (about 50.000 rows), so the rs.next() needs some time to process all of the rows.
using ResultSet, the processing of all rows (rs.next) takes about 30 secs
My goal is to speed up this process, so I changed the code and now using a CachedRowSet:
using CachedRowSet, the processing of all rows takes about 35 secs
I don't understand why the CachedRowSet is slower than the normal ResultSet, because the CachedRowSet retrieves all data at once, while the ResultSet retrieves the data every time the rs.next is called.
Here is a part of the code:
try {
stmt = masterCon.prepareStatement(sql);
rs = stmt.executeQuery();
CachedRowSet crset = new CachedRowSetImpl();
crset.populate(rs);
while (rs.next()) {
int countStar = iterRs.getInt("COUNT");
...
}
} finally {
//cleanup
}

CachedRowSet caches the results in memory i.e. that you don't need the connection anymore. Therefore it it "slower" in the first place.
A CachedRowSet object is a container for rows of data that caches its
rows in memory, which makes it possible to operate without always
being connected to its data source.
-> http://download.oracle.com/javase/1,5.0/docs/api/javax/sql/rowset/CachedRowSet.html

There is an issue with CachedRowSet coupled together with a postgres jdbc driver.
CachedRowSet needs to know the types of the columns so it knows which java objects to create
(god knows what else it fetches from DB behind the covers!).
It therefor makes more roundtrips to the DB to fetch column metadata.
In very high volumes this becomes a real problem.
If the DB is on a remote server, this is a real problem as well because of network latency.
We've been using CachedRowSet for years and just discovered this. We now implement our own CachedRowSet, as we never used any of it's fancy stuff anyway.
We do getString for all types and convert ourselves as this seems the quickest way.
This clearly wasn't an issue with fetch size as postgres driver fetches everything by default.

What makes you think that ResultSet will retrieve the data each time rs.next() is called? It's up to the implementation exactly how it works - and I wouldn't be surprised if it fetches a chunk at a time; quite possibly a fairly large chunk.
I suspect you're basically seeing the time it takes to copy all the data into the CachedRowSet and then access it all - basically you've got an extra copying operation for no purpose.

Using normal ResultSet you can get more optimization options with RowPrefetch and FetchSize.
Those optimizes the network transport chunks and processing in the while loop, so the rs.next() has always a data to work with.
FetchSize has a default set to 10(Oracle latest versions), but as I know RowPrefetch is not set. Thus means network transport is not optimized at all.

Fastest way to iterate through large table using JDBC

I'm trying to create a java program to cleanup and merge rows in my table. The table is large, about 500k rows and my current solution is running very slowly. The first thing I want to do is simply get an in-memory array of objects representing all the rows of my table. Here is what I'm doing:
pick an increment of say 1000 rows at a time
use JDBC to fetch a resultset on the following SQL query
SELECT * FROM TABLE WHERE ID > 0 AND ID < 1000
add the resulting data to an in-memory array
continue querying all the way up to 500,000 in increments of 1000, each time adding results.
This is taking way to long. In fact its not even getting past the second increment from 1000 to 2000. The query takes forever to finish (although when I run the same thing directly through a MySQL browser its decently fast). Its been a while since I've used JDBC directly. Is there a faster alternative?

First of all, are you sure you need the whole table in memory? Maybe you should consider (if possible) selecting rows that you want to update/merge/etc. If you really have to have the whole table you could consider using a scrollable ResultSet. You can create it like this.
// make sure autocommit is off (postgres)
con.setAutoCommit(false);
Statement stmt = con.createStatement(
ResultSet.TYPE_SCROLL_INSENSITIVE, //or ResultSet.TYPE_FORWARD_ONLY
ResultSet.CONCUR_READ_ONLY);
ResultSet srs = stmt.executeQuery("select * from ...");
It enables you to move to any row you want by using 'absolute' and 'relative' methods.

One thing that helped me was Statement.setFetchSize(Integer.MIN_VALUE). I got this idea from Jason's blog. This cut down execution time by more than half. Memory consumed went down dramatically (as only one row is read at a time.)
This trick doesn't work for PreparedStatement, though.

Although it's probably not optimum, your solution seems like it ought to be fine for a one-off database cleanup routine. It shouldn't take that long to run a query like that and get the results (I'm assuming that since it's a one off a couple of seconds would be fine). Possible problems -
is your network (or at least your connection to mysql ) very slow? You could try running the process locally on the mysql box if so, or something better connected.
is there something in the table structure that's causing it? pulling down 10k of data for every row? 200 fields? calculating the id values to get based on a non-indexed row? You could try finding a more db-friendly way of pulling the data (e.g. just the columns you need, have the db aggregate values, etc.etc)
If you're not getting through the second increment something is really wrong - efficient or not, you shouldn't have any problem dumping 2000, or 20,000 rows into memory on a running JVM. Maybe you're storing the data redundantly or extremely inefficiently?

sql server query running slow from java

I have a java program that runs a bunch of queries against an sql server database. The first of these, which queries against a view returns about 750k records. I can run the query via sql server management studio, and I get results in about 30 seconds. however, I kicked off the program to run last night. when I checked on it this morning, this query still had not returned results back to the java program, some 15 hours later.
I have access to the database to do just about anything I want, but I'm really not sure how to begin debugging this. What should one do to figure out what is causing a situation like this? I'm not a dba, and am not intimately familiar with the sql server tool set, so the more detail you can give me on how to do what you might suggest would be appreciated.
heres the code
stmt = connection.createStatement();
clientFeedRS = stmt.executeQuery(StringBuffer.toString());
EDIT1:
Well it's been a while, and this got sidetracked, but this issue is back. I looked into upgrading from jdbc driver v 1.2 to 2.0, but we are stuck on jdk 1.4, and v 2.0 require jdk 1.5 so that's a non starter. Now I'm looking at my connection string properties. I see 2 that might be useful.
SelectMethod=cursor|direct
responseBuffering=adaptive|full
Currently, with the latency issue, I am running with cursor as the selectMethod, and with the default for responseBuffering which is full. Is changing these properties likely to help? if so, what would be the ideal settings? I'm thinking, based on what I can find online, that using a direct select method and adaptive response buffering might solve my issue. any thoughts?
EDIT2:
WEll I ended changing both of these connection string params, using the default select method(direct) and specifying the responseBuffering as adaptive. This ends up working best for me and alleviates the latency issues I was seeing. thanks for all the help.

I had similar problem, with a very simple request (SELECT . FROM . WHERE = .) taking up to 10 seconds to return a single row when using a jdbc connection in Java, while taking only 0.01s in sqlshell. The problem was the same whether i was using the official MS SQL driver or the JTDS driver.
The solution was to setup this property in the jdbc url :
sendStringParametersAsUnicode=false
Full example if you are using MS SQL official driver : jdbc:sqlserver://yourserver;instanceName=yourInstance;databaseName=yourDBName;sendStringParametersAsUnicode=false;
Instructions if using different jdbc drivers and more detailled infos about the problem here : http://emransharif.blogspot.fr/2011/07/performance-issues-with-jdbc-drivers.html
SQL Server differentiates its data types that support Unicode from the ones that just support ASCII. For example, the character data types that support Unicode are nchar, nvarchar, longnvarchar where as their ASCII counter parts are char, varchar and longvarchar respectively. By default, all Microsoft’s JDBC drivers send the strings in Unicode format to the SQL Server, irrespective of whether the datatype of the corresponding column defined in the SQL Server supports Unicode or not. In the case where the data types of the columns support Unicode, everything is smooth. But, in cases where the data types of the columns do not support Unicode, serious performance issues arise especially during data fetches. SQL Server tries to convert non-unicode datatypes in the table to unicode datatypes before doing the comparison. Moreover, if an index exists on the non-unicode column, it will be ignored. This would ultimately lead to a whole table scan during data fetch, thereby slowing down the search queries drastically.
In my case, i had 30M+ records in the table i was searching from. The duration to complete the request went from more than 10 seconds, to approximatively 0.01s after applying the property.
Hope this will help someone !

It appears this may not have applied to your particular situation, but I wanted to provide another possible explanation for someone searching for this problem.
I just had a similar problem where a query executed directly in SQL Server took 1 minute while the same query took 5 minutes through a java prepared statemnent. I tracked it down to the fact that it is was done as a prepared statement.
When you execute a query directly in SQL Server, you are providing it a non-parameterized query, in which it knows all of the search criteria at optimization time. In my case, my search criteria included a date range, and SQL server was able to look at it, decide "that date range is huge, let's not use the date index" and then it chose something much better.
When I execute the same query through a java prepared statement, at the time that SQL Server is optimizing the query, you haven't yet provided it any of the parameter values, so it has to make a guess which index to use. In the case of my date range, if it optimizes for a small range and I give it a large range, it will perform slower than it could. Likewise if it optimizes for a large range and I give it a small one, it's again going to perform slower than it could.
To demonstrate this was indeed the problem, as an experiment I tried giving it hints as to what to optimize for using SQL Server's "OPTIMIZE FOR" option. When I told it to use a tiny date range, my java query (which actually had a wide date range) actually took twice as long as before (10 minutes, as opposed to 5 minutes before, and as opposed to 1 minute in SQL Server). When I told it my exact dates to optimize for, the execution time was identical between the java prepared statement.
So my solution was to hard code the exact dates into the query. This worked for me because this was just a one-off statement. The PreparedStatement was not intended to be reused, but merely to parameterize the values to avoid SQL injection. Since these dates were coming from a java.sql.Date object, I didn't have to worry about my date values containing injection code.
However, for a statement that DOES need to be reused, hard coding the dates wouldn't work. Perhaps a better option for that would be to create multiple prepared statements optimized for different date ranges (one for a day, one for a week, one for a month, one for a year, and one for a decade...or maybe you only need 2 or 3 options...I don't know) and then for each query, execute the one prepared statement whose time range best matches the range in the actual query.
Of course, this only works well if your date ranges are evenly distributed. If 80% of your records were in the last year, and 20% percent spread out over the previous 10 years, then doing the "multiple queries based on range size" thing might not be best. You'd have to optimize you queries based on specific ranges or something. You'd need to figure that out through trial an error.

Be sure that your JDBC driver is configured to use a direct connection and not a cusror based connection. You can post your JDBC connection URL if you are not sure.
Make sure you are using a forward-only, read-only result set (this is the default if you are not setting it).
And make sure you are using updated JDBC drivers.
If all of this is not working, then you should look at the sql profiler and try to capture the sql query as the jdbc driver executes the statement, and run that statement in the management studio and see if there is a difference.
Also, since you are pulling so much data, you should be try to be sure you aren't having any memory/garbage collection slowdowns on the JVM (although in this case that doesn't really explain the time discrepancy).

If the query is parametrized it can be a missing parameter or a parameter that is set with the wrong function, e.g. setLong for string, etc.
Try to run your query with all parameters hardcoded into the query body without any ? to see of this is a problem.

I know this is an old question but since it's one of the first results when searching for this issue I figured I should post what worked for me. I had a query that took less than 10 seconds when I used SQL Server JDBC driver but more than 4 minutes when using jTDS. I tried all suggestions mentioned here and none of it made any difference. The only thing that worked is adding this to the URL ";prepareSQL=1"
See Here for more

I know this is a very old question but since it's one of the first results when searching for this issue I thought that I should post what worked for me.
I had a query that took about 3 seconds when I used SQL Server Management Studio (SSMS) but took 3.5 minutes when running using jTDS JDBC driver via the executeQuery method.
None of the suggestion mentioned above worked for me mainly because I was using just Statement and not Prepared Statement. The only thing that worked for me was to specify the name of the initial or default database in the connection string, to which the connecting user has at least the db_datareader database role membership. Having only the public role is not sufficient.
Here’s the sample connection string:
jdbc:jtds:sqlserver://YourSqlServer.name:1433/DefaultDbName
Please ensure that you have the ending /DefaultDbName specified in the connection string. Here DefaultDbName is the name of the database to which the user ID specified for making the JDBC connection has at least the db_datareader database role. If omitted, SQL Server defaults to using the master database. If the user ID used to make the JDBC connection only has the public role in the master database, the query takes exceptionally long.
I don’t know why this happens. However, I know a different query plan is used in such circumstances. I confirmed this using the SQL Profiler tool.
Environment details:
SQL Server version: 2016
jTDS driver version: 1.3.1
Java version: 11

Pulling back that much data is going to require lots of time. You should probably figure out a way to not require that much data in your application at any given time. Page the data or use lazy loading for example. Without more details on what you're trying to accomplish, it's hard to say.

The fact that it is quick when run from management studio could be due to an incorrectly cached query plan and out of date indexes (say, due to a large import or deletions). Is it returning all 750K records quickly in SSMS?
Try rebuilding your indexes (or if that would take too long, update your statistics); and maybe flushing the procedure cache (use caution if this is a production system...): DBCC FREEPROCCACHE

To start debugging this, it would be good to determine whether the problem area is in the database or in the app. Have you tried changing the query such that it returns a much smaller result? If that doesnt return, I would suggest targeting the way you are accessing the DB from Java.

Try adjusting the fetch size of the Statement and try selectMethod of cursor
http://technet.microsoft.com/en-us/library/aa342344(SQL.90).aspx
We had issues with large result sets using mysql and needed to make it stream the result set as explained in the following link.
http://helpdesk.objects.com.au/java/avoiding-outofmemoryerror-with-mysql-jdbc-driver

Quote from the MS Adaptive buffer guidelines:
Avoid using the connection string property selectMethod=cursor to allow the application to process a very large result set. The adaptive buffering feature allows applications to process very large forward-only, read-only result sets without using a server cursor. Note that when you set selectMethod=cursor, all forward-only, read-only result sets produced by that connection are impacted. In other words, if your application routinely processes short result sets with a few rows, creating, reading, and closing a server cursor for each result set will use more resources on both client-side and server-side than is the case where the selectMethod is not set to cursor.
And
There are some cases where using selectMethod=cursor instead of responseBuffering=adaptive would be more beneficial, such as:
If your application processes a forward-only, read-only result set slowly, such as reading each row after some user input, using selectMethod=cursor instead of responseBuffering=adaptive might help reduce resource usage by SQL Server.
If your application processes two or more forward-only, read-only result sets at the same time on the same connection, using selectMethod=cursor instead of responseBuffering=adaptive might help reduce the memory required by the driver while processing these result sets.
In both cases, you need to consider the overhead of creating, reading, and closing the server cursors.
See more: http://technet.microsoft.com/en-us/library/bb879937.aspx

Sometimes it could be due to the way parameters are binding to the query object.
I found the following code is very slow when executing from java program.
Query query = em().createNativeQuery(queryString)
.setParameter("param", SomeEnum.DELETED.name())
Once I remove the "deleted" parameter and directly append that "DELETED" string to the query, it became super fast. It may be due to that SQL server is expecting to have all the parameters bound to decide the optimized plan.

Two connections instead of two Statements
I had one connection to SQL server and used it for running all queries I needed, creating a new Statement in each method that needed DB interaction.
My application was traversing a master table and, for each record, fetching all related information from other tables, so the first and largest query would be running from beginning to end of the execution while iterating its result set.
Connection conn;
conn = DriverManager.getConnection("jdbc:jtds:sqlserver://myhostname:1433/DB1", user, pasword);
Statement st = conn.createStatement();
ResultSet rs = st.executeQuery("select * from MASTER + " ;");
// iterating rs will cause the other queries to complete Entities read from MASTER
// ...
Statement st1 = conn.createStatement();
ResultSet rs1 = st1.executeQuery("select * from TABLE1 where id=" + masterId + ";");
// st1.executeQuery() makes rs to be cached
// ...
Statement st2 = conn.createStatement();
ResultSet rs2 = st2.executeQuery("select * from TABLE2 where id=" + masterId + ";");
// ...
This meant that any subsequent queries (to read single records from the other tables) would cause the first result set to be cached entirely and not before that the other queries would run at all.
The solution was running all other queries in a second connection. This let the first query and its result set alone and undisturbed while the rest of the queries run swiftly in the other connection.
Connection conn;
conn = DriverManager.getConnection("jdbc:jtds:sqlserver://myhostname:1433/DB1", user, pasword);
Statement st = conn.createStatement();
ResultSet rs = st.executeQuery("select * from MASTER + " ;");
// ...
Connection conn2 = DriverManager.getConnection("jdbc:jtds:sqlserver://myhostname:1433/DB1", user, pasword);
Statement st1 = conn2.createStatement();
ResultSet rs1 = st1.executeQuery("select * from TABLE1 where id=" + masterId + ";");
// ...
Statement st2 = conn2.createStatement();
ResultSet rs2 = st2.executeQuery("select * from TABLE2 where id=" + masterId + ";");
// ...

Does it take a similar amount of time with SQLWB? If the Java version is much slower, then I would check a couple of things:
You shoudl get the best performance with a forward-only, read-only ResultSet.
I recall that the older JDBC drivers from MSFT were slow. Make sure you are using the latest-n-greatest. I think there is a generic SQL Server one and one specifically for SQL 2005.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.