I was using so far something like this for querying my database that was working perfectly fine :
PreparedStatement prepStmt = dbCon.prepareStatement(mySql);
ResultSet rs = prepStmt.executeQuery();
But then I needed to use the rs.first(); in order to be able to iterate over my rs multiple times. So I use now
PreparedStatement prepStmt = dbCon.prepareStatement(mySql,ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE);
My question is related to the performance of the two. What do I lose if I use the second option? Will using the second option have any negative effect in the code that I have written so far?
PS: Note that my application is a multi-user, database-intensive web application (on a Weblogic 10.3.4) that uses a back end Oracle 11g database.
Thanks all for your attention.
UPDATE
My maximum reslutset size will be less than 1000 rows and 15-20 columns
If you're using scrollability (your second option), pay attention to this:
Important: Because all rows of any scrollable result set are stored in
the client-side cache, a situation where the result set contains many
rows, many columns, or very large columns might cause the client-side
Java Virtual Machine (JVM) to fail. Do not specify scrollability for a
large result set.
Source: Oracle Database JDBC Developer's Guide and Reference
Since an Oracle cursor is a forward-only structure, in order to simulate a scrollable cursor, the JDBC driver would generally need to cache the results in memory if it wants to be able to ensure that the same results are returned when you iterate through the results a second time. Depending on the number and size of the results returned from the query, that can involve a substantial amount of additional memory being consumed on the application server. On the other hand, that should mean that iterating through the ResultSet a second time should be much more efficient than the first time.
Whether the extra memory required is meaningful depends on your application. You say that the largest ResultSet will have 1000 rows. If you figure that each row is 500 bytes (this will obviously depend on data types-- if your ResultSet just has a bunch of numbers, it would be much smaller, if it contains a bunch of long description strings, it may be much larger), 1000 rows is 500 kb per user. If you've got 1000 simultaneous users, that's only 500 MB of storage which probably isn't prohibitive. If you've got 1 million simultaneous users, on the other hand, that's 500 GB which is probably means that you're buying a few new servers. If your rows are 5000 bytes rather than 500, then you're talking about 5 GB of RAM which could be a large fraction of the memory required on the application server to run your application.
Related
I am using a java jdbc application to fetch about 500,000 records from DB. The Database being used is Oracle. I write the data into a file as soon as each row is fetched. Since it takes about an hour to complete fetching the entire data, I am trying to increase the fetch size of the result set. I have seen in multiple links that while increasing the fetch size one should be careful about the memory consumption. Does increasing the fetch size actually increase the heap memory used by the jvm?
Suppose if the fetch size is 10 and the program query returns 100 rows in total. During the first fetch the resultset contains 10 record. Once I read the first 10 records the resultset fetches the next 10. Does this mean that after the 2nd fetch the dataset will contain 20 records? Are the earlier 10 records still maintained in memory or are they removed while fetching the newer batch?
Any help is appreciated.
It depends. Different drivers may behave differently and different ResultSet settings may behave differently.
If you have a CONCUR_READ_ONLY, FETCH_FORWARD, TYPE_FORWARD_ONLY ResultSet, the driver will almost certainly actively store in memory the number of rows that corresponds to your fetch size (of course data for earlier rows will remain in memory for some period of time until it is garbage collected). If you have a TYPE_SCROLL_INSENSITIVE ResultSet, on the other hand, it is very likely that the driver would store all the data that was fetched in memory in order to allow you to scroll backwards and forwards through the data. That's not the only possible way to implement this behavior, so different drivers (and different versions of drivers) may have different behaviors but it is the simplest and the way that most drivers I've come across behave.
While increasing the fetch size may help the performance a bit I would also look into tuning the SDU size which controls the size of the packets at the sqlnet layer. Increasing the SDU size can speed up data transfers.
Of course the time it takes to fetch these 500,000 rows largely depends on how much data you're fetching. If it takes an hour I'm guessing you're fetching a lot of data and/or you're doing it from a remote client over a WAN.
To change the SDU size:
First change the default SDU size on the server to 32k (starting in 11.2.0.3 you can even use 64kB and up to 2MB starting in 12c) by changing or adding this line in sqlnet.ora on the server:
DEFAULT_SDU_SIZE=32767
Then modify your JDBC URL:
jdbc:oracle:thin:#(DESCRIPTION=(SDU=32767)(HOST=...)(PORT=...))(CONNECT_DATA=
I'm using this code to load data from a database :
PreparedStatement inputStmt = connection.prepareStatement("select * from A");
inputStmt.setFetchSize(3000);
inputStmt.executeQuery();
Since i m using a setFetchSize, I know that the request will fetch only 3000 rows at a time, and if needed it will fetch the next 3000 rows ...
My question is : when we fetch the second 3000 rows, is the first 3000 remains in the cache ?
because I'm reading a table with millions of rows, and if I do not manage well my callc I will have a memory issue.
The default ResultSet type is ResultSet.TYPE_FORWARD_ONLY, meaning that you can only go forward, one row at a time. Unless the driver is very poorly implemented, it won't keep any unaccessible rows in memory anymore.
The default fetch size differs between drivers. Some load the full resultset by default (for example Postgres), others have a smaller fetch size (e.g. Oracle), which can even be ineffective for some types of tasks.
I need to update / insert a large number of entries very fast. I see 2 options
creating many queries and send them via executeBatch
create one big query (contains all updates/inserts in db-specific syntax) and just execute it. Since the number of updates is fix ("batch size") i can prepare this statement too
The target db is oracle. The number of inserts/updates in a batch is a fixed number between 1000 and 10000 (does this number has some impact on performance?)
So what way to go?
Your options are essentially the same. In fact they may be identical, unless your second option is implemented in a poor way.
Using built in PreparedStatement batching is safer, since the driver will know what to do a lot better than you do. There's less chances for programmer error, and should it ever happen that you change your database provider, you won't need to double check whether your solution is still valid.
Make sure to check out how to properly perform the batching. For example the batch size is commonly 100 instead of the full amount of rows you wish to insert (so you would have 10 executeBatch()es to insert your 1000 rows).
I have a Java application using JDBC that runs once per day on my server and interacts with a MySQL database (v5.5) also running on the same server. The app is querying for and iterating through all rows in a table.
The table is reasonably small at the moment (~5000 rows) but will continue to grow indefinitely. My servers memory is limited and I don't like the idea of the app's memory consumption being indeterminate.
If I use statement.setFetchSize(n) prior to running the query, it's unclear to me what is happening here. For example, if I use something like:
PreparedStatement query = db.prepareStatement("SELECT x, y FROM z");
query.setFetchSize(n);
ResultSet result = query.executeQuery();
while ( result.next() ){
...
}
Is this how to appropriately control potentially large select queries? What's happening here? If n is 1000, then will MySQL only pull 1000 rows into memory at a time (knowing where it left off) and then grab the next 1000 (or however many) rows each time it needs to?
Edit:
It's clear to me now that setting the fetch size is not useful for me. Remember that my application and MySQL server are both running on the same machine. If MySQL is pulling in the entire query result into memory, then that affects the app too since they both share the same physical memory.
The MySQL Connector/J driver will fetch all rows, unless the fetch size is set to Integer.MIN_VALUE (in which case it will fetch one row at a time AFAIK). See the MySQL Connector/J JDBC API Implementation Notes under ResultSet.
If you expect memory usage to become a problem (or when it actually becomes a problem), you could also implement paging using the LIMIT clause (instead of using setFetchSize(Integer.MIN_VALUE)).
I'm trying to create a java program to cleanup and merge rows in my table. The table is large, about 500k rows and my current solution is running very slowly. The first thing I want to do is simply get an in-memory array of objects representing all the rows of my table. Here is what I'm doing:
pick an increment of say 1000 rows at a time
use JDBC to fetch a resultset on the following SQL query
SELECT * FROM TABLE WHERE ID > 0 AND ID < 1000
add the resulting data to an in-memory array
continue querying all the way up to 500,000 in increments of 1000, each time adding results.
This is taking way to long. In fact its not even getting past the second increment from 1000 to 2000. The query takes forever to finish (although when I run the same thing directly through a MySQL browser its decently fast). Its been a while since I've used JDBC directly. Is there a faster alternative?
First of all, are you sure you need the whole table in memory? Maybe you should consider (if possible) selecting rows that you want to update/merge/etc. If you really have to have the whole table you could consider using a scrollable ResultSet. You can create it like this.
// make sure autocommit is off (postgres)
con.setAutoCommit(false);
Statement stmt = con.createStatement(
ResultSet.TYPE_SCROLL_INSENSITIVE, //or ResultSet.TYPE_FORWARD_ONLY
ResultSet.CONCUR_READ_ONLY);
ResultSet srs = stmt.executeQuery("select * from ...");
It enables you to move to any row you want by using 'absolute' and 'relative' methods.
One thing that helped me was Statement.setFetchSize(Integer.MIN_VALUE). I got this idea from Jason's blog. This cut down execution time by more than half. Memory consumed went down dramatically (as only one row is read at a time.)
This trick doesn't work for PreparedStatement, though.
Although it's probably not optimum, your solution seems like it ought to be fine for a one-off database cleanup routine. It shouldn't take that long to run a query like that and get the results (I'm assuming that since it's a one off a couple of seconds would be fine). Possible problems -
is your network (or at least your connection to mysql ) very slow? You could try running the process locally on the mysql box if so, or something better connected.
is there something in the table structure that's causing it? pulling down 10k of data for every row? 200 fields? calculating the id values to get based on a non-indexed row? You could try finding a more db-friendly way of pulling the data (e.g. just the columns you need, have the db aggregate values, etc.etc)
If you're not getting through the second increment something is really wrong - efficient or not, you shouldn't have any problem dumping 2000, or 20,000 rows into memory on a running JVM. Maybe you're storing the data redundantly or extremely inefficiently?