I'm using this code to load data from a database :
PreparedStatement inputStmt = connection.prepareStatement("select * from A");
inputStmt.setFetchSize(3000);
inputStmt.executeQuery();
Since i m using a setFetchSize, I know that the request will fetch only 3000 rows at a time, and if needed it will fetch the next 3000 rows ...
My question is : when we fetch the second 3000 rows, is the first 3000 remains in the cache ?
because I'm reading a table with millions of rows, and if I do not manage well my callc I will have a memory issue.
The default ResultSet type is ResultSet.TYPE_FORWARD_ONLY, meaning that you can only go forward, one row at a time. Unless the driver is very poorly implemented, it won't keep any unaccessible rows in memory anymore.
The default fetch size differs between drivers. Some load the full resultset by default (for example Postgres), others have a smaller fetch size (e.g. Oracle), which can even be ineffective for some types of tasks.
Related
I have a Java application using JDBC that runs once per day on my server and interacts with a MySQL database (v5.5) also running on the same server. The app is querying for and iterating through all rows in a table.
The table is reasonably small at the moment (~5000 rows) but will continue to grow indefinitely. My servers memory is limited and I don't like the idea of the app's memory consumption being indeterminate.
If I use statement.setFetchSize(n) prior to running the query, it's unclear to me what is happening here. For example, if I use something like:
PreparedStatement query = db.prepareStatement("SELECT x, y FROM z");
query.setFetchSize(n);
ResultSet result = query.executeQuery();
while ( result.next() ){
...
}
Is this how to appropriately control potentially large select queries? What's happening here? If n is 1000, then will MySQL only pull 1000 rows into memory at a time (knowing where it left off) and then grab the next 1000 (or however many) rows each time it needs to?
Edit:
It's clear to me now that setting the fetch size is not useful for me. Remember that my application and MySQL server are both running on the same machine. If MySQL is pulling in the entire query result into memory, then that affects the app too since they both share the same physical memory.
The MySQL Connector/J driver will fetch all rows, unless the fetch size is set to Integer.MIN_VALUE (in which case it will fetch one row at a time AFAIK). See the MySQL Connector/J JDBC API Implementation Notes under ResultSet.
If you expect memory usage to become a problem (or when it actually becomes a problem), you could also implement paging using the LIMIT clause (instead of using setFetchSize(Integer.MIN_VALUE)).
I was using so far something like this for querying my database that was working perfectly fine :
PreparedStatement prepStmt = dbCon.prepareStatement(mySql);
ResultSet rs = prepStmt.executeQuery();
But then I needed to use the rs.first(); in order to be able to iterate over my rs multiple times. So I use now
PreparedStatement prepStmt = dbCon.prepareStatement(mySql,ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE);
My question is related to the performance of the two. What do I lose if I use the second option? Will using the second option have any negative effect in the code that I have written so far?
PS: Note that my application is a multi-user, database-intensive web application (on a Weblogic 10.3.4) that uses a back end Oracle 11g database.
Thanks all for your attention.
UPDATE
My maximum reslutset size will be less than 1000 rows and 15-20 columns
If you're using scrollability (your second option), pay attention to this:
Important: Because all rows of any scrollable result set are stored in
the client-side cache, a situation where the result set contains many
rows, many columns, or very large columns might cause the client-side
Java Virtual Machine (JVM) to fail. Do not specify scrollability for a
large result set.
Source: Oracle Database JDBC Developer's Guide and Reference
Since an Oracle cursor is a forward-only structure, in order to simulate a scrollable cursor, the JDBC driver would generally need to cache the results in memory if it wants to be able to ensure that the same results are returned when you iterate through the results a second time. Depending on the number and size of the results returned from the query, that can involve a substantial amount of additional memory being consumed on the application server. On the other hand, that should mean that iterating through the ResultSet a second time should be much more efficient than the first time.
Whether the extra memory required is meaningful depends on your application. You say that the largest ResultSet will have 1000 rows. If you figure that each row is 500 bytes (this will obviously depend on data types-- if your ResultSet just has a bunch of numbers, it would be much smaller, if it contains a bunch of long description strings, it may be much larger), 1000 rows is 500 kb per user. If you've got 1000 simultaneous users, that's only 500 MB of storage which probably isn't prohibitive. If you've got 1 million simultaneous users, on the other hand, that's 500 GB which is probably means that you're buying a few new servers. If your rows are 5000 bytes rather than 500, then you're talking about 5 GB of RAM which could be a large fraction of the memory required on the application server to run your application.
I would like to display 100000 records on browser / multiple pages with minimal impact on memory. ie Per page 100 records.
I would like to move page back and forth. My doubts are
1. Can I maintain all the record inside the memory ? Is this good Idea ?
2) Can I make database connection/query for ever page ? If so how do write a query?
Could anyone please help me..
It's generally not a good idea to maintain so much records in memory. If the application is accessed by several users at the same time, the memory impact will be huge.
I don't know what DBMS are you using, but in MySQL and several others, you can rely on the DB for pagination with a query such as:
SELECT * FROM MyTable
LIMIT 0, 100
The first number after limit is the offset (how many records it will skip) and the second is the number of records it will fetch.
Bear in mind that this is SQL does not have the same syntax on every DB (some don't even support it).
I would not hold the data in memory (either in the browser or in the serving application). Instead I'd page through the results using SQL.
How you do this can be database-specific. See here for one example in MySql. Mechanisms will exist for other databases.
1) No, having all the records in memory kind of defeats the point of having a database. Look into having a scrollable result set, that way you can get the functionality you want without having to play with the SQL. You can also adjust how many records are fetched at a time so that you don't load more records than you need.
2) Db connections are expensive to create and destroy but any serious system will pool the connections so the impact on performance won't be that great.
If you want to get a bit more fancy you can do away with pages altogether and just load more records as the user scrolls through the list.
It would not be a good idea, as you are making the browser executable hold all of that.
When I had something like this to do used javascript to render the page, and just made ajax calls to get the next page. There is a slight delay in displaying the next table, as you fetch it, but users are used to that.
If you are showing 100 records/page, use json to pass the data from the server, as javascript can parse it quickly, and then use innerHTML to put the html, as the DOM is much slower in rendering tables.
As mentioned by others here, it is not a good idea to store a large list of results in memory. Query for results for each page is certainly a much better approach. To do that you have two options. One is to use whatever the database specific features your DBMS provides for targeting a specific subsection of results from a query. The other approach is to use the generic methods provided by JDBC to achieve the same effect. This keeps your code from being tied to a specific database:
// get a ResultSet from some query
ResultSet results = ...
if (count > 0) {
results.setFetchSize(count + 1);
results.setFetchDirection(ResultSet.FETCH_FORWARD);
results.absolute(count * beginIndex);
}
for (int rowNumber = 0; results.next(); ++rowNumber) {
if (count > 0 && rowNumber > count) {
break;
}
// process the ResultSet below
...
}
Using a library like Spring JDBC or Hibernate can make this even easier.
In many SQL language, you have a notion of LIMIT (mysql, ...) or OFFSET (mssql).
You can use this kind of thing to limit rows per page
Depends on the data. 100k int's might not be too bad if you are caching that.
T-SQL has SET ##ROWCOUNT = 100 to limit the amount of records returned.
But to do it right and return the total # of pages, you need a more advanced paging SPROC.
It's a pretty hotly dedated topic and there are many ways to do it.
Here's a sample of an old sproc I wrote
CREATE PROCEDURE Objects_GetPaged
(
#sort VARCHAR(255),
#Page INT,
#RecsPerPage INT,
#Total INT OUTPUT
)
AS
SET NOCOUNT ON
--Create a temporary table
CREATE TABLE #TempItems
(
id INT IDENTITY,
memberid int
)
INSERT INTO #TempItems (memberid)
SELECT Objects.id
FROM Objects
ORDER BY CASE #sort WHEN 'Alphabetical' THEN Objects.UserName ELSE NULL END ASC,
CASE #sort WHEN 'Created' THEN Objects.Created ELSE NULL END DESC,
CASE #sort WHEN 'LastLogin' THEN Objects.LastLogin ELSE NULL END DESC
SELECT #Total=COUNT(*) FROM #TempItems
-- Find out the first and last record we want
DECLARE #FirstRec int, #LastRec int
SELECT #FirstRec = (#Page - 1) * #RecsPerPage
SELECT #LastRec = (#Page * #RecsPerPage + 1)
SELECT *
FROM #TempItems
INNER JOIN Objects ON(Objects.id = #TempItems.id)
WHERE #TempItems.ID > #FirstRec AND #TempItems.ID < #LastRec
ORDER BY #TempItems.Id
I would recommend that you choose using CachedRowSet .
A CachedRowSet object is a container for rows of data that caches its rows in memory, which makes it possible to operate without always being connected to its data source.
A CachedRowSet object is a disconnected rowset, which means that it makes use of a connection to its data source only briefly. It connects to its data source while it is reading data to populate itself with rows and again while it is propagating changes back to its underlying data source.
Because a CachedRowSet object stores data in memory, the amount of data that it can contain at any one time is determined by the amount of memory available. To get around this limitation, a CachedRowSet object can retrieve data from a ResultSet object in chunks of data, called pages. To take advantage of this mechanism, an application sets the number of rows to be included in a page using the method setPageSize. In other words, if the page size is set to five, a chunk of five rows of data will be fetched from the data source at one time. An application can also optionally set the maximum number of rows that may be fetched at one time. If the maximum number of rows is set to zero, or no maximum number of rows is set, there is no limit to the number of rows that may be fetched at a time.
After properties have been set, the CachedRowSet object must be populated with data using either the method populate or the method execute. The following lines of code demonstrate using the method populate. Note that this version of the method takes two parameters, a ResultSet handle and the row in the ResultSet object from which to start retrieving rows.
CachedRowSet crs = new CachedRowSetImpl();
crs.setMaxRows(20);
crs.setPageSize(4);
crs.populate(rsHandle, 10);
When this code runs, crs will be populated with four rows from rsHandle starting with the tenth row.
On the similar path, you could build upon a strategy to paginate your data on the JSP and so on and so forth.
What is the difference between setting statement fetch size in JDBC or firing a SQL query with LIMIT clause?
The SQL LIMIT will limit your SQL query results to those that fall within a specified range. You can use it to show the first X number of results, or to show a range from X - Y results.
The fetch size is the number of rows physically retrieved from the database at one time by the JDBC driver as you scroll through a query ResultSet with next(). For example, you set the query fetch size to 100. When you retrieve the first row, the JDBC driver retrieves the first 100 rows (or all of them if fewer than 100 rows satisfy the query). When you retrieve the second row, the JDBC driver merely returns the row from local memory - it doesn't have to retrieve that row from the database. This feature improves performance by reducing the number of calls (which are frequently network transmissions) to the database.
So, even if setting the fetch size is translated by JDBC into a SQL LIMIT clause, the big difference with forcing a SQL query with LIMIT is that with JDBC, you're actually still able to browse all the results.
SQL LIMIT applies first, in the moment when SQL server is building a resultset as the answer for the query. It can have (and usually has) influence on query plan used and consequently also on SQL server response time. The resultset contains only rows that correnspond the limit - you cannot fetch more.
Fetch size comes afterwards when content of the resultset is transfered to client. For details see Pascals answer.
I have a .jsp page where I have a GUI table that displays records from an Oracle database. This table allows typical pagination behaviour, such as "FIRST", "NEXT", "PREVIOUS" and "LAST". The records are obtained from a Java ResultSet object that is returned from executing a SQL statement.
This ResultSet might be very big, so my question is:
If I have a ResultSet containing one million records but my table only displays the data from the first ten records in the ResultSet, is the data only fetched when I start requesting record data or does all of the data get loaded into memory entirely once the ResultSet is returned from executing a SQL statement?
The Java ResultSet is a pointer (or cursor) to the results in the database. The ResultSet loads records in blocks from the database. So to answer your question, the data is only fetched when you request it but in blocks.
If you need to control how many rows are fetched at once by the driver, you can use the setFetchSize(int rows) method on the ResultSet. This will allow you to control how big the blocks it retrieves at once.
The JDBC spec does not specify whether the data is streamed or if it is loaded into memory. Oracle streams by default. MySQL does not. To get MySQL to stream the resultset, you need to set the following on the Statement:
pstmt = conn.prepareStatement(
sql,
ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY);
pstmt.setFetchSize(Integer.MIN_VALUE);
The best idea is make a sub query and display 100 or 1000 rows at a time/in single page. And managing the connection by connection pooling.
To make a sub query you can use Row count in oracle and Limit in MY SQL.
While the JDBC spec does not specify whether or not the all data in the result set would get fetched, any well-written driver won't do that.
That said, a scrollable result set might be more what you have in mind:
(link redacted, it pointed to a spyware page)
You may also consider a disconnected row set, that's stored in the session (depending on how scalable your site needs to be):
http://java.sun.com/j2se/1.4.2/docs/api/javax/sql/RowSet.html
lets say we have a table that contains 500 records in it
PreparedStatement stm=con.prepareStatement("select * from table");
stm.setFetchSize(100);// now each 100 records are loaded together from the database into the memory,
// and since we have 500 5 server round trips will occur.
ResultSet rs = stm.executeQuery();
rs.setFetchSize (50);//overrides the fetch size provided in the statements,
//and the next trip to the database will fetch the records based on the new fetch size