JAVA Getting data from Database Without a loop

JAVA Getting data from Database Without a loop - java

I am Programming a software with JAVA and using the Oracle DB.
Normally we obtain the values from the Database using a Loop like
Resultset rt = (Resultset) cs.getObject(1);
while(rt.next){
....
}
But it sound is more slowly when fetch thousand of data from the database.
My question is:
In Oracle DB: I created a Procedure like this and it is the Iterating data and assign to the cursor.
Ex.procedure test_pro(sysref_cursor out info) as
open info select * from user_tbl ......
end test_pro;
In JAVA Code: As I mentioned before I Iterate a the resultset for obtain values, but the side of database, even I select the values, why should I use a loop for getting that values?
(another fact in the .net frameworks, there are using the database binding concept. So is any way in the java, binding the database procedures like .net 's, without the iterating.
)

Depending on what you are going to do with that data and at which frequence, the choice for a ref_cursor might be a good or a bad one. Ref_cursors are intended to give non Oracle aware programs a way to pass it data, for reporting purposes.
In you case, stick to the looping but don't forget to implement array fetching because this has a tremendous effect on the performance. The database passes blocks of rows to your jdbc buffer at the client and your code fetches rows from that buffer. By the time you hit the end of the buffer, the Jdbc layer requests the next chunk of rows from the database, eliminating lot's of network round trips. The default already fetches 10 rows at a time. For larger sets, use bigger numbers, if memory can provide the room.
See Oracle® Database JDBC Developer's Guide and Reference

If you know for sure there will always be exactly one result, like in this case, you can even skip the if and just call rs.next() once:
For example :
ResultSet resultset = statement.executeQuery("SELECT MAX (custID) FROM customer");
resultset.next(); // exactly one result so allowed
int max = resultset.getInt(1); // use indexed retrieval since the column has no name

Yes,you can call procedure in java.
http://www.mkyong.com/jdbc/jdbc-callablestatement-stored-procedure-out-parameter-example/

You can't avoid looping. For performance reasons you need to adjust your prefetch on Statement or Resultset object (100 is a solid starting point).
Why is done this way? It's similar to reading streams - you never know how big it can be - so you read by chunk/buffer, one after another...

Related

Sorting funcationality Optimization using MySQL and Java

I am going to generate simple CSV file report in Java using Hibernate and MySQL.
I am using Native SQL (because query is too complex which is not possible with HQL or Criteria query and also this doesn't matter here) part of Hibernate to fetch the data and simply writing it using any of CSVWriter api (this doesn't matter here.)
As far all is well, but the problem starts now.
Requirements:
The report size can be with 5000K to 15000K records with 25 fields.
It can be run on real time.
There is one report column (let's say finalValue) for which I want sorting and it can be extract like this, (sum(b.quantity*c.unit_gross_price) - COALESCE(sum(pai.value),0)).
Problem:
MySQL Indexing can not be used for finalValue column (mentioned above) as it is complex combination of aggregate functions. So if execute the query (with or without limit) with sorting, it is taking 40sec, otherwise 0.075sec.
The Solutions:
These are the some solutions, that I can think but each have some limitations.
Sorting using java.util.TreeSet : It will throw the OutOfMemoryError, which is obvious as heap space will be exceed if I will put 15000K heavy objects.
Using limit in MySQL query and write file for each iteration : It will take much time as every query will take same time around 50sec as without sorting limit can't be use.
So the main problem here is to overcome two parameters : Memory and Time. I need to balance both of them.
Any ideas, suggestions?
NOTE: I am not given here any snaps of code that doesn't mean question details is not enough. Code doe's not require here.

I think you can use a streaming ResultSet here. As documeted on this page under the ResultSet section.
Here are the main points from the documentation.
By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate and, due to the design of the MySQL network protocol, is easier to implement. If you are working with ResultSets that have a large number of rows or large values and cannot allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.
To enable this functionality, create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this, any result sets created with the statement will be retrieved row-by-row.
There are some caveats with this approach. You must read all of the rows in the result set (or close it) before you can issue any other queries on the connection, or an exception will be thrown.
The earliest the locks these statements hold can be released (whether they be MyISAM table-level locks or row-level locks in some other storage engine such as InnoDB) is when the statement completes.
If using streaming results, process them as quickly as possible if you want to maintain concurrent access to the tables referenced by the statement producing the result set.
So, with a streaming result-set, write your order by query, and then start writing the results into your CSV file.
This still probably doesn't solve the sorting issue, but I think if you can't pre-generate that value and put an index on it, the sorting is going to take some time.
However, there might be some server config variables that you can use to optimize the sorting performance.
From the MySQL Order-By optimization page
I think you can set the read_rnd_buffer_size value, which, according to the docs, can:
Setting the variable to a large value can improve ORDER BY performance by a lot
Another one is sort_buffer_size, for which, the docs say the follwing:
If you see many Sort_merge_passes per second in SHOW GLOBAL STATUS output, you can consider increasing the sort_buffer_size value to speed up ORDER BY or GROUP BY operations that cannot be improved with query optimization or improved indexing.
Another variable that can probably help is the innodb_buffer_pool_size. Which allows innodb to keep as much table data in memory as possible and avoid some disk-seeks.
However, all of these variables require some tuning. Some trial-and-error and probably some kind of benchmarking to get right.
There are some other suggestions on that MySQL Order-By optimization page as well.

Use a temporary table to store your select result with an index on finalValue. This will store and index your intermediate result.
CREATE TEMPORARY TABLE my_temp_table (INDEX my_index_name (finalValue))
SELECT ... -- your select
Note that complex expressions will require an alias in your SELECT to be used as a part of a CREATE TABLE SELECT. I assume that your SELECT has the alias finalValue (the column you mentioned).
Then select the temporary table ordered by the finalValue (the index will be used).
SELECT * FROM my_temp_table ORDER BY finalValue;
And finally drop the temporary table (or reuse it if you want, but remember that when client session terminates temporary data is automatically deleted).

Summary tables. (Let's see more details to be sure this is Data Warehouse type data.) Summary tables are augmented periodically with subtotals and counts. Then when the report is needed, the data is readily available almost directly from the summary table, rather than scanning lots of raw data and doing aggregates.
My blog on Summary Tables. Let's see your schema and report query; we can discuss this in more detail.

Getting the count of results returned by a MySQL Query with JDBC in the most performance efficient way

The accepted way of getting the number of results from a JDBC result seems to be, to do resultSet.last(), and then resultSet.getRow(), according to this answer. But, in that answer, the author also says:
it may not be a good idea as it can mean reading the entire table
over the network and throwing away the data. Do a SELECT COUNT(*) FROM
... query instead.
I'm looking for a definite answer on this. Performance wise, will it be better to do a separate COUNT(*) query to get the number of results, or will it be better to do resultSet.last() and resultSet.getRow(), followed by resultSet.first() again?
If the ResultSet has already fetched the results and is holding them in memory, then it'd undoubtedly be better just to do last() and getRow() as (I assume) it would just loop over the results in the memory. But the OP's answer above seems to imply that it lazy loads the results from the db as they're requested.

Making a separate query is not a good solution (on my opinion) because server will search rows twice for one result set.
To prevent "double" query MySQL has function FOUND_ROWS(), it returns a number of rows founded for current conditions from WHERE clause. This is very useful when you use LIMIT and OFFSET in the query.
I believe that using this function is a better solution.
http://dev.mysql.com/doc/refman/5.0/en/information-functions.html#function_found-rows
I had read some time ago that JDBC streams rows from server, but this was not official documentation and I can't tell how it works in the present, but queries can be different - with difficult sub-queries and joins. I think that server can finish all resources if will "stream" these rows only when client asked for them.

According to me it depends on your use case. If you resultset is small enough then the performance difference would be negligible. I highly doubt that the resultset fetches the whole result in the main memory for larger resultsets. To second this here is the link that talks about this particular scenario. In such case calculating size using last() , getrows() followed by first() would be obviously inefficient as it has to first load the portion of resultset in memory and perform these operations(and don't forget the network transfer time) to calculate the net size. On the other hand count(*) would just go in and count the rows in your result set.
This is my understanding of which one out performs other. I am open to any inputs from others

What is the fastest way to retrieve sequential data from database?

I have a lot of rows in a database and it must be processed, but I can't retrieve all the data to the memory due to memory limitations.
At the moment, I using LIMIT and OFFSET to retrieve the data to get the data in some especified interval.
I want to know if the is the faster way or have another method to getting all the data from a table in database. None filter will be aplied, all the rows will be processed.

SELECT * FROM table ORDER BY column
There's no reason to be sucking the entire table in to RAM. Simply open a cursor and start reading. You can play games with fetch sizes and what not, but the DB will happily keep its place while you process your rows.
Addenda:
Ok, if you're using Java then I have a good idea what your problem is.
First, just by using Java, you're using a cursor. That's basically what a ResultSet is in Java. Some ResultSets are more flexible than others, but 99% of them are simple, forward only ResultSets that you call 'next' upon to get each row.
Now as to your problem.
The problem is specifically with the Postgres JDBC driver. I don't know why they do this, perhaps it's spec, perhaps it's something else, but regardless, Postgres has the curious characteristic that if your Connection has autoCommit set to true, then Postgres decides to suck in the entire result set on either the execute method or the first next method. Not really important as to where, only that if you have a gazillion rows, you get a nice OOM exception. Not helpful.
This can easily be exactly what you're seeing, and I appreciate how it can be quite frustrating and confusing.
Most Connection default to autoCommit = true. Instead, simply set autoCommit to false.
Connection con = ...get Connection...
con.setAutoCommit(false);
PreparedStatement ps = con.prepareStatement("SELECT * FROM table ORDER BY columm");
ResultSet rs = ps.executeQuery();
while(rs.next()) {
String col1 = rs.getString(1);
...and away you go here...
}
rs.close();
ps.close();
con.close();
Note the distinct lack of exception handling, left as an exercise for the reader.
If you want more control over how many rows are fetched at a time into memory, you can use:
ps.setFetchSize(numberOfRowsToFetch);
Playing around with that might improve your performance.
Make sure you have an appropriate index on the column you use in the ORDER BY if you care about sequencing at all.

Since its clear your using Java based on your comments:
If you are using JDBC you will want to use:
http://download.oracle.com/javase/1.5.0/docs/api/java/sql/ResultSet.html
If you are using Hibernate it gets trickier:
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/batch.html

Fastest way to iterate through large table using JDBC

I'm trying to create a java program to cleanup and merge rows in my table. The table is large, about 500k rows and my current solution is running very slowly. The first thing I want to do is simply get an in-memory array of objects representing all the rows of my table. Here is what I'm doing:
pick an increment of say 1000 rows at a time
use JDBC to fetch a resultset on the following SQL query
SELECT * FROM TABLE WHERE ID > 0 AND ID < 1000
add the resulting data to an in-memory array
continue querying all the way up to 500,000 in increments of 1000, each time adding results.
This is taking way to long. In fact its not even getting past the second increment from 1000 to 2000. The query takes forever to finish (although when I run the same thing directly through a MySQL browser its decently fast). Its been a while since I've used JDBC directly. Is there a faster alternative?

First of all, are you sure you need the whole table in memory? Maybe you should consider (if possible) selecting rows that you want to update/merge/etc. If you really have to have the whole table you could consider using a scrollable ResultSet. You can create it like this.
// make sure autocommit is off (postgres)
con.setAutoCommit(false);
Statement stmt = con.createStatement(
ResultSet.TYPE_SCROLL_INSENSITIVE, //or ResultSet.TYPE_FORWARD_ONLY
ResultSet.CONCUR_READ_ONLY);
ResultSet srs = stmt.executeQuery("select * from ...");
It enables you to move to any row you want by using 'absolute' and 'relative' methods.

One thing that helped me was Statement.setFetchSize(Integer.MIN_VALUE). I got this idea from Jason's blog. This cut down execution time by more than half. Memory consumed went down dramatically (as only one row is read at a time.)
This trick doesn't work for PreparedStatement, though.

Although it's probably not optimum, your solution seems like it ought to be fine for a one-off database cleanup routine. It shouldn't take that long to run a query like that and get the results (I'm assuming that since it's a one off a couple of seconds would be fine). Possible problems -
is your network (or at least your connection to mysql ) very slow? You could try running the process locally on the mysql box if so, or something better connected.
is there something in the table structure that's causing it? pulling down 10k of data for every row? 200 fields? calculating the id values to get based on a non-indexed row? You could try finding a more db-friendly way of pulling the data (e.g. just the columns you need, have the db aggregate values, etc.etc)
If you're not getting through the second increment something is really wrong - efficient or not, you shouldn't have any problem dumping 2000, or 20,000 rows into memory on a running JVM. Maybe you're storing the data redundantly or extremely inefficiently?

Display 100000 records on browser / multiple pages

I would like to display 100000 records on browser / multiple pages with minimal impact on memory. ie Per page 100 records.
I would like to move page back and forth. My doubts are
1. Can I maintain all the record inside the memory ? Is this good Idea ?
2) Can I make database connection/query for ever page ? If so how do write a query?
Could anyone please help me..

It's generally not a good idea to maintain so much records in memory. If the application is accessed by several users at the same time, the memory impact will be huge.
I don't know what DBMS are you using, but in MySQL and several others, you can rely on the DB for pagination with a query such as:
SELECT * FROM MyTable
LIMIT 0, 100
The first number after limit is the offset (how many records it will skip) and the second is the number of records it will fetch.
Bear in mind that this is SQL does not have the same syntax on every DB (some don't even support it).

I would not hold the data in memory (either in the browser or in the serving application). Instead I'd page through the results using SQL.
How you do this can be database-specific. See here for one example in MySql. Mechanisms will exist for other databases.

1) No, having all the records in memory kind of defeats the point of having a database. Look into having a scrollable result set, that way you can get the functionality you want without having to play with the SQL. You can also adjust how many records are fetched at a time so that you don't load more records than you need.
2) Db connections are expensive to create and destroy but any serious system will pool the connections so the impact on performance won't be that great.
If you want to get a bit more fancy you can do away with pages altogether and just load more records as the user scrolls through the list.

It would not be a good idea, as you are making the browser executable hold all of that.
When I had something like this to do used javascript to render the page, and just made ajax calls to get the next page. There is a slight delay in displaying the next table, as you fetch it, but users are used to that.
If you are showing 100 records/page, use json to pass the data from the server, as javascript can parse it quickly, and then use innerHTML to put the html, as the DOM is much slower in rendering tables.

As mentioned by others here, it is not a good idea to store a large list of results in memory. Query for results for each page is certainly a much better approach. To do that you have two options. One is to use whatever the database specific features your DBMS provides for targeting a specific subsection of results from a query. The other approach is to use the generic methods provided by JDBC to achieve the same effect. This keeps your code from being tied to a specific database:
// get a ResultSet from some query
ResultSet results = ...
if (count > 0) {
results.setFetchSize(count + 1);
results.setFetchDirection(ResultSet.FETCH_FORWARD);
results.absolute(count * beginIndex);
}
for (int rowNumber = 0; results.next(); ++rowNumber) {
if (count > 0 && rowNumber > count) {
break;
}
// process the ResultSet below
...
}
Using a library like Spring JDBC or Hibernate can make this even easier.

In many SQL language, you have a notion of LIMIT (mysql, ...) or OFFSET (mssql).
You can use this kind of thing to limit rows per page

Depends on the data. 100k int's might not be too bad if you are caching that.
T-SQL has SET ##ROWCOUNT = 100 to limit the amount of records returned.
But to do it right and return the total # of pages, you need a more advanced paging SPROC.
It's a pretty hotly dedated topic and there are many ways to do it.
Here's a sample of an old sproc I wrote
CREATE PROCEDURE Objects_GetPaged
(
#sort VARCHAR(255),
#Page INT,
#RecsPerPage INT,
#Total INT OUTPUT
)
AS
SET NOCOUNT ON
--Create a temporary table
CREATE TABLE #TempItems
(
id INT IDENTITY,
memberid int
)
INSERT INTO #TempItems (memberid)
SELECT Objects.id
FROM Objects
ORDER BY CASE #sort WHEN 'Alphabetical' THEN Objects.UserName ELSE NULL END ASC,
CASE #sort WHEN 'Created' THEN Objects.Created ELSE NULL END DESC,
CASE #sort WHEN 'LastLogin' THEN Objects.LastLogin ELSE NULL END DESC
SELECT #Total=COUNT(*) FROM #TempItems
-- Find out the first and last record we want
DECLARE #FirstRec int, #LastRec int
SELECT #FirstRec = (#Page - 1) * #RecsPerPage
SELECT #LastRec = (#Page * #RecsPerPage + 1)
SELECT *
FROM #TempItems
INNER JOIN Objects ON(Objects.id = #TempItems.id)
WHERE #TempItems.ID > #FirstRec AND #TempItems.ID < #LastRec
ORDER BY #TempItems.Id

I would recommend that you choose using CachedRowSet .
A CachedRowSet object is a container for rows of data that caches its rows in memory, which makes it possible to operate without always being connected to its data source.
A CachedRowSet object is a disconnected rowset, which means that it makes use of a connection to its data source only briefly. It connects to its data source while it is reading data to populate itself with rows and again while it is propagating changes back to its underlying data source.
Because a CachedRowSet object stores data in memory, the amount of data that it can contain at any one time is determined by the amount of memory available. To get around this limitation, a CachedRowSet object can retrieve data from a ResultSet object in chunks of data, called pages. To take advantage of this mechanism, an application sets the number of rows to be included in a page using the method setPageSize. In other words, if the page size is set to five, a chunk of five rows of data will be fetched from the data source at one time. An application can also optionally set the maximum number of rows that may be fetched at one time. If the maximum number of rows is set to zero, or no maximum number of rows is set, there is no limit to the number of rows that may be fetched at a time.
After properties have been set, the CachedRowSet object must be populated with data using either the method populate or the method execute. The following lines of code demonstrate using the method populate. Note that this version of the method takes two parameters, a ResultSet handle and the row in the ResultSet object from which to start retrieving rows.
CachedRowSet crs = new CachedRowSetImpl();
crs.setMaxRows(20);
crs.setPageSize(4);
crs.populate(rsHandle, 10);
When this code runs, crs will be populated with four rows from rsHandle starting with the tenth row.
On the similar path, you could build upon a strategy to paginate your data on the JSP and so on and so forth.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.