I have a Java application using JDBC that runs once per day on my server and interacts with a MySQL database (v5.5) also running on the same server. The app is querying for and iterating through all rows in a table.
The table is reasonably small at the moment (~5000 rows) but will continue to grow indefinitely. My servers memory is limited and I don't like the idea of the app's memory consumption being indeterminate.
If I use statement.setFetchSize(n) prior to running the query, it's unclear to me what is happening here. For example, if I use something like:
PreparedStatement query = db.prepareStatement("SELECT x, y FROM z");
query.setFetchSize(n);
ResultSet result = query.executeQuery();
while ( result.next() ){
...
}
Is this how to appropriately control potentially large select queries? What's happening here? If n is 1000, then will MySQL only pull 1000 rows into memory at a time (knowing where it left off) and then grab the next 1000 (or however many) rows each time it needs to?
Edit:
It's clear to me now that setting the fetch size is not useful for me. Remember that my application and MySQL server are both running on the same machine. If MySQL is pulling in the entire query result into memory, then that affects the app too since they both share the same physical memory.
The MySQL Connector/J driver will fetch all rows, unless the fetch size is set to Integer.MIN_VALUE (in which case it will fetch one row at a time AFAIK). See the MySQL Connector/J JDBC API Implementation Notes under ResultSet.
If you expect memory usage to become a problem (or when it actually becomes a problem), you could also implement paging using the LIMIT clause (instead of using setFetchSize(Integer.MIN_VALUE)).
Related
Let's say I have a small postgres database (< 500mb) ,
and I have app which is very read intensive 99% of requests are reads.
Is there a way to tell Postgres to load all tables into RAM so it can do selects faster.?
I think Oracle and SQL server have that kind of functionality.
I have done some test on my local machine, I have table with 500 records, Java HashMap Took 2ms, sql select took 12000 ms,
Obviously java HashMap is faster because it is within same process but is there a way to speedup sql queries for small tables in postgres ? Thanks
for (int i = 0; i < 100_000; i++) {
//1) select * from someTable where id = 10
// 2) get from Java HashMap by key
}
PostgreSQL caches data in RAM automatically. All you have to do is set shared_buffers to be at least as big as your database is, then all data have to be read only once and will stay cached in RAM.
If you want to reduce the initial slowness caused by reading the data, set shared_preload_libraries = 'pg_prewarm,pg_stat_statements'.
I suspect that the excessive slowness you report is not slowness of the database, but network lag or inefficient queries. pg_stat_statements will help profile your database workload.
My Scenario is i have big query with lot of joins and lot of decode/case calls in select and i am passing one param to where condition from java and i see for 150000 rows java fetch is very slow but query is running faster in SQL developer client interface.
i thought of creating or replacing a view which takes one parameter and call that view from java.
Did not find resource to know how to pass prams to create or replace view statement from java ?
Any one suggest other approach that fetches rows quickly ?
Using oracle 12c and driver is jdbc7 and jdk8
First (and easiest):
Set the JDBC fetch size to a high number in your statement. There is a setFetchSize(int) method on Statement, PreparedStatement, CallableStatement, and ResultSet objects.
This defaults to something small like 10 rows. Set that to a reasonably high number, such as 500 or more.
This is a setting that will definitely slow down a query that pulls back hundreds of thousands of records.
Second:
Verify that the query is indeed running fast in SQL Developer, to the last row.
You can export to a file or try wrapping the query in a PL/SQL statement that will loop through all records.
If you wish, you can use AUTOTRACE in SQL*Plus to your advantage:
SET TIMING ON
SET AUTOTRACE TRACEONLY
<your query>
This will run the query to the end, pulling all records over the network but not displaying them.
The goal here is to prove that your SQL statement is indeed returning all records as quickly as needed.
If not, then you have a standard tuning exercise. Get it running to completion quickly in SQL Developer first.
I am using a java jdbc application to fetch about 500,000 records from DB. The Database being used is Oracle. I write the data into a file as soon as each row is fetched. Since it takes about an hour to complete fetching the entire data, I am trying to increase the fetch size of the result set. I have seen in multiple links that while increasing the fetch size one should be careful about the memory consumption. Does increasing the fetch size actually increase the heap memory used by the jvm?
Suppose if the fetch size is 10 and the program query returns 100 rows in total. During the first fetch the resultset contains 10 record. Once I read the first 10 records the resultset fetches the next 10. Does this mean that after the 2nd fetch the dataset will contain 20 records? Are the earlier 10 records still maintained in memory or are they removed while fetching the newer batch?
Any help is appreciated.
It depends. Different drivers may behave differently and different ResultSet settings may behave differently.
If you have a CONCUR_READ_ONLY, FETCH_FORWARD, TYPE_FORWARD_ONLY ResultSet, the driver will almost certainly actively store in memory the number of rows that corresponds to your fetch size (of course data for earlier rows will remain in memory for some period of time until it is garbage collected). If you have a TYPE_SCROLL_INSENSITIVE ResultSet, on the other hand, it is very likely that the driver would store all the data that was fetched in memory in order to allow you to scroll backwards and forwards through the data. That's not the only possible way to implement this behavior, so different drivers (and different versions of drivers) may have different behaviors but it is the simplest and the way that most drivers I've come across behave.
While increasing the fetch size may help the performance a bit I would also look into tuning the SDU size which controls the size of the packets at the sqlnet layer. Increasing the SDU size can speed up data transfers.
Of course the time it takes to fetch these 500,000 rows largely depends on how much data you're fetching. If it takes an hour I'm guessing you're fetching a lot of data and/or you're doing it from a remote client over a WAN.
To change the SDU size:
First change the default SDU size on the server to 32k (starting in 11.2.0.3 you can even use 64kB and up to 2MB starting in 12c) by changing or adding this line in sqlnet.ora on the server:
DEFAULT_SDU_SIZE=32767
Then modify your JDBC URL:
jdbc:oracle:thin:#(DESCRIPTION=(SDU=32767)(HOST=...)(PORT=...))(CONNECT_DATA=
I was using so far something like this for querying my database that was working perfectly fine :
PreparedStatement prepStmt = dbCon.prepareStatement(mySql);
ResultSet rs = prepStmt.executeQuery();
But then I needed to use the rs.first(); in order to be able to iterate over my rs multiple times. So I use now
PreparedStatement prepStmt = dbCon.prepareStatement(mySql,ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE);
My question is related to the performance of the two. What do I lose if I use the second option? Will using the second option have any negative effect in the code that I have written so far?
PS: Note that my application is a multi-user, database-intensive web application (on a Weblogic 10.3.4) that uses a back end Oracle 11g database.
Thanks all for your attention.
UPDATE
My maximum reslutset size will be less than 1000 rows and 15-20 columns
If you're using scrollability (your second option), pay attention to this:
Important: Because all rows of any scrollable result set are stored in
the client-side cache, a situation where the result set contains many
rows, many columns, or very large columns might cause the client-side
Java Virtual Machine (JVM) to fail. Do not specify scrollability for a
large result set.
Source: Oracle Database JDBC Developer's Guide and Reference
Since an Oracle cursor is a forward-only structure, in order to simulate a scrollable cursor, the JDBC driver would generally need to cache the results in memory if it wants to be able to ensure that the same results are returned when you iterate through the results a second time. Depending on the number and size of the results returned from the query, that can involve a substantial amount of additional memory being consumed on the application server. On the other hand, that should mean that iterating through the ResultSet a second time should be much more efficient than the first time.
Whether the extra memory required is meaningful depends on your application. You say that the largest ResultSet will have 1000 rows. If you figure that each row is 500 bytes (this will obviously depend on data types-- if your ResultSet just has a bunch of numbers, it would be much smaller, if it contains a bunch of long description strings, it may be much larger), 1000 rows is 500 kb per user. If you've got 1000 simultaneous users, that's only 500 MB of storage which probably isn't prohibitive. If you've got 1 million simultaneous users, on the other hand, that's 500 GB which is probably means that you're buying a few new servers. If your rows are 5000 bytes rather than 500, then you're talking about 5 GB of RAM which could be a large fraction of the memory required on the application server to run your application.
I am trying to write a database independant application with JDBC. I now need a way to fetch the top N entries out of some table. I saw there is a setMaxRows method in JDBC, but I don't feel comfortable using it, because I am scared the database will push out all results, and only the JDBC driver will reduce the result. If I need the top 5 results in a table with a billion rows this will break my neck (the table has an usable index).
Writing special SQL-statements for every kind of database isn't very nice, but will let the database do clever query planning and stop fetching more results than necessary.
Can I rely on setMaxRows to tell the database to not work to much?
I guess in the worst case I can't rely on this working in the hoped way. I'm mostly interested in Postgres 9.1 and Oracle 11.2, so if someone has experience with these databases, please step forward.
will let the database do clever query planning and stop fetching more
results than necessary.
If you use
PostgreSQL:
SELECT * FROM tbl ORDER BY col1 LIMIT 10; -- slow without index
Or:
SELECT * FROM tbl LIMIT 10; -- fast even without index
Oracle:
SELECT *
FROM (SELECT * FROM tbl ORDER BY col1 DESC)
WHERE ROWNUM < 10;
.. then only 10 rows will be returned. But if you sort your rows before picking top 10, all basically qualifying rows will be read before they can be sorted.
Matching indexes can prevent this overhead!
If you are unsure, what JDBC actually send to the database server, run a test and have the database engine log the statements received. In PostgreSQL you can set in postgresql.conf:
log_statement = all
(and reload) to log all statements sent to the server. Be sure to reset that setting after the test or your log files may grow huge.
The thing which could/may kill you with billion(s) of rows is the (highly likely) ORDER BY clause in your query. If this order cannot be established using an index then . . . it'll break your neck :)
I would not depend on the jdbc driver here. As a previous comment suggests it's unclear what it really does (looking at different rdbms).
If you are concerned regarding speed of your query you can use a LIMIT clause as well. If you use LIMIT you can at least be sure that it's passed on to the DB server.
Edit: Sorry, I was not aware that Oracle doesn't support LIMIT.
In direct answer to your question regarding PostgreSQL 9.1: Yes, the JDBC driver will tell the server to stop generating rows beyond what you set.
As others have pointed out, depending on indexes and the plan chosen, the server might scan a very large number of rows to find the five you want. Proper server configuration can help accurately model the costs to prevent this, but if value distribution is unusual you may need to introduce and optimization barrier (like with a CTE) to coerce the planner to produce a good plan.