Updating row while iterating through ResultSet takes a lot of time - java

I am trying to improve a data transfer program that I wrote. I am looking for suggestions on how to make it quicker.
My program extracts data from a database (usually Oracle 11g) by filling a ResultSet and writing this result into a file. The program looks periodically into the tables and queries if a special column has changed. For example, this could be such a query:
select columnA, columnB from scheme.table where changeColumn = '1'
Now comes the critical part. After extracting the data I need to update this changeColumn to '0'. Since I have just used the ResultSet for exporting the data into a file I have to rewind it, so the code looks like this:
extractedData.beforeFirst();
while (extractedData.next()) {
extractedData.updateString("changeColumn", "0");
extractedData.updateRow();
}
Now if this ResultSet is bigger (let's say more than 100.000 entries) then this loop can take hours. Does anyone have any suggestions on how to increase the performance of this?
I heard of setting the fetch size to a bigger value, but usually the ResultSet only contains less than a dozen entries. Is there a way to dynamically set the fetch size?

Use a JDBC Batch Update. From all the row that needs updating, take the primary key on the row that needs updating, add it to a batch update (SQL query) and execute the batch.
A good example from Mkyong shows you how to do JDBC Batch Update with JDBC PreparedStatement.

Related

Efficiant way to check large number string existing in database

I have a very large table in the database, the table has a column called
"unique_code_string", this table has almost 100,000,000 records.
Every 2 minutes, I will receive 100,000 code string, they are in an array and they are unique to each other. I need to insert them to the large table if they are all "good".
The meaning of "good" is this:
All 100,000 codes in the array never occur in the database large table.
If one or more codes occur in the database large table, the whole array will not use at all,
it means no codes in the array will insert into the large table.
Currently, I use this way:
First I do a loop and check each code in the array to see if there is already same code in the database large table.
Second, if all code is "new", then, I do the real insert.
But this way is very slow, I must finish all thing within 2 minutes.
I am thinking of other ways:
Join the 100,000 code in a SQL "in clause", each code has 32 length, I think no database will accept this 32*100,000 length "in clause".
Use database transaction, I force insert the codes anyway, if error happens, the transaction rollback. This cause some performance issue.
Use database temporary table, I am not good at writing SQL querys, please give me some example if this idea may work.
Now, can any experts give me some advice or some solutions?
I am a non-English speaker, I hope you see the issue I am meeting.
Thank you very much.
Load the 100,000 rows into a table!
Create a unique index on the original table:
create unique index unq_bigtable_uniquecodestring on bigtable (unique_code_string);
Now, you have the tools you need. I think I would go for a transaction, something like this:
insert into bigtable ( . . . )
select . . .
from smalltable;
If any row fails (due to the unique index), then the transaction will fail and nothing is inserted. You can also be explicit:
insert into bigtable ( . . . )
select . . .
from smalltable
where not exists (select 1
from smalltable st join
bigtable bt
on st.unique_code_string = bt.unique_code_string
);
For this version, you should also have an index/unique constraint on smalltable(unique_code_string).
It's hard to find an optimal solution with so little information. Often this depends on the network latency between application and database server and hardware resources.
You can load the 100,000,000 unique_code_string from the database and use HashSet or TreeSet to de-duplicate in-memory before inserting into the database. If your database server is resource constrained or there is considerable network latency this might be faster.
Depending how your receive the 100,000 records delta you could load it into the database e.g. a CSV file can be read using external table. If you can get the data efficiently into a temporary table and database server is not overloaded you can do it very efficiently with SQL or stored procedure.
You should spend some time to understand how real-time the update has to be e.g. how many SQL queries are reading the 100,000,000 row table and can you allow some of these SQL queries to be cancelled or blocked while you update the rows. Often it's a good idea to create a shadow table:
Create new table as copy of the existing 100,000,000 rows table.
Disable the indexes on the new table
Load the delta rows to the new table
Rebuild the indexes on new table
Delete the existing table
Rename the new table to the existing 100,000,000 rows table
The approach here is database specific. It will depend on how your database is defining the indexes e.g. if you have a partitioned table it might be not necessary.

Is the java code for querying an indexed database same as that for an Un-indexed database?

I have indexed some columns in my MS Access database, and I am using Java to query the database.
Before indexing, I used this code:
ResultSet rs = statement.executeQuery("Select * from Employees where FirstName = Sarah");
After indexing some columns in the database, should I make any changes to the code. Is there something like this needed/possible:
statement.getIndexes();
I am asking this because my MS Access database has 300,000+ records. Fetching records was too slow because of the size. After indexing, fetching records did not speed up at all. I think I might still be accessing the unindexed version of that column.
(I am writing the code for an Android app, if that matters)
No. The SQL command tells it to return a certain result, how it finds that result (use of indexes and the like) is an implementation detail of the db. Now you may need to do something on the database to get it to implement the index. Although you really ought to think of moving to a real database, Access is just not meant for large amounts of data.
It's likely that your issue is the query. You should never use select * from a table. Always specify your columns. Have a look here.

JDBC Performance - passing large resultset from database to Java

I have a performance related question. I need to retrieve about 500 rows from the database for the purpose of using the Apache POI to export the results into a Microsoft Excel spreadsheet.
Up until now for all my database queries I have been populating a PL/SQL object in the database layer and then returning that PL/SQL object to the Java and looping through the results.
But now that I need to return such a large result set to the Java from the db layer I've been asked a question about whether or not I think it might be better performance wise to return the 500 rows into the Java via an XML Clob.
This is a bit of an open question but I was hoping to get peoples opinion on this please.
thanks
As per http://docs.oracle.com/cd/E11882_01/java.112/e16548/resltset.htm#JJDBC28621
By default, when Oracle JDBC runs a query, it retrieves a result set
of 10 rows at a time from the database cursor. This is the default
Oracle row fetch size value. You can change the number of rows
retrieved with each trip to the database cursor by changing the row
fetch size value.
The following methods are available in all Statement, PreparedStatement, CallableStatement, and ResultSet objects for setting and getting the fetch size:
void setFetchSize(int rows) throws SQLException
int getFetchSize() throws SQLException
Use a Java ResultSet. This will fetch only some rows at a time, as you need them. Here is an example of how to use it: http://docs.oracle.com/javase/tutorial/jdbc/basics/retrieving.html
Basically, every time you ask for a new row, as in rs.next(), the JDBC system decides if the data is available on the client, or needs to be fetched from the server. This way, you are not fetching all of the data at once.

Insert, Select and Update Query Slows Down the Entire server

I am having an application for handling more than 10000000 data.
The MainTable has more than 10000000 data
I am trying to Insert the Data into a SubTable From the Main Table as
INSERT INTO SubTable(Value1,Value2)
SELECT Value1,Value2 FROM MainTable
GROUP BY Value1_ID;
After performing certain processing in SubTable..Again I update the new values into the Main Table as
UPDATE MainTable inf,SubTable in
SET inf.Value1=in.Value1, inf.Value2=in.Value2
WHERE inf.Value1_ID= in.Value1_ID;
While Running this query the Entire Server gets very slow and it stops the entire other transaction.I am using the JDBC Driver Manager connection here. How to avoid this? How to solve this problem?
If it's something that you have to do only once in a while, instead of updating the whole table in a single update, you can set up a small script that will update by batch of rows every few seconds/minutes or so. The other processes will have their query executed freely between two updates.
For example, by updating a batch of 100,000 rows every minutes, if your tables have the right indexes, that would take 1~2 hours, but with a far lesser impact on the performance.
The other solution would be do the update when the activity on the server is at its lowest (maybe during the week-ends?), that way you won't impact the other processes as much.

Fastest way to iterate through large table using JDBC

I'm trying to create a java program to cleanup and merge rows in my table. The table is large, about 500k rows and my current solution is running very slowly. The first thing I want to do is simply get an in-memory array of objects representing all the rows of my table. Here is what I'm doing:
pick an increment of say 1000 rows at a time
use JDBC to fetch a resultset on the following SQL query
SELECT * FROM TABLE WHERE ID > 0 AND ID < 1000
add the resulting data to an in-memory array
continue querying all the way up to 500,000 in increments of 1000, each time adding results.
This is taking way to long. In fact its not even getting past the second increment from 1000 to 2000. The query takes forever to finish (although when I run the same thing directly through a MySQL browser its decently fast). Its been a while since I've used JDBC directly. Is there a faster alternative?
First of all, are you sure you need the whole table in memory? Maybe you should consider (if possible) selecting rows that you want to update/merge/etc. If you really have to have the whole table you could consider using a scrollable ResultSet. You can create it like this.
// make sure autocommit is off (postgres)
con.setAutoCommit(false);
Statement stmt = con.createStatement(
ResultSet.TYPE_SCROLL_INSENSITIVE, //or ResultSet.TYPE_FORWARD_ONLY
ResultSet.CONCUR_READ_ONLY);
ResultSet srs = stmt.executeQuery("select * from ...");
It enables you to move to any row you want by using 'absolute' and 'relative' methods.
One thing that helped me was Statement.setFetchSize(Integer.MIN_VALUE). I got this idea from Jason's blog. This cut down execution time by more than half. Memory consumed went down dramatically (as only one row is read at a time.)
This trick doesn't work for PreparedStatement, though.
Although it's probably not optimum, your solution seems like it ought to be fine for a one-off database cleanup routine. It shouldn't take that long to run a query like that and get the results (I'm assuming that since it's a one off a couple of seconds would be fine). Possible problems -
is your network (or at least your connection to mysql ) very slow? You could try running the process locally on the mysql box if so, or something better connected.
is there something in the table structure that's causing it? pulling down 10k of data for every row? 200 fields? calculating the id values to get based on a non-indexed row? You could try finding a more db-friendly way of pulling the data (e.g. just the columns you need, have the db aggregate values, etc.etc)
If you're not getting through the second increment something is really wrong - efficient or not, you shouldn't have any problem dumping 2000, or 20,000 rows into memory on a running JVM. Maybe you're storing the data redundantly or extremely inefficiently?

Categories

Resources