How to optimize the processing speed for inserting data using java?

How to optimize the processing speed for inserting data using java? - java

I have a requirement to read an Excel file with its headers and data and create a table in a Database (MySQL) on the basis of header and put value which is extracted from file. For that I am using JDBC for creating and inserting data (used prepared statement) in DB Table.
It works nicely but when the number of records are increased -suppose file contains 200000 or more records- it's going to be slow. Please guide me how I am able to optimize the processing speed of inserting data into an DB Table.
Thanks, Sameek

To optimize it you should first use the same PreparedStatement object in all of the inserts.
To further optimize the code you can send batches of updates.
e.g. batches of 5:
//create table
PreparedStatement ps = conn.prepareStatement(sql);
for(int i =0; i < rows.length; ++i) {
if(i != 0 && i%5 == 0) {
pstmt.executeBatch();
}
pstmt.setString(1, rows[i].getName());
pstmt.setLong(2, rows[i].getId());
pstmt.addBatch();
}
pstmt.executeBatch();

Wrap your inserts in a transaction. Pseudo code:
1) Begin transaction
2) Create prepared statement
3) Loop for all inserts, setting prepared statement parameters and executing for each insert
4) Commit transaction

I'll take the example of hibernate. Hibernate have a concept called HibernateSession which stores the SQL command that are not yet sent to DB.
With Hibernate you can do inserts and flush the session every 100 inserts which means sending SQL queries every 100 inserts. This helps to gain performance because it communicates with database every 100 inserts and not each insert.
So you can make the same thing by executing the executeUpdate every 100 (or what ever you want) times or use preparedStatement.

Related

Inserting 1million tuples in DB taking long time

I have an scenario where i need to insert 1million entries using sql. Which is taking long time.
I have Database structure with 9 tables, in every operation i need perform below tasks.
1. Open Connection
2. Insert tuples in 9 tables with maintaining Primary/foreign key relationship
3. Commit
4. close the connection
Repeat the operation for 1million time.
i am inserting 800 iterations/hour.
Which i feel too slow,
Do you have any better ways to improve on this?

Try inserting by Batch (i.e. PreparedStatement). Maybe you are inserting them individually.
e.g.
for(int i = 0; i < rows.length; i++){
// set parameter 1 to rows[i][0]
preparedStatement.setObject(1, rows[i][0]);
// set parameter 2 to rows[i][1]
.
.
.
preparedStatement.addBatch();
// insert 10k rows
if(i % 10000 == 0)
preparedStatement.executeBatch();
}
preparedStatement.executeBatch();
In this case where you have foreign keys, batch insert first the data to tables without FKs.
Why are batch inserts/updates faster? How do batch updates work?

Try to run execute the query in bulk as suggested in above answers. make sure to take care of rollback strategy in case error is thrown for a particular data.
Different strategies if error is thrown by a batch are :
1. full rollback
2. Commit at logical point and rest rollback.

Getting Data as chunks from Database to save memory

My problem is related to JDBC queries where the number of records in the Table is huge. The end goal is getting the data from DB in a streamed fashion where by you get the data chunk by chunk.
This is possible by creating Multiple SQL statements by key words such as LIMIT , OFFSET. But in this case there will be multiple DB calls which will cost more time.
Is there a way where by you do not load an entire result Set into memory & can get data in chunks without having additional DB calls?
Thanks

First, if you are getting data in chunks from the database, you will be doing multiple database calls. You won't be executing as many queries.
Second, yes it is possible. There is a DB concept known as a "cursor".
Connection cn = //..
// Very important - JDBC default is to commit after every statement
// which will cause DB to close the cursor
cn.setAutoCommit(false);
Statement st = cn.prepareStatement("SELECT * FROM TBL_FOO");
// Cache 50 rows at the client at a time
st.setFetchSize(50);
ResultSet rs = st.executeQuery();
while (rs.next()) {
// Move the cursor position forward - moving past cached rows triggers another fetch
}
rs.close();
st.close();
Note, the database will have fetched all the rows when it executed the query, and the result set will occupy DB memory until you close the cursor. Remember, the DB is a shared resource.

Partial Data deletes using Java SQL Hibernate

Our Java program has to delete large number of records from DB2 tables but it is running out of Transaction logs.
We are working on increasing the logs space but due to internal processes and other things... it will take a week+ to complete.
We are looking for a temporary way to delete a few records at a time instead of deleting all. For e.g When we have 1000 records to delete, we want the program to delete 50 records at a time and then do a commit and proceed for next 50 until all 1000 records deleted. If delete fails after X number of records deletion, it is still fine.
We are using Hibernate.
Requesting your suggestion on how it can be achieved. I'm looking at checking the sqlstate and sqlcode in java even on successful sql execution scenario but my bad, i couldn't find a way.
so like, loop do... while(check sqlcode for completion not true)
We cannot delete from back-end as the deletions are supposed to happen on user requests from java web application and in addition, we also have some table constraints and delete cascades.

I don't know Hibernate but you can change your delete to:
delete from (
select * from schema.table where id=:id fetch first 50 rows only
)
Assuming executeUpdate() returns the number of rows deleted you can put this in a loop (pseudocode):
... stmt = session.createSQLquery("delete from ( ... fetch first 50 rows only )");
int n = 1
while (n > 0) {
transaction = session.beginTransaction();
n = stmt.executeUpdate();
transaction.commit();
}

How to handle bad or duplicate records in Statement.ExecuteBatch()

I have a java program which processes a file of 1 Million records and insert it into a table using bulk insertion i.e. Statement.addbatch() and then Statement.executeBatch() after every 1000 records. The program runs quite fast.
However, if there is a duplicate record i.e. the table raises an exception the whole batch is gone and the rest of the records are un-trackable.
Even if I get the updatecount() is is of no help because I cannot insert the duplicates into another table etc.
Is there a way, that in 1 particular batch insert of 1000, if there is a bad record, then each record in that batch can be processed 1 by 1 so that the bad/duplicate records can be placed in another table and the non-duplicates in regular table ?
Is there any other class I can use ? I know in c++ the Oracle provides OCI which can handle single records in the batch (called Host array operation), but in Java the Bulk insertion is usually done by Statement.adding in the loop and then inserting it using Statement.executeBatch().
Thanks.

I would break it up into smaller chunks of 1,000, like this
final int BATCH_SIZE = 1000;
for (int i = 0; i < DATA_SIZE; i++) {
statement.setString(1, "a#a.com");
statement.setLong(2, 1);
statement.addBatch();
if (i % BATCH_SIZE == BATCH_SIZE - 1)
statement.executeBatch();
}
if (DATA_SIZE % BATCH_SIZE != 0)
statement.executeBatch();
It is quite frequent for a batch of records to contain few bad ones. If you try to insert all the records in one go, and one record fails, the whole lot of insertions will be rejected. That is expected and is the core purpose of "transaction handling".
Usually for batch inserts, you can take two approaches:
1)Commit after each record insertion --> Very performance intensive process.
2)Divide the total records into smaller "chunks" and insert into database. So that just the chunk containing the bad record will fail and other chunks will be inserted into the database.
Alternatively, if you dont want to handle these things yourself, go for a framework. Spring batch
Could be one of your options in that case

Does a ResultSet load all data into memory or only when requested?

I have a .jsp page where I have a GUI table that displays records from an Oracle database. This table allows typical pagination behaviour, such as "FIRST", "NEXT", "PREVIOUS" and "LAST". The records are obtained from a Java ResultSet object that is returned from executing a SQL statement.
This ResultSet might be very big, so my question is:
If I have a ResultSet containing one million records but my table only displays the data from the first ten records in the ResultSet, is the data only fetched when I start requesting record data or does all of the data get loaded into memory entirely once the ResultSet is returned from executing a SQL statement?

The Java ResultSet is a pointer (or cursor) to the results in the database. The ResultSet loads records in blocks from the database. So to answer your question, the data is only fetched when you request it but in blocks.
If you need to control how many rows are fetched at once by the driver, you can use the setFetchSize(int rows) method on the ResultSet. This will allow you to control how big the blocks it retrieves at once.

The JDBC spec does not specify whether the data is streamed or if it is loaded into memory. Oracle streams by default. MySQL does not. To get MySQL to stream the resultset, you need to set the following on the Statement:
pstmt = conn.prepareStatement(
sql,
ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY);
pstmt.setFetchSize(Integer.MIN_VALUE);

The best idea is make a sub query and display 100 or 1000 rows at a time/in single page. And managing the connection by connection pooling.
To make a sub query you can use Row count in oracle and Limit in MY SQL.

While the JDBC spec does not specify whether or not the all data in the result set would get fetched, any well-written driver won't do that.
That said, a scrollable result set might be more what you have in mind:
(link redacted, it pointed to a spyware page)
You may also consider a disconnected row set, that's stored in the session (depending on how scalable your site needs to be):
http://java.sun.com/j2se/1.4.2/docs/api/javax/sql/RowSet.html

lets say we have a table that contains 500 records in it
PreparedStatement stm=con.prepareStatement("select * from table");
stm.setFetchSize(100);// now each 100 records are loaded together from the database into the memory,
// and since we have 500 5 server round trips will occur.
ResultSet rs = stm.executeQuery();
rs.setFetchSize (50);//overrides the fetch size provided in the statements,
//and the next trip to the database will fetch the records based on the new fetch size

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to optimize the processing speed for inserting data using java? - java

Wrap your inserts in a transaction. Pseudo code: 1) Begin transaction 2) Create prepared statement 3) Loop for all inserts, setting prepared statement parameters and executing for each insert 4) Commit transaction

Related

Inserting 1million tuples in DB taking long time

Getting Data as chunks from Database to save memory

Partial Data deletes using Java SQL Hibernate

How to handle bad or duplicate records in Statement.ExecuteBatch()

Does a ResultSet load all data into memory or only when requested?

Categories

Resources