Inserting 1million tuples in DB taking long time

Inserting 1million tuples in DB taking long time - java

I have an scenario where i need to insert 1million entries using sql. Which is taking long time.
I have Database structure with 9 tables, in every operation i need perform below tasks.
1. Open Connection
2. Insert tuples in 9 tables with maintaining Primary/foreign key relationship
3. Commit
4. close the connection
Repeat the operation for 1million time.
i am inserting 800 iterations/hour.
Which i feel too slow,
Do you have any better ways to improve on this?

Try inserting by Batch (i.e. PreparedStatement). Maybe you are inserting them individually.
e.g.
for(int i = 0; i < rows.length; i++){
// set parameter 1 to rows[i][0]
preparedStatement.setObject(1, rows[i][0]);
// set parameter 2 to rows[i][1]
.
.
.
preparedStatement.addBatch();
// insert 10k rows
if(i % 10000 == 0)
preparedStatement.executeBatch();
}
preparedStatement.executeBatch();
In this case where you have foreign keys, batch insert first the data to tables without FKs.
Why are batch inserts/updates faster? How do batch updates work?

Try to run execute the query in bulk as suggested in above answers. make sure to take care of rollback strategy in case error is thrown for a particular data.
Different strategies if error is thrown by a batch are :
1. full rollback
2. Commit at logical point and rest rollback.

Related

Partial Data deletes using Java SQL Hibernate

Our Java program has to delete large number of records from DB2 tables but it is running out of Transaction logs.
We are working on increasing the logs space but due to internal processes and other things... it will take a week+ to complete.
We are looking for a temporary way to delete a few records at a time instead of deleting all. For e.g When we have 1000 records to delete, we want the program to delete 50 records at a time and then do a commit and proceed for next 50 until all 1000 records deleted. If delete fails after X number of records deletion, it is still fine.
We are using Hibernate.
Requesting your suggestion on how it can be achieved. I'm looking at checking the sqlstate and sqlcode in java even on successful sql execution scenario but my bad, i couldn't find a way.
so like, loop do... while(check sqlcode for completion not true)
We cannot delete from back-end as the deletions are supposed to happen on user requests from java web application and in addition, we also have some table constraints and delete cascades.

I don't know Hibernate but you can change your delete to:
delete from (
select * from schema.table where id=:id fetch first 50 rows only
)
Assuming executeUpdate() returns the number of rows deleted you can put this in a loop (pseudocode):
... stmt = session.createSQLquery("delete from ( ... fetch first 50 rows only )");
int n = 1
while (n > 0) {
transaction = session.beginTransaction();
n = stmt.executeUpdate();
transaction.commit();
}

How to handle bad or duplicate records in Statement.ExecuteBatch()

I have a java program which processes a file of 1 Million records and insert it into a table using bulk insertion i.e. Statement.addbatch() and then Statement.executeBatch() after every 1000 records. The program runs quite fast.
However, if there is a duplicate record i.e. the table raises an exception the whole batch is gone and the rest of the records are un-trackable.
Even if I get the updatecount() is is of no help because I cannot insert the duplicates into another table etc.
Is there a way, that in 1 particular batch insert of 1000, if there is a bad record, then each record in that batch can be processed 1 by 1 so that the bad/duplicate records can be placed in another table and the non-duplicates in regular table ?
Is there any other class I can use ? I know in c++ the Oracle provides OCI which can handle single records in the batch (called Host array operation), but in Java the Bulk insertion is usually done by Statement.adding in the loop and then inserting it using Statement.executeBatch().
Thanks.

I would break it up into smaller chunks of 1,000, like this
final int BATCH_SIZE = 1000;
for (int i = 0; i < DATA_SIZE; i++) {
statement.setString(1, "a#a.com");
statement.setLong(2, 1);
statement.addBatch();
if (i % BATCH_SIZE == BATCH_SIZE - 1)
statement.executeBatch();
}
if (DATA_SIZE % BATCH_SIZE != 0)
statement.executeBatch();
It is quite frequent for a batch of records to contain few bad ones. If you try to insert all the records in one go, and one record fails, the whole lot of insertions will be rejected. That is expected and is the core purpose of "transaction handling".
Usually for batch inserts, you can take two approaches:
1)Commit after each record insertion --> Very performance intensive process.
2)Divide the total records into smaller "chunks" and insert into database. So that just the chunk containing the bad record will fail and other chunks will be inserted into the database.
Alternatively, if you dont want to handle these things yourself, go for a framework. Spring batch
Could be one of your options in that case

Efficient Logging System- Java / Oracle SQL

All,
I have to redesign an existing logging system being used in web application. The existing system reads an Excel sheet for records, processes(data validation) it, records the error messages for each entry in the Excel sheet into the database as soon as an error is found and displays the result in the end for all the records. So,
If I have 2 records in the excelsheet, R1 and R2, both fail with 3 validation error each, an insert query is fired 6 times for each validation message and the user sees all the 6 messages in the end of the validation process.
This method worked for smaller set of entries. But for 20,000 records, this obviously has become a bottleneck.
As per my initial redesign approach, following are the options I need suggestion on from everyone at SO:
1> Create a custom logger class with all the required information for logging and for each record in error, store the record ID as key and the Logger class object as value in a HashMap. When all the records are processed completely, perform database inserts for all the records in the HashMap in one shot.
2> Fire SQL inserts periodically i.e. for X records in total, process Y <= X records each time, perform insert operation once. and processing remaining records again.
We really do not have a set criteria at this point except for definitely improving the performance.
Can everyone please provide your feedback as to what would be an efficient logging system design and if there are better approaches than what I mentioned above ?

I would guess your problems are due to the fact you are doing row based operations, rather than set based ?
A set based operation would be the quickest way to load the data. If that is not possible I would go with the insert x records at a time as it is more scalable , inserting them all at once would require ever increasing amounts of memory (but would probably be quicker).
good discussion here on ask tom: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1583402705463

Instead of memorizing every error in a HashMap, you could try (provided the DBMS supports it) to batch all those insert statements together and fire it at the end. Somewhat like this:
PreparedStatement ps = connection.prepareStatement("INSERT INTO table (...) values (?, ?, ...)");
for(...) {
ps.setString(1, ...);
...
ps.addBatch();
}
int[] results = ps.executeBatch();

How to optimize the processing speed for inserting data using java?

I have a requirement to read an Excel file with its headers and data and create a table in a Database (MySQL) on the basis of header and put value which is extracted from file. For that I am using JDBC for creating and inserting data (used prepared statement) in DB Table.
It works nicely but when the number of records are increased -suppose file contains 200000 or more records- it's going to be slow. Please guide me how I am able to optimize the processing speed of inserting data into an DB Table.
Thanks, Sameek

To optimize it you should first use the same PreparedStatement object in all of the inserts.
To further optimize the code you can send batches of updates.
e.g. batches of 5:
//create table
PreparedStatement ps = conn.prepareStatement(sql);
for(int i =0; i < rows.length; ++i) {
if(i != 0 && i%5 == 0) {
pstmt.executeBatch();
}
pstmt.setString(1, rows[i].getName());
pstmt.setLong(2, rows[i].getId());
pstmt.addBatch();
}
pstmt.executeBatch();

Wrap your inserts in a transaction. Pseudo code:
1) Begin transaction
2) Create prepared statement
3) Loop for all inserts, setting prepared statement parameters and executing for each insert
4) Commit transaction

I'll take the example of hibernate. Hibernate have a concept called HibernateSession which stores the SQL command that are not yet sent to DB.
With Hibernate you can do inserts and flush the session every 100 inserts which means sending SQL queries every 100 inserts. This helps to gain performance because it communicates with database every 100 inserts and not each insert.
So you can make the same thing by executing the executeUpdate every 100 (or what ever you want) times or use preparedStatement.

Updating a database while using a preparedStatement select

I'm selecting a subset of data from a MS SQL datbase, using a PreparedStatement.
While iterating through the resultset, I also want to update the rows. At the moment I use something like this:
prepStatement = con.prepareStatement(
selectQuery,
ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_UPDATABLE);
rs = prepStatement.executeQuery();
while(rs.next){
rs.updateInt("number", 20)
rs.updateRow();
}
The database is updated with the correct values, but I get the following exception:
Optimistic concurrency check failed. The row was modified outside of this cursor.
I've Googled it, but haven't been able to find any help on the issue.
How do I prevent this exception? Or since the program does do what I want it to do, can I just ignore it?

The record has been modified between the moment it was retrieved from the database (through your cursor) and the moment when you attempted to save it back. If the number column can be safely updated independently of the rest of the record or independently of some other process having already set the number column to some other value, you could be tempted to do:
con.execute("update table set number = 20 where id=" & rs("id") )
However, the race condition persists, and your change may be in turn overwritten by another process.
The best strategy is to ignore the exception (the record was not updated), possibly pushing the failed record to a queue (in memory), then do a second pass over the failed records (re-evaluating the conditions in query and updating as appropriate - add number <> 20 as one of the conditions in query if this is not already the case.) Repeat until no more records fail. Eventually all records will be updated.

Assuming you know exactly which rows you will update, I would do
SET your AUTOCOMMIT to OFF
SET ISOLATION Level to SERIALIZABLE
SELECT row1, row1 FROM table WHERE somecondition FOR UPDATE
UPDATE the rows
COMMIT
This is achieved via pessimistic locking (and assuming row locking is supported in your DB, it should work)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Inserting 1million tuples in DB taking long time - java

Try to run execute the query in bulk as suggested in above answers. make sure to take care of rollback strategy in case error is thrown for a particular data. Different strategies if error is thrown by a batch are : 1. full rollback 2. Commit at logical point and rest rollback.

Related

Partial Data deletes using Java SQL Hibernate

How to handle bad or duplicate records in Statement.ExecuteBatch()

Efficient Logging System- Java / Oracle SQL

How to optimize the processing speed for inserting data using java?

Updating a database while using a preparedStatement select

Categories

Resources