Efficient way to do batch INSERTS with JDBC - java

In my app I need to do a lot of INSERTS. Its a Java app and I am using plain JDBC to execute the queries. The DB being Oracle. I have enabled batching though, so it saves me network latencies to execute queries. But the queries execute serially as separate INSERTs:
insert into some_table (col1, col2) values (val1, val2)
insert into some_table (col1, col2) values (val3, val4)
insert into some_table (col1, col2) values (val5, val6)
I was wondering if the following form of INSERT might be more efficient:
insert into some_table (col1, col2) values (val1, val2), (val3, val4), (val5, val6)
i.e. collapsing multiple INSERTs into one.
Any other tips for making batch INSERTs faster?

This is a mix of the two previous answers:
PreparedStatement ps = c.prepareStatement("INSERT INTO employees VALUES (?, ?)");
ps.setString(1, "John");
ps.setString(2,"Doe");
ps.addBatch();
ps.clearParameters();
ps.setString(1, "Dave");
ps.setString(2,"Smith");
ps.addBatch();
ps.clearParameters();
int[] results = ps.executeBatch();

Though the question asks inserting efficiently to Oracle using JDBC, I'm currently playing with DB2 (On IBM mainframe), conceptually inserting would be similar so thought it might be helpful to see my metrics between
inserting one record at a time
inserting a batch of records (very efficient)
Here go the metrics
1) Inserting one record at a time
public void writeWithCompileQuery(int records) {
PreparedStatement statement;
try {
Connection connection = getDatabaseConnection();
connection.setAutoCommit(true);
String compiledQuery = "INSERT INTO TESTDB.EMPLOYEE(EMPNO, EMPNM, DEPT, RANK, USERNAME)" +
" VALUES" + "(?, ?, ?, ?, ?)";
statement = connection.prepareStatement(compiledQuery);
long start = System.currentTimeMillis();
for(int index = 1; index < records; index++) {
statement.setInt(1, index);
statement.setString(2, "emp number-"+index);
statement.setInt(3, index);
statement.setInt(4, index);
statement.setString(5, "username");
long startInternal = System.currentTimeMillis();
statement.executeUpdate();
System.out.println("each transaction time taken = " + (System.currentTimeMillis() - startInternal) + " ms");
}
long end = System.currentTimeMillis();
System.out.println("total time taken = " + (end - start) + " ms");
System.out.println("avg total time taken = " + (end - start)/ records + " ms");
statement.close();
connection.close();
} catch (SQLException ex) {
System.err.println("SQLException information");
while (ex != null) {
System.err.println("Error msg: " + ex.getMessage());
ex = ex.getNextException();
}
}
}
The metrics for 100 transactions :
each transaction time taken = 123 ms
each transaction time taken = 53 ms
each transaction time taken = 48 ms
each transaction time taken = 48 ms
each transaction time taken = 49 ms
each transaction time taken = 49 ms
...
..
.
each transaction time taken = 49 ms
each transaction time taken = 49 ms
total time taken = 4935 ms
avg total time taken = 49 ms
The first transaction is taking around 120-150ms which is for the query parse and then execution, the subsequent transactions are only taking around 50ms. (Which is still high, but my database is on a different server(I need to troubleshoot the network))
2) With insertion in a batch (efficient one) - achieved by preparedStatement.executeBatch()
public int[] writeInABatchWithCompiledQuery(int records) {
PreparedStatement preparedStatement;
try {
Connection connection = getDatabaseConnection();
connection.setAutoCommit(true);
String compiledQuery = "INSERT INTO TESTDB.EMPLOYEE(EMPNO, EMPNM, DEPT, RANK, USERNAME)" +
" VALUES" + "(?, ?, ?, ?, ?)";
preparedStatement = connection.prepareStatement(compiledQuery);
for(int index = 1; index <= records; index++) {
preparedStatement.setInt(1, index);
preparedStatement.setString(2, "empo number-"+index);
preparedStatement.setInt(3, index+100);
preparedStatement.setInt(4, index+200);
preparedStatement.setString(5, "usernames");
preparedStatement.addBatch();
}
long start = System.currentTimeMillis();
int[] inserted = preparedStatement.executeBatch();
long end = System.currentTimeMillis();
System.out.println("total time taken to insert the batch = " + (end - start) + " ms");
System.out.println("total time taken = " + (end - start)/records + " s");
preparedStatement.close();
connection.close();
return inserted;
} catch (SQLException ex) {
System.err.println("SQLException information");
while (ex != null) {
System.err.println("Error msg: " + ex.getMessage());
ex = ex.getNextException();
}
throw new RuntimeException("Error");
}
}
The metrics for a batch of 100 transactions is
total time taken to insert the batch = 127 ms
and for 1000 transactions
total time taken to insert the batch = 341 ms
So, making 100 transactions in ~5000ms (with one trxn at a time) is decreased to ~150ms (with a batch of 100 records).
NOTE - Ignore my network which is super slow, but the metrics values would be relative.

The Statement gives you the following option:
Statement stmt = con.createStatement();
stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')");
stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')");
stmt.addBatch("INSERT INTO emp_dept VALUES (1000, 260)");
// submit a batch of update commands for execution
int[] updateCounts = stmt.executeBatch();

You'll have to benchmark, obviously, but over JDBC issuing multiple inserts will be much faster if you use a PreparedStatement rather than a Statement.

You can use this rewriteBatchedStatements parameter to make the batch insert even faster.
you can read here about the param: MySQL and JDBC with rewriteBatchedStatements=true

SQLite: The above answers are all correct. For SQLite, it is a little bit different. Nothing really helps, even to put it in a batch is (sometimes) not improving performance. In that case, try to disable auto-commit and commit by hand after you are done (Warning! When multiple connections write at the same time, you can clash with these operations)
// connect(), yourList and compiledQuery you have to implement/define beforehand
try (Connection conn = connect()) {
conn.setAutoCommit(false);
preparedStatement pstmt = conn.prepareStatement(compiledQuery);
for(Object o : yourList){
pstmt.setString(o.toString());
pstmt.executeUpdate();
pstmt.getGeneratedKeys(); //if you need the generated keys
}
pstmt.close();
conn.commit();
}

How about using the INSERT ALL statement ?
INSERT ALL
INTO table_name VALUES ()
INTO table_name VALUES ()
...
SELECT Statement;
I remember that the last select statement is mandatory in order to make this request succeed. Don't remember why though.
You might consider using PreparedStatement instead as well. lots of advantages !
Farid

You can use addBatch and executeBatch for batch insert in java See the Example : Batch Insert In Java

In my code I have no direct access to the 'preparedStatement' so I cannot use batch, I just pass it the query and a list of parameters. The trick however is to create a variable length insert statement, and a LinkedList of parameters. The effect is the same as the top example, with variable parameter input length.See below (error checking omitted).
Assuming 'myTable' has 3 updatable fields: f1, f2 and f3
String []args={"A","B","C", "X","Y","Z" }; // etc, input list of triplets
final String QUERY="INSERT INTO [myTable] (f1,f2,f3) values ";
LinkedList params=new LinkedList();
String comma="";
StringBuilder q=QUERY;
for(int nl=0; nl< args.length; nl+=3 ) { // args is a list of triplets values
params.add(args[nl]);
params.add(args[nl+1]);
params.add(args[nl+2]);
q.append(comma+"(?,?,?)");
comma=",";
}
int nr=insertIntoDB(q, params);
in my DBInterface class I have:
int insertIntoDB(String query, LinkedList <String>params) {
preparedUPDStmt = connectionSQL.prepareStatement(query);
int n=1;
for(String x:params) {
preparedUPDStmt.setString(n++, x);
}
int updates=preparedUPDStmt.executeUpdate();
return updates;
}

if you use jdbcTemplate then:
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.jdbc.core.BatchPreparedStatementSetter;
public int[] batchInsert(List<Book> books) {
return this.jdbcTemplate.batchUpdate(
"insert into books (name, price) values(?,?)",
new BatchPreparedStatementSetter() {
public void setValues(PreparedStatement ps, int i) throws SQLException {
ps.setString(1, books.get(i).getName());
ps.setBigDecimal(2, books.get(i).getPrice());
}
public int getBatchSize() {
return books.size();
}
});
}
or with more advanced configuration
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.jdbc.core.ParameterizedPreparedStatementSetter;
public int[][] batchInsert(List<Book> books, int batchSize) {
int[][] updateCounts = jdbcTemplate.batchUpdate(
"insert into books (name, price) values(?,?)",
books,
batchSize,
new ParameterizedPreparedStatementSetter<Book>() {
public void setValues(PreparedStatement ps, Book argument)
throws SQLException {
ps.setString(1, argument.getName());
ps.setBigDecimal(2, argument.getPrice());
}
});
return updateCounts;
}
link to source

Using PreparedStatements will be MUCH slower than Statements if you have low iterations. To gain a performance benefit from using a PrepareStatement over a statement, you need to be using it in a loop where iterations are at least 50 or higher.

Related

JdbcTemplate.batchUpdate() returns 0, on insert error for one item but inserts the remaining item into sql server db despite using #Transactional

My code is very similar to one below, despite configuring the transaction manager, except for the incorrect item all items are inserted into the db. This is absurd as either there should be all insert or none using #Transactional.
List<Book> books = new ArrayList();
for (int count = 0; count < size; count++) {
if (count == 500) {
// Create an invalid data for id 500, test rollback
// Name max 255, this book has length of 300
books.add(new Book(NameGenerator.randomName(300), new BigDecimal(1.99)));
continue;
}
books.add(new Book(NameGenerator.randomName(20), new BigDecimal(1.99)));
}
#Transactional
public int[][] batchInsert(List<Book> books, int batchSize) {
int[][] updateCounts = jdbcTemplate.batchUpdate(
"insert into books (name, price) values(?,?)",
books,
batchSize,
new ParameterizedPreparedStatementSetter<Book>() {
public void setValues(PreparedStatement ps, Book argument) throws SQLException {
ps.setString(1, argument.getName());
ps.setBigDecimal(2, argument.getPrice());
}
});
return updateCounts;
}
This is a difficult situation I facing for batch update. The actual solution lies in using prepared statement and not JdbcTemplate(Actually under the hood jdbcTemplate uses prepared statement). The scenario was to insert 5000 records in db in distrubuted application with one batch(JSON Payload) of 1000 records.
On the connection object one needs to turn off auto commit and then commit the transaction if all the insert happens successfully. The scenario is tried and tested in Spring boot.
Connection conn = dataSource.getConnection(); //Configure
datasource in the config class
conn.setAutoCommit(false); //The most important part of the code
PreparedStatement pstmt = conn.prepareStatement(
"INSERT INTO customers (CustID, Last_Name, " +
"First_Name, Email, Phone_Number)" +
" VALUES(?,?,?,?,?)");
for (int i = 0; i < firstNames.length; i++) {
// Add each parameter to the row.
pstmt.setInt(1, i + 1);
pstmt.setString(2, lastNames[i]);
pstmt.setString(3, firstNames[i]);
pstmt.setString(4, emails[i]);
pstmt.setString(5, phoneNumbers[i]);
// Add row to the batch.
pstmt.addBatch();
}
try {
// Batch is ready, execute it to insert the data
pstmt.executeBatch();
conn.commit(); //Any exception in inserting a record the commit is
skipped
} catch (SQLException e) {
System.out.println("Error message: " + e.getMessage());
return; // Exit if there was an error
}
Using prepared statement actually gives the control over the transaction and has far better performance than using JdbcTemaplte.
Also explained in the article https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/ConnectingToVertica/ClientJDBC/BatchInsertsUsingJDBCPreparedStatements.htm

Insert performance tuning

Currently we are selecting data from one database and inserting it into a backup database(SQL SERVER).
This data always contains more than 15K records in one select.
We are using Enumeration to iterate over the data selected.
We are using JDBC PreparedStatement to insert data as:
Enumeration values = ht.elements(); -- ht is HashTable containing selected data.
while(values.hasMoreElements())
{
pstmt = conn.prepareStatement("insert query");
pstmt.executeUpdate();
}
I am not sure if this is the correct or efficient way to do the faster insert.
For inserting 10k rows it takes near about 30 min or more.
Is there any efficient way to make it fast?
Note: Not using any indexes on the table.
Use a batch insert, but commit after a few entris, don't try to send all 10K at once. Try investigating to get the best size, it' a trade off to memory vs network trips.
Connection connection = new getConnection();
Statement statement = connection.createStatement();
int i = 0;
for (String query : queries) {
statement.addBatch("insert query");
if ((i++ % 500) == 0) {
// Do an execute now and again, don't send too many at once
statement.executeBatch();
}
}
statement.executeBatch();
statement.close();
connection.close();
Also, from your code I'm not sure what you are doing, but use paramaterised queries rather than sending 10K insert statements as text. Something like:
String q= "INSERT INTO data_table (id) values (?)";
Connection connection = new getConnection();
PreparedStatement ps = connection.prepareStatement(q);
for (Data d: data) {
ps.setString(1, d.getId());
ps.addBatch();
}
ps.executeBatch();
ps.close();
connection.close();
You can insert all the values in one sql command:
INSERT INTO Table1 ( Column1, Column2 ) VALUES
( V1, V2 ), ( V3, V4 ), .......
You may also insert the values by bulks of 500 records, for example, if the query would become very big. It is not efficient at all to insert on row per statement remotely (using a connection). Another solution is to do the inserts using a stored procedure. You just pass the values to it as parameters.
Here is how you can do it using the INSERT command above:
Enumeration values = ht.elements(); -- ht is HashTable containing selected data.
int i=0;
String sql="";
while(values.hasMoreElements())
{
sql+="(" + values + ")"; //better use StringBuffer here
i++;
if(i % 500 == 0) {
pstmt = conn.prepareStatement("insert query "+sql);
pstmt.executeUpdate();
sql="";
}
else
sql += " , ";
}

Update in bulk through JDBC batch gives SQLException TransactionImpl

I'm using EJB3 with Oracle database and JDBC.
I'm working on an app where I have to fire 25000 UPDATE queries.
My code is as follows:
public int updateStatus(List<String> idList) {
Connection connection = getConnection(); // Connection initialized properly for oracle db
statement = connection.createStatement();
String sql = null;
for (String id : idlist) { // idList is properly filled
sql = "UPDATE TBLTEST SET STATUS = 'FIXED' WHERE ID = '" + id + "'";
statement.addBatch(sql);
}
int[] affectedRecords = statement.executeBatch();
}
Please note, the class in which this method is written, is annotated as
#TransactionManagement(TransactionManagementType.CONTAINER)
This code is working perfectly fine upto 8000 queries. For more ids, it throws the following exception:
org.jboss.util.NestedSQLException: Transaction TransactionImple < ac, BasicAction: 0:ffffc0a80272:1652:56bd6be5:57e status: ActionStatus.ABORTED > cannot proceed STATUS_ROLLEDBACK; - nested throwable: (javax.transaction.RollbackException: Transaction TransactionImple < ac, BasicAction: 0:ffffc0a80272:1652:56bd6be5:57e status: ActionStatus.ABORTED > cannot proceed STATUS_ROLLEDBACK)
at org.jboss.resource.adapter.jdbc.WrapperDataSource.checkTransactionActive(WrapperDataSource.java:165)
at org.jboss.resource.adapter.jdbc.WrappedConnection.checkTransactionActive(WrappedConnection.java:843)
at org.jboss.resource.adapter.jdbc.WrappedConnection.checkStatus(WrappedConnection.java:858)
at org.jboss.resource.adapter.jdbc.WrappedConnection.checkTransaction(WrappedConnection.java:835)
at org.jboss.resource.adapter.jdbc.WrappedConnection.createStatement(WrappedConnection.java:183)
Can anyone help with the exception?
Best guess: By using individual SQL statements instead of PreparedStatement you force the driver to send all your statements (> 400k of character data) to the DB and the DB to parse all of that 400k characters which will hit a limit at some time and breaks things (Exception is not clear on where or what broke as it hides the causing Exception).
How to fix:
Go for individual batches of "not too many" statements at a time - say... 1000:
public int updateStatus(List<String> idList) {
List<Integer> affectedRecords = new ArrayList<Integer>(idList.size());
try(Connection connection = getConnection();
Statement statement = connection.createStatement()) {
int count = 0;
for (String id : idList) {
statement.addBatch("UPDATE TBLTEST SET STATUS = 'FIXED' WHERE ID = '" + id + "'");
//Execute after 1000 rows
if(++count % 1000 == 0) {
int[] result = statement.executeBatch();
//Utility Method - you need to implement to add int[] into the List
addResults(affectedRecords, result);
statement.clearBatch();
}
}
//In need of final execute?
if(count % 1000 > 0) {
// For good measure execute once more
int[] result = statement.executeBatch();
//Utility Method - you need to implement to add int[] into the List
addResults(affectedRecords, result);
}
} catch(SQLException se) {
se.printStackTrace();
}
}

Java Bulk Insertion Loops Take Time Code Attached?

hi i am new to java and i am inserting in in to database using loop from array it takes time how would i insert data in DB as bulk insertion my code here,
if(con != null)
{
rs = dboperation.DBselectstatement(con,"select host_object_id from nagios_hosts where address='"+ip+"'");
if (rs != null)
{
rs.next();
String id = rs.getString(1);
for(int i= 0;i<serviceArray.length;i++)
{
status.append(serviceArray[i]+"\n");
dboperation.DbupdateStatement(DbAcess.getNagios_connection(),"insert into nagios_servicelist(service_name,host_object_id) values('"+serviceArray[i]+"','"+id+"')");
}
}
}
do not go in detail about this code i tell you that i am getting id from the first query in "rs" resultset and "servicearray" have services that i want to insert in Db but it takes time in loop how will i do this array bulk insertion in Database?
hopes to listen from you soon
Thanks in Advance
You shuld use JDBC bulk insert for your purpose -
//Create a new statement
Statement st = con.createStatement();
//Add SQL statements to be executed
st.addBatch("insert into nagios_servicelist(service_name,host_object_id) values('"+serviceArray[0]+"','"+id+"')");
st.addBatch("insert into nagios_servicelist(service_name,host_object_id) values('"+serviceArray[1]+"','"+id+"')");
st.addBatch("insert into nagios_servicelist(service_name,host_object_id) values('"+serviceArray[2]+"','"+id+"')");
// Execute the statements in batch
st.executeBatch();
You can insert your own logic here. But this is the overview of how this is to be done.
The following code avoids out of memory error as well as SQL injection
String sql = "insert into employee (name, city, phone) values (?, ?, ?)";
Connection connection = new getConnection();
PreparedStatement ps = connection.prepareStatement(sql);
final int batchSize = 1000;
int count = 0;
for (Employee employee: employees) {
ps.setString(1, employee.getName());
ps.setString(2, employee.getCity());
ps.setString(3, employee.getPhone());
ps.addBatch();
if(++count % batchSize == 0) {
ps.executeBatch();
}
}
ps.executeBatch(); // insert remaining records
ps.close();
connection.close();

class java.lang.OutOfMemoryError while saving data in Oracle database

I have an excel sheet with about 25,000 rows. Each row in the excel sheet will be a row in my table as well. I tried to do the following and it just keeps me giving Memory out of bound exception. I tried to change the batchSize from 25 to 50, 100, 500. None of them works. Can anyone tell me what am I doing wrong? changing the heap size of the JVM is not an option for me.
public void saveForecast(List list) throws FinderException{
final Session session = getCurrentSession();
final int batchSize = 25;
Connection con = null;
PreparedStatement pstmt = null;
Iterator iterator = list.iterator();
int rowCount = list.size();
String sqlStatement = "INSERT INTO DMD_VOL_UPLOAD (ORIGIN, DESTINATION, DAY_OF_WEEK, EFFECTIVE_DATE, DISCONTINUE_DATE, VOLUME)";
sqlStatement += " VALUES(?, ?, ?, ?, ?, ?)";
System.out.println(sqlStatement);
System.out.println("Number of rows to be inserted: "+ rowCount);
System.out.println("Starting time: "+new Date().toString());
try{
con = session.connection();
for(int i=0; i<rowCount; i++){
ForecastBatch forecastBatch = (ForecastBatch) iterator.next();
pstmt = con.prepareStatement(sqlStatement);
pstmt.setString(1, forecastBatch.getOrigin());
pstmt.setString(2, forecastBatch.getDestination());
pstmt.setInt(3, forecastBatch.getDayOfWeek());
java.util.Date effJavaDate = forecastBatch.getEffectiveDate();
java.sql.Date effSqlDate = new java.sql.Date(effJavaDate.getTime());
pstmt.setDate(4, effSqlDate);
java.util.Date disJavaDate=forecastBatch.getDiscontinueDate();
java.sql.Date disSqlDate = new java.sql.Date(disJavaDate.getTime());
pstmt.setDate(5, disSqlDate);
pstmt.setInt(6, forecastBatch.getVolumeSum());
pstmt.addBatch();
if(i % batchSize == 0){
pstmt.executeBatch();
session.flush();
session.clear();
}
}
pstmt.executeBatch();
pstmt.close();
System.out.println("Ending Time: "+ new Date().toString());
}catch(SQLException e){
e.printStackTrace();
throw new FinderException(e);
}
finally{
HibernateUtil.closeSession();
}
}
}
You are creating a new statement inside your loop but only closing the last statement after the loop ends. That means you're actually creating 25000 statements and closing only a single one leaving 24999 statements open, which I'm not surprised is causing you to run out of resources.
Furthermore, you're not using the batch statements correctly (you'd have to create the statement once, then set the parameters, call addBatch, set more parameters, call addBatch again, and so on, then call executeBatch when you want to submit all values in the batch.
EDIT:
You'll probably fix this by moving the prepareStatement call just before the for loop and I don't think calling session flush/clear is necessary either.
Your main problem seems to be that you're re-preparing the statement for every single row. You should be preparing the statement once. This would likely lead to consuming a huge amount of memory.

Categories

Resources