I am using Oracle Database 12c with oracle.jdbc.driver.OracleDriver.
How batch insert is working? I know that is grouping statements, but what exactly if happening during preparedStatement.executeBatch? Is it executing only 1 insert per batch?
What approach for executing batches is better. This one with execute outside loop:
PreparedStatement ps = c.prepareStatement("insert into some_tab(id, val) values (?, ?);");
for (int i = 0; i < 3000; i++) {
ps.setLong(1, i);
ps.setString(2, "value" + i);
if ((i + 1) % 3 == 0) {
ps.addBatch();
}
}
ps.executeBatch();
Or this - with execution in loop:
PreparedStatement ps = c.prepareStatement("insert into some_tab(id, val) values (?, ?);");
for (int i = 0; i < 3000; i++) {
ps.setLong(1, i);
ps.setString(2, "value" + i);
ps.addBatch();
if ((i + 1) % 3 == 0) {
ps.executeBatch();
}
}
ps.executeBatch();
What approach for executing batches is better.
PreparedStatement.addBatch insert the current parameters into the batch
Adds a set of parameters to this PreparedStatement object's batch of commands.
preparedStatement.executeBatch send the current batch to the DB.
Submits a batch of commands to the database for execution
So your codes don't have the same logic. One will only add 1/3 of the insert queries into the batch. The other will execute the batch every 3 iteration.
I would sugget a mix of both, add each iteration into the batch, and every # iteration, execute it :
while(...){
...
ps.addBatch();
if ((i + 1) % 3 == 0) { // 3 is just the example, this can be much higher
ps.executeBatch();
}
}
//Send the rest if the loop ended with `(i + 1) % 3 != 0`
ps.executeBatch();
Note that a batch of 3 is probably not necessary, you can increase the value drastically, but I don't really know a way to "estimate" the size of a batch to be efficient, I usually use a batch of 1000 items but this is not to take for granted...
what exactly if happening during preparedStatement.executeBatch? Is it executing only 1 insert per batch?
I always see a batch like a package.
you write a label with the address (create PreparedStatement)
you take a box and stick the address on it
you fill the box with parameters (multiple addBatch)
if the box is too small, you send the current box (executeBatch)
if there is still items to send, restart at point 2
The idea is to limit the number of communication from the JDBC and the DB using a package/batch.
How this is working under the hood is not really answerable and should not really be a concerned.
Related
Which one will give me better performance?
Use Java simply loop the value and added to the sql string and execute the statement at once? Note that PreparedStatement is also used.
INSERT INTO tbl ( c1 , c2 , c3 )
VALUES ('r1c1', 'r1c2', 'r1c3'),
('r2c1', 'r2c2', 'r2c3'),
('r3c1', 'r3c2', 'r3c3')
Use the batch execution as below.
String SQL_INSERT = "INSERT INTO tbl (c1, c2, c3) VALUES (?, ?, ?);";
try (
Connection connection = database.getConnection();
PreparedStatement statement = connection.prepareStatement(SQL_INSERT);
) {
int i = 0;
for (Entity entity : entities) {
statement.setString(1, entity.getSomeProperty());
// ...
statement.addBatch();
i++;
if (i % 1000 == 0 || i == entities.size()) {
statement.executeBatch(); // Execute every 1000 items.
}
}
}
I did a presentation a few years ago I called Load Data Fast!. I compared many different methods of inserting data as fast as possible, and benchmarked them.
LOAD DATA INFILE was much faster than any other method.
But there are other factors that affect the speed, like the type of data, and the type of hardware, and perhaps the load on the system from other concurrent clients of the database. The results I got only describe what the performance is on a Macbook Pro.
Ultimately, you need to test your specific case on your server to get the most accurate answer.
This is what being a software engineer is about. You don't always get the answers spoon-fed to you. You have to do some testing to confirm them.
So I have created a simple program to insert rows to my database(MYSQL) table. I am inserting the records in a batch of 1,000. It works perfectly fine but as is it inserting the 1st batch of 1,000 before completion, I am still able to check the records in my table and there exists records (<1000).
I thought inserting by batch means the behavior should be after the 1st 1000 records completed, then it starts populate the table and if I check the table before it completes the 1st batch, it should return an empty table. That's how I believe it should work.
Code(snippet):
for (int i=1; i < words.length; i++) {
preparedStatement.setString(1, path);
preparedStatement.setString(2, words[i]);
preparedStatement.addBatch();
if (i % 1000 == 0) {
preparedStatement.executeBatch();
System.out.print("Add Thousand");
}
}
if (words.length % 1000 > 0) {
preparedStatement.executeBatch();
System.out.print("Add Remaining");
}
Is this how the nature of MySQL behaves even though it is inserting in batches but it "appears" to be inserting row by row as it begins the batch?
Shouldn't the table appear empty until the first batch of 1000 has been executed?
Can anyone explain what is going on?
This may be asked a lot but,
so I am trying to insert a 4 million records to a database using java,
I did a lot of googling and tried access and MySQL. Both were almost the same,
with MySQL I tried statement.addBatch(); but still takes forever.
the question is, what is the best time I can get ? and what is the best way ?
counter++;
String sqlQuery = "INSERT INTO employees VALUES("some query")";
sqlState.addBatch(sqlQuery);
if (counter == 1000) {
sqlState.executeBatch();
counter = 0;
}
Also am I using the Batch right ?
Reuse a single PreparedStatements, and set parameters on it for each record, then add it to the batch.
PreparedStatement ps = conn.prepareStatement("Insert into employees (f1, f2, f3) VALUES (?, ?, ?)");
while (hasMoreRecords) {
ps.setString(1, "v1");
ps.setString(2, "v2");
....
ps.addBatch();
if (++i % 1000 == 0) {
ps.executeBatch();
}
}
ps.executeBatch();
This won't be a huge difference, but is optimal.
Does your INSERT use a sub-query? Maybe that sub-query is slow.
I have a prepared statement like so
insert into mytable (id, name) values (?,?) , (?,?);
I am using multiple rows per preparedStatement because i was seeing massive speed gains.
Now if i have an odd number of rows to enter then the preparedStatement.executeBatch() does not enter any rows in the DB. It does not throw any error.
here is how i insert the values
int count =0;
for(int i=0; i<size; i++) {
statement.setObject(1, id[i]);
statement.setObject(2, name[i]);
//second row
if(i+1 != size) {
statement.setObject(1, id[i+1]);
statement.setObject(2, name[i+1]);
}
statement.addBatch();
if (count % 200 == 0 && count >0) {
statement.executeBatch();
}
}
statement.executeBatch();
What can i do to make it work?
You can do this automatically using the "rewriteBatchedStatements" option in the MySQL driver. You can write a single insert statement and execute it as a batch and the driver will rewrite it for you automatically to execute in as few round-trips as possible. c.f. http://dev.mysql.com/doc/connector-j/en/connector-j-reference-configuration-properties.html
With this solution, you do not have to use the multiple row form of INSERT.
I have a SQL query as shown below.
SELECT O_DEF,O_DATE,O_MOD from OBL_DEFINITVE WHERE OBL_DEFINITVE_ID =?
A collection of Ids is passed to this query and ran as Batch query. This executes for 10000
times for retriveing values from Database.(Some one else mess)
public static Map getOBLDefinitionsAsMap(Collection oblIDs)
throws java.sql.SQLException
{
Map retVal = new HashMap();
if (oblIDs != null && (!oblIDs.isEmpty()))
{
BatchStatementObject stmt = new BatchStatementObject();
stmt.setSql(SELECT O_DEF,O_DATE,O_MOD from OBL_DEFINITVE WHERE OBL_DEFINITVE_ID=?);
stmt.setParameters(
PWMUtils.convertCollectionToSubLists(taskIDs, 1));
stmt.setResultsAsArray(true);
QueryResults rows = stmt.executeBatchSelect();
int rowSize = rows.size();
for (int i = 0; i < rowSize; i++)
{
QueryResults.Row aRow = (QueryResults.Row) rows.getRow(i);
CoblDefinition ctd = new CoblDefinition(aRow);
retVal.put(aRow.getLong(0), ctd);
}
}
return retVal;
Now we had identified that if the query is modified to
add as
SELECT O_DEF,O_DATE,O_MOD from OBL_DEFINITVE WHERE OBL_DEFINITVE_ID in (???)
so that we can reduce it to 1 query.
The problem here is MSSQL server is throwing exception that
Prepared or callable statement has more than 2000 parameter
And were struck here . Can some one provide any better alternative to this
There is a maximum number of allowed parameters, let's call it n. You can do one of the following:
If you have m*n + k parameters, you can create m batches (or m+1 batches, if k is not 0). If you have 10000 parameters and 2000 is the maximum allowed parameters, you will only need 5 batches.
Another solution is to generate the query string in your application and adding your parameters as string. This way you will run your query only once. This is an obvious optimization in speed, but you'll have a query string generated in your application. You would set your where clause like this:
String myWhereClause = "where TaskID = " + taskIDs[0];
for (int i = 1; i < numberOfTaskIDs; i++)
{
myWhereClause += " or TaskID = " + taskIDs[i];
}
It looks like you are using your own wrapper around PreparedStatement and addBatch(). You are clearly reaching a limit of how many statements/parameters can be batched at once. You will need to use executeBatch (eg every 100 or 1000) statements, instead of having it build up until the limit is reached.
Edit: Based on the comment below I reread the problem. The solution: make sure you use less than 2000 parameters when building the query. If necessary, breaking it up in two or more queries as required.