We are using JDBC batch update (Statement - void addBatch( String sql ) and int[] executeBatch()) in our Java code. The job is supposed to insert about 27k records in a table and then update about 18k records in a subsequent batch.
When our job runs at 6am, it is missing a few thousand records (we observed this from the database audit logs). We can see from the job logs that the update statements are being generated for all the 18k records. We understand that all the update statements get added in sequence to the batch, However, only records from the beginning of the batch seem to be missing. Also, it is not a fixed number everyday - one day, it skips out on the first 4534 update statements and another day it skips out on the first 8853 records and another day, it skips out on 5648 records.
We initially thought this could be a thread issue but have since moved away from that thought process as the block being skipped out does not always contain the same number of update statements. If we assume that the first few thousand updates are happening even before the insert, then the updates should at least show up in the database audit logs. However, this is not the case.
We are thinking this is due to a memory/heap issue as running the job at any other time picks up all the 18k update statements and they are executed successfully. We reviewed the audit logs from the Oracle database and noticed that the missing update statements are never executed on the table during the 6am run. At any other time, all the update statements are showing up in the database audit logs.
This job was running successfully for almost 3 years now and this behavior started only from a few weeks ago. We tried to look at any changes to the server/environment but nothing jumps out at us.
We are trying to pinpoint why this is happening, specifically, if there are any processes that are using up too much of the JVM heap and as a result, our update statements are getting overwritten/not being executed.
Database: Oracle 11g Enterprise Edition Release 11.2.0.3.0 - 64bit
Java: java version "1.6.0_51"
Java(TM) SE Runtime Environment (build 1.6.0_51-b11)
Java HotSpot(TM) Server VM (build 20.51-b01, mixed mode)
void main()
{
DataBuffer dataBuffer;//assume that all the selected data to be updated is stored in this object
List<String> TransformedList = transform(dataBuffer);
int status = bulkDML(TransformedList);
}
public List<String> transform(DataBuffer i_SourceData)
{
//i_SourceData has all the data selected from
//the source table, that has to be updated
List<Row> AllRows = i_SourceData.getAllRows();
List<String> AllColumns = i_SourceData.getColumnNames();
List<String> transformedList = new ArrayList<String>();
for(Row row: AllRows)
{
int index = AllColumns.indexOf("unq_idntfr_col");
String unq_idntfr_val = (String)row.getFieldValues().get(index);
index = AllColumns.indexOf("col1");
String val1 = (String)row.getFieldValues().get(index);
String query = null;
query = "UPDATE TABLE SET col1 = " + val1 + " where unq_idntfr_col=" + unq_idntfr_val;//this query is not the issue either - it is parameterized in our code
transformedList.add(query);
}
return transformedList;
}
public int bulkDML(List<String> i_QueryList)
{
Connection connection = getConnection();
Statement statement = getStatement(connection);
try
{
connection.setAutoCommit(false);
for (String Query: i_QueryList)
{
statement.addBatch(Query);
}
statement.executeBatch();
connection.commit();
}
//handle various exceptions and all of them return -1
//not pertinent to the issue at hand
catch(Exception e)
{
return -1;
}
CloseResources(connection, statement, null);
return 0;
}
Any suggestions would be greatly appreciated, thank you.
If you want to execute multiple updates on the same table then I suggest modifying your query to use binds and a PreparedStatement because that's really the only way to do real DML batching with the Oracle Database. For example your query would become:
UPDATE TABLE SET col1=? WHERE unq_idntfr_col=?
and then use JDBC batching with the same PreparedStatement. This change would require to you revisit your bulkDML method to make it take bind values as parameter instead of SQL.
The JDBC pseudo code would then look like this:
PreparedStatement pstmt = connection.prepareCall("UPDATE TABLE SET col1=? WHERE unq_idntfr_col=?");
pstmt.setXXX(1, x);
pstmt.setYYY(2, y);
pstmt.addBatch();
pstmt.setXXX(1, x);
pstmt.setYYY(2, y);
pstmt.addBatch();
pstmt.setXXX(1, x);
pstmt.setYYY(2, y);
pstmt.addBatch();
pstmt.executeBatch();
Related
I'm trying to move a large number of records from one MySQL instance two another inside RDS. They are on different VPCs and different AWS accounts, so I can't create a data pipeline that would do the copy for me.
I've written a quick java program that connects to both the import database and the export database and does the following:
query the import database for the highest in table.primary_key with SELECT MAX(primary_key) FROM table
get a result set from the export table with SELECT * FROM table WHERE(primary_key > max_from_import) LIMIT 1000000
create a PreparedStatement object from the import connection and set the queryString to INSERT INTO table (col1....coln) VALUES (?....n?)
iterate over the result set and set the prepared statement columns to the ones from the result cursor (with some minor manipulations to the data), call execute on the PreparedStatement object, clear its' parameters, then move to the next result.
With this method I'm able to see around 100000 records being imported an hour, but I know that from this question that a way to optimize inserts is not to create a new query each time, but to append more data with each insert. i.e.
INSERT INTO table (col1...coln) VALUES (val1...valn), (val1...valn)....(val1...valn);
Does the jdbc driver know to do this, or is there some sort of optimization I can make on my end to improve insert run time?
UPDATE:
Both answers recommended using the add and execute batch, as well as removing auto commit. Removing auto commit saw a slight improvement (10%), doing the batch yielded a run time of less than 50% of the individual inserts.
You need to use batch insert. Internally, Connector/J (MySQL JDBC driver) can rewrite batch inserts into multi values insert statements.
(Note that this is the default Connector/J behavior. You can add
the option useServerPrepStmts=true to the JDBC url to enable server side prepared statements)
The code looks like the following:
try(PreparedStatement stmt = connection.prepareStatement(sql)) {
for(value : valueList) {
stmt.clearParameters();
stmt.setParameter(1, value);
stmt.addBatch();
}
stmt.executeBatch();
}
The code above will generate a multi value insert:
INSERT tablename(field) VALUES(value1), (value2), (value3) ...
First create a JDBC connection to Destination database and make its auto commit property to false.
After that in a loop do the following
Read N(for example 1000) number of rows from Source database and write that to destination database.
After some inserts commit destination database connection.
Sample code to get more idea is given below
Connection sourceCon = getSourceDbConnction();
Connection destCon = getDestinationDbConnction();
destCon.setAutoCommit(false);
int i=0;
String query;
while((query=getInsertQuery()!=null)
{
statement.executeUpdate(query);
i++;
if(i%10 == 0)
{
destCon.commit();
i=0;
}
}
destCon.commit();
The getInsertQuery function should give string in INSERT INTO table (col1...coln) VALUES (val1...valn), (val1...valn)....(val1...valn); format.
Also it should return null, if all tables are processed.
If you are using Prepared Statements, you can use addBatch and executeBatch functions. Inside loop add values using addBatch function. After some inserts call executeBatch.
I'm having a problem with mysql queries taking a little too long to execute.
private void sqlExecute(String sql, Map<Integer, Object> params) {
try (Connection conn = dataSource.getConnection(); PreparedStatement statement = conn.prepareStatement(sql)) {
if (params.size() > 0) {
for (Integer key : params.keySet()) {
statement.setObject(key, params.get(key));
}
}
statement.executeUpdate();
} catch (SQLException e) {
e.printStackTrace();
}
}
I've narrowed the problem down to the executeUpdate() line specifically. Everything else runs smoothly, but this particular line (and when I run executeQuery() as well) takes around 70ms to execute. This may not seem like an unreasonable amount of time, but currently this is a small test db table with under 100 rows. Columns are indexed, so a typical query is only looking at around 15 rows.
Ultimately however, we'll need to scan much larger tables with thousands of rows. Additionally, we're running numerous queries at a time (they can't really be batched because the results of each query are used for future queries), so all of those queries together are taking more like 7s.
Here's an example of a method for running a mysql query:
public void addRating(String db, int user_id, int item_id) {
parameters = new HashMap<>();
parameters.put(1, user_id);
parameters.put(2, item_id);
sql = "INSERT IGNORE INTO " + db + " (user_id, item_id) VALUES (?, ?)";
sqlExecute(sql, parameters);
}
A couple of things to note:
The column indexing is probably not the problem. When running the same mysql statements in our phpMyAdmin console, execution time is more like 0.3ms.
Also of note is that the execution time is consistently 70ms, regardless of the actual mysql statement.
I'm using connection pooling and wonder if this is possibly a source of the problem. In other tests, dataSource.getConnection() also takes about 70ms. I'm running this code locally.
The above example is for an INSERT using executeUpdate(), but the same problem happens for SELECT statements using executeQuery().
I have tried using /dev/urandom per Oracle's suggestion, but this made no difference.
I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append("inserted "+updateCounts.length);
I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");
I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
rewriteBatchedStatements=true is the important parameter. useServerPrepStmts is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() (Source). That means that instead of sending the following n INSERT statements to the mysql server each time executeBatch() is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.
You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3); , you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3); (It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.
If:
It's a new table, or the amount to be inserted is greater then the already inserted data
There are indexes on the table
You do not need other access to the table during the insert
Then ALTER TABLE tbl_name DISABLE KEYS can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.
You may try using DDBulkLoad object.
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append("inserted "+count);
Is there any way for 'previewing' sql select statements?
What I'm trying to do is to get the names of the columns that are returned by a sql statement without actually running the statement?
On application startup i need to know the column names, the problem is that some of the queries can run for awhile.
ResultSetMetaData may help
You still have to execute the query to get the meta data, but you may be able to alter add a restriction to the where clause which means it returns zero rows very quickly. For example you could append and 1 = 0 to the where clause.
The DBMS still has to do all the query parsing that it would normally do - it just means that the execution should hopefully fail very quickly
You didn't mention your DBMS, but the following works with the Postgres and Oracle JDBC drivers. I didn't test any other.
// the statement is only prepared, not executed!
PreparedStatement pstmt = con.prepareStatement("select * from foo");
ResultSetMetaData metaData = pstmt.getMetaData();
for (int i=1; i <= metaData.getColumnCount(); i++)
{
System.out.println(metaData.getColumnName(i));
}
I need to get the result set from an executed batch :
String [] queries = {"create volatile table testTable as (select * from orders) with data;",
"select top 10 * from testTable;" ,
"drop table testTable" };
for (String query : queries) {
statement.addBatch(query);
}
statement.executeBatch();
Ones i execute batch how can i get the result set from the select query ?
In short, you should not. Plain multiple execute() should be used.
As as according to the javadoc of executeBatch(), it should not support getResultSet()/getMoreResults() API.
Also, in JDBC™ 4.0 Specification #14.1.2
Only DDL and DML commands that return a simple update count may be
executed as part of a batch. The method executeBatch throws a
BatchUpdateException if any of the commands in the batch fail to
execute properly or if a command attempts to return a result set.
But some JDBC drivers might do support, try at your own risk.
I can think of 2 options from the top of my head.
1) As the other guy said...
"Plain multiple execute() should be used"
this way you can get the result set.
2) you can query after you execute your batch and get its info from the db.
According to the Java 7 API, the 'executeBatch" function doesn't return an object of ResultSet, but an array of integers. You can process the values based on the API to see which commands in the batch were successful.