I am inserting data into a teradata table using executeBatch method. Currently if one insert in the batch fails all the other inserts in the batch also fails and no records end up being inserted. How can I change this behaviour to let the other inserts in the batch succeed if any inserts fails and the some ability to track the rejected records.
PS: I have ensured that TMODE is set to TERA and autocommit enabled.
UPDATE:
target table definition.
CREATE SET TABLE mydb.mytable ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
col1 INTEGER,
col2 VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL)
PRIMARY INDEX ( col1 );
Below is the sample scala code. As you can see, this batch contains 5 insert statements. The First insert is set to fail because it is trying to insert null into an not null field (col2). The other 4 inserts dont have any issues and should succeed. But as you can see from below all 5 inserts in the batch failed. Is there any way we can make other inserts succeed?. As stated above tmode is tera and autocommit is enabled. if there is no way other than re-submitting all failed queries individually then we would have to reduce the batch size and settle for lower throughput.
Class.forName("com.teradata.jdbc.TeraDriver");
val conn = DriverManager.getConnection("jdbc:teradata://teradata-server/mydb,tmode=TERA","username","password")
val insertSQL = "INSERT INTO mydb.mytable VALUES (?,?)"
val stmt = conn.prepareStatement(insertSQL)
stmt.setInt(1,1)
stmt.setNull(2,Types.VARCHAR) // Inserting Null here. This insert will fail
stmt.addBatch()
stmt.setInt(1,2)
stmt.setString(2,"XXX")
stmt.addBatch()
stmt.setInt(1,3)
stmt.setString(2,"YYY")
stmt.addBatch()
stmt.setInt(1,4)
stmt.setString(2,"ZZZ")
stmt.addBatch()
stmt.setInt(1,5)
stmt.setString(2,"ABC")
stmt.addBatch()
try {
val res = stmt.executeBatch()
println(res.mkString(","))
}
catch {
case th: BatchUpdateException => {
println(th.getUpdateCounts().mkString(","))
}
}
Result
-3,-3,-3,-3,-3
This is from Teradata's JDBC manual:
Beginning with Teradata Database 13.10 and Teradata JDBC Driver
13.00.00.16, PreparedStatement batch execution can return individual success and error conditions for each parameter set.
An application using the PreparedStatement executeBatch method must
have a catch-block for BatchUpdateException and the application must
examine the error code returned by the BatchUpdateException
getErrorCode method.
PreparedStatement BatchUpdateException Handling
Execute a multi-statement request using a PreparedStatement batch request and demonstrates the handling of the PreparedStatement BatchUpdateException
Related
I'm trying to move a large number of records from one MySQL instance two another inside RDS. They are on different VPCs and different AWS accounts, so I can't create a data pipeline that would do the copy for me.
I've written a quick java program that connects to both the import database and the export database and does the following:
query the import database for the highest in table.primary_key with SELECT MAX(primary_key) FROM table
get a result set from the export table with SELECT * FROM table WHERE(primary_key > max_from_import) LIMIT 1000000
create a PreparedStatement object from the import connection and set the queryString to INSERT INTO table (col1....coln) VALUES (?....n?)
iterate over the result set and set the prepared statement columns to the ones from the result cursor (with some minor manipulations to the data), call execute on the PreparedStatement object, clear its' parameters, then move to the next result.
With this method I'm able to see around 100000 records being imported an hour, but I know that from this question that a way to optimize inserts is not to create a new query each time, but to append more data with each insert. i.e.
INSERT INTO table (col1...coln) VALUES (val1...valn), (val1...valn)....(val1...valn);
Does the jdbc driver know to do this, or is there some sort of optimization I can make on my end to improve insert run time?
UPDATE:
Both answers recommended using the add and execute batch, as well as removing auto commit. Removing auto commit saw a slight improvement (10%), doing the batch yielded a run time of less than 50% of the individual inserts.
You need to use batch insert. Internally, Connector/J (MySQL JDBC driver) can rewrite batch inserts into multi values insert statements.
(Note that this is the default Connector/J behavior. You can add
the option useServerPrepStmts=true to the JDBC url to enable server side prepared statements)
The code looks like the following:
try(PreparedStatement stmt = connection.prepareStatement(sql)) {
for(value : valueList) {
stmt.clearParameters();
stmt.setParameter(1, value);
stmt.addBatch();
}
stmt.executeBatch();
}
The code above will generate a multi value insert:
INSERT tablename(field) VALUES(value1), (value2), (value3) ...
First create a JDBC connection to Destination database and make its auto commit property to false.
After that in a loop do the following
Read N(for example 1000) number of rows from Source database and write that to destination database.
After some inserts commit destination database connection.
Sample code to get more idea is given below
Connection sourceCon = getSourceDbConnction();
Connection destCon = getDestinationDbConnction();
destCon.setAutoCommit(false);
int i=0;
String query;
while((query=getInsertQuery()!=null)
{
statement.executeUpdate(query);
i++;
if(i%10 == 0)
{
destCon.commit();
i=0;
}
}
destCon.commit();
The getInsertQuery function should give string in INSERT INTO table (col1...coln) VALUES (val1...valn), (val1...valn)....(val1...valn); format.
Also it should return null, if all tables are processed.
If you are using Prepared Statements, you can use addBatch and executeBatch functions. Inside loop add values using addBatch function. After some inserts call executeBatch.
I have a strange problem. I'm executing insert using prepared statement like this:
try (Connection connection = connectionPool.getConnection();
PreparedStatement ps = connection.prepareStatement(sql, Statement.RETURN_GENERATED_KEYS)) { //TODO: caching of PS
int i = 1;
ParameterMetaData pmd = ps.getParameterMetaData();
...
} catch (SQLException e) {
throw new TGFIOException("Error executing SQL command " + sql, e);
}
Insert statement is like this:
insert into dbo.CurrencyRates(RateDate, CurrencyID, Rate) values ( ?, ?, ? )
Unfortunately it fails with following exception:
com.microsoft.sqlserver.jdbc.SQLServerException: com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near the keyword 'WHERE'.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:190)
at com.microsoft.sqlserver.jdbc.SQLServerParameterMetaData.<init>(SQLServerParameterMetaData.java:426)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.getParameterMetaData(SQLServerPreparedStatement.java:1532)
at com.jolbox.bonecp.PreparedStatementHandle.getParameterMetaData(PreparedStatementHandle.java:246)
There is no WHERE in the statement, so I am puzzled why it fails on metadata extraction...
EDIT:
SQL Server = 10.50.2500.0 Express Edition,
Driver = sqljdbc4.jar from 4.0 package
Also, I am using getParameterMetaData because I need to set some params to null and the preferred method is to use setNull() where you need SQLType.
EDIT2:
I've tested with Driver sqljdbc41 from newest 6.0 package - results are the same
EDIT3:
I've removed call to getParameterMetaData() and it worked, unfortunately it is a generic part that should max portable, yet it does not work with this single table (inserts to other tables on the same database works fine !!!)
EDIT4:
I've tried with different insert statements for this table and all of them works fine if I skip ps.getParameterMetaData() and fail when I call it. If I try with 2 or more params I get usual near WHERE error. If I try one column insert I get an error stating that the column name is incorrect, even if it is correct and without the meta data call it works perfectly fine. I will try to trace what driver tries to do underneath...
After some tracing on what actually the driver does (many thanks a_horse_with_no_name), I've come to some funny conclusion.
The solution for my question is to:
Replace following insert statement
INSERT INTO CurrencyRates(RateDate, CurrencyID, Rate) VALUES ( ?, ?, ? )
With this statement
INSERT INTO CurrencyRates (RateDate, CurrencyID, Rate) VALUES ( ?, ?, ? )
Logic behind that is that SQL driver does some metadata extraction in the background, and it creates a query with following fragment: ... FROM CurrencyRates(RateDate WHERE ... if you do not put space after table name, yet for the ordinary call this is perfectly possible!
EDIT:
This is obviously an inconsistency as (putting aside what actually is a valid insert) it should consistently accept or reject this query no matter if I call for meta data or not.
I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append("inserted "+updateCounts.length);
I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");
I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
rewriteBatchedStatements=true is the important parameter. useServerPrepStmts is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() (Source). That means that instead of sending the following n INSERT statements to the mysql server each time executeBatch() is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.
You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3); , you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3); (It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.
If:
It's a new table, or the amount to be inserted is greater then the already inserted data
There are indexes on the table
You do not need other access to the table during the insert
Then ALTER TABLE tbl_name DISABLE KEYS can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.
You may try using DDBulkLoad object.
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append("inserted "+count);
I need to get the result set from an executed batch :
String [] queries = {"create volatile table testTable as (select * from orders) with data;",
"select top 10 * from testTable;" ,
"drop table testTable" };
for (String query : queries) {
statement.addBatch(query);
}
statement.executeBatch();
Ones i execute batch how can i get the result set from the select query ?
In short, you should not. Plain multiple execute() should be used.
As as according to the javadoc of executeBatch(), it should not support getResultSet()/getMoreResults() API.
Also, in JDBC™ 4.0 Specification #14.1.2
Only DDL and DML commands that return a simple update count may be
executed as part of a batch. The method executeBatch throws a
BatchUpdateException if any of the commands in the batch fail to
execute properly or if a command attempts to return a result set.
But some JDBC drivers might do support, try at your own risk.
I can think of 2 options from the top of my head.
1) As the other guy said...
"Plain multiple execute() should be used"
this way you can get the result set.
2) you can query after you execute your batch and get its info from the db.
According to the Java 7 API, the 'executeBatch" function doesn't return an object of ResultSet, but an array of integers. You can process the values based on the API to see which commands in the batch were successful.
I have a table named CUSTOMERS with the following columns :
CUSTOMER_ID (NUMBER), DAY(DATE), REGISTERED_TO(NUMBER)
There are more columns in the table but it is irrelevant to my question as only the above columns are defined together as the primary key
In our application we do a large amount of inserts into this table so we do not use MERGE but use the following statement :
INSERT INTO CUSTOMERS (CUSTOMER_ID , DAY, REGISTERED_TO)
SELECT ?, ?, ?
FROM DUAL WHERE NOT EXISTS
(SELECT NULL
FROM CUSTOMERS
WHERE CUSTOMER_ID = ?
AND DAY = ?
AND REGISTERED_TO = ?
)";
We use a PreparedStatement object using the batch feature to insert a large number of records collected through the flow of the application per customer.
Problem is that sometimes I get the following error :
ORA-00001: unique constraint (CUSTOMERS_PK)
violated
Strange thing is that when I do NOT use batch inserts and insert each record one by one (by simply executing pstmt.execute()) there are no errors.
Is it something wrong with the insert statement ? the jdbc driver ? Am I not using the batch mechanism correctly ?
Here is a semi-pseudo-code of my insertion loop :
pstmt = conn.prepareStatement(statement);
pstmt.setQueryTimeout(90);
for each customer :
- pstmt.setObject(1, customer id);
- pstmt.setObject(2, current day);
- pstmt.setObject(3, registered to);
- pstmt.addBatch();
end for
pstmt.executeBatch();
It is all enclosed in a try/catch/finally block making sure the statement and connection are closed at the end of this process.
I guess you are using several threads or processes in parallel, each doing inserts. In this case, Oracle's transaction isolation feature defeats your attempt to do the merge, because sometimes the following is bound to happen:
session A runs your statement, inserts a row (x,y,z)
session B runs the same statement, tries to insert row (x,y,z), gets a lock and waits
session A commits
session B receives the "unique constraint violated" error
That's because until session A commits, session B doesn't see the new row, so it tries to insert the same.