I'm trying to move a large number of records from one MySQL instance two another inside RDS. They are on different VPCs and different AWS accounts, so I can't create a data pipeline that would do the copy for me.
I've written a quick java program that connects to both the import database and the export database and does the following:
query the import database for the highest in table.primary_key with SELECT MAX(primary_key) FROM table
get a result set from the export table with SELECT * FROM table WHERE(primary_key > max_from_import) LIMIT 1000000
create a PreparedStatement object from the import connection and set the queryString to INSERT INTO table (col1....coln) VALUES (?....n?)
iterate over the result set and set the prepared statement columns to the ones from the result cursor (with some minor manipulations to the data), call execute on the PreparedStatement object, clear its' parameters, then move to the next result.
With this method I'm able to see around 100000 records being imported an hour, but I know that from this question that a way to optimize inserts is not to create a new query each time, but to append more data with each insert. i.e.
INSERT INTO table (col1...coln) VALUES (val1...valn), (val1...valn)....(val1...valn);
Does the jdbc driver know to do this, or is there some sort of optimization I can make on my end to improve insert run time?
UPDATE:
Both answers recommended using the add and execute batch, as well as removing auto commit. Removing auto commit saw a slight improvement (10%), doing the batch yielded a run time of less than 50% of the individual inserts.
You need to use batch insert. Internally, Connector/J (MySQL JDBC driver) can rewrite batch inserts into multi values insert statements.
(Note that this is the default Connector/J behavior. You can add
the option useServerPrepStmts=true to the JDBC url to enable server side prepared statements)
The code looks like the following:
try(PreparedStatement stmt = connection.prepareStatement(sql)) {
for(value : valueList) {
stmt.clearParameters();
stmt.setParameter(1, value);
stmt.addBatch();
}
stmt.executeBatch();
}
The code above will generate a multi value insert:
INSERT tablename(field) VALUES(value1), (value2), (value3) ...
First create a JDBC connection to Destination database and make its auto commit property to false.
After that in a loop do the following
Read N(for example 1000) number of rows from Source database and write that to destination database.
After some inserts commit destination database connection.
Sample code to get more idea is given below
Connection sourceCon = getSourceDbConnction();
Connection destCon = getDestinationDbConnction();
destCon.setAutoCommit(false);
int i=0;
String query;
while((query=getInsertQuery()!=null)
{
statement.executeUpdate(query);
i++;
if(i%10 == 0)
{
destCon.commit();
i=0;
}
}
destCon.commit();
The getInsertQuery function should give string in INSERT INTO table (col1...coln) VALUES (val1...valn), (val1...valn)....(val1...valn); format.
Also it should return null, if all tables are processed.
If you are using Prepared Statements, you can use addBatch and executeBatch functions. Inside loop add values using addBatch function. After some inserts call executeBatch.
I am trying to incrementally read SQL Server CDC changes.
In my first interval, I query
Statement statement = connection.createStatement();
String queryString = "SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_mytable(sys.fn_cdc_get_min_lsn('dbo_mytable'), " +
"sys.fn_cdc_get_max_lsn(), 'all') ORDER BY __$seqval";
ResultSet rs = statement.executeQuery(queryString);
Now I know that __$start_lsn is an LSN (Log Sequence Number) in binary(10). Although I don't understand how I can read it as a Java type so that I can include it my next query and how should I create my next query where I will like to specify the min_lsn as the last LSN which I processed.
You can use several options for retrieving the data from the ResultSet
Then for creating a new query, look at using a a PreparedStatement. There are several options for setting the data based on the type that you you pulled out of the initial query.
We are trying to insert record in hana db so we are currently using jdbc template and key holder to save data and retrieve newly generated column id.This works fine for postgress but not for hana.
query ="SELECT CURRENT_IDENTITY_VALUE() FROM \"HALOSYS\".\"HaloTestDemo\"";
resultSet = stmt.executeQuery(query);
Above statement gives current identity value but this doesn't fit into our context where we want to use jdbc template.
Please give me idea to fix this.
if we add preparedstatement generated keys i am getting follwing exception
com.sap.db.jdbc.exceptions.jdbc40.SQLFeatureNotSupportedException: Method prepareStatement( String, int )() of Connection is not supported.
Try from DUMMY instead of the table name, and do it in the same session as the insert. If you use connection pooling, that will be much harder.
Please prepare your query statement to include insert record and select CURRENT_IDENTITY_VALUE() from the same table into one query statement in an anonymous block as below :
DO BEGIN
INSERT INTO <TABLE_NAME> .....;
SELECT CURRENT_IDENTITY_VALUE() FROM <TABLE_NAME>;
END;
I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append("inserted "+updateCounts.length);
I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");
I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
rewriteBatchedStatements=true is the important parameter. useServerPrepStmts is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() (Source). That means that instead of sending the following n INSERT statements to the mysql server each time executeBatch() is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.
You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3); , you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3); (It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.
If:
It's a new table, or the amount to be inserted is greater then the already inserted data
There are indexes on the table
You do not need other access to the table during the insert
Then ALTER TABLE tbl_name DISABLE KEYS can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.
You may try using DDBulkLoad object.
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append("inserted "+count);
Is there any way for 'previewing' sql select statements?
What I'm trying to do is to get the names of the columns that are returned by a sql statement without actually running the statement?
On application startup i need to know the column names, the problem is that some of the queries can run for awhile.
ResultSetMetaData may help
You still have to execute the query to get the meta data, but you may be able to alter add a restriction to the where clause which means it returns zero rows very quickly. For example you could append and 1 = 0 to the where clause.
The DBMS still has to do all the query parsing that it would normally do - it just means that the execution should hopefully fail very quickly
You didn't mention your DBMS, but the following works with the Postgres and Oracle JDBC drivers. I didn't test any other.
// the statement is only prepared, not executed!
PreparedStatement pstmt = con.prepareStatement("select * from foo");
ResultSetMetaData metaData = pstmt.getMetaData();
for (int i=1; i <= metaData.getColumnCount(); i++)
{
System.out.println(metaData.getColumnName(i));
}