I'm trying to set up a while loop that inserts multiple rows into a MySQL table using the jdbc drivers in Java. The idea is that I end up with a statement along the lines of:
INSERT INTO table (column1, column2) VALUES (column1, column2), (column1, column2);
I want to set up this statement using a java.sql.PreparedStatement, but I'd like to prepare small bits of the statement, one row at a time - mainly because the number of entries will be dynamic, and this seems like the best way to create one big query.
This requires the small parts to be 'merged' together every time another chunk is generated. How do I merge these together? Or would you suggest to forget about this idea, and simply execute thousands of INSERT statements at once?
Thank you,
Patrick
It sort of depends on how often you plan to run this loop to execute thousands of statements, but that is one of the exact purposes of prepared statements and stored procedures - since the query does not have to be recompiled on each execution, you get potentially massive performance gains when querying in a loop over a simple SQL statement execution, which must be compiled and executed on every loop iteration.
Those gains may still not match the performance of a prepared statement built up into a long multi-insert in a loop as you're asking, but will be simpler to code. I would recommend staying with an execution loop unless the performance becomes problematic.
Better to prepare one PreparedStatement and reuse it as much as you want:
INSERT INTO (table) (column1, column2) VALUES (column1, column2)
Related
Requirement:
Have to insert a row to database table multiple times (may be like 50,000 times in a batch job)
Database used: MS SQL Server
Approach taken:
Used NamedParameterJdbcTemplate and PreparedStatement to implement the above.
For example,
String query = "insert into table_Name (Column1, Column2, column3) values (?,?,?)";
Then by the help of PreparedStatement, I have assigned dynamic values to the ? fields in the insert query and then triggered executeUpdate function of NamedParameterJdbcTemplate to execute the insert query.
This above process is repeated multiple times; every time creating new insert queries with different values in the fields ? by the help of PreparedStatement.
Issue:
Performance of the executeUpdate function of NamedParameterJdbcTemplate is very slow.
After few research, I got the below explanation:
"The cost based optimizer (that 's what we 're talking about, not) makes its
choices based on the availability of indexes (among other objects), and the
distribution of values in the indexes (how selective the index will be for a
given value). Obviuosly, when working with bind variables, the suitability
of the index from a distribution point of view is harder to determine. The
optimizer has no way to determine beforehand to what value matches will be
sought. This might (should) lead to another execution plan. No surprise
here, as far as I am concerned."
https://bytes.com/topic/oracle/answers/65559-jdbc-oracle-beware-bind-variables
A PreparedStatement has two advantages over a regular Statement:
We add parameters to the SQL using methods instead of doing it inside the SQL query itself. With this we avoid SQL injection attacks and also let the driver to do type conversions for us.
The same PreparedStatement can be called with different parameters, and the database engine can reuse the query execution plan.
It seems that NamedParameterJdbcTemplate helps us with the first advantage, but does nothing for the latter.
Query:
If NamedParameterJdbcTemplate does not helps us with the second advantage, then what is the alternative solution instead of NamedParameterJdbcTemplate
This question already has answers here:
Difference between Statement and PreparedStatement
(15 answers)
Closed 7 years ago.
I came across below statement that tells about the performance improvement that we get with JDBC PreparedStatement class.
If you submit a new, full SQL statement for every query or update to
the database, the database has to parse the SQL and for queries create
a query plan. By reusing an existing PreparedStatement you can reuse
both the SQL parsing and query plan for subsequent queries. This
speeds up query execution, by decreasing the parsing and query
planning overhead of each execution.
Let's say I am creating the statement and providing different values while running the queries like this :
String sql = "update people set firstname=? , lastname=? where id=?";
PreparedStatement preparedStatement =
connection.prepareStatement(sql);
preparedStatement.setString(1, "Gary");
preparedStatement.setString(2, "Larson");
preparedStatement.setLong (3, 123);
int rowsAffected = preparedStatement.executeUpdate();
preparedStatement.setString(1, "Stan");
preparedStatement.setString(2, "Lee");
preparedStatement.setLong (3, 456);
int rowsAffected = preparedStatement.executeUpdate();
Then will I still get performance benefit, because I am trying to set different values so I can the final query generated is changing based on values.
Can you please explain exactly when we get the performance benefit? Should the values also be same?
When you use prepared statement(i.e pre-compiled statement), As soon as DB gets this statement, it compiles it and caches it so that it can use the last compiled statement for successive call of same statement. So it becomes pre-compiled for successive calls.
You generally use prepared statement with bind variables where you provide the variables at run time. Now what happens for successive execution of prepared statements, you can provide the variables which are different from previous calls. From DB point of view, it does not have to compile the statement every time, will just insert the bind variables at rum time. So becomes faster.
Other advantages of prepared statements is its protection against SQL-injection attack
So the values does not have to be same
Although it is not obvious SQL is not scripting but a "compiled" language. And this compilation aka. optimization aka hard-parse is very exhaustive task. Oracle has a lot of work to do, it must parse the query, resolve table names, validate access privileges, perform some algebraic transformations and then it has to find effective execution plan. Oracle (and other databases too) can join only TWO tables - not more. It means then when you join several tables in SQL, Oracle has to join them one-by-one. i.e. if you join n tables in a query there can be at least up to n! possible execution plans. By default Oracle is limited up to 8000 permutations when search for "optimal" (not the best one) execution plan.
So the compilation(hard-parse) might be more exhaustive then query execution itself. In order to spare resources, Oracle shares execution plans between sessions in a memory structure called library cache. And here another problem might occur, too many parsing require exclusive access to a shared resource.
So if you do too many (hard) parsing your application can not scale - sessions are blocking each other.
On the other hand, there are situations where bind variables are NOT helpful.
Imagine such a query:
update people set firstname=? , lastname=? where group=? and deleted='N'
Since the column deleted is indexed and Oracle knows that there are 98% of values ='Y' and only 2% of values = 'N' it will deduce to use and index in the column deleted. If you used bind variable for condition on deleted column Oracle could not find effective execution plan, because it also depends on input which is unknown in the time of the compilation.
(PS: since 11g it is more complicated with bind variable peeking)
I want the DBMS to help me gain speed when doing a lot of inserts.
Today I do an INSERT Query in Java and catch the exception if the data already is in the database.
The exception I get is :
SQLite Exception : [19] DB[1] exec() columns recorddate, recordtime are not unique.
If I get an exception I do a SELECT Query with the primary keys (recorddate, recordtime) and compare the result with the data I am trying to insert in Java. If it is the same I continue with next insert, otherwise I evaluate the data and decide what to save and maybe do an UPDATE.
This process takes time and I would like to speed it up.
I have thought of INSERT IF NOT EXIST but this just ignore the insert if there is any data with the same primary keys, am I right? And I want to make sure it is exactly the same data before I ignore the insert.
I would appreciate any suggestions for how to make this faster.
I'm using Java to handle large amount of data to insert into a SQLite database (SQLite v. 3.7.10). As the connection between Java and SQLite I am using sqlite4java (http://code.google.com/p/sqlite4java/)
I do not think letting the dbms handling more of that logic would be faster, at least not with plain SQL, as far as I can think of there is no "create or update" there.
When handling lots of entries often latency is an important issue, especially with dbs accessed via network, so at least in that case you want want to use mass operations whereever possible. Even if provided, "create or update" instead of select and update or insert (if even) would only half the latency.
I realize that is not what you asked for, but I would try to optimize in a different way, processing chunks of data, select all of them into a map then partition the input in creates, updates and ignores. That way ignores are almost for free, and further lookups are guaranteed to be done in memory. Unlikely that the dbms can be significantly faster.
If unsure if that is the right approach for you, profiling of overhead times should help.
Wrap all of your inserts and updates into a transaction. In SQL this will be written as follows.
BEGIN;
INSERT OR REPLACE INTO Table(Col1,Col2) VALUES(Val1,Val2);
COMMIT;
There are two things to note here: the database paging and commits will not be written to disk until COMMIT is called, speeding up your queries significantly; the second thing is the INSERT OR REPLACE syntax, which does precisely what you want for UNIQUE or PRIMARY KEY fields.
Most database wrappers have a special syntax for managing transactions. You can certainly execute a query, BEGIN, followed by your inserts and updates, and finish by executing COMMIT. Read the database wrapper documentation.
One more thing you can do is switch to Write-Ahead Logging. Run the following command, only once, on the database.
PRAGMA journal_mode = wal;
Without further information, I would:
BEGIN;
UPDATE table SET othervalues=... WHERE recorddate=... AND recordtime=...;
INSERT OR IGNORE INTO table(recorddate, recordtime, ...) VALUES(...);
COMMIT;
UPDATE will update all existing rows, ignoring non existent because of WHERE clause.
INSERT will then add new rows, ignoring existing because of IGNORE.
I'm trying to create a java program to cleanup and merge rows in my table. The table is large, about 500k rows and my current solution is running very slowly. The first thing I want to do is simply get an in-memory array of objects representing all the rows of my table. Here is what I'm doing:
pick an increment of say 1000 rows at a time
use JDBC to fetch a resultset on the following SQL query
SELECT * FROM TABLE WHERE ID > 0 AND ID < 1000
add the resulting data to an in-memory array
continue querying all the way up to 500,000 in increments of 1000, each time adding results.
This is taking way to long. In fact its not even getting past the second increment from 1000 to 2000. The query takes forever to finish (although when I run the same thing directly through a MySQL browser its decently fast). Its been a while since I've used JDBC directly. Is there a faster alternative?
First of all, are you sure you need the whole table in memory? Maybe you should consider (if possible) selecting rows that you want to update/merge/etc. If you really have to have the whole table you could consider using a scrollable ResultSet. You can create it like this.
// make sure autocommit is off (postgres)
con.setAutoCommit(false);
Statement stmt = con.createStatement(
ResultSet.TYPE_SCROLL_INSENSITIVE, //or ResultSet.TYPE_FORWARD_ONLY
ResultSet.CONCUR_READ_ONLY);
ResultSet srs = stmt.executeQuery("select * from ...");
It enables you to move to any row you want by using 'absolute' and 'relative' methods.
One thing that helped me was Statement.setFetchSize(Integer.MIN_VALUE). I got this idea from Jason's blog. This cut down execution time by more than half. Memory consumed went down dramatically (as only one row is read at a time.)
This trick doesn't work for PreparedStatement, though.
Although it's probably not optimum, your solution seems like it ought to be fine for a one-off database cleanup routine. It shouldn't take that long to run a query like that and get the results (I'm assuming that since it's a one off a couple of seconds would be fine). Possible problems -
is your network (or at least your connection to mysql ) very slow? You could try running the process locally on the mysql box if so, or something better connected.
is there something in the table structure that's causing it? pulling down 10k of data for every row? 200 fields? calculating the id values to get based on a non-indexed row? You could try finding a more db-friendly way of pulling the data (e.g. just the columns you need, have the db aggregate values, etc.etc)
If you're not getting through the second increment something is really wrong - efficient or not, you shouldn't have any problem dumping 2000, or 20,000 rows into memory on a running JVM. Maybe you're storing the data redundantly or extremely inefficiently?
I've got an application that parses log files and inserts a huge amount of data into database. It's written in Java and talks to a MySQL database over JDBC. I've experimented with different ways to insert the data to find the fastest for my particular use case. The one that currently seems to be the best performer is to issue an extended insert (e.g. a single insert with multiple rows), like this:
INSERT INTO the_table (col1, col2, ..., colN) VALUES
(v1, v2, v3, ..., vN),
(v1, v2, v3, ..., vN),
...,
(v1, v2, v3, ..., vN);
The number of rows can be tens of thousands.
I've tried using prepared statements, but it's nowhere near as fast, probably because each insert is still sent to the DB separately and the tables needs to be locked and whatnot. My colleague who worked on the code before me tried using batching, but that didn't perform well enough either.
The problem is that using extended inserts means that as far as I can tell I need to build the SQL string myself (since the number of rows is variable) and that means that I open up all sorts of SQL injection vectors that I'm no where intelligent enough to find myself. There's got to be a better way to do this.
Obviously I escape the strings I insert, but only with something like str.replace("\"", "\\\""); (repeated for ', ? and \), but I'm sure that isn't enough.
prepared statements + batch insert:
PreparedStatement stmt = con.prepareStatement(
"INSERT INTO employees VALUES (?, ?)");
stmt.setInt(1, 101);
stmt.setString(2, "Paolo Rossi");
stmt.addBatch();
stmt.setInt(1, 102);
stmt.setString(2, "Franco Bianchi");
stmt.addBatch();
// as many as you want
stmt.executeBatch();
I would try batching your inserts and see how that performs.
Have a read of this (http://www.onjava.com/pub/a/onjava/excerpt/javaentnut_2/index3.html?page=2) for more information on batching.
If you are loading tens of thousands of records then you're probably better off using a bulk loader.
http://dev.mysql.com/doc/refman/5.0/en/load-data.html
Regarding the difference between extended inserts and batching single inserts, the reason I decided to use extended inserts is because I noticed that it took my code alot longer time to insert alot of rows than mysql does from the terminal. This was even though I was batching inserts in batches of 5000. The solution in the end was to use extended inserts.
I quickly retested this theory.
I took two dumps of a table with 1.2 million rows. One using the default extended insert statements you get with mysqldump and the other using:
mysqldump --skip-extended-insert
Then I simply imported the files again into new tables and timed it.
The extended insert test finished in 1m35s and the other in 3m49s.
The full answer is to use the rewriteBatchedStatements=true configuration option along with dfa's answer of using a batched statement.
The relevant mysql documentation
A worked MySQL example