I am working with mysql in Java.
Basically, I have multiple queries that each create a table in the database, along with a single ALTER statement that adjusts the auto-increment initial value for one of my attributes. I am executing those queries as a transaction--namely either all are committed to the database or none are. But to do so I have create a separate Statement for each query - 8 in total - and execute each. After, I commit all the results. And then I have to close each Statement.
But this seems inefficient. To many Statements. So I wonder whether batch methods would work. My concern is that batch methods execute all the queries simultaneously, and since I have Referential Integrity Constraints and the ALTER query there is a dependancy between the tables - and thus the order in which they are created matters. Is this not correct ? Am I misunderstanding how batch statements work ?
If my logic above is correct, then should I possibly group a few queries together (that are not related) and use batch methods to execute them. This will then reduce the number of Statements I have.
I don't think you can batch DDL (i.e. create, drop, alter). Also, it's not a great idea, performance wise, to require dynamic DDL.
You can batch DML statements using Statement.addBatch(String) (i.e. select, insert, update and delete statements) and then call Statement.executeBatch().
Related
I have a Java application which is executing queries on PostgreSQL 9.3 Server using JDBC. In my java application, I had to execute same query many times(in thousands) with different arguments in 'where' clause predicates alone. I have been using Statement class till now. I recently read about PreparedStatement class somewhere and I am thinking should I use it to speedup processing. But my doubt is this. Since my query executes each time with different values in Where clause predicates, the selectivity will change and hence plan chosen by the db server will change. In that case, will using PreparedStatement speedup the processing? Is the plan chosen when Preparedstatement is created or plan is chosen only when execute is called on the preparedstatement object? If plan is chosen when preparedstatement is created itself, how is it done since optimizer chooses plans based on selectivity calculated using actual predicate values.
My Query is a complex one involving many tables. Template is like,
select something from tables where predicate1 and predicate2 and price < X and date < Y;
where X and Y varies for each query.
From PostgreSQL doc :
PREPARE creates a prepared statement. A prepared statement is a
server-side object that can be used to optimize performance. When the
PREPARE statement is executed, the specified statement is parsed,
analyzed, and rewritten. When an EXECUTE command is subsequently
issued, the prepared statement is planned and executed. This division
of labor avoids repetitive parse analysis work, while allowing the
execution plan to depend on the specific parameter values supplied.
moe was right : preparing a query will only remove the overhead of reparsing it again and again. The planing is done only when you will execute the prepared query with its parameters.
In 9.3, it uses a heuristic. It does something like planning the query with the specific bind values the first 5 times the prepared statement is executed. If none of those plans turn out to be substantially better than the generic plan, then it stops the individual planning and justs uses the generic plan from then on.
But there is another wrinkle in that just because your code told the driver to use a prepared statement doesn't mean driver is actually doing so. A lot of drivers do weird things.
The real answer is test, test, test.
I want the DBMS to help me gain speed when doing a lot of inserts.
Today I do an INSERT Query in Java and catch the exception if the data already is in the database.
The exception I get is :
SQLite Exception : [19] DB[1] exec() columns recorddate, recordtime are not unique.
If I get an exception I do a SELECT Query with the primary keys (recorddate, recordtime) and compare the result with the data I am trying to insert in Java. If it is the same I continue with next insert, otherwise I evaluate the data and decide what to save and maybe do an UPDATE.
This process takes time and I would like to speed it up.
I have thought of INSERT IF NOT EXIST but this just ignore the insert if there is any data with the same primary keys, am I right? And I want to make sure it is exactly the same data before I ignore the insert.
I would appreciate any suggestions for how to make this faster.
I'm using Java to handle large amount of data to insert into a SQLite database (SQLite v. 3.7.10). As the connection between Java and SQLite I am using sqlite4java (http://code.google.com/p/sqlite4java/)
I do not think letting the dbms handling more of that logic would be faster, at least not with plain SQL, as far as I can think of there is no "create or update" there.
When handling lots of entries often latency is an important issue, especially with dbs accessed via network, so at least in that case you want want to use mass operations whereever possible. Even if provided, "create or update" instead of select and update or insert (if even) would only half the latency.
I realize that is not what you asked for, but I would try to optimize in a different way, processing chunks of data, select all of them into a map then partition the input in creates, updates and ignores. That way ignores are almost for free, and further lookups are guaranteed to be done in memory. Unlikely that the dbms can be significantly faster.
If unsure if that is the right approach for you, profiling of overhead times should help.
Wrap all of your inserts and updates into a transaction. In SQL this will be written as follows.
BEGIN;
INSERT OR REPLACE INTO Table(Col1,Col2) VALUES(Val1,Val2);
COMMIT;
There are two things to note here: the database paging and commits will not be written to disk until COMMIT is called, speeding up your queries significantly; the second thing is the INSERT OR REPLACE syntax, which does precisely what you want for UNIQUE or PRIMARY KEY fields.
Most database wrappers have a special syntax for managing transactions. You can certainly execute a query, BEGIN, followed by your inserts and updates, and finish by executing COMMIT. Read the database wrapper documentation.
One more thing you can do is switch to Write-Ahead Logging. Run the following command, only once, on the database.
PRAGMA journal_mode = wal;
Without further information, I would:
BEGIN;
UPDATE table SET othervalues=... WHERE recorddate=... AND recordtime=...;
INSERT OR IGNORE INTO table(recorddate, recordtime, ...) VALUES(...);
COMMIT;
UPDATE will update all existing rows, ignoring non existent because of WHERE clause.
INSERT will then add new rows, ignoring existing because of IGNORE.
I am looking similar way to jdbc driver in Java, to perform a batch of updates in PHP.
In jdbc there is an API of PreparedStatement.executeBatch(), which executes the whole statements in one round trip to the DB.
Does PHP PDO has similar API, and if not, does starting transaction, doing the updates and then commit will do the same effect of executing all updates in one round trip to the DB or each update will round trip to the DB and immediately executing the statement (although not visible to others, since it is in transaction)?
There is no such thing like "batch update" in Mysql. There are SQL queries only.
As long as you can do your updates in one query, your updates will be done in one round-trip. Otherwise there would be many. No matter what API being used.
Speaking of single SQL queries, there are 2 possible ways
CASE statement in WHERE
A neat trick with INSERT(!) query with ON DUPLICATE UPDATE statement. Which will actually update your data.
PHP PDO doesn't have batch execution of queries.
Running many inserts and updates in a transaction usually improves greatly the execution speed. If you're making batch jobs on a database you should run queries in bulks within a transaction.
I am trying to create multiple tables (upto 20) via java.sql prepared statement batch execute. Most of tables are related to eachother. But there is some confusion in my mind.
1) set connection auto commit true or false?
2) Is there any special pattern for BatchExecute.? like up down. I want to parent table create query must execute first.
3) If error ouccurs all the batch is rollback?
The behavior of batch execution with auto commit on is implementation defined, some drivers may not even support that. So if you want to use batch execution, set auto commit to false.
That said, some databases implicitly commit each DDL statement; this might interfere with correct working of batched execution. I would advise to take the safe route and not use batched execution for DDL, but to use a normal Statement and execute(String) for executing DDL.
Actually using batch execution in this case does not make much sense. Batch execution gives you a (big) performance improvement when inserting or updating thousands of rows at once.
You just need to have all your statements within a transaction:
call Connection.setAutoCommit(false)
execute your create-table statements with Statement.executeUpdate
call Connection.commit()
You need to order the create-table statements yourself based on the foreign-keys between them.
As Mark pointed out, the DB you are using might commit each create-table right away and ignore the transaction. Not all DBs support transactional creation of tables. You will need to test this or do some more research regarding this aspect.
I've been looking around trying to determine some Hibernate behavior that I'm unsure about. In a scenario where Hibernate batching is properly set up, will it only ever use multiple insert statements when a batch is sent? Is it not possible to use a DB independent multi-insert statement?
I guess I'm trying to determine if I actually have the batching set up correctly. I see the multiple insert statements but then I also see the line "Executing batch size: 25."
There's a lot of code I could post but I'm trying to keep this general. So, my questions are:
1) What can you read in the logs to be certain that batching is being used?
2) Is it possible to make Hibernate use a multi-row insert versus multiple insert statements?
Hibernate uses multiple insert statements (one per entity to insert), but sends them to the database in batch mode (using Statement.addBatch() and Statement.executeBatch()). This is the reason you're seeing multiple insert statements in the log, but also "Executing batch size: 25".
The use of batched statements greatly reduces the number of roundtrips to the database, and I would be surprised if it were less efficient than executing a single statement with multiple inserts. Moreover, it also allows mixing updates and inserts, for example, in a single database call.
I'm pretty sure it's not possible to make Hibernate use multi-row inserts, but I'm also pretty sure it would be useless.
I know that this is an old question but i had the same problem that i thought that hibernate batching means that hibernate would combine multiple inserts into one statement which it doesn't seem to do.
After some testing i found this answer that a batch of multiple inserts is just as good as a multi-row insert. I did a test inserting 1000 rows one time using hibernate batch and one time without. Both tests took about 20s so there was no performace gain in using hibernate batch.
To be sure i tried using the rewriteBatchedStatements option from the MySQL Connector/J which actually combines multiple inserts into one statement. It reduced the time to insert 1000 records down to 3s.
So after all hibernate batch seems to be useless and a real multi-row insert to be much better. Am i doing something wrong or what causes my test results?
The Oracle bulk insert collect an array of entyty and pass in a single block to the db associating to it a unic ciclic insert/update/delete.
Is unic way to speed network throughput .
Oracle suggest to do it calling a stored procedure from hibernate passing it an array of datas.
http://biemond.blogspot.it/2012/03/oracle-bulk-insert-or-select-from-java.html?m=1
Is not only a software problem but infrastructural!
Problem is network data flow optimization and TCP stack fragmentation.
Mysql have function.
You have to do something like what is described in this article.
Normal transfer on network the correct volume of data is the solution
You have also to verify network mtu and Oracle sdu/tdu utilization respect data transferred between application and database