I'm using java.sql.Connection.setAutoCommit(false) and java.sql.PreparedStatement.addBatch() to do some bulk inserts. I'm guessing how many correct insert/update statements can be safely executed before a commit? For example, executing 100.000 inserts before a commit may results in a JDBC driver complaint or memory leakage or something else? I guess there is a limit about how many statements I can execute before a commit, where can I find such infos?
There's no limit on the number of DML statements. Every INSERT/UPDATE/DELETE you push to the database is actually tracked in the database only. So there would not be any memory leakage like you mentioned. Memory leakage in JDBC can usually be related with the unclosed result sets or prepared statements only.
But other side, So much of DML operations without COMMIT, could do so much of logging in the DB. And this might impact the performance of other operations. When you issue a COMMIT after say millions of INSERTs, the other operations like INDEX analysis, data replication (if any) would put more overhead to the DBMS. Still these points are completely DBMS specific. JDBC driver has nothing to with.
Related
Lot of SHOW TRANSACTION ISOLATION LEVEL appears in process list in Postgres 9.0 .
What are reasons for this and when it appears ?. All are in idle state.
How to disable this ?
I assume that with process list you mean the system view pg_stat_activity (accessible in pgAdmin III in the "statistics" tab or with "Tools/Server Status").
Since you say that the connections are idle, the query column does not show an active query, it shows the last query that has been issued in this database connection. I don't know which ORM or connection pooler you are using, but some software in your stack must insert these statements routinely at the end of a database action.
I wouldn't worry about them, this statements are not resource intensive and probably won't cause you any trouble.
If you really need to get rid of them, figure out which software in your stack causes them and investigate there.
I am running a Java program with Hibernate. Hibernate generates queries, but we also have some custom DAO queries. Static code analysis/reviews for poorly designed SQL queries is quite hard and resource intensive to do, but they cause real trouble. Is there some JDBC interceptor available, that can warn about sub-par or poorly written SQL queries? I know there are sql monitors such has log4-jdbc, but to my knowledge they do not trace this kind of information at runtime.
By heuristic I mean, that executing a 5s lasting query once a day is ok, but executing 100 of these 5s lasting queries every minute is not ok. Or it could warn about sql queries not containing WHERE statement, which pulls out a large number of rows every time.
Perhaps something like XRebel could solve the issue? You could set thresholds to get notifications if the query is running too long or if too many queries were executed. Plus, you could actually spot N+1 issues just by looking at the results. And the queries are logged with inlined parameters, so you wouldn't have to figure out which parameter is which.
We're currently trying to make our server software use a connection pool to greatly reduce lag however instead of reducing the time queries take to run, it is doubling the time and making it even slower than it was before the connection pool.
Are there any reasons for this? Does JDBC only allow a single query at a time or is there another issue?
Also, does anyone have any examples of multi-threaded connection pools to reduce the time hundreds of queries take as the examples we have found only made it worse.
We've tried using BoneCP and Apache DBCP with similar results...
That one is using Apache's DBCP. We also have tried using BoneCP with the same result...
A connection pool helps mitigating the overhead/cost of creating new connections to the database, by reusing already existing ones. This is important if your workload requires many, short to medium living connections, e.g. an app that processes concurrent user requests by querying the database. Unfortunately your example benchmark code does not have such a profile. You are just using 4 connections in parallel and there is no reuse involved.
What a connection pool cannot achieve is magically speeding up execution times or improving the concurrency level beyond that, which is provided by the database. If the benchmark code represents the expected workload, I would advise you to look into batching statements instead of threading. That will massively increase performance of INSERT/UPDATE operations.
update :
Using multiple connections in parallel can enhance performance. Just keep in mind, that there is not necessarily a relation between multiple threads in your Java application and in the database. JDBC is just a wrapper around the database driver, using multiple connections results in multiple queries being submitted to the database server in parallel. If those queries are suited for it, every modern RDBMS will be able to process them in parallel. But if those queries are very work intensive, or even worse include table locks or conflicting updates, the DB may not be able to do so. If you experience bad performance, check which queries are lagging and optimize them (are they efficient? proper indexes in place? denormalizing the schema may help in more extreme cases. Use prepared statements and batch mode for larger updates, etc.). If your db is overloaded with many, similar and small queries, consider caching frequently used data.
The underlying problem I want to solve is running a task that generates several temporary tables in MySQL, which need to stay around long enough to fetch results from Java after they are created. Because of the size of the data involved, the task must be completed in batches. Each batch is a call to a stored procedure called through JDBC. The entire process can take half an hour or more for a large data set.
To ensure access to the temporary tables, I run the entire task, start to finish, in a single Spring transaction with a TransactionCallbackWithoutResult. Otherwise, I could get a different connection that does not have access to the temporary tables (this would happen occasionally before I wrapped everything in a transaction).
This worked fine in my development environment. However, in production I got the following exception:
java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
This happened when a different task tried to access some of the same tables during the execution of my long running transaction. What confuses me is that the long running transaction only inserts or updates into temporary tables. All access to non-temporary tables are selects only. From what documentation I can find, the default Spring transaction isolation level should not cause MySQL to block in this case.
So my first question, is this the right approach? Can I ensure that I repeatedly get the same connection through a Hibernate template without a long running transaction?
If the long running transaction approach is the correct one, what should I check in terms of isolation levels? Is my understanding correct that the default isolation level in Spring/MySQL transactions should not lock tables that are only accessed through selects? What can I do to debug which tables are causing the conflict, and prevent those tables from being locked by the transaction?
I consider keeping transaction open for an extended time evil. During my career the definition of "extended" has descended from seconds to milli-seconds.
It is an unending source of non-repeatable problems and headscratching problems.
I would bite the bullet in this case and keep a 'work log' in software which you can replay in reverse to clean up if the batch fails.
When you say your table is temporary, is it transaction scoped? That might lead to other transactions (perhaps on a different transaction) not being able to see/access it. Perhaps a join involving a real table and a temporary table somehow locks the real table.
Root cause: Have you tried to use the MySQL tools to determine what is locking the connection? It might be something like next row locking. I don't know the MySQL tools that well, but on oracle you can see what connections are blocking other connections.
Transaction timeout: You should create a second connection pool/data source with a much longer timeout. Use that connection pool for your long running task. I think your production environment is 'trying' to help you out by detecting stuck connections.
As mentioned by Justin regarding Transaction timeout, I recently faced the problem in which the connection pool ( in my case tomcat dbcp in Tomcat 7), had setting which was supposed to mark the long running connections mark abandon and then close them. After tweaking those parameters I could avoid that issue.
We are using WebSphere 6.1 on Windows connecting to a DB2 database on a different Windows machine. We are using prepared statements in our application. While tuning a database index (adding a column to the end of an index) we do not see the performance boost we saw on a test database with the same query, after changing the index the processor on database server actually was pegged.
Are the prepared statements query plans actually stored in JNDI? If so, how can they be cleared? If not, how can we clear the cache on the DB2 server?
The execution plans for prepared statements are stored in the DB2 package cache. It's possible that after an index is added, the package cache is still holding on to old access plans that are now sub-optimal.
After adding an index, you will want to issue a RUNSTATS statement on at least that index in order to provide the DB2 optimizer with the information it needs to choose a reasonable access plan.
Once the RUNSTATS statistics exist for the new index, issue a FLUSH PACKAGE CACHE statement to release any access plans that involved the affected table. The downside of this is that access plans for other dynamic SQL statements will also be ejected, leading to a temporary uptick in optimizer usage as each distinct SQL statement is optimized and cached.
http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0007117.html
Query plans are 'normally' held in the database by the RDBMS itself, with the exact life cycle being vendor specific I'd guess. They are definitely not held in a JNDI registry.
I assume there is a similar volume of data in both databases?
If so have you looked at the explain plan for both databases and confirmed they match?
If the answer to both these questions is yes I'm out of ideas and it's time to reboot the database server.