The java application using ojdbc6.jar, tomcat 7, tomcat-dbcp-8.0.3.jar (and other jars that's probably not relevant to this question), JDK 7 (u51)
We have identified that there is a connection leak using v$session report where some connection gets into INACTIVE state for 7+ hours. This is also confirmed by Thread dump taken at frozen state.
Heap Dump (taken at frozen state) shows:
Total PoolableConnection and DefaultPooledObject is equal to maxTotal (expected for exhausted pool)
Each connection is associated to PooledObjectState ALLOCATED
(expected)
lastReturnTime = lastUserTime = lastBorrowedTime (in
DefaultPooledObject), which to me means: THREAD-1 (good workflow)
returned the connection which was immediately borrowed by THREAD-2
(bad workflow having leak) and THREAD-2 never closed the
connection, let it dangling!
ALL of the above observation makes sense, since we definitely have connection leak and eventual exhausted pool
My question is:
When I see, details about PoolableConnection, it has associated boolean _closed which is "true". Why/How could it have "_closed = true".
when I decompiled tomcat-dbcp jar, I could see that every time _closed is marked true, it will also associate IDLE state to connection object (instead of ALLOCATED).
Looking for theories on why this boolean is true.
PS: We have various ideas (like setting logAbandoned) to find the exact piece of code responsible for connection leak, I am looking forward to find a reason (or theory) for heap dump to capture these PoolableConnection _closed=true case.
Looking at DelegatingConnection source code it can be seen that closed=true can be set as the result of connection.close() or in a finally block after some exceptions as a safety measure.
} finally {
closed = true;
}
There's a leak, the connection is in an inconsistent state because it could not be closed and is probably ready to be processed in the ABANDONED life cycle phase.
Inspecting the pool through JMX could give another perspective.
The leak could be related to an improperly handled exception that otherwise would have given a hint on the pool bad state.
Related
I asked this question (How do I call java.sql.Connection::abort?) and it led me to another question.
With
java.sql.Connection conn = ... ;
What is the difference between
conn.close();
and
conn.abort(...);
?
You use Connection.close() for a normal, synchronous, close of the connection. The abort method on the other hand is for abruptly terminating a connection that may be stuck.
In most cases you will need to use close(), but close() can sometimes not complete in time, for example it could block if the connection is currently busy (eg executing a long running query or update, or maybe waiting for a lock).
The abort method is for that situation: the driver will mark the connection as closed (hopefully) immediately, the method returns, and the driver can then use the provided Executor to asynchronously perform the necessary cleanup work (eg making sure the statement that is stuck gets aborted, cleaning up other resources, etc).
I hadn't joined the JSR-221 (JDBC specification) Expert Group yet when this method was defined, but as far as I'm aware, the primary intended users for this method is not so much application code, but connection pools, transaction managers and other connection management code that may want to forcibly end connections that are in use too long or 'stuck'.
That said, application code can use abort as well. It may be faster than close (depending on the implementation), but you won't get notified of problems during the asynchronous clean up, and you may abort current operations in progress.
However keep in mind, an abort is considered an abrupt termination of the connection, so it may be less graceful than a close, and it could lead to unspecified behaviour. Also, I'm not sure how well it is supported in drivers compared to a normal close().
Consulting the java docs seems to indicate that abort is more thorough than close, which is interesting.
abort...
Terminates an open connection. Calling abort results in: The
connection marked as closed Closes any physical connection to the
database Releases resources used by the connection Insures that any
thread that is currently accessing the connection will either progress
to completion or throw an SQLException.
close...
Releases this Connection object's database and JDBC resources
immediately instead of waiting for them to be automatically released.
Calling the method close on a Connection object that is already closed
is a no-op.
So it seems if you are only concerned with releasing the objects, use close. If you want to make sure it's somewhat more "thread safe", using abort appears to provide a more graceful disconnect.
Per Mark Rotteveel's comment (which gives an accurate summary of the practical difference), my interpretation was incorrect.
Reference: https://docs.oracle.com/javase/8/docs/api/java/sql/Connection.html#close--
I encountered a critical problem with the c3p0 library (version 0.9.5.2) that I use in my Java SE application.
My application uses a Thread Pool to parallelize task by executing jobs.
Each job uses the database to read, update or delete data at least once up to (in very rare cases but it can happen) more than 10,000 times.
I therefore included in my project c3p0 library to have a connection pool to the database so that all workers in my thread pool can simultaneously interact with it.
I do not have any problems when running my application on my development environment (OSX 10.11), but when I run it in production (Linux Debian 8) I encounter a big problem ! Indeed it freezes....
At first it was a deadlock with the following trace stack:
[WARNING] com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector#479d237b -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tasks!
[WARNING] com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector#479d237b -- APPARENT DEADLOCK!!! Complete Status:
Managed Threads: 3
Active Threads: 3
Active Tasks:
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask#264fb34f
on thread: C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1adv4kd1qtfdi6|659f3099]-HelperThread-#2
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask#39a5576b
on thread: C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1adv4kd1qtfdi6|659f3099]-HelperThread-#1
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask#5e676544
on thread: C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1adv4kd1qtfdi6|659f3099]-HelperThread-#0
Pending Tasks:
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask#6848208c
Pool thread stack traces:
Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1adv4kd1qtfdi6|659f3099]-HelperThread-#2,5,main]
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
com.microsoft.sqlserver.jdbc.SocketFinder.findSocketUsingJavaNIO(IOBuffer.java:2438)
com.microsoft.sqlserver.jdbc.SocketFinder.findSocket(IOBuffer.java:2290)
com.microsoft.sqlserver.jdbc.TDSChannel.open(IOBuffer.java:551)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1962)
com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1627)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1458)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:772)
com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1168)
com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:175)
com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:220)
com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:206)
com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:203)
com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1138)
com.mchange.v2.resourcepool.BasicResourcePool.doAcquireAndDecrementPendingAcquiresWithinLockOnSuccess(BasicResourcePool.java:1125)
com.mchange.v2.resourcepool.BasicResourcePool.access$700(BasicResourcePool.java:44)
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask.run(BasicResourcePool.java:1870)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)
Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1adv4kd1qtfdi6|659f3099]-HelperThread-#1,5,main]
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
com.microsoft.sqlserver.jdbc.SocketFinder.findSocketUsingJavaNIO(IOBuffer.java:2438)
com.microsoft.sqlserver.jdbc.SocketFinder.findSocket(IOBuffer.java:2290)
com.microsoft.sqlserver.jdbc.TDSChannel.open(IOBuffer.java:551)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1962)
com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1627)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1458)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:772)
com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1168)
com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:175)
com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:220)
com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:206)
com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:203)
com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1138)
com.mchange.v2.resourcepool.BasicResourcePool.doAcquireAndDecrementPendingAcquiresWithinLockOnSuccess(BasicResourcePool.java:1125)
com.mchange.v2.resourcepool.BasicResourcePool.access$700(BasicResourcePool.java:44)
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask.run(BasicResourcePool.java:1870)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)
Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1adv4kd1qtfdi6|659f3099]-HelperThread-#0,5,main]
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
com.microsoft.sqlserver.jdbc.SocketFinder.findSocketUsingJavaNIO(IOBuffer.java:2438)
com.microsoft.sqlserver.jdbc.SocketFinder.findSocket(IOBuffer.java:2290)
com.microsoft.sqlserver.jdbc.TDSChannel.open(IOBuffer.java:551)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1962)
com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1627)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1458)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:772)
com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1168)
com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:175)
com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:220)
com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:206)
com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:203)
com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1138)
com.mchange.v2.resourcepool.BasicResourcePool.doAcquireAndDecrementPendingAcquiresWithinLockOnSuccess(BasicResourcePool.java:1125)
com.mchange.v2.resourcepool.BasicResourcePool.access$700(BasicResourcePool.java:44)
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask.run(BasicResourcePool.java:1870)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)
Subsequently I made some changes following the advice on different websites:
System.setProperty("com.mchange.v2.log.MLog", "com.mchange.v2.log.FallbackMLog");
System.setProperty("com.mchange.v2.log.FallbackMLog.DEFAULT_CUTOFF_LEVEL", "WARNING");
// Create db pool
final ComboPooledDataSource cpds = new ComboPooledDataSource() ;
// Driver
cpds.setDriverClass( "com.microsoft.sqlserver.jdbc.SQLServerDriver" ); // loads the jdbc driver
// Url
cpds.setJdbcUrl( "jdbc:xxxx://xxxxx:xxxx;database=xxxxx;" );
// Username / Password
cpds.setUser( "xxxx" ) ;
cpds.setPassword( "xxxx" ) ;
// Start size of db pool
cpds.setInitialPoolSize( 8 );
// Min and max db pool size
cpds.setMinPoolSize( 8 ) ;
cpds.setMaxPoolSize( 10 ) ;
// ????
cpds.setNumHelperThreads( 5 ) ;
// Max allowed time to execute statement for a connection
// #See http://stackoverflow.com/questions/14730379/apparent-deadlock-creating-emergency-threads-for-unassigned-pending-tasks
cpds.setMaxAdministrativeTaskTime( 60 ) ;
// ?????
cpds.setMaxStatements( 180 ) ;
cpds.setMaxStatementsPerConnection( 180 ) ;
// ?????
cpds.setUnreturnedConnectionTimeout( 60 ) ;
// ?????
cpds.setStatementCacheNumDeferredCloseThreads(1);
// We make a test : open and close opened connection
cpds.getConnection().close() ;
After these changes, after the execution of some jobs, the application freezes for several ten of seconds and then displays this error message:
[WARNING] A task has exceeded the maximum allowable task time. Will interrupt() thread [Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1ao3z0x88z7oi|4dd889bd]-HelperThread-#4,5,main]], with current task: com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask#4128b402
[WARNING] Thread [Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1ao3z0x88z7oi|4dd889bd]-HelperThread-#4,5,main]] interrupted.
[WARNING] A task has exceeded the maximum allowable task time. Will interrupt() thread [Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1ao3z0x88z7oi|4dd889bd]-HelperThread-#3,5,main]], with current task: com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask#5d6aab6d
[WARNING] Thread [Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1ao3z0x88z7oi|4dd889bd]-HelperThread-#3,5,main]] interrupted.
[WARNING] A task has exceeded the maximum allowable task time. Will interrupt() thread [Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1ao3z0x88z7oi|4dd889bd]-HelperThread-#0,5,main]], with current task: com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask#70a3328f
[WARNING] Thread [Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1ao3z0x88z7oi|4dd889bd]-HelperThread-#0,5,main]] interrupted.
My questions are :
Why does the application work perfectly in a development environment and encounters these problems during production?
Above all, how to remedy it?
When a connection reaches the maximum number of statements defined with setMaxStatements and setMaxStatementsPerConnection, what happens to it? The connection is closed then another takes over while another one is created?
I did not quite understand the impact that the setStatementCacheNumDeferredCloseThreads function has on my application.
Thank you very much ! Have a good day.
OK. So. Your basic problem is simple. In your production environment, Connection acquisition attempts are eventually freezing, that is they are neither succeeding nor failing with an Exception, they are simply hanging. Ultimately, this is what you have to debug: Why is it that when c3p0 tries to connect to your production database, sometimes those calls to Driver.connect() hang? Whatever is causing that is outside of c3p0's control. You could be hitting limits in total connections at the DBMS side (not from this application, your maxPoolSize is quite modest, but perhaps your production server is overextended). If you are running on an older JVM, there was a known problem with hangs to SQLServer, see e.g. JDBC connection hangs with no response from SQL Server 2008 r2 Driver.getConnection hangs using SQLServer driver and Java 1.6.0_29 but I doubt you are running Java 6 at this point, and I don't know of more recent issues. In any case, it's quite clear from your logs that this is what is happening: c3p0 is trying to acquire Connections from the DBMS, the DBMS is hanging indefinitely, eventually all of c3p0's helper threads get saturated by hung tasks and you see an APPARENT DEADLOCK. To resolve the issue, you have to debug why attempts by your JDBC driver to connect to your DBMS sometimes hang.
Most of the things you did after scrounging random troubleshooting posts were not very relevant to this issue. The thing that did cause your logs to change was this setting
cpds.setMaxAdministrativeTaskTime( 60 );
That works around the problem in an ugly way. If a task hangs for a long period of time, that setting causes c3p0 to interrupt() the Thread on which it is running and abandon it. That prevents the deadlocks, but doesn't address their cause.
There is a surprising change, though, between the two logs. The replacement of APPARENT DEADLOCK spew with reports that a 'task has exceeded the maximum allowable task time' were to be expected. But interestingly, in your second log, the tasks that get interrupt()ed are not Connection acquisition attempts, but Connection destruction attempts. I don't know why that has changed, but the core issue is the same: attempts by your JDBC driver to interact with your DBMS are freezing indefinitely, nether succeeding nor failing promptly with an Exception. That is what you need to debug.
If you can't resolve the problem, you may be able to work around it. It's very ugly, but if you reduce maxAdministrativeTaskTime (say to 30) and increase numHelperThreads (say to 20), you may be able to largely eliminate the application pauses, as long as the freezes are infrequent. Increasing numHelperThreads increases the number of frozen tasks c3p0's Thread pool can tolerate before being completely blocked. reducing maxAdministrativeTaskTime reduces the lifetime of blockages. Obviously, the right thing to do is to debug the problem between the JDBC driver and the DBMS. But if that proves impossible, sometimes a workaround is the best you can do.
I would eliminate (at least for now) these three settings:
// ?????
cpds.setMaxStatements( 180 ) ;
cpds.setMaxStatementsPerConnection( 180 ) ;
// ?????
cpds.setStatementCacheNumDeferredCloseThreads(1);
The first two turn Statement caching on, which may or may not be desirable from a performance perspective for your application. But they increase the complexity of c3p0's interaction with the DBMS. SQLServer (among several databases) is very fragile with respect to multithreaded use of a Connection (which, at least per early versions of the JDBC spec, should be legal, but too bad). Setting statementCacheNumDeferredCloseThreads to 1 ensures that the Statement cache doesn't try to close an expiring Statement while the Connection is otherwise in use, and so prevents freezes, APPARENT DEADLOCKs that show up usually as hung Statement close tasks, not your issue. If you turn the Statement cache on, by all means keep statementCacheNumDeferredCloseThreads set to 1 to avoid freezes. But the safest, sanest thing is to avoid all the complexity of the Statement cache until you have your main issue debugged. You can restore these settings later to test whether they improve your application's performance. (If you do turn the Statement cache back on, my suggestion is that you just set maxStatementsPerConnection, and do not set a global maxStatements, or if you set both, set the per-Connection limit to a value much smaller than the global limit. But again, for now, just turn all this stuff off.)
To get to your specific questions:
Why does the application work perfectly in a development environment and encounters these problems during production?
That's an important clue that you want to use in debugging the hang between you JDBC driver and your DBMS. Something about your production server leads to a hang that does not show up in your development server. That may just a matter of the relatively low load on your development server and high load on your production server. But there may be other differences in settings that provide clues about the hangs.
Above all, how to remedy it?
Debug the hangs. If you cannot debug the hangs, try working around the issue with a shorter maxAdministrativeTaskTime and a larger numHelperThreads.
When a connection reaches the maximum number of statements defined with setMaxStatements and setMaxStatementsPerConnection, what happens to it? The connection is closed then another takes over while another one is created?
A Connection doesn't reach any of those things. These are parameters that describe the Statement cache. When the total number of cached Statements hits maxStatements, the least-recently-used cached Statement is closed (just the Statement, not its Connection). When a Connection's maxStatementsPerConnection is hit, that Connection's least-recently-used cached Statement is closed (but the Connection itself remains open and active).
I did not quite understand the impact that the setStatementCacheNumDeferredCloseThreads function has on my application.
If you are using the Statement cache (again, I recommend you turn it off for now), this setting ensures that expired Statements (see above) are not close()ed while their parent Connection is in use by some other Thread. The setting creates a dedicated Thread (or Threads) whose sole purpose is to wait for when Connections are no longer in use and close them only then (thus, statement cache deferred close threads).
I hope this helps!
Update: The bug that you are experiencing does look very much like the Java 6 bug. If you are running Java 6, you are in luck, the fix is probably just to update your production JVM to the most recent version of Java 6.
We are trying to integrate the HikariCP (release 2.4.4) into our application. After some time of usage the pool fails to acquire new connections throwing:
java.lang.NullPointerException
at com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:464) ~[HikariCP-2.4.4.jar:?]
at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:131) ~[HikariCP-2.4.4.jar:?]
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:171) ~[HikariCP-2.4.4.jar:?]
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:147) ~[HikariCP-2.4.4.jar:?]
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:83) ~[HikariCP-2.4.4.jar:?]
The jdbc driver we are using is ojdbc-7 with version 12.1.0.2. The pool uses following configuration:
allowPoolSuspension.............false
autoCommit......................false
catalog.........................null
connectionInitSql..............."BEGIN EXECUTE IMMEDIATE 'SET ROLE SOME_ROLE IDENTIFIED BY SOME_PASSWORD '; END;"
connectionTestQuery............."SELECT 1 FROM DUAL"
connectionTimeout...............15000
dataSource......................null
dataSourceClassName.............null
dataSourceJNDI..................null
dataSourceProperties............{v$session.machine=host, password=<masked>, v$session.program=my application}
driverClassName................."oracle.jdbc.OracleDriver"
healthCheckProperties...........{}
healthCheckRegistry.............null
idleTimeout.....................60000
initializationFailFast..........true
isolateInternalQueries..........false
jdbc4ConnectionTest.............false
jdbcUrl........................."jdbc:oracle:thin:#//host.company.com:1521/database.company.com"
leakDetectionThreshold..........1800000
maxLifetime.....................0
maximumPoolSize.................18
metricRegistry..................null
metricsTrackerFactory...........null
minimumIdle.....................2
password........................<masked>
poolName........................"TEST_POOL"
readOnly........................false
registerMbeans..................false
scheduledExecutorService........null
threadFactory...................null
transactionIsolation............null
username........................"USER_NAME"
validationTimeout...............5000
Is it a bug or a missconfiguration?
I'm not 100% sure, but it looks like either:
You are evicting a connection after you have called close(), which is not allowed. Or..
You are evicting a connection and then calling close(), which is not allowed.
When you evict a connection, you must be the owner of that connection (obtained from getConnection(), and subsequently you must not close() the connection (it will be closed automatically). And as explained above, if you have called close() already, the connection is already back in the pool and it is not valid to evict it, as you are no longer the owner.
EDIT: Let me be clearer. From studying how this exception could be reached it seems clear that you are first closing the connection, and secondly evicting the connection. The reverse (evict and then close) would not result in this error.
This is known as "use after return" and is similar to "use after free" bugs in languages without garbage collection. When you close a connection, it is returned to the pool. From that instant the connection is available to be claimed by another thread -- the caller of close() is no longer the owner.
This is exactly analogous to calling free() on memory in C/C++. Instantly after doing so the memory is available to be claimed by another caller -- the caller of free() is no longer the owner. In the C/C++ case, if you continue to use a reference to the freed memory you risk corrupting data of another thread that has now allocated it.
In the case of nearly any pooling library in Java (connection or otherwise), once you release an object back to the pool you are no longer the owner. Nothing can prevent you from retaining a reference to the returned object.
In this case, once you have called close(), the object is returned to the pool instantly. If another thread obtains the connection legally from the pool (getConnection()), while at the same time the previous owner calls evict(), you will easily run into this issue.
We may choose to harden this code path (or not). HikariCP is not particularly paternalistic philosophically; favoring documentation over code. For example, if you pass a null into evict() you will be met with an NPE somewhere. Could we check for null and ignore it? Sure. Multiply that approach across the codebase and it could easily grow by 20%. Or, how about don't do that, developer?
It is a fairly simple contract:
You can only evict a connection that you own.
As soon as you have closed a connection, you no longer own it.
So I was looking into the c3p0 API to debug one of our production issues which was resulting in a stack overflow error while checking out a connection.
I found below comments in BasicResourcePool class's checkoutResource method:
/*
* This function recursively calls itself... under nonpathological
* situations, it shouldn't be a problem, but if resources can never
* successfully check out for some reason, we might blow the stack...
*
* by the semantics of wait(), a timeout of zero means forever.
*/
I want to know what might be the reasons the resources from this pool can never get successfully checked out.
The answer might help me look into what might be going possibly wrong in my application.
So, although it's a reasonable guess, mere pool exhaustion (what happens if you leak or forget to close() Connections) won't lead to the stack overflow.
The stack overflow occurs when checkoutResource(...)
finds a Connection available to check out, and "preliminarily" checks it out; then
something goes wrong, indicating that the preliminarily checked-out Connection is not usable; so
the function goes "back to the well", recursively calling itself to try again with a fresh Connection
The mystery is in the "something goes wrong" part. There are really two things that can go wrong:
(most likely!) You have testConnectionOnCheckout set to true and all Connections are failing their Connection tests
The Connection happened to be removed (e.g. expired for exceeding maxIdleTime or maxConnectionAge) from the pool during the checkout procedure
If you are seeing this, the first thing to examine is whether there is a problem with your Connection or your Connection testing regime. Try...
Log com.mchange.v2.resourcepool.BasicResourcePool at DEBUG or FINE and look for Exceptions indicating inability to checkout. You can grep for A resource could not be refurbished for checkout. Alternatively, switch Connection testing regimes to testing idle Connections and on Connection check-in rather than on check-out, and watch the problem show-up in a perhaps less disruptive way.
If you are doing something that would force the pool to really churn Connections, setting very short timeouts or something, it's imaginable that the race condition is biting. Check your values for configuration properties maxConnectionAge, maxIdleTime, and maxIdleTimeExcessConnections and make sure that they are reasonable or not set (i.e. left at reasonable defaults).
Encountering the following error with our J2EE application:
java.sql.SQLException: Error in allocating a connection. Cause: In-use connections equal max-pool-size and expired max-wait-time. Cannot allocate more connections.
How do I know how many connections the application is currently using and what should be the optimal connection pool settings for a heavily traffic application? Can I change it, and how can I determine what I should set it to (is it a memory issue, bandwidth, etc.)?
How to know how much connection the
application is currently using
You don't give enough information to answer that. Most appservers will have some sort of JMX reporting on things like this. Alternatively, depending on the database, you could find the number of currently open connections.
what should be the optimal connection
pool settings for a heavily traffic
applicaiton
Higher than what you've got?
The above of course assumes that you're not mishandling connections. By that I mean if you're using them directly in code you should always use this idiom:
Connection conn = null;
try {
conn = ... ; // get connection
// do stuff
} finally {
if (conn != null) try { conn.close(); } catch (Exception e) { }
}
If you're not released connections back to the pool and are waiting for the garbage collector to clean them up and release them you're going to use way more connections that you actually need.
First thing to check is whether you have resource leaks. Try surveilling with jconsole or jvisualvm to see how your application behaves and if there is anything that springs in the eye.
Then the actual jdbc pool connection is inherently Java EE container specific so you need to give more information.
Are you sure you are closing everything you need to? Take a look here.
Make sure that the close goes in the finally block. You will see this code in the link:
finally
{
closeAll(resultSet, statement, connection);
}
I helped someone else find a similar issue (theirs was a memory issue instead... but if the process had gone on longer it would have had this result) where they had not closed the result set or the connection.
I think you may need to inspect ask yourself some questions, here are some to think about:
A. Are you using youre connections correctly, are you closing them and returning them to the pool after usage?
B. Are you using long running transactions, e.g. "conversations" with users? Can they be left hanging if the user terminates his usage of the application?
C. Have you designed your data access to fit your application? E.g. are you using caching techniques in areas where you expect repeated reads frequently?
D. Are your connection pool big enough? 5 years ago I had a application with 250 simultaneous connections towards a Oracle database, a lot more than what you typically find out of the box. running the application on say 50 didn't work.
To give more detailed answer, you need to provide more info on the app.
Good luck!