I encountered a critical problem with the c3p0 library (version 0.9.5.2) that I use in my Java SE application.
My application uses a Thread Pool to parallelize task by executing jobs.
Each job uses the database to read, update or delete data at least once up to (in very rare cases but it can happen) more than 10,000 times.
I therefore included in my project c3p0 library to have a connection pool to the database so that all workers in my thread pool can simultaneously interact with it.
I do not have any problems when running my application on my development environment (OSX 10.11), but when I run it in production (Linux Debian 8) I encounter a big problem ! Indeed it freezes....
At first it was a deadlock with the following trace stack:
[WARNING] com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector#479d237b -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tasks!
[WARNING] com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector#479d237b -- APPARENT DEADLOCK!!! Complete Status:
Managed Threads: 3
Active Threads: 3
Active Tasks:
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask#264fb34f
on thread: C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1adv4kd1qtfdi6|659f3099]-HelperThread-#2
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask#39a5576b
on thread: C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1adv4kd1qtfdi6|659f3099]-HelperThread-#1
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask#5e676544
on thread: C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1adv4kd1qtfdi6|659f3099]-HelperThread-#0
Pending Tasks:
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask#6848208c
Pool thread stack traces:
Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1adv4kd1qtfdi6|659f3099]-HelperThread-#2,5,main]
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
com.microsoft.sqlserver.jdbc.SocketFinder.findSocketUsingJavaNIO(IOBuffer.java:2438)
com.microsoft.sqlserver.jdbc.SocketFinder.findSocket(IOBuffer.java:2290)
com.microsoft.sqlserver.jdbc.TDSChannel.open(IOBuffer.java:551)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1962)
com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1627)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1458)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:772)
com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1168)
com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:175)
com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:220)
com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:206)
com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:203)
com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1138)
com.mchange.v2.resourcepool.BasicResourcePool.doAcquireAndDecrementPendingAcquiresWithinLockOnSuccess(BasicResourcePool.java:1125)
com.mchange.v2.resourcepool.BasicResourcePool.access$700(BasicResourcePool.java:44)
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask.run(BasicResourcePool.java:1870)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)
Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1adv4kd1qtfdi6|659f3099]-HelperThread-#1,5,main]
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
com.microsoft.sqlserver.jdbc.SocketFinder.findSocketUsingJavaNIO(IOBuffer.java:2438)
com.microsoft.sqlserver.jdbc.SocketFinder.findSocket(IOBuffer.java:2290)
com.microsoft.sqlserver.jdbc.TDSChannel.open(IOBuffer.java:551)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1962)
com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1627)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1458)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:772)
com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1168)
com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:175)
com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:220)
com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:206)
com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:203)
com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1138)
com.mchange.v2.resourcepool.BasicResourcePool.doAcquireAndDecrementPendingAcquiresWithinLockOnSuccess(BasicResourcePool.java:1125)
com.mchange.v2.resourcepool.BasicResourcePool.access$700(BasicResourcePool.java:44)
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask.run(BasicResourcePool.java:1870)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)
Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1adv4kd1qtfdi6|659f3099]-HelperThread-#0,5,main]
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
com.microsoft.sqlserver.jdbc.SocketFinder.findSocketUsingJavaNIO(IOBuffer.java:2438)
com.microsoft.sqlserver.jdbc.SocketFinder.findSocket(IOBuffer.java:2290)
com.microsoft.sqlserver.jdbc.TDSChannel.open(IOBuffer.java:551)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1962)
com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1627)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1458)
com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:772)
com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1168)
com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:175)
com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:220)
com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:206)
com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:203)
com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1138)
com.mchange.v2.resourcepool.BasicResourcePool.doAcquireAndDecrementPendingAcquiresWithinLockOnSuccess(BasicResourcePool.java:1125)
com.mchange.v2.resourcepool.BasicResourcePool.access$700(BasicResourcePool.java:44)
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask.run(BasicResourcePool.java:1870)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)
Subsequently I made some changes following the advice on different websites:
System.setProperty("com.mchange.v2.log.MLog", "com.mchange.v2.log.FallbackMLog");
System.setProperty("com.mchange.v2.log.FallbackMLog.DEFAULT_CUTOFF_LEVEL", "WARNING");
// Create db pool
final ComboPooledDataSource cpds = new ComboPooledDataSource() ;
// Driver
cpds.setDriverClass( "com.microsoft.sqlserver.jdbc.SQLServerDriver" ); // loads the jdbc driver
// Url
cpds.setJdbcUrl( "jdbc:xxxx://xxxxx:xxxx;database=xxxxx;" );
// Username / Password
cpds.setUser( "xxxx" ) ;
cpds.setPassword( "xxxx" ) ;
// Start size of db pool
cpds.setInitialPoolSize( 8 );
// Min and max db pool size
cpds.setMinPoolSize( 8 ) ;
cpds.setMaxPoolSize( 10 ) ;
// ????
cpds.setNumHelperThreads( 5 ) ;
// Max allowed time to execute statement for a connection
// #See http://stackoverflow.com/questions/14730379/apparent-deadlock-creating-emergency-threads-for-unassigned-pending-tasks
cpds.setMaxAdministrativeTaskTime( 60 ) ;
// ?????
cpds.setMaxStatements( 180 ) ;
cpds.setMaxStatementsPerConnection( 180 ) ;
// ?????
cpds.setUnreturnedConnectionTimeout( 60 ) ;
// ?????
cpds.setStatementCacheNumDeferredCloseThreads(1);
// We make a test : open and close opened connection
cpds.getConnection().close() ;
After these changes, after the execution of some jobs, the application freezes for several ten of seconds and then displays this error message:
[WARNING] A task has exceeded the maximum allowable task time. Will interrupt() thread [Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1ao3z0x88z7oi|4dd889bd]-HelperThread-#4,5,main]], with current task: com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask#4128b402
[WARNING] Thread [Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1ao3z0x88z7oi|4dd889bd]-HelperThread-#4,5,main]] interrupted.
[WARNING] A task has exceeded the maximum allowable task time. Will interrupt() thread [Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1ao3z0x88z7oi|4dd889bd]-HelperThread-#3,5,main]], with current task: com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask#5d6aab6d
[WARNING] Thread [Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1ao3z0x88z7oi|4dd889bd]-HelperThread-#3,5,main]] interrupted.
[WARNING] A task has exceeded the maximum allowable task time. Will interrupt() thread [Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1ao3z0x88z7oi|4dd889bd]-HelperThread-#0,5,main]], with current task: com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask#70a3328f
[WARNING] Thread [Thread[C3P0PooledConnectionPoolManager[identityToken->z8kfsx9l1ao3z0x88z7oi|4dd889bd]-HelperThread-#0,5,main]] interrupted.
My questions are :
Why does the application work perfectly in a development environment and encounters these problems during production?
Above all, how to remedy it?
When a connection reaches the maximum number of statements defined with setMaxStatements and setMaxStatementsPerConnection, what happens to it? The connection is closed then another takes over while another one is created?
I did not quite understand the impact that the setStatementCacheNumDeferredCloseThreads function has on my application.
Thank you very much ! Have a good day.
OK. So. Your basic problem is simple. In your production environment, Connection acquisition attempts are eventually freezing, that is they are neither succeeding nor failing with an Exception, they are simply hanging. Ultimately, this is what you have to debug: Why is it that when c3p0 tries to connect to your production database, sometimes those calls to Driver.connect() hang? Whatever is causing that is outside of c3p0's control. You could be hitting limits in total connections at the DBMS side (not from this application, your maxPoolSize is quite modest, but perhaps your production server is overextended). If you are running on an older JVM, there was a known problem with hangs to SQLServer, see e.g. JDBC connection hangs with no response from SQL Server 2008 r2 Driver.getConnection hangs using SQLServer driver and Java 1.6.0_29 but I doubt you are running Java 6 at this point, and I don't know of more recent issues. In any case, it's quite clear from your logs that this is what is happening: c3p0 is trying to acquire Connections from the DBMS, the DBMS is hanging indefinitely, eventually all of c3p0's helper threads get saturated by hung tasks and you see an APPARENT DEADLOCK. To resolve the issue, you have to debug why attempts by your JDBC driver to connect to your DBMS sometimes hang.
Most of the things you did after scrounging random troubleshooting posts were not very relevant to this issue. The thing that did cause your logs to change was this setting
cpds.setMaxAdministrativeTaskTime( 60 );
That works around the problem in an ugly way. If a task hangs for a long period of time, that setting causes c3p0 to interrupt() the Thread on which it is running and abandon it. That prevents the deadlocks, but doesn't address their cause.
There is a surprising change, though, between the two logs. The replacement of APPARENT DEADLOCK spew with reports that a 'task has exceeded the maximum allowable task time' were to be expected. But interestingly, in your second log, the tasks that get interrupt()ed are not Connection acquisition attempts, but Connection destruction attempts. I don't know why that has changed, but the core issue is the same: attempts by your JDBC driver to interact with your DBMS are freezing indefinitely, nether succeeding nor failing promptly with an Exception. That is what you need to debug.
If you can't resolve the problem, you may be able to work around it. It's very ugly, but if you reduce maxAdministrativeTaskTime (say to 30) and increase numHelperThreads (say to 20), you may be able to largely eliminate the application pauses, as long as the freezes are infrequent. Increasing numHelperThreads increases the number of frozen tasks c3p0's Thread pool can tolerate before being completely blocked. reducing maxAdministrativeTaskTime reduces the lifetime of blockages. Obviously, the right thing to do is to debug the problem between the JDBC driver and the DBMS. But if that proves impossible, sometimes a workaround is the best you can do.
I would eliminate (at least for now) these three settings:
// ?????
cpds.setMaxStatements( 180 ) ;
cpds.setMaxStatementsPerConnection( 180 ) ;
// ?????
cpds.setStatementCacheNumDeferredCloseThreads(1);
The first two turn Statement caching on, which may or may not be desirable from a performance perspective for your application. But they increase the complexity of c3p0's interaction with the DBMS. SQLServer (among several databases) is very fragile with respect to multithreaded use of a Connection (which, at least per early versions of the JDBC spec, should be legal, but too bad). Setting statementCacheNumDeferredCloseThreads to 1 ensures that the Statement cache doesn't try to close an expiring Statement while the Connection is otherwise in use, and so prevents freezes, APPARENT DEADLOCKs that show up usually as hung Statement close tasks, not your issue. If you turn the Statement cache on, by all means keep statementCacheNumDeferredCloseThreads set to 1 to avoid freezes. But the safest, sanest thing is to avoid all the complexity of the Statement cache until you have your main issue debugged. You can restore these settings later to test whether they improve your application's performance. (If you do turn the Statement cache back on, my suggestion is that you just set maxStatementsPerConnection, and do not set a global maxStatements, or if you set both, set the per-Connection limit to a value much smaller than the global limit. But again, for now, just turn all this stuff off.)
To get to your specific questions:
Why does the application work perfectly in a development environment and encounters these problems during production?
That's an important clue that you want to use in debugging the hang between you JDBC driver and your DBMS. Something about your production server leads to a hang that does not show up in your development server. That may just a matter of the relatively low load on your development server and high load on your production server. But there may be other differences in settings that provide clues about the hangs.
Above all, how to remedy it?
Debug the hangs. If you cannot debug the hangs, try working around the issue with a shorter maxAdministrativeTaskTime and a larger numHelperThreads.
When a connection reaches the maximum number of statements defined with setMaxStatements and setMaxStatementsPerConnection, what happens to it? The connection is closed then another takes over while another one is created?
A Connection doesn't reach any of those things. These are parameters that describe the Statement cache. When the total number of cached Statements hits maxStatements, the least-recently-used cached Statement is closed (just the Statement, not its Connection). When a Connection's maxStatementsPerConnection is hit, that Connection's least-recently-used cached Statement is closed (but the Connection itself remains open and active).
I did not quite understand the impact that the setStatementCacheNumDeferredCloseThreads function has on my application.
If you are using the Statement cache (again, I recommend you turn it off for now), this setting ensures that expired Statements (see above) are not close()ed while their parent Connection is in use by some other Thread. The setting creates a dedicated Thread (or Threads) whose sole purpose is to wait for when Connections are no longer in use and close them only then (thus, statement cache deferred close threads).
I hope this helps!
Update: The bug that you are experiencing does look very much like the Java 6 bug. If you are running Java 6, you are in luck, the fix is probably just to update your production JVM to the most recent version of Java 6.
Related
The java application using ojdbc6.jar, tomcat 7, tomcat-dbcp-8.0.3.jar (and other jars that's probably not relevant to this question), JDK 7 (u51)
We have identified that there is a connection leak using v$session report where some connection gets into INACTIVE state for 7+ hours. This is also confirmed by Thread dump taken at frozen state.
Heap Dump (taken at frozen state) shows:
Total PoolableConnection and DefaultPooledObject is equal to maxTotal (expected for exhausted pool)
Each connection is associated to PooledObjectState ALLOCATED
(expected)
lastReturnTime = lastUserTime = lastBorrowedTime (in
DefaultPooledObject), which to me means: THREAD-1 (good workflow)
returned the connection which was immediately borrowed by THREAD-2
(bad workflow having leak) and THREAD-2 never closed the
connection, let it dangling!
ALL of the above observation makes sense, since we definitely have connection leak and eventual exhausted pool
My question is:
When I see, details about PoolableConnection, it has associated boolean _closed which is "true". Why/How could it have "_closed = true".
when I decompiled tomcat-dbcp jar, I could see that every time _closed is marked true, it will also associate IDLE state to connection object (instead of ALLOCATED).
Looking for theories on why this boolean is true.
PS: We have various ideas (like setting logAbandoned) to find the exact piece of code responsible for connection leak, I am looking forward to find a reason (or theory) for heap dump to capture these PoolableConnection _closed=true case.
Looking at DelegatingConnection source code it can be seen that closed=true can be set as the result of connection.close() or in a finally block after some exceptions as a safety measure.
} finally {
closed = true;
}
There's a leak, the connection is in an inconsistent state because it could not be closed and is probably ready to be processed in the ABANDONED life cycle phase.
Inspecting the pool through JMX could give another perspective.
The leak could be related to an improperly handled exception that otherwise would have given a hint on the pool bad state.
I have a scenario in production for a web app, where when a form is submitted the data gets stored in 3 tables in Oracle DB through JDBC. Sometimes I am seeing connection time out errors in logs while the app is trying to connect to Oracle DB through Java code. This is intermittent.
Below is the exception:
SQL exception while storing data in table
java.sql.SQLRecoverableException: IO Error: Connection timed out
Most of the times the web app is able to connect to data base and insert values in it but some times and I am getting this time out error and unable to insert data in it. I am not sure why am I getting this intermittent issue. When I checked the connections pool config in my application, I noticed the following things:
Pool Size (Maximum number of Connections that this pool can open) : 10
Pool wait (Maximum wait time, in milliseconds, before throwing an Exception if all pooled Connections are in use) : 1000
Since the pool size is just 10 and if there are multiple users trying to connect to data base will this connection time out issue occur ?
Also since there are 3 tables where the data insertion occurs we are doing the whole insertion in just one connection itself. We are not opneing each DB connection for each individual table.
NOTE: This application is deployed on AEM (Content Management system) server and connections pool config is provided by them.
Update: I tried setting the validation query in the connections pool but still I am getting the connection time out error. I am not sure whether the connections pool has checked the validation query or not. I have attached the connections pool above for reference.
I would try two things:
Try setting a validation query so each time the pool leases a connection, you're sure it's actually available. select 1 from dual should work. On recent JDBC drivers that should not be required but you might give it a go.
Estimate the concurrency of your form. A 10 connections pool is not too small depending on the complexity of your work on DB. It seems you're saving a form so it should not be THAT complex. How many users per day do you expect? Then, on peak time, how many users do you expect to be using the form at the same time? A 10 connections pool often leases and retrieves connections quite fast so it can handle several transactions per second. If you expect more, increase the size slightly (more than 25-30 actually degrades DB performance as more queries compete for resources there).
If nothing seems to work, it would be good to check what's happening on your DB. If possible, use Enterprise Manager to see if there are latches while doing stuff on those three tables.
I give this answer from programming point of view. There are multiple possibilities for this problem. These are following and i have added appropriate solution for it. As connection timeout occurs, means your new thread do not getting database access within mentioned time and it is due to:
Possibility I: Not closing connection, there should be connection leakage somewhere in your application Solution
You need to ensure this thing and need to check for this leakage and close the connection after use.
Possibility II: Big Transaction Solution
i. Is these insertion synchronized, if it is so then use it very carefully. Use it at block level not method level. And your synchronized block size should be minimum as much as possible.
What happen is if we have big synchronized block, we give connection, but it will be in waiting state as this synchronized block needs too much time for execution. so other thread waiting time increases. Suppose we have 100 users, each have 100 threads for that operation. 1st is executing and it takes too long time. and others are waiting. So there may be a case where 80th 90th,etc thread throw timeout. And For some thread this issue occurs.
So you must need to reduce size of the synchronized block.
ii. And also for this case also check If the transaction is big, then try to cut the transaction into smaller ones if possible:-
For an example here, for one insertion one small transaction. for second other small transaction, like this. And these three small transaction completes operation.
Possibility III: Pool size is not enough if usability of application is too high Solution
Need to increase the pool size. (It is applicable if you properly closes all the connection after use)
You can use Java Executor service in this case .One thread One connection , all asynchronous .Once transaction completed , release the connection back to pool.That way , you can get rid of this timeout issue.
If one connection is inserting the data in 3 tables and other threads trying to make connection are waiting, timeout is bound to happen.
I'm having an issue with the jdbc Connection Pool on glassfish handing out dead database connections. I'm running Glassfish 3.1.2.2 using jconn3 (com.sybase.jdbc3) to connect to Sybase 12.5. Our organization has a nightly reboot process during which time we restart the Sybase server. My issue manifests itself when an attempt to use a database connection during the reboot occurs. Here are the order of operations to produce my issue:
Sybase is down for restart.
Connection is requested from the pool.
DB operation fails as expected.
Connection is returned to the pool in a closed state.
Sybase is back up.
Connection is requested from the pool.
DB operation fails due to "Connection is already closed" exception.
Connection is returned to the pool
I've implemented a database recovery singleton that attempts to recover from this scenario. Any time a database exception occurs I make a jmx call to pause all queue's and execute a flushConnectionPool operation on the JDBC Connection Pool. If the database connection is still not up the process sets up a timer to retry in 10 minutes. While this process works, it's not without flaws.
I realize there's a setting on the pool so that you can require validation on the database connection prior to handing it out but I've shied away from this for performance reasons. My process performs approximately 5 million database transactions a day.
My question is, does anyone know of a way to avoid returning a dead connection back to the pool in the first place?
You've pretty well summed up your options. We had that problem, the midnight DB going down. For us, we turned on connection validation, but we don't have your transaction volume.
Glassfish offers a custom validation option, with which a class can be specified to do the validation.
By default, all the classes provided by Glassfish do (You'll see them offered as options in the console) is a SQL statement like this:
SELECT 1;
The syntax varies a bit between databases, SQL Server is uses '1', whereas for Postgres, it just uses 1. But the intent is the same.
The net is that it will cost you an extra DB hit every time you try to get a connection, but it's a really, really cheap hit. But still, it's a hit.
But you could implement your own version. It could do the check, say, every 10th request, or even less frequent. Roll a random number from 1 to N (N = 10, 20, 100...), if you get a '1', do the select (and fail, if it fails), otherwise return "true". But at the same time, configure it so that if you do detect an error, purge the entire pool. Clearly tweak this so you have a good chance of it happening when your db goes down at night (dunno how busy your system is at night) vs peak processing.
You could even "lower the odds" during peak processing. "if time between 6am and 6pm then odds = 1000 else odds = 100; if (random(odds) == 1) { do select... }"
A random option removes the need to maintain a thread safe counter.
In the end, it doesn't really matter, you just want a timely note that the DB is down so you can ask GF to abort the pool.
I can definitely see it thrashing a bit at the very beginning as the DB comes up, possibly refreshing the pool more than once, but that should be harmless.
Different ways you could play with that, but that's an avenue to consider.
I encounter a problem with java threads and I am not sure if its related to my approach or if thread pooling is going to resolve what I am trying to achieve.
for (int i = 0; i < 100; i++) {
verifier[i]=new Thread();
verifier[i].start();
}
I initialize 100 threads and start them. In the threads the code that gets executed is just
con = (HttpURLConnection) website.openConnection(url);
// gets only the header
con.setRequestMethod("HEAD");
con.setConnectTimeout(2000); // set timeout to 2 seconds
These threads repeat the process above over a long list of url/data.
The first 50 threads execute almost instantly then they just stop for 60 seconds or so and then there is another spike of execution that 20 of them or so finish at the same time and so on. The same deadlock occurs even if there is 4 of them.
My first guess was a deadlock. I am not sure how to resolve the issue and maintain a constant execution pace, without deadlocks and stops.
I am looking for an explanation of why this occurs and how it can be resolved.
By DeadLock I reefer to the Java Virtual Machine and how it handles thread. Not deadlock caused by my threads.
SCREENSHOT OF THREAD EXECUTION:
It looks like the threads are dying for no reason and I don't know why?!
It could be that the operating system configurable limit of tcp/ip connections gets hit, which causes the JVM to block waiting for a new TCP/IP connection to be created, which will only happen if a connection already used get's closed.
This could help to find what is going on:
profile the run with visualvm which comes with the JVM itself (run it on the command line with jvisualvm). There should be indication of how many threads are created and why are they blocked, deadlocks, etc.
Wait for it to block and take thread dumps of the JVM process to check for deadlocks in the thread stack traces using jstack or visualvm, search for the deadlock keyword.
Check with netstat -nao the state of your TCP connections, to see if the operating system limit is getting hit, if there are many connections in CLOSE_WAIT at the times the blocking occurs
Are you behind a corporate proxy/firewall, you could be hitting some other sort of security limit that prevents you from opening more TCP connections, not necessarily the limit of the operating system
If none of this helps you can always edit the question with further findinds, but based on the description of the code other limits are getting hit that on a first look don't seem related to JVM thread deadlocks, hope this helps.
While stress testing my JPA based DAO layer (Running 500 simultanious updates at the same time each in a separate thread). I encountered following - system always stuck unable to make any progress.
The problem was, that there were no available connections at some point for any thread, so no running thread could make any progress.
I have investigated this for a while and the root was REQUIRES_NEW annotation on add method in one of my JPA DAO's.
So the scenario was:
Test starts acquiring new Connection from ConnectionPool to start transaction.
After some initial phase, I call add on my DAO, causing it to request another Connection from ConnectionPool which there are no, because all the Connections by that time, were taken by parallel running tests.
I tried to play with DataSource configurations
c3p0 stucks
DBCP stucks
BoneCP stucks
MySQLDataSource fail some requests, with error - number of connections exceeded allowed.
Although I solved it by getting read of REQUIRES_NEW, with which all DataSources worked perfectly, still the best result seems to be of MySQLDataSource, since it did not stuck and just fail:)
So it seems you should not use REQUIRES_NEW at all, if you expect high throughput.
And my question:
Is there a configuration for either DataSources that could prevent this REQUIRES_NEW problem?
I played with checkout timeout in c3p0, and tests started to fail, as expected.
2 sec - 8 % passed
4 sec - 12 % passed
6 sec - 16 % passed
8 sec - 26 % passed
10 sec - 34 % passed
12 sec - 36 % passed
14/16/18 sec - 40 % passed
This is highly subjective ofcourse.
MySQLDataSource with plain configurations gave 20% of passed tests.
What about configuring a timeout for obtaining the connection? If the connection can't be obtained in say 2 seconds, the pool will abort and throw an exception.
Also note that REQUIRES is more typical. Often you'd want a call chain to share a transaction instead of starting a new transaction for each new call in a chain.
Probably any of the Connection pools can be configured to deal with this in any number of ways. Ultimately, all that REQUIRES_NEW is probably forcing your app to acquire more than one Connection per client, which is multiplying the stressfulness of your stress tests. If pools are hanging, it's probably because they are running out of Connections. If you set a sufficiently large pool size, you might resolve the issue. Alternatively, as Arjan suggests above, you can configure pools to "fail fast" instead of hanging indefinitely, if clients have to wait for a Connection. With c3p0, the config param for that would be checkoutTimeout.
Without more information about exactly what's going on when you say a Connection pool "stucks", this is necessarily guesswork. But under very high concurrent load, with any Connection pool, you'll either need to make lots of resources available (high maxPoolSize + numHelperThreads in c3p0), kick out excess clients (checkoutTimeout), or let clients just endure long (but finite!) wait times.