Gracefully handling DB Node reboot in J2EE application - java

In my organization, Oracle database is a 2 node RAC database. Each member of the cluster is on a reboot schedule that is:
Node 1 - First Sunday of each month at 1:00am
Node 2 - Second Sunday of each month at 1:00am
Whenever these node gets rebooted, I see below exception in my J2EE application log file:
org.hibernate.engine.transaction.spi.AbstractTransactionImpl.rollback(AbstractTransactionImpl.java:209)
... 154 more
Caused by: java.sql.SQLRecoverableException: ORA-01089: immediate shutdown in progress - no operations are permitted
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:389)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:382)
at oracle.jdbc.driver.T4C7Ocommoncall.processError(T4C7Ocommoncall.java:93)
at oracle.jdbc.driver.T4CTTIfun.receive
Our DBAs said that "We do a shutdown transactional local. It is supposed to try to wait for in flight transactions to finish and not allow new transactions. "
As I mentioned above, out of 2 nodes, only one node gets rebooted at a time, and considering DBAs answer.. our app should never block on database during reboot process.
My question is, why my application is throwing this exception then? And why my application is trying to connect to a DB node for which shut down is in progress?

Your application isn't trying to connect to a node that is shutting down. It's already connected when the shutdown starts.
I assume that your application maintains a connection pool in the middle tier. So, presumably, just before one of the nodes restarts, your connection pool has open connections to both nodes. When the DBA does a shutdown transactional, sessions that have active transactions are allowed to complete but most of the sessions in your connection pool that are connected to the node that is shutting down will not have active transactions at that point. When you get one of these connections from the connection pool and try to start a transaction, you'll get this error.
Most likely, you want to catch this error and reconnect which should cause a new connection to be made to the remaining node. The error you're getting is a SQLRecoverableException so it would generally make sense to try to recover.

Related

JDBC Connections not released till end of thread

I have a websphere application server with an MSSQL database connection with 10 in the connection pool. When my worker thread runs I have a process that runs the following:
Psuedo code
for(List<String> items){
Connection connection = null;
Statement statement = null;
try{
connection = getConnection();
statement = connection.createStatement();
String sql = "exec someStoredProcedure '" + item + "';";
statement.execute(sql);
}finally{
//real code includes null checks and try catch for errors
statement.close();
connection.close();
}
}
My problem is that if my loop is larger than 10 it will hang till it hits the timeout and error "ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180 seconds".
I would think that closing the statements and connections would allow me to reuse the connection from the pool. The connections do get closed/collected after the thread finishes. I've updated my code to reuse the same connection through the for loop and I can run the process rapid fire over and over without problems because each thread cleans up after itself it seems. Any idea what's going on or how to resolve? I'm worried in the future I'll have a process that needs more than 10 threads over the course of running.
The behavior that you describe sounds inconsistent with how the connection pool is supposed to work. However, there are a number of important details that impact connection reuse which are not apparent from your writeup.
You are either using sharable or unsharable connections. You can find that out by looking at the resource reference that you used for the DataSource lookup. For example, #Resource with shareable=false or a deployment descriptor resource reference with <res-sharing-scope>Unshareable</res-sharing-scope> is unsharable, whereas if you set true for the former or Shareable for the latter or omit the attribute, then the connection is sharable. If you didn't use a resource reference, connections will typically be sharable, unless you explicitly overrode that with advanced configuration.
If you are using an unsharable connection and the entire code block is not encompassed within a global transaction, then every connection.close will return the connection to the pool to become immediately reusable. If you are in a global transaction, then each connection will remain unavailable after close because there is outstanding work on it that still needs to be committed or rolled back.
If you are using a sharable connection and each of the connection requests match (same user/password or absence thereof for all), then you should keep getting the same underlying connection back and should not run out of connections. If however, the connection requests do not match, then the connections are kept around for the duration of the scope (could be a transaction or a request scope like servlet boundary) and you will keep getting a new connection with each request until you run out.
If the above isn't enough to figure out the issue, add more detail to your scenario indicating the sharability, how you request each connections, and the placement of transaction boundaries and request boundaries, so that a more detailed answer for your specific scenario can be given.

Gremlin server withRemote connection closed - how to reconnect automatically?

I am using withRemote to connect my java application to gremlin server running in AWS with dynamodb storage backend. I am getting connection timeout after few seconds (~3.3 seconds):
org.apache.tinkerpop.gremlin.process.remote.RemoteConnectionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.nio.channels.ClosedChannelException]]
I need to figure out how to reconnect which means detecting if the connection is closed. I am not sure how to detect that. I get the above exception when I use the graph traversal, is there a way to discover it before and reconnect or is there an option in configuration that allows reconnecting automatically (like create new connection before this one closes) so my application is always connected?
In case you need, this is how I am doing connection - currently connection part is singleton when the application starts:
this.graph = EmptyGraph.instance();
GryoMessageSerializerV1d0 gryoMessageSerializerV1d0 = new GryoMessageSerializerV1d0(
GryoMapper.build().addRegistry(JanusGraphIoRegistry.getInstance()));
this.cluster = Cluster.build().serializer(gryoMessageSerializerV1d0)
.addContactPoint(configuration.getString("graphDb.host", "localhost"))
.port(configuration.getInt("graphDb.port", 8182)).create();
this.graphTraversalSource = this.graph.traversal().withRemote(DriverRemoteConnection.using(cluster));
I feel like this problem is already solved with connection.keepAlive configuration option. It defaults to 180 seconds so it's longer than your timeout of 60 seconds in your load balancer which is why it gives up.
That said, the driver should be reconnecting on its own. It's constantly trying to do that given the connectionPool.reconnectInterval but perhaps there is a condition where you're quickly exhausting all the connections to the point of getting that error....not sure. Either way, hopefully the

Using galera mariadb with jdbc

I've created a mariadb cluster and I'm trying to get a Java application to be able to failover to another host when one of them dies.
I've created an application that creates a connection with "jdbc:mysql:sequential://host1,host2,host3/database?socketTimeout=2000&autoReconnect=true". The application makes a query in a loop every second. If I kill the node where the application is currently executing the query (Statement.executeQuery()) I get a SQLException because of a timeout. I can catch the exception and re-execute the statement and I see that the request is being sent to another server, so failover in that case works ok. But I was expecting that executeQuery() would not throw an exception and silently retry another server automatically.
Am I wrong in assuming that I shouldn't have to handle an exception and explicitely retry the query? Is there something more I need to configure for that to happen?
It is dangerous to auto reconnect for the following reason. Let's say you have this code:
BEGIN;
SELECT ... FROM tbl WHERE ... FOR UPDATE;
(line 3)
UPDATE tbl ... WHERE ...;
COMMIT;
Now let's say the server crashes at (line 3). The transaction will be rolled back. In my fabricated example, that only involves releasing the lock on tbl.
Now let's say that some other connection succeeds in performing the same transaction on the same row while you are auto-reconnecting.
Now, with auto-reconnect, the first thread is oblivious that the first half of the transaction was rolled back and proceeds to do the UPDATE based on data that is now out of date.
You need to get an exception so that you can go back to the BEGIN so that you can be "transaction safe".
You need this anyway -- With Galera, and no crashes, a similar thing could happen. Two threads performing that transaction on two different nodes at the same time... Each succeeds until it gets to the COMMIT, at which point the Galera magic happens and one of the COMMITs is told to fail. The 'right' response is replay the entire transaction on the server that was chosen for failure.
Note that Galera, unlike non-Galera, requires checking for errors on COMMIT.
More Galera tips (aimed at devs and dbas migrating from non-Galera)
Failover doesn't mean that application doesn't have to handle exceptions.
Driver will try to reconnect to another server when connection is lost.
If driver fail to reconnect to another server a SQLNonTransientConnectionException will be thrown, pools will automatically discard those connection.
If connection is recovered, there is some marginals cases where relaunching query is safe: when query is not in a transaction, and connection is currently in read-only mode (using Spring #Transactional(readOnly = false)) for example. For thoses cases, MariaDb java connection will then relaunch query automatically. In those particular cases, no exception will be thrown, and failover is transparent.
Driver cannot re-execute current query during a transaction.
Even without without transaction, if query is an UPDATE command, driver cannot know if the last request has been received by the database server and executed.
Then driver will send an SQLException (with SQLState begining by "25" = INVALID_TRANSACTION_STATE), and it's up to the application to handle those cases.

MQ Connection - 2009 Connection broken error on active channel

I'm upgrading our application to MQ7 (7.5.0.5) and I'm seeing some odd behavior in a small test application that I have written.
My application uses Springs CachingConnectionFactory and is configured to use only one thread.
I can see that by debugging through the code 2 tcp connections are created, one for the initial connection and one for the JMS session. Every 60 seconds, the 2 tcp connections that are used by my message sink are broken and replaced with 2 new connections.
The following error is present in the error logs on the queue manager.
05/16/2016 09:38:26 AM - Process(1609.14) User(mqm) Program(amqrmppa)
Host(xxxxxxxxx) Installation(Installation1)
VRMF(7.5.0.2) QMgr(xxxxx)
AMQ9271: Channel 'XX.XXXX.X' timed out.
EXPLANATION:
A timeout occurred while waiting to receive from the other end of channel
'XX.XXX.X'. The address of the remote end of the connection was '57.4.4.145'.
ACTION:
The return code from the (recv) [TIMEOUT] 60 seconds call was 0 (X'0').
Record these values and tell the systems administrator.
I have the following settings on my channel: DISCINT(60), SHARECNV(1), the exceptions are linked to the DISCINT time, changing that changes the frequency of the exceptions, also the
exceptions disappear with a SHARECNV value >1
Can anyone tell my why the connections are broken even when the channel is active and messages are being sent and received?
Thanks!
This sounds like APAR IV62728 which describes the symptoms you're seeing:
http://www-01.ibm.com/support/docview.wss?rs=171&uid=swg1IV62728
Fixed in 7.5.0.6. Try upgrading to that level and see if it solves the problem.
I managed to find a solution to this issue. When using the CachingConnectionFactory with an underlying IBM connection factory, an initial connection is created in a stopped state. That connection in then used to create JMSSessions.
The issue was that the initial common connection was timing out.
I managed to keep the connection active by adjusting the HBINT value to 5. It appears that a number of heatbeats are required to keep the connection open and my initial value of 20 was too high.

Timeout implementation of JPA transactions and Session invalidation

I have been handling a application which uses wicket+JPA+springs technologies.Recently we got many 5XX error in logs(greater than threshhold).During that time,There were some general problems due to unstable response times of the mainframe db2 which is backend for our application.
But after that once the mainframe is OK this application servers did not come to normal again.
There are a lot of hanging transactions (from my appplication).
There are many threads in the server that may be hung.
As users will go on keeping login or will access the links in aplication during that time the situation becomes worse.
When I look at webspehere logs I found following exceptions:
00000035 ThreadMonitor W WSVR0605W: Thread "WebContainer : 88" (000005ac)
has been active for 637111 milliseconds and may be hung.
There is/are 43 thread(s) in total in the server that may be hung.
In application logs i found following exceptions:
-->CouldNotLockPageException: Could not lock page 4. Attempt lasted 3 minutes
-->DefaultExceptionMapper - Connection lost, give up responding.
org.apache.wicket.protocol.http.servlet.ResponseIOException:
com.ibm.wsspi.webcontainer.ClosedConnectionException: OutputStream encountered error during
write.
--> JDBCExceptionReporter - [jcc][t4][2030][11211][3.67.27] A communication error occurred
during operations on the connection's underlying socket, socket input stream,
or socket output stream.
Error location: Reply.fill() - socketInputStream.read (-1). Message:
Connection reset. ERRORCODE=-4499, SQLSTATE=08001DSRA0010E: SQL State = 08001, Error Code = - 4.499
Now we are working on the solutions to this problem.The follwing are two solutions that we are thinking as of now.
1.I have gone through many forums and found that whenever we get CouldNotLockPageException then it would be better to invaidate the session and force user to login page.Currently We do not have session invalidation (logout) mechanism.So we will implement that one.
2.We need to implement transaction timeouts so that we can stop hanging transactions.
I need solution for this problem from java or server side.Here we are using wicket,jpa and springs frameworks.I have few queries.
1.How can we implement transaction timeouts in the above frameworks?
2.Will invalidating session can stop hanging transaction or threads that may hung?
Since you are already using Spring, it's as simple as that:
#Transactional(timeout = 300)
The Transaction annotation allow you to supply a timeout value(in seconds) and the transaction manager will forward it to the JTA transaction manager or your Data Source connection pool. It works nice with Bitronix Transaction Manager, which automatically picks it up.
You also need to make sure the java.sql.Conenction are always being closed and Transaction are always committed (when all operations succeeded) or rollbacked on failure.
Invalidating the user http session has nothing to do with jdbc connections. Your jdbc connection should always be committed/rollbacked and closed(which in case on connection pooling, will release the connection to the pool).
And make sure the max pool size is not greater than tour db max concurrent connections setting.

Categories

Resources