MongoCursorNotFoundException -Query failed with error code -5 - java

We are getting the following exception.
com.mongodb.MongoCursorNotFoundException: Query failed with error code -5 and error message 'Cursor 43249415092 not found on server xx.xx.xx.xx:27017'
at com.mongodb.connection.GetMoreProtocol.receiveMessage(GetMoreProtocol.java:115)
at com.mongodb.connection.GetMoreProtocol.execute(GetMoreProtocol.java:68)
at com.mongodb.connection.GetMoreProtocol.execute(GetMoreProtocol.java:37)
at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:155)
at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:219)
at com.mongodb.connection.DefaultServerConnection.getMore(DefaultServerConnection.java:194)
at com.mongodb.operation.QueryBatchCursor.getMore(QueryBatchCursor.java:197)
at com.mongodb.operation.QueryBatchCursor.hasNext(QueryBatchCursor.java:93)
at com.mongodb.MongoBatchCursorAdapter.hasNext(MongoBatchCursorAdapter.java:46)
at com.mongodb.DBCursor.hasNext(DBCursor.java:152)
We are unable to find the root cause since we are getting this exception rarely.
We also observed that the application is unable to read from cursor
but no exception is thrown.
In cases where no exception is thrown,we took the thread dump and found that the thread reading from mongo is in RUNNABLE state.
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at com.mongodb.connection.SocketStream.read(SocketStream.java:85)
at com.mongodb.connection.InternalStreamConnection.receiveResponseBuffers(InternalStreamConnection.java:503)
at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:221)
at com.mongodb.connection.UsageTrackingInternalConnection.receiveMessage(UsageTrackingInternalConnection.java:102)
at com.mongodb.connection.DefaultConnectionPool$PooledConnection.receiveMessage(DefaultConnectionPool.java:416)
at com.mongodb.connection.GetMoreProtocol.receiveMessage(GetMoreProtocol.java:112)
at com.mongodb.connection.GetMoreProtocol.execute(GetMoreProtocol.java:68)
at com.mongodb.connection.GetMoreProtocol.execute(GetMoreProtocol.java:37)
at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:155)
at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:219)
at com.mongodb.connection.DefaultServerConnection.getMore(DefaultServerConnection.java:194)
at com.mongodb.operation.QueryBatchCursor.getMore(QueryBatchCursor.java:197)
at com.mongodb.operation.QueryBatchCursor.hasNext(QueryBatchCursor.java:93)
at com.mongodb.MongoBatchCursorAdapter.hasNext(MongoBatchCursorAdapter.java:46)
at com.mongodb.DBCursor.hasNext(DBCursor.java:152)
please help me in finding the root cause of this issue?

Recently, I met the same issue. After long time research, I figured out it. In my scenario, I have 4 mongos, which behind a load balance(return the mongos IP address randomly). In my connection string, I use the load balance host as the address of the mongoDB cluster .
When the app start, the mongoDB driver create a server with a connection pool. In the connection pool, there are mixed connections coming from 4 mongos.
When you query a large data(large than the batchSize), The first batch data comes from mongos A, then when the following batch request is pushed, the connection may connect to mongos B/C or D(source code). They can't find the cursor of course. So, the MongoCursorNotFoundException is thrown.
How to handle it?
Do not use balance host in your connection string. use all mongos IP address instead. Let mongoDB driver itself to balance the request.
WRONG: mongodb://your.load.balance.host:27000/yourDB?connectTimeoutMS=60000&minPoolSize=100&maxPoolSize=100&waitqueuemultiple=20&waitqueuetimeoutms=60000
RIGHT: mongodb://10.0.0.1:27017,10.0.0.2:27017,10.0.0.3:27017,10.0.0.4:27017/yourDB?connectTimeoutMS=60000&minPoolSize=100&maxPoolSize=100&waitqueuemultiple=20&waitqueuetimeoutms=60000
There is a better solution: You can configure a unique and dedicated host for each mongos, then modify the RIGHT connection string: replace the IP address by this host.

Related

setting connectionTimedOut to 1 sec is throwing Socket Timed Out error

I am working on a web application which runs in pcf environment and it has approximately 100 users. I am using Hikari CP library to manage databae connections and customized connectionTimedout property by setting it to 1 sec in the application code. Connection pool size is set to 100.
In one scenario, making a call to stored procedure where I am explicitly creating
Connection = DriverManager.getConnection()
object as ArrayDescriptor() is expecting connection object.
I am using ArrayDescriptor as input for stored procedure requires array of object.
However this code is throwing Socket Read Timed Out error randomly
The same code was working fine when configured with dbcp library managed connection pool.
Can anyone help? What's the problem with Hikari CP library?
As per compliance rules I can't post code on public domains.
connectionTimeout
This property controls the maximum number of milliseconds that a
client (that's you) will wait for a connection from the pool. If this
time is exceeded without a connection becoming available, a
SQLException will be thrown. Lowest acceptable connection timeout is
250 ms. Default: 30000 (30 seconds)

Gremlin server withRemote connection closed - how to reconnect automatically?

I am using withRemote to connect my java application to gremlin server running in AWS with dynamodb storage backend. I am getting connection timeout after few seconds (~3.3 seconds):
org.apache.tinkerpop.gremlin.process.remote.RemoteConnectionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.nio.channels.ClosedChannelException]]
I need to figure out how to reconnect which means detecting if the connection is closed. I am not sure how to detect that. I get the above exception when I use the graph traversal, is there a way to discover it before and reconnect or is there an option in configuration that allows reconnecting automatically (like create new connection before this one closes) so my application is always connected?
In case you need, this is how I am doing connection - currently connection part is singleton when the application starts:
this.graph = EmptyGraph.instance();
GryoMessageSerializerV1d0 gryoMessageSerializerV1d0 = new GryoMessageSerializerV1d0(
GryoMapper.build().addRegistry(JanusGraphIoRegistry.getInstance()));
this.cluster = Cluster.build().serializer(gryoMessageSerializerV1d0)
.addContactPoint(configuration.getString("graphDb.host", "localhost"))
.port(configuration.getInt("graphDb.port", 8182)).create();
this.graphTraversalSource = this.graph.traversal().withRemote(DriverRemoteConnection.using(cluster));
I feel like this problem is already solved with connection.keepAlive configuration option. It defaults to 180 seconds so it's longer than your timeout of 60 seconds in your load balancer which is why it gives up.
That said, the driver should be reconnecting on its own. It's constantly trying to do that given the connectionPool.reconnectInterval but perhaps there is a condition where you're quickly exhausting all the connections to the point of getting that error....not sure. Either way, hopefully the

MQ Connection - 2009 Connection broken error on active channel

I'm upgrading our application to MQ7 (7.5.0.5) and I'm seeing some odd behavior in a small test application that I have written.
My application uses Springs CachingConnectionFactory and is configured to use only one thread.
I can see that by debugging through the code 2 tcp connections are created, one for the initial connection and one for the JMS session. Every 60 seconds, the 2 tcp connections that are used by my message sink are broken and replaced with 2 new connections.
The following error is present in the error logs on the queue manager.
05/16/2016 09:38:26 AM - Process(1609.14) User(mqm) Program(amqrmppa)
Host(xxxxxxxxx) Installation(Installation1)
VRMF(7.5.0.2) QMgr(xxxxx)
AMQ9271: Channel 'XX.XXXX.X' timed out.
EXPLANATION:
A timeout occurred while waiting to receive from the other end of channel
'XX.XXX.X'. The address of the remote end of the connection was '57.4.4.145'.
ACTION:
The return code from the (recv) [TIMEOUT] 60 seconds call was 0 (X'0').
Record these values and tell the systems administrator.
I have the following settings on my channel: DISCINT(60), SHARECNV(1), the exceptions are linked to the DISCINT time, changing that changes the frequency of the exceptions, also the
exceptions disappear with a SHARECNV value >1
Can anyone tell my why the connections are broken even when the channel is active and messages are being sent and received?
Thanks!
This sounds like APAR IV62728 which describes the symptoms you're seeing:
http://www-01.ibm.com/support/docview.wss?rs=171&uid=swg1IV62728
Fixed in 7.5.0.6. Try upgrading to that level and see if it solves the problem.
I managed to find a solution to this issue. When using the CachingConnectionFactory with an underlying IBM connection factory, an initial connection is created in a stopped state. That connection in then used to create JMSSessions.
The issue was that the initial common connection was timing out.
I managed to keep the connection active by adjusting the HBINT value to 5. It appears that a number of heatbeats are required to keep the connection open and my initial value of 20 was too high.

Timeout implementation of JPA transactions and Session invalidation

I have been handling a application which uses wicket+JPA+springs technologies.Recently we got many 5XX error in logs(greater than threshhold).During that time,There were some general problems due to unstable response times of the mainframe db2 which is backend for our application.
But after that once the mainframe is OK this application servers did not come to normal again.
There are a lot of hanging transactions (from my appplication).
There are many threads in the server that may be hung.
As users will go on keeping login or will access the links in aplication during that time the situation becomes worse.
When I look at webspehere logs I found following exceptions:
00000035 ThreadMonitor W WSVR0605W: Thread "WebContainer : 88" (000005ac)
has been active for 637111 milliseconds and may be hung.
There is/are 43 thread(s) in total in the server that may be hung.
In application logs i found following exceptions:
-->CouldNotLockPageException: Could not lock page 4. Attempt lasted 3 minutes
-->DefaultExceptionMapper - Connection lost, give up responding.
org.apache.wicket.protocol.http.servlet.ResponseIOException:
com.ibm.wsspi.webcontainer.ClosedConnectionException: OutputStream encountered error during
write.
--> JDBCExceptionReporter - [jcc][t4][2030][11211][3.67.27] A communication error occurred
during operations on the connection's underlying socket, socket input stream,
or socket output stream.
Error location: Reply.fill() - socketInputStream.read (-1). Message:
Connection reset. ERRORCODE=-4499, SQLSTATE=08001DSRA0010E: SQL State = 08001, Error Code = - 4.499
Now we are working on the solutions to this problem.The follwing are two solutions that we are thinking as of now.
1.I have gone through many forums and found that whenever we get CouldNotLockPageException then it would be better to invaidate the session and force user to login page.Currently We do not have session invalidation (logout) mechanism.So we will implement that one.
2.We need to implement transaction timeouts so that we can stop hanging transactions.
I need solution for this problem from java or server side.Here we are using wicket,jpa and springs frameworks.I have few queries.
1.How can we implement transaction timeouts in the above frameworks?
2.Will invalidating session can stop hanging transaction or threads that may hung?
Since you are already using Spring, it's as simple as that:
#Transactional(timeout = 300)
The Transaction annotation allow you to supply a timeout value(in seconds) and the transaction manager will forward it to the JTA transaction manager or your Data Source connection pool. It works nice with Bitronix Transaction Manager, which automatically picks it up.
You also need to make sure the java.sql.Conenction are always being closed and Transaction are always committed (when all operations succeeded) or rollbacked on failure.
Invalidating the user http session has nothing to do with jdbc connections. Your jdbc connection should always be committed/rollbacked and closed(which in case on connection pooling, will release the connection to the pool).
And make sure the max pool size is not greater than tour db max concurrent connections setting.

Handle a HicariCP Oracle connection attempts

I presume i have a close to default HicariConfiguration with MaximumPoolSize(5).
The problem i faced with is there're a lot of attempts to connect to database even the first one failed. I mean, for instance, the password i'm going to use to connect to Oracle is wrong and connection fails, but then we have one more attempts to connect to database which lock the account as a result.
Question: What HicariCP setting is supposed to be used to limit up to 1 number of attempt to connect?
Thanks for any information!
### UPDATE
env.conf:
jdbc {
test1 {
datasourceClassName="oracle.jdbc.pool.OracleDataSource"
dataSourceUrl=.....jdbc url
dataSourceUser=USER
dataSourcePassword=password
setMaximumPoolSize = 5
setJdbc4ConnectionTest = true
}
}
Conf file is read by means of ConfigFactory, and create HicariConfig based on conf file (setDriverClassName etc).
Output of HikariConfig:
autoCommit.....................true
connectionTimeOut..............30000
idleTimeOut....................600000
initializationFailFast.........false
isolateInternalQueries.........false
jdbc4ConnectionTest............test
maxLifetime....................1800000
minimumIdle....................5
https://github.com/brettwooldridge/HikariCP/issues/312, As explained at the end of this issue, HikariCP will keep trying to acquire a connection. It removed the acquireRetries parameters deliberately. so the way is to configure the right username/password, since DB only lock after authenticaions failures.
Here's extracted from the issue. HikariCP intends to retry forever.
Back to acquireRetries... Without a concept of acquireRetries, how
long does the dedicated thread continue to try to create a new
connection? Forever. The background creation thread will continue to
try to add a connection to the pool forever, or until one of three
conditions is met:

Categories

Resources