We have Kafka cluster which connects to AWS Kinesis though I am getting this error message when Docker container starts / Pod shows healthy though raising this exception. Any idea what this error means? We are using kinesis client and producer dependencies at implementation
```implementation 'com.amazonaws:amazon-kinesis-client:1.14.8'
implementation 'com.amazonaws:amazon-kinesis-producer:0.14.10'```
[2022-10-13 18:14:21,591] ERROR WorkerSinkTask{id=fm-sync-nprod-test-01-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. Error: The child process has been shutdown and can no longer accept messages. (org.apache.kafka.connect.runtime.WorkerSinkTask)
com.amazonaws.services.kinesis.producer.DaemonException: The child process has been shutdown and can no longer accept messages.
at com.amazonaws.services.kinesis.producer.Daemon.add(Daemon.java:173)
at com.amazonaws.services.kinesis.producer.KinesisProducer.addUserRecord(KinesisProducer.java:625)
at com.amazonaws.services.kinesis.producer.KinesisProducer.addUserRecord(KinesisProducer.java:535)
at com.amazonaws.services.kinesis.producer.KinesisProducer.addUserRecord(KinesisProducer.java:411)
at com.amazon.kinesis.kafka.AmazonKinesisSinkTask.addUserRecord(AmazonKinesisSinkTask.java:233)
at com.amazon.kinesis.kafka.AmazonKinesisSinkTask.put(AmazonKinesisSinkTask.java:142)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:581)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:333)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:234)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:203)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:244)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.base/java.lang.Thread.run(Thread.java:844)
[2022-10-13 18:14:21,596] WARN WorkerSinkTask{id=fm-sync-nprod-test-01-0} Offset commit failed during close (org.apache.kafka.connect.runtime.WorkerSinkTask)
[2022-10-13 18:14:21,597] ERROR WorkerSinkTask{id=fm-sync-nprod-test-01-0} Commit of offsets threw an unexpected exception for sequence number 1: null (org.apache.kafka.connect.runtime.WorkerSinkTask)
com.amazonaws.services.kinesis.producer.DaemonException: The child process has been shutdown and can no longer accept messages.
at com.amazonaws.services.kinesis.producer.Daemon.add(Daemon.java:173)
at com.amazonaws.services.kinesis.producer.KinesisProducer.flush(KinesisProducer.java:916)
at com.amazonaws.services.kinesis.producer.KinesisProducer.flush(KinesisProducer.java:936)
at com.amazonaws.services.kinesis.producer.KinesisProducer.flushSync(KinesisProducer.java:962)
at com.amazon.kinesis.kafka.AmazonKinesisSinkTask.lambda$flush$0(AmazonKinesisSinkTask.java:107)
at java.base/java.util.HashMap$Values.forEach(HashMap.java:981)
at com.amazon.kinesis.kafka.AmazonKinesisSinkTask.flush(AmazonKinesisSinkTask.java:105)
at org.apache.kafka.connect.sink.SinkTask.preCommit(SinkTask.java:125)
at org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:404)
at org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:646)
at org.apache.kafka.connect.runtime.WorkerSinkTask.closeAllPartitions(WorkerSinkTask.java:641)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:204)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:244)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.base/java.lang.Thread.run(Thread.java:844)
[2022-10-13 18:14:22,338] ERROR WorkerSinkTask{id=fm-sync-nprod-test-01-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:611)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:333)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:234)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:203)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:244)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: com.amazonaws.services.kinesis.producer.DaemonException: The child process has been shutdown and can no longer accept messages.
at com.amazonaws.services.kinesis.producer.Daemon.add(Daemon.java:173)
at com.amazonaws.services.kinesis.producer.KinesisProducer.addUserRecord(KinesisProducer.java:625)
at com.amazonaws.services.kinesis.producer.KinesisProducer.addUserRecord(KinesisProducer.java:535)
at com.amazonaws.services.kinesis.producer.KinesisProducer.addUserRecord(KinesisProducer.java:411)
at com.amazon.kinesis.kafka.AmazonKinesisSinkTask.addUserRecord(AmazonKinesisSinkTask.java:233)
at com.amazon.kinesis.kafka.AmazonKinesisSinkTask.put(AmazonKinesisSinkTask.java:142)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:581)
... 10 more
Related
We have deployed an Ignite 2.11.1 cluster in Kubernetes. When a client node attempts to join the grid, we are getting the tcp-disco-msg-worker blocked below.
We have used the same configuration successfully with another deployment so we are not certain why we are getting this error. The only change is that in the deployment that is failing we have added resource limits to the configuration.
[20:07:24,201][SEVERE][tcp-disco-msg-worker-[crd]-#2-#48][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#78, blockedFor=4562s]
[20:07:24] Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=db-checkpoint-thread, igniteInstanceName=null, finished=false, heartbeatTs=1657047082073]]]
[20:07:27,073][SEVERE][tcp-disco-msg-worker-[crd]-#2-#48][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=sys-stripe-0, threadName=sys-stripe-0-#1, blockedFor=4072s]
Thread [name="qtp2015455415-61", id=61, state=TIMED_WAITING, blockCnt=1, waitCnt=702]
Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#78322a4, ownerName=null, ownerId=-1]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:382)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:973)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1023)
at java.lang.Thread.run(Thread.java:748)
Thread [name="qtp2015455415-60", id=60, state=RUNNABLE, blockCnt=3, waitCnt=679]
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked sun.nio.ch.Util$3#7a7b4390
- locked java.util.Collections$UnmodifiableSet#220c68db
- locked sun.nio.ch.EPollSelectorImpl#4f569077
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
at org.eclipse.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:183)
at org.eclipse.jetty.io.ManagedSelector.select(ManagedSelector.java:190)
at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:606)
at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:543)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:360)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:184)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
at java.lang.Thread.run(Thread.java:748)
Thread [name="qtp2015455415-59", id=59, state=TIMED_WAITING, blockCnt=1, waitCnt=684]
Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#78322a4, ownerName=null, ownerId=-1]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:382)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:973)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1023)
at java.lang.Thread.run(Thread.java:748)
Thread [name="qtp2015455415-58", id=58, state=TIMED_WAITING, blockCnt=1, waitCnt=688]
Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#78322a4, ownerName=null, ownerId=-1]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:382)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:973)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1023)
at java.lang.Thread.run(Thread.java:748)
Client side errors:
06-Jul-2022 15:16:04.772 INFO [exchange-worker-#45%igniteClientInstance%] org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading Illegal access: this web application instance has been stopped already. Could not load [java.lang.OutOfMemoryError]. The
following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [java.lang.OutOfMemoryError]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1427)
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1415)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1254)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1215)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3219)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.base/java.lang.Thread.run(Thread.java:833)
06-Jul-2022 15:16:43.252 INFO [page-lock-tracker-timeout] org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading Illegal access: this web application instance has been stopped already. Could not load [org.apache.ignite.internal.processors.cache.persi
stence.diagnostic.pagelocktracker.SharedPageLockTracker$State]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [org.apache.ignite.internal.processors.cache.persistence.diagnostic.pagelocktracker.SharedPageLockTracker$State]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1427)
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1415)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1254)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1215)
at org.apache.ignite.internal.processors.cache.persistence.diagnostic.pagelocktracker.SharedPageLockTracker.getThreadOperationState(SharedPageLockTracker.java:285)
at org.apache.ignite.internal.processors.cache.persistence.diagnostic.pagelocktracker.SharedPageLockTracker.hangThreads(SharedPageLockTracker.java:301)
at org.apache.ignite.internal.processors.cache.persistence.diagnostic.pagelocktracker.SharedPageLockTracker.access$200(SharedPageLockTracker.java:53)
at org.apache.ignite.internal.processors.cache.persistence.diagnostic.pagelocktracker.SharedPageLockTracker$TimeOutWorker.iteration(SharedPageLockTracker.java:359)
at org.apache.ignite.internal.util.worker.CycleThread.run(CycleThread.java:49)
06-Jul-2022 15:25:42.469 INFO [tcp-comm-worker-#1%igniteClientInstance%-#28%igniteClientInstance%] org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading Illegal access: this web application instance has been stopped already. Could not load [java.lan
g.OutOfMemoryError]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [java.lang.OutOfMemoryError]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1427)
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1415)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1254)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1215)
at org.apache.ignite.spi.communication.tcp.internal.CommunicationWorker.body(CommunicationWorker.java:194)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$6.body(TcpCommuicationSpi.java:928)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
We have a AWS Neptune db.r4.large instance which with 2vCPUs can run 4 threads and hence 4 queries maximum at a time. We use the Gremlin Java client to connect to AWS Neptune. We are seeing regular persistent java.util.concurrent.TimeoutException exceptions while querying data.
Reading up on this AWS doc: https://docs.aws.amazon.com/neptune/latest/userguide/best-practices-gremlin-java-exceptions.html
It looks to be a case of client side throttling since we do not see any utilization of the Neptune queue with MainRequestQueuePendingRequests = 0. We did as recommended in the post (code below). With maxConnectionPoolSize = maxSimultaneousUsagePerConnection = maxInProcessPerConnection all set to 64, we have maxParallelQueries = 4096. Given our db instance and query latency, this seems like an overkill but we hoped we would at least shift some of these errors on to AWS Neptune to see the queue being utilized and see some throttling exceptions. However, all the requests that reach Neptune are processed correctly. The problem remains that almost half the requests never reach AWS Neptune and error out on the client side.
Other relevant metrics are the maxWaitForConnection = 16s (default) and the
avg(aws.neptune.gremlin_web_socket_open_connections) = ~10 (pretty steady). Anyone has any suggestions how we could resolve these exceptions?
#Bean("gremlinClusterRW", destroyMethod = "close")
fun gremlinClusterRW(): Cluster {
return Cluster.build()
.addContactPoint(endpointRW)
.port(port)
.channelizer(SigV4WebSocketChannelizer::class.java)
.maxConnectionPoolSize(64)
.maxInProcessPerConnection(64)
.maxSimultaneousUsagePerConnection(64)
.enableSsl(true)
.create()
}
Exception Stacktrace:
org.springframework.kafka.listener.ListenerExecutionFailedException: Listener method 'public void com.listeners.a.b(org.apache.kafka.clients.consumer.ConsumerRecord<java.lang.Long, a>)' threw exception; nested exception is java.lang.IllegalStateException: org.apache.tinkerpop.gremlin.process.remote.RemoteConnectionException: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timed out while waiting for an available host - check the client configuration and connectivity to the server if this message persists; nested exception is java.lang.IllegalStateException: org.apache.tinkerpop.gremlin.process.remote.RemoteConnectionException: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timed out while waiting for an available host - check the client configuration and connectivity to the server if this message persists
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.decorateException(KafkaMessageListenerContainer.java:2114)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeErrorHandler(KafkaMessageListenerContainer.java:2106)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeRecordListener(KafkaMessageListenerContainer.java:2001)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeWithRecords(KafkaMessageListenerContainer.java:1928)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeRecordListener(KafkaMessageListenerContainer.java:1814)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeListener(KafkaMessageListenerContainer.java:1531)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1178)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1075)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalStateException: org.apache.tinkerpop.gremlin.process.remote.RemoteConnectionException: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timed out while waiting for an available host - check the client configuration and connectivity to the server if this message persists
at
...
(redacted)
...
at jdk.internal.reflect.GeneratedMethodAccessor251.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.messaging.handler.invocation.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:171)
at org.springframework.messaging.handler.invocation.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:120)
at org.springframework.kafka.listener.adapter.HandlerAdapter.invoke(HandlerAdapter.java:48)
at org.springframework.kafka.listener.adapter.MessagingMessageListenerAdapter.invokeHandler(MessagingMessageListenerAdapter.java:330)
at org.springframework.kafka.listener.adapter.RecordMessagingMessageListenerAdapter.onMessage(RecordMessagingMessageListenerAdapter.java:86)
at org.springframework.kafka.listener.adapter.RecordMessagingMessageListenerAdapter.onMessage(RecordMessagingMessageListenerAdapter.java:51)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeOnMessage(KafkaMessageListenerContainer.java:2069)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeOnMessage(KafkaMessageListenerContainer.java:2051)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeRecordListener(KafkaMessageListenerContainer.java:1988)
... 8 common frames omitted
Caused by: org.apache.tinkerpop.gremlin.process.remote.RemoteConnectionException: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timed out while waiting for an available host - check the client configuration and connectivity to the server if this message persists
at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteConnection.submitAsync(DriverRemoteConnection.java:227)
at org.apache.tinkerpop.gremlin.process.remote.traversal.step.map.RemoteStep.promise(RemoteStep.java:89)
... 36 common frames omitted
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timed out while waiting for an available host - check the client configuration and connectivity to the server if this message persists
at org.apache.tinkerpop.gremlin.driver.Client$AliasClusteredClient.submitAsync(Client.java:573)
at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteConnection.submitAsync(DriverRemoteConnection.java:225)
... 37 common frames omitted
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timed out while waiting for an available host - check the client configuration and connectivity to the server if this message persists
at org.apache.tinkerpop.gremlin.driver.Client.submitAsync(Client.java:371)
at org.apache.tinkerpop.gremlin.driver.Client$AliasClusteredClient.submitAsync(Client.java:591)
at org.apache.tinkerpop.gremlin.driver.Client$AliasClusteredClient.submitAsync(Client.java:571)
... 38 common frames omitted
Caused by: java.util.concurrent.TimeoutException: Timed out while waiting for an available host - check the client configuration and connectivity to the server if this message persists
at org.apache.tinkerpop.gremlin.driver.Client$ClusteredClient.chooseConnection(Client.java:495)
at org.apache.tinkerpop.gremlin.driver.Client$AliasClusteredClient.chooseConnection(Client.java:630)
at org.apache.tinkerpop.gremlin.driver.Client.submitAsync(Client.java:366)
... 40 common frames omitted
How do deal with h2 database inability to deal with interrupts, I was occasionally seeing that my embedded h2 database appeared to get corrupted, in particular I had amended an ExecutorService so that if a task took too long it would cancel the task. The task would be cancelled okay but then subsequent database access failed with exceptions such as
23/07/2019 14.23.31:BST:DeleteDuplicatesController:start:SEVERE: commit failed
org.hibernate.TransactionException: commit failed
at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.commit(AbstractTransactionImpl.java:187)
at com.jthink.songkong.db.ReportCache.save(ReportCache.java:46)
at com.jthink.songkong.reports.AbstractReport.setReportDatabaseObject(AbstractReport.java:365)
at com.jthink.songkong.reports.DeleteDuplicatesReport.setReportDatabaseObject(DeleteDuplicatesReport.java:333)
at com.jthink.songkong.reports.DeleteDuplicatesReport.closeReport(DeleteDuplicatesReport.java:377)
at com.jthink.songkong.analyse.toplevelanalyzer.DeleteDuplicatesController.deleteAnyDups(DeleteDuplicatesController.java:606)
at com.jthink.songkong.analyse.toplevelanalyzer.DeleteDuplicatesController.start(DeleteDuplicatesController.java:665)
at com.jthink.songkong.ui.swingworker.DeleteDuplicates.doInBackground(DeleteDuplicates.java:43)
at com.jthink.songkong.ui.swingworker.DeleteDuplicates.doInBackground(DeleteDuplicates.java:20)
at javax.swing.SwingWorker$1.call(SwingWorker.java:295)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at javax.swing.SwingWorker.run(SwingWorker.java:334)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.hibernate.TransactionException: unable to commit against JDBC connection
at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.doCommit(JdbcTransaction.java:116)
at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.commit(AbstractTransactionImpl.java:180)
... 14 more
Caused by: org.h2.jdbc.JdbcSQLNonTransientException: General error: "java.lang.IllegalStateException: Reading from nio:C:/Users/Paul/AppData/Roaming/SongKong/Database/Database.mv.db failed; file length -1 read length 4096 at 1541494 [1.4.199/1]"; SQL statement:
COMMIT [50000-199]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:502)
at org.h2.message.DbException.getJdbcSQLException(DbException.java:427)
at org.h2.message.DbException.get(DbException.java:194)
at org.h2.message.DbException.convert(DbException.java:347)
at org.h2.command.Command.executeUpdate(Command.java:280)
at org.h2.jdbc.JdbcConnection.commit(JdbcConnection.java:542)
at com.mchange.v2.c3p0.impl.NewProxyConnection.commit(NewProxyConnection.java:1284)
at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.doCommit(JdbcTransaction.java:112)
... 15 more
Caused by: java.lang.IllegalStateException: Reading from nio:C:/Users/Paul/AppData/Roaming/SongKong/Database/Database.mv.db failed; file length -1 read length 4096 at 1541494 [1.4.199/1]
at org.h2.mvstore.DataUtils.newIllegalStateException(DataUtils.java:883)
at org.h2.mvstore.DataUtils.readFully(DataUtils.java:420)
at org.h2.mvstore.FileStore.readFully(FileStore.java:98)
at org.h2.mvstore.MVStore.readBufferForPage(MVStore.java:1048)
at org.h2.mvstore.MVStore.readPage(MVStore.java:2186)
at org.h2.mvstore.MVMap.readPage(MVMap.java:554)
at org.h2.mvstore.Page$NonLeaf.getChildPage(Page.java:1086)
at org.h2.mvstore.Page.get(Page.java:221)
at org.h2.mvstore.MVMap.get(MVMap.java:402)
at org.h2.mvstore.MVMap.get(MVMap.java:389)
at org.h2.mvstore.MVStore.getMapName(MVStore.java:2737)
at org.h2.mvstore.MVStore.renameMap(MVStore.java:2650)
at org.h2.mvstore.tx.TransactionStore.commit(TransactionStore.java:453)
at org.h2.mvstore.tx.Transaction.commit(Transaction.java:389)
at org.h2.engine.Session.commit(Session.java:691)
at org.h2.command.dml.TransactionCommand.update(TransactionCommand.java:46)
at org.h2.command.CommandContainer.update(CommandContainer.java:133)
at org.h2.command.Command.executeUpdate(Command.java:267)
... 18 more
Caused by: java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:721)
at org.h2.store.fs.FileNio.read(FilePathNio.java:74)
at org.h2.mvstore.DataUtils.readFully(DataUtils.java:406)
... 34 more
23/07/2019 14.23.31:BST:Errors:addError:SEVERE: Adding Error:commit failed
I have since found this issue
Basically if using H2 in embedded mode, and it receives an interrupt then all subsequent access fails until the thread pool is close and reopened. In the example I give of a process having to be cancelled because it appears to be stuck there is no solution except for interrupting
I also have another case whereby usually the controller thread that doesn't directly do a database work itself so I was struggling to see why when an interrupt occurred why this would cause database errors since this is handled by controller thread. I have now worked out the issue is that Im using an ExecutorService with a fixed size BlockingQueue (so that we dont have a big queue build up in memory), but if the queue gets full then new task is actually executed by the controller thread (because of CallerRunsPolicy), so the controller thread can be making calls to database after all.
Im using H2 with hibernate and in both cases calling the following immediately after the interrupt
HibernateUtil.closeFactory();
seems to solve the issue, however I guess this means that any other threads with hibernate sessions will be broken, but at least newly opened sessins will be okay. So im not particularly happy with this workaround, any other ideas ?
Using H2 as a server is not a solution since the whole point of H2 was an embedded db self contained within application.
Although not properly documented using the async protocol allows a connection to be interrupted without breaking all other connections.
When I run Gateway I get this error on Registry service.
2018-07-31 11:23:33.472 ERROR 1617 --- [get_localhost-3] c.n.e.cluster.ReplicationTaskProcessor : It seems to be
a socket read timeout exception, it will retry later. if it continues to happen and some eureka node occupied all the cpu time, you should set property 'eureka.server.peer-node-read-timeout-ms' to a bigger value
com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
at com.sun.jersey.client.apache4.ApacheHttpClient4Handler.handle(ApacheHttpClient4Handler.java:187)
at com.netflix.eureka.cluster.DynamicGZIPContentEncodingFilter.handle(DynamicGZIPContentEncodingFilter.java:48)
at com.netflix.discovery.EurekaIdentityHeaderFilter.handle(EurekaIdentityHeaderFilter.java:27)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:570)
at com.netflix.eureka.transport.JerseyReplicationClient.submitBatchUpdates(JerseyReplicationClient.java:116
)
at com.netflix.eureka.cluster.ReplicationTaskProcessor.process(ReplicationTaskProcessor.java:80)
at com.netflix.eureka.util.batcher.TaskExecutors$BatchWorkerRunnable.run(TaskExecutors.java:187)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:161)
at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:82)
at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:278)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.jav
a:286)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:257
)
at org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java
:230)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:684)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:835)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:118)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at com.sun.jersey.client.apache4.ApacheHttpClient4Handler.handle(ApacheHttpClient4Handler.java:173)
... 10 common frames omitted
I use pre-packaged war file downloaded from Releases page.
Version 4.0.0
Interesting part is when I run microservice I don't get this error. It happens only with gateway.
Also when creating new gateway and run it exception also not appear. It happens after 1 day and after a day if you got this exception you get this every time...
I got the same error using the default configuration. The solution lies on the error message itself which is to increase the peer-node-read-timeout-ms value (default value is 200).
Go to your application.yml
server:
# see discussion about enable-self-preservation:
# https://github.com/jhipster/generator-jhipster/issues/3654
enable-self-preservation: false
peer-node-read-timeout-ms: 5000
We post messages to queue from our Java application. Recently we moved to new high availabulity Production server with same configuration as our old. But now we see a new issue whenever we are trying to post messages. After posting few messages we are getting:
"MQJMS2005: failed to create MQQueueManager An MQException occurred:
Completion Code 2, Reason 2059 MQJE011: Socket connection attempt
refused"
we did telnet and everything looks fine.The other part is whenever our MQ team tries to enable trace to capture error it works fine.
org.springframework.jms.UncategorizedJmsException: Uncategorized exception occured during JMS processing; nested exception is javax.jms.JMSException: MQJMS2005: failed to create MQQueueManager for 'AXMQMTIMSPRDHA:AXMQMTIMSPRDHA_QM'; nested exception is com.ibm.mq.MQException: MQJE001: An MQException occurred: Completion Code 2, Reason 2059
MQJE011: Socket connection attempt refused
at org.springframework.jms.support.JmsUtils.convertJmsAccessException(JmsUtils.java:316)
at org.springframework.jms.support.JmsAccessor.convertJmsAccessException(JmsAccessor.java:168)
at org.springframework.jms.core.JmsTemplate.execute(JmsTemplate.java:469)
at org.springframework.jms.core.JmsTemplate.send(JmsTemplate.java:534)
at org.springframework.jms.core.JmsTemplate.convertAndSend(JmsTemplate.java:641)
at org.springframework.jms.core.JmsTemplate.convertAndSend(JmsTemplate.java:630)
at com.lowes.trf.rerate.jms.MessageSender.sendMessageAsXml(MessageSender.java:54)
at com.lowes.trf.rerate.jms.MessageSender$$FastClassByCGLIB$$b52d5402.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:698)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:96)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:260)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:94)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:631)
at com.lowes.trf.rerate.jms.MessageSender$$EnhancerByCGLIB$$c88e6908.sendMessageAsXml(<generated>)
at com.lowes.trf.rerate.service.ReRateService.sendMessageAsXml(ReRateService.java:151)
at com.lowes.trf.rerate.batch.controller.ReRateBatchController.postResponseToEsbAsXml(ReRateBatchController.java:300)
at com.lowes.trf.rerate.batch.controller.ReRateBatchController.execute(ReRateBatchController.java:229)
at com.lowes.trf.rerate.batch.controller.ReRateBatchController$$FastClassByCGLIB$$66bcc521.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:627)
at com.lowes.trf.rerate.batch.controller.ReRateBatchController$$EnhancerByCGLIB$$44b4ed47.execute(<generated>)
at com.lowes.trf.rerate.batch.controller.ReRateBatchController.main(ReRateBatchController.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
at java.lang.reflect.Method.invoke(Method.java:620)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: javax.jms.JMSException: MQJMS2005: failed to create MQQueueManager for 'AXMQMTIMSPRDHA:AXMQMTIMSPRDHA_QM'
at com.ibm.mq.jms.services.ConfigEnvironment.newException(ConfigEnvironment.java:586)
at com.ibm.mq.jms.MQConnection.createQM(MQConnection.java:2110)
at com.ibm.mq.jms.MQConnection.createQMNonXA(MQConnection.java:1532)
at com.ibm.mq.jms.MQQueueConnection.<init>(MQQueueConnection.java:150)
at com.ibm.mq.jms.MQQueueConnectionFactory.createQueueConnection(MQQueueConnectionFactory.java:185)
at com.ibm.mq.jms.MQQueueConnectionFactory.createQueueConnection(MQQueueConnectionFactory.java:112)
at com.ibm.mq.jms.MQQueueConnectionFactory.createConnection(MQQueueConnectionFactory.java:1050)
at org.springframework.jms.support.JmsAccessor.createConnection(JmsAccessor.java:184)
at org.springframework.jms.core.JmsTemplate.access$500(JmsTemplate.java:85)
at org.springframework.jms.core.JmsTemplate$JmsTemplateResourceFactory.createConnection(JmsTemplate.java:1031)
at org.springframework.jms.connection.ConnectionFactoryUtils.doGetTransactionalSession(ConnectionFactoryUtils.java:297)
at org.springframework.jms.core.JmsTemplate.execute(JmsTemplate.java:453)
... 27 more
IBM Documentation says the remote MQ Manager is possibly down, and make sure the channels you are using are fine. Also when you see the connection being refused, run dspmq on the remote machine to ensure the MQManager is really up.