I have an application that is using future for asynchronous execution.
I set the parameter on get method, that the thread should get killed after 10 seconds, when it does not get the response:
Future<RecordMetadata> meta = producer.send(record, new ProducerCallBack());
RecordMetadata data = meta.get(10, TimeUnit.SECONDS);
But the thread get killed after 60 second:
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
at org.apache.kafka.clients.producer.KafkaProducer$FutureFailure.<init>(KafkaProducer.java:1124)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:823)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:760)
at io.khinkali.KafkaProducerClient.main(KafkaProducerClient.java:49)
Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
What am I doing wrong?
From the docs:
The threshold for time to block is determined by max.block.ms after
which it throws a TimeoutException.
Check Kafka Appender config in logback.xml, look for:
<producerConfig>max.block.ms=60000</producerConfig>
I set the parameter on get method, that the thread should get killed after 10 seconds, when it does not get the response:
If we are talking about Future.get(...) there is nothing about it that "kills" the thread at all. To quote from the javadocs, the Future.get(...) method:
Waits if necessary for at most the given time for the computation to complete, and then retrieves its result, if available.
If the get(...) method times out then it will throw TimeoutException but your thread is free to continue to run. If you want to stop the thread running then you'll need to catch TimeoutException and then call meta.cancel(true) but even that doesn't guarantee that the thread will be "killed". That causes the thread to be interrupted which means that certain methods will throw InterruptedException or the thread needs to be checking for Thread.currentThread().isInterrupted().
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
Yeah this timeout has nothing to do with the Future.get(...) timeout.
Related
How do deal with h2 database inability to deal with interrupts, I was occasionally seeing that my embedded h2 database appeared to get corrupted, in particular I had amended an ExecutorService so that if a task took too long it would cancel the task. The task would be cancelled okay but then subsequent database access failed with exceptions such as
23/07/2019 14.23.31:BST:DeleteDuplicatesController:start:SEVERE: commit failed
org.hibernate.TransactionException: commit failed
at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.commit(AbstractTransactionImpl.java:187)
at com.jthink.songkong.db.ReportCache.save(ReportCache.java:46)
at com.jthink.songkong.reports.AbstractReport.setReportDatabaseObject(AbstractReport.java:365)
at com.jthink.songkong.reports.DeleteDuplicatesReport.setReportDatabaseObject(DeleteDuplicatesReport.java:333)
at com.jthink.songkong.reports.DeleteDuplicatesReport.closeReport(DeleteDuplicatesReport.java:377)
at com.jthink.songkong.analyse.toplevelanalyzer.DeleteDuplicatesController.deleteAnyDups(DeleteDuplicatesController.java:606)
at com.jthink.songkong.analyse.toplevelanalyzer.DeleteDuplicatesController.start(DeleteDuplicatesController.java:665)
at com.jthink.songkong.ui.swingworker.DeleteDuplicates.doInBackground(DeleteDuplicates.java:43)
at com.jthink.songkong.ui.swingworker.DeleteDuplicates.doInBackground(DeleteDuplicates.java:20)
at javax.swing.SwingWorker$1.call(SwingWorker.java:295)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at javax.swing.SwingWorker.run(SwingWorker.java:334)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.hibernate.TransactionException: unable to commit against JDBC connection
at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.doCommit(JdbcTransaction.java:116)
at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.commit(AbstractTransactionImpl.java:180)
... 14 more
Caused by: org.h2.jdbc.JdbcSQLNonTransientException: General error: "java.lang.IllegalStateException: Reading from nio:C:/Users/Paul/AppData/Roaming/SongKong/Database/Database.mv.db failed; file length -1 read length 4096 at 1541494 [1.4.199/1]"; SQL statement:
COMMIT [50000-199]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:502)
at org.h2.message.DbException.getJdbcSQLException(DbException.java:427)
at org.h2.message.DbException.get(DbException.java:194)
at org.h2.message.DbException.convert(DbException.java:347)
at org.h2.command.Command.executeUpdate(Command.java:280)
at org.h2.jdbc.JdbcConnection.commit(JdbcConnection.java:542)
at com.mchange.v2.c3p0.impl.NewProxyConnection.commit(NewProxyConnection.java:1284)
at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.doCommit(JdbcTransaction.java:112)
... 15 more
Caused by: java.lang.IllegalStateException: Reading from nio:C:/Users/Paul/AppData/Roaming/SongKong/Database/Database.mv.db failed; file length -1 read length 4096 at 1541494 [1.4.199/1]
at org.h2.mvstore.DataUtils.newIllegalStateException(DataUtils.java:883)
at org.h2.mvstore.DataUtils.readFully(DataUtils.java:420)
at org.h2.mvstore.FileStore.readFully(FileStore.java:98)
at org.h2.mvstore.MVStore.readBufferForPage(MVStore.java:1048)
at org.h2.mvstore.MVStore.readPage(MVStore.java:2186)
at org.h2.mvstore.MVMap.readPage(MVMap.java:554)
at org.h2.mvstore.Page$NonLeaf.getChildPage(Page.java:1086)
at org.h2.mvstore.Page.get(Page.java:221)
at org.h2.mvstore.MVMap.get(MVMap.java:402)
at org.h2.mvstore.MVMap.get(MVMap.java:389)
at org.h2.mvstore.MVStore.getMapName(MVStore.java:2737)
at org.h2.mvstore.MVStore.renameMap(MVStore.java:2650)
at org.h2.mvstore.tx.TransactionStore.commit(TransactionStore.java:453)
at org.h2.mvstore.tx.Transaction.commit(Transaction.java:389)
at org.h2.engine.Session.commit(Session.java:691)
at org.h2.command.dml.TransactionCommand.update(TransactionCommand.java:46)
at org.h2.command.CommandContainer.update(CommandContainer.java:133)
at org.h2.command.Command.executeUpdate(Command.java:267)
... 18 more
Caused by: java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:721)
at org.h2.store.fs.FileNio.read(FilePathNio.java:74)
at org.h2.mvstore.DataUtils.readFully(DataUtils.java:406)
... 34 more
23/07/2019 14.23.31:BST:Errors:addError:SEVERE: Adding Error:commit failed
I have since found this issue
Basically if using H2 in embedded mode, and it receives an interrupt then all subsequent access fails until the thread pool is close and reopened. In the example I give of a process having to be cancelled because it appears to be stuck there is no solution except for interrupting
I also have another case whereby usually the controller thread that doesn't directly do a database work itself so I was struggling to see why when an interrupt occurred why this would cause database errors since this is handled by controller thread. I have now worked out the issue is that Im using an ExecutorService with a fixed size BlockingQueue (so that we dont have a big queue build up in memory), but if the queue gets full then new task is actually executed by the controller thread (because of CallerRunsPolicy), so the controller thread can be making calls to database after all.
Im using H2 with hibernate and in both cases calling the following immediately after the interrupt
HibernateUtil.closeFactory();
seems to solve the issue, however I guess this means that any other threads with hibernate sessions will be broken, but at least newly opened sessins will be okay. So im not particularly happy with this workaround, any other ideas ?
Using H2 as a server is not a solution since the whole point of H2 was an embedded db self contained within application.
Although not properly documented using the async protocol allows a connection to be interrupted without breaking all other connections.
I am making a blocking service call in one of my worker verticles that logged a warning. This was "addressed" by increasing the time limit, but, I am more curious about how to read the naming of the thread in the log line - vert.x-worker-thread-3,5,main. The full log was this -
io.vertx.core.impl.BlockedThreadChecker
WARNING: Thread Thread[vert.x-worker-thread-3,5,main] has been blocked for 64134 ms, time limit is 60000
io.vertx.core.VertxException: Thread blocked
What does the 3,5,main indicate? Is it some kind of trace from the main verticle? Thanks.
ThreadName: vert.x-worker-thread-3
ThreadPriority: 5
Source: main
I have a client that needs to disconnect from one server and connect to another. Its taking about 16 seconds. I still haven't debugged the connection logic, but I can see the shutdown of the channel is taking 5 seconds. Is this expected behavior, or should I be looking for thread starvation in my code.
LOG.debug("==============SHUTTING DOWN MANAGED CHANNEL");
long startTime=System.currentTimeMillis();
channel.shutdown().awaitTermination(20, SECONDS);
long endTime=System.currentTimeMillis();
LOG.debug("Time to shutdown channel ms = {}",endTime-startTime);
LOG.debug("==============RETURN FROM SHUTTING DOWN MANAGED CHANNEL");
From the log
2018-07-09 14:41:23,143 DEBUG [com.ticomgeo.ftc.client.FTCClient] (EE-ManagedExecutorService-singleThreaded-Thread-1) ==============SHUTTING DOWN MANAGED CHANNEL
2018-07-09 14:41:28,151 INFO [io.grpc.internal.ManagedChannelImpl] (grpc-default-worker-ELG-1-1) [io.grpc.internal.ManagedChannelImpl-1] Terminated
2018-07-09 14:41:28,152 DEBUG [com.ticomgeo.ftc.client.FTCClient] (EE-ManagedExecutorService-singleThreaded-Thread-1) Time to shutdown channel ms = 5009
2018-07-09 14:41:28,152 DEBUG [com.ticomgeo.ftc.client.FTCClient] (EE-ManagedExecutorService-singleThreaded-Thread-1) ==============RETURN FROM SHUTTING DOWN MANAGED CHANNEL
There are two shutdown functions, shutdown and shutdownNow. Is there any chance you have a calls going that are blocking shutdown? You may be better served by shutdownNow.
shutdown
Initiates an orderly shutdown in which preexisting calls continue but new calls are rejected.
shutdownNow
Initiates a forceful shutdown in which preexisting and new calls are rejected. Although forceful, the shutdown process is still not instantaneous; isTerminated() will likely return false immediately after this method returns.
Have an issue with #Retryable in the Async context, I have a service call which is returning a SocketTimeOut exception. This I would have expected to retry 3 times, I do have #EnableRetry, however I am seeing something I little strange in the logs, a sleep interruptedException. Here is part of the stack trace.
Caused by: java.lang.InterruptedException: sleep interrupted
at org.springframework.retry.interceptor.RetryOperationsInterceptor.invoke(RetryOperationsInterceptor.java:118) ~[spring-retry-1.2.1.RELEASE.jar!/:na]
at someservice.somemethod(someservice.java) ~[classes/:na]
2018-01-18 18:59:39.818 INFO 14 --- [lTaskExecutor-1] someExceptionHandler : Thread interrupted while sleeping; nested exception is java.lang.InterruptedException: sleep interrupted
at org.springframework.retry.backoff.ThreadWaitSleeper.sleep(ThreadWaitSleeper.java:30) ~[spring-retry-1.2.1.RELEASE.jar!/:na]
at org.springframework.retry.backoff.StatelessBackOffPolicy.backOff(StatelessBackOffPolicy.java:36) ~[spring-retry-1.2.1.RELEASE.jar!/:na]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_141]
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:673) ~[spring-aop-4.3.8.RELEASE.jar!/:4.3.8.RELEASE]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:738) ~[spring-aop-4.3.8.RELEASE.jar!/:4.3.8.RELEASE]
Not sure if this is a red herring here, but this happens after the read timeout occurs, I would have expected it to retry but rather I am seeing this in the logs. I know Spring retry has a default wait of 1 second, I am wondering if its being interrupted thus having an impact on its ability to retry.
T.I.A
Caused by: java.lang.InterruptedException: sleep interrupted at
Simply means that something else in your app is interrupting the thread while it is waiting in the backOff, thus killing the retry scenario.
This is by design...
if (this.logger.isDebugEnabled()) {
this.logger
.debug("Abort retry because interrupted: count="
+ context.getRetryCount());
}
...when you interrupt a thread, you are explicitly telling it to stop what it's doing (if it's doing something interruptible, such as sleeping).
I've noticed that some of the commands in my application fail with
Caused by: ! com.netflix.hystrix.exception.HystrixRuntimeException: GetAPICommand timed-out and no fallback available.
out: ! at com.netflix.hystrix.HystrixCommand.getFallbackOrThrowException(HystrixCommand.java:1631)
out: ! at com.netflix.hystrix.HystrixCommand.access$2000(HystrixCommand.java:97)
out: ! at com.netflix.hystrix.HystrixCommand$TimeoutObservable$1$1.tick(HystrixCommand.java:1025)
out: ! at com.netflix.hystrix.HystrixCommand$1.performBlockingGetWithTimeout(HystrixCommand.java:621)
out: ! at com.netflix.hystrix.HystrixCommand$1.get(HystrixCommand.java:516)
out: ! at com.netflix.hystrix.HystrixCommand.execute(HystrixCommand.java:425)
out: Caused by: ! java.util.concurrent.TimeoutException: null
out: !... 11 common frames omitted
This is my Hystrix configuration override:
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds=210000
hystrix.threadpool.default.coreSize=50
hystrix.threadpool.default.maxQueueSize=100
hystrix.threadpool.default.queueSizeRejectionThreshold=50
What kind of timeout is this? Is it a read/connection timeout to the external application? How do I go about debugging this?
This is a Hystrix Command Timeout, this timeout is enabled by default per each command, you define the value using the property:
execution.isolation.thread.timeoutInMilliseconds:
This property sets the time in milliseconds after which the caller will
observe a timeout and walk away from the command execution. Hystrix marks > the HystrixCommand as a TIMEOUT, and performs fallback logic.
So you can increase your timeout value or disable the default time out (if apply in your case) for your command using the property:
#HystrixProperty(name = "hystrix.command.default.execution.timeout.enabled", value = "false")
You can find more information here: https://github.com/Netflix/Hystrix/wiki/Configuration#CommandExecution
It might be you are in debug or your connection too slow, default thread execution timeout is only 1 second, so you could get this message easily if you put a break-point in your command let's say
Although this is not your case but might help somebody else
After adding the following dependency
<dependency>
<groupId>com.netflix.hystrix</groupId>
<artifactId>hystrix-javanica</artifactId>
<version>1.5.18</version>
</dependency>
I would be able to resolve the timeout issue.
What you need to do is to set the hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds=210000 property in the application.properties or bootstrap.properties
This works for me hopefully would work.
Looking at the stacktrace this is an exception thrown by Hystrix after the 210 seconds you defined above.
As TimeoutException is a checked exception that needs to be declared on each method that could throw this exception. You would see this declared in the run() method of your code.
You can debug this like any other program, but be aware that the run() method runs in a thread separate from the caller. After 210 seconds the caller will just continue despite your debugging session.
You should increase ur rest client httpclient readTimeout property