I am working on a piece of code that routes from FTP input to a bean process and then to FTP output. The process takes about 15 minutes to complete and when it finishes, the Camel tries to delete a input file from FTP input route. Currently the FTP server throws an error:
13 Jan 2015 18:21:26 DEBUG org.apache.camel.component.file.remote.FtpOperations.deleteFile - Deleting file: ../flex-brazil/Portal_Forn/request_appr/364/NF_4.txt
13 Jan 2015 18:21:26 WARN org.slf4j.helpers.MarkerIgnoringBase.warn - Error during commit. Exchange[org.apache.camel.component.file.GenericFileMessage#3207779]. Caused by: [org.apache.camel.component.file.GenericFileOperationFailedException - File operation failed: 421 Timeout (900 seconds): closing control connection.
FTP response 421 received. Server closed connection.. Code: 421]
org.apache.camel.component.file.GenericFileOperationFailedException: File operation failed: 421 Timeout (900 seconds): closing control connection.
FTP response 421 received. Server closed connection.. Code: 421
at org.apache.camel.component.file.remote.FtpOperations.getCurrentDirectory(FtpOperations.java:701)
at org.apache.camel.component.file.remote.FtpOperations.deleteFile(FtpOperations.java:224)
at org.apache.camel.component.file.strategy.GenericFileDeleteProcessStrategy.commit(GenericFileDeleteProcessStrategy.java:71)
at org.apache.camel.component.file.GenericFileOnCompletion.processStrategyCommit(GenericFileOnCompletion.java:124)
at org.apache.camel.component.file.GenericFileOnCompletion.onCompletion(GenericFileOnCompletion.java:80)
at org.apache.camel.component.file.GenericFileOnCompletion.onComplete(GenericFileOnCompletion.java:54)
at org.apache.camel.util.UnitOfWorkHelper.doneSynchronizations(UnitOfWorkHelper.java:100)
at org.apache.camel.impl.DefaultUnitOfWork.done(DefaultUnitOfWork.java:228)
at org.apache.camel.util.UnitOfWorkHelper.doneUow(UnitOfWorkHelper.java:61)
at org.apache.camel.processor.CamelInternalProcessor$UnitOfWorkProcessorAdvice.after(CamelInternalProcessor.java:613)
at org.apache.camel.processor.CamelInternalProcessor$UnitOfWorkProcessorAdvice.after(CamelInternalProcessor.java:581)
at org.apache.camel.processor.CamelInternalProcessor$InternalCallback.done(CamelInternalProcessor.java:240)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:173)
at org.apache.camel.component.file.GenericFileConsumer.processExchange(GenericFileConsumer.java:401)
at org.apache.camel.component.file.remote.RemoteFileConsumer.processExchange(RemoteFileConsumer.java:99)
at org.apache.camel.component.file.GenericFileConsumer.processBatch(GenericFileConsumer.java:201)
at org.apache.camel.component.file.GenericFileConsumer.poll(GenericFileConsumer.java:165)
at org.apache.camel.impl.ScheduledPollConsumer.doRun(ScheduledPollConsumer.java:187)
at org.apache.camel.impl.ScheduledPollConsumer.run(ScheduledPollConsumer.java:114)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.commons.net.ftp.FTPConnectionClosedException: FTP response 421 received. Server closed connection.
at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:367)
at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:294)
at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:483)
at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:608)
at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:582)
at org.apache.commons.net.ftp.FTP.pwd(FTP.java:1454)
at org.apache.commons.net.ftp.FTPClient.printWorkingDirectory(FTPClient.java:2658)
at org.apache.camel.component.file.remote.FtpOperations.getCurrentDirectory(FtpOperations.java:697)
... 25 more
There are any parameter to teach the Camel to send keepalives to FTP server to avoid this situation?
I suggest to split the whole process into two routes:
Reader Route: Only reads the file from the FTP server.
Process and Writer Route: This route processes the incoming message and writes a response to the FTP server
The advantage is, that the connection will be closed after reading and deleting without exception.
When a new message will be written back to server a new connection will be created.
The first route can only close the connection, when the process is asynchronous. For this purpose you can use queues for reading and writing e.g. seda or jms.
You may consider to set a bigger timeout option - the default is 30s.
To set the timeout to 15 minutes: ftp://user#host?timeout=900000
Keep in mind, however, that the connection may be closed by the FTP server.
Related
I have deployed Spring Boot application that has a Database based queue with jobs on App Service.
Yesterday I performed a few Scale out and Scale in operations while the application was working to see how it will behave.
At some point (not necessary related to scaling operations) application started to throw Hikari errors.
com.zaxxer.hikari.pool.PoolBase : HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection#1ae66f34 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.
com.zaxxer.hikari.pool.ProxyConnection : HikariPool-1 - Connection org.postgresql.jdbc.PgConnection#1ef85079 marked as broken because of SQLSTATE(08006), ErrorCode(0)
The following are stack traces from my scheduled job in spring and other information:
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
Caused by: javax.net.ssl.SSLException: Connection reset by peer (Write failed)
Suppressed: java.net.SocketException: Broken pipe (Write failed)
Caused by: java.net.SocketException: Connection reset by peer (Write failed)
Next the following stack of errors:
WARN 1 --- [ scheduling-1] com.zaxxer.hikari.pool.PoolBase : HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection#48d0d6da (This connection has been closed.).
Possibly consider using a shorter maxLifetime value.
org.springframework.jdbc.support.MetaDataAccessException: Error while extracting DatabaseMetaData; nested exception is java.sql.SQLException: Connection is closed
Caused by: java.sql.SQLException: Connection is closed
The code which is invoked periodically - every 500 milliseconds is here:
#Scheduled(fixedDelayString = "${worker.delay}")
#Transactional
public void execute() {
jobManager.next(jobClass).ifPresent(this::handleJob);
}
Update.
The above code is almost all the time doing nothing, since there was no traffic on the website.
Update2. I've checked Postgres logs and found this:
2020-07-11 22:48:09 UTC-5f0866f0.f0-LOG: checkpoint starting: immediate force wait
2020-07-11 22:48:10 UTC-5f0866f0.f0-LOG: checkpoint complete (240): wrote 30 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.046 s, sync=0.046 s, total=0.437 s; sync files=13, longest=0.009 s, average=0.003 s; distance=163 kB, estimate=13180 kB
2020-07-11 22:48:10 UTC-5f0866ee.68-LOG: received immediate shutdown request
2020-07-11 22:48:10 UTC-5f0a3f41.8914-WARNING: terminating connection because of crash of another server process
2020-07-11 22:48:10 UTC-5f0a3f41.8914-DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
// Same text about 10 times
2020-07-11 22:48:10 UTC-5f0866f2.7c-HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-07-11 22:48:10 UTC-5f0866ee.68-LOG: src/port/kill.c(84): Process (272) exited OOB of pgkill.
2020-07-11 22:48:10 UTC-5f0866f1.fc-WARNING: terminating connection because of crash of another server process
2020-07-11 22:48:10 UTC-5f0866f1.fc-DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-07-11 22:48:10 UTC-5f0866f1.fc-HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-07-11 22:48:10 UTC-5f0866ee.68-LOG: archiver process (PID 256) exited with exit code 1
2020-07-11 22:48:11 UTC-5f0866ee.68-LOG: database system is shut down
It looks like it is a problem with Azure PostgresSQL server and it closed itself. Am I reading this right?
Like mentioned in your logs, have you tried setting maxLifetime property for the Hikari CP ? I think after setting that property this issue should be resolved.
Based on Hikari doc (https://github.com/brettwooldridge/HikariCP) --
maxLifetime
This property controls the maximum lifetime of a connection in the pool. An in-use connection will never be retired, only when it is closed will it then be removed. On a connection-by-connection basis, minor negative attenuation is applied to avoid mass-extinction in the pool. We strongly recommend setting this value, and it should be several seconds shorter than any database or infrastructure imposed connection time limit. A value of 0 indicates no maximum lifetime (infinite lifetime), subject of course to the idleTimeout setting. The minimum allowed value is 30000ms (30 seconds). Default: 1800000 (30 minutes)
At biz code (java) sometimes will appear error(below code):
and mysql server memory was highest,mysql slowlog not show some helpful information,How should I do that can troubleshoot this problem?
tks.
java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3011)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3469)
... 86 common frames omitted
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 81,184 milliseconds ago. The last packet sent successfully to the server was 81,184 milliseconds ago.**
I use SpringBoot and Jersey into my project and I often tackle the following error :
[ERROR - ServerRuntime$Responder - 2018-02-13 13:16:45,983] An I/O error has occurred while writing a response message entity to the container output stream.
org.glassfish.jersey.server.internal.process.MappableException: org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe
at org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:92)
at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:162)
at org.glassfish.jersey.message.internal.MessageBodyFactory.writeTo(MessageBodyFactory.java:1130)
at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:711)
at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:444)
at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:434)
at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:329)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
It results with a 503 status response to my client.
Could you explain me why this error occured ?
Thanks
This kind of error usually happens servlet is writing data back on the stream and connection closed from client side.
It is like server in sending some data may be file, string, bytes etc.. but on client side like a browser has closed the connection, like you close the browser tab.
It cased early End of file exception on the server.
I have had this exception when I accidentally passed a null Response at the end of a call. You can verify this in the'caused by' section following the first exception:
Caused by: org.eclipse.jetty.io.EofException: null
Problem Description: MongoDB version is 3.4
In fact, did not do anything on the normal query, write,
because it is in the testing phase, QPS is small.
Question:
1: How is this anomaly produced.
2: what configuration or adjustment needs to be done? help me
02-01 15:11:47 WARN - Got socket exception on connection [connectionId{localValue:43}] to 172.16.199.96:22001. All connections to 172.16.199.96:22001 will be closed.
02-01 15:11:47 INFO - Closed connection [connectionId{localValue:43}] to 172.16.199.96:22001 because there was a socket exception raised by this connection.
org.springframework.data.mongodb.UncategorizedMongoDbException: Exception receiving message; nested exception is com.mongodb.MongoSocketReadException: Exception receiving message
at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:107)
at org.springframework.data.mongodb.core.MongoTemplate.potentiallyConvertRuntimeException(MongoTemplate.java:2135)
at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1978)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1784)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1767)
at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:641)
at org.springframework.data.mongodb.core.MongoTemplate.findOne(MongoTemplate.java:606)
at org.springframework.data.mongodb.core.MongoTemplate.findOne(MongoTemplate.java:598)
at com.xxx.xxx.xxx.xxx(xxxService.java:46)
at com.xxx.xxx.xxx.xxx(xxxService.java:157)
at com.xxx.xxx.xxx.xxx(xxxService.java:142)
at com.xxx.xxx.xxx.xxx(xxxService.java:87)
at com.alibaba.dubbo.common.bytecode.Wrapper2.invokeMethod(Wrapper2.java)
at com.alibaba.dubbo.rpc.proxy.javassist.JavassistProxyFactory$1.doInvoke(JavassistProxyFactory.java:46)
at com.alibaba.dubbo.rpc.proxy.AbstractProxyInvoker.invoke(AbstractProxyInvoker.java:72)
at com.alibaba.dubbo.rpc.protocol.InvokerWrapper.invoke(InvokerWrapper.java:53)
at com.alibaba.dubbo.rpc.filter.ExceptionFilter.invoke(ExceptionFilter.java:64)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:69)
at com.alibaba.dubbo.monitor.support.MonitorFilter.invoke(MonitorFilter.java:75)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:69)
at com.alibaba.dubbo.rpc.filter.TimeoutFilter.invoke(TimeoutFilter.java:42)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:69)
at com.alibaba.dubbo.rpc.protocol.dubbo.filter.TraceFilter.invoke(TraceFilter.java:78)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:69)
at com.alibaba.dubbo.rpc.filter.ContextFilter.invoke(ContextFilter.java:61)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:69)
at com.alibaba.dubbo.rpc.filter.GenericFilter.invoke(GenericFilter.java:132)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:69)
at com.alibaba.dubbo.rpc.filter.ClassLoaderFilter.invoke(ClassLoaderFilter.java:38)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:69)
at com.alibaba.dubbo.rpc.filter.EchoFilter.invoke(EchoFilter.java:38)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:69)
at com.alibaba.dubbo.rpc.protocol.dubbo.DubboProtocol$1.reply(DubboProtocol.java:100)
at com.alibaba.dubbo.remoting.exchange.support.header.HeaderExchangeHandler.handleRequest(HeaderExchangeHandler.java:98)
at com.alibaba.dubbo.remoting.exchange.support.header.HeaderExchangeHandler.received(HeaderExchangeHandler.java:170)
at com.alibaba.dubbo.remoting.transport.DecodeHandler.received(DecodeHandler.java:52)
at com.alibaba.dubbo.remoting.transport.dispatcher.ChannelEventRunnable.run(ChannelEventRunnable.java:81)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.mongodb.MongoSocketReadException: Exception receiving message
at com.mongodb.connection.InternalStreamConnection.translateReadException(InternalStreamConnection.java:483)
at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:228)
at com.mongodb.connection.UsageTrackingInternalConnection.receiveMessage(UsageTrackingInternalConnection.java:96)
at com.mongodb.connection.DefaultConnectionPool$PooledConnection.receiveMessage(DefaultConnectionPool.java:440)
at com.mongodb.connection.CommandProtocol.execute(CommandProtocol.java:112)
at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:168)
at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:289)
at com.mongodb.connection.DefaultServerConnection.command(DefaultServerConnection.java:176)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:216)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:207)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:113)
at com.mongodb.operation.FindOperation$1.call(FindOperation.java:516)
at com.mongodb.operation.FindOperation$1.call(FindOperation.java:510)
at com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:431)
at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:404)
at com.mongodb.operation.FindOperation.execute(FindOperation.java:510)
at com.mongodb.operation.FindOperation.execute(FindOperation.java:81)
at com.mongodb.Mongo.execute(Mongo.java:836)
at com.mongodb.Mongo$2.execute(Mongo.java:823)
at com.mongodb.DBCursor.initializeCursor(DBCursor.java:870)
at com.mongodb.DBCursor.hasNext(DBCursor.java:142)
at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1964)
... 37 common frames omitted
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at com.mongodb.connection.SocketStream.read(SocketStream.java:85)
at com.mongodb.connection.InternalStreamConnection.receiveResponseBuffers(InternalStreamConnection.java:494)
at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:224)
... 57 common frames omitted
java version 1.8.
spring boot version 1.5.3.
deployed with docker.
mongo.hosts=ip:port,ip:port,ip:port
mongo.database.name=dbname
mongo.username=username
mongo.password=pwd
mongo.connections.per.host=32
mongo.max.wait.time=2000
mongo.connect.timeout=2000
You can try,
autoConnectRetry simply means the driver will automatically attempt to reconnect to the server(s) after unexpected disconnects. In production environments you usually want this set to true.
This is from another post, How to configure MongoDB Java driver MongoOptions for production use?
for everybody who is experiencing the same random MongoSocketReadException, you may need the socketTimeoutMS or maxIdleTimeMS parameters instead. The parameter autoConnectRetry is not exposed any more in the mongodb connection string.
Our situation: we switched to mongodb atlas serverless solution for our development and testing environments, ever since then we got this MongoSocketReadException like every 15 min. or randomly. We are also behind a enterprise firewall.
According to https://www.mongodb.com/docs/v6.0/tutorial/connection-pool-performance-tuning/:
a misconfigured firewall closes a socket connection incorrectly and the driver cannot detect that the connection closed improperly.
you need => Use socketTimeoutMS to ensure that sockets are always closed. Set socketTimeoutMS to two or three times the length of the slowest operation that the driver runs.
because the socketTimeoutMS is by default 0, which will never timeout.
And another parameter maxIdleTimeMS may also affect the connection because if the socket is closed and on the client side it's not detected, the connection will be still waiting in idle time and not cloesd. And by default it's 0 meaning it waits forever with no upper boundaries.
So configure this to a small amount may help the driver to close the the problematic connection with its closed socket, before it tries to connect to the db using the same connection and presumes the connection is still there.
So our solution:
...mongodbUri...?socketTimeoutMS=150000&maxIdleTimeMS=150000
we changed the socketTimeoutMS from 0 to 15s and same for the maxIdleTimeMS.
Can any one help to explain why I get so many "Broken pipe IOE" in ZooKeeper log?
ZooKeeper throws this exception almost every minute. I don't think we use the four letter command to dumpWatches so frequently. So what does this mean?
This may be caused by the command wchc because our ZooKeeper has more than ten thousand znodes. And I have found that this command is executed from the same server with the different port. Will ZooKeeper call this command frequently?
2014-09-17,10:52:09,179 ERROR org.apache.zookeeper.server.NIOServerCnxn: [myid:0] Error sending data synchronously
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:69)
at sun.nio.ch.IOUtil.write(IOUtil.java:40)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:336)
at org.apache.zookeeper.server.NIOServerCnxn.sendBufferSync(NIOServerCnxn.java:138)
at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:453)
at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.write(NIOServerCnxn.java:474)
at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111)
at java.io.BufferedWriter.write(BufferedWriter.java:212)
at java.io.PrintWriter.write(PrintWriter.java:412)
at java.io.PrintWriter.write(PrintWriter.java:429)
at java.io.PrintWriter.print(PrintWriter.java:559)
at java.io.PrintWriter.println(PrintWriter.java:695)
at org.apache.zookeeper.server.WatchManager.dumpWatches(WatchManager.java:166)
at org.apache.zookeeper.server.DataTree.dumpWatches(DataTree.java:1240)
at org.apache.zookeeper.server.NIOServerCnxn$WatchCommand.commandRun(NIOServerCnxn.java:722)
at org.apache.zookeeper.server.NIOServerCnxn$CommandThread.run(NIOServerCnxn.java:496)
2014-09-17,10:52:09,179 INFO org.apache.zookeeper.server.NIOServerCnxn: [myid:0] Closed socket connection for client /10.20.201.234:53756 which had sessionid 0x34840357f664081
2014-09-17,10:52:09,179 ERROR org.apache.zookeeper.server.NIOServerCnxn: [myid:0] Error sending data synchronously
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:69)
at sun.nio.ch.IOUtil.write(IOUtil.java:40)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:336)
at org.apache.zookeeper.server.NIOServerCnxn.sendBufferSync(NIOServerCnxn.java:138)
at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:453)
at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.write(NIOServerCnxn.java:474)
at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111)
at java.io.BufferedWriter.flush(BufferedWriter.java:235)
at java.io.PrintWriter.flush(PrintWriter.java:276)
at org.apache.zookeeper.server.NIOServerCnxn.cleanupWriterSocket(NIOServerCnxn.java:424)
at org.apache.zookeeper.server.NIOServerCnxn.access$000(NIOServerCnxn.java:60)
at org.apache.zookeeper.server.NIOServerCnxn$CommandThread.run(NIOServerCnxn.java:500)
I have found the reason. We're using TaoKeeper to monitor the status of ZooKeeper. It will periodically send wchc and other commands to check the status. When ZooKeeper receives wchc, the exception occurs because we have too many znodes.
I think TaoKeeper should use mntr rather than wchc which may affect ZooKeeper's performance when it has large number of znodes.