GemFire with GFSH - Region creation with server start throws BindException

GemFire with GFSH - Region creation with server start throws BindException - java

I am trying to create a GemFire cache and region at the time of starting a cache server using GFSH command.
GFSH:
gfsh start server --name=server1 --server-port=40405 --classpath=$CLASSPATH --cache-xml=/tmp/gemfire/8.2.7/config/cache.xml --locators=hostA[10334],hostB[10334] --mcast-port=0
Cache.xml:
<?xml version="1.0" encoding="UTF-8"?><cache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schema.pivotal.io/gemfire/cache" xsi:schemaLocation="http://schema.pivotal.io/gemfire/cache http://schema.pivotal.io/gemfire/cache/cache-8.1.xsd" version="8.1" lock-lease="120" lock-timeout="60" search-timeout="300" is-server="false" copy-on-read="false">
<pdx>
<pdx-serializer>
<class-name>
com.gemstone.gemfire.pdx.ReflectionBasedAutoSerializer
</class-name>
<parameter name="classes">
<string>com.gemfire.DomainObjects</string>
</parameter>
</pdx-serializer>
</pdx>
<region name="Customer" refid="REPLICATE"></region>
</cache>
When I run the gfsh command, I am getting below exception.
Caused by: java.net.BindException: Failed to create server socket on null[40,405]
at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(SocketCreator.java:828)
at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(SocketCreator.java:758)
at com.gemstone.gemfire.internal.cache.tier.sockets.AcceptorImpl.<init>(AcceptorImpl.java:466)
at com.gemstone.gemfire.internal.cache.BridgeServerImpl.start(BridgeServerImpl.java:342)
at com.gemstone.gemfire.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:607)
... 11 more
Caused by: java.net.BindException: Address already in use
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
at java.net.ServerSocket.bind(ServerSocket.java:375)
at com.gemstone.gemfire.internal.SocketCreator.createServerSocket(SocketCreator.java:825)
... 15 more
Any help?
I used netstat to find if the port is occupied. I couldnt find any and i executed the script still i am getting same exception.
bash-4.1$ netstat | grep 40405
bash-4.1$ ./startServer.sh
.......................................................................................................................................................................................................................................................The Cache Server process terminated unexpectedly with exit status 1. Please refer to the log file in /var/tmp/sn17180/gemfire/8.2.7/config/server1 for full details.
[severe 2018/05/06 04:37:53.156 IST libgemfire.so nid=0x17921700] SIGQUIT received, dumping threads
java.io.EOFException: Locator at hostB(server1:18877)<v73>:10334 did not respond. This is normal if the locator was shutdown. If it wasn't check its log for exceptions.
at com.gemstone.org.jgroups.stack.tcpserver.TcpClient.requestToServer(TcpClient.java:125)
at com.gemstone.org.jgroups.stack.tcpserver.TcpClient.requestToServer(TcpClient.java:78)
at com.gemstone.gemfire.internal.cache.ClusterConfigurationLoader.requestConfigurationFromLocators(ClusterConfigurationLoader.java:171)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.requestAndApplySharedConfiguration(GemFireCacheImpl.java:874)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.init(GemFireCacheImpl.java:1025)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.create(GemFireCacheImpl.java:688)
at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:182)
at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:229)
at com.gemstone.gemfire.distributed.ServerLauncher.startWithGemFireApi(ServerLauncher.java:793)
at com.gemstone.gemfire.distributed.ServerLauncher.start(ServerLauncher.java:695)
at com.gemstone.gemfire.distributed.ServerLauncher.run(ServerLauncher.java:625)
at com.gemstone.gemfire.distributed.ServerLauncher.main(ServerLauncher.java:195)
Exception in thread "main" com.gemstone.gemfire.GemFireIOException: While starting bridge server CacheServer on port=40405 client subscription config policy=none client subscription config capacity=1 client subscription config overflow directory=.
at com.gemstone.gemfire.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:611)
at com.gemstone.gemfire.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:340)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4269)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.initializeDeclarativeCache(GemFireCacheImpl.java:1184)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.init(GemFireCacheImpl.java:1026)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.create(GemFireCacheImpl.java:688)
at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:182)
at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:229)
at com.gemstone.gemfire.distributed.ServerLauncher.startWithGemFireApi(ServerLauncher.java:793)
at com.gemstone.gemfire.distributed.ServerLauncher.start(ServerLauncher.java:695)
at com.gemstone.gemfire.distributed.ServerLauncher.run(ServerLauncher.java:625)
at com.gemstone.gemfire.distributed.ServerLauncher.main(ServerLauncher.java:195)
Caused by: java.net.BindException: Failed to create server socket on null[40,405]

This means some other process running on localhost is already bound and listening to client socket connections on port 40405. Perhaps you have another GemFire Server already running on that port?
In which case, you need to vary the port numbers for as many servers as you start, for example...
gfsh> start server --name=One --server-port=40405 ...
...
gfsh> start server --name=Two --server-port=40406 ...
...
gfsh> start server --name=Three --server-port=40407 ...
...
... and so and so forth.
If a non-GemFire process is already running using that port number than either considering killing that process or using another port for your GemFire CacheServers.
You can use the Linux netstat command to find which process is using the port and act appropriately.

Related

Every 15 minutes there will be this exception, look at the fillInStackTrace information

Problem Description: MongoDB version is 3.4
In fact, did not do anything on the normal query, write,
because it is in the testing phase, QPS is small.
Question:
1: How is this anomaly produced.
2: what configuration or adjustment needs to be done? help me
02-01 15:11:47 WARN - Got socket exception on connection [connectionId{localValue:43}] to 172.16.199.96:22001. All connections to 172.16.199.96:22001 will be closed.
02-01 15:11:47 INFO - Closed connection [connectionId{localValue:43}] to 172.16.199.96:22001 because there was a socket exception raised by this connection.
org.springframework.data.mongodb.UncategorizedMongoDbException: Exception receiving message; nested exception is com.mongodb.MongoSocketReadException: Exception receiving message
at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:107)
at org.springframework.data.mongodb.core.MongoTemplate.potentiallyConvertRuntimeException(MongoTemplate.java:2135)
at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1978)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1784)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1767)
at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:641)
at org.springframework.data.mongodb.core.MongoTemplate.findOne(MongoTemplate.java:606)
at org.springframework.data.mongodb.core.MongoTemplate.findOne(MongoTemplate.java:598)
at com.xxx.xxx.xxx.xxx(xxxService.java:46)
at com.xxx.xxx.xxx.xxx(xxxService.java:157)
at com.xxx.xxx.xxx.xxx(xxxService.java:142)
at com.xxx.xxx.xxx.xxx(xxxService.java:87)
at com.alibaba.dubbo.common.bytecode.Wrapper2.invokeMethod(Wrapper2.java)
at com.alibaba.dubbo.rpc.proxy.javassist.JavassistProxyFactory$1.doInvoke(JavassistProxyFactory.java:46)
at com.alibaba.dubbo.rpc.proxy.AbstractProxyInvoker.invoke(AbstractProxyInvoker.java:72)
at com.alibaba.dubbo.rpc.protocol.InvokerWrapper.invoke(InvokerWrapper.java:53)
at com.alibaba.dubbo.rpc.filter.ExceptionFilter.invoke(ExceptionFilter.java:64)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:69)
at com.alibaba.dubbo.monitor.support.MonitorFilter.invoke(MonitorFilter.java:75)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:69)
at com.alibaba.dubbo.rpc.filter.TimeoutFilter.invoke(TimeoutFilter.java:42)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:69)
at com.alibaba.dubbo.rpc.protocol.dubbo.filter.TraceFilter.invoke(TraceFilter.java:78)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:69)
at com.alibaba.dubbo.rpc.filter.ContextFilter.invoke(ContextFilter.java:61)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:69)
at com.alibaba.dubbo.rpc.filter.GenericFilter.invoke(GenericFilter.java:132)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:69)
at com.alibaba.dubbo.rpc.filter.ClassLoaderFilter.invoke(ClassLoaderFilter.java:38)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:69)
at com.alibaba.dubbo.rpc.filter.EchoFilter.invoke(EchoFilter.java:38)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:69)
at com.alibaba.dubbo.rpc.protocol.dubbo.DubboProtocol$1.reply(DubboProtocol.java:100)
at com.alibaba.dubbo.remoting.exchange.support.header.HeaderExchangeHandler.handleRequest(HeaderExchangeHandler.java:98)
at com.alibaba.dubbo.remoting.exchange.support.header.HeaderExchangeHandler.received(HeaderExchangeHandler.java:170)
at com.alibaba.dubbo.remoting.transport.DecodeHandler.received(DecodeHandler.java:52)
at com.alibaba.dubbo.remoting.transport.dispatcher.ChannelEventRunnable.run(ChannelEventRunnable.java:81)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.mongodb.MongoSocketReadException: Exception receiving message
at com.mongodb.connection.InternalStreamConnection.translateReadException(InternalStreamConnection.java:483)
at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:228)
at com.mongodb.connection.UsageTrackingInternalConnection.receiveMessage(UsageTrackingInternalConnection.java:96)
at com.mongodb.connection.DefaultConnectionPool$PooledConnection.receiveMessage(DefaultConnectionPool.java:440)
at com.mongodb.connection.CommandProtocol.execute(CommandProtocol.java:112)
at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:168)
at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:289)
at com.mongodb.connection.DefaultServerConnection.command(DefaultServerConnection.java:176)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:216)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:207)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:113)
at com.mongodb.operation.FindOperation$1.call(FindOperation.java:516)
at com.mongodb.operation.FindOperation$1.call(FindOperation.java:510)
at com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:431)
at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:404)
at com.mongodb.operation.FindOperation.execute(FindOperation.java:510)
at com.mongodb.operation.FindOperation.execute(FindOperation.java:81)
at com.mongodb.Mongo.execute(Mongo.java:836)
at com.mongodb.Mongo$2.execute(Mongo.java:823)
at com.mongodb.DBCursor.initializeCursor(DBCursor.java:870)
at com.mongodb.DBCursor.hasNext(DBCursor.java:142)
at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1964)
... 37 common frames omitted
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at com.mongodb.connection.SocketStream.read(SocketStream.java:85)
at com.mongodb.connection.InternalStreamConnection.receiveResponseBuffers(InternalStreamConnection.java:494)
at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:224)
... 57 common frames omitted
java version 1.8.
spring boot version 1.5.3.
deployed with docker.
mongo.hosts=ip:port,ip:port,ip:port
mongo.database.name=dbname
mongo.username=username
mongo.password=pwd
mongo.connections.per.host=32
mongo.max.wait.time=2000
mongo.connect.timeout=2000

You can try,
autoConnectRetry simply means the driver will automatically attempt to reconnect to the server(s) after unexpected disconnects. In production environments you usually want this set to true.
This is from another post, How to configure MongoDB Java driver MongoOptions for production use?

for everybody who is experiencing the same random MongoSocketReadException, you may need the socketTimeoutMS or maxIdleTimeMS parameters instead. The parameter autoConnectRetry is not exposed any more in the mongodb connection string.
Our situation: we switched to mongodb atlas serverless solution for our development and testing environments, ever since then we got this MongoSocketReadException like every 15 min. or randomly. We are also behind a enterprise firewall.
According to https://www.mongodb.com/docs/v6.0/tutorial/connection-pool-performance-tuning/:
a misconfigured firewall closes a socket connection incorrectly and the driver cannot detect that the connection closed improperly.
you need => Use socketTimeoutMS to ensure that sockets are always closed. Set socketTimeoutMS to two or three times the length of the slowest operation that the driver runs.
because the socketTimeoutMS is by default 0, which will never timeout.
And another parameter maxIdleTimeMS may also affect the connection because if the socket is closed and on the client side it's not detected, the connection will be still waiting in idle time and not cloesd. And by default it's 0 meaning it waits forever with no upper boundaries.
So configure this to a small amount may help the driver to close the the problematic connection with its closed socket, before it tries to connect to the db using the same connection and presumes the connection is still there.
So our solution:
...mongodbUri...?socketTimeoutMS=150000&maxIdleTimeMS=150000
we changed the socketTimeoutMS from 0 to 15s and same for the maxIdleTimeMS.

Socket survives JBoss shutdown - how to fix?

I have a NIO-based app (uses mina-core) that starts listening on a socket when it gets contextInitialized() from JBoss and issues IoAcceptor.unbind() in response to contextDestroyed(). The concrete subclass of IoAcceptor is NioSocketAcceptor
However, every time I restart JBoss, I get the following error when trying to create the new socket:
Caused by: java.io.IOException: Error while binding on 0.0.0.0/0.0.0.0:9191
original message : Address already in use
at org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:238) [mina-core-2.0.9.jar:]
at org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:51) [mina-core-2.0.9.jar:]
at org.apache.mina.core.polling.AbstractPollingIoAcceptor.registerHandles(AbstractPollingIoAcceptor.java:583) [mina-core-2.0.9.jar:]
at org.apache.mina.core.polling.AbstractPollingIoAcceptor.access$400(AbstractPollingIoAcceptor.java:71) [mina-core-2.0.9.jar:]
at org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.java:457) [mina-core-2.0.9.jar:]
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) [mina-core-2.0.9.jar:]
Do I need to do something more than call unbind?

Cannot programmatically submit Spark application (with Cassandra connector) to cluster from remote client

I'm running a standalone Spark cluster on EC2, and I'm writing a application using Spark-Cassandra connector driver and try to submit job to Spark cluster programmatically.
The job itself is simple:
public static void main(String[] args) {
SparkConf conf;
JavaSparkContext sc;
conf = new SparkConf()
.set("spark.cassandra.connection.host", host);
conf.set("spark.driver.host", "[my_public_ip]");
conf.set("spark.driver.port", "15000");
sc = new JavaSparkContext("spark://[spark_master_host]","test",conf);
CassandraJavaRDD<CassandraRow> rdd = javaFunctions(sc).cassandraTable(
"keyspace", "table");
System.out.println(rdd.first().toString());
sc.stop();
}
Which runs fine when I run that in the Spark Master node of my EC2 cluster.
I'm trying to running this in a remote Windows client.
The problem was from these two lines:
conf.set("spark.driver.host", "[my_public_ip]");
conf.set("spark.driver.port", "15000");
First, if i comment out these 2 lines, application would not throw a exception, but the Executor is not running, with following log:
14/12/06 22:40:03 INFO client.AppClient$ClientActor: Executor updated: app-20141207033931-0021/3 is now LOADING
14/12/06 22:40:03 INFO client.AppClient$ClientActor: Executor updated: app-20141207033931-0021/0 is now EXITED (Command exited with code 1)
14/12/06 22:40:03 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141207033931-0021/0 removed: Command exited with code 1
Which never ends, when I check the worker node log, I found:
14/12/06 22:40:21 ERROR security.UserGroupInformation: PriviledgedActionException as:[username] cause:java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1134)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:156)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.security.PrivilegedActionException: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
... 4 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:125)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
... 7 more
I've no idea what that's about, my guess is that probably worker node could not connect to driver, which probably initially set as:
14/12/06 22:39:30 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#[some_host_name]:52660]
14/12/06 22:39:30 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver#[some_host_name]:52660]
Obviously, no DNS is going to resolve my host name...
Since I can't set deploy mode to "client" or "cluster", if not via ./spark-submit script.(Which I think that's absurd...). I try to add a host resolution "XX.XXX.XXX.XX [host-name]" in /etc/hosts of all Spark Master Worker nodes.
No luck of course...
That leads me to the second, un-comment that two line;
Which gives me:
14/12/06 22:59:41 INFO Remoting: Starting remoting
14/12/06 22:59:41 ERROR Remoting: Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
at akka.remote.Remoting.start(Remoting.scala:194)
...
Cause:
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: /[my_public_ip]:15000
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:391)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:388)
I double checked my firewall setting and router setting, confirm that my firewall is diabled; and netstat -an to confirm port 15000 is not in use (in fact I tried to change to several available port, no luck); and I ping my public ip from both other machine and machine from my cluster, no problem.
Now I'm utterly screw up, I just run out of idea try to fix this. Any suggestions? Any help is appreciated!

Please check if 15000 is in your security group.

So many "Broken pipe IOE" in ZooKeeper log

Can any one help to explain why I get so many "Broken pipe IOE" in ZooKeeper log?
ZooKeeper throws this exception almost every minute. I don't think we use the four letter command to dumpWatches so frequently. So what does this mean?
This may be caused by the command wchc because our ZooKeeper has more than ten thousand znodes. And I have found that this command is executed from the same server with the different port. Will ZooKeeper call this command frequently?
2014-09-17,10:52:09,179 ERROR org.apache.zookeeper.server.NIOServerCnxn: [myid:0] Error sending data synchronously
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:69)
at sun.nio.ch.IOUtil.write(IOUtil.java:40)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:336)
at org.apache.zookeeper.server.NIOServerCnxn.sendBufferSync(NIOServerCnxn.java:138)
at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:453)
at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.write(NIOServerCnxn.java:474)
at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111)
at java.io.BufferedWriter.write(BufferedWriter.java:212)
at java.io.PrintWriter.write(PrintWriter.java:412)
at java.io.PrintWriter.write(PrintWriter.java:429)
at java.io.PrintWriter.print(PrintWriter.java:559)
at java.io.PrintWriter.println(PrintWriter.java:695)
at org.apache.zookeeper.server.WatchManager.dumpWatches(WatchManager.java:166)
at org.apache.zookeeper.server.DataTree.dumpWatches(DataTree.java:1240)
at org.apache.zookeeper.server.NIOServerCnxn$WatchCommand.commandRun(NIOServerCnxn.java:722)
at org.apache.zookeeper.server.NIOServerCnxn$CommandThread.run(NIOServerCnxn.java:496)
2014-09-17,10:52:09,179 INFO org.apache.zookeeper.server.NIOServerCnxn: [myid:0] Closed socket connection for client /10.20.201.234:53756 which had sessionid 0x34840357f664081
2014-09-17,10:52:09,179 ERROR org.apache.zookeeper.server.NIOServerCnxn: [myid:0] Error sending data synchronously
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:69)
at sun.nio.ch.IOUtil.write(IOUtil.java:40)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:336)
at org.apache.zookeeper.server.NIOServerCnxn.sendBufferSync(NIOServerCnxn.java:138)
at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:453)
at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.write(NIOServerCnxn.java:474)
at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111)
at java.io.BufferedWriter.flush(BufferedWriter.java:235)
at java.io.PrintWriter.flush(PrintWriter.java:276)
at org.apache.zookeeper.server.NIOServerCnxn.cleanupWriterSocket(NIOServerCnxn.java:424)
at org.apache.zookeeper.server.NIOServerCnxn.access$000(NIOServerCnxn.java:60)
at org.apache.zookeeper.server.NIOServerCnxn$CommandThread.run(NIOServerCnxn.java:500)

I have found the reason. We're using TaoKeeper to monitor the status of ZooKeeper. It will periodically send wchc and other commands to check the status. When ZooKeeper receives wchc, the exception occurs because we have too many znodes.
I think TaoKeeper should use mntr rather than wchc which may affect ZooKeeper's performance when it has large number of znodes.

Issues with CORBA communication

This maybe too localized, but I am hoping someone can help me articulate my questions properly.
So, we have a front end web server that communicates with a back end app server using CORBA. I have been asked to port the back end application to a LINUX box, which I did. But, in order to test it, I am trying to point the front end web server to the LINUX back end.
We are using omniORB-4.1.4, and here is how the instance of the back end system is obtained:
String args[] = new String[0];
System.out.println(getDateTime()+"Instance: Connecting to: "+initialHost+" "+initialPort+" "+enviornment+" "+version);
java.util.Properties props = new java.util.Properties();
props.put("org.omg.CORBA.ORBInitialPort", initialPort);
props.put("org.omg.CORBA.ORBInitialHost", initialHost);
props.put("com.sun.CORBA.giop.ORBGIOPVersion", "1.0");
orb=org.omg.CORBA.ORB.init(args,props);
When both and front end and back end are running on a SUN/Solaris box, it seems to get an instance fine. But, when the back end is running on a LINUX box, it gives a Connection Refused exception, and the hostname as 127.0.0.1
2012/10/22 13:53:22.033 EvaluateInstance: Connecting to: cmrheldv 23026 DEV87 0871
Oct 22, 2012 1:53:22 PM com.sun.corba.se.impl.transport.SocketOrChannelConnectionImpl <init>
WARNING: "IOP00410201: (COMM_FAILURE) Connection failure: socketType: IIOP_CLEAR_TEXT; hostname: cmrheldv; port: 23026"
org.omg.CORBA.COMM_FAILURE: vmcid: SUN minor code: 201 completed: No
at com.sun.corba.se.impl.logging.ORBUtilSystemException.connectFailure(ORBUtilSystemException.java:2200)
at com.sun.corba.se.impl.logging.ORBUtilSystemException.connectFailure(ORBUtilSystemException.java:2221)
at com.sun.corba.se.impl.transport.SocketOrChannelConnectionImpl.<init>(SocketOrChannelConnectionImpl.java:205)
at com.sun.corba.se.impl.transport.SocketOrChannelConnectionImpl.<init>(SocketOrChannelConnectionImpl.java:218)
at com.sun.corba.se.impl.transport.SocketOrChannelContactInfoImpl.createConnection(SocketOrChannelContactInfoImpl.java:101)
at com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.beginRequest(CorbaClientRequestDispatcherImpl.java:171)
at com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.request(CorbaClientDelegateImpl.java:118)
at com.sun.corba.se.impl.resolver.BootstrapResolverImpl.invoke(BootstrapResolverImpl.java:74)
at com.sun.corba.se.impl.resolver.BootstrapResolverImpl.resolve(BootstrapResolverImpl.java:107)
at com.sun.corba.se.impl.resolver.CompositeResolverImpl.resolve(CompositeResolverImpl.java:22)
at com.sun.corba.se.impl.resolver.CompositeResolverImpl.resolve(CompositeResolverImpl.java:22)
at com.sun.corba.se.impl.resolver.CompositeResolverImpl.resolve(CompositeResolverImpl.java:22)
at com.sun.corba.se.impl.orb.ORBImpl.resolve_initial_references(ORBImpl.java:1151)
at EvaluateInstance.InitializeModules(EvaluateInstance.java:152)
at EvaluateInstance.initializeVariables(EvaluateInstance.java:326)
at EvaluateCF.initializeInstances(EvaluateCF.java:1792)
at EvaluateCF.processRequest(EvaluateCF.java:112)
at coldfusion.tagext.CfxTag.doStartTag(CfxTag.java:102)
at coldfusion.runtime.CfJspPage._emptyTcfTag(CfJspPage.java:2722)
at cfconfglobalconstants2ecfm330318830._factor9(/opt/jrun4/servers/or_dev87/cfusion-ear/cfusion-war/origenate/confglobalconstants.cfm:372)
at cfconfglobalconstants2ecfm330318830._factor10(/opt/jrun4/servers/or_dev87/cfusion-ear/cfusion-war/origenate/confglobalconstants.cfm:13)
at cfconfglobalconstants2ecfm330318830._factor11(/opt/jrun4/servers/or_dev87/cfusion-ear/cfusion-war/origenate/confglobalconstants.cfm:6)
at cfconfglobalconstants2ecfm330318830.runPage(/opt/jrun4/servers/or_dev87/cfusion-ear/cfusion-war/origenate/confglobalconstants.cfm:1)
at coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:231)
at coldfusion.tagext.lang.IncludeTag.doStartTag(IncludeTag.java:416)
at coldfusion.runtime.CfJspPage._emptyTcfTag(CfJspPage.java:2722)
at cfapp_globals2ecfm1890385339.runPage(/opt/jrun4/servers/or_dev87/cfusion-ear/cfusion-war/origenate/app_globals.cfm:61)
at coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:231)
at coldfusion.tagext.lang.IncludeTag.doStartTag(IncludeTag.java:416)
at coldfusion.runtime.CfJspPage._emptyTcfTag(CfJspPage.java:2722)
at cfapp_locals2ecfm610494134.runPage(/opt/jrun4/servers/or_dev87/cfusion-ear/cfusion-war/origenate/securitycontrol/app_locals.cfm:49)
at coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:231)
at coldfusion.tagext.lang.IncludeTag.doStartTag(IncludeTag.java:416)
at coldfusion.runtime.CfJspPage._emptyTcfTag(CfJspPage.java:2722)
at cfdefault2ecfm129406838._factor9(/opt/jrun4/servers/or_dev87/cfusion-ear/cfusion-war/origenate/securitycontrol/default.cfm:107)
at cfdefault2ecfm129406838.runPage(/opt/jrun4/servers/or_dev87/cfusion-ear/cfusion-war/origenate/securitycontrol/default.cfm:1)
at coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:231)
at coldfusion.tagext.lang.IncludeTag.doStartTag(IncludeTag.java:416)
at coldfusion.filter.CfincludeFilter.invoke(CfincludeFilter.java:65)
at coldfusion.filter.ApplicationFilter.invoke(ApplicationFilter.java:381)
at coldfusion.filter.RequestMonitorFilter.invoke(RequestMonitorFilter.java:48)
at coldfusion.filter.MonitoringFilter.invoke(MonitoringFilter.java:40)
at coldfusion.filter.PathFilter.invoke(PathFilter.java:94)
at coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:70)
at coldfusion.filter.BrowserDebugFilter.invoke(BrowserDebugFilter.java:79)
at coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:28)
at coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38)
at coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:46)
at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)
at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
at coldfusion.filter.CachingFilter.invoke(CachingFilter.java:62)
at coldfusion.CfmServlet.service(CfmServlet.java:200)
at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)
at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
at jrun.servlet.FilterChain.service(FilterChain.java:101)
at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:286)
at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
at jrun.servlet.http.WebService.invokeRunnable(WebService.java:172)
at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect(Native Method)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:500)
at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
at com.sun.corba.se.impl.transport.DefaultSocketFactoryImpl.createSocket(DefaultSocketFactoryImpl.java:60)
at com.sun.corba.se.impl.transport.SocketOrChannelConnectionImpl.<init>(SocketOrChannelConnectionImpl.java:188)
... 64 more
Is it something to do with the properties? Because the two boxes can hear each other perfectly well on the defined ports.
Also, when I do a netstat on the Linux box, it shows up a LISTENING connection to the port from the Solaris box. What makes the response return as a 127.0.0.1?
The Solaris box is called yyyy, and the Linux box is called xxxx. The initial port and host is gotten through an ini file.

The problem was nothing to do with the code. It was something to do with connectivity.
On the LINUX box, cmrheldv, I had to edit the /etc/hosts file and change the 127.0.0.1 to the real ip for the localhost.
http://jeewesley.blogspot.com/2008/12/glassfish-ejb3-remote-interface-on.html

Try to set the path to your variable and start orbd.
Open cmd and type:
set path=%path%;"C:\Program Files\Java\jdk1.8.0_65\bin"
After that:
start orbd

You should first start the ORB port with this command :
orbd -ORBInitialPort [PORT NUMBER]

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.