Zookeeper Network Ensemble does not start appropiately - java

I've been working with zookeeper lately to fill a requirement of reliablity in distributed applications. I'm working with three computers and I followed this tutorial:
http://sanjivblogs.blogspot.ie/2011/04/deploying-zookeeper-ensemble.html
I did step by step to ensure I did it well, but now when I start my zookeepers with the
./zkServer.sh start
I'm getting these exceptions for all my computers:
2013-04-05 21:46:58,995 [myid:2] - WARN [SendWorker:1:QuorumCnxManager$SendWorker#679] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1961)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2038)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:831)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:62)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:667)
2013-04-05 21:46:58,995 [myid:2] - WARN [SendWorker:1:QuorumCnxManager$SendWorker#688] - Send worker leaving thread
2013-04-05 21:47:58,363 [myid:2] - WARN [RecvWorker:3:QuorumCnxManager$RecvWorker#762] - Connection broken for id 3, my id = 2, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:747)
But I don't know what I am doing wrong to get this. My objective is to syncrhonize my zookeepers in different machines to get always an available service. I went to zookeeper.apache.org Web Päge and look for the same information on how to configure and start my zookeeper, but are the same steps I followed before.
If somebody could help me please I would really appretiate it. Thanks in advance.

I will need to follow some strict steps to achieve this, but finally done it. If somebody is facing the same issue, to make the zookeeper enssemble, please remember:
You need 3 zookeeper servers running (local or over the network), this is the minimum number to achieve the synchronization. In each server, it is needed to create a file called "myid" (inside the zookeeper folder), the content of each myid file must be a sequential number, for instance, I have three zookeeper servers (folders), so I have a myid with content 1, other with content 2 and other with content 3.
Then in the zoo.cfg it is necessary to stablish the parameters required:
tickTime=2000
#dataDir=/var/lib/zookeeper
dataDir=/home/mtataje/var/zookeeper1
clientPort=2184
initLimit=10
syncLimit=20
server.1=192.168.3.41:2888:3888
server.2=192.168.3.41:2889:3889
server.3=192.168.3.41:2995:2999
The zoo.cfg varies from each server to another, in my case because I was testing on local, I needed to change the port and the dataDir.
After that, excutes the:
./zkServer.sh start
Maybe some exceptions will appear, but it is because at least two zookeepers must be synchronized, when you start at least 2 zookeepers, exceptions should be gone.
Best regards.

Related

Finagle service discovery issue

Here I wanted to register to 2 endpoints and send requests to them. You can see this in the code below. I name one env1 and the other env2.
val client = Http.client
.configured(Transport.Options(noDelay = false, reuseAddr = false))
.newService("gexampleapi-env1.localhost.net:8081,gexampleapi-env2.localhost.net:8081")
So far everything is normal. But env1 instance had to be down for some reason(for a few hours' maintenance etc. not sure why.). Under normal circumstances, our expectation is that it continues to send requests through the env2 instance. But this didn't happen. Could not send requests to both servers. Normally it was working correctly, but it didn't work that day for a reason we don't know.
Since the event took place months ago, I only have the following log.
2022-02-15 12:09:40,181 [finagle/netty4-1-3] INFO com.twitter.finagle
FailureAccrualFactory marking connection to "gExampleAPI" as dead.
Remote Address:
Inet(gexampleapi-env1.localhost.net/10.0.0.1:8081,Map())
To solve the problem, we removed gexampleapi-env1.localhost.net:8081 host from the config file. and after restarting it continued to process requests. If you have any ideas about why we may have experienced this problem and how to avoid this next time, I would appreciate it if you could share them.

akka.pattern.AskTimeoutException while running Lagom HelloWorld example

I have a problem while trying my hands on the Hello World example explained here.
Kindly note that I have just modified the HelloEntity.java file to be able to return something other than "Hello, World!". Most certain my changes are taking time and hence I am getting the below Timeout error.
I am currently trying (doing a PoC) on a single node to understand the Lagom framework and do not have liberty to deploy multiple nodes.
I have also tried modifying the default lagom.circuit-breaker in application.conf "call-timeout = 100s" however, this does not seem to have helped.
Following is the exact error message for your reference:
{"name":"akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://hello-impl-application/system/sharding/HelloEntity#1074448247]] after [5000 ms]. Sender[null] sent message of type \"com.lightbend.lagom.javadsl.persistence.CommandEnvelope\".","detail":"akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://hello-impl-application/system/sharding/HelloEntity#1074448247]] after [5000 ms]. Sender[null] sent message of type \"com.lightbend.lagom.javadsl.persistence.CommandEnvelope\".\n\tat akka.pattern.PromiseActorRef$.$anonfun$defaultOnTimeout$1(AskSupport.scala:595)\n\tat akka.pattern.PromiseActorRef$.$anonfun$apply$1(AskSupport.scala:605)\n\tat akka.actor.Scheduler$$anon$4.run(Scheduler.scala:140)\n\tat scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:866)\n\tat scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:109)\n\tat scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)\n\tat scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:864)\n\tat akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)\n\tat akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)\n\tat akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)\n\tat akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)\n\tat java.lang.Thread.run(Thread.java:748)\n"}
Question: Is there a way to increase the akka Timeout by modifying the application.conf or any of the java source files in the Hello World project? Can you please help me with the exact details.
Thanks in advance for you time and help.
The call timeout is the timeout for circuit breakers, which is configured using lagom.circuit-breaker.default.call-timeout. But that's not what is timing out above, the thing that is timing out above is the request to your HelloEntity, that timeout is configured using lagom.persistence.ask-timeout. The reason why there's a timeout on requests to entities is because in a multi-node environment, your entities are sharded across nodes, so an ask on them may go to another node, which is why a timeout is needed in case that node is not responding.
All that said, I don't think changing the ask-timeout will solve your problem. If you have a single node, then your entities should respond instantly if everything is working ok.
Is that the only error you're seeing in the logs?
Are you seeing this in devmode (ie, using the runAll command), or are you running the Lagom service some other way?
Is your database responding?
Thanks James for the help/pointer.
Adding following lines to resources/application.conf did the trick for me:
lagom.persistence.ask-timeout=30s
hello {
..
..
call-timeout = 30s
call-timeout = ${?CIRCUIT_BREAKER_CALL_TIMEOUT}
..
}
A Call is a Service-to-Service communication. That’s a SeviceClient communicating to a remote server. It uses a circuit breaker. It is a extra-service call.
An ask (in the context of lagom.persistence) is sending a command to a persistent entity. That happens across the nodes insied your Lagom service. It is not using circuit breaking. It is an intra-service call.

How to avoid "RuntimeException: My id 0 not in the peer list" when starting embedded Zookeeper?

I'm trying to start a zookeeper node with QuorumPeerMain.runFromConfig
in a BeforeAll method so that I can use it in tests
(it's embedded in the same JVM). I'm getting
Invalid configuration, only one server specified (ignoring)
followed by
java.lang.RuntimeException: My id 0 not in the peer list.
My config is:
dataDir=/tmp/zookeeper
clientPort=2181
maxClientCnxns=0
server.0=127.0.0.1:2888:3888
That last line, which should reference the node itself, is getting discarded after the "Invalid configuration" error message, and then later zookeeper throws an Exception because its id is not in the list. Its id, 0, would have been in the list if zookeeper hadn't just discarded it. How can I avoid this RuntimeException?
Or, is there a better way to run zookeeper from a BeforeAll?
As indicated in the configuration, the ID must be unique and between 1 and 255. You have set an ID of 0 that is not in this set, which causes this error.
Documentation: https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkMulitServerSetup
Try Apache Curator testing framework. It's much more comfortable to work with. See: TestingServer javadoc for details.
You can find test examples in github

Grizzly maxPendingBytes property ignored

I'm developing a project using Grizzly 2.3.22 with its Websocket support. Everything was OK until OOM happened. Looking through the dump I found that all the memory was eaten up by a single org.glassfish.grizzly.nio.transport.TCPNIOConnection holding a huge (1,5GB) write queue. I guess one of the client developers was debugging their connected application and stopped on a breakpoint for a long time. Anyway, this can easily happen if a client has a very slow connection - my server should be ready for that.
In the Grizzly documentation I found the maxPendingBytes property, which seem like a solution, at least for now. But I cannot get it to work at all. I set log level to ALL for AbstractNIOAsyncQueueWriter, connect with the client, put it on hold and observe how the server's queue grows like this:
TRACE 2016-07-05 21:02:26.330 [nioEventLoopGroup-2-1] o.g.g.n.AbstractNIOAsyncQueueWriter - AsyncQueueWriter.write connection=TCPNIOConnection{localSocketAddress={/127.0.0.1:8445}, peerSocketAddress={/127.0.0.1:56185}}, record=org.glassfish.grizzly.asyncqueue.AsyncWriteQueueRecord#1e35bafb, directWrite=false, size=165, isUncountable=false, bytesToReserve=165, pendingBytes=16170
TRACE 2016-07-05 21:02:26.368 [nioEventLoopGroup-2-1] o.g.g.n.AbstractNIOAsyncQueueWriter - AsyncQueueWriter.write connection=TCPNIOConnection{localSocketAddress={/127.0.0.1:8445}, peerSocketAddress={/127.0.0.1:56185}}, record=org.glassfish.grizzly.asyncqueue.AsyncWriteQueueRecord#3d6e05dd, directWrite=false, size=165, isUncountable=false, bytesToReserve=165, pendingBytes=16335
...
When I set maxPendingBytes=10000 I expect an exception thrown when the pendingBytes from the log above becomes larger than 10000, but it doesn't happen.
Moreover, I tried debugging the server with the Grizzly's source code, and found that while the property's value does get assigned to the NIOConnection.maxAsyncWriteQueueSize field, the AbstractNIOAsyncQueueWriter.canWrite(...) method - the only place where the field seems to be used - is never called.
I'm at a loss. Am I missing something here?

MQJE018: Protocol error - unexpected segment type received

Calling all MQ Gurus,
I have a box under my desk which we use to replicate our production environment which is:
WebSphere 6.1
Fedora Linux
MQ 6.0
Whenever one of our applications tries to send a message to a MQ queue we get the following error: MQJE018: Protocol error - unexpected segment type received
Any suggestions on what this might mean would be appreciated, stack traces are below.
Dump of callerThis =
Object type = com.ibm.ejs.jms.listener.MDBListenerImpl
com.ibm.ejs.jms.listener.MDBListenerImpl#744c744c
==> Performing default dump from com.ibm.ejs.jms.JMSDiagnosticModule = Wed May 06 13:09:58 BST 2009
Dump of callerThis =
Object type = com.ibm.ejs.jms.listener.MDBListenerImpl
com.ibm.ejs.jms.listener.MDBListenerImpl#744c744c
Linked exception = com.ibm.mq.MQException: MQJE001: An MQException occurred: Completion Code 2, Reason 2195
MQJE018: Protocol error - unexpected segment type received
at com.ibm.mq.MQManagedConnectionJ11.<init>(MQManagedConnectionJ11.java:238)
at com.ibm.mq.MQClientManagedConnectionFactoryJ11._createManagedConnection(MQClientManagedConnectionFactoryJ11.java:318)
at com.ibm.mq.MQClientManagedConnectionFactoryJ11.createManagedConnection(MQClientManagedConnectionFactoryJ11.java:338)
at com.ibm.mq.StoredManagedConnection.<init>(StoredManagedConnection.java:84)
at com.ibm.mq.MQSimpleConnectionManager.allocateConnection(MQSimpleConnectionManager.java:168)
at com.ibm.mq.MQQueueManagerFactory.obtainBaseMQQueueManager(MQQueueManagerFactory.java:774)
at com.ibm.mq.MQQueueManagerFactory.procure(MQQueueManagerFactory.java:690)
at com.ibm.mq.MQQueueManagerFactory.constructQueueManager(MQQueueManagerFactory.java:646)
at com.ibm.mq.MQQueueManagerFactory.createQueueManager(MQQueueManagerFactory.java:153)
at com.ibm.mq.MQQueueManager.<init>(MQQueueManager.java:544)
at com.ibm.mq.MQSPIQueueManager.<init>(MQSPIQueueManager.java:69)
at com.ibm.mq.jms.MQConnection.createQM(MQConnection.java:2401)
at com.ibm.mq.jms.MQConnection.createQMXA(MQConnection.java:1783)
at com.ibm.mq.jms.MQQueueConnection.<init>(MQQueueConnection.java:110)
at com.ibm.mq.jms.MQQueueConnection.<init>(MQQueueConnection.java:67)
at com.ibm.mq.jms.MQXAQueueConnection.<init>(MQXAQueueConnection.java:57)
at com.ibm.mq.jms.MQXAQueueConnectionFactory.createXAQueueConnection(MQXAQueueConnectionFactory.java:80)
EDIT:I have looked up the reason codes in the IBM documentation which gives little help
2195 (X'0893')
MQRC_UNEXPECTED_ERROR
Explanation:
The call was rejected because an unexpected error occurred.
Completion Code:
MQCC_FAILED
Programmer Response:
Check the application's parameter list to ensure, for example, that the correct number of parameters was passed, and that data pointers and storage keys are valid. If the problem cannot be resolved, contact your system programmer.
* On z/OS, check whether any information has been displayed on the console. If this error occurs on an MQCONN or MQCONNX call, check that the subsystem named is an active MQ subsystem. In particular, check that it is not a DB2(TM) subsystem. If the problem cannot be resolved, rerun the application with a CSQSNAP DD card (if you have not already got a dump) and send the resulting dump to IBM.
* On OS/2 and i5/OS, consult the FFST record to obtain more detail about the problem.
* On HP OpenVMS, Compaq NonStop Kernel, and UNIX systems, consult the FDC file to obtain more detail about the problem.
OK, finally managed to get around this error. It was nothing to do with the MQ install its self. I knew it was some kind of network issue so I changed the host name settings within websphere from the hostname to the ip of the box and everything worked fine.
Note, I made the host name changes on the queues and on the queue connection factories.
Hope that helps someone.
Karl

Categories

Resources