Connection's issue on cassandra database - java

i have problems to connect cassandra to spark, i can connect to cassandra by cqlsh but when i launch my program:
public static void main(String[] args) {
Cluster cluster;
Session session;
cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
session = cluster.connect();
SparkConf conf = new SparkConf conf = new SparkConf().setAppName("CassandraExamples").setMaster("local[1]")
.set("spark.cassandra.connection.host", "9.168.86.84");
JavaSparkContext sc = new JavaSparkContext("spark://9.168.86.84:9042","CassandraExample",conf);
CassandraJavaPairRDD<String, String> rdd1 = javaFunctions(sc).cassandraTable("keyspace", "table",
mapColumnTo(String.class), mapColumnTo(String.class)).select("row1", "row2");
System.out.println("Data fetched: \n" + StringUtils.join(rdd1.toArray(), "\n"));
}
i'm getting this error:
15/06/11 11:41:15 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster#9.168.86.84:9042]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: no further information: /9.168.86.84:9042
15/06/11 11:41:34 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#9.168.86.84:9042/user/Master...
15/06/11 11:41:35 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster#9.168.86.84:9042: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#9.168.86.84:9042
15/06/11 11:41:35 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster#9.168.86.84:9042]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: no further information: /9.168.86.84:9042
15/06/11 11:41:54 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#9.168.86.84:9042/user/Master...
15/06/11 11:41:55 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster#9.168.86.84:9042: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#9.168.86.84:9042
15/06/11 11:41:55 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster#9.168.86.84:9042]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: no further information: /9.168.86.84:9042
15/06/11 11:42:14 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/06/11 11:42:14 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet.
15/06/11 11:42:14 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
cassandra.yaml has this properties:
listen_address: 9.168.86.84
start_native_transport: true
rpc_address: 0.0.0.0
native_transport_port: 9042
rpc_port: 9160
can someone tell my what's wrong?

Related

Problem at creating EMS application supporting Failover/FaultTolerance

I am starting to study how can I implement an application supporting Failover/FaultTolerance on top of JMS, more precisely EMS
I configured two EMS servers working both with FaultTolerance enabled:
For EMS running on server on server1 I have
in tibemsd.conf
ft_active = tcp://server2:7232
in factories.conf
[GenericConnectionFactory]
type = generic
url = tcp://server1:7232
[FTTopicConnectionFactory]
type = topic
url = tcp://server1:7232,tcp://server2:7232
[FTQueueConnectionFactory]
type = queue
url = tcp://server1:7232,tcp://server2:7232
And for EMS running on server on server2 I have
in tibemsd.conf
ft_active = tcp://server1:7232
in factories.conf
[GenericConnectionFactory]
type = generic
url = tcp://server2:7232
[FTTopicConnectionFactory]
type = topic
url = tcp://server2:7232,tcp://server1:7232
[FTQueueConnectionFactory]
type = queue
url = tcp://server2:7232,tcp://server1:7232
I am not a TIBCO EMS expert but my config seems to be good: When I start EMS on server1 I get:
$ tibemsd -config tibemsd.conf
...
2022-07-20 23:04:58.566 Server is active.
2022-07-20 23:05:18.563 Standby server 'SERVERNAME#server1' has connected.
then if I start EMS on server2, I get
$ tibemsd -config tibemsd.conf
...
2022-07-20 23:05:18.564 Accepting connections on tcp://server2:7232.
2022-07-20 23:05:18.564 Server is in standby state for 'tcp://server1:7232'
Moreover, if I kill active EMS on server1, I immediately get the following message on server2:
2022-07-20 23:21:52.891 Connection to active server 'tcp://server1:7232' has been lost.
2022-07-20 23:21:52.891 Server activating on failure of 'tcp://server1:7232'.
...
2022-07-20 23:21:52.924 Server is now active.
Until here, everything looks OK, active/standby EMS servers seems to be correctly configured
Things get more complicated when I write a piece of code how is supposed to connect to these EMS servers and to periodically publish messages. Let's try with the following code sample:
#Test
public void testEmsFailover() throws JMSException, InterruptedException {
int NB = 1000;
TibjmsConnectionFactory factory = new TibjmsConnectionFactory();
factory.setServerUrl("tcp://server1:7232,tcp://server2:7232");
Connection connection = factory.createConnection();
Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
connection.start();
for (int i = 0; i < NB; i++) {
LOG.info("sending message");
Queue queue = session.createQueue(QUEUE__CLIENT_TO_FRONTDOOR__CONNECTION_REQUEST);
MessageProducer producer = session.createProducer(queue);
MapMessage mapMessage = session.createMapMessage();
mapMessage.setStringProperty(PROPERTY__CLIENT_KIND, USER.toString());
mapMessage.setStringProperty(PROPERTY__CLIENT_NAME, "name");
producer.send(mapMessage);
LOG.info("done!");
Thread.sleep(1000);
}
}
If I run this code while both active and standby servers are up, everything looks good
23:26:32.431 [main] INFO JmsEndpointTest - sending message
23:26:32.458 [main] INFO JmsEndpointTest - done!
23:26:33.458 [main] INFO JmsEndpointTest - sending message
23:26:33.482 [main] INFO JmsEndpointTest - done!
Now If I kill the active EMS server, I would expect that
the standby server would instantaneously become the active one
my code would continue to publish such as if nothing had happened
However, in my code I get the following error:
javax.jms.JMSException: Connection is closed
at com.tibco.tibjms.TibjmsxLink.sendRequest(TibjmsxLink.java:307)
at com.tibco.tibjms.TibjmsxLink.sendRequestMsg(TibjmsxLink.java:261)
at com.tibco.tibjms.TibjmsxSessionImp._createProducer(TibjmsxSessionImp.java:1004)
at com.tibco.tibjms.TibjmsxSessionImp.createProducer(TibjmsxSessionImp.java:4854)
at JmsEndpointTest.testEmsFailover(JmsEndpointTest.java:103)
...
and in the logs of the server (the previous standby server supposed to be now the active one) I get
2022-07-20 23:32:44.447 [anonymous#cersei]: connect failed: server not in active state
2022-07-20 23:33:02.969 Connection to active server 'tcp://server2:7232' has been lost.
2022-07-20 23:33:02.969 Server activating on failure of 'tcp://server2:7232'.
2022-07-20 23:33:02.969 Server rereading configuration.
2022-07-20 23:33:02.971 Recovering state, please wait.
2022-07-20 23:33:02.980 Recovered 46 messages.
2022-07-20 23:33:02.980 Server is now active.
2022-07-20 23:33:03.545 [anonymous#cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:04.187 [anonymous#cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:04.855 [anonymous#cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:05.531 [anonymous#cersei]: reconnect failed: connection unknown for id=8
I would appreciate any help to enhance my code
Thank you
I think I found the origin of my problem:
according to the page Tibco-Ems Failover Issue, the error message
reconnect failed: connection unknown for id=8
means: "the store (ems db) was'nt share between the active and the standby node, so when the active ems failed, the new active ems was'nt able to recover connections and messages."
I realized that it is painful to configure a shared store. To avoid it, I configured two tibems on the same host, by following the page Step By Step How to Setup TIBCO EMS In Fault Tolerant Mode:
two tibemsd.conf configuration files
configure a different listen port in each file
configure ft_active with url of other server
configure factories.conf
By doing so, I can replay my test and it works as expected

unable to setup a connection between kafka(hortonworks sandbox) and intelliJ IDEA(local windows system)

Here is the exception:
exception:java.nio.channels.ClosedChannelException
whole logs in the console:
[main] INFO kafka.utils.Log4jControllerRegistration$ - Registered kafka:type=kafka.Log4jController MBean
[main] INFO kafka.utils.VerifiableProperties - Verifying properties
[main] INFO kafka.utils.VerifiableProperties - Property metadata.broker.list is overridden to xxx.xxx.xxx.xxx:6667
[main] INFO kafka.utils.VerifiableProperties - Property request.required.acks is overridden to 1
[main] INFO kafka.utils.VerifiableProperties - Property serializer.class is overridden to kafka.serializer.StringEncoder
[Thread-0] INFO kafka.client.ClientUtils$ - Fetching metadata from broker BrokerEndPoint(0,xxx.xxx.xxx.xxx,6667) with correlation id 0 for 1 topic(s) Set(test)
[Thread-0] INFO kafka.producer.SyncProducer - Connected to xxx.xxx.xxx.xxx:6667 for producing
[Thread-0] INFO kafka.producer.SyncProducer - Disconnecting from xxx.xxx.xxx.xxx:6667
[Thread-0] WARN kafka.client.ClientUtils$ - Fetching topic metadata with correlation id 0 for topics [Set(test)] from broker [BrokerEndPoint(0,xxx.xxx.xxx.xxx,6667)] failed
java.nio.channels.ClosedChannelException
I have found some answers online that told me that I should set the advertised.host.name in the server.properties but I really don't know to set which IP to advertised.host.name.
I am totally lost in this situation, here is the java code I write in IntelliJ, I just want to know which hostname should I put in BROKER_LIST to make a connection between hortonworks sandbox and local machine.
public class KafkaProperties {
public static final String ZK="127.0.0.1:2181";
public static final String TOPIC="test";
public static final String BROKER_LIST= "xx.xxx.xxx.xxx:6667";
}
server.properties:
# Generated by Apache Ambari. Sun Mar 1 19:04:58 2020
auto.create.topics.enable=true
auto.leader.rebalance.enable=true
compression.type=producer
controlled.shutdown.enable=true
controlled.shutdown.max.retries=3
controlled.shutdown.retry.backoff.ms=5000
controller.message.queue.size=10
controller.socket.timeout.ms=30000
default.replication.factor=1
delete.topic.enable=true
external.kafka.metrics.exclude.prefix=kafka.network.RequestMetrics,kafka.server.DelayedOperationPurgatory,kafka.server.BrokerTopicMetrics.BytesRejectedPerSec
external.kafka.metrics.include.prefix=kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetCommit.98percentile,kafka.network.RequestMetrics.ResponseQueueTimeMs.request.Offsets.95percentile,kafka.network.RequestMetrics.ResponseSendTimeMs.request.Fetch.95percentile,kafka.network.RequestMetrics.RequestsPerSec.request
fetch.purgatory.purge.interval.requests=10000
kafka.ganglia.metrics.group=kafka
kafka.ganglia.metrics.host=localhost
kafka.ganglia.metrics.port=8671
kafka.ganglia.metrics.reporter.enabled=true
kafka.metrics.reporters=
kafka.timeline.metrics.host_in_memory_aggregation=
kafka.timeline.metrics.host_in_memory_aggregation_port=
kafka.timeline.metrics.host_in_memory_aggregation_protocol=
kafka.timeline.metrics.hosts=
kafka.timeline.metrics.maxRowCacheSize=10000
kafka.timeline.metrics.port=
kafka.timeline.metrics.protocol=
kafka.timeline.metrics.reporter.enabled=true
kafka.timeline.metrics.reporter.sendInterval=5900
kafka.timeline.metrics.truststore.password=
kafka.timeline.metrics.truststore.path=
kafka.timeline.metrics.truststore.type=
leader.imbalance.check.interval.seconds=300
leader.imbalance.per.broker.percentage=10
listeners=PLAINTEXT://sandbox-hdp.hortonworks.com:6667
log.cleanup.interval.mins=10
log.dirs=/kafka-logs
log.index.interval.bytes=4096
log.index.size.max.bytes=10485760
log.retention.bytes=-1
log.retention.check.interval.ms=600000
log.retention.hours=168
log.roll.hours=168
log.segment.bytes=1073741824
message.max.bytes=1000000
min.insync.replicas=1
num.io.threads=8
num.network.threads=3
num.partitions=1
num.recovery.threads.per.data.dir=1
num.replica.fetchers=1
offset.metadata.max.bytes=4096
offsets.commit.required.acks=-1
offsets.commit.timeout.ms=5000
offsets.load.buffer.size=5242880
offsets.retention.check.interval.ms=600000
offsets.retention.minutes=86400000
offsets.topic.compression.codec=0
offsets.topic.num.partitions=50
offsets.topic.replication.factor=1
offsets.topic.segment.bytes=104857600
port=6667
producer.metrics.enable=false
producer.purgatory.purge.interval.requests=10000
queued.max.requests=500
replica.fetch.max.bytes=1048576
replica.fetch.min.bytes=1
replica.fetch.wait.max.ms=500
replica.high.watermark.checkpoint.interval.ms=5000
replica.lag.max.messages=4000
replica.lag.time.max.ms=10000
replica.socket.receive.buffer.bytes=65536
replica.socket.timeout.ms=30000
sasl.enabled.mechanisms=GSSAPI
sasl.mechanism.inter.broker.protocol=GSSAPI
security.inter.broker.protocol=PLAINTEXT
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
ssl.client.auth=none
ssl.key.password=
ssl.keystore.location=
ssl.keystore.password=
ssl.truststore.location=
ssl.truststore.password=
zookeeper.connect=sandbox-hdp.hortonworks.com:2181
zookeeper.connection.timeout.ms=25000
zookeeper.session.timeout.ms=30000
zookeeper.sync.time.ms=2000
now the code in IntelliJ:
public class KafkaProperties {
public static final String ZK="sandbox-hdp.hortonworks.com:2181";
public static final String TOPIC="yanzhao";
public static final String BROKER_LIST= "sandbox-hdp.hortonworks.com:6667";
}
/ect/hosts file:
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.18.0.2 sandbox-hdp.hortonworks.com sandbox-hdp
a part of jps command output
7846 QuorumPeerMain /usr/hdp/current/zookeeper-server/conf/zoo.cfg
363 AmbariServer
6444 JournalNode
25581 Kafka /usr/hdp/3.0.1.0-187/kafka/config/server.properties
connection:
[root#sandbox-hdp ~]# netstat -lpn | grep 6667
tcp 0 0 172.18.0.2:6667 0.0.0.0:* LISTEN 25581/java
the command I ran in the sandbox(set up the consumer):
kafka-console-consumer.sh --bootstrap-server sandbox-hdp.hortonworks.com:6667 --topic yanzhao
exception logs:
[main] INFO kafka.utils.Log4jControllerRegistration$ - Registered kafka:type=kafka.Log4jController MBean
[main] INFO kafka.utils.VerifiableProperties - Verifying properties
[main] INFO kafka.utils.VerifiableProperties - Property metadata.broker.list is overridden to sandbox-hdp.hortonworks.com:6667
[main] INFO kafka.utils.VerifiableProperties - Property request.required.acks is overridden to 1
[main] INFO kafka.utils.VerifiableProperties - Property serializer.class is overridden to kafka.serializer.StringEncoder
[Thread-0] INFO kafka.client.ClientUtils$ - Fetching metadata from broker BrokerEndPoint(0,sandbox-hdp.hortonworks.com,6667) with correlation id 0 for 1 topic(s) Set(yanzhao)
[Thread-0] INFO kafka.producer.SyncProducer - Connected to sandbox-hdp.hortonworks.com:6667 for producing
[Thread-0] INFO kafka.producer.SyncProducer - Disconnecting from sandbox-hdp.hortonworks.com:6667
[Thread-0] WARN kafka.client.ClientUtils$ - Fetching topic metadata with correlation id 0 for topics [Set(yanzhao)] from broker [BrokerEndPoint(0,sandbox-hdp.hortonworks.com,6667)] failed
java.nio.channels.ClosedChannelException
at kafka.network.BlockingChannel.send(BlockingChannel.scala:112)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:80)
at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:79)
at kafka.producer.SyncProducer.send(SyncProducer.scala:124)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:63)
at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:83)
at kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:76)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:85)
at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:76)
at kafka.producer.Producer.send(Producer.scala:78)
at kafka.javaapi.producer.Producer.send(Producer.scala:35)
at com.yanzhao.spark.kafka.KafkaProducer.run(KafkaProducer.java:32)
[Thread-0] INFO kafka.producer.SyncProducer - Disconnecting from sandbox-hdp.hortonworks.com:6667
[Thread-0] ERROR kafka.utils.CoreUtils$ - fetching topic metadata for topics [Set(yanzhao)] from broker [ArrayBuffer(BrokerEndPoint(0,sandbox-hdp.hortonworks.com,6667))] failed
kafka.common.KafkaException: fetching topic metadata for topics [Set(yanzhao)] from broker [ArrayBuffer(BrokerEndPoint(0,sandbox-hdp.hortonworks.com,6667))] failed
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:77)
at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:83)
at kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:76)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:85)
at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:76)
at kafka.producer.Producer.send(Producer.scala:78)
at kafka.javaapi.producer.Producer.send(Producer.scala:35)
at com.yanzhao.spark.kafka.KafkaProducer.run(KafkaProducer.java:32)
Caused by: java.nio.channels.ClosedChannelException
at kafka.network.BlockingChannel.send(BlockingChannel.scala:112)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:80)
at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:79)
at kafka.producer.SyncProducer.send(SyncProducer.scala:124)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:63)
... 7 more
You need to use the hostname of the machine on which your Kafka broker is running (and obviously not the IP of the machine the client is running on).
Now your client needs to use the address the Kafka broker publishes to the public. This address is configured through advertised.listeners:
Listeners to publish to ZooKeeper for clients to use, if different
than the listeners config property. In IaaS environments, this may
need to be different from the interface to which the broker binds. If
this is not set, the value for listeners will be used. Unlike
listeners it is not valid to advertise the 0.0.0.0 meta-address.
Therefore, you should use this address. In case the advertised.listeners is not configured in server.properties you can probably still use listeners address.
On a final note, I can see that you have used "127.0.0.1:2181" as the Zookeeper address. Likewise, you need to use the hostname of the machine where Zookeeper is running.

HBase RegionServer abort

When I start hbase on my cluster, HMaster process and HQuorumPeer process start on master node while only HQuorumPeer process starts on slaves.
In the GUI console, in the task section, I can see the master (node0) in the state RUNNING and the status "Waiting for region servers count to settle; currently checked in 0, slept for 250920 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms".
In the software attributes section I can find all my nodes in the zookeeper quorum with the description "Addresses of all registered ZK servers".
So It seems that Zookeeper is working but in the log file it seems to be the problem.
Log hbase-clusterhadoop-master:
2016-09-08 12:26:14,875 INFO [main-SendThread(node0:2181)] zookeeper.ClientCnxn: Opening socket connection to server node0/192.168.1.113:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Impossibile trovare una configurazione di login) 2016-09-08 12:26:14,882 WARN [main-SendThread(node0:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2016-09-08 12:26:14,994 WARN [main] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=node3:2181,node2:2181,node1:2181,node0:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
........
2016-09-08 12:32:53,063 INFO [master:node0:60000] zookeeper.ZooKeeper: Initiating client connection, connectString=node3:2181,node2:2181,node1:2181,node0:2181 sessionTimeout=90000 watcher=replicationLogCleaner0x0, quorum=node3:2181,node2:2181,node1:2181,node0:2181, baseZNode=/hbase
2016-09-08 12:32:53,064 INFO [master:node0:60000-SendThread(node3:2181)] zookeeper.ClientCnxn: Opening socket connection to server node3/192.168.1.112:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Impossibile trovare una configurazione di login)
2016-09-08 12:32:53,065 INFO [master:node0:60000-SendThread(node3:2181)] zookeeper.ClientCnxn: Socket connection established to node3/192.168.1.112:2181, initiating session
2016-09-08 12:32:53,069 INFO [master:node0:60000-SendThread(node3:2181)] zookeeper.ClientCnxn: Session establishment complete on server node3/192.168.1.112:2181, sessionid = 0x357095a4b940001, negotiated timeout = 90000
2016-09-08 12:32:53,072 INFO [master:node0:60000] zookeeper.RecoverableZooKeeper: Node /hbase/replication/rs already exists and this is not a retry
2016-09-08 12:32:53,072 DEBUG [master:node0:60000] cleaner.CleanerChore: initialize cleaner=org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner
2016-09-08 12:32:53,075 DEBUG [master:node0:60000] cleaner.CleanerChore: initialize cleaner=org.apache.hadoop.hbase.master.snapshot.SnapshotLogCleaner
2016-09-08 12:32:53,076 DEBUG [master:node0:60000] cleaner.CleanerChore: initialize cleaner=org.apache.hadoop.hbase.master.cleaner.HFileLinkCleaner
2016-09-08 12:32:53,077 DEBUG [master:node0:60000] cleaner.CleanerChore: initialize cleaner=org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner
2016-09-08 12:32:53,078 DEBUG [master:node0:60000] cleaner.CleanerChore: initialize cleaner=org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner
2016-09-08 12:32:53,078 INFO [master:node0:60000] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2016-09-08 12:32:54,607 INFO [master:node0:60000] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 1529 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2016-09-08 12:32:56,137 INFO [master:node0:60000] master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 3059 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
Log hbase-clusterhadoop-zookeeper-node0 (master):
2016-09-08 12:26:18,315 WARN [WorkerSender[myid=0]] quorum.QuorumCnxManager: Cannot open channel to 1 at election address node1/192.168.1.156:3888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:382)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:241)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:228)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431)
at java.net.Socket.connect(Socket.java:527)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Thread.java:695)
Log hbase-clusterhadoop-regionserver-node1 (one of the slave):
2016-09-08 12:33:32,690 INFO [regionserver60020-SendThread(node3:2181)] zookeeper.ClientCnxn: Opening socket connection to server node3/192.168.1.112:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Impossibile trovare una configurazione di login)
2016-09-08 12:33:32,691 INFO [regionserver60020-SendThread(node3:2181)] zookeeper.ClientCnxn: Socket connection established to node3/192.168.1.112:2181, initiating session
2016-09-08 12:33:32,692 INFO [regionserver60020-SendThread(node3:2181)] zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2016-09-08 12:33:32,793 WARN [regionserver60020] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=node3:2181,node2:2181,node1:2181,node0:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
2016-09-08 12:33:32,794 ERROR [regionserver60020] zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts
2016-09-08 12:33:32,794 WARN [regionserver60020] zookeeper.ZKUtil: regionserver:600200x0, quorum=node3:2181,node2:2181,node1:2181,node0:2181, baseZNode=/hbase Unable to set watcher on znode /hbase/master
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:222)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:427)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:778)
at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:751)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:884)
at java.lang.Thread.run(Thread.java:695)
2016-09-08 12:33:32,794 ERROR [regionserver60020] zookeeper.ZooKeeperWatcher: regionserver:600200x0, quorum=node3:2181,node2:2181,node1:2181,node0:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:222)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:427)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:778)
at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:751)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:884)
at java.lang.Thread.run(Thread.java:695)
2016-09-08 12:33:32,795 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING region server node1,60020,1473330794709: Unexpected exception during initialization, aborting
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:222)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:427)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:778)
at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:751)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:884)
at java.lang.Thread.run(Thread.java:695)
2016-09-08 12:33:32,798 FATAL [regionserver60020] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: []
2016-09-08 12:33:32,798 INFO [regionserver60020] regionserver.HRegionServer: STOPPED: Unexpected exception during initialization, aborting
2016-09-08 12:33:32,867 INFO [regionserver60020-SendThread(node0:2181)] zookeeper.ClientCnxn: Opening socket connection to server node0/192.168.1.113:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Impossibile trovare una configurazione di login)
Log hbase-clusterhadoop-zookeeper-node1:
2016-09-08 12:33:32,075 WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0%0:2181] quorum.Learner: Unexpected exception, tries=0, connecting to node3/192.168.1.112:2888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:382)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:241)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:228)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431)
at java.net.Socket.connect(Socket.java:527)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:225)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:71)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
2016-09-08 12:33:32,227 INFO [node1/192.168.1.156:3888] quorum.QuorumCnxManager: Received connection request /192.168.1.113:49844
2016-09-08 12:33:32,233 INFO [WorkerReceiver[myid=1]] quorum.FastLeaderElection: Notification: 1 (message format version), 0 (n.leader), 0x10000002d (n.zxid), 0x1 (n.round), LOOKING (n.state), 0 (n.sid), 0x1 (n.peerEpoch) FOLLOWING (my state)
2016-09-08 12:33:32,239 INFO [WorkerReceiver[myid=1]] quorum.FastLeaderElection: Notification: 1 (message format version), 3 (n.leader), 0x10000002d (n.zxid), 0x1 (n.round), LOOKING (n.state), 0 (n.sid), 0x1 (n.peerEpoch) FOLLOWING (my state)
2016-09-08 12:33:32,725 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.1.111:49534
2016-09-08 12:33:32,725 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2016-09-08 12:33:32,725 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.1.111:49534 (no session established for client)
The conf file abase-site:
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://node0:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>node0,node1,node2,node3</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/Users/clusterhadoop/usr/local/zookeeper</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/Users/clusterhadoop/usr/local/hbtmp</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>hbase.master</name>
<value>node0:60000</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.property.maxClientCnxns</name>
<value>1000</value>
</property>
</configuration>
Hosts file:
127.0.0.1 localhost
127.0.0.1 node3
192.168.1.112 node3
192.168.1.156 node1
192.168.1.111 node2
192.168.1.113 node0
Any idea on what is the problem and how to solve it?

Hazelcast: connecting to remote cluster

We have a cluster of Hazelcast nodes all running on one remote system (single physical system with many nodes). We would like to connect to this cluster from an external client - a Java application which uses code as below to connect to Hazelcast:
ClientConfig clientConfig = new ClientConfig();
clientConfig.addAddress(config.getHost() + ":" + config.getPort());
client = HazelcastClient.newHazelcastClient(clientConfig);
where, host is the IP of remote and port is 5701.
This still connects to the local host (127.0.0.1). What am I missing?
Edit:
If the java client is the only hazelcast app running on the local system, it fails to connect and throws the exception: java.lang.IllegalStateException: Cannot get initial partitions!
From the logs:
14:58:26.717 [main] INFO c.m.b.p.s.s.HazelcastCacheClient - creating
new Hazelcast instance
14:58:26.748 [main] INFO com.hazelcast.core.LifecycleService -
HazelcastClient[hz.client_0_dev][3.2.1] is STARTING
14:58:27.029 [main] INFO com.hazelcast.core.LifecycleService -
HazelcastClient[hz.client_0_dev][3.2.1] is STARTED
14:58:27.061 [hz.client_0_dev.cluster-listener] INFO
com.hazelcast.core.LifecycleService -
HazelcastClient[hz.client_0_dev][3.2.1] is CLIENT_CONNECTED
14:58:27.061 [hz.client_0_dev.cluster-listener] INFO
c.h.client.spi.ClientClusterService -
Members [5] { Member [127.0.0.1]:5701 Member [127.0.0.1]:5702
Member [127.0.0.1]:5703 Member [127.0.0.1]:5704 Member
[127.0.0.1]:5705 }
14:58:47.278 [main] ERROR c.h.c.spi.ClientPartitionService - Error
while fetching cluster partition table!
com.hazelcast.spi.exception.RetryableIOException:
java.util.concurrent.ExecutionException:
com.hazelcast.core.HazelcastException: java.net.ConnectException:
Connection refused: no further information ... Caused by:
java.util.concurrent.ExecutionException:
com.hazelcast.core.HazelcastException: java.net.ConnectException:
Connection refused: no further information
at java.util.concurrent.FutureTask.report(Unknown Source)
~[na:1.8.0_31]
at java.util.concurrent.FutureTask.get(Unknown Source) ~[na:1.8.0_31]
at
com.hazelcast.client.connection.nio.ClientConnectionManagerImpl.getOrConnect(ClientConnectionManagerImpl.java:282)
~[BRBASE-service-manager-1.0.0-jar-with-dependencies.jar:na]
... 14 common frames omitted
Caused by: com.hazelcast.core.HazelcastException:
java.net.ConnectException: Connection refused: no further information
at com.hazelcast.util.ExceptionUtil.rethrow(ExceptionUtil.java:45)
~[BRBASE-service-manager-1.0.0-jar-with-dependencies.jar:na] ...
To connect to the remote cluster, make sure the cluster uses the external IP and not 127.0.0.1. In our case we have a single physical system, with multiple nodes, with tcp-ip mode enabled. The hazelcast.xml has the configuration:
<tcp-ip enabled="true">
<!-- This should be external IP -->
<interface>172.x.x.x</interface>
</tcp-ip>
Can you try:
ClientConfig config = new ClientConfig();
config.getNetworkConfig().addAddress(host + ":" + port);
HazelcastInstance instance = HazelcastClient.newHazelcastClient(config);
If you want to connect to multiple ip's running Hazelcast as cluster add below to your client config and then instantiate client.
//configure client properties
ClientConfig config = new ClientConfig();
String[] addresses = {"172.20.250.118" + ":" + "5701","172.20.250.49" + ":" + "5701"};
config.getNetworkConfig().addAddress(addresses);
//start Hazelcast client
HazelcastInstance hazelcastInstance = HazelcastClient.newHazelcastClient(config);

Cannot programmatically submit Spark application (with Cassandra connector) to cluster from remote client

I'm running a standalone Spark cluster on EC2, and I'm writing a application using Spark-Cassandra connector driver and try to submit job to Spark cluster programmatically.
The job itself is simple:
public static void main(String[] args) {
SparkConf conf;
JavaSparkContext sc;
conf = new SparkConf()
.set("spark.cassandra.connection.host", host);
conf.set("spark.driver.host", "[my_public_ip]");
conf.set("spark.driver.port", "15000");
sc = new JavaSparkContext("spark://[spark_master_host]","test",conf);
CassandraJavaRDD<CassandraRow> rdd = javaFunctions(sc).cassandraTable(
"keyspace", "table");
System.out.println(rdd.first().toString());
sc.stop();
}
Which runs fine when I run that in the Spark Master node of my EC2 cluster.
I'm trying to running this in a remote Windows client.
The problem was from these two lines:
conf.set("spark.driver.host", "[my_public_ip]");
conf.set("spark.driver.port", "15000");
First, if i comment out these 2 lines, application would not throw a exception, but the Executor is not running, with following log:
14/12/06 22:40:03 INFO client.AppClient$ClientActor: Executor updated: app-20141207033931-0021/3 is now LOADING
14/12/06 22:40:03 INFO client.AppClient$ClientActor: Executor updated: app-20141207033931-0021/0 is now EXITED (Command exited with code 1)
14/12/06 22:40:03 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141207033931-0021/0 removed: Command exited with code 1
Which never ends, when I check the worker node log, I found:
14/12/06 22:40:21 ERROR security.UserGroupInformation: PriviledgedActionException as:[username] cause:java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1134)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:156)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.security.PrivilegedActionException: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
... 4 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:125)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
... 7 more
I've no idea what that's about, my guess is that probably worker node could not connect to driver, which probably initially set as:
14/12/06 22:39:30 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#[some_host_name]:52660]
14/12/06 22:39:30 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver#[some_host_name]:52660]
Obviously, no DNS is going to resolve my host name...
Since I can't set deploy mode to "client" or "cluster", if not via ./spark-submit script.(Which I think that's absurd...). I try to add a host resolution "XX.XXX.XXX.XX [host-name]" in /etc/hosts of all Spark Master Worker nodes.
No luck of course...
That leads me to the second, un-comment that two line;
Which gives me:
14/12/06 22:59:41 INFO Remoting: Starting remoting
14/12/06 22:59:41 ERROR Remoting: Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
at akka.remote.Remoting.start(Remoting.scala:194)
...
Cause:
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: /[my_public_ip]:15000
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:391)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:388)
I double checked my firewall setting and router setting, confirm that my firewall is diabled; and netstat -an to confirm port 15000 is not in use (in fact I tried to change to several available port, no luck); and I ping my public ip from both other machine and machine from my cluster, no problem.
Now I'm utterly screw up, I just run out of idea try to fix this. Any suggestions? Any help is appreciated!
Please check if 15000 is in your security group.

Categories

Resources