Hazelcast: connecting to remote cluster - java

We have a cluster of Hazelcast nodes all running on one remote system (single physical system with many nodes). We would like to connect to this cluster from an external client - a Java application which uses code as below to connect to Hazelcast:
ClientConfig clientConfig = new ClientConfig();
clientConfig.addAddress(config.getHost() + ":" + config.getPort());
client = HazelcastClient.newHazelcastClient(clientConfig);
where, host is the IP of remote and port is 5701.
This still connects to the local host (127.0.0.1). What am I missing?
Edit:
If the java client is the only hazelcast app running on the local system, it fails to connect and throws the exception: java.lang.IllegalStateException: Cannot get initial partitions!
From the logs:
14:58:26.717 [main] INFO c.m.b.p.s.s.HazelcastCacheClient - creating
new Hazelcast instance
14:58:26.748 [main] INFO com.hazelcast.core.LifecycleService -
HazelcastClient[hz.client_0_dev][3.2.1] is STARTING
14:58:27.029 [main] INFO com.hazelcast.core.LifecycleService -
HazelcastClient[hz.client_0_dev][3.2.1] is STARTED
14:58:27.061 [hz.client_0_dev.cluster-listener] INFO
com.hazelcast.core.LifecycleService -
HazelcastClient[hz.client_0_dev][3.2.1] is CLIENT_CONNECTED
14:58:27.061 [hz.client_0_dev.cluster-listener] INFO
c.h.client.spi.ClientClusterService -
Members [5] { Member [127.0.0.1]:5701 Member [127.0.0.1]:5702
Member [127.0.0.1]:5703 Member [127.0.0.1]:5704 Member
[127.0.0.1]:5705 }
14:58:47.278 [main] ERROR c.h.c.spi.ClientPartitionService - Error
while fetching cluster partition table!
com.hazelcast.spi.exception.RetryableIOException:
java.util.concurrent.ExecutionException:
com.hazelcast.core.HazelcastException: java.net.ConnectException:
Connection refused: no further information ... Caused by:
java.util.concurrent.ExecutionException:
com.hazelcast.core.HazelcastException: java.net.ConnectException:
Connection refused: no further information
at java.util.concurrent.FutureTask.report(Unknown Source)
~[na:1.8.0_31]
at java.util.concurrent.FutureTask.get(Unknown Source) ~[na:1.8.0_31]
at
com.hazelcast.client.connection.nio.ClientConnectionManagerImpl.getOrConnect(ClientConnectionManagerImpl.java:282)
~[BRBASE-service-manager-1.0.0-jar-with-dependencies.jar:na]
... 14 common frames omitted
Caused by: com.hazelcast.core.HazelcastException:
java.net.ConnectException: Connection refused: no further information
at com.hazelcast.util.ExceptionUtil.rethrow(ExceptionUtil.java:45)
~[BRBASE-service-manager-1.0.0-jar-with-dependencies.jar:na] ...

To connect to the remote cluster, make sure the cluster uses the external IP and not 127.0.0.1. In our case we have a single physical system, with multiple nodes, with tcp-ip mode enabled. The hazelcast.xml has the configuration:
<tcp-ip enabled="true">
<!-- This should be external IP -->
<interface>172.x.x.x</interface>
</tcp-ip>

Can you try:
ClientConfig config = new ClientConfig();
config.getNetworkConfig().addAddress(host + ":" + port);
HazelcastInstance instance = HazelcastClient.newHazelcastClient(config);

If you want to connect to multiple ip's running Hazelcast as cluster add below to your client config and then instantiate client.
//configure client properties
ClientConfig config = new ClientConfig();
String[] addresses = {"172.20.250.118" + ":" + "5701","172.20.250.49" + ":" + "5701"};
config.getNetworkConfig().addAddress(addresses);
//start Hazelcast client
HazelcastInstance hazelcastInstance = HazelcastClient.newHazelcastClient(config);

Related

Problem at creating EMS application supporting Failover/FaultTolerance

I am starting to study how can I implement an application supporting Failover/FaultTolerance on top of JMS, more precisely EMS
I configured two EMS servers working both with FaultTolerance enabled:
For EMS running on server on server1 I have
in tibemsd.conf
ft_active = tcp://server2:7232
in factories.conf
[GenericConnectionFactory]
type = generic
url = tcp://server1:7232
[FTTopicConnectionFactory]
type = topic
url = tcp://server1:7232,tcp://server2:7232
[FTQueueConnectionFactory]
type = queue
url = tcp://server1:7232,tcp://server2:7232
And for EMS running on server on server2 I have
in tibemsd.conf
ft_active = tcp://server1:7232
in factories.conf
[GenericConnectionFactory]
type = generic
url = tcp://server2:7232
[FTTopicConnectionFactory]
type = topic
url = tcp://server2:7232,tcp://server1:7232
[FTQueueConnectionFactory]
type = queue
url = tcp://server2:7232,tcp://server1:7232
I am not a TIBCO EMS expert but my config seems to be good: When I start EMS on server1 I get:
$ tibemsd -config tibemsd.conf
...
2022-07-20 23:04:58.566 Server is active.
2022-07-20 23:05:18.563 Standby server 'SERVERNAME#server1' has connected.
then if I start EMS on server2, I get
$ tibemsd -config tibemsd.conf
...
2022-07-20 23:05:18.564 Accepting connections on tcp://server2:7232.
2022-07-20 23:05:18.564 Server is in standby state for 'tcp://server1:7232'
Moreover, if I kill active EMS on server1, I immediately get the following message on server2:
2022-07-20 23:21:52.891 Connection to active server 'tcp://server1:7232' has been lost.
2022-07-20 23:21:52.891 Server activating on failure of 'tcp://server1:7232'.
...
2022-07-20 23:21:52.924 Server is now active.
Until here, everything looks OK, active/standby EMS servers seems to be correctly configured
Things get more complicated when I write a piece of code how is supposed to connect to these EMS servers and to periodically publish messages. Let's try with the following code sample:
#Test
public void testEmsFailover() throws JMSException, InterruptedException {
int NB = 1000;
TibjmsConnectionFactory factory = new TibjmsConnectionFactory();
factory.setServerUrl("tcp://server1:7232,tcp://server2:7232");
Connection connection = factory.createConnection();
Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
connection.start();
for (int i = 0; i < NB; i++) {
LOG.info("sending message");
Queue queue = session.createQueue(QUEUE__CLIENT_TO_FRONTDOOR__CONNECTION_REQUEST);
MessageProducer producer = session.createProducer(queue);
MapMessage mapMessage = session.createMapMessage();
mapMessage.setStringProperty(PROPERTY__CLIENT_KIND, USER.toString());
mapMessage.setStringProperty(PROPERTY__CLIENT_NAME, "name");
producer.send(mapMessage);
LOG.info("done!");
Thread.sleep(1000);
}
}
If I run this code while both active and standby servers are up, everything looks good
23:26:32.431 [main] INFO JmsEndpointTest - sending message
23:26:32.458 [main] INFO JmsEndpointTest - done!
23:26:33.458 [main] INFO JmsEndpointTest - sending message
23:26:33.482 [main] INFO JmsEndpointTest - done!
Now If I kill the active EMS server, I would expect that
the standby server would instantaneously become the active one
my code would continue to publish such as if nothing had happened
However, in my code I get the following error:
javax.jms.JMSException: Connection is closed
at com.tibco.tibjms.TibjmsxLink.sendRequest(TibjmsxLink.java:307)
at com.tibco.tibjms.TibjmsxLink.sendRequestMsg(TibjmsxLink.java:261)
at com.tibco.tibjms.TibjmsxSessionImp._createProducer(TibjmsxSessionImp.java:1004)
at com.tibco.tibjms.TibjmsxSessionImp.createProducer(TibjmsxSessionImp.java:4854)
at JmsEndpointTest.testEmsFailover(JmsEndpointTest.java:103)
...
and in the logs of the server (the previous standby server supposed to be now the active one) I get
2022-07-20 23:32:44.447 [anonymous#cersei]: connect failed: server not in active state
2022-07-20 23:33:02.969 Connection to active server 'tcp://server2:7232' has been lost.
2022-07-20 23:33:02.969 Server activating on failure of 'tcp://server2:7232'.
2022-07-20 23:33:02.969 Server rereading configuration.
2022-07-20 23:33:02.971 Recovering state, please wait.
2022-07-20 23:33:02.980 Recovered 46 messages.
2022-07-20 23:33:02.980 Server is now active.
2022-07-20 23:33:03.545 [anonymous#cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:04.187 [anonymous#cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:04.855 [anonymous#cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:05.531 [anonymous#cersei]: reconnect failed: connection unknown for id=8
I would appreciate any help to enhance my code
Thank you
I think I found the origin of my problem:
according to the page Tibco-Ems Failover Issue, the error message
reconnect failed: connection unknown for id=8
means: "the store (ems db) was'nt share between the active and the standby node, so when the active ems failed, the new active ems was'nt able to recover connections and messages."
I realized that it is painful to configure a shared store. To avoid it, I configured two tibems on the same host, by following the page Step By Step How to Setup TIBCO EMS In Fault Tolerant Mode:
two tibemsd.conf configuration files
configure a different listen port in each file
configure ft_active with url of other server
configure factories.conf
By doing so, I can replay my test and it works as expected

When using Apache Curator, why does creating a zNode cause NoNodeException

I am trying to create a "directory" in Zookeeper like this:
curatorFramework = CuratorFrameworkFactory.newClient(
"ip-111-11-111-1.us-west-2.compute.internal/111.11.111.1:2181",
zkInfo.getSessionTimeoutMs(),
zkInfo.getConnectionTimeoutMs(),
new RetryNTimes(zkInfo.getRetryAttempts(),
zkInfo.getRetryIntervalMs())
);
curatorFramework.start();
byte[] byteArray = new byte[1];
byteArray[0] = (byte) 7;
curatorFramework.create()
.withMode(CreateMode.PERSISTENT)
.withACL(ZooDefs.Ids.OPEN_ACL_UNSAFE)
.forPath("/my_node", byteArray);
Perplexingly, it is giving me a "NoNodeException" on the very node I'm trying to create.
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /my_node
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) ~[stormjar.jar:?]
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[stormjar.jar:?]
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) ~[stormjar.jar:?]
at org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1176) ~[stormjar.jar:?]
at org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1156) ~[stormjar.jar:?]
at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64) ~[stormjar.jar:?]
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100) ~[stormjar.jar:?]
at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1153) ~[stormjar.jar:?]
at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:607) ~[stormjar.jar:?]
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:597) ~[stormjar.jar:?]
at org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:362) ~[stormjar.jar:?]
at org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:310) ~[stormjar.jar:?]
Note that I am able to connect to Zookeeper:
Socket connection established to ip-111-11-111-1.us-west-2.compute.internal/111.11.111.1:2181, initiating session
Session establishment complete on server ip-111-11-111-1.us-west-2.compute.internal/111.11.111.1:2181, sessionid = 0x100000363b13354, negotiated timeout = 20000
Please note that the Zookeeper server is on a remote machine and the ip ("111.11.111.1") has been changed in this post.
So I tried connecting with zkCli and found that it was my connectString that was the problem:
/opt/zookeeper-3.6.1/bin/zkCli.sh -server ip-111-11-111-1.us-west-2.compute.internal/111.11.111.1:2181
Welcome to ZooKeeper!
[myid:ip-111-11-111-1.us-west-2.compute.internal:2181] - INFO [main-SendThread(ip-111-11-111-1.us-west-2.compute.internal:2181):ClientCnxn$SendThread#1154] - Opening socket connection to server ip-111-11-111-1.us-west-2.compute.internal/111.11.111.1:2181.
[myid:ip-111-11-111-1.us-west-2.compute.internal:2181] - INFO [main-SendThread(ip-111-11-111-1.us-west-2.compute.internal:2181):ClientCnxn$SendThread#986] - Socket connection established, initiating session, client: /111.11.111.2:43736, server: ip-111-11-111-1.us-west-2.compute.internal/111.11.111.1:2181
[myid:ip-111-11-111-1.us-west-2.compute.internal:2181] - INFO [main-SendThread(ip-111-11-111-1.us-west-2.compute.internal:2181):ClientCnxn$SendThread#1420] - Session establishment complete on server ip-111-11-111-1.us-west-2.compute.internal/111.11.111.1:2181, session id = 0x100000363b1d2e8, negotiated timeout = 30000
[zk: ip-111-11-111-1.us-west-2.compute.internal/111.11.111.1:2181(CONNECTED) 0] ls /
Node does not exist: /
[zk: ip-111-11-111-1.us-west-2.compute.internal/111.11.111.1:2181(CONNECTED) 1] create /my_node
Node does not exist: /my_node
As you can see, trying to create a node gives you a NoNode error in zkCli.
It turns out that ip-111-11-111-1.us-west-2.compute.internal/111.11.111.1:2181 was not a correct connect string, but I was confused because zkCli allowed me to connect to it.
Seems related to ACLs, just to be sure, you could manually create it.
Locate your local zk binaries (doesn't need to be on the remote host) and launch the client (zkCli) pointing at your server. Once connected, create the new znode:
bin/zkCli.sh -server 111.11.111.1:2181
[zkshell:x] create /my_node
>>Created /mynode
The shell should output the last sentence in order to guarantee the node has been created. Once done, launch again the Curator process.
Take a look here for more detailed info about the zk client.

Trouble with Glassfish Server and ActiveMQ: peer did not send his wire format

I'm getting this error while trying to set up a JMSPublisher and JMSSubscriber
jndi.properties
java.naming.factory.initial = org.apache.activemq.jndi.ActiveMQInitialContextFactory
java.naming.provider.url = tcp://localhost:4848?wireFormat.maxInactivityDurationInitalDelay=30000
topic.topic/flightStatus = flightStatus
Glassfish server is running on: http://localhost:4848
Publisher:
JmsPublisher publisher= new JmsPublisher("ConnectionFactory", "topic/flightStatus");
...
public JmsPublisher(String factoryName, String topicName) throws JMSException, NamingException {
Context jndiContext = new InitialContext();
TopicConnectionFactory factory = (TopicConnectionFactory) jndiContext.lookup(factoryName);
Topic topic = (Topic) jndiContext.lookup(topicName);
this.connect = factory.createTopicConnection();
this.session = connect.createTopicSession(false, Session.AUTO_ACKNOWLEDGE);
this.publisher = session.createPublisher(topic);
}
Exception:
Exception in thread "main" javax.jms.JMSException: Wire format negotiation timeout: peer did not send his wire format.
at org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:62)
at org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1395)
at org.apache.activemq.ActiveMQConnection.ensureConnectionInfoSent(ActiveMQConnection.java:1481)
at org.apache.activemq.ActiveMQConnection.createSession(ActiveMQConnection.java:323)
at org.apache.activemq.ActiveMQConnection.createTopicSession(ActiveMQConnection.java:1112)
at com.mycompany.testejms.JmsPublisher.<init>(JmsPublisher.java:34)
at com.mycompany.testejms.JmsPublisher.main(JmsPublisher.java:51)
Caused by: java.io.IOException: Wire format negotiation timeout: peer did not send his wire format.
at org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:98)
at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
at org.apache.activemq.transport.ResponseCorrelator.asyncRequest(ResponseCorrelator.java:81)
at org.apache.activemq.transport.ResponseCorrelator.request(ResponseCorrelator.java:86)
at org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1366)
... 5 more
The error indicates that the ActiveMQ client is not actually communicating with an ActiveMQ broker. Glassfish may be listening on http://localhost:4848, but apparently that's not where the ActiveMQ broker is listening for connections. From what I understand, port 4848 is where the Glassfish web admin console listens for connections. Note the http in the URL you provided. By default, ActiveMQ listens on port 61616.

Connection's issue on cassandra database

i have problems to connect cassandra to spark, i can connect to cassandra by cqlsh but when i launch my program:
public static void main(String[] args) {
Cluster cluster;
Session session;
cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
session = cluster.connect();
SparkConf conf = new SparkConf conf = new SparkConf().setAppName("CassandraExamples").setMaster("local[1]")
.set("spark.cassandra.connection.host", "9.168.86.84");
JavaSparkContext sc = new JavaSparkContext("spark://9.168.86.84:9042","CassandraExample",conf);
CassandraJavaPairRDD<String, String> rdd1 = javaFunctions(sc).cassandraTable("keyspace", "table",
mapColumnTo(String.class), mapColumnTo(String.class)).select("row1", "row2");
System.out.println("Data fetched: \n" + StringUtils.join(rdd1.toArray(), "\n"));
}
i'm getting this error:
15/06/11 11:41:15 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster#9.168.86.84:9042]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: no further information: /9.168.86.84:9042
15/06/11 11:41:34 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#9.168.86.84:9042/user/Master...
15/06/11 11:41:35 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster#9.168.86.84:9042: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#9.168.86.84:9042
15/06/11 11:41:35 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster#9.168.86.84:9042]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: no further information: /9.168.86.84:9042
15/06/11 11:41:54 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#9.168.86.84:9042/user/Master...
15/06/11 11:41:55 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster#9.168.86.84:9042: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#9.168.86.84:9042
15/06/11 11:41:55 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster#9.168.86.84:9042]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: no further information: /9.168.86.84:9042
15/06/11 11:42:14 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/06/11 11:42:14 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet.
15/06/11 11:42:14 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
cassandra.yaml has this properties:
listen_address: 9.168.86.84
start_native_transport: true
rpc_address: 0.0.0.0
native_transport_port: 9042
rpc_port: 9160
can someone tell my what's wrong?

Cannot programmatically submit Spark application (with Cassandra connector) to cluster from remote client

I'm running a standalone Spark cluster on EC2, and I'm writing a application using Spark-Cassandra connector driver and try to submit job to Spark cluster programmatically.
The job itself is simple:
public static void main(String[] args) {
SparkConf conf;
JavaSparkContext sc;
conf = new SparkConf()
.set("spark.cassandra.connection.host", host);
conf.set("spark.driver.host", "[my_public_ip]");
conf.set("spark.driver.port", "15000");
sc = new JavaSparkContext("spark://[spark_master_host]","test",conf);
CassandraJavaRDD<CassandraRow> rdd = javaFunctions(sc).cassandraTable(
"keyspace", "table");
System.out.println(rdd.first().toString());
sc.stop();
}
Which runs fine when I run that in the Spark Master node of my EC2 cluster.
I'm trying to running this in a remote Windows client.
The problem was from these two lines:
conf.set("spark.driver.host", "[my_public_ip]");
conf.set("spark.driver.port", "15000");
First, if i comment out these 2 lines, application would not throw a exception, but the Executor is not running, with following log:
14/12/06 22:40:03 INFO client.AppClient$ClientActor: Executor updated: app-20141207033931-0021/3 is now LOADING
14/12/06 22:40:03 INFO client.AppClient$ClientActor: Executor updated: app-20141207033931-0021/0 is now EXITED (Command exited with code 1)
14/12/06 22:40:03 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141207033931-0021/0 removed: Command exited with code 1
Which never ends, when I check the worker node log, I found:
14/12/06 22:40:21 ERROR security.UserGroupInformation: PriviledgedActionException as:[username] cause:java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1134)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:156)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.security.PrivilegedActionException: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
... 4 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:125)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
... 7 more
I've no idea what that's about, my guess is that probably worker node could not connect to driver, which probably initially set as:
14/12/06 22:39:30 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#[some_host_name]:52660]
14/12/06 22:39:30 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver#[some_host_name]:52660]
Obviously, no DNS is going to resolve my host name...
Since I can't set deploy mode to "client" or "cluster", if not via ./spark-submit script.(Which I think that's absurd...). I try to add a host resolution "XX.XXX.XXX.XX [host-name]" in /etc/hosts of all Spark Master Worker nodes.
No luck of course...
That leads me to the second, un-comment that two line;
Which gives me:
14/12/06 22:59:41 INFO Remoting: Starting remoting
14/12/06 22:59:41 ERROR Remoting: Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
at akka.remote.Remoting.start(Remoting.scala:194)
...
Cause:
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: /[my_public_ip]:15000
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:391)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:388)
I double checked my firewall setting and router setting, confirm that my firewall is diabled; and netstat -an to confirm port 15000 is not in use (in fact I tried to change to several available port, no luck); and I ping my public ip from both other machine and machine from my cluster, no problem.
Now I'm utterly screw up, I just run out of idea try to fix this. Any suggestions? Any help is appreciated!
Please check if 15000 is in your security group.

Categories

Resources