I'm running Kafka on WSL. I'm trying to make simple producer like this (I'm using intellj)
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;
import java.util.Properties;
public class ProducerDemo {
public static void main(String[] args) {
String bootstrapServers = "127.0.0.1:9092";
//create Producer properties
Properties properties = new Properties();
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,bootstrapServers);
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,StringSerializer.class.getName());
//create the producer
KafkaProducer<String,String> producer = new KafkaProducer<String, String>(properties);
//create a producer record
ProducerRecord<String,String> record =
new ProducerRecord<String, String>("first_topic","hallo world");
//send data
producer.send(record);
//flush + close
producer.flush();
producer.close();
}
}
but there's a problem, when I try to run the code it shows this error
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Error connecting to node AD17-2.localdomain:9092 (id: 0 rack: null)
java.net.UnknownHostException: No such host is known (AD17-2.localdomain)
at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:932)
at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1505)
at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:851)
at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1495)
at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1354)
at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1288)
at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:110)
at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:403)
at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:363)
at org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:151)
at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:955)
at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:293)
at org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:350)
at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:323)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:239)
at java.base/java.lang.Thread.run(Thread.java:832)
Process finished with exit code -1
i even checked the port on my Windows
TCP 0.0.0.0:9092 0.0.0.0:0 LISTENING
my question is, is it impossible to make producer on windows and while running kafka on WSL ?
Your Kafka broker is using the local hostname (AD17-2.localdomain) in its advertised listener. When you client connects to it initially on 127.0.0.1:9092 the broker returns this address for it to connect to for producing messages. Your producer fails to resolve this address, and fails.
To fix it, in the broker's server.properties set
advertised.listeners=PLAINTEXT://127.0.0.1:9092
listener.security.protocol.map=PLAINTEXT:PLAINTEXT
listeners=PLAINTEXT://0.0.0.0:9092
More info: https://rmoff.net/2018/08/02/kafka-listeners-explained/
It works for me with this solution:
Whit the command: "ip addr | grep eth0" on Ubuntu terminal I can get external interface IP, I set this IP on advertised.listener and with the command:
netsh interface portproxy add v4tov4 listenport=9092 listenaddress=0.0.0.0 connectport=9092 connectaddress=XXX.XX.XX.XX
on Windows cmd I can forward ports. With the command "netstat -ab" I can see on Windows cmd the IP and Port TCP 0.0.0.0:9092. It works correctly!
Running Kafka Confluent Platform on WSL 2 (Ubuntu Distribution) and Spring application on Windows (Broker may not be available)
Related
I am starting to study how can I implement an application supporting Failover/FaultTolerance on top of JMS, more precisely EMS
I configured two EMS servers working both with FaultTolerance enabled:
For EMS running on server on server1 I have
in tibemsd.conf
ft_active = tcp://server2:7232
in factories.conf
[GenericConnectionFactory]
type = generic
url = tcp://server1:7232
[FTTopicConnectionFactory]
type = topic
url = tcp://server1:7232,tcp://server2:7232
[FTQueueConnectionFactory]
type = queue
url = tcp://server1:7232,tcp://server2:7232
And for EMS running on server on server2 I have
in tibemsd.conf
ft_active = tcp://server1:7232
in factories.conf
[GenericConnectionFactory]
type = generic
url = tcp://server2:7232
[FTTopicConnectionFactory]
type = topic
url = tcp://server2:7232,tcp://server1:7232
[FTQueueConnectionFactory]
type = queue
url = tcp://server2:7232,tcp://server1:7232
I am not a TIBCO EMS expert but my config seems to be good: When I start EMS on server1 I get:
$ tibemsd -config tibemsd.conf
...
2022-07-20 23:04:58.566 Server is active.
2022-07-20 23:05:18.563 Standby server 'SERVERNAME#server1' has connected.
then if I start EMS on server2, I get
$ tibemsd -config tibemsd.conf
...
2022-07-20 23:05:18.564 Accepting connections on tcp://server2:7232.
2022-07-20 23:05:18.564 Server is in standby state for 'tcp://server1:7232'
Moreover, if I kill active EMS on server1, I immediately get the following message on server2:
2022-07-20 23:21:52.891 Connection to active server 'tcp://server1:7232' has been lost.
2022-07-20 23:21:52.891 Server activating on failure of 'tcp://server1:7232'.
...
2022-07-20 23:21:52.924 Server is now active.
Until here, everything looks OK, active/standby EMS servers seems to be correctly configured
Things get more complicated when I write a piece of code how is supposed to connect to these EMS servers and to periodically publish messages. Let's try with the following code sample:
#Test
public void testEmsFailover() throws JMSException, InterruptedException {
int NB = 1000;
TibjmsConnectionFactory factory = new TibjmsConnectionFactory();
factory.setServerUrl("tcp://server1:7232,tcp://server2:7232");
Connection connection = factory.createConnection();
Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
connection.start();
for (int i = 0; i < NB; i++) {
LOG.info("sending message");
Queue queue = session.createQueue(QUEUE__CLIENT_TO_FRONTDOOR__CONNECTION_REQUEST);
MessageProducer producer = session.createProducer(queue);
MapMessage mapMessage = session.createMapMessage();
mapMessage.setStringProperty(PROPERTY__CLIENT_KIND, USER.toString());
mapMessage.setStringProperty(PROPERTY__CLIENT_NAME, "name");
producer.send(mapMessage);
LOG.info("done!");
Thread.sleep(1000);
}
}
If I run this code while both active and standby servers are up, everything looks good
23:26:32.431 [main] INFO JmsEndpointTest - sending message
23:26:32.458 [main] INFO JmsEndpointTest - done!
23:26:33.458 [main] INFO JmsEndpointTest - sending message
23:26:33.482 [main] INFO JmsEndpointTest - done!
Now If I kill the active EMS server, I would expect that
the standby server would instantaneously become the active one
my code would continue to publish such as if nothing had happened
However, in my code I get the following error:
javax.jms.JMSException: Connection is closed
at com.tibco.tibjms.TibjmsxLink.sendRequest(TibjmsxLink.java:307)
at com.tibco.tibjms.TibjmsxLink.sendRequestMsg(TibjmsxLink.java:261)
at com.tibco.tibjms.TibjmsxSessionImp._createProducer(TibjmsxSessionImp.java:1004)
at com.tibco.tibjms.TibjmsxSessionImp.createProducer(TibjmsxSessionImp.java:4854)
at JmsEndpointTest.testEmsFailover(JmsEndpointTest.java:103)
...
and in the logs of the server (the previous standby server supposed to be now the active one) I get
2022-07-20 23:32:44.447 [anonymous#cersei]: connect failed: server not in active state
2022-07-20 23:33:02.969 Connection to active server 'tcp://server2:7232' has been lost.
2022-07-20 23:33:02.969 Server activating on failure of 'tcp://server2:7232'.
2022-07-20 23:33:02.969 Server rereading configuration.
2022-07-20 23:33:02.971 Recovering state, please wait.
2022-07-20 23:33:02.980 Recovered 46 messages.
2022-07-20 23:33:02.980 Server is now active.
2022-07-20 23:33:03.545 [anonymous#cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:04.187 [anonymous#cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:04.855 [anonymous#cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:05.531 [anonymous#cersei]: reconnect failed: connection unknown for id=8
I would appreciate any help to enhance my code
Thank you
I think I found the origin of my problem:
according to the page Tibco-Ems Failover Issue, the error message
reconnect failed: connection unknown for id=8
means: "the store (ems db) was'nt share between the active and the standby node, so when the active ems failed, the new active ems was'nt able to recover connections and messages."
I realized that it is painful to configure a shared store. To avoid it, I configured two tibems on the same host, by following the page Step By Step How to Setup TIBCO EMS In Fault Tolerant Mode:
two tibemsd.conf configuration files
configure a different listen port in each file
configure ft_active with url of other server
configure factories.conf
By doing so, I can replay my test and it works as expected
I have setup 3 Redis Server and 3 Redis Sentinel instances in my localhost. The servers are running at:
127.0.0.1:6379 // Master
127.0.0.1:6380 // Slave
127.0.0.1:6381 // Slave
and the sentinels are running at:
127.0.0.1:5000
127.0.0.1:5001
127.0.0.1:5002
I have a (Java) client that tries to connect to one of the sentinels and set the keys in redis server:
// import statements
public class RedisPush {
private static final String MASTER_NAME = "mymaster";
private static final String PASSWORD = "foobared";
private static final Set sentinels;
static {
sentinels = new HashSet();
sentinels.add("127.0.0.1:5000");
sentinels.add("127.0.0.1:5001");
sentinels.add("127.0.0.1:5002");
}
public static void pushToRedis() {
Jedis jedis = null;
try {
JedisSentinelPool pool = new JedisSentinelPool(MASTER_NAME, sentinels);
System.out.println("Fetching connection from pool.");
jedis = pool.getResource();
jedis.auth(PASSWORD);
Socket socket = jedis.getClient().getSocket();
System.out.println("Connected to " + socket.getRemoteSocketAddress());
int i = 0;
while (true) {
jedis.set("sentinel_key" + i, "value" + i);
System.out.println(i);
i++;
Thread.sleep(1000);
}
} catch (Exception e) {
e.printStackTrace();
}
finally {
if(jedis != null)
jedis.close();
}
}
public static void main(String[] args) throws Exception {
while(true) {
pushToRedis();
Thread.sleep(1000);
}
}
}
Initially, my sentinel configuration is as follows (for example, the first sentinel running on port 5000):
bind 127.0.0.1
port 5000
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel auth-pass mymaster foobared
If I try to run my (Java) client, I get the following error:
Nov 23, 2018 1:52:57 AM redis.clients.jedis.JedisSentinelPool initSentinels
INFO: Trying to find master from available Sentinels...
Nov 23, 2018 1:52:57 AM redis.clients.jedis.JedisSentinelPool initSentinels
WARNING: Cannot get master address from sentinel running # 192.168.0.102:5001. Reason: redis.clients.jedis.exceptions.JedisConnectionException: java.net.ConnectException: Connection refused (Connection refused). Trying next one.
Nov 23, 2018 1:52:57 AM redis.clients.jedis.JedisSentinelPool initSentinels
WARNING: Cannot get master address from sentinel running # 192.168.0.102:5000. Reason: redis.clients.jedis.exceptions.JedisConnectionException: java.net.ConnectException: Connection refused (Connection refused). Trying next one.
Nov 23, 2018 1:52:57 AM redis.clients.jedis.JedisSentinelPool initSentinels
WARNING: Cannot get master address from sentinel running # 192.168.0.102:5002. Reason: redis.clients.jedis.exceptions.JedisConnectionException: java.net.ConnectException: Connection refused (Connection refused). Trying next one.
redis.clients.jedis.exceptions.JedisConnectionException: All sentinels down, cannot determine where is mymaster master is running...
at redis.clients.jedis.JedisSentinelPool.initSentinels(JedisSentinelPool.java:180)
at redis.clients.jedis.JedisSentinelPool.<init>(JedisSentinelPool.java:95)
at redis.clients.jedis.JedisSentinelPool.<init>(JedisSentinelPool.java:82)
at redis.clients.jedis.JedisSentinelPool.<init>(JedisSentinelPool.java:70)
at redis.clients.jedis.JedisSentinelPool.<init>(JedisSentinelPool.java:44)
at RedisPush.pushToRedis(RedisPush.java:29)
at RedisPush.main(RedisPush.java:61)
However, if I change my sentinel configuration script to the one below:
# bind 127.0.0.1
port 5000
protected-mode no
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel auth-pass mymaster foobared
the client works perfectly. I don't understand why.
AFAIK, if requirepass is not set in sentinel.conf file AND bind is commented in sentinel.conf file, ONLY then protected-mode will be yes to avoid any client connecting to the sentinel apart from localhost. In my first sentinel configuration, I had the bind command but still it didn't work.
Why does commenting out bind and explicitly setting protected-mode to no works?
P.S. I also tried having both bind 127.0.0.1 and protected-mode no but even that didn't work.
I had a similar issue where Jedis was throwing the same error, it was related to https://github.com/xetorthio/jedis/issues/1367
To get around this on my mac I did sudo ifconfig lo0 alias 127.0.1.1
Then changed my sentinel.conf to bind 127.0.1.1
and then updated my yaml for spring data to connect to that ip
sentinel.nodes: 127.0.1.1:5000, 127.0.1.1:5002, 127.0.1.1:5003
Jedis doesn't support well Redis Sentinel. Use Lettuce instead which has a better pool management.
And well supported on springboot too.
I've installed Zookeeper and Kafka from Ambari, on CentoS 7.
Ambari version: 2.1.2.1
Zookeeper version: 3.4.6.2.3
Kafka version: 0.8.2.2.3
Java Kafka client:kafka_2.10, 0.8.2.2
I'm trying to save the Kafka offset, using the following code:
SimpleConsumer simpleConsumer = new SimpleConsumer(host, port, soTimeout, bufferSize, clientId);
TopicAndPartition topicAndPartition = new TopicAndPartition(topicName, partitionId);
Map<TopicAndPartition, OffsetAndMetadata> requestInfo = new HashMap<>();
requestInfo.put(topicAndPartition, new OffsetAndMetadata(readOffset, "", ErrorMapping.NoError()));
OffsetCommitRequest offsetCommitRequest = new OffsetCommitRequest(groupName, requestInfo, correlationId, clientName, (short)0);
simpleConsumer.commitOffsets(offsetCommitRequest);
simpleConsumer.close();
But when I run this, I get the following error in my client:
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
Also in the Kafka logs I have the following error:
[2015-11-24 15:38:53,566] ERROR Closing socket for /192.168.186.1 because of error (kafka.network.Processor)
java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Buffer.java:498)
at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:406)
at kafka.api.OffsetCommitRequest$$anonfun$1$$anonfun$apply$1.apply(OffsetCommitRequest.scala:73)
at kafka.api.OffsetCommitRequest$$anonfun$1$$anonfun$apply$1.apply(OffsetCommitRequest.scala:68)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.Range.foreach(Range.scala:141)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at kafka.api.OffsetCommitRequest$$anonfun$1.apply(OffsetCommitRequest.scala:68)
at kafka.api.OffsetCommitRequest$$anonfun$1.apply(OffsetCommitRequest.scala:65)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.Range.foreach(Range.scala:141)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at kafka.api.OffsetCommitRequest$.readFrom(OffsetCommitRequest.scala:65)
at kafka.api.RequestKeys$$anonfun$9.apply(RequestKeys.scala:47)
at kafka.api.RequestKeys$$anonfun$9.apply(RequestKeys.scala:47)
at kafka.network.RequestChannel$Request.<init>(RequestChannel.scala:55)
at kafka.network.Processor.read(SocketServer.scala:547)
at kafka.network.Processor.run(SocketServer.scala:405)
at java.lang.Thread.run(Thread.java:745)
Now I've also downloaded and installed the official Kafka 0.8.2.2 version from https://www.apache.org/dyn/closer.cgi?path=/kafka/0.8.2.2/kafka_2.10-0.8.2.2.tgz and it works ok; you can save the Kafka offset without any error.
Can anybody give me a some directions, why is the Ambari Kafka failing to save the offset?
P.S: I know that if versionId is 0 (in OffsetCommitRequest), than the offset is actually saved in Zookeeper.
I have written a simple java code for creating a table in hbase but somehow it is not working. I checked that all services are working fie i.e. HMaster, Regionserver and Zookeeper. Below is a code that i wrote
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
public class CreateSchema
{
public static void main(String[] args) throws IOException
{
try
{
HBaseConfiguration conf = new HBaseConfiguration(new Configuration());
HBaseAdmin hbase = new HBaseAdmin(conf);
HTableDescriptor desc = new HTableDescriptor("sample");
HColumnDescriptor meta = new HColumnDescriptor("samplecolumn1".getBytes());
HColumnDescriptor prefix = new HColumnDescriptor("samplecolumn2".getBytes());
desc.addFamily(meta);
desc.addFamily(prefix);
System.out.println("Creating table");
hbase.createTable(desc);
System.out.println("Done");
}
catch(Exception e)
{
System.out.println("Error Ocuured");
}
}
}
Here is a zookeeper log.
2015-01-15 07:46:01,594 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServer``CnxnFactory#197]
- Accepted socket connection from /127.0.0.1:60599 2015-01-15 07:46:01,595 - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#822] -
Connection request from old client /127.0.0.1:60599; will be dropped
if server is in r-o mode 2015-01-15 07:46:01,595 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#868] -
Client attempting to establish new session at /127.0.0.1:60599
2015-01-15 07:46:01,619 - INFO [SyncThread:0:ZooKeeperServer#617] -
Established session 0x14aec781bb9000b with negotiated timeout 40000
for client /127.0.0.1:60599 2015-01-15 07:46:37,151 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#197] -
Accepted socket connection from /10.0.2.15:58102 2015-01-15
07:46:37,152 - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#357] - caught
end of stream exception EndOfStreamException: Unable to read
additional data from client sessionid 0x0, likely client has closed
socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745) 2015-01-15 07:46:37,153 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1007] -
Closed socket connection for client /10.0.2.15:58102 (no session
established for client)
After running Hbase java programm nothing happens.
java CreateSchema
Creating table
Please let me know what could be the issue.
One of the reason this might occur is your hbase-site.xml is not available in the classpath.
When you run:
HBaseConfiguration conf = new HBaseConfiguration(new Configuration());
HbaseConfiguration based the hbase-site.xml is created.
PS: I ran your code as such in my eclipse and it worked fine. The table sample got created without any error.
I'm running a standalone Spark cluster on EC2, and I'm writing a application using Spark-Cassandra connector driver and try to submit job to Spark cluster programmatically.
The job itself is simple:
public static void main(String[] args) {
SparkConf conf;
JavaSparkContext sc;
conf = new SparkConf()
.set("spark.cassandra.connection.host", host);
conf.set("spark.driver.host", "[my_public_ip]");
conf.set("spark.driver.port", "15000");
sc = new JavaSparkContext("spark://[spark_master_host]","test",conf);
CassandraJavaRDD<CassandraRow> rdd = javaFunctions(sc).cassandraTable(
"keyspace", "table");
System.out.println(rdd.first().toString());
sc.stop();
}
Which runs fine when I run that in the Spark Master node of my EC2 cluster.
I'm trying to running this in a remote Windows client.
The problem was from these two lines:
conf.set("spark.driver.host", "[my_public_ip]");
conf.set("spark.driver.port", "15000");
First, if i comment out these 2 lines, application would not throw a exception, but the Executor is not running, with following log:
14/12/06 22:40:03 INFO client.AppClient$ClientActor: Executor updated: app-20141207033931-0021/3 is now LOADING
14/12/06 22:40:03 INFO client.AppClient$ClientActor: Executor updated: app-20141207033931-0021/0 is now EXITED (Command exited with code 1)
14/12/06 22:40:03 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141207033931-0021/0 removed: Command exited with code 1
Which never ends, when I check the worker node log, I found:
14/12/06 22:40:21 ERROR security.UserGroupInformation: PriviledgedActionException as:[username] cause:java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1134)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:156)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.security.PrivilegedActionException: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
... 4 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:125)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
... 7 more
I've no idea what that's about, my guess is that probably worker node could not connect to driver, which probably initially set as:
14/12/06 22:39:30 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#[some_host_name]:52660]
14/12/06 22:39:30 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver#[some_host_name]:52660]
Obviously, no DNS is going to resolve my host name...
Since I can't set deploy mode to "client" or "cluster", if not via ./spark-submit script.(Which I think that's absurd...). I try to add a host resolution "XX.XXX.XXX.XX [host-name]" in /etc/hosts of all Spark Master Worker nodes.
No luck of course...
That leads me to the second, un-comment that two line;
Which gives me:
14/12/06 22:59:41 INFO Remoting: Starting remoting
14/12/06 22:59:41 ERROR Remoting: Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
at akka.remote.Remoting.start(Remoting.scala:194)
...
Cause:
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: /[my_public_ip]:15000
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:391)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:388)
I double checked my firewall setting and router setting, confirm that my firewall is diabled; and netstat -an to confirm port 15000 is not in use (in fact I tried to change to several available port, no luck); and I ping my public ip from both other machine and machine from my cluster, no problem.
Now I'm utterly screw up, I just run out of idea try to fix this. Any suggestions? Any help is appreciated!
Please check if 15000 is in your security group.