NoRouteToHostException while hadoop fs -copyFromLocal

NoRouteToHostException while hadoop fs -copyFromLocal - java

I installed hadoop 2.5.1 on CentOS7.0
and I'm using 3 computers with below hosts file, the same as all 3 computers
I'm not using DNS.
XXX.XXX.XXX.65 mccb-com65 #server
XXX.XXX.XXX.66 mccb-com66 #client01
XXX.XXX.XXX.67 mccb-com67 #client02
127.0.0.1 localhost
127.0.1.1 mccb-com65
I execute the command
$hadoop fs -copyFromLocal /home/hadoop/hdfs/hdfs/s_corpus.txt hdfs://XXX.XXX.XXX.65:9000/tmp/
I met below error message
INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.NoRouteToHostException: No route to host at
sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at
org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1526)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1328)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1281)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:526)
15/02/27 16:57:40 INFO hdfs.DFSClient: Abandoning
BP-1257634566-XXX.XXX.XXX.65-1425014347197:blk_1073741837_1013
15/02/27 16:57:40 INFO hdfs.DFSClient: Excluding datanode
XXX.XXX.XXX.67:50010 <-- the same as another salve node XXX.XXX.XXX.66
I turn off all firewall of both computers mccb-com66 and mccb-com67 as below state shows.
$systemctl status iptables
iptables.service - IPv4 firewall with iptables Loaded: loaded
(/usr/lib/systemd/system/iptables.service; disabled)
Active:
inactive (dead)
and Additionally I also turn off selinux.
datanode and nodemanager are alive in both machines
I can check the state
jps and
http://mccb-com65:50070 and
http://mccb-com65:8088
What I'm missing?
Could you anybody help me???

Even though I turn off the iptables, it's not valid solution.
After I open port one by one with firewall-cmd, it works..
for all slaves (66 and 67)
$firewall-cmd --zone=public --add-port=8042/tcp
$firewall-cmd --zone=public --add-port=50010/tcp
$firewall-cmd --zone=public --add-port=50020/tcp
$firewall-cmd --zone=public --add-port=50075/tcp
$firewall-cmd --reload
and then it works.
However, since I cannot open all ports which need to run Hadoop App,
turn off firewalld is reasonable such as
$systemctl stop firewalld
$systemctl disable firewalld
and check the status
$Systemctl status firewalld

your /etc/hosts should contain:
XXX.XXX.XXX.65 mccb-com65 #server
XXX.XXX.XXX.66 mccb-com66 #client01
XXX.XXX.XXX.67 mccb-com67 #client02
Remove
127.0.0.1 localhost
127.0.1.1 mccb-com65

Related

Two Rio instances on same network - java.rmi.ConnectException: Connection refused to host: 127.0.1.1;

I have seen the following error reported in a few places, but not in the context I am getting it. So if anyone can give some advise, I would appreciate it.
java.rmi.ConnectException: Connection refused to host: 127.0.1.1;
I have two Rio (http://www.rio-project.org/docs/index.html) servers running on the same network. If I run them one at a time, they run with no problems, however if I try run them at the same time, I get the above error. So it looks like they are trying to share the same resource (port?).
For example:
./bin/startall
More info:
Server 1
[jboss#primary etc]$ cat hosts
xx.10.7.xx ws1.company.com
xxx.25.20.xxx CLapp.xxxx.co.za
xxx.216.xxx.7 api.xxxx.com
xx.199.180.xxx smtp.outlook365.com webmail.domainlocalhost.com smtp.domainlocalhost.com
xxx.28.xx.10 ota.tulx.co.za
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 primary primary.company.com
::1 primary.company.com localhost localhost.localdomain localhost6 localhost6.localdomain6 primary
xxx.101.60.xxx repo1.maven.org
xx.200.xxx.14 primary.company.com primary
Server 2
[jboss#uat etc]$ cat hosts
xx.10.7.13 ws1.company.com
xxx.25.20.xxx CLapp.xxxx.co.za
xxx.216.xxx.7 api.xxxx.com
xxx.199.180.xxx smtp.outlook365.com webmail.domainlocalhost.com smtp.domainlocalhost.com
xxx.28.65.xx ota.tulx.co.za
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 uat.company.com uat
::1 uat.company.com localhost localhost.localdomain localhost6 localhost6.localdomain6 uat
xx.200.xxx.5 uat.company.com uat localhost
xxx.101.60.xxx repo1.maven.org
UPDATE
I tried removing 'localhost' where it's repeated twice on Server 2's hosts file for the two IP's, but I still get the error.
i.e. I changed:
xx.200.xxx.5 uat.company.com uat localhost
to:
xx.200.xxx.5 uat.company.com uat

tachyon0.8.2 deployed with hadoop2.6.0,but the IPC version are not matched

Now,I want to deploy the tachyon0.8.2 on my ubuntu14.04,I already has hadoop and spark:
on the master
bd#master$ jps
11871 Jps
3388 Master
2919 NameNode
3266 ResourceManager
3123 SecondaryNameNode
on the slave
bd#slave$ jps
4350 Jps
2778 NodeManager
2647 DataNode
2879 Worker
And I editor the taachyon-env.sh:
export TACHYON_MASTER_ADDRESS=${TACHYON_MASTER_ADDRESS:-master}
export TACHYON_UNDERFS_ADDRESS=${TACHYON_UNDERFS_ADDRESS:-hdfs://master:9000}
Then, I run the bin/tachyon formatand bin/tachyon-start.sh local.
I cannot see the tachyonMaster in JPS:
/usr/local/bigdata/tachyon-0.8.2 [06:06:32]
bd$ bin/tachyon-start.sh local
Killed 0 processes on master
Killed 0 processes on master
Connecting to master as bd...
Killed 0 processes on master
Connection to master closed.
[sudo] password for bd:
Formatting RamFS: /mnt/ramdisk (512mb)
Starting master # master
Starting worker # master
/usr/local/bigdata/tachyon-0.8.2 [06:06:54]
bd$ jps
12183 TachyonWorker
3388 Master
2919 NameNode
3266 ResourceManager
3123 SecondaryNameNode
12203 Jps
and I see the logs in master.logs,I said that:
2015-12-27 18:06:50,635 ERROR MASTER_LOGGER (MetricsConfig.java:loadConfigFile) - Error loading metrics configuration file.
2015-12-27 18:06:51,735 ERROR MASTER_LOGGER (HdfsUnderFileSystem.java:<init>) - Exception thrown when trying to get FileSystem for hdfs://master:9000
org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at tachyon.underfs.hdfs.HdfsUnderFileSystem.<init>(HdfsUnderFileSystem.java:74)
at tachyon.underfs.hdfs.HdfsUnderFileSystemFactory.create(HdfsUnderFileSystemFactory.java:30)
at tachyon.underfs.UnderFileSystemRegistry.create(UnderFileSystemRegistry.java:116)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:100)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:83)
at tachyon.master.TachyonMaster.connectToUFS(TachyonMaster.java:412)
at tachyon.master.TachyonMaster.startMasters(TachyonMaster.java:280)
at tachyon.master.TachyonMaster.start(TachyonMaster.java:261)
at tachyon.master.TachyonMaster.main(TachyonMaster.java:64)
2015-12-27 18:06:51,742 ERROR MASTER_LOGGER (TachyonMaster.java:main) - Uncaught exception terminating Master
java.lang.IllegalArgumentException: All eligible Under File Systems were unable to create an instance for the given path: hdfs://master:9000
java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
at tachyon.underfs.UnderFileSystemRegistry.create(UnderFileSystemRegistry.java:132)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:100)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:83)
at tachyon.master.TachyonMaster.connectToUFS(TachyonMaster.java:412)
at tachyon.master.TachyonMaster.startMasters(TachyonMaster.java:280)
at tachyon.master.TachyonMaster.start(TachyonMaster.java:261)
at tachyon.master.TachyonMaster.main(TachyonMaster.java:64)
What should I do for this problem?

This exception arises due to version mismatch of Hadoop client and server side. Check your Hadoop version, and then recompile Tachyon against that version using this command:
mvn -Dhadoop.version=your_hadoop_version clean install
Example: mvn -Dhadoop.version=2.4.0 clean install
Now configure your compiled Tachyon and it should work fine. Reference link.

jvisualvm connect to remote jstatd not showing applications

I started a jstatd on the remote server (Ubuntu Server 14.04):
jstatd -J-Djava.security.policy=.jstatd.all.policy -J-Djava.rmi.server.logCalltrue -p 9099
and try to connect to it with jvisualvm on windows. I checked netstat, the connection is established, and on the remote it logs the call:
Sep 11, 2015 12:48:51 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(4)-10.82.199.0: [10.82.199.0: sun.rmi.registry.RegistryImpl[0:0:0, 0]: java.rmi.Remote lookup(java.lang.String)]
Sep 11, 2015 12:48:55 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(4)-10.82.199.0: [10.82.199.0: sun.rmi.registry.RegistryImpl[0:0:0, 0]: java.rmi.Remote lookup(java.lang.String)]
Sep 11, 2015 12:48:59 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(4)-10.82.199.0: [10.82.199.0: sun.rmi.registry.RegistryImpl[0:0:0, 0]: java.rmi.Remote lookup(java.lang.String)]
All signs are saying that it's working. but however no applications is showing in jvisualvm:

Apparently VisualVM expects a consistent DNS name for the server you're trying to connect to remotely (the Ubuntu Server 14.04 in your case). Hence, if you're specifying an IP address instead of a DNS name to VisualVM you should add the following to your jstatd startup line:
-J-Djava.rmi.server.hostname=<the IP address to your Ubuntu server here>
Additionally, I found out that specifying the port option (-p 9099 in your case) is not supported in some VisualVM releases:
Known limitation: In this VisualVM release the jstatd's default port and rminame must be used when starting the jstatd utility, i.e. the use of the -p and -n options is not supported.
VisualVM Troubleshooting Guide
All in all, you should try running the following jstatd line on your Ubuntu Server:
jstatd -J-Djava.security.policy=.jstatd.all.policy -J-Djava.rmi.server.hostname=10.82.83.117 -J-Djava.rmi.server.logCalltrue
Sources:
http://www.catify.com/2012/09/26/remote-monitoring-with-visualvm/
It worked for me :)

jstatd -p 1099 -J-Djava.rmi.sver.hostname=10.250.105.112 -J-Djava.security.policy=<(echo 'grant codebase "file:${java.home}/../lib/tools.jar" {permission java.security.AllPermission;};')
Works for Me Perfectly

In case this helps someone else...
I was running into problems where neither jstatd nor adding a plain JMX connection in VisualVM worked. The former would not give any error messages, it just wouldn't list any apps. The latter would give me an error saying "Cannot connect to some-server:30648 using service:jmx:rmi:///jndi/rmi://some-server:30648/jmxrmi.
Trying to use the excellent sjk-plus tool to manually connect to the JMX service gave the following error:
$ java --add-opens java.base/jdk.internal.perf=ALL-UNNAMED \
--add-opens jdk.attach/sun.tools.attach=ALL-UNNAMED \
-Dsjk.breakCage=false \
-jar scripts/sjk-plus-0.14.jar
mx --get --allMatched -b com.acme.some.package:name=* -f Count \
-s some-server:30648
JMX Connection failed: java.rmi.ConnectException: Connection refused to host: 127.0.1.1; nested exception is:
java.net.ConnectException: Connection refused (Connection refused)
Do you see it? 127.0.1.1, what is that weird IP address doing there?
This was caused by a particular entry in the /etc/hosts file on the server:
user#some-server:~$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 some-server
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Changing the some-server entry in the hosts file and restarting the process made it work with sjk-plus, and made it discoverable with jstatd as well.

in java 11+
jstatd -J-Djava.rmi.server.logCalls=true \
-J-Djava.security.policy=.jstatd.all.policy \
-J-Djava.net.preferIPv4Stack=true \
-J-Djava.security.policy=<(echo 'grant codebase "jrt:/jdk.jstatd" {permission java.security.AllPermission;}; grant codebase "jrt:/jdk.internal.jvmstat" {permission java.security.AllPermission;};')

Zookeeper cluster set up

I am able to set up zookeeper cluster on 1 machine with 3 different ports, but when I do the same with different IP to have zookeeper instance on different machines, it throws following error:
2014-11-20 12:16:24,819 [myid:1] - INFO [main:QuorumPeerMain#127] - Starting quorum peer
2014-11-20 12:16:24,827 [myid:1] - INFO [main:NIOServerCnxnFactory#94] - binding to port 0.0.0.0/0.0.0.0:2181
2014-11-20 12:16:24,842 [myid:1] - INFO [main:QuorumPeer#959] - tickTime set to 2000
2014-11-20 12:16:24,842 [myid:1] - INFO [main:QuorumPeer#979] - minSessionTimeout set to -1
2014-11-20 12:16:24,842 [myid:1] - INFO [main:QuorumPeer#990] - maxSessionTimeout set to -1
2014-11-20 12:16:24,842 [myid:1] - INFO [main:QuorumPeer#1005] - initLimit set to 10
2014-11-20 12:16:24,857 [myid:1] - INFO [Thread-1:QuorumCnxManager$Listener#504] - My election bind port: /172.16.1.175:2223
2014-11-20 12:16:24,870 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer#714] - LOOKING
2014-11-20 12:16:24,873 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#815] - New election. My id = 1, proposed zxid=0x0
2014-11-20 12:16:24,876 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#597] - Notification: 1 (message format version), 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2014-11-20 12:16:24,881 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager#382] - Cannot open channel to 2 at election address /172.16.1.170:2223
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Thread.java:744)

have you started zookeeper in all the three nodes ? In a multi-cluster set up (assuming you have a distributed environment with multiple machines) every server knows about the other nodes present in the cluster known as ensemble. It does this by looking at the following piece of line in the zoo.cfg file.
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
In multi-cluster set up doc page it says
As long as a majority of the ensemble are up, the service will be available. Because Zookeeper requires a majority, it is best to use an odd number of machines. For example, with four machines ZooKeeper can only handle the failure of a single machine; if two machines fail, the remaining two machines do not constitute a majority. However, with five machines ZooKeeper can handle the failure of two machines
now unless you start the process in all three nodes it wont be able to communicate with each other and keep logging such errors. This probably might help you get somewhere.

How to Setup Zookeeper for Multiple Clusters or Remote servers?
Step 1: Check the Java 1.8.0 or above version is available in the system under
/Opt/ java -version
Step 2: Download Zookeeper-3.3.6 from the link by using the below command
Sudo wget http://redrockdigimark.com/apachemirror/zookeeper/zookeeper-3.3.6/zookeeper-3.3.6.tar.gz
Step 3: Extract the File by using the below Command
Sudo tar xzf zookeeper-3.3.6.tar.gz -C /opt/
Step 4: Mapper the zookeeper -3.3.6 to Zookeeper as below
/opt/> ls -s zookeeper-3.3.6 zookeeper then
/opt/> Cd zookeeper/conf
Step 5: Create a Configuration file by copying of zoo.cfg from zoo_sample.cfg /opt/zookeeper/conf/>
cp zoo.cfg sample_zoo.cfg
Step 6: Edit the zoo.cfg by using the command /opt/zookeeper/conf/>
sudo vi zoo.cfg
Create the Data directory as DataDir=/var/lib/zookeeper
Step 7: Create a file without extension as myid under /var/lib/zookeeper
and give the unique id as 1 for server1
Add all the cluster server in the botton as
server.1=0.0.0.0:2888:3888
server.2=184.72.205.209:2888:3888
server.3=34.207.92.20:2888:3888
Step 8: Create a file without extension as myid under /var/lib/zookeeper
And give the unique id as 2 for server2
Step 9: The Same configuration to be applied for the second server as below
server.1=34.229.138.19:2888:3888
server.2=0.0.0.0:2888:3888
server.3=34.207.92.20:2888:3888
Step 10: Install nc package and lsof packages as below
Sudo yum install nc
Sudo yum install lsof
Step 11:Now Start the Zookeeper in all servers as
Sudo /opt/zookeeper/bin/zkServer.sh start
Step 12: To Stop the Zookeeper Server
Sudo /opt/zookeeper/bin/zkServer.sh Stop
To Check the Status of Zookeeper Server
Sudo /opt/zookeeper/bin/zkServer.sh Status
Important Points to be noted
1.For Zookeeper 2F+1 server to be maintained ie. If you have 1 servers then (2*1)+1=3 Servers to be maintained , if you have 2 servers then (2*2)+1=5 Servers to be maintained , F stands for number of servers
2.All the Servers should have zoo.cfg configuration file and the local servers IP should be 0.0.0.0
3.zookeeper uses 2888 port to connect to individual followers nodes with the leader node
4.Port 3888 is for peer to peer communication
5.Leader election will be taken care by zookeeper automatically, and if the leader down, with in 2 micro seconds , it will elect the other leader and shares the information of the followers
6.In zoo.cfg configuration file Client port must be 2181

jps can't connect to a remote jstatd

I'm trying query a remote JVM with jps using jstatd, in order to eventually monitor it using VisualVM.
I got jstatd running with the following security policy:
grant codebase "file:${java.home}/../lib/tools.jar" {
permission java.security.AllPermission;
};
jstatd is running on a 64-bit Linux box with a 1.6.0_10 version HotSpot vm. The jstatd command is:
jstatd -J-Djava.security.policy=jstatd.tools.policy -J-Djava.rmi.server.logCalls=true
I'm trying to run jps from a Windows 7 machine. Due to firewall restrictions, I'm tunneling the RMI data through an SSH tunnel to my Windows machine such that the jps command line is:
.\jps.exe -m -l rmi://localhost
When I run jps, I see the connection attempt in the jstatd log, which looks like this:
Feb 1, 2011 11:50:34 AM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(3)-127.0.0.1: [127.0.0.1: sun.rmi.registry.RegistryImpl[0:0:0, 0]: java.rmi.Remote lookup(ja va.lang.String)]
but on the jps side I get the following error:
Error communicating with remote host: Connection refused to host: 192.168.1.137; nested exception is:
java.net.ConnectException: Connection refused: connect
Based on the connection attempt listed in the jstatd log, I think jps is actually reaching the host, but for some reason is getting blocked. Is there some security policy I have set or some other setting somewhere I can change so that I can get jps to pull stats from the remote jstatd?

My guess is that you're only forwarding the RMI registry port (1099), but you need to also open another port.
Check which ports on the remote side
# netstat -nap | grep jstatd
tcp 0 0 :::1099 :::* LISTEN 453/jstatd
tcp 0 0 :::58204 :::* LISTEN 453/jstatd
In this case you will need to forward port 58204 as well as 1099

Here is how you could easily do this.
Launch ejstatd in your remote host this way (executing from the ejstatd folder): mvn exec:java -Dexec.args="-pr 2000 -ph 2001 -pv 2002"
Open those 3 ports on your remote host and make them available to your local machine: 2000, 2001 and 2002
On your local machine, you will be able to use jps replacing <remotehost> with your remote host name: jps -m -l rmi://<remotehost>:2000
Disclaimer: I'm the author of the open source ejstatd tool

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

NoRouteToHostException while hadoop fs -copyFromLocal - java

your /etc/hosts should contain: XXX.XXX.XXX.65 mccb-com65 #server XXX.XXX.XXX.66 mccb-com66 #client01 XXX.XXX.XXX.67 mccb-com67 #client02 Remove 127.0.0.1 localhost 127.0.1.1 mccb-com65

Related

Two Rio instances on same network - java.rmi.ConnectException: Connection refused to host: 127.0.1.1;

tachyon0.8.2 deployed with hadoop2.6.0,but the IPC version are not matched

jvisualvm connect to remote jstatd not showing applications

Zookeeper cluster set up

jps can't connect to a remote jstatd

Categories

Resources