Zookeeper cluster set up - java

I am able to set up zookeeper cluster on 1 machine with 3 different ports, but when I do the same with different IP to have zookeeper instance on different machines, it throws following error:
2014-11-20 12:16:24,819 [myid:1] - INFO [main:QuorumPeerMain#127] - Starting quorum peer
2014-11-20 12:16:24,827 [myid:1] - INFO [main:NIOServerCnxnFactory#94] - binding to port 0.0.0.0/0.0.0.0:2181
2014-11-20 12:16:24,842 [myid:1] - INFO [main:QuorumPeer#959] - tickTime set to 2000
2014-11-20 12:16:24,842 [myid:1] - INFO [main:QuorumPeer#979] - minSessionTimeout set to -1
2014-11-20 12:16:24,842 [myid:1] - INFO [main:QuorumPeer#990] - maxSessionTimeout set to -1
2014-11-20 12:16:24,842 [myid:1] - INFO [main:QuorumPeer#1005] - initLimit set to 10
2014-11-20 12:16:24,857 [myid:1] - INFO [Thread-1:QuorumCnxManager$Listener#504] - My election bind port: /172.16.1.175:2223
2014-11-20 12:16:24,870 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer#714] - LOOKING
2014-11-20 12:16:24,873 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#815] - New election. My id = 1, proposed zxid=0x0
2014-11-20 12:16:24,876 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#597] - Notification: 1 (message format version), 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2014-11-20 12:16:24,881 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager#382] - Cannot open channel to 2 at election address /172.16.1.170:2223
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Thread.java:744)

have you started zookeeper in all the three nodes ? In a multi-cluster set up (assuming you have a distributed environment with multiple machines) every server knows about the other nodes present in the cluster known as ensemble. It does this by looking at the following piece of line in the zoo.cfg file.
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
In multi-cluster set up doc page it says
As long as a majority of the ensemble are up, the service will be available. Because Zookeeper requires a majority, it is best to use an odd number of machines. For example, with four machines ZooKeeper can only handle the failure of a single machine; if two machines fail, the remaining two machines do not constitute a majority. However, with five machines ZooKeeper can handle the failure of two machines
now unless you start the process in all three nodes it wont be able to communicate with each other and keep logging such errors. This probably might help you get somewhere.

How to Setup Zookeeper for Multiple Clusters or Remote servers?
Step 1: Check the Java 1.8.0 or above version is available in the system under
/Opt/ java -version
Step 2: Download Zookeeper-3.3.6 from the link by using the below command
Sudo wget http://redrockdigimark.com/apachemirror/zookeeper/zookeeper-3.3.6/zookeeper-3.3.6.tar.gz
Step 3: Extract the File by using the below Command
Sudo tar xzf zookeeper-3.3.6.tar.gz -C /opt/
Step 4: Mapper the zookeeper -3.3.6 to Zookeeper as below
/opt/> ls -s zookeeper-3.3.6 zookeeper then
/opt/> Cd zookeeper/conf
Step 5: Create a Configuration file by copying of zoo.cfg from zoo_sample.cfg /opt/zookeeper/conf/>
cp zoo.cfg sample_zoo.cfg
Step 6: Edit the zoo.cfg by using the command /opt/zookeeper/conf/>
sudo vi zoo.cfg
Create the Data directory as DataDir=/var/lib/zookeeper
Step 7: Create a file without extension as myid under /var/lib/zookeeper
and give the unique id as 1 for server1
Add all the cluster server in the botton as
server.1=0.0.0.0:2888:3888
server.2=184.72.205.209:2888:3888
server.3=34.207.92.20:2888:3888
Step 8: Create a file without extension as myid under /var/lib/zookeeper
And give the unique id as 2 for server2
Step 9: The Same configuration to be applied for the second server as below
server.1=34.229.138.19:2888:3888
server.2=0.0.0.0:2888:3888
server.3=34.207.92.20:2888:3888
Step 10: Install nc package and lsof packages as below
Sudo yum install nc
Sudo yum install lsof
Step 11:Now Start the Zookeeper in all servers as
Sudo /opt/zookeeper/bin/zkServer.sh start
Step 12: To Stop the Zookeeper Server
Sudo /opt/zookeeper/bin/zkServer.sh Stop
To Check the Status of Zookeeper Server
Sudo /opt/zookeeper/bin/zkServer.sh Status
Important Points to be noted
1.For Zookeeper 2F+1 server to be maintained ie. If you have 1 servers then (2*1)+1=3 Servers to be maintained , if you have 2 servers then (2*2)+1=5 Servers to be maintained , F stands for number of servers
2.All the Servers should have zoo.cfg configuration file and the local servers IP should be 0.0.0.0
3.zookeeper uses 2888 port to connect to individual followers nodes with the leader node
4.Port 3888 is for peer to peer communication
5.Leader election will be taken care by zookeeper automatically, and if the leader down, with in 2 micro seconds , it will elect the other leader and shares the information of the followers
6.In zoo.cfg configuration file Client port must be 2181

Related

DataNode cannot connect with Name Node - "org.apache.hadoop.ipc.Client: Retrying connect to server"

I've deployed a Hadoop 3.1.2 cluster with 1 Namenode and 2 Datanodes. NameNode is UP, secondaryNameNode and ResourceManager also up for Master Node, however DataNode cannot connect with the NameNode, thus no capacity is shown.
I've been trying to find out what the error might be, but haven't succeed so far.
Removed domain resolutions as I was getting odd errors:
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [server]
lim_sbo_bigdata_master: ERROR: Cannot set priority of namenode process 11606
Starting datanodes
Starting secondary namenodes [server]
lim_sbo_bigdata_master: ERROR: Cannot set priority of secondarynamenode process 11825
Starting resourcemanager
Starting nodemanagers
* SELinux is disabled
* IPtables is OPEN for all traffic:
hadoop#lim_server]$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Servers belong to the same network.
NameNode:
[hadoop#server ~]$ hadoop version
Hadoop 3.1.2
Source code repository https://github.com/apache/hadoop.git -r 1019dde65bcf12e05ef48ac71e84550d589e5d9a
Compiled by sunilg on 2019-01-29T01:39Z
Compiled with protoc 2.5.0
From source with checksum 64b8bdd4ca6e77cce75a93eb09ab2a9
This command was run using /home/hadoop/hadoop-3.1.2/share/hadoop/common/hadoop-common-3.1.2.jar
[hadoop#server ~]$ jps
27089 Jps
26760 ResourceManager
26491 SecondaryNameNode
26239 NameNode
[hadoop#server ~]$ hdfs dfsadmin -report
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
DataNode Error
[hadoop#server_2]$ jps
17052 DataNode
17166 NodeManager
17406 Jps
2019-08-27 05:46:09,086 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 9867
2019-08-27 05:46:09,229 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /0.0.0.0:9867
2019-08-27 05:46:09,243 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: null
2019-08-27 05:46:09,251 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: <default>
2019-08-27 05:46:09,260 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to /10.30.17.228:9000 starting to offer serv
ice
2019-08-27 05:46:09,265 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2019-08-27 05:46:09,265 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9867: starting
2019-08-27 05:46:10,330 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 10.30.17.228/10.30.17.228:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountW
ithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-08-27 05:46:11,331 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 10.30.17.228/10.30.17.228:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountW
ithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
Try changing "localhost" to the actual hostname or IP of the namenodes.

Remote debug ec2 java instance

My application is running in EC2 as a docker with java application.
I'm exposing 5005 port for debug, and locally it works perfectly. However on EC2 environment I get
java.net.ConnectException "Connection refused (Connection refused)"
when trying to connect using Intelij.
Security group is set to open ports 80, 5005, 22
Docker is exposing port 80 and 5005
Application is running with java args
-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=5005,suspend=n
Am I missing something ?
For those who are still interested, here is a way how to create a Remote JVM Debug on EC2 with docker
On the yaml file add the 'port' attribute.
ports:
- "5005:5005"
To the dockerfile run the Jar with the following command
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
On the inbound role on the EC2:
Cutsom TCP => 5005 => Your IP
In the Intellij create Remote JVM debug
In the host enter the host ip
Port: 5005
Choose JDK 9 or later as the address should be with *:5005
Click the Debug Button and it should work

tachyon0.8.2 deployed with hadoop2.6.0,but the IPC version are not matched

Now,I want to deploy the tachyon0.8.2 on my ubuntu14.04,I already has hadoop and spark:
on the master
bd#master$ jps
11871 Jps
3388 Master
2919 NameNode
3266 ResourceManager
3123 SecondaryNameNode
on the slave
bd#slave$ jps
4350 Jps
2778 NodeManager
2647 DataNode
2879 Worker
And I editor the taachyon-env.sh:
export TACHYON_MASTER_ADDRESS=${TACHYON_MASTER_ADDRESS:-master}
export TACHYON_UNDERFS_ADDRESS=${TACHYON_UNDERFS_ADDRESS:-hdfs://master:9000}
Then, I run the bin/tachyon formatand bin/tachyon-start.sh local.
I cannot see the tachyonMaster in JPS:
/usr/local/bigdata/tachyon-0.8.2 [06:06:32]
bd$ bin/tachyon-start.sh local
Killed 0 processes on master
Killed 0 processes on master
Connecting to master as bd...
Killed 0 processes on master
Connection to master closed.
[sudo] password for bd:
Formatting RamFS: /mnt/ramdisk (512mb)
Starting master # master
Starting worker # master
/usr/local/bigdata/tachyon-0.8.2 [06:06:54]
bd$ jps
12183 TachyonWorker
3388 Master
2919 NameNode
3266 ResourceManager
3123 SecondaryNameNode
12203 Jps
and I see the logs in master.logs,I said that:
2015-12-27 18:06:50,635 ERROR MASTER_LOGGER (MetricsConfig.java:loadConfigFile) - Error loading metrics configuration file.
2015-12-27 18:06:51,735 ERROR MASTER_LOGGER (HdfsUnderFileSystem.java:<init>) - Exception thrown when trying to get FileSystem for hdfs://master:9000
org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at tachyon.underfs.hdfs.HdfsUnderFileSystem.<init>(HdfsUnderFileSystem.java:74)
at tachyon.underfs.hdfs.HdfsUnderFileSystemFactory.create(HdfsUnderFileSystemFactory.java:30)
at tachyon.underfs.UnderFileSystemRegistry.create(UnderFileSystemRegistry.java:116)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:100)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:83)
at tachyon.master.TachyonMaster.connectToUFS(TachyonMaster.java:412)
at tachyon.master.TachyonMaster.startMasters(TachyonMaster.java:280)
at tachyon.master.TachyonMaster.start(TachyonMaster.java:261)
at tachyon.master.TachyonMaster.main(TachyonMaster.java:64)
2015-12-27 18:06:51,742 ERROR MASTER_LOGGER (TachyonMaster.java:main) - Uncaught exception terminating Master
java.lang.IllegalArgumentException: All eligible Under File Systems were unable to create an instance for the given path: hdfs://master:9000
java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
at tachyon.underfs.UnderFileSystemRegistry.create(UnderFileSystemRegistry.java:132)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:100)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:83)
at tachyon.master.TachyonMaster.connectToUFS(TachyonMaster.java:412)
at tachyon.master.TachyonMaster.startMasters(TachyonMaster.java:280)
at tachyon.master.TachyonMaster.start(TachyonMaster.java:261)
at tachyon.master.TachyonMaster.main(TachyonMaster.java:64)
What should I do for this problem?
This exception arises due to version mismatch of Hadoop client and server side. Check your Hadoop version, and then recompile Tachyon against that version using this command:
mvn -Dhadoop.version=your_hadoop_version clean install
Example: mvn -Dhadoop.version=2.4.0 clean install
Now configure your compiled Tachyon and it should work fine. Reference link.

Failed to write pid zookeeper installing zookeeper

I was following previous posts but still not able to resolve the issue. I am trying to install zookeeper and start it to run summing-bird which is run to provide bolts/spouts to storm for online and batch. I installed zookeeper version 3.4.6 first and was getting class not found exception. After looking at the post
ClassNotFoundException for Zookeeper while building Storm
I downgraded the version to 3.3.6 and now I am not even able to start the zookeeper server. Any help will be really appreciated.
root#cp-1:/users/username/zookeeper-3.3.6/bin# ./zkServer.sh start
JMX enabled by default
Using config: /users/username/zookeeper-3.3.6/bin/../conf/zoo.cfg
Starting zookeeper ... ./zkServer.sh: 93: [: /tmp/zookeeper/: unexpected operator
./zkServer.sh: 103: ./zkServer.sh: cannot create /tmp/zookeeper/
The number of snapshots to retain in dataDir/zookeeper_server.pid: Directory nonexistent
FAILED TO WRITE PID
This is how my zoo.cfg file looks like
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/tmp/zookeeper/
dataLogDir=/tmp/logs/zookeeper/
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=10.11.10.3:2888:3888
server.2=10.11.10.4:2888:3888
This is how access looks like
drwxr-xr-x 2 username oppts-PG0 4096 Nov 25 14:35 zookeeper
drwxr-xr-x 3 root root 4096 Nov 25 14:46 logs
drwxr-xr-x 2 root root 4096 Nov 25 14:46 logs/zookeeper
As stated in the contents of zoo.cfg, you’d not better to set the dataDir to /tmp/zookeeper.
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
You can try to set dataDir to other directory that you created. And then restart zkServer.sh.

NoRouteToHostException while hadoop fs -copyFromLocal

I installed hadoop 2.5.1 on CentOS7.0
and I'm using 3 computers with below hosts file, the same as all 3 computers
I'm not using DNS.
XXX.XXX.XXX.65 mccb-com65 #server
XXX.XXX.XXX.66 mccb-com66 #client01
XXX.XXX.XXX.67 mccb-com67 #client02
127.0.0.1 localhost
127.0.1.1 mccb-com65
I execute the command
$hadoop fs -copyFromLocal /home/hadoop/hdfs/hdfs/s_corpus.txt hdfs://XXX.XXX.XXX.65:9000/tmp/
I met below error message
INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.NoRouteToHostException: No route to host at
sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at
org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1526)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1328)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1281)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:526)
15/02/27 16:57:40 INFO hdfs.DFSClient: Abandoning
BP-1257634566-XXX.XXX.XXX.65-1425014347197:blk_1073741837_1013
15/02/27 16:57:40 INFO hdfs.DFSClient: Excluding datanode
XXX.XXX.XXX.67:50010 <-- the same as another salve node XXX.XXX.XXX.66
I turn off all firewall of both computers mccb-com66 and mccb-com67 as below state shows.
$systemctl status iptables
iptables.service - IPv4 firewall with iptables Loaded: loaded
(/usr/lib/systemd/system/iptables.service; disabled)
Active:
inactive (dead)
and Additionally I also turn off selinux.
datanode and nodemanager are alive in both machines
I can check the state
jps and
http://mccb-com65:50070 and
http://mccb-com65:8088
What I'm missing?
Could you anybody help me???
Even though I turn off the iptables, it's not valid solution.
After I open port one by one with firewall-cmd, it works..
for all slaves (66 and 67)
$firewall-cmd --zone=public --add-port=8042/tcp
$firewall-cmd --zone=public --add-port=50010/tcp
$firewall-cmd --zone=public --add-port=50020/tcp
$firewall-cmd --zone=public --add-port=50075/tcp
$firewall-cmd --reload
and then it works.
However, since I cannot open all ports which need to run Hadoop App,
turn off firewalld is reasonable such as
$systemctl stop firewalld
$systemctl disable firewalld
and check the status
$Systemctl status firewalld
your /etc/hosts should contain:
XXX.XXX.XXX.65 mccb-com65 #server
XXX.XXX.XXX.66 mccb-com66 #client01
XXX.XXX.XXX.67 mccb-com67 #client02
Remove
127.0.0.1 localhost
127.0.1.1 mccb-com65

Categories

Resources