Hadoop Job hangs at ACCEPTED, with yarn resourcemanager log java.net.UnknownHostException - java

As is described in the title, I deployed a hadoop v2.6.3 cluster on an internal network with static ip like 10.0.0.x.
Then I ran an example WordCount Program However, the shell just give the outputs and hangs:
hadoop jar wc.jar WordCount /user/alex/data/kaggle.sample /user/alex/wc/output
16/04/06 10:44:29 INFO client.RMProxy: Connecting to ResourceManager at master/10.0.0.7:8032
16/04/06 10:44:29 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/04/06 10:44:30 INFO input.FileInputFormat: Total input paths to process : 1
16/04/06 10:44:30 INFO mapreduce.JobSubmitter: number of splits:1
16/04/06 10:44:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1459942813464_0002
16/04/06 10:44:30 INFO impl.YarnClientImpl: Submitted application application_1459942813464_0002
16/04/06 10:44:30 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1459942813464_0002/
16/04/06 10:44:30 INFO mapreduce.Job: Running job: job_1459942813464_0002
Then I goes to Hadoop Cluster Web UI, and found that the job status is ACCEPTED, and not running. I checked the log file of YARN.ResourceManager, and its last ERROR message is like this:
2016-04-06 10:34:42,466 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Error trying to assign container token and NM token to an allocated container container_1459942813464_0001_02_000001
java.lang.IllegalArgumentException: java.net.UnknownHostException: worker14.alex
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:256)
at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:220)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.pullNewlyAllocatedContainersAndNMTokens(SchedulerApplicationAttempt.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.getAllocation(FiCaSchedulerApp.java:269)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:896)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:937)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:930)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:842)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:823)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: worker14.alex
... 19 more
The Hadoop Configuration file is following:
#core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:8020/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/alex/hadoop-2.6.3/tmp/</value>
</property>
</configuration>
#yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/alex/hadoop-2.6.3/tmp/nm.local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/home/alex/hadoop-2.6.3/log/nm.log</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
#mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>10.0.0.7:10020</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/home/alex/hadoop-2.6.3/tmp/staging</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/home/alex/hadoop-2.6.3/tmp/mr-history/tmp</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/home/alex/hadoop-2.6.3/tmp/mr-history/done</value>
</property>
</configuration>
/etc/hosts file have map ips to either master or worker1 - worker14
slaves file are master, worker1 - worker14
It seems that my hostname resolve goes wrong. It is worker14.alex rather than worker14 (alex is my linux username)
So what's wrong with my configuration? Do I need to restart all the servers? Or I just need to restart some of the services like service networking restart?

were you able to get to a resolution? I'm seeing the exact same issue, I see a Caused by: java.net.UnknownHostException: var exception. – Nishant Kelkar
Check your yarn-site.xml, this value:
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/var/log/hadoop-yarn/apps</value>
If you put "hdfs://" before the path, the error occurs.

Related

Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException) for hadoop 3.1.3

I am trying to run a mapreduce job but I am getting error for Hadoop-3.1.3
hadoop jar WordCount.jar WordcountDemo.WordCount /mapwork/Mapwork /r_out
Error
2020-04-04 19:59:11,379 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2020-04-04 19:59:12,499 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2020-04-04 19:59:12,569 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007
2020-04-04 19:59:12,727 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007/job.jar could only be written to 0 of the 1 minReplication nodes. There are 0 datanode(s) running and 0 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2205)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2731)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:892)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:568)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2916)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1545)
at org.apache.hadoop.ipc.Client.call(Client.java:1491)
at org.apache.hadoop.ipc.Client.call(Client.java:1388)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:514)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1081)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1866)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1668)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
2020-04-04 19:59:12,734 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging/tejashri/.staging/job_1586009643433_0007/job.jar could only be written to 0 of the 1 minReplication nodes. There are 0 datanode(s) running and 0 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2205)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2731)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:892)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:568)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2916)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1545)
at org.apache.hadoop.ipc.Client.call(Client.java:1491)
at org.apache.hadoop.ipc.Client.call(Client.java:1388)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:514)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1081)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1866)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1668)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
Update (from comments):
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>C:\hadoop\hdfstmp</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:\hadoop\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:\hadoop\data\datanode</value>
</property>
<property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>0</value>
</property>
</configuration>
Output of jps:
16832 NodeManager
5556 ResourceManager
18280 NameNode
11708 Jps
datanode error log:
2020-04-04 21:42:25,150 WARN common.Storage: Failed to add storage directory [DISK]file:/C:/hadoop/data/datanode
java.io.IOException: Incompatible clusterIDs in C:\hadoop\data\datanode: namenode clusterID = CID-199fd5c5-1f1d-4c44-9e39-80995486695e; datanode clusterID = CID-16d0af22-57e1-4531-a5c8-4bf3eefd351d
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:744)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:294)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:407)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:387)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:559)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1743)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1679)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:390)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:282)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:822)
at java.lang.Thread.run(Thread.java:748)
2020-04-04 21:42:25,156 ERROR datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid 7578b7ba-c42a-476b-abc2-2088b15b3474) service to localhost/127.0.0.1:9000. Exiting.
java.io.IOException: All specified directories have failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:560)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1743)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1679)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:390)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:282)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:822)
at java.lang.Thread.run(Thread.java:748)
2020-04-04 21:42:25,158 WARN datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid 7578b7ba-c42a-476b-abc2-2088b15b3474) service to localhost/127.0.0.1:9000
2020-04-04 21:42:25,261 INFO datanode.DataNode: Removed Block pool <registering> (Datanode Uuid 7578b7ba-c42a-476b-abc2-2088b15b3474)
2020-04-04 21:42:27,274 WARN datanode.DataNode: Exiting Datanode
The Mapreduce job fails because it is unable to access HDFS since There are 0 datanode(s) running and 0 node(s) are excluded in this operation.
And from the datanode logs, it is understood that the Datanode daemon is unable to register itself with the HDFS cluster due to Incompatible clusterIDs.
When a namenode is formatted (during installation and setup), a clusterID is generated and this clusterID is stored in the VERSION file of each daemon when they initialize. This clusterID acts as the identifier for the datanodes, letting them to rejoin the cluster whenever they are stopped and started.
Incompatible clusterIDs among the nodes can happen when the namenode is formatted on an active cluster and the other daemons are not re-initialized.
To get the cluster back in form,
Stop the cluster
Delete the contents of the following
directories C:\hadoop\hdfstmp, C:\hadoop\data\namenode,
C:\hadoop\data\datanode
Format the namenode
Start the cluster
You have recopy the data required for the Mapreduce job and run the job.
I do not have the option to shut down and restart my cluster. However, running the following command solved the problem without causing any other issue that I could see.
hdfs dfsadmin -safemode leave
See the following:
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#DFSAdmin_Command
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode

Running MapReduce word count on Hadoop gives Exception message: The system cannot find the path specified

this is my first Stack Overflow question ever. I've setup my hadoop (2.9.2) single node cluster in pseudo distributed mode. When i try to run hadoop jar C:/MapReduceClient.jar wordcount /input_dir /output_dir, i get the following log with error
19/01/16 20:19:17 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/01/16 20:19:18 INFO input.FileInputFormat: Total input files to process : 1
19/01/16 20:19:19 INFO mapreduce.JobSubmitter: number of splits:1
19/01/16 20:19:19 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/01/16 20:19:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547662294790_0002
19/01/16 20:19:19 INFO impl.YarnClientImpl: Submitted application application_1547662294790_0002
19/01/16 20:19:19 INFO mapreduce.Job: The url to track the job: http://DESKTOP-XXXXXX:8088/proxy/application_1547662294790_0002/
19/01/16 20:19:19 INFO mapreduce.Job: Running job: job_1547662294790_0002
19/01/16 20:19:19 INFO mapreduce.Job: Running job: job_1547662294790_0002
19/01/16 20:19:27 INFO mapreduce.Job: Job job_1547662294790_0002 running in uber mode : false
**19/01/16 20:19:27 INFO mapreduce.Job: map 0% reduce 0%**
**19/01/16 20:19:27 INFO mapreduce.Job: Job job_1547662294790_0002 failed with state FAILED due to: Application application_1547662294790_0002 failed 2 times due to AM Container for appattempt_1547662294790_0002_000002 exited with exitCode: 1**
Failing this attempt.Diagnostics: [2019-01-16 20:19:25.234]Exception from container-launch.
Container id: container_1547662294790_0002_02_000001
Exit code: 1
**Exception message: The system cannot find the path specified.**
The system cannot find the path specified.
The system cannot find the path specified.
[2019-01-16 20:19:25.236]Container exited with a non-zero exit code 1.
[2019-01-16 20:19:25.236]Container exited with a non-zero exit code 1.
For more detailed output, check the application tracking page: http://DESKTOP-XXXXX:8088/cluster/app/application_1547662294790_0002 Then click on links to logs of each attempt
. Failing the application.
19/01/16 20:19:28 INFO mapreduce.Job: Counters: 0
The same setup with the same .jar is working on my other pc and the output is correct. Windows 10 Pro x64 (both)
Only difference is that the working one has java 1.8.0_171 installed
JAVA_HOME= C:\Java\jdk1.8.0_201
HADOOP_HOME= C:\hadoop-2.9.2
PATH=%JAVA_HOME%\bin;C:\hadoop-2.9.2\bin
My config files:
/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:\hadoop-2.9.2\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:\hadoop-2.9.2\data\datanode</value>
</property>
</configuration>
/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
Solved. It was the default user name being non latin characters messing the node manager up. Checked with a whoami command only to find out that the default user name was "???????"

HBase access from Java API Client

I'm having some troubles accessing HBase from java API Client and I can't figure out what I'm doing wrong.
I'm using HBase 1.1.2 in standalone mode on VM (10.166.205.41) with RHEL6 and JAVA 1.7.
Here is my HBase configuration from the hbase-site.xml
<configuration>
<property>
<name>hbase.zookeeper.quorum</name>
<value>10.166.205.41</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>9091</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/hbaserootdir/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/hbaserootdir/zookeeper</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
</configuration>
My regionservers file is defined as followed:
10.166.205.41
The HBase shell client is working fine and i can access the HBase master UI from url 10.166.205.41:16010.
Here is my Java API Client running on Eclipse on Windows 7.
Pom.xml
<dependencies>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.1.2</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>1.1.2</version>
<type>pom</type>
</dependency>
</dependencies>
Source code:
public class InsertData {
final static Logger logger = Logger.getLogger(InsertData.class);
public static void main(String[] args) throws IOException {
Configuration config = HBaseConfiguration.create();
config.setInt("timeout", 120000);
config.set("hbase.zookeeper.quorum","10.166.205.41");
config.set("hbase.zookeeper.property.clientPort", "9091");
Connection connection = ConnectionFactory.createConnection(config);
Table table = connection.getTable(TableName.valueOf("emp"));
try {
Get g = new Get(Bytes.toBytes("1"));
Result result = table.get(g);
byte [] name = result.getValue(Bytes.toBytes("personal data"), Bytes.toBytes("name"));
logger.info("name : " + Bytes.toString(name));
} finally {
table.close();
connection.close();
}
}
}
During execution connection to server 10.166.205.41:41571 failed.
2018-12-14 16:35:19 DEBUG FailedServers:56 - Added failed server with address hlzudd5hdf01.yres.ytech/10.166.205.41:41571 to list caused by org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: connection timed out: hlzudd5hdf01.yres.ytech/10.166.205.41:41571
2018-12-14 16:35:19 DEBUG ClientCnxn:843 - Reading reply sessionid:0x167ad4d815d000b, packet:: clientPath:/hbase/meta-region-server serverPath:/hbase/meta-region-server finished:false header:: 3,4 replyHeader:: 3,4697,0 request:: '/hbase/meta-region-server,F response:: #ffffffff0001a726567696f6e7365727665723a3431353731ffffffa0ffffffe9ffffff80fffffffd5611ffffff8c6a50425546a24a17686c7a7564643568646630312e797265732e797465636810ffffffe3ffffffc4218ffffffceffffff93ffffffb6ffffffeafffffffa2c100183,s{4519,4519,1544800813291,1544800813291,0,0,0,0,77,0,4519}
2018-12-14 16:35:19 DEBUG ClientCnxn:742 - Got ping response for sessionid: 0x167ad4d815d000b after 38ms
2018-12-14 16:35:19 DEBUG AbstractRpcClient:349 - Not trying to connect to hlzudd5hdf01.yres.ytech/10.166.205.41:41571 this server is in the failed servers list
2018-12-14 16:35:19 DEBUG ClientCnxn:843 - Reading reply sessionid:0x167ad4d815d000b, packet:: clientPath:/hbase/meta-region-server serverPath:/hbase/meta-region-server finished:false header:: 4,4 replyHeader:: 4,4697,0 request:: '/hbase/meta-region-server,F response:: #ffffffff0001a726567696f6e7365727665723a3431353731ffffffa0ffffffe9ffffff80fffffffd5611ffffff8c6a50425546a24a17686c7a7564643568646630312e797265732e797465636810ffffffe3ffffffc4218ffffffceffffff93ffffffb6ffffffeafffffffa2c100183,s{4519,4519,1544800813291,1544800813291,0,0,0,0,77,0,4519}
On the HBase master UI this is the region server address and by clicking on the link i can't get the page neither.
I packaged my program as jar file and running it on my VM is fine which makes me think it could be a port access issue.
Entering netstat -tanp | grep LISTEN command on my RHEL6 VM tells me my region server port is listening
tcp 0 0 10.166.205.41:41571 0.0.0.0:* LISTEN 26322/java
Seems I don't have any firewall running so don't know why the connection failed. Maybe it is something else.
I'm out of idea to fix that issue so if you could help me that would be much appreciated ^^
Thanks a lot.

Unable to start jobtracker and tasktracker

I am able to start the namenode and secondary namenode but I am not able to start jobtracker and tasktracker.
When I check log it shows something like this
************************************************************/
2013-05-30 07:27:50,962 FATAL org.apache.hadoop.conf.Configuration: bad conf file: top-level element not <configuration>
2013-05-30 07:27:50,963 WARN org.apache.hadoop.conf.Configuration: bad conf file: element not <property>
2013-05-30 07:27:50,963 WARN org.apache.hadoop.conf.Configuration: bad conf file: element not <property>
2013-05-30 07:27:50,963 WARN org.apache.hadoop.conf.Configuration: bad conf file: element not <property>
2013-05-30 07:27:51,204 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2013-05-30 07:27:51,360 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2013-05-30 07:27:51,365 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2013-05-30 07:27:51,365 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics system started
2013-05-30 07:27:51,440 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2013-05-30 07:27:51,587 FATAL org.apache.hadoop.conf.Configuration: bad conf file: top-level element not <configuration>
2013-05-30 07:27:51,587 WARN org.apache.hadoop.conf.Configuration: bad conf file: element not <property>
2013-05-30 07:27:51,588 WARN org.apache.hadoop.conf.Configuration: bad conf file: element not <property>
2013-05-30 07:27:51,588 WARN org.apache.hadoop.conf.Configuration: bad conf file: element not <property>
2013-05-30 07:27:51,594 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2013-05-30 07:27:51,603 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2013-05-30 07:27:51,773 FATAL org.apache.hadoop.conf.Configuration: bad conf file: top-level element not <configuration>
2013-05-30 07:27:51,773 WARN org.apache.hadoop.conf.Configuration: bad conf file: element not <property>
2013-05-30 07:27:51,773 WARN org.apache.hadoop.conf.Configuration: bad conf file: element not <property>
2013-05-30 07:27:51,773 WARN org.apache.hadoop.conf.Configuration: bad conf file: element not <property>
2013-05-30 07:27:51,799 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.lang.IllegalArgumentException: Does not contain a valid host:port authority: local
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:130)
at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2121)
at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1540)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3937)
2013-05-30 07:27:51,805 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down TaskTracker at ubuntu2/192.168.44.131
************************************************************/
There is some problem with your config files: core-site.xml, hdfs-site.xml and mapred-site.xml. As per the error message:
java.lang.IllegalArgumentException: Does not contain a valid host:port
authority: local
Do look for correct values and if the xml is valid (a small typo can ruin it).
Try these : hadoop job tracker cannot start up and Error in starting hadoop Job Tracker

Hadoop datanode fails to start throwing org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage

I have some problems trying to start a datanode in Hadoop, from the log I can see that datanode is started twice (partial log follows):
2012-05-22 16:25:00,369 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = master/192.168.0.1
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1243785; compiled by 'hortonfo' on Tue Feb 14 08:15:38 UTC 2012
************************************************************/
2012-05-22 16:25:00,375 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = master/192.168.0.1
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1243785; compiled by 'hortonfo' on Tue Feb 14 08:15:38 UTC 2012
************************************************************/
2012-05-22 16:25:00,490 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-05-22 16:25:00,500 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-05-22 16:25:00,500 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-05-22 16:25:00,500 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2012-05-22 16:25:00,512 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-05-22 16:25:00,523 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-05-22 16:25:00,523 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-05-22 16:25:00,524 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2012-05-22 16:25:00,722 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-05-22 16:25:00,724 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-05-22 16:25:00,727 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2012-05-22 16:25:00,729 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2012-05-22 16:20:15,894 INFO org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage /app/hadoop/tmp/dfs/data. The directory is already locked.
2012-05-22 16:20:16,008 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Cannot lock storage /app/hadoop/tmp/dfs/data. The directory is already locked.
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:385)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
I've searched online and I found this question, but I didn't overwrite anything using conf/hdfs-site.xml, that is shown below, so Hadoop should use default values that (as described here) cannot cause any failed lock.
This is my conf/hdfs-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
This is my conf/core-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
This is the content of hadoop/conf/slaves:
master
slave
stop datanode
remove the in_use.lock file from the dfs data dir
location and start datanode
it should work just fine
Also add the following 2 properties in your hdfs-site.xml file..
<property>
<name>dfs.name.dir</name>
<value>/some_path</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/some_path</value>
</property>
their default location is /tmp..because of this you lose data on each restart.
I was facing a similar issue, but I read a post that said the dfs.name.dir and dfs.data.dir should be different from each other. I had the two to be the same, and changing these values to be different from each other fixed my issue.
I had the same issue and setting umask to 0022 in the shell, where I was running the tests fixed the issue for me.
sudo rm -r /tmp/hadoop-[YourNameHere]/dfs/data/
And restart dfs.

Categories

Resources