Hadoop Docker Setup - WordCount Tutorial

Hadoop Docker Setup - WordCount Tutorial - java

I was following the tutorial to run WordCount.java mentioned in here and when I run the following line in the tutorial
hadoop jar wordcount.jar org.myorg.WordCount /user/cloudera/wordcount/input /user/cloudera/wordcount/output
I get the following error -
17/09/04 01:57:29 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/09/04 01:57:30 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
The docker image that I used was docker pull cloudera/quickstart
There were no setup tutorials for Hadoop with Docker so it would be helpful if you could tell me the configurations that are to be made to overcome these issues.

That tutorial assumes you are in the cluster with the Hadoop client command available, the Hadoop Services are started, and properly configured.
0.0.0.0:8032 is the default YARN resource manager, so you need to configure your HADOOP_CONF_DIR XML files (specifically yarn-site for this error) to point at the Docker container for the correct addresses of YARN. core and hdfs-site will need configured to point at HDFS as well.

Related

Spring Boot app in Docker receives: Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch

I have a Spring Boot app in Docker that runs on Heroku.
Recently, after updating Tomcat to 10.1.0-M10, I started getting this error:
Error R10 (Boot timeout) -> Web process failed to bind to $PORT within
60 seconds of launch
The immediate thought of downgrading to lower versions doesn't work due to vulnerabilities in the earlier versions. I have checked possible causes and found Tomcat binding port issue.
I cannot set up fixed config for different ports as I am deploying to Heroku and dependent on their random ports.
My Dockerfile:
FROM azul/zulu-openjdk-alpine:11
ENV PORT=$PORT
COPY /target/app.jar /app.jar
CMD java -Xms256m -Xmx512m \
-Dlog4j2.formatMsgNoLookups=true \
-Djava.security.egd=file:/dev/./urandom \
-Dserver.port=$PORT \
-jar /app.jar
What is the way to solve it? Is there anything I am missing?
UPDATE:
There are more logs from Heroku:
Feb 22 12:50:16 integration-test app/web.1 2022-02-22 20:50:16.057 [main] INFO c.g.s.z.ApplicationKt - Started ApplicationKt in 8.09 seconds (JVM running for 9.062)
Feb 22 12:50:16 integration-test app/web.1 2022-02-22 20:50:16.060 [main] DEBUG o.s.b.a.ApplicationAvailabilityBean - Application availability state LivenessState changed to CORRECT
Feb 22 12:50:16 integration-test app/web.1 2022-02-22 20:50:16.063 [main] DEBUG o.s.b.a.ApplicationAvailabilityBean - Application availability state ReadinessState changed to ACCEPTING_TRAFFIC
Feb 22 12:51:06 integration-test heroku/web.1 Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch

I found a solution that wasn't perfect but seemed to work for me.
Downgraded Spring Boot from 2.6.3 to 2.6.1
Downgraded Tomcat from 10.X.X to 9.X.X
Removed dev tools dependencies
I think the two latest did the magic. Dev tools stopped asking for an extra port in the test/prod environment. Tomcat bound the port in the version 9.X.X but not in 10.X.X.
Even though I found the solution, I don't know why it behaved like this, and it isn't perfect security-wise.

from the error message it seems that $PORT is not resolved to any environment variable.
deploying to heroku you must use .env file to define env vars (you can't use docker run -e PORT=1234) see documentation
When you use heroku locally, you can set config vars in a .env file. When heroku local is run .env is read and each name/value pair is set in the environment. You can use this same .env file when using Docker: docker run -p 5000:5000 --env-file .env <image-name>

Unable to set up hadoop on my local machine

I am trying to install Hadoop as a pseudo stand alone application on my macbook, and I have been seeing errors.
When I try to execute sbin/start-dfs.sh, I get the following error.
$ sbin/start-dfs.sh
Starting namenodes on [localhost]
localhost: Connection closed by ::1 port 22
Starting datanodes
localhost: Connection closed by ::1 port 22
Starting secondary namenodes [Kartiks-MacBook-Pro.local]
Kartiks-MacBook-Pro.local: Connection closed by 100.110.189.236 port 22
2018-01-22 00:20:19,441 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I used the following pages as my reference
1) http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
2) https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation
3) http://zhongyaonan.com/hadoop-tutorial/setting-up-hadoop-2-6-on-mac-osx-yosemite.html
None of the URLs to view either the namenode load for me
1) http://localhost:9870/ (from the Hadoop's website)
2) http://localhost:50070/ (from http://zhongyaonan.com/hadoop-tutorial/setting-up-hadoop-2-6-on-mac-osx-yosemite.html)
I am using my personal login id from my Macbook.

spark on yarn, Connecting to ResourceManager at /0.0.0.0:8032

I was writing a spark program in my developing machine, which is a mac.
The version of hadoop is 2.6, the version of spark is 1.6.2. The hadoop cluster have 3 nodes, of course all in linux machine.
I run the spark program in idea IDE in spark standalone mode, it works successfully. But now, I change it to yarn-client mode, it doesn't work successfully, and gives the message as follows:
...
2017-02-23 11:01:33,725-[HL] INFO main org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2017-02-23 11:01:34,839-[HL] INFO main org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-02-23 11:01:35,842-[HL] INFO main org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-02-23 11:01:36,847-[HL] INFO main org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-02-23 11:01:37,854-[HL] INFO main org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
...
I have already added corresponding configuration files to the resources directory of the project. If I make it a jar package and use spark-submit to run this program, it will be ok. Now, I want to run this program in IDE as yarn-client mode, just like spark standalone mode. How can I fix this problem? Thanks.

Ensure the YARN configurations are available for Spark to use when running in yarn mode. Add these files core-site.xml, hdfs-site.xml and yarn-site.xml files to the conf directory of spark.
Also make sure, the yarn-site.xml contains the address of the resource manager
<property>
<name>yarn.resourcemanager.address</name>
<value>resource_manager_ip:8032</value>
</property>

Set your conf object like this, its work for me:
conf = new SparkConf().setAppName(setup.getAppname).setMaster("yarn")
.set("spark.hadoop.yarn.resourcemanager.hostname", "resourcemanager.fqdn")
.set("spark.hadoop.yarn.resourcemanager.address", "resourcemanager.fqdn:8032")`
Font: hortonworks.com

tachyon0.8.2 deployed with hadoop2.6.0,but the IPC version are not matched

Now,I want to deploy the tachyon0.8.2 on my ubuntu14.04,I already has hadoop and spark:
on the master
bd#master$ jps
11871 Jps
3388 Master
2919 NameNode
3266 ResourceManager
3123 SecondaryNameNode
on the slave
bd#slave$ jps
4350 Jps
2778 NodeManager
2647 DataNode
2879 Worker
And I editor the taachyon-env.sh:
export TACHYON_MASTER_ADDRESS=${TACHYON_MASTER_ADDRESS:-master}
export TACHYON_UNDERFS_ADDRESS=${TACHYON_UNDERFS_ADDRESS:-hdfs://master:9000}
Then, I run the bin/tachyon formatand bin/tachyon-start.sh local.
I cannot see the tachyonMaster in JPS:
/usr/local/bigdata/tachyon-0.8.2 [06:06:32]
bd$ bin/tachyon-start.sh local
Killed 0 processes on master
Killed 0 processes on master
Connecting to master as bd...
Killed 0 processes on master
Connection to master closed.
[sudo] password for bd:
Formatting RamFS: /mnt/ramdisk (512mb)
Starting master # master
Starting worker # master
/usr/local/bigdata/tachyon-0.8.2 [06:06:54]
bd$ jps
12183 TachyonWorker
3388 Master
2919 NameNode
3266 ResourceManager
3123 SecondaryNameNode
12203 Jps
and I see the logs in master.logs,I said that:
2015-12-27 18:06:50,635 ERROR MASTER_LOGGER (MetricsConfig.java:loadConfigFile) - Error loading metrics configuration file.
2015-12-27 18:06:51,735 ERROR MASTER_LOGGER (HdfsUnderFileSystem.java:<init>) - Exception thrown when trying to get FileSystem for hdfs://master:9000
org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at tachyon.underfs.hdfs.HdfsUnderFileSystem.<init>(HdfsUnderFileSystem.java:74)
at tachyon.underfs.hdfs.HdfsUnderFileSystemFactory.create(HdfsUnderFileSystemFactory.java:30)
at tachyon.underfs.UnderFileSystemRegistry.create(UnderFileSystemRegistry.java:116)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:100)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:83)
at tachyon.master.TachyonMaster.connectToUFS(TachyonMaster.java:412)
at tachyon.master.TachyonMaster.startMasters(TachyonMaster.java:280)
at tachyon.master.TachyonMaster.start(TachyonMaster.java:261)
at tachyon.master.TachyonMaster.main(TachyonMaster.java:64)
2015-12-27 18:06:51,742 ERROR MASTER_LOGGER (TachyonMaster.java:main) - Uncaught exception terminating Master
java.lang.IllegalArgumentException: All eligible Under File Systems were unable to create an instance for the given path: hdfs://master:9000
java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
at tachyon.underfs.UnderFileSystemRegistry.create(UnderFileSystemRegistry.java:132)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:100)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:83)
at tachyon.master.TachyonMaster.connectToUFS(TachyonMaster.java:412)
at tachyon.master.TachyonMaster.startMasters(TachyonMaster.java:280)
at tachyon.master.TachyonMaster.start(TachyonMaster.java:261)
at tachyon.master.TachyonMaster.main(TachyonMaster.java:64)
What should I do for this problem?

This exception arises due to version mismatch of Hadoop client and server side. Check your Hadoop version, and then recompile Tachyon against that version using this command:
mvn -Dhadoop.version=your_hadoop_version clean install
Example: mvn -Dhadoop.version=2.4.0 clean install
Now configure your compiled Tachyon and it should work fine. Reference link.

WTP(eclipse3.7) can not start tomcat6 in a debug mode under Windows7

WTP(eclipse3.7) can not start tomcat6 in a debug mode under Windows7，Here is the error log：
!ENTRY org.eclipse.wst.server.core 4 0 2011-11-23 22:26:33.001
!MESSAGE Server Tomcat v6.0 Server at localhost failed to start.

As there is not enough information to find the root cause, here are some things which you can try:
- Is Tomcat starting from outside i.e. not from eclipse but by launching stratup.bat file of Tomcat? Make sure your tomcat is working properly.
- did you configure resource(your project) while configuring the WTP.
- As you have very specifically said that it's not working in debug mode, is it working on normal mode? If yes, then you have to copy the start configuration to debug configuration.
Do share if you have some more information on this.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Hadoop Docker Setup - WordCount Tutorial - java

Related

Spring Boot app in Docker receives: Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch

Unable to set up hadoop on my local machine

spark on yarn, Connecting to ResourceManager at /0.0.0.0:8032

tachyon0.8.2 deployed with hadoop2.6.0,but the IPC version are not matched

WTP(eclipse3.7) can not start tomcat6 in a debug mode under Windows7

Categories

Resources