MapReduce: executing WordCount v1.0

MapReduce: executing WordCount v1.0 - java

I am trying to learn MapReduce from the official documentation. To make a jar file for WordCount class, the documentation says to run the following command:
javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d wordcount_classes WordCount.java
But, I found that my Hadoop directory has no core.jar present. I suppose my Hadoop installation is alright as I can execute the Hadoop shell script from the Bin folder.

If you trying with that:
javac -classpath `hadoop classpath` -d wordcount_classes WordCount.java
Isn't the best practice, I think, but work for me.

Check in your hadoop-1.2.1 folder (as in my case), which you unzipped in "Prepare to Start Cluster" of single node setup. There you would find hadoop-1.2.1-core.jar
That is the file being used to compile here.

Related

How to run jar files sequentially from a shell script

I am trying to run two java application one after other in my docker container.
In my dockerfile i have specified invoker.sh as the entry point.
ENTRYPOINT ["sh", "/opt/invoker.sh"]
Then i use this script to run two jar files.
#!/bin/sh
java -jar loader.jar
java -jar service.jar
but this does not work. It gives
Error: Unable to access jarfile javaimpl-loader.jar
and only the service.jar is executed. When i tried echo $(ls) it shows that both the jar files are there.
but if i change the script to
#!/bin/sh
echo $(java -jar loader.jar)
java -jar service.jar
then both the jars work. Why cant i use the 1st script. any help regarding this highly apreciated.

It appears the first example is being treated as a single line, you could work with that. Also, I would prefer bash to /bin/sh. Like,
#!/usr/bin/env bash
java -jar loader.jar && java -jar service.jar

Hadoop WordCount error

I am following the documentation found at this link
https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Usage
When i try to compile for WordCount.java and create a jar, I get the following error
bin/hadoop com.sun.tools.javac.Main WordCount.java
Error: Could not find or load main class com.sun.tools.javac.Main
I have verified my $JAVA_HOME and $HADOOP_CLASSPATH in the hadoop-env.sh file and also verified to see if I have the jdk
Here are the contents from hadoop-env.sh
export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_111.jdk/Contents/Home/"
.......
.........
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
if [ "$HADOOP_CLASSPATH" ]; then
export HADOOP_CLASSPATH="$JAVA_HOME/lib/tools.jar"
else
export HADOOP_CLASSPATH=$f
fi
I am not sure the reason behind error or if I am missing another key configuration?

This doesn't make sense in that loop... Nor does checking the existence of the variable first
if [ "$HADOOP_CLASSPATH" ]; then
export HADOOP_CLASSPATH="$JAVA_HOME/lib/tools.jar"
else
You need to set HADOOP_CLASSPATH="$JAVA_HOME/lib/tools.jar", as the documentation says for that class to be found. And that class is only available in the JDK
But, you could just run javac command to compile code. Not sure why the docs have you calling that class.
How to compile a Hadoop program
$ javac -classpath ${HADOOP_CLASSPATH} -d WordCount/ WordCount.java
To create jar:
$ jar -cvf WordCount.jar -C WordCount/ .
To run:
$ hadoop jar WordCount.jar WordCount input/ output
Suggestion Please use Maven/Gradle to create proper JAR files, and an IDE to write code.
P.S. Not many people actually write plain MapReduce

Can't execute the basic Hadoop Mapreduce Wordcount example

I am trying to run the WordCount example. But I am facing issues with compiling the program.
I get the error:
error: package org.apache.hadoop.mapred does not exist
after executing:
javac -classpath /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar -d wordcount_classes WordCount.java
I set up hadoop using this tutorial. I also looked this up on stackoverflow : question and executed the bin/hadoop classpath command in /usr/local/hadoop. This is the output I obtained:
/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/* :/usr/local/hadoop/share/hadoop/common/* :/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/* :/usr/local/hadoop/share/hadoop/hdfs/* :/usr/local/hadoop/share/hadoop/yarn/lib/* :/usr/local/hadoop/share/hadoop/yarn/* :/usr/local/hadoop/share/hadoop/mapreduce/lib/* :/usr/local/hadoop/share/hadoop/mapreduce/* :/contrib/capacity-scheduler/*.jar
But I don't know what to make of it or what my next step should be! Please help!

You're trying to compile the source code using one of the many hadoop dependency jars (hadoop-common-x.x.x.jar). The jar that contains the mapred package noted in the error message is the hadoop-mapreduce-client-core jar.
I suggest you use a build tool such as Maven or Gradle to build your source code as it will manage transitive dependencies for you.
Alternatively to proceed with your manual invocation of javac, try something like this (untested):
javac -cp '/usr/local/hadoop/share/hadoop/common/*' \
-cp '/usr/local/hadoop/share/hadoop/hdfs/lib/*' \
-cp '/usr/local/hadoop/share/hadoop/hdfs/*' \
-cp '/usr/local/hadoop/share/hadoop/yarn/lib/*' \
-cp '/usr/local/hadoop/share/hadoop/yarn/*' \
-cp '/usr/local/hadoop/share/hadoop/mapreduce/lib/*' \
-cp '/usr/local/hadoop/share/hadoop/mapreduce/*' \
-d wordcount_classes WordCount.java

Compile a java file using docker with own path

Hy. I'm trying to compile a .java file using docker. I read the files on docker's website, also I read these links:
docker's website
about volumes
and another question I had put up for gcc compiler
I understood the concept for the gcc compiler since it doesn't create any extra file for compiling.
But the java one does. It creates a Main.class file on my /home directory if I use the following command and compile a file named Main.java
sudo docker run --rm -v "$PWD":/usr/src/myapp -w /usr/src/myapp java:7 javac Main.java
after learning from the above links I was able to successfully compile a java file with my own path using:
docker run --rm -v /mypathhere/mycode.java:/mycode.java: java:7 javac mycode.java"
if there is any error it shows an error but if there isn't it just compiles and gives me no output, and that's justified because it creates a Main.class file.
My problem is that I am unable to find that Main.class file. I don't know where docker is creating it and I have zero understanding for it. Please help me out.

The .class file will be inside the container, under the root directory.
The best plan is to mount the whole source directory and have javac put the result to the same directory e.g:
docker run --rm -v /mypathhere:/mycode java:7 sh -c "cd mycode; javac mycode.java"
That way, you should get the class file written to the mypathhere directory.
Apologies if that doesn't quite work - it's off the top of my head. Hopefully you get the idea though.

How do you get WordCount.java to compile on Cloudera 4?

I'm trying to compile a simple WordCount.java map-reduce example on a linux (CentOS) installation of Cloudera 4. I keep hitting compiler errors when I reference any of the hadoop classes, but I can't figure out which jars of the hundreds under /usr/lib/hadoop I need to add to my classpath to get things to compile. Any help would be greatly appreciated! What I'd like most is a java file for word count (just in case the one I found is bad for some reason) along with the associated command to compile and run it.
I am trying to do this using just javac rather than Eclipse. My main issue either way is what exactly are the Hadoop libraries from the Cloudera 4 install which I need to include in order to get the classic WordCount example to compile. Basically, I need to put the Java MapReduce API classes (Mapper, Reducer, etc.) in my classpath.

I have a script that builds my hadoop classes. Try:
#!/bin/bash
program=`echo $1 | awk -F "." '{print $1}'`
if [ ! -d "${program}_classes" ]
then mkdir ${program}_classes/;
fi
javac -classpath /usr/lib/hadoop/hadoop-common-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop/client/h\
adoop-mapreduce-client-core-2.0.0-cdh4.0.1.jar -d ${program}_classes/ $1
jar -cvf ${program}.jar -C ${program}_classes/ .;
You were probably missing the key jars:
/usr/lib/hadoop/hadoop-common-2.0.0-cdh4.0.1.jar
and
/usr/lib/hadoop/client/hadoop-mapreduce-client-core-2.0.0-cdh4.0.1.jar

If you are running the Cloudera CDH4 Virtual Machine then the following should get you running:
javac -classpath /usr/lib/hadoop/hadoop-common-2.0.0-cdh4.0.0.jar:/usr/lib/hadoop/client/hadoop-mapreduce-client-core-2.0.0-cdh4.0.0.jar -d wordcount_classes WordCount.java

Or you can export environment:
export JAVA_HOME=/usr/java/default
export PATH=${JAVA_HOME}/bin:${PATH}
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
and use the commands below:
$ bin/hadoop com.sun.tools.javac.Main WordCount.java
$ jar cf wc.jar WordCount*.class

If you are using Eclipse please do add Hadoop packages. you may get it from java2s or any similar sites. I couldn't say without know anything about what you did till now.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

MapReduce: executing WordCount v1.0 - java

If you trying with that: javac -classpath `hadoop classpath` -d wordcount_classes WordCount.java Isn't the best practice, I think, but work for me.

Check in your hadoop-1.2.1 folder (as in my case), which you unzipped in "Prepare to Start Cluster" of single node setup. There you would find hadoop-1.2.1-core.jar That is the file being used to compile here.

Related

How to run jar files sequentially from a shell script

Hadoop WordCount error

Can't execute the basic Hadoop Mapreduce Wordcount example

Compile a java file using docker with own path

How do you get WordCount.java to compile on Cloudera 4?

Categories

Resources