I am new to hadoop.
I have a file Wordcount.java which refers hadoop.jar and stanford-parser.jar
I am running the following commnad
javac -classpath .:hadoop-0.20.1-core.jar:stanford-parser.jar -d ep WordCount.java
jar cvf ep.jar -C ep .
bin/hadoop jar ep.jar WordCount gutenburg gutenburg1
After executing i am getting the following error:
lang.ClassNotFoundException: edu.stanford.nlp.parser.lexparser.LexicalizedParser
The class is in stanford-parser.jar ...
What can be the possible problem?
Thanks
I think you need to add the standford-parser jar when invoking hadoop also, not just the compiler. (If you look in ep.jar, I imagine it will only have one file in it - WordCount.class)
E.g.
bin/hadoop jar ep.jar WordCount -libjars stanford-parser.jar gutenburg gutenburg1
See Map/Reduce Tutorial
mdma is on the right track, but you'll also need your job driver to implement Tool.
I had the same problem. I think the reason -libjars option doesn't get recognized by your program is because you are not parsing it by calling GenericOptionsParser.getRemainingArgs(). In Hadoop 0.21.0's WordCount.java example (in mapred/src/examples/org/apache/hadoop/examples/), this pieces of code is found, and after doing the same in my program, -libjars comma-separated-jars is recognized:
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
...
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
I've just found out that you can simply edit $HADOOP_HOME/conf/hadoop-env.sh and add your JARs to HADOOP_CLASSPATH.
This is probably simplest and most efficient.
Another option you can try since the -libjars doesn't seem to be working for you is to package everything into a single jar, ie your code + the dependencies into a single jar.
This was how it had to be done prior to ~Hadoop-0.18.0 (somewhere around there they fixed this).
Using ant (i use ant in eclipse) you can set up a build that unpacks the dependencies and adds them to the target build project. You can probably hack this yourself though, by manually unpacking the dependency jar and adding the contents to your jar.
Even though I use 0.20.1 now I still use this method. It makes starting a job form the command-line simpler.
Related
While running the wordcount example in Hadoop, I am facing the following error.
saying "JAR does not exist or is not a normal file:
/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduceexamples-2.2.0.jar"
My input command was :
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduceexamples-2.2.0.jar wordcount input output
Just go to that path and check it out if the name is correct or not, the convention may differ by the distribution.
For example, hadoop 3.1.0 has it in the following path:
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar
I faced the same problem and problem was with the version number in file. For example, in the installation instructions, the command was:
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+'
While the version I'm working with was 3.1.3 so what worked for me was:
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar grep input output 'dfs[a-z.]+'
Just check whether all the dependencies are included in your jar file.
Try something like this.
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduceexamples-2.2.0-jar-with-dependencies.jar wordcount input output
Since .jar path can differ from distribution to distribution it's hard to say the exact path and obviously, you can cd into each directory and check, but you ever imagine there is an easy way. Just execute the following command and it will list all the jar files named hadoop-mapreduceexamples-2.2.0.jar with the exact location,
find . -name hadoop-mapreduceexamples-2.2.0.jar
Or if you do not know the name of the .jar file you can try this,
find . -name *.jar
I faced this problem same, beacause I am in /opt/module/hadoop-3.1.3/wcinput directory.
When cd /opt/module/hadoop-3.1.3 , and then try this again :
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount wcinput wcoutput.
it worked.
I am trying to run an hadoop jar command from JAVA using Runtime.exec.
Below is the sample code:
Runtime.getRuntime().exec(new String[]{"bin/hadoop", "jar /home/hadoop/jar/test.jar /user/hduser/myinput/input /user/hduser/newoutput"});
However I am not getting the desired output. Below is my hadoop command which I want to execute from JAVA:
bin/hadoop jar /home/hadoop/jar/test.jar /user/hduser/myinput/input /user/hduser/newoutput
I am not getting any exception as well. Is the way Runtime.getRuntime().exec is used is wrong?
Replace your command with following command:
Runtime.getRuntime().exec("HADOOP_HOME/bin/hadoop jar /home/hadoop/jar/test.jar /user/hduser/myinput/input /user/hduser/newoutput");
Give the class name where you defined the driver code.
bin/hadoop jar /home/hadoop/jar/test.jar Package_name.className /user/hduser/myinput/input /user/hduser/newoutput
I am trying to run an example as it is pointed in Hadoop in Action book page 15.
this is the command that needs to be run :
bin/hadoop jar hadoop-*-examples.jar
but I get this error
"Error: Could not find or load main class org.apache.hadoop.util.RunJar"
It seems like a classpath issue or something. Can someone point out some guideline ?
Actually I am not able to run any of hadoop commands like version, fs, jar ... and so on .. !
NOTE: I am using windows.
[Edited]
Okay, i was reading too fast, you mentioned you were not able to run hadoop fs commands as well. I guess you might miss a few congifurations or haven't start the services at all. Try following this tutorial step by step.
you will need to pass in the class name, for example:
bin/hadoop jar hadoop-*-examples.jar org.apache.hadoop.examples.WordCount [input] [output]
It is probably so late answer, but any way.
Just check that HADOOP_PREFIX environment variable is set correctly (points to your hadoop installed directory).
you need check your environment variable HADOOP_PREFIX=/path/to/hadoop
Setting the HADOOP_CLASSPATH as follows in your ~/.bashrc worked for me:
export HADOOP_CLASSPATH=$(cygpath -pw $(hadoop classpath)):$HADOOP_CLASSPATH
As described in documentation: http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-common/SingleCluster.html
You have to set HADOOP_PREFIX pointing to your hadoop installation folder.
This is an example script to start in stand-alone mode:
#!/bin/bash
cd /etc/hadoop-2.6.5/
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre
export HADOOP_PREFIX=/etc/hadoop-2.6.5
mkdir input
cp etc/hadoop/*.xml input
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar grep input output 'dfs[a-z.]+'
cat output/*
I'm really new to Hadoop and not familiar to terminal commands.
I followed step by step to install hadoop on my mac and can run some inner hadoop examples. However, when i tried to run the WordCount example, it generate many errors such as org.apache can't be resolved.
The post online said you should put it in where you write your java code.. I used to use eclipse. However, in Eclipse there're so many errors that the project was enable to be compiled.
And suggestion?
Thanks!
Assuming you have also followed the directions to start up a local cluster, or pseudo-distributed cluster, then here is the easiest way.
Go to the hadoop directory, which should be whatever directory is unzipped when you download the hadoop library from apache. From there you can run these command to run hadoop
for Hadoop version 0.23.*
cd $HOME/path/to/hadoop-0.23.*
./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-0.23.5.jar wordcount myinput outputdir
for Hadoop version 0.20.*
cd $HOME/path/to/hadoop-0.20.*
./bin/hadoop jar hadoop-0.20.2-examples.jar wordcount myinput outputdir
I am learning to work on hadoop cluster. I have worked for some time on hadoop streaming where I coded map-reduce scripts in perl/python and ran the job.
However, I didn't find any good explanation for running a java map reduce job.
For example:
I have the following program-
http://www.infosci.cornell.edu/hadoop/wordcount.html
Can somebody tell me how shall I actually compile this program and run the job.
Create a directory to hold the compiled class:
mkdir WordCount_classes
Compile your class:
javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d WordCount_classes WordCount.java
Create a jar file from your compiled class:
jar -cvf $HOME/code/hadoop/WordCount.jar -C WordCount_classes/ .
Create a directory for your input and copy all your input files into it, then run your job as follows:
bin/hadoop jar $HOME/code/WordCount.jar WordCount ${INPUTDIR} ${OUTPUTDIR}
The output of your job will be put in the ${OUTPUTDIR} directory. This directory is created by the Hadoop job, so make sure it doesn't exist before you run the job.
See here for a full example.