How to execute mahout with hadoop installation

How to execute mahout with hadoop installation - java

i'm trying to figure out how to run mahout jar examples with hadoop. I configured mahout and hadoop, now i enter in the hadoop dir and type something like this:
/Users/hadoop/hadoop-0.20.2/bin/hadoop jar /Users/hadoop/trunk/examples/mahout-examples-0.5-SNAPSHOT-job.jar org.apache.mahout.SpareVectorsFromSequenceFile -w -i ratings -o ratings_vectors
but i'm trying and my goal is to run hadoop job for the Grouplens dataset. I executed put command to upload my ratings.dat to Hadoop, and then? The command give me always something like this:
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.mahout.SpareVectorsFromSequenceFile
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
My questions are:
how can i set the right path in hadoop dir to call mahout?
how can i use the org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommenderEvaluatorRunner to compute my data ratings.dat with hadoop?
Thank you very much, I'm beginning with hadoop and mahout ;)

You have a typo. They are "sparse vectors", not "spare vectors". See SpareVectorsFromSequenceFile which should be SparseVectorsFromSequenceFile.

Related

Hbase example, Exception in thread "main" java.lang.NoClassDefFoundError

We are trying to execute basic Hbase example on hortonworks sandbox (2.3).
hadoop jar /usr/hdp/2.3.0.0-2557/hbase/lib/hbase-examples.jar org.apache.hadoop.hbase.mapreduce.IndexBuilder
We are getting below exception after executing this program.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/Bytes
at org.apache.hadoop.hbase.mapreduce.IndexBuilder.<clinit>(IndexBuilder.java:67)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.util.Bytes
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 5 more
Based on this error we tried to set the Hadoop classpath in Hbase-env.sh.
/usr/hdp/2.3.0.0-2557/hbase/lib/hbase-client-1.1.1.2.3.0.0-2557.jar:/usr/hdp/2.3.0.0-2557/hbase/lib/hbase-common-1.1.1.2.3.0.0-2557.jar:/usr/hdp/2.3.0.0-2557/hbase/lib/protobuf-java-2.5.0.jar:/usr/hdp/2.3.0.0-2557/hbase/lib/guava-12.0.1.jar:$/usr/hdp/2.3.0.0-2557/hbase/lib/zookeeper.jar:/usr/hdp/2.3.0.0-2557/hbase/lib/hbase-protocol-1.1.1.2.3.0.0-2557.jar:/usr/hdp/2.3.0.0-2557/hbase/lib/commons-configuration-1.6.jar:/usr/hdp/2.3.0.0-2557/hbase/lib/hadoop-common.jar:/usr/hdp/2.3.0.0-2557/hbase/lib/hbase-0.94.27.jar
But still getting the same error.

Instead of manually adding jars into classpath you can directly use below command.
$(hbase classpath) recursively search in hortonworks hadoop folders and finds the required jars from sandbox.
HADOOP_CLASSPATH=$(hbase classpath):/usr/hdp/2.3.0.0-2557/hbase/conf hadoop jar /usr/hdp/2.3.0.0-2557/hbase/lib/hbase-examples.jar org.apache.hadoop.hbase.mapreduce.IndexBuilder

When I face NoClassDefFoundError error with mapreduce, I add jar using one of the jar class in JobBuilder to resolve it.
e.g.
Job job = new Job(conf);
job.setJarByClass(org.apache.hadoop.hbase.util.Bytes.class);
Supply jars using libjars parameter to your job-
e.g.
LIB=hbase-x.x.x.jar
hadoop jar /usr/hdp/2.3.0.0-2557/hbase/lib/hbase-examples.jar org.apache.hadoop.hbase.mapreduce.IndexBuilder -libjars ${LIB}
you can also add jar to HADOOP_CLASSPATH variable before launch job.

Is all the latest code included in the jar? Use a java decompiler such as jd-gui to look inside the jar file to make sure this class you are referencing is actually there. Also check that the necessary import statements are present in the Java class.

java.lang.ClassNotFoundException when trying to run camus

I downloaded the confluent package which includes camus jars and I followed the instructions online enter link description here.
Hadoop is properly setup (meaning I can use hadoop fs -ls commands and other hadoop jar commands). However, when i tried to run
hadoop jar confluent-camus-1.0.jar com.linkedin.camus.etl.kafka.CamusJob
I got "main" classNotFound error
Exception in thread "main" java.lang.ClassNotFoundException: com.linkedin.camus.
etl.kafka.CamusJob
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:344)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)
The the path to the "confluent-camus-1.0.jar" is correct (right under the folder). I didn't start the kafka service, just to try to run it.
Anyone got similar problems?
Thanks.

You should try to inspect your jar file:
jar tvf confluent-camus-1.0.jar | grep com.linkedin.camus.etl.kafka.CamusJob
If you do not find this class, try to find it in other jar, which generated by camus.
After you should add target jar with
hadoop jar confluent-camus-1.0.jar com.linkedin.camus.etl.kafka.CamusJob -libjars {JAR_NAME}

Configuring memory for mappers and reducer during mapreduce job submission

I am trying to configure memory for mapper/reducer memory during a map reduce job submission as below:
hadoop jar Word-0.0.1-SNAPSHOT.jar -Dmapreduce.map.memory.mb=5120 com.test.Word.App /tmp/ilango/input /tmp/ilango/output/
Is there any wrong in the command above ? I am getting the following exception. It looks like do we need to put JAR file or need to configure what to use -D option in Hadoop. Thanks in advance.
Exception in thread "main" java.lang.ClassNotFoundException: -Dmapreduce.map.memory.mb=5120
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)

Command to run a MR job is
hadoop jar jarname classname input output
As per your command
hadoop jar jarname -D mapreduce.map.memory.mb=5120 classname input output
hadoop checks Driver class with name "-Dmapreduce.map.memory.mb=5120".
Thats why it is showing java.lang.ClassNotFound Exception.
-D option should be supplied after your Driver class.
Try using below command.
hadoop jar Word-0.0.1-SNAPSHOT.jar com.test.Word.App -D mapreduce.map.memory.mb=5120 /tmp/ilango/input /tmp/ilango/output/
Hope this solve your issue.

It looks like you are missing a space after -D
try -D mapreduce.map.memory.mb=5120
There is a difference between -Dproperty=value and -D property=value. The first one sets JVM system property where as the second one sets the Hadoop configuration property.
Quoting from the book Hadoop the Definitive guide, :
-D property=value Sets the given Hadoop configuration property to the given value. Overrides any default or site properties in the
configuration, and any properties set via the -conf option.

If you're using MVN and added the Main class to the manifest, in this case com.test.Word.App, your command -D mapreduce.map.memory.mb=5120 will be taken as input.
So, just remove the com.test.Word.App line

Error running Jar file in Hadoop's MapReduce program

I'm trying to compile my code into a JAR file for use with Hadoop's MapReduce. My main class is VectorMaker.java and the director structure is as follows.
RandomForestVectors
/bin
/lib
/hadoop-core-1.2.0.jar
/mahout-core-0.7.jar
/mahout-math-0.7.jar
/opencsv-2.3.jar
/VectorMaker.java
These are the commands I'm using to make my JAR file.
javac -classpath "./lib/*" -d ./bin ./VectorMaker.java
jar cf VectorMaker.jar -C "./bin/" . &
This is the command I used to try and run my JAR file as Hadoop MapReduce program.
hadoop jar VectorMaker.jar VectorMaker user/starmine/AlphaDefault/mahout/random_forest/prevectors /user/starmine/AlphaDefault/mahout/random_forest/test1
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/VectorWritable
at VectorMaker.main(VectorMaker.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.VectorWritable
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
I know I need to somehow make my JAR file contain all my dependencies but I'm not sure how.

Make sure you have all your dependencies in your classpath, including all the Mahout libraries. Alternatively you can specify the path through -classpath while compiling you program.
javac -classpath -classpath $MAHOUT_HOME/lib/*

HADOOP :: java.lang.ClassNotFoundException: WordCount

I am using eclipse to export the jar file of a map-reduce program. When i am run the jar using command
hadoop jar hadoop-prog.jar WordCount /home/temp/input /home/temp/output
it always shows the error :
Exception in thread "main" java.lang.ClassNotFoundException: WordCount
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
Btw, I get a sample example jar file of wordcount from internet and it ran very well.
I could not figure out where is the problem.

If you're trying to run the wordcount provided in the examples, you should run:
hadoop jar hadoop*examples*.jar wordcount /home/temp/input /home/temp/output
More info on how to run wordcount on this link.
In general, if you're developing your own Map/Reduce jobs, you should include the full package name of your driver class, so something like this might work:
hadoop jar wordcount.jar com.something.WordCount /home/temp/input /home/temp/output

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to execute mahout with hadoop installation - java

You have a typo. They are "sparse vectors", not "spare vectors". See SpareVectorsFromSequenceFile which should be SparseVectorsFromSequenceFile.

Related

Hbase example, Exception in thread "main" java.lang.NoClassDefFoundError

java.lang.ClassNotFoundException when trying to run camus

Configuring memory for mappers and reducer during mapreduce job submission

Error running Jar file in Hadoop's MapReduce program

HADOOP :: java.lang.ClassNotFoundException: WordCount

Categories

Resources