running hadoop .. -libjars using HIPI

running hadoop .. -libjars using HIPI - java

I'm new to java and trying to run a MR that uses HIPI: http://hipi.cs.virginia.edu/
I've used the command as described in:
http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html
I'm using hadoop 0.20.2
my command looks like:
hadoop jar grayscalefromfile_exc.jar grayscalefromfile_exc.StubDriver -libjars hipi-0.0.1.jar imgs imgsOut1
where the path looks like:
--
--grayscalefromfile_exc.jar
--hipi-0.0.1.jar
The error i get:
Exception in thread "main" java.lang.NoClassDefFoundError: hipi/imagebundle/mapreduce/ImageBundleInputFormat
at grayscalefromfile_exc.StubDriver.run(StubDriver.java:89)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at grayscalefromfile_exc.StubDriver.main(StubDriver.java:103)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: java.lang.ClassNotFoundException: hipi.imagebundle.mapreduce.ImageBundleInputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 9 more
Needless to say , the hipi-0.0.1.jar, has the path inside: hipi/imagebundle/mapreduce/ImageBundleInputFormat.java
Tnx

libjars uploads the given jars to the cluster and then makes them available on the classpath for each mapper / reducer instance
If you want to add additional jars to the driver client classpath, you'll need to use the HADOOP_CLASSPATH env variable:
#> export HADOOP_CLASSPATH=hipi-0.0.1.jar
#> hadoop jar grayscalefromfile_exc.jar grayscalefromfile_exc.StubDriver -libjars hipi-0.0.1.jar imgs imgsOut1
And my output when i run this (the error relates to the fact i haven't got a hipi image bundle file):
cswhite#Studio-1555:~/workspace/sandbox/so-hipi/target$ export $HADOOP_CLASSPATH=/home/cswhite/Downloads/hipi-0.0.1.jar
cswhite#Studio-1555:~/workspace/sandbox/so-hipi/target$ echo $HADOOP_CLASSPATH
/home/cswhite/Downloads/hipi-0.0.1.jar
cswhite#Studio-1555:~/workspace/sandbox/so-hipi/target$ hadoop jar so-hipi-0.0.1-SNAPSHOT.jar StubDriver -libjars ~/Downloads/hipi-0.0.1.jar images output
num of args: 2:images,output
****hdfs://localhost:9000/user/cswhite/images
12/05/14 14:06:34 INFO input.FileInputFormat: Total input paths to process : 1
12/05/14 14:06:34 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/tmp/hadoop-hadoop/mapred/staging/cswhite/.staging/job_201205141351_0003
12/05/14 14:06:34 ERROR security.UserGroupInformation: PriviledgedActionException as:cswhite cause:java.io.IOException: not a hipi image bundle
Exception in thread "main" java.io.IOException: not a hipi image bundle
at hipi.imagebundle.HipiImageBundle.readBundleHeader(HipiImageBundle.java:322)
at hipi.imagebundle.HipiImageBundle.openForRead(HipiImageBundle.java:388)
at hipi.imagebundle.AbstractImageBundle.open(AbstractImageBundle.java:82)
at hipi.imagebundle.AbstractImageBundle.open(AbstractImageBundle.java:55)
at hipi.imagebundle.mapreduce.ImageBundleInputFormat.getSplits(ImageBundleInputFormat.java:61)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at StubDriver.run(StubDriver.java:53)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at StubDriver.main(StubDriver.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

I was able to resolve a similar issue by using the following API in the main class
DistributedCache.addFileToClassPath(new Path("/path/application.jar"), conf);
The jar must be present in hdfs path /path/application.jar.

Related

hadoop always use my installation path to expand classpath on remote node and then load jar failed

I'm a Hadoop&Hbase newbie. I've already run the WordCount example successfully. Now I modify the Mapper and try to use Hbase row as input data, so I need to import some HBase classes.
After I rebuild WordCount.jar and run:
$ hadoop jar ./out/artifacts/WordCount_jar/WordCount.jar WordCount
I got error like:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at WordCount.main(WordCount.java:83)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
So I copy all hbase library to a folder and set HADOOP_CLASSPATH:
$ export HADOOP_CLASSPATH=/home/kayuuzu/jar/*
$ hadoop fs -put /home/kayuuzu/jar/* /home/kayuuzu/jar/
$ hadoop jar ./out/artifacts/WordCount_jar/WordCount.jar WordCount
Now it found hbase classes but print error like:
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://mycluster/ldata/bin/hadoop-2.3.0-cdh5.0.1/share/hadoop/mapreduce2/hadoop-mapreduce-client-core-2.3.0-cdh5.0.1.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
at WordCount.main(WordCount.java:101)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
hadoop classpath:
$ hadoop classpath:
/home/kayuzu/jar/*:/ldata/bin/hadoop-2.3.0-cdh5.0.1/etc/hadoop:/ldata/bin/hadoop-2.3.0-cdh5.0.1/share/hadoop/common/lib/*:/ldata/bin/hadoop-2.3.0-cdh5.0.1/share/hadoop/common/*:/ldata/bin/hadoop-2.3.0-cdh5.0.1/share/hadoop/hdfs:/ldata/bin/hadoop-2.3.0-cdh5.0.1/share/hadoop/hdfs/lib/*:/ldata/bin/hadoop-2.3.0-cdh5.0.1/share/hadoop/hdfs/*:/ldata/bin/hadoop-2.3.0-cdh5.0.1/share/hadoop/yarn/lib/*:/ldata/bin/hadoop-2.3.0-cdh5.0.1/share/hadoop/yarn/*:/ldata/bin/hadoop-2.3.0-cdh5.0.1/share/hadoop/mapreduce/lib/*:/ldata/bin/hadoop-2.3.0-cdh5.0.1/share/hadoop/mapreduce/*
It seems strangely using "ldata/bin/hadoop-2.3.0-cdh5.0.1"(my hadoop installation path) to expand classpath and tring to load jar from hdfs filesytem just like from local.
If I move hadoop-mapreduce-client-core-2.3.0-cdh5.0.1.jar to /home/kayuuzu/jar/ and upload it to hdfs://home/kayuuzu/jar/, this error will dismiss and then fail to load other class. It seems hadoop try to load class from hdfs using the same path on my local machine.
I guess it will work if I move all hadoop library file to one directory and upload it to hdfs keeping same path, but it will destroy my local hadoop installation and there is so many jar file.
Have I misunderstood something? How to specify library path of remote mapreduce?

Hadoop FileAlreadyExistsException: Output directory hdfs://<namenode public dns>:9000/input already exists

I have Hadoop setup in fully distributed mode with one master and 3 slaves. I am trying to execute a jar file named Tasks.jar which is taking arg[0] as input directory and arg[1] as output directory.
In my Hadoop environment, I have the input files in /input directory and there is no /output directory.
I checked the above by using the hadoop fs -ls / command
Now, when I try to execute my jar file by using the below command:
hadoop jar Tasks.jar ProgrammingAssigment/Tasks /input /output'
I get the below exception:
ubuntu#ip-172-31-5-213:~$ hadoop jar Tasks.jar ProgrammingAssignment/Tasks /input /output
16/10/14 02:26:23 INFO client.RMProxy: Connecting to ResourceManager at ec2-52-55-2-64.compute-1.amazonaws.com/172.31.5.213:8032
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://ec2-52-55-2-64.compute-1.amazonaws.com:9000/input already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at ProgrammingAssignment.Tasks.main(Tasks.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

The stacktrace showing like hdfs://ec2-52-55-2-64.compute-1.amazonaws.com:9000/input, in your code output directory mentioned as /input,so the above exception occurred. you may need to change the output directory or change the input directory name in hdfs.

Make sure that /input was passed as your input directory but not output directory. Through the exception
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs
/input was considered as your output direcotry.

BSONFileInputFormat not found even after adding libs to hadoop folder

I was working on the movie recommendations work around using crcmnky's repository. https://github.com/crcsmnky/mongodb-spark-demo
I have compiled mongo-hadoop and mongo-java-driver and stored the jars: mongo-hadoop-core-1.3.2-SNAPSHOT and mongo-java-driver-2.13.3.jar in the $HADOOP_HOME/lib folder.
After doing all this, I built the project and ran it as per the given instructions on the README file.
I get the error:
Exception in thread "main" java.lang.NoClassDefFoundError: com/mongodb/hadoop/BSONFileInputFormat
at com.mongodb.spark.demo.Recommender.main(Recommender.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.mongodb.hadoop.BSONFileInputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
What could have possibly gone wrong? I followed all instructions correctly.

I had the exact same problem and the son of Zeus took me forever to solve. Try this:
Locate your mongo-hadoop-core-1.4.1-SNAPSHOT.jar and mongo-java-driver-2.12.3.jar
Add them to the --jars in spark-submit command "before" your --master and the application jar location. This is the crucial step. If you mention the --jars after the two then you will for some insane reason keep getting the BSONFileInputFormat exception. So effectively your spark-submit command would be -
./bin/spark-submit --class "com.mongodb.spark.demo.Recommender" --jars /home/killshot/Downloads/mongo-hadoop/core/build/libs/mongo-hadoop-core-1.4.1-SNAPSHOT.jar,/home/killshot/Downloads/mongo-hadoop/work/mongodb-spark-demo/target/lib/mongo-java-driver-2.12.3.jar --master local[4]

Error running Jar file in Hadoop's MapReduce program

I'm trying to compile my code into a JAR file for use with Hadoop's MapReduce. My main class is VectorMaker.java and the director structure is as follows.
RandomForestVectors
/bin
/lib
/hadoop-core-1.2.0.jar
/mahout-core-0.7.jar
/mahout-math-0.7.jar
/opencsv-2.3.jar
/VectorMaker.java
These are the commands I'm using to make my JAR file.
javac -classpath "./lib/*" -d ./bin ./VectorMaker.java
jar cf VectorMaker.jar -C "./bin/" . &
This is the command I used to try and run my JAR file as Hadoop MapReduce program.
hadoop jar VectorMaker.jar VectorMaker user/starmine/AlphaDefault/mahout/random_forest/prevectors /user/starmine/AlphaDefault/mahout/random_forest/test1
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/VectorWritable
at VectorMaker.main(VectorMaker.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.VectorWritable
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
I know I need to somehow make my JAR file contain all my dependencies but I'm not sure how.

Make sure you have all your dependencies in your classpath, including all the Mahout libraries. Alternatively you can specify the path through -classpath while compiling you program.
javac -classpath -classpath $MAHOUT_HOME/lib/*

Java Error, java.lang.NoClassDefFoundError: org/myorg/WordCount in Hadoop

I'm very new to Hadoop. I followed the basic tutorial about how to create word count program in hadoop. Everything was fine. I than tried to create my own map reduce, and put it in a separate jar file. When I tried to run the program, it gives me that error:
shean#ubuntu-PC:~/hadoop/bin$ hadoop jar ../weather.jar weather.Weather /user/hadoop/weather_log_sample.txt /user/hadoop/output
Warning: $HADOOP_HOME is deprecated.
Exception in thread "main" java.lang.NoClassDefFoundError: org/myorg/WordCount
at weather.Weather.main(Weather.java:45)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: org.myorg.WordCount
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 6 more
But the problem is , it's looking for WordCount class...

If I am not wrong you are missing the jar wordcount.jar.Please add it to build path.

My advice: you put "package" path first removed. This makes it easier not reported NoClassDefFoundError errors. javac compile time: javac-classpath "$ HADOOP_HOME/hadoop-core-1.2.0.jar: $ HADOOP_HOME/lib/commons-cli-1.2.jar"-d. / weather
litianmin#gmail.com

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

running hadoop .. -libjars using HIPI - java

I was able to resolve a similar issue by using the following API in the main class DistributedCache.addFileToClassPath(new Path("/path/application.jar"), conf); The jar must be present in hdfs path /path/application.jar.

Related

hadoop always use my installation path to expand classpath on remote node and then load jar failed

Hadoop FileAlreadyExistsException: Output directory hdfs://<namenode public dns>:9000/input already exists

BSONFileInputFormat not found even after adding libs to hadoop folder

Error running Jar file in Hadoop's MapReduce program

Java Error, java.lang.NoClassDefFoundError: org/myorg/WordCount in Hadoop

Categories

Resources