Hadoop MR job - java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVParser - java

Oozie workflow triggers a Hadoop Map Reduce job's Java class. I have added opencsv-2.3.jar and commons-lang-3-3.1 jar dependencies in my Eclipse project. The project builds successfully, however when moved it on Hadoop cluster I get an ClassNotFoundError even though my project contains jar.
Since this is a working existing legacy system, I do not wish to change the environment dependencies. Hence, i tried different combinations by adding libraries to classpath without success.
Tried: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader - Upload File Vaadin
Checked with a MR client maven dependency - org.apache.hadoop:hadoop-mapreduce-client-common:2.6.0-cdh5.4.2.
The legacy jar in production env runs fine, but my project's compiled jar throws errors as follows:
oozie syslog:
INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.Job: Running job: job_123213123123_35305
INFO [communication thread] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1548794054671_35304_m_000000_0 is : 1.0
INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.Job: Job job_123213123123_35305 running in uber mode : false
INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.Job: map 0% reduce 0%
INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.Job: Task Id : attempt_123213123123_35305_m_000001_0, Status : FAILED
oozie stderr:
Error: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVParser
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Please suggest if I am missing anything and what I can try.

opencsv-2.3.jar library was added from Eclipse Build Path as an external jar. I had to use mvn clean and build it. Finally, used "*jar-with-dependencies.jar" from the target folder which fixed the issue.

Related

Hbase example, Exception in thread "main" java.lang.NoClassDefFoundError

We are trying to execute basic Hbase example on hortonworks sandbox (2.3).
hadoop jar /usr/hdp/2.3.0.0-2557/hbase/lib/hbase-examples.jar org.apache.hadoop.hbase.mapreduce.IndexBuilder
We are getting below exception after executing this program.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/Bytes
at org.apache.hadoop.hbase.mapreduce.IndexBuilder.<clinit>(IndexBuilder.java:67)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.util.Bytes
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 5 more
Based on this error we tried to set the Hadoop classpath in Hbase-env.sh.
/usr/hdp/2.3.0.0-2557/hbase/lib/hbase-client-1.1.1.2.3.0.0-2557.jar:/usr/hdp/2.3.0.0-2557/hbase/lib/hbase-common-1.1.1.2.3.0.0-2557.jar:/usr/hdp/2.3.0.0-2557/hbase/lib/protobuf-java-2.5.0.jar:/usr/hdp/2.3.0.0-2557/hbase/lib/guava-12.0.1.jar:$/usr/hdp/2.3.0.0-2557/hbase/lib/zookeeper.jar:/usr/hdp/2.3.0.0-2557/hbase/lib/hbase-protocol-1.1.1.2.3.0.0-2557.jar:/usr/hdp/2.3.0.0-2557/hbase/lib/commons-configuration-1.6.jar:/usr/hdp/2.3.0.0-2557/hbase/lib/hadoop-common.jar:/usr/hdp/2.3.0.0-2557/hbase/lib/hbase-0.94.27.jar
But still getting the same error.
Instead of manually adding jars into classpath you can directly use below command.
$(hbase classpath) recursively search in hortonworks hadoop folders and finds the required jars from sandbox.
HADOOP_CLASSPATH=$(hbase classpath):/usr/hdp/2.3.0.0-2557/hbase/conf hadoop jar /usr/hdp/2.3.0.0-2557/hbase/lib/hbase-examples.jar org.apache.hadoop.hbase.mapreduce.IndexBuilder
When I face NoClassDefFoundError error with mapreduce, I add jar using one of the jar class in JobBuilder to resolve it.
e.g.
Job job = new Job(conf);
job.setJarByClass(org.apache.hadoop.hbase.util.Bytes.class);
Supply jars using libjars parameter to your job-
e.g.
LIB=hbase-x.x.x.jar
hadoop jar /usr/hdp/2.3.0.0-2557/hbase/lib/hbase-examples.jar org.apache.hadoop.hbase.mapreduce.IndexBuilder -libjars ${LIB}
you can also add jar to HADOOP_CLASSPATH variable before launch job.
Is all the latest code included in the jar? Use a java decompiler such as jd-gui to look inside the jar file to make sure this class you are referencing is actually there. Also check that the necessary import statements are present in the Java class.

Nutch without Solr & with MySQL:

I am new to Nutch, just setup using following url : Nutch+MySQL
I ran following command because the command mentioned in url is deprecated and not available with Nutch 2.3.
I ran the following command.
bin/crawl urls 1 null 10
Injecting seed URLs
/home/Aayush/NUTCH_HOME/runtime/local/bin/nutch inject urls -crawlId 1
InjectorJob: starting at 2015-03-15 17:14:50
InjectorJob: Injecting urlDir: urls
InjectorJob: java.lang.ClassNotFoundException: org.apache.gora.sql.store.SqlStore
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:259)
at org.apache.nutch.storage.StorageUtils.getDataStoreClass(StorageUtils.java:93)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:77)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
And I get the following error.
Error running:
/home/Aayush/NUTCH_HOME/runtime/local/bin/nutch inject urls -crawlId 1
Failed with exit value 255.
I found somewhere that Nutch + MySQL is not supported by Gora version 0.5 you have to downgrade the version to 0.2.1. I downgraded the version in ivy.xml and then removed the directory build and runtime. And again rebuild the project using ant.
After rebuilding and rerunning I still get the above mentioned error.
Please provide any information/help.
Thank you for reading.

Hadoop external jars

I am trying to run a hadoop job on a server. The version is 0.20.2.
I have a big amount of jars, I am running:
hadoop jar GenData.jar -libjars /path/jar1,path/jar2,...
I am getting the error below even if the corresponding classes are inside the jars:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/avro/mapreduce/AvroKeyInputFormat at
GenerateTrainningData.main(GenerateTrainningData.java:256) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.hadoop.util.RunJar.main(RunJar.java:197) Caused by:
java.lang.ClassNotFoundException:
org.apache.avro.mapreduce.AvroKeyInputFormat at
java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
Looks like you are getting this exception from Hadoop client side, Mapreduce driver code execution happens in Client JVM. In hadoop -libjars is a generic option which is used for adding dependent jars to mapper/reducer. In your case for adding Jars to Client set you may set the following environment variable,before executing the hadoop command.
export HADOOP_CLASSPATH=<PATH_to_jar>/Jar1:<PATH_to_jar>/Jar2;
(colon ":" can be used for specifying more than 1 jars, In your case you may add the Jar that contains the class org.apache.avro.mapreduce.AvroKeyInputFormat).
New edits
Here first of all you need to find the jar containing the class org.apache.avro.mapreduce.AvroKeyInputFormat. You can find the class inside the jar avro-mapred*.jar (Get the compatible version of avro-mapred-version.jar from internet ) include the same in your classpath using the above command.
You are missing avro-mapred dependency.

Apache Spark: issue with Scala example

I'm trying to learn to use Apache Spark and I have a problem with a simple example but I can not find a solution. I'm working on Ubuntu 13.04 with Java-7-Oracle and scala 2.9.3.
When I try to run SparkPi examples I get this output:
filippo#filippo-HP-Pavilion-dv6-Notebook-PC:/usr/local/spark$ ./bin/run-example SparkPi 10
java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:337)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
This is the example show in Spark documentation but I don't understand what is the problem :(
You may have downloaded a source release rather than a pre-built?
To build and assemble with sbt, you can run sbt assembly in the spark root directory.
Your installation directory is /usr/local/spark, which doesn't contain the required class.
Try just extract the downloaded tgz file from http://spark.apache.org/downloads.html. Cd to the directory and run the example command.
Make sure that you have lib/spark-examples-XXX-YYY.jar when you run bin/run-example

Hadoop jobs.setJar does not working for jars on HDFS

I am trying to solve an issue when a Hadoop app throws java.lang.ClassNotFoundException:
WARN mapreduce.FaunusCompiler: Using the distribution Faunus job jar: ../lib/faunus-0.4.4-hadoop2-job.jar
INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)
INFO mapreduce.FaunusCompiler: Executing job 1 out of 1: VerticesMap.Map > CountMapReduce.Map > CountMapReduce.Reduce
INFO mapreduce.FaunusCompiler: Job data location: output/job-0
INFO client.RMProxy: Connecting to ResourceManager at yuriys-bigdata3/172.31.8.161:8032
WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner
INFO input.FileInputFormat: Total input paths to process : 1
INFO mapreduce.JobSubmitter: number of splits:1
INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1402963354379_0016
INFO impl.YarnClientImpl: Submitted application application_1402963354379_0016
INFO mapreduce.Job: The url to track the job: http://local-bigdata3:8088/proxy/application_1402963354379_0016/
INFO mapreduce.Job: Running job: job_1402963354379_0016
INFO mapreduce.Job: Job job_1402963354379_0016 running in uber mode : false
INFO mapreduce.Job: map 0% reduce 0%
INFO mapreduce.Job: Task Id : attempt_1402963354379_0016_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException:
com.tinkerpop.blueprints.util.DefaultVertexQuery
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat.setConf(GraphSONInputFormat.java:39)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
The app does create a "fat" jar file, where all the dependency jars (including the one that contains the not found class) are included under the lib node
The app does set the Job.setJar on this fat jar file.
The code does not do anything strange:
job.setJar(hadoopFileJar);
...
boolean success = job.waitForCompletion(true);
Besides, I looked up the configuration in the yarn-site.xml and verified that a job dir under yarn.nodemanager.local-dirs does contain that jar (it is renamed to job.jar though) and also that lib directory with extracted jars in them.
i.e. the jar that contains the missing class is there. Yarn/MR recreates this dir with all these required files after each job schedule, so the files do get transferred there.
I've discovered so far, is that the classpath environment variable on the java worker processes that execute the failing code is set as
C:\hdp\data\hadoop\local\usercache\user\appcache\application_1402963354379_0013\container_1402963354379_0013_02_000001\classpath-3824944728798396318.jar
and this jar just contains a manifest.mf That manifest contains paths to the directory with the "fat.jar" file and its directories (original formatting saved):
file:/c:/hdp/data/hadoop/loc al/usercache/user/appcache/application_1402963354379_0013/container
_1402963354379_0013_02_000001/job.jar/job.jar file:/c:/hdp/data/hadoo p/local/usercache/user/appcache/application_1402963354379_0013/cont ainer_1402963354379_0013_02_000001/job.jar/classes/ file:/c:/hdp/data /hadoop/local/usercache/user/appcache/application_1402963354379_001 3/container_1402963354379_0013_02_000001/jobSubmitDir/job.splitmetain fo file:/c:/hdp/data/hadoop/local/usercache/user/appcache/applicati on_1402963354379_0013/container_1402963354379_0013_02_000001/jobSubmi tDir/job.split file:/c:/hdp/data/hadoop/local/usercache/user/appcac he/application_1402963354379_0013/container_1402963354379_0013_02_000 001/job.xml file:/c:/hdp/data/hadoop/local/usercache/user/appcache/ application_1402963354379_0013/container_1402963354379_0013_02_000001 /job.jar/
However, this path does not explicitly adds the jars in the directories, i.e. the directory from the above manifest
file:/c:/hdp/data/hadoop/local/usercache/user/appcache/application_1402963354379_0013/container_1402963354379_0013_02_000001/job.jar/
does contain the jar file with the class that is not being found by yarn (as this directory contains all the jars from the "fat" jar lib section), but for JAVA world this kind of setting of classpath seems incorrect – this directory was supposed to be included with star*,
e.g:
file:/c:/hdp/data/hadoop/local/usercache/user/appcache/application_1402963354379_0013/container_1402963354379_0013_02_000001/job.jar/*
What I am doing wrong with passing dependencies to Yarn?
Could cluster configuration be an issue or possibly this is a bug on my Hadoop distro (HDP 2.1, Windows x64)?

Categories

Resources