Hadoop jobs.setJar does not working for jars on HDFS - java

I am trying to solve an issue when a Hadoop app throws java.lang.ClassNotFoundException:
WARN mapreduce.FaunusCompiler: Using the distribution Faunus job jar: ../lib/faunus-0.4.4-hadoop2-job.jar
INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)
INFO mapreduce.FaunusCompiler: Executing job 1 out of 1: VerticesMap.Map > CountMapReduce.Map > CountMapReduce.Reduce
INFO mapreduce.FaunusCompiler: Job data location: output/job-0
INFO client.RMProxy: Connecting to ResourceManager at yuriys-bigdata3/172.31.8.161:8032
WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner
INFO input.FileInputFormat: Total input paths to process : 1
INFO mapreduce.JobSubmitter: number of splits:1
INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1402963354379_0016
INFO impl.YarnClientImpl: Submitted application application_1402963354379_0016
INFO mapreduce.Job: The url to track the job: http://local-bigdata3:8088/proxy/application_1402963354379_0016/
INFO mapreduce.Job: Running job: job_1402963354379_0016
INFO mapreduce.Job: Job job_1402963354379_0016 running in uber mode : false
INFO mapreduce.Job: map 0% reduce 0%
INFO mapreduce.Job: Task Id : attempt_1402963354379_0016_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException:
com.tinkerpop.blueprints.util.DefaultVertexQuery
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat.setConf(GraphSONInputFormat.java:39)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
The app does create a "fat" jar file, where all the dependency jars (including the one that contains the not found class) are included under the lib node
The app does set the Job.setJar on this fat jar file.
The code does not do anything strange:
job.setJar(hadoopFileJar);
...
boolean success = job.waitForCompletion(true);
Besides, I looked up the configuration in the yarn-site.xml and verified that a job dir under yarn.nodemanager.local-dirs does contain that jar (it is renamed to job.jar though) and also that lib directory with extracted jars in them.
i.e. the jar that contains the missing class is there. Yarn/MR recreates this dir with all these required files after each job schedule, so the files do get transferred there.
I've discovered so far, is that the classpath environment variable on the java worker processes that execute the failing code is set as
C:\hdp\data\hadoop\local\usercache\user\appcache\application_1402963354379_0013\container_1402963354379_0013_02_000001\classpath-3824944728798396318.jar
and this jar just contains a manifest.mf That manifest contains paths to the directory with the "fat.jar" file and its directories (original formatting saved):
file:/c:/hdp/data/hadoop/loc al/usercache/user/appcache/application_1402963354379_0013/container
_1402963354379_0013_02_000001/job.jar/job.jar file:/c:/hdp/data/hadoo p/local/usercache/user/appcache/application_1402963354379_0013/cont ainer_1402963354379_0013_02_000001/job.jar/classes/ file:/c:/hdp/data /hadoop/local/usercache/user/appcache/application_1402963354379_001 3/container_1402963354379_0013_02_000001/jobSubmitDir/job.splitmetain fo file:/c:/hdp/data/hadoop/local/usercache/user/appcache/applicati on_1402963354379_0013/container_1402963354379_0013_02_000001/jobSubmi tDir/job.split file:/c:/hdp/data/hadoop/local/usercache/user/appcac he/application_1402963354379_0013/container_1402963354379_0013_02_000 001/job.xml file:/c:/hdp/data/hadoop/local/usercache/user/appcache/ application_1402963354379_0013/container_1402963354379_0013_02_000001 /job.jar/
However, this path does not explicitly adds the jars in the directories, i.e. the directory from the above manifest
file:/c:/hdp/data/hadoop/local/usercache/user/appcache/application_1402963354379_0013/container_1402963354379_0013_02_000001/job.jar/
does contain the jar file with the class that is not being found by yarn (as this directory contains all the jars from the "fat" jar lib section), but for JAVA world this kind of setting of classpath seems incorrect – this directory was supposed to be included with star*,
e.g:
file:/c:/hdp/data/hadoop/local/usercache/user/appcache/application_1402963354379_0013/container_1402963354379_0013_02_000001/job.jar/*
What I am doing wrong with passing dependencies to Yarn?
Could cluster configuration be an issue or possibly this is a bug on my Hadoop distro (HDP 2.1, Windows x64)?

Related

Hadoop MR job - java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVParser

Oozie workflow triggers a Hadoop Map Reduce job's Java class. I have added opencsv-2.3.jar and commons-lang-3-3.1 jar dependencies in my Eclipse project. The project builds successfully, however when moved it on Hadoop cluster I get an ClassNotFoundError even though my project contains jar.
Since this is a working existing legacy system, I do not wish to change the environment dependencies. Hence, i tried different combinations by adding libraries to classpath without success.
Tried: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader - Upload File Vaadin
Checked with a MR client maven dependency - org.apache.hadoop:hadoop-mapreduce-client-common:2.6.0-cdh5.4.2.
The legacy jar in production env runs fine, but my project's compiled jar throws errors as follows:
oozie syslog:
INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.Job: Running job: job_123213123123_35305
INFO [communication thread] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1548794054671_35304_m_000000_0 is : 1.0
INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.Job: Job job_123213123123_35305 running in uber mode : false
INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.Job: map 0% reduce 0%
INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.Job: Task Id : attempt_123213123123_35305_m_000001_0, Status : FAILED
oozie stderr:
Error: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVParser
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Please suggest if I am missing anything and what I can try.
opencsv-2.3.jar library was added from Eclipse Build Path as an external jar. I had to use mvn clean and build it. Finally, used "*jar-with-dependencies.jar" from the target folder which fixed the issue.

Neo4j - reco : Engine FriendsComputingEngine wasn't found on the classpath

I am using neo4j-reco, to pre-compute real-time recommendations.
I have a sample graph and .jar files have been placed into the plugins directory of Neo4j installation as mentioned in the readme file,
but getting following error when restarting the server.
2015-12-01 15:38:35.769+0530 INFO Neo4j Server shutdown initiated by request
15:38:35.788 [Thread-12] INFO c.g.s.f.b.GraphAwareServerBootstrapper - stopped
2015-12-01 15:38:35.789+0530 INFO Successfully shutdown Neo4j Server
15:38:36.399 [Thread-12] INFO c.g.runtime.BaseGraphAwareRuntime - Shutting down GraphAware Runtime...
15:38:36.399 [Thread-12] INFO c.g.r.schedule.RotatingTaskScheduler - Terminating task scheduler...
15:38:36.399 [Thread-12] INFO c.g.r.schedule.RotatingTaskScheduler - Task scheduler terminated successfully.
15:38:36.399 [Thread-12] INFO c.g.runtime.BaseGraphAwareRuntime - GraphAware Runtime shut down.
2015-12-01 15:38:36.405+0530 INFO Successfully stopped database
2015-12-01 15:38:36.405+0530 INFO Successfully shutdown database
15:38:40.041 [main] INFO c.g.r.b.RuntimeKernelExtension - GraphAware Runtime enabled, bootstrapping...
15:38:40.069 [main] INFO c.g.r.b.RuntimeKernelExtension - Bootstrapping module with order 1, ID reco, using com.graphaware.reco.neo4j.module.RecommendationModuleBootstrapper
15:38:40.077 [main] INFO c.g.r.n.m.RecommendationModuleBootstrapper - Constructing new recommendation module with ID: reco
15:38:40.080 [main] INFO c.g.r.n.m.RecommendationModuleBootstrapper - Trying to instantiate class FriendsComputingEngine
15:38:40.089 [main] ERROR c.g.r.n.m.RecommendationModuleBootstrapper - Engine FriendsComputingEngine wasn't found on the classpath. Will not pre-compute recommendations
java.lang.ClassNotFoundException: FriendsComputingEngine
at java.net.URLClassLoader$1.run(URLClassLoader.java:366) ~[na:1.7.0_91]
at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[na:1.7.0_91]
at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_91]
at java.net.URLClassLoader.findClass(URLClassLoader.java:354) ~[na:1.7.0_91]
at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[na:1.7.0_91]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) ~[na:1.7.0_91]
at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[na:1.7.0_91]
at java.lang.Class.forName0(Native Method) ~[na:1.7.0_91]
at java.lang.Class.forName(Class.java:195) ~[na:1.7.0_91]
How to solve this
You need to build one first if you're referring to it in your config. If you follow the steps in the readme file you're mentioning, you will end up building one.

ClassNotFoundException is thrown when running ExportSnapshot

I'm getting a confusing ClassNotFoundException when I try to run ExportSnapshot from my HBase master node. hbase shell and other commands work just fine, and my cluster is fully operational.
This feels like a Classpath issue, but I don't know what I'm missing.
$ /usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot ambarismoketest-snapshot -copy-to hdfs://10.0.1.90/apps/hbase/data -mappers 16
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2015-10-13 20:05:02,339 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2015-10-13 20:05:04,351 INFO [main] util.FSVisitor: No logs under directory:hdfs://cinco-de-nameservice/apps/hbase/data/.hbase-snapshot/impression_event_production_hbase-transfer-to-staging-20151013/WALs
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/Job
at org.apache.hadoop.hbase.snapshot.ExportSnapshot.runCopyJob(ExportSnapshot.java:529)
at org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:646)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.snapshot.ExportSnapshot.innerMain(ExportSnapshot.java:697)
at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:701)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.Job
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 5 more
Problem
It turns out this is because the mapreduce2 JARs are not available in the classpath. The classpath was properly set up, but I did not have the mapreduce2 client installed on that node. HBase's ExportSnapshot apparently depends on those client JARs when exporting snapshots to another cluster because it writes to HDFS.
Fix
If you use Ambari:
Load Ambari UI
Pull up node where you were running the ExportSnapshot from and getting the above error
Under "components", click "Add"
Click "Mapreduce 2 client"
Background
There's a ticket here https://issues.apache.org/jira/browse/HBASE-9687 where the title is ClassNotFoundException is thrown when ExportSnapshot runs against hadoop cluster where HBase is not installed on the same node as resourcemanager. The title implies that installing resource manager is the fix and this may work; however, the crux is you need the hadoop mapreduce2 jars in the classpath and you can do that by simply installing the mapreduce2 client.
For us, specifically, the reason our snapshot exports were working one day and broken the next is that our HBase master switched on us b/c of another issue we had. Our backup HBase master did not have the mapreduce2 client JARs, but the original primary master did.

Talend Server External Jar Files

i've got a problem when i try to deploy a job to my talend enterprise server. When i run the job in the talend administration center i get the following error:
java.lang.NoClassDefFoundError: javax/xml/rpc/encoding/SerializerFactory
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
In this job im using some external jar files (axis.jar and jaxrpc.jar) and i added a tLibraryLoad in the job but without any import commands. I used a locate on the Talend Server to ensure that the needed files are located on the server. I found them in different directories and now im not sure if they have to be relocated. The directories are the followed:
{Talend-Installation}/cmdline/studio/commandline-workspace/.Java/lib/jaxrpc.jar
{Talend-Installation}/cmdline/studio/configuration/lib/java/jaxrpc.jar
{Talend-Installation}/cmdline/studio/plugins/javax.xml.rpc_[version]/lib/jaxrpc.jar
{Talend-Installation}/studio/plugins/javax.xml.rpc_[version]
On my client the job runs without any errors. Can someone help me with that?
Don't hesitate to ask me for additional content if needed.
Cheers.
External libreries have to be in the same machine where the job will be executed, therefore in order to make it run,
Use a context variable in the component tLibraryLoad : context.my_jar_path+"/jaxrpc.jar"
Put the jars files in the execution server
Depending to how you load your context, make the context variable have the value of the jar path: context.my_jar_path = /Data/Talend/ExtJars/

running hadoop .. -libjars using HIPI

I'm new to java and trying to run a MR that uses HIPI: http://hipi.cs.virginia.edu/
I've used the command as described in:
http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html
I'm using hadoop 0.20.2
my command looks like:
hadoop jar grayscalefromfile_exc.jar grayscalefromfile_exc.StubDriver -libjars hipi-0.0.1.jar imgs imgsOut1
where the path looks like:
--
--grayscalefromfile_exc.jar
--hipi-0.0.1.jar
The error i get:
Exception in thread "main" java.lang.NoClassDefFoundError: hipi/imagebundle/mapreduce/ImageBundleInputFormat
at grayscalefromfile_exc.StubDriver.run(StubDriver.java:89)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at grayscalefromfile_exc.StubDriver.main(StubDriver.java:103)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: java.lang.ClassNotFoundException: hipi.imagebundle.mapreduce.ImageBundleInputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 9 more
Needless to say , the hipi-0.0.1.jar, has the path inside: hipi/imagebundle/mapreduce/ImageBundleInputFormat.java
Tnx
libjars uploads the given jars to the cluster and then makes them available on the classpath for each mapper / reducer instance
If you want to add additional jars to the driver client classpath, you'll need to use the HADOOP_CLASSPATH env variable:
#> export HADOOP_CLASSPATH=hipi-0.0.1.jar
#> hadoop jar grayscalefromfile_exc.jar grayscalefromfile_exc.StubDriver -libjars hipi-0.0.1.jar imgs imgsOut1
And my output when i run this (the error relates to the fact i haven't got a hipi image bundle file):
cswhite#Studio-1555:~/workspace/sandbox/so-hipi/target$ export $HADOOP_CLASSPATH=/home/cswhite/Downloads/hipi-0.0.1.jar
cswhite#Studio-1555:~/workspace/sandbox/so-hipi/target$ echo $HADOOP_CLASSPATH
/home/cswhite/Downloads/hipi-0.0.1.jar
cswhite#Studio-1555:~/workspace/sandbox/so-hipi/target$ hadoop jar so-hipi-0.0.1-SNAPSHOT.jar StubDriver -libjars ~/Downloads/hipi-0.0.1.jar images output
num of args: 2:images,output
****hdfs://localhost:9000/user/cswhite/images
12/05/14 14:06:34 INFO input.FileInputFormat: Total input paths to process : 1
12/05/14 14:06:34 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/tmp/hadoop-hadoop/mapred/staging/cswhite/.staging/job_201205141351_0003
12/05/14 14:06:34 ERROR security.UserGroupInformation: PriviledgedActionException as:cswhite cause:java.io.IOException: not a hipi image bundle
Exception in thread "main" java.io.IOException: not a hipi image bundle
at hipi.imagebundle.HipiImageBundle.readBundleHeader(HipiImageBundle.java:322)
at hipi.imagebundle.HipiImageBundle.openForRead(HipiImageBundle.java:388)
at hipi.imagebundle.AbstractImageBundle.open(AbstractImageBundle.java:82)
at hipi.imagebundle.AbstractImageBundle.open(AbstractImageBundle.java:55)
at hipi.imagebundle.mapreduce.ImageBundleInputFormat.getSplits(ImageBundleInputFormat.java:61)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at StubDriver.run(StubDriver.java:53)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at StubDriver.main(StubDriver.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
I was able to resolve a similar issue by using the following API in the main class
DistributedCache.addFileToClassPath(new Path("/path/application.jar"), conf);
The jar must be present in hdfs path /path/application.jar.

Categories

Resources