ClassNotFoundException when setting hive.exec.pre.hooks - java

I am following this document to do hive hook:
http://dharmeshkakadia.github.io/hive-hook/
But I got this error when show tables
2018-08-12 09:57:38,122 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-315]: hive.exec.pre.hooks Class not found: HiveExampleHook
2018-08-12 09:57:38,122 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-315]: FAILED: Hive Internal Error: java.lang.ClassNotFoundException(HiveExampleHook)
java.lang.ClassNotFoundException: HiveExampleHook
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.hive.ql.hooks.HooksLoader.getHooks(HooksLoader.java:100)
at org.apache.hadoop.hive.ql.hooks.HooksLoader.getHooks(HooksLoader.java:64)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1674)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1501)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1285)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1280)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:89)
at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:301)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:314)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2018-08-12 09:57:38,122 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-315]: </PERFLOG method=Driver.execute start=1534067858120 end=1534067858122 duration=2 from=org.apache.hadoop.hive.ql.Driver>
2018-08-12 09:57:38,122 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-315]: Completed executing command(queryId=hive_20180812095757_e6516d83-ddc9-4f82-8151-def7e7f1eb37); Time taken: 0.002 seconds
2018-08-12 09:57:38,122 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-315]: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2018-08-12 09:57:38,122 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-315]: </PERFLOG method=releaseLocks start=1534067858122 end=1534067858122 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2018-08-12 09:57:38,130 ERROR org.apache.hive.service.cli.operation.Operation: [HiveServer2-Background-Pool: Thread-315]: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Hive Internal Error: java.lang.ClassNotFoundException(HiveExampleHook)
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:238)
at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:89)
at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:301)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:314)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: HiveExampleHook
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.hive.ql.hooks.HooksLoader.getHooks(HooksLoader.java:100)
at org.apache.hadoop.hive.ql.hooks.HooksLoader.getHooks(HooksLoader.java:64)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1674)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1501)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1285)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1280)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
... 11 more
I am sure the last step add jar target/Hive-hook-example-1.0.jar; is wrong.
I tried the following:
I put the jar file into hdfs /user/hive/ :
add jar hdfs:///user/hive/Hive-hook-example-1.0.jar;
I also set "Hive Auxiliary JARs Directory" as
/home/centos/HiveExampleHook/target/Hive-hook-example-1.0.jar in Hiveserver2 node and restart Hive plus beeline.
Copy the jar file to /opt/cloudera/parcels/CDH/jars/
Copy the jar file to /opt/cloudera/parcels/CDH/lib/hive/lib/
Nothing helps.
Any idea?
UPDATE 1:
If I do LIST JARS; this would show
+----------------------------------------------------+--+
| resource |
+----------------------------------------------------+--+
| /tmp/3fe67bb1-5cfd-427f-8faa-cab6524afeb3_resources/Hive-hook-example-1.0.jar |
+----------------------------------------------------+--+
I tried two ways to do CREATE FUNCTION too:
CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook';
INFO : Compiling command(queryId=hive_20180812153838_47589f9d-eaeb-410d-80b0-9cf414ae557f): CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook'
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20180812153838_47589f9d-eaeb-410d-80b0-9cf414ae557f); Time taken: 0.002 seconds
INFO : Executing command(queryId=hive_20180812153838_47589f9d-eaeb-410d-80b0-9cf414ae557f): CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook'
INFO : Starting task [Stage-0:FUNC] in serial mode
ERROR : FAILED: Class HiveExampleHook not found
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
INFO : Completed executing command(queryId=hive_20180812153838_47589f9d-eaeb-410d-80b0-9cf414ae557f); Time taken: 0.003 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)
and...
CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook' USING JAR 'hdfs:///user/hive/Hive-hook-example-1.0.jar';
INFO : Compiling command(queryId=hive_20180812153939_cf1f31c9-0361-47dc-8903-78221bd12401): CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook' USING JAR 'hdfs:///user/hive/Hive-hook-example-1.0.jar'
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20180812153939_cf1f31c9-0361-47dc-8903-78221bd12401); Time taken: 0.004 seconds
INFO : Executing command(queryId=hive_20180812153939_cf1f31c9-0361-47dc-8903-78221bd12401): CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook' USING JAR 'hdfs:///user/hive/Hive-hook-example-1.0.jar'
INFO : Starting task [Stage-0:FUNC] in serial mode
INFO : converting to local hdfs:///user/hive/Hive-hook-example-1.0.jar
INFO : Added [/tmp/3fe67bb1-5cfd-427f-8faa-cab6524afeb3_resources/Hive-hook-example-1.0.jar] to class path
INFO : Added resources: [hdfs:///user/hive/Hive-hook-example-1.0.jar]
ERROR : FAILED: Class HiveExampleHook not found
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
INFO : Completed executing command(queryId=hive_20180812153939_cf1f31c9-0361-47dc-8903-78221bd12401); Time taken: 0.03 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)
So clearly it can find the jar but not the class name. Am I right?
UPDATE 2:
I tried this:
[Hive-hook-example]# java -cp `pwd`/target/Hive-hook-example-1.0.jar HiveExampleHook
And still got this:
Error: Could not find or load main class HiveExampleHook
I believe this is some stupid mistake I did.
UPDATE 3:
OK I got it figured out. You have to use hive CLI and not beeline for this to work.
hive> add jar hdfs:///user/hive/Hive-hook-example-1.0.jar;
add jar hdfs:///user/hive/Hive-hook-example-1.0.jar
converting to local hdfs:///user/hive/Hive-hook-example-1.0.jar
Added [/tmp/0a90132d-70cd-4ef0-b4cd-e75dc823e5ca_resources/Hive-hook-example-1.0.jar] to class path
Added resources: [hdfs:///user/hive/Hive-hook-example-1.0.jar]
hive> set hive.exec.pre.hooks=HiveExampleHook;
set hive.exec.pre.hooks=HiveExampleHook
hive> show tables;
show tables
Hello from the hook !!
OK
test1
Time taken: 0.023 seconds, Fetched: 5 row(s)
So the question is how to run this in beeline then? Because hive CLI is deprecated.
UPDATE 4:
I decided to do this:
Ran beeline and saw this:
2018-08-12 16:39:13,286 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-60]: <PERFLOG method=PreHook.HiveExampleHook from=org.apache.hadoop.hive.ql.Driver>
2018-08-12 16:39:13,286 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-60]: </PERFLOG method=PreHook.HiveExampleHook start=1534091953286 end=1534091953286 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2
That is some progress although I am not sure what it means and whether the class was ran. As I see nothing output.

With beeline, you have to use HDFS path while adding jar. Remember beeline is just a JDBC CLI, so when you use add jar with local path, it has the reference to you local path, that is not accessible to hive session running on the cluster.
(Thanks for asking https://twitter.com/quanghoc/status/1028671393376874496 for help. I am the author of the blog you referred to.)

Related

Hadoop MR job - java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVParser

Oozie workflow triggers a Hadoop Map Reduce job's Java class. I have added opencsv-2.3.jar and commons-lang-3-3.1 jar dependencies in my Eclipse project. The project builds successfully, however when moved it on Hadoop cluster I get an ClassNotFoundError even though my project contains jar.
Since this is a working existing legacy system, I do not wish to change the environment dependencies. Hence, i tried different combinations by adding libraries to classpath without success.
Tried: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader - Upload File Vaadin
Checked with a MR client maven dependency - org.apache.hadoop:hadoop-mapreduce-client-common:2.6.0-cdh5.4.2.
The legacy jar in production env runs fine, but my project's compiled jar throws errors as follows:
oozie syslog:
INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.Job: Running job: job_123213123123_35305
INFO [communication thread] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1548794054671_35304_m_000000_0 is : 1.0
INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.Job: Job job_123213123123_35305 running in uber mode : false
INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.Job: map 0% reduce 0%
INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.Job: Task Id : attempt_123213123123_35305_m_000001_0, Status : FAILED
oozie stderr:
Error: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVParser
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Please suggest if I am missing anything and what I can try.
opencsv-2.3.jar library was added from Eclipse Build Path as an external jar. I had to use mvn clean and build it. Finally, used "*jar-with-dependencies.jar" from the target folder which fixed the issue.

Neo4j - reco : Engine FriendsComputingEngine wasn't found on the classpath

I am using neo4j-reco, to pre-compute real-time recommendations.
I have a sample graph and .jar files have been placed into the plugins directory of Neo4j installation as mentioned in the readme file,
but getting following error when restarting the server.
2015-12-01 15:38:35.769+0530 INFO Neo4j Server shutdown initiated by request
15:38:35.788 [Thread-12] INFO c.g.s.f.b.GraphAwareServerBootstrapper - stopped
2015-12-01 15:38:35.789+0530 INFO Successfully shutdown Neo4j Server
15:38:36.399 [Thread-12] INFO c.g.runtime.BaseGraphAwareRuntime - Shutting down GraphAware Runtime...
15:38:36.399 [Thread-12] INFO c.g.r.schedule.RotatingTaskScheduler - Terminating task scheduler...
15:38:36.399 [Thread-12] INFO c.g.r.schedule.RotatingTaskScheduler - Task scheduler terminated successfully.
15:38:36.399 [Thread-12] INFO c.g.runtime.BaseGraphAwareRuntime - GraphAware Runtime shut down.
2015-12-01 15:38:36.405+0530 INFO Successfully stopped database
2015-12-01 15:38:36.405+0530 INFO Successfully shutdown database
15:38:40.041 [main] INFO c.g.r.b.RuntimeKernelExtension - GraphAware Runtime enabled, bootstrapping...
15:38:40.069 [main] INFO c.g.r.b.RuntimeKernelExtension - Bootstrapping module with order 1, ID reco, using com.graphaware.reco.neo4j.module.RecommendationModuleBootstrapper
15:38:40.077 [main] INFO c.g.r.n.m.RecommendationModuleBootstrapper - Constructing new recommendation module with ID: reco
15:38:40.080 [main] INFO c.g.r.n.m.RecommendationModuleBootstrapper - Trying to instantiate class FriendsComputingEngine
15:38:40.089 [main] ERROR c.g.r.n.m.RecommendationModuleBootstrapper - Engine FriendsComputingEngine wasn't found on the classpath. Will not pre-compute recommendations
java.lang.ClassNotFoundException: FriendsComputingEngine
at java.net.URLClassLoader$1.run(URLClassLoader.java:366) ~[na:1.7.0_91]
at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[na:1.7.0_91]
at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_91]
at java.net.URLClassLoader.findClass(URLClassLoader.java:354) ~[na:1.7.0_91]
at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[na:1.7.0_91]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) ~[na:1.7.0_91]
at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[na:1.7.0_91]
at java.lang.Class.forName0(Native Method) ~[na:1.7.0_91]
at java.lang.Class.forName(Class.java:195) ~[na:1.7.0_91]
How to solve this
You need to build one first if you're referring to it in your config. If you follow the steps in the readme file you're mentioning, you will end up building one.

Hadoop jobs.setJar does not working for jars on HDFS

I am trying to solve an issue when a Hadoop app throws java.lang.ClassNotFoundException:
WARN mapreduce.FaunusCompiler: Using the distribution Faunus job jar: ../lib/faunus-0.4.4-hadoop2-job.jar
INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)
INFO mapreduce.FaunusCompiler: Executing job 1 out of 1: VerticesMap.Map > CountMapReduce.Map > CountMapReduce.Reduce
INFO mapreduce.FaunusCompiler: Job data location: output/job-0
INFO client.RMProxy: Connecting to ResourceManager at yuriys-bigdata3/172.31.8.161:8032
WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner
INFO input.FileInputFormat: Total input paths to process : 1
INFO mapreduce.JobSubmitter: number of splits:1
INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1402963354379_0016
INFO impl.YarnClientImpl: Submitted application application_1402963354379_0016
INFO mapreduce.Job: The url to track the job: http://local-bigdata3:8088/proxy/application_1402963354379_0016/
INFO mapreduce.Job: Running job: job_1402963354379_0016
INFO mapreduce.Job: Job job_1402963354379_0016 running in uber mode : false
INFO mapreduce.Job: map 0% reduce 0%
INFO mapreduce.Job: Task Id : attempt_1402963354379_0016_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException:
com.tinkerpop.blueprints.util.DefaultVertexQuery
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat.setConf(GraphSONInputFormat.java:39)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
The app does create a "fat" jar file, where all the dependency jars (including the one that contains the not found class) are included under the lib node
The app does set the Job.setJar on this fat jar file.
The code does not do anything strange:
job.setJar(hadoopFileJar);
...
boolean success = job.waitForCompletion(true);
Besides, I looked up the configuration in the yarn-site.xml and verified that a job dir under yarn.nodemanager.local-dirs does contain that jar (it is renamed to job.jar though) and also that lib directory with extracted jars in them.
i.e. the jar that contains the missing class is there. Yarn/MR recreates this dir with all these required files after each job schedule, so the files do get transferred there.
I've discovered so far, is that the classpath environment variable on the java worker processes that execute the failing code is set as
C:\hdp\data\hadoop\local\usercache\user\appcache\application_1402963354379_0013\container_1402963354379_0013_02_000001\classpath-3824944728798396318.jar
and this jar just contains a manifest.mf That manifest contains paths to the directory with the "fat.jar" file and its directories (original formatting saved):
file:/c:/hdp/data/hadoop/loc al/usercache/user/appcache/application_1402963354379_0013/container
_1402963354379_0013_02_000001/job.jar/job.jar file:/c:/hdp/data/hadoo p/local/usercache/user/appcache/application_1402963354379_0013/cont ainer_1402963354379_0013_02_000001/job.jar/classes/ file:/c:/hdp/data /hadoop/local/usercache/user/appcache/application_1402963354379_001 3/container_1402963354379_0013_02_000001/jobSubmitDir/job.splitmetain fo file:/c:/hdp/data/hadoop/local/usercache/user/appcache/applicati on_1402963354379_0013/container_1402963354379_0013_02_000001/jobSubmi tDir/job.split file:/c:/hdp/data/hadoop/local/usercache/user/appcac he/application_1402963354379_0013/container_1402963354379_0013_02_000 001/job.xml file:/c:/hdp/data/hadoop/local/usercache/user/appcache/ application_1402963354379_0013/container_1402963354379_0013_02_000001 /job.jar/
However, this path does not explicitly adds the jars in the directories, i.e. the directory from the above manifest
file:/c:/hdp/data/hadoop/local/usercache/user/appcache/application_1402963354379_0013/container_1402963354379_0013_02_000001/job.jar/
does contain the jar file with the class that is not being found by yarn (as this directory contains all the jars from the "fat" jar lib section), but for JAVA world this kind of setting of classpath seems incorrect – this directory was supposed to be included with star*,
e.g:
file:/c:/hdp/data/hadoop/local/usercache/user/appcache/application_1402963354379_0013/container_1402963354379_0013_02_000001/job.jar/*
What I am doing wrong with passing dependencies to Yarn?
Could cluster configuration be an issue or possibly this is a bug on my Hadoop distro (HDP 2.1, Windows x64)?

How do you specify additional jars for the mr job launched from hive jdbc queries?

I am trying to query a hive table via jdbc that uses avro storage format but I am getting a class not found error in the mr job spawned by the query. The strange thing is that I can run the query from hive shell without the exception occurring.
I can run a query that does not spawn a map reduce job (select * from table limit 10) and it works fine.
2014-03-12 10:23:34,040 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:344)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:291)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:405)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:560)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:168)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:330)
... 11 more
Caused by: java.lang.NoClassDefFoundError: org/apache/avro/mapred/FsInput
at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:82)
at org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
... 16 more
Caused by: java.lang.ClassNotFoundException: org.apache.avro.mapred.FsInput
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 19 more
Probably hive.aux.jars.path: The location of the plugin jars that contain implementations of user defined functions and serdes. The CLI can pick up another config value from your JDBC hiverserver/hiverservre2. Try running set hive.aux.jars.path; in the two environments and compare the results. Eg. here Denny adds all avro JARs to hive.aux.jars.path in hive-site.xml.
The solution is to execute the following sql stmt for each new connection created.
add jar /hive-ext/avro-mapred-1.7.5-cdh5.0.0-beta-2-hadoop2.jar ;
The path /hive-ext is local to the hive server ie on the same machine that is running the hive server
Be sure to change your statement to match the name of your avro-mapred jar file

Hadoop "Spill Failed" Exception in an ec2 instance with 420GB of instance storage

I am using Hadoop2.3.0 and have installed it as single node cluster (psuedo-distributed mode) on CentOS 6.4 Amazon ec2 instance with an instance storage of 420GB and 7.5GB of RAM , my understanding is that the " Spill Failed " exception only occurs when the node runs out of the disk space however , after running map/reduce tasks for only a short amount of time (no where near to 420 GB of data ) I get the following exception.
I would like to mention that I moved the Hadoop installation on the same node from a EBS volume of 8GB(where I had installed it originally) to an instance store volume of 420GB on the same node and changed the $HADOOP_HOME environment variable and other properties to point to the instance store volume accordingly and the Hadoop2.3.0 is now completely contained in the 420GB drive.
However I still see the following exception , can you please let me know if there is anything besides Diskspace that can cause the Spill Failed exception ?
2014-02-28 15:35:07,630 ERROR [IPC Server handler 12 on 58189] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1393591821307_0013_m_000000_0 - exited :
java.io.IOException: Spill failed
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1533)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1442)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1393591821307_0013_m_000000_0_spill_26.out
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:402)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1564)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:853)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1503)
2014-02-28 15:35:07,604 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: Spill failed
2014-02-28 15:35:07,605 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Spill failed
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1533)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1442)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1393591821307_0013_m_000000_0_spill_26.out
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:402)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1564)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:853)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1503)
I was able to solve this by setting the hadoop.tmp.dir value to something on the instace storage , by default it was pointing to the EBS backed root volume.

Categories

Resources