Apache spark - java.lang.NoClassDefFoundError - java

I have maven based mixed scala/java application that can submit spar jobs. My application jar "myapp.jar" has some nested jars inside lib folder. one of which is "common.jar". I have defined class-path attribute in Manifest file like Class-Path: lib/common.jar. Spark executor throws java.lang.NoClassDefFoundError:com/myapp/common/myclass error when submitting application in yarn-client mode. Class(com/myapp/common/myclass.class) and jar(common.jar) is there and nested inside my main myapp.jar. Fat jar is created using spring-boot-maven plugin which nest other jars inside lib folder of parent jar. I prefer not to create shaded flat jar as that would create other issues. Anyway spark executor jvm can load nested jars here?
EDIT spark (jvm classloader) can find all the classes those are flat inside myapp.jar itself. i.e. com/myapp/abc.class, com/myapp/xyz.class etc.
EDIT2 spark executor classloader can also find some classes from nested jar but it throws NoClassDefFoundError some other classes in same nested jar!
here's the error:
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host4.local): java.lang.NoClassDefFoundError: com/myapp/common/myclass
at com.myapp.UserProfileRDD$.parse(UserProfileRDDInit.scala:111)
at com.myapp.UserProfileRDDInit$$anonfun$generateUserProfileRDD$1.apply(UserProfileRDDInit.scala:87)
at com.myapp.UserProfileRDDInit$$anonfun$generateUserProfileRDD$1.applyUserProfileRDDInit.scala:87)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:172)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:79)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassNotFoundException:
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 14 more
I do submit myapp.jar with sparkConf.setJar(String[] {"myapp.jar"}) and also tried setting it on spark.yarn.executor.extraClassPath
As a workaround, I extracted myapp.jar and set sparkConf.setJar(String[] {"myapp.jar","lib/common.jar"}) manually and error went away but obviously I have to do that for all the nested jar which is not desirable.

You can use --jars options, to give comma separated list of jars while starting the Spark Application.
Something like
spark-submit --jars lib/abc.jar,lib/xyz.jar --class <CLASSNAME> myapp.jar


Split Spring Boot fat jar to two jars (app / libs)

To optimise Docker layers, I am trying to split our 30M Spring Boot fat jar into the 2M app.jar and 28M libs.jar.
I can use exploded mode, but I prefer using 2 jars, because it simplifies a few things, such as deployments, scripts etc. In particular the fat jar is more easily and intuitively executed with java -jar, as opposed to the more cumbersome java org.springframework.boot.loader....Launcher.
My problem is that the moment I separate the libs out, I can't get the Launcher to find them. In either jar or exploded mode (with two dirs) - I keep getting
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:53)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: com/odoro/common/api/ServiceType
at com.odoro.sync.service.Application.main(Application.java:14)
... 6 more
Caused by: java.lang.ClassNotFoundException: com.odoro.common.api.ServiceType
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at org.springframework.boot.loader.LaunchedURLClassLoader.doLoadClass(LaunchedURLClassLoader.java:178)
at org.springframework.boot.loader.LaunchedURLClassLoader.loadClass(LaunchedURLClassLoader.java:142)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
I get this in all the following cases:
# java -jar app.jar -cp ../lib.jar
# java -cp .:../lib org.springframework.boot.loader.JarLauncher
# java -Dloader.path=../lib org.springframework.boot.loader.PropertiesLauncher
Any idea how I can get this to work?
It seems that the Spring Boot thin launcher is what you're looking for.

java.lang.ClassNotFoundException: com.github.lwhite1.tablesaw.api.Table error while running Spark-submit using Eclipse

I am executing the java archive on Spark using spark-submit in Ubuntu. The command is given below. This JAR file was build using Maven Package. The dependencies are specified in pom.xml file.
]$ spark-submit --class HighScore.Driver --master local[*] JarfilePath/Levelwise_PCFS-0.0.1-SNAPSHOT.jar InputFilePath/K9_Site1.csv 1000.
I am getting following error even when packageName.className (HighScore.Driver) is specified in the command.
Here is the error message.
Exception in thread "main" java.lang.NoClassDefFoundError: com/github/lwhite1/tablesaw/api/Table
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:727)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.github.lwhite1.tablesaw.api.Table
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 10 more
com/github/lwhite1/tablesaw/api/Table dependency was also specified in pom.xml file. But still it throws the exception.
Can some one help me in rectifying this error.
Remember that the classloader (specifically java.net.URLClassLoader) will look for classes in package a.b.c in folder a/b/c/ in each entry in your classpath.
NoClassDefFoundError can also indicate that you're missing a transitive dependency of a .jar file that you've compiled against and you're trying to use.
For example, if you had a class com.example.Foo, after compiling you would have a class file Foo.class.
Say for example your working directory is .../project/. That class file must be placed in .../project/com/example, and you would set your classpath to .../project/.
Please refer this post for more details. It would be helpful.

Hbase example, Exception in thread "main" java.lang.NoClassDefFoundError

We are trying to execute basic Hbase example on hortonworks sandbox (2.3).
hadoop jar /usr/hdp/ org.apache.hadoop.hbase.mapreduce.IndexBuilder
We are getting below exception after executing this program.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/Bytes
at org.apache.hadoop.hbase.mapreduce.IndexBuilder.<clinit>(IndexBuilder.java:67)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.util.Bytes
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 5 more
Based on this error we tried to set the Hadoop classpath in Hbase-env.sh.
But still getting the same error.
Instead of manually adding jars into classpath you can directly use below command.
$(hbase classpath) recursively search in hortonworks hadoop folders and finds the required jars from sandbox.
HADOOP_CLASSPATH=$(hbase classpath):/usr/hdp/ hadoop jar /usr/hdp/ org.apache.hadoop.hbase.mapreduce.IndexBuilder
When I face NoClassDefFoundError error with mapreduce, I add jar using one of the jar class in JobBuilder to resolve it.
Job job = new Job(conf);
Supply jars using libjars parameter to your job-
hadoop jar /usr/hdp/ org.apache.hadoop.hbase.mapreduce.IndexBuilder -libjars ${LIB}
you can also add jar to HADOOP_CLASSPATH variable before launch job.
Is all the latest code included in the jar? Use a java decompiler such as jd-gui to look inside the jar file to make sure this class you are referencing is actually there. Also check that the necessary import statements are present in the Java class.

java.lang.ClassNotFoundException when trying to run camus

I downloaded the confluent package which includes camus jars and I followed the instructions online enter link description here.
Hadoop is properly setup (meaning I can use hadoop fs -ls commands and other hadoop jar commands). However, when i tried to run
hadoop jar confluent-camus-1.0.jar com.linkedin.camus.etl.kafka.CamusJob
I got "main" classNotFound error
Exception in thread "main" java.lang.ClassNotFoundException: com.linkedin.camus.
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:344)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)
The the path to the "confluent-camus-1.0.jar" is correct (right under the folder). I didn't start the kafka service, just to try to run it.
Anyone got similar problems?
You should try to inspect your jar file:
jar tvf confluent-camus-1.0.jar | grep com.linkedin.camus.etl.kafka.CamusJob
If you do not find this class, try to find it in other jar, which generated by camus.
After you should add target jar with
hadoop jar confluent-camus-1.0.jar com.linkedin.camus.etl.kafka.CamusJob -libjars {JAR_NAME}

Hadoop external jars

I am trying to run a hadoop job on a server. The version is 0.20.2.
I have a big amount of jars, I am running:
hadoop jar GenData.jar -libjars /path/jar1,path/jar2,...
I am getting the error below even if the corresponding classes are inside the jars:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/avro/mapreduce/AvroKeyInputFormat at
GenerateTrainningData.main(GenerateTrainningData.java:256) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.hadoop.util.RunJar.main(RunJar.java:197) Caused by:
org.apache.avro.mapreduce.AvroKeyInputFormat at
java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
Looks like you are getting this exception from Hadoop client side, Mapreduce driver code execution happens in Client JVM. In hadoop -libjars is a generic option which is used for adding dependent jars to mapper/reducer. In your case for adding Jars to Client set you may set the following environment variable,before executing the hadoop command.
export HADOOP_CLASSPATH=<PATH_to_jar>/Jar1:<PATH_to_jar>/Jar2;
(colon ":" can be used for specifying more than 1 jars, In your case you may add the Jar that contains the class org.apache.avro.mapreduce.AvroKeyInputFormat).
New edits
Here first of all you need to find the jar containing the class org.apache.avro.mapreduce.AvroKeyInputFormat. You can find the class inside the jar avro-mapred*.jar (Get the compatible version of avro-mapred-version.jar from internet ) include the same in your classpath using the above command.
You are missing avro-mapred dependency.

