Apache Spark -- using spark-submit throws a NoSuchMethodError - java

To submit a Spark application to a cluster, their documentation notes:
To do this, create an assembly jar (or “uber” jar) containing your code and its dependencies. Both sbt and Maven have assembly plugins. When creating assembly jars, list Spark and Hadoop as provided dependencies; these need not be bundled since they are provided by the cluster manager at runtime. -- http://spark.apache.org/docs/latest/submitting-applications.html
So, I added the Apache Maven Shade Plugin to my pom.xml file. (version 3.0.0)
And I turned my Spark dependency's scope into provided. (version 2.1.0)
(I also added the Apache Maven Assembly Plugin to ensure I was wrapping all of my dependencies in the jar when I run mvn clean package. I'm unsure if it's truly necessary.)
Thus is how spark-submit fails. It throws a NoSuchMethodError for a dependency I have (note that the code works from a local instance when compiling inside IntelliJ, assuming that provided is removed).
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.createStarted()Lcom/google/common/base/Stopwatch;
The line of code that throws the error is irrelevant--it's simply the first line in my main method that creates a Stopwatch, part of the Google Guava utilities. (version 21.0)
Other solutions online suggest that it has to do with version conflicts of Guava, but I haven't had any luck yet with those suggestions. Any help would be appreciated, thank you.

If you take a look at the /jars subdirectory of the Spark 2.1.0 installation, you will likely see guava-14.0.1.jar. Per the API for the Guava Stopwatch#createStarted method you are using, createStarted did not exist until Guava 15.0. What is most likely happening is that the Spark process Classloader is finding the Spark-provided Guava 14.0.1 library before it finds the Guava 21.0 library packaged in your uberjar.
One possible resolution is to use the class-relocation feature provided by the Maven Shade plugin (which you're already using to construct your uberjar). Via "class relocation", Maven-Shade moves the Guava 21.0 classes (needed by your code) during the packaging of the uberjar from a pattern location reflecting their existing package name (e.g. com.google.common.base) to an arbitrary shadedPattern location, which you specify in the Shade configuration (e.g. myguava123.com.google.common.base).
The result is that the older and newer Guava libraries no longer share a package name, avoiding the runtime conflict.

Most likely you're having a dependency conflict, yes.
First you can look if you have a dependency conflict when you build your jar. A quick way is to look in your jar directly to see if the Stopwatch.class file is there, and if, by looking at the bytecode, it appears that the method createStarted is there.
Otherwise you can also list the dependency tree and work from there : https://maven.apache.org/plugins/maven-dependency-plugin/examples/resolving-conflicts-using-the-dependency-tree.html
If it's not an issue with your jar, you might have a dependency issue due to a conflict between your spark installation and your jar.
Look in the lib and jars folder of your spark installation. There you can see if you have jars that include an alternate version of guava that wouldnt support the method createStarted() from Stopwatch

Apply above answers to solve the problem by following config:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.1.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<relocations>
<relocation>
<pattern>com.google.common</pattern>
<shadedPattern>shade.com.google.common</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.thirdparty.publicsuffix</pattern>
<shadedPattern>shade.com.google.thirdparty.publicsuffix</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>

Related

My Kotlin fat JAR is failing to load a dependency correctly but only when it's run under FreeBSD, what can cause this?

This is a really weird one. I have a Kotlin web service that was originally written as a hybrid app of both Kotlin and Java but I've recently migrated to pure Kotlin (although many of its libraries are still in Java). The framework I'm using is sparkjava and I'm using Maven to manage dependencies and packaging. The service in the past was built with manually included dependencies as JAR files and was built using an IntelliJ configuration, this was horribly messy and difficult to reproduce so I moved all the dependencies into Maven and set up a process for this. This is where things get weird:
I included this plugin in my pom.xml to manage the creation of the fat JAR which looks like this:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.1.1</version>
<configuration>
<archive>
<manifest>
<mainClass>unifessd.MainKt</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
When I run this confuration however, I get a JAR that won't execute. I didn't think this was a major problem, as running the "package" lifecycle in Maven does produce an executable JAR. This resultant JAR will happily run on my development machine (macOS Big Sur) and will pass all my external testing scripts. However, when I deploy the very same JAR to my production environment which is a FreeBSD server on AWS, it will start up correctly but whenever I make a request I get the following error:
[qtp248514407-20] WARN org.eclipse.jetty.server.HttpChannel -
//<redacted.com>/moderation/users/administrators
java.lang.NoClassDefFoundError: Could not initialize class
de.mkammerer.argon2.jna.Argon2Library
at de.mkammerer.argon2.BaseArgon2.hashBytes(BaseArgon2.java:267)
at de.mkammerer.argon2.BaseArgon2.hashBytes(BaseArgon2.java:259)
at de.mkammerer.argon2.BaseArgon2.hash(BaseArgon2.java:66)
at de.mkammerer.argon2.BaseArgon2.hash(BaseArgon2.java:49)
at [...]
I've truncated the stack trace to keep things concise but all it's doing before that is opening the appropriate DAO and hashing the password attempt. The offending class is of course de.mkammerer.argon2, which is a dependency I use to hash passwords using the argon2 algorithm. This has me really stumped for the following reasons:
When this dependency was linked in manually using a JAR in IntelliJ, it worked absolutely fine in production.
Even though the class fails to load in production, it works fine locally despite the packages being identical.
macOS and FreeBSD aren't exactly a million miles apart in terms of how they're put together, so why are they behaving so differently?
A few other points in my efforts to debug this:
I've tried linking in my argon2 library in the old way, and it's still failing in the same fashion.
IntelliJ isn't recognising the main class of my Kotlin app any more if I try and create an artifact without Maven. This is really weird, I can set up a Kotlin build and run configuration just fine by specifying unifessd.MainKt as my main class, but when it comes to building an artifact it's simply not having it. It doesn't appear in the artifact creation dialogue and when I specify it as my Main-Class in MANIFEST.MF, IntelliJ tells me it's an invalid main class. What on Earth is going on here? It'll run just fine when I tell Maven that's my main class and package it in a JAR, even in the faulty production environment.
Robert and dan1st were correct, the problem was that my argon2 library had a dependency on JNA and native code that was incompatible with FreeBSD. I tested the JAR on an Ubuntu server to confirm that this was the case and the program ran correctly.

How to automate discovering dependency conflict

The java application I develop should run on servers I have not direct access to. Sometimes dependency conflicts arise. I mean that on some servers the app works perfect and on other ones same application fails. And the errors indicate libraries version conflict. I would like the application informs about the library version conflict rather than just crashes with NoSuchFieldError, NoSuchMethodError, NoClassDefFoundError etc.
I may obtain libraries list on the application building platform with mvn dependency:tree
So, I need the application reads the libraries versions on platform where it runs, compare it with libraries list on building platform and report about versions mismatching. So, how the application could define libraries in runtime? Or maybe there exist more convenient way to automate dependency conflict discovering?
I believe you need resolve this kind of problem before deployment. You cannot rely on Running Tool to save. So I would suggest solve them at much earlier stage.
If you use maven to manage dependency. You can try below:
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-enforcer-plugin</artifactId>
<version>1.4.1</version>
<configuration>
<rules><dependencyConvergence/></rules>
</configuration>
</plugin>
</plugins>
Then
mvn enforcer:enforce
You can read this good material

Strange Maven nullpointer

I have modified this question since initially asking it. Please refer to sections UPDATE 1 and, specifically, UPDATE 2 at the end.
I am building a large JavaFX application with a lot of dependencies. I am using IntelliJ to do the job and am at the point of deployment. I am using IntelliJ's own tutorial on artifacts to build an executable jar. The tutorial can be seen in the "Working with artifacts" tutorial on jetbrains.
I have built my executable jar and it is working as it should, with one caveat, however:
I have to manually mimic the directory structure of my IntelliJ project for the executable jar file to find the resources necessary for the program to function properly.
This is where my question comes in: shouldn't IntelliJ include these files in the artifact, so it can run in and on its own?
My directory structure in IntelliJ looks like this:
Project root
.idea
.out
.src
.main
.java
.com
.myCompany
.package-with-classes1
.class1 ... N
.package-with-classes2
.class1 ... N
.package-with-files
.file1.someExtension
.file2.someExtension
.other-package-classes
.and-so-on
When I build the artifact under Project Structure - Artifacts - Output Layout, I then manually add the directory structure as can be seen above, and then place the files where they belong.
As per my question above, I would expect these files to be automatically included with the executable jar file.
UPDATE 1: Added Maven to project
Due to Andrey's comment I have now implemented Maven in my project. I have a lot of external dependencies which I have added to my pom.xml file like so:
<dependency>
<groupId>some.group.id</groupId>
<artifactId>some-artifact-id</artifactId>
<scope>system</scope>
<version>1.0.0</version>
<systemPath>${basedir}\path\to\jar\jarfile.jar</systemPath
</dependency>
I then do:
mvn clean
mvn compile
mvn package
All runs with no errors.
It places 2 jar files in my \target folder: (1) name-of-jar.jar and (2) name-of-jar-with-dependencies.jar.
Running (1) throws the error: no main manifest attribute. Running (2) throws ClassNotFoundException and NoClassDefFoundError errors. Why is this? The classes throwing the errors are included as dependencies using the above approach.
UPDATE 2: Progress with Maven, but...
I solved the issue in section UPDATE 1 by installing all my third party jar libraries to my local machine's Maven repository at C:\Users\$USER$.m2\repository. However, getting a null pointer...
I changed my dependency declarations in my pom.xml to the following:
<dependency>
<groupId>some.group.id</groupId>
<artifactId>some-artifact-id</artifactId>
<version>some.version.number</version>
</dependency>
I am currently building my fat jar using the maven assembly plugin (I have also tried using the shade plugin but am having the same issue). Here's the excerpt of the assembly plugin from my pom.xml:
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>com.myCompany.myMainClass</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
This produces the same two jar files in the \target directory as described in section UPDATE 1.
Now, if I run jar -tf name-of-jar-with-dependencies.jar I can see from the directory contents of this jar that the jar in fact does contain all the third party jar libraries that was missing; and running the jar file using java -jar name-of-jar-with-dependencies.jar does not throw the errors as described in section UPDATE 1 any longer. So far, so good.
However, it does throw a NullPointerException which puzzles me. Specifically, it is complaining that the a certain class is missing. This seems a little strange to me since this class is part of a third party jar library which I did add as a dependency in my pom.xml. The fact that this class is indeed included in the final jar was confirmed by the approach above, printing out the contents of the name-of-jar-with-dependencies.jar, which - among a lot of other files - includes this very jar file.
Any thoughts?

Deploying a library to Maven repo when it depends on non-Maven libraries

I run an open source library and am considering having it fully embrace Maven and upload it to to a central repository so that people can easily add it to their projects.
The problem is that it depends on a couple of older libraries that do not exist on any Maven repos. Currently, that means a pom file has to use the system scope of the dependency. I've also read about creating a local repository for the project to install the 3rd party libraries.
However, my impression is that neither of these approaches will work well when I deploy my library to a Maven repository. That is, if it depends on external "system" or local repositories, then when someone adds my library to their pom file, they're not actually done. They also have to download the 3rd party libraries and manually install them or add them to their own local repository.
What I'd like to have happen is for these couple of 3rd party libraries to simply be included in the jar file that I deploy to the central repository. That way, if someone adds my library to their pom file, they really are done and don't have to worry about getting and installing the 3rd party libraries. Is it possible to do that?
First off, I'll start by saying that you should back away as far as possible from the system scope. You can refer to this answer for more information.
A way to circumvent your problem is indeed to include in the deployed JAR all the libraries that aren't present in Maven Central. (Let's say you have installed those libraries in your local repository.) You can do that with the maven-shade-plugin, which is a plugin used to create uber jars. The attribute artifactSet controls what will be included in the shaded JAR. In the following snippet, only the two dependencies groupId:artifactId and groupId2:artifactId2 will be included.
<plugin>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<includes>
<include>groupId:artifactId</include>
<include>groupId2:artifactId2</include>
</includes>
</artifactSet>
</configuration>
</execution>
</executions>
</plugin>
By default, the shaded JAR will replace your main artifact and this will be the JAR that will be deployed. The POM that will be deployed will not contain the dependency entries for the artifacts that were included in the shaded JAR, as such, a client depending on your deployed artifact won't be transitively-depending on them.

do not pack specified resources in jar

I have a maven2 project, consisting of root project (with pom packaging) and a set of modules having dependencies on each other. The project is a library with a set of apps built on top of this library. Now the problem.
My library uses some resources which cannot be packed in jar - namely some sqlite databases, and I can't find a way to put it near the jar instead of inside it, and to buldle library this way to dependent applications.
Thanks. Any ideas?
Create a custom assembly to distribute the project as an archive (e.g. a zip or tar.gz) is clearly the way to go here.
To customize the way the Assembly Plugin creates your assemblies, you'll need to provide your custom descriptor (this gives you all the flexibility you need). Then, to build the assembly as part of the build, all you have to do is to bind the single or single-directory mojos into the default build lifecycle as explained in the Configuration and Usage of the plugin's documentation.
Another great resource is Sonatype's book which has an entire chapter dedicated to assemblies: see Chapter 14. Maven Assemblies .
Sounds like you could use the maven assembly plugin to create a distribution file of your choice (zip, jar, tar...) which would include the extra resources.
Here is the important fact from Maven: The Complete Reference's Assemblies Chapter Section 8.3.2:
When you generate assemblies as part
of your normal build process, those
assembly archives will be attached to
your main project’s artifact. This
means they will be installed and
deployed alongside the main artifact,
and are then resolvable in much the
same way. Each assembly artifact is
given the same basic coordinates
(groupId, artifactId, and version) as
the main project. However, these
artifacts are attachments, which in
Maven means they are derivative works
based on some aspect of the main
project build. To provide a couple of
examples, source assemblies contain
the raw inputs for the project build,
and jar-with-dependencies assemblies
contain the project’s classes plus its
dependencies. Attached artifacts are
allowed to circumvent the Maven
requirement of one project, one
artifact precisely because of this
derivative quality.
Since assemblies are (normally)
attached artifacts, each must have a
classifier to distinguish it from the
main artifact, in addition to the
normal artifact coordinates. By
default, the classifier is the same as
the assembly descriptor’s identifier.
When using the built-in assembly
descriptors, as above, the assembly
descriptor’s identifier is generally
also the same as the identifier used
in the descriptorRef for that type of
assembly.
It is important to understand that while most Maven projects only generate a single artifact it is possible to generate more than one and use the classifier coordinate to associate these artifacts with the same GAV coordinate. In your case, you'll want to attach an assembly plugin's "single" goal using something similar to this:
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.2-beta-2</version>
<dependencies>
<dependency>
<groupId>org.sonatype.mavenbook.assemblies</groupId>
<artifactId>web-fragment-descriptor</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
</dependencies>
<executions>
<execution>
<id>assemble</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
<configuration>
<descriptorRefs>
<descriptorRef>web-fragment</descriptorRef>
</descriptorRefs>
</configuration>
</execution>
</executions>
</plugin>
You can attach as many of these executions as you wish, but once you have more than one execution for a particular plugin, each execution will require a unique "id" element. The "single" goal in the Maven Assembly plugin does the same thing that that the "assembly" goal does except it was designed to be bound to the lifecycle.
The other part of you question is about excluding specific resources from a JAR, you can accomplish this by excluding resources in your POM.

Categories

Resources