Beginner in implementing Apache Spark in my Java project. I'm using Spark-3.3, and the jars are downloaded from the maven repository. A simple snippet as the following throws an error, I'm very confusing:
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SaveMode;
import org.apache.spark.sql.SparkSession;
public class main{
public static void main(String[] args) {
String in_path = "./test.csv";
String out_path = "./out.csv";
SparkSession spark = SparkSession.builder()
.appName("CSV to Dataset")
.master("local")
.getOrCreate();
Dataset<Row> df = spark.read().format("csv")
.option("header", "true")
.load(in_path);
//write
Dataset<Row> outputDf = df
.filter("confidence_level = 'high'")
.repartition(1);
outputDf
.write()
.format("csv")
.option("header", true)
.mode(SaveMode.Overwrite)
.save(out_path);
}
}
One can make a .csv file to reproduce this error:
,step,value
0,0,0.48335474743993967
1,1,0.1158508331018181
2,2,0.9587111373188968
3,3,0.8701416114549719
4,4,0.1568403204008163
5,5,0.12215751676273201
6,6,0.5040615339539852
7,7,0.5291894043380058
8,8,0.40721487378992893
9,9,0.9284453533942072
10,10,0.8224097122571449
11,11,0.31928057533043286
12,12,0.9255140336657344
The error is this:
java: cannot access scala.collection.immutable.Seq class file for
scala.collection.immutable.Seq not found
Many thanks
Update1:
After including all these jars:
spark-sql_2.13-3.3.0.jar
spark-network-common_2.13-3.3.0.jar
spark-mllib_2.13-3.3.0.jar
spark-core_2.13-3.3.0.jar
spark-catalyst_2.13-3.3.0.jar
slf4j-simple-1.7.36.jar
slf4j-api-1.7.36.jar
scala-library-2.13.8.jar
log4j-core-2.18.0.jar
hadoop-mapreduce-client-jobclient-3.3.4.jar
hadoop-mapreduce-client-core-3.3.4.jar
hadoop-mapreduce-client-common-3.3.4.jar
hadoop-common-3.3.4.jar
hadoop-client-3.3.4.jar
guava-31.1-jre.jar
commons-lang3-3.12.0.jar
commons-configuration2-2.8.0.jar
It still throws:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hadoop/thirdparty/com/google/common/collect/Maps
A nightmare, now I cannot find which jar is related to.
I have slightly different version of spark, but I think steps below won't be any different.
Suppose I have spark-3.1.2 installed here: /home/qq/.sdkman/candidates/spark/3.1.2/
Suppose also I have Main.java like yours.
I compiled and run it without errors like this:
javac -cp .:/home/qq/.sdkman/candidates/spark/3.1.2/jars/* Main.java
java -cp .:/home/qq/.sdkman/candidates/spark/3.1.2/jars/* Main
I strongly suggest to use a depedency management tool like Maven. You will sooner or later have to update your Spark version (if only for bugfixes or fixed security flaws). Spark 3.3.0 needs at least 139 different jars (plus optional packages like MLlib or GraphX). Managing the correct versions of 139+ different jars is no fun!
If you cannot use Maven directly in your project, you can at least use this tool to create the list of jars that you have to include:
First create a minimal pom.xml:
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>mygroup</groupId>
<artifactId>simplesparkapp</artifactId>
<version>1</version>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.13</artifactId>
<version>3.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.13</artifactId>
<version>3.3.0</version>
</dependency>
</dependencies>
</project>
Then use Maven's dependency plugin to either
create a list of all required jars: mvn dependency:list (result is here) or
let Maven copy all required jars into a folder on your machine: mvn dependency:copy-dependencies -DoutputDirectory=out
This way you can at least automate parts of your build and deployment road.
Related
I have been working with cucumber/Java and JUnit4 (CucumberOptions) for years without trouble running the tests in both IntelliJ and maven command line.
Recently, i have been trying to make the move to JUnit5 and i was able to have all tests running in IntelliJ (only, unfortunately)
My POC project has the following structure:
junit5
-Features (folder with feature files)
-resources (folder with files used in tests)
-src
--test
---java
----stepdefs
-----SetupEnvHook
-----StepDefs
----AllTest (testrunner wip)
----JU4Test (testrunner JUnit4)
----JU5Test (testrunner Junit5)
---resources (test resources)
-junit5.iml
-pom.xml
The JU5Test.java file :
import org.junit.platform.suite.api.ConfigurationParameter;
import org.junit.platform.suite.api.SelectDirectories;
import org.junit.platform.suite.api.Suite;
import stepdefs.SetupEnvHook;
import io.cucumber.java.Before;
import static io.cucumber.core.options.Constants.*;
#Suite
#SelectDirectories("Features")
//#ConfigurationParameter(key = PARALLEL_EXECUTION_ENABLED_PROPERTY_NAME, value = "true")
#ConfigurationParameter(key = PLUGIN_PUBLISH_ENABLED_PROPERTY_NAME, value = "false")
#ConfigurationParameter(key = PLUGIN_PUBLISH_QUIET_PROPERTY_NAME, value = "true")
#ConfigurationParameter(key = PLUGIN_PROPERTY_NAME, value = "json:target/cucumber-reports/cucumber.json")
#ConfigurationParameter(key = GLUE_PROPERTY_NAME, value = "stepdefs, my.external.steps.stepdefinition")
public class JU5Test {
#Before
public static void beforeSuite() {
SetupEnvHook.setEnvironment("QA");
}
}
The beforeSuite() method is also used in the JU4Test.
When i set a breakpoint in SetupEnvHook.setEnvironment("QA"); it is completely ignored due to the fact that the Before Annotation is not working, while another breakpoint inside the same
#io.cucumber.java.BeforeAll(order = 9999)
Annotation in SetupEnvHook class is triggered correctly.
My pom file is as follows :
<dependencies>
<dependency>
<groupId>my.external</groupId>
<artifactId>steps</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<version>5.9.1</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
</plugins>
</build>
Please ignore the my external dependancy. This dependancy is related to the stepdefinitions in the test runner file glue property.
I know that group and version values are also missing but these are all fed from the same dependancy in red so as to have more control on the versions everyone uses.
This is all done in Java 8 using
org.apache.maven.plugins:maven-compiler-plugin:3.10.1
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M7
io.cucumber:cucumber-java:7.8.1
io.cucumber:cucumber-junit:7.8.1
io.cucumber:cucumber-junit-platform-engine:7.8.1
org.junit.jupiter:junit-jupiter-api:5.9.1
org.junit.jupiter:junit-jupiter-engine:5.9.1
org.junit.platform:junit-platform-suite-api:1.9.1
org.junit.platform:junit-platform-suite-engine:1.9.1
I already tried using different Annotations not only from io.cucumber.java but also from org.junit (which is basically JUnit4) and org.junit.jupiter.api with no success obviously.
Running through maven command line ends up with :
Results :
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project junit5: No tests were executed!
It does not however state that 'no tests were found', had that issue initially and got it solved.
From looking at the error i suspect i may have something missing from the pom.xml surefire plugin but i cannot figure out what. (this pom is the same used to run the JU4Test without issues)
Anyone else have any thoughts on what i can try next? or better yet, the solution for this xD
Edit: remove images
It does not however state that 'no tests were found', had that issue initially and got it solved.
From looking at the error i suspect i may have something missing from the pom.xml surefire plugin but i cannot figure out what. (this pom is the same used to run the JU4Test without issues)
From your description it is impossible to say what is wrong with your project. Your list of depencies includes dependencies not included in your POM.
You may want to consider starting your project from scratch. You can use the https://github.com/cucumber/cucumber-java-skeleton for that.
When i set a breakpoint in SetupEnvHook.setEnvironment("QA"); it is completely ignored due to the fact that the Before Annotation is not working
The reason the #Before annotation is ignored is because you are using a Cucumber annotation on a class that is not part of the glue path.
Though I suspect you are trying to find a mapping for JUnit 4s #BeforeClass. Currently there is not such thing in JUnit 5s Suite Engine. If you need it, you should consider making a pull requests.
Alternatively you could create a package with a single class for each environment and use Cucumbers #BeforeAll hooks to set the environment. Then for each #Suite you configure a different glue path to include those hooks.
Though I think it would be even better to read the target environment from an environment variable and have it default to something sane. You can then use different CI jobs for each environment.
I am new to java and AKKA toolkit. I have created a JAVA project and tried to include the below code
package com.postgresqltutorial;
import akka.actor.ActorSystem;
public class App {
public static void main(String[] args) {
final ActorSystem system = ActorSystem.create("QuickStart");
}
}
I have used AKKA libs in referenced libs as akka-actor_2.12-2.6.15.jar, akka-protobuf_2.12-2.6.15.jar and akka-stream_2.12-2.6.15.jar.
And my project structure is like
project structure
Please help me to resolve this.
Most likely you've not referenced the libraries correctly. That is why you should use a build tool such as Maven. Check the referenced link to understand how it works. It handles the libraries for you, you just have to add them in the pom.xml file.
Example:
<dependencies>
<dependency>
<groupId>com.typesafe.akka</groupId>
<artifactId>akka-actor_3</artifactId>
<version>2.6.18</version>
</dependency>
</dependencies>
This is my test (maven-plugin-testing-harness 3.3.0, junit 5.6.2):
import java.io.File;
import org.apache.maven.plugin.testing.AbstractMojoTestCase;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
public final class MyMojoTest extends AbstractMojoTestCase {
#BeforeEach
public void setup() throws Exception {
this.setUp();
}
#Test
public void executeIt() throws Exception {
final File pom = new File("src/test/resources/my-test-pom.xml");
final MyMojo mojo = MyMojo.class.cast(
this.lookupMojo("mygoal", pom)
);
mojo.execute();
}
}
This is what I have in MyMojo (maven-plugin-api 3.8.4):
import org.apache.maven.plugin.AbstractMojo;
import org.apache.maven.plugins.annotations.Mojo;
import org.apache.maven.plugins.annotations.Parameter;
#Mojo(name = "my", defaultPhase = LifecyclePhase.COMPILE)
public final class MyMojo extends AbstractMojo {
#Parameter(defaultValue = "${project}", readonly = true)
private MavenProject project;
}
The problem is that mojo returned by lookupMojo() doesn't have the project attribute set (it's null).
Some solution was proposed here, but I'm not sure how it can work with JUnit 5.
I tried with the same configurations as mentioned above. The plugin works fine but none of the tests having lookupMojo() seems to be working.
A similar test example can be referred here. There is a difference in the setUp method from your class MyMojoTest and example provided in the link.
super.setUp(); should be called instead of this.setUp() so as to initialize all the objects in AbstractMojoTestCase class.
The possible reason that the test case with maven-plugin-testing-harness 3.3.0 and junit 5.6.2 will not work because they are not compatible.
The reasons are
maven-plugin-testing-harness was built to be compatible with Junit4. Latest update was a long time ago i.e. Dec 17, 2014. Junit 4 and Junit 5 are not compatible. We have to make use of Junit-Vintage-Engine to make it work.
maven-plugin-testing-harness was develop using JDk-7 and minimum requirements for Junit 5 is Jdk-8.
Information from the harness plugin Manifest file
Implementation-Vendor-Id: org.apache.maven.plugin-testing
Built-By: igor
Build-Jdk: 1.7.0_55
Specification-Vendor: The Apache Software
Foundation Specification-Title: Maven Plugin Testing Mechanism
Maven version supported is also different for both the jars.
link
Few other links confirm the same.
There are very few libraries and informational link available for plugin testing with Junit5. I could find only a handful of them, although I haven't tried them yet.
Library:
<dependency>
<groupId>com.soebes.itf.jupiter.extension</groupId>
<artifactId>itf-assertj</artifactId>
<version>0.11.0</version>
<scope>test</scope>
</dependency>
Few more Jupiter extension libraries in this link
Examples related to it.
Example 1
Example 2
Example 3
Possible solutions
Solution #1: Use AbstractMojoTestCase.lookupConfiguredMojo() method
Please, consider the implementation of the test class as an example: maven-plugin-testing/ParametersMojoTest.java at maven-plugin-testing-3.3.0 · apache/maven-plugin-testing.
Considering this example, please, note the Mojo instantiation approach:
The readMavenProject() method.
The Mojo instantiation uses the readMavenProject() and lookupConfiguredMojo() methods:
MavenProject project = readMavenProject( new File( "src/test/projects/default" ) );
ParametersMojo mojo = (ParametersMojo) lookupConfiguredMojo( project, "parameters" );
This Mojo instantiation approach provides the instantiated Mojo with the correct MavenProject parameter value.
Some additional references
Related Stack Overflow answer.
Solution #2: Test pom.xml for plugin: Use project stub
It is necessary to update the test pom.xml file by introducing the project element (of the configuration element) with the stub.
For example:
<project>
<…>
<build>
<plugins>
<plugin>
<artifactId>touch-maven-plugin</artifactId>
<configuration>
<project implementation="org.apache.maven.plugin.testing.stubs.MavenProjectStub">
<groupId implementation="java.lang.String">test-group-id</groupId>
<artifactId implementation="java.lang.String">test-artifact-id</artifactId>
<version implementation="java.lang.String">1.0.0-SNAPSHOT</version>
</project>
<…>
</configuration>
</plugin>
</plugins>
</build>
</project>
Some additional references
An example. maven-jar-plugin/pom.xml at maven-jar-plugin-3.2.2 · apache/maven-jar-plugin.
Related Stack Overflow answer.
I'm trying to follow this example from Adobe:
How to programmatically access the AEM JCR
My code looks like this:
package com.example;
import javax.jcr.Repository;
import javax.jcr.Session;
import javax.jcr.SimpleCredentials;
import javax.jcr.Node;
import org.apache.jackrabbit.commons.JcrUtils;
/**
* Hello world!
*
*/
public class App
{
public static void main( String[] args )
{
try {
//Create a connection to the CQ repository running on local host
Repository repository = JcrUtils.getRepository("http://localhost:4502/crx/server");
} catch (Exception ex){
System.out.println(ex.toString());
ex.printStackTrace();
}
}
}
My pom.xml has these dependencies:
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.jackrabbit</groupId>
<artifactId>jackrabbit-core</artifactId>
<version>2.21.7</version>
</dependency>
<dependency>
<groupId>org.apache.jackrabbit</groupId>
<artifactId>jackrabbit-jcr-commons</artifactId>
<version>2.21.7</version>
</dependency>
<dependency>
<groupId>org.apache.jackrabbit</groupId>
<artifactId>jackrabbit-jcr2dav</artifactId>
<version>2.21.7</version>
</dependency>
</dependencies>
When I run with this command:
`java -jar .\target\demo-jar-with-dependencies.jar`
I get the following output:
javax.jcr.RepositoryException: Unable to access a repository with the following settings:
org.apache.jackrabbit.repository.uri: http://localhost:4502/crx/server
The following RepositoryFactory classes were consulted:
org.apache.jackrabbit.core.RepositoryFactoryImpl: declined
Perhaps the repository you are trying to access is not available at the moment.
javax.jcr.RepositoryException: Unable to access a repository with the following settings:
org.apache.jackrabbit.repository.uri: http://localhost:4502/crx/server
The following RepositoryFactory classes were consulted:
org.apache.jackrabbit.core.RepositoryFactoryImpl: declined
Perhaps the repository you are trying to access is not available at the moment.
at org.apache.jackrabbit.commons.JcrUtils.getRepository(JcrUtils.java:224)
at org.apache.jackrabbit.commons.JcrUtils.getRepository(JcrUtils.java:264)
at com.example.App.main(App.java:20)
I've found a number of articles both here and on other sites but none of the suggestions I've found have done anything to resolve the issue.
Any thoughts on what I'm doing wrong? I'm pretty new to AEM. Yes, the AEM Author server is running and it's on port 4502. In fact if I open this URL in a web browser I get:
Which seems normal.
Update: The consensus seems to be that I'm going about this incorrectly by using Maven and not downloading the Jackrabbit Standalone library from Apache. So I created an entirely new project in Eclipse, without Maven, and added jackrabbit-standalone.jar Version 2.23.0 (which is the latest) as a reference library. The result is exactly the same:
I've setup a Github repository for this code at:
Github Repo
Please feel free to clone it and see what I'm doing wrong.
We usually write our AEM code in OSGi bundles that we deploy in AEM. If you want to use the JCR API to connect to an AEM instance follow the advice here:
To use the JCR API, add the jackrabbit-standalone-2.4.0.jar file to
your Java application’s classpath. You can obtain this JAR file from
the Java JCR API web page at
https://jackrabbit.apache.org/jcr/jcr-api.html.
Once you download that file you have two options:
1/ No Maven - compile your program manually using javac -classpath ... (and specify the jackrabbit-standalone JAR in the classpath), and run it using java -cp ...
2/ With Maven - install the JAR in the Maven repo (or add it as a dependency using <scope>system<scope>, but you also have to run your program using Maven
Note that the Maven dependencies do not affect the program, the jackrabbit-standalone JAR is all you need to run the code you wrote.
So you're trying to access AEM using Jackrabbit WebDV.
It's pretty clear that you don't have the right components in the class path. You likely need all "spi" related components (these can the WebDAV-related client code), plus "jackrabbit-jcr-commons".
You definitively do not need core; as that would a for a local repository instance.
I'm following the Mahout In Action tutorial for kmeans clustring, i use the same code found here:
with the same pom.xml also.
On my local machine using eclipse every thing works fine, so i build the jar file (clustering-0.0.1-SNAPSHOT.jar) and bring it to the cluster (Hortonworks 2.3) when trying to run it using: hadoop jar clustering-0.0.1-SNAPSHOT.jar com.digimarket.clustering.App (I named my project differently) I get this error:
java.lang.NoClassDefFoundError:
org/apache/mahout/common/distance/DistanceMeasure
I know it's a dependency issue, I found questions asked by users who had this issue before but couldn't understand how they solved it.
here and here
This is the content of mahout directory in my cluster:
ls /usr/hdp/2.3.4.0-3485/mahout/
bin
conf
doc
lib
mahout-examples-0.9.0.2.3.4.0-3485.jar
mahout-examples-0.9.0.2.3.4.0-3485-job.jar
mahout-integration-0.9.0.2.3.4.0-3485.jar
mahout-math-0.9.0.2.3.4.0-3485.jar
mahout-mrlegacy-0.9.0.2.3.4.0-3485.jar
mahout-mrlegacy-0.9.0.2.3.4.0-3485-job.jar
Thanks.
It looks like you have a dependency that is not available to your code on your cluster.
Based on the pom.xml from that project you should be using:
<properties>
<mahout.version>0.5</mahout.version>
<mahout.groupid>org.apache.mahout</mahout.groupid>
</properties>
...
<dependencies>
<dependency>
<groupId>${mahout.groupid}</groupId>
<artifactId>mahout-core</artifactId>
<version>${mahout.version}</version>
</dependency>
...
</dependencies>
The class org.apache.mahout.common.distance.DistanceMeasure is included in the mahout-core-0.*.jar I have mahout-core-0.7.jar and the class is present in there.
You can download that jar and include it with the -libjars flag or you can put it on the hadoop classpath.