Hi I am trying to run a simple java program using Apache Hive and Apache Spark. The program compiles without any error, but on runtime I get the following error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.hive.HiveContext.sql(Ljava/lang/String;)Lorg/apache/spark/sql/DataFrame;
at SparkHiveExample.main(SparkHiveExample.java:13)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Following is my code:
import org.apache.spark.SparkContext;
import org.apache.spark.SparkConf;
import org.apache.spark.sql.hive.HiveContext;
import org.apache.spark.sql.DataFrame;
public class SparkHiveExample {
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("SparkHive Example");
SparkContext sc = new SparkContext(conf);
HiveContext hiveContext = new HiveContext(sc);
System.out.println("Hello World");
DataFrame df = hiveContext.sql("show tables");
df.show();
}
}
My pom.xml file looks as follows:
<project>
<groupId>edu.berkeley</groupId>
<artifactId>simple-project</artifactId>
<modelVersion>4.0.0</modelVersion>
<name>Simple Project</name>
<packaging>jar</packaging>
<version>1.0</version>
<dependencies>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.3.0</version>
</dependency>
</dependencies>
</project>
What could be the problem?
EDIT: I tried using SQLContext.sql() method and I still get a similar method not found runtime error. This stackoverflow answer suggests that the problem is caused due to dependency problem, but I am unable to figure out what.
make sure your spark core and spark hive dependencies are set to the scope of provided as shown below. These dependencies are provided by the cluster and not by your application.
And ensure the version of your spark installation is 1.3 or above. prior to 1.3 the sql method returned a RDD (SchemaRDD) instead of DataFrame. It is most likely the version of spark that is installed is older than 1.3.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.3.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.3.0</version>
<scope>provided</scope>
</dependency>
And it is recommended you use SparkSession object to run queries instead of HiveContext. The below code snippet explains the usage of SparkSession.
val spark = SparkSession.builder.
master("local")
.appName("spark session example")
.enableHiveSupport()
.getOrCreate()
spark.sql("show tables")
The error is because you are applying query of showing table and assigning to a Dataframe.
You can assign to a DataFrame when you use select query or similar queries but not show query
from pyspark.sql.types import DecimalType,StringType
from pyspark.sql.functions import *
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Your APPName").enableHiveSupport().getOrCreate()
from pyspark.sql import HiveContext
hive_context = HiveContext(spark)
hive_context.sql("select current_date()").show()
Related
I want to use spark to retrieve some data from elastic-search data catalogy and I use the offical document's method then wrong here...
This is my code (using Java and JDK 1.8_221):
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.elasticsearch.spark.rdd.api.java.JavaEsSpark;
import scala.Tuple2;
import java.util.Map;
public class Main {
public static void main(String[] args) {
SparkConf conf = new SparkConf();
conf.setMaster("local");
conf.setAppName("Spark ElasticSearch");
conf.set("es.index.auto.create", "true");
conf.set("es.nodes", "10.245.142.213");
conf.set("es.port", "9200");
JavaSparkContext sc = new JavaSparkContext(conf);
sc.setLogLevel("ERROR");
JavaPairRDD<String, Map<String, Object>> esRDD =
JavaEsSpark.esRDD(sc, "au_pkt_ams/au_pkt_ams");
for(Tuple2 tuple: esRDD.collect()){
System.out.print(tuple._1()+"-------------");
System.out.println(tuple._2());
}
}
}
And here is the error report(All logs):
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Partition$class
at org.elasticsearch.spark.rdd.EsPartition.<init>(AbstractEsRDD.scala:84)
at org.elasticsearch.spark.rdd.AbstractEsRDD$$anonfun$getPartitions$1.apply(AbstractEsRDD.scala:49)
at org.elasticsearch.spark.rdd.AbstractEsRDD$$anonfun$getPartitions$1.apply(AbstractEsRDD.scala:48)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:237)
at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
at scala.collection.immutable.List.map(List.scala:298)
at org.elasticsearch.spark.rdd.AbstractEsRDD.getPartitions(AbstractEsRDD.scala:48)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:253)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:945)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:361)
at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:360)
at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
at Main.main(Main.java:24)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.Partition$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 22 more
Process finished with exit code 1
Log says esRDD.collect() is wrong, that they cannot get the file 'Partition.class' but the file is actully exists.
I had the same issue. I found that this was a scala version incompatibility between Spark and Elastic libraries. In my case Spark libraries are included with scala 2.12 but then I realised that elasticsearch-spark connector includes only scala 2.11.
I updated Spark libraries to scala 2.11 and also changed elasticsearch-hadoop to elasticsearch-spark-20_2.11. Here is the new pom.xml entries which works fine for me.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.4.2</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-20_2.11</artifactId>
<version>7.4.2</version>
</dependency>
I'm trying to run a spark stream from a kafka queue containing Avro messages.
As per https://spark.apache.org/docs/latest/sql-data-sources-avro.html I should be able to use from_avro to convert column value to Dataset<Row>.
However, I'm unable to compile the project as it complains from_avro cannot be found. I can see the method declared in package.class of the dependency.
How can I use the from_avro method from org.apache.spark.sql.avro in my Java code locally?
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import static org.apache.spark.sql.functions.*;
import org.apache.spark.sql.avro.*;
public class AvroStreamTest {
public static void main(String[] args) throws IOException, InterruptedException {
// Creating local sparkSession here...
Dataset<Row> df = sparkSession
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "host:port")
.option("subscribe", "avro_queue")
.load();
// Cannot resolve method 'from_avro'...
df.select(from_avro(col("value"), jsonFormatSchema)).writeStream().format("console")
.outputMode("update")
.start();
}
}
pom.xml:
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.11</artifactId>
<version>2.4.0</version>
</dependency>
<!-- more dependencies below -->
</dependencies>
It seems like Java is unable to import names from sql.avro.package.class
It's because of the generated class names, importing it as import org.apache.spark.sql.avro.package$; and then using package$.MODULE$.from_avro(...) should work
You need to include spark-sql-avro in your pom.xml which is available at
https://mvnrepository.com/artifact/org.apache.spark/spark-sql-avro_2.11/2.4.0-palantir.28-1-gdf34e2d
I'm trying to run the Logistic Regression example (https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaLogisticRegressionWithElasticNetExample.java)
This is the code:
public final class GettingStarted {
public static void main(final String[] args) throws InterruptedException {
System.setProperty("hadoop.home.dir", "C:\\winutils");
SparkSession spark = SparkSession
.builder()
.appName("JavaLogisticRegressionWithElasticNetExample")
.config("spark.master", "local")
.getOrCreate();
// $example on$
// Load training data
Dataset<Row> training = spark.read().format("libsvm").load("data/mllib/sample_libsvm_data.txt");
LogisticRegression lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8);
// Fit the model
LogisticRegressionModel lrModel = lr.fit(training);
// Print the coefficients and intercept for logistic regression
System.out.println("Coefficients: "
+ lrModel.coefficients() + " Intercept: " + lrModel.intercept());
// We can also use the multinomial family for binary classification
LogisticRegression mlr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
.setFamily("multinomial");
// Fit the model
LogisticRegressionModel mlrModel = mlr.fit(training);
// Print the coefficients and intercepts for logistic regression with multinomial family
System.out.println("Multinomial coefficients: " + lrModel.coefficientMatrix()
+ "\nMultinomial intercepts: " + mlrModel.interceptVector());
// $example off$
spark.stop();}}
I'm also using the same file of the example (https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt)
But I get these errors:
Exception in thread "main" java.lang.AssertionError: assertion failed: unsafe symbol CompatContext (child of package macrocompat) in runtime reflection universe
at scala.reflect.internal.Symbols$Symbol.<init>(Symbols.scala:184)
at scala.reflect.internal.Symbols$TypeSymbol.<init>(Symbols.scala:2984)
at scala.reflect.internal.Symbols$ClassSymbol.<init>(Symbols.scala:3176)
at scala.reflect.internal.Symbols$StubClassSymbol.<init>(Symbols.scala:3471)
at scala.reflect.internal.Symbols$Symbol.newStubSymbol(Symbols.scala:498)
at scala.reflect.internal.pickling.UnPickler$Scan.readExtSymbol$1(UnPickler.scala:258)
at scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:284)
at scala.reflect.internal.pickling.UnPickler$Scan.readSymbolRef(UnPickler.scala:649)
at scala.reflect.internal.pickling.UnPickler$Scan.readType(UnPickler.scala:417)
at scala.reflect.internal.pickling.UnPickler$Scan$LazyTypeRef$$anonfun$6.apply(UnPickler.scala:725)
at scala.reflect.internal.pickling.UnPickler$Scan$LazyTypeRef$$anonfun$6.apply(UnPickler.scala:725)
at scala.reflect.internal.pickling.UnPickler$Scan.at(UnPickler.scala:179)
at scala.reflect.internal.pickling.UnPickler$Scan$LazyTypeRef.completeInternal(UnPickler.scala:725)
at scala.reflect.internal.pickling.UnPickler$Scan$LazyTypeRef.complete(UnPickler.scala:749)
at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1489)
at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$12.scala$reflect$runtime$SynchronizedSymbols$SynchronizedSymbol$$super$info(SynchronizedSymbols.scala:162)
at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anonfun$info$1.apply(SynchronizedSymbols.scala:127)
at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anonfun$info$1.apply(SynchronizedSymbols.scala:127)
at scala.reflect.runtime.Gil$class.gilSynchronized(Gil.scala:19)
at scala.reflect.runtime.JavaUniverse.gilSynchronized(JavaUniverse.scala:16)
at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$class.gilSynchronizedIfNotThreadsafe(SynchronizedSymbols.scala:123)
at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$12.gilSynchronizedIfNotThreadsafe(SynchronizedSymbols.scala:162)
at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$class.info(SynchronizedSymbols.scala:127)
at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$12.info(SynchronizedSymbols.scala:162)
at scala.reflect.internal.Mirrors$RootsBase.ensureClassSymbol(Mirrors.scala:94)
at scala.reflect.internal.Mirrors$RootsBase.getClassByName(Mirrors.scala:102)
at scala.reflect.internal.Mirrors$RootsBase.getClassIfDefined(Mirrors.scala:114)
at scala.reflect.internal.Mirrors$RootsBase.getClassIfDefined(Mirrors.scala:111)
at scala.reflect.internal.Definitions$DefinitionsClass.BlackboxContextClass$lzycompute(Definitions.scala:496)
at scala.reflect.internal.Definitions$DefinitionsClass.BlackboxContextClass(Definitions.scala:496)
at scala.reflect.runtime.JavaUniverseForce$class.force(JavaUniverseForce.scala:305)
at scala.reflect.runtime.JavaUniverse.force(JavaUniverse.scala:16)
at scala.reflect.runtime.JavaUniverse.init(JavaUniverse.scala:147)
at scala.reflect.runtime.JavaUniverse.<init>(JavaUniverse.scala:78)
at scala.reflect.runtime.package$.universe$lzycompute(package.scala:17)
at scala.reflect.runtime.package$.universe(package.scala:17)
at org.apache.spark.sql.catalyst.ScalaReflection$.<init>(ScalaReflection.scala:40)
at org.apache.spark.sql.catalyst.ScalaReflection$.<clinit>(ScalaReflection.scala)
at org.apache.spark.sql.catalyst.encoders.RowEncoder$.org$apache$spark$sql$catalyst$encoders$RowEncoder$$serializerFor(RowEncoder.scala:74)
at org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:61)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:67)
at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:415)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:172)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)
at GettingStarted.main(GettingStarted.java:95)
Do you know what I'm wrong about?
EDIT:
I run it on IntelliJ, it is a Maven project and I added the dependencies:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.mongodb.spark</groupId>
<artifactId>mongo-spark-connector_2.11</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.10</artifactId>
<version>2.2.0</version>
</dependency>
tl;dr As soon as you start seeing errors internal to scala, mentionning reflection universe, think incompatible scala versions.
Your scala versions on your libs do not match one another (2.10 and 2.11).
You should align all on your actual scala version.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId> <!-- This is scala v2.11 -->
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.10</artifactId> <!-- This is scala v2.10 -->
<version>2.2.0</version>
</dependency>
I am trying to get data from Cloudant using Java code and getting error,
I tried with below Spark and cloudant-spark version,
Spark 2.0.0,
Spark 2.0.1,
Spark 2.0.2
Getting same error for all version as error posted below.
If I add scala dependencies to resolve error this error than it is conflicting with Spark library.
Below is my java code,
package spark.cloudant.connecter;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.SQLContext;
import com.cloudant.spark.*;
public class cloudantconnecter {
public static void main(String[] args) throws Exception {
try {
SparkConf sparkConf = new SparkConf().setAppName("spark cloudant connecter").setMaster("local[*]");
sparkConf.set("spark.streaming.concurrentJobs", "30");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
SQLContext sqlContext = new SQLContext(sc);
System.out.print("initialization successfully");
Dataset<org.apache.spark.sql.Row> st = sqlContext.read().format("com.cloudant.spark")
.option("cloudant.host", "HOSTNAME").option("cloudant.username", "USERNAME")
.option("cloudant.password", "PASSWORD").load("DATABASENAME");
st.printSchema();
} catch (
Exception e) {
e.printStackTrace();
}
}
}
Maven Dependencies
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.10</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>cloudant-labs</groupId>
<artifactId>spark-cloudant</artifactId>
<version>2.0.0-s_2.11</version>
</dependency>
</dependencies>
Getting error details,
Exception in thread "main" java.lang.NoSuchMethodError: scala/Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; (loaded from file:/C:/Users/Administrator/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar by sun.misc.Launcher$AppClassLoader#9f916f97) called from class scalaj.http.HttpConstants$ (loaded from file:/C:/Users/Administrator/.m2/repository/org/scalaj/scalaj-http_2.11/2.3.0/scalaj-http_2.11-2.3.0.jar by sun.misc.Launcher$AppClassLoader#9f916f97).
at scalaj.http.HttpConstants$.liftedTree1$1(Http.scala:637)
at scalaj.http.HttpConstants$.<init>(Http.scala:636)
at scalaj.http.HttpConstants$.<clinit>(Http.scala)
at scalaj.http.BaseHttp$.$lessinit$greater$default$2(Http.scala:754)
at scalaj.http.Http$.<init>(Http.scala:738)
at scalaj.http.Http$.<clinit>(Http.scala)
at com.cloudant.spark.common.JsonStoreDataAccess.getQueryResult(JsonStoreDataAccess.scala:152)
at com.cloudant.spark.common.JsonStoreDataAccess.getTotalRows(JsonStoreDataAccess.scala:99)
at com.cloudant.spark.common.JsonStoreRDD.totalRows$lzycompute(JsonStoreRDD.scala:56)
at com.cloudant.spark.common.JsonStoreRDD.totalRows(JsonStoreRDD.scala:55)
at com.cloudant.spark.common.JsonStoreRDD.totalPartition$lzycompute(JsonStoreRDD.scala:59)
at com.cloudant.spark.common.JsonStoreRDD.totalPartition(JsonStoreRDD.scala:58)
at com.cloudant.spark.common.JsonStoreRDD.getPartitions(JsonStoreRDD.scala:81)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1934)
at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1046)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.RDD.fold(RDD.scala:1040)
at org.apache.spark.sql.execution.datasources.json.InferSchema$.infer(InferSchema.scala:68)
at org.apache.spark.sql.DataFrameReader$$anonfun$3.apply(DataFrameReader.scala:317)
at org.apache.spark.sql.DataFrameReader$$anonfun$3.apply(DataFrameReader.scala:317)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:316)
at com.cloudant.spark.DefaultSource.create(DefaultSource.scala:127)
at com.cloudant.spark.DefaultSource.createRelation(DefaultSource.scala:105)
at com.cloudant.spark.DefaultSource.createRelation(DefaultSource.scala:100)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:315)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
at spark.cloudant.connecter.cloudantconnecter.main(cloudantconnecter.java:24)
Error is showing because mentioned library in question using scala 2.10 and mentioned package spark cloudant library using 2.11
So please change library spark-core_2.10 to spark-core_2.11
So now dependencies are,
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.0.1</version>
</dependency>
<dependency>
<groupId>cloudant-labs</groupId>
<artifactId>spark-cloudant</artifactId>
<version>2.0.0-s_2.11</version>
</dependency>
I am trying to connect to spark master on a remote system through java app
I am using
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.1</version>
</dependency>
and code
{
SparkSession sparkSession = SparkSession.builder().
master("spark://ip:7077")
.appName("spark session example")
.getOrCreate();
JavaSparkContext sc = new JavaSparkContext(sparkSession.sparkContext());
}
Getting
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at org.apache.spark.sql.SparkSession$Builder.config(SparkSession.scala:713)
at org.apache.spark.sql.SparkSession$Builder.master(SparkSession.scala:766)
at com.mobelisk.spark.JavaSparkPi.main(JavaSparkPi.java:9)
Also If I change to
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
**<version>2.0.1</version>**
</dependency>
on the same program getting
Caused by: java.lang.RuntimeException: java.io.InvalidClassException: org.apache.spark.rpc.netty.RequestMessage; local class incompatible: stream classdesc serialVersionUID = -2221986757032131007, local class serialVersionUID = -5447855329526097695
In Spark-shell on remote
Spark context available as 'sc' (master = local[*], app id = local-1477561433881).
Spark session available as 'spark'.
Welcome to
____ __
/ / _ _____/ /
_\ / _ / _ `/ / '/
// .__/_,// //_\ version 2.0.1
//
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101)
Type in expressions to have them evaluated.
Type :help for more information.
As I am very new to all this, I am not able to figure out the issue in program
I figured it out, posting this in case if someone is going to follow the similar approach.
I had added
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>2.0.0-M3</version>
which comes with scala-library 2.10.6
but there already exists a scala-library 2.11.8 in spark-core
so I had to exclude the earlier one like this
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>2.0.0-M3</version>
<exclusions>
<exclusion>
<artifactId>scala-library</artifactId>
<groupId>org.scala-lang</groupId>
</exclusion>
<exclusion>
<artifactId>scala-reflect</artifactId>
<groupId>org.scala-lang</groupId>
</exclusion>
</exclusions>
</dependency>
Now everything is working fine
This Spark version mismatch:
you use 2.10 in project.
cluster uses 2.11
Update dependency to 2.11.