Spark does not find Scala specific methods

Spark does not find Scala specific methods - java

The problem is that every job fails with the following exception:
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
at ps.sparkapp.Classification$.main(Classification.scala:35)
at ps.sparkapp.Classification.main(Classification.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
This exception meas that the task can not find the method. I develop using the intelij community edition. I have no problems compiling the package. All dependencies are packaged correctly. Here my build.sbt:
name := "SparkApp"
version := "1.0"
scalaVersion := "2.11.6"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.1"
libraryDependencies += "org.apache.spark" % "spark-mllib_2.11" % "2.1.1"
scala -version
Scala code runner version 2.11.6 -- Copyright 2002-2013, LAMP/EPFL
I found out that this error has somehow to do with scala because it only happens when I use functionality that is native to scala, e.g scala for loop, .map or .drop(2).
The class and everything is still written in scala, but if i avoid functionality like .map or drop(2) then everything works fine.
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.sql.SparkSession
import org.apache.spark.ml.linalg.Vector
object Classification {
def main(args: Array[String]) {
...
//df.printSchema()
var dataset = df.groupBy("user_id","measurement_date").pivot("rank").min()
val col = dataset.schema.fieldNames.drop(2) // <- here the error happens
// take all features and put them into one vector
val assembler = new VectorAssembler()
.setInputCols(col)
.setOutputCol("features")
val data = assembler.transform(dataset)
data.printSchema()
data.show()
sc.stop()
}
}
As said if I do not use .drop(2) everything works perfectly, but avoiding these methods is no option since that is very painful..
I could not find any solution on the web, any ideas?
BTW: I can use these methods within the spark-shell, which i find strange.
Thanks in advance.
NOTE 1)
I use:
SPARK version 2.1.1
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)

Try adding the actual Scala libraries etc as a project dependency. E.g.:
libraryDependencies += "org.scala-lang" % "scala-library" % "2.11.6"

Related

How do you specify/figure out which minimum JDK is required for running the fat jar?

I used sbt-assembly on a project where I have some java 14 jars, and my local machine has JDK 8 as the default JDK.
The sbt assembly task was successful and produced a fat jar.
When I run it with JDK 8, I get the error:
Exception in thread "main" java.lang.UnsupportedClassVersionError: javafx/event/EventTarget has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
JDK 11 (version 55.0) is the one I need. And sure enough, when I set JDK 11 on my shell, I can run the fat jar.
Is there a way to be explicit about the target JDK version in the build.sbt file?
Also, I'm surprised that even though I have Java 14 jars in the dependency, the application runs fine on JDK 11. Is it just an example of Java's supreme backwards compatibility in action? I would like to know what else could be at work.
This is what my build.sbt looks like
name := "scalafx-app"
version := "0.1"
scalaVersion := "2.13.3"
scalacOptions += "-Ymacro-annotations"
useCoursier := false
assemblyMergeStrategy in assembly := {
case "module-info.class" => MergeStrategy.concat
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
lazy val scalaTest = "org.scalatest" %% "scalatest" % "3.1.1"
lazy val osName = System.getProperty("os.name") match {
case n if n.startsWith("Linux") => "linux"
case n if n.startsWith("Mac") => "mac"
case n if n.startsWith("Windows") => "win"
case _ => throw new Exception("Unknown platform!")
}
lazy val javaFXModules = Seq("base", "controls", "fxml", "graphics", "media", "web")
lazy val root = (project in file("."))
.settings(
libraryDependencies += scalaTest % Test,
// scalafx
libraryDependencies += "org.scalafx" %% "scalafx" % "14-R19",
libraryDependencies ++= javaFXModules.map(m =>
"org.openjfx" % s"javafx-$m" % "14.0.1" classifier(osName) withJavadoc()
),
libraryDependencies += "org.scalafx" %% "scalafxml-core-sfx8" % "0.5",
// javafx custom components
libraryDependencies += "com.jfoenix" % "jfoenix" % "9.0.9",
libraryDependencies += "org.kordamp.ikonli" % "ikonli-javafx" % "11.4.0",
libraryDependencies += "org.kordamp.ikonli" % "ikonli-material-pack" % "11.4.0",
// json parsing
libraryDependencies += "com.typesafe.play" %% "play-json" % "2.9.0",
libraryDependencies += "com.squareup.moshi" % "moshi" % "1.9.3",
// logging
libraryDependencies += "com.typesafe.scala-logging" %% "scala-logging" % "3.9.2",
libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.2.3",
)

A JAR is just like a zip of classes, each class is the one that you can check with javap to see which JDK version they need by looking at the value of the "major version" field; see this.
If you want to ensure the classes are compiled to a specific Java version, you can use the release & target scalac options.
Like this:
// Ensure the compiler emits Java 8 bytecode.
scalacOptions ++= Seq("-release", "8", "-target:8")
release is used to specify which Java sdtlib is used.
target is used t specify which bytcode version is emitted.

Unable to query MongoDB using Spark SQL via MongoDB Connector

The eventual goal that I want to achieve is that I want to query my MongoDB Collection through Spark SQL using Scala Code as an Independent application. I have successfully installed Spark on my local which is running "Windows 10" operating system. I can run spark-shell, Spark Master node and worker node. So from the looks of it, the apache spark is working fine on my p.c
I can also query my MongoDB collection by running the scala code in Spark Shell.
Problem:
When I try to use the same code from my Scala project using MongoDB Spark Connector for scala. I am running into an error which I am unable to figure out. IT seems like an environment issue, I looked it up and many people suggested that it happens if you use Java 9 or higher version. I am using Java 8 so that's not the issue in my case. That is why I have also posted my java -version snapshot in the post.
But when I run the code, I get the following error. It would be great help IF somebody can advise me in any direction.
Scala Code:
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SparkSession
import com.mongodb.spark.config._
import com.mongodb.spark._
object SparkSQLMongoDBConnector {
def main(args: Array[String]): Unit ={
var sc: SparkContext = null
var conf = new SparkConf()
conf.setAppName("MongoSparkConnectorIntro")
.setMaster("local")
.set("spark.hadoop.validateOutputSpecs", "false")
.set("spark.mongodb.input.uri","mongodb://127.0.0.1/metadatastore.metadata_collection?readPreference=primaryPreferred")
.set("spark.mongodb.output.uri","mongodb://127.0.0.1/metadatastore.metadata_collection?readPreference=primaryPreferred")
sc = new SparkContext(conf)
val spark = SparkSession.builder().master("spark://192.168.137.221:7077").appName("MongoSparkConnectorIntro").config("spark.mongodb.input.uri", "mongodb://127.0.0.1/metadatastore.metadata_collection?readPreference=primaryPreferred").config("spark.mongodb.output.uri", "mongodb://127.0.0.1/metadatastore.metadata_collection?readPreference=primaryPreferred").getOrCreate()
val readConfig = ReadConfig(Map("collection" -> "spark", "readPreference.name" -> "secondaryPreferred"), Some(ReadConfig(sc)))
val customRdd = MongoSpark.load(sc, readConfig)
println(customRdd.count)
println(customRdd.first.toString())
}
}
SBT:
scalaVersion := "2.12.8"
libraryDependencies += "org.mongodb.spark" %% "mongo-spark-connector" % "2.4.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.0"
Java Version:
Error:
This is the error that I face when I run the Scala code in the IntelliJ.
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:116)
at org.apache.hadoop.security.Groups.<init>(Groups.java:93)
at org.apache.hadoop.security.Groups.<init>(Groups.java:73)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:293)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInorg.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2422)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2422)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:293)
at SparkSQLMongoDBConnector$.main(SparkSQLMongoDBConnector.scala:35)
at SparkSQLMongoDBConnector.main(SparkSQLMongoDBConnector.scala)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3410)
at java.base/java.lang.String.substring(String.java:1883)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:50)
... 16 more
Any help would be much appreciated.

Shell checks your Java version via java.version variable
private static boolean IS_JAVA7_OR_ABOVE =
System.getProperty("java.version").substring(0, 3).compareTo("1.7") >= 0;
Make sure it is defined.
This line was changed in Hadoop 2.7+, but by default, Spark uses 2.6.5.

How to parse an ObjectId type from the url in the routes of Playframework

I am using play 2.7.x with Java and I am trying to parse the mongodb's ObjectId from the url in the routes config file like this:
GET /tasks/:id/view controllers.TaskController.viewTask(id: org.bson.types.ObjectId)
I don't really need any mongodb features except validating the ObjectId in the url.
My build.sbt file is:
name := """bla-bla-core"""
organization := "com.bla"
maintainer := "bla#bla.com"
version := "1.0-SNAPSHOT"
lazy val root = (project in file(".")).enablePlugins(PlayJava)
scalaVersion := "2.12.8"
libraryDependencies ++= Seq(
guice,
ws,
ehcache,
filters,
"org.mongodb" % "mongo-java-driver" % "3.0.1",
)
I get the following compilation error:
Compilation error[No URL path binder found for type org.bson.types.ObjectId. Try to implement an implicit PathBindable for this type.]
Did anybody set up mongodb's objectId parsing from the route before in Playframework? I assumed it is quite common issue and I would find the solution easily but nothing I tried works :/
All the solutions where talking about some package called "se.radley" %% "play-plugins-salat" that was last maintained in 2016 :D

Try having it as String and transform to ObjectId in controller.

error: eof expected but '}' found. } in playframework 2.3.9

I am using play framwork 2.3.9. and my build.sbt is giving error error: eof expected but '}' found.
}
^
Below is my build.sbt file
import sbt.Keys._
import sbt._
object ApplicationBuild extends Build {
val appName = "ReliaCloud"
val appVersion = "1.0-SNAPSHOT"
//added this for 2.3
lazy val root = (project in file(".")).enablePlugins(PlayJava)
val appDependencies = Seq(
// Add your project dependencies here,
javaCore, jdbc, javaJdbc,
"org.mongodb.morphia" % "morphia" % "1.0.1",
"org.mongodb" % "mongo-java-driver" % "2.10.1",
"postgresql" % "postgresql" % "9.1-901-1.jdbc4",
"ws.securesocial" %% "securesocial" % "2.1.4"
)
val main = Project(appName, file(".")).enablePlugins(play.PlayJava).settings(
resolvers += "Maven repository" at "http://morphia.googlecode.com/svn/mavenrepo/",
resolvers += "MongoDb Java Driver Repository" at "http://repo1.maven.org/maven2/org/mongodb/mongo-java-driver/",
resolvers += Resolver.sonatypeRepo("releases")
)
}
Not able to figure out why I am getting this error.
My application is build using playframework 2.2 and I am trying to migrate it to playframework 2.3.xx

Please look at the document here, in regards to plugin settings, when you move from 2.2 to 2.3 in Playframework's Java version.
Little note: If it possible move to the latest versions currently 2.6; and soon to be 2.7. They are much more efficient and secure.

Java <-> Scala convertion - "value is not a member of"

I have a problem with Scala code with Java methods.
It is calling:
value getDepth is not a member of amqpManagment.utils.data.ChessObject
var depth: Int = chessObjects.getDepth()
^
However i use getDepth in many other places in Java code and it works fine.
Also after put that code it was working in InteliJ by few hours which is weird but maybe project didnt rebuild itself after that change...
However InteliJ shows code is okay, but during compiling it shows that error. Rebuilding by InteliJ or terminal doesnt help.
Scala code:
import amqpManagment.utils.data.ChessObject
object ChessScheduler {
// DEPTH GAME
def startGameWithDepthRule(chessObject: ChessObject) : Integer =
{
...
val depth: Int = chessObjects.getDepth()
...
}
}
Java Code:
#Getter
#Setter
public class ChessObject {
private Integer depth;
...
}
build.sbt
import sbt.Keys._
import sbt.Level
name := "ChessEngineModuler"
logLevel := Level.Warn
version := "1.0"
scalaVersion := "2.12.2"
Thank you for your help.

Hello #Chenna Reddy :)
Thank you for your post, It seems it was problem with Lombok indeed. However after your answer i realised it was a problem because Scala code was compiled before Java one.
I check three solutions cause i had added dependency and Annotation Processor On.
First solution is just adding Getters and Setters to Java class not by the Lombok, however it is ugly solution
Second Solution is just adding in Files -> Settings -> Build, Execution, Deployment -> Compiler -> Scala Compiler -> Compile Order -> Java then Scala.
Third one is set in build.sbt -> compileOrder := CompileOrder.JavaThenScala
I think 3rd is the best one if we want deploy that code somewhere :)

Looks like you are using lombok for auto generation of getters. Please add lombok dependency.
libraryDependencies += "org.projectlombok" % "lombok" % "1.16.16"
Above step is not required if you are building Java project seperately and that project has lombok as a compile time dependency. Then generated jar file must have all the getters already.
Regarding why Intellij shows error sometimes, its possible that you didn't enable annotation processing from Files -> Settings -> Build, Execution, Deployment -> Compiler -> Annotation Processors.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spark does not find Scala specific methods - java

Try adding the actual Scala libraries etc as a project dependency. E.g.: libraryDependencies += "org.scala-lang" % "scala-library" % "2.11.6"

Related

How do you specify/figure out which minimum JDK is required for running the fat jar?

Unable to query MongoDB using Spark SQL via MongoDB Connector

How to parse an ObjectId type from the url in the routes of Playframework

error: eof expected but '}' found. } in playframework 2.3.9

Java <-> Scala convertion - "value is not a member of"

Categories

Resources