How can I run DataNucleus Bytecode Enhancer from SBT? - java

I've put together a proof of concept which aims to provide a skeleton SBT multimodule project which utilizes DataNucleus JDO Enhancer with mixed Java and Scala sources.
The difficulty appears when I try to enhance persistence classes from SBT. Apparently, I'm not passing the correct classpath when calling Fork.java.fork(...) from SBT.
See also this question:
How can SBT generate metamodel classes from model classes using DataNucleus?
Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class org.datanucleus.util.Localiser
at org.datanucleus.metadata.MetaDataManagerImpl.loadPersistenceUnit(MetaDataManagerImpl.java:1104)
at org.datanucleus.enhancer.DataNucleusEnhancer.getFileMetadataForInput(DataNucleusEnhancer.java:768)
at org.datanucleus.enhancer.DataNucleusEnhancer.enhance(DataNucleusEnhancer.java:488)
at org.datanucleus.api.jdo.JDOEnhancer.enhance(JDOEnhancer.java:125)
at javax.jdo.Enhancer.run(Enhancer.java:196)
at javax.jdo.Enhancer.main(Enhancer.java:130)
[info] Compiling 2 Java sources to /home/rgomes/workspace/poc-scala-datanucleus/model/target/scala-2.11/klasses...
java.lang.IllegalStateException: errno = 1
at $54321831a5683ffa07b5$.runner(build.sbt:230)
at $54321831a5683ffa07b5$$anonfun$model$7.apply(build.sbt:259)
at $54321831a5683ffa07b5$$anonfun$model$7.apply(build.sbt:258)
at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
at sbt.std.Transform$$anon$4.work(System.scala:63)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
at sbt.Execute.work(Execute.scala:235)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159)
at sbt.CompletionService$$anon$2.call(CompletionService.scala:28)
For the sake of completeness and information, below you can see a java command line generated by SBT which can be executed by hand on a separate window, for example. It just works fine.
$ java -cp /home/rgomes/.ivy2/cache/org.scala-lang/scala-library/jars/scala-library-2.11.6.jar:/home/rgomes/.ivy2/cache/com.google.code.gson/gson/jars/gson-2.3.1.jar:/home/rgomes/.ivy2/cache/javax.jdo/jdo-api/jars/jdo-api-3.0.jar:/home/rgomes/.ivy2/cache/javax.transaction/transaction-api/jars/transaction-api-1.1.jar:/home/rgomes/.ivy2/cache/org.datanucleus/datanucleus-core/jars/datanucleus-core-4.0.4.jar:/home/rgomes/.ivy2/cache/org.datanucleus/datanucleus-api-jdo/jars/datanucleus-api-jdo-4.0.4.jar:/home/rgomes/.ivy2/cache/org.datanucleus/datanucleus-jdo-query/jars/datanucleus-jdo-query-4.0.4.jar:/home/rgomes/.ivy2/cache/org.datanucleus/datanucleus-rdbms/jars/datanucleus-rdbms-4.0.4.jar:/home/rgomes/.ivy2/cache/com.h2database/h2/jars/h2-1.4.185.jar:/home/rgomes/.ivy2/cache/org.postgresql/postgresql/jars/postgresql-9.4-1200-jdbc41.jar:/home/rgomes/.ivy2/cache/com.github.dblock.waffle/waffle-jna/jars/waffle-jna-1.7.jar:/home/rgomes/.ivy2/cache/net.java.dev.jna/jna/jars/jna-4.1.0.jar:/home/rgomes/.ivy2/cache/net.java.dev.jna/jna-platform/jars/jna-platform-4.1.0.jar:/home/rgomes/.ivy2/cache/org.slf4j/slf4j-simple/jars/slf4j-simple-1.7.7.jar:/home/rgomes/.ivy2/cache/org.slf4j/slf4j-api/jars/slf4j-api-1.7.7.jar:/home/rgomes/workspace/poc-scala-datanucleus/model/src/main/resources:/home/rgomes/workspace/poc-scala-datanucleus/model/target/scala-2.11/klasses javax.jdo.Enhancer -v -pu persistence-h2 -d /home/rgomes/workspace/poc-scala-datanucleus/model/target/scala-2.11/classes
May 13, 2015 3:30:07 PM org.datanucleus.enhancer.ClassEnhancerImpl save
INFO: Writing class file "/home/rgomes/workspace/poc-scala-datanucleus/model/target/scala-2.11/classes/model/AbstractModel.class" with enhanced definition
May 13, 2015 3:30:07 PM org.datanucleus.enhancer.DataNucleusEnhancer addMessage
INFO: ENHANCED (Persistable) : model.AbstractModel
May 13, 2015 3:30:07 PM org.datanucleus.enhancer.ClassEnhancerImpl save
INFO: Writing class file "/home/rgomes/workspace/poc-scala-datanucleus/model/target/scala-2.11/classes/model/Identifier.class" with enhanced definition
May 13, 2015 3:30:07 PM org.datanucleus.enhancer.DataNucleusEnhancer addMessage
INFO: ENHANCED (Persistable) : model.Identifier
May 13, 2015 3:30:07 PM org.datanucleus.enhancer.DataNucleusEnhancer addMessage
INFO: DataNucleus Enhancer completed with success for 2 classes. Timings : input=112 ms, enhance=102 ms, total=214 ms. Consult the log for full details
Enhancer Processing -v.
Enhancer adding Persistence Unit persistence-h2.
Enhancer processing output directory /home/rgomes/workspace/poc-scala-datanucleus/model/target/scala-2.11/classes.
Enhancer found JDOEnhancer of class org.datanucleus.api.jdo.JDOEnhancer.
Enhancer property key:VendorName value:DataNucleus.
Enhancer property key:VersionNumber value:4.0.4.
Enhancer property key:API value:JDO.
Enhancer enhanced 2 classes.
Below you can see some debugging information which is passed to Fork.java.fork(...):
=============================================================
mainClass=javax.jdo.Enhancer
args=-v -pu persistence-h2 -d /home/rgomes/workspace/poc-scala-datanucleus/model/target/scala-2.11/classes
javaHome=None
cwd=/home/rgomes/workspace/poc-scala-datanucleus/model/target/scala-2.11/classes
runJVMOptions=
bootJars ---------------------------------------------
/home/rgomes/.ivy2/cache/org.scala-lang/scala-library/jars/scala-library-2.11.6.jar
/home/rgomes/.ivy2/cache/com.google.code.gson/gson/jars/gson-2.3.1.jar
/home/rgomes/.ivy2/cache/javax.jdo/jdo-api/jars/jdo-api-3.0.jar
/home/rgomes/.ivy2/cache/javax.transaction/transaction-api/jars/transaction-api-1.1.jar
/home/rgomes/.ivy2/cache/org.datanucleus/datanucleus-core/jars/datanucleus-core-4.0.4.jar
/home/rgomes/.ivy2/cache/org.datanucleus/datanucleus-api-jdo/jars/datanucleus-api-jdo-4.0.4.jar
/home/rgomes/.ivy2/cache/org.datanucleus/datanucleus-jdo-query/jars/datanucleus-jdo-query-4.0.4.jar
/home/rgomes/.ivy2/cache/org.datanucleus/datanucleus-rdbms/jars/datanucleus-rdbms-4.0.4.jar
/home/rgomes/.ivy2/cache/com.h2database/h2/jars/h2-1.4.185.jar
/home/rgomes/.ivy2/cache/org.postgresql/postgresql/jars/postgresql-9.4-1200-jdbc41.jar
/home/rgomes/.ivy2/cache/com.github.dblock.waffle/waffle-jna/jars/waffle-jna-1.7.jar
/home/rgomes/.ivy2/cache/net.java.dev.jna/jna/jars/jna-4.1.0.jar
/home/rgomes/.ivy2/cache/net.java.dev.jna/jna-platform/jars/jna-platform-4.1.0.jar
/home/rgomes/.ivy2/cache/org.slf4j/slf4j-simple/jars/slf4j-simple-1.7.7.jar
/home/rgomes/.ivy2/cache/org.slf4j/slf4j-api/jars/slf4j-api-1.7.7.jar
/home/rgomes/workspace/poc-scala-datanucleus/model/src/main/resources
/home/rgomes/workspace/poc-scala-datanucleus/model/target/scala-2.11/klasses
envVars ----------------------------------------------
=============================================================
The project is available in github for your convenience at
https://github.com/frgomes/poc-scala-datanucleus
Just download it and type
./sbt compile
Any help is immensely appreciated. Thanks

You can either use java.lang.ProcessBuilder or sbt.Fork.
See below a generic javaRunner you can add to your build.sbt which employs java.lang.ProcessBuilder.
See also a generic sbtRunner you can add to your build.sbt which employs sbt.Fork. Thanks to #dwijnand for providing insightful information for making sbtRunner work as expected.
def javaRunner(mainClass: String,
args: Seq[String],
classpath: Seq[File],
cwd: File,
javaHome: Option[File] = None,
runJVMOptions: Seq[String] = Nil,
envVars: Map[String, String] = Map.empty,
connectInput: Boolean = false,
outputStrategy: Option[OutputStrategy] = Some(StdoutOutput)): Seq[File] = {
val java_ : String = javaHome.fold("") { p => p.absolutePath + "/bin/" } + "java"
val jvm_ : Seq[String] = runJVMOptions.map(p => p.toString)
val cp_ : Seq[String] = classpath.map(p => p.absolutePath)
val env_ = envVars.map({ case (k,v) => s"${k}=${v}" })
val xcmd_ : Seq[String] = Seq(java_) ++ jvm_ ++ Seq("-cp", cp_.mkString(java.io.File.pathSeparator), mainClass) ++ args
println("=============================================================")
println(xcmd_.mkString(" "))
println("=============================================================")
println("")
IO.createDirectory(cwd)
import scala.collection.JavaConverters._
val cmd = xcmd_.asJava
val pb = new java.lang.ProcessBuilder(cmd)
pb.directory(cwd)
pb.inheritIO
val process = pb.start()
def cancel() = {
println("Run canceled.")
process.destroy()
1
}
val errno = try process.waitFor catch { case e: InterruptedException => cancel() }
if(errno==0) {
if (args.contains("-v")) cwd.list.foreach(f => println(f))
cwd.listFiles
} else {
throw new IllegalStateException(s"errno = ${errno}")
}
}
def sbtRunner(mainClass: String,
args: Seq[String],
classpath: Seq[File],
cwd: File,
javaHome: Option[File] = None,
runJVMOptions: Seq[String] = Nil,
envVars: Map[String, String] = Map.empty,
connectInput: Boolean = false,
outputStrategy: Option[OutputStrategy] = Some(StdoutOutput)): Seq[File] = {
val args_ = args.map(p => p.toString)
val java_ = javaHome.fold("None") { p => p.absolutePath }
val cp_ = classpath.map(p => p.absolutePath)
val jvm_ = runJVMOptions.map(p => p.toString) ++ Seq("-cp", cp_.mkString(java.io.File.pathSeparator))
val env_ = envVars.map({ case (k,v) => s"${k}=${v}" })
def dump: String =
s"""
|mainClass=${mainClass}
|args=${args_.mkString(" ")}
|javaHome=${java_}
|cwd=${cwd.absolutePath}
|runJVMOptions=${jvm_.mkString(" ")}
|classpath --------------------------------------------
|${cp_.mkString("\n")}
|envVars ----------------------------------------------
|${env_.mkString("\n")}
""".stripMargin
def cmd: String =
s"""java ${jvm_.mkString(" ")} ${mainClass} ${args_.mkString(" ")}"""
println("=============================================================")
println(dump)
println("=============================================================")
println(cmd)
println("=============================================================")
println("")
IO.createDirectory(cwd)
val options =
ForkOptions(
javaHome = javaHome,
outputStrategy = outputStrategy,
bootJars = Seq.empty,
workingDirectory = Option(cwd),
runJVMOptions = jvm_,
connectInput = connectInput,
envVars = envVars)
val process = new Fork("java", Option(mainClass)).fork(options, args)
def cancel() = {
println("Run canceled.")
process.destroy()
1
}
val errno = try process.exitValue() catch { case e: InterruptedException => cancel() }
if(errno==0) {
if (args.contains("-v")) cwd.list.foreach(f => println(f))
cwd.listFiles
} else {
throw new IllegalStateException(s"errno = ${errno}")
}
}
Then you need to wire DataNucleus Enhancer as part of your build process. This is done via manipulateBytecode sub-task, as demonstrated below:
lazy val model =
project.in(file("model"))
// .settings(publishSettings:_*)
.settings(librarySettings:_*)
.settings(paranoidOptions:_*)
.settings(otestFramework: _*)
.settings(deps_tagging:_*)
//-- .settings(deps_stream:_*)
.settings(deps_database:_*)
.settings(
Seq(
// This trick requires SBT 0.13.8
manipulateBytecode in Compile := {
val previous = (manipulateBytecode in Compile).value
sbtRunner( // javaRunner also works!
mainClass = "javax.jdo.Enhancer",
args =
Seq(
"-v",
"-pu", "persistence-h2",
"-d", (classDirectory in Compile).value.absolutePath),
classpath =
(managedClasspath in Compile).value.files ++
(unmanagedResourceDirectories in Compile).value :+
(classDirectory in Compile).value,
cwd = (classDirectory in Compile).value,
javaHome = javaHome.value,
envVars = (envVars in Compile).value
)
previous
}
):_*)
.dependsOn(util)
For a complete example, including a few JDO annotated persistence classes and some rudimentary test cases, please have a look at
http://github.com/frgomes/poc-scala-datanucleus

I think the issue is you're passing your dependency jars as boot jars not as the classpath.
From your poc project perhaps something like:
val jvm_ = runJVMOptions.map(p => p.toString) ++
Seq("-cp", cp_ mkString java.io.File.pathSeparator)

Related

sbt-assembly and Lucene "An SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene94' does not exist.ยจ exception

OS: Ubuntu 22.10
java: openjdk version "19.0.1" 2022-10-18
scala: 2.13.10
Apache Lucene: 9.4.2
I took the Lucene documentation example and convert it to Scala program:
package test
import org.apache.lucene.analysis.standard.StandardAnalyzer
import org.apache.lucene.document.{Document, Field, TextField}
import org.apache.lucene.index.{DirectoryReader, IndexWriter, IndexWriterConfig}
import org.apache.lucene.queryparser.classic.QueryParser
import org.apache.lucene.search.{IndexSearcher, Query, ScoreDoc}
import org.apache.lucene.store.FSDirectory
import java.nio.file.{Files, Path}
object Test extends App {
val analyzer: StandardAnalyzer = new StandardAnalyzer()
val indexPath: Path = Files.createTempDirectory("tempIndex")
val directory: FSDirectory = FSDirectory.open(indexPath)
val config: IndexWriterConfig = new IndexWriterConfig(analyzer)
val iwriter: IndexWriter = new IndexWriter(directory, config)
val doc: Document = new Document()
val text: String = "This is the text to be indexed."
doc.add(new Field("fieldname", text, TextField.TYPE_STORED))
iwriter.addDocument(doc)
iwriter.close()
// Now search the index:
val ireader: DirectoryReader = DirectoryReader.open(directory)
val isearcher: IndexSearcher = new IndexSearcher(ireader)
// Parse a simple query that searches for "text":
val parser: QueryParser = new QueryParser("fieldname", analyzer)
val query: Query = parser.parse("text")
val hits: Array[ScoreDoc] = isearcher.search(query, 10).scoreDocs
assert (hits.length == 1)
// Iterate through the results:
for (i <- hits.indices) {
val hitDoc = isearcher.doc(hits(i).doc)
assert("This is the text to be indexed.".equals(hitDoc.get("fieldname")))
}
ireader.close()
directory.close()
println("The end!")
}
If I use the following sbt file:
ThisBuild / version := "0.1.0-SNAPSHOT"
ThisBuild / scalaVersion := "2.13.10"
lazy val root = (project in file("."))
.settings(
name := "Test"
)
val luceneVersion = "9.4.2"
libraryDependencies ++= Seq(
"org.apache.lucene" % "lucene-core" % luceneVersion,
"org.apache.lucene" % "lucene-queryparser" % luceneVersion
)
The compilation gives me the error:
[error] Deduplicate found different file contents in the following:
[error] Jar name = lucene-core-9.4.2.jar, jar org = org.apache.lucene, entry target = module-info.class
[error] Jar name = lucene-queries-9.4.2.jar, jar org = org.apache.lucene, entry target = module-info.class
[error] Jar name = lucene-queryparser-9.4.2.jar, jar org = org.apache.lucene, entry target = module-info.class
[error] Jar name = lucene-sandbox-9.4.2.jar, jar org = org.apache.lucene, entry target = module-info.class
So I included in the sbt file:
assembly / assemblyMergeStrategy := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case _ => MergeStrategy.first
}
After that the compilation and execution of the program were ok:
sbt "runMain test.Test"
But if I want to create a fat jar file and execute it, I got the following exception:
plugins.sbt :
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.1.0")
java -cp target/scala-2.13/Test-assembly-0.1.0-SNAPSHOT.jar test.Test
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.lucene.codecs.Codec.getDefault(Codec.java:141)
at org.apache.lucene.index.LiveIndexWriterConfig.<init>(LiveIndexWriterConfig.java:128)
at org.apache.lucene.index.IndexWriterConfig.<init>(IndexWriterConfig.java:145)
at test.Test$.delayedEndpoint$test$Test$1(Test.scala:17)
at test.Test$delayedInit$body.apply(Test.scala:12)
at scala.Function0.apply$mcV$sp(Function0.scala:42)
at scala.Function0.apply$mcV$sp$(Function0.scala:42)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
at scala.App.$anonfun$main$1(App.scala:98)
at scala.App.$anonfun$main$1$adapted(App.scala:98)
at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:575)
at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:573)
at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
at scala.App.main(App.scala:98)
at scala.App.main$(App.scala:96)
at test.Test$.main(Test.scala:12)
at test.Test.main(Test.scala)
Caused by: java.lang.IllegalArgumentException: An SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene94' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: []
at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:113)
at org.apache.lucene.codecs.Codec$Holder.<clinit>(Codec.java:58)
... 17 more
So, what did I do wrong?
Thanks.
case PathList("META-INF", xs # _*) => MergeStrategy.discard means that you're ignoring all META-INF directories (the whole their content). This is dangerous. The dependencies lucene-core and lucene-sandbox have service files in their META-INF. You should be more selective in what you ignore. Try to ignore only Java 9+ files module-info.class
assembly / assemblyMergeStrategy := {
case x if x.endsWith("module-info.class") => MergeStrategy.discard
case x =>
val oldStrategy = (assembly / assemblyMergeStrategy).value
oldStrategy(x)
}
or at least unignore META-INF/services subdirectories
assembly / assemblyMergeStrategy := {
case PathList("META-INF", "services", xs # _*) => MergeStrategy.concat
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case _ => MergeStrategy.first
}
Drools fat jar nullpointer KieServices
Run Drools Kie project from fat jar

Scala classNotFound on Any type

I'm trying to make a generic value type in my HashMap like so:
val aMap = ArrayBuffer[HashMap[String, Any]]()
aMap += HashMap()
aMap(0)("aKey") = "aStringVal"
aMap(0)("aKey2") = true // a bool value
aMap(0)("aKey3") = 23 // an int value
This works in my spark-shell but it gives me this ClassNotFoundException on scala.Any in my IntelliJ Project:
org.apache.spark.streaming.scheduler.JobScheduler logError - Error running job streaming job 1521859195000 ms.0
java.lang.ClassNotFoundException: scala.Any
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
I'm using Scala 2.11. Any ideas what could be causing this?
What this ended up being for me was creating a DataFrame with mixed data using .toDF
I had:
val baseDataFrame = Seq(
("value1", "one"),
("value2", 2),
("value3", 3)
).toDF("column1", "column2")
and this change fixed the issue:
val baseDataFrame = Seq(
("value1", "one"),
("value2", "2"),
("value3", "3")
).toDF("column1", "column2")

java.lang.NoClassDefFoundError: Could not initialize class when launching spark job via spark-submit in scala code

I have a code which looks like below
object ErrorTest {
case class APIResults(status:String, col_1:Long, col_2:Double, ...)
def funcA(rows:ArrayBuffer[Row])(implicit defaultFormats:DefaultFormats):ArrayBuffer[APIResults] = {
//call some API ang get results and return APIResults
...
}
// MARK: load properties
val props = loadProperties()
private def loadProperties(): Properties = {
val configFile = new File("config.properties")
val reader = new FileReader(configFile)
val props = new Properties()
props.load(reader)
props
}
def main(args: Array[String]): Unit = {
val prop_a = props.getProperty("prop_a")
val session = Context.initialSparkSession();
import session.implicits._
val initialSet = ArrayBuffer.empty[Row]
val addToSet = (s: ArrayBuffer[Row], v: Row) => (s += v)
val mergePartitionSets = (p1: ArrayBuffer[Row], p2: ArrayBuffer[Row]) => (p1 ++= p2)
val sql1 =
s"""
select * from tbl_a where ...
"""
session.sql(sql1)
.rdd.map{row => {implicit val formats = DefaultFormats; (row.getLong(6), row)}}
.aggregateByKey(initialSet)(addToSet,mergePartitionSets)
.repartition(40)
.map{case (rowNumber,rows) => {implicit val formats = DefaultFormats; funcA(rows)}}
.flatMap(x => x)
.toDF()
.write.mode(SaveMode.Overwrite).saveAsTable("tbl_b")
}
}
when I run it via spark-submit, it throws error Caused by: java.lang.NoClassDefFoundError: Could not initialize class staging_jobs.ErrorTest$. But if I move val props = loadProperties() into the first line of main method, then there's no error anymore. Could anyone give me a explanation on this phenomenon? Thanks a lot!
Caused by: java.lang.NoClassDefFoundError: Could not initialize class staging_jobs.ErrorTest$
at staging_jobs.ErrorTest$$anonfun$main$1.apply(ErrorTest.scala:208)
at staging_jobs.ErrorTest$$anonfun$main$1.apply(ErrorTest.scala:208)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:243)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193)
... 8 more
I've met the same question as you. I defined a method convert outside main method. When I use it with dataframe.rdd.map{x => convert(x)} in main , NoClassDefFoundError:Could not initialize class Test$ happened.
But when I use a function object convertor, which is the same code with convert method, in main method, no error happened.
I used spark 2.1.0, scala 2.11, it seems like a bug in spark?
I guess the problem is that val props = loadProperties() defines a member for the outer class (of main). Then this member will be serialized (or run) on the executors, which do not have the save environment with the driver.

ClassNotFoundException while trying to use slick codegen

I am trying to use a custom codegen for the purpose of acquiring DateTime types from mysql instead of Timestamp. I just couldn't make the sbt task to run with the custom code generator.
class is located at /project-root/app/com/my/name
val conf = ConfigFactory.parseFile(new File("conf/application.conf")).resolve()
slick <<= slickCodeGenTask
lazy val slick = TaskKey[Seq[File]]("gen-tables")
lazy val slickCodeGenTask = (sourceManaged, dependencyClasspath in Compile, runner in Compile, streams) map { (dir, cp, r, s) =>
val outputDir = (dir / "slick").getPath
val url = conf.getString("slick.dbs.default.db.url")
val jdbcDriver = conf.getString("slick.dbs.default.db.driver")
val slickDriver = conf.getString("slick.dbs.default.driver").dropRight(1)
val pkg = "com.my.name"
val user = conf.getString("slick.dbs.default.db.user")
val password = conf.getString("slick.dbs.default.db.password")
toError(r.run(s"$pkg.CustomCodeGenerator", cp.files, Array(slickDriver, jdbcDriver, url, outputDir, pkg, user, password), s.log))
val fname = outputDir + s"/$pkg/Tables.scala"
Seq(file(fname))
}
it always gives the same exception below when i try to run sbt gen-tables
java.lang.ClassNotFoundException: com.my.name.CustomCodeGenerator
at java.lang.ClassLoader.findClass(ClassLoader.java:530)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sbt.classpath.ClasspathFilter.loadClass(ClassLoaders.scala:59)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at sbt.Run.getMainMethod(Run.scala:72)
at sbt.Run.run0(Run.scala:60)
at sbt.Run.sbt$Run$$execute$1(Run.scala:51)
at sbt.Run$$anonfun$run$1.apply$mcV$sp(Run.scala:55)
at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
at sbt.Logger$$anon$4.apply(Logger.scala:84)
at sbt.TrapExit$App.run(TrapExit.scala:248)
at java.lang.Thread.run(Thread.java:745)
when i try some built in java classes or default slick codegen class just to experiment it founds the class
i tried changing the order of this task in the build.sbt class but didn't solved
Instead of
dependencyClasspath in Compile
use fullClasspath in Compile
see https://www.scala-sbt.org/1.x/docs/Howto-Classpaths.html

Scala code not compiling in SBT - Eclipse Maven build

I am trying to compile sample Spark scala file through sbt and have built maven project in Eclipse IDE
Image
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object simpleSpark {
def main(args : Arrayt[String]){
val logfile = "C:\\spark-1.6.1-bin-hadoop2.6\spark-1.6.1-bin-hadoop2.6\README.md"
val conf = new SparkConf().setAppName("Simple Application").setMaster("local[2]").set("spark.executor.memory", "1g")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numHadoops = logData.filter(line => line.contains("Hadoop")).count()
val numSparks = logData.filer(line => line.contains("Spark")).count()
println("Lines with Hadoop: %s, Lines with Spark: %s".format(numHadoops, numHadoops))
}
}
The error says you have illegal start of expression here set("spark.executor.memory",) . Are you sure you set spark.executor.memory correctly in actual code ?
If yes , can you show what you wrote is .sbt file ?

Categories

Resources