PARSER - Nosuchfield error while loading data froom hdfs in spark

PARSER - Nosuchfield error while loading data froom hdfs in spark - java

I am trying to run the following code. It looks like a dependency issue to me mostly.
Dataset<Row> ds = spark.read().parquet("hdfs://localhost:9000/test/arxiv.parquet");
I am getting the following error :
Exception in thread "main" java.io.IOException: com.google.protobuf.ServiceException: java.lang.NoSuchFieldError: PARSER
at org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:71)
I have added dependency of apache hadoop common.
Can someone point out possible problem with the code ?

First post, I don't know if I'm doing this right, though had the same problem, and I added a bunch of dependencies till it solves the problem. I'll give you the list I don't really know which ones are really needed :
libraryDependencies += "org.apache.parquet" % "parquet-hadoop" % "1.11.0"
libraryDependencies += "org.apache.parquet" % "parquet-avro" % "1.11.0"
libraryDependencies += "org.apache.parquet" % "parquet-encoding" % "1.11.0"
libraryDependencies += "org.apache.parquet" % "parquet-column" % "1.11.0"
libraryDependencies += "org.apache.parquet" % "parquet-common" % "1.11.0"
libraryDependencies += "org.apache.parquet" %% "parquet-scala" % "1.11.0"
libraryDependencies += "org.apache.parquet" % "parquet-hadoop-bundle" % "1.11.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "3.3.1" % Test
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "3.3.1"

Following up on the answer given by MinoS03 the required dependencies are the following:
hadoop-common
hadoop-hdfs-client

Related

How do you specify/figure out which minimum JDK is required for running the fat jar?

I used sbt-assembly on a project where I have some java 14 jars, and my local machine has JDK 8 as the default JDK.
The sbt assembly task was successful and produced a fat jar.
When I run it with JDK 8, I get the error:
Exception in thread "main" java.lang.UnsupportedClassVersionError: javafx/event/EventTarget has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
JDK 11 (version 55.0) is the one I need. And sure enough, when I set JDK 11 on my shell, I can run the fat jar.
Is there a way to be explicit about the target JDK version in the build.sbt file?
Also, I'm surprised that even though I have Java 14 jars in the dependency, the application runs fine on JDK 11. Is it just an example of Java's supreme backwards compatibility in action? I would like to know what else could be at work.
This is what my build.sbt looks like
name := "scalafx-app"
version := "0.1"
scalaVersion := "2.13.3"
scalacOptions += "-Ymacro-annotations"
useCoursier := false
assemblyMergeStrategy in assembly := {
case "module-info.class" => MergeStrategy.concat
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
lazy val scalaTest = "org.scalatest" %% "scalatest" % "3.1.1"
lazy val osName = System.getProperty("os.name") match {
case n if n.startsWith("Linux") => "linux"
case n if n.startsWith("Mac") => "mac"
case n if n.startsWith("Windows") => "win"
case _ => throw new Exception("Unknown platform!")
}
lazy val javaFXModules = Seq("base", "controls", "fxml", "graphics", "media", "web")
lazy val root = (project in file("."))
.settings(
libraryDependencies += scalaTest % Test,
// scalafx
libraryDependencies += "org.scalafx" %% "scalafx" % "14-R19",
libraryDependencies ++= javaFXModules.map(m =>
"org.openjfx" % s"javafx-$m" % "14.0.1" classifier(osName) withJavadoc()
),
libraryDependencies += "org.scalafx" %% "scalafxml-core-sfx8" % "0.5",
// javafx custom components
libraryDependencies += "com.jfoenix" % "jfoenix" % "9.0.9",
libraryDependencies += "org.kordamp.ikonli" % "ikonli-javafx" % "11.4.0",
libraryDependencies += "org.kordamp.ikonli" % "ikonli-material-pack" % "11.4.0",
// json parsing
libraryDependencies += "com.typesafe.play" %% "play-json" % "2.9.0",
libraryDependencies += "com.squareup.moshi" % "moshi" % "1.9.3",
// logging
libraryDependencies += "com.typesafe.scala-logging" %% "scala-logging" % "3.9.2",
libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.2.3",
)

A JAR is just like a zip of classes, each class is the one that you can check with javap to see which JDK version they need by looking at the value of the "major version" field; see this.
If you want to ensure the classes are compiled to a specific Java version, you can use the release & target scalac options.
Like this:
// Ensure the compiler emits Java 8 bytecode.
scalacOptions ++= Seq("-release", "8", "-target:8")
release is used to specify which Java sdtlib is used.
target is used t specify which bytcode version is emitted.

How to fix scala.tools.nsc.typechecker.Contexts$Context.imports(Contexts.scala:232) in an sbt project?

The Issue is with the below Error,
[error] at scala.tools.nsc.typechecker.Typers$Typer.typedApply$1(Typers.scala:4580)
[error] at scala.tools.nsc.typechecker.Typers$Typer.typedInAnyMode$1(Typers.scala:5343)
[error] at scala.tools.nsc.typechecker.Typers$Typer.typed1(Typers.scala:5360)
[error] at scala.tools.nsc.typechecker.Typers$Typer.runTyper$1(Typers.scala:5396)
[error] (Compile / compileIncremental) java.lang.StackOverflowError
[error] Total time: 11 s, completed Apr 25, 2019 7:11:28 PM
also tried to increase the jmx parameters
javaOptions ++= Seq("-Xms512M", "-Xmx4048M", "-XX:MaxPermSize=4048M", "-XX:+CMSClassUnloadingEnabled") but it didn't help. All the dependencies seems to resolve properly but this Error is kind of Struck.
build.properties
sbt.version=1.2.8
plugin.sbt
addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "5.2.4")
addSbtPlugin("org.scoverage" % "sbt-scoverage" % "1.5.1")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.9")
And the build.sbt
name := "ProjectNew"
version := "4.0"
scalaVersion := "2.11.8"
fork := true
libraryDependencies ++= Seq(
"org.scalaz" %% "scalaz-core" % "7.1.0" % "test",
("org.apache.spark" %% "spark-core" % "2.1.0.cloudera1").
exclude("org.mortbay.jetty", "servlet-api").
exclude("commons-beanutils", "commons-beanutils-core").
//exclude("commons-collections", "commons-collections").
exclude("com.esotericsoftware.minlog", "minlog").
//exclude("org.apache.hadooop","hadoop-client").
exclude("commons-logging", "commons-logging") % "provided",
("org.apache.spark" %% "spark-sql" % "2.1.0.cloudera1")
.exclude("com.esotericsoftware.minlog","minlog")
//.exclude("org.apache.hadoop","hadoop-client")
% "provided",
("org.apache.spark" %% "spark-hive" % "2.1.0.cloudera1")
.exclude("com.esotericsoftware.minlog","minlog")
//.exclude("org.apache.hadoop","hadoop-client")
% "provided",
"spark.jobserver" % "job-server-api" % "0.4.0",
"org.scalatest" %%"scalatest" % "2.2.4" % "test",
"com.github.nscala-time" %% "nscala-time" % "1.6.0"
)
//libraryDependencies ++= Seq(
// "org.apache.spark" %% "spark-core" % "1.5.0-cdh5.5.0" % "provided",
// "org.apache.spark" %% "spark-sql" % "1.5.0-cdh5.5.0" % "provided",
// "org.scalatest"%"scalatest_2.10" % "2.2.4" % "test",
// "com.github.nscala-time" %% "nscala-time" % "1.6.0"
// )
resolvers ++= Seq(
"cloudera" at "http://repository.cloudera.com/artifactory/cloudera-repos/",
"Job Server Bintray" at "http://dl.bintray.com/spark-jobserver/maven"
)
scalacOptions ++= Seq("-unchecked", "-deprecation")
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs # _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
parallelExecution in Test := false
fork in Test := true
javaOptions ++= Seq("-Xms512M", "-Xmx4048M", "-XX:MaxPermSize=4048M", "-XX:+CMSClassUnloadingEnabled")

It was a memory Issue.
I referred to the following Answer: Answer.
And in my C:\Program Files (x86)\sbt\conf\sbtconfig File, I added/increased the below params for memory.
-Xmx2G
-XX:MaxPermSize=1000m
-XX:ReservedCodeCacheSize=1000m
-Xss8M
And running sbt package has seamlessly worked and Compilation succeeded.
Thank you All.

spark-submit dependency resolution for spark-csv

I am doing small scala program which converts csv to parquet.
I am using databricks spark-csv.
Here's is my build.sbt
name: = "tst"
version: = "1.0"
scalaVersion: = "2.10.5"
libraryDependencies++ = Seq(
"org.apache.spark" % % "spark-core" % "1.6.1" % "provided",
"org.apache.spark" % % "spark-sql" % "1.6.1",
"com.databricks" % "spark-csv_2.10" % "1.5.0",
"org.apache.spark" % % "spark-hive" % "1.6.1",
"org.apache.commons" % "commons-csv" % "1.1",
"com.univocity" % "univocity-parsers" % "1.5.1",
"org.slf4j" % "slf4j-api" % "1.7.5" % "provided",
"org.scalatest" % % "scalatest" % "2.2.1" % "test",
"com.novocode" % "junit-interface" % "0.9" % "test",
"com.typesafe.akka" % "akka-actor_2.10" % "2.3.11",
"org.scalatest" % % "scalatest" % "2.2.1",
"com.holdenkarau" % % "spark-testing-base" % "1.6.1_0.3.3",
"com.databricks" % "spark-csv_2.10" % "1.5.0",
"org.joda" % "joda-convert" % "1.8.1"
)
After sbt package, when I run command
spark-submit --master local[*] target/scala-2.10/tst_2.10-1.0.jar
I get following error.
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at http://spark-packages.org
I can see the com.databricks_spark-csv_2.10-1.5.0.jar file in ~/.ivy2/jars/ downloaded by sbt package command
The source code of the dataconversion.scala
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object dataconversion {
def main(args: Array[String]) {
val conf =
new SparkConf()
.setAppName("ClusterScore")
.set("spark.storage.memoryFraction", "1")
val sc = new SparkContext(conf)
val sqlc = new SQLContext(sc)
val df = sqlc.read
.format("com.databricks.spark.csv")
.option("header", "true") // Use first line of all files as header
.option("inferSchema", "true") // Automatically infer data types
.load("/tmp/cars.csv")
println(df.printSchema)
}
}
I can do spark-submit without error if I specify --jars option with explicit jar path. But that's not ideal. Please suggest.

Use the sbt-assembly plugin to build a "fat jar" containing all your dependencies with sbt assembly, and then call spark-submit on that.
In general, when you get ClassNotFoundException, try exploding the jar you created to see what's in it with jar tvf target/scala-2.10/tst_2.10-1.0.jar. Checking what's in the Ivy cache is meaningless; that just tells you that SBT found it. As mathematicians say, that's necessary but not sufficient.

The mentioned library is required so you have options:
Place com.databricks_spark-csv_2.10-1.5.0.jar in local or hdfs
reachable path and provide as dependency with --jars parameter
Using --packages com.databricks:spark-csv_2.10:1.5.0 which will
provide required lib to your process
To build fat jar with your dependencies and forget about --jars

Unable to import com.google.firebase.FirebaseApplication in Play Frame work application

I am trying to add Firebase in my play framework project. I followed the following link
https://medium.com/#RICEaaron/scala-firebase-da433df93bd2#.m1fwlvc8l
I am done with following steps
created project in firebase developer console
generated private server key and downloaded the json file
Added firebase server sdk dependency in build.sbt
This is my build.sbt code:
name := """NeutrinoRPM"""
version := "1.0-SNAPSHOT"
lazy val root = (project in file(".")).enablePlugins(PlayJava)
scalaVersion := "2.11.1"
resolvers += Resolver.sonatypeRepo("snapshots")
libraryDependencies ++= Seq(
javaJdbc,
cache,
javaWs,
javaCore,
"ws.securesocial" %% "securesocial" % "3.0-M3",
"org.julienrf" %% "play-jsmessages" % "1.6.2",
javaJpa.exclude("org.hibernate.javax.persistence", "hibernate-jpa-2.0-api"),
"org.hibernate" % "hibernate-entitymanager" % "4.3.4.Final",
"mysql" % "mysql-connector-java" % "5.1.9",
"com.typesafe.play" %% "play-mailer" % "2.4.0",
"com.nimbusds" % "nimbus-jose-jwt" % "3.8.2",
"com.wordnik" %% "swagger-play2" % "1.3.12",
"org.webjars" % "swagger-ui" % "2.1.8-M1",
"com.google.api-client" % "google-api-client" % "1.21.0",
"com.google.apis" % "google-api-services-analytics" % "v3-rev127-1.21.0",
"com.google.code.gson" % "gson" % "2.6.2",
"com.google.http-client" % "google-http-client-gson" % "1.21.0",
"org.apache.pdfbox" % "pdfbox" % "2.0.1",
"com.google.firebase" % "firebase-server-sdk" % "3.0.1"
)
Now I am trying to initialize the Firebase server SDK with this code snippet:
FileInputStream serviceAccount = new FileInputStream("path/to/serviceAccountKey.json");
FirebaseOptions options = new FirebaseOptions.Builder()
.setCredential(FirebaseCredentials.fromCertificate(serviceAccount))
.setDatabaseUrl("https://<DATABASE_NAME>.firebaseio.com/")
.build();
FirebaseApp.initializeApp(options);
But when i try to import
com.google.firebase.FirebaseApplication
com.google.firebase.FirebaseOptions
com.google.firebase.database
I get this error: The import com.google.firebase.FirebaseApplication can not be resolved
I spent too many hours on google to search the solution to my problem but ended up with no help. Please help me.

Your dependency on the Firebase server SDK is old:
"com.google.firebase" % "firebase-server-sdk" % "3.0.1"
For new Firebase projects created through firebase.google.com, you should be using the Firebase Admin SDK when running in the JVM. The maven dependency is com.google.firebase:firebase-admin:4.1.0.
There is no FirebaseApplication in that SDK - perhaps you are instead looking for FirebaseApp?

package org.apache.poi.hwpf does not exsist

i am trying to compile some code to test out, and i am recieving some errors
package :
org.apache.poi.hwpf.usermodel
org.apache.poi.hwpf.extractor
org.apache.poi.hwpf
does not exsist
does anyone know where i can find these packages ?
its to complie a simple piece of code that should just allow the conversion of a docx file to a pdf file

The Word 2007 XML formats are in xwpf as hwpf is for the older versions of Word, for example usermodel is org.apache.poi.xwpf.usermodel
The jar for this is under poi-ooxml and currently there is a copy on the maven repo1 at: http://repo1.maven.org/maven2/org/apache/poi/poi-ooxml/3.9/
I'm using SBT with these dependancies:
// Add multiple dependencies
libraryDependencies ++= Seq(
"org.apache.poi" % "poi" % "3.9" % "compile->default",
"org.apache.poi" % "poi-ooxml" % "3.9" % "compile->default",
"org.apache.poi" % "poi-ooxml-schemas" % "3.9" % "compile->default",
"org.mortbay.jetty" % "jetty" % "6.1.22" % "test->default",
"junit" % "junit" % "4.5" % "test->default",
"org.scalatest" %% "scalatest" % "1.6.1" % "test->default"
)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

PARSER - Nosuchfield error while loading data froom hdfs in spark - java

Following up on the answer given by MinoS03 the required dependencies are the following: hadoop-common hadoop-hdfs-client

Related

How do you specify/figure out which minimum JDK is required for running the fat jar?

How to fix scala.tools.nsc.typechecker.Contexts$Context.imports(Contexts.scala:232) in an sbt project?

spark-submit dependency resolution for spark-csv

Unable to import com.google.firebase.FirebaseApplication in Play Frame work application

package org.apache.poi.hwpf does not exsist

Categories

Resources