Currently working on converting below code as "JAR" to register permanent UDF in Databricks cluster. facing issue like NoClassDefFoundError, But i added required Library dependencies while building Jar using SBT. source code : https://databricks.com/notebooks/enforcing-column-level-encryption.html
Used Below in build.sbt
scalaVersion := "2.13.4"
libraryDependencies += "org.apache.hive" % "hive-exec" % "0.13.1"
libraryDependencies += "com.macasaet.fernet" % "fernet-java8" % "1.5.0"
Guide me on right libraries if anything wrong above.
Kindly help me on this,
import com.macasaet.fernet.{Key, StringValidator, Token}
import org.apache.hadoop.hive.ql.exec.UDF;
class Validator extends StringValidator {
override def getTimeToLive() : java.time.temporal.TemporalAmount = {
Duration.ofSeconds(Instant.MAX.getEpochSecond());
}
}
class udfDecrypt extends UDF {
def evaluate(inputVal: String, sparkKey : String): String = {
if( inputVal != null && inputVal!="" ) {
val keys: Key = new Key(sparkKey)
val token = Token.fromString(inputVal)
val validator = new Validator() {}
val payload = token.validateAndDecrypt(keys, validator)
payload
} else return inputVal
}
}
Make sure the fernet-java library is installed in your cluster.
This topic is related to
Databricks SCALA UDF cannot load class when registering function
I tried more to install the jar file to the cluster via the Libraries in the config, not drop directly to DBFS as the userguide, then I faced the issue with validator not found and the question routed me here.
I added the maven repo to the Libraries config, but then the cluster failed to installed it, with error
Library resolution failed because unresolved dependency: com.macasaet.fernet:fernet-java8:1.5.0: not found
databricks cluster libraries
Have you experienced with this?
Related
In my akka project while I'm trying to build I am getting this error also I downloaded all libraries
Uncaught error from thread [PersistentActors-akka.persistence.dispatchers.default-plugin-dispatcher-5]: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /private/var/folders/52/md0sb9l50k3b8rxw4sclh7kh0000gn/T/libleveldbjni-64-1-498021637280856383.8: dlopen(/private/var/folders/52/md0sb9l50k3b8rxw4sclh7kh0000gn/T/libleveldbjni-64-1-498021637280856383.8, 1): no suitable image found. Did find:
/private/var/folders/52/md0sb9l50k3b8rxw4sclh7kh0000gn/T/libleveldbjni-64-1-498021637280856383.8: no matching architecture in universal wrapper
Could you help me so on windows I fixed this problem by downloading Microsoft Visual C++ 2010 Redistributable Package but on mac I am out of ideas
my code
class SimplePersistentActor extends PersistentActor with ActorLogging {
override def persistenceId: String = "simple-persistence"
override def receiveCommand: Receive = {
case message => log.info(s"Received: $message")
}
override def receiveRecover: Receive = {
case event => log.info(s"Recovered: $event")
}
}
val system = ActorSystem("Playground")
val simpleActor = system.actorOf(Props[SimplePersistentActor], "simplePersistentActor")
simpleActor ! "I love Akka!"
I have the following setup:
I have the following dependency in my POM:
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>3.14.0</version>
</dependency>
I have a very simple proto file:
syntax = "proto3";
package com.ziath.genericdecoderserver;
option java_outer_classname = "DecodePackage";
message DecodeData {
string template = 1;
bytes image = 2;
int32 startColumn = 3;
int32 endColumn = 4;
}
I'm generating the proto file using the version 3.14.0, binary for win64:
PS C:\Users\neilb\Documents\GitHub\GenericDecoderServer\src\main\protobuf\bin> .\protoc.exe --version
libprotoc 3.14.0
This matches the maven dependency I'm pulling in. However the file generated has error with the override annotation:
#java.lang.Override
public com.ziath.genericdecoderserver.DecodePackage.DecodeData buildPartial() {
com.ziath.genericdecoderserver.DecodePackage.DecodeData result = new
com.ziath.genericdecoderserver.DecodePackage.DecodeData(this);
result.template_ = template_;
result.image_ = image_;
result.startColumn_ = startColumn_;
result.endColumn_ = endColumn_;
onBuilt();
return result;
}
The reported error is:
The method buildPartial() of type DecodePackage.DecodeData.Builder must override a superclass method
So this method is in the Builder class which is defined as:
public static final class Builder extends com.google.protobuf.GeneratedMessageV3.Builder<Builder> implements
// ##protoc_insertion_point(builder_implements:com.ziath.genericdecoderserver.DecodeData)
com.ziath.genericdecoderserver.DecodePackage.DecodeDataOrBuilder {
Eclipse is correct the method buildPartial is not in either of the interfaces protobuf is referencing so it looks like a version mismatch but the versions are the same. There are scores of errors in this generated code along the same lines. Does anybody know what the problem is or even seen this before because my searches show nothing from this?
Thanks.
Cheers,
Neil
Solved it! The project was created using Spring Initilaser and for some reason that made the java version to be 1.5 in eclipse. 1.5 does not allow override for interface methods.
The eventual goal that I want to achieve is that I want to query my MongoDB Collection through Spark SQL using Scala Code as an Independent application. I have successfully installed Spark on my local which is running "Windows 10" operating system. I can run spark-shell, Spark Master node and worker node. So from the looks of it, the apache spark is working fine on my p.c
I can also query my MongoDB collection by running the scala code in Spark Shell.
Problem:
When I try to use the same code from my Scala project using MongoDB Spark Connector for scala. I am running into an error which I am unable to figure out. IT seems like an environment issue, I looked it up and many people suggested that it happens if you use Java 9 or higher version. I am using Java 8 so that's not the issue in my case. That is why I have also posted my java -version snapshot in the post.
But when I run the code, I get the following error. It would be great help IF somebody can advise me in any direction.
Scala Code:
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SparkSession
import com.mongodb.spark.config._
import com.mongodb.spark._
object SparkSQLMongoDBConnector {
def main(args: Array[String]): Unit ={
var sc: SparkContext = null
var conf = new SparkConf()
conf.setAppName("MongoSparkConnectorIntro")
.setMaster("local")
.set("spark.hadoop.validateOutputSpecs", "false")
.set("spark.mongodb.input.uri","mongodb://127.0.0.1/metadatastore.metadata_collection?readPreference=primaryPreferred")
.set("spark.mongodb.output.uri","mongodb://127.0.0.1/metadatastore.metadata_collection?readPreference=primaryPreferred")
sc = new SparkContext(conf)
val spark = SparkSession.builder().master("spark://192.168.137.221:7077").appName("MongoSparkConnectorIntro").config("spark.mongodb.input.uri", "mongodb://127.0.0.1/metadatastore.metadata_collection?readPreference=primaryPreferred").config("spark.mongodb.output.uri", "mongodb://127.0.0.1/metadatastore.metadata_collection?readPreference=primaryPreferred").getOrCreate()
val readConfig = ReadConfig(Map("collection" -> "spark", "readPreference.name" -> "secondaryPreferred"), Some(ReadConfig(sc)))
val customRdd = MongoSpark.load(sc, readConfig)
println(customRdd.count)
println(customRdd.first.toString())
}
}
SBT:
scalaVersion := "2.12.8"
libraryDependencies += "org.mongodb.spark" %% "mongo-spark-connector" % "2.4.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.0"
Java Version:
Error:
This is the error that I face when I run the Scala code in the IntelliJ.
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:116)
at org.apache.hadoop.security.Groups.<init>(Groups.java:93)
at org.apache.hadoop.security.Groups.<init>(Groups.java:73)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:293)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInorg.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2422)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2422)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:293)
at SparkSQLMongoDBConnector$.main(SparkSQLMongoDBConnector.scala:35)
at SparkSQLMongoDBConnector.main(SparkSQLMongoDBConnector.scala)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3410)
at java.base/java.lang.String.substring(String.java:1883)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:50)
... 16 more
Any help would be much appreciated.
Shell checks your Java version via java.version variable
private static boolean IS_JAVA7_OR_ABOVE =
System.getProperty("java.version").substring(0, 3).compareTo("1.7") >= 0;
Make sure it is defined.
This line was changed in Hadoop 2.7+, but by default, Spark uses 2.6.5.
I'm writing a custom Spark structured streaming source (using v2 interfaces and Spark 2.3.0) in Java/Scala.
When testing the integration with the Spark offsets/checkpoint, I get the following error:
18/06/20 11:58:49 ERROR MicroBatchExecution: Query [id = 58ec2604-3b04-4912-9ba8-c757d930ac05, runId = 5458caee-6ef7-4864-9968-9cb843075458] terminated with error
java.lang.ClassCastException: org.apache.spark.sql.execution.streaming.SerializedOffset cannot be cast to org.apache.spark.sql.sources.v2.reader.streaming.Offset
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:405)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:390)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at org.apache.spark.sql.execution.streaming.StreamProgress.flatMap(StreamProgress.scala:25)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:389)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:133)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:121)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:117)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
This is my Offset implementation (simplified version, I removed JSON (de) serialization):
package mypackage
import org.apache.spark.sql.execution.streaming.SerializedOffset
import org.apache.spark.sql.sources.v2.reader.streaming.Offset
case class MyOffset(offset: Long) extends Offset {
override val json = "{\"offset\":"+offset+"}"
}
private object MyOffset {
def apply(offset: SerializedOffset): MyOffset = new MyOffset(0L)
}
Dou you have any advice about how to solve this problem?
Check that Spark version of your client app is exactly the same as Spark version of your cluster. I used Spark v.2.4.0 dependencies in spark job application, but cluster had Spark engine v.2.3.0. When I downgraded dependencies to v.2.3.0 error has gone.
I have a problem with Scala code with Java methods.
It is calling:
value getDepth is not a member of amqpManagment.utils.data.ChessObject
var depth: Int = chessObjects.getDepth()
^
However i use getDepth in many other places in Java code and it works fine.
Also after put that code it was working in InteliJ by few hours which is weird but maybe project didnt rebuild itself after that change...
However InteliJ shows code is okay, but during compiling it shows that error. Rebuilding by InteliJ or terminal doesnt help.
Scala code:
import amqpManagment.utils.data.ChessObject
object ChessScheduler {
// DEPTH GAME
def startGameWithDepthRule(chessObject: ChessObject) : Integer =
{
...
val depth: Int = chessObjects.getDepth()
...
}
}
Java Code:
#Getter
#Setter
public class ChessObject {
private Integer depth;
...
}
build.sbt
import sbt.Keys._
import sbt.Level
name := "ChessEngineModuler"
logLevel := Level.Warn
version := "1.0"
scalaVersion := "2.12.2"
Thank you for your help.
Hello #Chenna Reddy :)
Thank you for your post, It seems it was problem with Lombok indeed. However after your answer i realised it was a problem because Scala code was compiled before Java one.
I check three solutions cause i had added dependency and Annotation Processor On.
First solution is just adding Getters and Setters to Java class not by the Lombok, however it is ugly solution
Second Solution is just adding in Files -> Settings -> Build, Execution, Deployment -> Compiler -> Scala Compiler -> Compile Order -> Java then Scala.
Third one is set in build.sbt -> compileOrder := CompileOrder.JavaThenScala
I think 3rd is the best one if we want deploy that code somewhere :)
Looks like you are using lombok for auto generation of getters. Please add lombok dependency.
libraryDependencies += "org.projectlombok" % "lombok" % "1.16.16"
Above step is not required if you are building Java project seperately and that project has lombok as a compile time dependency. Then generated jar file must have all the getters already.
Regarding why Intellij shows error sometimes, its possible that you didn't enable annotation processing from Files -> Settings -> Build, Execution, Deployment -> Compiler -> Annotation Processors.