I create a function based on Java MaskFormatter function in Databricks/Scala.
But when I call it from sparksql, I received error message
Error in SQL statement: AnalysisException: Undefined function:
formatAccount. This function is neither a built-in/temporary function,
nor a persistent function that is qualified as
spark_catalog.default.formataccount.; line 1 pos 32
Here is my function
import javax.swing.text.MaskFormatter
def formatAccount(account: String, mask:String ) : String =
{
val formatter = new MaskFormatter(mask.replace("X", "A"))
formatter.setValueContainsLiteralCharacters(false)
val formatAccount = formatter.valueToString(account)
formatAccount
}
Here is the query code which received the error message
sql("""select java_method(emitToKafka ,formatAccount("1222233334", "X-XXXX-XXXX-X"))""")
However if I run below code, it works fine.
formatAccount("1222233334", "X-XXXX-XXXX-X")
res0: String = 1-2222-3333-4
what could be missed?
Related
I try to create a Google Cloud Function that automates the process of creating a windows password for my vm instance. I found this link: https://cloud.google.com/compute/docs/instances/windows/automate-pw-generation#python
Unfortunately I use Javascript regularly wherefore I need help with Java, Python or Go. In this case I decided to use python but it doesn't matter.
My settings are:
Runtime: Python 3.7
Entry point: main
Code
import base64
import copy
import datetime
import json
import time
from Crypto.Cipher import PKCS1_OAEP
from Crypto.PublicKey import RSA
from Crypto.Util.number import long_to_bytes
from oauth2client.client import GoogleCredentials
from googleapiclient.discovery import build
def GetCompute():
credentials = GoogleCredentials.get_application_default()
compute = build('compute', 'v1', credentials=credentials)
return compute
def GetInstance(compute, instance, zone, project):
cmd = compute.instances().get(instance=instance, project=project, zone=zone)
return cmd.execute()
def GetKey():
key = RSA.generate(2048)
return key
def GetModulusExponentInBase64(key):
mod = long_to_bytes(key.n)
exp = long_to_bytes(key.e)
modulus = base64.b64encode(mod)
exponent = base64.b64encode(exp)
return modulus, exponent
def GetExpirationTimeString():
utc_now = datetime.datetime.utcnow()
expire_time = utc_now + datetime.timedelta(minutes=5)
return expire_time.strftime('%Y-%m-%dT%H:%M:%SZ')
def GetJsonString(user, modulus, exponent, email):
expire = GetExpirationTimeString()
data = {'userName': user,
'modulus': modulus,
'exponent': exponent,
'email': email,
'expireOn': expire}
return json.dumps(data)
def UpdateWindowsKeys(old_metadata, metadata_entry):
new_metadata = copy.deepcopy(old_metadata)
new_metadata['items'] = [{
'key': "windows-keys",
'value': metadata_entry
}]
return new_metadata
def UpdateInstanceMetadata(compute, instance, zone, project, new_metadata):
cmd = compute.instances().setMetadata(instance=instance, project=project, zone=zone, body=new_metadata)
return cmd.execute()
def GetSerialPortFourOutput(compute, instance, zone, project):
port = 4
cmd = compute.instances().getSerialPortOutput(instance=instance, project=project, zone=zone, port=port)
output = cmd.execute()
return output['contents']
def GetEncryptedPasswordFromSerialPort(serial_port_output, modulus):
output = serial_port_output.split('\n')
for line in reversed(output):
try:
entry = json.loads(line)
if modulus == entry['modulus']:
return entry['encryptedPassword']
except ValueError:
pass
def DecryptPassword(encrypted_password, key):
decoded_password = base64.b64decode(encrypted_password)
cipher = PKCS1_OAEP.new(key)
password = cipher.decrypt(decoded_password)
return password
def main(request):
instance = 'my-instance'
zone = 'my-zone'
project = 'my-project'
user = 'my-user'
email = 'my-email'
compute = GetCompute()
key = GetKey()
modulus, exponent = GetModulusExponentInBase64(key)
instance_ref = GetInstance(compute, instance, zone, project)
old_metadata = instance_ref['metadata']
metadata_entry = GetJsonString(user, modulus, exponent, email)
new_metadata = UpdateWindowsKeys(old_metadata, metadata_entry)
result = UpdateInstanceMetadata(compute, instance, zone, project, new_metadata)
time.sleep(30)
serial_port_output = GetSerialPortFourOutput(compute, instance, zone, project)
enc_password = GetEncryptedPasswordFromSerialPort(serial_port_output, modulus)
password = DecryptPassword(enc_password, key)
print(f'Username: {user}')
print(f'Password: {password}')
ip = instance_ref['networkInterfaces'][0]['accessConfigs'][0]['natIP']
print(f'IP Address: {ip}')
As you can see I added my details to the main function and my requirements.txt looks like this:
pycrypto==2.6.1
oauth2client==4.1.3
Unfortunately it doesn't work and I receive the following error:
Object of type bytes is not JSON serializable.
I hope you can help me here. Thanks.
========
EDIT
I added ".decode()" to modulus and exponent to avoid the previous error:
def GetJsonString(user, modulus, exponent, email):
expire = GetExpirationTimeString()
data = {'userName': user,
'modulus': modulus.decode(),
'exponent': exponent.decode(),
'email': email,
'expireOn': expire}
return json.dumps(data)
But I am still not able to generate a password. I receive an error at "serial_port_output = GetSerialPortFourOutput(compute, instance, zone, project)":
error decoding modulus: illegal base64 data at input byte 1
Your function is not working because you're using Python 2.x print statement instead of python3 print() function.
Replace
print 'Username: {0}'.format(user)
with
print ('Username: {0}'.format(user))
You can also use python 3.6+ f-strings instead of format()
pring(f'Username: {user}')
I have a formattedDataInputDateTime String that I want to insert into a table as a Timestamp type as a second field.
// Returns 2019-10-30T13:00Z
val localDateTimeZoned = OffsetDateTime.of(java.time.LocalDate.parse(currentDate), java.time.LocalTime.now, ZoneOffset.UTC).truncatedTo(ChronoUnit.HOURS)
// Returns 2019-10-30T13:00:00.000+0000
val formattedDataInputDateTime: String = localDateTimeZoned.format(DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSxx")).toString
So I wrote the following query but can't figure out how to insert the formattedDataInputDateTime as a timestamp here?
spark.sql(
s"""INSERT INTO main.basic_metrics
|VALUES ('metric_name', ???,
|'metric_type', current_timestamp, false)""".stripMargin)
I've tried to test this approach but it resulted in the following error:
val ts = cast(unix_timestamp("$formattedDataInputDateTime", "yyyy-MM-dd'T'HH:mm:ss.SSSxx") as timestamp)
type mismatch;
found : String("$formattedDataInputDateTime")
required: org.apache.spark.sql.Column
val ts = cast(unix_timestamp("$formattedDataInputDateTime", "yyyy-MM-dd'T'HH:mm:ss.SSSxx") as timestamp)
type mismatch;
found : String("$formattedDataInputDateTime")
required: org.apache.spark.sql.Column
This basically means the $ is inside the quoted string. It should be outside like $"formattedDataInputDateTime"
You are passing String instead of Column, you can wrap it using lit:
cast(unix_timestamp(lit(formattedDataInputDateTime), "yyyy-MM-dd'T'HH:mm:ss.SSSxx")
However you can get current date and format it using spark functions current_date and date_format.
There is input_file_name function in Apache Spark which is used by me to add new column to Dataset with the name of file which is currently being processed.
The problem is that I'd like to somehow customize this function to return only file name, ommitting the full path to it on s3.
For now, I am doing replacement of the path on the second step using map function:
val initialDs = spark.sqlContext.read
.option("dateFormat", conf.dateFormat)
.schema(conf.schema)
.csv(conf.path).withColumn("input_file_name", input_file_name)
...
...
def fromFile(fileName: String): String = {
val baseName: String = FilenameUtils.getBaseName(fileName)
val tmpFileName: String = baseName.substring(0, baseName.length - 8) //here is magic conversion ;)
this.valueOf(tmpFileName)
}
But I'd like to use something like
val initialDs = spark.sqlContext.read
.option("dateFormat", conf.dateFormat)
.schema(conf.schema)
.csv(conf.path).withColumn("input_file_name", **customized_input_file_name_function**)
In Scala:
#register udf
spark.udf
.register("get_only_file_name", (fullPath: String) => fullPath.split("/").last)
#use the udf to get last token(filename) in full path
val initialDs = spark.read
.option("dateFormat", conf.dateFormat)
.schema(conf.schema)
.csv(conf.path)
.withColumn("input_file_name", get_only_file_name(input_file_name))
Edit: In Java as per comment
#register udf
spark.udf()
.register("get_only_file_name", (String fullPath) -> {
int lastIndex = fullPath.lastIndexOf("/");
return fullPath.substring(lastIndex, fullPath.length - 1);
}, DataTypes.StringType);
import org.apache.spark.sql.functions.input_file_name
#use the udf to get last token(filename) in full path
Dataset<Row> initialDs = spark.read()
.option("dateFormat", conf.dateFormat)
.schema(conf.schema)
.csv(conf.path)
.withColumn("input_file_name", get_only_file_name(input_file_name()));
Borrowing from a related question here, the following method is more portable and does not require a custom UDF.
Spark SQL Code Snippet: reverse(split(path, '/'))[0]
Spark SQL Sample:
WITH sample_data as (
SELECT 'path/to/my/filename.txt' AS full_path
)
SELECT
full_path
, reverse(split(full_path, '/'))[0] as basename
FROM sample_data
Explanation:
The split() function breaks the path into it's chunks and reverse() puts the final item (the file name) in front of the array so that [0] can extract just the filename.
Full Code example here :
spark.sql(
"""
|WITH sample_data as (
| SELECT 'path/to/my/filename.txt' AS full_path
| )
| SELECT
| full_path
| , reverse(split(full_path, '/'))[0] as basename
| FROM sample_data
|""".stripMargin).show(false)
Result :
+-----------------------+------------+
|full_path |basename |
+-----------------------+------------+
|path/to/my/filename.txt|filename.txt|
+-----------------------+------------+
commons io is natural/easiest import in spark means(no need to add additional dependency...)
import org.apache.commons.io.FilenameUtils
getBaseName(String fileName)
Gets the base name, minus the full path and extension, from a full fileName.
val baseNameOfFile = udf((longFilePath: String) => FilenameUtils.getBaseName(longFilePath))
Usage is like...
yourdataframe.withColumn("shortpath" ,baseNameOfFile(yourdataframe("input_file_name")))
.show(1000,false)
I have a case class which I want to serialize first. Then after that, I want to deserialize it for storing purpose in MongoDB but java 8 LocalDateTime was creating problem. I took help from this link:
how to deserialize DateTime in Lift
but with no luck. I am unable to write it for java 8 date time.
Can any one please help me with this date Time issue? Here is my code:
import net.liftweb.json.Serialization.{read, write}
implicit val formats = Serialization.formats(NoTypeHints)
case class Child(var str: String, var Num: Int, var abc: Option[String], MyList: List[Int], val dateTime: LocalDateTime = LocalDateTime.now())
val ser = write(Child("Mary", 5, None, List(1, 2)))
println("Child class converted to string" + ser)
var obj = read[Child](ser)
println("object of Child is " + obj)
And here is the error message printed on the console:
(run-main-0) java.lang.ArrayIndexOutOfBoundsException: 49938
java.lang.ArrayIndexOutOfBoundsException: 49938
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.<init>(BytecodeReadingParanamer.java:451)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.<init>(BytecodeReadingParanamer.java:431)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.<init>(BytecodeReadingParanamer.java:492)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.<init>(BytecodeReadingParanamer.java:337)
at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:100)
at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:75)
at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:68)
at net.liftweb.json.Meta$ParanamerReader$.lookupParameterNames(Meta.scala:89)
at net.liftweb.json.Meta$Reflection$.argsInfo$1(Meta.scala:237)
at net.liftweb.json.Meta$Reflection$.constructorArgs(Meta.scala:253)
at net.liftweb.json.Meta$Reflection$.net$liftweb$json$Meta$Reflection$$findMostComprehensive$1(Meta.scala:266)
at net.liftweb.json.Meta$Reflection$$anonfun$primaryConstructorArgs$1.apply(Meta.scala:269)
at net.liftweb.json.Meta$Reflection$$anonfun$primaryConstructorArgs$1.apply(Meta.scala:269)
at net.liftweb.json.Meta$Memo.memoize(Meta.scala:199)
at net.liftweb.json.Meta$Reflection$.primaryConstructorArgs(Meta.scala:269)
at net.liftweb.json.Extraction$.decompose(Extraction.scala:88)
at net.liftweb.json.Extraction$$anonfun$1.applyOrElse(Extraction.scala:91)
at net.liftweb.json.Extraction$$anonfun$1.applyOrElse(Extraction.scala:89)
at scala.collection.immutable.List.collect(List.scala:305)
at net.liftweb.json.Extraction$.decompose(Extraction.scala:89)
at net.liftweb.json.Serialization$.write(Serialization.scala:38)
at TestActor$.delayedEndpoint$TestActor$1(TestActor.scala:437)
at TestActor$delayedInit$body.apply(TestActor.scala:54)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:383)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at TestActor$.main(TestActor.scala:54)
at TestActor.main(TestActor.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
If I remove the dateTime parameter from case class, it works fine. It seems like the problem is in dateTime.
I ran your code on my Intellij Idea, got the same error. Tried to debug the cause but the invocation stack is so deep that I finally gave up.
But I guess maybe it is because Lift doesn't provide default Format for LocaleDateTime, just like the post you mentioned said, "it is the DateParser format that Lift uses by default."
Here is a compromise for your reference,Lift-JSON provides default Date format for us
// net.liftweb.json.Serialization Line 72
def formats(hints: TypeHints) = new Formats {
val dateFormat = DefaultFormats.lossless.dateFormat
override val typeHints = hints
}
So instead of going all the way to write customized serializer, we may as well change our data type to fit the default Date format. Plus, net.liftweb.mongodb.DateSerializer(Line 79) provides support for Date serialization.
Then, we can provide method to easily get LocaleDateTime. Following is how I try to figure it out.
package jacoffee.scalabasic
import java.time.{ ZoneId, LocalDateTime }
import java.util.Date
// package object defined is for Scala compiler to look for implicit conversion for case class parameter date
package object stackoverflow {
implicit def toDate(ldt: LocalDateTime): Date =
Date.from(ldt.atZone(ZoneId.systemDefault()).toInstant())
implicit def toLDT(date: Date): LocalDateTime =
LocalDateTime.ofInstant(date.toInstant(), ZoneId.systemDefault())
}
package jacoffee.scalabasic.stackoverflow
import java.time.LocalDateTime
import java.util.Date
import net.liftweb.json.{ NoTypeHints, Serialization }
import net.liftweb.json.Serialization.{ read, write }
case class Child(var str: String, var Num: Int, var abc: Option[String],
myList: List[Int], val date : Date = LocalDateTime.now()) {
def getLDT: LocalDateTime = date
}
object DateTimeSerialization extends App {
implicit val formats = Serialization.formats(NoTypeHints)
val ser = write(Child("Mary", 5, None, List(1, 2)))
// Child class converted to string {"str":"Mary","Num":5,"myList":[1,2],"date":"2015-07-21T03:07:05.699Z"}
println(" Child class converted to string " + ser)
var obj=read[Child](ser)
// Object of Child is Child(Mary,5,None,List(1, 2),Tue Jul 21 11:48:22 CST 2015)
println(" Object of Child is "+ obj)
// LocalDateTime of Child is 2015-07-21T11:48:22.075
println(" LocalDateTime of Child is "+ obj.getLDT)
}
Anyway, hope it helps.
In my scala code this works fine:
import org.springframework.jmx.export.annotation.{ManagedOperationParameters, ManagedResource, ManagedOperation, ManagedOperationParameter}
#Override #ManagedOperation(description = "somedesk")
def getStatsAsStr: String = "blabla"
but as soon as i add #ManagedOperationParameters I get illegal start of simple expression for #ManagedOperationParameter( although I do import it.
so while in java this compiles fine:
#Override #ManagedOperation(description = "some description")
#ManagedOperationParameters({#ManagedOperationParameter(name = "myname", description = "myname")
})
In scala does not compile:
import org.springframework.jmx.export.annotation.{ManagedOperationParameters, ManagedResource, ManagedOperation, ManagedOperationParameter}
#Override #ManagedOperation(description = "some description")
#ManagedOperationParameters(Array(#ManagedOperationParameter(name = "myname", description = "mydesc")) // PRODUCES 'illegal start of simple expression for #ManagedOperationParameter('
def getStatsAsStr(myname: String): String = "blabla"
is there a way for it to work? if i create it as a .java with java syntax in same project all is fine (which means my depenedncies are fine) i think its something with scala syntax i don't get what is it?
Inner annotation values have to be constructed with a different syntax. This should work (whitespace added for clarity, not relevant); if not, try replacing the named parameters with positional.
#ManagedOperationParameters(
Array(
new ManagedOperationParameter(name="myname", description="mydesc")
))