I have a dataframe a2 written in scala :
val a3 = a2.select(printme.apply(col(“PlayerReference”)))
the column PlayerReference contains a string.
that calls an udf function :
val printme = udf({
st: String =>
val x = new JustPrint(st)
x.printMe();
})
this udf function calls a java class :
public class JustPrint {
private String ss = null;
public JustPrint(String ss) {
this.ss = ss;
}
public void printMe() {
System.out.println("Value : " + this.ss);
}
}
but i have this error for the udf :
java.lang.UnsupportedOperationException: Schema for type Unit is not supported
The goal of this exercise is to validate the chain of calls.
What should I do to solve this problem ?
The reason you're getting this error is that your UDF doesn't return anything, which, in terms of spark is called Unit.
What you should do depends on what you actually want, but, assuming you just want to track values coming through your UDF you should either change printMe so it returns String, or the UDF.
Like this:
public String printMe() {
System.out.println("Value : " + this.ss);
return this.ss;
}
or like this:
val printme = udf({
st: String =>
val x = new JustPrint(st)
x.printMe();
x
})
Related
I have hive/redshift tables and I want to create a spark data frame with precisely the DDL of the original tables, written in JAVA. Is there an option to achieve that?
I think maybe is better to convert the DDL string to Spark schema json, and from that create a df struct type. I started to investiage the spark parser api
String ddlString = "CREATE TABLE data.baab (" +
"id STRING, " +
"test STRING, " +
"test2 STRING, " +
"audit STRUCT<createdDate: TIMESTAMP, createdBy: STRING, lastModifiedDate: TIMESTAMP, lastModifiedBy: STRING>) " +
"USING parquet " +
"LOCATION 's3://test.com' " +
"TBLPROPERTIES ('transient_lastDdlTime' = '1676593278')";
SparkSqlParser parser = new SparkSqlParser();
and I cant see anything that related to ddl parser:
override def parseDataType(sqlText : _root_.scala.Predef.String) : org.apache.spark.sql.types.DataType = { /* compiled code */ }
override def parseExpression(sqlText : _root_.scala.Predef.String) : org.apache.spark.sql.catalyst.expressions.Expression = { /* compiled code */ }
override def parseTableIdentifier(sqlText : _root_.scala.Predef.String) : org.apache.spark.sql.catalyst.TableIdentifier = { /* compiled code */ }
override def parseFunctionIdentifier(sqlText : _root_.scala.Predef.String) : org.apache.spark.sql.catalyst.FunctionIdentifier = { /* compiled code */ }
override def parseMultipartIdentifier(sqlText : _root_.scala.Predef.String) : scala.Seq[_root_.scala.Predef.String] = { /* compiled code */ }
override def parseTableSchema(sqlText : _root_.scala.Predef.String) : org.apache.spark.sql.types.StructType = { /* compiled code */ }
override def parsePlan(sqlText : _root_.scala.Predef.String) : org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = { /* compiled code */ }
protected def astBuilder : org.apache.spark.sql.catalyst.parser.AstBuilder
protected def parse[T](command : _root_.scala.Predef.String)(toResult : scala.Function1[org.apache.spark.sql.catalyst.parser.SqlBaseParser, T]) : T = { /* compiled code */ }
this is what I tried:
StructType struct = null;
Pattern pattern = Pattern.compile("\\(([^()]*)\\)");
Matcher matcher = pattern.matcher(ddlString);
if (matcher.find()) {
String result = matcher.group(1);
struct = StructType.fromDDL(result);
}
return struct;
this is work, but I afraid that this solution will not covert all the cases.
Any suggestions?
I am currently translating legacy groovy class with methods to Java, and for most methods it has been easy with slight modifications.
Now I am stuck in a method that takes closure as param:
transformer.renameNumbers([:], { Number->
return "${number.name}#somecompany.com"
})
}
the renameNumbers implementation is :
renameNumbers(Map<String,String> renameMap, someclosure = {it}) {
numbers.each { it->
if(newUsername == null ) {
newNumbername = someclosure.call(it)
}
if(newNumbername!=null && newNumbername!=it.number) {
def oldNumber= it.number
it.number = newNumbername
log.info("Changed numbername key of from '$oldNumber' to '$newNumbername'")
}
}
The problem is that if i try to simply pass: transformer.renameNumbers(Map, Object)
it complains:
groovy.lang.MissingMethodException: No signature of method: org.eclipse.emf.ecore.util.EObjectContainmen.call() is applicable for argument types:
I guess it's because my normal Java Object doesn't have call() methods.
Is there a way to circumvent this? For example if I create custom Java class with custom call method ?
Thanks
You could try using Java 8s functional interfaces like Function<T,R> and Lambdas:
//Function<Number, String> f = (n) -> n.name + "#somecompany.com";
transformer.renameNumbers(new HashMap<>(), (n) -> n.name + "#somecompany.com");
Usage :
void renameNumbers(Map<String, String> renameMap, Function<Number, String> somefunction) {
numbers.forEach(it -> {
String newNumbername = somefunction.apply(it); // <-----
if (newNumbername != null && newNumbername != it.number) {
String oldNumber = it.number;
it.number = newNumbername;
log.info("Changed numbername key of from '" + oldNumber + "' to '" + newNumbername + "'");
}
});
}
I am learning akka framework for parallel processing in scala, and I was trying to migrating a java project to scala so I can learn both akka and scala at the same time. I am get a NullPointerException on master actor when trying to receive mutable object from the worker actor after some computation in the worker. All code is below...
import akka.actor._
import java.math.BigInteger
import akka.routing.ActorRefRoutee
import akka.routing.Router
import akka.routing.RoundRobinRoutingLogic
object Main extends App {
val system = ActorSystem("CalcSystem")
val masterActor = system.actorOf(Props[Master], "master")
masterActor.tell(new Calculate, ActorRef.noSender)
}
class Master extends Actor {
private val messages: Int = 10;
var resultList: Seq[String] = _
//val workerRouter = this.context.actorOf(Props[Worker].withRouter(new RoundRobinRouter(2)), "worker")
var router = {
val routees = Vector.fill(5) {
val r = context.actorOf(Props[Worker])
context watch r
ActorRefRoutee(r)
}
Router(RoundRobinRoutingLogic(), routees)
}
def receive() = {
case msg: Calculate =>
processMessages()
case msg: Result =>
resultList :+ msg.getFactorial().toString
println(msg.getFactorial())
if (resultList.length == messages) {
end
}
}
private def processMessages() {
var i: Int = 0
for (i <- 1 to messages) {
// workerRouter.tell(new Work, self)
router.route(new Work, self)
}
}
private def end() {
println("List = " + resultList)
this.context.system.shutdown()
}
}
import akka.actor._
import java.math.BigInteger
class Worker extends Actor {
private val calculator = new Calculator
def receive() = {
case msg: Work =>
println("Called calculator.calculateFactorial: " + context.self.toString())
val result = new Result(calculator.calculateFactorial)
sender.tell(result, this.context.parent)
case _ =>
println("I don't know what to do with this...")
}
}
import java.math.BigInteger
class Result(bigInt: BigInteger) {
def getFactorial(): BigInteger = bigInt
}
import java.math.BigInteger
class Calculator {
def calculateFactorial(): BigInteger = {
var result: BigInteger = BigInteger.valueOf(1)
var i = 0
for(i <- 1 to 4) {
result = result.multiply(BigInteger.valueOf(i))
}
println("result: " + result)
result
}
}
You initialize the resultList with null and then try to append something.
Does your calculation ever stop? In line
resultList :+ msg.getFactorial().toString
you're creating a copy of sequence with an element appended. But there is no assignment to var resultList
This line will work as you want.
resultList = resultList :+ msg.getFactorial().toString
I recommend you to avoid mutable variables in actor and use context.become
https://github.com/alexandru/scala-best-practices/blob/master/sections/5-actors.md#52-should-mutate-state-in-actors-only-with-contextbecome
Is it possible to get ClassTag information from a Java Class instance obtained via reflection?
Here's the situation. I have a Scala case class that looks like this:
case class Relation[M : ClassTag](id: UUID,
model: Option[M] = None)
And it is used like this (although with many more classes related to each other):
case class Organization(name: String)
case class Person(firstName: String,
lastName: String,
organization: Relation[Organization])
What I'm trying to do is programmatically build up a tree of these relations using something that looks like this:
private def generateFieldMap(clazz: Class[_]): Map[String, Class[_]] = {
clazz.getDeclaredFields.foldLeft(Map.empty[String, Class[_]])((map, field) => {
map + (field.getName -> field.getType)
})
}
private def getRelationModelClass[M : ClassTag](relationClass: Class[_ <: Relation[M]]): Class[_] = {
classTag[M].runtimeClass
}
def treeOf[M: ClassTag](relations: List[String]): Map[String, Any] = {
val normalizedRelations = ModelHelper.normalize(relations)
val initialFieldMap = Map("" -> generateFieldMap(classTag[M].runtimeClass))
val relationFieldMap = relations.foldLeft(initialFieldMap)((map, relation) => {
val parts = relation.split('.')
val parentRelation = parts.dropRight(1).mkString(".")
val relationClass = map(parentRelation)(parts.last)
val relationModelClass = relationClass match {
case clazz: Class[_ <: Relation[_]] => getRelationModelClass(clazz)
case _ => throw ProcessStreetException("cannot follow non-relation: " + relation)
}
val fieldMap = generateFieldMap(relationModelClass)
map + (relation -> fieldMap)
})
relationFieldMap
}
val relations = List("organization")
val tree = treeOf[Person](relations)
This won't compile. I get this error:
[error] Foo.scala:148: not found: type _$12
[error] case clazz: Class[_ <: Relation[_]] => getRelationModelClass(clazz)
[error] ^
[error] one error found
[error] (compile:compile) Compilation failed
Basically, what I'd like to do is be able to access the ClassTag information when all I have is a Java Class. Is this possible?
Yes, it is absolutely possible and very easy:
val clazz = classOf[String]
val ct = ClassTag(clazz) // just use ClassTag.apply() method
In your example you'd want to call getRelationModelClass method like this:
getRelationModelClass(clazz)(ClassTag(clazz))
This is possible because [T: ClassTag] syntax implicitly creates second parameters list like (implicit ct: ClassTag[T]). Usually it is filled by the compiler, but nothing prevents you from using it explicitly.
You also don't really need to pass the class AND class tag for this clazz at the same time to the method. You're not even using explicit class object in its body. Just pass the class tag, it will be enough.
I ended up accomplishing my goal using TypeTags and the Scala reflection API. Here are the changes necessary.
First, change the Relation class to use a TypeTag.
case class Relation[M : TypeTag](id: UUID,
model: Option[M] = None)
Then change the rest of the code to use the Scala reflection API:
private def generateFieldMap(tpe: Type): Map[String, Type] =
tpe.members.filter(_.asTerm.isVal).foldLeft(Map.empty[String, Type])((map, field) => {
map + (member.name.toString.trim -> member.typeSignature)
})
private def getRelationModelType(tpe: Type): Type =
tpe match { case TypeRef(_, _, args) => args.head }
def treeOf[M: TypeTag](relations: List[String]): Map[String, Any] = {
val normalizedRelations = ModelHelper.normalize(relations)
val initialFieldMap = Map("" -> generateFieldMap(typeTag[T].tpe))
val relationFieldMap = relations.foldLeft(initialFieldMap)((map, relation) => {
val parts = relation.split('.')
val parentRelation = parts.dropRight(1).mkString(".")
val relationType = map(parentRelation)(parts.last)
val relationModelType = getRelationModelType(relationType)
val fieldMap = generateFieldMap(relationModelType)
map + (relation -> fieldMap)
})
relationFieldMap
}
I have tested three variation of the same code and I got it to work just fine. I want to know why the different behavior.
So I have this working code, which converts a long time stamp to a string of the ECMA date standard format :
lazy val dateFormat = new java.text.SimpleDateFormat("yyyy-MM-DD'T'HH:mm:ss.sssZ")
implicit def dateToECMAFormat(time: Long) = new {
def asECMADateString: String = {
dateFormat.format(new java.util.Date(time))
}
}
Other variation that works :
implicit def dateToECMAFormat(time: Long) = new {
val dateFormat = new java.text.SimpleDateFormat("yyyy-MM-DD'T'HH:mm:ss.sssZ")
def asECMADateString: String = {
dateFormat.format(new java.util.Date(time))
}
}
But I do not want the SimpleDateFormat to be re instanciated all the time . So I prefere the first one. But now the real mystery :
val dateFormat = new java.text.SimpleDateFormat("yyyy-MM-DD'T'HH:mm:ss.sssZ")
implicit def dateToECMAFormat(time: Long) = new {
def asECMADateString: String = {
dateFormat.format(new java.util.Date(time))
}
}
This last piece of code compiles but throws an exception at run-time; I did not manage to get the stack trace from play framework. I just know my controller in play framework 2.1 return with a 500 (Internal Server Error) without any more information (the other controllers work though and the main services are still up).
In each case the call looks like this: 100000L.asECMADateString
Can someone explain to me the different behaviors and why does the last one does not work? I though I had a good grasp of the difference between val, lazy val and def, but now I feel like I am missing something.
UPDATE
The code is called in object like this :
object MyController extends Controller{
implicit val myExecutionContext = getMyExecutionContext
lazy val dateFormat = new java.text.SimpleDateFormat("yyyy-MM-DD'T'HH:mm:ss.sssZ")
implicit def dateToECMAFormat(time: Long) = new {
def asECMADateString: String = {
dateFormat.format(new java.util.Date(time))
}
}
def myAction = Action {
Async {
future {
blocking{
//here get some result from a db
val result = getStuffFromDb
result.someLong.asECMADateString
}
} map { result => Ok(result) } recover { /* return some error code */ }
}
}
}
It is your basic playframework Async action call.
Since the difference between the 1st and 3rd examples are the lazy val, I'd be looking at exactly where your call (100000L.asECMADateString) is being made. lazy val helps correct some "order of initialization" issues with mix-ins, for example: see this recent issue to see if it's similar to yours.