customer Transformer converting to PMML by SkLearn2PMML-Plugin in java side - java

I have known the SkLearn2PMML-Plugin project in github(https://github.com/jpmml/sklearn2pmml-plugin/blob/master/README.md). But I have little experience in Java . Can someone help me to write the java plugin of my feature transformer. Below is my feature transformer.
class FeatureSelector(TransformerMixin):
'''A transformer for extracting certain column(s)'''
def __init__(self, cols):
self.cols = cols
def fit(self, X, y=None, **fit_params):
return self
def transform(self, X, **transform_params):
return X[self.cols]
class ModelTransformer(TransformerMixin):
def __init__(self, model):
self.model = model
def fit(self, *args, **kwargs):
self.model.fit(*args, **kwargs)
return self
def transform(self, X, **transform_params):
return pd.DataFrame(self.model.predict(X))

You can achieve FeatureSelector functionality using the sklearn2pmml.preprocessing.ExpressionTransformer transformation:
selector = ExpressionTransformer("X[0]")
The ModelTransformer functionality is a bit more tricky, but certainly doable. Next time, please consider opening a feature request with the SkLearn2PMML project directly (instead of asking SO to write code for you): https://github.com/jpmml/sklearn2pmml/issues/118

Related

Java and Python Integration using Jep

I am trying python and java integration using Jep. I have loaded randomforest model from pickle file (rf.pkl) as sklearn.ensemble.forest.RandomForestClassifier object from java program using Jep.
I want this loading to be one time so that I wanted to execute a python function defined in python script prediction.py (to predict using rf model) by sending "rfmodel" argument from java to call python function.
But the argument sent to python from java is read as string in python. How can I retain the datatype of argument in python as sklearn.ensemble.forest.RandomForestClassifier?
Jep jep = new Jep();
jep.eval("import pickle");
jep.eval("clf = pickle.load(open('C:/Downloads/DSRFmodel.pkl', 'rb'))");
jep.eval("print(type(clf))");
Object randomForest = jep.getValue("clf");
jep.eval("import integration");
jep.set("arg1", requestId);
jep.set("arg2", randomForest);
jep.eval("result = integration.trainmodel(arg1, arg2)");
------------
python.py
import pickle
def trainmodel(requestid, rf):
//when rf is printed it is 'str' format.
When Jep converts a Python object into a Java object if it does not recognize the Python type it will return the String representation of the Python object, see this bug for discussion on that behavior. If you are running the latest version of Jep(3.8) you can override this behavior by passing a Java class to the getValue function. The PyObject class was created to serve as a generic wrapper around arbitrary python objects. The following code should do what you want:
Jep jep = new Jep();
jep.eval("import pickle");
jep.eval("clf = pickle.load(open('C:/Downloads/DSRFmodel.pkl', 'rb'))");
jep.eval("print(type(clf))");
Object randomForest = jep.getValue("clf", PyObject.class);
jep.eval("import integration");
jep.set("arg1", requestId);
jep.set("arg2", randomForest);
jep.eval("result = integration.trainmodel(arg1, arg2)");

Scala interop with Java SAM without 2.12 M2 flag

Is there any accepted technique of writing Scala code against a Java-8 API which uses Java #FunctionalInterface / SAM / lambda expressions?
While Java lambda expressions inter-op is available under a flag in M2 http://www.scala-lang.org/news/2.12.0-M2, I was rather hoping that a type class / AnyVal solution might work together with scala.FunctionX traits.
Unfortunately though, scala.FunctionX extends AnyRef and not Any so one cannot use/mix these traits into an implicit AnyVal class implementation.
Added: I'm not entirely sure that I though out how I would achieve my aim even if scala.FunctionX were global traits (extending from Any). My use case is this though:
In a project of mine, I've chosen to provide a Java-8 API with FunctionalInterfaces like the Java Stream interfaces & classes so as to cater for the widest possible audience of JVM-based client languages e.g. Closure, Scala, Kotlin. For each client language using my Java-8 API, I will write appropriate bindings (if necessary) to use language-specific idioms if in event of accessing Java-8 API feels clunky in that language.
btw. I would be interested in any comments with this question taken in a Kotlin-Java interop context also.
This Scala program demonstrates one side of the coin for my question, that is, how to get Scala functions to masquerade as Java 8 Lambdas.
Syntactically and idiomatically this seems to work fine by creating some implicit Scala functions to convert Scala functions to their Java 8 FunctionalInterface counterpart types.
The caveat is, of course, that this method does not take advantage of Java 8's ability to optimize lambda creation via invokedynamic.
Accordingly this approach results in a JVM object being created for the Scala function instance and this may impact upon memory usage and performance compared with Java 8 native lambdas.
For the flip side of the coin, that is, how to get Java 8 Lambdas to masquerade as Scala functions, I guess one would have to write some Java code to interop with Scala (if one's aim was to have a Scala API that was callable from Java).
Justin Johansson,
Microblogging about my Project Clockwork,
A new implementation of XPath/XQuery on the JVM,
as #MartianOdyssey on Twitter
https://twitter.com/MartianOdyssey
/**
* Scala Functions masquerading as Java 8 Lambdas.
*
* (C) Justin Johansson 2015.
*
* Microblogging about my Project Clockwork, a
* new implementation of XPath/XQuery on the JVM,
* as #MartianOdyssey on Twitter (https://twitter.com/MartianOdyssey).
*
* Permission to use this code is granted under Apache License,
* Version 2.0 and providing attribution is afforded to author,
* Justin Johansson.
*/
package lab
import scala.language.implicitConversions
import java.util.{ Arrays => JArrays, List => JList }
import java.util.function.{ Consumer => JConsumer, Function => JFunction, Predicate => JPredicate }
import java.util.stream.{ Stream => JStream }
object JLambda extends App {
println("JLambda: Scala to Java 8 lambda test")
implicit def func1ToJConsumer[T](func: T => Unit): JConsumer[T] = {
new JConsumer[T] {
def accept(arg: T): Unit = func(arg)
}
}
implicit def func1ToJFunction[T, R](func: T => R): JFunction[T, R] = {
new JFunction[T, R] {
def apply(arg: T): R = func(arg)
}
}
implicit def func1ToJPredicate[T](func: T => Boolean): JPredicate[T] = {
new JPredicate[T] {
def test(arg: T): Boolean = func(arg)
}
}
val myList = JArrays.asList("cake", "banana", "apple", "coffee")
println(s"myList = $myList")
val myListFiltered: JStream[String] = myList.stream
.filter { x: String => x.startsWith("c") }
val myListFilteredAndMapped: JStream[String] = myListFiltered
.map { x: String => x.toUpperCase }
myListFilteredAndMapped.forEach { x: String => println(s"x=$x") }
}
/*
Outputs:
JLambda: Scala to Java 8 lambda test
myList = [cake, banana, apple, coffee]
x=CAKE
x=COFFEE
*/
btw. I would be interested in any comments with this question taken in a Kotlin-Java interop context also.
Kotlin's FunctionX interfaces are SAM's, so there's no need to do anything extra to make Java 8 understand them

How To Convert Scala Case Class to Java HashMap

I'm using Mule ESB (Java Based) and I have some scala components that modify and create data. My Data is represented in Case Classes. I'm trying to convert them to Java, however Just getting them to convert to Scala types is a challenge. Here's a simplified example of what I'm trying to do:
package com.echostar.ese.experiment
import scala.collection.JavaConverters
case class Resource(guid: String, filename: String)
case class Blackboard(name: String, guid:String, resource: Resource)
object CCC extends App {
val res = Resource("4alskckd", "test.file")
val bb = Blackboard("Test", "123asdfs", res)
val myMap = getCCParams(bb)
val result = new java.util.HashMap[String,Object](myMap)
println("Result:"+result)
def getCCParams(cc: AnyRef) =
(Map[String, Any]() /: cc.getClass.getDeclaredFields) {(a, f) =>
f.setAccessible(true)
val value = f.get(cc) match {
// this covers tuples as well as case classes, so there may be a more specific way
case caseClassInstance: Product => getCCParams(caseClassInstance): Map[String, Any]
case x => x
}
a + (f.getName -> value)
}
}
Current Error: Recursive method needs return type.
My Scala Foo isn't very strong. I grabbed this method from another answer here
and basically know what it's doing, but not enough to change this to java.util.HashMap and java.util.List
Expected Output:
Result:{"name"="Test", "guid"="123asdfs", "resource"= {"guid"="4alskckd", "filename"="test.file"}}
UPDATE1:
1. Added getCCParams(caseClassInstance): Map[String, Any] to line 22 Above per #cem-catikkas. IDE syntax error still says "recursive method ... needs result type" and "overloaded method java.util.HashMap cannot be applied to scala.collection.immutable.Map".
2. Changed java.util.HashMap[String, Object]
You should follow what the error tells you. Since getCCParams is a recursive method you need to declare its return type.
def getCCParams(cc: AnyRef): Map[String, Any]
Answering this in case anyone else going through the issue ends up here (as happened to me).
I believe the error you were getting had to do with the fact that the return type was being declared at method invocation (line 22), however the compiler was expecting it at the method's declaration (in your case, line 17). The below seems to have worked:
def getCCParams(cc: AnyRef): Map[String, Any] = ...
Regarding the conversion from Scala Map to Java HashMap, by adding the ._ wildcard to the JavaConverters import statement, you manage to import all the methods of the object as single identifiers, which is a requirement for implicit conversions. This will include the asJava method which can then be used to convert the Scala Map to a Java one, and then this can be passed to the java.util.HashMap(Map<? extends K,? extends V> m) constructor to instantiate a HashMap:
import scala.collection.JavaConverters._
import java.util.{HashMap => JHashMap}
...
val myMap = getCCParams(bb)
val r = myMap.asJava // converting to java.util.Map[String, Any]
val result: JHashMap[String,Any] = new JHashMap(r)
I wonder if you've considered going at it the other way around, by implementing the java.util.Map interface in your case class? Then you wouldn't have to convert back and forth, but any consumers downstream that are using a Map interface will just work (for example if you're using Groovy's field dot-notation).

Get Java reflection representation of Scala type

This seems like a simple question, but it's very challenging to search for, so I'm asking a new question. My apologies if it's already been asked.
Due to the compiler bug described here Scala 2.11.5 compiler crash with type aliases and manifests (also here https://issues.scala-lang.org/browse/SI-9155), I need to use scala TypeTags and friends for discovery of type parameters to methods. However, I then need to use that type information in a Java library that uses java.lang.Class and java.lang.reflect.Type.
How can I convert a scala.reflect.runtime.universe Type into a java.lang.reflect.Type or java.lang.Class?
Put concretely, how would I fill out the body of this method:
def typeFor[T](implicit tag: TypeTag[T]): java.lang.reflect.Type = ...
or, if that's not possible:
def typeFor[T](implicit tag: TypeTag[T]): java.lang.Class[T] = ...
And note, due to the bug posted above, I cannot use scala.reflect.Manifest.
The short answer is no, but you can try to do something similar to this SO question. However there is an open ticket....
This may have some limitations I'm not aware of, but you could drop down to Java reflection and try something like:
import scala.util.control.Exception._
def typeMe[T](implicit t: TypeTag[T]) = {
catching(classOf[Exception]) opt Class.forName(t.tpe.typeSymbol.asClass.fullName)
}
println(typeMe[String])
println(typeMe[ClassTag[_]])
Results in:
Some(class java.lang.String)
Some(interface scala.reflect.ClassTag)
The way I solved it with manifests, was:
private def typeFromManifest(m: Manifest[_]): Type = {
if (m.typeArguments.isEmpty) { m.runtimeClass }
else new ParameterizedType {
def getRawType = m.runtimeClass
def getActualTypeArguments = m.typeArguments.map(typeFromManifest).toArray
def getOwnerType = null
}
}
Right now I'm trying to solve this using something other than Manifest which should be removed from scala runtime.

How to provide a codec to the SaveAsSequenceFile method in Spark?

I am trying to figure out how to pass a codec to the saveAsSequenceFile method in Apache Spark. Below is the code I am trying to run. I am running Scala 2.10.4, Spark 1.0.0, Java 1.7.60, and Apache Hadoop 2.4.0.
val rdd:RDD[(String, String)] = sc.sequenceFile(secPath,
classOf[Text],
classOf[Text]
).map { case (k,v) => (k.toString, v.toString)}
val sortedOutput = rdd.sortByKey(true, 1)
sortedOutput.saveAsSequenceFile(secPathOut)
My issue is that I am new to Spark and Scala. I do not understand what the javadoc means for the codec variable passed to the saveAsSequenceFile method.
def saveAsSequenceFile(path: String, codec: Option[Class[_ <: CompressionCodec]] = None): Unit
What does the <: mean? I get that the codec is optional, because when I run the above code it works. Could someone please show an example of a properly formatted codec call to this method?
Thanks!
The <: indicates that the class you pass in should extend org.apache.hadoop.io.compress.CompressionCodec (read this), spark uses a lot of HDFS features and is pretty heavily integrated with it at this point. This means you can pass the class of any of the following as the codec, BZip2Codec, DefaultCodec, GzipCodec, there are likely also other extensions of CompressionCodec not built into hadoop. Here is an example of calling the method
sc.parallelize(List((1,2))).saveAsSequenceFile("path",Some(classOf[GzipCodec]))
The Option[...] is used in scala in favor of java's null even though null exists in scala. Option can be Some(...) or None

Categories

Resources