Java and Python Integration using Jep

Java and Python Integration using Jep - java

I am trying python and java integration using Jep. I have loaded randomforest model from pickle file (rf.pkl) as sklearn.ensemble.forest.RandomForestClassifier object from java program using Jep.
I want this loading to be one time so that I wanted to execute a python function defined in python script prediction.py (to predict using rf model) by sending "rfmodel" argument from java to call python function.
But the argument sent to python from java is read as string in python. How can I retain the datatype of argument in python as sklearn.ensemble.forest.RandomForestClassifier?
Jep jep = new Jep();
jep.eval("import pickle");
jep.eval("clf = pickle.load(open('C:/Downloads/DSRFmodel.pkl', 'rb'))");
jep.eval("print(type(clf))");
Object randomForest = jep.getValue("clf");
jep.eval("import integration");
jep.set("arg1", requestId);
jep.set("arg2", randomForest);
jep.eval("result = integration.trainmodel(arg1, arg2)");
------------
python.py
import pickle
def trainmodel(requestid, rf):
//when rf is printed it is 'str' format.

When Jep converts a Python object into a Java object if it does not recognize the Python type it will return the String representation of the Python object, see this bug for discussion on that behavior. If you are running the latest version of Jep(3.8) you can override this behavior by passing a Java class to the getValue function. The PyObject class was created to serve as a generic wrapper around arbitrary python objects. The following code should do what you want:
Jep jep = new Jep();
jep.eval("import pickle");
jep.eval("clf = pickle.load(open('C:/Downloads/DSRFmodel.pkl', 'rb'))");
jep.eval("print(type(clf))");
Object randomForest = jep.getValue("clf", PyObject.class);
jep.eval("import integration");
jep.set("arg1", requestId);
jep.set("arg2", randomForest);
jep.eval("result = integration.trainmodel(arg1, arg2)");

Related

Jython - Scala array/list not JSON serializable within python script

I have a Scala class that wraps an avro record with getters and setters. Using Jython to allow users to write python scripts to process the Avro record and ultimately do a json.dumps on the new processed record.
The issue is, if the user wants to grab a value that is an Array from the record, the interpreter complains that the object is not JSON serializable.
import json
json.dumps(<AClass>.getArray('myArray'))
The AClass is made available to any given python script at run time. Scala AClass:
class AClass {
def getArray(fieldName: String): Array[Integer] = {
val value: GenericData.Array[T] = [....]
value
.asInstanceOf[GenericData.Array[T]]
.asScala
.toArray[T]
}
}
I've tried a few other return types, 1) List[Integer], 2) mutable.Buffer[Integer], just the plain Avro generic Array 3) GenericData.Array[T]. All give the same serialization error with the slightly varying objects:
Runtime exception occurred during Python processing. TypeError: List(1, 2, 3) is not JSON serializable.
... Buffer(1, 2, 3) is not JSON serializable
... [1, 2, 3] is not JSON serializable.
Now it seems that if we were to convert it to a list() from within the python script, it works fine. This gave some leads but need it to happen at the Scala level.
import json
json.dumps(list(<AClass>.getArray('myArray')))
Is there any way to achieve this? What Scala / Java list type would translate directly into the python list type and/or be JSON serializable within the Jython py interpreter?

The json module in jython only accepts standard jython data types (that's why converting to list() works). See: https://docs.python.org/3/library/json.html#py-to-json-table

Method showString([class java.lang.Integer, class java.lang.Integer, class java.lang.Boolean]) does not exist in PySpark

This is the snippet:
from pyspark import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext()
spark = SparkSession(sc)
d = spark.read.format("csv").option("header", True).option("inferSchema", True).load('file.csv')
d.show()
After this runs into the error:
An error occurred while calling o163.showString. Trace:
py4j.Py4JException: Method showString([class java.lang.Integer, class java.lang.Integer, class java.lang.Boolean]) does not exist
All the other methods work well. Tried researching alot but in vain. Any lead will be highly appreciated

This is an indicator of a Spark version mismatch. Before Spark 2.3 show method took only two arguments:
def show(self, n=20, truncate=True):
since 2.3 it takes three arguments:
def show(self, n=20, truncate=True, vertical=False):
In your case Python client seems to invoke the latter one, while the JVM backend uses the older version.
Since SparkContext initialization undergone significant changes in 2.4, which would cause failure on SparkContext.__init__, you're likely using:
2.3.x Python library.
2.2.x JARs.
You can confirm that by checking versions directly from your session, Python:
sc.version
vs. JVM:
sc._jsc.version()
Problems like this, are usually a result of misconfigured PYTHONPATH (either directly, or by using pip installed PySpark on top per-existing Spark binaries) or SPARK_HOME.

On spark-shell console, enter the variable name and see the data type.
As an alternative, you can tab twice after variable named. and it will show necessary function which could be applied.
Example of a DataFrame object.
res23: org.apache.spark.sql.DataFrame = [order_id: string, book_name: string ... 1 more field]

How can I use JSONata in Java?

JSONata is an expression language designed to query and transform JSON data structures.
I find that current implementations of JSONata are in Javascript only. (https://github.com/jsonata-js/jsonata)
I want to use JSONata in my Java code. It'll make life much easier to manipulate JSON documents in Java.
A possible way could be to use the standard Java classes under javax.script package to interact with the Javascript-based JSONata implementation.
Has anyone already done this? Is there any sample code to demonstrate how this can be achieved?
Has anyone implemented other mechanisms of using JSONata in Java?

The following snippet shows how you could invoke the JSONata processor from Java using the embedded JavaScript engine...
ScriptEngineManager factory = new ScriptEngineManager();
ScriptEngine engine = factory.getEngineByName("JavaScript");
Invocable inv = (Invocable) engine;
FileReader jsonata = new FileReader("jsonata.js");
// load the JSONata processor
engine.eval(jsonata);
// read and JSON.parse the input data
byte[] sample = Files.readAllBytes(Paths.get("sample.json"));
engine.put("input", new String(sample));
Object inputjson = engine.eval("JSON.parse(input);");
// query the data
String expression = "$sum(Account.Order.Product.(Price * Quantity))"; // JSONata expression
Object expr = inv.invokeFunction("jsonata", expression);
Object resultjson = inv.invokeMethod(expr, "evaluate", inputjson);
// JSON.stringify the result
engine.put("resultjson", resultjson);
Object result = engine.eval("JSON.stringify(resultjson);");
System.out.println(result);
In this example, the jsonata.js file has been pulled down from the JSONata GitHub repo as well as the 'Invoice' sample code from try.jsonata.org.
Extra code would be needed to handle errors, but this gives the general idea.

I have just posted a Java implementation of JSONata called JSONata4Java.
maven central:
JSONata4Java jar files are located here in Maven Central: https://search.maven.org/search?q=g:com.ibm.jsonata4java
pom.xml dependency
<dependency>
<groupId>com.ibm.jsonata4java</groupId>
<artifactId>JSONata4Java</artifactId>
<version>1.0.0</version>
</dependency>
github:
The Java port of jsonata project named JSONata4Java has been posted here: https://github.com/IBM/JSONata4Java
If you would like to contribute, please print and sign the appropriate JSONata4Java cla document and mail it to me:
IBM Corporation
c/o Nathaniel Mills
16 Deer Hill Ln.
Coventry, CT, 06238, US
Attn: OSS CLA Processing
Thanks in advance.

You can use the JSONata-Java project:
https://github.com/cow-co/jsonata-java
A Java port of the original (JavaScript) interpreter for the JSONata
JSON query and transformation language.

Scala interop with Java SAM without 2.12 M2 flag

Is there any accepted technique of writing Scala code against a Java-8 API which uses Java #FunctionalInterface / SAM / lambda expressions?
While Java lambda expressions inter-op is available under a flag in M2 http://www.scala-lang.org/news/2.12.0-M2, I was rather hoping that a type class / AnyVal solution might work together with scala.FunctionX traits.
Unfortunately though, scala.FunctionX extends AnyRef and not Any so one cannot use/mix these traits into an implicit AnyVal class implementation.
Added: I'm not entirely sure that I though out how I would achieve my aim even if scala.FunctionX were global traits (extending from Any). My use case is this though:
In a project of mine, I've chosen to provide a Java-8 API with FunctionalInterfaces like the Java Stream interfaces & classes so as to cater for the widest possible audience of JVM-based client languages e.g. Closure, Scala, Kotlin. For each client language using my Java-8 API, I will write appropriate bindings (if necessary) to use language-specific idioms if in event of accessing Java-8 API feels clunky in that language.
btw. I would be interested in any comments with this question taken in a Kotlin-Java interop context also.

This Scala program demonstrates one side of the coin for my question, that is, how to get Scala functions to masquerade as Java 8 Lambdas.
Syntactically and idiomatically this seems to work fine by creating some implicit Scala functions to convert Scala functions to their Java 8 FunctionalInterface counterpart types.
The caveat is, of course, that this method does not take advantage of Java 8's ability to optimize lambda creation via invokedynamic.
Accordingly this approach results in a JVM object being created for the Scala function instance and this may impact upon memory usage and performance compared with Java 8 native lambdas.
For the flip side of the coin, that is, how to get Java 8 Lambdas to masquerade as Scala functions, I guess one would have to write some Java code to interop with Scala (if one's aim was to have a Scala API that was callable from Java).
Justin Johansson,
Microblogging about my Project Clockwork,
A new implementation of XPath/XQuery on the JVM,
as #MartianOdyssey on Twitter
https://twitter.com/MartianOdyssey
/**
* Scala Functions masquerading as Java 8 Lambdas.
*
* (C) Justin Johansson 2015.
*
* Microblogging about my Project Clockwork, a
* new implementation of XPath/XQuery on the JVM,
* as #MartianOdyssey on Twitter (https://twitter.com/MartianOdyssey).
*
* Permission to use this code is granted under Apache License,
* Version 2.0 and providing attribution is afforded to author,
* Justin Johansson.
*/
package lab
import scala.language.implicitConversions
import java.util.{ Arrays => JArrays, List => JList }
import java.util.function.{ Consumer => JConsumer, Function => JFunction, Predicate => JPredicate }
import java.util.stream.{ Stream => JStream }
object JLambda extends App {
println("JLambda: Scala to Java 8 lambda test")
implicit def func1ToJConsumer[T](func: T => Unit): JConsumer[T] = {
new JConsumer[T] {
def accept(arg: T): Unit = func(arg)
}
}
implicit def func1ToJFunction[T, R](func: T => R): JFunction[T, R] = {
new JFunction[T, R] {
def apply(arg: T): R = func(arg)
}
}
implicit def func1ToJPredicate[T](func: T => Boolean): JPredicate[T] = {
new JPredicate[T] {
def test(arg: T): Boolean = func(arg)
}
}
val myList = JArrays.asList("cake", "banana", "apple", "coffee")
println(s"myList = $myList")
val myListFiltered: JStream[String] = myList.stream
.filter { x: String => x.startsWith("c") }
val myListFilteredAndMapped: JStream[String] = myListFiltered
.map { x: String => x.toUpperCase }
myListFilteredAndMapped.forEach { x: String => println(s"x=$x") }
}
/*
Outputs:
JLambda: Scala to Java 8 lambda test
myList = [cake, banana, apple, coffee]
x=CAKE
x=COFFEE
*/

btw. I would be interested in any comments with this question taken in a Kotlin-Java interop context also.
Kotlin's FunctionX interfaces are SAM's, so there's no need to do anything extra to make Java 8 understand them

Java nested Map to Scala nested sequence

I'm new to Scala and our project mixes Java and Scala code together (using the Play Framework). I'm trying to write a Scala method that can take a nested Java Map such as:
LinkedHashMap<String, LinkedHashMap<String, String>> groupingA = new LinkedHashMap<String, LinkedHashMap<String,String>>();
And have that java object passed to a Scala function that can loop through it. I have the following scala object definition to try and support the above Java nested map:
Seq[(String, Seq[(String,String)])]
Both the Java file and the Scala file compile fine individually, but when my java object tries to create a new instance of my scala class and pass in the nested map, I get a compiler error with the following details:
[error] ..... overloaded method value apply with alternatives:
[error] (options: java.util.List[String])scala.collection.mutable.Buffer[(String, String)] <and>
[error] (options: scala.collection.immutable.List[String])List[(String, String)] <and>
[error] (options: java.util.Map[String,String])Seq[(String, String)] <and>
[error] (options: scala.collection.immutable.Map[String,String])Seq[(String, String)] <and>
[error] (options: (String, String)*)Seq[(String, String)]
[error] cannot be applied to (java.util.LinkedHashMap[java.lang.String,java.util.LinkedHashMap[java.lang.String,java.lang.String]])
Any ideas here on how I can pass in a nested Java LinkedHashMap such as above into a Scala file where I can generically iterate over a nested collection? I'm trying to write this generic enough that it would also work for a nested Scala collection in case we ever switch to writing our play framework controllers in Scala instead of Java.

Seq is a base trait defined in the Scala Collections hierarchy. While java and scala offer byte code compatibility, scala defines a number of its own types including its own collection library. The rub here is if you want to write idiomatic scala you need to convert your java data to scala data. The way I see it you have a few options.
You can use Richard's solution and convert the java types to scala types in your scala code. I think this is ugly because it assumes your input will always be coming from java land.
You can write beautiful, perfect scala handler and provide a companion object that offers the ugly java conversion behavior. This disentangles your scala implementation from the java details.
Or you could write an implicit def like the one below genericizing it to your heart's content.
.
import java.util.LinkedHashMap
import scala.collection.JavaConversions.mapAsScalaMap
object App{
implicit def wrapLhm[K,V,G](i:LinkedHashMap[K,LinkedHashMap[G,V]]):LHMWrapper[K,V,G] = new LHMWrapper[K,V,G](i)
def main(args: Array[String]){
println("Hello World!")
val lhm = new LinkedHashMap[String, LinkedHashMap[String,String]]()
val inner = new LinkedHashMap[String,String]()
inner.put("one", "one")
lhm.put("outer",inner);
val s = lhm.getSeq()
println(s.toString())
}
class LHMWrapper[K,V,G](value: LinkedHashMap[K,LinkedHashMap[G,V]]){
def getSeq():Seq[ (K, Seq[(G,V)])] = mapAsScalaMap(value).mapValues(mapAsScalaMap(_).toSeq).toSeq
}
}

Try this:
import scala.collections.JavaConversions.mapAsScalaMap
val lhm: LinkedHashMap[String, LinkedHashMap[String, String]] = getLHM()
val scalaMap = mapAsScalaMap(lhm).mapValues(mapAsScalaMap(_).toSeq).toSeq
I tested this, and got a result of type Seq[String, Seq[(String, String)]]
(The conversions will wrap the original Java object, rather than actually creating a Scala object with a copy of the values. So the conversions to Seq aren't necessary, you could leave it as a Map, the iteration order will be the same).
Let me guess, are you processing query parameters?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java and Python Integration using Jep - java

Related

Jython - Scala array/list not JSON serializable within python script

Method showString([class java.lang.Integer, class java.lang.Integer, class java.lang.Boolean]) does not exist in PySpark

How can I use JSONata in Java?

Scala interop with Java SAM without 2.12 M2 flag

Java nested Map to Scala nested sequence

Categories

Resources