How to load and use native library in scala repl? - java

When you use scala repl simple call System.loadLibrary("opencv_410") (in case you are trying to load libopencv_410.so) does not make you able to use native library. If you try to create some object of the class with JNI calls it will tell you --- java.lang.UnsatisfiedLinkError, as if no library was loaded.
Welcome to Scala 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_201).
Type in expressions for evaluation. Or try :help.
scala> System.loadLibrary(org.opencv.core.Core.NATIVE_LIBRARY_NAME)
scala> new org.opencv.core.Mat()
java.lang.UnsatisfiedLinkError: org.opencv.core.Mat.n_Mat()J
at org.opencv.core.Mat.n_Mat(Native Method)
at org.opencv.core.Mat.<init>(Mat.java:26)
... 24 elided
scala>
Solution with no explanation is provided.

To load native library you should load it for the class scala.tools.nsc.interpreter.IMain. As two argument method loadLibrary0 of Runtime class is not accessible from our scope we use reflection to invoke it.
Welcome to Scala 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_201).
Type in expressions for evaluation. Or try :help.
scala> val loadLibrary0 = Runtime.getRuntime.getClass.getDeclaredMethods()(4)
loadLibrary0.setAccessible(true)
loadLibrary0.invoke(Runtime.getRuntime, scala.tools.nsc.interpreter.ILoop.getClass, "opencv_java410")
loadLibrary0: java.lang.reflect.Method = synchronized void java.lang.Runtime.loadLibrary0(java.lang.Class,java.lang.String)
scala>
scala> res1: Object = null
scala> new org.opencv.core.Mat()
res2: org.opencv.core.Mat = Mat [ 0*0*CV_8UC1, isCont=false, isSubmat=false, nativeObj=0x7f5162f2a1f0, dataAddr=0x0 ]

Related

Method showString([class java.lang.Integer, class java.lang.Integer, class java.lang.Boolean]) does not exist in PySpark

This is the snippet:
from pyspark import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext()
spark = SparkSession(sc)
d = spark.read.format("csv").option("header", True).option("inferSchema", True).load('file.csv')
d.show()
After this runs into the error:
An error occurred while calling o163.showString. Trace:
py4j.Py4JException: Method showString([class java.lang.Integer, class java.lang.Integer, class java.lang.Boolean]) does not exist
All the other methods work well. Tried researching alot but in vain. Any lead will be highly appreciated
This is an indicator of a Spark version mismatch. Before Spark 2.3 show method took only two arguments:
def show(self, n=20, truncate=True):
since 2.3 it takes three arguments:
def show(self, n=20, truncate=True, vertical=False):
In your case Python client seems to invoke the latter one, while the JVM backend uses the older version.
Since SparkContext initialization undergone significant changes in 2.4, which would cause failure on SparkContext.__init__, you're likely using:
2.3.x Python library.
2.2.x JARs.
You can confirm that by checking versions directly from your session, Python:
sc.version
vs. JVM:
sc._jsc.version()
Problems like this, are usually a result of misconfigured PYTHONPATH (either directly, or by using pip installed PySpark on top per-existing Spark binaries) or SPARK_HOME.
On spark-shell console, enter the variable name and see the data type.
As an alternative, you can tab twice after variable named. and it will show necessary function which could be applied.
Example of a DataFrame object.
res23: org.apache.spark.sql.DataFrame = [order_id: string, book_name: string ... 1 more field]

Java and Python Integration using Jep

I am trying python and java integration using Jep. I have loaded randomforest model from pickle file (rf.pkl) as sklearn.ensemble.forest.RandomForestClassifier object from java program using Jep.
I want this loading to be one time so that I wanted to execute a python function defined in python script prediction.py (to predict using rf model) by sending "rfmodel" argument from java to call python function.
But the argument sent to python from java is read as string in python. How can I retain the datatype of argument in python as sklearn.ensemble.forest.RandomForestClassifier?
Jep jep = new Jep();
jep.eval("import pickle");
jep.eval("clf = pickle.load(open('C:/Downloads/DSRFmodel.pkl', 'rb'))");
jep.eval("print(type(clf))");
Object randomForest = jep.getValue("clf");
jep.eval("import integration");
jep.set("arg1", requestId);
jep.set("arg2", randomForest);
jep.eval("result = integration.trainmodel(arg1, arg2)");
------------
python.py
import pickle
def trainmodel(requestid, rf):
//when rf is printed it is 'str' format.
When Jep converts a Python object into a Java object if it does not recognize the Python type it will return the String representation of the Python object, see this bug for discussion on that behavior. If you are running the latest version of Jep(3.8) you can override this behavior by passing a Java class to the getValue function. The PyObject class was created to serve as a generic wrapper around arbitrary python objects. The following code should do what you want:
Jep jep = new Jep();
jep.eval("import pickle");
jep.eval("clf = pickle.load(open('C:/Downloads/DSRFmodel.pkl', 'rb'))");
jep.eval("print(type(clf))");
Object randomForest = jep.getValue("clf", PyObject.class);
jep.eval("import integration");
jep.set("arg1", requestId);
jep.set("arg2", randomForest);
jep.eval("result = integration.trainmodel(arg1, arg2)");

convert java to scala code - change of method signatures

Trying to convert some java to scala code I face the problem of a different method signature which compiled fine in the java world:
The following code in java (from https://github.com/DataSystemsLab/GeoSpark/blob/master/babylon/src/main/java/org/datasyslab/babylon/showcase/Example.java#L122-L126)
visualizationOperator = new ScatterPlot(1000,600,USMainLandBoundary,false,-1,-1,true,true);
visualizationOperator.CustomizeColor(255, 255, 255, 255, Color.GREEN, true);
visualizationOperator.Visualize(sparkContext, spatialRDD);
imageGenerator = new SparkImageGenerator();
imageGenerator.SaveAsFile(visualizationOperator.distributedVectorImage, "file://"+outputPath,ImageType.SVG);
Is translated to https://github.com/geoHeil/geoSparkScalaSample/blob/master/src/main/scala/myOrg/visualization/Vis.scala#L45-L57
val vDistributedVector = new ScatterPlot(1000, 600, USMainLandBoundary, false, -1, -1, true, true)
vDistributedVector.CustomizeColor(255, 255, 255, 255, Color.GREEN, true)
vDistributedVector.Visualize(s, spatialRDD)
sparkImageGenerator.SaveAsFile(vDistributedVector.distributedVectorImage, outputPath + "distributedVector", ImageType.SVG)
Which will throw the following error:
overloaded method value SaveAsFile with alternatives:
[error] (x$1: java.util.List[String],x$2: String,x$3: org.datasyslab.babylon.utils.ImageType)Boolean <and>
[error] (x$1: java.awt.image.BufferedImage,x$2: String,x$3: org.datasyslab.babylon.utils.ImageType)Boolean <and>
[error] (x$1: org.apache.spark.api.java.JavaPairRDD,x$2: String,x$3: org.datasyslab.babylon.utils.ImageType)Boolean
[error] cannot be applied to (org.apache.spark.api.java.JavaPairRDD[Integer,String], String, org.datasyslab.babylon.utils.ImageType)
[error] sparkImageGenerator.SaveAsFile(vDistributedVector.distributedVectorImage, outputPath + "distributedVector", ImageType.SVG)
Unfortunately, I am not really sure how to fix this / how to properly call the method in scala.
This is a problem in ImageGenerator, inherited by SparkImageGenerator. As you can see here, it has a method
public boolean SaveAsFile(JavaPairRDD distributedImage, String outputPath, ImageType imageType)
which uses a raw type (JavaPairRDD without <...>). They exist primarily for compatibility with pre-Java 5 code and shouldn't normally be used otherwise. For this code, there is certainly no good reason, as it actually expects specific type parameters. Using raw types merely loses type-safety. Maybe some subclasses (current or potential) might override it and expect different type parameters, but this would be a misuse of inheritance and there must be a better solution.
Scala doesn't support raw types in any way and so you can't call this method from it (AFAIK). As a workaround, you could write a wrapper in Java which used correct types and call this wrapper from Scala. I misremembered, it's extending Java classes extending raw types which was impossible, and even then there are workarounds.
You might be able to call it by explicit type ascription (preferable to casting):
sparkImageGenerator.SaveAsFile(
(vDistributedVector.distributedVectorImage: JavaPairRDD[_, _]),
outputPath + "distributedVector", ImageType.SVG)
But given the error message shows just JavaPairRDD, I don't particularly expect it to work. If this fails, I'd still go with a Java wrapper.
The accepted answer is correct in saying that raw types should be avoided. However Scala can interoperate with Java code that has raw types. Scala interprets the raw type java.util.List as the existential type java.util.List[_].
Take for example this Java code:
// Test.java
import java.util.Map;
public class Test {
public boolean foo(Map map, String s) {
return true;
}
}
Then try to call it from Scala:
Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131).
Type in expressions for evaluation. Or try :help.
scala> import java.util.{Map,HashMap}
import java.util.{Map,HashMap}
scala> new Test().foo(new HashMap[String,Integer], "a")
res0: Boolean = true
scala> val h: Map[_,_] = new HashMap[String,Integer]
h: java.util.Map[_, _] = {}
scala> new Test().foo(h, "a")
res1: Boolean = true
So it looks like there must be some other problem.

Scala interop with Java SAM without 2.12 M2 flag

Is there any accepted technique of writing Scala code against a Java-8 API which uses Java #FunctionalInterface / SAM / lambda expressions?
While Java lambda expressions inter-op is available under a flag in M2 http://www.scala-lang.org/news/2.12.0-M2, I was rather hoping that a type class / AnyVal solution might work together with scala.FunctionX traits.
Unfortunately though, scala.FunctionX extends AnyRef and not Any so one cannot use/mix these traits into an implicit AnyVal class implementation.
Added: I'm not entirely sure that I though out how I would achieve my aim even if scala.FunctionX were global traits (extending from Any). My use case is this though:
In a project of mine, I've chosen to provide a Java-8 API with FunctionalInterfaces like the Java Stream interfaces & classes so as to cater for the widest possible audience of JVM-based client languages e.g. Closure, Scala, Kotlin. For each client language using my Java-8 API, I will write appropriate bindings (if necessary) to use language-specific idioms if in event of accessing Java-8 API feels clunky in that language.
btw. I would be interested in any comments with this question taken in a Kotlin-Java interop context also.
This Scala program demonstrates one side of the coin for my question, that is, how to get Scala functions to masquerade as Java 8 Lambdas.
Syntactically and idiomatically this seems to work fine by creating some implicit Scala functions to convert Scala functions to their Java 8 FunctionalInterface counterpart types.
The caveat is, of course, that this method does not take advantage of Java 8's ability to optimize lambda creation via invokedynamic.
Accordingly this approach results in a JVM object being created for the Scala function instance and this may impact upon memory usage and performance compared with Java 8 native lambdas.
For the flip side of the coin, that is, how to get Java 8 Lambdas to masquerade as Scala functions, I guess one would have to write some Java code to interop with Scala (if one's aim was to have a Scala API that was callable from Java).
Justin Johansson,
Microblogging about my Project Clockwork,
A new implementation of XPath/XQuery on the JVM,
as #MartianOdyssey on Twitter
https://twitter.com/MartianOdyssey
/**
* Scala Functions masquerading as Java 8 Lambdas.
*
* (C) Justin Johansson 2015.
*
* Microblogging about my Project Clockwork, a
* new implementation of XPath/XQuery on the JVM,
* as #MartianOdyssey on Twitter (https://twitter.com/MartianOdyssey).
*
* Permission to use this code is granted under Apache License,
* Version 2.0 and providing attribution is afforded to author,
* Justin Johansson.
*/
package lab
import scala.language.implicitConversions
import java.util.{ Arrays => JArrays, List => JList }
import java.util.function.{ Consumer => JConsumer, Function => JFunction, Predicate => JPredicate }
import java.util.stream.{ Stream => JStream }
object JLambda extends App {
println("JLambda: Scala to Java 8 lambda test")
implicit def func1ToJConsumer[T](func: T => Unit): JConsumer[T] = {
new JConsumer[T] {
def accept(arg: T): Unit = func(arg)
}
}
implicit def func1ToJFunction[T, R](func: T => R): JFunction[T, R] = {
new JFunction[T, R] {
def apply(arg: T): R = func(arg)
}
}
implicit def func1ToJPredicate[T](func: T => Boolean): JPredicate[T] = {
new JPredicate[T] {
def test(arg: T): Boolean = func(arg)
}
}
val myList = JArrays.asList("cake", "banana", "apple", "coffee")
println(s"myList = $myList")
val myListFiltered: JStream[String] = myList.stream
.filter { x: String => x.startsWith("c") }
val myListFilteredAndMapped: JStream[String] = myListFiltered
.map { x: String => x.toUpperCase }
myListFilteredAndMapped.forEach { x: String => println(s"x=$x") }
}
/*
Outputs:
JLambda: Scala to Java 8 lambda test
myList = [cake, banana, apple, coffee]
x=CAKE
x=COFFEE
*/
btw. I would be interested in any comments with this question taken in a Kotlin-Java interop context also.
Kotlin's FunctionX interfaces are SAM's, so there's no need to do anything extra to make Java 8 understand them

How to provide a codec to the SaveAsSequenceFile method in Spark?

I am trying to figure out how to pass a codec to the saveAsSequenceFile method in Apache Spark. Below is the code I am trying to run. I am running Scala 2.10.4, Spark 1.0.0, Java 1.7.60, and Apache Hadoop 2.4.0.
val rdd:RDD[(String, String)] = sc.sequenceFile(secPath,
classOf[Text],
classOf[Text]
).map { case (k,v) => (k.toString, v.toString)}
val sortedOutput = rdd.sortByKey(true, 1)
sortedOutput.saveAsSequenceFile(secPathOut)
My issue is that I am new to Spark and Scala. I do not understand what the javadoc means for the codec variable passed to the saveAsSequenceFile method.
def saveAsSequenceFile(path: String, codec: Option[Class[_ <: CompressionCodec]] = None): Unit
What does the <: mean? I get that the codec is optional, because when I run the above code it works. Could someone please show an example of a properly formatted codec call to this method?
Thanks!
The <: indicates that the class you pass in should extend org.apache.hadoop.io.compress.CompressionCodec (read this), spark uses a lot of HDFS features and is pretty heavily integrated with it at this point. This means you can pass the class of any of the following as the codec, BZip2Codec, DefaultCodec, GzipCodec, there are likely also other extensions of CompressionCodec not built into hadoop. Here is an example of calling the method
sc.parallelize(List((1,2))).saveAsSequenceFile("path",Some(classOf[GzipCodec]))
The Option[...] is used in scala in favor of java's null even though null exists in scala. Option can be Some(...) or None

Categories

Resources