I have a Java script - with a function that I wrote, that I send her a list of strings, the function encrypt each element, and returns a list with the encrypted elements.
My problem is this:
I need to use this function in a python script (send a "list" Python object as input, and receive an "ArrayList" Java object).
How can I call a Java function - that I wrote, in a python script?
And does the list objects are consistent between Python and Java (list Vs. ArrayList)?
A big thank you to all!
** Edit: I'm about to use this entire package in AWS Lambda Function **
The main decisions for choosing a solution seem to be
What do we use to execute the Java program?
How do we transfer computed data from the Java program to the Python program?
E.g. you could decide to use a Java JVM and execute via a call to the operating system from Python.
The computed data could be sent to standard output (in some suitable format) and read in and processed by Python. (See link for the os call and i/o)
Related
I am trying to understand how Apache PySpark works. The video: Spark Python API - Josh Rosen says Python API is a wrapper over Java API. Internally it invokes Java methods. Check around timestamp 6.41
https://www.youtube.com/watch?v=mJXl7t_k0wE
This documentation says Java API is wrapper over Scala API
https://cwiki.apache.org/confluence/display/SPARK/Java+API+Internals
I have few questions as mentioned below:
1) So does that mean for each method such as map, reduce etc. in PySpark, it will invoke corresponding methods(say map) in Java and then Java code will invoke similar methods(map) in Scala. Actual execution will happen through scala code and results will be returned from Scala -> Java -> Python in reverse order again.
2) Also, the closures/functions which are used for "map" are those also sent from python -> java -> scala?
3) class RDD(object):
"""
A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.
Represents an immutable, partitioned collection of elements that can be
operated on in parallel.
"""
def __init__(self, jrdd, ctx, jrdd_deserializer=AutoBatchedSerializer(PickleSerializer())):
self._jrdd = jrdd
self.is_cached = False
self.is_checkpointed = False
self.ctx = ctx
self._jrdd_deserializer = jrdd_deserializer
self._id = jrdd.id()
self.partitioner = None
Does self._jrdd represent Java version of that particular RDD?
5) I am using PySpark in IntelliJ and have loaded source code from https://spark.apache.org/downloads.html.
Is it possible to debug down from PySpark till Scala API for any function invocation e.g "map" function? When I tried, I can see some java related functions are being invoked but after that cannot move forward in IntelliJ debug mode.
Any help/explanation/pointers will be appreciated.
So does that mean for each method such as map, reduce etc. in PySpark, it will invoke corresponding methods(say map) in Java and then Java code will invoke similar methods(map) in Scala.
Yes and no. First of all Java and Scala compile to the same bytecode - at the moment when code is executed both are executed in the same context. Python is a bit different thing - with RDD internal mechanics is different from JVM languages and JVM serves mostly as a transport layer and the worker code is Python. With SQL there is no worker side Python.
Also, the closures/functions which are used for "map" are those also sent from python -> java -> scala?
Serialized versions are send via JVM, but execution context is Python
Does self._jrdd represent Java version of that particular RDD?
Yes, it does.
Is it possible to debug down from PySpark till Scala API for any function invocation e.g "map" function?
How can pyspark be called in debug mode?
I'm trying to pipe data from Spark (app written in Java) to a C++ executable.
My RDD are like : JavaRDD<CustomMatrix>, where CustomMatrix implements Serializable. It is made up of metadata (int, long, String ...) and a short[][].
Other transformations, like map/flatMap/... , work well.
I would like to send the array (short[][]) to a C++ program, to perform some transformations, and get back the modified array.
I used pipe function for piping data as String to a C++ exec. But now I have to serialize my data and send it to the C++ exec. Does anybody have any idea how this should be effectively handled?
I have a Fortran program that is calling into Java using JNI. My Java function receives an array, writes the array to a file, makes a system call to a Python function that computes something and writes the result to a file which in turn is read by the Java function and passed back to Fortran. This works as expected.
Unfortunately, I cannot use Jython because Jython does not support NumPy yet.
The serial implementation of my program works as expected but when I run the parallel implementation of Fortran code that uses OpenMP, file I/O is messed up. Is there any way I can safely read/write from files with the parallel implementation?
I assume that you use hard-coded filenames. The probblem is that all active threads are using the same files to pass data to the next program. Try to separate them. If you are running 3 OpenMP threads then you need 3 files for data transfer.
For separation you could name your files based on UUIDs and pass that filename to your python program as a parameter.
String filename = "myFile" + UUID.randomUUID() + ".dat";
Process p=Runtime.getRuntime().exec("python myProgram.py " + filename);
p.waitFor();
Python program:
print 'using file: ', sys.argv[0]
Is it possible to access a pointer created in c++ in java? Like if I make a string, and make a pointer for the variable (giving the variable a memory place) in c++ is there some command in java that would let me take that pointer and view it? or would I have to output the string to a file, and then preform java file I/O.
You would have to convert it into something java understands through JNI--JNI will have a method to convert your pointer to a string, but then Java will copy the memory and will create a regular Java string out of it--changing your memory after giving it to java will not change the Java string.
I don't think even JNI allows communications through direct memory access but I could be wrong, I haven't looked at it lately.
You should be able to use JNI: http://java.sun.com/docs/books/jni/
You could also try using SWIG: http://www.swig.org/
But before you dive in you should evaluate if you really need to do that. Are you just trying to share data? You could use networking to do that. Pass a tcp message between two programs. Many options exist for sharing data.
i searched many ways but i didn't get the information about how to write the data to a file when call the database function (postgresql) in the java program.. please clarify me on this....
thanks in advance..
I assume you want the file to be written on the database server, not the application server (where the Java Program is running).
You need to implement a stored function in an untrusted language (like python or C). You would then call that function either from Java or from within the already existing function.
Here is an example that I found when googling for this:
CREATE FUNCTION makefile(text)
RETURNS text AS
$$
o=open("/path/to/file")
o.write(args[0])
o.close()
return "ok"
$$
LANGUAgE plpythonu;
Here is another one:
http://www.leidecker.info/pgshell/Having_Fun_With_PostgreSQL.txt
A (very) short description is also given here:
http://archives.postgresql.org/pgsql-novice/2007-01/msg00010.php