How to write a custom Protobuf CodeGenerator in Java

How to write a custom Protobuf CodeGenerator in Java - java

I'm trying to write a custom code generator for an in-house proprietary programming language. I figured I could write the generator in Java, using the protoc plugin guide. My main() does something like this:
public static void main(String[] args) throws IOException {
CodeGenerator gen = new CodeGenerator();
PluginProtos.CodeGeneratorRequest codeGeneratorRequest = PluginProtos.CodeGeneratorRequest.parseFrom(args[0].getBytes());
codeGeneratorRequest.getProtoFileList().forEach(gen::handleFile);
// get the response and do something with it
//PluginProtos.CodeGeneratorResponse response = PluginProtos.CodeGeneratorResponse.newBuilder().build();
//response.writeTo(System.out);
}
(Obviously I've only just started; wanted to get something stubby working first before actually writing the generation logic)
Problem is: how do I invoke protoc with the --plugin argument to generate code in my custom language, using my plugin? I tried writing a shell script to do it like this:
#!/bin/bash
java -cp ./codegen.jar CodeGeneratorMain "$#"
And I tried invoking protoc like this: protoc --plugin=protoc-gen-code --code_out=./build hello.proto however, when I run that, I get this error:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at CodeGeneratorMain.main(CodeGeneratorMain.java:12)
--code_out: protoc-gen-code: Plugin failed with status code 1.
As though it's not passing the CodeGeneratorRequest on stdin at all. How would I verify that? Am I doing something obviously wrong?

So after reading and re-reading the docs I realized my very silly error: protoc passes the parsed input via stdin not via argv. That means that if I change this: PluginProtos.CodeGeneratorRequest codeGeneratorRequest = PluginProtos.CodeGeneratorRequest.parseFrom(args[0].getBytes()); to this: PluginProtos.CodeGeneratorRequest codeGeneratorRequest = PluginProtos.CodeGeneratorRequest.parseFrom(System.in);
it works.

Related

pyspark: call a custom java function from pyspark. Do I need Java_Gateway?

I wrote the following MyPythonGateway.java so that I can call my custom java class from Python:
public class MyPythonGateway {
public String findMyNum(String input) {
return MyUtiltity.parse(input).getMyNum();
}
public static void main(String[] args) {
GatewayServer server = new GatewayServer(new MyPythonGateway());
server.start();
}
}
and here is how I used it in my Python code:
def main():
gateway = JavaGateway() # connect to the JVM
myObj = gateway.entry_point.findMyNum("1234 GOOD DAY")
print(myObj)
if __name__ == '__main__':
main()
Now I want to use MyPythonGateway.findMyNum() function from PySpark, not just a standalone python script. I did the following:
myNum = sparkcontext._jvm.myPackage.MyPythonGateway.findMyNum("1234 GOOD DAY")
print(myNum)
However, I got the following error:
... line 43, in main:
myNum = sparkcontext._jvm.myPackage.MyPythonGateway.findMyNum("1234 GOOD DAY")
File "/home/edamameQ/spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 726, in __getattr__
py4j.protocol.Py4JError: Trying to call a package.
So what did I miss here? I don't know if I should run a separate JavaApplication of MyPythonGateway to start a gateway server when using pyspark. Please advice. Thanks!
Below is exactly what I need:
input.map(f)
def f(row):
// call MyUtility.java
// x = MyUtility.parse(row).getMyNum()
// return x
What would be the best way to approach this? Thanks!

First of all the error you see usually means the class you're trying to use is not accessible. So most likely it is a CLASSPATH issue.
Regarding general idea there are two important issues:
you cannot access SparkContext inside an action or transformation so using PySpark gateway won't work (see How to use Java/Scala function from an action or a transformation? for some details)). If you want to use Py4J from the workers you'll have to start a separate gateways on each worker machine.
you really don't want to pass data between Python an JVM this way. Py4J is not designed for data intensive tasks.

In PySpark before start calling the method -
myNum = sparkcontext._jvm.myPackage.MyPythonGateway.findMyNum("1234 GOOD DAY")
you have to import MyPythonGateway java class as follows
java_import(sparkContext._jvm, "myPackage.MyPythonGateway")
myPythonGateway = spark.sparkContext._jvm.MyPythonGateway()
myPythonGateway.findMyNum("1234 GOOD DAY")
specify the jar containing myPackage.MyPythonGateway with --jars option in spark-submit

If input.map(f) has inputs as an RDD for example, this might work, since you can't access the JVM variable (attached to spark context) inside the executor for a map function of an RDD (and to my knowledge there is no equivalent for #transient lazy val in pyspark).
def pythonGatewayIterator(iterator):
results = []
jvm = py4j.java_gateway.JavaGateway().jvm
mygw = jvm.myPackage.MyPythonGateway()
for value in iterator:
results.append(mygw.findMyNum(value))
return results
inputs.mapPartitions(pythonGatewayIterator)

all you need to do is compile jar and add to pyspark classpath with --jars or --driver-class-path spark submit options. Then access class and method with below code-
sc._jvm.com.company.MyClass.func1()
where sc - spark context
Tested with Spark 2.3. Keep in mind, you can call JVM class method only from driver program and not executor.

Running a Python program in Java using Jython

I wrote a Python program that consists out of five .py script files.
I want to execute the main of those python scripts from within a Java Application.
What are my options to do so? Using the PythonInterpreter doesn't work, as for example the datetime module can't be loaded from Jython (and I don't want the user to determine his Python path for those dependencies to work).
I compiled the whole folder to .class files using Jython's compileall. Can I embed these .class files somehow to execute the main file from within my Java Application, or how should I proceed?

Have a look at the ProcessBuilder class in java: https://docs.oracle.com/javase/7/docs/api/java/lang/ProcessBuilder.html.
The command used in the java constructor should be the same as what you would type in a command line. For example:
Process p = new ProcessBuilder("python", "myScript.py", "firstargument").start();
(the process builder does the same thing as the python subprocess module).
Have a look at running scripts through processbuilder
N.B. as for the Jython part of the question, if you go to the jython website (have a look at the FAQ section of their website www.jython.org). Check the entry "use jython from java".

I'm also interested in running Python code directly within Java, using Jython, and avoiding the need for an installed Python interpreter.
The article, 'Embedding Jython in Java Applications' explains how to reference an external *.py Python script, and pass it argument parameters, no installed Python interpreter necessary:
#pymodule.py - make this file accessible to your Java code
def square(value):
return value*value
This function can then be executed either by creating a string that
executes it, or by retrieving a pointer to the function and calling
its call method with the correct parameters:
//Java code implementing Jython and calling pymodule.py
import org.python.util.PythonInterpreter;
import org.python.core.*;
public class ImportExample {
public static void main(String [] args) throws PyException
{
PythonInterpreter pi = new PythonInterpreter();
pi.exec("from pymodule import square");
pi.set("integer", new PyInteger(42));
pi.exec("result = square(integer)");
pi.exec("print(result)");
PyInteger result = (PyInteger)pi.get("result");
System.out.println("result: "+ result.asInt());
PyFunction pf = (PyFunction)pi.get("square");
System.out.println(pf.__call__(new PyInteger(5)));
}
}
Jython's Maven/Gradle/etc dependency strings can be found at http://mvnrepository.com/artifact/org.python/jython-standalone/2.7.1
Jython JavaDoc

It is possible to load the other modules. You just need to specify the python path where your custom modules can be found. See the following test case and I am using the Python datatime/math modules inside my calling function (my_maths()) and I have multiple python files in the python.path which are imported by the main.py
#Test
public void testJython() {
Properties properties = System.getProperties();
properties.put("python.path", ".\\src\\test\\resources");
PythonInterpreter.initialize(System.getProperties(), properties, new String[0]);
PythonInterpreter interpreter = new PythonInterpreter();
interpreter.execfile(".\\src\\test\\resources\\main.py");
interpreter.set("id", 150); //set variable value
interpreter.exec("val = my_maths(id)"); //the calling function in main.py
Integer returnVal = (Integer) interpreter.eval("val").__tojava__(Integer.class);
System.out.println("return from python: " + returnVal);
}

Using Expect for Groovy to automate an interactive CLI session

I'm using this code:
http://groovy.codehaus.org/Expect+for+Groovy
to attempt to automate a python based CLI.
My test main function is below.
Running this however, it seems that it never actually reads data from the process.
If I change the process to /bin/ls and expect some filename, it will work correctly, which leads me to believe it cant handle the fact that python is waiting for input, while /bin/ls closes the stream and flushes it.
Any ideas? Thanks.
public static void test2(String[] args){
println "Main"
def builder = new ProcessBuilder("/usr/bin/python");
builder.redirectErrorStream()
builder.redirectOutput(ProcessBuilder.Redirect.PIPE);
builder.redirectInput(ProcessBuilder.Redirect.PIPE);
def expectSession = new IOSession(builder.start());
expectSession.expect(">>>");
expectSession.send("print(%d) % (1+1)")
expectSession.expect("2");
expectSession.send("quit()");
expectSession.close();
println "Done...";
}

Looking through the source for IOSession it looks like this might be a bug in the constructor. Try:
def expectSession = new IOSession();
expectSession.addProcess(builder.start());
Also, you have to add \r to the end of the strings you are sending.

Serializing a JRuby CompiledScript in Java

I have a Ruby script that I'd like to run at the startup of my Java program.
When you tell the ScriptEngine to evaluate the code for the first time, it takes a while. I'm under the impression that the reason it takes this long is because it first needs to compile the code, right?
I found that you can compile Ruby code, and then evaluate it later. The evaluation itself is fast - the compilation part is the slow one. Here I am compiling:
jruby = new ScriptEngineManager().getEngineByName("jruby");
Compilable compilingEngine = (Compilable)jruby;
String code = "print 'HELLO!'";
CompiledScript script;
script = compilingEngine.compile(code);
This snippet is what takes a while. Later when you evaluate it, it is fine.
So of course, I was wondering if it would be possible to "save" this compiled code into a file, so in the future I can "load" it and just execute it without compiling again.

As others have said, this is not possible with CompiledScript. However, with JRuby you have another option. You can use the command line tool jrubyc to compile a Ruby script to Java bytecode like so:
jrubyc <scriptname.rb>
This will produce a class file named scriptname.class. You can run this class from the command line as if it were a normal class with a main(String[] argv) method (note: the jruby runtime needs to be in the classpath) and you can of course load it into your application at runtime.
You can find more details on the output of jrubyc here: https://github.com/jruby/jruby/wiki/JRubyCompiler#methods-in-output-class-file

According to this, no.
"Unfortunately, compiled scripts are not, by default, serializable, so they can't be pre-compiled as part of a deployment process, so compilation should be applied at runtime when you know it makes sense."

I think some really easy cache will solve your problem:
class CompiledScriptCache {
static {
CompiledScriptCache INSTANCE = new CompiledScritCache();
}
publich static CompiledScriptCache get(){
retrun INSTANCE;
};
List<CompiledScript> scripts = new ArrayList<>();
public CompiledScript get(int id){
return scripts.get(id);
}
public int add(String code){
ScriptEngine jruby = new ScriptEngineManager().getEngineByName("jruby");
Compilable compilingEngine = (Compilable)jruby;
CompiledScript script;
script = compilingEngine.compile(code);
scripts.add(script);
return scripts.size()-1;
}
}
update
I thought this question was about avoiding to comile the source more than once.
Only other approach I could imagine is to create Java-Classes and make a cross-compile:
https://github.com/jruby/jruby/wiki/GeneratingJavaClasses

Running caliper commandline

OK, again having some problems with caliper.
I am now running on Linux, trying to use the beta snapshot. I am attempting to run Google's caliper via commandline using just the jar. (Beta snapshot)
I do not have access to maven on this machine, and installing it is out of the question. I would just like to use a jar and, maybe once this is working, I can write up a script or something.
Here is what I am doing:
1. Using small example Benchmark:
import com.google.caliper.Benchmark;
public class Tutorial {
public static class Benchmark1 {
#Benchmark void timeNanoTime(int reps) {
for (int i = 0; i < reps; i++) {
System.nanoTime();
}
}
}
}
2. Compile with javac -cp caliper-1.0-beta-SNAPSHOT-all.jar Tutorial.java
3. (Attempt to) run with
java -cp caliper-1.0-beta-SNAPSHOT-all.jar com.google.caliper.runner.CaliperMain Tutorial.Benchmark1, receive message Benchmark class not found: Tutorial.Benchmark1.
I've tried to work this out from bits and pieces of information from various sources but I am really having a heck of a time with this. I would appreciate any input.

I believe you really need no maven, this should work.
Your own class doesn't get found and I think it's a problem of your classpath. As they're usually more problem with nested classes try simply
java -cp caliper-1.0-beta-SNAPSHOT-all.jar com.google.caliper.runner.CaliperMain Tutorial
If the message changes to something like "class contains no benchmarks", then you'll know more. If you insists on using nested class, you may need to call Tutorial$Benchmark1 (unprobable, but possible; java class naming is sick).
Please try also
java -cp caliper-1.0-beta-SNAPSHOT-all.jar Tutorial.Benchmark1
to see if your class lies on the classpath (the message should change to something like "no main method").
See also this older post.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to write a custom Protobuf CodeGenerator in Java - java

Related

pyspark: call a custom java function from pyspark. Do I need Java_Gateway?

Running a Python program in Java using Jython

Using Expect for Groovy to automate an interactive CLI session

Serializing a JRuby CompiledScript in Java

Running caliper commandline

Categories

Resources