How do I execute BASH 'source' command using scala 2.11?
import scala.sys.process.Process
object putFileToHDFS {
def main(args: Array[String]): Unit = {
println("""In class putFileToHDFS""");
val tester = Process("source ./trial.sh").lineStream;
tester.foreach(println)
}
}
I think this is the equivalent of what source does.
import scala.sys.process._
//send file contents to sh for interpretation
val tester = "/bin/cat /path/trial.sh".#|("/bin/sh").lineStream
Of course, things are much easier if trial.sh is executable.
val tester = "/bin/sh -c /path/trial.sh".lineStream
You do not need to source the script file, since you only need to do that when you want to include a script from another one. All you need to do is run it:
Process("bash trial.sh").lineStream
Related
I'm working on an exercise. I need to create a java project that can be run from the scala command line. The final output should be this:
scala> int2str(6)
res0: String = six
scala> int2str(65)
res0: String = sixty-five
How do I create a function that can be accessed by scala like that? I can create a Scala project in IntelliJ, but I don't know how to export that function to be used like that.
Any help is appreciated.
Thanks
Assuming you have the following scala code:
object A {
def int2str(i: Int): String = ""
}
You can just make a jar file from your scala project with a command
sbt package for instance, then run scala -cp <your jar>.
In the scala console just import your function:
import A.int2str
I wrote the following MyPythonGateway.java so that I can call my custom java class from Python:
public class MyPythonGateway {
public String findMyNum(String input) {
return MyUtiltity.parse(input).getMyNum();
}
public static void main(String[] args) {
GatewayServer server = new GatewayServer(new MyPythonGateway());
server.start();
}
}
and here is how I used it in my Python code:
def main():
gateway = JavaGateway() # connect to the JVM
myObj = gateway.entry_point.findMyNum("1234 GOOD DAY")
print(myObj)
if __name__ == '__main__':
main()
Now I want to use MyPythonGateway.findMyNum() function from PySpark, not just a standalone python script. I did the following:
myNum = sparkcontext._jvm.myPackage.MyPythonGateway.findMyNum("1234 GOOD DAY")
print(myNum)
However, I got the following error:
... line 43, in main:
myNum = sparkcontext._jvm.myPackage.MyPythonGateway.findMyNum("1234 GOOD DAY")
File "/home/edamameQ/spark-1.5.2/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 726, in __getattr__
py4j.protocol.Py4JError: Trying to call a package.
So what did I miss here? I don't know if I should run a separate JavaApplication of MyPythonGateway to start a gateway server when using pyspark. Please advice. Thanks!
Below is exactly what I need:
input.map(f)
def f(row):
// call MyUtility.java
// x = MyUtility.parse(row).getMyNum()
// return x
What would be the best way to approach this? Thanks!
First of all the error you see usually means the class you're trying to use is not accessible. So most likely it is a CLASSPATH issue.
Regarding general idea there are two important issues:
you cannot access SparkContext inside an action or transformation so using PySpark gateway won't work (see How to use Java/Scala function from an action or a transformation? for some details)). If you want to use Py4J from the workers you'll have to start a separate gateways on each worker machine.
you really don't want to pass data between Python an JVM this way. Py4J is not designed for data intensive tasks.
In PySpark before start calling the method -
myNum = sparkcontext._jvm.myPackage.MyPythonGateway.findMyNum("1234 GOOD DAY")
you have to import MyPythonGateway java class as follows
java_import(sparkContext._jvm, "myPackage.MyPythonGateway")
myPythonGateway = spark.sparkContext._jvm.MyPythonGateway()
myPythonGateway.findMyNum("1234 GOOD DAY")
specify the jar containing myPackage.MyPythonGateway with --jars option in spark-submit
If input.map(f) has inputs as an RDD for example, this might work, since you can't access the JVM variable (attached to spark context) inside the executor for a map function of an RDD (and to my knowledge there is no equivalent for #transient lazy val in pyspark).
def pythonGatewayIterator(iterator):
results = []
jvm = py4j.java_gateway.JavaGateway().jvm
mygw = jvm.myPackage.MyPythonGateway()
for value in iterator:
results.append(mygw.findMyNum(value))
return results
inputs.mapPartitions(pythonGatewayIterator)
all you need to do is compile jar and add to pyspark classpath with --jars or --driver-class-path spark submit options. Then access class and method with below code-
sc._jvm.com.company.MyClass.func1()
where sc - spark context
Tested with Spark 2.3. Keep in mind, you can call JVM class method only from driver program and not executor.
I wrote a Python program that consists out of five .py script files.
I want to execute the main of those python scripts from within a Java Application.
What are my options to do so? Using the PythonInterpreter doesn't work, as for example the datetime module can't be loaded from Jython (and I don't want the user to determine his Python path for those dependencies to work).
I compiled the whole folder to .class files using Jython's compileall. Can I embed these .class files somehow to execute the main file from within my Java Application, or how should I proceed?
Have a look at the ProcessBuilder class in java: https://docs.oracle.com/javase/7/docs/api/java/lang/ProcessBuilder.html.
The command used in the java constructor should be the same as what you would type in a command line. For example:
Process p = new ProcessBuilder("python", "myScript.py", "firstargument").start();
(the process builder does the same thing as the python subprocess module).
Have a look at running scripts through processbuilder
N.B. as for the Jython part of the question, if you go to the jython website (have a look at the FAQ section of their website www.jython.org). Check the entry "use jython from java".
I'm also interested in running Python code directly within Java, using Jython, and avoiding the need for an installed Python interpreter.
The article, 'Embedding Jython in Java Applications' explains how to reference an external *.py Python script, and pass it argument parameters, no installed Python interpreter necessary:
#pymodule.py - make this file accessible to your Java code
def square(value):
return value*value
This function can then be executed either by creating a string that
executes it, or by retrieving a pointer to the function and calling
its call method with the correct parameters:
//Java code implementing Jython and calling pymodule.py
import org.python.util.PythonInterpreter;
import org.python.core.*;
public class ImportExample {
public static void main(String [] args) throws PyException
{
PythonInterpreter pi = new PythonInterpreter();
pi.exec("from pymodule import square");
pi.set("integer", new PyInteger(42));
pi.exec("result = square(integer)");
pi.exec("print(result)");
PyInteger result = (PyInteger)pi.get("result");
System.out.println("result: "+ result.asInt());
PyFunction pf = (PyFunction)pi.get("square");
System.out.println(pf.__call__(new PyInteger(5)));
}
}
Jython's Maven/Gradle/etc dependency strings can be found at http://mvnrepository.com/artifact/org.python/jython-standalone/2.7.1
Jython JavaDoc
It is possible to load the other modules. You just need to specify the python path where your custom modules can be found. See the following test case and I am using the Python datatime/math modules inside my calling function (my_maths()) and I have multiple python files in the python.path which are imported by the main.py
#Test
public void testJython() {
Properties properties = System.getProperties();
properties.put("python.path", ".\\src\\test\\resources");
PythonInterpreter.initialize(System.getProperties(), properties, new String[0]);
PythonInterpreter interpreter = new PythonInterpreter();
interpreter.execfile(".\\src\\test\\resources\\main.py");
interpreter.set("id", 150); //set variable value
interpreter.exec("val = my_maths(id)"); //the calling function in main.py
Integer returnVal = (Integer) interpreter.eval("val").__tojava__(Integer.class);
System.out.println("return from python: " + returnVal);
}
I installed Groovy.
And I am trying to run groovy scripts from a command prompt that I created using Java, like so:
Runtime.getRuntime().exec("groovy");
So if I type in "groovy" to the command line, this is what I get:
>>>groovy
Cannot run program "groovy": CreateProcess error=2, The system cannot find the file specified
Does anyone have an idea as to what might be going wrong? Should I just use Groovy's implementation of exec? Like:
def processBuilder=new ProcessBuilder("ls")
processBuilder.redirectErrorStream(true)
processBuilder.directory(new File("Your Working dir")) // <--
def process = processBuilder.start()
My guess is that it wouldn't matter whether using Java's implementation or Groovy's implementation.
So how do I run a groovy script?
The way originally described in the question above calling the groovy executable invokes a second Java runtime instance and class loader while the efficient way is to embed the Groovy script directly into the Java runtime as a Java class and invoke it.
Here are three ways to execute a Groovy script from Java:
1) Simplest way is using GroovyShell:
Here is an example Java main program and target Groovy script to invoke:
== TestShell.java ==
import groovy.lang.Binding;
import groovy.lang.GroovyShell;
// call groovy expressions from Java code
Binding binding = new Binding();
binding.setVariable("input", "world");
GroovyShell shell = new GroovyShell(binding);
Object retVal = shell.evaluate(new File("hello.groovy"));
// prints "hello world"
System.out.println("x=" + binding.getVariable("x")); // 123
System.out.println("return=" + retVal); // okay
== hello.groovy ==
println "Hello $input"
x = 123 // script-scoped variables are available via the GroovyShell
return "ok"
2) Next is to use GroovyClassLoader to parse the script into a class then create an instance of it. This approach treats the Groovy script as a class and invokes methods on it as on any Java class.
GroovyClassLoader gcl = new GroovyClassLoader();
Class clazz = gcl.parseClass(new File("hello.groovy");
Object aScript = clazz.newInstance();
// probably cast the object to an interface and invoke methods on it
3) Finally, you can create GroovyScriptEngine and pass in objects as variables using binding. This runs the Groovy script as a script and passes in input using binding variables as opposed to calling explicit methods with arguments.
Note: This third option is for developers who want to embed groovy scripts into a server and have them reloaded on modification.
import groovy.lang.Binding;
import groovy.util.GroovyScriptEngine;
String[] roots = new String[] { "/my/groovy/script/path" };
GroovyScriptEngine gse = new GroovyScriptEngine(roots);
Binding binding = new Binding();
binding.setVariable("input", "world");
gse.run("hello.groovy", binding);
System.out.println(binding.getVariable("output"));
Note: You must include the groovy_all jar in your CLASSPATH for these approaches to work.
Reference: http://groovy.codehaus.org/Embedding+Groovy
I am trying to find the way to run the explain command over the entire pig script in java.
I was using PigServer but it offers only to do explain over the single query (alias) not the entire script.
Is there a way to do something like:
$ pig -x local -e 'explain -script Temp1/TPC_test.pig -out explain-out9.txt'
but from my Java code?
You may use PigRunner for this purpose.
E.g:
import org.apache.pig.PigRunner;
import org.apache.pig.tools.pigstats.PigStats;
public class PigTest {
public static void main(String[] args) throws Exception {
args = new String [] {
"-x", "local",
"-e", "explain -script Temp1/TPC_test.pig -out explain-out9.txt"
};
PigStats stats = PigRunner.run(args, null);
//print plan:
//stats.getJobGraph().explain(System.out, "text", true);
}
}
I found that the following runtime dependencies are needed to avoid NoClassDefFoundError:
jackson-mapper-asl
antlr-runtime
guava
You can use org.apache.pig.PigServer to run pig scripts from Java programs:
PigServer pigServer = new PigServer(ExecType.MAPREDUCE);
pigServer.registerScript("scripts/test.pig");
Requires 'pig.properties' on classpath.
fs.default.name=hdfs://<namenode-hostname>:<port>
mapred.job.tracker=<jobtracker-hostname>:<port>
Or pass an instance of java.util.Properties to PigServer constructor.
Properties props = new Properties();
props.setProperty("fs.default.name", "hdfs://<namenode-hostname>:<port>");
props.setProperty("mapred.job.tracker", "<jobtracker-hostname>:<port>");
PigServer pigServer = new PigServer(ExecType.MAPREDUCE, props);
Hope this helps
Of course you can also use the grunt shell! (I always forget about this.)
On our site, we're using a launcher script which prepares a pig invocation command like follows:
$ pig -p param1=foo -p param2=bar script.pig
You can use explain -script in the grunt shell:
invoke pig
wrap the script call with explain
It looks like:
$ pig
grunt> explain -param param1=foo -param param2=bar script.pig