Pentaho SDK, how to define a text file input

Pentaho SDK, how to define a text file input - java

I'm trying to define a Pentaho Kettle (ktr) transformation via code. I would like to add to the transformation a Text File Input Step: http://wiki.pentaho.com/display/EAI/Text+File+Input.
I don't know how to do this (note that I want to achieve the result in a custom Java application, not using the standard Spoon GUI). I think I should use the TextFileInputMeta class, but when I try to define the filename the trasformation doesn't work anymore (it seems empty in Spoon).
This is the code I'm using. I think the third line has something wrong:
PluginRegistry registry = PluginRegistry.getInstance();
TextFileInputMeta fileInMeta = new TextFileInputMeta();
fileInMeta.setFileName(new String[] {myFileName});
String fileInPluginId = registry.getPluginId(StepPluginType.class, fileInMeta);
StepMeta fileInStepMeta = new StepMeta(fileInPluginId, myStepName, fileInMeta);
fileInStepMeta.setDraw(true);
fileInStepMeta.setLocation(100, 200);
transAWMMeta.addStep(fileInStepMeta);

To run a transformation programmatically, you should do the following:
Initialise Kettle
Prepare a TransMeta object
Prepare your steps
Don't forget about Meta and Data objects!
Add them to TransMeta
Create Trans and run it
By default, each transformation germinates a thread per step, so use trans.waitUntilFinished() to force your thread to wait until execution completes
Pick execution's results if necessary
Use this test as example: https://github.com/pentaho/pentaho-kettle/blob/master/test/org/pentaho/di/trans/steps/textfileinput/TextFileInputTests.java
Also, I would recommend you create the transformation manually and to load it from file, if it is acceptable for your circumstances. This will help to avoid lots of boilerplate code. It is quite easy to run transformations in this case, see an example here: https://github.com/pentaho/pentaho-kettle/blob/master/test/org/pentaho/di/TestUtilities.java#L346

Related

jenkins plugin - Starting and stopping a Stage from inside a plugin

First, some background why I want this crazy thing. I'm building a Plugin in Jenkins that provides an API for scripts that are started from a pipeline-script to independently communicate with jenkins.
For example a shell-script can then tell jenkins to start a new stage from the running script.
I've got the communication between the script and Jenkins working, but the problem is that I now want to try and start a stage from a callback in my code but I can't seem to figure out how to do it.
Stuff I've tried and failed at:
Start a new StageStep.java
I can't seem to find a way to correctly instantiate and inject the step into the lifecycle. I've looked into DSL.java, but cant seem to get to an instance to call invokeStep(), nor was I able to find out how to instantiate DSL.java with the right environment.
Look at StageStepExecution.java and do what it does.
It seems to either invoke the body with an Environment Variable and nothing else, or set some actions and save the state in a config file when it has no body. I could not find out how the Pipeline: Stage View Plugin hooks into this, but it doesn't seem to read the config file. I've tried setting the Actions (even the inner class through reflection) but that did not seem to do anything.
Inject a custom string as Groovy body and call it with csc.newBodyInvoker()
A hacky solution I came up with was just generating the groovy script and running it like the ParallelStep does. But the sandbox does not allow me to call new GroovyShell().evaluate(""), and If I approve that call, the 'stage' step throws a MissingMethodException. So I also do not instatiate the script with the right environment. Providing the EnvironmentExpander does not make any difference.
Referencing and modifying workflow/{n}.xml
Changing the name of a stage in the relevant workflow/{n}.xml and rebooting the server updates the name of the stage, but modifying my custom stage to look like a regular one does not seem to add the step as a stage.
Stuff I've researched:
If some other plugin does something like this, but I couldn't find any example of plugins starting other steps.
How Jenkins handles the scripts and starts the steps, but It seems as though every step is directly called through the method name after the script is parsed, and I found no way to hook into this.
Other plugins using the StageView through other methods, but I could not find any.
add an AtomNode as a head onto the running thread, but I couldn't find how to replace/add the head and am hesitant to mess with jenkins' threading.
I've spent multiple days on this seemingly trivial call, but I can't seem to figure it out.

So the latest thing I tried actually worked, and is displayed correctly, but it ain't pretty.
I basically reimplemented the implementation of DSL.invokeStep(), which required me to use reflection A LOT. This is not safe, and will break with any changes of course so I'll open an issue in the Jenkins' ticket system in the hopes they will add a public interface for doing this. I'm just hoping this won't give me any weird side-effects.
// First, get some environment stuff
CpsThread cpsThread = CpsThread.current();
CpsFlowExecution currentFlowExecution = (CpsFlowExecution) getContext().get(FlowExecution.class);
// instantiate the stage's descriptor
StageStep.DescriptorImpl stageStepDescriptor = new StageStep.DescriptorImpl();
// now we need to put a new FlowNode as the head of the step-stack. This is of course not possible directly,
// but everything is also outside of the sandbox, so putting the class in the same package doesn't work
// get the 'head' field
Field cpsHeadField = CpsThread.class.getDeclaredField("head");
cpsHeadField.setAccessible(true);
Object headValue = cpsHeadField.get(cpsThread);
// get it's value
Method head_get = headValue.getClass().getDeclaredMethod("get");
head_get.setAccessible(true);
FlowNode currentHead = (FlowNode) head_get.invoke(headValue);
// crate a new StepAtomNode starting at the current value of 'head'.
FlowNode an = new StepAtomNode(currentFlowExecution, stageStepDescriptor, currentHead);
// now set this as the new head.
Method head_setNewHead = headValue.getClass().getDeclaredMethod("setNewHead", FlowNode.class);
head_setNewHead.setAccessible(true);
head_setNewHead.invoke(headValue, an);
// Create a new CpsStepContext, and as the constructor is protected, use reflection again
Constructor<?> declaredConstructor = CpsStepContext.class.getDeclaredConstructors()[0];
declaredConstructor.setAccessible(true);
CpsStepContext context = (CpsStepContext) declaredConstructor.newInstance(stageStepDescriptor,cpsThread,currentFlowExecution.getOwner(),an,null);
stageStepDescriptor.checkContextAvailability(context); // Good to check stuff I guess
// Create a new instance of the step, passing in arguments as a Map
Map<String, Object> stageArguments = new HashMap<>();
stageArguments.put("name", "mynutest");
Step stageStep = stageStepDescriptor.newInstance(stageArguments);
// so start the damd thing
StepExecution execution = stageStep.start(context);
// now that we have a callable instance, we set the step on the Cps Thread. Reflection to the rescue
Method mSetStep = cpsThread.getClass().getDeclaredMethod("setStep", StepExecution.class);
mSetStep.setAccessible(true);
mSetStep.invoke(cpsThread, execution);
// Finally. Start running the step
execution.start();

Is it possible to add test elements on the fly to JMeter with Java code?

I'm being able to create jmx files by using Java code, those output files contain elements such as test plan and samplers; however, I'm running an initialization routine that loads variables from disk and should create/configure new samplers based on that. I don't know how to access the running test plan element or add new sampler elements on the fly though.
Is it something possible to do? I've been browsing the API docs but haven't found a way to do it yet.

it is possible to add new elements on the fly but these new elements will not be executed because the StandardJMeterEngine is started and not aware of them unless you restart the test.
you can create one test containing the routines and create another test plans from a template that you modify, save and run.
another solution is to work only with Variables loaded by the routine
this method provide the possibility to access the testplan at runtime and modify their childs elements
org.apache.jmeter.gui.GuiPackage.getInstance().getTreeModel().getTestPlan();
to execute a test :
FileInputStream in = new FileInputStream(testPlanPath);
HashTree testPlanTree = SaveService.loadTree(in);
in.close();
StandardJMeterEngine jmeter = new StandardJMeterEngine();
jmeter.configure(testPlanTree);
jmeter.runTest();
Do not hesitate if you need more informations.

Generating Oozie Workflows using Java Code

Looking through Oozie examples and documentation, it looks like you need a workflow file in order to run an oozie job from Java code. Is ther any way to submit a job directly fro Java code, without needing a workflow file? Is there any pre-existing way to dynamically generate these files through java code? Are there any pre-existing tools that will make generating them easier? Or will I have to write the entirety of the code to generate the file?
Current Situation
OozieClient wc = new OozieClient("http://bar:8080/oozie");
Properties conf = wc.createConfiguration();
conf.setProperty(OozieClient.APP_PATH, "workflow file path");
// set other properties
...
// submit and start the workflow job
wc.run(conf);
Ideal situation is something vaguely like this.
OozieAction action = new OozieAction("actionName");
action.setOkDestination("nextAction");
action.setErrorDestination("errorDestination");
//Rest of config for action
OozieWorkflow workflow = new Oozieworkflow();
workfow.setStartAction(action);
workflow.addAction(otherAction);
//rest of conf
OozieClient wc = new OozieClient("http://bar:8080/oozie");
wc.runWorkflow(workflow);
Another situation that would be desirable if the former is impossibble is:
OozieAction action = new OozieAction("actionName");
action.setOkDestination("nextAction");
action.setErrorDestination("errorDestination");
//Rest of config for action
OozieWorkflow workflow = new Oozieworkflow();
workfow.setStartAction(action);
workflow.addAction(otherAction);
//rest of conf
workflow.writeToFile("some localFile")
//load file to HDFS
//This would also work
// workflow.writeToHDFS("someHdfsLocation");
OozieClient wc = new OozieClient("http://bar:8080/oozie");
//run with created workflow

I have been in a similar situation.
What I would suggest is to use the oozie schema definition (xsd) and generating the java equivalent objects through xjc. Given these objects you can probably create the workflow (not trivial though)
There are scala based DSL's you can use
https://github.com/klout/scoozie does something similar with Scala->oozie generation

Oozie 5.1.0 added support for Fluent Job API which makes it possible to write java code instead of workflow XML files (under the hood, Oozie will generate the XML file for you).
Simple example for the java code which creates a workflow similar to the shell action demo of Oozie:
public class MyFirstWorkflowFactory implements WorkflowFactory {
#Override
public Workflow create() {
final ShellAction shellAction = ShellActionBuilder.create()
.withName("shell-action")
.withResourceManager("${resourceManager}")
.withNameNode("${nameNode}")
.withConfigProperty("mapred.job.queue.name", "${queueName}")
.withExecutable("echo")
.withArgument("my_output=Hello Oozie")
.withCaptureOutput(true)
.build();
final Workflow shellWorkflow = new WorkflowBuilder()
.withName("shell-workflow")
.withDagContainingNode(shellAction).build();
return shellWorkflow;
}
}
More detailed documentation can be found here: https://oozie.apache.org/docs/5.1.0/DG_FluentJobAPI.html

There's a graphical tool to generate Oozi workflows via an eclipse plugin. Find it here Eclipse marketplace: https://marketplace.eclipse.org/content/oozie-eclipse-plugin
It looks like this:

have a static oozie workflow in your HDFS which just takes 2 parameters and writes the content of parameter1(say content which user enters) to parameter2(say writing to HDFS). Now call the oozie CLI and specify app.path as the location created by workflow1

How to run a .m (matlab) file through java and matlab control?

I have 2 .m files. One is the function and the other one (read.m) reads then function and exports the results into an excel file. I have a java program that makes some changes to the .m files. After the changes I want to automate the execution/running of the .m files. I have downloaded the matlabcontrol.jar and I am looking for a way to use it to invoke and run the read.m file that then reads the function.
Can anyone help me with the code? Thanks
I have tried this code but it does not work.
public static void tomatlab() throws MatlabConnectionException, MatlabInvocationException {
MatlabProxyFactoryOptions options =
new MatlabProxyFactoryOptions.Builder()
.setUsePreviouslyControlledSession(true)
.build();
MatlabProxyFactory factory = new MatlabProxyFactory(options);
MatlabProxy proxy = factory.getProxy();
proxy.eval("addpath('C:\\path_to_read.m')");
proxy.feval("read");
proxy.eval("rmpath('C:\\path_to_read.m')");
// close connection
proxy.disconnect();
}

Based on the official tutorial in the Wiki of the project, it seems quite straightforward to start with this API.
The path-manipulation might be a bit tricky, but I would give a try to loading the whole script into a string and passing it to eval (please note I have no prior experience with this specific Matlab library). That could be done quite easily (with joining Files.readAllLines() for example).
Hope that helps something.

Unable to incorporate Eclispe JDT codeAssist facilities outside a Plug-in

Using Eclipse jdt facilities, you can traverse the AST of java code snippets as follows:
ASTParser ASTparser = ASTParser.newParser(AST.JLS3);
ASTparser.setSource("package x;class X{}".toCharArray());
ASTparser.createAST(null).accept(...);
But when trying to perform code complete & code selection it seems that I have to do it in a plug-in application since I have to write codes like
IFile file = ResourcesPlugin.getWorkspace().getRoot().getFile(new Path(somePath));
ICodeAssist i = JavaCore.createCompilationUnitFrom(f);
i.codeComplete/codeSelect(...)
Is there anyway that I can finally get a stand-alone java application which incorporates the jdt code complete/select facilities?
thx a lot!
shi kui
I have noticed it that using org.eclipse.jdt.internal.codeassist.complete.CompletionParser
I can parse a code snippet as well.
CompletionParser parser =new CompletionParser(new ProblemReporter(
DefaultErrorHandlingPolicies.proceedWithAllProblems(),
new CompilerOptions(null),
new DefaultProblemFactory(Locale.getDefault())),
false);
org.eclipse.jdt.internal.compiler.batch.CompilationUnit sourceUnit =
new org.eclipse.jdt.internal.compiler.batch.CompilationUnit(
"class T{f(){new T().=1;} \nint j;}".toCharArray(), "testName", null);
CompilationResult compilationResult = new CompilationResult(sourceUnit, 0, 0, 0);
CompilationUnitDeclaration unit = parser.dietParse(sourceUnit, compilationResult, 25);
But I have 2 questions:
1. How to retrive the assist information?
2. How can I specify class path or source path for the compiler to look up type/method/field information?

I don't think so, unless you provide your own implementation of ICodeAssist.
As the Performing code assist on Java code mentions, Elements that allow this manipulation should implement ICodeAssist.
There are two kinds of manipulation:
Code completion - compute the completion of a Java token.
Code selection - answer the Java element indicated by the selected text of a given offset and length.
In the Java model there are two elements that implement this interface: IClassFile and ICompilationUnit.
Code completion and code selection only answer results for a class file if it has attached source.
You could try opening a File outside of any workspace (like this FAQ), but the result wouldn't implement ICodeAssist.
So the IFile most of the time comes from a workspace location.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.