Spring Batch (java-config) executing step after a jobExeuctionDecider - java

I'm trying to configure a Flow in spring batch using java-config,
this flow basically has to do this:
Execute a init step(which adds a record in the database),
then execute a decider to check file existence,
2.1. IF the files exists it will execute the load job (which is another flow with a bunch of steps in parallel)
Execute a finish step (which adds a record in the database), this should always run, even if 2.1 was not executed.
I tried to do this configuration, but the finish step never runs:
Flow flow = new FlowBuilder<SimpleFlow>("commonFlow")
.start(stepBuilderFactory.get("initStep").tasklet(initTasklet).build())
.next(decider)
.on(FlowExecutionStatus.COMPLETED.getName())
.to(splitFlow)
.from(decider).on("*")
.end()
.next(stepBuilderFactory.get("finishStep").tasklet(finishTasklet).build())
.end();
I'm able to make it work doing as below, but it is not elegant at all:
Step finishStep = stepBuilderFactory.get("finishStep").tasklet(finishTasklet).build();
Flow flow = new FlowBuilder<SimpleFlow>("commonFlow")
.start(stepBuilderFactory.get("initStep").tasklet(initTasklet).build())
.next(decider)
.on(FlowExecutionStatus.COMPLETED.getName())
.to(splitFlow)
.next(finishStep)
.from(decider).on("*")
.to(finishStep)
.end();
Does anybody know how is the right way to execute a step after a decision using java-config?

It sounds like this is being made MUCH more complicated than it needs to be. You do not need to configure a flow or decider. This is a VERY simple in and out job.
The simplest option is to use Spring Integration to detect the presents of a file and trigger the job.
The next simplest option is just to have a quartz or CRON job check for the file and start the batch job.
Last but not least you can have the job run and if the ItemReader can not find the file(s) just swallow the exception. Or set a FileItemReader Listener to check for files on it's before method.

You can use two separate Flows to achieve this.
Flow flowWithDecider = new FlowBuilder<SimpleFlow>("flow-with-decider")
.start(decider)
.on("ok")
.to(okStep1)
.next(okStep2)
.from(decider)
.on("failed")
.to(failedStep)
.build()
Flow commonFlow = new FlowBuilder<SimpleFlow>("common-flow")
.start(commonStep)
.build()
Job job = jobs
.get("my-job")
.start(flowWithDecider())
.next(commonFlow()) // this will execute after the previous flow, regardless of the decision
.end()
.build();

Related

How to have a Step and Split in single job?

I'm working on a Spring batch.
In it, a Job should contain a Step(STEP_1) which must be executed as first step always and then there are three other steps that can be executed in parallel.
Without STEP_1 i can execute all three steps in parallel using the java configuration. But when I add this STEP_1 the parallel steps aren't executing. Can anyone show how its needs to be done using java configuration?
Attached a link where it was explained for XML based configuration. But I'm looking for Java Config.
Spring-batch flow / split after a step
Sample code :
#Bean
public Flow splitStep() {
Flow flow1 = new FlowBuilder<Flow>(step01().getName()).from(step01()).end();
Flow flow2 = new FlowBuilder<Flow>(step02().getName()).from(step02()).end();
Flow flow3 = new FlowBuilder<Flow>(step03().getName()).from(step03()).end();
Flow splitFlow = new FlowBuilder<Flow>("splitStep")
//.start(step_job_details()) // Single Separate step needed to be executed as first step always and then then the split steps.
.split(new SimpleAsyncTaskExecutor())
.add(flow1, flow2, flow3).build();
//.add(flow1).build();
return splitFlow;
}
Thanks.

Job parameters are getting cached

Blockquote
I am facing a problem with jobParameters in spring batch.I have a jobParameter which is optional.For the first time when i am passing job parameter through commandLineJobRunner it is working.For the second time i am not passing any jobParameter but still it is taking the previous jobParameter.When i clear my Meta-Data then jobParameter is coming as null i am not passing.How can i fix this without clearing the Meta-Data.Is this happens normally in spring batch
edited code
I am using MapJobRegistry and next is used while launching the job.When i debugged i have observed that to increment the run.id it is loading all the previous parameters
public JobParameters More ...getNext(JobParameters parameters) {
if (parameters == null) {
parameters = new JobParameters();
}
long id = parameters.getLong(key, 0L) + 1;
return new JobParametersBuilder(parameters).addLong(key, id).toJobParameters();
}
First thing to mention is that you should not use all the Map... classes. They are not intendend for production and, therefore, you better of using the different Jdbc implementations. If you don't wanna use a real DB, you can use always an inmemory DB.
But about your initial question:
You are using the CommandLineJobRunner togehter with the option "next".
Having a look at method CommandLineJobRunner.start() you find the following lines:
if (opts.contains("-next")) {
JobParameters nextParameters = getNextJobParameters(job);
Map<String, JobParameter> map = new HashMap<String, JobParameter>(nextParameters.getParameters());
map.putAll(jobParameters.getParameters());
jobParameters = new JobParameters(map);
}
You can see that getNextJobParameters is called. Inside this method you can see that the data of the previous run is loaded 'jobExplorer.getJobInstances(jobIdentifier, 0, 1);' (if there was a previous run). If there is a previous run, then the job-parameters of this old run are returned after applying the incrementer.next method -> hence, this is the reason you get your old parameters.
Now, this is the technical explanation but the question that follows is how you should use the "next" and "restart" option in order to get what you want.
using of next:
- next works only as expected if you launch a job with the same name and the same jobparameters. Otherwise the results can be confusing. Actually, I use next only inside units and integration tests
using of restart:
- you can use "restart" if the previous jobexecution with the same jobname failed. Also here, the jobparameters will be taken from your previous launch.
For a normal start of a job, you shouldn't be using next nor restart. A normal start of a job should always have a unique jobparameter. For instance, a "runid" whose value is changed with every start of the job. (otherwise, you would get JobInstanceAlready... Exception).
In case of unit tests, I use a unique "runId" for every testcase. And here, I'm using the "next" option.

Pentaho SDK, how to define a text file input

I'm trying to define a Pentaho Kettle (ktr) transformation via code. I would like to add to the transformation a Text File Input Step: http://wiki.pentaho.com/display/EAI/Text+File+Input.
I don't know how to do this (note that I want to achieve the result in a custom Java application, not using the standard Spoon GUI). I think I should use the TextFileInputMeta class, but when I try to define the filename the trasformation doesn't work anymore (it seems empty in Spoon).
This is the code I'm using. I think the third line has something wrong:
PluginRegistry registry = PluginRegistry.getInstance();
TextFileInputMeta fileInMeta = new TextFileInputMeta();
fileInMeta.setFileName(new String[] {myFileName});
String fileInPluginId = registry.getPluginId(StepPluginType.class, fileInMeta);
StepMeta fileInStepMeta = new StepMeta(fileInPluginId, myStepName, fileInMeta);
fileInStepMeta.setDraw(true);
fileInStepMeta.setLocation(100, 200);
transAWMMeta.addStep(fileInStepMeta);
To run a transformation programmatically, you should do the following:
Initialise Kettle
Prepare a TransMeta object
Prepare your steps
Don't forget about Meta and Data objects!
Add them to TransMeta
Create Trans and run it
By default, each transformation germinates a thread per step, so use trans.waitUntilFinished() to force your thread to wait until execution completes
Pick execution's results if necessary
Use this test as example: https://github.com/pentaho/pentaho-kettle/blob/master/test/org/pentaho/di/trans/steps/textfileinput/TextFileInputTests.java
Also, I would recommend you create the transformation manually and to load it from file, if it is acceptable for your circumstances. This will help to avoid lots of boilerplate code. It is quite easy to run transformations in this case, see an example here: https://github.com/pentaho/pentaho-kettle/blob/master/test/org/pentaho/di/TestUtilities.java#L346

Passing value between two separate MapReduce jobs

I have an Hadoop program, where I need to pass a single output which is generated from first MapReduce task to a second MapReduce task.
Ex.
MapReduce -1 -> Writes double value to the hdfs (file name is similar to part-00000).
In the second MapReduce job I want to use the double value from part-00000 file.
How can I do it. Can anyone please give some code snippet.
Wait for first job to finish and then run second one on the output of the first. You can do it:
1) In the Driver:
int code = firstJob.waitForCompletion(true) ? 0 : 1;
if (code) {
Job secondJob = new Job(new Configuration(), "JobChaining-Second");
TextInputFormat.addInputPath(secondJob, outputDirOfFirstJob);
...
}
2) Use JobControl and ControlledJob:
http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html
To use JobControl, start by wrapping your jobs with ControlledJob. Doing this is
relatively simple: you create your job like you usually would, except you also create a
ControlledJob that takes in your Job or Configuration as a parameter, along with a
list of its dependencies (other ControlledJobs). Then, you add them one-by-one to the
JobControl object, which handles the rest.
3) Externally (e.g. from shell script). Pass input/output paths as arguments.
4) Use Apache Oozie. You will specify your jobs in XML.

Retrieving Jobs and/or Process

Is there an easy way to retrieve a job and check e.g. the status with Play?
I have a few encoding jobs/downloading jobs which run for a long time. In some cases I want to cancel them.
Is there a way to retrieve a list of Jobs or something?
E.g. one Job calls the FFMPEG encoder using the ProcessBuilder. I would like to be able to get this job and kill the process if it is not required (e.g. wrong file uploaded and don't want to wait for an hour before it is finished). If I can get a handle to that Job then I can get to the process as well.
I am using Play 1.2.4
See JobsPlugin.java to see how to list all the scheduledJobs.
Getting the task currently executed is more tricky but you can find your jobs in JobsPlugin.scheduledJobs list by checking Job class and call a method in your custom Job to tell him to cancel
Something like
for (Job<?> job : JobsPlugin.scheduledJobs) {
if (job instanceof MyJob) {
((MyJob) job).cancelWork();
}
}
where cancelWork is your custom method

Categories

Resources