How to have a Step and Split in single job? - java

I'm working on a Spring batch.
In it, a Job should contain a Step(STEP_1) which must be executed as first step always and then there are three other steps that can be executed in parallel.
Without STEP_1 i can execute all three steps in parallel using the java configuration. But when I add this STEP_1 the parallel steps aren't executing. Can anyone show how its needs to be done using java configuration?
Attached a link where it was explained for XML based configuration. But I'm looking for Java Config.
Spring-batch flow / split after a step
Sample code :
#Bean
public Flow splitStep() {
Flow flow1 = new FlowBuilder<Flow>(step01().getName()).from(step01()).end();
Flow flow2 = new FlowBuilder<Flow>(step02().getName()).from(step02()).end();
Flow flow3 = new FlowBuilder<Flow>(step03().getName()).from(step03()).end();
Flow splitFlow = new FlowBuilder<Flow>("splitStep")
//.start(step_job_details()) // Single Separate step needed to be executed as first step always and then then the split steps.
.split(new SimpleAsyncTaskExecutor())
.add(flow1, flow2, flow3).build();
//.add(flow1).build();
return splitFlow;
}
Thanks.

Related

Cucumber Scenarios to be run in Sequential Order

I have few concerns regarding cucumber framework:-
1. I have single Feature file(steps are dependent on each other)and i want to run all the scenarios in order, by default they are running in random order.
2. How to run a single feature file multiple times?
I put some tags and tried to run but no luck.
#Given("Get abc Token")
public void get_abc_Token(io.cucumber.datatable.DataTable dataTable) throws URISyntaxException {
DataTable data=dataTable.transpose();
String tkn= given()
.formParam("parm1",data.column(0).get(1))
.formParam("parm2", data.column(1).get(1))
.formParam("parm3", data.column(2).get(1))
.when()
.post(new URI(testurl)+"/abcapi")
.asString();
jp=new JsonPath(tkn);
Token=jp.getString("access_token");
if (Token==null) {
Assert.assertTrue(false,"Token is NULL");
}else {
}
}
#Given("Get above token")
public void get_abovetoken(io.cucumber.datatable.DataTable dataTable) throws URISyntaxException {
System.out.println("Token is " +Token);
}
}
So in the above steps i am getting token from one step and trying to print token in other step but i got null and not the actual value, because my steps are nunning randommmally
Please note i am running TestRunner via testng.xml file.
Cucumber and testing tools in general are designed to run each test/scenario as a completely independent thing. Linking scenarios together is a terrible anti-pattern don't do it.
Instead learn to write scenarios properly. Scenarios and feature files should have no programming in them at all. Programming needs to be pushed down into the step definitions.
Any scenario, no matter how complicated can be written in 3 steps if you really want to. Your Given can set up any amount of state. Your When deals with what you are doing, and your Then can check up any number of conditions.
You do this by pushing all the detail down out of the scenario and into the step definitions. You improve this further by having the step definitions call helper methods that do all the work.

Limit parallel builds to a specific number in Jenkins

I have a number of builds in my Jenkinsfile which now run in parallel. But the master server is a bit overstrained. So my idea is to limit it's builds to a configured value concurrentBuilds.
https://issues.jenkins-ci.org/browse/JENKINS-44085 inspired me but I'm a bit stuck in my plan. I have a list of services which now gathered in a map which are run in parallel like this:
def stepsForParallel = [:]
stage('read modules') {
readMavenPom().modules.findAll { module ->
module.endsWith('-service')
}.each { service ->
stepsForParallel[service] = transformIntoStep(service) // this returns { build module } to avoid immediate execution
}
}
stage('modules') {
parallel stepsForParallel
}
The build function does use parallel too. So it I get a lot of parallel tasks.
My idea was to create a LinkedBlockingDeque (let's call it stepDeque) that gathers all steps that should be done in parallel. Then I'd create a second one (let's call it workingDeque) with a size of the configured concurrentBuilds.
But then my issue arises: as far as I know I can only run parallel on a map. So, when one of the tasks of the workingDeque finishes I have a free thread.
So my question is: when I poll a job from stepDeque and add it to workingDeque, is there a way to solely run the step I just added? Or is there a simpler way to achieve this?
I have written a Step class which knows its dependents. At the start I gather all steps and put them in a LinkedBlockQueue and I create n workers using
def worker = [:]
maxConcurrentSteps.times {
worker["worker${it}"] = {
Step work = getWork() // uses take() to get new work
while (work != null) {
work.run()
work = getWork()
}
}
}

Job parameters are getting cached

Blockquote
I am facing a problem with jobParameters in spring batch.I have a jobParameter which is optional.For the first time when i am passing job parameter through commandLineJobRunner it is working.For the second time i am not passing any jobParameter but still it is taking the previous jobParameter.When i clear my Meta-Data then jobParameter is coming as null i am not passing.How can i fix this without clearing the Meta-Data.Is this happens normally in spring batch
edited code
I am using MapJobRegistry and next is used while launching the job.When i debugged i have observed that to increment the run.id it is loading all the previous parameters
public JobParameters More ...getNext(JobParameters parameters) {
if (parameters == null) {
parameters = new JobParameters();
}
long id = parameters.getLong(key, 0L) + 1;
return new JobParametersBuilder(parameters).addLong(key, id).toJobParameters();
}
First thing to mention is that you should not use all the Map... classes. They are not intendend for production and, therefore, you better of using the different Jdbc implementations. If you don't wanna use a real DB, you can use always an inmemory DB.
But about your initial question:
You are using the CommandLineJobRunner togehter with the option "next".
Having a look at method CommandLineJobRunner.start() you find the following lines:
if (opts.contains("-next")) {
JobParameters nextParameters = getNextJobParameters(job);
Map<String, JobParameter> map = new HashMap<String, JobParameter>(nextParameters.getParameters());
map.putAll(jobParameters.getParameters());
jobParameters = new JobParameters(map);
}
You can see that getNextJobParameters is called. Inside this method you can see that the data of the previous run is loaded 'jobExplorer.getJobInstances(jobIdentifier, 0, 1);' (if there was a previous run). If there is a previous run, then the job-parameters of this old run are returned after applying the incrementer.next method -> hence, this is the reason you get your old parameters.
Now, this is the technical explanation but the question that follows is how you should use the "next" and "restart" option in order to get what you want.
using of next:
- next works only as expected if you launch a job with the same name and the same jobparameters. Otherwise the results can be confusing. Actually, I use next only inside units and integration tests
using of restart:
- you can use "restart" if the previous jobexecution with the same jobname failed. Also here, the jobparameters will be taken from your previous launch.
For a normal start of a job, you shouldn't be using next nor restart. A normal start of a job should always have a unique jobparameter. For instance, a "runid" whose value is changed with every start of the job. (otherwise, you would get JobInstanceAlready... Exception).
In case of unit tests, I use a unique "runId" for every testcase. And here, I'm using the "next" option.

Spring Batch (java-config) executing step after a jobExeuctionDecider

I'm trying to configure a Flow in spring batch using java-config,
this flow basically has to do this:
Execute a init step(which adds a record in the database),
then execute a decider to check file existence,
2.1. IF the files exists it will execute the load job (which is another flow with a bunch of steps in parallel)
Execute a finish step (which adds a record in the database), this should always run, even if 2.1 was not executed.
I tried to do this configuration, but the finish step never runs:
Flow flow = new FlowBuilder<SimpleFlow>("commonFlow")
.start(stepBuilderFactory.get("initStep").tasklet(initTasklet).build())
.next(decider)
.on(FlowExecutionStatus.COMPLETED.getName())
.to(splitFlow)
.from(decider).on("*")
.end()
.next(stepBuilderFactory.get("finishStep").tasklet(finishTasklet).build())
.end();
I'm able to make it work doing as below, but it is not elegant at all:
Step finishStep = stepBuilderFactory.get("finishStep").tasklet(finishTasklet).build();
Flow flow = new FlowBuilder<SimpleFlow>("commonFlow")
.start(stepBuilderFactory.get("initStep").tasklet(initTasklet).build())
.next(decider)
.on(FlowExecutionStatus.COMPLETED.getName())
.to(splitFlow)
.next(finishStep)
.from(decider).on("*")
.to(finishStep)
.end();
Does anybody know how is the right way to execute a step after a decision using java-config?
It sounds like this is being made MUCH more complicated than it needs to be. You do not need to configure a flow or decider. This is a VERY simple in and out job.
The simplest option is to use Spring Integration to detect the presents of a file and trigger the job.
The next simplest option is just to have a quartz or CRON job check for the file and start the batch job.
Last but not least you can have the job run and if the ItemReader can not find the file(s) just swallow the exception. Or set a FileItemReader Listener to check for files on it's before method.
You can use two separate Flows to achieve this.
Flow flowWithDecider = new FlowBuilder<SimpleFlow>("flow-with-decider")
.start(decider)
.on("ok")
.to(okStep1)
.next(okStep2)
.from(decider)
.on("failed")
.to(failedStep)
.build()
Flow commonFlow = new FlowBuilder<SimpleFlow>("common-flow")
.start(commonStep)
.build()
Job job = jobs
.get("my-job")
.start(flowWithDecider())
.next(commonFlow()) // this will execute after the previous flow, regardless of the decision
.end()
.build();

Passing value between two separate MapReduce jobs

I have an Hadoop program, where I need to pass a single output which is generated from first MapReduce task to a second MapReduce task.
Ex.
MapReduce -1 -> Writes double value to the hdfs (file name is similar to part-00000).
In the second MapReduce job I want to use the double value from part-00000 file.
How can I do it. Can anyone please give some code snippet.
Wait for first job to finish and then run second one on the output of the first. You can do it:
1) In the Driver:
int code = firstJob.waitForCompletion(true) ? 0 : 1;
if (code) {
Job secondJob = new Job(new Configuration(), "JobChaining-Second");
TextInputFormat.addInputPath(secondJob, outputDirOfFirstJob);
...
}
2) Use JobControl and ControlledJob:
http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html
To use JobControl, start by wrapping your jobs with ControlledJob. Doing this is
relatively simple: you create your job like you usually would, except you also create a
ControlledJob that takes in your Job or Configuration as a parameter, along with a
list of its dependencies (other ControlledJobs). Then, you add them one-by-one to the
JobControl object, which handles the rest.
3) Externally (e.g. from shell script). Pass input/output paths as arguments.
4) Use Apache Oozie. You will specify your jobs in XML.

Categories

Resources