I am trying to achieve the flow shown in the image below using Spring batch. I was referring to java configuration on page 85 of https://docs.spring.io/spring-batch/4.0.x/reference/pdf/spring-batch-reference.pdf where it talks about Java Configuration.
For some reason, when the Decider returns TYPE2, the batch ends with Failed State without any error message. Following is the java configuration of my job:
jobBuilderFactory.get("myJob")
.incrementer(new RunIdIncrementer())
.preventRestart()
.start(firstStep())
.next(typeDecider()).on("TYPE1").to(stepType1()).next(lastStep())
.from(typeDecider()).on("TYPE2").to(stepType2()).next(lastStep())
.end()
.build();
I think something not right with the java configuration though it matches with the Spring document. A flow can be useful here but I am sure there would be a way without it. Any idea on how to achieve this?
You need to define the flow not only from the decider to next steps but also starting from stepType1 and stepType2 to lastStep. Here is an example:
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.job.flow.FlowExecutionStatus;
import org.springframework.batch.core.job.flow.JobExecutionDecider;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
#Configuration
#EnableBatchProcessing
public class MyJob {
#Autowired
private JobBuilderFactory jobs;
#Autowired
private StepBuilderFactory steps;
#Bean
public Step firstStep() {
return steps.get("firstStep")
.tasklet((contribution, chunkContext) -> {
System.out.println("firstStep");
return RepeatStatus.FINISHED;
})
.build();
}
#Bean
public JobExecutionDecider decider() {
return (jobExecution, stepExecution) -> new FlowExecutionStatus("TYPE1"); // or TYPE2
}
#Bean
public Step stepType1() {
return steps.get("stepType1")
.tasklet((contribution, chunkContext) -> {
System.out.println("stepType1");
return RepeatStatus.FINISHED;
})
.build();
}
#Bean
public Step stepType2() {
return steps.get("stepType2")
.tasklet((contribution, chunkContext) -> {
System.out.println("stepType2");
return RepeatStatus.FINISHED;
})
.build();
}
#Bean
public Step lastStep() {
return steps.get("lastStep")
.tasklet((contribution, chunkContext) -> {
System.out.println("lastStep");
return RepeatStatus.FINISHED;
})
.build();
}
#Bean
public Job job() {
return jobs.get("job")
.start(firstStep())
.next(decider())
.on("TYPE1").to(stepType1())
.from(decider()).on("TYPE2").to(stepType2())
.from(stepType1()).on("*").to(lastStep())
.from(stepType2()).on("*").to(lastStep())
.build()
.build();
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
jobLauncher.run(job, new JobParameters());
}
}
This prints:
firstStep
stepType1
lastStep
If the decider returns TYPE2, the sample prints:
firstStep
stepType2
lastStep
Hope this helps.
Ran into the similar issue where else part is not being called (technically only first configured on() is being called)
Almost all the websites related to the flow and decider examples have the similar job configurations and was not able to figure it out what was the issue.
After some research, found the way how spring maintains the deciders and decisions.
At high level, while initializing the application, based on the job configuration spring maintains a list of decisions for a decider object (like decsion0, decision1, so on).
when we call the decider() method it always returns a new object for the decider. As it is returning a new object, the list contains only one mapping for each object (i.e., decision0 ) and since it is a list, it always return the first configured decision.So this is the reason why only the first configured transition only being called.
Solution:
Instead of making a method call to the decider, create a single-ton bean for the decider and use it in the job configuration
Example:
#Bean
public JobExecutionDecider stepDecider() {
return new CustomStepDecider();
}
inject it and use it in the job creation bean
#Bean
public Job sampleJob(Step step1, Step step2,Step step3,
JobExecutionDecider stepDecider) {
return jobBuilderFactory.get("sampleJob")
.start(step1)
.next(stepDecider).on("TYPE1").to(step2)
.from(stepDecider).on("TYPE2").to(step3)
}
Hope this helps.
Create a dummyStep which will return the FINISH status and jump to next decider. you need to redirect flow cursor to next decider or virtual step after finishing the current step
.next(copySourceFilesStep())
.next(firstStepDecider).on(STEP_CONTINUE).to(executeStep_1())
.from(firstStepDecider).on(STEP_SKIP).to(virtualStep_1())
//-executeStep_2
.from(executeStep_1()).on(ExitStatus.COMPLETED.getExitCode())
.to(secondStepDecider).on(STEP_CONTINUE).to(executeStep_2())
.from(secondStepDecider).on(STEP_SKIP).to(virtualStep_3())
.from(virtualStep_1()).on(ExitStatus.COMPLETED.getExitCode())
.to(secondStepDecider).on(STEP_CONTINUE).to(executeStep_2())
.from(secondStepDecider).on(STEP_SKIP).to(virtualStep_3())
//-executeStep_3
.from(executeStep_2()).on(ExitStatus.COMPLETED.getExitCode())
.to(thirdStepDecider).on(STEP_CONTINUE).to(executeStep_3())
.from(thirdStepDecider).on(STEP_SKIP).to(virtualStep_4())
.from(virtualStep_3()).on(ExitStatus.COMPLETED.getExitCode())
.to(thirdStepDecider).on(STEP_CONTINUE).to(executeStep_3())
.from(thirdStepDecider).on(STEP_SKIP).to(virtualStep_4())
#Bean
public Step virtulaStep_2() {
return stepBuilderFactory.get("continue-virtualStep2")
.tasklet((contribution, chunkContext) -> {
return RepeatStatus.FINISHED;
})
.build();
}
Related
I'm working with spring batch and have a job with two steps the first step (tasklet) validation the header CSV and the second step reads an CSV file and Write to another CSV file like this:
#Bean
public ClassifierCompositeItemWriter<POJO> classifierCompositeItemWriter() throws Exception {
Classifier<POJO, ItemWriter<? super POJO>> classifier = new ClassiItemWriter(ClassiItemWriter.itemWriter());
return new ClassifierCompositeItemWriterBuilder<POJO>()
.classifier(classifier)
.build();
}
#Bean
public Step readAndWriteCsvFile() throws Exception {
return stepBuilderFactory.get("readAndWriteCsvFile")
.<POJO, POJO>chunk(10000)
.reader(ClassitemReader.itemReader())
.processor(processor())
.writer(classifierCompositeItemWriter())
.build();
}
I used a FlatFileItemReader (in ClassitemReader) and a FlatFileItemWriter (in ClassiItemWriter), before reading CSV. I checked if the header of CSV file is correct via tasklet like this :
#Bean
public Step fileValidatorStep() {
return stepBuilderFactory
.get("fileValidatorStep")
.tasklet(fileValidator)
.build();
}
And if so I process the transformation from CSV file recieved to another file CSV.
in the jobBuilderFactory i check if ExistStatus comes from tasklet fileValidatorStep is "COMPLETED" to forward the process to readAndWriteCsvFile(), if not "COMPLETED" and tasklet fileValidatorStep return ExistStatus "ERROR" the job end and exit processing.
#Bean
public Job job() throws Exception {
return jobBuilderFactory.get("job")
.incrementer(new RunIdIncrementer())
.start(fileValidatorStep()).on("ERROR").end()
.next(fileValidatorStep()).on("COMPLETED").to(readAndWriteCsvFile())
.end().build();
}
The problem is that when I launch my Job the Bean readAndWriteCsvFile runs first of the tasklet, that means the standard Bean of reader and writer of spring batch always loaded in life cycle before i can validation the header and check the ExistStatus, the reader is still working and reades the file and puts data in another file without check because is loading Bean during launch of Job before all tasklet.
How i can launch readAndWriteCsvFile methode after fileValidatorStep ?
You don't need a flow job for that, a simple job is enough. Here is a quick example:
import java.util.Arrays;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.StepContribution;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.item.support.ListItemReader;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
#Configuration
#EnableBatchProcessing
public class MyJobConfiguration {
#Bean
public Step validationStep(StepBuilderFactory stepBuilderFactory) {
return stepBuilderFactory.get("validationStep")
.tasklet(new Tasklet() {
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
if(!isValid()) {
throw new Exception("Invalid file");
}
return RepeatStatus.FINISHED;
}
private boolean isValid() {
// TODO implement validation logic
return false;
}
})
.build();
}
#Bean
public Step readAndWriteCsvFile(StepBuilderFactory stepBuilderFactory) {
return stepBuilderFactory.get("readAndWriteCsvFile")
.<Integer, Integer>chunk(2)
.reader(new ListItemReader<>(Arrays.asList(1, 2, 3, 4)))
.writer(items -> items.forEach(System.out::println))
.build();
}
#Bean
public Job job(JobBuilderFactory jobBuilderFactory, StepBuilderFactory stepBuilderFactory) {
return jobBuilderFactory.get("job")
.start(validationStep(stepBuilderFactory))
.next(readAndWriteCsvFile(stepBuilderFactory))
.build();
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJobConfiguration.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
jobLauncher.run(job, new JobParameters());
}
}
In this example, if the validationStep fails, the next step is will not be executed.
I solved my problem, i changed the bean FlatFileItemReader method inside the Job class configuration with the annotation #StepScope, now this bean has only loads when I need it also should avoid declaring the FlatFileItemReader bean out of scope of the job
I have an existing spring batch project that have multiple steps. I want to modify a step so I can stop the job : jobExecution.getStatus() == STOPPED.
My step :
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Autowired
private StepReader reader;
#Autowired
private StepProcessor processor;
#Autowired
private StepWriter writer;
#Autowired
public GenericListener listener;
#Bean
#JobScope
#Qualifier("mystep")
public Step MyStep() throws ReaderException {
return stepBuilderFactory.get("mystep")
.reader(reader.read())
.listener(listener)
.processor(processor)
.writer(writer)
.build();
}
GenericListener implements ItemReadListener, ItemProcessListener, ItemWriteListener and overrides before and after methods that basically write log.
The focus here is on the StepReader class and its read() method that returns a FlatFileItemReader :
#Component
public class StepReader {
public static final String DELIMITER = "|";
#Autowired
private ClassToAccessProperties classToAccessProperties;
private Logger log = Logger.create(StepReader.class);
#Autowired
private FlatFileItemReaderFactory<MyObject> flatFileItemReaderFactory;
public ItemReader<MyObject> read() throws ReaderException {
try {
String csv = classToAccessProperties.getInputCsv();
FlatFileItemReader<MyObject> reader = flatFileItemReaderFactory.create(csv, getLineMapper());
return reader;
} catch (ReaderException | EmptyInputfileException | IOException e) {
throw new ReaderException(e);
} catch (NoInputFileException e) {
log.info("Oh no !! No input file");
// Here I want to stop the job
return null;
}
}
private LineMapper<MyObject> getLineMapper () {
DefaultLineMapper<MyObject> mapper = new DefaultLineMapper<>();
DelimitedLineTokenizer delimitedLineTokenizer = new DelimitedLineTokenizer();
delimitedLineTokenizer.setDelimiter(DELIMITER);
mapper.setLineTokenizer(delimitedLineTokenizer);
mapper.setFieldSetMapper(new MyObjectFieldSetMapper());
return mapper;
}
}
I tried to implement StepExecutionListener in StepReader but with no luck, I think because the reader method in StepBuilderFactory is expecting an ItemReader from the reader.read() method and it doesn't care about the rest of the class.
I'm looking for ideas or solution to be able to stop the entire job (not fail it) when NoInputFileException is catched.
I'm looking for ideas or solution to be able to stop the entire job (not fail it) when NoInputFileException is catched.
This is a common pattern and is described in details in the Handling Step Completion When No Input is Found section of the reference documentation. The example in that section shows how to fail a job when no input file is found, but since you want to stop the job instead of failing it, you can use StepExecution#setTerminateOnly(); in the listener and your job will end with status STOPPED. In your example, you would add that listener to the MyStep step.
However, I would suggest to add a pre-validation step and stop the job if there is no file. Here is a quick example:
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
#Configuration
#EnableBatchProcessing
public class MyJob {
#Autowired
private JobBuilderFactory jobs;
#Autowired
private StepBuilderFactory steps;
#Bean
public Step fileValidationStep() {
return steps.get("fileValidationStep")
.tasklet((contribution, chunkContext) -> {
// TODO add code to check if the file exists
System.out.println("file not found");
chunkContext.getStepContext().getStepExecution().setTerminateOnly();
return RepeatStatus.FINISHED;
})
.build();
}
#Bean
public Step fileProcessingStep() {
return steps.get("fileProcessingStep")
.tasklet((contribution, chunkContext) -> {
System.out.println("processing file");
return RepeatStatus.FINISHED;
})
.build();
}
#Bean
public Job job() {
return jobs.get("job")
.start(fileValidationStep())
.next(fileProcessingStep())
.build();
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
JobExecution jobExecution = jobLauncher.run(job, new JobParameters());
System.out.println("Job status: " + jobExecution.getExitStatus().getExitCode());
}
}
The example prints:
file not found
Job status: STOPPED
Hope this helps.
Goal: if there is an AdmisSkipException (custom exception) I want the job to skip the record and keep on processing next lines.
If there is any other exception I want the job to stop.
Here is what I have so far:
Conf:
.<Admis, PreCandidat>chunk(100)
.reader(readerDBAdmis())
.processor(new AdmisItemProcessor(preCandidatRepository, scolFormationSpecialisationRepository, preCandidatureRepository))
.faultTolerant()
.skipPolicy(AdmisVerificationSkipper())
.writer(writerPGICocktail()).build();
AdmisSkipException :
public class AdmisSkipException extends Exception {
private TypeRejet typeRejet;
private Admis admis;
public AdmisSkipException(TypeRejet typeRejet, Admis admis) {
super();
this.typeRejet = typeRejet;
this.admis = admis;
}
public TypeRejet getTypeRejet() {
return typeRejet;
}
public Admis getAdmis() {
return admis;
}
}
AdmisVerificationSkipper :
public class AdmisVerificationSkipper implements SkipPolicy {
private AdmisRejetRepository admisRejetRepository;
public AdmisVerificationSkipper(AdmisRejetRepository admisRejetRepository) {
this.admisRejetRepository = admisRejetRepository;
}
#Override
public boolean shouldSkip(Throwable exception, int skipCount) throws SkipLimitExceededException {
if (exception instanceof AdmisSkipException) {
AdmisSkipException admisSkipException = (AdmisSkipException) exception;
AdmisRejet rejet = new AdmisRejet();
rejet.setAdmis(admisSkipException.getAdmis());
rejet.setTypeRejet(admisSkipException.getTypeRejet());
admisRejetRepository.save(rejet);
return true;
}
return false;
}
}
With this configuration, if a NullPointerException (for example) is thrown in AdmisItemProcessor, the job will continue instead of failing.
What should I change to stop the job ?
if there is an AdmisSkipException (custom exception) I want the job to skip the record and keep on processing next lines. If there is any other exception I want the job to stop.
You can achieve this with:
.<Admis, PreCandidat>chunk(100)
.reader(readerDBAdmis())
.processor(new AdmisItemProcessor(preCandidatRepository, scolFormationSpecialisationRepository, preCandidatureRepository))
.writer(writerPGICocktail())
.faultTolerant()
.skip(AdmisSkipException.class)
.skipLimit(SKIP_LIMIT)
.build();
Looking at your code, you probably had to create a custom skip policy because you want to save skipped items somewhere. I would recommend to use a SkipListener instead, which is designed specifically for this type of requirements. Having a shouldSkip method save items to a repository is a side effect. So this is better done with a listener. That said, you won't need a custom policy and .skip(AdmisSkipException.class).skipLimit(SKIP_LIMIT) should be enough.
With this configuration, if a NullPointerException (for example) is thrown in AdmisItemProcessor, the job will continue instead of failing. What should I change to stop the job ?
Here is an example you can run to see how it works:
import java.util.Arrays;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.support.ListItemReader;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.lang.Nullable;
#Configuration
#EnableBatchProcessing
public class MyJob {
#Autowired
private JobBuilderFactory jobs;
#Autowired
private StepBuilderFactory steps;
#Bean
public ItemReader<Integer> itemReader() {
return new ListItemReader<>(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10));
}
#Bean
public ItemProcessor<Integer, Integer> itemProcessor() {
return new ItemProcessor<Integer, Integer>() {
#Nullable
#Override
public Integer process(Integer item) throws Exception {
if (item.equals(3)) {
throw new IllegalArgumentException("No 3!");
}
if (item.equals(9)) {
throw new NullPointerException("Boom at 9!");
}
return item;
}
};
}
#Bean
public ItemWriter<Integer> itemWriter() {
return items -> {
for (Integer item : items) {
System.out.println("item = " + item);
}
};
}
#Bean
public Step step() {
return steps.get("step")
.<Integer, Integer>chunk(1)
.reader(itemReader())
.processor(itemProcessor())
.writer(itemWriter())
.faultTolerant()
.skip(IllegalArgumentException.class)
.skipLimit(3)
.build();
}
#Bean
public Job job() {
return jobs.get("job")
.start(step())
.build();
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
JobExecution jobExecution = jobLauncher.run(job, new JobParameters());
System.out.println(jobExecution);
}
}
This example skips items when IllegalArgumentExceptions are thrown and fails the job if a NullPointerException happens.
Hope this helps.
I am just wondering is it doable in Spring Batch?
Step1
Step2 (flow) -> flow1, flow2, flow3
Step3
Where each
flow1 -> partition into 5 GridSize
flow2 -> partition into 5 GridSize
flow3 -> partition into 5 GridSize
return jobBuilderFactory.get("dataLoad")
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(step1())
.next(step2())
.next(step3())
.build()
.build();
#Bean
public Flow step2() {
Flow subflow1 = new FlowBuilder<Flow>("readTable1Flow").from(readTable1()).end();
Flow subflow2 = new FlowBuilder<Flow>("readTable2Flow").from(readTable2()).end();
Flow subflow3 = new FlowBuilder<Flow>("readTable3Flow").from(readTable3()).end();
return new FlowBuilder<Flow>("splitflow").split(new SimpleAsyncTaskExecutor())
.add(subflow1, subflow2, subflow3).build();
}
#Bean
public Step readTable1() {
return stepBuilderFactory.get("readTable1Step")
.partitioner(slaveStep1().getName(), partitioner1())
.partitionHandler(slaveStep1Handler())
.build();
}
#Bean
public Step readTable2() {
return stepBuilderFactory.get("readTable2Step")
.partitioner(slaveStep2().getName(), partitioner2())
.partitionHandler(slaveStep2Handler())
.build();
}
#Bean
public Step readTable3() {
return stepBuilderFactory.get("readTable3Step")
.partitioner(slaveStep3().getName(), partitioner2())
.partitionHandler(slaveStep3Handler())
.build();
}
This is possible. You can have a split flow where each step running in parallel is a partitioned step. Here is an example:
import java.util.HashMap;
import java.util.Map;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepScope;
import org.springframework.batch.core.job.builder.FlowBuilder;
import org.springframework.batch.core.job.flow.Flow;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.partition.support.Partitioner;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.item.ExecutionContext;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.task.SimpleAsyncTaskExecutor;
#Configuration
#EnableBatchProcessing
public class MyJob {
#Autowired
private JobBuilderFactory jobs;
#Autowired
private StepBuilderFactory steps;
#Bean
public Step step1() {
return steps.get("step1")
.tasklet((contribution, chunkContext) -> {
System.out.println(Thread.currentThread().getName() + ": step1");
return RepeatStatus.FINISHED;
})
.build();
}
#Bean
public Flow step2() {
Flow subflow1 = new FlowBuilder<Flow>("step21_master").from(step21_master()).end();
Flow subflow2 = new FlowBuilder<Flow>("step22_master").from(step22_master()).end();
Flow subflow3 = new FlowBuilder<Flow>("step23_master").from(step23_master()).end();
return new FlowBuilder<Flow>("splitflow").split(taskExecutor())
.add(subflow1, subflow2, subflow3).build();
}
#Bean
public Step step21_master() {
return steps.get("step21_master")
.partitioner("workerStep", partitioner("step21_master"))
.step(workerStep())
.gridSize(3)
.taskExecutor(taskExecutor())
.build();
}
#Bean
public Step step22_master() {
return steps.get("step22_master")
.partitioner("workerStep", partitioner("step22_master"))
.step(workerStep())
.gridSize(3)
.taskExecutor(taskExecutor())
.build();
}
#Bean
public Step step23_master() {
return steps.get("step23_master")
.partitioner("workerStep", partitioner("step23_master"))
.step(workerStep())
.gridSize(3)
.taskExecutor(taskExecutor())
.build();
}
#Bean
public Step step3() {
return steps.get("step3")
.tasklet((contribution, chunkContext) -> {
System.out.println(Thread.currentThread().getName() + ": step3");
return RepeatStatus.FINISHED;
})
.build();
}
#Bean
public Step workerStep() {
return steps.get("workerStep")
.tasklet(getTasklet(null))
.build();
}
#Bean
#StepScope
public Tasklet getTasklet(#Value("#{stepExecutionContext['data']}") String partitionData) {
return (contribution, chunkContext) -> {
System.out.println(Thread.currentThread().getName() + " processing partitionData = " + partitionData);
return RepeatStatus.FINISHED;
};
}
#Bean
public Job job() {
return jobs.get("job")
.flow(step1()).on("*").to(step2())
.next(step3())
.build()
.build();
}
#Bean
public SimpleAsyncTaskExecutor taskExecutor() {
return new SimpleAsyncTaskExecutor();
}
public Partitioner partitioner(String stepName) {
return gridSize -> {
Map<String, ExecutionContext> map = new HashMap<>(gridSize);
for (int i = 0; i < gridSize; i++) {
ExecutionContext executionContext = new ExecutionContext();
executionContext.put("data", stepName + ":data" + i);
String key = stepName + ":partition" + i;
map.put(key, executionContext);
}
return map;
};
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
jobLauncher.run(job, new JobParameters());
}
}
In this example, step2 is a split flow (as in your example) where each sub flow is a partitioned step (a master step) with 3 worker steps. If you run this sample, you should see something like:
[main] INFO org.springframework.batch.core.launch.support.SimpleJobLauncher - Job: [FlowJob: [name=job]] launched with the following parameters: [{}]
[main] INFO org.springframework.batch.core.job.SimpleStepHandler - Executing step: [step1]
main: step1
[SimpleAsyncTaskExecutor-1] INFO org.springframework.batch.core.job.SimpleStepHandler - Executing step: [step21_master]
[SimpleAsyncTaskExecutor-3] INFO org.springframework.batch.core.job.SimpleStepHandler - Executing step: [step23_master]
[SimpleAsyncTaskExecutor-2] INFO org.springframework.batch.core.job.SimpleStepHandler - Executing step: [step22_master]
SimpleAsyncTaskExecutor-4 processing partitionData = step21_master: data2
SimpleAsyncTaskExecutor-12 processing partitionData = step22_master: data2
SimpleAsyncTaskExecutor-11 processing partitionData = step22_master: data0
SimpleAsyncTaskExecutor-10 processing partitionData = step22_master: data1
SimpleAsyncTaskExecutor-9 processing partitionData = step23_master: data1
SimpleAsyncTaskExecutor-8 processing partitionData = step23_master: data2
SimpleAsyncTaskExecutor-7 processing partitionData = step23_master: data0
SimpleAsyncTaskExecutor-5 processing partitionData = step21_master: data0
SimpleAsyncTaskExecutor-6 processing partitionData = step21_master: data1
main: step3
[main] INFO org.springframework.batch.core.job.SimpleStepHandler - Executing step: [step3]
[main] INFO org.springframework.batch.core.launch.support.SimpleJobLauncher - Job: [FlowJob: [name=job]] completed with the following parameters: [{}] and the following status: [COMPLETED]
step1, step2 and step3 run in sequence where step2 is split in 3 sub steps running in parallel, where each sub step is a master step for 3 worker steps running in parallel.
I made it working now. The result I was not able to make it work is Thread deadlock. The spring batch is trying to insert/update metadata into database (HsqlDB). Each flow is in waiting status. Once I switch to different db, it works. Thank you Mahmoud for your input!!
Before Spring Batch job runs. I have a import table which contains all items that needs importing into our system. It is at this point verified to contain only items that does not exist in our system.
Next I have a Spring Batch Job, which reads from this import table using JpaPagingItemReader.
After work is done, it writes to db using ItemWriter.
I run with page-size and chunk-size at 10000.
Now this works absolutely fine when running on MySQL innoDB. I can even use multiple threading and everything works fine.
But now we are migrating to PostgreSQL, and the same Batch Job runs into some very strange problem
What happens is that it tries to insert duplicates into our system. This will naturally be rejected by unique index constraints and an error is thrown.
Since the import db table is verified to contain only non-existing before batch job starts, the only reason for this i can think of is that the JpaPagingItemReader reads some rows multiple times from import db table when i run on Postgres. But why would it do that?
I have experimented with a lot of settings. Turning chunk and page-size down to around 100 only makes import slower, but still same error. Running single-thread instead of multiple threads only makes the error happen slightly later.
So what on earth could be the reason for my JpaPagingItemReader reading the same items multiple times only on PostgresSQL?
The select statement backing the reader is simple, its a NamedQuery:
#NamedQuery(name = "ImportDTO.findAllForInsert",
query = "select h from ImportDTO h where h.toBeImported = true")
Please also note that the toBeImported flag will not be altered by the batch job at all during runtime, so the results from this query should always return the same before, under and after the batch job.
Any insights, tips or help is greatly appriciated!
Here is Batch Config code:
import javax.persistence.EntityManagerFactory;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.core.launch.support.SimpleJobLauncher;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.item.database.JpaPagingItemReader;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.task.SimpleAsyncTaskExecutor;
import org.springframework.core.task.TaskExecutor;
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private OrganizationItemWriter organizationItemWriter;
#Autowired
private EntityManagerFactory entityManagerFactory;
#Autowired
private OrganizationUpdateProcessor organizationUpdateProcessor;
#Autowired
private OrganizationInsertProcessor organizationInsertProcessor;
private Integer organizationBatchSize = 10000;
private Integer organizationThreadSize = 3;
private Integer maxThreadSize = organizationThreadSize;
#Bean
public SimpleJobLauncher jobLauncher(JobRepository jobRepository) {
SimpleJobLauncher launcher = new SimpleJobLauncher();
launcher.setJobRepository(jobRepository);
return launcher;
}
#Bean
public JpaPagingItemReader<ImportDTO> findNewImportsToImport() throws Exception {
JpaPagingItemReader<ImportDTO> databaseReader = new JpaPagingItemReader<>();
databaseReader.setEntityManagerFactory(entityManagerFactory);
JpaQueryProviderImpl<ImportDTO> jpaQueryProvider = new JpaQueryProviderImpl<>();
jpaQueryProvider.setQuery("ImportDTO.findAllForInsert");
databaseReader.setQueryProvider(jpaQueryProvider);
databaseReader.setPageSize(organizationBatchSize);
// must be set to false if multi threaded
databaseReader.setSaveState(false);
databaseReader.afterPropertiesSet();
return databaseReader;
}
#Bean
public JpaPagingItemReader<ImportDTO> findImportsToUpdate() throws Exception {
JpaPagingItemReader<ImportDTO> databaseReader = new JpaPagingItemReader<>();
databaseReader.setEntityManagerFactory(entityManagerFactory);
JpaQueryProviderImpl<ImportDTO> jpaQueryProvider = new JpaQueryProviderImpl<>();
jpaQueryProvider.setQuery("ImportDTO.findAllForUpdate");
databaseReader.setQueryProvider(jpaQueryProvider);
databaseReader.setPageSize(organizationBatchSize);
// must be set to false if multi threaded
databaseReader.setSaveState(false);
databaseReader.afterPropertiesSet();
return databaseReader;
}
#Bean
public OrganizationItemWriter writer() throws Exception {
return organizationItemWriter;
}
#Bean
public StepExecutionNotificationListener stepExecutionListener() {
return new StepExecutionNotificationListener();
}
#Bean
public ChunkExecutionListener chunkListener() {
return new ChunkExecutionListener();
}
#Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(maxThreadSize);
return taskExecutor;
}
#Bean
public Job importOrganizationsJob(JobCompletionNotificationListener listener) throws Exception {
return jobBuilderFactory.get("importAndUpdateOrganizationJob")
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(importNewOrganizationsFromImports())
.next(updateOrganizationsFromImports())
.build();
}
#Bean
public Step importNewOrganizationsFromImports() throws Exception {
return stepBuilderFactory.get("importNewOrganizationsFromImports")
.<ImportDTO, Organization> chunk(organizationBatchSize)
.reader(findNewImportsToImport())
.processor(organizationInsertProcessor)
.writer(writer())
.taskExecutor(taskExecutor())
.listener(stepExecutionListener())
.listener(chunkListener())
.throttleLimit(organizationThreadSize)
.build();
}
#Bean
public Step updateOrganizationsFromImports() throws Exception {
return stepBuilderFactory.get("updateOrganizationsFromImports")
.<ImportDTO, Organization> chunk(organizationBatchSize)
.reader(findImportsToUpdate())
.processor(organizationUpdateProcessor)
.writer(writer())
.taskExecutor(taskExecutor())
.listener(stepExecutionListener())
.listener(chunkListener())
.throttleLimit(organizationThreadSize)
.build();
}
}
You need to add order by clause to select