I'm working with spring batch and have a job with two steps the first step (tasklet) validation the header CSV and the second step reads an CSV file and Write to another CSV file like this:
#Bean
public ClassifierCompositeItemWriter<POJO> classifierCompositeItemWriter() throws Exception {
Classifier<POJO, ItemWriter<? super POJO>> classifier = new ClassiItemWriter(ClassiItemWriter.itemWriter());
return new ClassifierCompositeItemWriterBuilder<POJO>()
.classifier(classifier)
.build();
}
#Bean
public Step readAndWriteCsvFile() throws Exception {
return stepBuilderFactory.get("readAndWriteCsvFile")
.<POJO, POJO>chunk(10000)
.reader(ClassitemReader.itemReader())
.processor(processor())
.writer(classifierCompositeItemWriter())
.build();
}
I used a FlatFileItemReader (in ClassitemReader) and a FlatFileItemWriter (in ClassiItemWriter), before reading CSV. I checked if the header of CSV file is correct via tasklet like this :
#Bean
public Step fileValidatorStep() {
return stepBuilderFactory
.get("fileValidatorStep")
.tasklet(fileValidator)
.build();
}
And if so I process the transformation from CSV file recieved to another file CSV.
in the jobBuilderFactory i check if ExistStatus comes from tasklet fileValidatorStep is "COMPLETED" to forward the process to readAndWriteCsvFile(), if not "COMPLETED" and tasklet fileValidatorStep return ExistStatus "ERROR" the job end and exit processing.
#Bean
public Job job() throws Exception {
return jobBuilderFactory.get("job")
.incrementer(new RunIdIncrementer())
.start(fileValidatorStep()).on("ERROR").end()
.next(fileValidatorStep()).on("COMPLETED").to(readAndWriteCsvFile())
.end().build();
}
The problem is that when I launch my Job the Bean readAndWriteCsvFile runs first of the tasklet, that means the standard Bean of reader and writer of spring batch always loaded in life cycle before i can validation the header and check the ExistStatus, the reader is still working and reades the file and puts data in another file without check because is loading Bean during launch of Job before all tasklet.
How i can launch readAndWriteCsvFile methode after fileValidatorStep ?
You don't need a flow job for that, a simple job is enough. Here is a quick example:
import java.util.Arrays;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.StepContribution;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.item.support.ListItemReader;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
#Configuration
#EnableBatchProcessing
public class MyJobConfiguration {
#Bean
public Step validationStep(StepBuilderFactory stepBuilderFactory) {
return stepBuilderFactory.get("validationStep")
.tasklet(new Tasklet() {
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
if(!isValid()) {
throw new Exception("Invalid file");
}
return RepeatStatus.FINISHED;
}
private boolean isValid() {
// TODO implement validation logic
return false;
}
})
.build();
}
#Bean
public Step readAndWriteCsvFile(StepBuilderFactory stepBuilderFactory) {
return stepBuilderFactory.get("readAndWriteCsvFile")
.<Integer, Integer>chunk(2)
.reader(new ListItemReader<>(Arrays.asList(1, 2, 3, 4)))
.writer(items -> items.forEach(System.out::println))
.build();
}
#Bean
public Job job(JobBuilderFactory jobBuilderFactory, StepBuilderFactory stepBuilderFactory) {
return jobBuilderFactory.get("job")
.start(validationStep(stepBuilderFactory))
.next(readAndWriteCsvFile(stepBuilderFactory))
.build();
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJobConfiguration.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
jobLauncher.run(job, new JobParameters());
}
}
In this example, if the validationStep fails, the next step is will not be executed.
I solved my problem, i changed the bean FlatFileItemReader method inside the Job class configuration with the annotation #StepScope, now this bean has only loads when I need it also should avoid declaring the FlatFileItemReader bean out of scope of the job
Related
I have an existing spring batch project that have multiple steps. I want to modify a step so I can stop the job : jobExecution.getStatus() == STOPPED.
My step :
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Autowired
private StepReader reader;
#Autowired
private StepProcessor processor;
#Autowired
private StepWriter writer;
#Autowired
public GenericListener listener;
#Bean
#JobScope
#Qualifier("mystep")
public Step MyStep() throws ReaderException {
return stepBuilderFactory.get("mystep")
.reader(reader.read())
.listener(listener)
.processor(processor)
.writer(writer)
.build();
}
GenericListener implements ItemReadListener, ItemProcessListener, ItemWriteListener and overrides before and after methods that basically write log.
The focus here is on the StepReader class and its read() method that returns a FlatFileItemReader :
#Component
public class StepReader {
public static final String DELIMITER = "|";
#Autowired
private ClassToAccessProperties classToAccessProperties;
private Logger log = Logger.create(StepReader.class);
#Autowired
private FlatFileItemReaderFactory<MyObject> flatFileItemReaderFactory;
public ItemReader<MyObject> read() throws ReaderException {
try {
String csv = classToAccessProperties.getInputCsv();
FlatFileItemReader<MyObject> reader = flatFileItemReaderFactory.create(csv, getLineMapper());
return reader;
} catch (ReaderException | EmptyInputfileException | IOException e) {
throw new ReaderException(e);
} catch (NoInputFileException e) {
log.info("Oh no !! No input file");
// Here I want to stop the job
return null;
}
}
private LineMapper<MyObject> getLineMapper () {
DefaultLineMapper<MyObject> mapper = new DefaultLineMapper<>();
DelimitedLineTokenizer delimitedLineTokenizer = new DelimitedLineTokenizer();
delimitedLineTokenizer.setDelimiter(DELIMITER);
mapper.setLineTokenizer(delimitedLineTokenizer);
mapper.setFieldSetMapper(new MyObjectFieldSetMapper());
return mapper;
}
}
I tried to implement StepExecutionListener in StepReader but with no luck, I think because the reader method in StepBuilderFactory is expecting an ItemReader from the reader.read() method and it doesn't care about the rest of the class.
I'm looking for ideas or solution to be able to stop the entire job (not fail it) when NoInputFileException is catched.
I'm looking for ideas or solution to be able to stop the entire job (not fail it) when NoInputFileException is catched.
This is a common pattern and is described in details in the Handling Step Completion When No Input is Found section of the reference documentation. The example in that section shows how to fail a job when no input file is found, but since you want to stop the job instead of failing it, you can use StepExecution#setTerminateOnly(); in the listener and your job will end with status STOPPED. In your example, you would add that listener to the MyStep step.
However, I would suggest to add a pre-validation step and stop the job if there is no file. Here is a quick example:
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
#Configuration
#EnableBatchProcessing
public class MyJob {
#Autowired
private JobBuilderFactory jobs;
#Autowired
private StepBuilderFactory steps;
#Bean
public Step fileValidationStep() {
return steps.get("fileValidationStep")
.tasklet((contribution, chunkContext) -> {
// TODO add code to check if the file exists
System.out.println("file not found");
chunkContext.getStepContext().getStepExecution().setTerminateOnly();
return RepeatStatus.FINISHED;
})
.build();
}
#Bean
public Step fileProcessingStep() {
return steps.get("fileProcessingStep")
.tasklet((contribution, chunkContext) -> {
System.out.println("processing file");
return RepeatStatus.FINISHED;
})
.build();
}
#Bean
public Job job() {
return jobs.get("job")
.start(fileValidationStep())
.next(fileProcessingStep())
.build();
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
JobExecution jobExecution = jobLauncher.run(job, new JobParameters());
System.out.println("Job status: " + jobExecution.getExitStatus().getExitCode());
}
}
The example prints:
file not found
Job status: STOPPED
Hope this helps.
I am using Spring Batch and FlatFileItemReader to read a .CSV file. A file have a header(first line), details and footer(last line). So, I want to validate total number of details by a footer line.
This is my example .csv file.
movie.csv
Name|Type|Year
Notting Hill|romantic comedy|1999
Toy Story 3|Animation|2010
Captain America: The First Avenger|Action|2011
3
from example file
First line is a header (and I ignore it).
At line 2-4 is a detail lines, and last is a footer.
I want to read footer and get value (last line = 3)
and after, get total number recod of details (in this case we have 3 lines)
and last I'll validation total from footer (3) and total number record of details (3) is equals?
and this is my code.
#Bean
#StepScope
public FlatFileItemReader<Movie> movieItemReader(String filePath) {
FlatFileItemReader<Movie> reader = new FlatFileItemReader<>();
reader.setLinesToSkip(1); //skip header line
reader.setResource(new PathResource(filePath));
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer("|");
DefaultLineMapper<Movie> movieLineMapper = new DefaultLineMapper<>();
FieldSetMapper<Movie> movieMapper = movieFieldSetMapper();
movieLineMapper.setLineTokenizer(tokenizer);
movieLineMapper.setFieldSetMapper(movieFieldSetMapper);
movieLineMapper.afterPropertiesSet();
reader.setLineMapper(movieLineMapper);
return reader;
}
public FieldSetMapper<Movie> movieFieldSetMapper() {
BeanWrapperFieldSetMapper<Movie> movieMapper = new BeanWrapperFieldSetMapper<>();
movieMapper.setTargetType(Movie.class);
return movieMapper;
}
You can use a chunk oriented step as a validation step before your job's business logic. This step would use a ItemReadListener to save the last item and a StepExecutionListener for the validation. Here is a quick example:
import org.springframework.batch.core.ExitStatus;
import org.springframework.batch.core.ItemReadListener;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.StepExecution;
import org.springframework.batch.core.StepExecutionListener;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepScope;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.listener.StepExecutionListenerSupport;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.mapping.PassThroughLineMapper;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ByteArrayResource;
#Configuration
#EnableBatchProcessing
public class MyJob {
#Autowired
private JobBuilderFactory jobs;
#Autowired
private StepBuilderFactory steps;
#Bean
#StepScope
public FlatFileItemReader<String> itemReader() {
FlatFileItemReader<String> reader = new FlatFileItemReader<>();
reader.setLinesToSkip(1); //skip header line
reader.setResource(new ByteArrayResource("header\nitem1\nitem2\n2".getBytes()));
reader.setLineMapper(new PassThroughLineMapper());
return reader;
}
#Bean
public ItemWriter<String> itemWriter() {
return items -> {
for (String item : items) {
System.out.println("item = " + item);
}
};
}
#Bean
public Step step1() {
MyListener myListener = new MyListener();
return steps.get("step1")
.<String, String>chunk(5)
.reader(itemReader())
.writer(itemWriter())
.listener((ItemReadListener<String>) myListener)
.listener((StepExecutionListener) myListener)
.build();
}
#Bean
public Step step2() {
return steps.get("step2")
.tasklet((contribution, chunkContext) -> {
System.out.println("Total count is ok as validated by step1");
return RepeatStatus.FINISHED;
})
.build();
}
#Bean
public Job job() {
return jobs.get("job")
.start(step1())
.next(step2())
.build();
}
static class MyListener extends StepExecutionListenerSupport implements ItemReadListener<String> {
private String lastItem;
#Override
public void beforeRead() {
}
#Override
public void afterRead(String item) {
this.lastItem = item;
}
#Override
public void onReadError(Exception ex) {
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
int readCount = stepExecution.getReadCount();
int totalCountInFooter = Integer.valueOf(this.lastItem); // TODO sanity checks (number format, etc)
System.out.println("readCount = " + (readCount - 1)); // substract footer from the read count
System.out.println("totalCountInFooter = " + totalCountInFooter);
// TODO do validation on readCount vs totalCountInFooter
return ExitStatus.COMPLETED; // return appropriate exit status according to validation result
}
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
jobLauncher.run(job, new JobParameters());
}
}
This example prints:
item = item1
item = item2
item = 2
readCount = 2
totalCountInFooter = 2
Total count is ok as validated by step1
Hope this helps.
Goal: if there is an AdmisSkipException (custom exception) I want the job to skip the record and keep on processing next lines.
If there is any other exception I want the job to stop.
Here is what I have so far:
Conf:
.<Admis, PreCandidat>chunk(100)
.reader(readerDBAdmis())
.processor(new AdmisItemProcessor(preCandidatRepository, scolFormationSpecialisationRepository, preCandidatureRepository))
.faultTolerant()
.skipPolicy(AdmisVerificationSkipper())
.writer(writerPGICocktail()).build();
AdmisSkipException :
public class AdmisSkipException extends Exception {
private TypeRejet typeRejet;
private Admis admis;
public AdmisSkipException(TypeRejet typeRejet, Admis admis) {
super();
this.typeRejet = typeRejet;
this.admis = admis;
}
public TypeRejet getTypeRejet() {
return typeRejet;
}
public Admis getAdmis() {
return admis;
}
}
AdmisVerificationSkipper :
public class AdmisVerificationSkipper implements SkipPolicy {
private AdmisRejetRepository admisRejetRepository;
public AdmisVerificationSkipper(AdmisRejetRepository admisRejetRepository) {
this.admisRejetRepository = admisRejetRepository;
}
#Override
public boolean shouldSkip(Throwable exception, int skipCount) throws SkipLimitExceededException {
if (exception instanceof AdmisSkipException) {
AdmisSkipException admisSkipException = (AdmisSkipException) exception;
AdmisRejet rejet = new AdmisRejet();
rejet.setAdmis(admisSkipException.getAdmis());
rejet.setTypeRejet(admisSkipException.getTypeRejet());
admisRejetRepository.save(rejet);
return true;
}
return false;
}
}
With this configuration, if a NullPointerException (for example) is thrown in AdmisItemProcessor, the job will continue instead of failing.
What should I change to stop the job ?
if there is an AdmisSkipException (custom exception) I want the job to skip the record and keep on processing next lines. If there is any other exception I want the job to stop.
You can achieve this with:
.<Admis, PreCandidat>chunk(100)
.reader(readerDBAdmis())
.processor(new AdmisItemProcessor(preCandidatRepository, scolFormationSpecialisationRepository, preCandidatureRepository))
.writer(writerPGICocktail())
.faultTolerant()
.skip(AdmisSkipException.class)
.skipLimit(SKIP_LIMIT)
.build();
Looking at your code, you probably had to create a custom skip policy because you want to save skipped items somewhere. I would recommend to use a SkipListener instead, which is designed specifically for this type of requirements. Having a shouldSkip method save items to a repository is a side effect. So this is better done with a listener. That said, you won't need a custom policy and .skip(AdmisSkipException.class).skipLimit(SKIP_LIMIT) should be enough.
With this configuration, if a NullPointerException (for example) is thrown in AdmisItemProcessor, the job will continue instead of failing. What should I change to stop the job ?
Here is an example you can run to see how it works:
import java.util.Arrays;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.support.ListItemReader;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.lang.Nullable;
#Configuration
#EnableBatchProcessing
public class MyJob {
#Autowired
private JobBuilderFactory jobs;
#Autowired
private StepBuilderFactory steps;
#Bean
public ItemReader<Integer> itemReader() {
return new ListItemReader<>(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10));
}
#Bean
public ItemProcessor<Integer, Integer> itemProcessor() {
return new ItemProcessor<Integer, Integer>() {
#Nullable
#Override
public Integer process(Integer item) throws Exception {
if (item.equals(3)) {
throw new IllegalArgumentException("No 3!");
}
if (item.equals(9)) {
throw new NullPointerException("Boom at 9!");
}
return item;
}
};
}
#Bean
public ItemWriter<Integer> itemWriter() {
return items -> {
for (Integer item : items) {
System.out.println("item = " + item);
}
};
}
#Bean
public Step step() {
return steps.get("step")
.<Integer, Integer>chunk(1)
.reader(itemReader())
.processor(itemProcessor())
.writer(itemWriter())
.faultTolerant()
.skip(IllegalArgumentException.class)
.skipLimit(3)
.build();
}
#Bean
public Job job() {
return jobs.get("job")
.start(step())
.build();
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
JobExecution jobExecution = jobLauncher.run(job, new JobParameters());
System.out.println(jobExecution);
}
}
This example skips items when IllegalArgumentExceptions are thrown and fails the job if a NullPointerException happens.
Hope this helps.
I am trying to achieve the flow shown in the image below using Spring batch. I was referring to java configuration on page 85 of https://docs.spring.io/spring-batch/4.0.x/reference/pdf/spring-batch-reference.pdf where it talks about Java Configuration.
For some reason, when the Decider returns TYPE2, the batch ends with Failed State without any error message. Following is the java configuration of my job:
jobBuilderFactory.get("myJob")
.incrementer(new RunIdIncrementer())
.preventRestart()
.start(firstStep())
.next(typeDecider()).on("TYPE1").to(stepType1()).next(lastStep())
.from(typeDecider()).on("TYPE2").to(stepType2()).next(lastStep())
.end()
.build();
I think something not right with the java configuration though it matches with the Spring document. A flow can be useful here but I am sure there would be a way without it. Any idea on how to achieve this?
You need to define the flow not only from the decider to next steps but also starting from stepType1 and stepType2 to lastStep. Here is an example:
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.job.flow.FlowExecutionStatus;
import org.springframework.batch.core.job.flow.JobExecutionDecider;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
#Configuration
#EnableBatchProcessing
public class MyJob {
#Autowired
private JobBuilderFactory jobs;
#Autowired
private StepBuilderFactory steps;
#Bean
public Step firstStep() {
return steps.get("firstStep")
.tasklet((contribution, chunkContext) -> {
System.out.println("firstStep");
return RepeatStatus.FINISHED;
})
.build();
}
#Bean
public JobExecutionDecider decider() {
return (jobExecution, stepExecution) -> new FlowExecutionStatus("TYPE1"); // or TYPE2
}
#Bean
public Step stepType1() {
return steps.get("stepType1")
.tasklet((contribution, chunkContext) -> {
System.out.println("stepType1");
return RepeatStatus.FINISHED;
})
.build();
}
#Bean
public Step stepType2() {
return steps.get("stepType2")
.tasklet((contribution, chunkContext) -> {
System.out.println("stepType2");
return RepeatStatus.FINISHED;
})
.build();
}
#Bean
public Step lastStep() {
return steps.get("lastStep")
.tasklet((contribution, chunkContext) -> {
System.out.println("lastStep");
return RepeatStatus.FINISHED;
})
.build();
}
#Bean
public Job job() {
return jobs.get("job")
.start(firstStep())
.next(decider())
.on("TYPE1").to(stepType1())
.from(decider()).on("TYPE2").to(stepType2())
.from(stepType1()).on("*").to(lastStep())
.from(stepType2()).on("*").to(lastStep())
.build()
.build();
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
jobLauncher.run(job, new JobParameters());
}
}
This prints:
firstStep
stepType1
lastStep
If the decider returns TYPE2, the sample prints:
firstStep
stepType2
lastStep
Hope this helps.
Ran into the similar issue where else part is not being called (technically only first configured on() is being called)
Almost all the websites related to the flow and decider examples have the similar job configurations and was not able to figure it out what was the issue.
After some research, found the way how spring maintains the deciders and decisions.
At high level, while initializing the application, based on the job configuration spring maintains a list of decisions for a decider object (like decsion0, decision1, so on).
when we call the decider() method it always returns a new object for the decider. As it is returning a new object, the list contains only one mapping for each object (i.e., decision0 ) and since it is a list, it always return the first configured decision.So this is the reason why only the first configured transition only being called.
Solution:
Instead of making a method call to the decider, create a single-ton bean for the decider and use it in the job configuration
Example:
#Bean
public JobExecutionDecider stepDecider() {
return new CustomStepDecider();
}
inject it and use it in the job creation bean
#Bean
public Job sampleJob(Step step1, Step step2,Step step3,
JobExecutionDecider stepDecider) {
return jobBuilderFactory.get("sampleJob")
.start(step1)
.next(stepDecider).on("TYPE1").to(step2)
.from(stepDecider).on("TYPE2").to(step3)
}
Hope this helps.
Create a dummyStep which will return the FINISH status and jump to next decider. you need to redirect flow cursor to next decider or virtual step after finishing the current step
.next(copySourceFilesStep())
.next(firstStepDecider).on(STEP_CONTINUE).to(executeStep_1())
.from(firstStepDecider).on(STEP_SKIP).to(virtualStep_1())
//-executeStep_2
.from(executeStep_1()).on(ExitStatus.COMPLETED.getExitCode())
.to(secondStepDecider).on(STEP_CONTINUE).to(executeStep_2())
.from(secondStepDecider).on(STEP_SKIP).to(virtualStep_3())
.from(virtualStep_1()).on(ExitStatus.COMPLETED.getExitCode())
.to(secondStepDecider).on(STEP_CONTINUE).to(executeStep_2())
.from(secondStepDecider).on(STEP_SKIP).to(virtualStep_3())
//-executeStep_3
.from(executeStep_2()).on(ExitStatus.COMPLETED.getExitCode())
.to(thirdStepDecider).on(STEP_CONTINUE).to(executeStep_3())
.from(thirdStepDecider).on(STEP_SKIP).to(virtualStep_4())
.from(virtualStep_3()).on(ExitStatus.COMPLETED.getExitCode())
.to(thirdStepDecider).on(STEP_CONTINUE).to(executeStep_3())
.from(thirdStepDecider).on(STEP_SKIP).to(virtualStep_4())
#Bean
public Step virtulaStep_2() {
return stepBuilderFactory.get("continue-virtualStep2")
.tasklet((contribution, chunkContext) -> {
return RepeatStatus.FINISHED;
})
.build();
}
Before Spring Batch job runs. I have a import table which contains all items that needs importing into our system. It is at this point verified to contain only items that does not exist in our system.
Next I have a Spring Batch Job, which reads from this import table using JpaPagingItemReader.
After work is done, it writes to db using ItemWriter.
I run with page-size and chunk-size at 10000.
Now this works absolutely fine when running on MySQL innoDB. I can even use multiple threading and everything works fine.
But now we are migrating to PostgreSQL, and the same Batch Job runs into some very strange problem
What happens is that it tries to insert duplicates into our system. This will naturally be rejected by unique index constraints and an error is thrown.
Since the import db table is verified to contain only non-existing before batch job starts, the only reason for this i can think of is that the JpaPagingItemReader reads some rows multiple times from import db table when i run on Postgres. But why would it do that?
I have experimented with a lot of settings. Turning chunk and page-size down to around 100 only makes import slower, but still same error. Running single-thread instead of multiple threads only makes the error happen slightly later.
So what on earth could be the reason for my JpaPagingItemReader reading the same items multiple times only on PostgresSQL?
The select statement backing the reader is simple, its a NamedQuery:
#NamedQuery(name = "ImportDTO.findAllForInsert",
query = "select h from ImportDTO h where h.toBeImported = true")
Please also note that the toBeImported flag will not be altered by the batch job at all during runtime, so the results from this query should always return the same before, under and after the batch job.
Any insights, tips or help is greatly appriciated!
Here is Batch Config code:
import javax.persistence.EntityManagerFactory;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.core.launch.support.SimpleJobLauncher;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.item.database.JpaPagingItemReader;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.task.SimpleAsyncTaskExecutor;
import org.springframework.core.task.TaskExecutor;
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private OrganizationItemWriter organizationItemWriter;
#Autowired
private EntityManagerFactory entityManagerFactory;
#Autowired
private OrganizationUpdateProcessor organizationUpdateProcessor;
#Autowired
private OrganizationInsertProcessor organizationInsertProcessor;
private Integer organizationBatchSize = 10000;
private Integer organizationThreadSize = 3;
private Integer maxThreadSize = organizationThreadSize;
#Bean
public SimpleJobLauncher jobLauncher(JobRepository jobRepository) {
SimpleJobLauncher launcher = new SimpleJobLauncher();
launcher.setJobRepository(jobRepository);
return launcher;
}
#Bean
public JpaPagingItemReader<ImportDTO> findNewImportsToImport() throws Exception {
JpaPagingItemReader<ImportDTO> databaseReader = new JpaPagingItemReader<>();
databaseReader.setEntityManagerFactory(entityManagerFactory);
JpaQueryProviderImpl<ImportDTO> jpaQueryProvider = new JpaQueryProviderImpl<>();
jpaQueryProvider.setQuery("ImportDTO.findAllForInsert");
databaseReader.setQueryProvider(jpaQueryProvider);
databaseReader.setPageSize(organizationBatchSize);
// must be set to false if multi threaded
databaseReader.setSaveState(false);
databaseReader.afterPropertiesSet();
return databaseReader;
}
#Bean
public JpaPagingItemReader<ImportDTO> findImportsToUpdate() throws Exception {
JpaPagingItemReader<ImportDTO> databaseReader = new JpaPagingItemReader<>();
databaseReader.setEntityManagerFactory(entityManagerFactory);
JpaQueryProviderImpl<ImportDTO> jpaQueryProvider = new JpaQueryProviderImpl<>();
jpaQueryProvider.setQuery("ImportDTO.findAllForUpdate");
databaseReader.setQueryProvider(jpaQueryProvider);
databaseReader.setPageSize(organizationBatchSize);
// must be set to false if multi threaded
databaseReader.setSaveState(false);
databaseReader.afterPropertiesSet();
return databaseReader;
}
#Bean
public OrganizationItemWriter writer() throws Exception {
return organizationItemWriter;
}
#Bean
public StepExecutionNotificationListener stepExecutionListener() {
return new StepExecutionNotificationListener();
}
#Bean
public ChunkExecutionListener chunkListener() {
return new ChunkExecutionListener();
}
#Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(maxThreadSize);
return taskExecutor;
}
#Bean
public Job importOrganizationsJob(JobCompletionNotificationListener listener) throws Exception {
return jobBuilderFactory.get("importAndUpdateOrganizationJob")
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(importNewOrganizationsFromImports())
.next(updateOrganizationsFromImports())
.build();
}
#Bean
public Step importNewOrganizationsFromImports() throws Exception {
return stepBuilderFactory.get("importNewOrganizationsFromImports")
.<ImportDTO, Organization> chunk(organizationBatchSize)
.reader(findNewImportsToImport())
.processor(organizationInsertProcessor)
.writer(writer())
.taskExecutor(taskExecutor())
.listener(stepExecutionListener())
.listener(chunkListener())
.throttleLimit(organizationThreadSize)
.build();
}
#Bean
public Step updateOrganizationsFromImports() throws Exception {
return stepBuilderFactory.get("updateOrganizationsFromImports")
.<ImportDTO, Organization> chunk(organizationBatchSize)
.reader(findImportsToUpdate())
.processor(organizationUpdateProcessor)
.writer(writer())
.taskExecutor(taskExecutor())
.listener(stepExecutionListener())
.listener(chunkListener())
.throttleLimit(organizationThreadSize)
.build();
}
}
You need to add order by clause to select