Based on Job parameters configure Sprig Batch Writer - java

I created a a simple spring batch job using spring boot that reads from our database and writes to a topic. I also had a hook that I could comment out the topicWriter and write to a csv file during development. Both are working by commenting one out and running the other writer. (topicWriter or writer). Business now wants to be able to run adhoc, topic or writer. So I opted to pass in a output param that contains either topic or csv. Upon reading it looked like I could use a decider, but this may be wrong. As it stands now with the below code complains about duplicate step and is looping when I try to run.I was unable to figure out how to run without a starting step, so I created a do nothing tasklet because the job needed a start step before the decider. So I think I got this all screwed up. Any ideas of a solution or directions?
#Bean
public Job job(#Qualifier("step") Step step) {
return jobBuilderFactory.get(BatchConstants.JOB_NAME).listener(jobListener())
.start(step).next(decider()).on("COMPLETED").to(step1(null,null)).from(decider()).on("FAILED").to(step2(null,null)).end().build();
}
#Bean
protected Step step() {
return stepBuilderFactory.get("step")
.tasklet(new Tasklet() {
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) {
return RepeatStatus.FINISHED;
}
})
.build();
}
#Bean
protected Step step1(ItemReader<someDto> reader,
ItemWriter<someDto> topicWriter) {
return stepBuilderFactory.get(BatchConstants.STEP_NAME)
.<someDto, someDto> chunk(BatchConstants.CHUNKSIZE)
.reader(reader)
.writer(topicWriter) // write to kafka topic.
.build();
}
#Bean
protected Step step2(ItemReader<someDto> reader,
ItemWriter<someDto> writer) {
return stepBuilderFactory.get(BatchConstants.STEP_NAME)
.<someDto, someDto> chunk(BatchConstants.CHUNKSIZE)
.reader(reader)
.writer(writer) // writes to csv
.build();
}

In Singel step you can define this using #StepScpoe . Based on job parameters you can select the writer.
#Bean
#StepScope
protected Step step2(ItemReader<someDto> reader,
ItemWriter<someDto> writer ,ItemWriter<someDto> topicWriter,"#{jobParameters['writerType']}") final String type ) {
ItemWriter<someDto> myWriter;
if(type.equals("topic"))
{
myWriter=topicWriter;
}
else
{
myWriter=writer;
}
return stepBuilderFactory.get(BatchConstants.STEP_NAME)
.<someDto, someDto> chunk(BatchConstants.CHUNKSIZE)
.reader(reader)
.writer(myWriter)
.build();
}

Related

Why does my spring batch takes step2 while step1 is still going on?

First of all, thank you for checking out my post.
I would list the techs that are used, ​what I need to achieve and what is happening.
Services used:
Spring Boot in Groovy
AWS: AWS Batch, S3Bucket, SNS, Parameter Store, CloudFormation
What I need to achieve:
Read two csv files from S3 Bucket(student.csv: id,studentName are columns and score.csv: id,studentId,score)
Check who is failed by comparing two files. If student’s score is below 50, he/she is failed.
Create a new csv file in S3 Bucket and store it as failedStudents.csv.
Send an email via SNS topic the failedStudents.csv that was just created.
In the Spring Batch Config class, I am calling the 1,2,3 as step1 and 4 as step2.
#Bean
Step step1() {
return this.stepBuilderFactory
.get("step1")
.<BadStudent, BadStudent>chunk(100)
.reader(new IteratorItemReader<Student>(this.StudentLoader.Students.iterator()) as ItemReader<? extends BadStudent>)
.processor(this.studentProcessor as ItemProcessor<? super BadStudent, ? extends BadStudent>)
.writer(this.csvWriter())
.build()
}
#Bean
Step step2() {
return this.stepBuilderFactory
.get("step2")
.tasklet(new PublishSnsTopic())
.build()
}
#Bean
Job job() {
return this.jobBuilderFactory
.get("scoring-students-batch")
.incrementer(new RunIdIncrementer())
.start(this.step1())
.next(this.step2())
.build()
}
ItemWriter
#Component
#EnableContextResourceLoader
class BadStudentWriter implements ItemWriter<BadStudent> {
#Autowired
ResourceLoader resourceLoader
#Resource
FileProperties fileProperties
WritableResource resource
PrintStream writer
CSVPrinter csvPrinter
#PostConstruct
void setup() {
this.resource = this.resourceLoader.getResource("s3://students/failedStudents.csv") as WritableResource
this.writer = new PrintStream(this.resource.outputStream)
this.csvPrinter = new CSVPrinter(this.writer, CSVFormat.DEFAULT.withDelimiter('|' as char).withHeader('id', 'studentsName', 'score'))
}
#Override
void write(List<? extends BadStudent> items) throws Exception {
this.csvPrinter.with {CSVPrinter csvPrinter ->
items.each {BadStudent badStudent ->
csvPrinter.printRecord(
badStudent.id,
badStudent.studentsName,
badStudent.score
)
}
}
}
#AfterStep
void aferStep() {
this.csvPrinter.close()
}
}
PublishSnsTopic
#Configuration
#Service
class PublishSnsTopic implements Tasklet {
#Autowired
ResourceLoader resourceLoader
List<BadStudent> badStudents
#Autowired
FileProperties fileProperties
#PostConstruct
void setup() {
String badStudentCSVFileName = "s3://students/failedStudents.csv"
Reader badStudentReader = new InputStreamReader(
this.resourceLoader.getResource(badStudentCSVFileName).inputStream
)
this.badStudnets = new CsvToBeanBuilder(badStudentReader)
.withSeparator((char)'|')
.withType(BadStudent)
.withFieldAsNull(CSVReaderNullFieldIndicator.BOTH)
.build()
.parse()
String messageBody = ""
messageBody += this.badStudents.collect {it -> return "${it.id}"}
SnsClient snsClient = SnsClient.builder().build()
if(snsClient) {
publishTopic(snsClient, messageBody, this.fileProperties.topicArn)
snsClient.close()
}
}
void publishTopic(SnsClient snsClient, String message, String arn) {
try {
PublishRequest request = PublishRequest.builder()
.message(message)
.topicArn(arn)
.build()
PublishResponse result = snsClient.publish(request)
} catch (SnsException e) {
log.error "SOMETHING WENT WRONG"
}
}
#Override
RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
return RepeatStatus.FINISHED
}
}
The Issue here is that, while step 1 is executing step2 also gets executed.
And it would lead to serious problem as
As this batch job runs everyday. Let’s say today is Tuesday, the failedStudents.csv is generated and stored fine. However while it is also running step2 instead of waiting for step1 to be done, step2 is send the failedStudents.csv that is generated on Monday. So csv is generated as new dataset. But Step2 is sending a day old students list.
For some reason, if failedStudents.csv is not stored in the s3 or turned into Glacier bucket, it would failed the batch job. Because in the Step2 it would crash due to FileNotFound Exception while Step1 is still going on and the failedStudents.csv is not created yet.
This is my first AWS and Spring Batch Project and so far, Step1 is working great perfectly. But I am guessing Step2 has a problem,
I appreciate reading my long post.
I would be very happy if anyone could answer or help me with this problem.
I have been fixing PublishSnsTopic and ItemWriter, but this is where I am stuck for a while, not sure how to figure this out.
Thank you so much again for reading.

FlatFileItemWriter - Writer must be open before it can be written to

I've a SpringBatch Job where I skip all duplicate items write to a Flat file.
However the FlatFileItemWriter throws the below error whenever there's a duplicate:
Writer must be open before it can be written to
Below is the Writer & SkipListener configuration -
#Bean(name = "duplicateItemWriter")
public FlatFileItemWriter<InventoryFileItem> dupItemWriter(){
return new FlatFileItemWriterBuilder<InventoryFileItem>()
.name("duplicateItemWriter")
.resource(new FileSystemResource("duplicateItem.txt"))
.lineAggregator(new PassThroughLineAggregator<>())
.append(true)
.shouldDeleteIfExists(true)
.build();
}
public class StepSkipListener implements SkipListener<InventoryFileItem, InventoryItem> {
private FlatFileItemWriter<InventoryFileItem> skippedItemsWriter;
public StepSkipListener(FlatFileItemWriter<InventoryFileItem> skippedItemsWriter) {
this.skippedItemsWriter = skippedItemsWriter;
}
#Override
public void onSkipInProcess(InventoryFileItem item, Throwable t) {
System.out.println(item.getBibNum() + " Process - " + t.getMessage());
try {
skippedItemsWriter.write(Collections.singletonList(item));
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
The overall Job is defined as below and I'm using the duplicateItemWriter from the SkipListener.
#Bean(name = "fileLoadJob")
#Autowired
public Job fileLoadJob(JobBuilderFactory jobs, StepBuilderFactory steps,
FlatFileItemReader<inventoryFileItem> fileItemReader,
CompositeItemProcessor compositeItemProcessor,
#Qualifier(value = "itemWriter") ItemWriter<InventoryItem> itemWriter,
StepSkipListener skipListener) {
return jobs.get("libraryFileLoadJob")
.start(steps.get("step").<InventoryFileItem, InventoryItem>chunk(chunkSize)
.reader(FileItemReader)
.processor(compositeItemProcessor)
.writer(itemWriter)
.faultTolerant()
.skip(Exception.class)
.skipLimit(Integer.parseInt(skipLimit))
.listener(skipListener)
.build())
.build();
}
I've also tried to write all data to FlatFileItemWriter - that doesn't work as well. However if write to a DB, then there's no issue with it.
The Spring-Batch version I'm using is - 4.3.3
I've referred to the below threads as well:
unit testing a FlatFileItemWriter outside of Spring - "Writer must be open before it can be written to" exception
Spring Batch WriterNotOpenException
FlatfileItemWriter with Compositewriter example
This was just gross oversight, I missed that the FlatFileItemWriter needs a stream.
I'm somewhat disappointed to put up this question, but I'm posting the answer just in case it helps someone.
The solution was as simple as adding a stream(dupItemWriter) to the Job Definition.
FlatfileItemWriter with Compositewriter example
#Bean(name = "fileLoadJob")
#Autowired
public Job fileLoadJob(JobBuilderFactory jobs, StepBuilderFactory steps,
FlatFileItemReader<inventoryFileItem> fileItemReader,
CompositeItemProcessor compositeItemProcessor,
#Qualifier(value = "itemWriter") ItemWriter<InventoryItem> itemWriter,
#Qualifier(value = "duplicateItemWriter")FlatFileItemWriter<InventoryFileItem> dupItemWriter,
StepSkipListener skipListener) {
return jobs.get("libraryFileLoadJob")
.start(steps.get("step").<InventoryFileItem, InventoryItem>chunk(chunkSize)
.reader(FileItemReader)
.processor(compositeItemProcessor)
.writer(itemWriter)
.faultTolerant()
.skip(Exception.class)
.skipLimit(Integer.parseInt(skipLimit))
.listener(skipListener)
.stream(dupItemWriter)
.build())
.build();
}
Its not absolutely necessary to include the .stream(dupItemWriter)
you can also call the writer's .open() method instead.
In my case I was creating dynamic/programmatic ItemWriters so adding them to the streams was not feasible.
.stream(writer-1)
.stream(writer-2)
.stream(writer-N)
Instead I called .open() method myself:
FlatFileItemWriter<OutputContact> itemWriter = new FlatFileItemWriter<>();
itemWriter.setResource(outPutResource);
itemWriter.setAppendAllowed(true);
itemWriter.setLineAggregator(lineAggregator);
itemWriter.setHeaderCallback(writer -> writer.write("ACCT,MEMBER,SOURCE"));
itemWriter.open(new ExecutionContext());
I had the same issue,
I created two writers by inheritance of FlatFileItemWriter.
That was working before I added #StepScope annotation. But after that, the first one throws an Exception with "Writer must be open before it can be written to" error message. But the second one worked without any problem.
I solved that by calling the method open(new ExecutionContext()); but i still not understand why the second one works but not the first one.

How to continue processing the next row in the processor when it fails in spring batch?

Suppose that two data were read using HibernateCursorItemReader in spring batch.
Suppose that an Exception occurred while processing the first row data in process.
The job listener is handling the failure.
But here the job ends normally.
I want to continue processing in the processor for the second row, what should I do?
Job
#Bean
public Job sampleJob() {
return jobBuilderFactory.get("sampleJob")
.start(sampleStep())
.listener(jobListener)
.build();
}
Reader
#Bean
#StepScope
public HibernateCursorItemReader<Sample> sampleItemReader() {
return new HibernateCursorItemReaderBuilder<Sample>()
.sessionFactory(entityManagerFactory.unwrap(SessionFactory.class))
.queryString("select ...")
.fetchSize(100)
.saveState(false)
.build();
}
processor
#Override
public Sample process(Sample sample) throws Exception {
try {
...
}catch (Exception e) {
throw new MyException();
//When an exception occurs, the job listener handles it. **When is the next row read from the reader processed?** It just ended...
}
return sample
}
You can use FaultTolerant skip logic in your sampleStep() bean. Add these below configuration in your sampleStep() bean:
.faultTolerant()
.skipLimit(10)
.skip(Exception.class)
// If you want to skip for any specific exception, you can put it above.

Choose between different steps depending on argument in spring batch

I'm using spring batch to write an application that read from a table and then write the output to a csv file. The application receives several input params, one of them is the database table to read. I want to write a single job that read the correct table depending on the input param. This is my Configuration class:
#Configuration
public class ExtractorConfiguration {
#Bean(name="readerA")
#StepScope
public JdbcCursorItemReader<ClassA> readerA(
#Value("#{jobParameters['REF_DATE']}")String dataRef
){
...
return reader;
}
#Bean(name="writerA")
#StepScope
public FlatFileItemWriter<ClassA> writerA(
#Value("#{jobParameters['OUTPUT_FILE_PATH']}")String outputPath
) {
...
return writer;
}
//endregion
#Bean(name="extractStep")
#StepScope
public Step extractStep(
#Value("#{jobParameters['DATABASE_TABLE']}")String tableName
) throws Exception {
switch (tableName) {
case tableA:
return steps.get("extractStep")
.<ClassA, ClassA>chunk(applicationProperties.getChunkSize())
.reader(readerA(""))
.writer(writerA(""))
.build();
default:
throw new Exception("Wrong table: " + tableName);
}
}
#Bean(name = "myJob")
public Job myJob() throws Exception {
return jobs.get("myJob")
.flow(extractStep(""))
.end()
.build();
}
}
The idea was to add inside extractStep a second case in the switch (something like this):
case tableB:
return steps.get("extractStep")
.<ClassB, ClassB>chunk(applicationProperties.getChunkSize())
.reader(readerB(""))
.writer(writerB(""))
.build();
and then write and exec methods for readerB and writerB; with this approach I'm receving this error:
Caused by: java.lang.IllegalStateException: No context holder available for step scope
I would like to know:
1- what is the error?
2- is there a method to get the JobParameters inside myJob rather than inside the steps?
3- is there a better approach?
Thanks.

Using ClassifierCompositeItemWriter and FlatFileItemWriter to write to multiple files

I'm trying to create a spring batch job that will read from MySQL database and write the data to different files depending on a value from the database. I am getting an error :
org.springframework.batch.item.WriterNotOpenException: Writer must be open before it can be written to
at org.springframework.batch.item.file.FlatFileItemWriter.write(FlatFileItemWriter.java:255)
Here's my ClassifierCompositeItemWriter
ClassifierCompositeItemWriter<WithdrawalTransaction> classifierCompositeItemWriter = new ClassifierCompositeItemWriter<WithdrawalTransaction>();
classifierCompositeItemWriter.setClassifier(new Classifier<WithdrawalTransaction,
ItemWriter<? super WithdrawalTransaction>>() {
#Override
public ItemWriter<? super WithdrawalTransaction> classify(WithdrawalTransaction wt) {
ItemWriter<? super WithdrawalTransaction> itemWriter = null;
if(wt.getPaymentMethod().equalsIgnoreCase("PDDTS")) { // condition
itemWriter = pddtsWriter();
} else {
itemWriter = swiftWriter();
}
return itemWriter;
}
});
As you can see, I only used two file writers for now.
#Bean("pddtsWriter")
private FlatFileItemWriter<WithdrawalTransaction> pddtsWriter()
And
#Bean("swiftWriter")
private FlatFileItemWriter<WithdrawalTransaction> swiftWriter()
I also added them as stream
#Bean
public Step processWithdrawalTransactions() throws Exception {
return stepBuilderFactory.get("processWithdrawalTransactions")
.<WithdrawalTransaction, WithdrawalTransaction> chunk(10)
.processor(withdrawProcessor())
.reader(withdrawReader)
.writer(withdrawWriter)
.stream(swiftWriter)
.stream(pddtsWriter)
.listener(headerWriter())
.build();
}
Am I doing something wrong?

Categories

Resources