FlatFileItemWriter - Writer must be open before it can be written to - java

I've a SpringBatch Job where I skip all duplicate items write to a Flat file.
However the FlatFileItemWriter throws the below error whenever there's a duplicate:
Writer must be open before it can be written to
Below is the Writer & SkipListener configuration -
#Bean(name = "duplicateItemWriter")
public FlatFileItemWriter<InventoryFileItem> dupItemWriter(){
return new FlatFileItemWriterBuilder<InventoryFileItem>()
.name("duplicateItemWriter")
.resource(new FileSystemResource("duplicateItem.txt"))
.lineAggregator(new PassThroughLineAggregator<>())
.append(true)
.shouldDeleteIfExists(true)
.build();
}
public class StepSkipListener implements SkipListener<InventoryFileItem, InventoryItem> {
private FlatFileItemWriter<InventoryFileItem> skippedItemsWriter;
public StepSkipListener(FlatFileItemWriter<InventoryFileItem> skippedItemsWriter) {
this.skippedItemsWriter = skippedItemsWriter;
}
#Override
public void onSkipInProcess(InventoryFileItem item, Throwable t) {
System.out.println(item.getBibNum() + " Process - " + t.getMessage());
try {
skippedItemsWriter.write(Collections.singletonList(item));
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
The overall Job is defined as below and I'm using the duplicateItemWriter from the SkipListener.
#Bean(name = "fileLoadJob")
#Autowired
public Job fileLoadJob(JobBuilderFactory jobs, StepBuilderFactory steps,
FlatFileItemReader<inventoryFileItem> fileItemReader,
CompositeItemProcessor compositeItemProcessor,
#Qualifier(value = "itemWriter") ItemWriter<InventoryItem> itemWriter,
StepSkipListener skipListener) {
return jobs.get("libraryFileLoadJob")
.start(steps.get("step").<InventoryFileItem, InventoryItem>chunk(chunkSize)
.reader(FileItemReader)
.processor(compositeItemProcessor)
.writer(itemWriter)
.faultTolerant()
.skip(Exception.class)
.skipLimit(Integer.parseInt(skipLimit))
.listener(skipListener)
.build())
.build();
}
I've also tried to write all data to FlatFileItemWriter - that doesn't work as well. However if write to a DB, then there's no issue with it.
The Spring-Batch version I'm using is - 4.3.3
I've referred to the below threads as well:
unit testing a FlatFileItemWriter outside of Spring - "Writer must be open before it can be written to" exception
Spring Batch WriterNotOpenException
FlatfileItemWriter with Compositewriter example

This was just gross oversight, I missed that the FlatFileItemWriter needs a stream.
I'm somewhat disappointed to put up this question, but I'm posting the answer just in case it helps someone.
The solution was as simple as adding a stream(dupItemWriter) to the Job Definition.
FlatfileItemWriter with Compositewriter example
#Bean(name = "fileLoadJob")
#Autowired
public Job fileLoadJob(JobBuilderFactory jobs, StepBuilderFactory steps,
FlatFileItemReader<inventoryFileItem> fileItemReader,
CompositeItemProcessor compositeItemProcessor,
#Qualifier(value = "itemWriter") ItemWriter<InventoryItem> itemWriter,
#Qualifier(value = "duplicateItemWriter")FlatFileItemWriter<InventoryFileItem> dupItemWriter,
StepSkipListener skipListener) {
return jobs.get("libraryFileLoadJob")
.start(steps.get("step").<InventoryFileItem, InventoryItem>chunk(chunkSize)
.reader(FileItemReader)
.processor(compositeItemProcessor)
.writer(itemWriter)
.faultTolerant()
.skip(Exception.class)
.skipLimit(Integer.parseInt(skipLimit))
.listener(skipListener)
.stream(dupItemWriter)
.build())
.build();
}

Its not absolutely necessary to include the .stream(dupItemWriter)
you can also call the writer's .open() method instead.
In my case I was creating dynamic/programmatic ItemWriters so adding them to the streams was not feasible.
.stream(writer-1)
.stream(writer-2)
.stream(writer-N)
Instead I called .open() method myself:
FlatFileItemWriter<OutputContact> itemWriter = new FlatFileItemWriter<>();
itemWriter.setResource(outPutResource);
itemWriter.setAppendAllowed(true);
itemWriter.setLineAggregator(lineAggregator);
itemWriter.setHeaderCallback(writer -> writer.write("ACCT,MEMBER,SOURCE"));
itemWriter.open(new ExecutionContext());

I had the same issue,
I created two writers by inheritance of FlatFileItemWriter.
That was working before I added #StepScope annotation. But after that, the first one throws an Exception with "Writer must be open before it can be written to" error message. But the second one worked without any problem.
I solved that by calling the method open(new ExecutionContext()); but i still not understand why the second one works but not the first one.

Related

Spring batch Step does not read full file

Hi I have a problem with Spring Batch, I create a Job with two step the first step read a csv file by chunks filter bad values and saves into db, and second call to a stored procedure.
My problem is that for some reason the first step only reads partially the data file a 2,5GB csv.
The file have about 13M records but only saves about 400K.
Anybody knows why this happens and how to solve it?
Java version: 8
Spring boot version 2.7.1
This is my step
#Autowired
#Bean(name = "load_data_in_db_step")
public Step importData(
MyProcessor processor,
MyReader reader,
TaskExecutor executor,
#Qualifier("step-transaction-manager") PlatformTransactionManager transactionManager
) {
return stepFactory.get("experian_portals_imports")
.<ExperianPortal, ExperianPortal>chunk(chunkSize)
.reader(reader)
.processor(processor)
.writer(new JpaItemWriterBuilder<ExperianPortal>()
.entityManagerFactory(factory)
.usePersist(true)
.build()
)
.transactionManager(transactionManager)
.allowStartIfComplete(true)
.taskExecutor(executor)
.build();
}
this is the definition of MyReader
#Slf4j
#Component
public class MyReader extends FlatFileItemReader<ExperianPortal>{
private final MyLineMapper mapper;
private final Resource fileToRead;
#Autowired
public ExperianPortalReader(
MyLineMapper mapper,
#Value("${ext.datafile}") String pathToDataFile
) {
this.mapper = mapper;
val formatter = DateTimeFormatter.ofPattern("yyyyMM");
fileToRead = new FileSystemResource(String.format(pathToDataFile, formatter.format(LocalDate.now())));
}
#Override
public void afterPropertiesSet() throws Exception {
setLineMapper(mapper);
setEncoding(StandardCharsets.ISO_8859_1.name());
setLinesToSkip(1);
setResource(fileToRead);
super.afterPropertiesSet();
}
}
edit:
I already try to use a single thread strategy, i think that can be a problem with the RepeatTemplate, but i don't know how to use it correctly.
edit 2:
I give up with a custom solution and I finished using default components they works ok, and the problem was solve.
Remember to use only spring batch components
This is because you are using a non thread-safe item reader in a multi-threaded step. Your item reader extends FlatFileItemReader, and FlatFileItemReader is not thread-safe: Using FlatFileItemReader with a TaskExecutor (Thread Safety). You can try with a single threaded-step (remove .taskExecutor(executor)) and you will see that the entire file will be read.
What happens is that threads are reading records concurrently and the read count is not honored (threads are incrementing the read count and the step "thinks" that the file has been read entirely). You have a few options here:
synchronize the call to read in your item reader
wrap your reader in a SynchronizedItemStreamReader (the result would the same as the previous point)
make your item reader bean step-scoped

Add timestamp to filename in StaxEventItemWriter

I have the following Spring Batch itemwriter that streams to a file, but wish to add a timestamp to the filename.
The following code works but is incorrect, because the timestamp is set at startup time rather than on every batch run.
What I like to achieve is that a new filename gets assigned on every job run, any idea how to do this?
#Component
public class EccAddSumoItemWriter extends SumoStaxEventItemWriter<AddSubscriptionXml> {
public static final DateTimeFormatter DATE_FORMATTER = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH-mm-ss.SSS");
public EccAddSumoItemWriter(#Value("${sumo.output_folder}") String sumoSavePath) {
setShouldDeleteIfEmpty(true);
setRootIdentification("editionCodeChanged_add");
setResourcePath(sumoSavePath + "/edition_code_changed_add-" + now().format(DATE_FORMATTER) + ".xml");
}
}
setResourcePath merely refers to:
protected void setResourcePath(String resourceFilePath) {
this.setResource(new FileSystemResource(resourceFilePath));
}
What is the advised way of doing this in Spring Batch?
Ditch your EccAddSumoItemWriter and write an #Bean method that creates a #StepScope or #JobScope SumoStaxEventItemWriter.
#StepScope
#Bean
public SumoStaxEventItemWriter writer(#Value("${sumo.output_folder}") String sumoSavePath) {
SumoStaxEventItemWriter writer = new SumoStaxEventItemWriter();
writer.setShouldDeleteIfEmpty(true);
writer.setRootIdentification("editionCodeChanged_add");
writer.setResourcePath(sumoSavePath + "/edition_code_changed_add-" + now().format(DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH-mm-ss.SSS")) + ".xml");
return writer;
}
Now when building your step use this step scoped writer and each time you will get a fresh instance configured accordingly.
From the theory it seems that simply #StepScope should resolve the problem.
I will test this.
After test, worked perfectly:

Why does my spring batch takes step2 while step1 is still going on?

First of all, thank you for checking out my post.
I would list the techs that are used, ​what I need to achieve and what is happening.
Services used:
Spring Boot in Groovy
AWS: AWS Batch, S3Bucket, SNS, Parameter Store, CloudFormation
What I need to achieve:
Read two csv files from S3 Bucket(student.csv: id,studentName are columns and score.csv: id,studentId,score)
Check who is failed by comparing two files. If student’s score is below 50, he/she is failed.
Create a new csv file in S3 Bucket and store it as failedStudents.csv.
Send an email via SNS topic the failedStudents.csv that was just created.
In the Spring Batch Config class, I am calling the 1,2,3 as step1 and 4 as step2.
#Bean
Step step1() {
return this.stepBuilderFactory
.get("step1")
.<BadStudent, BadStudent>chunk(100)
.reader(new IteratorItemReader<Student>(this.StudentLoader.Students.iterator()) as ItemReader<? extends BadStudent>)
.processor(this.studentProcessor as ItemProcessor<? super BadStudent, ? extends BadStudent>)
.writer(this.csvWriter())
.build()
}
#Bean
Step step2() {
return this.stepBuilderFactory
.get("step2")
.tasklet(new PublishSnsTopic())
.build()
}
#Bean
Job job() {
return this.jobBuilderFactory
.get("scoring-students-batch")
.incrementer(new RunIdIncrementer())
.start(this.step1())
.next(this.step2())
.build()
}
ItemWriter
#Component
#EnableContextResourceLoader
class BadStudentWriter implements ItemWriter<BadStudent> {
#Autowired
ResourceLoader resourceLoader
#Resource
FileProperties fileProperties
WritableResource resource
PrintStream writer
CSVPrinter csvPrinter
#PostConstruct
void setup() {
this.resource = this.resourceLoader.getResource("s3://students/failedStudents.csv") as WritableResource
this.writer = new PrintStream(this.resource.outputStream)
this.csvPrinter = new CSVPrinter(this.writer, CSVFormat.DEFAULT.withDelimiter('|' as char).withHeader('id', 'studentsName', 'score'))
}
#Override
void write(List<? extends BadStudent> items) throws Exception {
this.csvPrinter.with {CSVPrinter csvPrinter ->
items.each {BadStudent badStudent ->
csvPrinter.printRecord(
badStudent.id,
badStudent.studentsName,
badStudent.score
)
}
}
}
#AfterStep
void aferStep() {
this.csvPrinter.close()
}
}
PublishSnsTopic
#Configuration
#Service
class PublishSnsTopic implements Tasklet {
#Autowired
ResourceLoader resourceLoader
List<BadStudent> badStudents
#Autowired
FileProperties fileProperties
#PostConstruct
void setup() {
String badStudentCSVFileName = "s3://students/failedStudents.csv"
Reader badStudentReader = new InputStreamReader(
this.resourceLoader.getResource(badStudentCSVFileName).inputStream
)
this.badStudnets = new CsvToBeanBuilder(badStudentReader)
.withSeparator((char)'|')
.withType(BadStudent)
.withFieldAsNull(CSVReaderNullFieldIndicator.BOTH)
.build()
.parse()
String messageBody = ""
messageBody += this.badStudents.collect {it -> return "${it.id}"}
SnsClient snsClient = SnsClient.builder().build()
if(snsClient) {
publishTopic(snsClient, messageBody, this.fileProperties.topicArn)
snsClient.close()
}
}
void publishTopic(SnsClient snsClient, String message, String arn) {
try {
PublishRequest request = PublishRequest.builder()
.message(message)
.topicArn(arn)
.build()
PublishResponse result = snsClient.publish(request)
} catch (SnsException e) {
log.error "SOMETHING WENT WRONG"
}
}
#Override
RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
return RepeatStatus.FINISHED
}
}
The Issue here is that, while step 1 is executing step2 also gets executed.
And it would lead to serious problem as
As this batch job runs everyday. Let’s say today is Tuesday, the failedStudents.csv is generated and stored fine. However while it is also running step2 instead of waiting for step1 to be done, step2 is send the failedStudents.csv that is generated on Monday. So csv is generated as new dataset. But Step2 is sending a day old students list.
For some reason, if failedStudents.csv is not stored in the s3 or turned into Glacier bucket, it would failed the batch job. Because in the Step2 it would crash due to FileNotFound Exception while Step1 is still going on and the failedStudents.csv is not created yet.
This is my first AWS and Spring Batch Project and so far, Step1 is working great perfectly. But I am guessing Step2 has a problem,
I appreciate reading my long post.
I would be very happy if anyone could answer or help me with this problem.
I have been fixing PublishSnsTopic and ItemWriter, but this is where I am stuck for a while, not sure how to figure this out.
Thank you so much again for reading.

#StepScope Causing issue reader must be open before it can be read [duplicate]

I want to use Spring Batch (v3.0.9) restart functionality so that when JobInstance restarted the process step reads from the last failed chunk point forward. My restart works fine as long as I don't use #StepScope annotation to my myBatisPagingItemReader bean method.
I was using #StepScope so that i can do late binding to get the JobParameters in my myBatisPagingItemReader bean method #Value("#{jobParameters['run-date']}"))
If I use #StepScope annotation on myBatisPagingItemReader() bean method the restart does not work as it creates new instance (scope=step, name=scopedTarget.myBatisPagingItemReader).
If i use stepscope, is it possible for my myBatisPagingItemReader to set the read.count from the last failure to get restart working?
I have explained this issue with example below.
#Configuration
#EnableBatchProcessing
public class BatchConfig {
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<Model> myBatisPagingItemReader,
ItemProcessor<Model, Model> itemProcessor,
ItemWriter<Model> itemWriter) {
return stepBuilderFactory.get("data-load")
.<Model, Model>chunk(10)
.reader(myBatisPagingItemReader)
.processor(itemProcessor)
.writer(itemWriter)
.listener(itemReadListener())
.listener(new JobParameterExecutionContextCopyListener())
.build();
}
#Bean
public Job job(JobBuilderFactory jobBuilderFactory, #Qualifier("step1")
Step step1) {
return jobBuilderFactory.get("load-job")
.incrementer(new RunIdIncrementer())
.start(step1)
.listener(jobExecutionListener())
.build();
}
#Bean
#StepScope
public ItemReader<Model> myBatisPagingItemReader(
SqlSessionFactory sqlSessionFactory,
#Value("#{JobParameters['run-date']}") String runDate)
{
MyBatisPagingItemReader<Model> reader = new
MyBatisPagingItemReader<>();
Map<String, Object> parameterValues = new HashMap<>();
parameterValues.put("runDate", runDate);
reader.setSqlSessionFactory(sqlSessionFactory);
reader.setParameterValues(parameterValues);
reader.setQueryId("query");
return reader;
}
}
Restart Example when I use #Stepscope annotation to myBatisPagingItemReader(), the reader is fetching 5 records and I have chunk size(commit-interval) set to 3.
Job Instance - 01 - Job Parameter - 01/02/2019.
chunk-1:
- process record-1
- process record-2
- process record-3
writer - writes all 3 records
chunk-1 commit successful
chunk-2:
process record-4
process record-5 - Throws and exception
Job completes and set to 'FAILED' status
Now the Job is Restarted again using same Job Parameter.
Job Instance - 01 - Job Parameter - 01/02/2019.
chunk-1:
process record-1
process record-2
process record-3
writer - writes all 3 records
chunk-1 commit successful
chunk-2:
process record-4
process record-5 - Throws and exception
Job completes and set to 'FAILED' status
The #StepScope annotation on myBatisPagingItemReader() bean method creates a new instance , see below log message.
Creating object in scope=step, name=scopedTarget.myBatisPagingItemReader
Registered destruction callback in scope=step, name=scopedTarget.myBatisPagingItemReader
As it is new instance it start the process from start, instead of starting from chunk-2.
If i don't use #Stepscope, it restarts from chunk-2 as the restarted job step sets - MyBatisPagingItemReader.read.count=3.
The issue here is that you are returning an ItemReader instead of the fully qualified class (MyBatisPagingItemReader) or at least ItemStreamReader. When you use Spring Batch's step scope, we create a proxy to allow for late initialization. The proxy is based on the return type of the method (ItemReader in your case). The issue you are running into is that because the proxy is of ItemReader, Spring Batch does not know that your bean also implements ItemStream and it is that interface that enables restartability. By default, Spring Batch will automatically register all beans of type ItemStream for you (you can also explicitly register the beans yourself, but it's typically not needed).
To address your issue, the following should work (note the change in the return type):
#Bean
#StepScope
public MyBatisPagingItemReader<Model> myBatisPagingItemReader(
SqlSessionFactory sqlSessionFactory,
#Value("#{JobParameters['run-date']}") String runDate) {
MyBatisPagingItemReader<Model> reader =
new MyBatisPagingItemReader<>();
Map<String, Object> parameterValues = new HashMap<>();
parameterValues.put("runDate", runDate);
reader.setSqlSessionFactory(sqlSessionFactory);
reader.setParameterValues(parameterValues);
reader.setQueryId("query");
return reader;
}
This is why it is my recommendation that where possible, when using #Bean annotated methods, you should return the most concrete type possible to allow Spring to help as much as possible.

Based on Job parameters configure Sprig Batch Writer

I created a a simple spring batch job using spring boot that reads from our database and writes to a topic. I also had a hook that I could comment out the topicWriter and write to a csv file during development. Both are working by commenting one out and running the other writer. (topicWriter or writer). Business now wants to be able to run adhoc, topic or writer. So I opted to pass in a output param that contains either topic or csv. Upon reading it looked like I could use a decider, but this may be wrong. As it stands now with the below code complains about duplicate step and is looping when I try to run.I was unable to figure out how to run without a starting step, so I created a do nothing tasklet because the job needed a start step before the decider. So I think I got this all screwed up. Any ideas of a solution or directions?
#Bean
public Job job(#Qualifier("step") Step step) {
return jobBuilderFactory.get(BatchConstants.JOB_NAME).listener(jobListener())
.start(step).next(decider()).on("COMPLETED").to(step1(null,null)).from(decider()).on("FAILED").to(step2(null,null)).end().build();
}
#Bean
protected Step step() {
return stepBuilderFactory.get("step")
.tasklet(new Tasklet() {
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) {
return RepeatStatus.FINISHED;
}
})
.build();
}
#Bean
protected Step step1(ItemReader<someDto> reader,
ItemWriter<someDto> topicWriter) {
return stepBuilderFactory.get(BatchConstants.STEP_NAME)
.<someDto, someDto> chunk(BatchConstants.CHUNKSIZE)
.reader(reader)
.writer(topicWriter) // write to kafka topic.
.build();
}
#Bean
protected Step step2(ItemReader<someDto> reader,
ItemWriter<someDto> writer) {
return stepBuilderFactory.get(BatchConstants.STEP_NAME)
.<someDto, someDto> chunk(BatchConstants.CHUNKSIZE)
.reader(reader)
.writer(writer) // writes to csv
.build();
}
In Singel step you can define this using #StepScpoe . Based on job parameters you can select the writer.
#Bean
#StepScope
protected Step step2(ItemReader<someDto> reader,
ItemWriter<someDto> writer ,ItemWriter<someDto> topicWriter,"#{jobParameters['writerType']}") final String type ) {
ItemWriter<someDto> myWriter;
if(type.equals("topic"))
{
myWriter=topicWriter;
}
else
{
myWriter=writer;
}
return stepBuilderFactory.get(BatchConstants.STEP_NAME)
.<someDto, someDto> chunk(BatchConstants.CHUNKSIZE)
.reader(reader)
.writer(myWriter)
.build();
}

Categories

Resources