Spring Batch FlatFileItemReader provide filename in future step - java

So I'm building a batch process that uses Spring Batch. I've defined a job that has a few steps, the first being an implementation of Tasklet which is a file watcher and checks a directory for any file that matches a particular file mask. Once that file is found, we move forward with the next step in the process. Initially this was additionally another implementation of Tasklet and we were looping through the file and for each records, loading to Oracle in batch loads. This was taking way too long. I have found that using FlatFileItemReader and JdbcBatchItemWriter is literally 1000 times faster. Anyway, my issue is that when using FlatFileItemReader I have to define my resource and provide a FileSystemResouce when that Bean is created. I really want to supply that filename after my first step is completed because I need to run a filewatcher and figure out what the filename is that we want to process. Is there a way of achieving this?
#Bean
public FlatFileItemReader<PartnerRelationship> partnerRelationshipReader() throws ParseException {
FlatFileItemReader<PartnerRelationship> reader = new FlatFileItemReader<>();
reader.setResource(new FileSystemResource("/path/to/my/file/file_20210714.dat"));
reader.setBufferedReaderFactory(new CustomFileReaderFactory());
reader.setStrict(false);
reader.setLineMapper(new DefaultLineMapper<PartnerRelationship>() {{
setLineTokenizer(new FixedLengthTokenizer() {{
setNames(Constants.partnerRelationshipFields);
setColumns(Constants.partnerRelationshipIndeces);
}});
setFieldSetMapper(new PartnerRelationshipFieldSetMapper());
}});
return reader;
}

You could pass the resource to the jobExecutionContext:
ExecutionContext jobExecutionContext = stepExecution.getJobExecution().getExecutionContext();
jobExecutionContext.put("resource", res);
It can be retrieved if you set your bean stepScope:
#Bean
#StepScope
public FlatFileItemReader<PartnerRelationship> partnerRelationshipReader(#Value #{jobExecutionContext['resource']} Resource res) throws ParseException {
FlatFileItemReader<PartnerRelationship> reader = new FlatFileItemReader<>();
reader.setResource(res);
reader.setBufferedReaderFactory(new CustomFileReaderFactory());
reader.setStrict(false);
reader.setLineMapper(new DefaultLineMapper<PartnerRelationship>() {{
setLineTokenizer(new FixedLengthTokenizer() {{
setNames(Constants.partnerRelationshipFields);
setColumns(Constants.partnerRelationshipIndeces);
}});
setFieldSetMapper(new PartnerRelationshipFieldSetMapper());
}});
return reader;
}

Related

Spring batch Step does not read full file

Hi I have a problem with Spring Batch, I create a Job with two step the first step read a csv file by chunks filter bad values and saves into db, and second call to a stored procedure.
My problem is that for some reason the first step only reads partially the data file a 2,5GB csv.
The file have about 13M records but only saves about 400K.
Anybody knows why this happens and how to solve it?
Java version: 8
Spring boot version 2.7.1
This is my step
#Autowired
#Bean(name = "load_data_in_db_step")
public Step importData(
MyProcessor processor,
MyReader reader,
TaskExecutor executor,
#Qualifier("step-transaction-manager") PlatformTransactionManager transactionManager
) {
return stepFactory.get("experian_portals_imports")
.<ExperianPortal, ExperianPortal>chunk(chunkSize)
.reader(reader)
.processor(processor)
.writer(new JpaItemWriterBuilder<ExperianPortal>()
.entityManagerFactory(factory)
.usePersist(true)
.build()
)
.transactionManager(transactionManager)
.allowStartIfComplete(true)
.taskExecutor(executor)
.build();
}
this is the definition of MyReader
#Slf4j
#Component
public class MyReader extends FlatFileItemReader<ExperianPortal>{
private final MyLineMapper mapper;
private final Resource fileToRead;
#Autowired
public ExperianPortalReader(
MyLineMapper mapper,
#Value("${ext.datafile}") String pathToDataFile
) {
this.mapper = mapper;
val formatter = DateTimeFormatter.ofPattern("yyyyMM");
fileToRead = new FileSystemResource(String.format(pathToDataFile, formatter.format(LocalDate.now())));
}
#Override
public void afterPropertiesSet() throws Exception {
setLineMapper(mapper);
setEncoding(StandardCharsets.ISO_8859_1.name());
setLinesToSkip(1);
setResource(fileToRead);
super.afterPropertiesSet();
}
}
edit:
I already try to use a single thread strategy, i think that can be a problem with the RepeatTemplate, but i don't know how to use it correctly.
edit 2:
I give up with a custom solution and I finished using default components they works ok, and the problem was solve.
Remember to use only spring batch components
This is because you are using a non thread-safe item reader in a multi-threaded step. Your item reader extends FlatFileItemReader, and FlatFileItemReader is not thread-safe: Using FlatFileItemReader with a TaskExecutor (Thread Safety). You can try with a single threaded-step (remove .taskExecutor(executor)) and you will see that the entire file will be read.
What happens is that threads are reading records concurrently and the read count is not honored (threads are incrementing the read count and the step "thinks" that the file has been read entirely). You have a few options here:
synchronize the call to read in your item reader
wrap your reader in a SynchronizedItemStreamReader (the result would the same as the previous point)
make your item reader bean step-scoped

FlatFileItemWriter - Writer must be open before it can be written to

I've a SpringBatch Job where I skip all duplicate items write to a Flat file.
However the FlatFileItemWriter throws the below error whenever there's a duplicate:
Writer must be open before it can be written to
Below is the Writer & SkipListener configuration -
#Bean(name = "duplicateItemWriter")
public FlatFileItemWriter<InventoryFileItem> dupItemWriter(){
return new FlatFileItemWriterBuilder<InventoryFileItem>()
.name("duplicateItemWriter")
.resource(new FileSystemResource("duplicateItem.txt"))
.lineAggregator(new PassThroughLineAggregator<>())
.append(true)
.shouldDeleteIfExists(true)
.build();
}
public class StepSkipListener implements SkipListener<InventoryFileItem, InventoryItem> {
private FlatFileItemWriter<InventoryFileItem> skippedItemsWriter;
public StepSkipListener(FlatFileItemWriter<InventoryFileItem> skippedItemsWriter) {
this.skippedItemsWriter = skippedItemsWriter;
}
#Override
public void onSkipInProcess(InventoryFileItem item, Throwable t) {
System.out.println(item.getBibNum() + " Process - " + t.getMessage());
try {
skippedItemsWriter.write(Collections.singletonList(item));
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
The overall Job is defined as below and I'm using the duplicateItemWriter from the SkipListener.
#Bean(name = "fileLoadJob")
#Autowired
public Job fileLoadJob(JobBuilderFactory jobs, StepBuilderFactory steps,
FlatFileItemReader<inventoryFileItem> fileItemReader,
CompositeItemProcessor compositeItemProcessor,
#Qualifier(value = "itemWriter") ItemWriter<InventoryItem> itemWriter,
StepSkipListener skipListener) {
return jobs.get("libraryFileLoadJob")
.start(steps.get("step").<InventoryFileItem, InventoryItem>chunk(chunkSize)
.reader(FileItemReader)
.processor(compositeItemProcessor)
.writer(itemWriter)
.faultTolerant()
.skip(Exception.class)
.skipLimit(Integer.parseInt(skipLimit))
.listener(skipListener)
.build())
.build();
}
I've also tried to write all data to FlatFileItemWriter - that doesn't work as well. However if write to a DB, then there's no issue with it.
The Spring-Batch version I'm using is - 4.3.3
I've referred to the below threads as well:
unit testing a FlatFileItemWriter outside of Spring - "Writer must be open before it can be written to" exception
Spring Batch WriterNotOpenException
FlatfileItemWriter with Compositewriter example
This was just gross oversight, I missed that the FlatFileItemWriter needs a stream.
I'm somewhat disappointed to put up this question, but I'm posting the answer just in case it helps someone.
The solution was as simple as adding a stream(dupItemWriter) to the Job Definition.
FlatfileItemWriter with Compositewriter example
#Bean(name = "fileLoadJob")
#Autowired
public Job fileLoadJob(JobBuilderFactory jobs, StepBuilderFactory steps,
FlatFileItemReader<inventoryFileItem> fileItemReader,
CompositeItemProcessor compositeItemProcessor,
#Qualifier(value = "itemWriter") ItemWriter<InventoryItem> itemWriter,
#Qualifier(value = "duplicateItemWriter")FlatFileItemWriter<InventoryFileItem> dupItemWriter,
StepSkipListener skipListener) {
return jobs.get("libraryFileLoadJob")
.start(steps.get("step").<InventoryFileItem, InventoryItem>chunk(chunkSize)
.reader(FileItemReader)
.processor(compositeItemProcessor)
.writer(itemWriter)
.faultTolerant()
.skip(Exception.class)
.skipLimit(Integer.parseInt(skipLimit))
.listener(skipListener)
.stream(dupItemWriter)
.build())
.build();
}
Its not absolutely necessary to include the .stream(dupItemWriter)
you can also call the writer's .open() method instead.
In my case I was creating dynamic/programmatic ItemWriters so adding them to the streams was not feasible.
.stream(writer-1)
.stream(writer-2)
.stream(writer-N)
Instead I called .open() method myself:
FlatFileItemWriter<OutputContact> itemWriter = new FlatFileItemWriter<>();
itemWriter.setResource(outPutResource);
itemWriter.setAppendAllowed(true);
itemWriter.setLineAggregator(lineAggregator);
itemWriter.setHeaderCallback(writer -> writer.write("ACCT,MEMBER,SOURCE"));
itemWriter.open(new ExecutionContext());
I had the same issue,
I created two writers by inheritance of FlatFileItemWriter.
That was working before I added #StepScope annotation. But after that, the first one throws an Exception with "Writer must be open before it can be written to" error message. But the second one worked without any problem.
I solved that by calling the method open(new ExecutionContext()); but i still not understand why the second one works but not the first one.

#StepScope Causing issue reader must be open before it can be read [duplicate]

I want to use Spring Batch (v3.0.9) restart functionality so that when JobInstance restarted the process step reads from the last failed chunk point forward. My restart works fine as long as I don't use #StepScope annotation to my myBatisPagingItemReader bean method.
I was using #StepScope so that i can do late binding to get the JobParameters in my myBatisPagingItemReader bean method #Value("#{jobParameters['run-date']}"))
If I use #StepScope annotation on myBatisPagingItemReader() bean method the restart does not work as it creates new instance (scope=step, name=scopedTarget.myBatisPagingItemReader).
If i use stepscope, is it possible for my myBatisPagingItemReader to set the read.count from the last failure to get restart working?
I have explained this issue with example below.
#Configuration
#EnableBatchProcessing
public class BatchConfig {
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<Model> myBatisPagingItemReader,
ItemProcessor<Model, Model> itemProcessor,
ItemWriter<Model> itemWriter) {
return stepBuilderFactory.get("data-load")
.<Model, Model>chunk(10)
.reader(myBatisPagingItemReader)
.processor(itemProcessor)
.writer(itemWriter)
.listener(itemReadListener())
.listener(new JobParameterExecutionContextCopyListener())
.build();
}
#Bean
public Job job(JobBuilderFactory jobBuilderFactory, #Qualifier("step1")
Step step1) {
return jobBuilderFactory.get("load-job")
.incrementer(new RunIdIncrementer())
.start(step1)
.listener(jobExecutionListener())
.build();
}
#Bean
#StepScope
public ItemReader<Model> myBatisPagingItemReader(
SqlSessionFactory sqlSessionFactory,
#Value("#{JobParameters['run-date']}") String runDate)
{
MyBatisPagingItemReader<Model> reader = new
MyBatisPagingItemReader<>();
Map<String, Object> parameterValues = new HashMap<>();
parameterValues.put("runDate", runDate);
reader.setSqlSessionFactory(sqlSessionFactory);
reader.setParameterValues(parameterValues);
reader.setQueryId("query");
return reader;
}
}
Restart Example when I use #Stepscope annotation to myBatisPagingItemReader(), the reader is fetching 5 records and I have chunk size(commit-interval) set to 3.
Job Instance - 01 - Job Parameter - 01/02/2019.
chunk-1:
- process record-1
- process record-2
- process record-3
writer - writes all 3 records
chunk-1 commit successful
chunk-2:
process record-4
process record-5 - Throws and exception
Job completes and set to 'FAILED' status
Now the Job is Restarted again using same Job Parameter.
Job Instance - 01 - Job Parameter - 01/02/2019.
chunk-1:
process record-1
process record-2
process record-3
writer - writes all 3 records
chunk-1 commit successful
chunk-2:
process record-4
process record-5 - Throws and exception
Job completes and set to 'FAILED' status
The #StepScope annotation on myBatisPagingItemReader() bean method creates a new instance , see below log message.
Creating object in scope=step, name=scopedTarget.myBatisPagingItemReader
Registered destruction callback in scope=step, name=scopedTarget.myBatisPagingItemReader
As it is new instance it start the process from start, instead of starting from chunk-2.
If i don't use #Stepscope, it restarts from chunk-2 as the restarted job step sets - MyBatisPagingItemReader.read.count=3.
The issue here is that you are returning an ItemReader instead of the fully qualified class (MyBatisPagingItemReader) or at least ItemStreamReader. When you use Spring Batch's step scope, we create a proxy to allow for late initialization. The proxy is based on the return type of the method (ItemReader in your case). The issue you are running into is that because the proxy is of ItemReader, Spring Batch does not know that your bean also implements ItemStream and it is that interface that enables restartability. By default, Spring Batch will automatically register all beans of type ItemStream for you (you can also explicitly register the beans yourself, but it's typically not needed).
To address your issue, the following should work (note the change in the return type):
#Bean
#StepScope
public MyBatisPagingItemReader<Model> myBatisPagingItemReader(
SqlSessionFactory sqlSessionFactory,
#Value("#{JobParameters['run-date']}") String runDate) {
MyBatisPagingItemReader<Model> reader =
new MyBatisPagingItemReader<>();
Map<String, Object> parameterValues = new HashMap<>();
parameterValues.put("runDate", runDate);
reader.setSqlSessionFactory(sqlSessionFactory);
reader.setParameterValues(parameterValues);
reader.setQueryId("query");
return reader;
}
This is why it is my recommendation that where possible, when using #Bean annotated methods, you should return the most concrete type possible to allow Spring to help as much as possible.

Spring Batch: Job can't be started with different JobParameters and JobParameters can't be accessed

I have to issues with Spring Batch. Both regarding the JobParameters that are passed in via the command line.
First issue:
I'm using Eclipse to develop my application and test it. Therefore, I added Program arguments to the Run Configurations. These arguments are:
-ts=${current_date} -path="file.csv"
Running the application will throw an exception. The exception is:
Caused by: org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException:
A job instance already exists and is complete for parameters={ts=20210211_1631, path=file.csv}.
If you want to run this job again, change the parameters.
As you can see the JobParameters should be different for each execution, because one of the parameters is a timestamp that is changing each minute. I had a look at this question Spring Batch: execute same job with different parameters, but here the solution is to set a new name for each job execution (e.g. name + System.currentTimeMillis()). Is there another solution to this problem? I don't want to create a 'random' name for the job each time it is executed. My Job is implemented as this:
#Bean(name = "inJob")
public Job inJob(JobRepository jobRepository) {
return jobBuilderFactory.get("inJob")
.repository(jobRepository)
.incrementer(new RunIdIncrementer())
.start(truncateTable())
.next(loadCsv())
.next(updateType())
.build();
}
I'm using a custom implementation of the JobRepository to store the metadata in a different database schema:
#Override
public JobRepository createJobRepository() throws Exception {
JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
factory.setDataSource(dataSource);
factory.setTransactionManager(transactionManager);
factory.setTablePrefix("logging.BATCH_");
return factory.getObject();
}
Second issue:
My second issue is accessing the JobParameters. One of the above parameters is a file path I want to use in the FlatFileItemReader:
#Bean(name = "inReader")
#StepScope
public FlatFileItemReader<CsvInfile> inReader() {
FlatFileItemReader<CsvInfile> reader = new FlatFileItemReader<CsvInfile>();
reader.setResource(new FileSystemResource(path));
DefaultLineMapper<CsvInfile> lineMapper = new DefaultLineMapper<>();
lineMapper.setFieldSetMapper(new CsvInfileFieldMapper());
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setDelimiter("|");
tokenizer.setNames(ccn.names);
lineMapper.setLineTokenizer(tokenizer);
reader.setLineMapper(lineMapper);
reader.setLinesToSkip(1);
reader.open(new ExecutionContext());
return reader;
}
To get the path from the JobParameters I used the BeforeStep annotation to load the JobParameters and copy them on local variables. Unfortunately this is not working. The variable will be null and the execution fails, because the file can't be opened.
private String path;
#BeforeStep
public void beforeStep(StepExecution stepExecution) {
JobParameters jobParameters = stepExecution.getJobParameters();
this.path = jobParameters.getString("path");
}
How can I access the JobParameters within my reader? I want to pass in the file path as command line argument and then read this file.
First issue: Is there another solution to this problem?
You current date is resolved per minute, so if you run your job more than one time during that minute, there would be already an job instance with the same parameters, hence the issue. Your ts parameter should have a precision of a second (or less if needed).
Second issue: How can I access the JobParameters within my reader? I want to pass in the file path as command line argument and then read this file.
You don't need that beforeStep method. You can late-bind the job parameter in your bean definition as follows:
#Bean(name = "inReader")
#StepScope
public FlatFileItemReader<CsvInfile> inReader(#Value("#{jobParameters['path']}") String path) {
FlatFileItemReader<CsvInfile> reader = new FlatFileItemReader<CsvInfile>();
reader.setResource(new FileSystemResource(path));
// ...
return reader;
}
This would inject the file path in your reader definition if you pass path as a job parameter, something like:
java -jar myjob.jar path=/absolute/path/to/your/file
This is explained in the Late Binding of Job and Step Attributes section of the reference documentation.

can we process the multiple files sequentially using spring Batch while multiple threads used to process individual files data..?

I want to process multiple files sequentially and each file needs to be processed with the help of multiple threads so used the spring batch FlatFileItemReader and TaskExecutor and it seems to be working fine for me. As mentioned in the requirement we have to process multiple files, so along with FlatFileItemReader, I am using MultiResourceItemReader which will take a number of files and process one by one where I am facing issues. Can someone help me what is the cause of exception? What is the approach to fix it..?
org.springframework.batch.item.ReaderNotOpenException: Reader must be open before it can be read.
at org.springframework.batch.item.file.FlatFileItemReader.readLine(FlatFileItemReader.java:195) ~[spring-batch-infrastructure-3.0.5.RELEASE.jar:3.0.5.RELEASE]
at org.springframework.batch.item.file.FlatFileItemReader.doRead(FlatFileItemReader.java:173) ~[spring-batch-infrastructure-3.0.5.RELEASE.jar:3.0.5.RELEASE]
at org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader.read(AbstractItemCountingItemStreamItemReader.java:88) ~[spring-batch-infrastructure-3.0.5.RELEASE.jar:3.0.5.RELEASE]
at org.springframework.batch.item.file.MultiResourceItemReader.readFromDelegate(MultiResourceItemReader.java:140) ~[spring-batch-infrastructure-3.0.5.RELEASE.jar:3.0.5.RELEASE]
at org.springframework.batch.item.file.MultiResourceItemReader.readNextItem(MultiResourceItemReader.java:119)
customer2.csv
200,Zoe,Nelson,1973-01-12 17:19:30
201,Vivian,Love,1951-10-31 08:57:08
202,Charde,Lang,1967-02-23 12:24:26
customer3.csv
400,Amelia,Osborn,1972-05-09 09:21:22
401,Gemma,Finch,1989-09-25 23:00:59
402,Orli,Slater,1959-03-30 15:54:32
403,Donovan,Beasley,1986-06-18 14:50:30
customer4.csv
600,Zelenia,Henson,1982-07-03 03:28:39
601,Thomas,Mathews,1954-11-21 20:34:03
602,Kevyn,Whitney,1984-09-21 06:24:25
603,Marny,Leon,1984-06-10 21:32:09
604,Jarrod,Gay,1960-06-22 19:11:04
customer5.csv
800,Imogene,Lee,1966-10-19 17:53:44
801,Mira,Franks,1964-03-08 09:47:43
802,Silas,Dixon,1953-04-11 01:37:51
803,Paloma,Daniels,1962-06-14 17:01:02
My code:
#Bean
public MultiResourceItemReader<Customer> multiResourceItemReader() {
System.out.println("In multiResourceItemReader");
MultiResourceItemReader<Customer> reader = new MultiResourceItemReader<>();
reader.setDelegate(customerItemReader());
reader.setResources(inputFiles);
return reader;
}
#Bean
public FlatFileItemReader<Customer> customerItemReader() {
FlatFileItemReader<Customer> reader = new FlatFileItemReader<>();
DefaultLineMapper<Customer> customerLineMapper = new DefaultLineMapper<>();
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setNames(new String[] {"id", "firstName", "lastName", "birthdate"});
customerLineMapper.setLineTokenizer(tokenizer);
customerLineMapper.setFieldSetMapper(new CustomerFieldSetMapper());
customerLineMapper.afterPropertiesSet();
reader.setLineMapper(customerLineMapper);
return reader;
}
bellow snippet working fine while using below :
#Bean
public Step step1() {
return stepBuilderFactory.get("step1")
.<Customer, Customer>chunk(100).
reader(customerItemReader())
.writer(customerItemWriter()).taskExecutor(taskExecutor()).throttleLimit(10)
.build();
}
}
bellow snippet is not working getting above mentioned exception
#Bean
public Step step1() {
return stepBuilderFactory.get("step1")
.<Customer, Customer>chunk(100).
reader(multiResourceItemReader())
.writer(customerItemWriter()).taskExecutor(taskExecutor()).throttleLimit(10)
.build();
}
Since you are using the reader in a multi-threaded step, a thread could have closed the current file while another thread is trying to read from that file at the same time. You need to synchronize access to your reader with a SynchronizedItemStreamReader:
#Bean
public SynchronizedItemStreamReader<Customer> multiResourceItemReader() {
System.out.println("In multiResourceItemReader");
MultiResourceItemReader<Customer> reader = new MultiResourceItemReader<>();
reader.setDelegate(customerItemReader());
reader.setResources(inputFiles);
SynchronizedItemStreamReader<Customer> synchronizedItemStreamReader = new SynchronizedItemStreamReader<>();
synchronizedItemStreamReader.setDelegate(reader);
return synchronizedItemStreamReader;
}

Categories

Resources