Add timestamp to filename in StaxEventItemWriter - java

I have the following Spring Batch itemwriter that streams to a file, but wish to add a timestamp to the filename.
The following code works but is incorrect, because the timestamp is set at startup time rather than on every batch run.
What I like to achieve is that a new filename gets assigned on every job run, any idea how to do this?
#Component
public class EccAddSumoItemWriter extends SumoStaxEventItemWriter<AddSubscriptionXml> {
public static final DateTimeFormatter DATE_FORMATTER = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH-mm-ss.SSS");
public EccAddSumoItemWriter(#Value("${sumo.output_folder}") String sumoSavePath) {
setShouldDeleteIfEmpty(true);
setRootIdentification("editionCodeChanged_add");
setResourcePath(sumoSavePath + "/edition_code_changed_add-" + now().format(DATE_FORMATTER) + ".xml");
}
}
setResourcePath merely refers to:
protected void setResourcePath(String resourceFilePath) {
this.setResource(new FileSystemResource(resourceFilePath));
}
What is the advised way of doing this in Spring Batch?

Ditch your EccAddSumoItemWriter and write an #Bean method that creates a #StepScope or #JobScope SumoStaxEventItemWriter.
#StepScope
#Bean
public SumoStaxEventItemWriter writer(#Value("${sumo.output_folder}") String sumoSavePath) {
SumoStaxEventItemWriter writer = new SumoStaxEventItemWriter();
writer.setShouldDeleteIfEmpty(true);
writer.setRootIdentification("editionCodeChanged_add");
writer.setResourcePath(sumoSavePath + "/edition_code_changed_add-" + now().format(DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH-mm-ss.SSS")) + ".xml");
return writer;
}
Now when building your step use this step scoped writer and each time you will get a fresh instance configured accordingly.

From the theory it seems that simply #StepScope should resolve the problem.
I will test this.
After test, worked perfectly:

Related

Spring batch Step does not read full file

Hi I have a problem with Spring Batch, I create a Job with two step the first step read a csv file by chunks filter bad values and saves into db, and second call to a stored procedure.
My problem is that for some reason the first step only reads partially the data file a 2,5GB csv.
The file have about 13M records but only saves about 400K.
Anybody knows why this happens and how to solve it?
Java version: 8
Spring boot version 2.7.1
This is my step
#Autowired
#Bean(name = "load_data_in_db_step")
public Step importData(
MyProcessor processor,
MyReader reader,
TaskExecutor executor,
#Qualifier("step-transaction-manager") PlatformTransactionManager transactionManager
) {
return stepFactory.get("experian_portals_imports")
.<ExperianPortal, ExperianPortal>chunk(chunkSize)
.reader(reader)
.processor(processor)
.writer(new JpaItemWriterBuilder<ExperianPortal>()
.entityManagerFactory(factory)
.usePersist(true)
.build()
)
.transactionManager(transactionManager)
.allowStartIfComplete(true)
.taskExecutor(executor)
.build();
}
this is the definition of MyReader
#Slf4j
#Component
public class MyReader extends FlatFileItemReader<ExperianPortal>{
private final MyLineMapper mapper;
private final Resource fileToRead;
#Autowired
public ExperianPortalReader(
MyLineMapper mapper,
#Value("${ext.datafile}") String pathToDataFile
) {
this.mapper = mapper;
val formatter = DateTimeFormatter.ofPattern("yyyyMM");
fileToRead = new FileSystemResource(String.format(pathToDataFile, formatter.format(LocalDate.now())));
}
#Override
public void afterPropertiesSet() throws Exception {
setLineMapper(mapper);
setEncoding(StandardCharsets.ISO_8859_1.name());
setLinesToSkip(1);
setResource(fileToRead);
super.afterPropertiesSet();
}
}
edit:
I already try to use a single thread strategy, i think that can be a problem with the RepeatTemplate, but i don't know how to use it correctly.
edit 2:
I give up with a custom solution and I finished using default components they works ok, and the problem was solve.
Remember to use only spring batch components
This is because you are using a non thread-safe item reader in a multi-threaded step. Your item reader extends FlatFileItemReader, and FlatFileItemReader is not thread-safe: Using FlatFileItemReader with a TaskExecutor (Thread Safety). You can try with a single threaded-step (remove .taskExecutor(executor)) and you will see that the entire file will be read.
What happens is that threads are reading records concurrently and the read count is not honored (threads are incrementing the read count and the step "thinks" that the file has been read entirely). You have a few options here:
synchronize the call to read in your item reader
wrap your reader in a SynchronizedItemStreamReader (the result would the same as the previous point)
make your item reader bean step-scoped

FlatFileItemWriter - Writer must be open before it can be written to

I've a SpringBatch Job where I skip all duplicate items write to a Flat file.
However the FlatFileItemWriter throws the below error whenever there's a duplicate:
Writer must be open before it can be written to
Below is the Writer & SkipListener configuration -
#Bean(name = "duplicateItemWriter")
public FlatFileItemWriter<InventoryFileItem> dupItemWriter(){
return new FlatFileItemWriterBuilder<InventoryFileItem>()
.name("duplicateItemWriter")
.resource(new FileSystemResource("duplicateItem.txt"))
.lineAggregator(new PassThroughLineAggregator<>())
.append(true)
.shouldDeleteIfExists(true)
.build();
}
public class StepSkipListener implements SkipListener<InventoryFileItem, InventoryItem> {
private FlatFileItemWriter<InventoryFileItem> skippedItemsWriter;
public StepSkipListener(FlatFileItemWriter<InventoryFileItem> skippedItemsWriter) {
this.skippedItemsWriter = skippedItemsWriter;
}
#Override
public void onSkipInProcess(InventoryFileItem item, Throwable t) {
System.out.println(item.getBibNum() + " Process - " + t.getMessage());
try {
skippedItemsWriter.write(Collections.singletonList(item));
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
The overall Job is defined as below and I'm using the duplicateItemWriter from the SkipListener.
#Bean(name = "fileLoadJob")
#Autowired
public Job fileLoadJob(JobBuilderFactory jobs, StepBuilderFactory steps,
FlatFileItemReader<inventoryFileItem> fileItemReader,
CompositeItemProcessor compositeItemProcessor,
#Qualifier(value = "itemWriter") ItemWriter<InventoryItem> itemWriter,
StepSkipListener skipListener) {
return jobs.get("libraryFileLoadJob")
.start(steps.get("step").<InventoryFileItem, InventoryItem>chunk(chunkSize)
.reader(FileItemReader)
.processor(compositeItemProcessor)
.writer(itemWriter)
.faultTolerant()
.skip(Exception.class)
.skipLimit(Integer.parseInt(skipLimit))
.listener(skipListener)
.build())
.build();
}
I've also tried to write all data to FlatFileItemWriter - that doesn't work as well. However if write to a DB, then there's no issue with it.
The Spring-Batch version I'm using is - 4.3.3
I've referred to the below threads as well:
unit testing a FlatFileItemWriter outside of Spring - "Writer must be open before it can be written to" exception
Spring Batch WriterNotOpenException
FlatfileItemWriter with Compositewriter example
This was just gross oversight, I missed that the FlatFileItemWriter needs a stream.
I'm somewhat disappointed to put up this question, but I'm posting the answer just in case it helps someone.
The solution was as simple as adding a stream(dupItemWriter) to the Job Definition.
FlatfileItemWriter with Compositewriter example
#Bean(name = "fileLoadJob")
#Autowired
public Job fileLoadJob(JobBuilderFactory jobs, StepBuilderFactory steps,
FlatFileItemReader<inventoryFileItem> fileItemReader,
CompositeItemProcessor compositeItemProcessor,
#Qualifier(value = "itemWriter") ItemWriter<InventoryItem> itemWriter,
#Qualifier(value = "duplicateItemWriter")FlatFileItemWriter<InventoryFileItem> dupItemWriter,
StepSkipListener skipListener) {
return jobs.get("libraryFileLoadJob")
.start(steps.get("step").<InventoryFileItem, InventoryItem>chunk(chunkSize)
.reader(FileItemReader)
.processor(compositeItemProcessor)
.writer(itemWriter)
.faultTolerant()
.skip(Exception.class)
.skipLimit(Integer.parseInt(skipLimit))
.listener(skipListener)
.stream(dupItemWriter)
.build())
.build();
}
Its not absolutely necessary to include the .stream(dupItemWriter)
you can also call the writer's .open() method instead.
In my case I was creating dynamic/programmatic ItemWriters so adding them to the streams was not feasible.
.stream(writer-1)
.stream(writer-2)
.stream(writer-N)
Instead I called .open() method myself:
FlatFileItemWriter<OutputContact> itemWriter = new FlatFileItemWriter<>();
itemWriter.setResource(outPutResource);
itemWriter.setAppendAllowed(true);
itemWriter.setLineAggregator(lineAggregator);
itemWriter.setHeaderCallback(writer -> writer.write("ACCT,MEMBER,SOURCE"));
itemWriter.open(new ExecutionContext());
I had the same issue,
I created two writers by inheritance of FlatFileItemWriter.
That was working before I added #StepScope annotation. But after that, the first one throws an Exception with "Writer must be open before it can be written to" error message. But the second one worked without any problem.
I solved that by calling the method open(new ExecutionContext()); but i still not understand why the second one works but not the first one.

Spring Batch: Job can't be started with different JobParameters and JobParameters can't be accessed

I have to issues with Spring Batch. Both regarding the JobParameters that are passed in via the command line.
First issue:
I'm using Eclipse to develop my application and test it. Therefore, I added Program arguments to the Run Configurations. These arguments are:
-ts=${current_date} -path="file.csv"
Running the application will throw an exception. The exception is:
Caused by: org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException:
A job instance already exists and is complete for parameters={ts=20210211_1631, path=file.csv}.
If you want to run this job again, change the parameters.
As you can see the JobParameters should be different for each execution, because one of the parameters is a timestamp that is changing each minute. I had a look at this question Spring Batch: execute same job with different parameters, but here the solution is to set a new name for each job execution (e.g. name + System.currentTimeMillis()). Is there another solution to this problem? I don't want to create a 'random' name for the job each time it is executed. My Job is implemented as this:
#Bean(name = "inJob")
public Job inJob(JobRepository jobRepository) {
return jobBuilderFactory.get("inJob")
.repository(jobRepository)
.incrementer(new RunIdIncrementer())
.start(truncateTable())
.next(loadCsv())
.next(updateType())
.build();
}
I'm using a custom implementation of the JobRepository to store the metadata in a different database schema:
#Override
public JobRepository createJobRepository() throws Exception {
JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
factory.setDataSource(dataSource);
factory.setTransactionManager(transactionManager);
factory.setTablePrefix("logging.BATCH_");
return factory.getObject();
}
Second issue:
My second issue is accessing the JobParameters. One of the above parameters is a file path I want to use in the FlatFileItemReader:
#Bean(name = "inReader")
#StepScope
public FlatFileItemReader<CsvInfile> inReader() {
FlatFileItemReader<CsvInfile> reader = new FlatFileItemReader<CsvInfile>();
reader.setResource(new FileSystemResource(path));
DefaultLineMapper<CsvInfile> lineMapper = new DefaultLineMapper<>();
lineMapper.setFieldSetMapper(new CsvInfileFieldMapper());
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setDelimiter("|");
tokenizer.setNames(ccn.names);
lineMapper.setLineTokenizer(tokenizer);
reader.setLineMapper(lineMapper);
reader.setLinesToSkip(1);
reader.open(new ExecutionContext());
return reader;
}
To get the path from the JobParameters I used the BeforeStep annotation to load the JobParameters and copy them on local variables. Unfortunately this is not working. The variable will be null and the execution fails, because the file can't be opened.
private String path;
#BeforeStep
public void beforeStep(StepExecution stepExecution) {
JobParameters jobParameters = stepExecution.getJobParameters();
this.path = jobParameters.getString("path");
}
How can I access the JobParameters within my reader? I want to pass in the file path as command line argument and then read this file.
First issue: Is there another solution to this problem?
You current date is resolved per minute, so if you run your job more than one time during that minute, there would be already an job instance with the same parameters, hence the issue. Your ts parameter should have a precision of a second (or less if needed).
Second issue: How can I access the JobParameters within my reader? I want to pass in the file path as command line argument and then read this file.
You don't need that beforeStep method. You can late-bind the job parameter in your bean definition as follows:
#Bean(name = "inReader")
#StepScope
public FlatFileItemReader<CsvInfile> inReader(#Value("#{jobParameters['path']}") String path) {
FlatFileItemReader<CsvInfile> reader = new FlatFileItemReader<CsvInfile>();
reader.setResource(new FileSystemResource(path));
// ...
return reader;
}
This would inject the file path in your reader definition if you pass path as a job parameter, something like:
java -jar myjob.jar path=/absolute/path/to/your/file
This is explained in the Late Binding of Job and Step Attributes section of the reference documentation.

unit testing a FlatFileItemWriter outside of Spring - "Writer must be open before it can be written to" exception

I am writing a simple batch that writes a CSV file, and I wanted to use Spring Batch FlatFileItemWriter for that, using Spring Boot 2.3.1.RELEASE.
I want to unit test my writer so that I can confirm it's configured properly.
the code is very simple :
public class CSVResultWriter implements ItemWriter<Project> {
private final FlatFileItemWriter writer;
public CSVResultWriter(String outputResource) {
writer=new FlatFileItemWriterBuilder<Project>()
.name("itemWriter")
.resource(new FileSystemResource(outputResource))
.lineAggregator(new PassThroughLineAggregator<>())
.append(true)
.build();
}
#Override
public void write(List<? extends Project> items) throws Exception {
writer.write(items);
}
}
and I am writing a simple unit test without Spring, something like :
File generatedCsvFile = new File(workingDir.toString() + File.separator + "outputData.csv");
CSVResultWriter writer = new CSVResultWriter(generatedCsvFile.getAbsolutePath());
Project sampleProject = Project.builder().name("sampleProject1").build();
writer.write(List.of(sampleProject));
assertThat(generatedCsvFile).exists();
But the test fails saying :
org.springframework.batch.item.WriterNotOpenException: Writer must be open before it can be written to
Looking at Spring source code, I don't understand how it's possible to make it work.. When trying to write items, the first thing that Spring does is checking that the writer is initialized :
https://github.com/spring-projects/spring-batch/blob/744d1834fe313204f06c0bcd0eedd472ab4af6be/spring-batch-infrastructure/src/main/java/org/springframework/batch/item/support/AbstractFileItemWriter.java#L237
#Override
public void write(List<? extends T> items) throws Exception {
if (!getOutputState().isInitialized()) {
throw new WriterNotOpenException("Writer must be open before it can be written to");
}
...
but the way OutputState is built doesn't give a chance to say it has been initiated :
https://github.com/spring-projects/spring-batch/blob/744d1834fe313204f06c0bcd0eedd472ab4af6be/spring-batch-infrastructure/src/main/java/org/springframework/batch/item/support/AbstractFileItemWriter.java#L363
// Returns object representing state.
protected OutputState getOutputState() {
if (state == null) {
File file;
try {
file = resource.getFile();
}
catch (IOException e) {
throw new ItemStreamException("Could not convert resource to file: [" + resource + "]", e);
}
Assert.state(!file.exists() || file.canWrite(), "Resource is not writable: [" + resource + "]");
state = new OutputState();
state.setDeleteIfExists(shouldDeleteIfExists);
state.setAppendAllowed(append);
state.setEncoding(encoding);
}
return state;
}
--> the initialized flag in OutputState keeps its default value, which is false.
So I am a bit puzzled.. I guess when this is managed by Spring, some magic happens and it works.
Am I missing something obvious, or can't we really test this outside of Spring ?
The FlatFileItemWriter implements the ItemStream contract, which will be automatically honored when used in a Spring Batch job.
If you want to use the writer outside of Spring, you need to call theses methods (open/update/close) manually. This is mentioned in the Item Stream section of the reference docs:
Clients of an ItemReader that also implement ItemStream should call open before any calls
to read, in order to open any resources such as files or to obtain connections.
A similar restriction applies to an ItemWriter that implements ItemStream

Choose between different steps depending on argument in spring batch

I'm using spring batch to write an application that read from a table and then write the output to a csv file. The application receives several input params, one of them is the database table to read. I want to write a single job that read the correct table depending on the input param. This is my Configuration class:
#Configuration
public class ExtractorConfiguration {
#Bean(name="readerA")
#StepScope
public JdbcCursorItemReader<ClassA> readerA(
#Value("#{jobParameters['REF_DATE']}")String dataRef
){
...
return reader;
}
#Bean(name="writerA")
#StepScope
public FlatFileItemWriter<ClassA> writerA(
#Value("#{jobParameters['OUTPUT_FILE_PATH']}")String outputPath
) {
...
return writer;
}
//endregion
#Bean(name="extractStep")
#StepScope
public Step extractStep(
#Value("#{jobParameters['DATABASE_TABLE']}")String tableName
) throws Exception {
switch (tableName) {
case tableA:
return steps.get("extractStep")
.<ClassA, ClassA>chunk(applicationProperties.getChunkSize())
.reader(readerA(""))
.writer(writerA(""))
.build();
default:
throw new Exception("Wrong table: " + tableName);
}
}
#Bean(name = "myJob")
public Job myJob() throws Exception {
return jobs.get("myJob")
.flow(extractStep(""))
.end()
.build();
}
}
The idea was to add inside extractStep a second case in the switch (something like this):
case tableB:
return steps.get("extractStep")
.<ClassB, ClassB>chunk(applicationProperties.getChunkSize())
.reader(readerB(""))
.writer(writerB(""))
.build();
and then write and exec methods for readerB and writerB; with this approach I'm receving this error:
Caused by: java.lang.IllegalStateException: No context holder available for step scope
I would like to know:
1- what is the error?
2- is there a method to get the JobParameters inside myJob rather than inside the steps?
3- is there a better approach?
Thanks.

Categories

Resources