Spring batch multiple output file for input files with MultiResourceItemReader - java

I am using MultiResourceItemReader to read from multiple CSV files that have lines of ObjectX(field1,field2,field3...)
but the problem is that when the processor ends the writer gets all the lines of ObjectX from all the files.
and I have to write the data accepted in a file with the same name as inputFile.
I am using DelimitedLineAggregator
is there a way to have a writer for each file while using MultiResourceItemReade because the writer accepts only one resource at a time?
this is an example of what I have
#Bean
public MultiResourceItemReader<ObjectX> multiResourceItemReader()
{
MultiResourceItemReader<ObjectX> resourceItemReader = new MultiResourceItemReader<ObjectX>();
resourceItemReader.setResources(inputResources);
resourceItemReader.setDelegate(reader());
return resourceItemReader;
}
#Bean
public FlatFileItemReader<ObjectX> flatFileItemReader() {
FlatFileItemReader<ObjectX> flatFileItemReader = new FlatFileItemReader<>();
flatFileItemReader.setComments(new String[]{});
flatFileItemReader.setLineMapper(lineMapper());
return flatFileItemReader;
}
#Override
#StepScope
public Sinistre process(ObjectX objectX) throws Exception {
//business logic
return objectX;
}
#Bean
#StepScope
public FlatFileItemWriter<Sinistre> flatFileItemWriter(
#Value("${doneFile}") FileSystemResource doneFile,
#Value("#{stepExecution.jobExecution}") JobExecution jobExecution
) {
FlatFileItemWriter writer = new FlatFileItemWriter<Sinistre>() {
private String resourceName;
#Override
public String doWrite(List<? extends ObjectX> items) {
//business logic
//business logic
//business logic
return super.doWrite(items);
}
};
DelimitedLineAggregator delimitedLineAggregator = new DelimitedLineAggregator();
delimitedLineAggregator.setDelimiter(";");
BeanWrapperFieldExtractor beanWrapperFieldExtractor = new BeanWrapperFieldExtractor();
beanWrapperFieldExtractor.setNames(new String[]{"field1", "field2", "field3", "field4".......});
delimitedLineAggregator.setFieldExtractor(beanWrapperFieldExtractor);
writer.setResource(doneFile);
writer.setLineAggregator(delimitedLineAggregator);
// how to write the header
writer.setHeaderCallback(new FlatFileHeaderCallback() {
#Override
public void writeHeader(Writer writer) throws IOException {
writer.write((String) jobExecution.getExecutionContext().get("header"));
}
});
writer.setAppendAllowed(false);
writer.setFooterCallback(new FlatFileFooterCallback() {
#Override
public void writeFooter(Writer writer) throws IOException {
writer.write("#--- fin traitement ---");
}
});
return writer;
}
this is what I called ObjectX
public class SinistreDto implements ResourceAware {
private String codeCompagnieA;//A
private String numPoliceA;//B
private String numAttestationA;//C
private String immatriculationA;//D
private String numSinistreA;//E
private String pctResponsabiliteA;//F
private String dateOuvertureA;//G
private String codeCompagnieB;//H
private String numPoliceB;//I
private String numAttestationB;//J
private String immatriculationB;//K
private String numSinistreB;//L
private Resource resource;
}
and this is the CSV file's data (I will have a bunch of files with data exactly like this)
38;5457;16902-A;0001-02-34;84485;000;20221010 12:15;55;5457;W3456;22-A555
76;544687;16902;1234-56;8448;025;20221010 12:15;22;544687;WW456;22-A555
65;84987;16902;WW 123456;74478;033;20221010 12:15;88;84987;WW3456;22-A555
this is how I expect the output file for each input file.
#header
38;5457;16902-A;0001-02-34;84485;000;20221010 12:15;55;5457;W3456;22-A555
76;544687;16902;1234-56;8448;025;20221010 12:15;22;544687;WW456;22-A555
65;84987;16902;WW 123456;74478;033;20221010 12:15;88;84987;WW3456;22-A555
#--- fin traitement ---

I see no difference between the input file and output file except the header and trailer lines. But that is not an issue, you probably omitted the processing part as it is not relevant to the question.
I believe the MultiResourceItemReader is not suitable for your case as data from different input files can end up in the same chunk, and hence written to the same output file, which is not what you want.
I think a good option for your use case is to use partitioning, where each partition is a file. This way, each input file will be read, processed and written to a corresponding output file. Spring Batch provides the MultiResourcePartitioner that will create a partition per file. You can find an example here: https://github.com/spring-projects/spring-batch/blob/main/spring-batch-samples/src/main/resources/jobs/iosample/multiResource.xml.

Related

Spring batch, how do I write simple strings into a file?

I'm new to spring batch and so far I've studied how to take a list of Objects as input from csv,xml,json files or DBs and write down those same Object in external files or DBs.
However I just realized that I don't know how to output simple strings, for example I've made this simple Processor:
public class ProductProcessorObjToStr implements ItemProcessor<Product,String> {
#Override
public String process(Product product) throws Exception {
return product.getProductName();
}
}
So that I get a simple list of names but I have no idea how to create the correct item writer.
I've studied theese kinds of writers where I map the various Object fields:
#Bean
#StepScope
public FlatFileItemWriter flatFileItemWriter(#Value("#{jobParameters['outputFile']}") FileSystemResource outputFile){
FlatFileItemWriter writer = new FlatFileItemWriter<Product>();
writer.setResource(outputFile);
writer.setLineAggregator(new DelimitedLineAggregator(){
{
setDelimiter("|");
setFieldExtractor(new BeanWrapperFieldExtractor(){
{
setNames(new String[]
{"productID","productName","productDesc","price","unit"});
}
});
}
});
writer.setHeaderCallback(new FlatFileHeaderCallback() {
#Override
public void writeHeader(Writer writer) throws IOException {
writer.write("productID,productName,ProductDesc,price,unit");
}
});
writer.setFooterCallback(new FlatFileFooterCallback() {
#Override
public void writeFooter(Writer writer) throws IOException {
//scrivo il footer
writer.write("****** File created at " + new SimpleDateFormat().format(new
Date()) + " ******");
}
});
return writer;
}
Which writer do I use for strings and how do I create it?
Thank you in advance for your suggestions !
Have a nice day.
for this case I think java.io.PrintWriter should do it and you can use method printLn() to print your string in a line if you want this,
and if you want to write a object and then read it and load it in your app use java.io.ObjectOutputStream to write your object then use java.io.ObjectInputStream to read it and load it to an instance of your object
note that you should implement java.io.Serializable to use it.

Spring batch FlatFileItemWriter write as csv from Object

I am using Spring batch and have an ItemWriter as follows:
public class MyItemWriter implements ItemWriter<Fixing> {
private final FlatFileItemWriter<Fixing> writer;
private final FileSystemResource resource;
public MyItemWriter () {
this.writer = new FlatFileItemWriter<>();
this.resource = new FileSystemResource("target/output-teste.txt");
}
#Override
public void write(List<? extends Fixing> items) throws Exception {
this.writer.setResource(new FileSystemResource(resource.getFile()));
this.writer.setLineAggregator(new PassThroughLineAggregator<>());
this.writer.afterPropertiesSet();
this.writer.open(new ExecutionContext());
this.writer.write(items);
}
#AfterWrite
private void close() {
this.writer.close();
}
}
When I run my spring batch job, the items are written to file as:
Fixing{id='123456', source='TEST', startDate=null, endDate=null}
Fixing{id='1234567', source='TEST', startDate=null, endDate=null}
Fixing{id='1234568', source='TEST', startDate=null, endDate=null}
1/ How can I write just the data so that the values are comma separated and where it is null, it is not written. So the target file should look like this:
123456,TEST
1234567,TEST
1234568,TEST
2/ Secondly, I am having an issue where only when I exit spring boot application, I am able to see the file get created. What I would like is once it has processed all the items and written, the file to be available without closing the spring boot application.
There are multiple options to write the csv file. Regarding second question writer flush will solve the issue.
https://howtodoinjava.com/spring-batch/flatfileitemwriter-write-to-csv-file/
We prefer to use OpenCSV with spring batch as we are getting more speed and control on huge file example snippet is below
class DocumentWriter implements ItemWriter<BaseDTO>, Closeable {
private static final Logger LOG = LoggerFactory.getLogger(StatementWriter.class);
private ColumnPositionMappingStrategy<Statement> strategy ;
private static final String[] columns = new String[] { "csvcolumn1", "csvcolumn2", "csvcolumn3",
"csvcolumn4", "csvcolumn5", "csvcolumn6", "csvcolumn7"};
private BufferedWriter writer;
private StatefulBeanToCsv<Statement> beanToCsv;
public DocumentWriter() throws Exception {
strategy = new ColumnPositionMappingStrategy<Statement>();
strategy.setType(Statement.class);
strategy.setColumnMapping(columns);
filename = env.getProperty("globys.statement.cdf.path")+"-"+processCount+".dat";
File cdf = new File(filename);
if(cdf.exists()){
writer = Files.newBufferedWriter(Paths.get(filename), StandardCharsets.UTF_8,StandardOpenOption.APPEND);
}else{
writer = Files.newBufferedWriter(Paths.get(filename), StandardCharsets.UTF_8,StandardOpenOption.CREATE_NEW);
}
beanToCsv = new StatefulBeanToCsvBuilder<Statement>(writer).withQuotechar(CSVWriter.NO_QUOTE_CHARACTER)
.withMappingStrategy(strategy).withSeparator(',').build();
}
#Override
public void write(List<? extends BaseDTO> items) throws Exception {
List<Statement> settlementList = new ArrayList<Statement>();
for (int i = 0; i < items.size(); i++) {
BaseDTO baseDTO = items.get(i);
settlementList.addAll(baseDTO.getStatementList());
}
beanToCsv.write(settlementList);
writer.flush();
}
#PreDestroy
#Override
public void close() throws IOException {
writer.close();
}
}
Since you are using PassThroughLineAggregator which does item.toString() for writing the object, overriding the toString() function of classes extending Fixing.java should fix it.
1/ How can I write just the data so that the values are comma separated and where it is null, it is not written.
You need to provide a custom LineAggregator that filters out null fields.
2/ Secondly, I am having an issue where only when I exit spring boot application, I am able to see the file get created
This is probably because you are calling this.writer.open in the write method which is not correct. You need to make your item writer implement ItemStream and call this.writer.open and this this.writer.close respectively in ItemStream#open and ItemStream#close

How to create a generic FlatFileItemReader to read CSV files with different headers?

I'm creating a job that will read and process different .csv files based on an input parameter. There are 3 different types of .csv files with different headers. I want to map each line of a file to a POJO using a generic FlatFileItemReader.
Each type of file will have its own POJO implementation, and all "File Specific POJOs" are subclassed from an abstract GenericFilePOJO.
A tasklet will first read the input parameter to decide which file type needs to be read, and construct a LineTokenizer with the appropriate header columns. It places this information in the infoHolder for retrieval at the reader step.
#Bean
public FlatFileItemReader<GenericFilePOJO> reader() {
FlatFileItemReader<RawFile> reader = new FlatFileItemReader<GenericFilePOJO>();
reader.setLinesToSkip(1); // header
reader.setLineMapper(new DefaultLineMapper() {
{
// The infoHolder will contain the file-specific LineTokenizer
setLineTokenizer(infoHolder.getLineTokenizer());
setFieldSetMapper(new BeanWrapperFieldSetMapper<GenericFilePOJO>() {
{
setTargetType(GenericFilePOJO.class);
}
});
}
});
return reader;
}
Can this reader handle the different File Specific POJOs despite returning the GenericFilePOJO?
You wrote:
A tasklet will first read the input parameter to decide which file
type needs to be read.
Because the tasklet or infoHolder knows about type of file you can implement the creation of specific FieldSetMapper instance.
This is a demo example how it can be implemented:
public class Solution<T extends GenericFilePOJO> {
private InfoHolder infoHolder = new InfoHolder();
#Bean
public FlatFileItemReader<T> reader()
{
FlatFileItemReader<T> reader = new FlatFileItemReader<T>();
reader.setLinesToSkip(1);
reader.setLineMapper(new DefaultLineMapper() {
{
setLineTokenizer(infoHolder.getLineTokenizer());
setFieldSetMapper(infoHolder.getFieldSetMapper());
}
});
return reader;
}
private class InfoHolder {
DelimitedLineTokenizer getLineTokenizer() {
return <some already existent logic>;
}
FieldSetMapper<T> getFieldSetMapper() {
if (some condition for specific file POJO 1){
return new BeanWrapperFieldSetMapper<T>() {
{
setTargetType(FileSpecificPOJO_1.class);
}
};
} else if (some condition for specific file POJO 2){
return new BeanWrapperFieldSetMapper<T>() {
{
setTargetType(FileSpecificPOJO_2.class);
}
};
}
}
}
}

Read two lines of a file at once in a flink streaming process

I want to process files with a flink stream in which two lines belong together. In the first line there is a header and in the second line a corresponding text.
The files are located on my local file system. I am using the readFile(fileInputFormat, path, watchType, interval, pathFilter, typeInfo) method with a custom FileInputFormat.
My streaming job class looks like this:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Read> inputStream = env.readFile(new ReadInputFormatTest("path/to/monitored/folder"), "path/to/monitored/folder", FileProcessingMode.PROCESS_CONTINUOUSLY, 100);
inputStream.print();
env.execute("Flink Streaming Java API Skeleton");
and my ReadInputFormatTest like this:
public class ReadInputFormatTest extends FileInputFormat<Read> {
private transient FileSystem fileSystem;
private transient BufferedReader reader;
private final String inputPath;
private String headerLine;
private String readLine;
public ReadInputFormatTest(String inputPath) {
this.inputPath = inputPath;
}
#Override
public void open(FileInputSplit inputSplit) throws IOException {
FileSystem fileSystem = getFileSystem();
this.reader = new BufferedReader(new InputStreamReader(fileSystem.open(inputSplit.getPath())));
this.headerLine = reader.readLine();
this.readLine = reader.readLine();
}
private FileSystem getFileSystem() {
if (fileSystem == null) {
try {
fileSystem = FileSystem.get(new URI(inputPath));
} catch (URISyntaxException | IOException e) {
throw new RuntimeException(e);
}
}
return fileSystem;
}
#Override
public boolean reachedEnd() throws IOException {
return headerLine == null;
}
#Override
public Read nextRecord(Read r) throws IOException {
r.setHeader(headerLine);
r.setSequence(readLine);
headerLine = reader.readLine();
readLine = reader.readLine();
return r;
}
}
As expected, the headers and the text are stored together in one object. However, the file is read eight times. So the problem is the parallelization. Where and how can I specify that a file is processed only once, but several files in parallel?
Or do I have to change my custom FileInputFormat even further?
I would modify your source to emit the available filenames (instead of the actual file contents) and then add a new processor to read a name from the input stream and then emit pairs of lines. In other words, split the current source into a source followed by a processor. The processor can be made to run at any degree of parallelism and the source would be a single instance.

How do I store the whole read line using Spring Batch?

I am using Spring Boot 1.4 and Spring Batch 1.4 to read in a file and of course, parse the data into the database.
What I would like to do is store the entire line read in the database before the fields are mapped. The entire row would be stored as a string in the database. This is for auditing purposes, therefore I do not want to rebuild the row string from its components.
We have all seen the common mappers in use to get the data from the delimited line:
#Bean
#StepScope
public FlatFileItemReader<Claim> claimFileReader(#Value("#{jobParameters[fileName]}") String pathToFile) {
logger.debug("Setting up FlatFileItemReader for claim");
logger.debug("Job Parameter for input filename: " + pathToFile);
FlatFileItemReader<Claim> reader = new FlatFileItemReader<Claim>();
reader.setResource(new FileSystemResource(pathToFile));
reader.setLineMapper(claimLineMapper());
logger.debug("Finished setting up FlatFileItemReader for claim");
return reader;
}
#Bean
public LineMapper<Claim> claimLineMapper() {
logger.debug("Setting up lineMapper");
DefaultLineMapper<Claim> lineMapper = new DefaultLineMapper<Claim>();
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
lineTokenizer.setDelimiter("|");
lineTokenizer.setStrict(false);
lineTokenizer.setNames(new String[] { "RX_NUMBER", "SERVICE_DT", "CLAIM_STS", "PROCESSOR_CLAIM_ID", "CARRIER_ID", "GROUP_ID", "MEM_UNIQUE_ID" });
BeanWrapperFieldSetMapper<Claim> fieldSetMapper = new BeanWrapperFieldSetMapper<Claim>();
fieldSetMapper.setTargetType(Claim.class);
lineMapper.setLineTokenizer(lineTokenizer);
lineMapper.setFieldSetMapper(claimFieldSetMapper());
logger.debug("Finished Setting up lineMapper");
return lineMapper;
}
If this is my row:
463832|20160101|PAID|504419000000|XYZ|GOLD PLAN|561868
I would want to store "463832|20160101|PAID|504419000000|HBT|GOLD PLAN|561868" as the string in the database (probably with some additional data such as job_instance_id).
Any ideas on how to hook this in during the file reading process?
Instead of using DefaultLineMapper, you can have a new class (suggest CustomLineMapper) as below
public class CustomLineMapper extends DefaultLineMapper<Claim> {
#Override
public Claim mapLine(String line, int lineNumber) throws Exception {
// here you can handle *line content*
return super.mapLine(line, lineNumber);
}
}
line object will contains the raw data which is before mapping it to an object.

Categories

Resources