I am trying to read multiple csv files present on "https://raw.githubusercontent.com/Shrutika09/SpringBatchTemplateUploaderPOC/main/order-data-*.csv" and insert the same into database parallelly using Spring Batch.
When I use the URL for a single csv file (https://raw.githubusercontent.com/Shrutika09/SpringBatchTemplateUploaderPOC/main/order-data-1.csv), all records are read and inserted into database.
But when I try to read all files with a particular naming pattern (https://raw.githubusercontent.com/Shrutika09/SpringBatchTemplateUploaderPOC/main/order-data-*.csv), it doesn't recognizes the file and hence doesn't work as expected.
Is there any way where we can read all files matching a particular naming pattern from a github location.
I am using Spring Batch Partitioner
Partitioner:
#Bean
public Partitioner partitioner() throws Exception {
System.out.println("In Partitioner");
MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
PathMatchingResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();
partitioner.setResources(resolver.getResources("https://raw.githubusercontent.com/Shrutika09/SpringBatchTemplateUploaderPOC/main/order-data-*.csv"));
partitioner.partition(5);
return partitioner;
}
Reader:
#Bean
#StepScope
public FlatFileItemReader<Orders> reader(#Value("#{stepExecutionContext['fileName']}") String path)
throws MalformedURLException {
System.out.println("In Reader");
System.out.println("In Reader" +path);
FlatFileItemReader<Orders> reader = new FlatFileItemReader<Orders>();
reader.setResource(new UrlResource(path));
reader.setLineMapper(new DefaultLineMapper<Orders>() {
{
setLineTokenizer(new DelimitedLineTokenizer() {
{
setNames(new String[] { "id", "firstName", "lastName" });
}
});
setFieldSetMapper(new BeanWrapperFieldSetMapper<Orders>() {
{
setTargetType(Orders.class);
}
});
}
});
return reader;
}
Related
I am using MultiResourceItemReader to read from multiple CSV files that have lines of ObjectX(field1,field2,field3...)
but the problem is that when the processor ends the writer gets all the lines of ObjectX from all the files.
and I have to write the data accepted in a file with the same name as inputFile.
I am using DelimitedLineAggregator
is there a way to have a writer for each file while using MultiResourceItemReade because the writer accepts only one resource at a time?
this is an example of what I have
#Bean
public MultiResourceItemReader<ObjectX> multiResourceItemReader()
{
MultiResourceItemReader<ObjectX> resourceItemReader = new MultiResourceItemReader<ObjectX>();
resourceItemReader.setResources(inputResources);
resourceItemReader.setDelegate(reader());
return resourceItemReader;
}
#Bean
public FlatFileItemReader<ObjectX> flatFileItemReader() {
FlatFileItemReader<ObjectX> flatFileItemReader = new FlatFileItemReader<>();
flatFileItemReader.setComments(new String[]{});
flatFileItemReader.setLineMapper(lineMapper());
return flatFileItemReader;
}
#Override
#StepScope
public Sinistre process(ObjectX objectX) throws Exception {
//business logic
return objectX;
}
#Bean
#StepScope
public FlatFileItemWriter<Sinistre> flatFileItemWriter(
#Value("${doneFile}") FileSystemResource doneFile,
#Value("#{stepExecution.jobExecution}") JobExecution jobExecution
) {
FlatFileItemWriter writer = new FlatFileItemWriter<Sinistre>() {
private String resourceName;
#Override
public String doWrite(List<? extends ObjectX> items) {
//business logic
//business logic
//business logic
return super.doWrite(items);
}
};
DelimitedLineAggregator delimitedLineAggregator = new DelimitedLineAggregator();
delimitedLineAggregator.setDelimiter(";");
BeanWrapperFieldExtractor beanWrapperFieldExtractor = new BeanWrapperFieldExtractor();
beanWrapperFieldExtractor.setNames(new String[]{"field1", "field2", "field3", "field4".......});
delimitedLineAggregator.setFieldExtractor(beanWrapperFieldExtractor);
writer.setResource(doneFile);
writer.setLineAggregator(delimitedLineAggregator);
// how to write the header
writer.setHeaderCallback(new FlatFileHeaderCallback() {
#Override
public void writeHeader(Writer writer) throws IOException {
writer.write((String) jobExecution.getExecutionContext().get("header"));
}
});
writer.setAppendAllowed(false);
writer.setFooterCallback(new FlatFileFooterCallback() {
#Override
public void writeFooter(Writer writer) throws IOException {
writer.write("#--- fin traitement ---");
}
});
return writer;
}
this is what I called ObjectX
public class SinistreDto implements ResourceAware {
private String codeCompagnieA;//A
private String numPoliceA;//B
private String numAttestationA;//C
private String immatriculationA;//D
private String numSinistreA;//E
private String pctResponsabiliteA;//F
private String dateOuvertureA;//G
private String codeCompagnieB;//H
private String numPoliceB;//I
private String numAttestationB;//J
private String immatriculationB;//K
private String numSinistreB;//L
private Resource resource;
}
and this is the CSV file's data (I will have a bunch of files with data exactly like this)
38;5457;16902-A;0001-02-34;84485;000;20221010 12:15;55;5457;W3456;22-A555
76;544687;16902;1234-56;8448;025;20221010 12:15;22;544687;WW456;22-A555
65;84987;16902;WW 123456;74478;033;20221010 12:15;88;84987;WW3456;22-A555
this is how I expect the output file for each input file.
#header
38;5457;16902-A;0001-02-34;84485;000;20221010 12:15;55;5457;W3456;22-A555
76;544687;16902;1234-56;8448;025;20221010 12:15;22;544687;WW456;22-A555
65;84987;16902;WW 123456;74478;033;20221010 12:15;88;84987;WW3456;22-A555
#--- fin traitement ---
I see no difference between the input file and output file except the header and trailer lines. But that is not an issue, you probably omitted the processing part as it is not relevant to the question.
I believe the MultiResourceItemReader is not suitable for your case as data from different input files can end up in the same chunk, and hence written to the same output file, which is not what you want.
I think a good option for your use case is to use partitioning, where each partition is a file. This way, each input file will be read, processed and written to a corresponding output file. Spring Batch provides the MultiResourcePartitioner that will create a partition per file. You can find an example here: https://github.com/spring-projects/spring-batch/blob/main/spring-batch-samples/src/main/resources/jobs/iosample/multiResource.xml.
I am new to sprig batch.
I have a folder which contain multiple csv file, I have implemented MultiResourceItemReader () to read those file . It is working only if all csv file are pipe line ("|") separated.
I want to read both comma (",") separated csv and pipe line separated csv using single reader. Is it possible ? if yes how ?
Here is my code
#Bean
#StepScope
public MultiResourceItemReader<Person> multiResourceItemReader(#Value("#{jobParameters[x]}") String x,#Value("#{jobParameters[y]}") String y,#Value("#{jobParameters[z]}") String z) {
Resource[] resourcessss = null;
ClassLoader cl = this.getClass().getClassLoader();
ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver(cl);
try {
resourcessss = resolver.getResources("file:" + z);
}catch(Exception e) {
}
MultiResourceItemReader<Person> resourceItemReader = new MultiResourceItemReader<Person>();
resourceItemReader.setResources(resourcessss);
resourceItemReader.setDelegate(reader());
return resourceItemReader;
}
#Bean
public FlatFileItemReader<Person> reader() {
FlatFileItemReader<Person> reader = new FlatFileItemReader<Person>();
reader.setLineMapper(new DefaultLineMapper() {
{
setLineTokenizer(new DelimitedLineTokenizer() {
{
setNames(new String[]{"Id","postCode"});
}
{
setDelimiter("|");
}
});
setFieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {
{
setTargetType(Person.class);
}
});
}
});
return reader;
}
Take a look at the PatternMatchingCompositeLineTokenizer. There, you can use a Pattern to identify what records get parsed by what LineTokenizer. In your case, you'd have one Pattern that identifies comma delimited records and map them to the tokenizer that parses via commas. You'd also have a Pattern that identifies records delimited by pipes and maps those to the appropriate LineTokenizer. It would look something like this:
#Bean
public LineTokenizer compositeLineTokenizer() throws Exception {
DelimitedLineTokenizer commaTokenizer = new DelimitedLineTokenizer();
commaTokenizer.setNames("a", "b", "c");
commaTokenizer.setDelimiter(",");
commaTokenizer.afterPropertiesSet();
DelimitedLineTokenizer pipeTokenizer = new DelimitedLineTokenizer();
pipeTokenizer.setNames("a", "b", "c");
pipeTokenizer.setDelimiter("|");
pipeTokenizer.afterPropertiesSet();
// I have not tested the patterns here so they may need to be adjusted
Map<String, LineTokenizer> tokenizers = new HashMap<>(2);
tokenizers.put("*,*", commaTokenizer);
tokenizers.put("*|*", pipeTokenizer);
PatternMatchingCompositeLineTokenizer lineTokenizer = new PatternMatchingCompositeLineTokenizer();
lineTokenizer.setTokenizers(tokenizers);
return lineTokenizer;
}
I want to process multiple files sequentially and each file needs to be processed with the help of multiple threads so used the spring batch FlatFileItemReader and TaskExecutor and it seems to be working fine for me. As mentioned in the requirement we have to process multiple files, so along with FlatFileItemReader, I am using MultiResourceItemReader which will take a number of files and process one by one where I am facing issues. Can someone help me what is the cause of exception? What is the approach to fix it..?
org.springframework.batch.item.ReaderNotOpenException: Reader must be open before it can be read.
at org.springframework.batch.item.file.FlatFileItemReader.readLine(FlatFileItemReader.java:195) ~[spring-batch-infrastructure-3.0.5.RELEASE.jar:3.0.5.RELEASE]
at org.springframework.batch.item.file.FlatFileItemReader.doRead(FlatFileItemReader.java:173) ~[spring-batch-infrastructure-3.0.5.RELEASE.jar:3.0.5.RELEASE]
at org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader.read(AbstractItemCountingItemStreamItemReader.java:88) ~[spring-batch-infrastructure-3.0.5.RELEASE.jar:3.0.5.RELEASE]
at org.springframework.batch.item.file.MultiResourceItemReader.readFromDelegate(MultiResourceItemReader.java:140) ~[spring-batch-infrastructure-3.0.5.RELEASE.jar:3.0.5.RELEASE]
at org.springframework.batch.item.file.MultiResourceItemReader.readNextItem(MultiResourceItemReader.java:119)
customer2.csv
200,Zoe,Nelson,1973-01-12 17:19:30
201,Vivian,Love,1951-10-31 08:57:08
202,Charde,Lang,1967-02-23 12:24:26
customer3.csv
400,Amelia,Osborn,1972-05-09 09:21:22
401,Gemma,Finch,1989-09-25 23:00:59
402,Orli,Slater,1959-03-30 15:54:32
403,Donovan,Beasley,1986-06-18 14:50:30
customer4.csv
600,Zelenia,Henson,1982-07-03 03:28:39
601,Thomas,Mathews,1954-11-21 20:34:03
602,Kevyn,Whitney,1984-09-21 06:24:25
603,Marny,Leon,1984-06-10 21:32:09
604,Jarrod,Gay,1960-06-22 19:11:04
customer5.csv
800,Imogene,Lee,1966-10-19 17:53:44
801,Mira,Franks,1964-03-08 09:47:43
802,Silas,Dixon,1953-04-11 01:37:51
803,Paloma,Daniels,1962-06-14 17:01:02
My code:
#Bean
public MultiResourceItemReader<Customer> multiResourceItemReader() {
System.out.println("In multiResourceItemReader");
MultiResourceItemReader<Customer> reader = new MultiResourceItemReader<>();
reader.setDelegate(customerItemReader());
reader.setResources(inputFiles);
return reader;
}
#Bean
public FlatFileItemReader<Customer> customerItemReader() {
FlatFileItemReader<Customer> reader = new FlatFileItemReader<>();
DefaultLineMapper<Customer> customerLineMapper = new DefaultLineMapper<>();
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setNames(new String[] {"id", "firstName", "lastName", "birthdate"});
customerLineMapper.setLineTokenizer(tokenizer);
customerLineMapper.setFieldSetMapper(new CustomerFieldSetMapper());
customerLineMapper.afterPropertiesSet();
reader.setLineMapper(customerLineMapper);
return reader;
}
bellow snippet working fine while using below :
#Bean
public Step step1() {
return stepBuilderFactory.get("step1")
.<Customer, Customer>chunk(100).
reader(customerItemReader())
.writer(customerItemWriter()).taskExecutor(taskExecutor()).throttleLimit(10)
.build();
}
}
bellow snippet is not working getting above mentioned exception
#Bean
public Step step1() {
return stepBuilderFactory.get("step1")
.<Customer, Customer>chunk(100).
reader(multiResourceItemReader())
.writer(customerItemWriter()).taskExecutor(taskExecutor()).throttleLimit(10)
.build();
}
Since you are using the reader in a multi-threaded step, a thread could have closed the current file while another thread is trying to read from that file at the same time. You need to synchronize access to your reader with a SynchronizedItemStreamReader:
#Bean
public SynchronizedItemStreamReader<Customer> multiResourceItemReader() {
System.out.println("In multiResourceItemReader");
MultiResourceItemReader<Customer> reader = new MultiResourceItemReader<>();
reader.setDelegate(customerItemReader());
reader.setResources(inputFiles);
SynchronizedItemStreamReader<Customer> synchronizedItemStreamReader = new SynchronizedItemStreamReader<>();
synchronizedItemStreamReader.setDelegate(reader);
return synchronizedItemStreamReader;
}
I'm trying to create a spring batch job that will read from MySQL database and write the data to different files depending on a value from the database. I am getting an error :
org.springframework.batch.item.WriterNotOpenException: Writer must be open before it can be written to
at org.springframework.batch.item.file.FlatFileItemWriter.write(FlatFileItemWriter.java:255)
Here's my ClassifierCompositeItemWriter
ClassifierCompositeItemWriter<WithdrawalTransaction> classifierCompositeItemWriter = new ClassifierCompositeItemWriter<WithdrawalTransaction>();
classifierCompositeItemWriter.setClassifier(new Classifier<WithdrawalTransaction,
ItemWriter<? super WithdrawalTransaction>>() {
#Override
public ItemWriter<? super WithdrawalTransaction> classify(WithdrawalTransaction wt) {
ItemWriter<? super WithdrawalTransaction> itemWriter = null;
if(wt.getPaymentMethod().equalsIgnoreCase("PDDTS")) { // condition
itemWriter = pddtsWriter();
} else {
itemWriter = swiftWriter();
}
return itemWriter;
}
});
As you can see, I only used two file writers for now.
#Bean("pddtsWriter")
private FlatFileItemWriter<WithdrawalTransaction> pddtsWriter()
And
#Bean("swiftWriter")
private FlatFileItemWriter<WithdrawalTransaction> swiftWriter()
I also added them as stream
#Bean
public Step processWithdrawalTransactions() throws Exception {
return stepBuilderFactory.get("processWithdrawalTransactions")
.<WithdrawalTransaction, WithdrawalTransaction> chunk(10)
.processor(withdrawProcessor())
.reader(withdrawReader)
.writer(withdrawWriter)
.stream(swiftWriter)
.stream(pddtsWriter)
.listener(headerWriter())
.build();
}
Am I doing something wrong?
I am using Spring Boot 1.4 and Spring Batch 1.4 to read in a file and of course, parse the data into the database.
What I would like to do is store the entire line read in the database before the fields are mapped. The entire row would be stored as a string in the database. This is for auditing purposes, therefore I do not want to rebuild the row string from its components.
We have all seen the common mappers in use to get the data from the delimited line:
#Bean
#StepScope
public FlatFileItemReader<Claim> claimFileReader(#Value("#{jobParameters[fileName]}") String pathToFile) {
logger.debug("Setting up FlatFileItemReader for claim");
logger.debug("Job Parameter for input filename: " + pathToFile);
FlatFileItemReader<Claim> reader = new FlatFileItemReader<Claim>();
reader.setResource(new FileSystemResource(pathToFile));
reader.setLineMapper(claimLineMapper());
logger.debug("Finished setting up FlatFileItemReader for claim");
return reader;
}
#Bean
public LineMapper<Claim> claimLineMapper() {
logger.debug("Setting up lineMapper");
DefaultLineMapper<Claim> lineMapper = new DefaultLineMapper<Claim>();
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
lineTokenizer.setDelimiter("|");
lineTokenizer.setStrict(false);
lineTokenizer.setNames(new String[] { "RX_NUMBER", "SERVICE_DT", "CLAIM_STS", "PROCESSOR_CLAIM_ID", "CARRIER_ID", "GROUP_ID", "MEM_UNIQUE_ID" });
BeanWrapperFieldSetMapper<Claim> fieldSetMapper = new BeanWrapperFieldSetMapper<Claim>();
fieldSetMapper.setTargetType(Claim.class);
lineMapper.setLineTokenizer(lineTokenizer);
lineMapper.setFieldSetMapper(claimFieldSetMapper());
logger.debug("Finished Setting up lineMapper");
return lineMapper;
}
If this is my row:
463832|20160101|PAID|504419000000|XYZ|GOLD PLAN|561868
I would want to store "463832|20160101|PAID|504419000000|HBT|GOLD PLAN|561868" as the string in the database (probably with some additional data such as job_instance_id).
Any ideas on how to hook this in during the file reading process?
Instead of using DefaultLineMapper, you can have a new class (suggest CustomLineMapper) as below
public class CustomLineMapper extends DefaultLineMapper<Claim> {
#Override
public Claim mapLine(String line, int lineNumber) throws Exception {
// here you can handle *line content*
return super.mapLine(line, lineNumber);
}
}
line object will contains the raw data which is before mapping it to an object.