Spring batch doesn't seem to be closing item writers properly - java

I have a job that writes each item in one separated file. In order to do this, the job uses a ClassifierCompositeItemWriter whose the ClassifierCompositeItemWriter returns a new FlatFileItemWriter for each item (code bellow).
#Bean
#StepScope
public ClassifierCompositeItemWriter<ProcessorResult> writer(#Value("#{jobParameters['outputPath']}") String outputPath) {
ClassifierCompositeItemWriter<MyItem> compositeItemWriter = new ClassifierCompositeItemWriter<>();
compositeItemWriter.setClassifier((item) -> {
String filePath = outputPath + "/" + item.getFileName();
BeanWrapperFieldExtractor<MyItem> fieldExtractor = new BeanWrapperFieldExtractor<>();
fieldExtractor.setNames(new String[]{"content"});
DelimitedLineAggregator<MyItem> lineAggregator = new DelimitedLineAggregator<>();
lineAggregator.setFieldExtractor(fieldExtractor);
FlatFileItemWriter<MyItem> itemWriter = new FlatFileItemWriter<>();
itemWriter.setResource(new FileSystemResource(filePath));
itemWriter.setLineAggregator(lineAggregator);
itemWriter.setShouldDeleteIfEmpty(true);
itemWriter.setShouldDeleteIfExists(true);
itemWriter.open(new ExecutionContext());
return itemWriter;
});
return compositeItemWriter;
}
Here's how the job is configured:
#Bean
public Step step1() {
return stepBuilderFactory
.get("step1")
.<String, MyItem>chunk(1)
.reader(reader(null))
.processor(processor(null, null, null))
.writer(writer(null))
.build();
}
#Bean
public Job job() {
return jobBuilderFactory
.get("job")
.incrementer(new RunIdIncrementer())
.flow(step1())
.end()
.build();
}
Everything works perfectly. All the files are generated as I expected. However, one of the file cannot be deleted. Just one. If I try to delete it, I get a message saying that "OpenJDK Platform binary" is using it. If I increase the chunk to a size bigger that the amount of files I'm generating, none of the files can be deleted. Seems like there's an issue to delete the files generated in the last chunk, like if the respective writer is not being closed properly by the Spring Batch lifecycle or something.
If I kill the application process, I can delete the file.
Any I idea why this could be happening? Thanks in advance!
PS: I'm calling this "itemWriter.open(new ExecutionContext());" because if I don't, I get a "org.springframework.batch.item.WriterNotOpenException: Writer must be open before it can be written to".
EDIT:
If someone is facing a similar problem, I suggest reading the Mahmoud's answer to this question Spring batch : ClassifierCompositeItemWriter footer not getting called .

Probably you are using the itemwriter outside of the step scope when doing this:
itemWriter.open(new ExecutionContext());
Please check this question, hope that this helps you.

Related

Springbatch read from csv, how does it work?

I am new to springbatch, and I wonder how this reader/processor/writer works if I am reading a csv file which contains 10k rows, use a chunk size of 10 and output to a csv file.
My questions is:
Does springbatch loads all 10k rows from csv in one time, process individually(10k times), and then store all of them into the destination file in one go? If so, what's the point of using springbatch? I can have three methods doing the same job right?
Or:
Does springbatch opens up a stream reading 10k rows from csv, each time it reads 10 rows, process 10 rows, and open a output stream write/append those 10 rows into destination file? Basically repeats 10k/10 = 1k times.
#Configuration
public class SampleJob3 {
#Bean
public Job job3(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new JobBuilder("Job3", jobRepository)
.incrementer(new RunIdIncrementer()) // work with program args
.start(step(jobRepository, transactionManager))
.build();
}
private Step step(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("Job3 Step started ")
.<Student, Student>chunk(3)
.repository(jobRepository)
.transactionManager(transactionManager)
.reader(reader(true))
.processor(student -> {
System.out.println("processor");
return new Student(student.getId(), student.getFirstName() + "!", student.getLastName() + "!", student.getEmail() + "!");
})
.writer(writer())
.build();
}
private FlatFileItemReader<Student> reader(boolean isValid) {
System.out.println("reader");
FlatFileItemReader<Student> reader = new FlatFileItemReader<>();
// using FileSystemResource if file stores in a directory instead of resource folder
reader.setResource(new PathMatchingResourcePatternResolver().getResource(isValid ? "input/students.csv" : "input/students_invalid.csv"));
reader.setLineMapper(new DefaultLineMapper<>() {
{
setLineTokenizer(new DelimitedLineTokenizer() {{
setNames("ID", "First Name", "Last Name", "Email");
}});
setFieldSetMapper(new BeanWrapperFieldSetMapper<>() {{
setTargetType(Student.class);
}});
}
});
reader.setLinesToSkip(1);
return reader;
}
//#Bean
public FlatFileItemWriter<Student> writer() {
System.out.println("writer");
FlatFileItemWriter<Student> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource("output/students.csv"));
writer.setHeaderCallback(writer1 -> writer1.write("Id,First Name,Last Name, Email"));
writer.setLineAggregator(new DelimitedLineAggregator<>() {{
setFieldExtractor(new BeanWrapperFieldExtractor<>() {{
setNames(new String[]{"id", "firstName", "lastName", "email"});
}});
}});
writer.setFooterCallback(writer12 -> writer12.write("Created # " + Instant.now()));
return writer;
}
}
My last question basically the same, but datasource is database. e.g. reading a table contains 10k data from dbA and write to dbB. Am I able to read 10 rows at a time, process them and write them to dbB? If so, can you share some sudocode?
A chunk-oriented step in Spring Batch will not read the entire file or table at once. It will rather stream data from the source in chunks (of a configurable size).
You can find more details about the processing model in the reference documentation here: Chunk-oriented Processing.

Spring Batch EmptyResultDataAccessException while deleting

I have a batch job that reads data from multiple tables on a datasource with a complicated select query with many joins and writes to a table on a different datasource using an insert query.
#Bean
public JdbcCursorItemReader<Employee> myReader2() {
JdbcCursorItemReader<Employee> reader = new JdbcCursorItemReader<>();
reader.setSql(COMPLICATED_QUERY_WITH_MANY_JOINS);
reader.setDataSource(dataSourceOne);
reader.setPreparedStatementSetter(new MyPrepStSetterOne());
reader.setRowMapper(new EmployeeRowMapper());
return reader;
}
Bean
public JdbcBatchItemWriter<Employee> myWriter2(DataSource dataSource) {
JdbcBatchItemWriter<Employee> writer = new JdbcBatchItemWriter<>();
writer.setSql(INSERT_QUERY);
writer.setPreparedStatementSetter(new MyPrepStSetterTwo());
writer.setDataSource(dataSourceTwo);
return writer;
}
I have the above reader and writer in a step.
I want to delete the employee records that were inserted from previous day's job (not all of them) if it could be duplicated by today's records.
So I added another step before the above step with same select query in the reader, but a delete query in the writer.
#Bean
public JdbcCursorItemReader<Employee> myReader1() {
JdbcCursorItemReader<Employee> reader = new JdbcCursorItemReader<>();
reader.setSql(COMPLICATED_QUERY_WITH_MANY_JOINS);
reader.setDataSource(dataSourceOne);
reader.setPreparedStatementSetter(new MyPrepStSetterOne());
reader.setRowMapper(new EmployeeRowMapper());
return reader;
}
Bean
public JdbcBatchItemWriter<Employee> myWriter1(DataSource dataSource) {
JdbcBatchItemWriter<Employee> writer = new JdbcBatchItemWriter<>();
writer.setSql(DELETE_QUERY_WHERE_EMPLOYEE_NAME_IS);
writer.setPreparedStatementSetter(new MyPrepStSetterZero());
writer.setDataSource(dataSourceTwo);
return writer;
}
I am getting EmptyResultDataAccessException: Item 3 of 10 did not update any rows
Because not all of today's records may have been inserted yesterday.
How can I make myWriter1 to ignore if a record does not exist already and proceed to next?
Your approach seems correct. You can set JdbcBatchItemWriter#setAssertUpdates to false in your writer and this should ignore the case where no records have been updated by your query (which is a valid business case according to your description).

How to process larger files from JSON To CSV using Spring batch

I am trying to implement a batch job for the following use-case. (New to spring batch)
Use-case
From one source system every day I will get 200+ compressed(.gz) files. Each(.gz) file will gives a 1GB of file on unzip. Which means 200GB of files in my input directory. Here the content type is JSON.
Sample Format of JSON File
{"name":"abc1","age":20}
{"name":"abc2","age":20}
{"name":"abc3","age":20}
.....
I need to process these files from JSON TO CSV to output directory. And the these csv generation should be similar like size based rolling in Log4J. After writing I need to remove the each file from input directory.
Question 1
Does the spring batch can handle this huge data? Because for single day I am getting nearly 200GB?
Question 2
I am thinking Spring batch can handle.So Implemented a code with partitioner using spring batch . But while reading I am seeing some dirty lines with out any end of line.
Faulty lines structure
{"name":"abc1","age":20,....}
{"name":"abc2","age":20......}
{"name":"abc3","age":20
{"name":"abc1","age":20,....}
{"name":"abc1","age":20,....}
.....
For this I have written a skip policy but its not working as expected. Its skipping all line from the error line on-wards instead one line. How to skip only that error line?
I am sharing my sample snippet below please give some suggestions or corrections on my code and to above questions and issues.
JobConfig.java
#Bean
public Job myJob() throws Exception {
return joubBuilderFactory.get(COnstants.JOB.JOB_NAME)
.incrementer(new RunIdIncrementer())
.listener(jobCompleteListener())
.start(masterStep())
.build();
//master
#Bean
public Step masterStep() throws Exception{
return stepBuilderFactory.get("step")
.listener(new UnzipListener())
.partitioner(slaveStep())
.partitioner("P",partitioner())
.gridSize(10).
taskExecutor(executor())
.build();
}
//slaveStep
#Bean
public Step slaveStep() throws Exception{
return stepBuilderFactory.get("slavestep")
.reader(reader(null))
.writer(customWriter)
.faultTolerant()
.skipPolicy(fileVerificationSkipper())
.build();
}
#Bean
public SkipPolicy fileVerificatoinSkipper(){
return new FileVerficationSkipper();
}
#Bean
#StepScop
public Partitioner partitioner() throws Exception{
MutliResourcePartitioner part = new MultiResourcePartitioner();
PathMatching ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();
Resource[] res = resolver.getResource("...path of files...");
part.setResoruces(res);
part.partition(20);
return part;
}
Skip Policy Code
public class LineVerificationSkipper implements SkipPolicy {
#Override
public boolean shouldSkip(Throwable exception, int skipCount) throws SkipLimitExceededException {
if (exception instanceof FileNotFoundException) {
return false;
} else if (exception instanceof FlatFileParseException && skipCount <= 5) {
FlatFileParseException ffpe = (FlatFileParseException) exception;
StringBuilder errorMessage = new StringBuilder();
errorMessage.append("An error occured while processing the " + ffpe.getLineNumber()
+ " line of the file. Below was the faulty " + "input.\n");
errorMessage.append(ffpe.getInput() + "\n");
System.err.println(errorMessage.toString());
return true;
} else {
return false;
}
}
Question 3
How to delete the input source files are processing each file?. Because I am not getting any info like file path or name in ItemWriter.?

MultiResourceItemReader - Skip entire file if header is invalid

My Spring Batch job reads a list of csv files containing two types of headers. I want the reader to skip the entire file if its header does not match one of the two possible header types.
I've taken a look at Spring Boot batch - MultiResourceItemReader : move to next file on error.
But I don't see how to validate the header tokens to ensure they match up in count and content
I was able to figure this out by doing the following,
public FlatFileItemReader<RawFile> reader() {
return new FlatFileItemReaderBuilder<RawFile>()
.skippedLinesCallback(line -> {
// Verify file header is what we expect
if (!StringUtils.equals(line, header)) {
throw new IllegalArgumentException(String.format("Bad header!", line));
}
})
.name( "myReader" )
.linesToSkip( 1 )
.lineMapper( new DefaultLineMapper() {
{
setLineTokenizer( lineTokenizer );
setFieldSetMapper( fieldSetMapper );
}} )
.build();
}
I call the reader() method when setting the delegate in my MultiResourceItemReader.
Note that header, lineTokenizer, and fieldSetMapper are all variables that I set depending on which type of file (and hence which set of headers) my job is expected to read.
Can we do this in XML based configuration ?

Spring Batch creating multiple files .Gradle based project

I need to create 3 separate files.
My Batch job should read from Mongo then parse through the information and find the "business" column (3 types of business: RETAIL,HPP,SAX) then create a file for their respective business. the file should create either RETAIL +formattedDate; HPP + formattedDate; SAX +formattedDate as the file name and the information found in the DB inside a txt file. Also, I need to set the .resource(new FileSystemResource("C:\filewriter\index.txt)) into something that will send the information to the right location, right now hard coding works but only creates one .txt file.
example:
#Bean
public FlatFileItemWriter<PaymentAudit> writer() {
LOG.debug("Mongo-writer");
FlatFileItemWriter<PaymentAudit> flatFile = new
FlatFileItemWriterBuilder<PaymentAudit>()
.name("flatFileItemWriter")
.resource(new FileSystemResource("C:\\filewriter\\index.txt))
//trying to create a path instead of hard coding it
.lineAggregator(createPaymentPortalLineAggregator())
.build();
String exportFileHeader =
"CREATE_DTTM";
StringHeaderWriter headerWriter = new
StringHeaderWriter(exportFileHeader);
flatFile.setHeaderCallback(headerWriter);
return flatFile;
}
My idea would be something like but not sure where to go:
public Map<String, List<PaymentAudit>> getPaymentPortalRecords() {
List<PaymentAudit> recentlyCreated =
PaymentPortalRepository.findByCreateDttmBetween(yesterdayMidnight,
yesterdayEndOfDay);
List<PaymentAudit> retailList = new ArrayList<>();
List<PaymentAudit> saxList = new ArrayList<>();
List<PaymentAudit> hppList = new ArrayList<>();
//String exportFilePath = "C://filewriter/";??????
recentlyCreated.parallelStream().forEach(paymentAudit -> {
if (paymentAudit.getBusiness().equalsIgnoreCase(RETAIL)) {
retailList.add(paymentAudit);
} else if
(paymentAudit.getBusiness().equalsIgnoreCase(SAX)) {
saxList.add(paymentAudit);
} else if
(paymentAudit.getBusiness().equalsIgnoreCase(HPP)) {
hppList.add(paymentAudit);
}
});
To create a file for each business object type, you can use the ClassifierCompositeItemWriter. In your case, you can create a writer for each type and add them as delegates in the composite item writer.
As per creating the filename dynamically, you need to use a step scoped writer. There is an example in the Step Scope section of the reference documentation.
Hope this helps.

Categories

Resources