Implement fault tolerance in Spring Batch - java

I have following batch job implemented in Spring Batch config:
#Bean
public Job myJob(Step step1, Step step2, Step step3) {
return jobs.get("myJob").start(step1).next(step2).next(step3).build();
}
#Bean
public Step step1(ItemReader<String> myReader,
ItemProcessor<String, String> myProcessor,
ItemWriter<String> myWriter) {
return steps.get("step1").<String, String>chunk(1)
.reader(myReader)
.processor(myProcessor)
.writer(myWriter)
.build();
}
I would like to retry the step1 (also step2 and step3 so forth) on certain exception and rollback the job on any failure (also between the retries). I understand that rollback is not going to be automatic and I am clear what to rollback for each step with writing custom code.
What is the best way to implement this?

Spring framework provides #Retryable and #Recover annotations to retry and to recover when something fails. You can check this article. https://www.baeldung.com/spring-retry

Fault tolerance features in Spring Batch are applied to items in chunk-oriented steps, not to the entire step.
What you can try to do is use a flow with a decider where you restart from step1 if an exception occurs in one of the subsequent steps.

Related

Spring Batch - Is there a way to commit data even if the chunk raise some exception?

I have a process that read from a queue, process and write into DB. Even if the process fails, I have to store in DB. But the Spring Batch steps are transactional and always rollback the changes. So, is there a way to commit data even if the chunk raise some exception?
EDIT I:
I tried with Tasklet but getting the same behaviour.
Thanks in advance.
During configuring a step , you can use noRollback() to configure a list of exception that will not cause rollback. Any exceptions that are the subclass of configured exception will not rollback. That means if you simply want to never rollback , set it as Exception which is the parent of all exceptions.
An example can be found from the docs :
#Bean
public Step step1() {
return this.stepBuilderFactory.get("step1")
.<String, String>chunk(2)
.reader(itemReader())
.writer(itemWriter())
.faultTolerant()
.noRollback(Exception.class)
.build();
}
I tried to use noRollback() as Ken Chan suggested but did'nt work. Also try to put the specific exceptions, but it keeps doing rololback.
The conditional flow is at Step level no item level, so it doesn't help me. Also tried with Listeners but the documentation said:
This listener is designed to work around the lifecycle of an item. This means that each method should be called once within the lifecycle of an item and in fault tolerant scenarios, any transactional work that is done in one of these methods would be rolled back and not re-applied. Because of this, it is recommended to not perform any logic using this listener that participates in a transaction.
I solved my problem using Tasklet insted of chunked oriented solution and adding an #Transactional to the execute method of the Tasklet.
#Transactional(propagation = Propagation.REQUIRES_NEW, isolation = Isolation.SERIALIZABLE, noRollbackFor = {
ErrorInternoServidorException.class, SolicitudIncorrectaException.class,
RegistroNoEncontradoException.class, SolicitudEventoObjetaException.class,
SolicitudEventoValidaException.class, MimCargueSolicitudException.class, ConflictException.class,
UnauthorizedException.class, ForbiddenException.class })
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
Spring Batch chunked oriented solution is wrapped in a Tasklet with his own transaction, so I tried to create a new one with my own rules.
Thanks to everyone for your replies. I learned a lot.
One way you could write your JOB to commit all your data even when exceptions are raised in your processing is to use a SkipPolicy and write your data to the destination DB there.
One of the main benefits of SkipPolicy is to log the data that caused and exception through the processing, and the logging part could even be inserting the record in a temporary Table in you DB.
public class FileVerificationSkipper implements SkipPolicy {
private static final int MAX_VALUES_TO_SKIP = 1000;
private JdbcTemplate jdbcTemplate;
public FileVerificationSkipper(JdbcTemplate jdbcTemplate) {
this.jdbcTemplate = jdbcTemplate;
}
#Override
public boolean shouldSkip(Throwable exception, int skipCount) throws SkipLimitExceededException {
if (exception instanceof FlatFileParseException && skipCount <= MAX_VALUES_TO_SKIP) {
jdbcTemplate.update("INSERT INTO YourTable(column1, column2) VALUES(?,?)");
return true;
} else {
return false;
}
}
}
Hope this helps.

How to make a Spring Batch step depends on previous step?

I am using Spring Batch to read some data from CSV files and put it in a database.
My Batch job must be compound of 2 steps :
Check files (names, extension, content ..)
Read lines from CSV and save them in DB (ItemReader, ItemProcessor,
ItemWriter..)
Step 2 must not be executed if Step 1 generated an error (files are not conform, files doesn't exist ...)
FYI, I am using Spring Batch without XML configuration ! Only annotations :
Here's what my job config class looks like :
#Configuration
#EnableBatchProcessing
public class ProductionOutConfig {
#Autowired
private StepBuilderFactory steps;
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private ProductionOutTasklet productionOutTasklet;
#Autowired
private CheckFilesForProdTasklet checkFilesForProdTasklet;
#Bean
public Job productionOutJob(#Qualifier("productionOut")Step productionOutStep,
#Qualifier("checkFilesForProd") Step checkFilesForProd){
return jobBuilderFactory.get("productionOutJob").start(checkFilesForProd).next(productionOutStep).build();
}
#Bean(name="productionOut")
public Step productionOutStep(){
return steps.get("productionOut").
tasklet(productionOutTasklet)
.build();}
#Bean(name = "checkFilesForProd")
public Step checkFilesForProd(){
return steps.get("checkFilesForProd")
.tasklet(checkFilesForProdTasklet)
.build();
}
}
What you are looking for is already the default behavior of Spring Batch i.e. next step wouldn't be executed if previous step has failed. To mark current step as failed step, you need to throw a run time exception which is not caught.
If exception is not handled, spring batch will mark that step as failed and next step wouldn't be executed. So all you need to do is to throw an exception on your failed scenarios.
For complicated job flows , you might like to use - JobExecutionDecider , Programmatic Flow Decisions
As the documentation specifies you can use the method "on" which starts a transition to a new state if the exit status from the previous state matches the given pattern.
Your code could be similar to something like this :
return jobBuilderFactory.get("productionOutJob")
.start(checkFilesForProd)
.on(ExitStatus.FAILED.getExitCode()).end()
.from(checkFilesForProd)
.on("*")
.to(productionOutStep)
.build();

How to create and launch spring batch jobs at runtime

We have a requirement to carry out data movement from 1 database to other and exploring spring batch for the same. User of our application selects source and target datasource along with the list of tables for which the data needs to be moved.
Need help with following:
The information necessary to build a job comes at runtime from our web application - that includes datasource details and list of table names. We would like to create a new job by sending these details to the job builder module and launch it using JobLauncher. How do we write this job builder module?
We may have multiple users raising data movement requests in parallel, so need a way to create multiple jobs and run them in suitable order.
We have used the Java based configuration to create a job and launch it from a web container. The configuration is as follows
#Bean
public Job loadDataJob(JobCompletionNotificationListener listener) {
RunIdIncrementer inc = new RunIdIncrementer();
inc.setKey(new Date().toString());
JobBuilder builder = jobBuilderFactory.get("loadDataJob")
.incrementer(inc)
.listener(listener);
SimpleJobBuilder simpleBuilder = builder.start(preExecute());
for(String s : getTables()){
simpleBuilder.next(etlTable(s));
}
simpleBuilder.next(postExecute());
return simpleBuilder.build();
}
#Bean
#Scope("prototype")
public Step etlTable(String tableName) {
return stepBuilderFactory.get(tableName)
.<Map<String,Object>, Map<String,Object>> chunk(1000)
.reader(dbDataReader(tableName))
.processor(processor())
.writer(dbDataWriter(tableName))
.build();
}
Currently we have hardcoded the source and target datasource details into respective beans. The getTables() returns a list of tables (hardcoded) for which the data needs to be moved.
RestController that launches the job
#RestController
public class MyController {
#Autowired
JobLauncher jobLauncher;
#Autowired
Job job;
#RequestMapping("/launchjob")
public String handle() throws Exception {
try {
JobParameters jobParameters = new JobParametersBuilder().addLong("time", new Date().getTime()).toJobParameters();
jobLauncher.run(job, jobParameters);
} catch (Exception e) {
}
return "Done";
}
}
Concerning your first question, you definitely have to use JavaConfiguration. Moreover, you shouldn't define your steps as spring beans, if you want to create a job with a dynamic number of steps (for instance a step per table you have to copy).
I've written a couple of answers to questions about how to create jobs dynamically. Have a look at them, they might be helpful
Spring batch execute dynamically generated steps in a tasklet
Spring batch repeat step ending up in never ending loop
Spring Batch - How to generate parallel steps based on params created in a previous step
Spring Batch - Looping a reader/processor/writer step
Edited
Some remarks concerning your second question:
Firstly, you are using a normal JobLauncher and I assume your instantiate the SimpleJobLauncher. This means, you can provide a job with jobparameters, as you have shown in your code above. However, the provided "job" does not have to be a "SpringBean"-instance, so you don't have to Autowire it and therefore, you can use create-methodes as I suggested in the answers to the questions mentioned above.
Secondly, if you create your Job instance for every request dynamically, there is no need to pass the whole configuration as jobparameters, since you can pass the "configuration properties" like datasource and tables to be copied directly as parameters to your "createJob" method. You could even create your DataSource-instances "on the fly", if you don't know all possible datasources in advance.
Thirdly, I would consider every request as a "single run", which cannot be "restarted". Hence, I'd just but some "meta information" into the jobparameters like user, date/time, datasource names (urls) and a list of tables to be copied. I would use this kind of information just as a kind of logging/auditing which requests where issued, but I wouldn't use the jobparameter-instances as controlparameters inside the job itself (again, you can pass the values of these parameters during the construction time of the job and steps by passing them to your create-Methods, so the structure of your job is created according to your parameters and hence, during runtime - when you could access your jobparameters - there is nothing to do based on the jobparameters).
Finally, if a request fails (meaning the jobs exits with an error) simply a new request has to be executed in order to retry, but this request would be a complete new request and not a restart of an already executed job launch (since I would add the request time to my jobparameters, every launch would be a unique launch).
Edited 2:
Not creating the Job as a Bean doesn't mean to not use Autowiring. Here is an example, aus I would structure my Beans.
#Component
#EnableBatchProcessing
#Import() // list with imports as neede
public class JobCreatorComponent {
#Autowire
private StepBuilderFactory stepBuilder;
#Autowire
private JobBuilderFactory jobBuilder;
public Job createJob(all the parameters you need) {
return jobBuilder.get(). ....
}
}
#RestController
#Import(JobCreatorComponent.class)
public class MyController {
#Autowired
JobLauncher jobLauncher;
#Autowired
JobCreatorComponent jobCreator;
#RequestMapping("/launchjob")
public String handle() throws Exception {
try {
Job job = jobCreator.createJob(... params ...);
JobParameters jobParameters = new JobParametersBuilder().addLong("time", new Date().getTime()).toJobParameters();
jobLauncher.run(job, jobParameters);
} catch (Exception e) {
}
return "Done";
}
}
by using #JobScope on itemreader no need to do things manually at run time just have to annoted your respective reader with #Jobscope, on each interaction with controller you will get fresh record processing.
This is type of job on demand where you can execute the job for goals like do the db migration or get the specific reporting like that.

Configuring spring transactions in spring integration dsl

I'm currently configuring spring integration using spring-integration-dsl as follow
#Bean
public IntegrationFlow flow() {
return IntegrationFlows.from(inboundServer())
.transform(Transformers.objectToString())
.transform(...)
.route(...)
.transform(Transformers.toJson())
.channel(...)
.get();
}
#Bean
public PlatformTransactionManager transactionManager() {
....
}
I don't know how I can configure the flow to use the transaction manager I've configured.
Actually, Spring Integration Java DSL supports all transaction features, which are available for the XML components.
Please, provide more info from where you want to start a transaction. And keep in mind that TX support is restricted to the Thread boundaries. So, you can start TX from the poller or from the JMS(AMQP) Message Driven Channel Adapter.
Or use TransactionInterceptor as an advice on any endpoint within the flow. But in this case the TX is restricted just only for the AbstractReplyProducingMessageHandler.handleRequestMessage.
UPDATE
To start the TX for some part of flow isn't so standard task and it can be achieved as a unit of work some transactional black box. For this purpose we have a component like Gateway. So, you specify some interface, mark it with #MessagingGateway, add #IntegrationComponentScan alongside with #EnableConfiguration and mark the method of that interface with #Transactional. The requestChannel of this gateway should send message to some separate flow with JDBC and Jackson conversion and wait for the result to continue in the main flow. The TX will be finished on the return from that gateway's method invocation.
And call that gateway as regular service-activator from the .handle("myGateway", "getData")

How to propagate Spring transaction to another thread?

Perhaps, I am doing something wrong, but I can't find a good way out for the following situation.
I would like to unit test a service that uses Spring Batch underneath to execute jobs. The jobs are executed via pre-configured AsyncTaskExecutor in separate threads. In my unit test I would like to:
Create few domain objects and persist them via DAO
Invoke the service method to launch the job
Wait until the job is completed
Use DAO to retrieve domain objects and check their state
Obviously, all above should be executed within one transaction, but unfortunately, transactions are not propagated to new threads (I understand the rationale behind this).
Ideas that came to my mind:
Commit the transaction#1 after step (1). Is not good, as the DB state should be rolled back after the unit test.
Use Isolation.READ_UNCOMMITTED in job configuration. But this requires two different configurations for test and for production.
I think the simplest solution would be configure the JobLauncher with a SyncTaskExecutor during test execution - this way the job is executed in the same thread as the test and shares the transaction.
The task executor configuration can be moved to a separate spring configuration xml file. Have two versions of it - one with SyncTaskExecutor which is used during testing and the other AsyncTaskExecutor that is used for production runs.
Although this is not a true solution to your question, I found it possible to start a new transaction inside a worker thread manually. In some cases this might be sufficient.
Source: Spring programmatic transactions.
Example:
#PersistenceContext
private EntityManager entityManager;
#Autowired
private PlatformTransactionManager txManager;
/* in a worker thread... */
public void run() {
TransactionStatus tx = txManager.getTransaction(new DefaultTransactionDefinition());
try {
entityManager.find(...)
...
entityManager.flush(...)
etc...
txManager.commit(tx);
} catch (RuntimeException e) {
txManager.rollback(tx);
}
}
If you do want separate configurations, I'd recommend templating the isolation policy in your configuration and getting its value out of a property file so that you don't wind up with a divergent set of Spring configs for testing and prod.
But I agree that using the same policy production uses is best. How vast is your fixture data, and how bad would it be to have a setUp() step that blew away and rebuilt your data (maybe from a snapshot, if it's a lot of data) so that you don't have to rely on rollbacks?

Categories

Resources