Maintain pre-job object through Spring Batch job steps - java

For my batch application, I have a handful of steps I need to take prior to the execution of the Spring Batch job. For instance, I need to do a specific query and store data in a property - a List with a complex type (List<ComplexType>) - so that it can be used and manipulated throughout the Spring Batch job (primarily in the ItemReader).
I've tried autowiring in my list and accessing it in the step, but I can't access the data in the list that way. I get a null ComplexType in my List, no matter what values have been added to my autowired List property prior to the job.
I have also tried passing data using ExecutionContext, but I don't think this is supposed to work outside the Spring Batch job execution.
What I want to know, is what is the best way to populate an item prior to executing a Spring Batch job and maintain that object throughout the lifecycle of the job.
If the best way is one of the previous attempts I've made, any guidance on common mistakes with those approaches are appreciated, thanks.

Thanks Luca Basso Ricci for the JobExecutionListener pointer. I ended up creating my own StepExecutionListener where my pre-step processing would happen.
I followed this example from Mkyong which goes over different types of Spring Batch Listeners.
I created a custom listener like this one in the Java code:
public class CustomStepListener implements StepExecutionListener {
#Autowired
private CustomObject customObject;
#Override
public void beforeStep(StepExecution stepExecution) {
// initialize customObject and do other pre set setup
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
return null;
}
And I initialized the autowired CustomObject class here. The CustomObject class is a custom object that simply contained my List of type ComplexType.
#Component
public class CustomObject {
private List<ComplexType> customObjectList;
public List<ComplexType> getCustomObjectList() {
return customObjectList;
}
public void setCustomObjectList(List<ComplexType> customObjectList) {
this.customObjectList= customObjectList;
}
}
Finally, in my job configuration 'batch-job-context.xml' I added my new listener:
<!-- ... -->
<beans:bean id="customStepListener"
class="com.robotsquidward.CustomStepListener"/>
<job id="robotsquidwardJob"
job-repository="jobRepository"
incrementer="runIdIncrementer">
<step id="robotsquidwardStep">
<tasklet task-executor="taskExecutor" throttle-limit="1">
<chunk
reader="robotsquidwardReader"
processor="robotsquidwardProcessor"
writer="robotsquidwardWriter"
commit-interval="1"/>
</tasklet>
<listeners>
<listener ref="customStepListener"/>
</listeners>
</step>
</job>
When I followed these steps I was able to initialize my ComplexObject List within the beforeJob function and access the values of the ComplexObject List within my job's Reader class:
#Component
#Scope(value = "step")
public class RobotsquidwardReader implements ItemReader<ComplexType> {
#Autowired
private CustomObject customObject;
#Override
public ComplexType read() throws Exception, UnexpectedInputException,
ParseException, NonTransientResourceException {
if(customObject.getCustomObjectList() != null) {
return customObject.getCustomObjectList.remove(0);
} else {
return null;
}
}
}
Easy as that. All it took is two new classes, a config change, and a major headache :)

You can do this :)
try to do this in job parameters incrementer trick :
<j:job id="Your_job" incrementer="incrementerBean">
and
<bean id="incrementerBean" class="com.whatever.IncrementerClass"/>
incrementer class :
class IncrementerClass implements JobParametersIncrementer {
#Override
JobParameters getNext(JobParameters parameters) {
Map<String, JobParameter> map = new HashMap<String, JobParameter>(
parameters.getParameters());
...
//you can put here your list, if it can be :
// Domain representation of a parameter to a batch job. Only the following types
// can be parameters: String, Long, Date, and Double.
//make some query
List<String> listStrings = Query.getYourQuery();
//Join your query into string to have something like this below
map.put("listOfSomething", new JobParameter("abc, abc, abc"));
...
return new JobParameters(map);
}
}
And thats all,
then you can use this parameter for example in some processing bean :
#Value("#{jobParameters['listOfSomething']}")
private String yourList
You can build your list from string, and thats all :)
good luck

Related

Batch Tasklet to read from database with select query

How can I create a tasklet class to make a custom select query from DB and pass the data to the next tasklet? I have to use tasklet (no jdbcReader or any reader)
Code Sample:
public class Taskletreader implements Tasklet, StepExecutionListener{
private final Logger logger = LoggerFactory.getLogger(Taskletreader.class);
#Autowired
private DataSource dataSource;
private List<FichierEclate> FichierEclates;
private String query="select * from FicherEclate where .......some conditions...."
#Override
public void beforeStep(StepExecution stepExecution) {
FichierEclates = new ArrayList<FichierEclate>();
logger.debug("Reader initialized.");
}
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
new JdbcTemplate(dataSource)
.execute(query);
return RepeatStatus.FINISHED;
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
// fu.closeReader();
stepExecution
.getJobExecution()
.getExecutionContext()
.put("FichierEclates", this.FichierEclates);
logger.debug("Reader ended.");
return ExitStatus.COMPLETED;
}
}
Can't understand where is the result of the select and how to pass it to next tasklet for processing ?
Can't understand where is the result of the select
If you want to consume the result of the query, you can use the query method on JdbcTemplate:
List<FichierEclate> fichierEclates = jdbcTemplate.query(query, new BeanPropertyRowMapper<>(FichierEclate.class));
This method accepts a RowMapper that maps each row from the database to an instance of your domain object. The sample uses BeanPropertyRowMapper, but you can provide a custom mapper if needed.
how to pass it to next tasklet for processing ?
You can use the execution context for that as you did. However, you should not be passing the entire items list. The execution context is persisted and it is not recommended to pass a lot of data between steps like this. A list of item IDs could be fine, but the list of entire items objects is not a good idea. Please refer to the Passing Data to Future Steps section of the reference documentation for more details.
That said, I really recommend using a chunk-oriented tasklet for that. I know you said I have to use tasklet (no jdbcReader or any reader), but I don't understand this constraint.

Can I inject a single, static item into a Spring Batch item reader?

We have a Spring Batch job that pulls a dynamic list of recipients from a file. We want to add a single extra recipient to serve as a quality control. I thought about adding a new tasklet that just spits out this record and passes it along to the real reader. I've read a few questions here, articles elsewhere and the documentation about transferring data between Spring Batch steps, but I'm not sure that's the easiest, or best way to accomplish this.
Like the official documentation using listeners, this article using autowired components and different listeners, and this question and answers.
If I did get a generator tasklet set up and pass its data into the reader, how would I insert it into the reader's actual records?
Some snippets of the code we're working with – it's purely annotation driven, no XML config setup anywhere.
Step builder
public Step loadRecipients() {
return stepBuilderFactory.get("loadRecipients").<Recipient, Recipient>chunk(chunkSize)
.reader(recipientsItemReader)
.processor(recipientsItemProcessor)
.writer(recipientsWriter)
.taskExecutor(taskExecutor)
.throttleLimit(1)
.build();
}
Reader config
#StepScope
public FlatFileItemReader<Recipient> recipientItemReader() {
FlatFileItemReader<Recipient> itemReader = new FilePrefixItemReader<>(
"theFilePath",
staticResourceLoader(),
FunctionUtils.propagateExceptions((org.springframework.core.io.Resource resource) -> new GZIPInputStream(resource.getInputStream()))
);
userCategoryItemReader.setLineMapper(userCategoriesDefaultLineMapper);
return userCategoryItemReader;
}
Should I just finagle my extra record into the resource input stream with some funky wrapper? Is there some other Spring magic I can use to add my static record?
wrap/extend the Writer and add the static item there, rough sourcecode:
public class AddStaticItemWriter implements ItemWriter<String> {
#Override
public void write(final List<? extends String> items) throws Exception {
// check some funky condition
if (addStaticItem) {
items.add(STATIC_ITEM);
}
// business code
// or delegate to underlying writer
}
}
some hints (pros, cons):
the added item is not known to spring batch, might lead to some weird things with roll-back scenarios (skip, re-try)
like above, you could wrap the reader and add the item there
Rather than perverting an item writer, I ended up making a specific tasklet for this. The major downside for the item writer approach was that the current implementation is very lean and has a lot of reused code. Extending the item writer added some code that didn't really belong there.
The major upside of the tasklet was upholding the a single-responsibility principle. It was very easy to get the tasklet to write to the database resource. If the writer was writing to a more complicated resource (such as a REST template or file destination), the hybridized writer would have been much cleaner. (Note, there was more code needed to get all the recipient parameters in order, this is just a basic tasklet example.
/**
* Inject the internal email recipient, for monitoring and informational purposes.
*/
public class InjectInternalEmailRecipientTasklet implements Tasklet{
public static final Float DEFAULT_MAX_AFFINITY_SCORE = 1.0f;
private UserCategoryRepository userCategoryRepository;
public InjectInternalEmailRecipientTasklet(RecipientRepository recipientRepository) {
this.recipientRepository = recipientRepository;
}
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
// We can safely inject this record even on non-prod environments because the email processor obfuscates all emails on
// non-prod environments. N.B. we do not want the internal user to receive TEST emails/placements.
recipeintRepository.bulkInsert(new Recipient("testemail#example.com");
return RepeatStatus.FINISHED;
}
}
And adding the tasklet step to the job config is straightforward as well.
public Job loadRecipients() {
return jobs.get("loadRecipients")
.start(truncateRecipientsStep())
.next(injectStaticAnalyticsUserCategoryStep())
.next(loadRecipients())
.preventRestart()
.build();
}
public Step injectInternalEmailRecipientStep() {
return stepBuilderFactory.get("injectAnalyticsEmailUserCategoryStep")
.tasklet(injectInternalEmailRecipientTasklet())
.build();
}
public Tasklet injectInternalEmailRecipientTasklet() {
return new InjectInternalEmailRecipientTasklet(recipientRepository);
}
Job configuration is so verbose for the sake of following patterns that serve the more complicated jobs well.

How to build and utilize a cache using CacheBuilder in Java

I have a method that pulls in a bunch of data. This has the potential to take a decent amount of time due to the large data set and the amount of computation required. The method that does this call will be used many times. The result list should return the same results each time. With that being said, I want to cache the results, so I only have to do that computation once. I'm supposed to use the CacheBuilder class. The script I have is essentially something like:
class CheckValidValues implements AValidValueInterface {
private ADataSourceInterface dataSource;
public CheckValidValues(ADataSourceInterface dataSource) {
this.dataSource = dataSource;
}
#Override
public void validate(String value) {
List<?> validValues = dataSource.getValidValues();
if (!validValues.contains(value)) {
// throw an exception
So I'm not even sure where I should be putting the caching method (i.e. in the CheckValidValues class or the getValidValues() method in dataSource. Also, I'm not entirely sure how you can add code into one of the methods without it instantiating the cache multiple times. Here's the route that I'm trying to take, but have no idea if it's correct. Adding above the List validValues = dataSource.getValidValues() line:
LoadingCache<String, List<?>> validValuesCache = CacheBuilder.newBuilder()
.expireAfterAccess(30, TimeUnit.SECONDS)
.build(
new CacheLoader<String, List<?>>() {
public List<?> load(#Nonnull String validValues) {
return valuesSupplier.getValidValues();
}
}
);
Then later, I'd think I could get that value with:
validValuesCache.get("validValues");
What I think should happen there is that it will do the getValidValues command and store that in the cache. However, if this method is being called multiple times, then, to me, that would mean it would create a new cache each time.
Any idea what I should do for this? I simply want to add the results of the getValidValues() method to cache so that it can be used in the next iteration without having to redo any computations.
You only want to cache a single value, the list of valid values. Use Guavas' Suppliers.memoizeWithExpiration(Supplier delegate, long duration, TimeUnit unit)
Each valid value is only existing once. So your List is essentially a Set. Back it by a HashSet (or a more efficient variant in Guava). This way the contains() is a hash table lookup instead of a sequential search inside the list.
We use Guava and Spring-Caching in a couple of projects where we defined the beans via Java configuration like this:
#Configuration
#EnableCaching
public class GuavaCacheConfig {
...
#Bean(name="CacheEnabledService")
public SomeService someService() {
return new CacheableSomeService();
}
#Bean(name="guavaCacheManager")
public CacheManager cacheManager() {
// if different caching strategies should occur use this technique:
// http://www.java-allandsundry.com/2014/10/spring-caching-abstraction-and-google.html
GuavaCacheManager guavaCacheManager = new GuavaCacheManager();
guavaCacheManager.setCacheBuilder(cacheBuilder());
return guavaCacheManager;
}
#Bean(name = "expireAfterAccessCacheBuilder")
public CacheBuilder<Object, Object> cacheBuilder() {
return CacheBuilder.newBuilder()
.recordStats()
.expireAfterAccess(5, TimeUnit.SECONDS);
}
#Bean(name = "keyGenerator")
public KeyGenerator keyGenerator() {
return new CustomKeyGenerator();
}
...
}
Note that the code above was taken from one of our integration tests.
The service, which return values should be cached is defined as depicted below:
#Component
#CacheConfig(cacheNames="someCache", keyGenerator=CustomKeyGenerator.NAME, cacheManager="guavaCacheManager")
public class CacheableService {
public final static String CACHE_NAME = "someCache";
...
#Cacheable
public <E extends BaseEntity> E findEntity(String id) {
...
}
...
#CachePut
public <E extends BaseEntity> ObjectId persist(E entity) {
...
}
...
}
As Spring-Caching uses an AOP approach, on invoking a #Cacheable annotated method Spring will first check if already a previous stored return value is available in the cache for the invoked method (depending on the cache key; we use a custom key generator therefore). If no value is yet available, Spring will invoke the actual service method and store the return value into the local cache which is available on subsequent calls.
#CachePut will always execute the service method and put the return value into the cache. This is useful if an existing value inside the cache should be replaced by a new value in case of an update for example.

Best way to access StepExecution/JobExecution in ItemReader/Processor/Writer

I've been using SpringBatch for a few months now..
I used to store execution-related variables(like page count, item count, current position of a batch and so on) into Beans. Then those beans are mounted onto ItemReader, ItemProcessor, ItemWriter by using setVar(), getVar()-setters and getters. Also those beans are shared among threads with manual synchronization.
But now I found out this could be a wrong way of doing batch jobs. Beans mounted to ItemReaders can't be persistent in JobRepository and therefore unable to record states for stopping and restarting of a Job. So I still need to go back and use StepExecution/JobExecution.
Those examples I found online are all based on either XML config, or the worse SpEL autowired to a setter method..
I use purely Java Config..Is there a Java config or Java code-oriented way of accessing StepExecution? What's the best practice for accessing various sorts of ExecutionContext?
To get access to the StepExecution and the JobExecution your ItemReader, ItemProcessor, or ItemWriter will have to implement the StepExecutionListener.
For instance:
public class MyCustomItemWriter implements ItemWriter<Object>, StepExecutionListener {
private StepExecution stepExecution;
#Override
public void beforeStep(StepExecution stepExecution) {
this.stepExecution = stepExecution;
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
return stepExecution.getExitStatus();
}
#Override
public void write(List<? extends Object> list) throws Exception {
if (null == list || list.isEmpty()) {
throw new Exception("Cannot write null or empty list");
}
ExecutionContext stepExecContext = this.stepExecution.getExecutionContext()
ExecutionContext jobExecContext = this.stepExecution.getJobExecution().getExecutionContext();
// TODO: Write your code here
}
}
To get access to StepExecution, JobExecution, you can use methods with annotations from package org.springframework.batch.core.annotation or implementing iterfaces like JobExecutionListener, StepExecutionListener depending on your needs

How to get JobParameter and JobExecutionContext in the ItemWriter?

I want to retrieve JobParameter and JobExecutionContext object in my ItemWriter class.
How to proceed?
I tried implementing StepExecutionListener through which I am just calling the parent class methods. But it is not succeeding.
Thanks in advance.
Implementing StepExecutionListener is one way. In fact that's the only way in Spring Batch 1.x.
Starting from Spring Batch 2, you have another choice: You can inject whatever entries in Job Parameters and Job Execution Context to your item writer. Make your item writer with step scope, then make use of expression like #{jobParameters['theKeyYouWant']} or #{jobExecutionContext['someOtherKey']} for value injecting to you item writer.
Use the #BeforeStep annotation to call a method before step processing.
//From the StepExecution get the current running JobExecution object.
public class MyDataProcessor implements ItemProcessor<MyDataRow, MyDataRow> {
private JobExecution jobExecution;
#BeforeStep
public void beforeStep(StepExecution stepExecution) {
jobExecution = stepExecution.getJobExecution();
}
}
To add to Adrian Shum's answer, if you want to avoid each job parameter to be injected as a class property, you can directly inject the Map of JobParameters as follows:
#Value("#{jobParameters}")
private Map<String, JobParameter> jobParameters;
If you are using Spring Configuration file, you can access the StepExecution object with:
<bean id="aaaReader" class="com.AAAReader" scope="step">
<property name="stepExecution" value="#{stepExecution}" />
</bean>
In AAAReader class you need to create the proper field and a setter:
private StepExecution stepExecution;
public void setStepExecution(final StepExecution stepExecution) {
this.stepExecution = stepExecution;
}
Same for Processor and Writer classes.

Categories

Resources