I am trying to configure Spring Batch Steps for partitioning. The nice sample found here shows a partition about "id range", but I don't know where to start for a "data page" range.
In my sequential step, I have :
reader : a RepositoryItemReader using a PagingAndSortingRepository
processor : a data converter
writer : a RepositoryItemWriter using a CrudRepository
chunck : 5
listener : a StepListener
return stepBuilderFactory.get("stepApplicationForm")
.<OldApplicationForm, NewApplicationForm>chunk(5)
.reader(reader).processor(processor).writer(writer)
.listener(listener).build();
As I have understood, for partitionning, I have to create a partitioner, then I have a "parent" step that tells to use the partitioner with the child step, then the "child" step with a reader aware of the "pagination" parameters.
For the TaskExecutor, I think that the ThreadPoolTaskExecutor will fit.
What is the good way to implement/configure a paritioning based on data "pages" ? And what are the threading caveeats I should check ?
Thanks :)
Each partition has its own item reader and item writer instances. Your partition implementation will find min max values of a data load. Using your own logic you can create min and max values in the execution context. While querying the data base you can make use of these to handle specific slice of the data so that no concurrency issues takes place.
#Bean
public Step myMasterStep() {
return stepBuilderFactory.get("myMasterStep")
.partitioner("mySlaveWorker", myPartitioner())
.partitionHandler(myPartitionHandler()).build();
}
#Bean
public Step mySlaveWorker() {
return stepBuilderFactory
.get("mySlaveWorker")
.<OldApplicationForm, NewApplicationForm> chunk(5)
.faultTolerant()
.listener(MyStepListener())
.skip(DataAccessException.class)
.skip(FatalStepExecutionException.class)
.skip(Exception.class)
.skipLimit(75)
.noRollback(DataAccessException.class)
.noRollback(FatalStepExecutionException.class)
.noRollback(Exception.class)
.reader(myDataItemReader())
.writer(myDataItemWriter()).build();
}
#Bean
#StepScope
public MyDataItemReader myDataItemReader(
#Value("#{stepExecutionContext[minId]}") Long minId,
#Value("#{stepExecutionContext[maxId]}") Long maxId) {
MyDataItemReader myDataItemReader = new MyDataItemReader();
myDataItemReader.setPageSize(100);
myDataItemReader.setMinId(minId);
myDataItemReader.setMaxId(maxId);
return myDataItemReader;
}
#Bean
#StepScope
public MyDataItemWriter myDataItemWriter() {
return new MyDataItemWriter();
}
#Bean
#StepScope
public MyPartitioner myPartitioner() {
MyPartitioner myPartitioner = new MyPartitioner();
myPartitioner.setDataSource(dataSource);
return myPartitioner;
}
public class MyStepListener implements SkipListener<OldApplicationForm, NewApplicationForm> {
private static final Logger LOGGER = LoggerFactory.getLogger(MyStepListener.class);
public void onSkipInProcess(OldApplicationForm item, Throwable t) {
LOGGER.error("onSkipInProcess" + t.getMessage());
}
public void onSkipInRead(Throwable t) {
LOGGER.error("onSkipInRead " + t.getMessage());
}
public void onSkipInWrite(NewApplicationForm item, Throwable t) {
//logs
LOGGER.error("In MyStepListener --> onSkipInProcess" + t.getMessage());
}
}
Related
I coded a custom implementation of ListItemReader, triying to follow the example in the spring batch's github. Anyway, in my case, I need read a variable from the jobContext, this variable is a path where I have to read the files that contains. I can't use the constructor because the constructors executes before the beforeStep event and I don't have these var at this moment.
Anyway this will work first execution, but if the list never goes again to null I can't execute again the initialize method.
If I tried add an else in the !list.isEmpty() condition that set my list to null. I enter in an infinite loop.
There are other methods to solve this? Maybe I am overcomplicating this.
public class ListItemReader<Path> implements ItemReader<Path>, StepExecutionListener {
private List<Path> list;
private org.springframework.batch.core.JobExecution jobExecution;
public ListItemReader() {
this.list = new ArrayList<>();
}
public void initialize(){
//Here I made an listdirectories of a path and add all the files to the list
String pathString = jobExecution.getExecutionContext().getString(Constants.CONTEXT_PATH);
Path path = Paths.get(pathString );
...
items.add(Paths.get(..path..));
}
#Nullable
#Override
public T read() {
if(list == null) initialize();
if (!list.isEmpty()) {
return list.remove(0);
}
return null;
}
#Override
public ExitStatus afterStep(StepExecution se) {
return ExitStatus.COMPLETED;
}
#Override
public void beforeStep(StepExecution se) {
jobExecution = se.getJobExecution();
}
}
I can't use the constructor because the constructors executes before the beforeStep event and I don't have these var at this moment.
Actually, you can delay your bean constructor by using #JobScope and #StepScope. Additionally, you can use #Value and Spring SpEL to inject your jobParameters.
In your case, you might want to rewrite your code, for e.g:
#Configuration
#RequiredArgsConstructor
public class YourJobConfig {
private final StepBuilderFactory stepBuilder;
#Bean("yourStep")
public Step doWhatever() {
return stepBuilder
.get("yourStepName")
.<Path, Path>chunk(50) // <-- replace your chunk size here
.reader(yourItemReader(null, 0)) // <-- place your default values here
.process(yourItemProcessor())
.writer(yourItemWriter())
}
#Bean
public ItemReader<Path> yourItemReader(
#Value("#{jobParameters['attribute1']}") String attribute1,
#Value("#{jobParameters['attribute2']}") long attribute2
) {
return new ListItemReader(attribute1, attribute2) // <-- this is your ListItemReader
}
#Bean
public ItemProcessor<Path, Path> yourItemProcessor(){
return new YourItemProcessor<>();
}
#Bean
public ItemWriter<Path> yourItemWriter(){
return new YourItemWriter<>();
}
}
Before starting your job, you can add some jobParameters:
public JobParameters getJobParams() {
JobParametersBuilder params = new JobParametersBuilder();
params.addString("attribute1", "something_cool");
params.addLong("attribute2", 123456);
return params.toJobParameters();
}
and add it to your job:
Job yourJob = getYourJob();
jobParameters jobParams = getJobParams();
JobExecution execution = jobLauncher.run(job, jobParams);
References
The Delegate Pattern and Registering with the Step
How to get access to job parameters from ItemReader
I'm testing Spring Batch for my next project. In my step I indicated my ItemReader, ItemProcessor, and ItemWriter.
On my ItemReader I'm fetching the ID's only using JdbcPagingItemReader and passing it to ItemProcessor where it fetch the whole entity using JpaRepository.findOne(). I'm applying the Driving Query Pattern here.
On the same time in my ItemProcessor implementation, I'm setting one of its field: setUpdated(new Date()).
Then on my ItemWriter, I'm just logging the passed entity from the ItemProcessor.
My question is when I check the logs, hibernate is updating the entities value in the table even though in my ItemWriter I'm just logging it.
ItemProcessor implementation:
public class IdToContractItemProcessor implements ItemProcessor<BigDecimal, Contract> {
#Override
public Contract process(BigDecimal id) {
Contract contract = repo.findOne(id.longValue());
contract.setUpdated(new Date());
return contract;
}
}
ItemWriter implementation:
public class CustomItemWriter implements ItemWriter<Contract> {
#Override
public void write(List<? extends Contract> list) {
for (Contract c : list) {
log.info("id {}", c.getId());
}
}
}
Step bean:
#Bean
public Step step() {
return stepBuilderFactory
.get("myStep")
.<BigDecimal, Contract>chunk(3)
.reader(jdbcPagingItemReader)
.processor(idToContractItemProcessor)
.writer(customWriter)
.build();
}
Contract item is an entity class.
Why is that every after 3 Contracts (chunk), hibernate logs update statement even though I'm not saving it on ItemWriter?
I'm having trouble with Spring Batch regarding the configuration of my custom writer which is basically a RepositoryItemWriter
#Bean
#StepScope
public ItemReader<DTO> itemReader() {
[...]Reading from database and mapping into DTO class
return reader;
}
#Bean
#StepScope
public ItemProcessor<DTO, Entity> itemProcessor(mapper) {
return dto-> {
dto.check();
return mapper.toEntity(dto);
};
}
#Bean
#StepScope
public ItemWriter<Entity> itemWriter() {
[...]Save into database from repository
return writer;
}
#Bean
public Step step() {
return stepBuilderFactory.get("step")
.<DTO, Entity>chunk(500)
.reader(itemReader)
.writer(itemWriter)
.build();
}
I am using mapstruct to map DTO to Entity within the processor. Even though it seems to be right, my writer is actually receiving DTO items instead of Entity and thus cannot persist them.
Some complementary but irrelevant information on the structure of the batch. I'm reading from a large file, splitting it into smaller files. Then I'm partitioning my step with a multi resource partitioner, processor is doing a few format controls then the writter just batch insert it into database.
Edit :
I guess I could copy/paste the generated source but the MapperImpl is pretty straight forward :
#Override
public Entity toEntity(DTO dto) {
if ( dto == null ) {
return null;
}
Entity entity = new Entity();
[Bunch of controls and mapping]
return entity;
}
That's pretty much it.
Thank you for your help
return mapper.toEntity(dto);
Perhaps, problem is in mapper implementation. Its hard to say how mapper works without implementation source
Mistake from coding during the night I guess. The processor was not declared for the step, so items were going straight from reader to writer without being processed and transform as entities.
#Bean
#StepScope
public ItemReader<DTO> itemReader() {
[...]Reading from database and mapping into DTO class
return reader;
}
#Bean
#StepScope
public ItemProcessor<DTO, Entity> itemProcessor(mapper) {
return dto-> {
dto.check();
return mapper.toEntity(dto);
};
}
#Bean
#StepScope
public ItemWriter<Entity> itemWriter() {
[...]Save into database from repository
return writer;
}
#Bean
public Step step() {
return stepBuilderFactory.get("step")
.<DTO, Entity>chunk(500)
.reader(itemReader)
.processor(itemProcessor) //Edit with solution : This line was missing
.writer(itemWriter)
.build();
}
Still wondering it should have compiled though.
I have a complex job flow where I have 3 separate jobs built into a JobStep, and then I call that JobStep from a Job. There will be four of these JobSteps that will run in parallel from the calling job.
I need to pass a string in to them as a parameter.
Somewhat simplified code:
My main looks like this:
public static void main(String[] args) {
SpringApplication.run(SomeApplication.class, args);
}
One of the JobSteps looks like
#Bean
public JobStep jobStep1(<snip>){
<snip for clarity>
JobStep jobStep = new JobStep() ;
jobStep.setJob(jobs.get(jobName)
.incrementer(new RunIdIncrementer()).listener(listener)
.start(Flow1)
.next(Flow2)
.next(Flow3)
.end().build());
jobStep.setJobRepository(jobRepository);
jobStep.setJobLauncher(jobLauncher);
return jobStep;
}
The top job that runs the rest looks like
#Bean
public Job parentJob(<snip>) {
Flow childJobFlow = new FlowBuilder<SimpleFlow>("childJob").start(job1).build();
Flow childJobFlow2 = new FlowBuilder<SimpleFlow>("childJob2").start(job2).build();
FlowBuilder<SimpleFlow> builder = new FlowBuilder<SimpleFlow>("jobFlow");
Flow jobFLow = builder.split(new SimpleAsyncTaskExecutor()).add(childJobFlow,childJobFlow2).build();
return jobs.get("parentJob")
.incrementer(new RunIdIncrementer()).listener(listener)
.start(jobFLow)
.end().build();
}
I need each JobStep to get a different string.
I was able to accomplish as Nghia Do suggested in his comment by using Partitioner. With partitioner I was able to push a string on to the context and then in a #Before Step retrieve it.
In my ItemReader I have:
#BeforeStep
public void beforeStep(StepExecution stepExecution) throws Exception {
this.stepExecution = stepExecution.getExecutionContext();
this.keyString = stepExecution.getString("keyString");
}
The Paritioner
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, ExecutionContext> partitionMap = new HashMap<String, ExecutionContext>();
List<String> codes = getCodes();
for (String code : codes)
{
ExecutionContext context = new ExecutionContext();
context.put("keyString", code);
partitionMap.put(code, context);
}
return partitionMap;
}
getCodes is just a placeholder function right now that returns a list of strings for testing. Eventually it will be replaces with something more useful.
private List<String> getCodes() {
ArrayList<String> result = new ArrayList<String>();
result.add("One");
result.add("Two");
result.add("Three");
result.add("Four");
result.add("Five");
result.add("Six");
result.add("Seven");
return result;
}
Then to get the steps I had make a master step that called my existing steps:
#Bean
public Step masterStep(#Value("#{proccessFilesStep}") Step readFilesStep) {
return stepBuilders.get("masterStep")
.partitioner(readFilesStep)
.partitioner("proccessFilesStep", partitioner())
.taskExecutor(taskExecutor())
.build();
}
And stepBuilders is:
#Autowired
private StepBuilderFactory stepBuilders;
Had to combine like 20 different examples on the net to get all the peices so I am putting them all in this answer for the next person that needs it.
I have a web application that takes input from a user and uses it to generate a report based on the results of calling various external web services.
I want to track the progress of the report generation, being able to see the status, start time and stop time of each step.
I've added the domain objects Job and JobStep:
#Entity
#Table(name="jobs")
#Data
#EqualsAndHashCode(callSuper=false, of={ "id" })
#ToString()
public class Job extends DomainObject {
#NotNull
#OneToMany(cascade=CascadeType.ALL)
#JoinColumn(name="job_id")
private Set<JobStep> steps = new TreeSet<JobStep>();
protected Job() {/*Hibernate requirement*/}
public Job() {
// Create all the steps in the beginning with the default settings:
// status=waiting, date_time both null.
for (JobStep.Type stepType : JobStep.Type.values()) {
JobStep step = new JobStep(stepType);
steps.add(step);
}
}
public Set<JobStep> getSteps() {
return steps;
}
public void startStep(JobStep.Type stepType)
{
for (JobStep step : steps) {
if (step.getType() == stepType) {
step.start();
return;
}
}
}
public void stopStep(JobStep.Type stepType, JobStep.Status status) {
for (JobStep step : steps) {
if (step.getType() == stepType) {
step.stop(status);
return;
}
}
}
}
#Entity
#Table(name="job_steps")
#Data
#EqualsAndHashCode(callSuper=false, of={ "type", "job" })
#ToString
public class JobStep extends DomainObject implements Comparable<JobStep> {
private static final Logger LOG = LoggerFactory.getLogger(JobStep.class);
public enum Type {
TEST_STEP1,
TEST_STEP2,
TEST_STEP3
}
public enum Status {
WAITING,
RUNNING,
FINISHED,
ERROR
}
#NotNull
#Getter
#Enumerated(EnumType.STRING)
private Type type;
#NotNull
#Setter(AccessLevel.NONE)
#Enumerated(EnumType.STRING)
private Status status = Status.WAITING;
#Setter(AccessLevel.NONE)
private DateTime start = null;
#Setter(AccessLevel.NONE)
private DateTime stop = null;
#ManyToOne
private Job job;
protected JobStep() {/*Hibernate requirement */}
public JobStep(Type type) {
this.type = type;
}
public void start() {
assert(status == Status.WAITING);
status = Status.RUNNING;
start = new DateTime();
}
public void stop(Status newStatus) {
assert(newStatus == Status.FINISHED ||
newStatus == Status.ERROR);
assert(status == Status.RUNNING);
status = newStatus;
stop = new DateTime();
}
#Override
public int compareTo(final JobStep o) {
return getType().compareTo(o.getType());
}
}
These are manipulated using the JobService class:
#Service
public class JobService {
private static final Logger LOG = LoggerFactory.getLogger(JobService.class);
#Autowired
private JobDAO jobDao;
#Transactional
public void createJob() {
Job job = new Job();
Long id = jobDao.create(job);
LOG.info("Created job: {}", id);
}
#Transactional
public Job getJob(Long id) {
return jobDao.get(id);
}
#Transactional
public void startJobStep(Job job, JobStep.Type stepType) {
LOG.debug("Starting JobStep '{}' for Job {}", stepType, job.getId());
job.startStep(stepType);
}
#Transactional
public void stopJobStep(Job job, JobStep.Type stepType,
JobStep.Status status) {
LOG.debug("Stopping JobStep '{}' for Job {} with status {}", stepType,
job.getId(), status);
job.stopStep(stepType, status);
}
}
So in a method that starts a step, I can write:
class Foo() {
#Autowired
JobService jobService;
public void methodThatStartsAStep(Job job) {
jobService.startJobStep(job, JobStep.Type.TEST_STEP1);
// Implementation here
}
}
The problem I'm having is finding a way to give the Job instance to the method that requires it in order to record that the step has started.
The obvious solution is to pass the Job as a parameter (as above), but it doesn't always make sense passing a Job - it's only done to record the step (extreme example below):
public int multiplySomeNumbers(Job job, int num1, int num2) {
jobService.startJobStep(job, JobStep.Type.TEST_STEP1);
// Implementation here.
}
I have two thoughts on an ideal solution:
Use an aspect and annotate functions that can cause a change in the job step state. This makes it less coupled, but the aspect would still need to get the job from somewhere;
Store the Job object or id in a global-like scope (e.g. a session or context). I tried using #Scope("session") on my JobService with the intention of storing the Job instance there, but I kept getting java.lang.IllegalStateException: No thread-bound request found. I'm not even sure if this is the right use-case for such a solution.
My questions are:
Is it possible to store the Job or its id somewhere so I don't have to add the Job as a parameter to method?
Is there a way of doing this that I'm not aware of?
re: question 2, I'm going to go out on a limb and take the widest definition of that question possible.
You seem to be reimplementing Spring Batch. Batch has extensive support for defining and executing jobs, persisting job progress, and supporting resumption. It also has contexts for remembering state and moving state from one step to another, chunk-oriented processing, and a generally well thought out and extensive infrastructure, including a bunch of readers and writers for common workflows.
Feel free to ignore this answer, I just wanted to throw the suggestion out there in case it spares you a ton of work.
you can keep it in thread local , you can directly access the Object from thread local / Or you can create custom Spring scope for more info about custom scope http://springindepth.com/book/in-depth-ioc-scope.html . And you can define the Job in custom Scope and inject that into your beans.
EDIT : This will work only if your entire process runs in single thread and your Job steps are static you can follow the process you mentioned. In case if your Jobs are not static ( mean calling the external services / order of external services may be changed based on input) i would implement Chain responsibility and command pattern ( commands as actual process) and Chain as your Job Steps. then you can track / stop / change the steps based on configuration.