We have created a simple spring batch job with single step. There are custom implemented ItemReader and ItemWriter. The ItemReader gets the initial data from job parameter. The batch runs perfectly when run as a standalone java process. But what we want is to host the batch on some server. Therefore, we have created REST service to initialize the batch. The service calls the job URL and passes some parameter. This parameter is passed as job parameter to the batch. The service and job run fine when it is called for one parameter.
But when we call the service more than once (twice for testing purpose), the batch behaves strangely. We are passing different job parameters. But when the execution starts for second job initialization, the job parameter value which is received by the ItemReader is the same as the one for the first execution. And both execution interfere with each other, sharing database connection, interfering with data retrieved etc.
We have tried setting the restartable parameter to false but it didn't work. We have also tried the following solution:
Can we create multiple instances of a same java(spring) batch job?
The above solution started giving "Interrupted attempting lock" error in JBoss.
On further investigation we found that ItemReader is getting initialized only once. That is why it is getting same job parameter value and is interfering with the previous execution.
EDIT
Following is the job configuration:
<bean id="jobLauncher"
class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository" />
<job id="jobid" restartable="false">
<step id="step1">
<tasklet>
<chunk reader="reader" writer="writer"
commit-interval="2">
</chunk>
</tasklet>
</step>
</job>
Following is the code snippet to launch the job:
JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");
Job job = (Job) context.getBean("jobid");
try {
JobParameters param = new JobParametersBuilder().addString("key","value").toJobParameters();
JobExecution execution = jobLauncher.run(job, param);
} catch (Exception e) {
e.printStackTrace();
}
Can anyone please suggest some solution? Am I missing some configuration for the step?
Thanks in advance.
I found that if we create the Context and JobLauncher objects statically, that is, if there is only one instance of these two objects, the above thing can work. In this way, we can launch the same job multiple times, but with different parameters.
Class MyClass{
private static ConfigurableApplicationContext context = null;
private static JobLauncher jobLauncher = null;
static{
String[] springConfig = {BatchTokeniserConstants.SPRING_CONFIG_FILE_NAME};
try {
context = new ClassPathXmlApplicationContext(springConfig);
jobLauncher = (JobLauncher) context.getBean("jobLauncher");
BatchTokeniserUtils.loadSystemVaiables();
} catch (BeansException e) {
}
}
}
Now the jobLauncher can be used to launch any job any number of time.
I hope it helps others.
Related
I'm using Spring Batch with Spring cloud tasks. I have the following configuration in my job:
#Bean
public Job jobDemo(
#Value("${jobname}")String jobName,
JobBuilderFactory jobBuilderFactory,
JobCompletionNotificationListener listener
) {
return jobBuilderFactory.get(jobName)
.incrementer(new RunIdIncrementer())
.preventRestart()
.listener(listener)
.flow(stepA())
.end()
.build();
}
I don't want the restart functionality in the job, that's why I have put .preventRestart(). I want to launch a new job every time the task runs, that is, a new instance of the job to run even when the last time the job has failed or stopped or anything. But I'm getting the following error:
org.springframework.batch.core.repository.JobRestartException: JobInstance already exists and is not restartable
This happens only in the scenarios when the job does not finish sucessfully. Any ideas about the solution?
A JobInstance can only be completed once successfully. When you are starting a Spring Batch job via Spring Boot, Spring Batch handles the logic to increment a JobParameter if there is a JobParametersIncrementer provides (as you have). However...when Spring Batch does that incrementing, it only increments if the previous job was successful. In your case, you want it to always increment. Because of that, you're going to need to write your own CommandLineRunner that always increments the JobParameters.
Spring Boot's JobLauncherCommandLineRunner is where the code to launch a job exists. You'll probably want to extend that and override it's execute method to be sure job parameters are always incremented.
I am using spring batch, but due to job instance already exist error I need to add current time in my job parameter. I am unable to figure out where to add job parameters. Here is my code:
<step id="myStep">
<tasklet>
<chunk reader="myReader" processor="myProcessor" writer="myWriter" commit-interval="6000" skip-limit="9000">
//some more code.
</chunk>
</tasklet>
</step>
<bean id="myReader" class="org.springframework,batch.item.database.StoredProcedueItemReader" scope="step">
//define property for datasource , procedurename , rowmapper, parameters
<property name="preparedStatementSetter" ref="myPreparedStatmentSetter">
</bean>
<bean id="myPreparedStatmentSetter" class="com.mypackage.MyPreparedStatementSetter" scope="step">
<property name="kId" value="#{jobParameters[kId]}">
</bean>
When I try to run the job for same kId multiple times I get The job already exist error, so I need to add current timestamp to my job parameter.
Would adding current time stamp as a property in the bean myPreparedStatmentSetter be sufficient, or do I need to add jobparameter somewhere else too? From where exactly are jobparameters picked from in spring file?
In case I need to add timestamp to the bean here is a questions -My stored procedure takes only kID as paramter, I dont need to pass current time stamp to stored procedure, then why I need to add the same in myPreparedStatmentSetter.
Also how would I add current timestamp in an xml file without java code?
EDIT
Here is my jobLauncher bean
<bean Id= "jobLauncher "class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" value="myJobRepo">
</bean>
Adding a "random" job parameter by hand, while it can work, isn't the most ideal way to get around the job instance already exists error. Instead, you should consider adding a JobParametersIncrementer to your job. Spring provides the RunIdIncrementer as an implementation of this out of the box. A job configured with it would look something like the following:
#Bean
public Job myJob() {
return jobBuilderFactory.get("myJob")
.incrementer(runIdIncrementer())
.start(step1())
.build();
}
#Bean
public JobParametersIncrementer runIdIncrementer() {
return new RunIdIncrementer();
}
I am guessing that you already adding KId to your job parameters. Add following to your joblaucher.run() method.
new JobParametersBuilder()
.addLong("time",System.currentTimeMillis())
.addLong("KId",<your KID>)
.toJobParameters();
We have a requirement to carry out data movement from 1 database to other and exploring spring batch for the same. User of our application selects source and target datasource along with the list of tables for which the data needs to be moved.
Need help with following:
The information necessary to build a job comes at runtime from our web application - that includes datasource details and list of table names. We would like to create a new job by sending these details to the job builder module and launch it using JobLauncher. How do we write this job builder module?
We may have multiple users raising data movement requests in parallel, so need a way to create multiple jobs and run them in suitable order.
We have used the Java based configuration to create a job and launch it from a web container. The configuration is as follows
#Bean
public Job loadDataJob(JobCompletionNotificationListener listener) {
RunIdIncrementer inc = new RunIdIncrementer();
inc.setKey(new Date().toString());
JobBuilder builder = jobBuilderFactory.get("loadDataJob")
.incrementer(inc)
.listener(listener);
SimpleJobBuilder simpleBuilder = builder.start(preExecute());
for(String s : getTables()){
simpleBuilder.next(etlTable(s));
}
simpleBuilder.next(postExecute());
return simpleBuilder.build();
}
#Bean
#Scope("prototype")
public Step etlTable(String tableName) {
return stepBuilderFactory.get(tableName)
.<Map<String,Object>, Map<String,Object>> chunk(1000)
.reader(dbDataReader(tableName))
.processor(processor())
.writer(dbDataWriter(tableName))
.build();
}
Currently we have hardcoded the source and target datasource details into respective beans. The getTables() returns a list of tables (hardcoded) for which the data needs to be moved.
RestController that launches the job
#RestController
public class MyController {
#Autowired
JobLauncher jobLauncher;
#Autowired
Job job;
#RequestMapping("/launchjob")
public String handle() throws Exception {
try {
JobParameters jobParameters = new JobParametersBuilder().addLong("time", new Date().getTime()).toJobParameters();
jobLauncher.run(job, jobParameters);
} catch (Exception e) {
}
return "Done";
}
}
Concerning your first question, you definitely have to use JavaConfiguration. Moreover, you shouldn't define your steps as spring beans, if you want to create a job with a dynamic number of steps (for instance a step per table you have to copy).
I've written a couple of answers to questions about how to create jobs dynamically. Have a look at them, they might be helpful
Spring batch execute dynamically generated steps in a tasklet
Spring batch repeat step ending up in never ending loop
Spring Batch - How to generate parallel steps based on params created in a previous step
Spring Batch - Looping a reader/processor/writer step
Edited
Some remarks concerning your second question:
Firstly, you are using a normal JobLauncher and I assume your instantiate the SimpleJobLauncher. This means, you can provide a job with jobparameters, as you have shown in your code above. However, the provided "job" does not have to be a "SpringBean"-instance, so you don't have to Autowire it and therefore, you can use create-methodes as I suggested in the answers to the questions mentioned above.
Secondly, if you create your Job instance for every request dynamically, there is no need to pass the whole configuration as jobparameters, since you can pass the "configuration properties" like datasource and tables to be copied directly as parameters to your "createJob" method. You could even create your DataSource-instances "on the fly", if you don't know all possible datasources in advance.
Thirdly, I would consider every request as a "single run", which cannot be "restarted". Hence, I'd just but some "meta information" into the jobparameters like user, date/time, datasource names (urls) and a list of tables to be copied. I would use this kind of information just as a kind of logging/auditing which requests where issued, but I wouldn't use the jobparameter-instances as controlparameters inside the job itself (again, you can pass the values of these parameters during the construction time of the job and steps by passing them to your create-Methods, so the structure of your job is created according to your parameters and hence, during runtime - when you could access your jobparameters - there is nothing to do based on the jobparameters).
Finally, if a request fails (meaning the jobs exits with an error) simply a new request has to be executed in order to retry, but this request would be a complete new request and not a restart of an already executed job launch (since I would add the request time to my jobparameters, every launch would be a unique launch).
Edited 2:
Not creating the Job as a Bean doesn't mean to not use Autowiring. Here is an example, aus I would structure my Beans.
#Component
#EnableBatchProcessing
#Import() // list with imports as neede
public class JobCreatorComponent {
#Autowire
private StepBuilderFactory stepBuilder;
#Autowire
private JobBuilderFactory jobBuilder;
public Job createJob(all the parameters you need) {
return jobBuilder.get(). ....
}
}
#RestController
#Import(JobCreatorComponent.class)
public class MyController {
#Autowired
JobLauncher jobLauncher;
#Autowired
JobCreatorComponent jobCreator;
#RequestMapping("/launchjob")
public String handle() throws Exception {
try {
Job job = jobCreator.createJob(... params ...);
JobParameters jobParameters = new JobParametersBuilder().addLong("time", new Date().getTime()).toJobParameters();
jobLauncher.run(job, jobParameters);
} catch (Exception e) {
}
return "Done";
}
}
by using #JobScope on itemreader no need to do things manually at run time just have to annoted your respective reader with #Jobscope, on each interaction with controller you will get fresh record processing.
This is type of job on demand where you can execute the job for goals like do the db migration or get the specific reporting like that.
Is there any possibility to find out, If a job is restarted in Spring Batch?
We do provide some Tasklets without restart-support from spring-batch and has to implement our own proceeding, if job is restarted.
Can't find any possibility in JobRepository, JobOperator, JobExplorer, etc.
Define a JobExplorer bean with required properties
<bean id="jobExplorer"
class="org.springframework.batch.core.explore.support.JobExplorerFactoryBean">
<property name="dataSource" ref="dataSource"/>
<property name="lobHandler" ref="lobHandler"/>
</bean>
Query it with your jobName
List<JobInstance> jobInstances= jobExplorer.getJobInstances(jobName);
for (JobInstance jobInstance : jobInstances) {
List<JobExecution> jobExecutions = jobExplorer.getJobExecutions(jobInstance);
for (JobExecution jobExecution : jobExecutions) {
if (jobExecution.getExitStatus().equals(ExitStatus.COMPLETED)) {
//You found a completed job, possible candidate for a restart
//You may check if the job is restarted comparing jobParameters
JobParameters jobParameters = jobInstance.getParameters();
//Check your running job if it has the same jobParameters
}
}
}
Did not compile this but I hope it gives an idea
Another way using jobExplorer is execute the following command:
jobExplorer.getJobExecutions(jobExplorer.getJobInstance(currentJobExecution.getJobInstance().getId())).size() > 1;
This statement verifies if another execution of the the same job (same id) exists. In environments with minimum control, does not exist possibility that the other execution be not a failed or stopped execution.
Potentially you can find this information in spring-batch's database tables, can't remeber the exact table's name, but you can figure out quickly because there are only few tables. I guess there is some information regarding restarting.
I have a Spring-Batch job that I launch from a Spring MVC controller. The controller gets an uploaded file from the user and the job is supposed to process the file:
#RequestMapping(value = "/upload")
public ModelAndView uploadInventory(UploadFile uploadFile, BindingResult bindingResult) {
// code for saving the uploaded file to disk goes here...
// now I want to launch the job of reading the file line by line and saving it to the database,
// but I want to launch this job in a new thread, not in the HTTP request thread,
// since I so not want the user to wait until the job ends.
jobLauncher.run(
jobRegistry.getJob(JOB_NAME),
new JobParametersBuilder().addString("targetDirectory", folderPath).addString("targetFile", fileName).toJobParameters()
);
return mav;
}
I've tried the following XML config:
<job id="writeProductsJob" xmlns="http://www.springframework.org/schema/batch">
<step id="readWrite">
<tasklet task-executor="taskExecutor">
<chunk reader="productItemReader" writer="productItemWriter" commit-interval="10" />
</tasklet>
</step>
</job>
<bean id="taskExecutor"
class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="5" />
<property name="maxPoolSize" value="5" />
</bean>
...but it seems like the multithreading happens only within the job boundaries itself. I.e., the controller thread waits until the job ends, and the job execution is handled by multiple threads (which is good but not the main thing I wanted). The main thing I wanted is that the job will be launched on a separate thread (or threads) while the controller thread will continue its execution without waiting for the job threads to end.
Is there a way to achieve this with Spring-batch?
The official documentation describes your exact problem and a solution in 4.5.2. Running Jobs from within a Web Container:
[...] The controller launches a Job using a JobLauncher that has been configured to launch asynchronously, which immediately returns a JobExecution. The Job will likely still be running, however, this nonblocking behaviour allows the controller to return immediately, which is required when handling an HttpRequest.
Spring Batch http://static.springsource.org/spring-batch/reference/html-single/images/launch-from-request.png
So you were pretty close in trying to use TaskExecutor, however it needs to be passed to the JobLauncher instead:
<bean id="jobLauncher"
class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository" />
<property name="taskExecutor" ref="taskExecutor"/>
</bean>
Disclaimer: I have never used Spring Batch...
The jobLauncher.run() method can be called in a new Thread like so:
#RequestMapping(value = "/upload")
public ModelAndView uploadInventory(UploadFile uploadFile, BindingResult bindingResult) {
[...]
final SomeObject jobLauncher = [...]
Thread thread = new Thread(){
#Override
public void run(){
jobLauncher.run([...]);
}
};
thread.start();
return mav;
}
The thread.start() line will spawn a new thread, and then continue to execute the code below it.
Note that, if jobLauncher is a local variable, it must be declared final in order for it to be used inside of the anonymous Thread class.
If you don't need to show the processing errors to your client, you can start the spring batch job in a seperate thread.