Synchronize Batch Jobs across multiple Application Instances

Synchronize Batch Jobs across multiple Application Instances - java

I am writing a spring batch application which should only run one Job Instance at a time. This should also be true if multiple application instances are started. Sadly, the jobs can’t be parallelized and are invoked at random.
So, what I am looking for is a spring boot configuration which allows me to synchronize the job execution within one processor as well as in the distributed case. I have already found some approaches like the JobLauncherSynchronizer (https://docs.spring.io/spring-batch-admin/trunk/apidocs/org/springframework/batch/admin/launch/JobLauncherSynchronizer.html) but all the solutions I have found work either only on one processor or protect just a fraction of the job execution.
Is there any spring boot configuration which prevents multiple concurrent executions of the same job, even across multiple concurrently running application instances (which share the same database)?
Thank you in advance.

Is there any spring boot configuration which prevents multiple concurrent executions of the same job, even across multiple concurrently running application instances (which share the same database)?
Not to my knowledge. If you really want to have a global synchronization at the job level (ie a single job instance at a time), you need a global synchronizer like the JobLauncherSynchronizer you linked to.

Related

Quartz: Multiple Jobs that share a resource

I have multiple jobs and they all share the same resource. This resource is some ad-hoc build script, and so it cannot be ran concurrently.
Is it possible to define in Quartz that some jobs cannot run concurrently?
So, if one of the jobs is already running, the spawned job is queued.

I had encounter a similar scenario in my application, try out the below approach and see if it works for you.
Put the code that runs your ad-hoc build script in a synchronized block.
With this only one thread will run your ad-hoc script at a time, even when multiple threads are trying to run the same resource.
With this, you can increase the thread count to a suitable value as well instead of setting it to 1, like below.
spring.quartz.properties.org.quartz.threadPool.threadCount=5
If you want to run multiple quartz scheduler instances on different machines sharing a single database, then you should consider Configure Clustering with JDBC-JobStore. Please refer this link for more info http://www.quartz-scheduler.org/documentation/quartz-2.2.2/configuration/ConfigJDBCJobStoreClustering.html

Avoiding concurrency in Spring Batch jobs in a cluster environment

I want to ensure that a Spring job is not started a second time while it still runs. This would be trivial in a single JVM environment.
However, how can I achieve this in a cluster environment (more specifically, in JBoss 5.1 - I know a bit antiquated; if solutions exist for later versions, I'd be interested in those as well).
So, it should be kind of a Singleton pattern across all cluster nodes.
I am considering using database locks or a message queue. Is there a simpler / better performing solution?

You need to synchronize threads that doesn't know nothing each other, so the easiest way is to share some information on a common place. Valid alternatives are:
A shared database
A shared file
An external web service holding the status of the batch process
If you prefer to use a shared database try to use a database like Redis to improve your performance. It is an in memory database with persistence on disk, so accessing the status of the batch process should be enough fast.

This is too late but for future lookups: spring batch uses a jpa repository to synchronize jobs, so you can avoid concurrency.

You can add a Job Listener and in the before step and use JobExecutionDao in it to find all JobExecutions. If there are more than one running - throw an exception and exit the job.

Behavior of executor service in cluster

I had written a code using executor service in java. Here I am creating 10 worker threads to process database fetched rows. Each thread will be assigned with one resultant row. This approach will work fine when the application is deployed and running on single instance/node.
Can anyone suggest how this will behave when my application is deployed in multiple nodes/cluster?
Do I have to take care of any part of code before deploying into cluster?
04/12/15: Any more suggestions?

You should consider the overhead of each task. Unless the task is of moderate size, you might want to batch them.
In a distributed context the overhead if much higher so you are more likely to need to batch the work.
You will need to a framework, so the considerations will depend on the framework you chose.

How to abandon zombie jobs?

I am using Spring Batch in a web application. The jobs are run by a custom TaskExecutor that manages the submitted Runnables in a queue and executes them one after another. Thus many jobs have the status STARTING or STARTED for a long time.
Now it happend that the server was shutdown while there were still jobs in the queue. After a restart of the server the jobs are still marked as running, but instead they should be abandoned.
How can this be done?

Spring Batch provides no tools out of the box for something like this. The reason for that is that this typically requires a human decision of some kind on what conditions this type of behavior is appropriate.
That being said, in your case, if you are not running in a clustered environment, you could create a component that, on the initialization of the application context, examines the job repository and updates the statuses as required (similar to how you can initialize a datasource on startup).

Clustered Quartz scheduler configuration

I'm working on an application that uses Quartz for scheduling Jobs. The Jobs to be scheduled are created programmatically by reading a properties file. My question is: if I have a cluster of several nodes which of these should create schedules programmatically? Only one of these? Or maybe all?

i have used quartz in a web app, where users, among other things, could create quartz jobs that performed certain tasks.
We have had no problems on that app provided that at least the job names are different for each job. You can also have different group names, and if i remember correctly the jobgroup+jobname combination forms a job key.
Anyway we had no problem with creating an running the jobs from different nodes, but quartz at the time(some 6 months ago, i do not believe this has changed but i am not sure) did not offer the possibility to stop jobs in the cluster, it only could stop jobs on the node the stop command was executed on.
If instead you just want to create a fixed number of jobs when the application starts you better delegate that job to one of the nodes, as the jobs name/group will be read from the same properties file for each node, and conflicts will arise.

Have you tried creating them on all of them? I think you would get some conflict because of duplicate names.
So I think one of the members should create the schedules during startup.

You should only have one system scheduling jobs for the cluster if they are predefined in properties like you say. If all of the systems did it you would needlessly recreate the jobs and maybe put them in a weird state if every server made or deleted the same jobs and triggers.
You could simply only deploy the properties for the jobs to one server and then only one server would try to create them.
You could make a separate app that has the purpose of scheduling the jobs and only run it once.
If these are web servers you could make a simple secured REST API that triggers the scheduling process. Then you could write an automated script to access the API and kick off the scheduling of jobs as part of a deployment or whenever else you desired. If you have multiple servers behind a load balancer it should go to only one server and schedule the jobs which quartz would save to the database backed jobstore. The other nodes in the cluster would receive them the next time they update from the database.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Synchronize Batch Jobs across multiple Application Instances - java

Related

Quartz: Multiple Jobs that share a resource

Avoiding concurrency in Spring Batch jobs in a cluster environment

Behavior of executor service in cluster

How to abandon zombie jobs?

Clustered Quartz scheduler configuration

Categories

Resources