How to abandon zombie jobs? - java

I am using Spring Batch in a web application. The jobs are run by a custom TaskExecutor that manages the submitted Runnables in a queue and executes them one after another. Thus many jobs have the status STARTING or STARTED for a long time.
Now it happend that the server was shutdown while there were still jobs in the queue. After a restart of the server the jobs are still marked as running, but instead they should be abandoned.
How can this be done?

Spring Batch provides no tools out of the box for something like this. The reason for that is that this typically requires a human decision of some kind on what conditions this type of behavior is appropriate.
That being said, in your case, if you are not running in a clustered environment, you could create a component that, on the initialization of the application context, examines the job repository and updates the statuses as required (similar to how you can initialize a datasource on startup).

Related

Synchronize Batch Jobs across multiple Application Instances

I am writing a spring batch application which should only run one Job Instance at a time. This should also be true if multiple application instances are started. Sadly, the jobs can’t be parallelized and are invoked at random.
So, what I am looking for is a spring boot configuration which allows me to synchronize the job execution within one processor as well as in the distributed case. I have already found some approaches like the JobLauncherSynchronizer (https://docs.spring.io/spring-batch-admin/trunk/apidocs/org/springframework/batch/admin/launch/JobLauncherSynchronizer.html) but all the solutions I have found work either only on one processor or protect just a fraction of the job execution.
Is there any spring boot configuration which prevents multiple concurrent executions of the same job, even across multiple concurrently running application instances (which share the same database)?
Thank you in advance.
Is there any spring boot configuration which prevents multiple concurrent executions of the same job, even across multiple concurrently running application instances (which share the same database)?
Not to my knowledge. If you really want to have a global synchronization at the job level (ie a single job instance at a time), you need a global synchronizer like the JobLauncherSynchronizer you linked to.

Spring-compatible mechanism to manage background jobs

I am working on a new functionality for a multi-tenancy web-app, which allows the admin to start a potentially very long running process (ca. 1 - 5 min) by the click of a button in the admin panel.
However it is crucial that such a task can only be executed ONCE at a time for each tenant. Of course we can disable the button after a click, but we cannot prevent the admin (or another admin) from opening another browser tab and clicking the button again.
Is there any existing library which allows us to:
Uniquely identify a job (e.g. by an id like "tenant_001_activation_task")
Start the task in the background
Query if such a task is already running in the background and if so reject any further calls to this function.
I already had a look into quartz and the Spring TaskExecutor. However these two seem to mainly focus on scheduling tasks at a given time (like a cronjob). What I'm looking for is a solution for running and monitoring a background job at any time programmatically.
If you decide to use Quartz, you can simply annotate the relevant job implementation classes with the #DisallowConcurrentExecution annotation.
Please note that this annotation is effective on the Quartz job detail level, not on the job implementation class level. Let us say you have a job implementation class com.foo.MyTenantTask and you annotation this class with the #DisallowConcurrentExecution annotation. Then you register 2 jobs that use this job implementation task - tenant_001_task and tenant_002_task.
If you run tenant_001_task and tenant_002_task, they will be allowed to run concurrently because they are different jobs (job details). However, if you attempt to run multiple instances of tenant_001_task concurrently, Quartz will only execute the first instance and the other instances will be queued up and wait for the first instance to finish executing. Then Quartz will pick one queued instance of tenant_001_task and execute it and so on until all queued up instances of tenant_001_task have been executed.
On the other hand, Quartz will not prevent concurrent execution of tenant_001_task and tenant_002_task instances since these represent different jobs (job details).
Quartz provides various (local, JMX, RMI) APIs that allow you to obtain the list of currently executing jobs, list of all registered jobs and their triggers etc. It will certainly allow you to implement the scheduling logic you described.
If you are building an app to manage and monitor your Quartz jobs, triggers etc., I recommend that you take a quick look into our product called QuartzDesk. It is a management and monitoring GUI for all types of Java Quartz-based applications. There is a public online demo and if you want to experiment locally, you can request a 30-day trial license key. If you need to interact with your Quartz schedulers programmatically (and possibly remotely), you can use various JAX-WS service APIs provided by QuartzDesk.

Blocking a load balanced server environment from sending two emails

I am currently working on a scheduled task that runs behind the scenes of my Spring web application. The task uses a cron scheduler to execute at midnight every night, and clean-up unused applications for my portal (my site allows users to create an application to fill out, and if they don't access the form within 30 days, my background task will delete it from our DB and inform the user to create a new form if needed with an email). Everything works great in my test environment, and I am ready to move to QA.
However, my next environment uses two load balanced servers to process requests. This is a problem, as the cron scheduler and my polling task run concurrently on both servers. While the read/writes to the DB won't be an issue, the issue lies with sending the notification email to the application user. Without any polling locks, two emails have the possibility to be generated and sent, and I would like to avoid this. Normally, we would use a SQL stored procedure and have a field in our DB for a lock, and then set/release whenever the polling code is called, so only one instance of the polling will be executed. However, with my new polling task, we don't have any fields available, so I am trying to work on a SPRING solution. I found this resource online:
http://www.springframework.net/doc-latest/reference/html/threading.html
And I was thinking of using it as
Semaphore _pollingLock = new Semaphore(1);
_pollingLock.aquire();
try {
//run my polling task
}
finally {
//release lock
}
However, I'm not sure if this will just ensure the second instance executes after, or it skips the second instance and will never execute. Or, is this solution not even appropriate, and there is a better solution. Again, I am using Spring java framework, so any solution that exists there would be my best bet.
Two ways that we've handled this sort of problem in the past both start with designating one of our clustered servers as the one responsible for a specific task (say, sending email, or running a job).
In one solution, we set a JVM parameter on all clustered servers identifying the server name of the one server on which your process should run. For example -DemailSendServer=clusterMember1
In another solution, we simply provided a JVM parameter in the startup of this designated server alone. For example -DsendEmailFromMe=true
In both cases, you can add a tiny bit of code in your process to gate it based on the value or presence of the startup parameter.
I've found the second option simpler to use since the presence of the parameter is enough to allow the process to run. In the first solution, you would have to compare the current server name against the value of the parameter instead.
We haven't done much with Spring Batch, but I would assume there is a way to configure Batch to run a job on a single server within a cluster as well.

Clustered Quartz scheduler configuration

I'm working on an application that uses Quartz for scheduling Jobs. The Jobs to be scheduled are created programmatically by reading a properties file. My question is: if I have a cluster of several nodes which of these should create schedules programmatically? Only one of these? Or maybe all?
i have used quartz in a web app, where users, among other things, could create quartz jobs that performed certain tasks.
We have had no problems on that app provided that at least the job names are different for each job. You can also have different group names, and if i remember correctly the jobgroup+jobname combination forms a job key.
Anyway we had no problem with creating an running the jobs from different nodes, but quartz at the time(some 6 months ago, i do not believe this has changed but i am not sure) did not offer the possibility to stop jobs in the cluster, it only could stop jobs on the node the stop command was executed on.
If instead you just want to create a fixed number of jobs when the application starts you better delegate that job to one of the nodes, as the jobs name/group will be read from the same properties file for each node, and conflicts will arise.
Have you tried creating them on all of them? I think you would get some conflict because of duplicate names.
So I think one of the members should create the schedules during startup.
You should only have one system scheduling jobs for the cluster if they are predefined in properties like you say. If all of the systems did it you would needlessly recreate the jobs and maybe put them in a weird state if every server made or deleted the same jobs and triggers.
You could simply only deploy the properties for the jobs to one server and then only one server would try to create them.
You could make a separate app that has the purpose of scheduling the jobs and only run it once.
If these are web servers you could make a simple secured REST API that triggers the scheduling process. Then you could write an automated script to access the API and kick off the scheduling of jobs as part of a deployment or whenever else you desired. If you have multiple servers behind a load balancer it should go to only one server and schedule the jobs which quartz would save to the database backed jobstore. The other nodes in the cluster would receive them the next time they update from the database.

Concurrent database access pattern for web applications

I'm trying to write a Spring web application on a Weblogic server that makes several independent database SELECTs(i.e. they can safely be called concurrently), one of which takes 15 minutes to execute.
Once all the results are fetched, an email containing the results will be sent to a user list.
What's a good way to get around this problem? Is there a Spring library that can help or do I go ahead and create daemon threads to do the job?
EDIT: This will have to be done at the application layer (business requirement) and the email will be sent out by the web application.
Are you sure you are doing everything optimally? 15 minutes is a really long time unless you have a gabillion rows across dozens of tables and need a heckofalot of joins....this is your highest priority -- why is it taking so long?
Do you do the email job at set intervals, or is it invoked from your web app? If set intervals, you should do it in an outside job, possibly on another machine. You can use daemons or the quartz scheduler.
If you need to fire this process off from the web app, you need to do it asynchronously. You could use JMS, or you could just have a table into which you enter a new job request, with daemon process that looks for new jobs every X time period. Firing off background threads is possible, but its error prone and not worth the complication, especially since you have other valid options that are simpler.
If you are asking about Spring support for long-running, possibly asynchronous tasks, you have a choice between Spring JMS support and Spring Batch.
You can use spring quartz to schedule the job. That way the jobs will run in the same container but will not require an http request to trigger them.

Categories

Resources