How to create arbiter for scheduled tasks on diffrent server - java

I am looking for a good solution how to execute scheduled task on a one from few instances.
The problem:
I have a Java server with Spring Boot. Also I have a scheduled task that runs by using #Scheduled(cron="...") annotation. My application works with load balancer and usually it works on 3 instances. The scheduled task does update of postgres DB and scheduled task always runs on 3 server simultaneously.
How can I run the scheduled task only on one from servers ?
Thanks a lot!

You have to select a leader somehow, selecting a leader can be quite hard https://en.wikipedia.org/wiki/Consensus_(computer_science). There are however quite a lot of solutions that can help in selecting a leader.
I personally like http://curator.apache.org/ a lot. However depending on the tools you already use, there might be already something that can provide the needed leader election support like Redis (https://redis.io/topics/distlock) or your database (Postgres -> Advisory Locks).
The simplest solution however, if you do not need failover capabilities, is to configure one app as your lead in a config file and do not execute the task when the config is not set.

Related

How to autorun a java file on WildFly

I'm developing a web app in JEE technology with WildFly as production server. Il must make some tasks autorun every day for performing somes operations in the databases.
But I've never do it before with JEE technology. If someone could help me.
Thanks.
If you want to create a task/job which needs to run on every day as per schedule. I would suggest you consider the following options.
Database Events / Triggers:
Seems you have highlighted that you need some database operations right so my option is Event / Triggers where you can define the operations and schedule that when do you want run the task.
Quartz:
My second option is quartz schedular where you can configure your existing java Class or Servlets to run as per the time which you set in the Quartz configuration.
Java Thread Executor:
We can achieve your spec by using java thread itself where you can use Java Thread Executor Framework which provides many options and flexibility than traditional java. Anyhow, I'm recommending to use either option 1 or 2 consider it as the last case since we should maintain at least a thread to be live forever.
Note: I have just given the redirection to proceed so please go ahead and explore the concepts and pick the right one based on the cons and props of the approaches also suite to your spec.
Refer: the points discussed here.There are few things related to your spec covered

Scheduler in a java spring boot microservice

We have a microservice written using Spring boot which has its own NoSQL datastore. We are working on functionality whereby we want to delete some old data (in magnitude of 0.5 million documents) and want to do it on a regular basis(once a day) based on presence of records of particular type in data store.
Is having a scheduler which runs once everyday and does the deletion, a correct approach for it ? Also since its a microservice and several instances of it will be running, how do we control that this scheduler runs on only 1 instance ?
There are multiple options I can think of now:
If there is a single instance of micro-service deployed, you can use something like quartz to time the job.
Create a RESTful API for cleanup, invoke it using a script, please refer to https://stackoverflow.com/a/15090893/2817980 for example. This will make sure that only one instance of the service works on cleanup.
If there is a master-slave replica, ask the master to allocate to only 1 instance
Create a scheduled job using something like quartz and then check if the job already taken up by some other scheduler in zookeeper/redis/db or any other storage.
I can discuss more on this.

Spring Integration polling inbound-channel-adapter on multiple servers

We have a Spring Integration application which is polling a mongodb:inbound-channel-adapter like so:
<int-mongodb:inbound-channel-adapter channel="n2s.mongoResults"
collection-name="entities"
query="{_id: {$regex: 'mpl/objectives'}})">
<!-- Run every 15 minutes -->
<int:poller fixed-rate="900000"/>
</int-mongodb:inbound-channel-adapter>
Everything works fine. However, this application is deployed to a cluster and so multiple servers are running the same poller. We'd like to coordinate these servers such that only one runs the pipeline.
Of course, the servers don't know about each other, so we probably need to coordinate them through a locking mechanism in a database. Any suggestions on how to achieve this?
Notes:
We have access to both a MongoDB database and an Oracle database in this workflow. From the perspective of the workflow, it makes more sense to lock on the Oracle database.
It's fine if all server execute the polling step and then one server locks to actually process the records, if that's easier to achieve.
Any suggestions on how to achieve this?
You could use distributed locking tool like like Zookeeper. Another alternative would be to change from a simple fixed trigger to a scheduling framework like Quartz which will ensure that the job only executes on a single node.
It's fine if all server execute the polling step and then one server
locks to actually process the records, if that's easier to achieve.
Yea that's what I would do. I think it's by far the easiest approach. See this post for details on how to do locking with Oracle.
There are several options, including:
Set auto-startup="false" and use some management tool to monitor the servers and ensure that exactly one adapter is running (you can use a control-bus or JMX to start/stop the adapter.
Run the application in SpringXD containers; set the module count for the source (containing the mongo adapter) and the XD admin will make sure an instance is running. It uses Zookeeper to manage state.
Use a distributed lock to ensure only one instance processes messages. Spring Integration itself comes with a RedisLockRegistry for such things or you can use any distributed lock mechanism.

How to see all jobs in a cluster using Spring Batch?

I'm trying to determine all the things I need to consider when deploying jobs to a clustered environment.
I'm not concerned about parallel processing or other scaling things at the moment; I'm more interested in how I make everything act as if it was running on a single server.
So for I've determined that triggering a job should be done via messaging.
The thing that's throwing me for a loop right now is how to utilize something like the Spring Batch Admin UI (even if it's a hand rolled solution) in a clustered deployment. Getting the job information from a JobExplorer seems like one of the keys.
Is Will Schipp's spring-batch-cluster project the answer, or is there a more agreed upon community answer?
Or do I not even need to worry because the JobRepository will be pulling from a shared database?
Or do I need to publish job execution info to a message queue to update the separate Job Repositories?
Are there other things I should be concerned about, like the jobIncrementers?
BTW, if it wasn't clear that I'm a total noob to Spring batch, let it now be known :-)
Spring XD (http://projects.spring.io/spring-xd/) provides a distributed runtime for deploying clusters of containers for batch jobs. It manages the job repository as well as provides way to deploy, start, restart, etc the jobs on the cluster. It addresses fault tolerance (if a node goes down, the job is redeployed for example) as well as many other necessary features that are needed to maintain a clustered Spring Batch environment.
I'm adding the answer that I think we're going to roll with unless someone comments on why it's dumb.
If Spring Batch is configured to use a shared database for all the DAOs that the JobExplorer will use, then running is a cluster isn't much of a concern.
We plan on using Quarts jobs to create JobRequest messages which will be put on a queue. The first server to get to the message will actually kick off the Spring Batch job.
Monitoring running jobs will not be an issue because the JobExplorer gets all of it's information from the database and it doesn't look like it's caching information, so we won't run into cluster issues there either.
So to directly answer the questions...
Is Will Schipp's spring-batch-cluster project the answer, or is there a more agreed upon community answer?
There is some cool stuff in there, but it seems like over-kill when just getting started. I'm not sure if there is "community" agreed upon answer.
Or do I not even need to worry because the JobRepository will be pulling from a shared database?
This seems correct. If using a shared database, all of the nodes in the cluster can read and write all the job information. You just need a way to ensure a timer job isn't getting triggered more than once. Quartz already has a cluster solution.
Or do I need to publish job execution info to a message queue to update the separate Job Repositories?
Again, this shouldn't be needed because the execution info is written to the database.
Are there other things I should be concerned about, like the jobIncrementers?
It doesn't seem like this is a concern. When using the JDBC DAO implementations, it uses a database sequence to increment values.

Can I configure a kind of trigger isolation with Quartz?

My application is split into 2 web applications running in the same container sharing one db.
The first war does only background processing and the other is for the client GUI + some background stuffs.
The application with the client GUI allows the user to configure the scheduling of some tasks that will be executed by the "background application". Basically it configures the Quartz jobs and triggers.
I'd like that the scheduler of the background application handles only the jobs of a certain group (bg-jobs), and that the other scheduler handles the other group (fg-jobs).
Is it possible to configure this kind of isolation with quartz?
Note: I'd like to keep it simple and if I can avoid to use Quartz Where which seems to be liek a hammer to sledge this probably overkill for my need.
Thanks in advance
The simplest and quickest way is to create a separate load of tables for each application. So have one set of quartz tables prefixed with "bg-" and another prefixed with "fg-". Then just change your schedulers configs to point at the appropriate tables. I know it might be a little awkward but you did say you wanted to keep it simple :).

Categories

Resources