We have a Spring Integration application which is polling a mongodb:inbound-channel-adapter like so:
<int-mongodb:inbound-channel-adapter channel="n2s.mongoResults"
collection-name="entities"
query="{_id: {$regex: 'mpl/objectives'}})">
<!-- Run every 15 minutes -->
<int:poller fixed-rate="900000"/>
</int-mongodb:inbound-channel-adapter>
Everything works fine. However, this application is deployed to a cluster and so multiple servers are running the same poller. We'd like to coordinate these servers such that only one runs the pipeline.
Of course, the servers don't know about each other, so we probably need to coordinate them through a locking mechanism in a database. Any suggestions on how to achieve this?
Notes:
We have access to both a MongoDB database and an Oracle database in this workflow. From the perspective of the workflow, it makes more sense to lock on the Oracle database.
It's fine if all server execute the polling step and then one server locks to actually process the records, if that's easier to achieve.
Any suggestions on how to achieve this?
You could use distributed locking tool like like Zookeeper. Another alternative would be to change from a simple fixed trigger to a scheduling framework like Quartz which will ensure that the job only executes on a single node.
It's fine if all server execute the polling step and then one server
locks to actually process the records, if that's easier to achieve.
Yea that's what I would do. I think it's by far the easiest approach. See this post for details on how to do locking with Oracle.
There are several options, including:
Set auto-startup="false" and use some management tool to monitor the servers and ensure that exactly one adapter is running (you can use a control-bus or JMX to start/stop the adapter.
Run the application in SpringXD containers; set the module count for the source (containing the mongo adapter) and the XD admin will make sure an instance is running. It uses Zookeeper to manage state.
Use a distributed lock to ensure only one instance processes messages. Spring Integration itself comes with a RedisLockRegistry for such things or you can use any distributed lock mechanism.
Related
I am looking for a good solution how to execute scheduled task on a one from few instances.
The problem:
I have a Java server with Spring Boot. Also I have a scheduled task that runs by using #Scheduled(cron="...") annotation. My application works with load balancer and usually it works on 3 instances. The scheduled task does update of postgres DB and scheduled task always runs on 3 server simultaneously.
How can I run the scheduled task only on one from servers ?
Thanks a lot!
You have to select a leader somehow, selecting a leader can be quite hard https://en.wikipedia.org/wiki/Consensus_(computer_science). There are however quite a lot of solutions that can help in selecting a leader.
I personally like http://curator.apache.org/ a lot. However depending on the tools you already use, there might be already something that can provide the needed leader election support like Redis (https://redis.io/topics/distlock) or your database (Postgres -> Advisory Locks).
The simplest solution however, if you do not need failover capabilities, is to configure one app as your lead in a config file and do not execute the task when the config is not set.
I have an EJB packaged in an EAR and deployed to Glassfish.
Currently we just use Glassfish/Eclipselink for caching.
But our server is starting to come under heavy loads and I want to set it up behind a load balancer on AWS.
The problem is, I don't want my cache to be out of sync for automatically spun up instances. I want the instances to be completely automatic.
I know you can set Glassfish up in a cluster, but as far as I know that isn't automatic. I would have to manage it myself. I want to fully automate everything.
It would be awesome if the Glassfish instances could be completely independent of each other, and I could use Redis or another server like that to offload the cache. That way the cache would be in one place, the Glassfish instances could spin up and down automatically and it would never matter, I wouldn't have to register them with a Glassfish cluster. I could also use the same Redis cache for the front end of the application. Glassfish is running the business layer accessible by API calls. The front end web is running separately. I was going to set up a Redis cache for that also, but if they could both share the same cache, that would be awesome.
Any ideas?
I can only answer on basis of a conceptual level, since I don't know the used products in detail.
Regardless if you add another level of caching, you need to care about the data consistency within your application.
In a cluster setup, a local non-distributed cache is no problem. The consistancy coordination solves this, e.g. via JMS. You need to explore how to setup the consistancy coordination across your cluster.
I'm trying to determine all the things I need to consider when deploying jobs to a clustered environment.
I'm not concerned about parallel processing or other scaling things at the moment; I'm more interested in how I make everything act as if it was running on a single server.
So for I've determined that triggering a job should be done via messaging.
The thing that's throwing me for a loop right now is how to utilize something like the Spring Batch Admin UI (even if it's a hand rolled solution) in a clustered deployment. Getting the job information from a JobExplorer seems like one of the keys.
Is Will Schipp's spring-batch-cluster project the answer, or is there a more agreed upon community answer?
Or do I not even need to worry because the JobRepository will be pulling from a shared database?
Or do I need to publish job execution info to a message queue to update the separate Job Repositories?
Are there other things I should be concerned about, like the jobIncrementers?
BTW, if it wasn't clear that I'm a total noob to Spring batch, let it now be known :-)
Spring XD (http://projects.spring.io/spring-xd/) provides a distributed runtime for deploying clusters of containers for batch jobs. It manages the job repository as well as provides way to deploy, start, restart, etc the jobs on the cluster. It addresses fault tolerance (if a node goes down, the job is redeployed for example) as well as many other necessary features that are needed to maintain a clustered Spring Batch environment.
I'm adding the answer that I think we're going to roll with unless someone comments on why it's dumb.
If Spring Batch is configured to use a shared database for all the DAOs that the JobExplorer will use, then running is a cluster isn't much of a concern.
We plan on using Quarts jobs to create JobRequest messages which will be put on a queue. The first server to get to the message will actually kick off the Spring Batch job.
Monitoring running jobs will not be an issue because the JobExplorer gets all of it's information from the database and it doesn't look like it's caching information, so we won't run into cluster issues there either.
So to directly answer the questions...
Is Will Schipp's spring-batch-cluster project the answer, or is there a more agreed upon community answer?
There is some cool stuff in there, but it seems like over-kill when just getting started. I'm not sure if there is "community" agreed upon answer.
Or do I not even need to worry because the JobRepository will be pulling from a shared database?
This seems correct. If using a shared database, all of the nodes in the cluster can read and write all the job information. You just need a way to ensure a timer job isn't getting triggered more than once. Quartz already has a cluster solution.
Or do I need to publish job execution info to a message queue to update the separate Job Repositories?
Again, this shouldn't be needed because the execution info is written to the database.
Are there other things I should be concerned about, like the jobIncrementers?
It doesn't seem like this is a concern. When using the JDBC DAO implementations, it uses a database sequence to increment values.
Recently I have come across a requirement where in I have to provide a custom jar to applications and this jar would contain threads that would query a database periodically and fetch messages(records) for that particular application which use them. So for example of app A uses this jar, then the threads in the jar would fetch all messages only for app A.
The database is a shared db between apps.
This works fine for standalone apps but for apps deployed over a cluster in an enterprise application server (weblogic in my case), this fails since all nodes on the cluster run in their own JVM and each one spawns a listener thread for the same app. So there can be conditions where in two threads run at the same time and fetch same records and there would be double processing. Cannot use synchronization since that will lead to performance bottle necks.
I cant use singleton timer EJBS. Have heard about the workmanager but not sufficient examples over the net. I am using the spring core framework.
If any of you could give any suggestions, it would be great.
Thanks.
First of all please stop thinking threads if you're dealing with JavaEE, it's supposed to provide higher level of abstraction for higher level of mindsets.
JavaEE 7 provides ManagedScheduledExecutorService
Quartz works great in that scenario - only one node in your JavaEE cluster is going to execute the job
I am looking for a pattern and/or framework which can model the following problem in an easily configurable way.
Every say 3 minutes, I needs to have a set of jobs kick off in a web application context that will concurrently hit web services to obtain the latest version of data, and push it off to a database. The problem is the database will be being heavily used to read the data from to do tons of complex calculations on the data. We are currently using spring so I have been looking at Spring Batch to run this process does anyone have any suggestions/patterns/examples of using Spring or other technologies of a similar system?
We have used ServletContextlisteners to kick off TimerTasks in our web applications when we needed processes to run repeatedly. The ServletContextListener kicks off when the app server starts the application or when the application is restarted. Then the timer tasks act like a separate thread that repeats your code for the specified period of time.
ServletContextListener
http://www.javabeat.net/examples/2009/02/26/servletcontextlistener-example/
TimerTask
http://enos.itcollege.ee/~jpoial/docs/tutorial/essential/threads/timer.html
Is refactoring the job out of the web application and into a standalone app a possibility?
That way you could stick the batch job onto a separate batch server (so that the extra load of the batch job wouldn't impact your web application), which then calls the web services and updates the database. The job can then be kicked off using something like cron or Autosys.
We're using Spring-Batch for exactly this purpose.
The database design would also depend on what the batched data is used for. If it is for reporting purposes, I would recommend separating the operational database from the reporting database, using a database link to obtain the required data from the operational database into the reporting database and then running the complex queries on the reporting database. That way the load is shifted off the operational database.
I think it's worth also looking into frameworks like camel-integration. Also take a look at the so called Enterprise Integration Patterns. Check the catalog - it might provide you with some useful vocabulary to think about the scaling/scheduling problem at hand.
The framework itself integrates really well with Spring.