Batch jobs - prevent concurrency - java

I have several batch jobs running on a SAP Java system using SAP Java Scheduler. Unfortunately, I haven't come across any documentation that shows how to prevent concurrent executions for periodic jobs. All I've seen is "a new instance of the job will be executed at the next interval". This ruins my fi-fo processing logic so I need to find a way to prevent it. If the scheduler API had a way of checking for the same job executions, this would be solved but haven't seen an example yet.
As a general architectural approach, other means to do this seems like using a DB table - an indicator table for marking current executions - or a JNDI parameter which would be checked first when the job starts. I could also "attempt" to use a static integer but that would fail me on clustered instances. The system is J2EE5 compliant and supports EJB 3.0, so a "singleton EJB" is not available either. I could set the max pool size for a bean and achieve a similar result maybe.
I'd like to hear your opinions on how to achieve this goal using different architectures.
Kind Regards,
S. Gökhan Topçu

you can route the jobs to every node just like db sharding, and you can handle it just like running in one node.
or you have to use a center node, something like db or memory cache or zookeeper to prevent same job running on different nodes;

Related

Avoiding concurrency in Spring Batch jobs in a cluster environment

I want to ensure that a Spring job is not started a second time while it still runs. This would be trivial in a single JVM environment.
However, how can I achieve this in a cluster environment (more specifically, in JBoss 5.1 - I know a bit antiquated; if solutions exist for later versions, I'd be interested in those as well).
So, it should be kind of a Singleton pattern across all cluster nodes.
I am considering using database locks or a message queue. Is there a simpler / better performing solution?
You need to synchronize threads that doesn't know nothing each other, so the easiest way is to share some information on a common place. Valid alternatives are:
A shared database
A shared file
An external web service holding the status of the batch process
If you prefer to use a shared database try to use a database like Redis to improve your performance. It is an in memory database with persistence on disk, so accessing the status of the batch process should be enough fast.
This is too late but for future lookups: spring batch uses a jpa repository to synchronize jobs, so you can avoid concurrency.
You can add a Job Listener and in the before step and use JobExecutionDao in it to find all JobExecutions. If there are more than one running - throw an exception and exit the job.

Behavior of executor service in cluster

I had written a code using executor service in java. Here I am creating 10 worker threads to process database fetched rows. Each thread will be assigned with one resultant row. This approach will work fine when the application is deployed and running on single instance/node.
Can anyone suggest how this will behave when my application is deployed in multiple nodes/cluster?
Do I have to take care of any part of code before deploying into cluster?
04/12/15: Any more suggestions?
You should consider the overhead of each task. Unless the task is of moderate size, you might want to batch them.
In a distributed context the overhead if much higher so you are more likely to need to batch the work.
You will need to a framework, so the considerations will depend on the framework you chose.

How to avoid simultaneous quartz job execution when application has two instances

I've got a Spring Web application that's running on two different instances.
The two instances aren't aware of each other, they run on distinct servers.
That application has a scheduled Quartz job but my problem is that the job shouldn't execute simultaneously on the instances, as its a mail sending job, it could cause duplicate emails being sent.
I'm using RAMJobStore and JDBCJobStore is not an option for me due to the large number of tables it requires.(I cant afford to create many tables due to internal restriction)
The solutions I thought about:
-creating a single control table, that has to be checked everytime a job starts (with repeatable read isolation level to avoid concurrency issues) The problem is that if the server is killed, the table might be left in a invalid state.
-using properties to define a single server to be the job running server. Problem is that if that server goes down, jobs will stop running
Has anyone ever experienced this problem and do you have any thoughts to share?
Start with the second solution (deactivate qartz on all nodes except one). It is very simple to do and it is safe. Count how frequently your server goes down. If it is inacceptable then try the first solution. The problem with the first solution is that you need a good skill in mutithreaded programming to implement it without bugs. It is not so simple if multithreading is not your everyday task. And a cost of some bug in your implementation may be bigger than actual profit.

Maintaining a single instance over multiple JVM

I am creating a distributed service and i am looking at restricting a set of time consuming operations to a single thread of execution across all JVMs at any given time. (I will have to deal with 3 JVMs max).
My initial investigations point me towards java.util.concurrent.Executors , java.util.concurrent.Semaphore. Using singleton pattern and Executors or Semaphore does not guarantee me a single thread of execution across Multiple JVMs.
I am looking for a java core API (or at least a Pattern) that i can use to accomplish my task.
P.S: I have access to ActiveMQ within my existing project which i was planning to use in order to achieve single thread of execution across multiple JVM Machines only if i dont have another choice.
There is no simple solution for this with a core java API. If the 3 JVMs have access to a shared file system you could use it to track state across JVMs.
So basically you do something like create a lock file when you start the expensive operation and delete it at the conclusion. And then have each JVM check for the existence of this lock file before starting the operation. However there are some issues with this approach like what if the JVM dies in the middle of the expensive operation and the file isn't deleted.
ZooKeeper is a nice solution for problems like this and any other cross process synchronization issue. Check it out if that is a possibility for you. I think it's a much more natural way to solve a problem like than a JMS queue.

Persistent delayed jobs queue for Java

I'm looking for an existing system to replace an existing slow and complicated self written mechanism of job management.
The existing system:
1 MySQL DB with a long massive table of jobs - the queue
Multiple servers (written in java) all extracting jobs from the queue and processing them
a job might NOT be deleted from the queue after processing it, to rerun it later
a job might create other jobs and insert them to the queue
The limitations:
As more and more jobs are created and inserted in to the queue, it takes longer to extract jobs from it. (The jobs are chosen by priority and type) - create a bottle neck
I'm looking for an existing system that can replace this one, and improve it's performance.
Any suggestions?
Thanks
I don't generally recommend JMS, but it sounds like it really is what you need here. Distributed, transactional, persistent job queue management is what JMS is all about.
Popular open-source implementations include HornetQ and ActiveMQ.
You could:
submit your jobs to Amazon's Simple Queue Service (maybe JAXB marshalled)
dynamically start some EC2 instances according to your queue's length and probably
submit the results (or availability notice for some files on S3) to Simple Notification Service (again JAXB marshalled).
That exactly what we do, using EC2 Spot instances to minimize costs. And that's what I call serious cloud computing ;)

Categories

Resources