How to ensure single running process on a distributed environment

How to ensure single running process on a distributed environment - java

On a distributed environment I have many nodes running the same code.
I have a process that handles an event and I want to make sure it is only handled once.
What are the recomended solutions for this requirement?
Here are my options from the top of my head:
Using a message broker (RabbitMQ, ActiveMQ, Kafka).
for instance, creating only one queue for this message.
But I don't like adding another component to the stack.
I dont really like adding a message brokers to the stack just for process sync.
use quartz.
http://www.quartz-scheduler.org/documentation/quartz-2.x/configuration/ConfigJDBCJobStoreClustering .
But I need a shared DB.
What are my other options?

Related

Load Balancing a java application running in a cluster

I have multiple vms running all of the modules in the project. Each request created by the user has to be processed by all modules, but needs to be done only once. So if VM1 picks up a request then module1 can process the request partially, next VM1 or VM2 or any other VM in cluster can pick up and process for module2. And so on.
Since each VM is of limited capacity i would like to use a load balancer for allocating work among individual VM's.
Are there load balancers(open source for java) available which can solve this or do i need to implement it using several load balancing algos(round robin,weighted etc) for solving my requirement?
Edit 1:
Each module is a java class which is independent in itself but needs previous modules to be done before its started.Each Vm is listening to a message bus. As and when a message appears in the bus any of the vm can pick up this and start working on it.

You can try HAProxy (TCP/HTTP loadbalancer ) which is open source, feature rich and quite widely used. Apart from good documentation you can find lots of information available.

Depending on the exact semantics of the problem you're trying to parallelize, you might get good results by chunking your problem into "work packets" of some size and keeping them in a central queue. Then, just have each VM poll a packet from said queue as soon as it finished the previous packet. This is called self-scheduling.

Adding new node to a scalable system with zero downtime

I am working as a developer on a batch processing solution, how it works is that we split a big file and process it across jvms. So we have 4 processor jvms which take a chunk of file and process it and 1 gateway jvm job of gateway jvm is to split the file into no. of processor jvms i.e. 4 and send a rest request which is consumed by processor jvms, rest request has all the details the file location it has to pick the file from and some other details
Now if i want to add another processor jvm without any downtime is there any way we can do it. Currently we are maintaining the urls for 4 jvms in a property file is there any better way to do it ? which provided me the ability to add more jvms without restarting any component

You can consider setting up a load balancer and putting your JVM(s) behind it. The load balancer would be responsible for distributing the incoming requests to the JVMs.
This way you can scale up or scale down your JVM depending on the work load. Also, if one of the JVMs are not working, other part of your system need not care about it anymore.

Not sure what is your use case and tech stack you are following. But it seems that you need to have distributed system with auto-scaling and dynamic provisioning capabilities. Have you considered Hadoop or Spark clusters or Akka?
If you can not use any of it, then solution is to maintain list of JVMs in some datastore (lets say in a table); its dynamic data meaning one can add/remove/update JVMs. Then you need a resource manager who can decide whether to spin up a new JVM based on load or any other conditional logic. This resource manager needs to monitor entire system. Also, whenever you create a task or chunk or slice data then distribute it using message queues such as ApacheMQ, ActiveMQ. You can also consider Kafka for complex use cases. Now a days, application servers such as websphere (Libery profile), weblogic also provide auto-scaling capability. So, if you are already using any of such application server then you can think of making use of that capability. I hope this helps.

Does MQ api support alias modify

I can use the Java MQ Api to put and get messages.
I can also disable gets and put on a queue.
During a migration project, we'll have an App running in parallell. Old and New. Old and New will have their own separate queues. I regulary have messages from a client going to Old. Occasionally want the msgs to flow to New instead.
wondering if MQ supports a gate/switch concept. where via API I can point a queue to go only to New, or only to Old, for a short time.
Trying to avoid going to message based routing via WMB since I dont have to do that today. THe parallel mode is only for a few months.

You do not mention the version of MQ or whether there are message affinities or dependence on preserving the MQMD.MsgID. These are critical in devising a solution to this problem. I'll try to describe enough options so that at least one will be viable whatever version you are at.
Pub/Sub
The easiest thing to do is to have the messages arrive on an alias over a topic. Any message that arrives is published immediately on that topic. Then it is a simple matter to generate administrative subscriptions to direct messages to the queues on which the apps needing the messages are listening. This is entirely a configuration change and requires no external components, processes or code. It is available from v7.1 of MQ and higher, which is to say any of the currently supported versions of MQ.
The down side is that IBM MQ will change the MQMD.MsgID from the time the message is received on the topic to the time it is published on the application's input queue. This breaks the app's ability to use the MQMD.MsgID of the incoming message as a correlation ID when replying. If the requesting app pre-loads the correlation ID or doesn't rely on a correlation ID, this is not an issue.
Aliasing
But for apps where this is an issue, it gets a bit harder. You can alias over a queue and have inbound messages land on the alias. When you need to switch from one queue to another, you change the alias. There are a couple issues with this. The first is that it is never possible to deliver the message stream to more than one of the applications. In a parallel processing test it is often desirable to do exactly that and then compare summary or detail reports.
The second problem is more operational in nature. It isn't possible to change the alias while it is open. If the messages arrive over a RCVR, RQSTR or `CLUSRCVR channel, no problem. Stop the channe, switch the alias and restart the channel. In a series of MQSC script commands this can be done faster than it can be typed. However, if the applications putting the messages are connected in bindings mode or via client directly to the alias, they must all be stopped in order to change the alias.
That said, aliasing works on all versions of MQ out of the box.
Physical copy
One solution that's been around for quite some time is to use the Q program (SupportPac MA01) to direct the messages. In this scenario, the queue on which messages land is a local queue. The Q program is either triggered or set to constantly listen on the queue. When a message arrives, Q then copies it to one or both of the destination queues.
Switching the behavior if Q is triggered involves pre-defining 2 or 3 processes where each defines a different behavior - move new messages to QUEUEA, to QUEUEB or to both. Changing the queue's PROCESS attribute to point to a different process results in an instantaneous change of the behavior.
Alternatively, if Q is configured to listen on the queue forever then changing the behavior involves use of three different scripts to execute it where one causes messages to be copied to QUEUEA, another to QUEUEB and another to both queues. Changing the behavior involves killing the script and starting a different one.
The Q program works with all versions of MQ, regardless of whether it is triggered or scripted.
Downsides to this approach include the obvious - more moving parts. You have to trigger the queue or else make a transactional program act like a daemon. Not hard but if you are betting the business on it then perhaps some monitoring is in order to make sure the input queue doesn't start building.
Recommendation
Of all these methods, I really like the Pub/Sub version. It is extremely reliable, has the least moving parts, and if anything breaks it's under IBM support. When you need to change something, you can do that with minimal impact to the running applications. If at all possible, use that.

Module clustering and JMS

I have a module which runs standalone in a JVM (no containers) and communicates with other modules via JMS.
My module is both a producer in one queue and a consumer in a different queue.
I have then need to cluster this module, both for HA reasons and for workload reasons, and I'm probably going to go with Terracotta+Hibernate for clustering my entities.
Currently when my app starts it launches a thread (via Executors.newSingleThreadExecutor()) which serves as the consumer (I can attach actual code sample if relevant and neccessary).
What I understood from reading questions here is that if I just start up my module on N different JVMs then N different subscribers will be created and each message in the queue will arrive to N subscribers.
What I'd like to do is have only one of them (let's currently say that which one is not important) process that message and so in actuality enable me to process N messages at a time.
How can/should this be done? Am I way off the track?
BTW, I'm using OpenMQ as my implementation but I don't know if that's relevant.
Thanks for any help

A classic case of message handling in clustered environment. This is what I would do.
Use Broadcast message (Channel based) in place of Queue. Queue being useful for point to point communication is not very effective. Set validity of message till the time it is consumed by one of the consumer. This way, other consumers wont even see the message and only one consumer will consume it.

Take a look at JGroups. You may consider implementing your module/subscribers to use jgroups for the kind of synchronization you need. JGroups provide Reliable Multicast Communication.

can JMS used in Java Swing App

I need to design a Swing application, which will need to send out multiple jobs as customer requested. each job is running the same shell scripts which will take 10-30 mins to return a value. (the jobs are not running on application server or as web services. )then the Swing application will need to decide what to do next according to the return value.
my question is if I can use JMS to send out jobs. if not, what do you suggest I look into?
multithreading....
Thank you very much!

Multi-threading is the obvious first approximation here. Take a look at SwingWorker, launch the process in a background thread, monitor the progress (as in show the user if it is still running, perhaps even a view into what is being emitted to the console), etc. These are the obvious choices.
What JMS would solve for you (and you would have to find a light weight JMS implementation that would run on the desktop) is to allow for retries and guarantees that the process runs to completion. Something that takes 20 minutes to run in a shell script doesn't sound like it is a candidate for a retry, but if it is, and it is important that the message really get through instead of just having the thread die and the process forgotten if the user closes the java application, then JMS is the type of thing to look at.

JMS can most certainly be used in a Swing based application. If the shell scripts are to be executed on a server by a service that is listening to the JMS queue and respond on another queue or topic.
There is nothing limiting you from using JMS queues or topics in a desktop application.

JMS is generally used to communicate between processes and between client/server, not really what you're looking for here, unless you're sending them out to a server to be processed, but it doesn't sound like that here. It sounds like you're looking for a work queue, such that a swing app has a button that adds a new task to the queue (where the task is running the shell script). You can then have multiple threads taking tasks of the queue and running the scripts.

You may - or may not - profit from using a job scheduler, like Quartz. Maybe it's overkill, maybe it's just what you need.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.