Stateful Processes as a Service (Java)

Stateful Processes as a Service (Java) - java

I am building a Service that allows customers to run individual worker processes "in the web". The processes are designed to run for a very long time and being fed with new orders about every minute (event driven). The processes are intended to keep on running, even if there are no new orders and all orders have been processed. -> 1 process per customer.
I require the following "functionality":
Start a new process
End the process (on demand, never automatically)
Keep track of the processes / user
Receive new "order" for a process (identify process by customer id)
Inform the customer when his orders can not be worked, in case his/her process ended (e.g. exception occured, someone killed the server ...)
I am looking for patterns or best practices that allow me to solve following problems:
- Process management within one server (e.g. using a static list or singleton-pattern, something like this to keep track of the mapping between user-id and process)
- Process management over many servers (scalability) : One server might run 100-200 processes, if I get more customers, how would I remember on which server the processes run?
I am sure there are others who faced these problems before and certainly there are "right" and "wrong" ways of doing this.

I would highly recommend you create a persistent centralized data store to keep your customer --> process list. Especially when you are talking about minutes between requests.
It will be pretty straightforward to have a dispatch pattern to deal with getting the request to the right server/process. You should probably run it on each machine, allowing it to route the request to the internal processor, or send it over to another machine.
This way you get pretty good failover scaling. With any machine able to dispatch to any other. You will need one master, whose job is to monitor the other machines for failure (read the centralized table, and ping each process (just make it another kind of order).

Related

Blocking a load balanced server environment from sending two emails

I am currently working on a scheduled task that runs behind the scenes of my Spring web application. The task uses a cron scheduler to execute at midnight every night, and clean-up unused applications for my portal (my site allows users to create an application to fill out, and if they don't access the form within 30 days, my background task will delete it from our DB and inform the user to create a new form if needed with an email). Everything works great in my test environment, and I am ready to move to QA.
However, my next environment uses two load balanced servers to process requests. This is a problem, as the cron scheduler and my polling task run concurrently on both servers. While the read/writes to the DB won't be an issue, the issue lies with sending the notification email to the application user. Without any polling locks, two emails have the possibility to be generated and sent, and I would like to avoid this. Normally, we would use a SQL stored procedure and have a field in our DB for a lock, and then set/release whenever the polling code is called, so only one instance of the polling will be executed. However, with my new polling task, we don't have any fields available, so I am trying to work on a SPRING solution. I found this resource online:
http://www.springframework.net/doc-latest/reference/html/threading.html
And I was thinking of using it as
Semaphore _pollingLock = new Semaphore(1);
_pollingLock.aquire();
try {
//run my polling task
}
finally {
//release lock
}
However, I'm not sure if this will just ensure the second instance executes after, or it skips the second instance and will never execute. Or, is this solution not even appropriate, and there is a better solution. Again, I am using Spring java framework, so any solution that exists there would be my best bet.

Two ways that we've handled this sort of problem in the past both start with designating one of our clustered servers as the one responsible for a specific task (say, sending email, or running a job).
In one solution, we set a JVM parameter on all clustered servers identifying the server name of the one server on which your process should run. For example -DemailSendServer=clusterMember1
In another solution, we simply provided a JVM parameter in the startup of this designated server alone. For example -DsendEmailFromMe=true
In both cases, you can add a tiny bit of code in your process to gate it based on the value or presence of the startup parameter.
I've found the second option simpler to use since the presence of the parameter is enough to allow the process to run. In the first solution, you would have to compare the current server name against the value of the parameter instead.
We haven't done much with Spring Batch, but I would assume there is a way to configure Batch to run a job on a single server within a cluster as well.

How to properly throttle web requests to external systems?

My Java web application pulls some data from external systems (JSON over HTTP) both live whenever the users of my application request it and batch (nightly updates for cases where no user has requested it). The data changes so caching options are likely exhausted.
The external systems have some throttling in place, the exact parameters of which I don't know, and which likely change depending on system load (e.g., peak times 10 requests per second from one IP address, off-peak times 100 requests per second from open IP address). If the requests are too frequent, they time out or return HTTP 503.
Right now I am attempting the request 5 times with 2000ms delay between each, giving up if an error is received each time. This is not optimal as sometimes at peak-times nearly all requests fail; I could avoid making these requests and perhaps get at least some to succeed instead.
My goals are to have a somewhat simple, reliable design, and enough flexibility so that I could both pull some metrics from the throttler to understand how well the external systems are responding (and thus adjust how often they are invoked), and to auto-adjust the interval with which I call them (individually per system) so that it is optimal both on off-peak and peak hours.
My infrastructure is Java with RabbitMQ over MongoDB over Linux.
I'm thinking of three main options:
Since I already have RabbitMQ used for batch processing, I could just introduce a queue to which the web processes would send the requests they have for external systems, then worker processes would read from that queue, throttle themselves as needed, and return the results. This would allow running multiple parallel worker processes on more servers if needed. My main concern is that it isn't a very simple solution, and how to manage peak-hour throughput being low and thus the web processes waiting for a long while. Also this converts my RabbitMQ into a critical single failure point; if it dies the whole system stops (as opposed to the nightly batch processes just not running any more, which is less critical). I suppose rpc is the correct pattern of RabbitMQ usage, but not sure. Edit - I've posted a related question How to properly implement RabbitMQ RPC from Java servlet web container? on how to implement this.
Introduce nginx (e.g. ngx_http_limit_req_module), HAProxy (link) or other proxy software to the mix (as reverse proxies?), have them take care of the throttling through some configuration magic. The pro is that I don't have to make code changes. The con is that it is more technology used, and one I've not used before, so chances of misconfiguring something are quite high. It would also likely not be easy to do dynamic throttling depending on external server load, or prioritizing live requests over batch requests, or get statistics of how the throttling is doing. Also, most documentation and examples will likely be on throttling incoming requests, not outgoing.
Do a pure-Java solution (e.g., leaky bucket implementation). Would be simple in the sense that it is "just code", but the devil is in the details; debugging all the deadlocks, starvations and race conditions isn't always fun.
What am I missing here?
Which is the best solution in this case?
P.S. Somewhat related question - what's the proper approach to log all the external system invocations, so that statistics are collected as to how often I invoke them, and what the success rate is?
E.g., after every invocation I'd invoke something like .logExternalSystemInvocation(externalSystemName, wasSuccessful, elapsedTimeMills), and then get some aggregate data out of it whenever needed.
Is there a standard library/tool to use, or do I have to roll my own?
If I use option 1. with RabbitMQ, is there a way to organize the flow so that I get this out of the box from the RabbitMQ console? I wouldn't want to send all failed messages to poison queue, it would fill up too quickly though and in most cases there is no need to re-process these failed requests as the user has already sadly moved on.

Perhaps this open source system can help you a little: http://code.google.com/p/valogato/

How to avoid simultaneous quartz job execution when application has two instances

I've got a Spring Web application that's running on two different instances.
The two instances aren't aware of each other, they run on distinct servers.
That application has a scheduled Quartz job but my problem is that the job shouldn't execute simultaneously on the instances, as its a mail sending job, it could cause duplicate emails being sent.
I'm using RAMJobStore and JDBCJobStore is not an option for me due to the large number of tables it requires.(I cant afford to create many tables due to internal restriction)
The solutions I thought about:
-creating a single control table, that has to be checked everytime a job starts (with repeatable read isolation level to avoid concurrency issues) The problem is that if the server is killed, the table might be left in a invalid state.
-using properties to define a single server to be the job running server. Problem is that if that server goes down, jobs will stop running
Has anyone ever experienced this problem and do you have any thoughts to share?

Start with the second solution (deactivate qartz on all nodes except one). It is very simple to do and it is safe. Count how frequently your server goes down. If it is inacceptable then try the first solution. The problem with the first solution is that you need a good skill in mutithreaded programming to implement it without bugs. It is not so simple if multithreading is not your everyday task. And a cost of some bug in your implementation may be bigger than actual profit.

Backend process VS scheduled task

I have a number of backend processes (java applications) which run 24/7. To monitor these backends (i.e. to check if a process is not responding and notify via SMS/EMAIL) I have written another application.
The old backends now log heartbeat at regular time interval and this new applications checks if they are doing it regularly and notifies if necessary.
Now, We have two options
either run it as a scheduled task, which will run after every (let say) 15 min and stop after doing its job or
Run it as another backend process with 15 min sleep time.
The issue we can foresee right now is that what if this monitor application goes into non-responding state? So, my question is Is there any difference between both the cases or both are same? What option would suit my case more?
Please note this is a specific case and is not same as this or this
Environment: Java, hosted on LINUX server

By scheduled task, do you mean triggered by the system scheduler, or as a scheduled thread in the existing backend processes?
To capture unexpected termination or unresponsive states you would be best running a separate process rather than a thread. However, a scheduled thread would give you closer interaction with the owning process with less IPC overhead.
I would implement both. Maintain a record of the local state in each backend process, with a scheduled task in each process triggering a thread to update the current state of that node. This update could be fairly frequent, since it will be less expensive than communicating with a separate process.
Use your separate "monitoring app" process to routinely gather the information about all the backend processes. This should occur less frequently - whether the process is running all the time, or scheduled by a cron job is immaterial since the state is held in each backend process. If one of the backends become unresponsive, this monitoring app will be able to determine the lack of response and perform some meaningful probes to determine what the problem is. It will be this component that will then notify your SMS/Email utility to send a report.

I would go for a backend process as it can maintain state
have a look at the quartz scheduler from terracotta
http://terracotta.org/products/quartz-scheduler
It will be resilient to transient conditions and you only need provide a simple wrap so the monitor app should be robust providing you get the threading stuff right in the quartz.properties file.

You can use nagios core as core and Naptor to monitoring your application. Its easy to setup and embed with your application development.
You can check at this link:
https://github.com/agunghakase/Naptor/tree/ver1.0.0

Preventing multiple users from doing the same action

I have a swing desktop application that is installed on many desktops within a LAN. I have a mysql database that all of them talk to. At precisely 5 PM everyday, there is a thread that will wake up in each of these applications and try to back up files to a remote server. I would like to prevent all the desktop applications from doing the same thing.
The way I was thinking to do this was:
After waking up at 5PM , all the applications will try to write a row onto a MYSQL table. They will write the same information. Only 1 will succeed and the others will get a duplicate row exception. Whoever succeeds, then goes on to run the backup program.
My questions are:
Is this right way of doing things? Is there any better (easier) way?
I know we can do this using sockets as well. But I dont want to go down that route... too much of coding also I would need to ensure that all the systems can talk to each other first (ping)
Will mysql support such as a feature. My DB is INNO DB. So I am thinking it does. Typically I will have about 20-30 users in the LAN. Will this cause a huge overhead for the DB to handle.

If you could put an intermediate class in between the applications and the database that would queue up the results and allow them to proceed in an orderly manner you'd have it knocked.
It sounds like the applications all go directly against the database. You'll have to modify the applications to avoid this issue.
I have a lot of questions about the design:
Why are they all writing "the same row"? Aren't they writing information for their own individual instance?
Why would every one of them have exactly the same primary key? If there was an auto increment or timestamp you would't have this problem.
What's the isolation set to on the database connection? If it's set to SERIALIZABLE, you'll force each one to wait until the previous one is done, at the cost of performance.
Could you have them all write files to a common directory and pick them up later in an orderly way?
I'm just brainstorming now.

It seems you want to backup server data not client data.
I recommend to use a 3-tier architecture using Java EE.
You could use a Timer Service then to trigger the backup.
Though usually a backup program is an independent program e.g. started by a cron job on the server. But again: you'll need a server to do this properly, not just a shared folder.

Here is what I would suggest. Instead of having all clients wake up at the same time and trying to perform the backup, stagger the time at which they wake up.
So when a client wakes up
- It will check some table in your DB (MYSQL) to see if a back up job has completed or is running currently. If the job has completed, the client will go on with its normal duties. You can decide how to handle the case when the job is running.
- If the client finds that the back up job has not been run for the day, it will start the back up job. At the same time will modify the row to indicate that the back up job has started. Once the back up has completed the client will modify the table to indicate that the back up has completed.
This approach will prevent a spurt in network activity and can also provide a rudimentary form of failover. So if one client fails, another client at a later time can attempt the backup. (this is a bit more involved though. Basically it comes down to what a client should do when it sees that a back up job is on going).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.