Keeping track of a pool of workers via JMS - java

I have an application consisting of several Java services, which communicate through direct RMI lines. I'm about to upgrade this to something more modern and stable using JMS / ActiveMQ.
The most common scenario is a number of processes connecting to a central service and performing requests. -> This can be implemented in JMS with a named request queue and named or temporary response queues.
Through the above link, processes will also register themselves as event listeners with the central service, which maintains a list of them, and dispatches events by sending it to each subscriber directly via RMI. -> A JMS topic is an ideal replacement for this.
The third scenario is more complex: I'll also have several processes registering themselves as workers with the central service. This central service will have to maintain a pool of those workers, which can be started, stopped and restarted dynamically and unexpectedly.
The pool will then assign tasks to specific workers (e.g. the worker with the lowest load, so no broadcasting or random selection). But I don't think that'll be very difficult, the core problem is keeping the pool up to date.
The workers will also need to communicate back with information about the running tasks or the general worker status (e.g. current load, or, is it accepting new tasks).
How might I implement this, in a stable way, on top of JMS? Well, explaining how to fully implement this complex scenario may be a bit much, but are there general design patterns that can be applied here, assuming the availability of queue- and topiclike channels?

Related

using JMS for long running processes?

Can someone point me to a tutorial or similar code where JMS is used by a web app to execute a long running background process? (instead of using threads), I'm fairly familiar with the concepts of JMS messaging, but never used any JMS API or brokers (i'm looking at learning Apache ActiveMQ)
I'd like to be able to:
submit a message to the queue to run a process
check the status (progress) on that process at arbitrary times
Thanks!
The real point of using JMS in your context is to start tasks asynchronously. This is called fire and forget in middleware lingo. JMS has guaranteed delivery semantics, meaning that once the message has been put on the queue it is guaranteed to get there ... eventually.
The idea is you do any tasks you need to do and if you have any tasks in the process that can be done at a later time, then you put a message on a queue and later it will execute. This allows you to cut down processing by a significant amount while somebody is waiting for a response.
Another benefit of JMS is that the different parts of the system do not need to be running at the same time. The part that consumes messages can be down for maintenance while your front end still works.
The previous post is accurate in terms of a model to put orders or requests into a queue asynchronously and then have them be picked up later. However, it doesn't really address the question of long running processes.
In terms of queues and topics, the benefit of persistent queues is that if there are no consumers on the queue then messages will be waiting for consumption until there is a subscriber. In a topic, you need to create a durable subscription in order to make sure a consumer that is not connected will receive messages that are sent in its absence once it reconnects.
So, how are you defining a long-running-process? For a multi-step process you would typically use something like a workflow engine. There are options like a BPM tool or something like "OS Workflow". You can also do a home-grown solution that could look like the example below
1) There would need to be some sort of workflow definition that defines the steps in the process. This could be a properties file or an XML file.
2) Web App puts a message on a queue or topic (pub/sub) with an indication of the process to be executed (or you can have specific destinations for different processes)
3) A Dispatcher MDB picks the 'order' up off the queue with a status of 'NEW' and starts processing the first step.
4) Once the step is complete, the MDB puts a new message on the queue indicating the process being executed and either the next step to be executed, or the last step that was executed (depending on how deterministic you want the process to be)
5) The MDB picks up the message and sees that the process is 'IN_PROGRESS'. It either determines the next step to be executed or reads the step to be executed next from the message (either a JMS header value or within the message, perhaps in an XML format)
6) Steps 4 & 5 are repeated until the process instance is complete
In this case you will need an external representation of the order and process instance information. This will allow you to check the status of a request from your WebApp. Your order would need to be read and persisted with an updated status after each step in the process such that the WebApp could access the status information.
The key component of this architecture is the dispatcher MDB that listens for messages and executes the next step of the process. When I worked with OS Workflow that was one key piece that was missing. In this manner, you can control the number of threads that are executing process steps by controlling the number of MDB's in the pool and consumers on the queue. In this architecture I would recommend a queue over a topic for the workflow steps. However after each process step you could publish a message to a topic for subscribers to get updated status information.
With the Java EE6 technologies including JPA you could easily create an XSD, generate domain data model POJO's with JAXB and use JPA for persistence. We did a webcast earlier this year that covered the JEE6 technologies that are currently supported in WebLogic. Here are the replays: http://www.oracle.com/technetwork/middleware/weblogic/learnmore/weblogic-javaee6-webcasts-358613.html.
I'm also still interested to speak with you about your JBoss migration :) jeffrey.west#oracle.com

Can I throttle requests made by a distributed app?

My application makes Web Service requests; there is a max rate of requests the provider will handle, so I need to throttle them down.
When the app ran on a single server, I used to do it at the application level: an object that keeps track of how many requests have been made so far, and waits if the current request makes it exceeds the maximum allowed load.
Now, we're migrating from a single server to a cluster, so there are two copies of the application running.
I can't keep checking for the max load at the application code, because the two nodes combined might exceed the allowed load.
I can't simply reduce the load on each server, because if the other node is idle, the first node can send out more requests.
This is a JavaEE 5 environment. What is the best way to throttle the requests the application sends out ?
Since you are already in a Java EE environment, you can create an MDB that handles all requests to the webservice based on a JMS queue. The instances of the application can simply post their requests to the queue and the MDB will recieve them and call the webservice.
The queue can actually be configured with the appropriate number of sessions that will limit the concurrent access to you webservice, thus your throttling is handled via the queue config.
The results can be returned via another queue (or even a queue per application instance).
The N nodes need to communicate. There are various strategies:
broadcast: each node will broadcast to everybody else that it's macking a call, and all other nodes will take that into account. Nodes are equal and maintain individial global count (each node know about every other node's call).
master node: one node is special, its the master and all other nodes ask permission from the master before making a call. The master is the only one that know the global count.
dedicated master: same as master, but the 'master' doesn't do calls on itslef, is just a service that keep track of calls.
Depending on how high do you anticipate to scale later, one or the other strategy may be best. For 2 nodes the simplest one is broadcast, but as the number of nodes increases the problems start to mount (you'll be spending more time broadcasting and responding to broadcats than actually doing WS requests).
How the nodes communicate, is up to you. You can open a TCP pipe, you can broadcats UDP, you can do a fully fledged WS for this purpose alone, you can use a file share protocol. Whatever you do, you are now no longer inside a process so all the fallacies of distributed computing apply.
Many ways of doing this: you might have a "Coordination Agent" which is responsible of handing "tokens" to the servers. Each "token" represents a permission to perform a task etc. Each application needs to request "tokens" in order to place calls.
Once an application depletes its tokens, it must ask for some more before proceeding to hit the Web Service again.
Of course, this all gets complicated when there are requirements with regards to the timing of each calls each application makes because of concurrency towards the Web Service.
You could rely on RabbitMQ as Messaging framework: Java bindings are available.
I recommend using beanstalkd to periodically pump a collection of requests (jobs) into a tube (queue), each with an appropriate delay. Any number of "worker" threads or processes will wait for the next request to be available, and if a worker finishes early it can pick up the next request. The down side is that there isn't any explicit load balancing between workers, but I have found that distribution of requests out of the queue has been well balanced.
This is an interesting problem, and the difficulty of the solution depends to a degree on how strict you want to be on the throttling.
My usual solution to this is JBossCache, partly because it comes packaged with JBoss AppServer, but also because it handles the task rather well. You can use it as a kind of distributed hashmap, recording the usage statistics at various degrees of granularity. Updates to it can be done asynchronously, so it doesn't slow things down.
JBossCache is usually used for heavy-duty distributed caching, but I rather like it for these lighter-weight jobs too. It's pure java, and requires no mucking about with the JVM (unlike Terracotta).
Hystrix was designed for pretty much the exact scenario you're describing. You can define a thread pool size for each service so you have a set maximum number of concurrent requests, and it queues up requests when the pool is full. You can also define a timeout for each service and when a service starts exceeding its timeout, Hystrix will reject further requests to that service for a short period of time in order to give the service a chance to get back on its feet. There's also real time monitoring of the entire cluster through Turbine.

Real world use of JMS/message queues? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I was just reading abit about JMS and Apache ActiveMQ.
And was wondering what real world use have people here used JMS or similar message queue technologies for ?
JMS (ActiveMQ is a JMS broker implementation) can be used as a mechanism to allow asynchronous request processing. You may wish to do this because the request take a long time to complete or because several parties may be interested in the actual request. Another reason for using it is to allow multiple clients (potentially written in different languages) to access information via JMS. ActiveMQ is a good example here because you can use the STOMP protocol to allow access from a C#/Java/Ruby client.
A real world example is that of a web application that is used to place an order for a particular customer. As part of placing that order (and storing it in a database) you may wish to carry a number of additional tasks:
Store the order in some sort of third party back-end system (such as SAP)
Send an email to the customer to inform them their order has been placed
To do this your application code would publish a message onto a JMS queue which includes an order id. One part of your application listening to the queue may respond to the event by taking the orderId, looking the order up in the database and then place that order with another third party system. Another part of your application may be responsible for taking the orderId and sending a confirmation email to the customer.
Use them all the time to process long-running operations asynchronously. A web user won't want to wait for more than 5 seconds for a request to process. If you have one that runs longer than that, one design is to submit the request to a queue and immediately send back a URL that the user can check to see when the job is finished.
Publish/subscribe is another good technique for decoupling senders from many receivers. It's a flexible architecture, because subscribers can come and go as needed.
I've had so many amazing uses for JMS:
Web chat communication for customer service.
Debug logging on the backend. All app servers broadcasted debug messages at various levels. A JMS client could then be launched to watch for debug messages. Sure I could've used something like syslog, but this gave me all sorts of ways to filter the output based on contextual information (e.q. by app server name, api call, log level, userid, message type, etc...). I also colorized the output.
Debug logging to file. Same as above, only specific pieces were pulled out using filters, and logged to file for general logging.
Alerting. Again, a similar setup to the above logging, watching for specific errors, and alerting people via various means (email, text message, IM, Growl pop-up...)
Dynamically configuring and controlling software clusters. Each app server would broadcast a "configure me" message, then a configuration daemon that would respond with a message containing all kinds of config info. Later, if all the app servers needed their configurations changed at once, it could be done from the config daemon.
And the usual - queued transactions for delayed activity such as billing, order processing, provisioning, email generation...
It's great anywhere you want to guarantee delivery of messages asynchronously.
Distributed (a)synchronous computing.
A real world example could be an application-wide notification framework, which sends mails to the stakeholders at various points during the course of application usage. So the application would act as a Producer by create a Message object, putting it on a particular Queue, and moving forward.
There would be a set of Consumers who would subscribe to the Queue in question, and would take care handling the Message sent across. Note that during the course of this transaction, the Producers are decoupled from the logic of how a given Message would be handled.
Messaging frameworks (ActiveMQ and the likes) act as a backbone to facilitate such Message transactions by providing MessageBrokers.
I've used it to send intraday trades between different fund management systems. If you want to learn more about what a great technology messaging is, I can thoroughly recommend the book "Enterprise Integration Patterns". There are some JMS examples for things like request/reply and publish/subscribe.
Messaging is an excellent tool for integration.
We use it to initiate asynchronous processing that we don't want to interrupt or conflict with an existing transaction.
For example, say you've got an expensive and very important piece of logic like "buy stuff", an important part of buy stuff would be 'notify stuff store'. We make the notify call asynchronous so that whatever logic/processing that is involved in the notify call doesn't block or contend with resources with the buy business logic. End result, buy completes, user is happy, we get our money and because the queue is guaranteed delivery the store gets notified as soon as it opens or as soon as there's a new item in the queue.
I have used it for my academic project which was online retail website similar to Amazon.
JMS was used to handle following features :
Update the position of the orders placed by the customers, as the shipment travels from one location to another. This was done by continuously sending messages to JMS Queue.
Alerting about any unusual events like shipment getting delayed and then sending email to customer.
If the delivery is reached its destination, sending a delivery event.
We had multiple also implemented remote clients connected to main Server. If connection is available, they use to access the main database or if not use their own database. In order to handle data consistency, we had implemented 2PC mechanism.
For this, we used JMS for exchange the messages between these systems i.e one acting as coordinator who will initiate the process by sending message on the queue and others will respond accordingly by sending back again a message on the queue.
As others have already mentioned, this was similar to pub/sub model.
I have seen JMS used in different commercial and academic projects. JMS can easily come into your picture, whenever you want to have a totally decoupled distributed systems. Generally speaking, when you need to send your request from one node, and someone in your network takes care of it without/with giving the sender any information about the receiver.
In my case, I have used JMS in developing a message-oriented middleware (MOM) in my thesis, where specific types of object-oriented objects are generated in one side as your request, and compiled and executed on the other side as your response.
Apache Camel used in conjunction with ActiveMQ is great way to do Enterprise Integration Patterns
We have used messaging to generate online Quotes
We are using JMS for communication with systems in a huge number of remote sites over unreliable networks. The loose coupling in combination with reliable messaging produces a stable system landscape: Each message will be sent as soon it is technically possible, bigger problems in network will not have influence on the whole system landscape...

Whats the best way to process an asynchronous queue continuously in Java?

I'm having a hard time figuring out how to architect the final piece of my system. Currently I'm running a Tomcat server that has a servlet that responds to client requests. Each request in turn adds a processing message to an asynchronous queue (I'll probably be using JMS via Spring or more likely Amazon SQS).
The sequence of events is this:
Sending side:
1. Take a client request
2. Add some data into a DB related to this request with a unique ID
3. Add a message object representing this request to the message queue
Receiving side:
1. Pull a new message object from the queue
2. Unwrap the object and grab some information from a web site based on information contained in the msg object.
3. Send an email alert
4. update my DB row (same unique ID) with the information that operation was completed for this request.
I'm having a hard figuring out how to properly deal with the receiving side. On one hand I can probably create a simple java program that I kick off from the command line that picks each item in the queue and processes it. Is that safe? Does it make more sense to have that program running as another thread inside the Tomcat container? I will not want to do this serially, meaning the receiving end should be able to process several objects at a time -- using multiple threads. I want this to be always running, 24 hours a day.
What are some options for building the receiving side?
"On one hand I can probably create a simple java program that I kick off from the command line that picks each item in the queue and processes it. Is that safe?"
What's unsafe about it? It works great.
"Does it make more sense to have that program running as another thread inside the Tomcat container?"
Only if Tomcat has a lot of free time to handle background processing. Often, this is the case -- you have free time to do this kind of processing.
However, threads aren't optimal. Threads share common I/O resources, and your background thread may slow down the front-end.
Better is to have a JMS queue between the "port 80" front-end, and a separate backend process. The back-end process starts, connects to the queue, fetches and executes the requests. The backend process can (if necessary) be multi-threaded.
If you are using JMS, why are you placing the tasks into a DB?
You can use a durable Queue in JMS. This would keep tasks, even if the JMS broker dies, until they have been acknowledged. You can have redundant brokers so that if one broker dies, the second automatically takes over. This could be more reliable than using a single DB.
If you are already using Spring, check out DefaultMessageListenerContainer. It allows you to create a POJO message driven bean. This can be used from within an existing application container (your WAR file) or as a separate process.
I've done this sort of thing by hosting the receiver in an app server, weblogic in my case, but tomcat works fine, too. Don't poll the queue, use an event-based model. This could be hand-coded or it could be a message-driven web service. If the database update is idempotent, you could update the database and send the email, then issue the commit on the queue. It's not a problem to have several threads that all read from the same queue.
I've use various JMS solutions, including tibco, activemq (before apache subsumed it) and joram. Joram was the more reliable opensource solution, but that may have changed now that it's part of apache.

java: what are the best techniques for communicating with a batch server?

I've a WEB application (with pure Java servlet) that have some heavy computational work, with database access, that can be done in asynchronous mode.
I'm planning to use a dedicated server to execute such batch jobs and I'm wondering which tools/techniques/protocols to use for communication between servlets in the WEB server and batch jobs in the new dedicated server.
I'm looking at JMS. Is it the right choice?
There are industry standard and/or widely adopted techniques?
I need also queue and priority handling for multiple simultaneous jobs.
JMS is a pretty standard solution. The high-end platforms (Sun's JCAPS, for example) makes heavy use of JMS to partition and manage the workload of web services.
There are many advantages to buying a high-end JMS implementation from Sun (or IBM or Microsoft). First, you get things like reliable message queues that are backed to the file system. No message can get lost. Second, you get some monitoring and management tools.
One cool thing is to have a JMS queue with (potentially) multiple subscribers to do workload balancing.
Another cool thing is to have JMS topic which has a logging process as well as the real work process subscribed. The logging process picks off the messages and simply records the essential stages of the job being started and stopped.
Messaging is one of the best options.
Make the messaging framework very generic so that it can handle any type of batch jobs.
One approach is to have an event/task manager where you put an event on the queue and the queue consumer processes the event and converts it into a set of tasks. The tasks can then be executed by separate task handlers. A task can also generate some more events that can be again put on the queues to provide a feedback loop. This way you can add work flow like features to the framework and allow your batch jobs to have dependencies on each other.
JMS would be the appropriate solution for sending your batch jobs from the servlet. It may not be the best solution for the batch server to communicate with the servlet though, as it cannot be a listener to messages.
As I don't know what the communication from the batch server to the servlet is supposed to entail, I can only say that there are probably several options you can use (yes JMS is one of them). But they all basically rely on polling calls to the servlet which will then check in some way to see if there is anything from the batch server waiting. This could simply be a servlet on the batch server or making receive calls to a JMS response queue. Other solutions are available, but the point is it is not asynchronous, unless you have the ability to push from the batch server all the way to you client end (a browser I am guessing) via something like AJAX.
Anyway, just something to keep in mind.
Another alternative for asynchronous processing is to have the web application store the request in the database, and have the batch process poll the database for new batch jobs to process. Since your application appears to be smaller (pure Java Servlets) this may be a simpler and lower cost solution.
Hope it helps.
We use JMS with web services:
Client requests computation via web service
Server writes JMS message, and creates an ID value which is stored in a database along with a status (initially "Pending"). Server returns the id to the client.
Server (can be separate server) reads JMS message, does computation, and when finished updates the status to "Completed" in the database
While the computation is ongoing, the client is polling the server to determine the status using another web service (along with the id). The server returns the status which is retrieved from the database. Once the server computation is completed, the client will see the "Completed" status and know that the computation is complete.

Categories

Resources