Whats the best way to process an asynchronous queue continuously in Java? - java

I'm having a hard time figuring out how to architect the final piece of my system. Currently I'm running a Tomcat server that has a servlet that responds to client requests. Each request in turn adds a processing message to an asynchronous queue (I'll probably be using JMS via Spring or more likely Amazon SQS).
The sequence of events is this:
Sending side:
1. Take a client request
2. Add some data into a DB related to this request with a unique ID
3. Add a message object representing this request to the message queue
Receiving side:
1. Pull a new message object from the queue
2. Unwrap the object and grab some information from a web site based on information contained in the msg object.
3. Send an email alert
4. update my DB row (same unique ID) with the information that operation was completed for this request.
I'm having a hard figuring out how to properly deal with the receiving side. On one hand I can probably create a simple java program that I kick off from the command line that picks each item in the queue and processes it. Is that safe? Does it make more sense to have that program running as another thread inside the Tomcat container? I will not want to do this serially, meaning the receiving end should be able to process several objects at a time -- using multiple threads. I want this to be always running, 24 hours a day.
What are some options for building the receiving side?

"On one hand I can probably create a simple java program that I kick off from the command line that picks each item in the queue and processes it. Is that safe?"
What's unsafe about it? It works great.
"Does it make more sense to have that program running as another thread inside the Tomcat container?"
Only if Tomcat has a lot of free time to handle background processing. Often, this is the case -- you have free time to do this kind of processing.
However, threads aren't optimal. Threads share common I/O resources, and your background thread may slow down the front-end.
Better is to have a JMS queue between the "port 80" front-end, and a separate backend process. The back-end process starts, connects to the queue, fetches and executes the requests. The backend process can (if necessary) be multi-threaded.

If you are using JMS, why are you placing the tasks into a DB?
You can use a durable Queue in JMS. This would keep tasks, even if the JMS broker dies, until they have been acknowledged. You can have redundant brokers so that if one broker dies, the second automatically takes over. This could be more reliable than using a single DB.

If you are already using Spring, check out DefaultMessageListenerContainer. It allows you to create a POJO message driven bean. This can be used from within an existing application container (your WAR file) or as a separate process.

I've done this sort of thing by hosting the receiver in an app server, weblogic in my case, but tomcat works fine, too. Don't poll the queue, use an event-based model. This could be hand-coded or it could be a message-driven web service. If the database update is idempotent, you could update the database and send the email, then issue the commit on the queue. It's not a problem to have several threads that all read from the same queue.
I've use various JMS solutions, including tibco, activemq (before apache subsumed it) and joram. Joram was the more reliable opensource solution, but that may have changed now that it's part of apache.

Related

SpringBoot Process Runtime Performance issue

I have below code which formats the files from one form to another. Process takes some time so we have wait time of 5 mins in case file takes time to process we destroy it.
The application flow is a http call is made from browser which hits spring boot #Controller class and lastly executes below code which is into #Service class of springboot application.
On load test I see in task manager many formatter.exe present even after springboot application is closed. My question is it correct way to implement in multi user concurrent environment. Also can some help me how can I increase performance executing "exe" when multiple request are made simultaneously
process = Runtime.getRuntime().exec(runtime.exec("c:\\modifier\\formatter.exe"););
if (!process.waitFor(5, TimeUnit.MINUTES)) {
process.destroyForcibly();
process = null;
}
This is not a good practice to wait for http request to complete in 5 minutes as well as waiting for separate process to complete. I assume, your endpoint is synchronized (not async request mapping) since you have not provided mapping details.
If you start a separate process and until you explicitly shutdown or kill it, that certain process will be running (if that process hanged). Refer this question to get an understand how to terminate a process and refer this document as well.
As I said, keeping a http request waiting for 5 minutes is not a good practice. I suggest a different approach for this solution as you use Spring Boot. You can simply make your endpoint asynchronous using #Async annotation for not to wait the request till process to complete. (How To Do #Async in Spring is a good document to see in this case)
Now you can change the controller implementation to use a message broker (RabbitMQ, ActiveMQ, JMS and so forth) queue requests and immediately respond back to the client (Messaging with RabbitMQ). So, your client (browser) see the response immediately even without process is started. You can handle the response at client side as you want then.
Now you can write a different program to dequeue messages from the broker and start the separate process. It doesn't matter how long that process takes as we have already given the response to the client and you don't need to destroy the process until that process is completed (if it hanged, just kill the process and re-queue message to the queue. This way we can ensure that every request will be processed).
After the process is done, you can notify the client by a push notification or Websocket implementation with resulted data.
I know this could be overdoing a simple task. But if you want to keep your feature's reliability and usability, this is worth doing. Also, you can use this in Microservice architecture. In fact this is a concept of Microserivces. Here is a good read to learn some about this approach.

Clustered event driven Java application - Should I use Websockets or polling?

I'm creating a monitor application that monitors the activities of a user. There are four elements in my system:
EventCatcher: The EventCatcher is responsible for catching all the events that happen in a subsystem and pushes the data to the EventHandler. Based from observation, there is an average of 10 events per second that is being pushed to the EventHandler. Some events are UserLogin, UserLogout.
EventHandler: The EventHandler is a singleton class that handles all the incoming events from the EventCatcher. It also keeps track of all the logged in users in the system. So, whenever the EventHandler receives a UserLogin event, the User object is extracted from the event and is stored in a HashMap. When a UserLogout event is received, that User object will be remove from the HashMap. This class also maintains a Set of all active Websocket sessions because everytime an event has occurred, I would want to inform all the open sessions that a particular event happened.
Websocket Endpoint: This is just a simple Java class annotated with #ServerEndpoint.
Clients: The system I will be building is for internal (company) use only. At production, at most, there will only be around 5 - 10 clients. All the clients will be receiving the same information every time an event has occurred.
So right now I am trying to convince my supervisor that Websockets is the way to go, however, my supervisor finds it really unnecessary because a simple polling solution would do the trick.
His points are:
We don't really need up-to-date information by the millisecond. We can poll every second.
If I was to maintain a list of open WebSocket sessions, how would that work in a clustered environment (we use a load balancer)
If I plan to send information to the client every time an event (UserLogin, UserLogout) has occurred, I should be able to just send small updates to all WebSocket sessions - meaning, I can't be sending a whole JSON dump of everything. So that means, for every WebSocket instance, I would have to maintain another Set of Users and properly maintain it to mirror the Set contained in the EventHandler.
What my supervisor suggests is that I lose the WebSocket and just convert it to a simple Servlet and let the clients poll every second to receive the entire JSON dump.
In this scenario, should I stick with WebSockets? Or should I just poll?
The main advantage, as far as I've read, of Websockets vs. polling is that by using Websockets, you will have a persistent connection from client to server. HTTP is not really meant for real-time data.
Also, polling requires sending an HTTP request every time and every request comes with HTTP headers. If an HTTP request header contains 800 bytes, then that's 48kb sent per minute per client. With a WebSocket, this isn't problem.
But then again, we won't really have a lot of active clients. We're not concerned about third parties sniffing our requests because this system is for company use only - internal use! And I believe my supervisor wants something simple and reliable.
I am fine with either way. I just want to be sure whether I'm using the right tool for the job.
Additional question: If WebSockets is the way to go, is there any reason why I should consider polling?
The entire purpose of WebSocket is to efficiently support continuing connections between client and server.
I’m not clear on how you are implementing your app. If this is a web app running in a Servlet environment leveraging WebSocket support in the web server, be aware that you need to use recent versions of the Servlet container. For example, with Tomcat you must use either version 8 or the latest updates to version 7.
And of course the web browser must have support for WebSocket.
Be aware that WebSocket is still a new technology that has been changing and evolving in both the specs and the implementations.
Atmosphere
You may want to consider using the Atmosphere framework. Atmosphere supports multiple techniques of Push including WebSocket & Comet.
The Vaadin web-app framework leverages Atmosphere to provide automatic support for Push in your app. By default, WebSocket is automatically attempted first. If WebSocket is not available, Vaadin+Atmosphere falls back automatically to the other techniques including polling.

Using JMS for long running file creation process for web application

We have a requirement to allow users to generate search result exports in various formats. The problem is that the size of the exports can vary and take several seconds to minutes to complete. I want to allow users to be able to fire the request and continue doing other things while it runs, but I don't want to impair the web application server's performance by using background threads if necessary.
My initial idea is to decouple the web application and the generation process. I could use JMS with a message driven bean (MDB) that handles the file generation that is deployed separately from the web application; allowing to scale them individually based on future needs.
Technically, I see the web application maintaining a list of requests that it has started and sent JMS messages for. As the MDB completes, it sends updates back to a queue the web application listens on and updates the list of requests accordingly with status and perhaps file URI information. When user want to download a generated file they requested, the file is streamed to the browser and then removed.
As an added precaution, the MDB would also fire a delayed message into a cleanup queue that after the delay has expired, the MDB checks the URI of the generated file and if it continues to exist, removes the file and notifies the web application so it may update it's internal list accordingly perhaps by removing it or marking it as having been automatically removed.
The beauty here is if I need to increase the number of concurrent export jobs, I can easily spawn up another JMS client process or tweak the existing processes to run more concurrent MDB handlers without having to touch the web application itself.
I'm curious if there are other alternatives I could be overlooking, concerns I should be considering, or whether this is a solid decoupled solution that has worked for others in the past.
Given your scenario, i would do with session beans.
or there is JMX set up for jobs but they are legacy ways. i dont like jmx.
But just to let you know if these are limited report and needs once per day kind of, and simple form is to use quartz job, But i think you cant run if you can multiple job at same time.
well there are quartz job set up which are rally easy to set up , and you can trigger it from your application, and if you want some clean up , you can create multiple jobs and intelligent you can make them dependent.
you can refer here for quartz set up .

using JMS for long running processes?

Can someone point me to a tutorial or similar code where JMS is used by a web app to execute a long running background process? (instead of using threads), I'm fairly familiar with the concepts of JMS messaging, but never used any JMS API or brokers (i'm looking at learning Apache ActiveMQ)
I'd like to be able to:
submit a message to the queue to run a process
check the status (progress) on that process at arbitrary times
Thanks!
The real point of using JMS in your context is to start tasks asynchronously. This is called fire and forget in middleware lingo. JMS has guaranteed delivery semantics, meaning that once the message has been put on the queue it is guaranteed to get there ... eventually.
The idea is you do any tasks you need to do and if you have any tasks in the process that can be done at a later time, then you put a message on a queue and later it will execute. This allows you to cut down processing by a significant amount while somebody is waiting for a response.
Another benefit of JMS is that the different parts of the system do not need to be running at the same time. The part that consumes messages can be down for maintenance while your front end still works.
The previous post is accurate in terms of a model to put orders or requests into a queue asynchronously and then have them be picked up later. However, it doesn't really address the question of long running processes.
In terms of queues and topics, the benefit of persistent queues is that if there are no consumers on the queue then messages will be waiting for consumption until there is a subscriber. In a topic, you need to create a durable subscription in order to make sure a consumer that is not connected will receive messages that are sent in its absence once it reconnects.
So, how are you defining a long-running-process? For a multi-step process you would typically use something like a workflow engine. There are options like a BPM tool or something like "OS Workflow". You can also do a home-grown solution that could look like the example below
1) There would need to be some sort of workflow definition that defines the steps in the process. This could be a properties file or an XML file.
2) Web App puts a message on a queue or topic (pub/sub) with an indication of the process to be executed (or you can have specific destinations for different processes)
3) A Dispatcher MDB picks the 'order' up off the queue with a status of 'NEW' and starts processing the first step.
4) Once the step is complete, the MDB puts a new message on the queue indicating the process being executed and either the next step to be executed, or the last step that was executed (depending on how deterministic you want the process to be)
5) The MDB picks up the message and sees that the process is 'IN_PROGRESS'. It either determines the next step to be executed or reads the step to be executed next from the message (either a JMS header value or within the message, perhaps in an XML format)
6) Steps 4 & 5 are repeated until the process instance is complete
In this case you will need an external representation of the order and process instance information. This will allow you to check the status of a request from your WebApp. Your order would need to be read and persisted with an updated status after each step in the process such that the WebApp could access the status information.
The key component of this architecture is the dispatcher MDB that listens for messages and executes the next step of the process. When I worked with OS Workflow that was one key piece that was missing. In this manner, you can control the number of threads that are executing process steps by controlling the number of MDB's in the pool and consumers on the queue. In this architecture I would recommend a queue over a topic for the workflow steps. However after each process step you could publish a message to a topic for subscribers to get updated status information.
With the Java EE6 technologies including JPA you could easily create an XSD, generate domain data model POJO's with JAXB and use JPA for persistence. We did a webcast earlier this year that covered the JEE6 technologies that are currently supported in WebLogic. Here are the replays: http://www.oracle.com/technetwork/middleware/weblogic/learnmore/weblogic-javaee6-webcasts-358613.html.
I'm also still interested to speak with you about your JBoss migration :) jeffrey.west#oracle.com

java: what are the best techniques for communicating with a batch server?

I've a WEB application (with pure Java servlet) that have some heavy computational work, with database access, that can be done in asynchronous mode.
I'm planning to use a dedicated server to execute such batch jobs and I'm wondering which tools/techniques/protocols to use for communication between servlets in the WEB server and batch jobs in the new dedicated server.
I'm looking at JMS. Is it the right choice?
There are industry standard and/or widely adopted techniques?
I need also queue and priority handling for multiple simultaneous jobs.
JMS is a pretty standard solution. The high-end platforms (Sun's JCAPS, for example) makes heavy use of JMS to partition and manage the workload of web services.
There are many advantages to buying a high-end JMS implementation from Sun (or IBM or Microsoft). First, you get things like reliable message queues that are backed to the file system. No message can get lost. Second, you get some monitoring and management tools.
One cool thing is to have a JMS queue with (potentially) multiple subscribers to do workload balancing.
Another cool thing is to have JMS topic which has a logging process as well as the real work process subscribed. The logging process picks off the messages and simply records the essential stages of the job being started and stopped.
Messaging is one of the best options.
Make the messaging framework very generic so that it can handle any type of batch jobs.
One approach is to have an event/task manager where you put an event on the queue and the queue consumer processes the event and converts it into a set of tasks. The tasks can then be executed by separate task handlers. A task can also generate some more events that can be again put on the queues to provide a feedback loop. This way you can add work flow like features to the framework and allow your batch jobs to have dependencies on each other.
JMS would be the appropriate solution for sending your batch jobs from the servlet. It may not be the best solution for the batch server to communicate with the servlet though, as it cannot be a listener to messages.
As I don't know what the communication from the batch server to the servlet is supposed to entail, I can only say that there are probably several options you can use (yes JMS is one of them). But they all basically rely on polling calls to the servlet which will then check in some way to see if there is anything from the batch server waiting. This could simply be a servlet on the batch server or making receive calls to a JMS response queue. Other solutions are available, but the point is it is not asynchronous, unless you have the ability to push from the batch server all the way to you client end (a browser I am guessing) via something like AJAX.
Anyway, just something to keep in mind.
Another alternative for asynchronous processing is to have the web application store the request in the database, and have the batch process poll the database for new batch jobs to process. Since your application appears to be smaller (pure Java Servlets) this may be a simpler and lower cost solution.
Hope it helps.
We use JMS with web services:
Client requests computation via web service
Server writes JMS message, and creates an ID value which is stored in a database along with a status (initially "Pending"). Server returns the id to the client.
Server (can be separate server) reads JMS message, does computation, and when finished updates the status to "Completed" in the database
While the computation is ongoing, the client is polling the server to determine the status using another web service (along with the id). The server returns the status which is retrieved from the database. Once the server computation is completed, the client will see the "Completed" status and know that the computation is complete.

Categories

Resources