Best practice for multi-threaded message processing on JMS queues

Best practice for multi-threaded message processing on JMS queues - java

I'm currently adding JMS support to a application-server-like framework. The JMS will be implemented by HornetQ (stand-alone broker, hornetq jars on the servers classpath) but there is neither JBoss nor spring nor anything else that would provide MDBs.
The next step is to add a message listener to a xa queue that would allow for parallel processing of incoming messages. Some messages would init long running tasks, so the basic idea is to spawn worker threads from the onMessage method.
On my long journey through the internet I came across this discussion, where one of participants mentioned, that he would not do that but use an extra internal queue for the task: the (single threaded) message listener then would simply grab the messages from the inbound queue and create new messages for an internal queue, where at the other end of that internal queue some worker threads fight for the incoming messages. Inbound messages then would be acknowledge once they're "copied" to the internal queue (which is ok for me).
Unfortunatly they don't say why it would be better to not spawn worker threads from the onMessage method - maybe, because the listener would block if all threads from the pool are busy. So I'm looking for pros and cons for the designs decisions:
Start worker threads from the onMessage method of the message listener
Use an internal queue to "send messages to the worker threads"

Transaction limits aside, whether or not to have multiple threads (or processes) reading from a queue simply comes down to whether or not the message order is important. Obviously if the order is important, then a single thread naturally maintains that order, while multiple threads will provide no such guarentee.
What you will normally find, is that order is important but across a subset of all the messages. In this scenario, if a single thread isn't performant, you need to get those messages off the queue and re-queued in as short a possible time because to preserve the order you'll have to use a single thread reading from the initial queue - hence the use of one or more internal queues. The problem this incurs is that the transaction will be closed before the messages are fully processed and so you need some sort of temporary storage to ensure messages don't get dropped if the process were to fall over before the processing had taken place.
If, as your question suggests, you're not too worried about dropping messages then the java.util.concurrent.BlockingQueue sounds like what you need for the internal queues with a single thread servicing each.

Related

ActiveMQ : how to fork-join? Ie. how to emit one message when all subtasks are done

imagine you have some task structure of
Task1
Task2: 1 million separate independent Subtask[i] that can run concurrently
Task3: must run once after ALL Task2 subtasks have completed
And all of Task1, Subtask[i] and Task3 are represented by MQ messages.
How can this be solved on an ActiveMQ? Especially the triggering of a Task3 message once all subtasks are complete.
I know, it's not a queueing problem, it's a fork-join problem. Lets say the environment dictates you must use an ActiveMQ for it.
Using ActiveMQ features, dynamic queues and consumers, stuff like that, is allowed. Using external counters, like a database row representing Task2's progress, is not allowed.

Hidden in this fork-join problem is a state management and observability challenge. Since the database is ruled out, you have to rely on something in-memory or on-queue.
Create a unique id for the task run -- something short, but with enough space to not collide like an airplane locator code-- ie. 34FDSX
Send all messages for the task to a queue://TASK.34FDSX.DATA
Send a control message to queue://TASK.34FDSX.CONTROL that contains the task id and expected total # of messages (including each messageId would be helpful too)
When consumers from queue://TASK.34FDSX.DATA complete their work, they should send a 'done' message to queue://TASK.34FDSX.DONE queue with their messageId or some identifier.
The consumers for the .CONTROL queue and the .DONE queue should be the same process and can track the expected and total completed tasks. Once everything is completed, he can fire the event to trigger Task #3.
This approach provides everything as 'online', and you can also timeout the .CONTROL and .DONE reader if too much time passes before the task completes.
Queue deletion can be done using ActiveMQ destination GC, or as a clean-up step in the .CONTROL/.DONE reader during the occurances when everything completes successfully.
Advantages:
No infinite blocking consumers
No infinite open transactions
State of the TASK is online and observable via the presence of queues and queue metrics-- queue size, enqueue count, dequeue count
The entire solution can be multi-threaded and the only requirement is that for a given task the .CONTROL/.DONE listener is the same consumer, but multiple tasks can have individual .CONTROL/.DONE listeners to scale.

The question here is a bit vague so my answer will have to be a bit vague as well.
Each of the million independent subtasks for "Task 2" can be represented by a single message. All these messages can be in the same queue. You can spin up as many consumers as you want and process all these messages (i.e. perform all the subtasks). Just ensure that these consumers either use client-acknowledge mode or a transacted session so that the message is not removed from the queue until they are done processing the message. Once there are no more messages in the queue then you know "Task 2" is done.
To detect when the queue is empty you can have a "special" consumer on the queue that periodically opens a transacted session and tries to consume a message from the queue. If the consumer receives a message then you can rollback the transacted session to put the message back on the queue and you know that the queue is not empty (i.e. "Task 2" is not done). If the consumer doesn't receive a message then you know the queue is empty and you can send another message indicating this. You could launch this special consumer as part of "Task 2" after all the messages for the subtasks have been sent to avoid detecting an empty queue prematurely.
To be clear, this is a simple solution. You could certainly add more complexity depending on your requirements, but your question just outlined the basic problem so it's unclear what other requirements you have (if any).

Listening to many short-lived, dynamically created queues with Spring AMQP

I'm building an application using RabbitMQ/Spring/Spring AMQP and am having trouble handling the way I've laid out my queues.
Essentially I have one queue that every consumer listens to, with each message basically saying "this queue is ready to be processed by a single consumer". The consumer will then listen to the queue indicated in the message, consume all the messages in that queue, and finally delete it when done.
These short lived queues are all created on the fly as data comes in to be processed and cannot be consumed by multiple consumers (whichever gets the message in the 'ready' queue).
I'm having trouble gracefully handling the consumers in this situation. Right now I just create a new DirectMessageListenerContainer each time a consumer gets a message from the 'ready' queue and then stop it once it has gotten all the messages it needs. It seems like this solution isn't ideal. Is there any better way to handle a situation like this with Spring AMQP/RabbitMQ?

You can add/remove queues to/from existing container(s) at runtime; it is more efficient with the direct container (see Choosing a container).
The MessageProperties has the consumerQueue property to tell you which queue the message came from.

What is the best way to handle JMS Exceptions

I have a for loop that keeps putting messages onto the JMS Queue but its quite possible that in future the for loop may execute way faster than the Queue can handle requests and might reach the max-pool limit.
I am catching the JMSException but the thing is that I don't have any fallback logic in place to resume the job. I mean I can store the state of last element passed on to the queue but I have no clue as how to start putting the messages back to the queue after the Exception has been encountered.. How can I start putting messages back to the Queue and make sure that same Exception wont be thrown.

You should set up your JMS queue with a pool of listeners that's adequate for your peak load. This can be arranged with your app server.
It should also allow a "dead letter queue" where messages that are poisonous in the way you describe will be routed.
It would be good to configure some kind of altering to let you know requests are spilling onto the floor.
I don't understand the fascination with queues anymore. I think a web service with a producer/consumer dequeue and a pool of executors to process requests is a better choice that a queue. That's 1990s IBM technology.

JMS Queue : Multiple Threads to read

I have a java program that puts down into the queue on the other side of the queue, i do have 10-15 consumers; any ONE of which should read the message and process it. If any of the 10-15 consumers get free they pick up the next message from the queue.
Basically, a Consumer can pickup the message from the queue whenever it is free, and only one consumer must pick it up. (without any synchronization blocks or so).
Also on the sender's end can i pause sending the messages into the queue if the queue sizes becomes full(or reaches a certain threshold)?
I am really new to the JMS API. Apologies if this is a newbie question .
Thanks!!

I have to Send messages into a queue and i have 20 threads running as
consumers, who can pick up the data from the queue(once they are
free). so when each thread gets free it goes to the queue checks if
the data is there it picks up and so on.. is this doable?
Yes, it's doable - that's the standard process of doing it with JMS queues. Another alternative would be topics, but with topics, every listener would have to process the same message, not just one, so queues are what you want. Although usually you don't have threads as consumers (I'm not even sure what that means), but message-driven beans. You might consider using them. MDBs run in their own thread anyway.

JMS Client Session Usage

I'm attempting to utilize the .NET Kaazing client in order to interact with a JMS back-end via web sockets. I'm struggling to understand the correct usage of sessions. Initially, I had a single session shared across all threads, but I noticed that this was not supported:
A Session object is a single-threaded context for producing and consuming messages. Although it may allocate provider resources outside the Java virtual machine (JVM), it is considered a lightweight JMS object.
The reason I had a single session was just because I thought that would yield better performance. Since the documentation claimed sessions were lightweight, I had no hesitation switching my code over to use a session per "operation". By "operation" I mean either sending a single message, or subscribing to a queue/topic. In the former case, the session is short-lived and closed immediately after the message is sent. In the latter case, the session needs to live as long as the subscription is active.
When I tried creating multiple sessions I got an error:
System.NotSupportedException: Only one non-transacted session can be active at a time
Googling this error was fruitless, so I tried switching over to transacted sessions. But when attempting to create a consumer I get a different error:
System.NotSupportedException: This operation is not supported in transacted sessions
So it seems I'm stuck between a rock and a hard place. The only possible options I see are to share my session across threads or to have a single, non-transacted session used to create consumers, and multiple transacted sessions for everything else. Both these approaches seem a little against the grain to me.
Can anyone shed some light on the correct way for me to handle sessions in my client?

There are several ways to add concurrency to your application. You could use multiple Connections, but that is probably not desirable due to an increase in network overhead. Better would be to implement a simple mechanism for handling the concurrency in the Message Listener by dispatching Tasks or by delivering messages via ConcurrentQueues. Here are some choices for implementation strategy:
The Task based approach would use a TaskScheduler. In the MessageListener, a task would be scheduled to handle the work and return immediately. You might schedule a new Task per message, for instance. At this point, the MessageListener would return and the next message would be immediately available. This approach would be fine for low throughput applications - e.g. a few messages per second - but where you need concurrency perhaps because some messages may take a long time to process.
Another approach would be to use a data structure of messages for work pending (ConcurrentQueue). When the MessageListener is invoked, each Message would be added to the ConcurrentQueue and return immediately. Then a separate set of threads/tasks can pull the messages from that ConcurrectQueue using an appropriate strategy for your application. This would work for a higher performance application.
A variation of this approach would be to have a ConcurrentQueue for each Thread processing inbound messages. Here the MessageListener would not manage its own ConcurrentQueue, but instead it would deliver the messages to the ConcurrentQueue associated with each thread. For instance, if you have inbound messages representing stock feeds and also news feeds, one thread (or set of threads) could process the stock feed messages, and another could process inbound news items separately.
Note that if you are using JMS Queues, each message will be acknowledged implicitly when your MessageListener returns. This may or may not be the behavior you want for your application.
For higher performance applications, you should consider approaches 2 and 3.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.