I am using Spring AMQP (RabbitMQ implementation) and I am trying to propagate a single transaction into multiple threads.
As an example, Let's say there are 3 queues, names X,Y,Z and first I'm obtaining a message from queue X using thread-1, and next,that message is given to thread-0, and within thread-0 message is cloned and sent to queue Y via, thread-2 and queue Z via thread-3. Thread-0 wait for the completion of both thread-3 and thread-4, to commit or rollback message. Note that's I'm using 4 threads here.
What I want is basically to handle this 3 operations,(getting message and putting it to two queues) as a single transaction. i.e. if I successfully send the message to queue Y, but fails to send it to Z, then the message sent to Y will be rollbacked and the original message will be rolled back into queue X as well.
Up to now, I have managed to propegate transaction information via threadLocals (mainly TransactionStatus and TransactionSynchronizationManager.resources) and I was able to bind these 3 operations into one transaction.
But my problem is with sending ACK/NACK to original queue X, even though I commit/rollback transaction, it only works for queues Y and Z only. The message obtained from X is always in Unacked state.
I have tries channel.basicAck(), RabbitUtils.commitIfNecessary(), approaches, but no success.
Note that I have enabled channelTransacted as well. Any help is highly appreciated.
Spring transactions are bound to a single thread. You might be able to get it to work using the RabbitMQ client directly - but you would have to share the same channel across all the threads. However, the RabbitMQ documentation strongly advises against doing that:
Channel instances must not be shared between threads. Applications should prefer using a Channel per thread instead of sharing the same Channel across multiple threads. While some operations on channels are safe to invoke concurrently, some are not and will result in incorrect frame interleaving on the wire. ...
In any case, even if you do the work on a single thread, RabbitMQ transactions are rather weak and don't guarantee much; see broker semantics.
AMQP guarantees atomicity only when transactions involve a single queue, i.e. all the publishes inside the tx get routed to a single queue and all acks relate to messages consumed from the same queue. ...
Related
Our program is using Queue.
Multiple consumers are processing messages.
Consumers do the following:
Receive on or off status message from the Queue.
Get the latest status from the repository.
Compare the state of the repository and the state received from the message.
If the on/off status is different, update the data. (At this time, other related data are also updated.)
Assuming that this process is handled by multiple consumers, the following problems are expected.
Producer sends messages 1: on, 2: off, and 3: on.
Consumer A receives message #1 and stores message #1 in the storage because there is no latest data.
Consumer A receives message #2.
At this time, consumer B receives message #3 at the same time.
Consumers A and B read the latest data from the storage at the same time (message 1).
Consumer B finishes processing first. Don't update the repository as the on/off state is unchanged.(1: on, 3: on)
Then consumer A finishes the processing. The on/off state has changed, so it processes and saves the work. (1: on, 2: off)
In normal case, the latest data remaining in the DB should be on.
(This is because the message was sent in the order of on -> off -> on.)
However, according to the above scenario, off remains the latest data.
Is there any good way to solve this problem?
For reference, the queue we use is using AWS Amazon MQ and the storage is using AWS dynamoDB. And using Spring Boot.
The fundamental problem here is that you need to consume these "status" messages in order, but you're using concurrent consumers which leads to race-conditions and out-of-order message processing. In short, your basic architecture using concurrent consumers is causing this problem.
You could possibly work up some kind of solution in the database with timestamps as suggested in the comments, but that would be extra work for the clients and extra data stored in the database that isn't strictly necessary.
The simplest way to solve the problem is to just consume the messages serially rather than concurrently. There are a handful of different ways to do this, e.g.:
Define just 1 consumer for the queue with the "status" messages.
Use ActiveMQ's "exclusive consumer" feature to ensure that only one consumer receives messages.
Use message groups to group all the "status" messages together to ensure they are processed serially (i.e. in order).
imagine you have some task structure of
Task1
Task2: 1 million separate independent Subtask[i] that can run concurrently
Task3: must run once after ALL Task2 subtasks have completed
And all of Task1, Subtask[i] and Task3 are represented by MQ messages.
How can this be solved on an ActiveMQ? Especially the triggering of a Task3 message once all subtasks are complete.
I know, it's not a queueing problem, it's a fork-join problem. Lets say the environment dictates you must use an ActiveMQ for it.
Using ActiveMQ features, dynamic queues and consumers, stuff like that, is allowed. Using external counters, like a database row representing Task2's progress, is not allowed.
Hidden in this fork-join problem is a state management and observability challenge. Since the database is ruled out, you have to rely on something in-memory or on-queue.
Create a unique id for the task run -- something short, but with enough space to not collide like an airplane locator code-- ie. 34FDSX
Send all messages for the task to a queue://TASK.34FDSX.DATA
Send a control message to queue://TASK.34FDSX.CONTROL that contains the task id and expected total # of messages (including each messageId would be helpful too)
When consumers from queue://TASK.34FDSX.DATA complete their work, they should send a 'done' message to queue://TASK.34FDSX.DONE queue with their messageId or some identifier.
The consumers for the .CONTROL queue and the .DONE queue should be the same process and can track the expected and total completed tasks. Once everything is completed, he can fire the event to trigger Task #3.
This approach provides everything as 'online', and you can also timeout the .CONTROL and .DONE reader if too much time passes before the task completes.
Queue deletion can be done using ActiveMQ destination GC, or as a clean-up step in the .CONTROL/.DONE reader during the occurances when everything completes successfully.
Advantages:
No infinite blocking consumers
No infinite open transactions
State of the TASK is online and observable via the presence of queues and queue metrics-- queue size, enqueue count, dequeue count
The entire solution can be multi-threaded and the only requirement is that for a given task the .CONTROL/.DONE listener is the same consumer, but multiple tasks can have individual .CONTROL/.DONE listeners to scale.
The question here is a bit vague so my answer will have to be a bit vague as well.
Each of the million independent subtasks for "Task 2" can be represented by a single message. All these messages can be in the same queue. You can spin up as many consumers as you want and process all these messages (i.e. perform all the subtasks). Just ensure that these consumers either use client-acknowledge mode or a transacted session so that the message is not removed from the queue until they are done processing the message. Once there are no more messages in the queue then you know "Task 2" is done.
To detect when the queue is empty you can have a "special" consumer on the queue that periodically opens a transacted session and tries to consume a message from the queue. If the consumer receives a message then you can rollback the transacted session to put the message back on the queue and you know that the queue is not empty (i.e. "Task 2" is not done). If the consumer doesn't receive a message then you know the queue is empty and you can send another message indicating this. You could launch this special consumer as part of "Task 2" after all the messages for the subtasks have been sent to avoid detecting an empty queue prematurely.
To be clear, this is a simple solution. You could certainly add more complexity depending on your requirements, but your question just outlined the basic problem so it's unclear what other requirements you have (if any).
I have a Kafka listener that implements the acknowledgment message listener interface with the following properties:
ackMode - MANUAL_IMMEDIATE
idleEventInterval - 3 Min
While consuming message on the listener it decides if to ack the specific record via acknowledgment.acknowledge() and it works as expected.
In addition, I have a scenario to ack last offset number(keeping it in memory) after X Minutes(also if no messages arrived).
to overcome this requirement I decide to use ListenerContainerIdleEvent that fire each 3 min according to my configuration.
My Questions are:
is there any way to acknowledge Kafka offset as a trigger to an idle event? the idle event contains a reference to KafkaMessageListenerContainer but it encapsulates the ListenerConsumer that hold KafkaConsumer.
is the idle message event send sync(with the same thread of the KafkaListenerConsumer)? From the code, the default implementation is SimpleApplicationEventMulticaster that initialize without TaskExecutor so it invokes the listener on the same thread. can u approve it?
I am using spring-kafka 1.3.9.
Yes, just keep a reference to the last Acknowledgment and call acknowledge() again.
Yes, the event is published on the consumer thread by default.
Even if the event is published on a different thread (executor in the multicaster) it should still work because, instead of committing directly, the commit will be queued and processed by the consumer when it wakes from the poll.
See the logic in processAck().
In newer versions (starting with 2.0), the event has a reference to the consumer so you can interact with it directly (obtain the current position and commit it again), as long as the event is published on the consumer thread.
I'm currently adding JMS support to a application-server-like framework. The JMS will be implemented by HornetQ (stand-alone broker, hornetq jars on the servers classpath) but there is neither JBoss nor spring nor anything else that would provide MDBs.
The next step is to add a message listener to a xa queue that would allow for parallel processing of incoming messages. Some messages would init long running tasks, so the basic idea is to spawn worker threads from the onMessage method.
On my long journey through the internet I came across this discussion, where one of participants mentioned, that he would not do that but use an extra internal queue for the task: the (single threaded) message listener then would simply grab the messages from the inbound queue and create new messages for an internal queue, where at the other end of that internal queue some worker threads fight for the incoming messages. Inbound messages then would be acknowledge once they're "copied" to the internal queue (which is ok for me).
Unfortunatly they don't say why it would be better to not spawn worker threads from the onMessage method - maybe, because the listener would block if all threads from the pool are busy. So I'm looking for pros and cons for the designs decisions:
Start worker threads from the onMessage method of the message listener
Use an internal queue to "send messages to the worker threads"
Transaction limits aside, whether or not to have multiple threads (or processes) reading from a queue simply comes down to whether or not the message order is important. Obviously if the order is important, then a single thread naturally maintains that order, while multiple threads will provide no such guarentee.
What you will normally find, is that order is important but across a subset of all the messages. In this scenario, if a single thread isn't performant, you need to get those messages off the queue and re-queued in as short a possible time because to preserve the order you'll have to use a single thread reading from the initial queue - hence the use of one or more internal queues. The problem this incurs is that the transaction will be closed before the messages are fully processed and so you need some sort of temporary storage to ensure messages don't get dropped if the process were to fall over before the processing had taken place.
If, as your question suggests, you're not too worried about dropping messages then the java.util.concurrent.BlockingQueue sounds like what you need for the internal queues with a single thread servicing each.
I have an MDB that gets subscribed to a topic which sends messages whose content is eventually persisted to a DB.
I know MDBs are pooled, and therefore the container is able to handle more than one incoming message in parallel. In my case, the order in which those messages are consumed (and then persisted) is important. I don't want an MDB instance pool to consume and then persist messages in a different order as they get published in the JMS topic.
Can this be an issue? If so, is there a way of telling the container to follow strict incoming order when consuming messages?
Copied from there:
To ensure that receipt order matches the order in which the client sent the message, you must do the following:
Set max-beans-in-free-pool to 1 for the MDB. This ensures that the MDB is the sole consumer of the message.
If your MDBs are deployed on a cluster, deploy them to a single node in the cluster, [...].
To ensure message ordering in the event of transaction rollback and recovery, configure a custom connection factory with MessagesMaximum set to 1, and ensure that no redelivery delay is configured. For more information see [...].
You should be able to limit the size of the MDB pool to 1, thus ensuring that the messages are processed in the correct order.
Of course, if you still want some parallelism in there, then you have a couple of options, but that really depends on the data.
If certain messages have something in common and the order of processing only matters within that group of messages that share a common value, then you may need to have multiple topics, or use Queues and a threadpool.
If on the other hand certain parts of the logic associated with the arrival of a message can take place in parallel, while other bits cannot, then you will need to split the logic up into the parallel-ok and parallel-not-ok parts, and process those bits accordingly.