I want to somehow delay messages for the whole message group.
The thing is that all messages belonging to each message group must be processed in the same order they were posted, sequentially. If one of the messages cannot be consumed - we want to delay it and also delay the remaining ones in the same message group. I do not want to block the consumer - it should be free to process messages from other groups.
How to do that?
I can't say JMS has anything nice built in support for this stuff. Everything is easier with single "stand alone" messages, but there is one thing you could try.
Do a delayed delivery for those messages (in that group).
// Send to same queue once again, but delay 60 sec
if( isGroupMarkedForRedelivery(message.getStringProperty("JMSXGroupID"))){
message.setLongProperty("_HQ_SCHED_DELIVERY", System.currentTimeMillis() + 60000);
producer.send(message); // producer sends to process queue (again).
}
Note that if you need them in the same order, then you should probably not use concurrency in sending and/or receiving. You could of course add more logic to adapt to your situation.
You probably need to make sure isGroupMarkedForRedelivery returns false for a specific group after less amount of time than the "delay".
Related
imagine you have some task structure of
Task1
Task2: 1 million separate independent Subtask[i] that can run concurrently
Task3: must run once after ALL Task2 subtasks have completed
And all of Task1, Subtask[i] and Task3 are represented by MQ messages.
How can this be solved on an ActiveMQ? Especially the triggering of a Task3 message once all subtasks are complete.
I know, it's not a queueing problem, it's a fork-join problem. Lets say the environment dictates you must use an ActiveMQ for it.
Using ActiveMQ features, dynamic queues and consumers, stuff like that, is allowed. Using external counters, like a database row representing Task2's progress, is not allowed.
Hidden in this fork-join problem is a state management and observability challenge. Since the database is ruled out, you have to rely on something in-memory or on-queue.
Create a unique id for the task run -- something short, but with enough space to not collide like an airplane locator code-- ie. 34FDSX
Send all messages for the task to a queue://TASK.34FDSX.DATA
Send a control message to queue://TASK.34FDSX.CONTROL that contains the task id and expected total # of messages (including each messageId would be helpful too)
When consumers from queue://TASK.34FDSX.DATA complete their work, they should send a 'done' message to queue://TASK.34FDSX.DONE queue with their messageId or some identifier.
The consumers for the .CONTROL queue and the .DONE queue should be the same process and can track the expected and total completed tasks. Once everything is completed, he can fire the event to trigger Task #3.
This approach provides everything as 'online', and you can also timeout the .CONTROL and .DONE reader if too much time passes before the task completes.
Queue deletion can be done using ActiveMQ destination GC, or as a clean-up step in the .CONTROL/.DONE reader during the occurances when everything completes successfully.
Advantages:
No infinite blocking consumers
No infinite open transactions
State of the TASK is online and observable via the presence of queues and queue metrics-- queue size, enqueue count, dequeue count
The entire solution can be multi-threaded and the only requirement is that for a given task the .CONTROL/.DONE listener is the same consumer, but multiple tasks can have individual .CONTROL/.DONE listeners to scale.
The question here is a bit vague so my answer will have to be a bit vague as well.
Each of the million independent subtasks for "Task 2" can be represented by a single message. All these messages can be in the same queue. You can spin up as many consumers as you want and process all these messages (i.e. perform all the subtasks). Just ensure that these consumers either use client-acknowledge mode or a transacted session so that the message is not removed from the queue until they are done processing the message. Once there are no more messages in the queue then you know "Task 2" is done.
To detect when the queue is empty you can have a "special" consumer on the queue that periodically opens a transacted session and tries to consume a message from the queue. If the consumer receives a message then you can rollback the transacted session to put the message back on the queue and you know that the queue is not empty (i.e. "Task 2" is not done). If the consumer doesn't receive a message then you know the queue is empty and you can send another message indicating this. You could launch this special consumer as part of "Task 2" after all the messages for the subtasks have been sent to avoid detecting an empty queue prematurely.
To be clear, this is a simple solution. You could certainly add more complexity depending on your requirements, but your question just outlined the basic problem so it's unclear what other requirements you have (if any).
Sometimes due to some external problems, I need to requeue a message by basic.reject with requeue = true.
But I don't need to consume it immediately because it will possibly fail again in a short time. If I continuously requeue it, this may result in infinite loop and requeue.
So I need to consume it later, say one minute later,
And I need to know how many times the messages has been requeue so that I can stop requeue it but only reject it to declare it fails to consume.
PS: I am using Java client.
There are multiple solutions to point 1.
First one is the one chosen by Celery (a Python producer/consumer library that can use RabbitMQ as broker). Inside your message, add a timestamp at which the task should be executed. When your consumer gets the message, do not ack it and check its timestamp. As soon as the timestamp is reached, the worker can execute the task. (Note that the worker can continue working on other tasks instead of waiting)
This technique has some drawbacks. You have to increase the QoS per channel to an arbitrary value. And if your worker is already working on a long running task, the delayed task wont be executed until the first task has finished.
A second technique is RabbitMQ-only and is much more elegant. It takes advantage of dead-letter exchanges and Messages TTL. You create a new queue which isn't consumed by anybody. This queue has a dead-letter exchange that will forward the messages to the consumer queue. When you want to defer a message, ack it (or reject it without requeue) from the consumer queue and copy the message into the dead-lettered queue with a TTL equal to the delay you want (say one minute later). At (roughly) the end of TTL, the defered message will magically land in the consumer queue again, ready to be consumed. RabbitMQ team has also made the Delayed Message Plugin (this plugin is marked as experimental yet fairly stable and potential suitable for production use as long as the user is aware of its limitations and has serious limitations in term of scalability and reliability in case of failover, so you might decide whether you really want to use it in production, or if you prefer to stick to the manual way, limited to one TTL per queue).
Point 2. just requires putting a counter in your message and handling this inside your app. You can choose to put this counter in a header or directly in the body.
I have a FIFO SQS queue, with visibility time of 30 seconds.
The requirement is to read messages as Quickly as possible and clear the queue.
I have code in JAVA in a fashion shown below ( this is just a representation of idea only, not complete code ):
//keep getting messages from FIFO and process them ASAP
while(true)
{
List<Message> messages =
sqsclient.receiveMessage(receiveMessageRequest).getMessages();
//my logic/code here to process these messages and delete them ASAP
}
In the while loop as soon as the messages are received, they are processed and removed from the queue.
But, many times the receiveMessageRequest does not give me messages (returns zero messages).
Also, the messages limitation is only 10 at a time during receive from SQS, which is already an issue, but due to these zero receives, the queues are piling up.
I have no clue why this is happening. The documentation exactly is not clear on this part (or Am I missing in terms of the configuration of the queue?)
Please help!
Note:
1. My FIFO Queue always has messages in this scenario, so there is no case of Queue having zero messages and receive request returning zero
2. The processing and delete times are also Less than the visibility timeout.
Thanks.
Update:
I have started running multiple consumers for processing the FIFO queue. Clearly, one consumer is not coping up with the inflow of messages. I shall update in few days how multiple consumers are performing. Thanks
You have to first make sure that all messages you received are deleted within VisibilityTimeout. If you are using DeleteMessageBatch for deletion make sure that all 10 messages are deleted.
Also, how did you queue messages when you enqueue them?
Order of messages are guaranteed only in a single message group.
This also means that if you set the same group id to all messages, you are limited to a single consumer so that order of messages are preserved for sure. Even if use multiple consumers, all messages that belong to a same group becomes invisible to other consumers until visibility timeout expires.
I had a question on how rabbitmq works with batching acknowledgements. I understand that the Prefetch value is the max number of messages that will get queued before reaching its limit. However, I wasn't sure if the ack's manage themselves or if I have to manage this in code.
Which method is correct?
Send each basicAck with multiple set to true
or
wait until 10 acks were supposed to be sent out and send only the last one and AMQP will automatically send all previous in queue. (with multiple set to true)
TL;DR multiple = true is faster in some cases but requires a lot more careful book keeping and batch like requirements
The consumer gets messages that have a monotonic-ly growing id specific to that consumer. The id is a 64 bit number (it actually might be an unsigned 32 bit but since Java doesn't have that its a long) called the delivery tag. The prefetch is the most messages a consumer will receive that are unacked.
When you ack the highest delivery tag with multiple true it will acknowledge all the unacked messages with a lower delivery tag (smaller number) that the consumer has outstanding. Obviously if you have high prefetch this is faster than acking each message.
Now RabbitMQ knows the consumer received the messages (the unacked ones) but it doesn't know if all those messages have been correctly consumed. So it is on the burden of you the developer to make sure all the previous messages have been consumed. The consumer will deliver the messages in order (I believe internally the client uses a BlockingQueue) but depending on the library/client used downstream the messages might not be.
Thus this really only works well when you are batching the messages together in a single go (e.g. transaction or sending a group of messages off to some other system) or buffering reliably. Often this is done with a blocking queue and then periodically draining the queue to send a group of messages to a downstream system.
On the other hand if you are streaming each message in real time then you can't really do this (ie multiple = false).
There is also the case of one of the message being bad in the group (e.g. drained from internal queue... not rabbit queue) and you won't to nack that bad one. If that is the case you can't use multiple = true either.
Finally if you wait for a certain amount messages (instead of say time) more than the prefetch you will wait indefinitely.... not a good idea. You need to wait on time and number of messages must be <= prefetch.
As you can see its fairly nontrivial to correctly use multiple = true.
First one correction regarding Prefetch value is the max number of messages that will get queued before reaching its limit. - this is not what prefetch value is; prefetch value is the number of UN-ACKed messages that consumer "gets" from the queue. So they are kind of assigned to the consumer but remain in the queue until they are acknowledged. Quote from here, when prefetch is 1
This tells RabbitMQ not to give more than one message to a worker at a
time. Or, in other words, don't dispatch a new message to a worker
until it has processed and acknowledged the previous one.
And for your question:
I wasn't sure if the ack's manage themselves or if I have to manage
this in code.
You can set the auto ack flag to true and then you could say that the ack's manage themselves
I need to create an application wherein I have to retrieve all the elements inside the JMS queue within a given time limit.
For instance, the given the limit is 10 seconds. So every 10 seconds, the application should create a new Thread wherein the Thread is responsible for 1) connecting to the JMS queue and 2) retrieving all the messages during the time of connection.
So in 10 seconds, lets say that there were 15 TextMessages in the queue. I only want the current executing thread to retrieve those 15 TextMessages and nothing else. I'm afraid that the thread would pick up additional messages.
Is there a facility to limit how much messages a consumer can take? Maybe something feature which would let me see how much the queue contains?
One method I can think of is that you create a receiver from a session that uses CLIENT_ACKNOWLEDGE acknowledgement mode. Now start the receiver and receive the messages. Yes you will receive some additional messages. Now as you receive a message get it JMSTimestamp and see whether it belongs to the time duration your thread is interested in. If the message is as per your time requirement acknowledge it. If not do not acknowledge it in which case it will persist on the server and may be picked up by other threads looking for messages with different time stamps.
Another efficient way would be using message selector. Since JMSTimestamp is a message header and can be used in a selector you can take advantage of it. Create receiver with a selector on JMSTimestamp with your time range requirement. Only messages satisfying the selector will be received.