RabbitMQ Batch Ack

RabbitMQ Batch Ack - java

I had a question on how rabbitmq works with batching acknowledgements. I understand that the Prefetch value is the max number of messages that will get queued before reaching its limit. However, I wasn't sure if the ack's manage themselves or if I have to manage this in code.
Which method is correct?
Send each basicAck with multiple set to true
or
wait until 10 acks were supposed to be sent out and send only the last one and AMQP will automatically send all previous in queue. (with multiple set to true)

TL;DR multiple = true is faster in some cases but requires a lot more careful book keeping and batch like requirements
The consumer gets messages that have a monotonic-ly growing id specific to that consumer. The id is a 64 bit number (it actually might be an unsigned 32 bit but since Java doesn't have that its a long) called the delivery tag. The prefetch is the most messages a consumer will receive that are unacked.
When you ack the highest delivery tag with multiple true it will acknowledge all the unacked messages with a lower delivery tag (smaller number) that the consumer has outstanding. Obviously if you have high prefetch this is faster than acking each message.
Now RabbitMQ knows the consumer received the messages (the unacked ones) but it doesn't know if all those messages have been correctly consumed. So it is on the burden of you the developer to make sure all the previous messages have been consumed. The consumer will deliver the messages in order (I believe internally the client uses a BlockingQueue) but depending on the library/client used downstream the messages might not be.
Thus this really only works well when you are batching the messages together in a single go (e.g. transaction or sending a group of messages off to some other system) or buffering reliably. Often this is done with a blocking queue and then periodically draining the queue to send a group of messages to a downstream system.
On the other hand if you are streaming each message in real time then you can't really do this (ie multiple = false).
There is also the case of one of the message being bad in the group (e.g. drained from internal queue... not rabbit queue) and you won't to nack that bad one. If that is the case you can't use multiple = true either.
Finally if you wait for a certain amount messages (instead of say time) more than the prefetch you will wait indefinitely.... not a good idea. You need to wait on time and number of messages must be <= prefetch.
As you can see its fairly nontrivial to correctly use multiple = true.

First one correction regarding Prefetch value is the max number of messages that will get queued before reaching its limit. - this is not what prefetch value is; prefetch value is the number of UN-ACKed messages that consumer "gets" from the queue. So they are kind of assigned to the consumer but remain in the queue until they are acknowledged. Quote from here, when prefetch is 1
This tells RabbitMQ not to give more than one message to a worker at a
time. Or, in other words, don't dispatch a new message to a worker
until it has processed and acknowledged the previous one.
And for your question:
I wasn't sure if the ack's manage themselves or if I have to manage
this in code.
You can set the auto ack flag to true and then you could say that the ack's manage themselves

Related

Fast Processing Topic and Slow Processing Topic - Akka Kafka

I have a problem where i need to prioritize some events to be processes earlier and some events lets say after the high priority events. Those events come from one source and i need to prioritize the streams depending on their event type priority to be either forwarded in the high priority or lower priority sink. I'm using kafka and akka kafka streams. So the main problem is i get a lot of traffic at a given point in time. What would here be the preferred scenario?

The first thing to tackle is the offset commit. Because processing will not be in order, committing offsets after processing cannot guarantee at-least-once (nor can it guarantee at-most-once), because the following sequence is possible (and the probability of this cannot be reduced to zero):
Commit offset for high-priority message which has been processed before multiple low-priority messages have been processed
Stream fails (or instance running the stream is stopped, or whatever)
Stream restarts from last committed offset
The low-priority messages are never read from Kafka again, so never get processed
This then suggests that either the offset commit will have to happen before the reordering or we'll need a notion of processed-but-not-yet-committable until the low-priority messages have been processed. Noting that for the latter option, tracking the greatest offset not committed (the simplest strategy which could possibly work) will not work if there's anything which could create gaps in the offset sequence which implies infinite retention and no compaction, I'd actually suggest committing the offsets before processing, but once the processing logic has guaranteed that it will eventually process the message.
A combination of actors and Akka Persistence allows this approach to be taken. The rough outline is to have an actor which is persistent (this is a good fit for event-sourcing) and basically maintains lists of high-priority and low-priority messages to process. The stream sends an "ask" with the message from Kafka to the actor, which on receipt classifies the message as high-/low-priority, assuming that the message hasn't already been processed. The message (and perhaps its classification) is persisted as an event and the actor acknowledges receipt of the message and that it commits to processing it by scheduling a message to itself to fully process a "to-process" message. The acknowledgement completes the ask, allowing the offset to be committed to Kafka. On receipt of the message (a command, really) to process a message, the actor chooses the Kafka message to process (by priority, age, etc.) and persists that it's processed that message (thus moving it from "to-process" to "processed") and potentially also persists an event updating state relevant to how it interprets Kafka messages. After this persistence, the actor sends another command to itself to process a "to-process" message.
Fault-tolerance is then achieved by having a background process periodically pinging this actor with the "process a to-process message" command.
As with the stream, this is a single-logical-thread-per-partition process. It's possible that you are multiplexing many partitions worth of state per physical Kafka partition, in which case you can have multiple of these actors and send multiple asks from the ingest stream. If doing this, the periodic ping is likely best accomplished by a stream fed by an Akka Persistence Query to get the identifiers of all the persistent actors.
Note that the reordering in this problem makes it fundamentally a race and thus non-deterministic: in this design sketch, the race is because for messages M1 from actor B and M2 from actor C sent to actor A may be received in any order (if actor B sent a message M3 to actor A after it sent message M1, M3 would arrive after M1 but could arrive before or after M2). In a different design, the race could occur based on speed of processing relative to the latency for Kafka to make a message available for consumption.

How to reconsume a rejected message later, RabbitMQ

Sometimes due to some external problems, I need to requeue a message by basic.reject with requeue = true.
But I don't need to consume it immediately because it will possibly fail again in a short time. If I continuously requeue it, this may result in infinite loop and requeue.
So I need to consume it later, say one minute later,
And I need to know how many times the messages has been requeue so that I can stop requeue it but only reject it to declare it fails to consume.
PS: I am using Java client.

There are multiple solutions to point 1.
First one is the one chosen by Celery (a Python producer/consumer library that can use RabbitMQ as broker). Inside your message, add a timestamp at which the task should be executed. When your consumer gets the message, do not ack it and check its timestamp. As soon as the timestamp is reached, the worker can execute the task. (Note that the worker can continue working on other tasks instead of waiting)
This technique has some drawbacks. You have to increase the QoS per channel to an arbitrary value. And if your worker is already working on a long running task, the delayed task wont be executed until the first task has finished.
A second technique is RabbitMQ-only and is much more elegant. It takes advantage of dead-letter exchanges and Messages TTL. You create a new queue which isn't consumed by anybody. This queue has a dead-letter exchange that will forward the messages to the consumer queue. When you want to defer a message, ack it (or reject it without requeue) from the consumer queue and copy the message into the dead-lettered queue with a TTL equal to the delay you want (say one minute later). At (roughly) the end of TTL, the defered message will magically land in the consumer queue again, ready to be consumed. RabbitMQ team has also made the Delayed Message Plugin (this plugin is marked as experimental yet fairly stable and potential suitable for production use as long as the user is aware of its limitations and has serious limitations in term of scalability and reliability in case of failover, so you might decide whether you really want to use it in production, or if you prefer to stick to the manual way, limited to one TTL per queue).
Point 2. just requires putting a counter in your message and handling this inside your app. You can choose to put this counter in a header or directly in the body.

ActiveMQ: consumer not getting non-grouped messages once it is selected to handle a specific message group

We're using ActiveMQ (5.14.5).
We have a single producer, and multiple consumers on the same queue.
From time to time we set JMSXGroupID to group several messages together to be consumed on a single consumer. This works as expected.
In parallel, the producer continues to send non-grouped messages (i.e. without JMSXGroupID)
The problem:
We noticed that once a consumer was selected to process a specific group, it no longer gets the non-grouped messages. Even if it is completely idle. The non-grouped messages are always sent to the other consumers.
The rogue consumer returns to consume non-grouped messages only after we close the group that was assigned to it (by setting JMSXGroupSeq=-1).
Is this a normal behavior? We expected that non-grouped messages will continue to be delivered in the same round-robin fashion as usual, to all consumers.
We were unable to find a clear reference to this in ActiveMQ documentation.

There's a bit of a no-win situation for the message broker here. If there are active message groups in play, the the broker has to assume that further messages will be produced that fall into those groups. So a message consumer that has become bound to a particular group needs to remain available to consumer later messages of that group, rather than ungrouped messages. After all, an ungrouped message can be handled elsewhere, while a grouped message can't.
However, we also want to have a fair-ish distribution of messages between consumers. So it makes sense that a consumer that is bound to a group, or groups, could take some work when it is idle.
But how do we know it is idle? What happens if a consumer takes a bunch of ungrouped messages (and don't forget the default pre-fetch behaviour), and then new messages arrive that match its specific group?
The fact that closing a group restores the "group consumer" to default behaviour suggests to me that this is not a bug, but a deliberate attempt to make a reasonable compromise in a tricky situation. It seems reasonable to me to ask for a feature to be added, where "group consumers" can take part in ungrouped workload, but I would be inclined to see that as an enhancement.
Just my $0.02, of course.

Amazon SQS - FIFO Queue message request, inconsistent receives

I have a FIFO SQS queue, with visibility time of 30 seconds.
The requirement is to read messages as Quickly as possible and clear the queue.
I have code in JAVA in a fashion shown below ( this is just a representation of idea only, not complete code ):
//keep getting messages from FIFO and process them ASAP
while(true)
{
List<Message> messages =
sqsclient.receiveMessage(receiveMessageRequest).getMessages();
//my logic/code here to process these messages and delete them ASAP
}
In the while loop as soon as the messages are received, they are processed and removed from the queue.
But, many times the receiveMessageRequest does not give me messages (returns zero messages).
Also, the messages limitation is only 10 at a time during receive from SQS, which is already an issue, but due to these zero receives, the queues are piling up.
I have no clue why this is happening. The documentation exactly is not clear on this part (or Am I missing in terms of the configuration of the queue?)
Please help!
Note:
1. My FIFO Queue always has messages in this scenario, so there is no case of Queue having zero messages and receive request returning zero
2. The processing and delete times are also Less than the visibility timeout.
Thanks.
Update:
I have started running multiple consumers for processing the FIFO queue. Clearly, one consumer is not coping up with the inflow of messages. I shall update in few days how multiple consumers are performing. Thanks

You have to first make sure that all messages you received are deleted within VisibilityTimeout. If you are using DeleteMessageBatch for deletion make sure that all 10 messages are deleted.
Also, how did you queue messages when you enqueue them?
Order of messages are guaranteed only in a single message group.
This also means that if you set the same group id to all messages, you are limited to a single consumer so that order of messages are preserved for sure. Even if use multiple consumers, all messages that belong to a same group becomes invisible to other consumers until visibility timeout expires.

Amazon SQS Long Polling not returning all messages

I have a requirement to read all messages in my Amazon SQS queue in 1 read and then sort it based on created timestamp and do business logic on it.
To make sure all the SQS hosts are checked for messages, I enabled long polling. The way I did that was to set the default wait time for the queue as 10 seconds. (Any value more than 0 will enable long polling).
However when I tried to read the queue, it still did not give me all the messages and I had to do multiple reads to get all the messages. I even enabled long polling through code per receive request, still did not work. Below is the code I am using.
AmazonSQSClient sqsClient = new AmazonSQSClient(new ClasspathPropertiesFileCredentialsProvider());
sqsClient.setEndpoint("sqs.us-west-1.amazonaws.com");
String queueUrl = "https://sqs.us-west-1.amazonaws.com/12345/queueName";
ReceiveMessageRequest receiveRequest = new ReceiveMessageRequest().withQueueUrl(queueUrl).withMaxNumberOfMessages(10).withWaitTimeSeconds(20);
List<Message> messages = sqsClient.receiveMessage(receiveRequest).getMessages();
I have 3 messages in the queue and each time I run the code I get a different result, sometimes I get all 3 messages, sometimes just 1. The visibility timeout I set as 2 seconds, just to eliminate the messages becoming invisible as the reason for not seeing them in the read.
This is the expected behavior for short polling. Long polling is supposed to eliminate multiple polls. Is there anything I am doing wrong here?
Thanks

Long polling is supposed to eliminate multiple polls
No, long polling is supposed to eliminate a large number of empty polls and false empty responses when messsages are actually available. A long poll in SQS won't sit and wait for the maximum amount of wait time just looking for more things to return, or keep searching once it's found something. A long poll in SQS only waits long enough to find something:
“Long polling allows the Amazon SQS service to wait until a message is available in the queue before sending a response. So unless the connection times out, the response to the ReceiveMessage request will contain at least one of the available messages (if any) and up to the maximum number requested in the ReceiveMessage call.”
— http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-long-polling.html (emphasis added)
So, the “something” that SQS finds and returns may be all of the messages (up to your max), or a subset of the messages, because, as has been mentioned, SQS is a distributed system. There was likely an architectural decision to be made between "return as quickly as possible once we've found something" and "search the entire system for everything possible up to the maximum number of message the client will accept" ... and, given those alternatives, it seems reasonable that most applications would prefer the faster response of "give me whatever you can, as quickly as you can."
You don't know that you've actually drained a queue until you get back an empty response from a long poll.

As pointed out by Michael - sqlbot, SQS does not guarantee returning all (or the requested number of) messages even in case of Long Polling. Long Polling just ensures that you do not get false empty responses - i.e. your read requests do not return any messages even when there are messages in the queue.
I had done some experiments around this and found that the number of messages returned in the response approaches the number of the messages requested as you increase the number of messages in the queue. Typically, with 1000+ messages in the queue, in my experiments, I could see that it returned 10 messages (which is by the way the max that can be returned for a read request) everytime. In fact this behavior was observed for Short Polling as well. Even with 100+ messages, the number of messages returned was not 10 all the time, although a good percentage of those requests returned 10 messages back. Obviously, this is not guaranteed, but that is what you would typically see.
I had documented the findings from my experiments in one of my blogs - posting a link to the same below in case you would like to see more details of the experiment.
http://pragmaticnotes.com/2017/11/20/amazon-sqs-long-polling-versus-short-polling/

Because SQS is, on the back-end, a distributed system, there is no guarantee that any particular request will be able to return the maximum number of messages that are being polled for.
You just have to keep calling, till you are confident enough that you have as many items as you would expect, or that the queue has been emptied.

Set the execution time out to a value greater than 0. I have set execution timeout to 2 seconds and it is now returning all 9 messages available in the queue.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.