Amazon SQS - FIFO Queue message request, inconsistent receives - java

I have a FIFO SQS queue, with visibility time of 30 seconds.
The requirement is to read messages as Quickly as possible and clear the queue.
I have code in JAVA in a fashion shown below ( this is just a representation of idea only, not complete code ):
//keep getting messages from FIFO and process them ASAP
while(true)
{
List<Message> messages =
sqsclient.receiveMessage(receiveMessageRequest).getMessages();
//my logic/code here to process these messages and delete them ASAP
}
In the while loop as soon as the messages are received, they are processed and removed from the queue.
But, many times the receiveMessageRequest does not give me messages (returns zero messages).
Also, the messages limitation is only 10 at a time during receive from SQS, which is already an issue, but due to these zero receives, the queues are piling up.
I have no clue why this is happening. The documentation exactly is not clear on this part (or Am I missing in terms of the configuration of the queue?)
Please help!
Note:
1. My FIFO Queue always has messages in this scenario, so there is no case of Queue having zero messages and receive request returning zero
2. The processing and delete times are also Less than the visibility timeout.
Thanks.
Update:
I have started running multiple consumers for processing the FIFO queue. Clearly, one consumer is not coping up with the inflow of messages. I shall update in few days how multiple consumers are performing. Thanks

You have to first make sure that all messages you received are deleted within VisibilityTimeout. If you are using DeleteMessageBatch for deletion make sure that all 10 messages are deleted.
Also, how did you queue messages when you enqueue them?
Order of messages are guaranteed only in a single message group.
This also means that if you set the same group id to all messages, you are limited to a single consumer so that order of messages are preserved for sure. Even if use multiple consumers, all messages that belong to a same group becomes invisible to other consumers until visibility timeout expires.

Related

ActiveMQ : how to fork-join? Ie. how to emit one message when all subtasks are done

imagine you have some task structure of
Task1
Task2: 1 million separate independent Subtask[i] that can run concurrently
Task3: must run once after ALL Task2 subtasks have completed
And all of Task1, Subtask[i] and Task3 are represented by MQ messages.
How can this be solved on an ActiveMQ? Especially the triggering of a Task3 message once all subtasks are complete.
I know, it's not a queueing problem, it's a fork-join problem. Lets say the environment dictates you must use an ActiveMQ for it.
Using ActiveMQ features, dynamic queues and consumers, stuff like that, is allowed. Using external counters, like a database row representing Task2's progress, is not allowed.
Hidden in this fork-join problem is a state management and observability challenge. Since the database is ruled out, you have to rely on something in-memory or on-queue.
Create a unique id for the task run -- something short, but with enough space to not collide like an airplane locator code-- ie. 34FDSX
Send all messages for the task to a queue://TASK.34FDSX.DATA
Send a control message to queue://TASK.34FDSX.CONTROL that contains the task id and expected total # of messages (including each messageId would be helpful too)
When consumers from queue://TASK.34FDSX.DATA complete their work, they should send a 'done' message to queue://TASK.34FDSX.DONE queue with their messageId or some identifier.
The consumers for the .CONTROL queue and the .DONE queue should be the same process and can track the expected and total completed tasks. Once everything is completed, he can fire the event to trigger Task #3.
This approach provides everything as 'online', and you can also timeout the .CONTROL and .DONE reader if too much time passes before the task completes.
Queue deletion can be done using ActiveMQ destination GC, or as a clean-up step in the .CONTROL/.DONE reader during the occurances when everything completes successfully.
Advantages:
No infinite blocking consumers
No infinite open transactions
State of the TASK is online and observable via the presence of queues and queue metrics-- queue size, enqueue count, dequeue count
The entire solution can be multi-threaded and the only requirement is that for a given task the .CONTROL/.DONE listener is the same consumer, but multiple tasks can have individual .CONTROL/.DONE listeners to scale.
The question here is a bit vague so my answer will have to be a bit vague as well.
Each of the million independent subtasks for "Task 2" can be represented by a single message. All these messages can be in the same queue. You can spin up as many consumers as you want and process all these messages (i.e. perform all the subtasks). Just ensure that these consumers either use client-acknowledge mode or a transacted session so that the message is not removed from the queue until they are done processing the message. Once there are no more messages in the queue then you know "Task 2" is done.
To detect when the queue is empty you can have a "special" consumer on the queue that periodically opens a transacted session and tries to consume a message from the queue. If the consumer receives a message then you can rollback the transacted session to put the message back on the queue and you know that the queue is not empty (i.e. "Task 2" is not done). If the consumer doesn't receive a message then you know the queue is empty and you can send another message indicating this. You could launch this special consumer as part of "Task 2" after all the messages for the subtasks have been sent to avoid detecting an empty queue prematurely.
To be clear, this is a simple solution. You could certainly add more complexity depending on your requirements, but your question just outlined the basic problem so it's unclear what other requirements you have (if any).

Executing receiveMessageRequest to the same SQS FIFO queue

I have two lambda instance running at the same time and these two instances will do a short poll to the same FIFO queue with only a few seconds apart.
The first instance will receive the first 10 messages and the second instance will receive 0 message even though there are a total of 15 messages in the queue.
Why couldn't the second instance get the remaining 5 messages from the queue? Is this the expected behaviour and how can I overcome it?
Your 15 messages (most likely) all belong to the same Message Group ID. Therefore, the remaining 5 will not become available to your consumers until the first 10 are successfully processed and deleted. For FIFO Queues, this is the expected behaviour to preserve the order of messages (cheers #Michael-sqlbot for pointing in the right direction with this answer as per comments below).
Use long polling for Standard Queues. Short polling doesn't check every SQS server, therefore, it has the potential to not get all results. Long polling does check all SQS servers and will therefore get all results.

RabbitMQ Batch Ack

I had a question on how rabbitmq works with batching acknowledgements. I understand that the Prefetch value is the max number of messages that will get queued before reaching its limit. However, I wasn't sure if the ack's manage themselves or if I have to manage this in code.
Which method is correct?
Send each basicAck with multiple set to true
or
wait until 10 acks were supposed to be sent out and send only the last one and AMQP will automatically send all previous in queue. (with multiple set to true)
TL;DR multiple = true is faster in some cases but requires a lot more careful book keeping and batch like requirements
The consumer gets messages that have a monotonic-ly growing id specific to that consumer. The id is a 64 bit number (it actually might be an unsigned 32 bit but since Java doesn't have that its a long) called the delivery tag. The prefetch is the most messages a consumer will receive that are unacked.
When you ack the highest delivery tag with multiple true it will acknowledge all the unacked messages with a lower delivery tag (smaller number) that the consumer has outstanding. Obviously if you have high prefetch this is faster than acking each message.
Now RabbitMQ knows the consumer received the messages (the unacked ones) but it doesn't know if all those messages have been correctly consumed. So it is on the burden of you the developer to make sure all the previous messages have been consumed. The consumer will deliver the messages in order (I believe internally the client uses a BlockingQueue) but depending on the library/client used downstream the messages might not be.
Thus this really only works well when you are batching the messages together in a single go (e.g. transaction or sending a group of messages off to some other system) or buffering reliably. Often this is done with a blocking queue and then periodically draining the queue to send a group of messages to a downstream system.
On the other hand if you are streaming each message in real time then you can't really do this (ie multiple = false).
There is also the case of one of the message being bad in the group (e.g. drained from internal queue... not rabbit queue) and you won't to nack that bad one. If that is the case you can't use multiple = true either.
Finally if you wait for a certain amount messages (instead of say time) more than the prefetch you will wait indefinitely.... not a good idea. You need to wait on time and number of messages must be <= prefetch.
As you can see its fairly nontrivial to correctly use multiple = true.
First one correction regarding Prefetch value is the max number of messages that will get queued before reaching its limit. - this is not what prefetch value is; prefetch value is the number of UN-ACKed messages that consumer "gets" from the queue. So they are kind of assigned to the consumer but remain in the queue until they are acknowledged. Quote from here, when prefetch is 1
This tells RabbitMQ not to give more than one message to a worker at a
time. Or, in other words, don't dispatch a new message to a worker
until it has processed and acknowledged the previous one.
And for your question:
I wasn't sure if the ack's manage themselves or if I have to manage
this in code.
You can set the auto ack flag to true and then you could say that the ack's manage themselves

Amazon SQS Long Polling not returning all messages

I have a requirement to read all messages in my Amazon SQS queue in 1 read and then sort it based on created timestamp and do business logic on it.
To make sure all the SQS hosts are checked for messages, I enabled long polling. The way I did that was to set the default wait time for the queue as 10 seconds. (Any value more than 0 will enable long polling).
However when I tried to read the queue, it still did not give me all the messages and I had to do multiple reads to get all the messages. I even enabled long polling through code per receive request, still did not work. Below is the code I am using.
AmazonSQSClient sqsClient = new AmazonSQSClient(new ClasspathPropertiesFileCredentialsProvider());
sqsClient.setEndpoint("sqs.us-west-1.amazonaws.com");
String queueUrl = "https://sqs.us-west-1.amazonaws.com/12345/queueName";
ReceiveMessageRequest receiveRequest = new ReceiveMessageRequest().withQueueUrl(queueUrl).withMaxNumberOfMessages(10).withWaitTimeSeconds(20);
List<Message> messages = sqsClient.receiveMessage(receiveRequest).getMessages();
I have 3 messages in the queue and each time I run the code I get a different result, sometimes I get all 3 messages, sometimes just 1. The visibility timeout I set as 2 seconds, just to eliminate the messages becoming invisible as the reason for not seeing them in the read.
This is the expected behavior for short polling. Long polling is supposed to eliminate multiple polls. Is there anything I am doing wrong here?
Thanks
Long polling is supposed to eliminate multiple polls
No, long polling is supposed to eliminate a large number of empty polls and false empty responses when messsages are actually available. A long poll in SQS won't sit and wait for the maximum amount of wait time just looking for more things to return, or keep searching once it's found something. A long poll in SQS only waits long enough to find something:
“Long polling allows the Amazon SQS service to wait until a message is available in the queue before sending a response. So unless the connection times out, the response to the ReceiveMessage request will contain at least one of the available messages (if any) and up to the maximum number requested in the ReceiveMessage call.”
— http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-long-polling.html (emphasis added)
So, the “something” that SQS finds and returns may be all of the messages (up to your max), or a subset of the messages, because, as has been mentioned, SQS is a distributed system. There was likely an architectural decision to be made between "return as quickly as possible once we've found something" and "search the entire system for everything possible up to the maximum number of message the client will accept" ... and, given those alternatives, it seems reasonable that most applications would prefer the faster response of "give me whatever you can, as quickly as you can."
You don't know that you've actually drained a queue until you get back an empty response from a long poll.
As pointed out by Michael - sqlbot, SQS does not guarantee returning all (or the requested number of) messages even in case of Long Polling. Long Polling just ensures that you do not get false empty responses - i.e. your read requests do not return any messages even when there are messages in the queue.
I had done some experiments around this and found that the number of messages returned in the response approaches the number of the messages requested as you increase the number of messages in the queue. Typically, with 1000+ messages in the queue, in my experiments, I could see that it returned 10 messages (which is by the way the max that can be returned for a read request) everytime. In fact this behavior was observed for Short Polling as well. Even with 100+ messages, the number of messages returned was not 10 all the time, although a good percentage of those requests returned 10 messages back. Obviously, this is not guaranteed, but that is what you would typically see.
I had documented the findings from my experiments in one of my blogs - posting a link to the same below in case you would like to see more details of the experiment.
http://pragmaticnotes.com/2017/11/20/amazon-sqs-long-polling-versus-short-polling/
Because SQS is, on the back-end, a distributed system, there is no guarantee that any particular request will be able to return the maximum number of messages that are being polled for.
You just have to keep calling, till you are confident enough that you have as many items as you would expect, or that the queue has been emptied.
Set the execution time out to a value greater than 0. I have set execution timeout to 2 seconds and it is now returning all 9 messages available in the queue.

hornetq delayed redelivery for message group

I want to somehow delay messages for the whole message group.
The thing is that all messages belonging to each message group must be processed in the same order they were posted, sequentially. If one of the messages cannot be consumed - we want to delay it and also delay the remaining ones in the same message group. I do not want to block the consumer - it should be free to process messages from other groups.
How to do that?
I can't say JMS has anything nice built in support for this stuff. Everything is easier with single "stand alone" messages, but there is one thing you could try.
Do a delayed delivery for those messages (in that group).
// Send to same queue once again, but delay 60 sec
if( isGroupMarkedForRedelivery(message.getStringProperty("JMSXGroupID"))){
message.setLongProperty("_HQ_SCHED_DELIVERY", System.currentTimeMillis() + 60000);
producer.send(message); // producer sends to process queue (again).
}
Note that if you need them in the same order, then you should probably not use concurrency in sending and/or receiving. You could of course add more logic to adapt to your situation.
You probably need to make sure isGroupMarkedForRedelivery returns false for a specific group after less amount of time than the "delay".

Categories

Resources