How does producer process data in a queue in AWS SQS - java

I've a doubt, let's assume in consumer I've set the receive messages limit to 10 for queue, delay of 5 seconds and polling rate as 20 sec.
I want to know how my consumer processes the data, whether it process the total 10 data at once and then picks up another 10 or it process one data and simultaneously picks one from the queue?

I want to know how my consumer processes the data, whether it process the total 10 data at once and then picks up another 10 or it process one data and simultaneously picks one from the queue?
If you're asking about how batches get processed, the former is correct. If your consumer is a lambda function for example, the function would recieve a single "batch" of 10 messages per invocation. It would then have to process or fail to process all 10 before starting another batch.
If you are in fact using lambda, I'd suggest reading this guide for more basic information.

Related

Executing receiveMessageRequest to the same SQS FIFO queue

I have two lambda instance running at the same time and these two instances will do a short poll to the same FIFO queue with only a few seconds apart.
The first instance will receive the first 10 messages and the second instance will receive 0 message even though there are a total of 15 messages in the queue.
Why couldn't the second instance get the remaining 5 messages from the queue? Is this the expected behaviour and how can I overcome it?
Your 15 messages (most likely) all belong to the same Message Group ID. Therefore, the remaining 5 will not become available to your consumers until the first 10 are successfully processed and deleted. For FIFO Queues, this is the expected behaviour to preserve the order of messages (cheers #Michael-sqlbot for pointing in the right direction with this answer as per comments below).
Use long polling for Standard Queues. Short polling doesn't check every SQS server, therefore, it has the potential to not get all results. Long polling does check all SQS servers and will therefore get all results.

Aggregate messages without List

I'm using spring integration and I need to pack group of messages by 10k. I don't want to store it into List since later 10k could became much bigger and persistent storage is also not my choice. I just want that several threads send messages into single thread where I can count them and write into disk into files containing 10k lines. After counter reaches 10k I create new file set counter to zero and so on. It would work fine with direct channel but how to tell several threads(I'm using
<int:dispatcher task-executor="executor" />
) to send messages into single thread? Thanks
You can reach the task with the QueueChannel. Any threads can send messages to it concurrently. On the other side you should just configure PollingConsumer with the fixed-delay poller - single-threaded, as you requested. I mean that poller with the fixed-delay and everything downstream with the DirectChannel will be done only in single thread. Therefore your count and rollover logic can be reached there.
Nothing to show you, because that configuration is straight forward: different services send messages to the same QueueChannel. The fixed-delay poller ensures single-threaded reading for you.

How to create Java concurrent queue from which we can blocking-take more than 1 element in single call?

Background: I need to send many small-size messages to WebSocket clients in asynchronous way. Messages are usually sent in peak, so after some pause I need to send ~5000 messages fast. So the problem is:
I don't want to start 5000 async's in single thread
I don't want to loop "start async"-"wait for complete" 5000 times in serial
I don't want to use 5000 threads, with single "start async"-"wait for complete" per thread
The best way would be to group ~20 asyncs per thread, so I need very specific queue:
lot of means concurrent push/poll in queue
small-sized asynchronous means I want to poll in bundles, like 1 to 20 messages per queue take() (so I can start 1...20 async I/O and wait for completness in single thread)
immediately means that I dont want to wait until 20 messages will be polled, bundle-poll should be used only if queue has lot of messages. Single message should be polled and sent immediately.
So basically: I need structure like queue that has blocking take(1 to X) waiting elements in single blocking call. Pseudocode:
[each of ~50 processing threads]:
messages = queue.blockingTake( max 10 or at least 1 if less than 10 available );
for each message: message.startAsync()
for each message: message.waitToComplete()
repeat
I wouldn't implement a Queue from scratch if it's not really necessary. A few ideas if you're interested:
Queue> if you have only 1 thread doing the offers. If you have more, the collection has to be sync'd. Like, one offerer peek()-s into the queue, sees that the last collection has too many elements so it creates a new one and offers it.
or
A number of running threads where the runnables take elements one by one from the queue.
or
1 queue per sending thread, if you keep the queue references you can then add elements to each of them in a round robin fashion.
or
subclass a BlockingQueue of your choice and create a "Collection take(int i)" method with a rewritten version of the normal take().

Retrieving contents inside JMS Queue for within specific time inteval

I need to create an application wherein I have to retrieve all the elements inside the JMS queue within a given time limit.
For instance, the given the limit is 10 seconds. So every 10 seconds, the application should create a new Thread wherein the Thread is responsible for 1) connecting to the JMS queue and 2) retrieving all the messages during the time of connection.
So in 10 seconds, lets say that there were 15 TextMessages in the queue. I only want the current executing thread to retrieve those 15 TextMessages and nothing else. I'm afraid that the thread would pick up additional messages.
Is there a facility to limit how much messages a consumer can take? Maybe something feature which would let me see how much the queue contains?
One method I can think of is that you create a receiver from a session that uses CLIENT_ACKNOWLEDGE acknowledgement mode. Now start the receiver and receive the messages. Yes you will receive some additional messages. Now as you receive a message get it JMSTimestamp and see whether it belongs to the time duration your thread is interested in. If the message is as per your time requirement acknowledge it. If not do not acknowledge it in which case it will persist on the server and may be picked up by other threads looking for messages with different time stamps.
Another efficient way would be using message selector. Since JMSTimestamp is a message header and can be used in a selector you can take advantage of it. Create receiver with a selector on JMSTimestamp with your time range requirement. Only messages satisfying the selector will be received.

How to design a system that queues requests & processes them in batches?

I have at my disposal a REST service that accepts a JSON array of image urls, and will return scaled thumbnails.
Problem
I want to batch up image URLs sent by concurrent clients before calling the REST service.
Obviously if I receive 1 image, I should wait a moment in case other images trickle in.
I've settled on a batch of 5 images. But the question is, how do I design it to take care of these scenarios:
If I receive x images, such that x < 5, how do I timeout from waiting if no new images will arrive in the next few minutes.
If I use a queue to buffer incoming image urls, I will probably need to lock it to prevent clients from concurrently writing while I'm busy reading my batches of 5. What data structure is good for this ? BlockingQueue ?
The data structure is not what's missing. What's missing is an entity - a Timer task, I'd say, which you stop and restart every time you send a batch of images to your service. You do this whether you send them because you had 5 (incidentally, I assume that 5 is just your starting number and it'll be configurable, along with your timeout), or whether because the timeout task fired.
So there's two entities running: a main thread which receives requests, queues them, checks queue depth, and if it's 5 or more, sends the oldest 5 to the service (and restarts the timer task); and the timer task, which picks up incomplete batches and sends them on.
Side note: that main thread seems to have several responsibilities, so some decomposition might be in order.
Well what you could do is have the clients send a special string to the queue, indicating that it is done sending image URLs. So if your last element in the queue is that string, you know that there are no URLs left.
If you have multiple clients and you know the number of clients you can always count the amount of the indicators in the queue to check if all of the clients are finished.
1- As example, if your Java web app is running on Google AppEngine, you could write each client request in the datastore, have cron job (i.e. scheduled task in GAE speak) read the datastore, build a batch and send it.
2- For the concurrency/locking aspect, then again you could rely on GAE datastore to provide atomicity.
Of course feel free to disregard my proposal if GAE isn't an option.

Categories

Resources