Retrieve multiple messages from SQS - java

I have multiple messages in SQS. The following code always returns only one, even if there are dozens visible (not in flight). setMaxNumberOfMessages I thought would allow multiple to be consumed at once .. have i misunderstood this?
CreateQueueRequest createQueueRequest = new CreateQueueRequest().withQueueName(queueName);
String queueUrl = sqs.createQueue(createQueueRequest).getQueueUrl();
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest(queueUrl);
receiveMessageRequest.setMaxNumberOfMessages(10);
List<Message> messages = sqs.receiveMessage(receiveMessageRequest).getMessages();
for (Message message : messages) {
// i'm a message from SQS
}
I've also tried using withMaxNumberOfMessages without any such luck:
receiveMessageRequest.withMaxNumberOfMessages(10);
How do I know there are messages in the queue? More than 1?
Set<String> attrs = new HashSet<String>();
attrs.add("ApproximateNumberOfMessages");
CreateQueueRequest createQueueRequest = new CreateQueueRequest().withQueueName(queueName);
GetQueueAttributesRequest a = new GetQueueAttributesRequest().withQueueUrl(sqs.createQueue(createQueueRequest).getQueueUrl()).withAttributeNames(attrs);
Map<String,String> result = sqs.getQueueAttributes(a).getAttributes();
int num = Integer.parseInt(result.get("ApproximateNumberOfMessages"));
The above always is run prior and gives me an int that is >1
Thanks for your input

AWS API Reference Guide: Query/QueryReceiveMessage
Due to the distributed nature of the queue, a weighted random set of machines is sampled on a ReceiveMessage call. That means only the messages on the sampled machines are returned. If the number of messages in the queue is small (less than 1000), it is likely you will get fewer messages than you requested per ReceiveMessage call. If the number of messages in the queue is extremely small, you might not receive any messages in a particular ReceiveMessage response; in which case you should repeat the request.
and
MaxNumberOfMessages: Maximum number of messages to return. SQS never returns more messages than this value but might return fewer.

There is a comprehensive explanation for this (arguably rather idiosyncratic) behaviour in the SQS reference documentation.
SQS stores copies of messages on multiple servers and receive message requests are made to these servers with one of two possible strategies,
Short Polling : The default behaviour, only a subset of the servers (based on a weighted random distribution) are queried.
Long Polling : Enabled by setting the WaitTimeSeconds attribute to a non-zero value, all of the servers are queried.
In practice, for my limited tests, I always seem to get one message with short polling just as you did.

I had the same problem. What is your Receive Message Wait Time for your queue set to? When mine was at 0, it only returned 1 message even if there were 8 in the queue. When I increased the Receive Message Wait Time, then I got all of them. Seems kind of buggy to me.

I was just trying the same and with the help of these two attributes setMaxNumberOfMessages and setWaitTimeSeconds i was able to get 10 messages.
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest(myQueueUrl);
receiveMessageRequest.setMaxNumberOfMessages(10);
receiveMessageRequest.setWaitTimeSeconds(20);
Snapshot of o/p:
Receiving messages from TestQueue.
Number of messages:10
Message
MessageId: 31a7c669-1f0c-4bf1-b18b-c7fa31f4e82d
...

receiveMessageRequest.withMaxNumberOfMessages(10);
Just to be clear, the more practical use of this would be to add to your constructor like this:
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest(queueUrl).withMaxNumberOfMessages(10);
Otherwise, you might as well just do:
receiveMessageRequest.setMaxNumberOfMessages(10);
That being said, changing this won't help the original problem.

Thanks Caoilte!
I faced this issue also. Finally solved by using long polling follow the configuration here:
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-configure-long-polling-for-queue.html
Unfortunately, to use long polling, you must create your queue as FIFO one. I tried standard queue with no luck.
And when receiving, need also set MaxNumberOfMessages. So my code is like:
ReceiveMessageRequest receive_request = new ReceiveMessageRequest()
.withQueueUrl(QUEUE_URL)
.withWaitTimeSeconds(20)
.withMaxNumberOfMessages(10);
Although solved, still feel too wired. AWS should definitely provide a more neat API for this kind of basic receiving operation.
From my point, AWS has many many cool features but not good APIs. Like those guys are rushing out all the time.

For small task list I use FIFO queue like stackoverflow.com/a/55149351/13678017
for example modified AWS tutorial
// Create a queue.
System.out.println("Creating a new Amazon SQS FIFO queue called " + "MyFifoQueue.fifo.\n");
final Map<String, String> attributes = new HashMap<>();
// A FIFO queue must have the FifoQueue attribute set to true.
attributes.put("FifoQueue", "true");
/*
* If the user doesn't provide a MessageDeduplicationId, generate a
* MessageDeduplicationId based on the content.
*/
attributes.put("ContentBasedDeduplication", "true");
// The FIFO queue name must end with the .fifo suffix.
final CreateQueueRequest createQueueRequest = new CreateQueueRequest("MyFifoQueue4.fifo")
.withAttributes(attributes);
final String myQueueUrl = sqs.createQueue(createQueueRequest).getQueueUrl();
// List all queues.
System.out.println("Listing all queues in your account.\n");
for (final String queueUrl : sqs.listQueues().getQueueUrls()) {
System.out.println(" QueueUrl: " + queueUrl);
}
System.out.println();
// Send a message.
System.out.println("Sending a message to MyQueue.\n");
for (int i = 0; i < 4; i++) {
var request = new SendMessageRequest()
.withQueueUrl(myQueueUrl)
.withMessageBody("message " + i)
.withMessageGroupId("userId1");
;
sqs.sendMessage(request);
}
for (int i = 0; i < 6; i++) {
var request = new SendMessageRequest()
.withQueueUrl(myQueueUrl)
.withMessageBody("message " + i)
.withMessageGroupId("userId2");
;
sqs.sendMessage(request);
}
// Receive messages.
System.out.println("Receiving messages from MyQueue.\n");
var receiveMessageRequest = new ReceiveMessageRequest(myQueueUrl);
receiveMessageRequest.setMaxNumberOfMessages(10);
receiveMessageRequest.setWaitTimeSeconds(20);
// what receive?
receiveMessageRequest.withMessageAttributeNames("userId2");
final List<Message> messages = sqs.receiveMessage(receiveMessageRequest).getMessages();
for (final Message message : messages) {
System.out.println("Message");
System.out.println(" MessageId: "
+ message.getMessageId());
System.out.println(" ReceiptHandle: "
+ message.getReceiptHandle());
System.out.println(" MD5OfBody: "
+ message.getMD5OfBody());
System.out.println(" Body: "
+ message.getBody());
for (final Entry<String, String> entry : message.getAttributes()
.entrySet()) {
System.out.println("Attribute");
System.out.println(" Name: " + entry
.getKey());
System.out.println(" Value: " + entry
.getValue());
}
}

Here's a workaround, you can call receiveMessageFromSQS method asynchronously.
bulkReceiveFromSQS (queueUrl, totalMessages, asyncLimit, batchSize, visibilityTimeout, waitTime, callback) {
batchSize = Math.min(batchSize, 10);
let self = this,
noOfIterations = Math.ceil(totalMessages / batchSize);
async.timesLimit(noOfIterations, asyncLimit, function(n, next) {
self.receiveMessageFromSQS(queueUrl, batchSize, visibilityTimeout, waitTime,
function(err, result) {
if (err) {
return next(err);
}
return next(null, _.get(result, 'Messages'));
});
}, function (err, listOfMessages) {
if (err) {
return callback(err);
}
listOfMessages = _.flatten(listOfMessages).filter(Boolean);
return callback(null, listOfMessages);
});
}
It will return you an array with a given number of messages

Related

Pagination in CosmosDB Java SDK with continuation token

I'm trying to create from an async client a method to retrieve items from a CosmosDB but I'm afraid I'm full of questions and little to no documentation from Microsoft side
I've created a function that will read from a cosmosDB a list of items, page by page, which continuation will depend on a continuityToken. The methos looks like this. Please, be aware there could be some minor mistakes non related to the core functionality which is reading page by page:
#FunctionName("Feed")
public HttpResponseMessage getFeed(
#HttpTrigger(
name = "get",
methods = { HttpMethod.GET },
authLevel = AuthorizationLevel.ANONYMOUS,
route = "Feed"
) final HttpRequestMessage<Optional<String>> request,
#CosmosDBInput(
name = "Feed",
databaseName = Constants.DATABASE_NAME,
collectionName = Constants.LOG_COLLECTION_NAME,
sqlQuery = "SELECT * FROM c", // This won't be used actually as we use our own query
connectionStringSetting = Constants.CONNECTION_STRING_KEY
) final LogEntry[] logEntryArray,
final ExecutionContext context
) {
context
.getLogger()
.info("Query with paging and continuation token");
String query = "SELECT * FROM c"
int pageSize = 10; //No of docs per page
int currentPageNumber = 1;
int documentNumber = 0;
String continuationToken = null;
double requestCharge = 0.0;
// First iteration (continuationToken = null): Receive a batch of query response pages
// Subsequent iterations (continuationToken != null): Receive subsequent batch of query response pages, with continuationToken indicating where the previous iteration left off
do {
context
.getLogger()
.info("Receiving a set of query response pages.");
context
.getLogger()
.info("Continuation Token: " + continuationToken + "\n");
CosmosQueryRequestOptions queryOptions = new CosmosQueryRequestOptions();
Flux<FeedResponse<LogEntry>> feedResponseIterator =
container.queryItems(query, queryOptions, LogEntry.class).byPage(continuationToken,pageSize);
try {
feedResponseIterator.flatMap(fluxResponse -> {
context
.getLogger()
.info("Got a page of query result with " +
fluxResponse.getResults().size() + " items(s)"
+ " and request charge of " + fluxResponse.getRequestCharge());
context
.getLogger()
.info("Item Ids " + fluxResponse
.getResults()
.stream()
.map(LogEntry::getDate)
.collect(Collectors.toList()));
return Flux.empty();
}).blockLast();
} catch (Exception e) {
}
} while (continuationToken != null);
context
.getLogger()
.info(String.format("Total request charge: %f\n", requestCharge));
return request
.createResponseBuilder(HttpStatus.OK)
.header("Content-Type", "application/json")
.body("ALL READ")
.build();
}
For simplicity the read items are merely logged.
First question: We are using an async document client that returns a Flux. Will the client keep track of the token? It is a stateless client in principle. I understand that the sync client could take easily care of this case, but wouldn't the async client reset its memory of tokens after the first page and token has been generated?
Second: Is the while loop even appropriated? My assumption is a big no, as we need to send back the token in a header and the frontend UI will need to send the token to the Azure Function in a header or other similar fashion. The token should be extracted from the context then
Third: Is the flatMap and blockList way to read the flux appropriate? I was trying to play with the subscribe method but again I don't see how it could work for an async client.
Thanks a lot,
Alex.
UPDATE:
I have observed that Flux only uses the items per page value to set the number of items to be retrieved per batch, but after retrieval of one page it doesn't stop and keeps retrieving pages! I don't know how to stop it. I have tried substituting the Flux.empty() per Mono.empty() and setting a LIMIT clause in the sql query. The first option does the same and the second freezes the query and never returns apparently. How can I return one page an only one page along with the continuation token to do the following query once the user clicks on the next page button?

How to delete data which already been consumed by consumer? Kafka

I am doing data replication in kafka. But, the size of kafka log file is increases very quickly. The size reaches 5 gb in a day. As a solution of this problem, ı want to delete processed data immediately. I am using delete record method in AdminClient to delete offset. But when I look at the log file, data corresponding to that offset is not deleted.
RecordsToDelete recordsToDelete = RedcordsToDelete.beforeOffset(offset);
TopicPartition topicPartition = new TopicPartition(topicName,partition);
Map<TopicPartition,RecordsToDelete> deleteConf = new HashMap<>();
deleteConf.put(topicPartition,recordsToDelete);
adminClient.deleteRecords(deleteConf);
I don't want suggestions like (log.retention.hours , log.retention.bytes , log.segment.bytes , log.cleanup.policy=delete)
Because I just want to delete data consumed by the consumer. In this solution, I also deleted the data that is not consumed.
What are your suggestions?
You didn't do anything wrong. The code you provided works and I've tested it. Just in case I've overlooked something in your code, mine is:
public void deleteMessages(String topicName, int partitionIndex, int beforeIndex) {
TopicPartition topicPartition = new TopicPartition(topicName, partitionIndex);
Map<TopicPartition, RecordsToDelete> deleteMap = new HashMap<>();
deleteMap.put(topicPartition, RecordsToDelete.beforeOffset(beforeIndex));
kafkaAdminClient.deleteRecords(deleteMap);
}
I've used group: 'org.apache.kafka', name: 'kafka-clients', version: '2.0.0'
So check if you are targeting right partition ( 0 for the first one)
Check your broker version: https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/admin/AdminClient.html says:
This operation is supported by brokers with version 0.11.0.0
Produce the messages from the same application, to be sure you're connected properly.
There is one more option you can consider. Using cleanup.policy=compact If your message keys are repeating you could benefit from it. Not just because older messages for that key will be automatically deleted but you can use the fact that message with null payload deletes all the messages for that key. Just don't forget to set delete.retention.ms and min.compaction.lag.ms to values small enough. In that case you can consume a message and than produce null payload for the same key ( but be cautious with this approach since this way you can delete messages ( with that key) you didn't consume)
Try this
DeleteRecordsResult result = adminClient.deleteRecords(recordsToDelete);
Map<TopicPartition, KafkaFuture<DeletedRecords>> lowWatermarks = result.lowWatermarks();
try {
for (Map.Entry<TopicPartition, KafkaFuture<DeletedRecords>> entry : lowWatermarks.entrySet()) {
System.out.println(entry.getKey().topic() + " " + entry.getKey().partition() + " " + entry.getValue().get().lowWatermark());
}
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
adminClient.close();
In this code, you need to call entry.getValue().get().lowWatermark(), because adminClient.deleteRecords(recordsToDelete) returns a map of Futures, you need to wait for the Future to run by calling get()
This code will only work if the cleanup policy is "delete" or "compact, delete" else the code will throw a Policy Violation exception.

Apache Kafka System Error Handling

We are trying to implement Kafka as our message broker solution. We are deploying our Spring Boot microservices in IBM BLuemix, whose internal message broker implementation is Kafka version 0.10. Since my experience is more on the JMS, ActiveMQ end, I was wondering what should be the ideal way to handle system level errors in the java consumers?
Here is how we have implemented it currently
Consumer properties
enable.auto.commit=false
auto.offset.reset=latest
We are using the default properties for
max.partition.fetch.bytes
session.timeout.ms
Kafka Consumer
We are spinning up 3 threads per topic all having the same groupId, i.e one KafkaConsumer instance per thread. We have only one partition as of now. The consumer code looks like this in the constructor of the thread class
kafkaConsumer = new KafkaConsumer<String, String>(properties);
final List<String> topicList = new ArrayList<String>();
topicList.add(properties.getTopic());
kafkaConsumer.subscribe(topicList, new ConsumerRebalanceListener() {
#Override
public void onPartitionsRevoked(final Collection<TopicPartition> partitions) {
}
#Override
public void onPartitionsAssigned(final Collection<TopicPartition> partitions) {
try {
logger.info("Partitions assigned, consumer seeking to end.");
for (final TopicPartition partition : partitions) {
final long position = kafkaConsumer.position(partition);
logger.info("current Position: " + position);
logger.info("Seeking to end...");
kafkaConsumer.seekToEnd(Arrays.asList(partition));
logger.info("Seek from the current position: " + kafkaConsumer.position(partition));
kafkaConsumer.seek(partition, position);
}
logger.info("Consumer can now begin consuming messages.");
} catch (final Exception e) {
logger.error("Consumer can now begin consuming messages.");
}
}
});
The actual reading happens in the run method of the thread
try {
// Poll on the Kafka consumer every second.
final ConsumerRecords<String, String> records = kafkaConsumer.poll(1000);
// Iterate through all the messages received and print their
// content.
for (final TopicPartition partition : records.partitions()) {
final List<ConsumerRecord<String, String>> partitionRecords = records.records(partition);
logger.info("consumer is alive and is processing "+ partitionRecords.size() +" records");
for (final ConsumerRecord<String, String> record : partitionRecords) {
logger.info("processing topic "+ record.topic()+" for key "+record.key()+" on offset "+ record.offset());
final Class<? extends Event> resourceClass = eventProcessors.getResourceClass();
final Object obj = converter.convertToObject(record.value(), resourceClass);
if (obj != null) {
logger.info("Event: " + obj + " acquired by " + Thread.currentThread().getName());
final CommsEvent event = resourceClass.cast(converter.convertToObject(record.value(), resourceClass));
final MessageResults results = eventProcessors.processEvent(event
);
if ("Success".equals(results.getStatus())) {
// commit the processed message which changes
// the offset
kafkaConsumer.commitSync();
logger.info("Message processed sucessfully");
} else {
kafkaConsumer.seek(new TopicPartition(record.topic(), record.partition()), record.offset());
logger.error("Error processing message : {} with error : {},resetting offset to {} ", obj,results.getError().getMessage(),record.offset());
break;
}
}
}
}
// TODO add return
} catch (final Exception e) {
logger.error("Consumer has failed with exception: " + e, e);
shutdown();
}
You will notice the EventProcessor which is a service class which processes each record, in most cases commits the record in database. If the processor throws an error (System Exception or ValidationException) we do not commit but programatically set the seek to that offset, so that subsequent poll will return from that offset for that group id.
The doubt now is that, is this the right approach? If we get an error and we set the offset then until that is fixed no other message is processed. This might work for system errors like not able to connect to DB, but if the problem is only with that event and not others to process this one record we wont be able to process any other record. We thought of the concept of ErrorTopic where when we get an error the consumer will publish that event to the ErrorTopic and in the meantime it will keep on processing other subsequent events. But it looks like we are trying to bring in the design concepts of JMS (due to my previous experience) into kafka and there may be better way to solve error handling in kafka. Also reprocessing it from error topic may change the sequence of messages which we don't want for some scenarios
Please let me know how anyone has handled this scenario in their projects following the Kafka standards.
-Tatha
if the problem is only with that event and not others to process this one record we wont be able to process any other record
that's correct and your suggestion to use an error topic seems a possible one.
I also noticed that with your handling of onPartitionsAssigned you essentially do not use the consumer committed offset, as you seem you'll always seek to the end.
If you want to restart from the last succesfully committed offset, you should not perform a seek
Finally, I'd like to point out, though it looks like you know that, having 3 consumers in the same group subscribed to a single partition - means that 2 out of 3 will be idle.
HTH
Edo

Conversation ID leads to unkown path in graph-api

I have a code that fetches conversations and the messages inside them (a specific number of pages). It works most of the time, but for certain conversations it throws an exception, such as:
Exception in thread "main" com.restfb.exception.FacebookOAuthException: Received Facebook error response of type OAuthException: Unknown path components: /[id of the message]/messages (code 2500, subcode null)
at com.restfb.DefaultFacebookClient$DefaultGraphFacebookExceptionMapper.exceptionForTypeAndMessage(DefaultFacebookClient.java:1192)
at com.restfb.DefaultFacebookClient.throwFacebookResponseStatusExceptionIfNecessary(DefaultFacebookClient.java:1118)
at com.restfb.DefaultFacebookClient.makeRequestAndProcessResponse(DefaultFacebookClient.java:1059)
at com.restfb.DefaultFacebookClient.makeRequest(DefaultFacebookClient.java:970)
at com.restfb.DefaultFacebookClient.makeRequest(DefaultFacebookClient.java:932)
at com.restfb.DefaultFacebookClient.fetchConnection(DefaultFacebookClient.java:356)
at test.Test.main(Test.java:40)
After debugging I found the ID that doesn't work and tried to access it from graph-api, which results in an "unknown path components" error. I also attempted to manually find the conversation in me/conversations and click the next page link in the graph api explorer which also lead to the same error.
Is there a different way to retrieve a conversation than by ID? And if not, could someone show me an example to verify first if the conversation ID is valid, so if there are conversations I can't retrieve I could skip them instead of getting an error. Here's my current code:
Connection<Conversation> fetchedConversations = fbClient.fetchConnection("me/Conversations", Conversation.class);
int pageCnt = 2;
for (List<Conversation> conversationPage : fetchedConversations) {
for (Conversation aConversation : conversationPage) {
String id = aConversation.getId();
//The line of code which causes the exception
Connection<Message> messages = fbClient.fetchConnection(id + "/messages", Message.class, Parameter.with("fields", "message,created_time,from,id"));
int tempCnt = 0;
for (List<Message> messagePage : messages) {
for (Message msg : messagePage) {
System.out.println(msg.getFrom().getName());
System.out.println(msg.getMessage());
}
if (tempCnt == pageCnt) {
break;
}
tempCnt++;
}
}
}
Thanks in advance!
Update: Surrounded the problematic part with a try catch as a temporary solution, also counted the number of occurrences and it only effects 3 out of 53 conversations. I also printed all the IDs, and it seems that these 3 IDs are the only ones that contain a "/" symbol, I'm guessing it has something to do with the exception.
The IDs that work look something like this: t_[text] (sometimes a "." or a ":" symbol) and the ones that cause an exception are always t_[text]/[text]
conv_id/messages is not a valid graph api call.
messages is a field of conversation.
Here is what you do (single call to api):
Connection<Conversation> conversations = facebookClient.fetchConnection("me/conversations", Conversation.class);
for (Conversation conv : conversations.getData()) {
// To get list of messages for given conversation
LinkedList<Message> allConvMessagesStorage = new LinkedList<Message>();
Connection<Message> messages25 = facebookClient.fetchConnection(conv.getId()+"/messages", Message.class);
//Add messages returned
allConvMessagesStorage.addAll(messages25.getData());
//Check if there is next page to fetch
boolean progress = messages25.hasNext();
while(progress){
messages25 = facebookClient.fetchConnectionPage(messages25.getNextPageUrl(), Message.class);
//Append next page of messages
allConvMessagesStorage.addAll(messages25.getData());
progress = messages25.hasNext();
}
}

Kafka 8.2.2 Dynamic Topic drops first event

EDIT : I am seeing the same exact behavior with the Kafka 9 Consumer API also.
I have a simple Kafaka 8.2.2 Producer with the enable topic creation property set to true. It will create a new topic when an event with a non-existent topic is created, but the event that creates that topic does not end up in Kafka and the RecordMetadata returned has no errors.
public void receiveEvent(#RequestBody EventWrapper events) throws InterruptedException, ExecutionException, TimeoutException {
log.info("Sending " + events.getEvents().size() + " Events ");
for (Event event : events.getEvents()) {
log.info("Sending Event - " + event);
ProducerRecord<String, String> record = new ProducerRecord<>(event.getTopic(), event.getData());
Future<RecordMetadata> ack = eventProducer.send(record);
log.info("ACK - " + ack.get());
}
log.info("SENT!");
}
I have a program that polls for new topics (I wasn't happy with the dynamic/regex topic code in Kafka 8) and it finds the new queue and subscribes, and it does see subsequent events, but never that first event.
I also tried the kafka-console-consumer script and it sees that exact same. First event never seen, then after that events start flowing.
Ideas?
Turns out there is a property you can set props.put("auto.offset.reset","earliest");
And after setting this, the Consumer does receive the first event put on the topic.

Categories

Resources