Delayer along with Splitter, messages not processing serially - java

My requirement is to process(call Webservice) the List of messages in a serial fashion one after the other. If first message is successful then only process the 2nd message and so on.
I am using Splitter here to split the messages. Inside Splitter I have used Delayer (not persistence).
Problem is as soon as 1st messages goes in to delayer, 2nd message in the list start processing, without waiting for the first message to be completed.
I believe this is happening because delayer doesn't block the threads.
Is there as way I can achieve this functionality by using Splitter and delayer?

The delayer is designed that way; it schedules the message to be processed some time in the future. If you simply want to slow down the rate at which you process splits; simply add a POJO service (invoked by a service activator) that has a Thread.sleep(...) and returns the input message.
public Message<?> sleeper(Message<?>) throws InterruptedException {
Thread.sleep(1000);
}

Related

ActiveMQ : how to fork-join? Ie. how to emit one message when all subtasks are done

imagine you have some task structure of
Task1
Task2: 1 million separate independent Subtask[i] that can run concurrently
Task3: must run once after ALL Task2 subtasks have completed
And all of Task1, Subtask[i] and Task3 are represented by MQ messages.
How can this be solved on an ActiveMQ? Especially the triggering of a Task3 message once all subtasks are complete.
I know, it's not a queueing problem, it's a fork-join problem. Lets say the environment dictates you must use an ActiveMQ for it.
Using ActiveMQ features, dynamic queues and consumers, stuff like that, is allowed. Using external counters, like a database row representing Task2's progress, is not allowed.
Hidden in this fork-join problem is a state management and observability challenge. Since the database is ruled out, you have to rely on something in-memory or on-queue.
Create a unique id for the task run -- something short, but with enough space to not collide like an airplane locator code-- ie. 34FDSX
Send all messages for the task to a queue://TASK.34FDSX.DATA
Send a control message to queue://TASK.34FDSX.CONTROL that contains the task id and expected total # of messages (including each messageId would be helpful too)
When consumers from queue://TASK.34FDSX.DATA complete their work, they should send a 'done' message to queue://TASK.34FDSX.DONE queue with their messageId or some identifier.
The consumers for the .CONTROL queue and the .DONE queue should be the same process and can track the expected and total completed tasks. Once everything is completed, he can fire the event to trigger Task #3.
This approach provides everything as 'online', and you can also timeout the .CONTROL and .DONE reader if too much time passes before the task completes.
Queue deletion can be done using ActiveMQ destination GC, or as a clean-up step in the .CONTROL/.DONE reader during the occurances when everything completes successfully.
Advantages:
No infinite blocking consumers
No infinite open transactions
State of the TASK is online and observable via the presence of queues and queue metrics-- queue size, enqueue count, dequeue count
The entire solution can be multi-threaded and the only requirement is that for a given task the .CONTROL/.DONE listener is the same consumer, but multiple tasks can have individual .CONTROL/.DONE listeners to scale.
The question here is a bit vague so my answer will have to be a bit vague as well.
Each of the million independent subtasks for "Task 2" can be represented by a single message. All these messages can be in the same queue. You can spin up as many consumers as you want and process all these messages (i.e. perform all the subtasks). Just ensure that these consumers either use client-acknowledge mode or a transacted session so that the message is not removed from the queue until they are done processing the message. Once there are no more messages in the queue then you know "Task 2" is done.
To detect when the queue is empty you can have a "special" consumer on the queue that periodically opens a transacted session and tries to consume a message from the queue. If the consumer receives a message then you can rollback the transacted session to put the message back on the queue and you know that the queue is not empty (i.e. "Task 2" is not done). If the consumer doesn't receive a message then you know the queue is empty and you can send another message indicating this. You could launch this special consumer as part of "Task 2" after all the messages for the subtasks have been sent to avoid detecting an empty queue prematurely.
To be clear, this is a simple solution. You could certainly add more complexity depending on your requirements, but your question just outlined the basic problem so it's unclear what other requirements you have (if any).

Delaying Kafka Streams consuming

I'm trying to use Kafka Streams (i.e. not a simple Kafka Consumer) to read from a retry topic with events that have previously failed to process. I wish to consume from the retry topic, and if processing still fails (for example, if an external system is down), I wish to put the event back on the retry topic. Thus I don't want to keep consuming immediately, but instead wait a while before consuming, in order to not flood the systems with messages that are temporarily unprocessable.
Simplified, the code currently does this, and I wish to add a delay to it.
fun createTopology(topic: String): Topology {
val streamsBuilder = StreamsBuilder()
streamsBuilder.stream<String, ArchivalData>(topic, Consumed.with(Serdes.String(), ArchivalDataSerde()))
.peek { key, msg -> logger.info("Received event for key $key : $msg") }
.map { key, msg -> enrich(msg) }
.foreach { key, enrichedMsg -> archive(enrichedMsg) }
return streamsBuilder.build()
}
I have tried to use Window Delay to set this up, but have not managed to get it to work. I could of course do a sleep inside a peek, but that would leave a thread hanging and does not sound like a very clean solution.
The exact details of how the delay would work is not terribly important to my use case. For example, all of these would work fine:
All events on the topic in the past x seconds are all consumed at once. After it begins / finishes to consume, the stream waits x seconds before consuming again
Every event is processed x seconds after being put on the topic
The stream consumes messages with a delay of x seconds between every event
I would be very grateful if someone could provide a few lines of Kotlin or Java code that would accomplish any of the above.
You cannot really pause reading from the input topic using Kafka Streams—the only way to "delay" would be to call a "sleep", but as you mentioned, that blocks the whole thread and is not a good solution.
However, what you can do is to use a stateful processor, e.g., process() (with attached state store) instead of foreach(). If the retry fails, you don't put the record back into the input topic, but you put it into the store and also register a punctuation with desired retry delay. If the punctuation fires, you retry and if the retry succeeds, you delete the entry from the store and cancel the punctuation; otherwise, you wait until the punctuation fires again.

Pattern to continuously listen to AWS SQS messages

I have a simple class named QueueService with some methods that wrap the methods from the AWS SQS SDK for Java. For example:
public ArrayList<Hashtable<String, String>> receiveMessages(String queueURL) {
List<Message> messages = this.sqsClient.receiveMessage(queueURL).getMessages();
ArrayList<Hashtable<String, String>> resultList = new ArrayList<Hashtable<String, String>>();
for(Message message : messages) {
Hashtable<String, String> resultItem = new Hashtable<String, String>();
resultItem.put("MessageId", message.getMessageId());
resultItem.put("ReceiptHandle", message.getReceiptHandle());
resultItem.put("Body", message.getBody());
resultList.add(resultItem);
}
return resultList;
}
I have another another class named App that has a main and creates an instace of the QueueService.
I looking for a "pattern" to make the main in App to listen for new messages in the queue. Right now I have a while(true) loop where I call the receiveMessagesmethod:
while(true) {
messages = queueService.receiveMessages(queueURL);
for(Hashtable<String, String> message: messages) {
String receiptHandle = message.get("ReceiptHandle");
String messageBody = message.get("MessageBody");
System.out.println(messageBody);
queueService.deleteMessage(queueURL, receiptHandle);
}
}
Is this the correct way? Should I use the async message receive method in SQS SDK?
To my knowledge, there is no way in Amazon SQS to support an active listener model where Amazon SQS would "push" messages to your listener, or would invoke your message listener when there are messages.
So, you would always have to poll for messages. There are two polling mechanisms supported for polling - Short Polling and Long Polling. Each has its own pros and cons, but Long Polling is the one you would typically end up using in most cases, although the default one is Short Polling. Long Polling mechanism is definitely more efficient in terms of network traffic, is more cost efficient (because Amazon charges you by the number of requests made), and is also the preferred mechanism when you want your messages to be processed in a time sensitive manner (~= process as soon as possible).
There are more intricacies around Long Polling and Short Polling that are worth knowing, and its somewhat difficult to paraphrase all of that here, but if you like, you can read a lot more details about this through the following blog. It has a few code examples as well that should be helpful.
http://pragmaticnotes.com/2017/11/20/amazon-sqs-long-polling-versus-short-polling/
In terms of a while(true) loop, I would say it depends.
If you are using Long Polling, and you can set the wait time to be (max) 20 seconds, that way you do not poll SQS more often than 20 seconds if there are no messages. If there are messages, you can decide whether to poll frequently (to process messages as soon as they arrive) or whether to always process them in time intervals (say every n seconds).
Another point to note would be that you could read upto 10 messages in a single receiveMessages request, so that would also reduce the number of calls you make to SQS, thereby reducing costs. And as the above blog explains in details, you may request to read 10 messages, but it may not return you 10 even if there are that many messages in the queue.
In general though, I would say you need to build appropriate hooks and exception handling to turn off the polling if you wish to at runtime, in case you are using a while(true) kind of a structure.
Another aspect to consider is whether you would like to poll SQS in your main application thread or you would like to spawn another thread. So another option could be to create a ScheduledThreadPoolExecutor with a single thread in the main to schedule a thread to poll the SQS periodically (every few seconds), and you may not need a while(true) structure.
There are a few things that you're missing:
Use the receiveMessages(ReceiveMessageRequest) and set a wait time to enable long polling.
Wrap your AWS calls in try/catch blocks. In particular, pay attention to OverLimitException, which can be thrown from receiveMessages() if you would have too many in-flight messages.
Wrap the entire body of the while loop in its own try/catch block, logging any exceptions that are caught (there shouldn't be -- this is here to ensure that your application doesn't crash because AWS changed their API or you neglected to handle an expected exception).
See doc for more information about long polling and possible exceptions.
As for using the async client: do you have any particular reason to use it? If not, then don't: a single receiver thread is much easier to manage.
If you want to use SQS and then lambda to process the request you can follow the steps given in the link or you always use lambda instead of SQS and invoke lambda for every request.
As of 2019 SQS can trigger lambdas:
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
I found one solution for actively listening the queue.
For Node. I have used the following package and resolved my issue.
sqs-consumer
Link
https://www.npmjs.com/package/sqs-consumer

MapReduce implementation with Akka

I'm trying to implement MapReduce on top of Akka and was lucky to find the code of the book Akka Essentials. However, I have found two major issues with this example implementation, and both seem like fundamental concurrency design flaws which btw is quite shocking to find in a book about Akka:
Upon completion the Client side will call shutdown() but at that point there is no guarantee that the messages went through to the WCMapReduceServer. I see that the WCMapReduceServer only gets a partial number of Client messages at any time and then WCMapReduceServer outputs [INFO] [06/25/2013 09:30:01.594] [WCMapReduceApp-5] [ActorSystem(WCMapReduceApp)] REMOTE: RemoteClientShutdown#akka://ClientApplication#192.168.224.65:2552 meaning the Client shutdown() happens before the Client actually manages to flush all pending messages. In the Client code line 41 we see the shutdown() takes place without flushing first. Is there a way in Akka to enforce flushing outbound messages before shutting down the system?
The other actually bigger flaw, which I already fixed, is the way used to signal EOF to the MapReduce server that the main task (file of words) is done given that all subtasks (each line of the file) are done. He sends a special String message DISPLAY_LIST and this message is queued with lowest priority see code. The big flaw here is that even though DISPLAY_LIST has the lowest priority, if any Map (or Reduce) task takes arbitrarily long, the DISPLAY_LIST message will go through before all the MapReduce subtasks have completed and therefore the outcome of this MapReduce example is non-deterministic i.e. you can get different dictionaries out of each run. The issue can be revealed by replacing the MapActor#onReceive implementation with the following i.e. make one Map step arbitrarily long:
public void onReceive(Object message) {
System.out.println("MapActor -> onReceive(" + message + ")");
if (message instanceof String) {
String work = (String) message;
// ******** BEGIN SLOW DOWN ONE MAP REQUEST
if ("Thieves! thieves!".equals(work)) {
try {
System.out.println("*** sleeping!");
Thread.sleep(5000);
System.out.println("*** back!");
}
catch (InterruptedException e) {
e.printStackTrace();
}
}
// ******** END SLOW DOWN ONE MAP REQUEST
// perform the work
List<Result> list = evaluateExpression(work);
// reply with the result
actor.tell(list);
} else throw new IllegalArgumentException("Unknown message [" + message + "]");
}
Reading the book a bit further one finds:
We have Thread.sleep() because there is no guarantee in which order
the messages are processed. The first Thread.sleep() method ensures
that all the string sentence messages are processed completely before
we send the Result message.
I'm sorry but Thread.sleep() has never been the means of ensuring anything in concurrency. Therefore no wonder books like this will end up full of fundamental concurrency flaws in their examples.
I have solved both problems, and also migrated the code to the latest Akka version 2.2-M3.
The solution to the first issue is to have the MapReduce remote MasterActor send back a ShutdownInfo notification as soon as it gets the TaskInfo notification which is sent from the Client once all messages have been sent. The TaskInfo contains the information of how many subtasks a MapReduce task has e.g. in this case how many lines in the text file.
The solution to the second problem is sending the TaskInfo with the total number of subtasks. Here the AggregatorActor counts the number of subtasks it has processed, compares it to the TaskInfo and signals that the job is done when they match (currently just print a message).
The interesting and correct behavior is shown in the output:
ClientActor sends a bunch of messages which are "subtasks". Note that the Identity request pattern is used to gain access to the ActorRef of the remote MapReduce MasterActor.
ClientActor sends last the TaskInfo message saying how many subtasks were previously sent.
MasterActor forwards String messages to MapActor which in turns forwards to ReduceActor
One MapActor is a lengthy one namely the one with content "Thieves! thieves!" this slows the MapReduce computation a bit.
Meanwhile MasterActor receives the TaskInfo last message and sends back to ClientActor the ShudownInfo
ClientActor runs system.shutdown() and Client terminates. Note that the MapReduce is still in the middle of the processing and the Client shutdown does not interfere.
The lengthy MapActor comes back and the message processing continues.
AggregatorActor receives the TaskInfo and by counting the subtasks confirms that the total number of substasks have been completed and signals completion.
The code may be fetch from my repository:
https://github.com/bravegag/akka-mapreduce-example
Feedback always welcome.

hornetq delayed redelivery for message group

I want to somehow delay messages for the whole message group.
The thing is that all messages belonging to each message group must be processed in the same order they were posted, sequentially. If one of the messages cannot be consumed - we want to delay it and also delay the remaining ones in the same message group. I do not want to block the consumer - it should be free to process messages from other groups.
How to do that?
I can't say JMS has anything nice built in support for this stuff. Everything is easier with single "stand alone" messages, but there is one thing you could try.
Do a delayed delivery for those messages (in that group).
// Send to same queue once again, but delay 60 sec
if( isGroupMarkedForRedelivery(message.getStringProperty("JMSXGroupID"))){
message.setLongProperty("_HQ_SCHED_DELIVERY", System.currentTimeMillis() + 60000);
producer.send(message); // producer sends to process queue (again).
}
Note that if you need them in the same order, then you should probably not use concurrency in sending and/or receiving. You could of course add more logic to adapt to your situation.
You probably need to make sure isGroupMarkedForRedelivery returns false for a specific group after less amount of time than the "delay".

Categories

Resources