I'm trying to use Kafka Streams (i.e. not a simple Kafka Consumer) to read from a retry topic with events that have previously failed to process. I wish to consume from the retry topic, and if processing still fails (for example, if an external system is down), I wish to put the event back on the retry topic. Thus I don't want to keep consuming immediately, but instead wait a while before consuming, in order to not flood the systems with messages that are temporarily unprocessable.
Simplified, the code currently does this, and I wish to add a delay to it.
fun createTopology(topic: String): Topology {
val streamsBuilder = StreamsBuilder()
streamsBuilder.stream<String, ArchivalData>(topic, Consumed.with(Serdes.String(), ArchivalDataSerde()))
.peek { key, msg -> logger.info("Received event for key $key : $msg") }
.map { key, msg -> enrich(msg) }
.foreach { key, enrichedMsg -> archive(enrichedMsg) }
return streamsBuilder.build()
}
I have tried to use Window Delay to set this up, but have not managed to get it to work. I could of course do a sleep inside a peek, but that would leave a thread hanging and does not sound like a very clean solution.
The exact details of how the delay would work is not terribly important to my use case. For example, all of these would work fine:
All events on the topic in the past x seconds are all consumed at once. After it begins / finishes to consume, the stream waits x seconds before consuming again
Every event is processed x seconds after being put on the topic
The stream consumes messages with a delay of x seconds between every event
I would be very grateful if someone could provide a few lines of Kotlin or Java code that would accomplish any of the above.
You cannot really pause reading from the input topic using Kafka Streams—the only way to "delay" would be to call a "sleep", but as you mentioned, that blocks the whole thread and is not a good solution.
However, what you can do is to use a stateful processor, e.g., process() (with attached state store) instead of foreach(). If the retry fails, you don't put the record back into the input topic, but you put it into the store and also register a punctuation with desired retry delay. If the punctuation fires, you retry and if the retry succeeds, you delete the entry from the store and cancel the punctuation; otherwise, you wait until the punctuation fires again.
Related
I have a flink process that listens to Kafka. The Messages consumed are then to be saved in a concurrent hash map for a period of time and then need to be sinked to cassandra.
The Operator chain goes something like
DataStream<Message> datastream = KafkaSource.createsource();
DataStream<Message> decodededMessage = datastream.flatmap(new DecodeMessage());
decodedMessage.assigneTimestampsandWatermarks(new AscendingTimestampExtractor<Message>(){
public long extractAscendingTimestamp(Message m){
return message.getTimestamp();
}
}).keyBy((KeySelector<Message>) x-> x.getID())
.process(new Timerfunction())
.addSink(new MySink());
class TimerFunction extends KeyedProcessFunction<Integer,Message,Message>{
private ValueState<Message> x;
public void processElement(){
//some logic to create timestamp for one minute
context.timerService().registerEventTimeTimer(x.getTimestamp());
}
public void onTimer()
// output values on trigger
}
I got some doubts while working with eventime
Message will have a unique id and timestamp and some other attributes. There could be a million unique keys in a minute. Will keyBy operation effect performance?
I need to cover a scenario as below
X Message with ID 1 arrives at 8hrs 1minute and 1sec
Y Message with ID 2 arrives at 8hrs 1minute and 4th sec
Since im using Id as Key Both these Messages should have a timer set to trigger at 8hrs 2min 0sec.
As per flink documentation if timestamp of timers are same it will be triggered just once.
I'm facing a problem where source becomes idle for few minutes the timer keeps waiting for next watermark and
never triggers. How to deal with idle source?
Is using processingtime a better option in this case?
Also i have a restriction to use Flink v1.8 so would need some info with respect to that version.
Thanks in Advance
I don't fully understand your question; there's too much context missing. But I can offer a few points:
(1) keyBy is expensive: it forces serialization/deserialization along with a network shuffle.
(2) Timers are deduplicated if and only if they are for the same timestamp and the same key.
(3) As for the idle source problem, the event time timers will eventually fire when events begin to flow again, as that will advance the watermark(s). If can't wait, you can use something like https://github.com/aljoscha/flink/blob/6e4419e550caa0e5b162bc0d2ccc43f6b0b3860f/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/timestamps/ProcessingTimeTrailingBoundedOutOfOrdernessTimestampExtractor.java, or switch to processing time.
TL;DR;
Is there a way to automatically adjust delay between elements in Project Reactor based on downstream health?
More details
I have an application that reads records from Kafka topic, sends an HTTP request for each one of them and writes the result to another Kafka topic. Reading and writing from/to Kafka is fast and easy, but the third party HTTP service is easily overwhelmed, so I use delayElements() with a value from a property file, which means that this value does not change during application runtime. Here's a code sample:
kafkaReceiver.receiveAutoAck()
.concatMap(identity())
.delayElements(ofMillis(delayElement))
.flatMap(message -> recordProcessingFunction.process(message.value()), messageRate)
.onErrorContinue(handleError())
.map(this::getSenderRecord)
.flatMap(kafkaSender::send)
However, the third party service might perform differently overtime and I'd like to be able to adjust this delay accordingly. Let's say, if I see that over 5% of requests fail over 10 second period, I would increase the delay. If it gets lower than 5% for over 10 sec, then I would reduce the delay again.
Is there an existing mechanism for that in Reactor? I can think of some creative solutions from my side, but was wondering if they (or someone else) already implemented that.
I don't think there is backpressure provided by any HTTP client, including netty. One option is to switch to RSocket, but if you are calling a third-party service, that may not be an option, I guess. You could tune a rate that works during most of the day and send the errored out message to another topic using doOnError or similar. Another receiver can process those messages with even higher delays, put the message back on the same topic with a retry count if it errors out again so that you can finally stop processing them.
If you are looking for delaying elements depends on the elements processing speed, you could use delayUntil.
Flux.range(1, 100)
.doOnNext(i -> System.out.println("Kafka Receive :: " + i))
.delayUntil(i -> Mono.fromSupplier(() -> i)
.map(k -> {
// msg processing
return k * 2;
})
.delayElement(Duration.ofSeconds(1)) // msg processing simulation
.doOnNext(k -> System.out.println("Kafka send :: " + k)))
.subscribe();
You can add a retry with exponential backoff. Somethign like this:
influx()
.flatMap(x -> Mono.just(x)
.map(data -> apiCall(data))
.retryWhen(
Retry.backoff(Integet.MAX_VALUE, Duration.ofSeconds(30))
.filter(err -> err instanceof RuntimeException)
.doBeforeRetry(
s -> log.warn("Retrying for err {}", s.failure().getMessage()))
.onRetryExhaustedThrow((spec, sig) -> new RuntimeException("ex")))
.onErrorResume(err -> Mono.empty()),
concurrency_val,
prefetch_val)
This will retry the failed request Integet.MAX_VALUE times with minimum time of 30s between each retry. The subsequent retries are actually offset by a configurable jitter factor (default value = 0.5) causing the duration to increase between successive retries.
The documentation on Retry.backoff says that:
A RetryBackoffSpec preconfigured for exponential backoff strategy with jitter, given a maximum number of retry attempts and a minimum Duration for the backoff.
Also, since the whole operation is mapped in flatMap, you can vary the default concurrency and prefetch values for it in order to account for the maximum number of requests that can fail at any given time while the whole pipeline waits for the RetryBackOffSpec to complete successfully.
Worst case scenario, your concurrency_val number of requests have failed and waiting for 30+ seconds for the retry to happen. The whole operation might halt down (still waiting for success from downstream) which may not be desirable if the downstream system don't recover in time. Better to replace backOff limit from Integer.MAX_VALUE to something managable beyond which it would just log the error and proceed with next event.
I have a simple class named QueueService with some methods that wrap the methods from the AWS SQS SDK for Java. For example:
public ArrayList<Hashtable<String, String>> receiveMessages(String queueURL) {
List<Message> messages = this.sqsClient.receiveMessage(queueURL).getMessages();
ArrayList<Hashtable<String, String>> resultList = new ArrayList<Hashtable<String, String>>();
for(Message message : messages) {
Hashtable<String, String> resultItem = new Hashtable<String, String>();
resultItem.put("MessageId", message.getMessageId());
resultItem.put("ReceiptHandle", message.getReceiptHandle());
resultItem.put("Body", message.getBody());
resultList.add(resultItem);
}
return resultList;
}
I have another another class named App that has a main and creates an instace of the QueueService.
I looking for a "pattern" to make the main in App to listen for new messages in the queue. Right now I have a while(true) loop where I call the receiveMessagesmethod:
while(true) {
messages = queueService.receiveMessages(queueURL);
for(Hashtable<String, String> message: messages) {
String receiptHandle = message.get("ReceiptHandle");
String messageBody = message.get("MessageBody");
System.out.println(messageBody);
queueService.deleteMessage(queueURL, receiptHandle);
}
}
Is this the correct way? Should I use the async message receive method in SQS SDK?
To my knowledge, there is no way in Amazon SQS to support an active listener model where Amazon SQS would "push" messages to your listener, or would invoke your message listener when there are messages.
So, you would always have to poll for messages. There are two polling mechanisms supported for polling - Short Polling and Long Polling. Each has its own pros and cons, but Long Polling is the one you would typically end up using in most cases, although the default one is Short Polling. Long Polling mechanism is definitely more efficient in terms of network traffic, is more cost efficient (because Amazon charges you by the number of requests made), and is also the preferred mechanism when you want your messages to be processed in a time sensitive manner (~= process as soon as possible).
There are more intricacies around Long Polling and Short Polling that are worth knowing, and its somewhat difficult to paraphrase all of that here, but if you like, you can read a lot more details about this through the following blog. It has a few code examples as well that should be helpful.
http://pragmaticnotes.com/2017/11/20/amazon-sqs-long-polling-versus-short-polling/
In terms of a while(true) loop, I would say it depends.
If you are using Long Polling, and you can set the wait time to be (max) 20 seconds, that way you do not poll SQS more often than 20 seconds if there are no messages. If there are messages, you can decide whether to poll frequently (to process messages as soon as they arrive) or whether to always process them in time intervals (say every n seconds).
Another point to note would be that you could read upto 10 messages in a single receiveMessages request, so that would also reduce the number of calls you make to SQS, thereby reducing costs. And as the above blog explains in details, you may request to read 10 messages, but it may not return you 10 even if there are that many messages in the queue.
In general though, I would say you need to build appropriate hooks and exception handling to turn off the polling if you wish to at runtime, in case you are using a while(true) kind of a structure.
Another aspect to consider is whether you would like to poll SQS in your main application thread or you would like to spawn another thread. So another option could be to create a ScheduledThreadPoolExecutor with a single thread in the main to schedule a thread to poll the SQS periodically (every few seconds), and you may not need a while(true) structure.
There are a few things that you're missing:
Use the receiveMessages(ReceiveMessageRequest) and set a wait time to enable long polling.
Wrap your AWS calls in try/catch blocks. In particular, pay attention to OverLimitException, which can be thrown from receiveMessages() if you would have too many in-flight messages.
Wrap the entire body of the while loop in its own try/catch block, logging any exceptions that are caught (there shouldn't be -- this is here to ensure that your application doesn't crash because AWS changed their API or you neglected to handle an expected exception).
See doc for more information about long polling and possible exceptions.
As for using the async client: do you have any particular reason to use it? If not, then don't: a single receiver thread is much easier to manage.
If you want to use SQS and then lambda to process the request you can follow the steps given in the link or you always use lambda instead of SQS and invoke lambda for every request.
As of 2019 SQS can trigger lambdas:
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
I found one solution for actively listening the queue.
For Node. I have used the following package and resolved my issue.
sqs-consumer
Link
https://www.npmjs.com/package/sqs-consumer
I am using a Kafka producer - consumer model in my framework. The record consumed at the consumer end is later indexed onto the elasticsearch. Here i have a use case where if the ES is down, I will have to pause the kafka consumer until the ES is up, Once it is up, I need to resume the consumer and consume the record from where I last left.
I don't think this can be achieved with #KafkaListener. Can anyone please give me a solution for this? I figured out that I need to write my own KafkaListenerContainer for this, but I am not able to implement it correctly. Any help would be much appreciated.
There are sevaral solutions possible, one simple way would be to use the KafkaConsumer API. In KafkaConsumer implementation keeps track of the position on the topic which will be retrieved with the next call to poll(...). Your problem is after you get the record from Kafka, you may be unable to insert it into Elastic Search. In this case, you have to write a routine to reset the position of the consumer, which in your case will be consumer.seek(partition, consumer.position(partition)-1). This will reset the position to the earlier position. At this point a good approach would be to pause the partition (this will enable the server to do some resource clean up) and then poll the ES (by whatever mechanism you desire). Once ES is available, call resume on the consumer and continue with your usual poll-insert cycle.
EDITED AFTER DISCUSSION
Create a spring bean with the lifecycle methods specified. In the initialization method of the bean instantiate your KafkaConsumer (retrieve the configuration of consumer from any source). From the method start a thread to interact with consumer and update ES, rest of the design is as per above. This is a single thred model. For higher throughput consider keeping the data retrieved from Kafka in small in memory queue and a dispatcher thread to take the message and give it to a pooled thread for updating ES.
I would suggest rather pausing consumer , why can't you retry the same message again and again and commit offset once message is consumed successfully.
For Example:
Annotate your method with #Retryable
And block your method with try/catch and throw new exception in catch block.
For ListenerFactory configuration add property:
factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
factory.getContainerProperties().setAckOnError(false);
There are a couple of ways you can achieve this.
Method #1
Create your KafkaConsumer object inside a Thread and run an infinite while loop to consume events.
Once you have this setup you can interrupt the thread and in the while loop, have check if Thread.interrupt() is true. If yes, break out of the loop and close the consumer.
Once you are done with your recovery activity, recreate the consumer with the same group ID. Do note, this may rebalance the consumer.
If you are using python same thing can be achieved using threads stop_event.
Method #2
Use KafkaConumer APIs pause(partitions_list) function. It accepts Kafka partitions as input. So, extract all the portions assigned to the consumer and pass these portions to the pause(partitions_list) function. The consumer will stop pulling data from these partitions.
After a certain time, you can use the resume(partitions_list) function to resume the consumer. This method will not rebalance consumers.
Note: If you are using the Spring Kafka client. This becomes a lot easier. You can start/stop the Message Listener Container.
You can find a detailed explanation here.
Autowired
private KafkaListenerEndpointRegistry registry;
KafkaListener(id = "dltGroup", topics = "actualTopicNAme.DLT", autoStartup = "false")
public void dltListen(String in) {
logger.info("Received from DLT: " + in);
}
public void startKafka() {
// TODO if not running
registry.getListenerContainer("dltGroup").start();
}
public void resumeKafka() {
if (registry.getListenerContainer("dltGroup").isContainerPaused() ||
registry.getListenerContainer("dltGroup").isPauseRequested()) {
registry.getListenerContainer("dltGroup").resume();
}
}
public void pauseKafka() {
if (registry.getListenerContainer("dltGroup").isRunning()) {
registry.getListenerContainer("dltGroup").pause();
}
}
My requirement is to process(call Webservice) the List of messages in a serial fashion one after the other. If first message is successful then only process the 2nd message and so on.
I am using Splitter here to split the messages. Inside Splitter I have used Delayer (not persistence).
Problem is as soon as 1st messages goes in to delayer, 2nd message in the list start processing, without waiting for the first message to be completed.
I believe this is happening because delayer doesn't block the threads.
Is there as way I can achieve this functionality by using Splitter and delayer?
The delayer is designed that way; it schedules the message to be processed some time in the future. If you simply want to slow down the rate at which you process splits; simply add a POJO service (invoked by a service activator) that has a Thread.sleep(...) and returns the input message.
public Message<?> sleeper(Message<?>) throws InterruptedException {
Thread.sleep(1000);
}