How to pause a kafka consumer? - java

I am using a Kafka producer - consumer model in my framework. The record consumed at the consumer end is later indexed onto the elasticsearch. Here i have a use case where if the ES is down, I will have to pause the kafka consumer until the ES is up, Once it is up, I need to resume the consumer and consume the record from where I last left.
I don't think this can be achieved with #KafkaListener. Can anyone please give me a solution for this? I figured out that I need to write my own KafkaListenerContainer for this, but I am not able to implement it correctly. Any help would be much appreciated.

There are sevaral solutions possible, one simple way would be to use the KafkaConsumer API. In KafkaConsumer implementation keeps track of the position on the topic which will be retrieved with the next call to poll(...). Your problem is after you get the record from Kafka, you may be unable to insert it into Elastic Search. In this case, you have to write a routine to reset the position of the consumer, which in your case will be consumer.seek(partition, consumer.position(partition)-1). This will reset the position to the earlier position. At this point a good approach would be to pause the partition (this will enable the server to do some resource clean up) and then poll the ES (by whatever mechanism you desire). Once ES is available, call resume on the consumer and continue with your usual poll-insert cycle.
EDITED AFTER DISCUSSION
Create a spring bean with the lifecycle methods specified. In the initialization method of the bean instantiate your KafkaConsumer (retrieve the configuration of consumer from any source). From the method start a thread to interact with consumer and update ES, rest of the design is as per above. This is a single thred model. For higher throughput consider keeping the data retrieved from Kafka in small in memory queue and a dispatcher thread to take the message and give it to a pooled thread for updating ES.

I would suggest rather pausing consumer , why can't you retry the same message again and again and commit offset once message is consumed successfully.
For Example:
Annotate your method with #Retryable
And block your method with try/catch and throw new exception in catch block.
For ListenerFactory configuration add property:
factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
factory.getContainerProperties().setAckOnError(false);

There are a couple of ways you can achieve this.
Method #1
Create your KafkaConsumer object inside a Thread and run an infinite while loop to consume events.
Once you have this setup you can interrupt the thread and in the while loop, have check if Thread.interrupt() is true. If yes, break out of the loop and close the consumer.
Once you are done with your recovery activity, recreate the consumer with the same group ID. Do note, this may rebalance the consumer.
If you are using python same thing can be achieved using threads stop_event.
Method #2
Use KafkaConumer APIs pause(partitions_list) function. It accepts Kafka partitions as input. So, extract all the portions assigned to the consumer and pass these portions to the pause(partitions_list) function. The consumer will stop pulling data from these partitions.
After a certain time, you can use the resume(partitions_list) function to resume the consumer. This method will not rebalance consumers.
Note: If you are using the Spring Kafka client. This becomes a lot easier. You can start/stop the Message Listener Container.
You can find a detailed explanation here.

Autowired
private KafkaListenerEndpointRegistry registry;
KafkaListener(id = "dltGroup", topics = "actualTopicNAme.DLT", autoStartup = "false")
public void dltListen(String in) {
logger.info("Received from DLT: " + in);
}
public void startKafka() {
// TODO if not running
registry.getListenerContainer("dltGroup").start();
}
public void resumeKafka() {
if (registry.getListenerContainer("dltGroup").isContainerPaused() ||
registry.getListenerContainer("dltGroup").isPauseRequested()) {
registry.getListenerContainer("dltGroup").resume();
}
}
public void pauseKafka() {
if (registry.getListenerContainer("dltGroup").isRunning()) {
registry.getListenerContainer("dltGroup").pause();
}
}

Related

Callback method vs get method in Kafka producer

To get the produced record details , we have two options to choose from
onCompletion() - callback function
get() method
Could someone please explain what is the difference between them and how to use them in details please? (JAVA)
NOTE : Producer properties which I'm using is mostly default (ex:batch.size,acks,max.block.ms...)
onCompletion is the async way of producing Kafka data and the loop with a get will be a sync way of writing data in Kafka.
The producer in Kafka writes data on the topic at very high throughput. If you use the sync get function in the producer code, after every write the producer needs to wait for the ack from Kafka. This throttles the producer throughput. The producer needs to wait for Kafka to store the data, replicate (based on how it is configured) and then give back the ack to the producer on successful write.
The alternative is onCompletion here the producer will keep on producing the data without waiting for ack from Kafka. Kafka will call the callback onCompletion if the write is successful. The producer needs to keep track of these onCompletion calls and if things fail, it needs to retry.
What generally producers do is send a batch of N records to Kafka and then wait for all the completion events and then send the next N records. This is something like the TCP sliding window flow control paradigm.
It is difficult to suggest what you should do. The downside of working with onCompletion and use a retry from there is that jeopardizes the ordering of the records in Kafka.
The producer may have sent 1..65 records successfully then Kafka missed 65-72 and then Kafka wrote 73..99. Once the Kafka completed writing 99, the producer may get the 66 , 67 as onCompletion (since it is an async callback, it can come anytime) call back and retry that. This essentially makes the record ordering jumbled up.
In those cases, the consumer needs to understand that all the writes may not be ordered.
My suggestion would be to use onCompletion for a batch of records. Generally, applications don't have very strict ordering requirements. So you could leverage the async nature of the call and improve throughput.
onCompletion() is an asynchronous callback method defined in the Java Kafka client.
on the other hand, get() is an inbuilt Java function. When you're using the Java Kafka client, you can use get() with a future for synchronous writes, as in the example from the Confluent documentation below:
Future<RecordMetadata> future = producer.send(record);
RecordMetadata metadata = future.get();

Spring Kafka acknowledgment settings

I have a spring boot application where we are using spring Kafka in consumer section i have made enable.auto.commit to false and set my listener ack-mode to manual_immediate
I have concurrent consumers so after consuming the record I call acknowledgment.acknowledge() but here i still face the duplicate issue problem whenever rebalance happens other consumer start consuming the same message which is already consumed by one consumer. Any idea what magic is happening behind the scene.
Anyone know when using manual_immediate does it commit message by commitSync or commitAsync ? is there way we can change the behaviour to avoid duplicates record message reading. Is there a way we can use hybrid model in Spring Kafka
In spring boot Kafka is there a way we can see whenever a rebalance happen i can log it.
How to create rebalance if we want to do it for some testing purpose?
As long as you call acknowledge on the listener thread, it will use commitSync() by default; use the syncCommits container property to use async commits.
If you call it on a different thread, the commit is queued to be processed by the consumer thread as soon as possible.
Duplicates cannot be avoided if a forced rebalance takes place because your listener took too long to process the records received by the poll().
You can increase max.poll.interval.ms and/or reduce max.poll.records to ensure you can process the records in time.
You can add a ConsumerRebalanceListener to the container properties to log rebalances.
Reduce max.poll.interval.ms to a small value to reproduce in a test.
Firstly, irrespective of ack-mode it is never guaranteed that a message is consumed just once. For instance, A rebalance can happen between the time a message is consumed and the time offset is committed, resulting in Kafka delivering the message again to the newly assigned consumer. It is the applications responsibility to be idempotent to duplicated messages.
In order to listen for rebalance events an implementation of ConsumerRebalanceListener is needed. You can plug this implementation to Spring's auto configured ConcurrentKafkaListenerContainerFactory instance. A more detailed description of how this can be done has already been answered here.
If you wish to create a forced rebalance for testing you can do so by killing one of hopefully more than 1 existing consumers. If using spring-kafka you can do this by using an #Autowired instance of KafkaListenerEndpointRegistry and kill/rest/(re)start any consumer. Something on this should do:
#Autowired
KafkaListenerEndpointRegistry registry;
public void myTest() {
Collection<MessageListenerContainer> containers = registry.getAllListenerContainers()
containers.get(0).stop()
}

Spring Kafka- When is exactly Consumer.poll() called behind the hood?

I have a spring boot application in which I have single Kafka Consumer.
I am using a DefaultKafkaConsumerFactory with default Consumer Configurations. I have a ConcurrentListenerContainerFactory with concurrency set to 1, and I have a method annotated with #KafkaListener.
I am listening to a topic with 3 partitions and I have 3 of such consumers deployed each in different applications. Hence, each consumer is listening to one partition.
Lets say poll on the consumer is called under the hood and 40 records are fetched. Then is each record, provided to the method annotated with #KafkaListener serially i.e. record 1 provided, wait till method finishes processing, record 2 provided , wait till method finishes processing and so on.
Does the above happen, or for every record obtained , a separate thread is created and the method invocation happens on a separate thread, so the main thread does not block and it can poll for records more quickly.
I would also like more clarity on what a message listener container is and the eventual message listener.
Thank you in advance.
In 1.3 and above there is a single thread per consumer; the next poll() is performed after the last message from the previous poll has been processed by the listener.
In earlier versions, there were two threads and a second (and possibly third) poll was performed while the listener thread is processing the first batch. This was required to avoid a rebalance due to a slow listener. The threading model was very complicated and we had to pause/resume the consumer when necessary. KIP-62 fixed the rebalance problem so we were able to use the much simpler threading model in use today.
Well, that is exactly an Apache Kafka position - guarantee an order processing records from the same partition in the same thread. Therefore when you distribute your topic with 3 partitions between 3 instances, each of them will get its own partition and does the polling in a single thread.
The KafkaMessageListenerContainer is an event-driven, self-controlling wrapper around KafkaConsumer. It really calls poll() in a while (isRunning()) { loop, which is scheduled in a TaskExecutor:
this.listenerConsumerFuture = containerProperties
.getConsumerTaskExecutor()
.submitListenable(this.listenerConsumer);
And it processes ConsumerRecords calling listener:
private void invokeListener(final ConsumerRecords<K, V> records) {
if (this.isBatchListener) {
invokeBatchListener(records);
}
else {
invokeRecordListener(records);
}
}

Pattern to continuously listen to AWS SQS messages

I have a simple class named QueueService with some methods that wrap the methods from the AWS SQS SDK for Java. For example:
public ArrayList<Hashtable<String, String>> receiveMessages(String queueURL) {
List<Message> messages = this.sqsClient.receiveMessage(queueURL).getMessages();
ArrayList<Hashtable<String, String>> resultList = new ArrayList<Hashtable<String, String>>();
for(Message message : messages) {
Hashtable<String, String> resultItem = new Hashtable<String, String>();
resultItem.put("MessageId", message.getMessageId());
resultItem.put("ReceiptHandle", message.getReceiptHandle());
resultItem.put("Body", message.getBody());
resultList.add(resultItem);
}
return resultList;
}
I have another another class named App that has a main and creates an instace of the QueueService.
I looking for a "pattern" to make the main in App to listen for new messages in the queue. Right now I have a while(true) loop where I call the receiveMessagesmethod:
while(true) {
messages = queueService.receiveMessages(queueURL);
for(Hashtable<String, String> message: messages) {
String receiptHandle = message.get("ReceiptHandle");
String messageBody = message.get("MessageBody");
System.out.println(messageBody);
queueService.deleteMessage(queueURL, receiptHandle);
}
}
Is this the correct way? Should I use the async message receive method in SQS SDK?
To my knowledge, there is no way in Amazon SQS to support an active listener model where Amazon SQS would "push" messages to your listener, or would invoke your message listener when there are messages.
So, you would always have to poll for messages. There are two polling mechanisms supported for polling - Short Polling and Long Polling. Each has its own pros and cons, but Long Polling is the one you would typically end up using in most cases, although the default one is Short Polling. Long Polling mechanism is definitely more efficient in terms of network traffic, is more cost efficient (because Amazon charges you by the number of requests made), and is also the preferred mechanism when you want your messages to be processed in a time sensitive manner (~= process as soon as possible).
There are more intricacies around Long Polling and Short Polling that are worth knowing, and its somewhat difficult to paraphrase all of that here, but if you like, you can read a lot more details about this through the following blog. It has a few code examples as well that should be helpful.
http://pragmaticnotes.com/2017/11/20/amazon-sqs-long-polling-versus-short-polling/
In terms of a while(true) loop, I would say it depends.
If you are using Long Polling, and you can set the wait time to be (max) 20 seconds, that way you do not poll SQS more often than 20 seconds if there are no messages. If there are messages, you can decide whether to poll frequently (to process messages as soon as they arrive) or whether to always process them in time intervals (say every n seconds).
Another point to note would be that you could read upto 10 messages in a single receiveMessages request, so that would also reduce the number of calls you make to SQS, thereby reducing costs. And as the above blog explains in details, you may request to read 10 messages, but it may not return you 10 even if there are that many messages in the queue.
In general though, I would say you need to build appropriate hooks and exception handling to turn off the polling if you wish to at runtime, in case you are using a while(true) kind of a structure.
Another aspect to consider is whether you would like to poll SQS in your main application thread or you would like to spawn another thread. So another option could be to create a ScheduledThreadPoolExecutor with a single thread in the main to schedule a thread to poll the SQS periodically (every few seconds), and you may not need a while(true) structure.
There are a few things that you're missing:
Use the receiveMessages(ReceiveMessageRequest) and set a wait time to enable long polling.
Wrap your AWS calls in try/catch blocks. In particular, pay attention to OverLimitException, which can be thrown from receiveMessages() if you would have too many in-flight messages.
Wrap the entire body of the while loop in its own try/catch block, logging any exceptions that are caught (there shouldn't be -- this is here to ensure that your application doesn't crash because AWS changed their API or you neglected to handle an expected exception).
See doc for more information about long polling and possible exceptions.
As for using the async client: do you have any particular reason to use it? If not, then don't: a single receiver thread is much easier to manage.
If you want to use SQS and then lambda to process the request you can follow the steps given in the link or you always use lambda instead of SQS and invoke lambda for every request.
As of 2019 SQS can trigger lambdas:
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
I found one solution for actively listening the queue.
For Node. I have used the following package and resolved my issue.
sqs-consumer
Link
https://www.npmjs.com/package/sqs-consumer

Rabbit Mq java client parallel consumption

I want to process messages from a rabbitMq queue in parallel. The queue is configured to be autoAck =false. I am using the camel-rabbitMQ support for camel endpoints, which has support for a threadPoolSize parameter, but this does not have the desired effect. Messages are still processed serially off the queue, even when threadpoolsize=20.
From debugging through the code I can see that the threadpoolsize parameter is used to create an ExecutorService that is used to pass to the rabbit connectionfactory as described here. This all looks good until you get into the rabbit ConsumerWorkService. Here messages are processed in block of max size 16 messages. Each message in a block is processed serially and then if there is more work to do the executor service is invokes with the next block. A code snippet of this is below. From this use of the executor service I can't see how the messages can be processed in parallel. The executorservice only ever has one piece of work to perform at a time.
What am I Missing?
private final class WorkPoolRunnable implements Runnable {
public void run() {
int size = MAX_RUNNABLE_BLOCK_SIZE;
List<Runnable> block = new ArrayList<Runnable>(size);
try {
Channel key = ConsumerWorkService.this.workPool.nextWorkBlock(block, size);
if (key == null) return; // nothing ready to run
try {
for (Runnable runnable : block) {
runnable.run();
}
} finally {
if (ConsumerWorkService.this.workPool.finishWorkBlock(key)) {
ConsumerWorkService.this.executor.execute(new WorkPoolRunnable());
}
}
} catch (RuntimeException e) {
Thread.currentThread().interrupt();
}
}
RabbitMQ's documentation is not very clear about this but, even though the ConsumerWorkService is using a thread pool, this pool doesn't seem to be used in a way to process messages in parallel:
Each Channel has its own dispatch thread. For the most common use case of one Consumer per Channel, this means Consumers do not hold up other Consumers. If you have multiple Consumers per Channel be aware that a long-running Consumer may hold up dispatch of callbacks to other Consumers on that Channel.
(http://www.rabbitmq.com/api-guide.html)
This documentation suggests using one Channel per thread and, in fact, if you simply create as many Channels as the required level of concurrency, messages will be dispatched between the consumers linked to these channels.
I've tested with 2 channels and consumers: when 2 messages are in the queue, each consumer only picks one message at a time. The blocks of 16 messages you mentioned don't seem to interfere, which is a good thing.
As a matter of fact, Spring AMQP also creates as several channels to process messages concurrently. This is done by:
setting SimpleMessageListenerContainer.setConcurrentConsumers(...): http://docs.spring.io/spring-amqp/docs/1.3.6.RELEASE/api/
and setting CachingConnectionFactory.setChannelCacheSize(...) accordingly: http://docs.spring.io/spring-amqp/docs/1.3.6.RELEASE/api/
I've also tested this to be working as expected.
If you have a single Channel instance, it's going to invoke its registered consumers serially as you correctly found out by examining ConsumerWorkService. There are 2 ways to overcome that:
Use multiple channels instead of one.
Use single channel but implement consumers in a special way. They should just pick incoming message from queue and put it as a task into an internal thread pool.
You can find more details in this post.

Categories

Resources