Spring Kafka acknowledgment settings - java

I have a spring boot application where we are using spring Kafka in consumer section i have made enable.auto.commit to false and set my listener ack-mode to manual_immediate
I have concurrent consumers so after consuming the record I call acknowledgment.acknowledge() but here i still face the duplicate issue problem whenever rebalance happens other consumer start consuming the same message which is already consumed by one consumer. Any idea what magic is happening behind the scene.
Anyone know when using manual_immediate does it commit message by commitSync or commitAsync ? is there way we can change the behaviour to avoid duplicates record message reading. Is there a way we can use hybrid model in Spring Kafka
In spring boot Kafka is there a way we can see whenever a rebalance happen i can log it.
How to create rebalance if we want to do it for some testing purpose?

As long as you call acknowledge on the listener thread, it will use commitSync() by default; use the syncCommits container property to use async commits.
If you call it on a different thread, the commit is queued to be processed by the consumer thread as soon as possible.
Duplicates cannot be avoided if a forced rebalance takes place because your listener took too long to process the records received by the poll().
You can increase max.poll.interval.ms and/or reduce max.poll.records to ensure you can process the records in time.
You can add a ConsumerRebalanceListener to the container properties to log rebalances.
Reduce max.poll.interval.ms to a small value to reproduce in a test.

Firstly, irrespective of ack-mode it is never guaranteed that a message is consumed just once. For instance, A rebalance can happen between the time a message is consumed and the time offset is committed, resulting in Kafka delivering the message again to the newly assigned consumer. It is the applications responsibility to be idempotent to duplicated messages.
In order to listen for rebalance events an implementation of ConsumerRebalanceListener is needed. You can plug this implementation to Spring's auto configured ConcurrentKafkaListenerContainerFactory instance. A more detailed description of how this can be done has already been answered here.
If you wish to create a forced rebalance for testing you can do so by killing one of hopefully more than 1 existing consumers. If using spring-kafka you can do this by using an #Autowired instance of KafkaListenerEndpointRegistry and kill/rest/(re)start any consumer. Something on this should do:
#Autowired
KafkaListenerEndpointRegistry registry;
public void myTest() {
Collection<MessageListenerContainer> containers = registry.getAllListenerContainers()
containers.get(0).stop()
}

Related

Get partition assigned by Kafka

I’m using Quarkus Kafka consumer. And I need to know to which partitions my consumer has been assigned by Kafka broker.
Any listener that I can use, just like the one that Kafka client provide.
Otherwise how can I assign a specific partition in each of the nodes of my cluster?
Regards
From the Quarkus docs, I think you can use rebalance listener.
It should be called, because the initial assignment of partitions to your client (from no partitions to some partitions) can be considered as rebalance too.
https://quarkus.io/guides/kafka#consumer-rebalance-listener
The listener is invoked every time the consumer topic/partition assignment changes. For example, when the application starts, it invokes the partitionsAssigned callback with the initial set of topics/partitions associated with the consumer. If, later, this set changes, it calls the partitionsRevoked and partitionsAssigned callbacks again, so you can implement custom logic.

Spring cloud stream Kafka consumer stuck with long running job and large value for max.poll.interval.ms

We have got some long-running jobs that have been also implemented with Spring Cloud Stream and Kafka binder. The issue we are facing is because the default value for max.poll.interval.ms and max.poll.records is not appropriate for our use case, we need to set a relatively large value for max.poll.interval.ms (a few hours) and a relatively small value for max.poll.records (e.g. 1) to be aligned with the longest-running job could ever get consumed by the consumer. This addresses the issue of a consumer getting into the rebalance loop. However, it causes some operational challenges with the consumer. It happens sometimes that the consumer gets stuck on the restart and does not consume any messages until the max.poll.interval.ms passes.
Is this because of the way that the Spring Cloud stream poll has been implemented? Does it help if I use the sync consumer and manages the poll() accordingly?
The consumer logs the lose of heartbeat and the message I can see in the Kafka log when the consumer has stuck:
GroupCoordinator 11]: Member consumer-3-f46e14b4-5998-4083-b7ec-bed4e3f374eb in group foo has failed, removing it from the group
Spring cloud stream (message-driven) is not a good fit for this application. It would be better to manage the consumer yourself; close it after the poll(); process the job; create a new consumer and commit the offset, and poll() again.

How to execute manual offset acknowledge from spring ListenerContainerIdleEvent

I have a Kafka listener that implements the acknowledgment message listener interface with the following properties:
ackMode - MANUAL_IMMEDIATE
idleEventInterval - 3 Min
While consuming message on the listener it decides if to ack the specific record via acknowledgment.acknowledge() and it works as expected.
In addition, I have a scenario to ack last offset number(keeping it in memory) after X Minutes(also if no messages arrived).
to overcome this requirement I decide to use ListenerContainerIdleEvent that fire each 3 min according to my configuration.
My Questions are:
is there any way to acknowledge Kafka offset as a trigger to an idle event? the idle event contains a reference to KafkaMessageListenerContainer but it encapsulates the ListenerConsumer that hold KafkaConsumer.
is the idle message event send sync(with the same thread of the KafkaListenerConsumer)? From the code, the default implementation is SimpleApplicationEventMulticaster that initialize without TaskExecutor so it invokes the listener on the same thread. can u approve it?
I am using spring-kafka 1.3.9.
Yes, just keep a reference to the last Acknowledgment and call acknowledge() again.
Yes, the event is published on the consumer thread by default.
Even if the event is published on a different thread (executor in the multicaster) it should still work because, instead of committing directly, the commit will be queued and processed by the consumer when it wakes from the poll.
See the logic in processAck().
In newer versions (starting with 2.0), the event has a reference to the consumer so you can interact with it directly (obtain the current position and commit it again), as long as the event is published on the consumer thread.

How to handle offset commit failures with enable.auto.commit disabled in Spark Streaming with Kafka?

I use Spark 2.0.0 with Kafka 0.10.2.
I have an application that is processing messages from Kafka and is a long running job.
From time to time I see the following message in the logs. Which I understand how I can increase the timeout and everything but what I wanted to know was given that I do have this error how can I recover from it ?
ERROR ConsumerCoordinator: Offset commit failed.
org.apache.kafka.clients.consumer.CommitFailedException:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member.
This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing.
You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
This is not on how I escape this error but how to handle it once it occurs
Background: In normal situations I will not see commit errors, but if I do get one I should be able to recover from it. I am using AT_LEAST_ONCE setup, So I am completely happy with reprocessing a few messages.
I am running Java and using DirectKakfaStreams with Manual commits.
Creating the stream:
JavaInputDStream<ConsumerRecord<String, String>> directKafkaStream =
KafkaUtils.createDirectStream(
jssc,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams));
Commiting the offsets
((CanCommitOffsets) directKafkaStream.inputDStream()).commitAsync(offsetRanges);
My understanding of the situation is that you use the Kafka Direct Stream integration (using spark-streaming-kafka-0-10_2.11 module as described in Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher)).
As said in the error message:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member.
Kafka manages what topic partition a consumer consumes so the Direct Stream will create a pool of consumers (inside a single consumer group).
As with any consumer group you should expect rebalancing which (quoting Chapter 4. "Kafka Consumers - Reading Data from Kafka" from Kafka: The Definitive Guide):
consumers in a consumer group share ownership of the partitions in the topics they subscribe to. When we add a new consumer to the group it starts consuming messages from partitions which were previously consumed by another consumer. The same thing happens when a consumer shuts down or crashes, it leaves the group, and the partitions it used to consume will be consumed by one of the remaining consumers. Reassignment of partitions to consumers also happen when the topics the consumer group is consuming are modified, for example if an administrator adds new partitions.
There are quite a few cases when rebalancing can occur and should be expected. And you do.
You asked:
how can I recover from it? This is not on how I escape this error but how to handle it once it occurs?
My answer would be to use the other method of CanCommitOffsets:
def commitAsync(offsetRanges: Array[OffsetRange], callback: OffsetCommitCallback): Unit
that gives you access to Kafka's OffsetCommitCallback:
OffsetCommitCallback is a callback interface that the user can implement to trigger custom actions when a commit request completes. The callback may be executed in any thread calling poll().
I think onComplete gives you a handle on how the async commit has finished and act accordingly.
Something I can't help you with much is how to revert the changes in a Spark Streaming application when some offsets could not have been committed. That I think requires tracking offsets and accept a case where some offsets can't be committed and be re-processed.

Detect the rebalancing of cluster via code

Is there a way we can detect via code that the hazelcast cluster is rebalancing
Use case: Multiple producer, multiple consumer
Shutdown consumer1 abruptly, and see how long it takes for the other members to resume producing or consuming. In other words i am trying to see the time it takes for the cluster to rebalance.
AFAIK there is no direct API for it.
You could write your own SPI service which receives repartitioning events.
But can you tell a bit more about your producer consumer problem? Do you want to use an IQueue for that?

Categories

Resources