This can be a stupid question but I am curious if it is possible to disable Consumer Rebalancing in Spring Kafka. Imagine I have a Topic with 3 partitions and 3 different consumers running using #KafkaListener on different TopicPartitions. Is it possible to not let rebalancing happen if one of the consumers is down? (I want to do manual offset management and when the consumer is up I want to start from where I left).
You can have three different consumers on each partation with auto offset is false, and you can manually submit the offset. So whenever consumer stops and start it will read from previous offset, and also assigning consumer to specific partition will not rebalance among another partitions
Related
I would like to better understand Kafka message retry process.
I have heard that failed processing of consumed messages can be addressed using 2 options:
SeekToCurrentErrorHandler (offset reset)
publishing a message to a Dead Letter Queues (DLQs)
The 2nd option is a pretty clear, that if a message failed to be processed it is simply pushed to an error queue. I am more curious about the first option.
AFAIK, the 1st option is the most widely used one, but how does it work when multiple consumers concurrently consume messages from the same topic? Does it work that if a particular message has failed the offset for the consumer-id is being reset to the message's offset? What will happen with the messages successfully processed simultaneously/after the failed one, will they be re-processed?
How can you advice me to deal with message re-tries?
Each partition can only be consumed by one consumer.
When you have multiple consumers, you must have at least that number of partitions.
The offset is maintained for each partition; the error handler will (can) only perform seeks on the partitions that are assigned to this consumer.
As far as i understand the best way to organise broadcast(i.e. every consumer receives all messages)
is to give each consumer it's own consumer group-id.
The problem is that "If all consumers in a group leave the group, the group is automatically destroyed"
(source: https://jaceklaskowski.gitbooks.io/apache-kafka/kafka-properties-group-id.html)
so that means that if my consumer goes down the corresponding entry with the key groupId,topicName,partitionNumber to __consumer_offsets
will be removed, meaning that when the consumer goes up again, with the same group-id it wont be able to read messages that were sent while it was down.
Does anyone know a solution to this?
the group is automatically destroyed doesn't mean, that whole information about group disappear. I think it relates to data, that are kept in memory. Information about offset are not removed from __consumer_offsets. Depending on value of offsets.retention.minutes property, (broker property) old offsets are removed. By default 7 days (10080 minutes)
In Apache Kafka documentation you can find information about offsets.retention.minutes property brokers configs
offsets.retention.minutes - After a consumer group loses all its consumers (i.e. becomes empty) its offsets will be kept for this retention period before getting discarded. For standalone consumers (using manual assignment), offsets will be expired after the time of last commit plus this retention period.
It means, that if for offsets.retention.minutes minutes none of consumer from particular group will be connected, information about offset will be deleted.
I am working on implementing a Kafka based solution to our application.
As per the Kafka documentation, what i understand is one consumer in a consumer group (which is a thread) is internally mapped to one partition in the subscribed topic.
Let's say i have a topic with 40 partitions and i have a high level consumer running in 4 instances. I do not want one instance to consume the same messages consumed by another instance. But if one instance goes down, the other three instances should be able to process all the messages.
Should i go for same consumer group with 10 threads per instance?
- Stackoverflow says same consumer group between the instances act as traditional synchronous queue mechanism
In Apache Kafka why can't there be more consumer instances than partitions?
Or Should i go for different consumer group per instance?
Using simple consumer or low level consumer gives control over the partition but then if one instance goes down, the other three instances would not process the messages from the partitions consumed in first instance
First to explain the concept of Consumers & Consumer Groups,
Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group.
The records will be effectively load balanced over the consumer instances in a consumer group. If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.
The way consumption is implemented in Kafka is by dividing up the partitions in the log over the consumer instances so that each instance is the exclusive consumer of a "fair share" of partitions at any point in time. If new instances join the group they will take over some partitions from other members of the group; if an instance dies, its partitions will be distributed to the remaining instances.
Now to answer your questions,
1. I do not want one instance to consume the same messages consumed by another instance. But if one instance goes down, the other three instances should be able to process all the messages.
This is possible by default in Kafka architecture. You just have to label all the 4 instances with the same consumer group name.
2. Should i go for same consumer group with 10 threads per instance ?
Doing this will assign each thread a kafka partition from which it will consume data, which is optimal. Reducing the number of threads will load balance the record distribution among the consumer instances and MAY overload some of the consumer instances.
3. In Apache Kafka why can't there be more consumer instances than partitions?
In Kafka, a partition can be assigned only to one consumer instance. Thus, creating more consumer instances than partitions will lead to idle consumers who will not be consuming any records from kafka.
4. Should i go for different consumer group per instance?
No. This will lead to duplication of the records, as every record will be sent to all the instances, as they are from different consumer groups.
Hope this clarifies your doubts.
There are few things to note when designing your Kafka echo system:
Consumer is essentially a thread and you do not want multiple thread trying to change your offset mark. That's why the consumer system should be designed as one consumer one thread.
Offset commits, there a delicate balance between how frequently you want to perform offset commits. If the frequency is higher then it will have an adverse effect on performance of your system (Zk will be the bottleneck). If the frequency is two low then you may risk duplicate messages.
In Kafka you have both ways to do competing-consumers and publish-subscribe patterns:
competing consumers : it's possible putting consumers inside the same consumer group. So that each partition is accessible by only one consumer (of course a consumer can read more than one partition). It means that you can't have more consumers than partitions in a consumer group, because the other consumers will be idle without being assigned any partition. Of course if one consumer in the consumer group goes down, one of the idle consumer will take the partition.
publish subscribe : if you have different consumer groups, all consumers in different consumer groups will receive same messages. Inside the consumer group then, the above pattern will be applied.
I use Spark 2.0.0 with Kafka 0.10.2.
I have an application that is processing messages from Kafka and is a long running job.
From time to time I see the following message in the logs. Which I understand how I can increase the timeout and everything but what I wanted to know was given that I do have this error how can I recover from it ?
ERROR ConsumerCoordinator: Offset commit failed.
org.apache.kafka.clients.consumer.CommitFailedException:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member.
This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing.
You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
This is not on how I escape this error but how to handle it once it occurs
Background: In normal situations I will not see commit errors, but if I do get one I should be able to recover from it. I am using AT_LEAST_ONCE setup, So I am completely happy with reprocessing a few messages.
I am running Java and using DirectKakfaStreams with Manual commits.
Creating the stream:
JavaInputDStream<ConsumerRecord<String, String>> directKafkaStream =
KafkaUtils.createDirectStream(
jssc,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams));
Commiting the offsets
((CanCommitOffsets) directKafkaStream.inputDStream()).commitAsync(offsetRanges);
My understanding of the situation is that you use the Kafka Direct Stream integration (using spark-streaming-kafka-0-10_2.11 module as described in Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher)).
As said in the error message:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member.
Kafka manages what topic partition a consumer consumes so the Direct Stream will create a pool of consumers (inside a single consumer group).
As with any consumer group you should expect rebalancing which (quoting Chapter 4. "Kafka Consumers - Reading Data from Kafka" from Kafka: The Definitive Guide):
consumers in a consumer group share ownership of the partitions in the topics they subscribe to. When we add a new consumer to the group it starts consuming messages from partitions which were previously consumed by another consumer. The same thing happens when a consumer shuts down or crashes, it leaves the group, and the partitions it used to consume will be consumed by one of the remaining consumers. Reassignment of partitions to consumers also happen when the topics the consumer group is consuming are modified, for example if an administrator adds new partitions.
There are quite a few cases when rebalancing can occur and should be expected. And you do.
You asked:
how can I recover from it? This is not on how I escape this error but how to handle it once it occurs?
My answer would be to use the other method of CanCommitOffsets:
def commitAsync(offsetRanges: Array[OffsetRange], callback: OffsetCommitCallback): Unit
that gives you access to Kafka's OffsetCommitCallback:
OffsetCommitCallback is a callback interface that the user can implement to trigger custom actions when a commit request completes. The callback may be executed in any thread calling poll().
I think onComplete gives you a handle on how the async commit has finished and act accordingly.
Something I can't help you with much is how to revert the changes in a Spark Streaming application when some offsets could not have been committed. That I think requires tracking offsets and accept a case where some offsets can't be committed and be re-processed.
I have two consumers with different client ID's and group ID's. Aside from retention hour and max partitions, my Kafka installation contains default configuration. I've looked around to see if anyone else has had the same issue but can't pull up any results.
So the scenario goes like this:
Consumer A:
Connects to Kafka, consumes about 3 million messages that need to be consumed, and then sits idle waiting for more messages.
Consumer B:
Different client / group ID, connects to the same Kafka topic, and this causes consumer A to get a repeat of the 3 million messages while consumer B consumes them as well.
The two consumers are two completely different Java applications with different client and group ID's running on the same computer. The Kafka server is on another computer.
Is this a normal behavior in Kafka? I am at a complete loss.
Here is my consumer config:
bootstrap.servers=192.168.110.109:9092
acks=all
max.block.ms=2000
retries=0
batch.size=16384
auto.commit.interval.ms=1000
linger.ms=0
key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
block.on.buffer.full=true
enable.auto.commit=false
auto.offset.reset=none
session.timeout.ms=30000
zookeeper.session.timeout=100000
rebalance.backoff.ms=8000
group.id=consumerGroupA
zookeeper.connect=192.168.110.109:2181
poll.interval=100
And the obvious difference in my consumer B is the group.id=consumerGroupB
This is a correct behavior. Because based on your configs, your consumers don't commit offset of records that they have read!
When a consumer read a record, it must commit reading it, you can ensure that consumers commit offsets automatically by setting enable.auto.commit=true or commit each record manually. In this case I think auto commit is fine for you.