Attempt to heartbeat failed: Multiple consumers with single rest proxy instance - java

I'm playing around with Kafka topics and rest-proxy and faced a behavior I don't understand.
I have 1 zookeeper, 1 kafka broker, 1 schema registry and 1 rest-proxy instance. Then the topic with 1 partition and 2 consumers reading from it are created.
I created a topic with the following command:
kafka-topics --create --if-not-exists --zookeeper localhost:32181 --partitions 4 --replication-factor 1 --topic my-ttopic
Then I tried to read from both of the consumers. Upon the second consumer is created and attempt to read using it is made some rebalancing is triggered and reading just hangs. The Kafka Rest in turns produces tons of logs like this:
[2020-10-11 09:29:30,645] INFO [Consumer clientId=consumer-grps-2, groupId=grps] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-11 09:29:30,649] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Join group failed with org.apache.kafka.common.errors.MemberIdRequiredException: The group member needs to have a valid member id before actually entering a consumer group. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-11 09:29:30,649] INFO [Consumer clientId=consumer-grps-2, groupId=grps] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-11 09:29:30,650] INFO [GroupCoordinator 2]: Preparing to rebalance group grps in state PreparingRebalance with old generation 1 (__consumer_offsets-14) (reason: Adding new member consumer-grps-2-0b19385b-df80-4d11-8eaa-7c313354a131 with group instance id None) (kafka.coordinator.group.GroupCoordinator)
[2020-10-10 20:44:51,713] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-10 20:44:54,726] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-10 20:44:57,739] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-10 20:45:00,752] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-10 20:45:03,766] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-10 20:45:06,780] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-10 20:45:09,793] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-10 20:45:12,806] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-10 20:45:15,818] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-10 20:45:18,829] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-10 20:45:21,841] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-10 20:45:24,854] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-10 20:45:27,867] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-10 20:45:30,880] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-10 20:45:33,893] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2020-10-10 20:45:36,905] INFO [Consumer clientId=consumer-grps-2, groupId=grps] Attempt to heartbeat failed since group is rebalancing (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
Can you give any advice on what the problem might be about?

Related

Kafka CooperativeStickyAssignor revokes/assigns partition in one rebalance cycle

I have an application that runs 6 consumers in parallel. I am getting some unexpected results when I use CooperativeStickyAssignor.
If I understand the mechanism correctly, if the consumer looses partition in one rebalance cycle, the partition will be assigned in the next rebalance cycle.
This assumption is based on the RebalanceProtocol documentation and few blog posts that describe the protocol, like this one on Confluent blog.
The assignor should not reassign any owned partitions immediately, but
instead may indicate consumers the need for partition revocation so
that the revoked partitions can be reassigned to other consumers in
the next rebalance event. This is designed for sticky assignment logic
which attempts to minimize partition reassignment with cooperative
adjustments.
Any member that revoked partitions then rejoins the group, triggering
a second rebalance so that its revoked partitions can be assigned.
Until then, these partitions are unowned and unassigned.
These are the logs from the application that uses protocol='cooperative-sticky'. In the same rebalance cycle (generationId=640) partition 74 moves from consumer-3 to consumer-4. I omitted the lines that are logged by the other 4 consumers.
Mind that the log is in reverse(bottom to top)
2022-12-14 11:18:24 1 --- [consumer-3] x.y.z.MyRebalanceHandler1 : New partition assignment: partition-59, seek to min common offset: 85120524
2022-12-14 11:18:24 1 --- [consumer-3] x.y.z.MyRebalanceHandler2 : Partitions [partition-59] assigned successfully
2022-12-14 11:18:24 1 --- [consumer-3] x.y.z.MyRebalanceHandler1 : Partitions assigned: [partition-59]
2022-12-14 11:18:24 1 --- [consumer-3] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=partition-3-my-client-id-my-group-id, groupId=my-group-id] Adding newly assigned partitions: partition-59
2022-12-14 11:18:24 1 --- [consumer-3] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=partition-3-my-client-id-my-group-id, groupId=my-group-id] Notifying assignor about the new Assignment(partitions=[partition-59])
2022-12-14 11:18:24 1 --- [consumer-3] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=partition-3-my-client-id-my-group-id, groupId=my-group-id] Request joining group due to: need to revoke partitions [partition-26, partition-74] as indicated by the current assignment and re-join
2022-12-14 11:18:24 1 --- [consumer-3] x.y.z.MyRebalanceHandler2 : Partitions [partition-26, partition-74] revoked successfully
2022-12-14 11:18:24 1 --- [consumer-3] x.y.z.MyRebalanceHandler1 : Finished removing partition data
2022-12-14 11:18:24 1 --- [consumer-4] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=partition-4-my-client-id-my-group-id, groupId=my-group-id] (Re-)joining group
2022-12-14 11:18:24 1 --- [consumer-4] x.y.z.MyRebalanceHandler1 : New partition assignment: partition-74, seek to min common offset: 107317730
2022-12-14 11:18:24 1 --- [consumer-4] x.y.z.MyRebalanceHandler2 : Partitions [partition-74] assigned successfully
2022-12-14 11:18:24 1 --- [consumer-4] x.y.z.MyRebalanceHandler1 : Partitions assigned: [partition-74]
2022-12-14 11:18:24 1 --- [consumer-4] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=partition-4-my-client-id-my-group-id, groupId=my-group-id] Adding newly assigned partitions: partition-74
2022-12-14 11:18:24 1 --- [consumer-4] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=partition-4-my-client-id-my-group-id, groupId=my-group-id] Notifying assignor about the new Assignment(partitions=[partition-74])
2022-12-14 11:18:24 1 --- [consumer-4] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=partition-4-my-client-id-my-group-id, groupId=my-group-id] Request joining group due to: need to revoke partitions [partition-57] as indicated by the current assignment and re-join
2022-12-14 11:18:24 1 --- [consumer-4] x.y.z.MyRebalanceHandler2 : Partitions [partition-57] revoked successfully
2022-12-14 11:18:24 1 --- [consumer-4] x.y.z.MyRebalanceHandler1 : Finished removing partition data
2022-12-14 11:18:22 1 --- [consumer-3] x.y.z.MyRebalanceHandler1 : Partitions revoked: [partition-26, partition-74]
2022-12-14 11:18:22 1 --- [consumer-3] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=partition-3-my-client-id-my-group-id, groupId=my-group-id] Revoke previously assigned partitions partition-26, partition-74
2022-12-14 11:18:22 1 --- [consumer-3] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=partition-3-my-client-id-my-group-id, groupId=my-group-id] Updating assignment with\n\tAssigned partitions: [partition-59]\n\tCurrent owned partitions: [partition-26, partition-74]\n\tAdded partitions (assigned - owned): [partition-59]\n\tRevoked partitions (owned - assigned): [partition-26, partition-74]
2022-12-14 11:18:22 1 --- [consumer-3] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=partition-3-my-client-id-my-group-id, groupId=my-group-id] Successfully synced group in generation Generation{generationId=640, memberId='partition-3-my-client-id-my-group-id-c31afd19-3f22-43cb-ad07-9088aa98d3af', protocol='cooperative-sticky'}
2022-12-14 11:18:22 1 --- [consumer-3] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=partition-3-my-client-id-my-group-id, groupId=my-group-id] Successfully joined group with generation Generation{generationId=640, memberId='partition-3-my-client-id-my-group-id-c31afd19-3f22-43cb-ad07-9088aa98d3af', protocol='cooperative-sticky'}
2022-12-14 11:18:22 1 --- [consumer-4] x.y.z.MyRebalanceHandler1 : Partitions revoked: [partition-57]
2022-12-14 11:18:22 1 --- [consumer-4] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=partition-4-my-client-id-my-group-id, groupId=my-group-id] Revoke previously assigned partitions partition-57
2022-12-14 11:18:22 1 --- [consumer-4] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=partition-4-my-client-id-my-group-id, groupId=my-group-id] Updating assignment with\n\tAssigned partitions: [partition-74]\n\tCurrent owned partitions: [partition-57]\n\tAdded partitions (assigned - owned): [partition-74]\n\tRevoked partitions (owned - assigned): [partition-57]
2022-12-14 11:18:21 1 --- [id-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=partition-4-my-client-id-my-group-id, groupId=my-group-id] Successfully synced group in generation Generation{generationId=640, memberId='partition-4-my-client-id-my-group-id-ae2af665-edc9-4a8e-b658-98372d142477', protocol='cooperative-sticky'}
2022-12-14 11:18:21 1 --- [consumer-4] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=partition-4-my-client-id-my-group-id, groupId=my-group-id] Successfully joined group with generation Generation{generationId=640, memberId='partition-4-my-client-id-my-group-id-ae2af665-edc9-4a8e-b658-98372d142477', protocol='cooperative-sticky'}
What am I missing here?
I expect that the partition gets revoked in one rebalance cycle and gets assigned in the next.
Kafka client version is 3.2.1.

Spring Kafka multiple topic for one class dynamically

I recently wanted to add a new behavior in my project that uses spring-kafka.
The idea is really simple :
App1 create a new scenario name "SCENARIO_1" and publish this string in the topic "NEW_SCENARIO"
App1 publish some message on topic "APP2-SCENARIO_1" and "APP3-SCENARIO_1"
App2 (group-id=app2) listens on NEW_SCENARIO and creates a new consumer<Object,String> listening on a new topic "APP2-SCENARIO_1"
App3 (group-id=app3) listens on NEW_SCENARIO and creates a new consumer<Object,String> listening on a new topic "APP3-SCENARIO_1"
The goal is to create dynamically new topics and consumer. I cannot use spring kafka annotation since I need it to be dynamic so I did this :
#KafkaListener(topics = ScenarioTopics.NEW_SCENARIO)
public void receive(final String topic) {
logger.info("Get new scenario " + topic + ", creating new consumer");
TopicPartitionOffset topicPartitionOffset = new TopicPartitionOffset(
"APP2_" + topic, 1, 0L);
ContainerProperties containerProps = new ContainerProperties(topicPartitionOffset);
containerProps.setMessageListener((MessageListener<Object, String>) message -> {
// process my message
});
KafkaMessageListenerContainer<Object, String> container = new KafkaMessageListenerContainer<>(kafkaPeopleConsumerFactory, containerProps);
container.start();
}
And this does not work. I'm missing probably something, but I can't figure what.
Here I have some logs that tells me that the leader is not available, which is weird since I got the new scenario event.
2022-03-14 18:08:26.057 INFO 21892 --- [ntainer#0-0-C-1] o.l.b.v.c.c.i.k.KafkaScenarioListener : Get new scenario W4BdDBEowY, creating new consumer
2022-03-14 18:08:26.061 INFO 21892 --- [ntainer#0-0-C-1] o.a.k.clients.consumer.ConsumerConfig : ConsumerConfig values:
allow.auto.create.topics = true
[...lot of things...]
value.deserializer = class org.springframework.kafka.support.serializer.JsonDeserializer
2022-03-14 18:08:26.067 INFO 21892 --- [ntainer#0-0-C-1] o.a.kafka.common.utils.AppInfoParser : Kafka version: 3.0.0
2022-03-14 18:08:26.067 INFO 21892 --- [ntainer#0-0-C-1] o.a.kafka.common.utils.AppInfoParser : Kafka commitId: 8cb0a5e9d3441962
2022-03-14 18:08:26.067 INFO 21892 --- [ntainer#0-0-C-1] o.a.kafka.common.utils.AppInfoParser : Kafka startTimeMs: 1647277706067
2022-03-14 18:08:26.068 INFO 21892 --- [ntainer#0-0-C-1] o.a.k.clients.consumer.KafkaConsumer : [Consumer clientId=consumer-people-creator-2, groupId=people-creator] Subscribed to partition(s): PEOPLE_W4BdDBEowY-1
2022-03-14 18:08:26.072 INFO 21892 --- [ -C-1] o.a.k.clients.consumer.KafkaConsumer : [Consumer clientId=consumer-people-creator-2, groupId=people-creator] Seeking to offset 0 for partition PEOPLE_W4BdDBEowY-1
2022-03-14 18:08:26.081 WARN 21892 --- [ -C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-people-creator-2, groupId=people-creator] Error while fetching metadata with correlation id 2 : {PEOPLE_W4BdDBEowY=LEADER_NOT_AVAILABLE}
2022-03-14 18:08:26.081 INFO 21892 --- [ -C-1] org.apache.kafka.clients.Metadata : [Consumer clientId=consumer-people-creator-2, groupId=people-creator] Cluster ID: ebyKy-RVSRmUDaaeQqMaQg
2022-03-14 18:18:04.882 WARN 21892 --- [ -C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-people-creator-2, groupId=people-creator] Error while fetching metadata with correlation id 5314 : {PEOPLE_W4BdDBEowY=LEADER_NOT_AVAILABLE}
2022-03-14 18:18:04.997 WARN 21892 --- [ -C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-people-creator-2, groupId=people-creator] Error while fetching metadata with correlation id 5315 : {PEOPLE_W4BdDBEowY=LEADER_NOT_AVAILABLE}
How do I create dynamically a kafka consumer on a topic ? I think I do it very wrong, but I searched a lot and really didn't find anything.
There are several answers here about dynamically creating containers...
Trigger one Kafka consumer by using values of another consumer In Spring Kafka
Kafka Consumer in spring can I re-assign partitions programmatically?
Create consumer dynamically spring kafka
Dynamically start and off KafkaListener just to load previous messages at the start of a session

While Using Kafka-Client getting this type of logs on console

I am getting below logs in my console but while publish the message get a message successfully but this happens in every time and it will print the below logs continue.
10:18:06.884 [main] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-test-1, groupId=test] Sending asynchronous auto-commit of offsets {shayona-0=OffsetAndMetadata{offset=11349, leaderEpoch=0, metadata=''}}
10:18:06.884 [main] DEBUG org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-test-1, groupId=test] Sending OFFSET_COMMIT request with header RequestHeader(apiKey=OFFSET_COMMIT, apiVersion=8, clientId=consumer-test-1, correlationId=1093) and timeout 30000 to node 2147482646: {group_id=test,generation_id=18,member_id=consumer-test-1-52154059-bfce-41f8-b05e-2e6973910aa9,group_instance_id=null,topics=[{name=shayona,partitions=[{partition_index=0,committed_offset=11349,committed_leader_epoch=0,committed_metadata=,_tagged_fields={}}],_tagged_fields={}}],_tagged_fields={}}
10:18:06.886 [main] DEBUG org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-test-1, groupId=test] Received OFFSET_COMMIT response from node 2147482646 for request with header RequestHeader(apiKey=OFFSET_COMMIT, apiVersion=8, clientId=consumer-test-1, correlationId=1093): OffsetCommitResponseData(throttleTimeMs=0, topics=[OffsetCommitResponseTopic(name='shayona', partitions=[OffsetCommitResponsePartition(partitionIndex=0, errorCode=0)])])
10:18:06.886 [main] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-test-1, groupId=test] Committed offset 11349 for partition shayona-0
10:18:06.886 [main] DEBUG org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-test-1, groupId=test] Completed asynchronous auto-commit of offsets {shayona-0=OffsetAndMetadata{offset=11349, leaderEpoch=0, metadata=''}}
10:18:07.177 [main] DEBUG org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-test-1, groupId=test] Received FETCH response from node 1001 for request with header RequestHeader(apiKey=FETCH, apiVersion=12, clientId=consumer-test-1, correlationId=1092): org.apache.kafka.common.requests.FetchResponse#2c715e84
10:18:07.177 [main] DEBUG org.apache.kafka.clients.FetchSessionHandler - [Consumer clientId=consumer-test-1, groupId=test] Node 1001 sent an incremental fetch response with throttleTimeMs = 0 for session 1022872780 with 0 response partition(s), 1 implied partition(s)
10:18:07.177 [main] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=consumer-test-1, groupId=test] Added READ_UNCOMMITTED fetch request for partition shayona-0 at position FetchPosition{offset=11349, offsetEpoch=Optional[0], currentLeader=LeaderAndEpoch{leader=Optional[localhost:9092 (id: 1001 rack: null)], epoch=0}} to node localhost:9092 (id: 1001 rack: null)
10:18:07.177 [main] DEBUG org.apache.kafka.clients.FetchSessionHandler - [Consumer clientId=consumer-test-1, groupId=test] Built incremental fetch (sessionId=1022872780, epoch=1035) for node 1001. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s) out of 1 partition(s)
10:18:07.177 [main] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=consumer-test-1, groupId=test] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(), toForget=(), implied=(shayona-0)) to broker localhost:9092 (id: 1001 rack: null)
10:18:07.177 [main] DEBUG org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-test-1, groupId=test] Sending FETCH request with header RequestHeader(apiKey=FETCH, apiVersion=12, clientId=consumer-test-1, correlationId=1094) and timeout 30000 to node 1001: {replica_id=-1,max_wait_ms=500,min_bytes=1,max_bytes=52428800,isolation_level=0,session_id=1022872780,session_epoch=1035,topics=[],forgotten_topics_data=[],rack_id=,_tagged_fields={}}
Help me to out from this. Below I mention consumer application.
Properties props = new Properties();
props.setProperty("bootstrap.servers", "localhost:9092");
props.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.setProperty("group.id", "test");
props.setProperty("enable.auto.commit", "true");
props.setProperty("auto.commit.interval.ms", "1000");
org.apache.kafka.clients.consumer.KafkaConsumer<String, String> consumer = new org.apache.kafka.clients.consumer.KafkaConsumer(props);
String topic[] = {"shayona"};
consumer.subscribe(Arrays.asList(topic));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
}

Kafka streams application failing to start

I have a Kafka streams application that is reading off a topic data that is produced with a console producer. I have a number of steps in the application, that produces two KTables which I then wish to join.
Each KTable is produed sucessfully, and I can even call toStream and then peek the values to console individually. As soon as I try to join the KTables together, then the application fails to even launch i.e. introducing the line bar.join(qux).toStream() causes the below panic. It looks like KTables bar and qux are produced.
Here is the output that I receive as error message:
2020-02-14 15:56:28.599 INFO AssignorConfiguration - stream-thread [foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4-StreamThread-1-consumer] Cooperative rebalancing enabled now
2020-02-14 15:56:28.630 WARN ConsumerConfig - The configuration 'admin.retries' was supplied but isn't a known config.
2020-02-14 15:56:28.630 WARN ConsumerConfig - The configuration 'admin.retry.backoff.ms' was supplied but isn't a known config.
2020-02-14 15:56:28.630 INFO AppInfoParser - Kafka version: 2.4.0
2020-02-14 15:56:28.630 INFO AppInfoParser - Kafka commitId: 77a89fcf8d7fa018
2020-02-14 15:56:28.630 INFO AppInfoParser - Kafka startTimeMs: 1581695788630
2020-02-14 15:56:28.636 INFO KafkaStreams - stream-client [foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4] State transition from CREATED to REBALANCING
2020-02-14 15:56:28.636 INFO StreamThread - stream-thread [foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4-StreamThread-1] Starting
2020-02-14 15:56:28.636 INFO StreamThread - stream-thread [foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4-StreamThread-1] State transition from CREATED to STARTING
2020-02-14 15:56:28.637 INFO KafkaConsumer - [Consumer clientId=foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4-StreamThread-1-consumer, groupId=foo] Subscribed to pattern: 'foo-KSTREAM-AGGREGATE-STATE-STORE-0000000009-repartition|foo-KSTREAM-AGGREGATE-STATE-STORE-0000000016-repartition|foo-KSTREAM-AGGREGATE-STATE-STORE-0000000022-repartition|foo-KSTREAM-AGGREGATE-STATE-STORE-0000000029-repartition|data'
2020-02-14 15:56:28.906 INFO Metadata - [Consumer clientId=foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4-StreamThread-1-consumer, groupId=foo] Cluster ID: ghhNsZUZRSGD984ra7fXRg
2020-02-14 15:56:28.907 INFO AbstractCoordinator - [Consumer clientId=foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4-StreamThread-1-consumer, groupId=foo] Discovered group coordinator 10.1.36.24:9092 (id: 2147483647 rack: null)
2020-02-14 15:56:28.915 INFO AbstractCoordinator - [Consumer clientId=foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4-StreamThread-1-consumer, groupId=foo] (Re-)joining group
2020-02-14 15:56:28.920 INFO AbstractCoordinator - [Consumer clientId=foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4-StreamThread-1-consumer, groupId=foo] (Re-)joining group
2020-02-14 15:56:28.925 ERROR StreamThread - stream-thread [foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4-StreamThread-1] Encountered the following error during processing:
java.lang.IllegalArgumentException: Number of partitions must be at least 1.
at org.apache.kafka.streams.processor.internals.InternalTopicConfig.setNumberOfPartitions(InternalTopicConfig.java:62) ~[kafka-streams-2.4.0.jar:?]
at org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.assign(StreamsPartitionAssignor.java:473) ~[kafka-streams-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:548) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:650) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1300(AbstractCoordinator.java:111) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:572) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:555) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1026) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1006) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:599) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:409) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:294) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:212) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:400) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:340) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:471) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1267) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1231) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1211) ~[kafka-clients-2.4.0.jar:?]
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:843) ~[kafka-streams-2.4.0.jar:?]
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:743) ~[kafka-streams-2.4.0.jar:?]
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698) ~[kafka-streams-2.4.0.jar:?]
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671) ~[kafka-streams-2.4.0.jar:?]
2020-02-14 15:56:28.925 INFO StreamThread - stream-thread [foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4-StreamThread-1] State transition from STARTING to PENDING_SHUTDOWN
2020-02-14 15:56:28.925 INFO StreamThread - stream-thread [foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4-StreamThread-1] Shutting down
2020-02-14 15:56:28.925 INFO KafkaConsumer - [Consumer clientId=foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4-StreamThread-1-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions
2020-02-14 15:56:28.932 INFO StreamThread - stream-thread [foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
2020-02-14 15:56:28.932 INFO KafkaStreams - stream-client [foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4] State transition from REBALANCING to ERROR
2020-02-14 15:56:28.932 ERROR KafkaStreams - stream-client [foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4] All stream threads have died. The instance will be in error state and should be closed.
2020-02-14 15:56:28.932 INFO StreamThread - stream-thread [foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4-StreamThread-1] Shutdown complete
2020-02-14 15:56:28.934 INFO KafkaStreams - stream-client [foo-d2f546ef-f7eb-4088-ae04-1943ed71f7a4] State transition from ERROR to PENDING_SHUTDOWN
2020-02-14 15:56:28,935 kafka-streams-close-thread WARN [AsyncContext#18b4aac2] Ignoring log event after log4j was shut down.
2020-02-14 15:56:28,936 kafka-streams-close-thread WARN Ignoring log event after log4j was shut down
2020-02-14 15:56:28,938 kafka-streams-close-thread WARN [AsyncContext#18b4aac2] Ignoring log event after log4j was shut down.
2020-02-14 15:56:28,939 kafka-streams-close-thread WARN Ignoring log event after log4j was shut down
2020-02-14 15:56:28,939 Thread-1 WARN [AsyncContext#18b4aac2] Ignoring log event after log4j was shut down.
2020-02-14 15:56:28,939 Thread-1 WARN Ignoring log event after log4j was shut down
What's the cause of this? Is there some magic config I need to include to deal with the extra state store of the join I am trying to introduce?
Downgrade to v2.3.1 has fixed the issue as found here: issues.apache.org/jira/browse/KAFKA-9335
Thank you to Spasoje Petronijević.

Kafka Resiliency - Group Coordinator

As I understand, one of the brokers is selected as the group coordinator which takes care of consumer rebalancing.
Discovered coordinator host:9092 (id: 2147483646 rack: null) for group good_group
I have 3 nodes with replication factor of 3 and 3 partitions.
Everything is great and when I kill kafka on non-coordinator nodes, consumer is still receiving messages.
But when I kill that specific node with coordinator, rebalancing is not happening and my java consumer app does not receive any messages.
2018-05-29 16:34:22.668 INFO AbstractCoordinator:555 - Discovered coordinator host:9092 (id: 2147483646 rack: null) for group good_group.
2018-05-29 16:34:22.689 INFO AbstractCoordinator:600 - Marking the coordinator host:9092 (id: 2147483646 rack: null) dead for group good_group
2018-05-29 16:34:22.801 INFO AbstractCoordinator:555 - Discovered coordinator host:9092 (id: 2147483646 rack: null) for group good_group.
2018-05-29 16:34:22.832 INFO AbstractCoordinator:600 - Marking the coordinator host:9092 (id: 2147483646 rack: null) dead for group good_group
2018-05-29 16:34:22.933 INFO AbstractCoordinator:555 - Discovered coordinator host:9092 (id: 2147483646 rack: null) for group good_group.
2018-05-29 16:34:23.044 WARN ConsumerCoordinator:535 - Auto offset commit failed for group good_group: Offset commit failed with a retriable exception. You should retry committing offsets.
Am I doing something wrong and is there a way around this?
But when I kill that specific node with coordinator, rebalancing is not happening and my java consumer app does not receive any messages.
The group coordinator receives heartbeats from all consumers in the consumer group. It maintains a list of active consumers and initiates the rebalancing on the change of this list. Then the group leader executes the rebalance activity.
That's why the rebalancing will stop if you kill the group coordinator.
UPDATE
In the case that the group coordinator broker shutdowns, the Zookeeper will be notified and the election starts to promote a new group coordinator from the active brokers automatically. So nothing to do with group coordinator. Let's see the log:
2018-05-29 16:34:23.044 WARN ConsumerCoordinator:535 - Auto offset commit failed for group good_group: Offset commit failed with a retriable exception. You should retry committing offsets.
The replication factor of internal topic __consumer_offset probably has the default value 1. Can you check what value of default.replication.factor and offsets.topic.replication.factor are in the server.properties files. If the values is 1 by default, it should be changed to bigger one. Failing to do so, the group coordinator shutdowns causing offset manager stops without backup. So the activity of committing offsets can not be done.

Categories

Resources