Frequent "offset out of range" messages, partitions deserted by consumer

Frequent "offset out of range" messages, partitions deserted by consumer - java

We are running 3 node Kafka 0.10.0.1 cluster. We have a consumer application which has a single consumer group connecting to multiple topics. We are seeing strange behaviour in consumer logs. With these lines
Fetch offset 1109143 is out of range for partition email-4, resetting offset
Fetch offset 952168 is out of range for partition email-7, resetting offset
Fetch offset 945796 is out of range for partition email-5, resetting offset
Fetch offset 950900 is out of range for partition email-0, resetting offset
Fetch offset 953163 is out of range for partition email-3, resetting offset
Fetch offset 1118389 is out of range for partition email-6, resetting offset
Fetch offset 1112177 is out of range for partition email-2, resetting offset
Fetch offset 1109539 is out of range for partition email-1, resetting offset
Some time later we saw these logs
[2018-06-08 19:45:28] :: INFO :: ConsumerCoordinator:333 - Revoking previously assigned partitions [sms-4, sms-3, sms-0, sms-2, sms-1] for group notifications-consumer
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator:381 - (Re-)joining group notifications-consumer
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO :: ConsumerCoordinator:225 - Setting newly assigned partitions [sms-8, sms-7, sms-9, sms-6, sms-5] for group notifications-consumer
I noticed that one of our topics was not seen in the list of Setting newly assigned partitions. Then that topic had no consumers attached to it for 8 hours at least. It's only when someone restarted application it started consuming from that topic. What can be going wrong here?
Here is consumer config
auto.commit.interval.ms = 3000
auto.offset.reset = latest
bootstrap.servers = [x.x.x.x:9092, x.x.x.x:9092, x.x.x.x:9092]
check.crcs = true
client.id =
connections.max.idle.ms = 540000
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = otp-notifications-consumer
heartbeat.interval.ms = 3000
interceptor.classes = null
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 50
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.ms = 50
request.timeout.ms = 305000
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = SSL
send.buffer.bytes = 131072
session.timeout.ms = 300000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = /x/x/client.truststore.jks
ssl.truststore.password = [hidden]
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
The topic which went orphan has 10 partitions, retention.ms=1800000, segment.ms=1800000.
Please help.

The offset out of range message you are seeing usually indicates the offset the consumer is at has been deleted on the broker. Upon hitting that the consumer will use auto.offset.reset to restart consuming.
With retention.ms=1800000 (30mins), you are only keeping data for a very short amount of time so it's expected that if you restart the consumer after several hours, the data is gone.

Related

In kafka stream topology with 'exactly_once' processing guarantee, message lost on exception

I have a requirement where I need to process messages from Kafka without losing any message and also need to maintain the message order. Therefore, I used transactions and enabled 'exactly_once' processing guarantee in my Kafka streams topology. As I assume that the topology processing will be 'all or nothing', that the message offset is committed only after the last node successfully processed the message.
However in a failure scenario, for example when the database is down and the processor fails to store message and throws an exception. At this point, the topology dies as intended and is recreated automatically on rebalance. I assume that the topology should either re-consume the original message again from the Kafka topic OR on application restart, it should re-consume that original message from Kafka topic. However, it seems that original message disappears and is never consumed or processed after that topology died.
What do I need to do to reprocess the original message sent to Kafka topic? Or what Kafka configuration requires change? Do I need manually assign a state store and keep track of messages processed on a changelog topic?
Topology:
#Singleton
public class EventTopology extends Topology {
private final Deserializer<String> deserializer = Serdes.String().deserializer();
private final Serializer<String> serializer = Serdes.String().serializer();
private final EventLogMessageSerializer eventLogMessageSerializer;
private final EventLogMessageDeserializer eventLogMessageDeserializer;
private final EventLogProcessorSupplier eventLogProcessorSupplier;
#Inject
public EventTopology(EventsConfig eventsConfig,
EventLogMessageSerializer eventLogMessageSerializer,
EventLogMessageDeserializer eventLogMessageDeserializer,
EventLogProcessorSupplier eventLogProcessorSupplier) {
this.eventLogMessageSerializer = eventLogMessageSerializer;
this.eventLogMessageDeserializer = eventLogMessageDeserializer;
this.eventLogProcessorSupplier = eventLogProcessorSupplier;
init(eventsConfig);
}
private void init(EventsConfig eventsConfig) {
var topics = eventsConfig.getTopicConfig().getTopics();
String eventLog = topics.get("eventLog");
addSource("EventsLogSource", deserializer, eventLogMessageDeserializer, eventLog)
.addProcessor("EventLogProcessor", eventLogProcessorSupplier, "EventsLogSource");
}
}
Processor:
#Singleton
#Slf4j
public class EventLogProcessor implements Processor<String, EventLogMessage> {
private final EventLogService eventLogService;
private ProcessorContext context;
#Inject
public EventLogProcessor(EventLogService eventLogService) {
this.eventLogService = eventLogService;
}
#Override
public void init(ProcessorContext context) {
this.context = context;
}
#Override
public void process(String key, EventLogMessage value) {
log.info("Processing EventLogMessage={}", value);
try {
eventLogService.storeInDatabase(value);
context.commit();
} catch (Exception e) {
log.warn("Failed to process EventLogMessage={}", value, e);
throw e;
}
}
#Override
public void close() {
}
}
Configuration:
eventsConfig:
saveTopicsEnabled: false
topologyConfig:
environment: "LOCAL"
broker: "localhost:9093"
enabled: true
initialiseWaitInterval: 3 seconds
applicationId: "eventsTopology"
config:
auto.offset.reset: latest
session.timeout.ms: 6000
fetch.max.wait.ms: 7000
heartbeat.interval.ms: 5000
connections.max.idle.ms: 7000
security.protocol: SSL
key.serializer: org.apache.kafka.common.serialization.StringSerializer
value.serializer: org.apache.kafka.common.serialization.StringSerializer
max.poll.records: 5
processing.guarantee: exactly_once
metric.reporters: com.simple.metrics.kafka.DropwizardReporter
default.deserialization.exception.handler: org.apache.kafka.streams.errors.LogAndContinueExceptionHandler
enable.idempotence: true
request.timeout.ms: 8000
acks: all
batch.size: 16384
linger.ms: 1
enable.auto.commit: false
state.dir: "/tmp"
topicConfig:
topics:
eventLog: "EVENT-LOG-LOCAL"
kafkaTopicConfig:
partitions: 18
replicationFactor: 1
config:
retention.ms: 604800000
Test:
Feature: Feature covering the scenarios to process event log messages produced by external client.
Background:
Given event topology is healthy
Scenario: event log messages produced are successfully stored in the database
Given database is down
And the following event log messages are published
| deptId | userId | eventType | endDate | eventPayload_partner |
| dept-1 | user-1234 | CREATE | 2021-04-15T00:00:00Z | PARTNER-1 |
When database is up
And database is healthy
Then event log stored in the database as follows
| dept_id | user_id | event_type | end_date | event_payload |
| dept-1 | user-1234 | CREATE | 2021-04-15T00:00:00Z | {"partner":"PARTNER-1"} |
Logs:
INFO [data-plane-kafka-request-handler-1] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Preparing to rebalance group eventsTopology in state PreparingRebalance with old generation 0 (__consumer_offsets-0) (reason: Adding new member eventsTopology-57fdac0e-09fb-4aa0-8b0b-7e01809b31fa-StreamThread-1-consumer-96a3e980-4286-461e-8536-5f04ccb2c778 with group instance id None)
INFO [executor-Rebalance] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Stabilized group eventsTopology generation 1 (__consumer_offsets-0)
INFO [data-plane-kafka-request-handler-2] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Assignment received from leader for group eventsTopology for generation 1
INFO [data-plane-kafka-request-handler-1] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-0_0 with producerId 0 and producer epoch 0 on partition __transaction_state-4
INFO [data-plane-kafka-request-handler-6] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-0_1 with producerId 1 and producer epoch 0 on partition __transaction_state-3
...
INFO [data-plane-kafka-request-handler-0] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-0_16 with producerId 17 and producer epoch 0 on partition __transaction_state-37
INFO [data-plane-kafka-request-handler-4] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_1 with producerId 18 and producer epoch 0 on partition __transaction_state-42
INFO [data-plane-kafka-request-handler-6] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_0 with producerId 19 and producer epoch 0 on partition __transaction_state-43
...
INFO [data-plane-kafka-request-handler-3] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_17 with producerId 34 and producer epoch 0 on partition __transaction_state-45
INFO [data-plane-kafka-request-handler-5] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_16 with producerId 35 and producer epoch 0 on partition __transaction_state-46
INFO [pool-26-thread-1] ManagerClient - Manager request {uri:http://localhost:8081/healthcheck, method:GET, body:'', headers:{}}
INFO [pool-26-thread-1] ManagerClient - Manager response from with body {"Database":{"healthy":true},"eventsTopology":{"healthy":true}}
INFO [dw-admin-130] KafkaConnectionCheck - successfully connected to kafka broker: localhost:9093
INFO [kafka-producer-network-thread | EVENT-LOG-LOCAL-test-client-id] LocalTestEnvironment - Message: ProducerRecord(topic=EVENT-LOG-LOCAL, partition=null, headers=RecordHeaders(headers = [], isReadOnly = true), key=null, value={"endDate":1618444800000,"deptId":"dept-1","userId":"user-1234","eventType":"CREATE","eventPayload":{"previousEndDate":null,"partner":"PARTNER-1","info":null}}, timestamp=null) pushed onto topic: EVENT-LOG-LOCAL
INFO [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] EventLogProcessor - Processing EventLogMessage=EventLogMessage(endDate=Thu Apr 15 01:00:00 BST 2021, deptId=dept-1, userId=user-1234, eventType=CREATE, eventPayload=EventLogMessage.EventPayload(previousEndDate=null, partner=PARTNER-1, info=null))
WARN [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] EventLogProcessor - Failed to process EventLogMessage=EventLogMessage(endDate=Thu Apr 15 01:00:00 BST 2021, deptId=dept-1, userId=user-1234, eventType=CREATE, eventPayload=EventLogMessage.EventPayload(previousEndDate=null, partner=PARTNER-1, info=null))
exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
at manager.service.EventLogService.storeInDatabase(EventLogService.java:24)
at manager.topology.processor.EventLogProcessor.process(EventLogProcessor.java:47)
at manager.topology.processor.EventLogProcessor.process(EventLogProcessor.java:19)
at org.apache.kafka.streams.processor.internals.ProcessorNode.lambda$process$2(ProcessorNode.java:142)
at org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:836)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:142)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:236)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:216)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:168)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:96)
at org.apache.kafka.streams.processor.internals.StreamTask.lambda$process$1(StreamTask.java:679)
at org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:836)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:679)
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1033)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:690)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
ERROR [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] org.apache.kafka.streams.processor.internals.TaskManager - stream-thread [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] Failed to process stream task 0_8 due to the following error:
org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_8, processor=EventsLogSource, topic=EVENT-LOG-LOCAL, partition=8, offset=0, stacktrace=exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
ERROR [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] Encountered the following exception during processing and the thread is going to shut down:
org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_8, processor=EventsLogSource, topic=EVENT-LOG-LOCAL, partition=8, offset=0, stacktrace=exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
ERROR [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] org.apache.kafka.streams.KafkaStreams - stream-client [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3] All stream threads have died. The instance will be in error state and should be closed.
Exception: java.lang.IllegalStateException thrown from the UncaughtExceptionHandler in thread "eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1"
INFO [executor-Heartbeat] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Member eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1-consumer-f11ca299-2a68-4317-a559-dd1b96cd431f in group eventsTopology has failed, removing it from the group
INFO [executor-Heartbeat] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Preparing to rebalance group eventsTopology in state PreparingRebalance with old generation 1 (__consumer_offsets-0) (reason: removing member eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1-consumer-f11ca299-2a68-4317-a559-dd1b96cd431f on heartbeat expiration)
INFO [data-plane-kafka-request-handler-2] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Stabilized group eventsTopology generation 2 (__consumer_offsets-0)
INFO [data-plane-kafka-request-handler-6] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Assignment received from leader for group eventsTopology for generation 2
INFO [data-plane-kafka-request-handler-0] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-0_0 with producerId 0 and producer epoch 1 on partition __transaction_state-4
...
INFO [data-plane-kafka-request-handler-0] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_16 with producerId 35 and producer epoch 1 on partition __transaction_state-46
INFO [main] Cluster - New databse host localhost/127.0.0.1:59423 added
com.jayway.awaitility.core.ConditionTimeoutException: Condition defined as a lambda expression in steps.EventLogSteps
Expecting:
<0>
to be equal to:
<1>
but was not. within 20 seconds.

Kafka Streams 2.6, Partition Assignor and Rebalancing Strategy

In my current Kafka version which is 2.6, i am using Streams API and i have a question. When i start a stream, it writes Streams,Admin,Consumer and Produces configs. I noticed something strange that although i provide configuration
streamsConfiguration.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, CooperativeStickyAssignor.class.getName());
like above, i see some different strategies in consumer and stream logs.
Here is consumer logs that shows consumer configs
2021-01-20 15:52:32.611 INFO 111980 --- [alytics.event-4] o.a.k.clients.consumer.ConsumerConfig : ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 500
auto.offset.reset = none
bootstrap.servers = [XXX:9092, XXX:9092, XXX:9092]
check.crcs = true
client.dns.lookup = use_all_dns_ips
client.id = APPID-dd747646-8b51-42b0-8ad9-2fb26435a588-StreamThread-2-restore-consumer
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = null
group.instance.id = null
heartbeat.interval.ms = 25000
interceptor.classes = []
internal.leave.group.on.close = false
internal.throw.on.fetch.stable.offset.unsupported = false
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 1000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = DEBUG
metrics.sample.window.ms = 30000
partition.assignment.strategy = [org.apache.kafka.clients.consumer.CooperativeStickyAssignor]
but also i saw logs like below
2021-01-20 15:52:32.740 INFO 111980 --- [alytics.event-4] o.a.k.clients.consumer.ConsumerConfig : ConsumerConfig values:
allow.auto.create.topics = false
auto.commit.interval.ms = 500
auto.offset.reset = latest
bootstrap.servers = [XXX:9092, XXX:9092, XXX:9092]
check.crcs = true
client.dns.lookup = use_all_dns_ips
client.id = APPID-dd747646-8b51-42b0-8ad9-2fb26435a588-StreamThread-2-consumer
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
heartbeat.interval.ms = 25000
interceptor.classes = []
internal.leave.group.on.close = false
internal.throw.on.fetch.stable.offset.unsupported = false
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 1000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = DEBUG
metrics.sample.window.ms = 30000
partition.assignment.strategy = [org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor]
when i check those two consumer logs, i only noticed that their client.id values are different.
I am a bit consufed that did i enable CooperativeStickyAssignor or not ?
what are the differences between these two consumers that causes to use different partition assignment strategy ?
Is it normal that i see different consumer configuration in a same kafka streams application ?
Thank you

The first consumer logs in your question is that of a "restore" consumer which manages state store recovery. You can find the word "restore" in the client id.
The second consumer logs that you showed in your question is that of your own defined consumer. It seems that the strategy that is used by your consumer is "StreamsPartitionAssignor".

Fetch offset 5705 is out of range for partition , resetting offset

I am getting below info message every time in kafka consumer.
2020-07-04 14:54:27.640 INFO 1 --- [istener-0-0-C-1] c.n.o.c.h.p.n.PersistenceKafkaConsumer : beginning to consume batch messages , Message Count :11
2020-07-04 14:54:27.809 INFO 1 --- [istener-0-0-C-1] c.n.o.c.h.p.n.PersistenceKafkaConsumer : Execution Time :169
2020-07-04 14:54:27.809 INFO 1 --- [istener-0-0-C-1] essageListenerContainer$ListenerConsumer : Committing: {nbi.cm.changes.mo.test23-1=OffsetAndMetadata{offset=5705, leaderEpoch=null, metadata=''}}
2020-07-04 14:54:27.812 INFO 1 --- [istener-0-0-C-1] c.n.o.c.h.p.n.PersistenceKafkaConsumer : Acknowledgment Success
2020-07-04 14:54:27.813 INFO 1 --- [istener-0-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-1, groupId=cm-persistence-notification] Fetch offset 5705 is out of range for partition nbi.cm.changes.mo.test23-1, resetting offset
2020-07-04 14:54:27.820 INFO 1 --- [istener-0-0-C-1] o.a.k.c.c.internals.SubscriptionState : [Consumer clientId=consumer-1, groupId=cm-persistence-notification] Resetting offset for partition nbi.cm.changes.mo.test23-1 to offset 666703.
Got OFFSET_OUT_OF_RANGE error in debug log and resetting to some other partition that actually not exist. Same all messages able to receive in consumer console.
But actually I committed offset before that only , offset are available in kafka , log retention policy is 24hr, so it's not deleted in kafka.
In debug log, I got below messages:
beginning to consume batch messages , Message Count :710
2020-07-02 04:58:31.486 DEBUG 1 --- [ce-notification] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-1, groupId=cm-persistence-notification] Node 1002 sent an incremental fetch response for session 253529272 with 1 response partition(s)
2020-07-02 04:58:31.486 DEBUG 1 --- [ce-notification] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-1, groupId=cm-persistence-notification] Fetch READ_UNCOMMITTED at offset 11372 for partition nbi.cm.changes.mo.test12-1 returned fetch data (error=OFFSET_OUT_OF_RANGE, highWaterMark=-1, lastStableOffset = -1, logStartOffset = -1, preferredReadReplica = absent, abortedTransactions = null, recordsSizeInBytes=0)
When all we will get OFFSET_OUT_OF_RANGE.
Listener Class :
#KafkaListener( id = "batch-listener-0", topics = "topic1", groupId = "test", containerFactory = KafkaConsumerConfiguration.CONTAINER_FACTORY_NAME )
public void receive(
#Payload List<String> messages,
#Header( KafkaHeaders.RECEIVED_MESSAGE_KEY ) List<String> keys,
#Header( KafkaHeaders.RECEIVED_PARTITION_ID ) List<Integer> partitions,
#Header( KafkaHeaders.RECEIVED_TOPIC ) List<String> topics,
#Header( KafkaHeaders.OFFSET ) List<Long> offsets,
Acknowledgment ack )
{
long startTime = System.currentTimeMillis();
handleNotifications( messages ); // will take more than 5s to process all messages
long endTime = System.currentTimeMillis();
long timeElapsed = endTime - startTime;
LOGGER.info( "Execution Time :{}", timeElapsed );
ack.acknowledge();
LOGGER.info( "Acknowledgment Success" );
}
Do i need to close consumer here , i thought spring-kafka automatically take care those , if no could you please tell how to close in apring-kafka and also how to check if rebalance happened or not , because in DEBUG log not able to see any log related to rebalance.

I think your consumer may be rebalancing, because you are not calling consumer.close() at the end of your process.
This is a guess, but if the retention policy isn't kicking in (and the logs are not being deleted), this is the only reason I can tell for that behaviour.
Update:
As you set them as #KafkaListeners, you could just call stop() on the KafkaListenerEndpointRegistry: kafkaListenerEndpointRegistry.stop()

Consumer CommitFailedException - time between subsequent calls to poll() was longer than the configured max.poll.interval.ms

I have problem with kafka consumer which from time to time throws exception.
ERROR [*KafkaConsumerWorker] (Thread-125) [] Kafka Consumer thread 235604751 Exception while polling Kafka.: org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:820) [kafka-clients-2.3.0.jar:]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:692) [kafka-clients-2.3.0.jar:]
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1368) [kafka-clients-2.3.0.jar:]
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1330) [kafka-clients-2.3.0.jar:]
at *.kafka.KafkaConsumerWorker.run(KafkaConsumerWorker.java:64) [classes:]
at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_51]
I can't find why is this happening, because the consumer is not processing any messages, while this exception occurs. These exceptions occurred 2 - 3 times daily.
Some of my consumer configurations are as follow:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = latest
bootstrap.servers = [*]
check.crcs = true
client.dns.lookup = default
client.id = 52c94040-05d9-4b57-8006-afcc862f9b62
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = TEST
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 10
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
Implementation:
{
logger.info("Kafka Consumer thread {} start", hashCode());
Consumer<String, Message> consumer = null;
try {
consumer = KafkaConsumerClient.createConsumer();
while (start) {
try {
ConsumerRecords<String, Message> notifications =
consumer.poll(300000);
if (!notifications.isEmpty()) {
//processing.....
}
consumer.commitSync();
} catch (Exception e) {
logger.error("Kafka Consumer thread {} Exception while polling Kafka.", hashCode(), e);
}
}
logger.info("Kafka Consumer thread {} exit", hashCode());
} finally {
if (consumer != null) {
logger.info("Kafka Consumer thread {} closing consumer.", hashCode());
consumer.close();
}
}
}
I know that with this version of the kafka clinet, the heartbeatis sent from another thread which I guess that eliminates that the consumer spent too much time for processing (even that there is nothing to process). I guess that this is something with config timeoutes but can't find which exactly.

Assuming you want to handle records in order, you should append events into an in memory queue from the consumer loop, then hand off that queue object into a completely new dequeue-ing Thread for the processing... logic
The error suggests whatever you're doing there is slow enough to stop and rebalance your consumer
I'd also recommend a higher level library that can handle backpressure such as the Connect / Streams API or Vertx or Smallrye Messaging or Akka Streams

You should set the Duration of Consumer#poll(Duration) to lower than max.poll.interval.ms which is the maximum of time Consumer can stay idle before fetching more records. In Kafka document:
If poll() is not called before expiration of this timeout, then the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member
By the time you you commit your offset, your consumer is already failed, and partition had already revoked, group is rebalancing.

Java Producer not recognizing property "Partitioner.class" property

I am running Cloudera provided Kafka version 0.9.0 and I have written a custom Producer
I am adding the following properties to my ProducerConfig:
Properties props = new Properties();
props.put("bootstrap.servers", brokers);
props.put("acks", "all");
//props.put("metadata.broker.list", this.getBrokers());
//props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("partitioner.class", "com.vikas.MyPartitioner");
However when I am running my program, I am getting this error
16/08/23 18:00:33 INFO producer.ProducerConfig: ProducerConfig values:
value.serializer = class org.apache.kafka.common.serialization.StringSerializer
key.serializer = class org.apache.kafka.common.serialization.StringSerializer
block.on.buffer.full = true
retry.backoff.ms = 100
buffer.memory = 33554432
batch.size = 16384
metrics.sample.window.ms = 30000
metadata.max.age.ms = 300000
receive.buffer.bytes = 32768
timeout.ms = 30000
max.in.flight.requests.per.connection = 5
bootstrap.servers = [server01:9092, server02:9092]
metric.reporters = []
client.id =
compression.type = none
retries = 0
max.request.size = 1048576
send.buffer.bytes = 131072
acks = all
reconnect.backoff.ms = 10
linger.ms = 1
metrics.num.samples = 2
metadata.fetch.timeout.ms = 60000
16/08/23 18:00:33 WARN producer.ProducerConfig: The configuration partitioner.class = null was supplied but isn't a known config.
As per the documentation on http://kafka.apache.org/090/documentation.html#producerconfigs
partitioner,class is a valid property but I am not sure why is kafka complaining about it being an unknown config.

Take a look at this article http://www.javaworld.com/article/3066873/big-data/big-data-messaging-with-kafka-part-2.html it has sample on how to use custom partitioner.
As you can see the error message is
"16/08/23 18:00:33 WARN producer.ProducerConfig: The configuration partitioner.class = null was supplied but isn't a known config." which means Kafka does not understand the key partitioner.class
You might want to set partitioner like this
configProperties.put(ProducerConfig.PARTITIONER_CLASS_CONFIG,"com.vikas.MyPartitioner");

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Frequent "offset out of range" messages, partitions deserted by consumer - java

Related

In kafka stream topology with 'exactly_once' processing guarantee, message lost on exception

Kafka Streams 2.6, Partition Assignor and Rebalancing Strategy

Fetch offset 5705 is out of range for partition , resetting offset

Consumer CommitFailedException - time between subsequent calls to poll() was longer than the configured max.poll.interval.ms

Java Producer not recognizing property "Partitioner.class" property

Categories

Resources