Kafka Streams parallel processing not working

Kafka Streams parallel processing not working - java

I'm learning Kafka Stream and I want to build a simple application that reads lines of text from one topic and puts number of letter occurrences to InfluxDB. My second goal is to run it in parallel. I would like to have two running instances of Kafka Stream both of them processing input at the same time but from different partitions. Unfortunately, after I launch both instances only one of them is working second one is waiting.
My configuration is as follows:
* input topic has 4 partitions
* Stream application has a property num.stream.threads set to 2
Here is the source code:
public class LineStatisticsStream {
static Pattern letterPattern = Pattern.compile("[a-z]");
InfluxDB influxDB;
public static void main(String[] args) {
new LineStatisticsStream().start();
}
Properties getStreamProperties() {
Properties properties = new Properties();
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");
properties.put(StreamsConfig.APPLICATION_ID_CONFIG,"lineStatistics4");
properties.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
properties.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG,2);
return properties;
}
Topology getStreamTopology(ForeachAction<String,Long> action) {
StreamsBuilder builder = new StreamsBuilder();
KStream<String,String> linesStream = builder.stream("lines", Consumed.with(Serdes.String(),Serdes.String()));
linesStream.flatMapValues(this::findAllLetters)
.map((k,v)-> KeyValue.pair(v,v))
.groupByKey(Grouped.with(Serdes.String(),Serdes.String()))
.count()
.toStream()
.to("letterCount",Produced.with(Serdes.String(),Serdes.Long()));
KStream<String,Long> countStream = builder.stream("letterCount", Consumed.with(Serdes.String(),Serdes.Long()));
countStream.peek(action);
return builder.build();
}
private void insertToInfluxDB(String letter, Long count) {
influxDB.write(Point.measurement("lettersCount")
.tag("letter",letter)
.time(System.currentTimeMillis(), TimeUnit.MILLISECONDS)
.addField("count", count)
.build()
);
}
private List<String> findAllLetters(String s) {
List<String> letters = new ArrayList<>();
Matcher matcher= letterPattern.matcher(s.toLowerCase());
while(matcher.find()) {
letters.add(matcher.group(0));
}
return letters;
}
private void start() {
influxDB = InfluxDBFactory.connect("http://127.0.0.1:8086");
influxDB.setDatabase("letters");
influxDB.enableBatch(BatchOptions.DEFAULTS);
final KafkaStreams streams = new KafkaStreams(getStreamTopology(this::insertToInfluxDB), getStreamProperties());
streams.start();
Runtime.getRuntime().addShutdownHook(new Thread(()->{streams.close();influxDB.close();}));
}
}
And last few lines of a log from "waiting" application:
2019-11-18 21:27:16 INFO ConsumerCoordinator:982 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-1-consumer, groupId=lineStatistics4] Found no committed offset for partition lines4-2
2019-11-18 21:27:16 INFO ConsumerCoordinator:982 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-2-consumer, groupId=lineStatistics4] Found no committed offset for partition lines4-1
2019-11-18 21:27:16 INFO RocksDBTimestampedStore:82 - Opening store KSTREAM-AGGREGATE-STATE-STORE-0000000003 in regular mode
2019-11-18 21:27:16 INFO RocksDBTimestampedStore:82 - Opening store KSTREAM-AGGREGATE-STATE-STORE-0000000003 in regular mode
2019-11-18 21:27:16 INFO KafkaConsumer:1068 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-2-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions
2019-11-18 21:27:16 INFO KafkaConsumer:1068 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-2-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions
2019-11-18 21:27:16 INFO StreamThread:212 - stream-thread [lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-2] State transition from PARTITIONS_ASSIGNED to RUNNING
2019-11-18 21:27:16 INFO KafkaConsumer:1068 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-1-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions
2019-11-18 21:27:16 INFO KafkaConsumer:1068 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-1-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions
2019-11-18 21:27:16 INFO StreamThread:212 - stream-thread [lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-1] State transition from PARTITIONS_ASSIGNED to RUNNING
2019-11-18 21:27:16 INFO KafkaStreams:263 - stream-client [lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099] State transition from REBALANCING to RUNNING
2019-11-18 21:27:16 INFO ConsumerCoordinator:982 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-2-consumer, groupId=lineStatistics4] Found no committed offset for partition lines4-1
2019-11-18 21:27:16 INFO ConsumerCoordinator:982 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-1-consumer, groupId=lineStatistics4] Found no committed offset for partition lines4-2
2019-11-18 21:27:16 INFO ConsumerCoordinator:525 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-2-consumer, groupId=lineStatistics4] Setting offset for partition lineStatistics4-KSTREAM-AGGREGATE-STATE-STORE-0000000003-repartition-0 to the committed offset FetchPosition{offset=73, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=localhost:9092 (id: 1 rack: null), epoch=0}}
2019-11-18 21:27:16 INFO ConsumerCoordinator:525 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-1-consumer, groupId=lineStatistics4] Setting offset for partition lines4-0 to the committed offset FetchPosition{offset=63, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=localhost:9092 (id: 1 rack: null), epoch=0}}
2019-11-18 21:27:16 INFO ConsumerCoordinator:525 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-1-consumer, groupId=lineStatistics4] Setting offset for partition lineStatistics4-KSTREAM-AGGREGATE-STATE-STORE-0000000003-repartition-3 to the committed offset FetchPosition{offset=40, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=localhost:9092 (id: 1 rack: null), epoch=0}}
2019-11-18 21:27:17 INFO SubscriptionState:348 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-2-consumer, groupId=lineStatistics4] Resetting offset for partition lines4-1 to offset 0.
2019-11-18 21:27:17 INFO SubscriptionState:348 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-1-consumer, groupId=lineStatistics4] Resetting offset for partition lines4-2 to offset 0.

Related

Unable to reset partition offset with CooperativeStickyAssignor

I'm trying to reset offset to a partition to 0, like this:
final KafkaConsumer<String, String> stateConsumer = new KafkaConsumer<>(stateConsumerProperties.getConsumerProps());
stateConsumer.subscribe(STATE_TOPIC);
...
stateConsumer.seekToBeginning(stateConsumer.assignment());
...
stateConsumer.poll(Duration.ofMillis(1000)) // timeout too long, testing only
.forEach(record -> {
log.info("Warmup state read: " + record.value() + ", partition: " + record.partition());
stateMessages.add(record.value());
});
Consumer config is only this:
this.stateConsumerProperties.setConsumer(Map.of(
ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka:9092",
ConsumerConfig.GROUP_ID_CONFIG, "state",
ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest",
ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer",
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer",
ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "30",
ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, "2000",
ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, CustomPartitionAssignor.class.getName()
));
The "custom" assignor is only for logging purposes, these are the only overrides:
public class CustomPartitionAssignor extends CooperativeStickyAssignor {
...
#Override
public GroupAssignment assign(Cluster metadata, GroupSubscription groupSubscription) {
return super.assign(metadata, groupSubscription);
}
#Override
public void onAssignment(Assignment assignment, ConsumerGroupMetadata metadata) {
super.onAssignment(assignment, metadata);
Arrays.toString(ASSIGNED_PARTITIONS.toArray()));
ASSIGNED_PARTITIONS = assignment.partitions();
log.info("New Assigned partitions: " + assignment.partitions());
}
...
}
I have only two identical consumers for testing and each consumer runs this code on join.
The first joiner has no issue since there is not state yet ("consumers" read another topic and produce "state" to state topic).
Problem is, this is what I get in the log when the second consumer joins:
consumer_2 | 2022-10-10 18:46:26.833 INFO 7 --- [pool-1-thread-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-state-2, groupId=state] Setting offset for partition state-1 to the committed offset FetchPosition{offset=361, offsetEpoch=Optional[0], currentLeader=LeaderAndEpoch{leader=Optional[179bd3c6448e:9092 (id: 1001 rack: null)], epoch=0}}
consumer_2 | 2022-10-10 18:46:26.875 INFO 7 --- [pool-1-thread-1] c.company.fdi.poc.kafka.service.Consumer : Warmup state read: state 723, partition: 1
consumer_2 | 2022-10-10 18:46:26.876 INFO 7 --- [pool-1-thread-1] c.company.fdi.poc.kafka.service.Consumer : Warmup state read: state 725, partition: 1
consumer_2 | 2022-10-10 18:46:26.876 INFO 7 --- [pool-1-thread-1] c.company.fdi.poc.kafka.service.Consumer : Warmup state read: state 727, partition: 1
State "production" is just an atomic counter that starts at 0.
What I want to happen is that when a new consumer joins, it resets its assigned partition offset to 0 and starts reading from the first record.
I have a suspicion that this has something to do with the CooperativeStickyAssignor, but I have no clue as of now.
You can assume 2 consumers, 2 partitions for state topic if this helps. Cluster should be as balanced as possible.
Start is two partitions one consumer, then I add another consumer.
Any help much appreciated, thanks in advance.

In kafka stream topology with 'exactly_once' processing guarantee, message lost on exception

I have a requirement where I need to process messages from Kafka without losing any message and also need to maintain the message order. Therefore, I used transactions and enabled 'exactly_once' processing guarantee in my Kafka streams topology. As I assume that the topology processing will be 'all or nothing', that the message offset is committed only after the last node successfully processed the message.
However in a failure scenario, for example when the database is down and the processor fails to store message and throws an exception. At this point, the topology dies as intended and is recreated automatically on rebalance. I assume that the topology should either re-consume the original message again from the Kafka topic OR on application restart, it should re-consume that original message from Kafka topic. However, it seems that original message disappears and is never consumed or processed after that topology died.
What do I need to do to reprocess the original message sent to Kafka topic? Or what Kafka configuration requires change? Do I need manually assign a state store and keep track of messages processed on a changelog topic?
Topology:
#Singleton
public class EventTopology extends Topology {
private final Deserializer<String> deserializer = Serdes.String().deserializer();
private final Serializer<String> serializer = Serdes.String().serializer();
private final EventLogMessageSerializer eventLogMessageSerializer;
private final EventLogMessageDeserializer eventLogMessageDeserializer;
private final EventLogProcessorSupplier eventLogProcessorSupplier;
#Inject
public EventTopology(EventsConfig eventsConfig,
EventLogMessageSerializer eventLogMessageSerializer,
EventLogMessageDeserializer eventLogMessageDeserializer,
EventLogProcessorSupplier eventLogProcessorSupplier) {
this.eventLogMessageSerializer = eventLogMessageSerializer;
this.eventLogMessageDeserializer = eventLogMessageDeserializer;
this.eventLogProcessorSupplier = eventLogProcessorSupplier;
init(eventsConfig);
}
private void init(EventsConfig eventsConfig) {
var topics = eventsConfig.getTopicConfig().getTopics();
String eventLog = topics.get("eventLog");
addSource("EventsLogSource", deserializer, eventLogMessageDeserializer, eventLog)
.addProcessor("EventLogProcessor", eventLogProcessorSupplier, "EventsLogSource");
}
}
Processor:
#Singleton
#Slf4j
public class EventLogProcessor implements Processor<String, EventLogMessage> {
private final EventLogService eventLogService;
private ProcessorContext context;
#Inject
public EventLogProcessor(EventLogService eventLogService) {
this.eventLogService = eventLogService;
}
#Override
public void init(ProcessorContext context) {
this.context = context;
}
#Override
public void process(String key, EventLogMessage value) {
log.info("Processing EventLogMessage={}", value);
try {
eventLogService.storeInDatabase(value);
context.commit();
} catch (Exception e) {
log.warn("Failed to process EventLogMessage={}", value, e);
throw e;
}
}
#Override
public void close() {
}
}
Configuration:
eventsConfig:
saveTopicsEnabled: false
topologyConfig:
environment: "LOCAL"
broker: "localhost:9093"
enabled: true
initialiseWaitInterval: 3 seconds
applicationId: "eventsTopology"
config:
auto.offset.reset: latest
session.timeout.ms: 6000
fetch.max.wait.ms: 7000
heartbeat.interval.ms: 5000
connections.max.idle.ms: 7000
security.protocol: SSL
key.serializer: org.apache.kafka.common.serialization.StringSerializer
value.serializer: org.apache.kafka.common.serialization.StringSerializer
max.poll.records: 5
processing.guarantee: exactly_once
metric.reporters: com.simple.metrics.kafka.DropwizardReporter
default.deserialization.exception.handler: org.apache.kafka.streams.errors.LogAndContinueExceptionHandler
enable.idempotence: true
request.timeout.ms: 8000
acks: all
batch.size: 16384
linger.ms: 1
enable.auto.commit: false
state.dir: "/tmp"
topicConfig:
topics:
eventLog: "EVENT-LOG-LOCAL"
kafkaTopicConfig:
partitions: 18
replicationFactor: 1
config:
retention.ms: 604800000
Test:
Feature: Feature covering the scenarios to process event log messages produced by external client.
Background:
Given event topology is healthy
Scenario: event log messages produced are successfully stored in the database
Given database is down
And the following event log messages are published
| deptId | userId | eventType | endDate | eventPayload_partner |
| dept-1 | user-1234 | CREATE | 2021-04-15T00:00:00Z | PARTNER-1 |
When database is up
And database is healthy
Then event log stored in the database as follows
| dept_id | user_id | event_type | end_date | event_payload |
| dept-1 | user-1234 | CREATE | 2021-04-15T00:00:00Z | {"partner":"PARTNER-1"} |
Logs:
INFO [data-plane-kafka-request-handler-1] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Preparing to rebalance group eventsTopology in state PreparingRebalance with old generation 0 (__consumer_offsets-0) (reason: Adding new member eventsTopology-57fdac0e-09fb-4aa0-8b0b-7e01809b31fa-StreamThread-1-consumer-96a3e980-4286-461e-8536-5f04ccb2c778 with group instance id None)
INFO [executor-Rebalance] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Stabilized group eventsTopology generation 1 (__consumer_offsets-0)
INFO [data-plane-kafka-request-handler-2] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Assignment received from leader for group eventsTopology for generation 1
INFO [data-plane-kafka-request-handler-1] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-0_0 with producerId 0 and producer epoch 0 on partition __transaction_state-4
INFO [data-plane-kafka-request-handler-6] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-0_1 with producerId 1 and producer epoch 0 on partition __transaction_state-3
...
INFO [data-plane-kafka-request-handler-0] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-0_16 with producerId 17 and producer epoch 0 on partition __transaction_state-37
INFO [data-plane-kafka-request-handler-4] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_1 with producerId 18 and producer epoch 0 on partition __transaction_state-42
INFO [data-plane-kafka-request-handler-6] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_0 with producerId 19 and producer epoch 0 on partition __transaction_state-43
...
INFO [data-plane-kafka-request-handler-3] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_17 with producerId 34 and producer epoch 0 on partition __transaction_state-45
INFO [data-plane-kafka-request-handler-5] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_16 with producerId 35 and producer epoch 0 on partition __transaction_state-46
INFO [pool-26-thread-1] ManagerClient - Manager request {uri:http://localhost:8081/healthcheck, method:GET, body:'', headers:{}}
INFO [pool-26-thread-1] ManagerClient - Manager response from with body {"Database":{"healthy":true},"eventsTopology":{"healthy":true}}
INFO [dw-admin-130] KafkaConnectionCheck - successfully connected to kafka broker: localhost:9093
INFO [kafka-producer-network-thread | EVENT-LOG-LOCAL-test-client-id] LocalTestEnvironment - Message: ProducerRecord(topic=EVENT-LOG-LOCAL, partition=null, headers=RecordHeaders(headers = [], isReadOnly = true), key=null, value={"endDate":1618444800000,"deptId":"dept-1","userId":"user-1234","eventType":"CREATE","eventPayload":{"previousEndDate":null,"partner":"PARTNER-1","info":null}}, timestamp=null) pushed onto topic: EVENT-LOG-LOCAL
INFO [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] EventLogProcessor - Processing EventLogMessage=EventLogMessage(endDate=Thu Apr 15 01:00:00 BST 2021, deptId=dept-1, userId=user-1234, eventType=CREATE, eventPayload=EventLogMessage.EventPayload(previousEndDate=null, partner=PARTNER-1, info=null))
WARN [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] EventLogProcessor - Failed to process EventLogMessage=EventLogMessage(endDate=Thu Apr 15 01:00:00 BST 2021, deptId=dept-1, userId=user-1234, eventType=CREATE, eventPayload=EventLogMessage.EventPayload(previousEndDate=null, partner=PARTNER-1, info=null))
exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
at manager.service.EventLogService.storeInDatabase(EventLogService.java:24)
at manager.topology.processor.EventLogProcessor.process(EventLogProcessor.java:47)
at manager.topology.processor.EventLogProcessor.process(EventLogProcessor.java:19)
at org.apache.kafka.streams.processor.internals.ProcessorNode.lambda$process$2(ProcessorNode.java:142)
at org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:836)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:142)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:236)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:216)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:168)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:96)
at org.apache.kafka.streams.processor.internals.StreamTask.lambda$process$1(StreamTask.java:679)
at org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:836)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:679)
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1033)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:690)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
ERROR [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] org.apache.kafka.streams.processor.internals.TaskManager - stream-thread [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] Failed to process stream task 0_8 due to the following error:
org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_8, processor=EventsLogSource, topic=EVENT-LOG-LOCAL, partition=8, offset=0, stacktrace=exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
ERROR [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] Encountered the following exception during processing and the thread is going to shut down:
org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_8, processor=EventsLogSource, topic=EVENT-LOG-LOCAL, partition=8, offset=0, stacktrace=exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
ERROR [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] org.apache.kafka.streams.KafkaStreams - stream-client [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3] All stream threads have died. The instance will be in error state and should be closed.
Exception: java.lang.IllegalStateException thrown from the UncaughtExceptionHandler in thread "eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1"
INFO [executor-Heartbeat] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Member eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1-consumer-f11ca299-2a68-4317-a559-dd1b96cd431f in group eventsTopology has failed, removing it from the group
INFO [executor-Heartbeat] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Preparing to rebalance group eventsTopology in state PreparingRebalance with old generation 1 (__consumer_offsets-0) (reason: removing member eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1-consumer-f11ca299-2a68-4317-a559-dd1b96cd431f on heartbeat expiration)
INFO [data-plane-kafka-request-handler-2] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Stabilized group eventsTopology generation 2 (__consumer_offsets-0)
INFO [data-plane-kafka-request-handler-6] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Assignment received from leader for group eventsTopology for generation 2
INFO [data-plane-kafka-request-handler-0] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-0_0 with producerId 0 and producer epoch 1 on partition __transaction_state-4
...
INFO [data-plane-kafka-request-handler-0] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_16 with producerId 35 and producer epoch 1 on partition __transaction_state-46
INFO [main] Cluster - New databse host localhost/127.0.0.1:59423 added
com.jayway.awaitility.core.ConditionTimeoutException: Condition defined as a lambda expression in steps.EventLogSteps
Expecting:
<0>
to be equal to:
<1>
but was not. within 20 seconds.

Fetch offset 5705 is out of range for partition , resetting offset

I am getting below info message every time in kafka consumer.
2020-07-04 14:54:27.640 INFO 1 --- [istener-0-0-C-1] c.n.o.c.h.p.n.PersistenceKafkaConsumer : beginning to consume batch messages , Message Count :11
2020-07-04 14:54:27.809 INFO 1 --- [istener-0-0-C-1] c.n.o.c.h.p.n.PersistenceKafkaConsumer : Execution Time :169
2020-07-04 14:54:27.809 INFO 1 --- [istener-0-0-C-1] essageListenerContainer$ListenerConsumer : Committing: {nbi.cm.changes.mo.test23-1=OffsetAndMetadata{offset=5705, leaderEpoch=null, metadata=''}}
2020-07-04 14:54:27.812 INFO 1 --- [istener-0-0-C-1] c.n.o.c.h.p.n.PersistenceKafkaConsumer : Acknowledgment Success
2020-07-04 14:54:27.813 INFO 1 --- [istener-0-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-1, groupId=cm-persistence-notification] Fetch offset 5705 is out of range for partition nbi.cm.changes.mo.test23-1, resetting offset
2020-07-04 14:54:27.820 INFO 1 --- [istener-0-0-C-1] o.a.k.c.c.internals.SubscriptionState : [Consumer clientId=consumer-1, groupId=cm-persistence-notification] Resetting offset for partition nbi.cm.changes.mo.test23-1 to offset 666703.
Got OFFSET_OUT_OF_RANGE error in debug log and resetting to some other partition that actually not exist. Same all messages able to receive in consumer console.
But actually I committed offset before that only , offset are available in kafka , log retention policy is 24hr, so it's not deleted in kafka.
In debug log, I got below messages:
beginning to consume batch messages , Message Count :710
2020-07-02 04:58:31.486 DEBUG 1 --- [ce-notification] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-1, groupId=cm-persistence-notification] Node 1002 sent an incremental fetch response for session 253529272 with 1 response partition(s)
2020-07-02 04:58:31.486 DEBUG 1 --- [ce-notification] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-1, groupId=cm-persistence-notification] Fetch READ_UNCOMMITTED at offset 11372 for partition nbi.cm.changes.mo.test12-1 returned fetch data (error=OFFSET_OUT_OF_RANGE, highWaterMark=-1, lastStableOffset = -1, logStartOffset = -1, preferredReadReplica = absent, abortedTransactions = null, recordsSizeInBytes=0)
When all we will get OFFSET_OUT_OF_RANGE.
Listener Class :
#KafkaListener( id = "batch-listener-0", topics = "topic1", groupId = "test", containerFactory = KafkaConsumerConfiguration.CONTAINER_FACTORY_NAME )
public void receive(
#Payload List<String> messages,
#Header( KafkaHeaders.RECEIVED_MESSAGE_KEY ) List<String> keys,
#Header( KafkaHeaders.RECEIVED_PARTITION_ID ) List<Integer> partitions,
#Header( KafkaHeaders.RECEIVED_TOPIC ) List<String> topics,
#Header( KafkaHeaders.OFFSET ) List<Long> offsets,
Acknowledgment ack )
{
long startTime = System.currentTimeMillis();
handleNotifications( messages ); // will take more than 5s to process all messages
long endTime = System.currentTimeMillis();
long timeElapsed = endTime - startTime;
LOGGER.info( "Execution Time :{}", timeElapsed );
ack.acknowledge();
LOGGER.info( "Acknowledgment Success" );
}
Do i need to close consumer here , i thought spring-kafka automatically take care those , if no could you please tell how to close in apring-kafka and also how to check if rebalance happened or not , because in DEBUG log not able to see any log related to rebalance.

I think your consumer may be rebalancing, because you are not calling consumer.close() at the end of your process.
This is a guess, but if the retention policy isn't kicking in (and the logs are not being deleted), this is the only reason I can tell for that behaviour.
Update:
As you set them as #KafkaListeners, you could just call stop() on the KafkaListenerEndpointRegistry: kafkaListenerEndpointRegistry.stop()

Frequent "offset out of range" messages, partitions deserted by consumer

We are running 3 node Kafka 0.10.0.1 cluster. We have a consumer application which has a single consumer group connecting to multiple topics. We are seeing strange behaviour in consumer logs. With these lines
Fetch offset 1109143 is out of range for partition email-4, resetting offset
Fetch offset 952168 is out of range for partition email-7, resetting offset
Fetch offset 945796 is out of range for partition email-5, resetting offset
Fetch offset 950900 is out of range for partition email-0, resetting offset
Fetch offset 953163 is out of range for partition email-3, resetting offset
Fetch offset 1118389 is out of range for partition email-6, resetting offset
Fetch offset 1112177 is out of range for partition email-2, resetting offset
Fetch offset 1109539 is out of range for partition email-1, resetting offset
Some time later we saw these logs
[2018-06-08 19:45:28] :: INFO :: ConsumerCoordinator:333 - Revoking previously assigned partitions [sms-4, sms-3, sms-0, sms-2, sms-1] for group notifications-consumer
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator:381 - (Re-)joining group notifications-consumer
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO :: ConsumerCoordinator:225 - Setting newly assigned partitions [sms-8, sms-7, sms-9, sms-6, sms-5] for group notifications-consumer
I noticed that one of our topics was not seen in the list of Setting newly assigned partitions. Then that topic had no consumers attached to it for 8 hours at least. It's only when someone restarted application it started consuming from that topic. What can be going wrong here?
Here is consumer config
auto.commit.interval.ms = 3000
auto.offset.reset = latest
bootstrap.servers = [x.x.x.x:9092, x.x.x.x:9092, x.x.x.x:9092]
check.crcs = true
client.id =
connections.max.idle.ms = 540000
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = otp-notifications-consumer
heartbeat.interval.ms = 3000
interceptor.classes = null
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 50
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.ms = 50
request.timeout.ms = 305000
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = SSL
send.buffer.bytes = 131072
session.timeout.ms = 300000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = /x/x/client.truststore.jks
ssl.truststore.password = [hidden]
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
The topic which went orphan has 10 partitions, retention.ms=1800000, segment.ms=1800000.
Please help.

The offset out of range message you are seeing usually indicates the offset the consumer is at has been deleted on the broker. Upon hitting that the consumer will use auto.offset.reset to restart consuming.
With retention.ms=1800000 (30mins), you are only keeping data for a very short amount of time so it's expected that if you restart the consumer after several hours, the data is gone.

Spring REST webservice and session persistence

I have this Spring webservice test code:
#RestController
#RequestMapping("/counter")
public class CounterController
{
#Autowired
private Counter counter;
#RequestMapping(value = "/inc", method = GET)
public int inc() throws Exception {
counter.incCounter();
return counter.getCounter();
}
#RequestMapping(value = "/get", method = GET)
public int get() throws Exception {
Thread.sleep(5000);
return counter.getCounter();
}
}
where Counter is a session scoped object
#Component
#Scope(value = WebApplicationContext.SCOPE_SESSION, proxyMode = ScopedProxyMode.TARGET_CLASS)
public class Counter implements Serializable {
private static final long serialVersionUID = 9162936878293396831L;
private int value;
public int getCounter() {
return value;
}
public void incCounter() {
value += 1;
}
}
The session configuration
#Configuration
#EnableRedisHttpSession(maxInactiveIntervalInSeconds=1800)
public class HttpSessionConfig {
#Bean
public JedisConnectionFactory connectionFactory() {
return new JedisConnectionFactory();
}
#Bean
public HttpSessionStrategy httpSessionStrategy(){
return new HeaderHttpSessionStrategy();
}
}
As you can see the get() method sleeps 5 secons and returns the value of the counter.
The problem is that if I call inc() many times during the execution of the get() all the counter changes are lost because when get() finishes returns the value of the counter that it has when started the execution. The weird problem is that get() when finishes persists the counter (It is a session object) and when this operation is done all the changes are lost.
Does exist a way to prevent that functions that do not modify a session object not persist it?
Update: I think that the Spring code corroborates this wrong behavior. This snippet of code of the class ServletRequestAttributes shows that every session object that is accessed (regardless if the access is for read) is marked to be saved when the webservice operation finishes:
#Override
public Object getAttribute(String name, int scope) {
if (scope == SCOPE_REQUEST) {
if (!isRequestActive()) {
throw new IllegalStateException(
"Cannot ask for request attribute - request is not active anymore!");
}
return this.request.getAttribute(name);
}
else {
HttpSession session = getSession(false);
if (session != null) {
try {
Object value = session.getAttribute(name);
if (value != null) {
this.sessionAttributesToUpdate.put(name, value);
}
return value;
}
catch (IllegalStateException ex) {
// Session invalidated - shouldn't usually happen.
}
}
return null;
}
}
According to the Spring Session documentation:
Optimized Writes
The Session instances managed by RedisOperationsSessionRepository
keeps track of the properties that have changed and only updates
those. This means if an attribute is written once and read many times
we only need to write that attribute once.
Or the documentation is wrong or I'm doing something wrong.

I think You did some mistakes while testing Your code. I have just tested it, and it works as expected.
I have used SoapUI, created 2 request's with the same JSESSIONID value in Cookie (same session).
Then I requested for /get, and meanwhile in second request window, i spammed /inc.
What /get returned was the number of /inc. (at the beggining value was 0 , than I have incremented it to 11 while /get was sleeping. Finally, /get returned 11).
I suggest You double check if nothing is messed up with Your session.
Edit: Your code with additional logs: (I've increased the sleeping time to 10000):
2016-04-06 11:56:10.977 INFO 7884 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': initialization completed in 14 ms
2016-04-06 11:56:11.014 INFO 7884 --- [nio-8080-exec-1] c.p.controller.TestServiceController : Before 10sec counter value: 0
2016-04-06 11:56:21.015 INFO 7884 --- [nio-8080-exec-1] c.p.controller.TestServiceController : After 10sec counter value: 0
2016-04-06 11:56:36.955 INFO 7884 --- [nio-8080-exec-2] c.p.controller.TestServiceController : Before 10sec counter value: 0
2016-04-06 11:56:46.956 INFO 7884 --- [nio-8080-exec-2] c.p.controller.TestServiceController : After 10sec counter value: 0
2016-04-06 11:56:50.558 INFO 7884 --- [nio-8080-exec-3] c.p.controller.TestServiceController : Incrementing counter value: 1
2016-04-06 11:56:53.494 INFO 7884 --- [nio-8080-exec-4] c.p.controller.TestServiceController : Before 10sec counter value: 1
2016-04-06 11:57:03.496 INFO 7884 --- [nio-8080-exec-4] c.p.controller.TestServiceController : After 10sec counter value: 1
2016-04-06 11:57:05.600 INFO 7884 --- [nio-8080-exec-5] c.p.controller.TestServiceController : Before 10sec counter value: 1
2016-04-06 11:57:06.715 INFO 7884 --- [nio-8080-exec-6] c.p.controller.TestServiceController : Incrementing counter value: 2
2016-04-06 11:57:06.869 INFO 7884 --- [nio-8080-exec-7] c.p.controller.TestServiceController : Incrementing counter value: 3
2016-04-06 11:57:07.038 INFO 7884 --- [nio-8080-exec-8] c.p.controller.TestServiceController : Incrementing counter value: 4
2016-04-06 11:57:07.186 INFO 7884 --- [nio-8080-exec-9] c.p.controller.TestServiceController : Incrementing counter value: 5
2016-04-06 11:57:07.321 INFO 7884 --- [io-8080-exec-10] c.p.controller.TestServiceController : Incrementing counter value: 6
2016-04-06 11:57:07.478 INFO 7884 --- [nio-8080-exec-1] c.p.controller.TestServiceController : Incrementing counter value: 7
2016-04-06 11:57:07.641 INFO 7884 --- [nio-8080-exec-2] c.p.controller.TestServiceController : Incrementing counter value: 8
2016-04-06 11:57:07.794 INFO 7884 --- [nio-8080-exec-3] c.p.controller.TestServiceController : Incrementing counter value: 9
2016-04-06 11:57:07.967 INFO 7884 --- [nio-8080-exec-4] c.p.controller.TestServiceController : Incrementing counter value: 10
2016-04-06 11:57:08.121 INFO 7884 --- [nio-8080-exec-6] c.p.controller.TestServiceController : Incrementing counter value: 11
2016-04-06 11:57:15.602 INFO 7884 --- [nio-8080-exec-5] c.p.controller.TestServiceController : After 10sec counter value: 11

It seems that it is nothing to do with this problem, this is the expected behavior of Spring Session with session scoped beans. For me it 's a critical problem and I've decided forget distributed caches (Redis and Hazelcast) and use the MapSessionRepository

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Kafka Streams parallel processing not working - java

Related

Unable to reset partition offset with CooperativeStickyAssignor

In kafka stream topology with 'exactly_once' processing guarantee, message lost on exception

Fetch offset 5705 is out of range for partition , resetting offset

Frequent "offset out of range" messages, partitions deserted by consumer

Spring REST webservice and session persistence

Categories

Resources