Unable to reset partition offset with CooperativeStickyAssignor - java

I'm trying to reset offset to a partition to 0, like this:
final KafkaConsumer<String, String> stateConsumer = new KafkaConsumer<>(stateConsumerProperties.getConsumerProps());
stateConsumer.subscribe(STATE_TOPIC);
...
stateConsumer.seekToBeginning(stateConsumer.assignment());
...
stateConsumer.poll(Duration.ofMillis(1000)) // timeout too long, testing only
.forEach(record -> {
log.info("Warmup state read: " + record.value() + ", partition: " + record.partition());
stateMessages.add(record.value());
});
Consumer config is only this:
this.stateConsumerProperties.setConsumer(Map.of(
ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka:9092",
ConsumerConfig.GROUP_ID_CONFIG, "state",
ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest",
ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer",
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer",
ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "30",
ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, "2000",
ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, CustomPartitionAssignor.class.getName()
));
The "custom" assignor is only for logging purposes, these are the only overrides:
public class CustomPartitionAssignor extends CooperativeStickyAssignor {
...
#Override
public GroupAssignment assign(Cluster metadata, GroupSubscription groupSubscription) {
return super.assign(metadata, groupSubscription);
}
#Override
public void onAssignment(Assignment assignment, ConsumerGroupMetadata metadata) {
super.onAssignment(assignment, metadata);
Arrays.toString(ASSIGNED_PARTITIONS.toArray()));
ASSIGNED_PARTITIONS = assignment.partitions();
log.info("New Assigned partitions: " + assignment.partitions());
}
...
}
I have only two identical consumers for testing and each consumer runs this code on join.
The first joiner has no issue since there is not state yet ("consumers" read another topic and produce "state" to state topic).
Problem is, this is what I get in the log when the second consumer joins:
consumer_2 | 2022-10-10 18:46:26.833 INFO 7 --- [pool-1-thread-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-state-2, groupId=state] Setting offset for partition state-1 to the committed offset FetchPosition{offset=361, offsetEpoch=Optional[0], currentLeader=LeaderAndEpoch{leader=Optional[179bd3c6448e:9092 (id: 1001 rack: null)], epoch=0}}
consumer_2 | 2022-10-10 18:46:26.875 INFO 7 --- [pool-1-thread-1] c.company.fdi.poc.kafka.service.Consumer : Warmup state read: state 723, partition: 1
consumer_2 | 2022-10-10 18:46:26.876 INFO 7 --- [pool-1-thread-1] c.company.fdi.poc.kafka.service.Consumer : Warmup state read: state 725, partition: 1
consumer_2 | 2022-10-10 18:46:26.876 INFO 7 --- [pool-1-thread-1] c.company.fdi.poc.kafka.service.Consumer : Warmup state read: state 727, partition: 1
State "production" is just an atomic counter that starts at 0.
What I want to happen is that when a new consumer joins, it resets its assigned partition offset to 0 and starts reading from the first record.
I have a suspicion that this has something to do with the CooperativeStickyAssignor, but I have no clue as of now.
You can assume 2 consumers, 2 partitions for state topic if this helps. Cluster should be as balanced as possible.
Start is two partitions one consumer, then I add another consumer.
Any help much appreciated, thanks in advance.

Related

Spring WebFlux | How to wait till a list of Monos finish execution in parallel

I want to send n-number of requests to a REST endpoint in parallel.I want to make sure these get executed in different threads for performance and need to wait till all n requests finish.
Only way I could come up with is using CountDownLatch as follows (please check the main() method. This is testable code):
public static void main(String args[]) throws Exception {
int n = 10; //n is dynamic during runtime
final CountDownLatch waitForNRequests = new CountDownLatch(n);
//send n requests
for (int i =0;i<n;i++) {
var r = testRestCall(""+i);
r.publishOn(Schedulers.parallel()).subscribe(res -> {
System.out.println(">>>>>>> Thread: " + Thread.currentThread().getName() + " response:" +res.getBody());
waitForNRequests.countDown();
});
}
waitForNRequests.await(); //wait till all n requests finish before goto the next line
System.out.println("All n requests finished");
Thread.sleep(10000);
}
public static Mono<ResponseEntity<Map>> testRestCall(String id) {
WebClient client = WebClient.create("https://reqres.in/api");
JSONObject request = new JSONObject();
request.put("name", "user"+ id);
request.put("job", "leader");
var res = client.post().uri("/users")
.contentType(MediaType.APPLICATION_JSON)
.body(BodyInserters.fromValue(request.toString()))
.accept(MediaType.APPLICATION_JSON)
.retrieve()
.toEntity(Map.class)
.onErrorReturn(ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE).build());
return res;
}
This doesnt look good and I am sure there is an elegant solution without using Latches..etc
I tried following method,but I dont know how to resolve following issues:
Flux.merge() , contact() results in executing all n-requests in a single thread
How to wait till n-requests finish execution (fork-join)?
List<Mono<ResponseEntity<Map>>> lst = new ArrayList<>();
int n = 10; //n is dynamic during runtime
for (int i =0;i<n;i++) {
var r = testRestCall(""+i);
lst.add(r);
}
var t= Flux.fromIterable(lst).flatMap(Function.identity()); //tried merge() contact() as well
t.publishOn(Schedulers.parallel()).subscribe(res -> {
System.out.println(">>>>>>> Thread: " + Thread.currentThread().getName() + " response:" +res.getBody());
///??? all requests execute in a single thread.How to parallelize ?
});
//???How to wait till all n requests finish before goto the next line
System.out.println("All n requests finished");
Thread.sleep(10000);
Update:
I found the reason why the Flux subscriber runs in the same thread, I need to create a ParallelFlux. So the correct order should be:
var t= Flux.fromIterable(lst).flatMap(Function.identity());
t.parallel()
.runOn(Schedulers.parallel())
.subscribe(res -> {
System.out.println(">>>>>>> Thread: " + Thread.currentThread().getName() + " response:" +res.getBody());
///??? all requests execute in a single thread.How to parallelize ?
});
Ref: https://projectreactor.io/docs/core/release/reference/#advanced-parallelizing-parralelflux
In reactive you think not about threads but about concurrency.
Reactor executes non-blocking/async tasks on a small number of threads using Schedulers abstraction to execute tasks. Schedulers have responsibilities very similar to ExecutorService. By default, for parallel scheduler number of threads is equal to number of CPU cores, but could be controlled by `reactor.schedulers.defaultPoolSize’ system property.
In your example instead of creating multiple Mono and then merge them, better to use Flux and then process elements in parallel controlling concurrency.
Flux.range(1, 10)
.flatMap(this::testRestCall)
By default, flatMap will process Queues.SMALL_BUFFER_SIZE = 256 number of in-flight inner sequences.
You could control concurrency flatMap(item -> process(item), concurrency) or use concatMap operator if you want to process sequentially. Check flatMap(..., int concurrency, int prefetch) for details.
Flux.range(1, 10)
.flatMap(i -> testRestCall(i), 5)
The following test shows that calls are executed in different threads
#Test
void testParallel() {
var flow = Flux.range(1, 10)
.flatMap(i -> testRestCall(i))
.log()
.then(Mono.just("complete"));
StepVerifier.create(flow)
.expectNext("complete")
.verifyComplete();
}
The result log
2022-12-30 21:31:25.169 INFO 43383 --- [ctor-http-nio-4] reactor.Mono.FlatMap.3 : | onComplete()
2022-12-30 21:31:25.170 INFO 43383 --- [ctor-http-nio-3] reactor.Mono.FlatMap.2 : | onComplete()
2022-12-30 21:31:25.169 INFO 43383 --- [ctor-http-nio-2] reactor.Mono.FlatMap.1 : | onComplete()
2022-12-30 21:31:25.169 INFO 43383 --- [ctor-http-nio-8] reactor.Mono.FlatMap.7 : | onComplete()
2022-12-30 21:31:25.169 INFO 43383 --- [tor-http-nio-11] reactor.Mono.FlatMap.10 : | onComplete()
2022-12-30 21:31:25.169 INFO 43383 --- [ctor-http-nio-7] reactor.Mono.FlatMap.6 : | onComplete()
2022-12-30 21:31:25.169 INFO 43383 --- [ctor-http-nio-9] reactor.Mono.FlatMap.8 : | onComplete()
2022-12-30 21:31:25.170 INFO 43383 --- [ctor-http-nio-6] reactor.Mono.FlatMap.5 : | onComplete()
2022-12-30 21:31:25.378 INFO 43383 --- [ctor-http-nio-5] reactor.Mono.FlatMap.4 : | onComplete()

In kafka stream topology with 'exactly_once' processing guarantee, message lost on exception

I have a requirement where I need to process messages from Kafka without losing any message and also need to maintain the message order. Therefore, I used transactions and enabled 'exactly_once' processing guarantee in my Kafka streams topology. As I assume that the topology processing will be 'all or nothing', that the message offset is committed only after the last node successfully processed the message.
However in a failure scenario, for example when the database is down and the processor fails to store message and throws an exception. At this point, the topology dies as intended and is recreated automatically on rebalance. I assume that the topology should either re-consume the original message again from the Kafka topic OR on application restart, it should re-consume that original message from Kafka topic. However, it seems that original message disappears and is never consumed or processed after that topology died.
What do I need to do to reprocess the original message sent to Kafka topic? Or what Kafka configuration requires change? Do I need manually assign a state store and keep track of messages processed on a changelog topic?
Topology:
#Singleton
public class EventTopology extends Topology {
private final Deserializer<String> deserializer = Serdes.String().deserializer();
private final Serializer<String> serializer = Serdes.String().serializer();
private final EventLogMessageSerializer eventLogMessageSerializer;
private final EventLogMessageDeserializer eventLogMessageDeserializer;
private final EventLogProcessorSupplier eventLogProcessorSupplier;
#Inject
public EventTopology(EventsConfig eventsConfig,
EventLogMessageSerializer eventLogMessageSerializer,
EventLogMessageDeserializer eventLogMessageDeserializer,
EventLogProcessorSupplier eventLogProcessorSupplier) {
this.eventLogMessageSerializer = eventLogMessageSerializer;
this.eventLogMessageDeserializer = eventLogMessageDeserializer;
this.eventLogProcessorSupplier = eventLogProcessorSupplier;
init(eventsConfig);
}
private void init(EventsConfig eventsConfig) {
var topics = eventsConfig.getTopicConfig().getTopics();
String eventLog = topics.get("eventLog");
addSource("EventsLogSource", deserializer, eventLogMessageDeserializer, eventLog)
.addProcessor("EventLogProcessor", eventLogProcessorSupplier, "EventsLogSource");
}
}
Processor:
#Singleton
#Slf4j
public class EventLogProcessor implements Processor<String, EventLogMessage> {
private final EventLogService eventLogService;
private ProcessorContext context;
#Inject
public EventLogProcessor(EventLogService eventLogService) {
this.eventLogService = eventLogService;
}
#Override
public void init(ProcessorContext context) {
this.context = context;
}
#Override
public void process(String key, EventLogMessage value) {
log.info("Processing EventLogMessage={}", value);
try {
eventLogService.storeInDatabase(value);
context.commit();
} catch (Exception e) {
log.warn("Failed to process EventLogMessage={}", value, e);
throw e;
}
}
#Override
public void close() {
}
}
Configuration:
eventsConfig:
saveTopicsEnabled: false
topologyConfig:
environment: "LOCAL"
broker: "localhost:9093"
enabled: true
initialiseWaitInterval: 3 seconds
applicationId: "eventsTopology"
config:
auto.offset.reset: latest
session.timeout.ms: 6000
fetch.max.wait.ms: 7000
heartbeat.interval.ms: 5000
connections.max.idle.ms: 7000
security.protocol: SSL
key.serializer: org.apache.kafka.common.serialization.StringSerializer
value.serializer: org.apache.kafka.common.serialization.StringSerializer
max.poll.records: 5
processing.guarantee: exactly_once
metric.reporters: com.simple.metrics.kafka.DropwizardReporter
default.deserialization.exception.handler: org.apache.kafka.streams.errors.LogAndContinueExceptionHandler
enable.idempotence: true
request.timeout.ms: 8000
acks: all
batch.size: 16384
linger.ms: 1
enable.auto.commit: false
state.dir: "/tmp"
topicConfig:
topics:
eventLog: "EVENT-LOG-LOCAL"
kafkaTopicConfig:
partitions: 18
replicationFactor: 1
config:
retention.ms: 604800000
Test:
Feature: Feature covering the scenarios to process event log messages produced by external client.
Background:
Given event topology is healthy
Scenario: event log messages produced are successfully stored in the database
Given database is down
And the following event log messages are published
| deptId | userId | eventType | endDate | eventPayload_partner |
| dept-1 | user-1234 | CREATE | 2021-04-15T00:00:00Z | PARTNER-1 |
When database is up
And database is healthy
Then event log stored in the database as follows
| dept_id | user_id | event_type | end_date | event_payload |
| dept-1 | user-1234 | CREATE | 2021-04-15T00:00:00Z | {"partner":"PARTNER-1"} |
Logs:
INFO [data-plane-kafka-request-handler-1] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Preparing to rebalance group eventsTopology in state PreparingRebalance with old generation 0 (__consumer_offsets-0) (reason: Adding new member eventsTopology-57fdac0e-09fb-4aa0-8b0b-7e01809b31fa-StreamThread-1-consumer-96a3e980-4286-461e-8536-5f04ccb2c778 with group instance id None)
INFO [executor-Rebalance] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Stabilized group eventsTopology generation 1 (__consumer_offsets-0)
INFO [data-plane-kafka-request-handler-2] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Assignment received from leader for group eventsTopology for generation 1
INFO [data-plane-kafka-request-handler-1] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-0_0 with producerId 0 and producer epoch 0 on partition __transaction_state-4
INFO [data-plane-kafka-request-handler-6] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-0_1 with producerId 1 and producer epoch 0 on partition __transaction_state-3
...
INFO [data-plane-kafka-request-handler-0] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-0_16 with producerId 17 and producer epoch 0 on partition __transaction_state-37
INFO [data-plane-kafka-request-handler-4] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_1 with producerId 18 and producer epoch 0 on partition __transaction_state-42
INFO [data-plane-kafka-request-handler-6] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_0 with producerId 19 and producer epoch 0 on partition __transaction_state-43
...
INFO [data-plane-kafka-request-handler-3] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_17 with producerId 34 and producer epoch 0 on partition __transaction_state-45
INFO [data-plane-kafka-request-handler-5] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_16 with producerId 35 and producer epoch 0 on partition __transaction_state-46
INFO [pool-26-thread-1] ManagerClient - Manager request {uri:http://localhost:8081/healthcheck, method:GET, body:'', headers:{}}
INFO [pool-26-thread-1] ManagerClient - Manager response from with body {"Database":{"healthy":true},"eventsTopology":{"healthy":true}}
INFO [dw-admin-130] KafkaConnectionCheck - successfully connected to kafka broker: localhost:9093
INFO [kafka-producer-network-thread | EVENT-LOG-LOCAL-test-client-id] LocalTestEnvironment - Message: ProducerRecord(topic=EVENT-LOG-LOCAL, partition=null, headers=RecordHeaders(headers = [], isReadOnly = true), key=null, value={"endDate":1618444800000,"deptId":"dept-1","userId":"user-1234","eventType":"CREATE","eventPayload":{"previousEndDate":null,"partner":"PARTNER-1","info":null}}, timestamp=null) pushed onto topic: EVENT-LOG-LOCAL
INFO [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] EventLogProcessor - Processing EventLogMessage=EventLogMessage(endDate=Thu Apr 15 01:00:00 BST 2021, deptId=dept-1, userId=user-1234, eventType=CREATE, eventPayload=EventLogMessage.EventPayload(previousEndDate=null, partner=PARTNER-1, info=null))
WARN [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] EventLogProcessor - Failed to process EventLogMessage=EventLogMessage(endDate=Thu Apr 15 01:00:00 BST 2021, deptId=dept-1, userId=user-1234, eventType=CREATE, eventPayload=EventLogMessage.EventPayload(previousEndDate=null, partner=PARTNER-1, info=null))
exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
at manager.service.EventLogService.storeInDatabase(EventLogService.java:24)
at manager.topology.processor.EventLogProcessor.process(EventLogProcessor.java:47)
at manager.topology.processor.EventLogProcessor.process(EventLogProcessor.java:19)
at org.apache.kafka.streams.processor.internals.ProcessorNode.lambda$process$2(ProcessorNode.java:142)
at org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:836)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:142)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:236)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:216)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:168)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:96)
at org.apache.kafka.streams.processor.internals.StreamTask.lambda$process$1(StreamTask.java:679)
at org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:836)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:679)
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1033)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:690)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
ERROR [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] org.apache.kafka.streams.processor.internals.TaskManager - stream-thread [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] Failed to process stream task 0_8 due to the following error:
org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_8, processor=EventsLogSource, topic=EVENT-LOG-LOCAL, partition=8, offset=0, stacktrace=exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
ERROR [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] Encountered the following exception during processing and the thread is going to shut down:
org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_8, processor=EventsLogSource, topic=EVENT-LOG-LOCAL, partition=8, offset=0, stacktrace=exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
ERROR [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1] org.apache.kafka.streams.KafkaStreams - stream-client [eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3] All stream threads have died. The instance will be in error state and should be closed.
Exception: java.lang.IllegalStateException thrown from the UncaughtExceptionHandler in thread "eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1"
INFO [executor-Heartbeat] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Member eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1-consumer-f11ca299-2a68-4317-a559-dd1b96cd431f in group eventsTopology has failed, removing it from the group
INFO [executor-Heartbeat] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Preparing to rebalance group eventsTopology in state PreparingRebalance with old generation 1 (__consumer_offsets-0) (reason: removing member eventsTopology-b21df600-cd39-4c9d-9e7a-f55f53ac9fd3-StreamThread-1-consumer-f11ca299-2a68-4317-a559-dd1b96cd431f on heartbeat expiration)
INFO [data-plane-kafka-request-handler-2] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Stabilized group eventsTopology generation 2 (__consumer_offsets-0)
INFO [data-plane-kafka-request-handler-6] kafka.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Assignment received from leader for group eventsTopology for generation 2
INFO [data-plane-kafka-request-handler-0] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-0_0 with producerId 0 and producer epoch 1 on partition __transaction_state-4
...
INFO [data-plane-kafka-request-handler-0] kafka.coordinator.transaction.TransactionCoordinator - [TransactionCoordinator id=0] Initialized transactionalId eventsTopology-1_16 with producerId 35 and producer epoch 1 on partition __transaction_state-46
INFO [main] Cluster - New databse host localhost/127.0.0.1:59423 added
com.jayway.awaitility.core.ConditionTimeoutException: Condition defined as a lambda expression in steps.EventLogSteps
Expecting:
<0>
to be equal to:
<1>
but was not. within 20 seconds.

Fetch offset 5705 is out of range for partition , resetting offset

I am getting below info message every time in kafka consumer.
2020-07-04 14:54:27.640 INFO 1 --- [istener-0-0-C-1] c.n.o.c.h.p.n.PersistenceKafkaConsumer : beginning to consume batch messages , Message Count :11
2020-07-04 14:54:27.809 INFO 1 --- [istener-0-0-C-1] c.n.o.c.h.p.n.PersistenceKafkaConsumer : Execution Time :169
2020-07-04 14:54:27.809 INFO 1 --- [istener-0-0-C-1] essageListenerContainer$ListenerConsumer : Committing: {nbi.cm.changes.mo.test23-1=OffsetAndMetadata{offset=5705, leaderEpoch=null, metadata=''}}
2020-07-04 14:54:27.812 INFO 1 --- [istener-0-0-C-1] c.n.o.c.h.p.n.PersistenceKafkaConsumer : Acknowledgment Success
2020-07-04 14:54:27.813 INFO 1 --- [istener-0-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-1, groupId=cm-persistence-notification] Fetch offset 5705 is out of range for partition nbi.cm.changes.mo.test23-1, resetting offset
2020-07-04 14:54:27.820 INFO 1 --- [istener-0-0-C-1] o.a.k.c.c.internals.SubscriptionState : [Consumer clientId=consumer-1, groupId=cm-persistence-notification] Resetting offset for partition nbi.cm.changes.mo.test23-1 to offset 666703.
Got OFFSET_OUT_OF_RANGE error in debug log and resetting to some other partition that actually not exist. Same all messages able to receive in consumer console.
But actually I committed offset before that only , offset are available in kafka , log retention policy is 24hr, so it's not deleted in kafka.
In debug log, I got below messages:
beginning to consume batch messages , Message Count :710
2020-07-02 04:58:31.486 DEBUG 1 --- [ce-notification] o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-1, groupId=cm-persistence-notification] Node 1002 sent an incremental fetch response for session 253529272 with 1 response partition(s)
2020-07-02 04:58:31.486 DEBUG 1 --- [ce-notification] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-1, groupId=cm-persistence-notification] Fetch READ_UNCOMMITTED at offset 11372 for partition nbi.cm.changes.mo.test12-1 returned fetch data (error=OFFSET_OUT_OF_RANGE, highWaterMark=-1, lastStableOffset = -1, logStartOffset = -1, preferredReadReplica = absent, abortedTransactions = null, recordsSizeInBytes=0)
When all we will get OFFSET_OUT_OF_RANGE.
Listener Class :
#KafkaListener( id = "batch-listener-0", topics = "topic1", groupId = "test", containerFactory = KafkaConsumerConfiguration.CONTAINER_FACTORY_NAME )
public void receive(
#Payload List<String> messages,
#Header( KafkaHeaders.RECEIVED_MESSAGE_KEY ) List<String> keys,
#Header( KafkaHeaders.RECEIVED_PARTITION_ID ) List<Integer> partitions,
#Header( KafkaHeaders.RECEIVED_TOPIC ) List<String> topics,
#Header( KafkaHeaders.OFFSET ) List<Long> offsets,
Acknowledgment ack )
{
long startTime = System.currentTimeMillis();
handleNotifications( messages ); // will take more than 5s to process all messages
long endTime = System.currentTimeMillis();
long timeElapsed = endTime - startTime;
LOGGER.info( "Execution Time :{}", timeElapsed );
ack.acknowledge();
LOGGER.info( "Acknowledgment Success" );
}
Do i need to close consumer here , i thought spring-kafka automatically take care those , if no could you please tell how to close in apring-kafka and also how to check if rebalance happened or not , because in DEBUG log not able to see any log related to rebalance.
I think your consumer may be rebalancing, because you are not calling consumer.close() at the end of your process.
This is a guess, but if the retention policy isn't kicking in (and the logs are not being deleted), this is the only reason I can tell for that behaviour.
Update:
As you set them as #KafkaListeners, you could just call stop() on the KafkaListenerEndpointRegistry: kafkaListenerEndpointRegistry.stop()

Kafka Streams parallel processing not working

I'm learning Kafka Stream and I want to build a simple application that reads lines of text from one topic and puts number of letter occurrences to InfluxDB. My second goal is to run it in parallel. I would like to have two running instances of Kafka Stream both of them processing input at the same time but from different partitions. Unfortunately, after I launch both instances only one of them is working second one is waiting.
My configuration is as follows:
* input topic has 4 partitions
* Stream application has a property num.stream.threads set to 2
Here is the source code:
public class LineStatisticsStream {
static Pattern letterPattern = Pattern.compile("[a-z]");
InfluxDB influxDB;
public static void main(String[] args) {
new LineStatisticsStream().start();
}
Properties getStreamProperties() {
Properties properties = new Properties();
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");
properties.put(StreamsConfig.APPLICATION_ID_CONFIG,"lineStatistics4");
properties.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
properties.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG,2);
return properties;
}
Topology getStreamTopology(ForeachAction<String,Long> action) {
StreamsBuilder builder = new StreamsBuilder();
KStream<String,String> linesStream = builder.stream("lines", Consumed.with(Serdes.String(),Serdes.String()));
linesStream.flatMapValues(this::findAllLetters)
.map((k,v)-> KeyValue.pair(v,v))
.groupByKey(Grouped.with(Serdes.String(),Serdes.String()))
.count()
.toStream()
.to("letterCount",Produced.with(Serdes.String(),Serdes.Long()));
KStream<String,Long> countStream = builder.stream("letterCount", Consumed.with(Serdes.String(),Serdes.Long()));
countStream.peek(action);
return builder.build();
}
private void insertToInfluxDB(String letter, Long count) {
influxDB.write(Point.measurement("lettersCount")
.tag("letter",letter)
.time(System.currentTimeMillis(), TimeUnit.MILLISECONDS)
.addField("count", count)
.build()
);
}
private List<String> findAllLetters(String s) {
List<String> letters = new ArrayList<>();
Matcher matcher= letterPattern.matcher(s.toLowerCase());
while(matcher.find()) {
letters.add(matcher.group(0));
}
return letters;
}
private void start() {
influxDB = InfluxDBFactory.connect("http://127.0.0.1:8086");
influxDB.setDatabase("letters");
influxDB.enableBatch(BatchOptions.DEFAULTS);
final KafkaStreams streams = new KafkaStreams(getStreamTopology(this::insertToInfluxDB), getStreamProperties());
streams.start();
Runtime.getRuntime().addShutdownHook(new Thread(()->{streams.close();influxDB.close();}));
}
}
And last few lines of a log from "waiting" application:
2019-11-18 21:27:16 INFO ConsumerCoordinator:982 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-1-consumer, groupId=lineStatistics4] Found no committed offset for partition lines4-2
2019-11-18 21:27:16 INFO ConsumerCoordinator:982 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-2-consumer, groupId=lineStatistics4] Found no committed offset for partition lines4-1
2019-11-18 21:27:16 INFO RocksDBTimestampedStore:82 - Opening store KSTREAM-AGGREGATE-STATE-STORE-0000000003 in regular mode
2019-11-18 21:27:16 INFO RocksDBTimestampedStore:82 - Opening store KSTREAM-AGGREGATE-STATE-STORE-0000000003 in regular mode
2019-11-18 21:27:16 INFO KafkaConsumer:1068 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-2-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions
2019-11-18 21:27:16 INFO KafkaConsumer:1068 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-2-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions
2019-11-18 21:27:16 INFO StreamThread:212 - stream-thread [lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-2] State transition from PARTITIONS_ASSIGNED to RUNNING
2019-11-18 21:27:16 INFO KafkaConsumer:1068 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-1-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions
2019-11-18 21:27:16 INFO KafkaConsumer:1068 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-1-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions
2019-11-18 21:27:16 INFO StreamThread:212 - stream-thread [lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-1] State transition from PARTITIONS_ASSIGNED to RUNNING
2019-11-18 21:27:16 INFO KafkaStreams:263 - stream-client [lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099] State transition from REBALANCING to RUNNING
2019-11-18 21:27:16 INFO ConsumerCoordinator:982 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-2-consumer, groupId=lineStatistics4] Found no committed offset for partition lines4-1
2019-11-18 21:27:16 INFO ConsumerCoordinator:982 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-1-consumer, groupId=lineStatistics4] Found no committed offset for partition lines4-2
2019-11-18 21:27:16 INFO ConsumerCoordinator:525 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-2-consumer, groupId=lineStatistics4] Setting offset for partition lineStatistics4-KSTREAM-AGGREGATE-STATE-STORE-0000000003-repartition-0 to the committed offset FetchPosition{offset=73, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=localhost:9092 (id: 1 rack: null), epoch=0}}
2019-11-18 21:27:16 INFO ConsumerCoordinator:525 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-1-consumer, groupId=lineStatistics4] Setting offset for partition lines4-0 to the committed offset FetchPosition{offset=63, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=localhost:9092 (id: 1 rack: null), epoch=0}}
2019-11-18 21:27:16 INFO ConsumerCoordinator:525 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-1-consumer, groupId=lineStatistics4] Setting offset for partition lineStatistics4-KSTREAM-AGGREGATE-STATE-STORE-0000000003-repartition-3 to the committed offset FetchPosition{offset=40, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=localhost:9092 (id: 1 rack: null), epoch=0}}
2019-11-18 21:27:17 INFO SubscriptionState:348 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-2-consumer, groupId=lineStatistics4] Resetting offset for partition lines4-1 to offset 0.
2019-11-18 21:27:17 INFO SubscriptionState:348 - [Consumer clientId=lineStatistics4-5e5e2a71-2a3b-4e51-9d4c-4e470a8a8099-StreamThread-1-consumer, groupId=lineStatistics4] Resetting offset for partition lines4-2 to offset 0.

Apache Kafka Java consumer does not receive message for topic with replication factor more than one

I'm starting on Apache Kakfa with a simple Producer, Consumer app in Java. I'm using kafka-clients version 0.10.0.1 and running it on a Mac.
I created a topic named replicated_topic_partitioned with 3 partitions and with replication factor as 3.
I started the zookeeper at port 2181. I started three brokers with id 1, 2 and 3 on ports 9092, 9093 and 9094 respectively.
Here's the output of the describe command
kafka_2.12-2.3.0/bin/kafka-topics.sh --describe --topic replicated_topic_partitioned --bootstrap-server localhost:9092
Topic:replicated_topic_partitioned PartitionCount:3 ReplicationFactor:3 Configs:segment.bytes=1073741824
Topic: replicated_topic_partitioned Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
Topic: replicated_topic_partitioned Partition: 1 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: replicated_topic_partitioned Partition: 2 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
I wrote a simple producer and a consumer code. The producer ran successfully and published the messages. But when I start the consumer, the poll call just waits indefinitely. On debugging, I found that it keeps on looping at the awaitMetadataUpdate method on the ConsumerNetworkClient.
Here are the code for Producer and Consumer
Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> myProducer = new KafkaProducer<>(properties);
DateFormat dtFormat = new SimpleDateFormat("yyyy/MM/dd HH:mm:ss:SSS");
String topic = "replicated_topic_partitioned";
int numberOfRecords = 10;
try {
for (int i = 0; i < numberOfRecords; i++) {
String message = String.format("Message: %s sent at %s", Integer.toString(i), dtFormat.format(new Date()));
System.out.println("Sending " + message);
myProducer.send(new ProducerRecord<String, String>(topic, message));
}
} catch (Exception e) {
e.printStackTrace();
} finally {
myProducer.close();
}
Consumer.java
Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("group.id", UUID.randomUUID().toString());
properties.put("auto.offset.reset", "earliest");
KafkaConsumer<String, String> myConsumer = new KafkaConsumer<>(properties);
String topic = "replicated_topic_partitioned";
myConsumer.subscribe(Collections.singletonList(topic));
try {
while (true){
ConsumerRecords<String, String> records = myConsumer.poll(1000);
printRecords(records);
}
} finally {
myConsumer.close();
}
Adding some key-fields from server.properties
broker.id=1
host.name=localhost
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka-logs-1
num.partitions=1
num.recovery.threads.per.data.dir=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0
The server.properties for the other two brokers was a replica of the above with broker.id, the port and thelog.dirs changed.
This did not work for me:
Kafka 0.9.0.1 Java Consumer stuck in awaitMetadataUpdate()
But, if I start the consumer from the command line passing a partition, it successfully reads the messages for that partition. But it does not receive any message when just a topic is specified.
Works:
kafka_2.12-2.3.0/bin/kafka-console-consumer.sh --topic replicated_topic_partitioned --bootstrap-server localhost:9092
--from-beginning --partition 1
Does not work:
kafka_2.12-2.3.0/bin/kafka-console-consumer.sh --topic replicated_topic_partitioned --bootstrap-server localhost:9092
--from-beginning
NOTE: The above consumer works perfectly for a topic with replication factor equals 1.
Question:
Why does the Java Producer not read any message for topic with replication factor more than one (even when assigning it to a partition) (like myConsumer.assign(Collections.singletonList(new TopicPartition(topic, 2))?
Why does the console consumer read message only when passed a partition (again works for a topic with replication factor of one)
so, youre sending 10 records, but all 10 records have the SAME key:
for (int i = 0; i < numberOfRecords; i++) {
String message = String.format("Message: %s sent at %s", Integer.toString(i), dtFormat.format(new Date()));
System.out.println("Sending " + message);
myProducer.send(new ProducerRecord<String, String>(topic, message)); <--- KEY=topic
}
unless told otherwise (by setting a partition directly on the ProducerRecord) the partition into which a record is delivered is determine by something like:
partition = murmur2(serialize(key)) % numPartitions
so same key means same partition.
have you tried searching for your 10 records on partitions 0 and 2 maybe?
if you want a better "spread" of records amongst partitions, either use a null key (you'd get round robin) or a variable key.
Disclaimer: This is not an answer.
The Java consumer is now working as expected. I did not do any change to the code or the configuration. The only thing I did was to restart my Mac. This caused the kafka-logs folder (and the zookeeper folder too I guess) to be deleted.
I re-created the topic (with the same command - 3 partitions, replication factor of 3). Then re-started the brokers with the same configuration - no advertised.host.name or advertised.port config.
So, recreation of the kafka-logs and topics remediated something that was causing an issue earlier.
My only suspect is a non-properly terminated consumer. I ran the consumer code without the close call on the consumer in the finally block initially. I also had the same group.id. Maybe, all 3 partitions were assigned to consumers that weren't properly terminated or closed. This is just a guess..
But even calling myConsumer.position(new TopicPartition(topic, 2)) did not return a response earlier when I assigned the consumer to a partition. It was looping in the same awaitMetadataUpdate method.

Categories

Resources