Kafka Stream StateStore infinite loop

Kafka Stream StateStore infinite loop - java

We have a KStream app that uses in-memory KV StateStore but with changelog disabled.
String stateStoreName = "statestore-v1";
StoreBuilder<KeyValueStore<String, Event>> keyValueStoreBuilder =
Stores.keyValueStoreBuilder(Stores.inMemoryKeyValueStore(stateStoreName),
Serdes.String(), new JsonSerde<>(Event.class));
keyValueStoreBuilder.withLoggingDisabled();
streamsBuilder.addStateStore(keyValueStoreBuilder);
We now want to enable the changelog with different configuration and different name.
String stateStoreName = "statestore-v2";
StoreBuilder<KeyValueStore<String, Event>> keyValueStoreBuilder =
Stores.keyValueStoreBuilder(Stores.inMemoryKeyValueStore(stateStoreName),
Serdes.String(), new JsonSerde<>(Event.class));
Map<String, String> changelogConfig = new HashMap<>();
changelogConfig.put("retention.ms", "43200000"); // 12 hours
changelogConfig.put("cleanup.policy", "delete");
changelogConfig.put("auto.offset.reset", "latest");
keyValueStoreBuilder.withLoggingEnabled(changelogConfig);
streamsBuilder.addStateStore(keyValueStoreBuilder);
When we run our application, we got into infinite loop with these messages:
2022-10-11 13:02:32.761 app=myapp INFO 54561 --- [-StreamThread-3]
o.a.k.s.p.i.StoreChangelogReader : stream-thread [myapp-StreamThread-3]
End offset for changelog myapp-statestore-v2-changelog-4 cannot be found;
will retry in the next time.
2022-10-11 13:02:32.761 app=myapp INFO 54561 --- [-StreamThread-3]
o.a.k.clients.consumer.KafkaConsumer : [Consumer clientId=myapp-StreamThread-3-restore-consumer, groupId=null]
Unsubscribed all topics or patterns and assigned partitions
It does not appear that the changelog topic is ever created... At least kafka-topics does not show it.
I am using io.confluent packages version 7.2.2-ccs, which I think translates to Apache Kafka version 3.2.x
Any ideas on how to fix the infinite loop and get the changelog topics created?
Thanks!

The infinite loop was caused because we were doing Blue/Green deployment. We learned that we can not do this if we are changing anything with the StateStore (configuration or disabling/re-enabling changelogs).
We just did a complete shutdown of the old version, then deployed the new version. That worked fine.
Another option would be to use the kafka-streams-application-reset tool as OneKricketeer suggested.

Related

How to manage RecordTooLargeException avoiding Flink job restarting

Is there any way to ignore oversized messages without Flink job restarting?
If I try to produce (using KafkaSink ) a message which is too large (greater than max.message.bytes) then the RecordTooLargeException occurs and the Flink job restarts, then this "exception&restart" cycle is repeating endlessly!
I don't need to increase messages size limits such as max.message.bytes (Kafka Topic Config) and max.request.size (Flink Producer Config), they are good, they are already big. I just want to handle the situation when an unrealistically large message is trying to be produced. In this case, this big message should be ignored, and an error should be logged, and any Runtime Exception should NOT occur, and the endless restarting loop should NOT start.
I tried to use ProducerInterceptor -> it cannot intercept/reject a message, it can just modify it.
I tried to ignore oversized messages in SerializationSchema (implemented a custom wrapper of SerializationSchema) -> it cannot discard message producing too.
I am trying to overwrite KafkaWriter and KafkaSink classes, but it seems to be challenging.
I will be grateful for any advice!
A few quick environment details:
Kafka version is 2.8.1
Flink code is Java code based on the newer KafkaSource/KafkaSink API, not the
older KafkaConsumer/KafkaProduer API.
The flink-clients and flink-connector-kafka version is 1.15.0
Code sample which throws the RecordTooLargeException:
int numberOfRows = 1;
int rowsPerSecond = 1;
DataStream<String> stream = environment.addSource(
new DataGeneratorSource<>(
RandomGenerator.stringGenerator(1050000), // max.message.bytes=1048588
rowsPerSecond,
(long) numberOfRows),
TypeInformation.of(String.class))
.setParallelism(1)
.name("string-generator");
KafkaSinkBuilder<String> builder = KafkaSink.<String>builder()
.setBootstrapServers("localhost:9092")
.setDeliverGuarantee(DeliveryGuarantee.AT_LEAST_ONCE)
.setRecordSerializer(
KafkaRecordSerializationSchema.builder().setTopic("test.output")
.setValueSerializationSchema(new SimpleStringSchema())
.build());
KafkaSink<String> sink = builder.build();
stream.sinkTo(sink).setParallelism(1).name("output-producer");
Exception Stack Trace:
2022-06-02/14:01:45.066/PDT [flink-akka.actor.default-dispatcher-4] INFO output-producer: Writer -> output-producer: Committer (1/1) (a66beca5a05c1c27691f7b94ca6ac025) switched from RUNNING to FAILED on 271b1b90-7d6b-4a34-8116-3de6faa8a9bf # 127.0.0.1 (dataPort=-1). org.apache.flink.util.FlinkRuntimeException: Failed to send data to Kafka null with FlinkKafkaInternalProducer{transactionalId='null', inTransaction=false, closed=false} at org.apache.flink.connector.kafka.sink.KafkaWriter$WriterCallback.throwException(KafkaWriter.java:440) ~[flink-connector-kafka-1.15.0.jar:1.15.0] at org.apache.flink.connector.kafka.sink.KafkaWriter$WriterCallback.lambda$onCompletion$0(KafkaWriter.java:421) ~[flink-connector-kafka-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsNonBlocking(MailboxProcessor.java:353) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) ~[flink-runtime-1.15.0.jar:1.15.0] at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) ~[flink-runtime-1.15.0.jar:1.15.0] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) ~[flink-runtime-1.15.0.jar:1.15.0] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) ~[flink-runtime-1.15.0.jar:1.15.0] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292] Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The message is 1050088 bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration.

How to solve InvalidTopicException with multiplexed input topics in Spring Cloud Stream Kafka Streams Binder?

I wrote a Spring Cloud Streams Kafka Streams Binder application that has multiple Kafka input topics multiplexed to one stream with:
spring:
cloud:
stream:
bindings:
process-in-0:
destination: test.topic-a,test.topic-b
(Source: https://spring.io/blog/2019/12/03/stream-processing-with-spring-cloud-stream-and-apache-kafka-streams-part-2-programming-model-continued)
But whenever I set up more than one topic in the input destination (separated by comma), the following error occurs:
2022-06-17 14:07:07.648 INFO --- [-StreamThread-1] o.a.k.clients.consumer.KafkaConsumer : [Consumer clientId=test-processor-2ba8d1d3-5bbe-45d3-a832-6a24cf2f5549-StreamThread-1-consumer, groupId=test-processor] Subscribed to topic(s): test-processor-KTABLE-AGGREGATE-STATE-STORE-0000000005-repartition, test.topic-a,test.topic-b
2022-06-17 14:07:07.660 WARN --- [-StreamThread-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=test-processor-2ba8d1d3-5bbe-45d3-a832-6a24cf2f5549-StreamThread-1-consumer, groupId=test-processor] Error while fetching metadata with correlation id 2 : {test-processor-KTABLE-AGGREGATE-STATE-STORE-0000000005-repartition=UNKNOWN_TOPIC_OR_PARTITION, test.topic-a,test.topic-b=INVALID_TOPIC_EXCEPTION}
2022-06-17 14:07:07.660 ERROR --- [-StreamThread-1] org.apache.kafka.clients.Metadata : [Consumer clientId=test-processor-2ba8d1d3-5bbe-45d3-a832-6a24cf2f5549-StreamThread-1-consumer, groupId=test-processor] Metadata response reported invalid topics [test.topic-a,test.topic-b]
2022-06-17 14:07:07.660 INFO --- [-StreamThread-1] org.apache.kafka.clients.Metadata : [Consumer clientId=test-processor-2ba8d1d3-5bbe-45d3-a832-6a24cf2f5549-StreamThread-1-consumer, groupId=test-processor] Cluster ID: XYZ
2022-06-17 14:07:07.663 ERROR --- [-StreamThread-1] org.apache.kafka.streams.KafkaStreams : stream-client [test-processor-2ba8d1d3-5bbe-45d3-a832-6a24cf2f5549] Encountered the following exception during processing and Kafka Streams opted to SHUTDOWN_CLIENT. The streams client is going to shut down now.
org.apache.kafka.streams.errors.StreamsException: org.apache.kafka.common.errors.InvalidTopicException: Invalid topics: [test.topic-a,test.topic-b]
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:627) ~[kafka-streams-3.2.0.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:551) ~[kafka-streams-3.2.0.jar:na]
Caused by: org.apache.kafka.common.errors.InvalidTopicException: Invalid topics: [test.topic-a,test.topic-b]
I tried with the following dependencies:
implementation 'org.apache.kafka:kafka-clients:3.2.0'
implementation 'org.apache.kafka:kafka-streams:3.2.0'
implementation "org.springframework.cloud:spring-cloud-stream"
implementation "org.springframework.cloud:spring-cloud-stream-binder-kafka"
implementation "org.springframework.cloud:spring-cloud-stream-binder-kafka-streams"
implementation "org.springframework.kafka:spring-kafka"
When I only set one input topic, everything works fine.
I am not able to determine what causes the InvalidTopicException, because I only use permitted characters in topic names and also the comma separator seems correct (else different exceptions occur).

Actually right after posting the question I found out the/one solution/workaround myself. So here it is for future help:
Apparently, I am not allowed to multiplex input topics when my processor topology expects a KTable as input type. When I change the processor signature to KStream, it suddenly works:
Not working:
#Bean
public Function<KTable<String, Object>, KStream<String, Object>> process() {
return stringObjectKTable ->
stringObjectKTable
.mapValues(...
Working:
#Bean
public Function<KStream<String, Object>, KStream<String, Object>> process() {
return stringObjectKStream ->
stringObjectKStream
.toTable()
.mapValues(...
I am not sure, if this is expected behaviour or if there is something else wrong, so I appreciate any hints, if there is more underlying.

Kafka streams throwing OutOfMemory exceptions continuously and stops working

I have set up a streams application which consumes messages from one topic, transforms them and put it to the other topic, if any error happens in serialization it puts the records to the error topic.
The load of messages is huge (in millions). The stream app was working perfectly fine until a few days ao, we loaded around 70M data and it was still doing good, then day before yesterday we added another stream to the same application and started streaming the data, now the application crashes with OOM exceptions. Each of the streams have different topics and consumer groups assigned.
The applications runs for an hour or so and then crashes with "java.lang.OutOfMemoryError: Java heap space" errors.
This application is behaving very strangely, we increased the heap size(Xmx) to 2G on each node, our topology is 2 nodes are running the application which are connected to Kafka broker which is running 3 nodes.
There were no network issues but I frequently see "Attempt to heartbeat failed since group is rebalancing" and consumer rebalancing happening in the logs only for the newly added stream.
kafka clients version - 2.3.1
kafka broker - 2.11
Kafka streams configuration:
```props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaProperties.getBootstrapServers());
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 4);
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, SendAndContinueExceptionHandler.class);
props.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG, CustomProductionExceptionHandler.class);
props.put(ProducerConfig.RETRIES_CONFIG, "1");
props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, CustomPartitioner.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
Streams creation code:
```#Bean
public Set<KafkaStreams> kStreamJson(StreamsBuilder builder) {
Serde<JsonNode> jsonSerde = Serdes.serdeFrom(jsonSerializer, jsonDeserializer);
final KStream<String, JsonNode> infoStream = builder.stream(inputTopic, Consumed.with(Serdes.String(), jsonSerde));
Properties infoProps = kStreamsConfigs().asProperties();
infoProps.put(StreamsConfig.APPLICATION_ID_CONFIG, migrationMOIProfilesGroupId);
infoStream
.map(IProcessX::process)
.through(
outputTopic,
Produced.with(Serdes.String(), new JsonPOJOSerde<>(Message.class)));
return Sets.newHashSet(
new KafkaStreams(builder.build(), infoProps)
);
}
Errors received:
[8/6/20, 22:22:54:070 GST] 00000076 SystemOut O 2020-08-06 22:22:54.070 INFO 83225 --- [s-streams-group] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=streams-group-5c15c2c1-798f-4b4d-91a8-24bd9a093fe6-StreamThread-18-consumer, groupId=streams-group] Discovered group coordinator kafka.broker:9092 (id: 2147483644 rack: null)
[8/6/20, 22:23:22:831 GST] 00000d95 SystemOut O 2020-08-06 21:27:59.979 ERROR 83225 --- [| producer-3356] o.apache.kafka.common.utils.KafkaThread : Uncaught exception in thread 'kafka-producer-network-thread | producer-3356':
java.lang.OutOfMemoryError: Java heap space
I have checked session.timeout.ms, heartbeat.timeout.ms, max.poll.interval.ms, max.poll.records but Im not sure what values to set for them.
Please help me solve the issue.

What is 'Simple Consumer Group' in Apache Kafka?

I used the code below to find Kafka consumer groups.
ListConsumerGroupsResult listConsumerGroups = admin.listConsumerGroups();
listConsumerGroups.all().get().forEach(v -> {
logger.info("{}", v);
});
The results are as follows.
[main] INFO KafkaAdminClient - (groupId='test-consumer-group', isSimpleConsumerGroup=false)
[main] INFO KafkaAdminClient - (groupId='wordcount-example', isSimpleConsumerGroup=false)
I want to know what isSimpleConsumerGroup is.
What is simple consumer group?

It is an implementation of consumer-group in kafka, and the main reason of using it is greater control over partition consumption than other consumer group(more implementation available).
For detailed answer you can have a look at this link https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example

Error while fetching metadata with correlation id 92 : {myTest=UNKNOWN_TOPIC_OR_PARTITION}

I have created a sample application to check my producer's code. My application runs fine when I'm sending data without a partitioning key. But, on specifying a key for data partitioning I'm getting the error:
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Error while fetching metadata with correlation id 37 : {myTest=UNKNOWN_TOPIC_OR_PARTITION}
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Error while fetching metadata with correlation id 38 : {myTest=UNKNOWN_TOPIC_OR_PARTITION}
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Error while fetching metadata with correlation id 39 : {myTest=UNKNOWN_TOPIC_OR_PARTITION}
for both consumer and producer. I have searched a lot on the internet, they have suggested to verify kafka.acl settings. I'm using kafka on HDInsight and I have no idea how to verify it and solve this issue.
My cluster has following configuration:
Head Node: 2
Worker Node:4
Zookeeper: 3
MY producer code:
public static void produce(String brokers, String topicName) throws IOException{
// Set properties used to configure the producer
Properties properties = new Properties();
// Set the brokers (bootstrap servers)
properties.setProperty("bootstrap.servers", brokers);
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
// specify the protocol for Domain Joined clusters
//To create an Idempotent Producer
properties.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");
properties.setProperty(ProducerConfig.ACKS_CONFIG, "all");
properties.setProperty(ProducerConfig.RETRIES_CONFIG, Integer.toString(Integer.MAX_VALUE));
properties.setProperty(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "test-transactional-id");
KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
producer.initTransactions();
// So we can generate random sentences
Random random = new Random();
String[] sentences = new String[] {
"the cow jumped over the moon",
"an apple a day keeps the doctor away",
"four score and seven years ago",
"snow white and the seven dwarfs",
"i am at two with nature",
};
for(String sentence: sentences){
// Send the sentence to the test topic
try
{
String key=sentence.substring(0,2);
producer.beginTransaction();
producer.send(new ProducerRecord<String, String>(topicName,key,sentence)).get();
}
catch (Exception ex)
{
System.out.print(ex.getMessage());
throw new IOException(ex.toString());
}
producer.commitTransaction();
}
}
Also, My topic consists of 3 partitions with replication factor=3

I made the replication factor less than the number of partitions and it worked for me. It sounds odd to me but yes, it started working after it.

The error clearly states that the topic (or partition) you are producing to does not exist.
Ultimately, you will need to describe the topic (via CLI kafka-topics --describe --topic <topicName> or other means) to verify if this is true
Kafka on HDInsight and I have no idea how to verify it and solve this issue.
ACLs are only setup if you installed the cluster with them, but I believe you can still list ACLs via zookeper-shell or SSHing into one of Hadoop masters.

I too had the same issue while creating a new topic. And when I described the topic, I could see that leaders were not assigned to the topic partitions.
Topic: xxxxxxxxx Partition: 0 Leader: none Replicas: 3,2,1 Isr:
Topic: xxxxxxxxx Partition: 1 Leader: none Replicas: 1,3,2 Isr:
After some googling, figured out that this could happen when we some issue with controller broker, so restarted the controller broker.
And Everything worked as expected...!

If the topic exists but you're still seeing this error, it could mean that the supplied list of brokers is incorrect. Check the bootstrap.servers value, it should be pointing to the right Kafka cluster where the topic resides.
I saw the same issue and I have multiple Kafka clusters and the topic clearly exists. However, my list of brokers was incorrect.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Kafka Stream StateStore infinite loop - java

Related

How to manage RecordTooLargeException avoiding Flink job restarting

How to solve InvalidTopicException with multiplexed input topics in Spring Cloud Stream Kafka Streams Binder?

Kafka streams throwing OutOfMemory exceptions continuously and stops working

What is 'Simple Consumer Group' in Apache Kafka?

Error while fetching metadata with correlation id 92 : {myTest=UNKNOWN_TOPIC_OR_PARTITION}

Categories

Resources