I have a consumer which polls from multiple topics.
Until now I only produced record into these topics with Java and everything worked fine.
I use the confulent tools with avro.
Now I tried to manually produce a topic via the terminal.
I create a avro-producer with the same schema my other producer uses:
# Produce a record with one field
kafka-avro-console-producer \
--broker-list 127.0.0.1:9092 --topic order_created-in \
--property schema.registry.url=http://127.0.0.1:8081 \
--property value.schema='{"type":"record","name":"test","fields":[{"name":"name","type":"string"},{"name":"APropertie","type":{"type":"array","items":{"type":"record","name":"APropertie","fields":[{"name":"key","type":"string"},{"name":"name","type":"string"},{"name":"date","type":"string"}]}}}]}'
After that I produce a record which follows the specified schema:
{"name": "order_created", "APropertie": [{"key": "1", "name": "testname", "date": "testdate"}]}
The record gets correctly produced to the topic. But my AvroConsumer throws an exception:
Polling
Polling
Polling
Polling
Polling
Polling
Exception in thread "main" org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition order_created-in-0 at offset 1. If needed, please seek past the record to continue consumption.
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 61
Caused by: org.apache.kafka.common.errors.SerializationException: Could not find class test specified in writer's schema whilst finding reader's schema for a SpecificRecord.
Process finished with exit code 1
Any hints?
Thanks!
It has something to do with the config of the producer / consumer.
Normal producers have a config like this:
// normal producer
properties.setProperty("bootstrap.servers", "127.0.0.1:9092");
properties.setProperty("acks", "all");
properties.setProperty("retries", "10");
Avro normally adds the following properties:
// avro part
properties.setProperty("key.serializer", StringSerializer.class.getName());
properties.setProperty("value.serializer", KafkaAvroSerializer.class.getName());
properties.setProperty("schema.registry.url", "http://127.0.0.1:8081");
properties.setProperty("confluent.value.schema.validation", "true");
properties.setProperty("confluent.key.schema.validation", "true");
These have to be included in the the console producer.
Related
l have 5 kafka broker. And in my producer app, my configs is like below:
Interesting part is that, after getting this exception 2 times, in third time message is producing to topic.
spring:
kafka:
streams:
replication-factor: 3
properties:
min.insync.replicas: 2
producer:
acks: "all"
batch-size: 1
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
compression-type: "lz4"
retries: 2
properties:
linger.ms: 1
request:
timeout:
ms: 60000
So, l tried to assign different count for replication-factor and min.insync.replicas.
min.insync.replicas and replication.factor are both Topic Configs, not a Kafka (Streams) client property.
From Spring Kafka , you can use #Bean NewTopic to define Kafka topic resources with their configurations.
Your error probably went away because the cluster had healed itself, while your app was retrying the request. You should look at the broker server logs rather than your app's.
Is there any way to ignore oversized messages without Flink job restarting?
If I try to produce (using KafkaSink ) a message which is too large (greater than max.message.bytes) then the RecordTooLargeException occurs and the Flink job restarts, then this "exception&restart" cycle is repeating endlessly!
I don't need to increase messages size limits such as max.message.bytes (Kafka Topic Config) and max.request.size (Flink Producer Config), they are good, they are already big. I just want to handle the situation when an unrealistically large message is trying to be produced. In this case, this big message should be ignored, and an error should be logged, and any Runtime Exception should NOT occur, and the endless restarting loop should NOT start.
I tried to use ProducerInterceptor -> it cannot intercept/reject a message, it can just modify it.
I tried to ignore oversized messages in SerializationSchema (implemented a custom wrapper of SerializationSchema) -> it cannot discard message producing too.
I am trying to overwrite KafkaWriter and KafkaSink classes, but it seems to be challenging.
I will be grateful for any advice!
A few quick environment details:
Kafka version is 2.8.1
Flink code is Java code based on the newer KafkaSource/KafkaSink API, not the
older KafkaConsumer/KafkaProduer API.
The flink-clients and flink-connector-kafka version is 1.15.0
Code sample which throws the RecordTooLargeException:
int numberOfRows = 1;
int rowsPerSecond = 1;
DataStream<String> stream = environment.addSource(
new DataGeneratorSource<>(
RandomGenerator.stringGenerator(1050000), // max.message.bytes=1048588
rowsPerSecond,
(long) numberOfRows),
TypeInformation.of(String.class))
.setParallelism(1)
.name("string-generator");
KafkaSinkBuilder<String> builder = KafkaSink.<String>builder()
.setBootstrapServers("localhost:9092")
.setDeliverGuarantee(DeliveryGuarantee.AT_LEAST_ONCE)
.setRecordSerializer(
KafkaRecordSerializationSchema.builder().setTopic("test.output")
.setValueSerializationSchema(new SimpleStringSchema())
.build());
KafkaSink<String> sink = builder.build();
stream.sinkTo(sink).setParallelism(1).name("output-producer");
Exception Stack Trace:
2022-06-02/14:01:45.066/PDT [flink-akka.actor.default-dispatcher-4] INFO output-producer: Writer -> output-producer: Committer (1/1) (a66beca5a05c1c27691f7b94ca6ac025) switched from RUNNING to FAILED on 271b1b90-7d6b-4a34-8116-3de6faa8a9bf # 127.0.0.1 (dataPort=-1). org.apache.flink.util.FlinkRuntimeException: Failed to send data to Kafka null with FlinkKafkaInternalProducer{transactionalId='null', inTransaction=false, closed=false} at org.apache.flink.connector.kafka.sink.KafkaWriter$WriterCallback.throwException(KafkaWriter.java:440) ~[flink-connector-kafka-1.15.0.jar:1.15.0] at org.apache.flink.connector.kafka.sink.KafkaWriter$WriterCallback.lambda$onCompletion$0(KafkaWriter.java:421) ~[flink-connector-kafka-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsNonBlocking(MailboxProcessor.java:353) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) ~[flink-runtime-1.15.0.jar:1.15.0] at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) ~[flink-runtime-1.15.0.jar:1.15.0] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) ~[flink-runtime-1.15.0.jar:1.15.0] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) ~[flink-runtime-1.15.0.jar:1.15.0] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292] Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The message is 1050088 bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration.
I have set up a streams application which consumes messages from one topic, transforms them and put it to the other topic, if any error happens in serialization it puts the records to the error topic.
The load of messages is huge (in millions). The stream app was working perfectly fine until a few days ao, we loaded around 70M data and it was still doing good, then day before yesterday we added another stream to the same application and started streaming the data, now the application crashes with OOM exceptions. Each of the streams have different topics and consumer groups assigned.
The applications runs for an hour or so and then crashes with "java.lang.OutOfMemoryError: Java heap space" errors.
This application is behaving very strangely, we increased the heap size(Xmx) to 2G on each node, our topology is 2 nodes are running the application which are connected to Kafka broker which is running 3 nodes.
There were no network issues but I frequently see "Attempt to heartbeat failed since group is rebalancing" and consumer rebalancing happening in the logs only for the newly added stream.
kafka clients version - 2.3.1
kafka broker - 2.11
Kafka streams configuration:
```props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaProperties.getBootstrapServers());
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 4);
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, SendAndContinueExceptionHandler.class);
props.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG, CustomProductionExceptionHandler.class);
props.put(ProducerConfig.RETRIES_CONFIG, "1");
props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, CustomPartitioner.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
Streams creation code:
```#Bean
public Set<KafkaStreams> kStreamJson(StreamsBuilder builder) {
Serde<JsonNode> jsonSerde = Serdes.serdeFrom(jsonSerializer, jsonDeserializer);
final KStream<String, JsonNode> infoStream = builder.stream(inputTopic, Consumed.with(Serdes.String(), jsonSerde));
Properties infoProps = kStreamsConfigs().asProperties();
infoProps.put(StreamsConfig.APPLICATION_ID_CONFIG, migrationMOIProfilesGroupId);
infoStream
.map(IProcessX::process)
.through(
outputTopic,
Produced.with(Serdes.String(), new JsonPOJOSerde<>(Message.class)));
return Sets.newHashSet(
new KafkaStreams(builder.build(), infoProps)
);
}
Errors received:
[8/6/20, 22:22:54:070 GST] 00000076 SystemOut O 2020-08-06 22:22:54.070 INFO 83225 --- [s-streams-group] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=streams-group-5c15c2c1-798f-4b4d-91a8-24bd9a093fe6-StreamThread-18-consumer, groupId=streams-group] Discovered group coordinator kafka.broker:9092 (id: 2147483644 rack: null)
[8/6/20, 22:23:22:831 GST] 00000d95 SystemOut O 2020-08-06 21:27:59.979 ERROR 83225 --- [| producer-3356] o.apache.kafka.common.utils.KafkaThread : Uncaught exception in thread 'kafka-producer-network-thread | producer-3356':
java.lang.OutOfMemoryError: Java heap space
I have checked session.timeout.ms, heartbeat.timeout.ms, max.poll.interval.ms, max.poll.records but Im not sure what values to set for them.
Please help me solve the issue.
I have created a sample application to check my producer's code. My application runs fine when I'm sending data without a partitioning key. But, on specifying a key for data partitioning I'm getting the error:
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Error while fetching metadata with correlation id 37 : {myTest=UNKNOWN_TOPIC_OR_PARTITION}
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Error while fetching metadata with correlation id 38 : {myTest=UNKNOWN_TOPIC_OR_PARTITION}
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Error while fetching metadata with correlation id 39 : {myTest=UNKNOWN_TOPIC_OR_PARTITION}
for both consumer and producer. I have searched a lot on the internet, they have suggested to verify kafka.acl settings. I'm using kafka on HDInsight and I have no idea how to verify it and solve this issue.
My cluster has following configuration:
Head Node: 2
Worker Node:4
Zookeeper: 3
MY producer code:
public static void produce(String brokers, String topicName) throws IOException{
// Set properties used to configure the producer
Properties properties = new Properties();
// Set the brokers (bootstrap servers)
properties.setProperty("bootstrap.servers", brokers);
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
// specify the protocol for Domain Joined clusters
//To create an Idempotent Producer
properties.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");
properties.setProperty(ProducerConfig.ACKS_CONFIG, "all");
properties.setProperty(ProducerConfig.RETRIES_CONFIG, Integer.toString(Integer.MAX_VALUE));
properties.setProperty(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "test-transactional-id");
KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
producer.initTransactions();
// So we can generate random sentences
Random random = new Random();
String[] sentences = new String[] {
"the cow jumped over the moon",
"an apple a day keeps the doctor away",
"four score and seven years ago",
"snow white and the seven dwarfs",
"i am at two with nature",
};
for(String sentence: sentences){
// Send the sentence to the test topic
try
{
String key=sentence.substring(0,2);
producer.beginTransaction();
producer.send(new ProducerRecord<String, String>(topicName,key,sentence)).get();
}
catch (Exception ex)
{
System.out.print(ex.getMessage());
throw new IOException(ex.toString());
}
producer.commitTransaction();
}
}
Also, My topic consists of 3 partitions with replication factor=3
I made the replication factor less than the number of partitions and it worked for me. It sounds odd to me but yes, it started working after it.
The error clearly states that the topic (or partition) you are producing to does not exist.
Ultimately, you will need to describe the topic (via CLI kafka-topics --describe --topic <topicName> or other means) to verify if this is true
Kafka on HDInsight and I have no idea how to verify it and solve this issue.
ACLs are only setup if you installed the cluster with them, but I believe you can still list ACLs via zookeper-shell or SSHing into one of Hadoop masters.
I too had the same issue while creating a new topic. And when I described the topic, I could see that leaders were not assigned to the topic partitions.
Topic: xxxxxxxxx Partition: 0 Leader: none Replicas: 3,2,1 Isr:
Topic: xxxxxxxxx Partition: 1 Leader: none Replicas: 1,3,2 Isr:
After some googling, figured out that this could happen when we some issue with controller broker, so restarted the controller broker.
And Everything worked as expected...!
If the topic exists but you're still seeing this error, it could mean that the supplied list of brokers is incorrect. Check the bootstrap.servers value, it should be pointing to the right Kafka cluster where the topic resides.
I saw the same issue and I have multiple Kafka clusters and the topic clearly exists. However, my list of brokers was incorrect.
Possible duplicate of Kafka - This server is not the leader for that topic-partition but there is no accepted answer nor a clear solution.
I have a simple java program to produce the message to Kafka:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("retries", 1);
props.put("batch.size", 16384);
props.put("linger.ms", 100);
props.put("buffer.memory", 33554432);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "com.company.project.KafkaValueSerializer");
MyMessage message = new MyMessage();
Producer<String, MyMessage> producer = new KafkaProducer<>(props);
Future<RecordMetadata> future =
producer.send(new ProducerRecord<String, MyMessage>("My_Topic", "", message));
I am getting the following exception:
Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:94)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:64)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:29)
at
Caused by: org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
When I tried with kafka-console-producer, I am getting the following error:
D:\kafka_2.11-0.10.2.0\bin\windows>kafka-console-producer.bat --broker-list localhost:9092 --topic My_Topic
hello
[2018-10-25 17:05:27,225] WARN Got error produce response with correlation id 3 on topic-partition My_Topic-0, retrying (2 attempts left). Error: NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
[2018-10-25 17:05:27,335] WARN Got error produce response with correlation id 5 on topic-partition My_Topic-0, retrying (1 attempts left). Error: NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
[2018-10-25 17:05:27,444] WARN Got error produce response with correlation id 7 on topic-partition My_Topic-0, retrying (0 attempts left). Error: NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
[2018-10-25 17:05:27,544] ERROR Error when sending message to topic My_Topic with key: null, value: 5 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
When I describe my topic, I have the following information:
Topic:My_Topic PartitionCount:1 ReplicationFactor:1 Configs:
Topic: My_Topic Partition: 0 Leader: 0 Replicas: 0 Isr: 0
I tried creating a new topic and producing messages as mentioned in Quick start guide then the above steps work well.
What correction should I make in My_Topic or producer configuration so that I can publish messages successfully in this topic?
Had it been the case of "console client working, but Java program not working", then the solution of 'changing retry limit' may have helped.
As both the Java program and the built-in command-line producer are not able to connect to Kafka, I suspect that the problem could be due to stale configuration.
(Example: topic deleted and created again with the same name, but with different partition count).
Deleting the old log files of zookeeper and Kafka and creating the topic again, or creating a topic with a different name will resolve the issue.