Error in kafka consumer with spring boot (disconnected) - java

I am getting this error in consumer when using spring boot:
2021-10-11 19:42:41.388 WARN 64415 --- [ntainer#0-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-something-1, groupId=some_group_id] Bootstrap broker my_address:9093 (id: -1 rack: null) disconnected
My application.properties are:
spring.kafka.consumer.bootstrap-servers = my_address:9093
spring.kafka.consumer.group-id= some_group_id
spring.kafka.consumer.auto-offset-reset = earliest
Java code will call following properties in consumer.properties
bootstrap.servers=my_address:9093
schema.registry.url=https://URL.net
security.protocol=SSL
ssl.truststore.location=some_path.jks
ssl.truststore.password=pinPass
ssl.keystore.location=some_path.jks
ssl.keystore.password=pinPass
ssl.key.password=pinPass
in java:
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,io.confluent.kafka.serializers.KafkaAvroDeserializer.class);
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,io.confluent.kafka.serializers.KafkaAvroDeserializer.class);
Please let me know if any more inputs needed.
Note: same code works fine for localhost..Also there is no issue with server as i have another code without spring and it works fine!

Your "security.protocol=SSL" means you will use SSL for identity authentication, please check your SSL Keys and Certificates, otherwise using "security.protocol=PLAINTEXT" will use the default, Un-authenticated, non-encrypted channel.
refer to:
https://docs.confluent.io/platform/current/kafka/authentication_ssl.html

Related

How to solve InvalidTopicException with multiplexed input topics in Spring Cloud Stream Kafka Streams Binder?

I wrote a Spring Cloud Streams Kafka Streams Binder application that has multiple Kafka input topics multiplexed to one stream with:
spring:
cloud:
stream:
bindings:
process-in-0:
destination: test.topic-a,test.topic-b
(Source: https://spring.io/blog/2019/12/03/stream-processing-with-spring-cloud-stream-and-apache-kafka-streams-part-2-programming-model-continued)
But whenever I set up more than one topic in the input destination (separated by comma), the following error occurs:
2022-06-17 14:07:07.648 INFO --- [-StreamThread-1] o.a.k.clients.consumer.KafkaConsumer : [Consumer clientId=test-processor-2ba8d1d3-5bbe-45d3-a832-6a24cf2f5549-StreamThread-1-consumer, groupId=test-processor] Subscribed to topic(s): test-processor-KTABLE-AGGREGATE-STATE-STORE-0000000005-repartition, test.topic-a,test.topic-b
2022-06-17 14:07:07.660 WARN --- [-StreamThread-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=test-processor-2ba8d1d3-5bbe-45d3-a832-6a24cf2f5549-StreamThread-1-consumer, groupId=test-processor] Error while fetching metadata with correlation id 2 : {test-processor-KTABLE-AGGREGATE-STATE-STORE-0000000005-repartition=UNKNOWN_TOPIC_OR_PARTITION, test.topic-a,test.topic-b=INVALID_TOPIC_EXCEPTION}
2022-06-17 14:07:07.660 ERROR --- [-StreamThread-1] org.apache.kafka.clients.Metadata : [Consumer clientId=test-processor-2ba8d1d3-5bbe-45d3-a832-6a24cf2f5549-StreamThread-1-consumer, groupId=test-processor] Metadata response reported invalid topics [test.topic-a,test.topic-b]
2022-06-17 14:07:07.660 INFO --- [-StreamThread-1] org.apache.kafka.clients.Metadata : [Consumer clientId=test-processor-2ba8d1d3-5bbe-45d3-a832-6a24cf2f5549-StreamThread-1-consumer, groupId=test-processor] Cluster ID: XYZ
2022-06-17 14:07:07.663 ERROR --- [-StreamThread-1] org.apache.kafka.streams.KafkaStreams : stream-client [test-processor-2ba8d1d3-5bbe-45d3-a832-6a24cf2f5549] Encountered the following exception during processing and Kafka Streams opted to SHUTDOWN_CLIENT. The streams client is going to shut down now.
org.apache.kafka.streams.errors.StreamsException: org.apache.kafka.common.errors.InvalidTopicException: Invalid topics: [test.topic-a,test.topic-b]
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:627) ~[kafka-streams-3.2.0.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:551) ~[kafka-streams-3.2.0.jar:na]
Caused by: org.apache.kafka.common.errors.InvalidTopicException: Invalid topics: [test.topic-a,test.topic-b]
I tried with the following dependencies:
implementation 'org.apache.kafka:kafka-clients:3.2.0'
implementation 'org.apache.kafka:kafka-streams:3.2.0'
implementation "org.springframework.cloud:spring-cloud-stream"
implementation "org.springframework.cloud:spring-cloud-stream-binder-kafka"
implementation "org.springframework.cloud:spring-cloud-stream-binder-kafka-streams"
implementation "org.springframework.kafka:spring-kafka"
When I only set one input topic, everything works fine.
I am not able to determine what causes the InvalidTopicException, because I only use permitted characters in topic names and also the comma separator seems correct (else different exceptions occur).
Actually right after posting the question I found out the/one solution/workaround myself. So here it is for future help:
Apparently, I am not allowed to multiplex input topics when my processor topology expects a KTable as input type. When I change the processor signature to KStream, it suddenly works:
Not working:
#Bean
public Function<KTable<String, Object>, KStream<String, Object>> process() {
return stringObjectKTable ->
stringObjectKTable
.mapValues(...
Working:
#Bean
public Function<KStream<String, Object>, KStream<String, Object>> process() {
return stringObjectKStream ->
stringObjectKStream
.toTable()
.mapValues(...
I am not sure, if this is expected behaviour or if there is something else wrong, so I appreciate any hints, if there is more underlying.

Error while fetching metadata with correlation id 92 : {myTest=UNKNOWN_TOPIC_OR_PARTITION}

I have created a sample application to check my producer's code. My application runs fine when I'm sending data without a partitioning key. But, on specifying a key for data partitioning I'm getting the error:
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Error while fetching metadata with correlation id 37 : {myTest=UNKNOWN_TOPIC_OR_PARTITION}
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Error while fetching metadata with correlation id 38 : {myTest=UNKNOWN_TOPIC_OR_PARTITION}
[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Error while fetching metadata with correlation id 39 : {myTest=UNKNOWN_TOPIC_OR_PARTITION}
for both consumer and producer. I have searched a lot on the internet, they have suggested to verify kafka.acl settings. I'm using kafka on HDInsight and I have no idea how to verify it and solve this issue.
My cluster has following configuration:
Head Node: 2
Worker Node:4
Zookeeper: 3
MY producer code:
public static void produce(String brokers, String topicName) throws IOException{
// Set properties used to configure the producer
Properties properties = new Properties();
// Set the brokers (bootstrap servers)
properties.setProperty("bootstrap.servers", brokers);
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
// specify the protocol for Domain Joined clusters
//To create an Idempotent Producer
properties.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");
properties.setProperty(ProducerConfig.ACKS_CONFIG, "all");
properties.setProperty(ProducerConfig.RETRIES_CONFIG, Integer.toString(Integer.MAX_VALUE));
properties.setProperty(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "test-transactional-id");
KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
producer.initTransactions();
// So we can generate random sentences
Random random = new Random();
String[] sentences = new String[] {
"the cow jumped over the moon",
"an apple a day keeps the doctor away",
"four score and seven years ago",
"snow white and the seven dwarfs",
"i am at two with nature",
};
for(String sentence: sentences){
// Send the sentence to the test topic
try
{
String key=sentence.substring(0,2);
producer.beginTransaction();
producer.send(new ProducerRecord<String, String>(topicName,key,sentence)).get();
}
catch (Exception ex)
{
System.out.print(ex.getMessage());
throw new IOException(ex.toString());
}
producer.commitTransaction();
}
}
Also, My topic consists of 3 partitions with replication factor=3
I made the replication factor less than the number of partitions and it worked for me. It sounds odd to me but yes, it started working after it.
The error clearly states that the topic (or partition) you are producing to does not exist.
Ultimately, you will need to describe the topic (via CLI kafka-topics --describe --topic <topicName> or other means) to verify if this is true
Kafka on HDInsight and I have no idea how to verify it and solve this issue.
ACLs are only setup if you installed the cluster with them, but I believe you can still list ACLs via zookeper-shell or SSHing into one of Hadoop masters.
I too had the same issue while creating a new topic. And when I described the topic, I could see that leaders were not assigned to the topic partitions.
Topic: xxxxxxxxx Partition: 0 Leader: none Replicas: 3,2,1 Isr:
Topic: xxxxxxxxx Partition: 1 Leader: none Replicas: 1,3,2 Isr:
After some googling, figured out that this could happen when we some issue with controller broker, so restarted the controller broker.
And Everything worked as expected...!
If the topic exists but you're still seeing this error, it could mean that the supplied list of brokers is incorrect. Check the bootstrap.servers value, it should be pointing to the right Kafka cluster where the topic resides.
I saw the same issue and I have multiple Kafka clusters and the topic clearly exists. However, my list of brokers was incorrect.

I getting always getting "{"status":504,"error":"Gateway Timeout","message":"com.netflix.zuul.exception.ZuulException: Hystrix Readed time out"}"?

I am always getting "2019-04-09 07:24:23.389 WARN 11676 --- [nio-9095-exec-5] o.s.c.n.z.filters.post.SendErrorFilter : Error during filtering", for request which takes more than 1 second.
I have already tried to increase the timeout but none of them worked.
2019-04-09 07:24:23.389 WARN 11676 --- [nio-9095-exec-5] o.s.c.n.z.filters.post.SendErrorFilter : Error during filtering
com.netflix.zuul.exception.ZuulException:
at org.springframework.cloud.netflix.zuul.filters.post.SendErrorFilter.findZuulException(SendErrorFilter.java:114) ~[spring-cloud-netflix-zuul-2.1.0.RELEASE.jar:2.1.0.RELEASE]
at org.springframework.cloud.netflix.zuul.filters.post.SendErrorFilter.run(SendErrorFilter.java:76) ~[spring-cloud-netflix-zuul-2.1.0.RELEASE.jar:2.1.0.RELEASE]
at com.netflix.zuul.ZuulFilter.runFilter(ZuulFilter.java:117) ~[zuul-core-1.3.1.jar:1.3.1]
at com.netflix.zuul.FilterProcessor.processZuulFilter(FilterProcessor.java:193) ~[zuul-core-1.3.1.jar:1.3.1]
at com.netflix.zuul.FilterProcessor.runFilters(FilterProcessor.java:157) ~[zuul-core-1.3.1.jar:1.3.1]
at com.netflix.zuul.FilterProcessor.error(FilterProcessor.java:105) ~[zuul-core-1.3.1.jar:1.3.1]
at com.netflix.zuul.ZuulRunner.error(ZuulRunner.java:112) ~[zuul-core-1.3.1.jar:1.3.1]
at com.netflix.zuul.http.ZuulServlet.error(ZuulServlet.java:145) ~[zuul-core-1.3.1.jar:1.3.1]
at com.netflix.zuul.http.ZuulServlet.service(ZuulServlet.java:83) ~[zuul-core-1.3.1.jar:1.3.1]
at org.springframework.web.servlet.mvc.ServletWrappingController.handleRequestInternal(ServletWrappingController.java:165) ~[spring-webmvc-5.1.5.RELEASE.jar:5.1.5.RELEASE]
at ava.lang.Thread.run(Thread.java:834) ~[na:na]
You can check my answer:
here
Hystrix readed timeout by default is 1 second, and you can change that in your application.yaml file. It can be done globally or per service.
Above issue is caused due to hysterix timeout.
The above issue can be solved by disabling the hystrix timeout or increasing the hysterix timeout as below :
# Disable Hystrix timeout globally (for all services)
hystrix.command.default.execution.timeout.enabled: false
#To disable timeout foror particular service,
hystrix.command.<serviceName>.execution.timeout.enabled: false
# Increase the Hystrix timeout to 60s (globally)
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds: 60000
# Increase the Hystrix timeout to 60s (per service)
hystrix.command.<serviceName>.execution.isolation.thread.timeoutInMilliseconds: 60000
The above solution will work if you are using discovery service for service lookup and routing.
Here is the detailed explaination : spring-cloud-netflix-issue-321
You are timing out on H2 console testing with postman or any other http testers because: Using Zuul...hysterix...you are trying to send the same exact object to the H2 database. This may be happening because you have validators on your models also. To resolve: make sure the json, xml or whatever it is objects are relatively unique by re-edit and then try to send request again.

Spring Cloud Stream KStream Consumer Concurrency has no effect?

My configuration for the consumer is as documented in Spring cloud stream consumer properties documentation.
spring-cloud-dependencies:Finchley.SR1
springBootVersion = '2.0.5.RELEASE'
I have 4 partitions for kstream_test topic and they are filled with messages from producer as seen below:
root#kafka:/# kafka-run-class kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic kstream_test --time -1
kstream_test:2:222
kstream_test:1:203
kstream_test:3:188
kstream_test:0:278
My spring cloud stream kafka binder based configuration is:
spring.cloud.stream.bindings.input:
destination: kstream_test
group: consumer-group-G1_test
consumer:
useNativeDecoding: true
headerMode: raw
startOffset: latest
partitioned: true
concurrency: 3
KStream Listener class
#StreamListener
#SendTo(MessagingStreams.OUTPUT)
public KStream<?, ?> process(#Input(MessagingStreams.INPUT) KStream<?, ?> kstreams) {
......
log.info("Got a message");
......
return kstreams;
}
My producer sends 100 messages in 1 run. But the logs seems to have only 1 thread StreamThread-1 handling the messages, though I have concurrency as 3. What might be wrong here ? Is 100 messages not enough to see the concurrency at play ?
2018-10-18 11:50:01.923 INFO 10228 --- [-StreamThread-1] c.c.c.s.KStreamHandler : Got a message
2018-10-18 11:50:01.923 INFO 10228 --- [-StreamThread-1] c.c.c.s.KStreamHandler : Got a message
2018-10-18 11:50:01.945 INFO 10228 --- [-StreamThread-1] c.c.c.s.KStreamHandler : Got a message
2018-10-18 11:50:01.956 INFO 10228 --- [-StreamThread-1] c.c.c.s.KStreamHandler : Got a message
2018-10-18 11:50:01.972 INFO 10228 --- [-StreamThread-1] c.c.c.s.KStreamHandler : Got a message
UPDATE:
As per the answer, the below num.stream.threads configuration works at the binder level.
spring.cloud.stream.kafka.streams.binder.configuration:
num.stream.threads: 3
It seems that the num.stream.threads needs to be set to increase the concurrency...
/** {#code num.stream.threads} */
#SuppressWarnings("WeakerAccess")
public static final String NUM_STREAM_THREADS_CONFIG = "num.stream.threads";
private static final String NUM_STREAM_THREADS_DOC = "The number of threads to execute stream processing.";
...it defaults to 1.
The binder should really set that based on the ...consumer.concurrency property; please open a github issue to that effect against the binder.
In the meantime, you can just set that property directly in ...consumer.configuration.
CORRECTION
I've just been told that the ...consumer.configuration is not currently applied to the streams binder either; you would have to set it at the binder level.

Understanding commit failure from the consumer when leader has changed

Consider following real obfuscated logs:
19:33:48,409 99733391 (pool-6-thread-11) ERROR [org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] [] [Consumer clientId=app2.maria1.mcdonalnds_service_msg, groupId=mcdonalnds_service_msg] Offset commit failed on partition service_megaman_mt-mcdonalnds_service_msg-1 at offset 75796: This is not the correct coordinator.
19:33:48,410 99733392 (pool-6-thread-11) INFO [org.apache.kafka.clients.consumer.internals.AbstractCoordinator] [] [Consumer clientId=app2.maria1.mcdonalnds_service_msg, groupId=mcdonalnds_service_msg] Group coordinator kafka1.maria4.internal:9092 (id: 2147483646 rack: null) is unavailable or invalid, will attempt rediscovery
19:33:48,414 99733396 (kafka-producer-network-thread | producer-1) WARN [org.apache.kafka.clients.producer.internals.Sender] [] [Producer clientId=producer-1] Got error produce response with correlation id 16386 on topic-partition service_megaman_mo-mcdonalnds_service_msg-1, retrying (99 attempts left). Error: NOT_LEADER_FOR_PARTITION
19:33:48,510 99733492 (pool-6-thread-11) INFO [org.apache.kafka.clients.consumer.internals.AbstractCoordinator] [] [Consumer clientId=app2.maria1.mcdonalnds_service_msg, groupId=mcdonalnds_service_msg] Discovered group coordinator kafka3.maria4.internal:9092 (id: 2147483644 rack: null)
19:33:48,528 99733510 (pool-6-thread-11) ERROR [org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] [] [Consumer clientId=app2.maria1.mcdonalnds_service_msg, groupId=mcdonalnds_service_msg] Offset commit failed on partition service_megaman_mt-mcdonalnds_service_msg-1 at offset 75796: The coordinator is not aware of this member.
19:33:48,528 99733510 (pool-6-thread-11) ERROR [com.bob.kafka.consumer.ListenableKafkaConsumer] [] Aborting consumer [mcdonalnds_service_msg] for topics [[service_megaman_mt-mcdonalnds_service_msg]] operation due to failure! Cause:
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
As far as I understand the exception message about poll() is not really the cause. So what happened:
1. Coordinator was not available
2. Consumer found new coordinator
3. New coordinator did not recognise offset so rejected the commit
What I am trying to figure out is the options to recover from this situation. This is not intermittent issue but happened once in a year so the setting on poll would not have helped if leader died.
What happens now: Original application code was simply closing consumers which is wrong , caused alerts and woke up just about everyone as application stopped consuming messages :-)
What I want to happen:
Consumer is restarted, doesn't die if loses connection to coordinator
What I am not sure about:
Why coordinator is not aware of this member
If I understand the issue correctly. :-)
On service side with Java Kafka lib for KafkaConsumer class I
should call close and subscribe or unsubscribe and
subscribe to fullfill my consumer recovery scenario.
What is going to happen to the processed
offset
which was rejected by new coordinator? Since offset was not commited I assume consumer will re-read the same messages?
Following post for Spring-kafka looks very similar issue but the service does not use Spring so this is of limited use to me.

Categories

Resources