Kafka leader election causes Kafka Streams crash - java

I have a Kafka Streams application consuming from and producing to a Kafka cluster with 3 brokers and a replication factor of 3. Other than the consumer offset topics (50 partitions), all other topics have only one partition each.
When the brokers attempt a preferred replica election, the Streams app (which is running on a completely different instance than the brokers) fails with the error:
Caused by: org.apache.kafka.streams.errors.StreamsException: task [0_0] exception caught when producing
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:119)
...
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:197)
Caused by: org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
Is it normal that the Streams app attempts to be the leader for the partition, given that it's running on a server that's not part of the Kafka cluster?
I can reproduce this behaviour on demand by:
Killing one of the brokers (whereupon the other 2 take over as leader for all partitions that had the killed broker as their leader, as expected)
Bringing the killed broker back up
Triggering a preferred replica leader election with bin/kafka-preferred-replica-election.sh --zookeeper localhost
My issue seems to be similar to this reported failure, so I'm wondering if this is a new Kafka Streams bug. My full stack trace is literally exactly the same as the gist linked in the reported failure (here).
Another potentially interesting detail is that during the leader election, I get these messages in the controller.log of the broker:
[2017-04-12 11:07:50,940] WARN [Controller-3-to-broker-3-send-thread], Controller 3's connection to broker BROKER-3-HOSTNAME:9092 (id: 3 rack: null) was unsuccessful (kafka.controller.RequestSendThread)
java.io.IOException: Connection to BROKER-3-HOSTNAME:9092 (id: 3 rack: null) failed
at kafka.utils.NetworkClientBlockingOps$.awaitReady$1(NetworkClientBlockingOps.scala:84)
at kafka.utils.NetworkClientBlockingOps$.blockingReady$extension(NetworkClientBlockingOps.scala:94)
at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:232)
at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:185)
at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:184)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
I initially thought this connection error was to blame, but after the leader election crashes the Streams app, if I restart the Streams app, it works normally until the next election, without me touching the brokers at all.
All servers (3 Kafka brokers and the Streams app) are running on EC2 instances.

This is now fixed in 0.10.2.1. If you can't pick that up, make sure you have these two parameters set as follows in your streams config:
final Properties props = new Properties();
...
props.put(ProducerConfig.RETRIES_CONFIG, 10);
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, Integer.toString(Integer.MAX_VALUE));

Related

Spring-Boot and Kafka : How to handle broker not available?

While the spring-boot app is running and if I shutdown the broker completely ( both kafka and zookeeper ) I am seeing this warn in console for infinite amount of time.
[org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1]
WARN o.apache.kafka.clients.NetworkClient - [Consumer
clientId=consumer-1, groupId=ResponseReceiveConsumerGroup]
Connection to node 2147483647 could not be established. Broker may not
be available.
Is there a way in Spring Boot to handle this gracefully instead of infinite logs on console ?
Increase the reconnect.backoff.ms property (see Kafka docs).
The default is only 50ms.

Spark Streaming on YARN - There is insufficient memory for the Java Runtime Environment to continue

This is a Spark Streaming application running on YARN cluster mode which produces messages in three Kafka Brokers.
As soon as it reaches 150K open files it fails:
There is insufficient memory for the Java Runtime Environment to continue
Native memory allocation (mmap) failed to map 12288 bytes for committing reserved memory.
Job aborted due to stage failure ... :
org.apache.kafka.common.KafkaException: Failed to construct kafka producer
.....
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
When doing lsof -p <PID> for the java process that runs that executor I can see tons(up to 90K) of TCP connections from the host server in the Kafka Brokers:
host:portXXX->kafkabroker1:XmlIpcRegSvc (ESTABLISHED)
host:portYYY->kafkabroker2:XmlIpcRegSvc (ESTABLISHED)
host:portZZZ->kafkabroker3:XmlIpcRegSvc (ESTABLISHED)
I tried reducing the number of executor cores from 8 to 6 but there was not a single difference in the number of open files(still it was reaching 150K) and then kept failing.
The libraries to connect to Kafka from Spark Streaming are:
org.apache.spark.streaming.kafka010.KafkaUtils
org.apache.spark.streaming.dstream.InputDStream
org.apache.kafka.clients.producer.kafkaproducer
The code:
foreachRDD{
get kafkaProducer
do some work on each RDD...
foreach( record => {
kafkaProducer.send(record._1,record._2)
}
kafkaProducer.close()
}
It was a schoolboy error. This very well explained article helped fixing the issue. The kafka producer was not closing the connections so we used the broadcast and lazy evaluation technique which solved the issue.

Kafka Connect implementation errors

I was running through the tutorial here: http://kafka.apache.org/documentation.html#introduction
When I get to "Step 7: Use Kafka Connect to import/export data" and attempt to start two connectors I am getting the following errors:
ERROR Failed to flush WorkerSourceTask{id=local-file-source-0}, timed out while waiting for producer to flush outstanding messages, 1 left
ERROR Failed to commit offsets for WorkerSourceTask
Here is the portion of the tutorial:
Next, we'll start two connectors running in standalone mode, which means they run in a single, local, dedicated process. We provide three configuration files as parameters. The first is always the configuration for the Kafka Connect process, containing common configuration such as the Kafka brokers to connect to and the serialization format for data. The remaining configuration files each specify a connector to create. These files include a unique connector name, the connector class to instantiate, and any other configuration required by the connector.
bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties
I have spent some time looking for a solution, but was unable to find anything useful. Any help is appreciated.
Thanks!
The reason I was getting this error was because the first server I created using the config/server.properties was not running. I am assuming that because it is the lead of the topic, the messages could not be flushed and the offsets could not be committed.
Once I started the kafka server using the server propertes (config/server.properties) This issue was resolved.
You need to start Kafka server and Zookeeper before running Kafka Connect.
You need to exec the cmds in "Step 2: Start the server" below first:
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
from here:https://mail-archives.apache.org/mod_mbox/kafka-users/201601.mbox/%3CCAK0BMEpgWmL93wgm2jVCKbUT5rAZiawzOroTFc_A6Q=GaXQgfQ#mail.gmail.com%3E
You need to start zookeeper and kafka server first before running that line.
start zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties
start multiple kafka servers
bin/kafka-server-start.sh config/server.properties
bin/kafka-server-start.sh config/server-1.properties
bin/kafka-server-start.sh config/server-2.properties
start connectors
bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties
Then you will see some lines are written into test.sink.txt:
foo
bar
And you can start the consumer to check it:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test --from-beginning
{"schema":{"type":"string","optional":false},"payload":"foo"}
{"schema":{"type":"string","optional":false},"payload":"bar"}
If you configure your Kafka Broker with a hostname such as my.sandbox.com make sure that you modify the config/connect-standalone.properties accordingly:
bootstrap.servers=my.sandbox.com:9092
On Hortonworks HDP the default port is 6667, hence the setting is
bootstrap.servers=my.sandbox.com:6667
If Kerberos is enabled you will need the following settings as well (without SSL):
security.protocol=PLAINTEXTSASL
producer.security.protocol=PLAINTEXTSASL
producer.sasl.kerberos.service.name=kafka
consumer.security.protocol=PLAINTEXTSASL
consumer.sasl.kerberos.service.name=kafka

Apache Kafka: Failed to Update Metadata/java.nio.channels.ClosedChannelException

I'm just getting started with Apache Kafka/Zookeeper and have been running into issues trying to set up a cluster on AWS. Currently I have three servers:
One running Zookeeper and two running Kafka.
I can start the Kafka servers without issue and can create topics on both of them. However, the trouble comes when I try to start a producer on one machine and a consumer on the other:
on the Kafka producer:
kafka-console-producer.sh --broker-list <kafka server 1 aws public dns>:9092,<kafka server 2 aws public dns>:9092 --topic samsa
on the Kafka consumer:
kafka-console-consumer.sh --zookeeper <zookeeper server ip>:2181 --topic samsa
I type in a message on the producer ("hi") and nothing happens for a while. Then I get this message:
ERROR Error when sending message to topic samsa with key: null, value: 2 bytes
with error: Failed to update metadata after 60000 ms.
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
On the consumer side I get this message, which repeats periodically:
WARN Fetching topic metadata with correlation id # for topics [Set(samsa)] from broker [BrokerEndPoint(<broker.id>,<producer's advertised.host.name>,9092)] failed (kafka.client.ClientUtils$)
java.nio.channels.ClosedChannelException
at kafka.network.BlockingChannel.send(BlockingChannel.scala:110)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:75)
at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:74)
at kafka.producer.SyncProducer.send(SyncProducer.scala:119)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:59)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:94)
at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
After a while, the producer will then start rapidly throwing this error message with # increasing incrementally:
WARN Error while fetching metadata with correlation id # : {samsa=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
Not sure where to go from here. Let me know if more details about my configuration files are needed
This was a configuration issue.
In order to get it running several changes to config files had to happen:
In config/server.properties on each Kafka server:
host.name: <Public IP>
advertised.host.name: <AWS Public DNS Address>
In config/producer.properties on each Kafka server:
metadata.broker.list: <Producer Server advertised.host.name>:<Producer Server port>,<Consumer Server advertised.host.name>:<Consumer Server port>
In /etc/hosts on each Kafka server, change 127.0.0.1 localhost localhost.localdomain to:
<Public IP> localhost localhost.localdomain

kafka + zookeeper remote = error

I am trying to install a kafka & zookeeper instance on a remote server. I only need 1 node of each actually because i only want to provide remote kafka for test purposes.
Kafka and Zookeeper are running from the Apache Kafka tarball you can find there (v0.0.9), inside a Docker image.
Trying to consume / produce using the provided scripts. And trying to produce using own java application. Everythinf is working fine if Kafka & ZK are installed on the local server.
Here is the error I get while trying to produce :
BrokerPartitionInfo:83 - Error while fetching metadata [{TopicMetadata for topic RSS ->
No partition metadata for topic RSS due to kafka.common.LeaderNotAvailableException}] for topic [RSS]: class kafka.common.LeaderNotAvailableException
Kafka properties tested
First :
borker.id=0
port=9092
host.name=<external-ip>
zookeeper.connect=localhost:<PORT>
Second:
borker.id=0
port=9092
host.name=<external-ip>
zookeeper.connect=<external-ip>:<PORT>
Third:
borker.id=0
port=9092
host.name=<external-ip>
zookeeper.connect=<external-ip>:<PORT>
advertised.host.name=<external-ip>
advertised.host.port=<external-ip>
Last:
borker.id=0
port=9092
host.name=</etc/host name>
zookeeper.connect=<external-ip>:<PORT>
advertised.host.name=<external-ip>
advertised.host.port=<external-ip>
Here is my "/etc/hosts"
127.0.0.1 kafka kafka
127.0.0.1 localhost
I followed the Getting Started, which if I understood is a localhost / signle server configurations. I cannot understand what I have to do to get this work with remote calls...
Thanks for your help !
EDIT 1
host.name=localhost
advertised.host.name=politik.cm-cloud.fr
Seems to allow a local consumer (on the server) and producer. But if we want to do the same from a remote server we get
[2015-12-09 12:44:10,826] WARN Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
java.net.NoRouteToHostException: No route to host
The error does not look like connectivity problem with Zookeeper / Kafka.
Just follow the instruction in "quickstart" from http://kafka.apache.org/
BrokerPartitionInfo:83 - Error while fetching metadata [{TopicMetadata for topic RSS ->
Additionally the error indicates there is no partition info i.e topic not yet created . Try creating topics first and then try to produce/consume because when producing to a non existent topic kafka will create the topic based on auto.create.topics.enable in server.properties but remotely it is better to create topics rathen than relying on auto create

Categories

Resources