How should we retrieve the full details of all brokers(connected/disconnected) from kafka cluster/zookeeper ?
I found following way to fetch only active brokers, but I want the the IP address of broker which is serving previously in cluster but it is disconnected now
Following code snippet gives list of active brokers:
ZooKeeper zkInstance = new ZooKeeper("mymachine:port", 10000, null);
brokerIDs = zkInstance.getChildren("/brokers/ids", false);
for (String brokerID : brokerIDs) {
brokerInfo = new String(zkInstance.getData("/brokers/ids/" + brokerID, false, null));
String host=brokerInfo.substring(brokerInfo.indexOf("\"host\"")).split(",") [0].replaceAll("\"","").split(":")[1];
String port=brokerInfo.substring(brokerInfo.indexOf("\"jmx_port\"")).split(",") [0].replaceAll("\"","").split(":")[1];
System.out.println(host+":"+port);
}
Output:
my-machine-1:port
my-machine-2:port
my-machine-3:port
my-machine-4:port
I need information of all connected/disconnected brokers in multi node kafka cluster
Use describeCluster() of the AdminClient class to get the broker details such as host,port,id and rock.
Please refer the following code:
Properties kafkaProperties = new Properties();
kafkaProperties.put("bootstrap.servers", "localhost:9092,localhost:9093,localhost:9094");
AdminClient adminClient = AdminClient.create(kafkaProperties);
DescribeClusterResult describeClusterResult = adminClient.describeCluster();
Collection<Node> brokerDetails = describeClusterResult.nodes().get();
System.out.println("host and port details");
for(Node broker:brokerDetails) {
System.out.println(broker.host()+":"+broker.port());
}
Related
I am learning Kafka recently, and my consumers can't consume any records unless I specify the --parititon 0 parameter. In other words I can NOT consume records like:
kafka-console-consumer --bootstrap-server 127.0.0.10:9092 --topic first-topic
but works like:
kafka-console-consumer --bootstrap-server 127.0.0.10:9092 --topic first-topic --partition 0
THE MAIN PROBLEM IS, when I moved to java code, my KafkaConsumer class can't fetch records, and I need to know how to specify the partition number in java KafkaConsumer ?!
my current java code is:
public class ConsumerDemo {
public static void main(String[] args) {
Logger logger = LoggerFactory.getLogger((ConsumerDemo.class.getName()));
String bootstrapServer = "127.0.0.10:9092";
String groupId = "my-kafka-java-app";
String topic = "first-topic";
// create consumer configs
Properties properties = new Properties();
properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
//properties.setProperty(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, partition);
properties.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG, groupId);
properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
// create consumer
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(properties);
// subscribe consumer to our topic
consumer.subscribe(Collections.singleton(topic)); //means subscribtion to one topic
// poll for new data
while(true){
//consumer.poll(100); old way
ConsumerRecords<String, String> records =
consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records){
logger.info("Key: " + record.key() + ", Value: "+ record.value() );
logger.info("Partition: " + record.partition() + ", Offset: "+ record.offset());
}
}
}
}
After a lot of inspection, my solution came out to be using consumer.assign and consumer.seek instead of using consumer.subscribe and without specifying the groupId. But I feel there should be a more optimal solution
the java code will be as:
// create consumer
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(properties);
// subscribe consumer to our topic
//consumer.subscribe(Collections.singleton(topic)); //means subscription to one topic
// using assign and Seek, are mostly used to replay data or fetch a specific msg
TopicPartition partitionToReadFrom = new TopicPartition(topic, 0);
long offsetToReadFrom = 15L;
// assign
consumer.assign(Arrays.asList(partitionToReadFrom));
// seek: for a specific offset to read from
consumer.seek(partitionToReadFrom, offsetToReadFrom);
The way you are doing is correct. You don't need to specify the partition when subscribing to a topic. Maybe your consumer group has already consumed all messages in the topic and has committed the latest offsets.
Make sure new messages are being produced when you run your application or create a new consumer group to consume from the beginning (if you keep the ConsumerConfig.AUTO_OFFSET_RESET_CONFIG set to "earliest")
As the name implies, the ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG property aims to configure a Partition Assignment Strategy and no to set a fixed partition as instructed by the command line.
The default strategy used is the RangeAssignor which can be changed, for example to a StickyAssignor as follows:
properties.setProperty(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,StickyAssignor.class.getName());
You can read more about Kafka Client Side Assignment Proposal.
Given the following appliction.conf :
akka {
loglevel = debug
actor {
provider = cluster
serialization-bindings {
"sample.cluster.CborSerializable" = jackson-cbor
}
}
remote {
artery {
canonical.hostname = "127.0.0.1"
canonical.port = 0
}
}
cluster {
roles= ["testrole1" , "testrole2"]
seed-nodes = [
"akka://ClusterSystem#127.0.0.1:25251",
"akka://ClusterSystem#127.0.0.1:25252"]
downing-provider-class = "akka.cluster.sbr.SplitBrainResolverProvider"
}
}
To discern between the roles within an Actor I use :
void register(Member member) {
if (member.hasRole("testrole1")) {
//start actor a1
}
else if (member.hasRole("testrole2")) {
//start actor a2
}
}
edited from src (https://doc.akka.io/docs/akka/current/cluster-usage.html)
To enable role for a node I use the following config :
Within application.conf I configure the array for the roles but this appears to be at the cluster level rather than node level. In other words it does not seem possible to configure application.conf such that the Akka cluster is instructed to start actor a1 on node n1 and actor a2 on node n2? Should note details be specified at the level of akka.cluster in application.conf ?
For each node, is it required to specify multiple application.conf configuration files?
For example, application.conf for testrole1
akka {
loglevel = debug
actor {
provider = cluster
serialization-bindings {
"sample.cluster.CborSerializable" = jackson-cbor
}
}
remote {
artery {
canonical.hostname = "127.0.0.1"
canonical.port = 0
}
}
cluster {
roles= ["testrole1"]
seed-nodes = [
"akka://ClusterSystem#127.0.0.1:25251",
"akka://ClusterSystem#127.0.0.1:25252"]
downing-provider-class = "akka.cluster.sbr.SplitBrainResolverProvider"
}
}
application.conf for testrole2 :
akka {
loglevel = debug
actor {
provider = cluster
serialization-bindings {
"sample.cluster.CborSerializable" = jackson-cbor
}
}
remote {
artery {
canonical.hostname = "127.0.0.1"
canonical.port = 0
}
}
cluster {
roles= ["testrole2"]
seed-nodes = [
"akka://ClusterSystem#127.0.0.1:25251",
"akka://ClusterSystem#127.0.0.1:25252"]
downing-provider-class = "akka.cluster.sbr.SplitBrainResolverProvider"
}
}
The difference between each application.conf defined above is the value of akka.cluster.roles is either "testrole1" or "testrole2".
How should application.conf be configured such that the Akka cluster is instructed to start actor a1 on node n1 and actor a2 on node n2? Should node details be specified at the level of akka.cluster in application.conf ?
Update:
Another option is to pass the rolename via an environment variable? I've just noticed this is explicitly stated here: https://doc.akka.io/docs/akka/current/typed/cluster.html "The node roles are defined in the configuration property named akka.cluster.roles and typically defined in the start script as a system property or environment variable." In this scenario, use the same application.conf file for all nodes but each node uses an environment variable. For example, an updated appliction.conf (note addition of "ENV_VARIABLE")
akka {
loglevel = debug
actor {
provider = cluster
serialization-bindings {
"sample.cluster.CborSerializable" = jackson-cbor
}
}
remote {
artery {
canonical.hostname = "127.0.0.1"
canonical.port = 0
}
}
cluster {
roles= ["ENV_VARIABLE"]
seed-nodes = [
"akka://ClusterSystem#127.0.0.1:25251",
"akka://ClusterSystem#127.0.0.1:25252"]
downing-provider-class = "akka.cluster.sbr.SplitBrainResolverProvider"
}
}
Cluster startup scripts determine the role for each node via the ENV_VARIABLE parameter, is this a viable solution?
If you're going to assign different roles to different nodes, those nodes cannot use the same configuration. The easiest way to accomplish this is through n1 having "testRole1" in its akka.cluster.roles list and n2 having "testRole2" in its akka.cluster.roles list.
Everything in akka.cluster config is only configuring that node for participation in the cluster (it's configuring the cluster component on that node). A few of the settings have to be the same across the nodes of a cluster (e.g. the SBR settings), but a setting on n1 doesn't affect a setting on n2.
I am new to Kafka. Tried to implement consumer and producer classes to send and receive messages. Need to configure bootstrap.servers for both classes which is a list of broker's ip and port separated by ,. For example,
producerConfig.put("bootstrap.servers",
"172.16.20.207:9946,172.16.20.234:9125,172.16.20.36:9636");
Since the application will be running on the master node of a cluster, it should be able to retrieve the broker information from ZooKeeper just like the answer to Kafka: Get broker host from ZooKeeper.
public static void main(String[] args) throws Exception {
ZooKeeper zk = new ZooKeeper("localhost:2181", 10000, null);
List<String> ids = zk.getChildren("/brokers/ids", false);
for (String id : ids) {
String brokerInfo = new String(zk.getData("/brokers/ids/" + id, false, null));
System.out.println(id + ": " + brokerInfo);
}
}
However this brokerInfo is in Json format which looks like this:
{"jmx_port":-1,"timestamp":"1428512949385","host":"192.168.0.11","version":1,"port":9093}
In this same post, another one suggested the following way of getting connection string for each broker and join them together with comma.
for (String id : ids) {
String brokerInfoString = new String(zk.getData("/brokers/ids/" + id, false, null));
Broker broker = Broker.createBroker(Integer.valueOf(id), brokerInfoString);
if (broker != null) {
brokerList.add(broker.connectionString());
}
}
If this Broker class is from org.apache.kafka.common.requests.UpdateMetadataRequest.Broker, it does not have methods createBroker and connectionString.
Found another similar post Getting the list of Brokers Dynamically. But it did not say how to get the attribute from broker info such as host and port. I can probably write a parser for the json like string to extract them, but I suspect there is more Kafka native way to do that. Any suggestions?
EDIT: I realized the Broker class is from kafka.cluster.Broker. Still it does not have method connectionstring().
You could use ZkUtils to retrieve all the broker information in the cluster, as show below:
ZkUtils zk = ZkUtils.apply("zkHost:2181", 6000, 6000, true);
List<Broker> brokers = JavaConversions.seqAsJavaList(zk.getAllBrokersInCluster());
for (Broker broker : brokers) {
//assuming you do not enable security
System.out.println(broker.getBrokerEndPoint(SecurityProtocol.PLAINTEXT).host());
}
zk.close();
I am using following code to extract Kafka broker list from zookeeper:
private static String getBrokerList() {
try {
ZooKeeper zookeeper = new ZooKeeper(zookeeperConnect, 15000, null);
List<String> ids = zookeeper.getChildren(ZkUtils.BrokerIdsPath(), false);
List<String> brokerList = new ArrayList<>();
for (String id : ids) {
String brokerInfo = new String(zookeeper.getData(ZkUtils.BrokerIdsPath() + '/' + id, false, null), Charset.forName("UTF-8"));
JsonObject jsonElement = new JsonParser().parse(brokerInfo).getAsJsonObject();
String host = jsonElement.get("host").getAsString();
brokerList.add(host + ':' + jsonElement.get("port").toString());
}
return Joiner.on(",").join(brokerList);
} catch (KeeperException | InterruptedException e) {
Throwables.propagate(e);
}
return "";
}
Above code is working fine when one thread executing the code at a time.
However, when several threads are executing the above code it fails with the following exception occasionally:
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /brokers/ids
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1532)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1560)
What am I doing wrong here?
My zookeeper version is 3.4.6-1569965.
from http://zookeeper.apache.org/doc/r3.4.9/api/org/apache/zookeeper/ZooKeeper.html#ZooKeeper(java.lang.String,%20int,%20org.apache.zookeeper.Watcher)
"Session establishment is asynchronous. This constructor will initiate connection to the server and return immediately - potentially (usually) before the session is fully established. The watcher argument specifies the watcher that will be notified of any changes in state. This notification can come at any point before or after the constructor call has returned."
You have to wait fro zookeeper connection to fully estabilish:
https://www.tutorialspoint.com/zookeeper/zookeeper_quick_guide.htm
Scroll down to the api section "Connect to the ZooKeeper Ensemble"
When configuring the Cluster Routing in the config file using:
akka.actor.deployment {
/jobDispatcher/singleton/workerRouter {
router = round-robin-pool
nr-of-instances = 5
cluster {
enabled = on
max-number-of-instances-per-node = 1
allow-local-routees = on
}
}
}
I can lookup a routed worker using:
ActorRef actor = context().actorOf( //
FromConfig.getInstance().props( //
Props.create(MyRoutedActor.class)), //
"workerRouter");
I would prefer configuring the Pool programmatically since I want to hide the details from my user.
However using:
ActorRef actor = context().actorOf(new ClusterRouterPool(new RoundRobinPool(5), //
new ClusterRouterPoolSettings(100, 1, true, "")) //
.props(Props.create(MyRoutedActor.class)),
"workerRouter");
does not route the calls to Routees in the cluster (only the local .
How do I configure the routing correctly?
Try to use ClusterRouterPool
Akka doc says [http://doc.akka.io/docs/akka/2.4/scala/cluster-usage.html]:
Pool - router that creates routees as child actors and deploys them on remote nodes. Each router will have its own routee instances. For example, if you start a router on 3 nodes in a 10-node cluster, you will have 30 routees in total if the router is configured to use one instance per node. The routees created by the different routers will not be shared among the routers. One example of a use case for this type of router is a single master that coordinates jobs and delegates the actual work to routees running on other nodes in the cluster.
The example http://doc.akka.io/docs/akka/2.4/java/cluster-usage.html#Router_with_Pool_of_Remote_Deployed_Routees
akka.actor.deployment {
/statsService/singleton/workerRouter {
router = consistent-hashing-pool
cluster {
enabled = on
max-nr-of-instances-per-node = 3
allow-local-routees = on
use-role = compute
}
}
}
The code to do programmatically (also from the akka docs):
int totalInstances = 100;
int maxInstancesPerNode = 3;
boolean allowLocalRoutees = false;
String useRole = "compute";
ActorRef workerRouter = getContext().actorOf(
new ClusterRouterPool(new ConsistentHashingPool(0),
new ClusterRouterPoolSettings(totalInstances, maxInstancesPerNode,
allowLocalRoutees, useRole)).props(Props
.create(StatsWorker.class)), "workerRouter3");
I have an akka cluster in scala, and this is my code:
val workerRouter = context.actorOf(
ClusterRouterGroup(AdaptiveLoadBalancingGroup(MixMetricsSelector), ClusterRouterGroupSettings( //RoundRobinGroup(Nil)
totalInstances = 1000, routeesPaths = List("/user/worker"),
allowLocalRoutees = true, useRole = Some("workerRole"))).props(),
name = "pool")