I am using following code to extract Kafka broker list from zookeeper:
private static String getBrokerList() {
try {
ZooKeeper zookeeper = new ZooKeeper(zookeeperConnect, 15000, null);
List<String> ids = zookeeper.getChildren(ZkUtils.BrokerIdsPath(), false);
List<String> brokerList = new ArrayList<>();
for (String id : ids) {
String brokerInfo = new String(zookeeper.getData(ZkUtils.BrokerIdsPath() + '/' + id, false, null), Charset.forName("UTF-8"));
JsonObject jsonElement = new JsonParser().parse(brokerInfo).getAsJsonObject();
String host = jsonElement.get("host").getAsString();
brokerList.add(host + ':' + jsonElement.get("port").toString());
return Joiner.on(",").join(brokerList);
} catch (KeeperException | InterruptedException e) {
return "";
Above code is working fine when one thread executing the code at a time.
However, when several threads are executing the above code it fails with the following exception occasionally:
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /brokers/ids
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1532)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1560)
What am I doing wrong here?
My zookeeper version is 3.4.6-1569965.
from http://zookeeper.apache.org/doc/r3.4.9/api/org/apache/zookeeper/ZooKeeper.html#ZooKeeper(java.lang.String,%20int,%20org.apache.zookeeper.Watcher)
"Session establishment is asynchronous. This constructor will initiate connection to the server and return immediately - potentially (usually) before the session is fully established. The watcher argument specifies the watcher that will be notified of any changes in state. This notification can come at any point before or after the constructor call has returned."
You have to wait fro zookeeper connection to fully estabilish:
Scroll down to the api section "Connect to the ZooKeeper Ensemble"
How should we retrieve the full details of all brokers(connected/disconnected) from kafka cluster/zookeeper ?
I found following way to fetch only active brokers, but I want the the IP address of broker which is serving previously in cluster but it is disconnected now
Following code snippet gives list of active brokers:
ZooKeeper zkInstance = new ZooKeeper("mymachine:port", 10000, null);
brokerIDs = zkInstance.getChildren("/brokers/ids", false);
for (String brokerID : brokerIDs) {
brokerInfo = new String(zkInstance.getData("/brokers/ids/" + brokerID, false, null));
String host=brokerInfo.substring(brokerInfo.indexOf("\"host\"")).split(",") [0].replaceAll("\"","").split(":")[1];
String port=brokerInfo.substring(brokerInfo.indexOf("\"jmx_port\"")).split(",") [0].replaceAll("\"","").split(":")[1];
I need information of all connected/disconnected brokers in multi node kafka cluster
Use describeCluster() of the AdminClient class to get the broker details such as host,port,id and rock.
Please refer the following code:
Properties kafkaProperties = new Properties();
kafkaProperties.put("bootstrap.servers", "localhost:9092,localhost:9093,localhost:9094");
AdminClient adminClient = AdminClient.create(kafkaProperties);
DescribeClusterResult describeClusterResult = adminClient.describeCluster();
Collection<Node> brokerDetails = describeClusterResult.nodes().get();
System.out.println("host and port details");
for(Node broker:brokerDetails) {
We are trying to implement Kafka as our message broker solution. We are deploying our Spring Boot microservices in IBM BLuemix, whose internal message broker implementation is Kafka version 0.10. Since my experience is more on the JMS, ActiveMQ end, I was wondering what should be the ideal way to handle system level errors in the java consumers?
Here is how we have implemented it currently
Consumer properties
We are using the default properties for
Kafka Consumer
We are spinning up 3 threads per topic all having the same groupId, i.e one KafkaConsumer instance per thread. We have only one partition as of now. The consumer code looks like this in the constructor of the thread class
kafkaConsumer = new KafkaConsumer<String, String>(properties);
final List<String> topicList = new ArrayList<String>();
kafkaConsumer.subscribe(topicList, new ConsumerRebalanceListener() {
public void onPartitionsRevoked(final Collection<TopicPartition> partitions) {
public void onPartitionsAssigned(final Collection<TopicPartition> partitions) {
try {
logger.info("Partitions assigned, consumer seeking to end.");
for (final TopicPartition partition : partitions) {
final long position = kafkaConsumer.position(partition);
logger.info("current Position: " + position);
logger.info("Seeking to end...");
logger.info("Seek from the current position: " + kafkaConsumer.position(partition));
kafkaConsumer.seek(partition, position);
logger.info("Consumer can now begin consuming messages.");
} catch (final Exception e) {
logger.error("Consumer can now begin consuming messages.");
The actual reading happens in the run method of the thread
try {
// Poll on the Kafka consumer every second.
final ConsumerRecords<String, String> records = kafkaConsumer.poll(1000);
// Iterate through all the messages received and print their
// content.
for (final TopicPartition partition : records.partitions()) {
final List<ConsumerRecord<String, String>> partitionRecords = records.records(partition);
logger.info("consumer is alive and is processing "+ partitionRecords.size() +" records");
for (final ConsumerRecord<String, String> record : partitionRecords) {
logger.info("processing topic "+ record.topic()+" for key "+record.key()+" on offset "+ record.offset());
final Class<? extends Event> resourceClass = eventProcessors.getResourceClass();
final Object obj = converter.convertToObject(record.value(), resourceClass);
if (obj != null) {
logger.info("Event: " + obj + " acquired by " + Thread.currentThread().getName());
final CommsEvent event = resourceClass.cast(converter.convertToObject(record.value(), resourceClass));
final MessageResults results = eventProcessors.processEvent(event
if ("Success".equals(results.getStatus())) {
// commit the processed message which changes
// the offset
logger.info("Message processed sucessfully");
} else {
kafkaConsumer.seek(new TopicPartition(record.topic(), record.partition()), record.offset());
logger.error("Error processing message : {} with error : {},resetting offset to {} ", obj,results.getError().getMessage(),record.offset());
// TODO add return
} catch (final Exception e) {
logger.error("Consumer has failed with exception: " + e, e);
You will notice the EventProcessor which is a service class which processes each record, in most cases commits the record in database. If the processor throws an error (System Exception or ValidationException) we do not commit but programatically set the seek to that offset, so that subsequent poll will return from that offset for that group id.
The doubt now is that, is this the right approach? If we get an error and we set the offset then until that is fixed no other message is processed. This might work for system errors like not able to connect to DB, but if the problem is only with that event and not others to process this one record we wont be able to process any other record. We thought of the concept of ErrorTopic where when we get an error the consumer will publish that event to the ErrorTopic and in the meantime it will keep on processing other subsequent events. But it looks like we are trying to bring in the design concepts of JMS (due to my previous experience) into kafka and there may be better way to solve error handling in kafka. Also reprocessing it from error topic may change the sequence of messages which we don't want for some scenarios
Please let me know how anyone has handled this scenario in their projects following the Kafka standards.
if the problem is only with that event and not others to process this one record we wont be able to process any other record
that's correct and your suggestion to use an error topic seems a possible one.
I also noticed that with your handling of onPartitionsAssigned you essentially do not use the consumer committed offset, as you seem you'll always seek to the end.
If you want to restart from the last succesfully committed offset, you should not perform a seek
Finally, I'd like to point out, though it looks like you know that, having 3 consumers in the same group subscribed to a single partition - means that 2 out of 3 will be idle.
I am new to Kafka. Tried to implement consumer and producer classes to send and receive messages. Need to configure bootstrap.servers for both classes which is a list of broker's ip and port separated by ,. For example,
Since the application will be running on the master node of a cluster, it should be able to retrieve the broker information from ZooKeeper just like the answer to Kafka: Get broker host from ZooKeeper.
public static void main(String[] args) throws Exception {
ZooKeeper zk = new ZooKeeper("localhost:2181", 10000, null);
List<String> ids = zk.getChildren("/brokers/ids", false);
for (String id : ids) {
String brokerInfo = new String(zk.getData("/brokers/ids/" + id, false, null));
System.out.println(id + ": " + brokerInfo);
However this brokerInfo is in Json format which looks like this:
In this same post, another one suggested the following way of getting connection string for each broker and join them together with comma.
for (String id : ids) {
String brokerInfoString = new String(zk.getData("/brokers/ids/" + id, false, null));
Broker broker = Broker.createBroker(Integer.valueOf(id), brokerInfoString);
if (broker != null) {
If this Broker class is from org.apache.kafka.common.requests.UpdateMetadataRequest.Broker, it does not have methods createBroker and connectionString.
Found another similar post Getting the list of Brokers Dynamically. But it did not say how to get the attribute from broker info such as host and port. I can probably write a parser for the json like string to extract them, but I suspect there is more Kafka native way to do that. Any suggestions?
EDIT: I realized the Broker class is from kafka.cluster.Broker. Still it does not have method connectionstring().
You could use ZkUtils to retrieve all the broker information in the cluster, as show below:
ZkUtils zk = ZkUtils.apply("zkHost:2181", 6000, 6000, true);
List<Broker> brokers = JavaConversions.seqAsJavaList(zk.getAllBrokersInCluster());
for (Broker broker : brokers) {
//assuming you do not enable security
I have some basic code that uses a prepared statement in a for loop and writes the result into a Cassandra Table with some throttling using a semaphore.
Session session = null;
try {
session = connector.openSession();
} catch( Exception ex ) {
// .. moan and complain..
System.err.printf("Got %s trying to openSession - %s\n", ex.getClass().getCanonicalName(), ex.getMessage() );
if( session != null ) {
// Prepared Statement for Cassandra Inserts
PreparedStatement statement = session.prepare(
"INSERT INTO model.base " +
"(channel, " +
"time_key, " +
"power" +
") VALUES (?,?,?);");
BoundStatement boundStatement = new BoundStatement(statement);
//Query Cassandra Table that has capital letters in the column names
ResultSet results = session.execute("SELECT \"Time_Key\",\"Power\",\"Bandwidth\",\"Start_Frequency\" FROM \"SB1000_49552019\".\"Measured_Value\" limit 800000;");
// Get the Variables from each Row of Cassandra Data
for (Row row : results){
// Upper Case Column Names in Cassandra
time_key = row.getLong("Time_Key");
start_frequency = row.getDouble("Start_Frequency");
power = row.getFloat("Power");
bandwidth = row.getDouble("Bandwidth");
// Create Channel Power Buckets, place information into prepared statement binding, write to cassandra.
for(channel = 1.6000E8; channel <= channel_end; channel+=increment ){
if( (channel >= start_frequency) && (channel <= (start_frequency + bandwidth)) ) {
ResultSetFuture rsf = session.executeAsync(boundStatement.bind(channel,time_key,power));
backlogList.add( rsf ); // put the new one at the end of the list
if( backlogList.size() > 10000 ) { // wait till we have a few
while( backlogList.size() > 5432 ) { // then harvest about half of the oldest ones of them
rsf = backlogList.remove(0);
} // end while
} // end if
} // end if
} // end for
} // end "row" for
} // end session
My connection is built with the following:
public static void main(String[] args) {
if (args.length != 2) {
System.err.println("Syntax: com.neutronis.Spark_Reports <Spark Master URL> <Cassandra contact point>");
SparkConf conf = new SparkConf();
conf.setAppName("Spark Reports");
conf.set("spark.cassandra.connection.host", args[1]);
Spark_Reports app = new Spark_Reports(conf);
With this code im attempting to use a semaphore but my Cassandra Cluster still seems to get overloaded and kick out the error:
ERROR ControlConnection: [Control connection] Cannot connect to any
host, scheduling retry in 1000 milliseconds Exception in thread "main"
com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s) tried for query failed (no host was tried)
It seems odd that it says no host was tried.
I've looked at other semaphore throttling issues such as this and this and attempted to apply to my code above but am still getting the error.
Read my answer to this question for how to back-pressure when using asynchronous calls: What is the best way to get backpressure for Cassandra Writes?
I have multiple messages in SQS. The following code always returns only one, even if there are dozens visible (not in flight). setMaxNumberOfMessages I thought would allow multiple to be consumed at once .. have i misunderstood this?
CreateQueueRequest createQueueRequest = new CreateQueueRequest().withQueueName(queueName);
String queueUrl = sqs.createQueue(createQueueRequest).getQueueUrl();
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest(queueUrl);
List<Message> messages = sqs.receiveMessage(receiveMessageRequest).getMessages();
for (Message message : messages) {
// i'm a message from SQS
I've also tried using withMaxNumberOfMessages without any such luck:
How do I know there are messages in the queue? More than 1?
Set<String> attrs = new HashSet<String>();
CreateQueueRequest createQueueRequest = new CreateQueueRequest().withQueueName(queueName);
GetQueueAttributesRequest a = new GetQueueAttributesRequest().withQueueUrl(sqs.createQueue(createQueueRequest).getQueueUrl()).withAttributeNames(attrs);
Map<String,String> result = sqs.getQueueAttributes(a).getAttributes();
int num = Integer.parseInt(result.get("ApproximateNumberOfMessages"));
The above always is run prior and gives me an int that is >1
Thanks for your input
AWS API Reference Guide: Query/QueryReceiveMessage
Due to the distributed nature of the queue, a weighted random set of machines is sampled on a ReceiveMessage call. That means only the messages on the sampled machines are returned. If the number of messages in the queue is small (less than 1000), it is likely you will get fewer messages than you requested per ReceiveMessage call. If the number of messages in the queue is extremely small, you might not receive any messages in a particular ReceiveMessage response; in which case you should repeat the request.
MaxNumberOfMessages: Maximum number of messages to return. SQS never returns more messages than this value but might return fewer.
There is a comprehensive explanation for this (arguably rather idiosyncratic) behaviour in the SQS reference documentation.
SQS stores copies of messages on multiple servers and receive message requests are made to these servers with one of two possible strategies,
Short Polling : The default behaviour, only a subset of the servers (based on a weighted random distribution) are queried.
Long Polling : Enabled by setting the WaitTimeSeconds attribute to a non-zero value, all of the servers are queried.
In practice, for my limited tests, I always seem to get one message with short polling just as you did.
I had the same problem. What is your Receive Message Wait Time for your queue set to? When mine was at 0, it only returned 1 message even if there were 8 in the queue. When I increased the Receive Message Wait Time, then I got all of them. Seems kind of buggy to me.
I was just trying the same and with the help of these two attributes setMaxNumberOfMessages and setWaitTimeSeconds i was able to get 10 messages.
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest(myQueueUrl);
Snapshot of o/p:
Receiving messages from TestQueue.
Number of messages:10
MessageId: 31a7c669-1f0c-4bf1-b18b-c7fa31f4e82d
Just to be clear, the more practical use of this would be to add to your constructor like this:
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest(queueUrl).withMaxNumberOfMessages(10);
Otherwise, you might as well just do:
That being said, changing this won't help the original problem.
Thanks Caoilte!
I faced this issue also. Finally solved by using long polling follow the configuration here:
Unfortunately, to use long polling, you must create your queue as FIFO one. I tried standard queue with no luck.
And when receiving, need also set MaxNumberOfMessages. So my code is like:
ReceiveMessageRequest receive_request = new ReceiveMessageRequest()
Although solved, still feel too wired. AWS should definitely provide a more neat API for this kind of basic receiving operation.
From my point, AWS has many many cool features but not good APIs. Like those guys are rushing out all the time.
For small task list I use FIFO queue like stackoverflow.com/a/55149351/13678017
for example modified AWS tutorial
// Create a queue.
System.out.println("Creating a new Amazon SQS FIFO queue called " + "MyFifoQueue.fifo.\n");
final Map<String, String> attributes = new HashMap<>();
// A FIFO queue must have the FifoQueue attribute set to true.
attributes.put("FifoQueue", "true");
* If the user doesn't provide a MessageDeduplicationId, generate a
* MessageDeduplicationId based on the content.
attributes.put("ContentBasedDeduplication", "true");
// The FIFO queue name must end with the .fifo suffix.
final CreateQueueRequest createQueueRequest = new CreateQueueRequest("MyFifoQueue4.fifo")
final String myQueueUrl = sqs.createQueue(createQueueRequest).getQueueUrl();
// List all queues.
System.out.println("Listing all queues in your account.\n");
for (final String queueUrl : sqs.listQueues().getQueueUrls()) {
System.out.println(" QueueUrl: " + queueUrl);
// Send a message.
System.out.println("Sending a message to MyQueue.\n");
for (int i = 0; i < 4; i++) {
var request = new SendMessageRequest()
.withMessageBody("message " + i)
for (int i = 0; i < 6; i++) {
var request = new SendMessageRequest()
.withMessageBody("message " + i)
// Receive messages.
System.out.println("Receiving messages from MyQueue.\n");
var receiveMessageRequest = new ReceiveMessageRequest(myQueueUrl);
// what receive?
final List<Message> messages = sqs.receiveMessage(receiveMessageRequest).getMessages();
for (final Message message : messages) {
System.out.println(" MessageId: "
+ message.getMessageId());
System.out.println(" ReceiptHandle: "
+ message.getReceiptHandle());
System.out.println(" MD5OfBody: "
+ message.getMD5OfBody());
System.out.println(" Body: "
+ message.getBody());
for (final Entry<String, String> entry : message.getAttributes()
.entrySet()) {
System.out.println(" Name: " + entry
System.out.println(" Value: " + entry
Here's a workaround, you can call receiveMessageFromSQS method asynchronously.
bulkReceiveFromSQS (queueUrl, totalMessages, asyncLimit, batchSize, visibilityTimeout, waitTime, callback) {
batchSize = Math.min(batchSize, 10);
let self = this,
noOfIterations = Math.ceil(totalMessages / batchSize);
async.timesLimit(noOfIterations, asyncLimit, function(n, next) {
self.receiveMessageFromSQS(queueUrl, batchSize, visibilityTimeout, waitTime,
function(err, result) {
if (err) {
return next(err);
return next(null, _.get(result, 'Messages'));
}, function (err, listOfMessages) {
if (err) {
return callback(err);
listOfMessages = _.flatten(listOfMessages).filter(Boolean);
return callback(null, listOfMessages);
It will return you an array with a given number of messages