How to get All Topics in apache kafka?

How to get All Topics in apache kafka? - java

#RequestMapping(value = "/getTopics",method = RequestMethod.GET)
#ResponseBody
public Response getAllTopics() {
ZkClient zkClient = new ZkClient(ZookeeperProps.zookeeperURL, ZookeeperProps.connectionTimeoutMs,
ZookeeperProps.sessionTimeoutMs, ZKStringSerializer$.MODULE$);
Seq<String> topics = ZkUtils.getAllTopics(zkClient);
scala.collection.Iterator<String> topicIterator = topics.iterator();
String allTopics = "";
while(topicIterator.hasNext()) {
allTopics+=topicIterator.next();
allTopics+="\n";
}
Response response = new Response();
response.setResponseMessage(allTopics);
return response;
}
I am novice in apache kafka.
Now a days trying to understand kafka with zookeeper.
I want to fetch the topics associated with zookeeper. so I am trying following things
a:) first i made the zookeeper client as shown below :
ZkClient(ZookeeperProps.zookeeperURL, ZookeeperProps.connectionTimeoutMs, ZookeeperProps.sessionTimeoutMs, ZKStringSerializer$.MODULE$);
Seq<String> topics = ZkUtils.getAllTopics(zkClient);
but topics is blank set while executing with Java code.I am not getting what is problem here.
My Zookeeper Props is as follow : String zkConnect = "127.0.0.1:2181";
And zookeeper is running perfectly fine.
Please help guys.

It's pretty simple. (My example is written in Java, but it would be almost the same in Scala.)
import java.util.List;
import org.apache.zookeeper.ZooKeeper;
public class KafkaTopicListFetcher {
public static void main(String[] args) throws Exception {
ZooKeeper zk = new ZooKeeper("localhost:2181", 10000, null);
List<String> topics = zk.getChildren("/brokers/topics", false);
for (String topic : topics) {
System.out.println(topic);
}
}
}
The result when I have three topics: test, test2, and test 3
test
test2
test3
The picture below is what I drew for my own blog posting. It would be helpful when you understand the structure of ZooKeeper tree that Kafka uses. (It looks pretty small here. Open the image in a new tab and zoom in please.)

You can use kafka AdminClient . Below code snippet may help you:
Properties properties = new Properties();
properties.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
AdminClient adminClient = AdminClient.create(properties);
ListTopicsOptions listTopicsOptions = new ListTopicsOptions();
listTopicsOptions.listInternal(true);
System.out.println(adminClient.listTopics(listTopicsOptions).names().get());

I would prefer to use kafka-topics.sh which is a built in shell script of Kafka to get topics.

Kafka Client library has AdminClient API: which supports managing and inspecting topics, brokers, configurations, ACL’s.
You can find code samples for
Creating new topic
Delete topic
Describe topic: gives Leader, Partitions, ISR and Replicas
List topics
Fetch controller broker/node details
All brokers/nodes details from the cluster
https://medium.com/nerd-for-tech/how-client-application-interact-with-kafka-cluster-made-easy-with-java-apis-58f29229d992

Related

Apache Flink Dynamic Pipeline

I'm working on creating a framework to allow customers to create their own plugins to my software built on Apache Flink. I've outlined in a snippet below what I'm trying to get working (just as a proof of concept), however I'm getting a org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. error when trying to upload it.
I want to be able to branch the input stream into x number of different pipelines, then having those combine together into a single output. What I have below is just my simplified version I'm starting with.
public class ContentBase {
public static void main(String[] args) throws Exception {
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "kf-service:9092");
properties.setProperty("group.id", "varnost-content");
// Setup up execution environment and get stream from Kafka
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<ObjectNode> logs = see.addSource(new FlinkKafkaConsumer011<>("log-input",
new JSONKeyValueDeserializationSchema(false), properties).setStartFromLatest())
.map((MapFunction<ObjectNode, ObjectNode>) jsonNodes -> (ObjectNode) jsonNodes.get("value"));
// Create a new List of Streams, one for each "rule" that is being executed
// For now, I have a simple custom wrapper on flink's `.filter` function in `MyClass.filter`
List<String> codes = Arrays.asList("404", "200", "500");
List<DataStream<ObjectNode>> outputs = new ArrayList<>();
for (String code : codes) {
outputs.add(MyClass.filter(logs, "response", code));
}
// It seemed as though I needed a seed DataStream to union all others on
ObjectMapper mapper = new ObjectMapper();
ObjectNode seedObject = (ObjectNode) mapper.readTree("{\"start\":\"true\"");
DataStream<ObjectNode> alerts = see.fromElements(seedObject);
// Union the output of each "rule" above with the seed object to then output
for (DataStream<ObjectNode> output : outputs) {
alerts.union(output);
}
// Convert to string and sink to Kafka
alerts.map((MapFunction<ObjectNode, String>) ObjectNode::toString)
.addSink(new FlinkKafkaProducer011<>("kf-service:9092", "log-output", new SimpleStringSchema()));
see.execute();
}
}
I can't figure out how to get the actual error out of the Flink web interface to add that information here

There were a few errors I found.
1) A Stream Execution Environment can only have one input (apparently? I could be wrong) so adding the .fromElements input was not good
2) I forgot all DataStreams are immutable so the .union operation creates a new DataStream output.
The final result ended up being much simpler
public class ContentBase {
public static void main(String[] args) throws Exception {
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "kf-service:9092");
properties.setProperty("group.id", "varnost-content");
// Setup up execution environment and get stream from Kafka
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<ObjectNode> logs = see.addSource(new FlinkKafkaConsumer011<>("log-input",
new JSONKeyValueDeserializationSchema(false), properties).setStartFromLatest())
.map((MapFunction<ObjectNode, ObjectNode>) jsonNodes -> (ObjectNode) jsonNodes.get("value"));
// Create a new List of Streams, one for each "rule" that is being executed
// For now, I have a simple custom wrapper on flink's `.filter` function in `MyClass.filter`
List<String> codes = Arrays.asList("404", "200", "500");
List<DataStream<ObjectNode>> outputs = new ArrayList<>();
for (String code : codes) {
outputs.add(MyClass.filter(logs, "response", code));
}
Optional<DataStream<ObjectNode>> alerts = outputs.stream().reduce(DataStream::union);
// Convert to string and sink to Kafka
alerts.map((MapFunction<ObjectNode, String>) ObjectNode::toString)
.addSink(new FlinkKafkaProducer011<>("kf-service:9092", "log-output", new SimpleStringSchema()));
see.execute();
}
}

The code you post cannot be compiled through because of the last part code (i.e., converting to string). You mixed up the java stream API map with Flink one. Change it to
alerts.get().map(ObjectNode::toString);
can fix it.
Good luck.

Storm/Kafka - Unable to get offset lags for kafka

I am running a Storm Topology which is getting tweets from Kafka on AWS
Ubuntu Server 14.04 LTS instances with 4 nodes - Nimbus, a Supervisor, a Kafka-Zookeeper node, a Zookeeper (for Storm cluster). My Storm UI is up and running and I am able to submit topologies. I have two brokers, but I'm only using the broker.id=0 one. I have tweets in it under a topic. My kafka server is running fine too.
I created the kafka-topic in this way:
bin/kafka-topics.sh --create --zookeeper localhost:2181/kafka --replication-factor 1 --partitions 1 --topic twitter1
The thing I'm confused about is:
SpoutConfig kafkaConfig = new SpoutConfig(kafkaHosts, topicName+"-0", "/kafka", topicName+"-0");
I think my errors are sprouting from this point. Complete code is:
import org.apache.storm.tuple.Fields;
import org.apache.storm.kafka.BrokerHosts;
import org.apache.storm.kafka.KafkaSpout;
import org.apache.storm.kafka.SpoutConfig;
import org.apache.storm.kafka.ZkHosts;
import java.util.Arrays;
import org.apache.storm.Config;
import org.apache.storm.StormSubmitter;
import org.apache.storm.spout.SchemeAsMultiScheme;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.kafka.StringScheme;
public class TwitterTopology{
public static void main(String[] args) {
String topicName = "twitter1";
String topologyName = args[0];
String kafkaIp = "xxx.31.xxx.207"; //hiding the IPs here. This is the IP for my kafka-zk node. Is this ok?
String nimbusHost = "xxx.31.xxx.70";
String kafkaHost = kafkaIp + ":9092";
BrokerHosts kafkaHosts = new ZkHosts(kafkaHost);
SpoutConfig kafkaConfig = new SpoutConfig(kafkaHosts, topicName, "/kafka", topicName);
kafkaConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout = new KafkaSpout(kafkaConfig);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("twitter-spout", kafkaSpout, 8);
builder.setBolt("WordSplitterBolt", new JsonWordSplitterBolt(5)).shuffleGrouping("twitter-spout");
builder.setBolt("IgnoreWordsBolt", new IgnoreWordsBolt()).shuffleGrouping("WordSplitterBolt");
builder.setBolt("WordCounterBolt", new WordCounterBolt(5, 5 * 60, 50)).shuffleGrouping("IgnoreWordsBolt");
Config config = new Config();
config.setDebug(false);
config.setMaxTaskParallelism(5);
config.put(Config.NIMBUS_HOST, nimbusHost);
config.put(Config.NIMBUS_THRIFT_PORT, 6627);
config.put(Config.STORM_ZOOKEEPER_PORT, 2181);
config.put(Config.STORM_ZOOKEEPER_SERVERS, Arrays.asList(kafkaIp));
try {
config.setNumWorkers(20);
config.setMaxSpoutPending(5000);
StormSubmitter.submitTopology(topologyName, config, builder.createTopology());
} catch (Exception e) {
throw new IllegalStateException("Couldn't initialize the topology", e);
}
}
}
I am getting this exception in the Storm UI:
Unable to get offset lags for kafka. Reason: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /brokers/topics/twitter1/partitions at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590) at org.apache.curator.shaded.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:242) at org.apache.curator.shaded.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:231) at org.apache.curator.shaded.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64) at org.apache.curator.shaded.RetryLoop.callWithRetry(RetryLoop.java:100) at org.apache.curator.shaded.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:228) at org.apache.curator.shaded.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:219) at org.apache.curator.shaded.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:41) at org.apache.storm.kafka.monitor.KafkaOffsetLagUtil.getLeadersAndTopicPartitions(KafkaOffsetLagUtil.java:319) at org.apache.storm.kafka.monitor.KafkaOffsetLagUtil.getOffsetLags(KafkaOffsetLagUtil.java:256) at org.apache.storm.kafka.monitor.KafkaOffsetLagUtil.main(KafkaOffsetLagUtil.java:124)
The error Unable to get offset lags for kafka remains constant while the other part of the exception changes according to the zkroot path that I change (3rd argument in SpoutConfig). I don't know how exactly to fill these arguments up to have the Kafka pull in the tweets from my topic.
I used the tutorial present here to write the code for submitting the topology: http://stdatalabs.blogspot.ca/2016/10/real-time-stream-processing-using.html
I have made numerous changes for the maven dependencies. My pom.xml has all the dependencies for storm-core, kafka, etc. with the latest versions available in the maven repo.

The zkHosts() should contain the zookeeper's config instead of kafka. If your zookeeper and kafka are on the same server.
Try giving the correct port for zookeeper(2181)
Refer https://storm.apache.org/releases/1.2.3/storm-kafka.html

Kafka: No message seen on console consumer after message sent by Java Producer

I'm new to Kafka. I created a java producer on my local machine and setup a Kafka broker on another machine, say M2, on the network(I can ping,SSH, connect to this machine). On the Producer side in the Eclipse console I get "Message sent". But when I check the console consumer on machine M2 I cannot see those messages.
My java producer code is:
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import java.util.HashMap;
import java.util.Map;
public class KafkaMessageProducer {
/**
* #param args
*/
public static void main(String[] args) {
KafkaMessageProducer reportObj = new KafkaMessageProducer();
reportObj.send();
}
public void send(){
Map<String, Object> config = new HashMap<String, Object>();
config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "135.113.133.60:9092");
config.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
config.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> producer = new KafkaProducer<String, String>(config);
int maxMessages = 5;
int count = 0;
while(count < maxMessages){
producer.send(new ProducerRecord<String, String>("test", "msg", "message --- #"+count++));
System.out.println("Message send.."+count);
}
producer.close();
}
}
Can you please let me know where I'm going wrong? I can send messages locally on machine M2 from the console producer.
Note: Even when I change the IP address to the full hostname of the Kafka Broker it still has the same issue.
Update: I also think that the Producer is able to connect to the Kafka broker and send the messages, but the Kafka Broker does not pass these messages to the consumer. If I change the IP address or the port to Zookeeper(which is running on the same node as the Kafka Broker), and see Zookeeper's log, it gets the Producer ping and then rejects the session.
Update2: I created a Producer jar and ran this jar on Machine M2 and it worked. So it seems that there is something wrong with the way Producer tries to connect to the Kafka broker. Not sure yet what is the problem.

I finally found the answer and I'm posting here in case anyone else has the same issue. Use the Kafka broker setting advertised.hostname when you are trying to connect remotely. This worked for me.

Just as an idea for debugging - try producer.send(/* record */).get();
That is, wait for the result from the Future returned from the send() method. Could be that there's an exception on producer side and it's simply ignored in the background.

You can try to use code as follows to read the metadata info for the kafka topic to see if the broker received the messages. That can help debugging.
SimpleConsumer consumer = new SimpleConsumer(broker.host(), broker.port(), 100000,
64 * 1024, "your_group_id");
List<String> topics = new ArrayList<>();
topics.add(topic);
TopicMetadataRequest req = new TopicMetadataRequest(topics);
TopicMetadataResponse resp = simpleConsumer.send(req);
if (resp.topicsMetadata().size() != 1) {
throw new RuntimeException("Expected one metadata for topic "
+ topic + " found " + resp.topicsMetadata().size());
}
TopicMetadata topicMetaData = resp.topicsMetadata().get(0);

Kafka 0.8.2 consumer

I am implementing a simple Kafka consumer in Java. Here is the code:
public class TestConsumer {
public static void main(String []a) throws Exception{
Properties props = new Properties();
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("partition.assignment.strategy", "round-robin");
props.put("group.id", "test");
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
try{
consumer.subscribe("ay_sparktopic");
Map<String, ConsumerRecords<String, String>> msg = consumer.poll(100);
System.out.println(msg);
}catch(Exception e){
System.out.println("Exception");
}
}
}
Above consumer gives following error message:
16/03/30 18:01:07 WARN ConsumerConfig: The configuration group.id = test was supplied but isn't a known config.
16/03/30 18:01:07 WARN ConsumerConfig: The configuration partition.assignment.strategy = round-robin was supplied but isn't a known config.
Any documentation I check online gives either range or roundrobin as possible assignment strategies and groupId is a custom name to my knowledge. Not sure what would be right config values here.

It looks like you´re trying to use the new consumer API that´s only available in Kafka 0.9+. To use the older API you have to import classes from the kafka.javaapi.consumer.* package instead of the new org.apache.kafka.clients.consumer package.
consumer.subscribe and consumer.poll relates to the new API so if you really want to use the old API, you need to change your code accordingly. If you instead want to use the new consumer API, you need to run Kafka 0.9 or later.

Using the below dependency resolves the issue.
libraryDependencies += "org.apache.kafka" % "kafka_2.11" % "0.9.0.0"
Even when you are having previous version running E.g.,kafka 0.8.2.1.

Kafka consumer in java not consuming messages

I am trying to a kafka consumer to get messages which are produced and posted to a topic in Java. My consumer goes as follows.
consumer.java
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import kafka.consumer.Consumer;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
import kafka.javaapi.message.ByteBufferMessageSet;
import kafka.message.MessageAndOffset;
public class KafkaConsumer extends Thread {
final static String clientId = "SimpleConsumerDemoClient";
final static String TOPIC = " AATest";
ConsumerConnector consumerConnector;
public static void main(String[] argv) throws UnsupportedEncodingException {
KafkaConsumer KafkaConsumer = new KafkaConsumer();
KafkaConsumer.start();
}
public KafkaConsumer(){
Properties properties = new Properties();
properties.put("zookeeper.connect","10.200.208.59:2181");
properties.put("group.id","test-group");
ConsumerConfig consumerConfig = new ConsumerConfig(properties);
consumerConnector = Consumer.createJavaConsumerConnector(consumerConfig);
}
#Override
public void run() {
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(TOPIC, new Integer(1));
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumerConnector.createMessageStreams(topicCountMap);
KafkaStream<byte[], byte[]> stream = consumerMap.get(TOPIC).get(0);
System.out.println(stream);
ConsumerIterator<byte[], byte[]> it = stream.iterator();
while(it.hasNext())
System.out.println("from it");
System.out.println(new String(it.next().message()));
}
private static void printMessages(ByteBufferMessageSet messageSet) throws UnsupportedEncodingException {
for(MessageAndOffset messageAndOffset: messageSet) {
ByteBuffer payload = messageAndOffset.message().payload();
byte[] bytes = new byte[payload.limit()];
payload.get(bytes);
System.out.println(new String(bytes, "UTF-8"));
}
}
}
When I run the above code I am getting nothing in the console wheres the java producer program behind the screen is posting data continously under the 'AATest' topic. Also the in the zookeeper console I am getting the following lines when I try running the above consumer.java
[2015-04-30 15:57:31,284] INFO Accepted socket connection from /10.200.208.59:51780 (org.apache.zookeeper.
server.NIOServerCnxnFactory)
[2015-04-30 15:57:31,284] INFO Client attempting to establish new session at /10.200.208.59:51780 (org.apa
che.zookeeper.server.ZooKeeperServer)
[2015-04-30 15:57:31,315] INFO Established session 0x14d09cebce30007 with negotiated timeout 6000 for clie
nt /10.200.208.59:51780 (org.apache.zookeeper.server.ZooKeeperServer)
Also when I run a separate console-consumer pointing to the AATest topic, I am getting all the data produced by the producer to that topic.
Both consumer and broker are in the same machine whereas the producer is in different machine. This actually resembles this question. But going through it dint help me. Please help me.

Different answer but it happened to be initial offset (auto.offset.reset) for a consumer in my case. So, setting up auto.offset.reset=earliest fixed the problem in my scenario. Its because I was publishing event first and then starting a consumer.
By default, consumer only consumes events published after it started because auto.offset.reset=latest by default.
eg. consumer.properties
bootstrap.servers=localhost:9092
enable.auto.commit=true
auto.commit.interval.ms=1000
session.timeout.ms=30000
auto.offset.reset=earliest
key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
Test
class KafkaEventConsumerSpecs extends FunSuite {
case class TestEvent(eventOffset: Long, hashValue: Long, created: Date, testField: String) extends BaseEvent
test("given an event in the event-store, consumes an event") {
EmbeddedKafka.start()
//PRODUCE
val event = TestEvent(0l, 0l, new Date(), "data")
val config = new Properties() {
{
load(this.getClass.getResourceAsStream("/producer.properties"))
}
}
val producer = new KafkaProducer[String, String](config)
val persistedEvent = producer.send(new ProducerRecord(event.getClass.getSimpleName, event.toString))
assert(persistedEvent.get().offset() == 0)
assert(persistedEvent.get().checksum() != 0)
//CONSUME
val consumerConfig = new Properties() {
{
load(this.getClass.getResourceAsStream("/consumer.properties"))
put("group.id", "consumers_testEventsGroup")
put("client.id", "testEventConsumer")
}
}
assert(consumerConfig.getProperty("group.id") == "consumers_testEventsGroup")
val kafkaConsumer = new KafkaConsumer[String, String](consumerConfig)
assert(kafkaConsumer.listTopics().asScala.map(_._1).toList == List("TestEvent"))
kafkaConsumer.subscribe(Collections.singletonList("TestEvent"))
val events = kafkaConsumer.poll(1000)
assert(events.count() == 1)
EmbeddedKafka.stop()
}
}
But if consumer is started first and then published, the consumer should be able to consume the event without auto.offset.reset required to be set to earliest.
References for kafka 0.10
https://kafka.apache.org/documentation/#consumerconfigs

In our case, we solved our problem with the following steps:
The first thing we found is that there is an config called 'retry' for KafkaProducer and its default value means 'No Retry'. Also, send method of the KafkaProducer is async without calling the get method of the send method's result. In this way, there is no guarantee to delivery produced messages to the corresponding broker without retry. So, you have to increase it a bit or can use idempotence or transactional mode of KafkaProducer.
The second case is about the Kafka and Zookeeper version. We chose the 1.0.0 version of the Kafka and Zookeeper 3.4.4. Especially, Kafka 1.0.0 had an issue about the connectivity with Zookeeper. If Kafka loose its connection to the Zookeeper with an unexpected exception, it looses the leadership of the partitions which didn't synced yet. There is an bug topic about this issue :
https://issues.apache.org/jira/browse/KAFKA-2729
After we found the corresponding logs at Kafka log which indicates same issue at topic above, we upgraded our Kafka broker version to the 1.1.0.
It is also important point to notice that small sized the partitions (like 100 or less), increases the throughput of the producer so if there is no enough consumer then the available consumer fall into the thread stuck on results with delayed messages(we measured delay with minutes, approximately 10-15 minutes). So you need to balance and configure the partition size and thread counts of your application correctly according to your available resources.

There might also be a case where kafka takes a long time to rebalance consumer groups when a new consumer is added to the same group id.
Check kafka logs to see if the group is rebalanced after starting your consumer

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.