Kafka Topic creation from Java API [duplicate]

Kafka Topic creation from Java API [duplicate] - java

This question already has answers here:
How to create a Topic in Kafka through Java
(6 answers)
Closed 1 year ago.
I am trying to create a Kafka topic using Java API, but getting LEADER is NOT AVAILABLE.
Code:
int partition = 0;
ZkClient zkClient = null;
try {
String zookeeperHosts = "localhost:2181"; // If multiple zookeeper then -> String zookeeperHosts = "192.168.20.1:2181,192.168.20.2:2181";
int sessionTimeOutInMs = 15 * 1000; // 15 secs
int connectionTimeOutInMs = 10 * 1000; // 10 secs
zkClient = new ZkClient(zookeeperHosts, sessionTimeOutInMs, connectionTimeOutInMs, ZKStringSerializer$.MODULE$);
String topicName = "mdmTopic5";
int noOfPartitions = 2;
int noOfReplication = 1;
Properties topicConfiguration = new Properties();
AdminUtils.createTopic(zkClient, topicName, noOfPartitions, noOfReplication, topicConfiguration);
} catch (Exception ex) {
ex.printStackTrace();
} finally {
if (zkClient != null) {
zkClient.close();
}
}
Error:
[2017-10-19 12:14:42,263] WARN Error while fetching metadata with correlation id 1 : {mdmTopic5=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2017-10-19 12:14:42,370] WARN Error while fetching metadata with correlation id 3 : {mdmTopic5=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2017-10-19 12:14:42,479] WARN Error while fetching metadata with correlation id 4 : {mdmTopic5=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
Does Kafka 0.11.0.1 supports AdminUtils.??? Please let me know how to create topic in this version.
Thanks in Advance.

Since Kafka 0.11 there is a proper Admin API for creating (and deleting) topics and I'd recommend to use it instead of directly connecting to Zookeeper.
See AdminClient.createTopics(): http://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/admin/AdminClient.html#createTopics(java.util.Collection)

Generally LEADER NOT AVAILABLE points to network issues rather than issues with your code.
Try:
telnet host port to see if you can connect to all required hosts/ports from your machine.
However, the latest approach is to use the BOOTSTRAP_SERVERS while creating topics.
A working version of topic creation code using scala would be as follows:
Import the required kafka-clients using sbt.
// https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients
libraryDependencies += Seq("org.apache.kafka" % "kafka-clients" % "2.1.1")
The code for topic creation in scala:
import java.util.Arrays
import java.util.Properties
import org.apache.kafka.clients.admin.NewTopic
import org.apache.kafka.clients.admin.{AdminClient, AdminClientConfig}
class CreateKafkaTopic {
def create(): Unit = {
val config = new Properties()
config.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "192.30.1.5:9092")
val localKafkaAdmin = AdminClient.create(config)
val partitions = 3
val replication = 1.toShort
val topic = new NewTopic("integration-02", partitions, replication)
val topics = Arrays.asList(topic)
val topicStatus = localKafkaAdmin.createTopics(topics).values()
//topicStatus.values()
println(topicStatus.keySet())
}
}
Hope it helps.

Related

How to track committed offset with Spark job for kafka batch

I have a use case where i am writing to a Kafka topic in batches using spark job (no streaming).Initially i pump-in suppose 10 records to Kafka topic and run the spark job which does some processing and finally write to another Kafka topic.
Next time when i push another 5 records and run the spark job, my requirement is to start processing these 5 records only not from starting offset. I need to maintain the committed offset so that spark job should run on next offset position and do the processing.
Here is code from kafka side to fetch the offset:
private static List<TopicPartition> getPartitions(KafkaConsumer consumer, String topic) {
List<PartitionInfo> partitionInfoList = consumer.partitionsFor(topic);
return partitionInfoList.stream().map(x -> new TopicPartition(topic, x.partition())).collect(Collectors.toList());
}
public static void getOffSet(KafkaConsumer consumer) {
List<TopicPartition> topicPartitions = getPartitions(consumer, topic);
consumer.assign(topicPartitions);
consumer.seekToBeginning(topicPartitions);
topicPartitions.forEach(x -> {
System.out.println("Partition-> " + x + " startingOffSet-> " + consumer.position(x));
});
consumer.assign(topicPartitions);
consumer.seekToEnd(topicPartitions);
topicPartitions.forEach(x -> {
System.out.println("Partition-> " + x + " endingOffSet-> " + consumer.position(x));
});
topicPartitions.forEach(x -> {
consumer.poll(1000) ;
OffsetAndMetadata offsetAndMetadata = consumer.committed(x);
long position = consumer.position(x);
System.out.printf("Committed: %s, current position %s%n", offsetAndMetadata == null ? null : offsetAndMetadata
.offset(), position);
});
}
Below code is for spark to load the messages from topic which is not working :
Dataset<Row> kafkaDataset = session.read().format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", topic)
.option("group.id", "test-consumer-group")
.option("startingOffsets","{\"Topic1\":{\"0\":2}}")
.option("endingOffsets", "{\"Topic1\":{\"0\":3}}")
.option("enable.auto.commit","true")
.load();
After above code executes i am again trying to get the offset by calling
getoffset(consumer)
from the topic which always reads from 0 offset and committed offset fetched initially keeps on increasing. I am new to kafka and still figuring out how to handle such scenarion.Please help here.
Initially i had 10 records in my topic, i published another 2 records and here is the o/p:
Output post getoffset method executes :
Partition-> Topic00-0 startingOffSet-> 0 Partition->
Topic00-0 endingOffSet-> 12 Committed: 12, current position
12
Output post spark code executes for loading messages.
Partition-> Topic00-0 startingOffSet-> 0 Partition->
Topic00-0 endingOffSet-> 12 Committed: 12, current position
12
I see no diff and . Please take a look and suggest resolution for this sceanario.

Clustering in Akka

I am trying to understand Akka Clustering for parallel computation using nodes. So, I wrote one factorial program and want to run that on a cluster of 3 nodes (inclusive master).
I am using a configuration file to provide seed nodes and cluster provider. And reading file in my code.
cluster {
akka {
actor {
provider = "cluster"
}
remote {
log-remote-lifecycle-events = off
netty.tcp {
hostname = "127.0.0.1"
port = 0
}
}
cluster {
seed-nodes = [
"akka.tcp://ClusterSystem#127.0.0.1:9876",
"akka.tcp://ClusterSystem#127.0.0.1:6789"]
# auto downing is NOT safe for production deployments.
# you may want to use it during development, read more about it in the docs.
#
# auto-down-unreachable-after = 10s
}
}
}
Following is the java code:
package test
import java.io.File
import akka.actor.{Actor,ActorSystem, Props}
import akka.stream.ActorMaterializer
import com.typesafe.config.ConfigFactory
import scala.concurrent.ExecutionContextExecutor
class Factorial extends Actor {
override def receive = {
case (n: Int) => fact(n)
}
def fact(n:Int): Int ={
if (n<=1){
return 1
}
else {
return n * fact(n - 1)
}
}
}
object ClusterActor {
def main(args: Array[String]): Unit = {
val configFile = "E:/Scala/StatsRuleEngine/Resources/local_configuration.conf"
val config = ConfigFactory.parseFile(new File(configFile))
implicit val system:ActorSystem = ActorSystem("ClusterSystem" ,config.getConfig("cluster"))
implicit val materializer:ActorMaterializer = ActorMaterializer()
implicit val executionContext: ExecutionContextExecutor = system.dispatcher
val FacActor = system.actorOf(Props[Factorial],"Factorial")
FacActor ! (5)
}
}
On running the program, I am getting below error
Remote connection to [null] failed with java.net.ConnectException: Connection refused: no further information: /127.0.0.1:6789
[WARN] [01/21/2019 16:31:15.979] [New I/O boss #3] [NettyTransport(akka://ClusterSystem)] Remote connection to [null] failed with java.net.ConnectException: Connection refused: no further information: /127.0.0.1:9876
I tried to search, but I don't why this error is coming.

When you boot your nodes, you need to specify the exact ports that will be open in config
netty.tcp {
hostname = "127.0.0.1"
port = 0 // THE EXACT PORT
}
So, if your seed nodes say 9876 and 6789. Two of nodes have to specify
netty.tcp {
hostname = "127.0.0.1"
port = 9876
}
and
netty.tcp {
hostname = "127.0.0.1"
port = 6789
}
Note, that the node that is listed first in seed nodes list must start first.

Can produce to Kafka but cannot consume

I'm using the Kafka JDK client ver 0.10.2.1 . I am able to produce simple messages to Kafka for a "heartbeat" test, but I cannot consume a message from that same topic using the sdk. I am able to consume that message when I go into the Kafka CLI, so I have confirmed the message is there. Here's the function I'm using to consume from my Kafka server, with the props - I pass the message I produced to the topic only after I have indeed confirmed the produce() was succesful, I can post that function later if requested:
private def consumeFromKafka(topic: String, expectedMessage: String): Boolean = {
val props: Properties = initProps("consumer")
val consumer = new KafkaConsumer[String, String](props)
consumer.subscribe(List(topic).asJava)
var readExpectedRecord = false
try {
val records = {
val firstPollRecs = consumer.poll(MAX_POLLTIME_MS)
// increase timeout and try again if nothing comes back the first time in case system is busy
if (firstPollRecs.count() == 0) firstPollRecs else {
logger.info("KafkaHeartBeat: First poll had 0 records- trying again - doubling timeout to "
+ (MAX_POLLTIME_MS * 2)/1000 + " sec.")
consumer.poll(MAX_POLLTIME_MS * 2)
}
}
records.forEach(rec => {
if (rec.value() == expectedMessage) readExpectedRecord = true
})
} catch {
case e: Throwable => //log error
} finally {
consumer.close()
}
readExpectedRecord
}
private def initProps(propsType: String): Properties = {
val prop = new Properties()
prop.put("bootstrap.servers", kafkaServer + ":" + kafkaPort)
propsType match {
case "producer" => {
prop.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
prop.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
prop.put("acks", "1")
prop.put("producer.type", "sync")
prop.put("retries", "3")
prop.put("linger.ms", "5")
}
case "consumer" => {
prop.put("group.id", groupId)
prop.put("enable.auto.commit", "false")
prop.put("auto.commit.interval.ms", "1000")
prop.put("session.timeout.ms", "30000")
prop.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
prop.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
prop.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
// poll just once, should only be one record for the heartbeat
prop.put("max.poll.records", "1")
}
}
prop
}
Now when I run the code, here's what it outputs in the console:
13:04:21 - Discovered coordinator serverName:9092 (id: 2147483647
rack: null) for group 0b8947e1-eb68-4af3-ac7b-be3f7c02e76e. 13:04:23
INFO o.a.k.c.c.i.ConsumerCoordinator - Revoking previously assigned
partitions [] for group 0b8947e1-eb68-4af3-ac7b-be3f7c02e76e 13:04:24
INFO o.a.k.c.c.i.AbstractCoordinator - (Re-)joining group
0b8947e1-eb68-4af3-ac7b-be3f7c02e76e 13:04:25 INFO
o.a.k.c.c.i.AbstractCoordinator - Successfully joined group
0b8947e1-eb68-4af3-ac7b-be3f7c02e76e with generation 1 13:04:26 INFO
o.a.k.c.c.i.ConsumerCoordinator - Setting newly assigned partitions
[HeartBeat_Topic.Service_5.2018-08-03.13_04_10.377-0] for group
0b8947e1-eb68-4af3-ac7b-be3f7c02e76e 13:04:27 INFO
c.p.p.l.util.KafkaHeartBeatUtil - KafkaHeartBeat: First poll had 0
records- trying again - doubling timeout to 60 sec.
And then nothing else, no errors thrown -so no records are polled. Does anyone have any idea what's preventing the 'consume' from happening? The subscriber seems to be successful, as I'm able to successfully call the listTopics and list partions no problem.

Your code has a bug. It seems your line:
if (firstPollRecs.count() == 0)
Should say this instead
if (firstPollRecs.count() > 0)
Otherwise, you're passing in an empty firstPollRecs, and then iterating over that, which obviously returns nothing.

TimeOut Exception while doRpcCall-Importing contacts in TeleGram java API

I am trying to add and import a contact for sending message but i am getting time out exception every time.
Correct me if something is wrong.
Code:
TLInputContact tlic=new TLInputContact(1, PhNo, Fname, Lname);
TLVector contacts = new TLVector<>();
contacts.add(tlic);
TLRequestContactsImportContacts importContacts = new TLRequestContactsImportContacts(contacts, true);
TLImportedContacts importedContacts = api.doRpcCall(importContacts);
TLAbsUser recipient=importedContacts.getUsers().get(0);
TLInputPeerContact peer = new TLInputPeerContact(recipient.getId());
TLRequestMessagesSendMessage sendMessageRequest = new TLRequestMessagesSendMessage(peer, "Test", rnd.nextInt());
TLAbsSentMessage sentMessage = api.doRpcCall(sendMessageRequest);
Log::
TelegramApi#1001:Timeout Iteration
ActorDispatcher:Dispatching action: schedule for scheduller
ActorDispatcher:Dispatching action: schedule for scheduller
TelegramApi#1001:Timeout Iteration
TelegramApi#1001:RPC #3: Timeout (15001 ms)
Exception in thread "main" org.telegram.api.engine.TimeoutException
at org.telegram.api.engine.TelegramApi.doRpcCall(TelegramApi.java:364)
at org.telegram.api.engine.TelegramApi.doRpcCall(TelegramApi.java:309)
at org.telegram.api.engine.TelegramApi.doRpcCall(TelegramApi.java:400)
at org.telegram.api.engine.TelegramApi.doRpcCall(TelegramApi.java:396)
at testmsg.Testmsg.main(Testmsg.java:151)
TelegramApi#1001:Timeout Iteration

Try doRpcCallSide instead of doRpcCall in TelegramApi object.
It helped me.

It started working for me after I updated Server IP Address in the MemoryStateAPI class, as shown below -
public void start(boolean isTest) {
connections = new HashMap<>();
connections.put(1, new ConnectionInfo[]{
new ConnectionInfo(1, 0, isTest ? "149.154.175.10" : "149.154.175.50", 443)
});
}

HBase Java Client does not work (MasterNotRunningException exception )

I'm trying to write a remote HBase client using Java. Here is the code for reference :
package ttumdt.app.connector;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.BinaryComparator;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.RowFilter;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
public class HBaseClusterConnector {
private final String MASTER_IP = "10.138.168.185";
private final String ZOOKEEPER_PORT = "2181";
final String TRAFFIC_INFO_TABLE_NAME = "TrafficLog";
final String TRAFFIC_INFO_COLUMN_FAMILY = "TimeStampIMSI";
final String KEY_TRAFFIC_INFO_TABLE_BTS_ID = "BTS_ID";
final String KEY_TRAFFIC_INFO_TABLE_DATE = "DATE";
final String COLUMN_IMSI = "IMSI";
final String COLUMN_TIMESTAMP = "TIME_STAMP";
private final byte[] columnFamily = Bytes.toBytes(TRAFFIC_INFO_COLUMN_FAMILY);
private final byte[] qualifier= Bytes.toBytes(COLUMN_IMSI);
private Configuration conf = null;
public HBaseClusterConnector () throws MasterNotRunningException, ZooKeeperConnectionException {
conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum",MASTER_IP);
conf.set("hbase.zookeeper.property.clientPort",ZOOKEEPER_PORT);
HBaseAdmin.checkHBaseAvailable(conf);
}
/**
* This filter will return list of IMSIs for a given btsId and ime interval
* #param btsId : btsId for which the query has to run
* #param startTime : start time for which the query has to run
* #param endTime : end time for which the query has to run
* #return returns IMSIs as set of Strings
* #throws IOException
*/
public Set<String> getInfoPerBTSID(String btsId, String date,
String startTime, String endTime)
throws IOException {
Set<String> imsis = new HashSet<String>();
//ToDo : better exception handling
HTable table = new HTable(conf, TRAFFIC_INFO_TABLE_NAME);
Scan scan = new Scan();
scan.addColumn(columnFamily,qualifier);
scan.setFilter(prepFilter(btsId, date, startTime, endTime));
// filter to build where timestamp
Result result = null;
ResultScanner resultScanner = table.getScanner(scan);
while ((result = resultScanner.next())!= null) {
byte[] obtainedColumn = result.getValue(columnFamily,qualifier);
imsis.add(Bytes.toString(obtainedColumn));
}
resultScanner.close();
return imsis;
}
//ToDo : Figure out how valid is this filter code?? How comparison happens
// with eqaul or grater than equal etc
private Filter prepFilter (String btsId, String date,
String startTime, String endTime)
{
byte[] tableKey = Bytes.toBytes(KEY_TRAFFIC_INFO_TABLE_BTS_ID);
byte[] timeStamp = Bytes.toBytes(COLUMN_TIMESTAMP);
// filter to build -> where BTS_ID = <<btsId>> and Date = <<date>>
RowFilter keyFilter = new RowFilter(CompareFilter.CompareOp.EQUAL,
new BinaryComparator(Bytes.toBytes(btsId+date)));
// filter to build -> where timeStamp >= startTime
SingleColumnValueFilter singleColumnValueFilterStartTime =
new SingleColumnValueFilter(columnFamily, timeStamp,
CompareFilter.CompareOp.GREATER_OR_EQUAL,Bytes.toBytes(startTime));
// filter to build -> where timeStamp <= endTime
SingleColumnValueFilter singleColumnValueFilterEndTime =
new SingleColumnValueFilter(columnFamily, timeStamp,
CompareFilter.CompareOp.LESS_OR_EQUAL,Bytes.toBytes(endTime));
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays
.asList((Filter) keyFilter,
singleColumnValueFilterStartTime, singleColumnValueFilterEndTime));
return filterList;
}
public static void main(String[] args) throws IOException {
HBaseClusterConnector flt = new HBaseClusterConnector();
Set<String> imsis= flt.getInfoPerBTSID("AMCD000784", "26082013","104092","104095");
System.out.println(imsis.toString());
}
}
I'm currently using Cloudera quick start VM to test this.
The problem is; if i run this very code on VM it works absolutely fine. But it fails with below error if it is run from outside. And I'm suspecting it has something to do with the VM setting rather than anything else. Please note that I've already checked if I can connect to the node manager / job tracker of the VM from host machine and it works absolutely fine. When I run the code from my host OS instead of running it on VM; I get the below error :
2013-10-15 18:16:04.185 java[652:1903] Unable to load realm info from SCDynamicStore
Exception in thread "main" org.apache.hadoop.hbase.MasterNotRunningException: Retried 1 times
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:138)
at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:1774)
at ttumdt.app.connector.HBaseClusterConnector.<init>(HBaseClusterConnector.java:47)
at ttumdt.app.connector.HBaseClusterConnector.main(HBaseClusterConnector.java:117)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Process finished with exit code 1
Please note that; the master node is actually running. The zookeper log shows that it has established connection with the host OS :
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
6:16:03.274 PM INFO org.apache.zookeeper.server.ZooKeeperServer
Client attempting to establish new session at /10.138.169.81:50567
6:16:03.314 PM INFO org.apache.zookeeper.server.ZooKeeperServer
Established session 0x141bc2487440004 with negotiated timeout 60000 for client /10.138.169.81:50567
6:16:03.964 PM INFO org.apache.zookeeper.server.PrepRequestProcessor
Processed session termination for sessionid: 0x141bc2487440004
6:16:03.996 PM INFO org.apache.zookeeper.server.NIOServerCnxn
Closed socket connection for client /10.138.169.81:50567 which had sessionid 0x141bc2487440004
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
But I see no trace of any activity in Master or RegionServer log.
Please note that my host OS is Mac OSX 10.7.5
As per the resource available; this should work fine; though some suggest the simple HBase java client never works. I'm confused; and eagerly waiting for pointers !!! Please reply

start your hiveserver2 on different port and then try connecting
command to connect hiveserver2 on different port (make sure hive is in path ) :
hive --service hiveserver2 --hiveconf hive.server2.thrift.port=13000

The HBase Java client certainly does work!
The most likely explanation is that your client can't see the machine that the master is running on for some reason.
One possible explanation is that, although you are connecting to Zookeeper using an IP address, the HBase client is attempting to connect to the master using its hostname.
So, if you ensure that you have entries in your hosts file (on the client) that match the hostname of the machine running the master, this may resolve the problem.
Check that you can access the master Web UI at <hostname>:60010 from your client machine.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Kafka Topic creation from Java API [duplicate] - java

Related

How to track committed offset with Spark job for kafka batch

Clustering in Akka

Can produce to Kafka but cannot consume

TimeOut Exception while doRpcCall-Importing contacts in TeleGram java API

HBase Java Client does not work (MasterNotRunningException exception )

Categories

Resources