Apache storm Kafka SpoutConfig for Zookeeper quorum - java

To configure a kafkaSpout, it Takes BrokerHosts which in turn take one zookeeper host.
BrokerHosts host = new ZkHosts("server-1:2181");
SpoutConfig spoutConfig = new SpoutConfig(host, TopologyConstants.KAFKA_QUEUE.SOURCE,
"/" + TopologyConstants.KAFKA_QUEUE.SOURCE, ID);
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
Problem is, If i have a zookeeper quorum(cluster of 3 zk servers), how do i configure KafkaSpout to take all the members of quorum instead of only 1.
As one zookeeper server may get down and the whole topology will be unavailable.

Found answer to my question:
connection string format is "host1:port1,host2:port2,host3:port3..." (storm-kafka uses Curator under the hood), so just supply multiple Zookeeper server urls to the ZkHosts constructor

Related

Topic created in all kafka port

server.propereties setup:
listeners=PLAINTEXT://:29092, SSL://:29093
SSL related set too done.
so that we can connect 29092 for plaintext and 29093 along with SSL setup.
Here am trying to produce data into port 29093 as below
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, System.getProperty("kafkaPort", "localhost:29093"));
//SSL related setup too done in props
Producer<Long, String> producer = new KafkaProducer<>(props, new LongSerializer(), new KafkaSerializer());
final ProducerRecord<Long, String> record = new ProducerRecord<Long, String>(System.getProperty("kafkaTopic", "dqerror"),
content);
RecordMetadata metadata = producer.send(record).get();
After publishing dqerror topic created in both also data get published in both
Data is published into two topic.
Actually, am trying to find is any possible to restrict to drop data into a specific port ?
Data is not published in "both" ports. There is only one Kafka cluster that is listening on two ports. There is one set of disks that the data is written into on your one broker.
Also, from what I can tell, there is only one topic used in your code.
If you want to restrict TCP traffic on any port, that would be a firewall rule from the OS, rather than any Kafka settings or Java code.

How to connect Apache Ignite Thin Client to Apache Ignite Cluster?

I have a running Ignite cluster and I use AWS S3 for node discovery:
TcpDiscoveryS3IpFinder ipFinder = new TcpDiscoveryS3IpFinder();
BasicAWSCredentials awsCredentials =
new BasicAWSCredentials(igniteAccessKey, igniteSecretAccessKey);
ipFinder.setAwsCredentials(awsCredentials);
ipFinder.setBucketName(igniteBucketName);
ipFinder.setBucketEndpoint("s3.eu-central-1.amazonaws.com");
TcpDiscoverySpi spi = new TcpDiscoverySpi();
spi.setIpFinder(ipFinder);
IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setClientMode(true);
cfg.setDiscoverySpi(spi);
Ignition.start(cfg);
It works very well and I can connect to this cluster using Apache Ignite Client nodes.
But what about Apache Ignite Thin Client? Thin client uses ClientConfiguration class (instead of IgniteConfiguration) that requires a list of IP addresses of cluster nodes. AFAIK one can only hardcode that list of IP addresses. From official documentation:
ClientConfiguration cfg = new ClientConfiguration().setAddresses("127.0.0.1:10800");
try (IgniteClient client = Ignition.startClient(cfg)) {
ClientCache<Integer, String> cache = client.cache("myCache");
// Get data from the cache
}
So I have questions:
How should I handle situations when a list of IP addresses change?
Is there any way to use node discovery for Thin clients?
You can use domain names as well as IP addresses. But you can't use node discovery with thin clients.

How to list all producers of a kafka cluster?

I am able to list all Kafka Consumer with KafkaAdminClient:
AdminClient client = AdminClient.create(conf);
ListTopicsResult ltr = client.listTopics();
KafkaFuture<Set<String>> names = ltr.names();
ArrayList<ConsumerGroupListing> consumerGroups = new ArrayList<>(client.listConsumerGroups().all().get());
ConsumerGroupListing consumerGroup = consumerGroups.get(0);
Is it possible to list all registrated producers in a similar way?
In contrast to consumers, it is not possible to retrieve such information since Kafka brokers don't store any kind of information about the producers connected to them.

How to set read request timeout for cassandra

I try to create new endpoints for cassandra with different read request timeout. The endpoint with big timeout for requests with big data responds.
I found Scala code with com.datastax.cassandra driver and cassandra-default.yaml with read_request_timeout parameter. How to set read_request_timeout in Cluster builder or in other places in code ?
Cluster
.builder
.addContactPoints(cassandraHost.split(","): _*)
.withPort(cassandraPort)
.withRetryPolicy(DefaultRetryPolicy.INSTANCE)
.withLoadBalancingPolicy(
new TokenAwarePolicy(DCAwareRoundRobinPolicy.builder().build())).build
# How long the coordinator should wait for read operations to complete
read_request_timeout_in_ms: 5000
Set at query level using :
session.execute(
new SimpleStatement("CQL HERE").setReadTimeoutMillis(65000));
If you want to set while cluster bulding use :
Cluster cluster = Cluster.builder()
.addContactPoint("127.0.0.1")
.withSocketOptions(
new SocketOptions()
.setConnectTimeoutMillis(2000))
.build();
Socket Options

Kafka consumer in Spark Streaming

Trying to write a Spark Streaming job that consumes messages from Kafka. Here’s what I’ve done so far:
Started Zookeeper
Started Kafka Server
Sent a few messages to the server. I can see them when I run the following:
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic mytopic --from-beginning
Now trying to write a program to count # of messages coming in within 5 minutes.
The code looks something like this:
Map<String, Integer> map = new HashMap<String, Integer>();
map.put("mytopic", new Integer(1));
JavaStreamingContext ssc = new JavaStreamingContext(
sparkUrl, " Spark Streaming", new Duration(60 * 5 * 1000), sparkHome, new String[]{jarFile});
JavaPairReceiverInputDStream tweets = KafkaUtils.createStream(ssc, "localhost:2181", "1", map);
Not sure what value to use for the 3rd argument (consumer group). When I run this I get Unable to connect to zookeeper server. But Zookeeper is running on port 2181; otherwise step #3 would not have worked.
Seems like I am not using KafkaUtils.createStream properly. Any ideas?
There is no such thing as default consumer group. You can use an arbitrary non-empty string there. If you have only one consumer, its consumer group doesn't really matter. If there are two or more consumers, they can either be a part of the same consumer group or belong to different consumer groups.
From http://kafka.apache.org/documentation.html :
Consumers
...
If all the consumer instances have the same consumer group, then this
works just like a traditional queue balancing load over the consumers.
If all the consumer instances have different consumer groups, then
this works like publish-subscribe and all messages are broadcast to
all consumers.
I think the problem may be in 'topics' parameter.
From Spark docs:
Map of (topic_name -> numPartitions) to consume. Each partition is consumed in its own thread
You only specified a single partition for your topic, namely '1'. Depending on broker's setting (num.partitions), there may be more than one partitions and your messages may be sent to other partitions which aren't read by your program.
Besides, I believe the partitionIds are 0 based. So if you have only one partition, it will have the id equal to 0.
I think you should specify the ip for zookeeper instead of localhost. Also, the third argument is for consumer group name. It can be any name you like. It is for the time when you have more than one consumer tied to the same group,topic partitions are distributed accordingly.Your tweets should be:
JavaPairReceiverInputDStream tweets = KafkaUtils.createStream(ssc, "x.x.x.x", "dummy-group", map);
I was facing the same issue. Here is the solution that worked for me.
The number of cores allocated to the Spark Streaming application must be more than the number of receivers. Otherwise the system will receive data, but not be able to process it.So Spark Streaming requires minimum of two cores . So in my spark-submit, I should mention at-least two cores.
kafka-clients-version.jar should be included in the list of dependent jars in spark-submit.
If zookeeper is running on the same machine as your streaming application then "localhost:2181" will work. Otherwise, you have to mention the address of the host where zookeeper is running and ensure that machine on which streaming app is running is able to talk to zookeeper host on port 2181.
I think, in your code, the second argument for the call
KafkaUtils.createStream, should be the host:port of the kafka server, not the zookeeper host and port. check that once.
EDIT:
Kafka Utils API Documentation
As per the document above, it should be the zookeeper quorum . So Zookeeper hostname and port should be used.
zkQuorum
Zookeeper quorum (hostname:port,hostname:port,..).

Categories

Resources