Kafka consumer api failed to subscribe to topic - java

I am using simple Kafka client API. As far as I know there are two ways to consumer messages, subscribe to a topic and assign partition to consumer.
However the first method does not work. Consumer poll() would hang forever. It only works with assign.
// common config for consumer
Map<String, Object> config = new HashMap<>();
config.put("bootstrap.servers", bootstrap);
config.put("group.id", KafkaTestConstants.KAFKA_GROUP);
config.put("enable.auto.commit", "true");
config.put("auto.offset.reset", "earliest");
config.put("key.deserializer", StringDeserializer.class.getName());
config.put("value.deserializer", StringDeserializer.class.getName());
StringDeserializer deserializer = new StringDeserializer();
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(config, deserializer, deserializer);
// subscribe does not work, poll() hangs
consumer.subscribe(Arrays.asList(KafkaTestConstants.KAFKA_TOPIC));
Here is the code that works by assigning the partition.
// assign works
TopicPartition tp = new TopicPartition(KafkaTestConstants.KAFKA_TOPIC, 0);
List<TopicPartition> tps = Arrays.asList(tp);
consumer.assign(tps);
Since I'd like to utilize the auto commit feature which is supposed to only work with consumer group management according to this post. Why does not subscribe() work?

I faced the same issue.
I was using the kafka_2.12 jar version, when I downgrade it to kafka_2.11 it worked.

Related

How to increase request time for kafka consumer

I am using Kafka with Spring Listener. Following is piece of code.
In the past we have published more than 100k messages to test topic and system seems to be working fine.
But few days back, I changed the groupId of consumer. After that this new consumer tried to process all the message from start and which takes lot of time. But after sometime may be (10 sec) broker kickoff the consumer.
so result no kafka register to listen message.
#KafkaListener(
topicPattern = "test",
groupId = "test",
id = "test",
containerFactory = "testKafkaListenerContainerFactory")
public void consume(#Payload String payload) throws IOException {
}
Kafka Consumer configuration:
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
props.put("security.protocol", "SSL");
Then I used cli to read message with following command and observed same behavior. After exactly 10 sec consumer stop reading message from kafka.
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test
How to increase request time out for kafka client or some other better approach to solution this issue?
In the past we have published more than 1 lac message to test topic and system seems to be working fine.
Does your Kafka has the required data? Kafka does not store messages forever. The duration of messages is governed by the retention set in the broker/s. If your data is older than the retention period, it will be lost.

Can a Kafka broker retain messages while there are no consumers connected?

I am trying to build a pub/sub application and I am exploring the best tools out there. I am currently looking at Kafka and have a little demo app already running. However, I am running into a conceptual issue.
I have a producer (Java code):
String topicName = "MyTopic;
String key = "MyKey";
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092,localhost:9093");
props.put("acks", "all");
props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
Producer<String, byte[]> producer = new KafkaProducer <String, byte[]>(props);
byte[] data = <FROM ELSEWHERE>;
ProducerRecord<String, byte[]> record = new ProducerRecord<String, byte[]>(topicName, key, data);
try {
RecordMetadata result = producer.send(record).get();
}
catch (Exception e) {
// Nothing for now
}
producer.close();
When I start a consumer via the Kakfa command line tools:
kafka-console-consumer --bootstrap-server localhost:9092 --topic MyTopic
and then I execute the producer code, I see the data message show up on my consumer terminal.
However, if I do not run the consumer prior executing the producer, the message appears "lost". When I start the consumer (after executing the producer), nothing appears in the consumer terminal.
Does anyone know if it's possible to have the Kafka broker retain messages while there are no consumers connected? If so, how?
Append --from-beginning to the console consumer command to have it start consuming from the earliest offset. This is actually about the offset reset strategy which is controlled by config auto.offset.reset. Here is what this config means:
What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found for the consumer's group
anything else: throw exception to the consumer.

KafkaProducer not sending Record

I'm completely new to Kafka and i have some troubles using the KafkaProducer.
The send Method of the producer blocks exactly 1min and then the application proceeds without an exception. This is obviously some timeout but no Exception is thrown.
I can also see nothing really in the logs.
The servers seam to be setup correctly. If i use the bin/kafka-console-consumer and producer applications i can send and receive messages correctly. Also the code seams to work to some extend.
If i want to write to a topic which does not exist yet i can see in the /tmp/kafka-logs folder the new entry and also in the console output of the KafkaServer.
Here is the Code i use:
Properties props = ResourceUtils.loadProperties("kafka.properties");
Producer<String, String> producer = new KafkaProducer<>(props);
for (String line : lines)
{
producer.send(new ProducerRecord<>("topic", Id, line));
producer.flush();
}
producer.close();
The properties in the kafka.properties file:
bootstrap.servers=localhost:9092
key.serializer=org.apache.kafka.common.serialization.StringSerializer
value.serializer=org.apache.kafka.common.serialization.StringSerializer
acks=all
retries=0
batch.size=16384
linger.ms=1
buffer.memory=33554432
So, producer.send blocks for 1minute and then it continues. At the end nothing is stored in Kafka, but the new topic is created.
Thank you for any help!
Try set the bootstrap.servers to 127.0.0.1:9092

Kafka pattern subscription. Rebalancing is not being triggered on new topic

According to the documentation on kafka javadocs if I:
Subscribe to a pattern
Create a topic that matches the pattern
A rebalance should occur, which makes the consumer read from that new topic. But that's not happening.
If I stop and start the consumer, it does pick up the new topic. So I know the new topic matches the pattern. There's a possible duplicate of this question in https://stackoverflow.com/questions/37120537/whitelist-filter-in-kafka-doesnt-pick-up-new-topics but that question got nowhere.
I'm seeing the kafka logs and there are no errors, it just doesn't trigger a rebalance. The rebalance is triggered when consumers join or die, but not when new topics are created (not even when partitions are added to existing topics, but that's another subject).
I'm using kafka 0.10.0.0, and the official Java client for the "New Consumer API", meaning broker GroupCoordinator instead of fat client + zookeeper.
This is the code for the sample consumer:
public class SampleConsumer {
public static void main(String[] args) throws IOException {
KafkaConsumer<String, String> consumer;
try (InputStream props = Resources.getResource("consumer.props").openStream()) {
Properties properties = new Properties();
properties.load(props);
properties.setProperty("group.id", "my-group");
System.out.println(properties.get("group.id"));
consumer = new KafkaConsumer<>(properties);
}
Pattern pattern = Pattern.compile("mytopic.+");
consumer.subscribe(pattern, new SampleRebalanceListener());
while (true) {
ConsumerRecords<String, String> records = consumer.poll(1000);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("%s %s\n", record.topic(), record.value());
}
}
}
}
In the producer, I'm sending messages to topics named mytopic1, mytopic2, etc.
Patterns are pretty much useless if the rebalance is not triggered.
Do you know why the rebalance is not happening?
The documentation mentions "The pattern matching will be done periodically against topics existing at the time of check.". It turns out the "periodically" corresponds to the metadata.max.age.ms property. By setting that property (inside "consumer.props" in my code sample) to i.e. 5000 I can see it detects new topics and partitions every 5 seconds.
This is as designed, according to this jira ticket https://issues.apache.org/jira/browse/KAFKA-3854:
The final note on the JIRA stating that a later created topic that matches a consumer's subscription pattern would not be assigned to the consumer upon creation seems to be as designed. A repeat subscribe() to the same pattern would be needed to handle that case.
The refresh metadata polling does the "repeat subscribe()" mentioned in the ticket.
This is confusing coming from Kafka 0.8 where there was true triggering based on zookeper watches, instead of polling. IMO 0.9 is more of a downgrade for this scenario, instead of "just in time" rebalancing, this becomes either high frequency polling with overhead, or low frequency polling with long times before it reacts to new topics/partitions.
to trigger a rebalance immediately, you can explicitly make a poll call after subscribe to the topic:
kafkaConsumer.poll(pollDuration);
refer to:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-568%3A+Explicit+rebalance+triggering+on+the+Consumer
In your consumer code, use the following:
properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, EARLIEST)
and try again

Read the data from specific partition of topic in Kafka broker via Spark Streaming

I am new to Spark an apology for asking such a question. I have a use case where I want to read a data from a specific partition of a topic with the help of Spark Streaming. I am using Spark Java API to do all the stuff.
I have created a topic named test with replication factor 2 and 5 partitions. Hopefully with the help of spark streaming Kafka integration guide, I am able to do all the things like creating a JavaStreamingContext object, creating a direct stream to Kafka broker and able to read all the message from all partition.
But still my use case does not fulfill, I have to read only messages of a particular partition of a topic in Kafka broker instead of all messages from all partition.
You should be able to read specific partition from specific offset using following code.
Map<TopicAndPartition, Long> consumerOffsets = new HashMap<TopicAndPartition, Long>();
TopicAndPartition p1 = new TopicAndPartition("yourtopic","yourpartition");
consumerOffsets.put(p1,offset);
JavaInputDStream<String> messages = KafkaUtils.createDirectStream(
jssc,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
String.class,
kafkaParams,
consumerOffsetsLong,
new Function<MessageAndMetadata<String, String>, String>() {
public String call(MessageAndMetadata<String, String> msgAndMeta) throws Exception {
return msgAndMeta.message();
}
}
);

Categories

Resources