I am learning Kafka and I want to know how to specify then partition when I consume messages from a topic.
I have found several pictures like this:
It means that a consumer can consume messages from several partitions but a partition can only be read by a single consumer (within a consumer group).
Also, I have read several examples for consumer and they look like this:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "consumer-tutorial");
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", StringDeserializer.class.getName());
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
and:
Subscribe:
consumer.subscribe(Arrays.asList(“foo”, “bar”));
Poll
try {
while (running) {
ConsumerRecords<String, String> records = consumer.poll(1000);
for (ConsumerRecord<String, String> record : records)
System.out.println(record.offset() + ": " + record.value());
}
} finally {
consumer.close();
}
How does this work? From which partition will I read messages?
There are two ways to tell what topic/partitions you want to consume: KafkaConsumer#assign() (you specify the partition you want and the offset where you begin) and subscribe (you join a consumer group, and partition/offset will be dynamically assigned by group coordinator depending of consumers in the same consumer group, and may change during runtime)
In both case, you need to poll to receive data.
See https://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html, especially paragraphs Consumer Groups and Topic Subscriptions and Manual Partition Assignment
Related
I am learning Kafka recently, and my consumers can't consume any records unless I specify the --parititon 0 parameter. In other words I can NOT consume records like:
kafka-console-consumer --bootstrap-server 127.0.0.10:9092 --topic first-topic
but works like:
kafka-console-consumer --bootstrap-server 127.0.0.10:9092 --topic first-topic --partition 0
THE MAIN PROBLEM IS, when I moved to java code, my KafkaConsumer class can't fetch records, and I need to know how to specify the partition number in java KafkaConsumer ?!
my current java code is:
public class ConsumerDemo {
public static void main(String[] args) {
Logger logger = LoggerFactory.getLogger((ConsumerDemo.class.getName()));
String bootstrapServer = "127.0.0.10:9092";
String groupId = "my-kafka-java-app";
String topic = "first-topic";
// create consumer configs
Properties properties = new Properties();
properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
//properties.setProperty(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, partition);
properties.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG, groupId);
properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
// create consumer
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(properties);
// subscribe consumer to our topic
consumer.subscribe(Collections.singleton(topic)); //means subscribtion to one topic
// poll for new data
while(true){
//consumer.poll(100); old way
ConsumerRecords<String, String> records =
consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records){
logger.info("Key: " + record.key() + ", Value: "+ record.value() );
logger.info("Partition: " + record.partition() + ", Offset: "+ record.offset());
}
}
}
}
After a lot of inspection, my solution came out to be using consumer.assign and consumer.seek instead of using consumer.subscribe and without specifying the groupId. But I feel there should be a more optimal solution
the java code will be as:
// create consumer
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(properties);
// subscribe consumer to our topic
//consumer.subscribe(Collections.singleton(topic)); //means subscription to one topic
// using assign and Seek, are mostly used to replay data or fetch a specific msg
TopicPartition partitionToReadFrom = new TopicPartition(topic, 0);
long offsetToReadFrom = 15L;
// assign
consumer.assign(Arrays.asList(partitionToReadFrom));
// seek: for a specific offset to read from
consumer.seek(partitionToReadFrom, offsetToReadFrom);
The way you are doing is correct. You don't need to specify the partition when subscribing to a topic. Maybe your consumer group has already consumed all messages in the topic and has committed the latest offsets.
Make sure new messages are being produced when you run your application or create a new consumer group to consume from the beginning (if you keep the ConsumerConfig.AUTO_OFFSET_RESET_CONFIG set to "earliest")
As the name implies, the ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG property aims to configure a Partition Assignment Strategy and no to set a fixed partition as instructed by the command line.
The default strategy used is the RangeAssignor which can be changed, for example to a StickyAssignor as follows:
properties.setProperty(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,StickyAssignor.class.getName());
You can read more about Kafka Client Side Assignment Proposal.
I'm currently trying to set up a consumer to consume messages from a topic. My log says it subscribes to the topic successfully
[Consumer clientId=consumer-1, groupId=consumer-group] Subscribed to topic(s): MY-TOPIC
and it clearly shows it is a part of a group, but when I go to the control center I can't find that group, however I can find the topic that I am subscribed too. It isn't consuming the records from the topic either which I attribute to not being apart of a valid group. I know it is polling the correct topic and I know there is records on the topic as I am constantly putting them on.
Here is my start method
#PostConstruct public void start()
{
// check if the config indicates whether to start the daemon or not
if (!parseBoolean(maskBlank(shouldStartConsumer, "true")))
{
System.err.println("CONSUMER DISABLED");
logger.warn("consumer not starting -- see value of " + PROP_EXTRACTOR_START_CONSUMER);
return;
}
System.err.println("STARTING CONSUMER");
Consumer<String, String> consumer = this.createConsumer(kafkaTopicName,
StringDeserializer.class, StringDeserializer.class);
Thread daemon = new Thread(() -> {
while (true)
{
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(10));
if (records.count() > 0) // IS ALWAYS 0, Poll doesn't return records
{
printRecord(records);
records.iterator().forEachRemaining(r -> {
System.err.println("received record: " + r);
});
}
else { logger.debug("KafkaTopicConsumer::consumeMessage -- No messages found in topic {}", kafkaTopicName); }
}
});
daemon.setName(kafkaTopicName);
daemon.setDaemon(true);
daemon.start();
}
Note: createConsumer method is just adding all of my config settings and where I subscribe to my topic.
I have a feeling it has something to do with the thread... I can post some of my config if that would help as well just leave a comment. Thanks
Kafka gives useful command line tool kafka.tools.GetOffsetShell, but I need its functionality in my application.
I want to get all offsets for each partition in specified topic, like that:
bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list kafka:9092 --topic com.group.test.Foo
com.group.test.Foo:0:10
com.group.test.Foo:1:11
com.group.test.Foo:2:10
But I don't want to run process bin/kafka-run-class.sh kafka.tools.GetOffsetShell.
How can I do the same using kafka api in Java?
Do I have to create consumer and invoke: KafkaConsumer#position for each TopicPartition? I need simpler way?
By default, GetOffsetShell returns the end offset for each partitions. You could retrieve those offsets programmatically like this:
......
try (final KafkaConsumer<String, String> consumer = new KafkaConsumer<>(consumerProperties)) {
consumer.subscribe(Arrays.asList("topicName"));
Set<TopicPartition> assignment;
while ((assignment = consumer.assignment()).isEmpty()) {
consumer.poll(Duration.ofMillis(100));
}
consumer.endOffsets(assignment).forEach((tp, offset) -> System.out.println(tp + ": " + offset));
}
I have subscribed to a Kafka topic as below. I need to run some logic only after the consumer has been assigned a partition.
However consumer.assignment() comes back as an empty set no matter how long I wait. If I do not have the while loop and then do a consumer.poll() I do get the records from the topic.Can any one tell me why this is happening ?
consumer.subscribe(topics);
Set<TopicPartition> assigned=Collections.emptySet();
while(isAssigned) {
assigned = consumer.assignment();
if(!assigned.isEmpty()) {
isAssigned= false;
}
}
//consumer props
Properties props = new Properties();
props.put("bootstrap.servers", "xxx:9092,yyy:9092");
props.put("group.id", groupId);
props.put("enable.auto.commit", "false");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "io.confluent.kafka.serializers.KafkaAvroDeserializer");
props.put("schema.registry.url", "http://xxx:8081");
props.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, true);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put("max.poll.records", "100");
Until you call poll(), the consumer is just idling.
Only after poll() is invoked, it will initiate a connection to the cluster, get assigned partitions and attempt to fetch messages.
This is mentioned in the Javadoc:
After subscribing to a set of topics, the consumer will automatically
join the group when poll(Duration) is invoked.
Once you start calling poll(), you can use assignment() or even register a ConsumerRebalanceListener to be notify when partitions are assigned or revoked from your consumer instance.
I am creating consumers (a consumer group with single consumer in it) :
Properties properties = new Properties();
properties.put("zookeeper.connect","localhost:2181");
properties.put("auto.offset.reset", "largest");
properties.put("group.id", groupId);
properties.put("auto.commit.enable", "true");
ConsumerConfig consumerConfig = new ConsumerConfig(properties);
ConsumerConnector consumerConnector = Consumer.createJavaConsumerConnector(consumerConfig);
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumerConnector.createMessageStreams(topicCountMap);
consumerMap.entrySet().stream().forEach(
streams -> {
streams.getValue().stream().forEach(
stream -> {
KafkaBasicConsumer customConsumer = new KafkaBasicConsumer();
try {
Future<?> consumerFuture = kafkaConsumerExecutor.submit(customConsumer);
kafkaConsumersFuture.put(groupId, consumerFuture);
} catch (Exception e) {
logger.error("---- Got error : "+ e.getMessage());
logger.error("Exception : ", e);
}
}
);
}
);
I have subscribed 2 consumers for the same topic.
I am unsubscribing the consumer by storing its future object and then invoking
consumerFuture.cancel(Boolean.TRUE);
Now I subscribe the same consumer again with above code and it gets successfully registered.
However, when the publisher now publishes the newly subscribed consumer is not getting messages whereas the other consumer which was registered is getting messages
I am also checking offsets of consumers, they are getting updated when producer publishes but consumers are not getting messages.
Before producing :
Group Topic Pid Offset logSize Lag
A T1 0 94 94 1
Group Topic Pid Offset logSize Lag
B T1 0 94 94 1
After producing :
Group Topic Pid Offset logSize Lag
A T1 0 95 97 2
Group Topic Pid Offset logSize Lag
B T1 0 94 97 2
I am not able to figure out that if this an issue from producer side (partitions not enough) or if I have created consumer in an incorrect way
Also, I am not able to figure out what is log and lag column means in this.
Let me know if anyone can help or need more details.
I found to solution to my problem, thanks #nautilus for reminding to update.
My main intent was to provide endpoint to subscribe and unsubscribe a consumer in kafka.
Since kafka provides only subscribing and not unsubscribing (only manually possible) I had to write layer over kafka implementation.
I stored the consumer object in a static map with key as group id (since my consumer group can have only one consumer)
Problem was I was not closing consumer once created when unsubscribing and old consumer with same group id was preventing new from getting messages
private static Map kafkaConsumersFuture
Based on some parameter, find out group id
kafkaConsumersFuture.put(groupId, consumerConnector);
And while unsubcribing I did
ConsumerConnector consumerConnector = kafkaConsumersFuture.get(groupId);
if(consumerConnector!=null) {
consumerConnector.shutdown();
kafkaConsumersFuture.remove(groupId);
}