Kafka consumer group, set offset to 0 when consumer group is created - java

I'm creating a new Kafka consumer like this in Java (some code omitted for brevity)
final Properties props = new Properties();
props.put(ConsumerConfig.GROUP_ID_CONFIG, "group2");
final Consumer<Long, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("topicname"));
This creates also the consumer group automatically, in case it does not yet exist. The problem is that the offset of this consumer group is not at the beginning of the topic, but at the end.
How can I ensure that the offset is at 0 when group is created (but not otherwise)? I don't want to manually track the offset, just set it to 0 when creating the consumer if the consumer group does not yet exist.

If you didn't specify any value for auto.offset.reset in the consumer config, it is default to "latest" offset.
You need to set it to "earliest" if you want to consume from offset 0:
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

Related

How to consume messages from Kafka in Java, starting from a specific offset

reading from the earliest:
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
reading from the latest:
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
but what line should I use if I want to start for example: from 18th commit forward?
You can use seek() in order to force the consumer to start consuming from a specific offset:
public void seek(TopicPartition partition, long offset)
Overrides the fetch offsets that the consumer will use on the next poll(timeout). If this API is invoked for the
same partition more than once, the latest offset will be used on the
next poll(). Note that you may lose data if this API is arbitrarily
used in the middle of consumption, to reset the fetch offsets
For example, let's assume you want to start from offset 18:
TopicPartition topicPartition = new TopicPartition("myTopic", 0);
Long startOffset = 18L;
List<TopicPartition> topics = Arrays.asList(topicPartition);
consumer.assign(topics);
consumer.seek(topicPartition, startOffset);
// Then consume messages as normal..

Kafka consumer api failed to subscribe to topic

I am using simple Kafka client API. As far as I know there are two ways to consumer messages, subscribe to a topic and assign partition to consumer.
However the first method does not work. Consumer poll() would hang forever. It only works with assign.
// common config for consumer
Map<String, Object> config = new HashMap<>();
config.put("bootstrap.servers", bootstrap);
config.put("group.id", KafkaTestConstants.KAFKA_GROUP);
config.put("enable.auto.commit", "true");
config.put("auto.offset.reset", "earliest");
config.put("key.deserializer", StringDeserializer.class.getName());
config.put("value.deserializer", StringDeserializer.class.getName());
StringDeserializer deserializer = new StringDeserializer();
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(config, deserializer, deserializer);
// subscribe does not work, poll() hangs
consumer.subscribe(Arrays.asList(KafkaTestConstants.KAFKA_TOPIC));
Here is the code that works by assigning the partition.
// assign works
TopicPartition tp = new TopicPartition(KafkaTestConstants.KAFKA_TOPIC, 0);
List<TopicPartition> tps = Arrays.asList(tp);
consumer.assign(tps);
Since I'd like to utilize the auto commit feature which is supposed to only work with consumer group management according to this post. Why does not subscribe() work?
I faced the same issue.
I was using the kafka_2.12 jar version, when I downgrade it to kafka_2.11 it worked.

Can a Kafka broker retain messages while there are no consumers connected?

I am trying to build a pub/sub application and I am exploring the best tools out there. I am currently looking at Kafka and have a little demo app already running. However, I am running into a conceptual issue.
I have a producer (Java code):
String topicName = "MyTopic;
String key = "MyKey";
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092,localhost:9093");
props.put("acks", "all");
props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
Producer<String, byte[]> producer = new KafkaProducer <String, byte[]>(props);
byte[] data = <FROM ELSEWHERE>;
ProducerRecord<String, byte[]> record = new ProducerRecord<String, byte[]>(topicName, key, data);
try {
RecordMetadata result = producer.send(record).get();
}
catch (Exception e) {
// Nothing for now
}
producer.close();
When I start a consumer via the Kakfa command line tools:
kafka-console-consumer --bootstrap-server localhost:9092 --topic MyTopic
and then I execute the producer code, I see the data message show up on my consumer terminal.
However, if I do not run the consumer prior executing the producer, the message appears "lost". When I start the consumer (after executing the producer), nothing appears in the consumer terminal.
Does anyone know if it's possible to have the Kafka broker retain messages while there are no consumers connected? If so, how?
Append --from-beginning to the console consumer command to have it start consuming from the earliest offset. This is actually about the offset reset strategy which is controlled by config auto.offset.reset. Here is what this config means:
What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found for the consumer's group
anything else: throw exception to the consumer.

Consume live messages in Kafka

I have started my zookeeper and Kafka server.
I started my Kafka producer which sends 10 messages with topic 'xxx'. Then stopped my Kafka producer.
Now I started my Kafka consumer and subscribed with topic 'xxx'. My consumer consumes those 10 messages sent by my Kafka producer, which is not running now.
I need my Kafka consumer should only consume messages from running Kafka server.
Is there any way to achieve this ?
Following things in my consumer properties.
props.put("bootstrap.servers", "localhost:9092");
String consumeGroup = "cg1";
props.put("group.id", consumeGroup);
props.put("enable.auto.commit", "true");
props.put("auto.offset.reset", "earliest");
props.put("auto.commit.interval.ms", "100");
props.put("heartbeat.interval.ms", "3000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.ByteArrayDeserializer");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
Set the following property :
consumerProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
It tells the consumer to read only the latest messages , that is , the messages which were published after the consumer started.
Please create a new topic and keep the property ConsumerConfig.AUTO_OFFSET_RESET_CONFIG to "latest". make sure to not commit the offset.
i.e we should not use commitSync()
By default, receivers start consuming records from the last committed offset of each assigned partition. If a committed offset is not available, the offset reset strategy ConsumerConfig#AUTO_OFFSET_RESET_CONFIG configured for the KafkaConsumer is used to set the start offset to the earliest or latest offset on the partition.
I think in your case you are committing the offset or there ia committ offset available for given topic.

Specify number of partitions on Kafka producer

I have the following code from the web page https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
What seems to be missing is how to configure the number of partitions. I want to specify 4 partitions but am always ending up with the default of 2 partitions. How do I change the code to have 4 partitions(without changing the default).
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092,broker2:9092");
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("partitioner.class", "com.gnip.kafka.SimplePartitioner");
props.put("request.required.acks", "1");
props.put("num.partitions", 4);
ProducerConfig config = new ProducerConfig(props);
Producer<String, String> producer = new Producer<String, String>(config);
Random rnd = new Random();
for (long nEvents = 0; nEvents < 1000; nEvents++) {
long runtime = new Date().getTime();
String ip = "192.168.2." + rnd.nextInt(255);
String msg = runtime + ",www.example.com," + ip;
KeyedMessage<String, String> data = new KeyedMessage<String, String>("page_visits2", ip, msg);
producer.send(data);
}
producer.close();
The Kafka producer api does not allow you to create custom partition, if you try to produce some data to a topic which does not exists it will first create the topic if the auto.create.topics.enable property in the BrokerConfig is set to TRUE and start publishing data on the same but the number of partitions created for this topic will based on the num.partitions parameter defined in the configuration files (by default it is set to one).
Increasing partition count for an existing topic can be done, but it'll not move any existing data into those partitions.
To create a topic with different number of partition you need to create the topic first and the same can be done with the console script that shipped along with the Kafka distribution. The following command will allow you to create a topic with 2 partition (as specified by the --partition flag)
bin/kafka-create-topic.sh --zookeeper localhost:2181 --replica 3 --partition 2 --topic my-custom-topic
Unfortunately as far as my understanding goes currently there is no direct alternative to achieve this.
The number of partitions is a broker property and will not have any effect for the producer, see here.
As the producer example page shows, you can use a custom partitioner to route messages as you prefer but new partitions will not be created if not defined in the broker properties.

Categories

Resources