How to find first message which is not deleted in Kafka topic - java

I have a topic and there are some messages in it. A part of messages was deleted after retention time was out including a message on which offset points and many messages after it.
How to find the oldest existing message in the topic? I heard there is some method in API but I can't find it. If someone knows how to do it, please help, 'cause I'm already tortured by searching it in the documentation. Thanks in advance.

You can use kafka-console-consumer with --max-messages 1 and --from-beginning in order to fetch the oldest message:
kafka-console-consumer --bootstrap-server localhost:9092 --topic topic_name --from-beginning --max-messages 1

You can use the beginningOffsets() method from the Consumer API to find the oldest message available.
For example:
Properties configs = new Properties();
configs.put("bootstrap.servers", "localhost:9092");
configs.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
configs.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(configs);) {
Map<TopicPartition, Long> offsets = consumer.beginningOffsets(Collections.singletonList(TP));
System.out.println(offsets);
}
That will print something like:
{offset-test-0=0}
In this example, offset 0 is the oldest offset available.

Related

How to increase request time for kafka consumer

I am using Kafka with Spring Listener. Following is piece of code.
In the past we have published more than 100k messages to test topic and system seems to be working fine.
But few days back, I changed the groupId of consumer. After that this new consumer tried to process all the message from start and which takes lot of time. But after sometime may be (10 sec) broker kickoff the consumer.
so result no kafka register to listen message.
#KafkaListener(
topicPattern = "test",
groupId = "test",
id = "test",
containerFactory = "testKafkaListenerContainerFactory")
public void consume(#Payload String payload) throws IOException {
}
Kafka Consumer configuration:
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
props.put("security.protocol", "SSL");
Then I used cli to read message with following command and observed same behavior. After exactly 10 sec consumer stop reading message from kafka.
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test
How to increase request time out for kafka client or some other better approach to solution this issue?
In the past we have published more than 1 lac message to test topic and system seems to be working fine.
Does your Kafka has the required data? Kafka does not store messages forever. The duration of messages is governed by the retention set in the broker/s. If your data is older than the retention period, it will be lost.

Kafka command-line consumer reads, but cannot read through Java

I have manually created topic test with this command:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
and using this command:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
I inserted these records:
This is a message
This is another message
This is a message2
First, I consume messages through the command line like this:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
and all the records are successfully shown. Then, I try to implement a consumer in Java using this code:
public class KafkaSubscriber {
public void consume() {
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "test-consumer-group");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
Consumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("test"));
// also with this command
// consumer.subscribe(Arrays.asList("test"));
System.out.println("Starting to read data...");
try {
while (true) {
try {
ConsumerRecords<String, String> records = consumer.poll(100);
System.out.println("Number of records found: " + records.count());
for (ConsumerRecord rec : records) {
System.out.println(rec.value());
}
}
catch (Exception ex) {
ex.printStackTrace();
}
}
}
catch (Exception e) {
e.printStackTrace();
} finally {
consumer.close();
}
}
But the output is:
Starting to read data...
0
0
0
0
0
....
Which means that it does not find any records in topic test. I also tried to publish some records after the Java consumer has started, but the same again. Any ideas what might be going wrong?
EDIT: After adding the following line:
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
the consumer now reads only when I write new records to the topic. It does not read all the records from the beggining.
By default, if no offsets have previously been committed for the group, the consumer starts at the end topics.
Hence if you are running it after having produced records, it won't receive them.
Notice in your kafka-console-consumer.sh, you have the --from-beginning flag which forces the consumer to instead start from the beginning of the topic.
One workaround, as suggested in a comment, is to set ConsumerConfig.AUTO_OFFSET_RESET_CONFIG to earliest. However I'd be careful with that setting as your consumer will consume from the start of the topics and this could be a lot of data in a real use case.
The easiest solution is now that you've run your consumer once and it has created a group, you can simply rerun the producer. After that when you run the consumer again it will pick up from its last position which will be before the new producer messages.
On the other hand, if you mean to always reconsume all messages then you have 2 options:
explicitely use seekToBeginning() when your consumer starts to move its position to the start of topics
set auto.offset.reset to earliest and disable auto offset commit by setting enable.auto.commit to false

Why is there a lag when consuming from Apache Kafka using Java KafkaConsumer

I hope someone can shed some light on my issue.
I'm making a series of REST calls to a micro-service I have built. When the service receives the calls, it persists some data to a database and then publishes a message to a Kafka topic.
I'm trying to write a test that makes the REST calls and then consumes the messages from the Kafka topic.
When my test tries to consume from Kafka, it doesnt appear to be consuming the latest messages.
I've used the kafka-consumer-groups.sh shell script which ships with Kafka to describe the state of my consumer. Here is what it looks like:
bash-4.3# ./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group test
Note: This will only show information about consumers that use the Java consumer API (non-ZooKeeper-based consumers).
Consumer group 'test' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
test_topic 0 42 43 1 - - -
Note the lag of 1. This is what appears to be my issue.
Here is my Kafka consumer code:
public void consumeMessages() {
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "test");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
try (KafkaConsumer < String, String > kafkaConsumer = new KafkaConsumer < > (props)) {
kafkaConsumer.subscribe(Collections.singletonList("test_topic"));
ConsumerRecords < String, String > consumerRecords = kafkaConsumer.poll(5000);
for (ConsumerRecord < String, String > consumerRecord: consumerRecords) {
System.out.printf("offset = %d, message = %s%n", consumerRecord.offset(), consumerRecord.value());
}
}
}
Any help will be greatly received.
Thanks,
Ben :)

Can a Kafka broker retain messages while there are no consumers connected?

I am trying to build a pub/sub application and I am exploring the best tools out there. I am currently looking at Kafka and have a little demo app already running. However, I am running into a conceptual issue.
I have a producer (Java code):
String topicName = "MyTopic;
String key = "MyKey";
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092,localhost:9093");
props.put("acks", "all");
props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
Producer<String, byte[]> producer = new KafkaProducer <String, byte[]>(props);
byte[] data = <FROM ELSEWHERE>;
ProducerRecord<String, byte[]> record = new ProducerRecord<String, byte[]>(topicName, key, data);
try {
RecordMetadata result = producer.send(record).get();
}
catch (Exception e) {
// Nothing for now
}
producer.close();
When I start a consumer via the Kakfa command line tools:
kafka-console-consumer --bootstrap-server localhost:9092 --topic MyTopic
and then I execute the producer code, I see the data message show up on my consumer terminal.
However, if I do not run the consumer prior executing the producer, the message appears "lost". When I start the consumer (after executing the producer), nothing appears in the consumer terminal.
Does anyone know if it's possible to have the Kafka broker retain messages while there are no consumers connected? If so, how?
Append --from-beginning to the console consumer command to have it start consuming from the earliest offset. This is actually about the offset reset strategy which is controlled by config auto.offset.reset. Here is what this config means:
What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found for the consumer's group
anything else: throw exception to the consumer.

KafkaProducer not sending Record

I'm completely new to Kafka and i have some troubles using the KafkaProducer.
The send Method of the producer blocks exactly 1min and then the application proceeds without an exception. This is obviously some timeout but no Exception is thrown.
I can also see nothing really in the logs.
The servers seam to be setup correctly. If i use the bin/kafka-console-consumer and producer applications i can send and receive messages correctly. Also the code seams to work to some extend.
If i want to write to a topic which does not exist yet i can see in the /tmp/kafka-logs folder the new entry and also in the console output of the KafkaServer.
Here is the Code i use:
Properties props = ResourceUtils.loadProperties("kafka.properties");
Producer<String, String> producer = new KafkaProducer<>(props);
for (String line : lines)
{
producer.send(new ProducerRecord<>("topic", Id, line));
producer.flush();
}
producer.close();
The properties in the kafka.properties file:
bootstrap.servers=localhost:9092
key.serializer=org.apache.kafka.common.serialization.StringSerializer
value.serializer=org.apache.kafka.common.serialization.StringSerializer
acks=all
retries=0
batch.size=16384
linger.ms=1
buffer.memory=33554432
So, producer.send blocks for 1minute and then it continues. At the end nothing is stored in Kafka, but the new topic is created.
Thank you for any help!
Try set the bootstrap.servers to 127.0.0.1:9092

Categories

Resources