I have manually created topic test with this command:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
and using this command:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
I inserted these records:
This is a message
This is another message
This is a message2
First, I consume messages through the command line like this:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
and all the records are successfully shown. Then, I try to implement a consumer in Java using this code:
public class KafkaSubscriber {
public void consume() {
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "test-consumer-group");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
Consumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("test"));
// also with this command
// consumer.subscribe(Arrays.asList("test"));
System.out.println("Starting to read data...");
try {
while (true) {
try {
ConsumerRecords<String, String> records = consumer.poll(100);
System.out.println("Number of records found: " + records.count());
for (ConsumerRecord rec : records) {
System.out.println(rec.value());
}
}
catch (Exception ex) {
ex.printStackTrace();
}
}
}
catch (Exception e) {
e.printStackTrace();
} finally {
consumer.close();
}
}
But the output is:
Starting to read data...
0
0
0
0
0
....
Which means that it does not find any records in topic test. I also tried to publish some records after the Java consumer has started, but the same again. Any ideas what might be going wrong?
EDIT: After adding the following line:
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
the consumer now reads only when I write new records to the topic. It does not read all the records from the beggining.
By default, if no offsets have previously been committed for the group, the consumer starts at the end topics.
Hence if you are running it after having produced records, it won't receive them.
Notice in your kafka-console-consumer.sh, you have the --from-beginning flag which forces the consumer to instead start from the beginning of the topic.
One workaround, as suggested in a comment, is to set ConsumerConfig.AUTO_OFFSET_RESET_CONFIG to earliest. However I'd be careful with that setting as your consumer will consume from the start of the topics and this could be a lot of data in a real use case.
The easiest solution is now that you've run your consumer once and it has created a group, you can simply rerun the producer. After that when you run the consumer again it will pick up from its last position which will be before the new producer messages.
On the other hand, if you mean to always reconsume all messages then you have 2 options:
explicitely use seekToBeginning() when your consumer starts to move its position to the start of topics
set auto.offset.reset to earliest and disable auto offset commit by setting enable.auto.commit to false
Related
I am using Kafka with Spring Listener. Following is piece of code.
In the past we have published more than 100k messages to test topic and system seems to be working fine.
But few days back, I changed the groupId of consumer. After that this new consumer tried to process all the message from start and which takes lot of time. But after sometime may be (10 sec) broker kickoff the consumer.
so result no kafka register to listen message.
#KafkaListener(
topicPattern = "test",
groupId = "test",
id = "test",
containerFactory = "testKafkaListenerContainerFactory")
public void consume(#Payload String payload) throws IOException {
}
Kafka Consumer configuration:
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
props.put("security.protocol", "SSL");
Then I used cli to read message with following command and observed same behavior. After exactly 10 sec consumer stop reading message from kafka.
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test
How to increase request time out for kafka client or some other better approach to solution this issue?
In the past we have published more than 1 lac message to test topic and system seems to be working fine.
Does your Kafka has the required data? Kafka does not store messages forever. The duration of messages is governed by the retention set in the broker/s. If your data is older than the retention period, it will be lost.
I have a topic and there are some messages in it. A part of messages was deleted after retention time was out including a message on which offset points and many messages after it.
How to find the oldest existing message in the topic? I heard there is some method in API but I can't find it. If someone knows how to do it, please help, 'cause I'm already tortured by searching it in the documentation. Thanks in advance.
You can use kafka-console-consumer with --max-messages 1 and --from-beginning in order to fetch the oldest message:
kafka-console-consumer --bootstrap-server localhost:9092 --topic topic_name --from-beginning --max-messages 1
You can use the beginningOffsets() method from the Consumer API to find the oldest message available.
For example:
Properties configs = new Properties();
configs.put("bootstrap.servers", "localhost:9092");
configs.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
configs.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(configs);) {
Map<TopicPartition, Long> offsets = consumer.beginningOffsets(Collections.singletonList(TP));
System.out.println(offsets);
}
That will print something like:
{offset-test-0=0}
In this example, offset 0 is the oldest offset available.
I hope someone can shed some light on my issue.
I'm making a series of REST calls to a micro-service I have built. When the service receives the calls, it persists some data to a database and then publishes a message to a Kafka topic.
I'm trying to write a test that makes the REST calls and then consumes the messages from the Kafka topic.
When my test tries to consume from Kafka, it doesnt appear to be consuming the latest messages.
I've used the kafka-consumer-groups.sh shell script which ships with Kafka to describe the state of my consumer. Here is what it looks like:
bash-4.3# ./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group test
Note: This will only show information about consumers that use the Java consumer API (non-ZooKeeper-based consumers).
Consumer group 'test' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
test_topic 0 42 43 1 - - -
Note the lag of 1. This is what appears to be my issue.
Here is my Kafka consumer code:
public void consumeMessages() {
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "test");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
try (KafkaConsumer < String, String > kafkaConsumer = new KafkaConsumer < > (props)) {
kafkaConsumer.subscribe(Collections.singletonList("test_topic"));
ConsumerRecords < String, String > consumerRecords = kafkaConsumer.poll(5000);
for (ConsumerRecord < String, String > consumerRecord: consumerRecords) {
System.out.printf("offset = %d, message = %s%n", consumerRecord.offset(), consumerRecord.value());
}
}
}
Any help will be greatly received.
Thanks,
Ben :)
I am trying to build a pub/sub application and I am exploring the best tools out there. I am currently looking at Kafka and have a little demo app already running. However, I am running into a conceptual issue.
I have a producer (Java code):
String topicName = "MyTopic;
String key = "MyKey";
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092,localhost:9093");
props.put("acks", "all");
props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
Producer<String, byte[]> producer = new KafkaProducer <String, byte[]>(props);
byte[] data = <FROM ELSEWHERE>;
ProducerRecord<String, byte[]> record = new ProducerRecord<String, byte[]>(topicName, key, data);
try {
RecordMetadata result = producer.send(record).get();
}
catch (Exception e) {
// Nothing for now
}
producer.close();
When I start a consumer via the Kakfa command line tools:
kafka-console-consumer --bootstrap-server localhost:9092 --topic MyTopic
and then I execute the producer code, I see the data message show up on my consumer terminal.
However, if I do not run the consumer prior executing the producer, the message appears "lost". When I start the consumer (after executing the producer), nothing appears in the consumer terminal.
Does anyone know if it's possible to have the Kafka broker retain messages while there are no consumers connected? If so, how?
Append --from-beginning to the console consumer command to have it start consuming from the earliest offset. This is actually about the offset reset strategy which is controlled by config auto.offset.reset. Here is what this config means:
What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found for the consumer's group
anything else: throw exception to the consumer.
I have the following code from the web page https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
What seems to be missing is how to configure the number of partitions. I want to specify 4 partitions but am always ending up with the default of 2 partitions. How do I change the code to have 4 partitions(without changing the default).
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092,broker2:9092");
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("partitioner.class", "com.gnip.kafka.SimplePartitioner");
props.put("request.required.acks", "1");
props.put("num.partitions", 4);
ProducerConfig config = new ProducerConfig(props);
Producer<String, String> producer = new Producer<String, String>(config);
Random rnd = new Random();
for (long nEvents = 0; nEvents < 1000; nEvents++) {
long runtime = new Date().getTime();
String ip = "192.168.2." + rnd.nextInt(255);
String msg = runtime + ",www.example.com," + ip;
KeyedMessage<String, String> data = new KeyedMessage<String, String>("page_visits2", ip, msg);
producer.send(data);
}
producer.close();
The Kafka producer api does not allow you to create custom partition, if you try to produce some data to a topic which does not exists it will first create the topic if the auto.create.topics.enable property in the BrokerConfig is set to TRUE and start publishing data on the same but the number of partitions created for this topic will based on the num.partitions parameter defined in the configuration files (by default it is set to one).
Increasing partition count for an existing topic can be done, but it'll not move any existing data into those partitions.
To create a topic with different number of partition you need to create the topic first and the same can be done with the console script that shipped along with the Kafka distribution. The following command will allow you to create a topic with 2 partition (as specified by the --partition flag)
bin/kafka-create-topic.sh --zookeeper localhost:2181 --replica 3 --partition 2 --topic my-custom-topic
Unfortunately as far as my understanding goes currently there is no direct alternative to achieve this.
The number of partitions is a broker property and will not have any effect for the producer, see here.
As the producer example page shows, you can use a custom partitioner to route messages as you prefer but new partitions will not be created if not defined in the broker properties.