Consume live messages in Kafka - java

I have started my zookeeper and Kafka server.
I started my Kafka producer which sends 10 messages with topic 'xxx'. Then stopped my Kafka producer.
Now I started my Kafka consumer and subscribed with topic 'xxx'. My consumer consumes those 10 messages sent by my Kafka producer, which is not running now.
I need my Kafka consumer should only consume messages from running Kafka server.
Is there any way to achieve this ?
Following things in my consumer properties.
props.put("bootstrap.servers", "localhost:9092");
String consumeGroup = "cg1";
props.put("group.id", consumeGroup);
props.put("enable.auto.commit", "true");
props.put("auto.offset.reset", "earliest");
props.put("auto.commit.interval.ms", "100");
props.put("heartbeat.interval.ms", "3000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.ByteArrayDeserializer");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");

Set the following property :
consumerProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
It tells the consumer to read only the latest messages , that is , the messages which were published after the consumer started.

Please create a new topic and keep the property ConsumerConfig.AUTO_OFFSET_RESET_CONFIG to "latest". make sure to not commit the offset.
i.e we should not use commitSync()
By default, receivers start consuming records from the last committed offset of each assigned partition. If a committed offset is not available, the offset reset strategy ConsumerConfig#AUTO_OFFSET_RESET_CONFIG configured for the KafkaConsumer is used to set the start offset to the earliest or latest offset on the partition.
I think in your case you are committing the offset or there ia committ offset available for given topic.

Related

How to increase request time for kafka consumer

I am using Kafka with Spring Listener. Following is piece of code.
In the past we have published more than 100k messages to test topic and system seems to be working fine.
But few days back, I changed the groupId of consumer. After that this new consumer tried to process all the message from start and which takes lot of time. But after sometime may be (10 sec) broker kickoff the consumer.
so result no kafka register to listen message.
#KafkaListener(
topicPattern = "test",
groupId = "test",
id = "test",
containerFactory = "testKafkaListenerContainerFactory")
public void consume(#Payload String payload) throws IOException {
}
Kafka Consumer configuration:
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
props.put("security.protocol", "SSL");
Then I used cli to read message with following command and observed same behavior. After exactly 10 sec consumer stop reading message from kafka.
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test
How to increase request time out for kafka client or some other better approach to solution this issue?
In the past we have published more than 1 lac message to test topic and system seems to be working fine.
Does your Kafka has the required data? Kafka does not store messages forever. The duration of messages is governed by the retention set in the broker/s. If your data is older than the retention period, it will be lost.

Why do I get high latency for small number of kafka producers and small latency (RTT) for big number of kafka producers?

I run 3 Kafka brokers and spark ( master = local[8]) on a pc, where I wrote java code to subscribe a Kafka topic "A", process and publish to a Kafka topic "B". And run 4 Kafka producers that publish to the topic "A" and one Kafka consumer that subscribe to a topic "B" using java on another pc.
I calculated the RTT in the Kafka consumer code.
The problem is: when I increase the number of Kafka Producers I get lower RTTs!!!.
I tried to change batch.size increase and decrease but it didn't work.
Kafka producer configuration:
//If acks=0, then the producer will not wait for any acknowledgment from the server at all.
props.put("acks", "0");
props.put("client.id", "vehicleProducer");
props.put("retries", 0);
props.put("batch.size", 50);
props.put("linger.ms", 0);
props.put("buffer.memory", 1024);
props.put("key.serializer", "org.apache.kafka.common.serialization.IntegerSerializer");
props.put("value.serializer", "com.iov.safety.fullvehicleproducer.CarFullDataSerializer");
Kafka Consumer configuration:
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
Kafka Spark Consumer configuration:
kafkaParams.put("bootstrap.servers", BOOTSTRAP_SERVERS_RSU2_EDUROAM);
kafkaParams.put("key.deserializer", StringDeserializer.class);
kafkaParams.put("value.deserializer", CarDataDeserializer.class);
kafkaParams.put("group.id", "test-consumer-group");
kafkaParams.put("auto.offset.reset", "latest");
kafkaParams.put("enable.auto.commit", false);
I expect that increasing the number of producers should increase RTT from Kafka consumer side.

Unable to publish message from KafkaProducer

I am unable to send message from Kafka Producer. My configuration is not working and it looks like this:
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "127.0.0.1:9092");
properties.setProperty("key.serializer", StringSerializer.class.getName());
properties.setProperty("value.serializer", StringSerializer.class.getName());
properties.setProperty("acks", "1");
properties.setProperty("retries", "3");
properties.setProperty("linger.ms", "1");
Producer<String, String> producer =
new org.apache.kafka.clients.producer.KafkaProducer<String, String>(properties);
ProducerRecord<String, String> producerRecord =
new ProducerRecord<String, String>("second_topic", "3", "messagtest");
Future<RecordMetadata> s = producer.send(producerRecord);
producer.flush();
producer.close();
Here is error after i did s.get();
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for second_topic-0: 30021 ms has passed since batch creation plus linger time
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:94)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:64)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:29)
at ai.sys.producer.Test.main(Test.java:33)
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for second_topic-0: 30021 ms has passed since batch creation plus linger time
Batching has been enabled in Kafka Producer by default with a size of 16K Bytes. However, in your code, you are sending only one record which might not satisfy the batch size.
Hence, for your code to work, try to add the following property of "batch.size" to value "0" to Kafka Producer properties.
properties.setProperty("batch.size", "0");
This would disable the batching mechanism and allow your producer to write records to the Kafka Broker.
Note: In real-time, disabling batching would increase the number of write requests to the broker and decrease the I/O Throughput and performance of both the producer and server.

Kafka 0.10.2 consumers getting massive number of duplicates

I have a fairly simple Kafka setup - 1 producer, 1 topic, 10 partitions, 10 KafkaConsumers all with the same group ID, all running on a single machine. When I process a file, the producer quickly creates 3269 messages, which the consumers happily start consuming. Everything runs fine for a while, but at a certain point the consumers start consuming duplicates - LOTS of duplicates. In fact, it looks like they just start consuming the message queue over again. If I let it run for a long time, the database will start receiving the same data entries 6 or more times. After doing some tests with logging, it looks like the consumers are re-consuming the same messages with the same unique message names.
As far as I can tell, no re-balancing is happening. Consumers are not dying or being added. It's the same 10 consumers, consuming the same 3269 messages over and over until I kill the process. If I just let it go, the consumers will write dozens of thousands of records, massively increasing the amount of data that really should be going into the database.
I'm fairly new to Kafka, but I'm kind of at a loss for why this is happening. I know Kafka doesn't guarantee exactly-once processing, and I'm ok with a couple duplicates here and there. I have code to prevent persisting the same records again. However, I'm not sure why the consumers would re-consume the queue over and over. I know that Kafka messages aren't deleted after they are consumed, but if all the consumers are in the same group, the offsets should prevent this, right? I understand a little bit about how offsets work, but as far as I know, they shouldn't be getting reset if there is no re-balancing, right? And the messages aren't timing out as far as I can tell. Is there a way for me to get my consumers to consume everything in the queue once-ish and then wait for more messages without re-consuming the same stuff forever?
Here are the proprties I pass in to the producer and consumers:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("group.id", "MyGroup");
props.put("num.partitions", 10);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
MyIngester ingester = new MyIngester(args[0], props);
To me this seems to be an issue with acknowledging the receipt.
Try the following properties
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "100");

Can a Kafka broker retain messages while there are no consumers connected?

I am trying to build a pub/sub application and I am exploring the best tools out there. I am currently looking at Kafka and have a little demo app already running. However, I am running into a conceptual issue.
I have a producer (Java code):
String topicName = "MyTopic;
String key = "MyKey";
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092,localhost:9093");
props.put("acks", "all");
props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
Producer<String, byte[]> producer = new KafkaProducer <String, byte[]>(props);
byte[] data = <FROM ELSEWHERE>;
ProducerRecord<String, byte[]> record = new ProducerRecord<String, byte[]>(topicName, key, data);
try {
RecordMetadata result = producer.send(record).get();
}
catch (Exception e) {
// Nothing for now
}
producer.close();
When I start a consumer via the Kakfa command line tools:
kafka-console-consumer --bootstrap-server localhost:9092 --topic MyTopic
and then I execute the producer code, I see the data message show up on my consumer terminal.
However, if I do not run the consumer prior executing the producer, the message appears "lost". When I start the consumer (after executing the producer), nothing appears in the consumer terminal.
Does anyone know if it's possible to have the Kafka broker retain messages while there are no consumers connected? If so, how?
Append --from-beginning to the console consumer command to have it start consuming from the earliest offset. This is actually about the offset reset strategy which is controlled by config auto.offset.reset. Here is what this config means:
What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found for the consumer's group
anything else: throw exception to the consumer.

Categories

Resources