Unable to publish message from KafkaProducer - java

I am unable to send message from Kafka Producer. My configuration is not working and it looks like this:
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "127.0.0.1:9092");
properties.setProperty("key.serializer", StringSerializer.class.getName());
properties.setProperty("value.serializer", StringSerializer.class.getName());
properties.setProperty("acks", "1");
properties.setProperty("retries", "3");
properties.setProperty("linger.ms", "1");
Producer<String, String> producer =
new org.apache.kafka.clients.producer.KafkaProducer<String, String>(properties);
ProducerRecord<String, String> producerRecord =
new ProducerRecord<String, String>("second_topic", "3", "messagtest");
Future<RecordMetadata> s = producer.send(producerRecord);
producer.flush();
producer.close();
Here is error after i did s.get();
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for second_topic-0: 30021 ms has passed since batch creation plus linger time
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:94)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:64)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:29)
at ai.sys.producer.Test.main(Test.java:33)
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for second_topic-0: 30021 ms has passed since batch creation plus linger time

Batching has been enabled in Kafka Producer by default with a size of 16K Bytes. However, in your code, you are sending only one record which might not satisfy the batch size.
Hence, for your code to work, try to add the following property of "batch.size" to value "0" to Kafka Producer properties.
properties.setProperty("batch.size", "0");
This would disable the batching mechanism and allow your producer to write records to the Kafka Broker.
Note: In real-time, disabling batching would increase the number of write requests to the broker and decrease the I/O Throughput and performance of both the producer and server.

Related

When Kafka connection is idle?

I faced an issue that when my Kafka consumers read some topic and it does not have messages for some time then Kafka thinks that connection is idle as I understood. I saw these trace logs:
About to close the idle connection from 2147482646 due to being idle for 540005 millis
Node 2147482646 disconnected.
But why it's idle? I have a default config connections.max.idle.ms: 9 mins, also heartbeat checks every 5 secs, session ms is 30 sec. As I understood maybe Kafka does not recognize heartbeats as "actions". Okay then we have regular poll() calls and as for me, it's an action even if I got 0 messages. But then I found an idea that connection is idle if nothing was written to the TCP socket for some period of time. So if my Kafka gets nothing within 9 mins (messages) then it's idle? Is it the truth? Or how then kafka recognizes that connection is idle?
UPD:
Poll is an action no matter how many messages you got. I ran kafka locally and started debugging with CONNECTIONS_MAX_IDLE_MS_CONFIG=5000 and I found out that my kafka consumer created 2 different connections and then one of them after some time was not used for 5 secs and it caused the closing of the connection. so why it happens that one connection was not used for a long time? Since I tried 10secs and the same. On my prod config is default 9 mins but it happens.
And what's interesting after I faced it 1 time then kafka recreates a connection and everything is fine. But maybe if its a group coordinator then it makes kafka rebalance and create all connections from scratch an issue appears again.
kafka-clients version 3.2.0. My code for testing:
Logger logger = LogManager.getLogger(Main.class);
logger.info("test");
String bootstrapServers = "127.0.0.1:9092";
String grp_id = "test";
String topic = "test";
Properties properties = new Properties();
properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
properties.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG, grp_id);
properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
properties.setProperty(ConsumerConfig.CONNECTIONS_MAX_IDLE_MS_CONFIG, "5000");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties);
consumer.subscribe(Collections.singletonList(topic));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
System.out.println("Key: " + record.key() + ", Value:" + record.value());
System.out.println("Partition:" + record.partition() + ",Offset:" + record.offset());
}
}

How to increase request time for kafka consumer

I am using Kafka with Spring Listener. Following is piece of code.
In the past we have published more than 100k messages to test topic and system seems to be working fine.
But few days back, I changed the groupId of consumer. After that this new consumer tried to process all the message from start and which takes lot of time. But after sometime may be (10 sec) broker kickoff the consumer.
so result no kafka register to listen message.
#KafkaListener(
topicPattern = "test",
groupId = "test",
id = "test",
containerFactory = "testKafkaListenerContainerFactory")
public void consume(#Payload String payload) throws IOException {
}
Kafka Consumer configuration:
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
props.put("security.protocol", "SSL");
Then I used cli to read message with following command and observed same behavior. After exactly 10 sec consumer stop reading message from kafka.
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test
How to increase request time out for kafka client or some other better approach to solution this issue?
In the past we have published more than 1 lac message to test topic and system seems to be working fine.
Does your Kafka has the required data? Kafka does not store messages forever. The duration of messages is governed by the retention set in the broker/s. If your data is older than the retention period, it will be lost.

Kafka 0.10.2 consumers getting massive number of duplicates

I have a fairly simple Kafka setup - 1 producer, 1 topic, 10 partitions, 10 KafkaConsumers all with the same group ID, all running on a single machine. When I process a file, the producer quickly creates 3269 messages, which the consumers happily start consuming. Everything runs fine for a while, but at a certain point the consumers start consuming duplicates - LOTS of duplicates. In fact, it looks like they just start consuming the message queue over again. If I let it run for a long time, the database will start receiving the same data entries 6 or more times. After doing some tests with logging, it looks like the consumers are re-consuming the same messages with the same unique message names.
As far as I can tell, no re-balancing is happening. Consumers are not dying or being added. It's the same 10 consumers, consuming the same 3269 messages over and over until I kill the process. If I just let it go, the consumers will write dozens of thousands of records, massively increasing the amount of data that really should be going into the database.
I'm fairly new to Kafka, but I'm kind of at a loss for why this is happening. I know Kafka doesn't guarantee exactly-once processing, and I'm ok with a couple duplicates here and there. I have code to prevent persisting the same records again. However, I'm not sure why the consumers would re-consume the queue over and over. I know that Kafka messages aren't deleted after they are consumed, but if all the consumers are in the same group, the offsets should prevent this, right? I understand a little bit about how offsets work, but as far as I know, they shouldn't be getting reset if there is no re-balancing, right? And the messages aren't timing out as far as I can tell. Is there a way for me to get my consumers to consume everything in the queue once-ish and then wait for more messages without re-consuming the same stuff forever?
Here are the proprties I pass in to the producer and consumers:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("group.id", "MyGroup");
props.put("num.partitions", 10);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
MyIngester ingester = new MyIngester(args[0], props);
To me this seems to be an issue with acknowledging the receipt.
Try the following properties
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "100");

Can a Kafka broker retain messages while there are no consumers connected?

I am trying to build a pub/sub application and I am exploring the best tools out there. I am currently looking at Kafka and have a little demo app already running. However, I am running into a conceptual issue.
I have a producer (Java code):
String topicName = "MyTopic;
String key = "MyKey";
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092,localhost:9093");
props.put("acks", "all");
props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
Producer<String, byte[]> producer = new KafkaProducer <String, byte[]>(props);
byte[] data = <FROM ELSEWHERE>;
ProducerRecord<String, byte[]> record = new ProducerRecord<String, byte[]>(topicName, key, data);
try {
RecordMetadata result = producer.send(record).get();
}
catch (Exception e) {
// Nothing for now
}
producer.close();
When I start a consumer via the Kakfa command line tools:
kafka-console-consumer --bootstrap-server localhost:9092 --topic MyTopic
and then I execute the producer code, I see the data message show up on my consumer terminal.
However, if I do not run the consumer prior executing the producer, the message appears "lost". When I start the consumer (after executing the producer), nothing appears in the consumer terminal.
Does anyone know if it's possible to have the Kafka broker retain messages while there are no consumers connected? If so, how?
Append --from-beginning to the console consumer command to have it start consuming from the earliest offset. This is actually about the offset reset strategy which is controlled by config auto.offset.reset. Here is what this config means:
What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found for the consumer's group
anything else: throw exception to the consumer.

Consume live messages in Kafka

I have started my zookeeper and Kafka server.
I started my Kafka producer which sends 10 messages with topic 'xxx'. Then stopped my Kafka producer.
Now I started my Kafka consumer and subscribed with topic 'xxx'. My consumer consumes those 10 messages sent by my Kafka producer, which is not running now.
I need my Kafka consumer should only consume messages from running Kafka server.
Is there any way to achieve this ?
Following things in my consumer properties.
props.put("bootstrap.servers", "localhost:9092");
String consumeGroup = "cg1";
props.put("group.id", consumeGroup);
props.put("enable.auto.commit", "true");
props.put("auto.offset.reset", "earliest");
props.put("auto.commit.interval.ms", "100");
props.put("heartbeat.interval.ms", "3000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.ByteArrayDeserializer");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
Set the following property :
consumerProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
It tells the consumer to read only the latest messages , that is , the messages which were published after the consumer started.
Please create a new topic and keep the property ConsumerConfig.AUTO_OFFSET_RESET_CONFIG to "latest". make sure to not commit the offset.
i.e we should not use commitSync()
By default, receivers start consuming records from the last committed offset of each assigned partition. If a committed offset is not available, the offset reset strategy ConsumerConfig#AUTO_OFFSET_RESET_CONFIG configured for the KafkaConsumer is used to set the start offset to the earliest or latest offset on the partition.
I think in your case you are committing the offset or there ia committ offset available for given topic.

Categories

Resources