When Kafka connection is idle? - java

I faced an issue that when my Kafka consumers read some topic and it does not have messages for some time then Kafka thinks that connection is idle as I understood. I saw these trace logs:
About to close the idle connection from 2147482646 due to being idle for 540005 millis
Node 2147482646 disconnected.
But why it's idle? I have a default config connections.max.idle.ms: 9 mins, also heartbeat checks every 5 secs, session ms is 30 sec. As I understood maybe Kafka does not recognize heartbeats as "actions". Okay then we have regular poll() calls and as for me, it's an action even if I got 0 messages. But then I found an idea that connection is idle if nothing was written to the TCP socket for some period of time. So if my Kafka gets nothing within 9 mins (messages) then it's idle? Is it the truth? Or how then kafka recognizes that connection is idle?
UPD:
Poll is an action no matter how many messages you got. I ran kafka locally and started debugging with CONNECTIONS_MAX_IDLE_MS_CONFIG=5000 and I found out that my kafka consumer created 2 different connections and then one of them after some time was not used for 5 secs and it caused the closing of the connection. so why it happens that one connection was not used for a long time? Since I tried 10secs and the same. On my prod config is default 9 mins but it happens.
And what's interesting after I faced it 1 time then kafka recreates a connection and everything is fine. But maybe if its a group coordinator then it makes kafka rebalance and create all connections from scratch an issue appears again.
kafka-clients version 3.2.0. My code for testing:
Logger logger = LogManager.getLogger(Main.class);
logger.info("test");
String bootstrapServers = "127.0.0.1:9092";
String grp_id = "test";
String topic = "test";
Properties properties = new Properties();
properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
properties.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG, grp_id);
properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
properties.setProperty(ConsumerConfig.CONNECTIONS_MAX_IDLE_MS_CONFIG, "5000");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties);
consumer.subscribe(Collections.singletonList(topic));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
System.out.println("Key: " + record.key() + ", Value:" + record.value());
System.out.println("Partition:" + record.partition() + ",Offset:" + record.offset());
}
}

Related

How to increase request time for kafka consumer

I am using Kafka with Spring Listener. Following is piece of code.
In the past we have published more than 100k messages to test topic and system seems to be working fine.
But few days back, I changed the groupId of consumer. After that this new consumer tried to process all the message from start and which takes lot of time. But after sometime may be (10 sec) broker kickoff the consumer.
so result no kafka register to listen message.
#KafkaListener(
topicPattern = "test",
groupId = "test",
id = "test",
containerFactory = "testKafkaListenerContainerFactory")
public void consume(#Payload String payload) throws IOException {
}
Kafka Consumer configuration:
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
props.put("security.protocol", "SSL");
Then I used cli to read message with following command and observed same behavior. After exactly 10 sec consumer stop reading message from kafka.
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test
How to increase request time out for kafka client or some other better approach to solution this issue?
In the past we have published more than 1 lac message to test topic and system seems to be working fine.
Does your Kafka has the required data? Kafka does not store messages forever. The duration of messages is governed by the retention set in the broker/s. If your data is older than the retention period, it will be lost.

Unable to publish message from KafkaProducer

I am unable to send message from Kafka Producer. My configuration is not working and it looks like this:
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "127.0.0.1:9092");
properties.setProperty("key.serializer", StringSerializer.class.getName());
properties.setProperty("value.serializer", StringSerializer.class.getName());
properties.setProperty("acks", "1");
properties.setProperty("retries", "3");
properties.setProperty("linger.ms", "1");
Producer<String, String> producer =
new org.apache.kafka.clients.producer.KafkaProducer<String, String>(properties);
ProducerRecord<String, String> producerRecord =
new ProducerRecord<String, String>("second_topic", "3", "messagtest");
Future<RecordMetadata> s = producer.send(producerRecord);
producer.flush();
producer.close();
Here is error after i did s.get();
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for second_topic-0: 30021 ms has passed since batch creation plus linger time
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:94)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:64)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:29)
at ai.sys.producer.Test.main(Test.java:33)
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for second_topic-0: 30021 ms has passed since batch creation plus linger time
Batching has been enabled in Kafka Producer by default with a size of 16K Bytes. However, in your code, you are sending only one record which might not satisfy the batch size.
Hence, for your code to work, try to add the following property of "batch.size" to value "0" to Kafka Producer properties.
properties.setProperty("batch.size", "0");
This would disable the batching mechanism and allow your producer to write records to the Kafka Broker.
Note: In real-time, disabling batching would increase the number of write requests to the broker and decrease the I/O Throughput and performance of both the producer and server.

Why does kafka streams reprocess the messages produced after broker restart

I have a single node kafka broker and simple streams application. I created 2 topics (topic1 and topic2).
Produced on topic1 - processed message - write to topic2
Note: For each message produced only one message is written to destination topic
I produced a single message. After it was written to topic2, I stopped the kafka broker. After sometime I restarted the broker and produced another message on topic1. Now streams app processed that message 3 times. Now without stopping the broker I produced messages to topic1 and waited for streams app to write to topic2 before producing again.
Streams app is behaving strangely. Sometimes for one produced message there are 2 messages written to destination topic and sometimes 3. I don't understand why is this happening. I mean even the messages produced after broker restart are being duplicated.
Update 1:
I am using Kafka version 1.0.0 and Kafka-Streams version 1.1.0
Below is the code.
Main.java
String credentials = env.get("CREDENTIALS");
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "activity-collection");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.RECONNECT_BACKOFF_MS_CONFIG, 100000);
props.put(StreamsConfig.RECONNECT_BACKOFF_MAX_MS_CONFIG, 200000);
props.put(StreamsConfig.REQUEST_TIMEOUT_MS_CONFIG, 60000);
props.put(StreamsConfig.RETRY_BACKOFF_MS_CONFIG, 60000);
props.put(StreamsConfig.producerPrefix(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG), true);
props.put(StreamsConfig.producerPrefix(ProducerConfig.ACKS_CONFIG), "all");
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> activityStream = builder.stream("activity_contenturl");
KStream<String, String> activityResultStream = AppUtil.hitContentUrls(credentials , activityStream);
activityResultStream.to("o365_user_activity");
AppUtil.java
public static KStream<String, String> hitContentUrls(String credentials, KStream<String, String> activityStream) {
KStream<String, String> activityResultStream = activityStream
.flatMapValues(new ValueMapper<String, Iterable<String>>() {
#Override
public Iterable<String> apply(String value) {
ArrayList<String> log = new ArrayList<String>();
JSONObject received = new JSONObject(value);
String url = received.get("url").toString();
String accessToken = ServiceUtil.getAccessToken(credentials);
JSONObject activityLog = ServiceUtil.getActivityLogs(url, accessToken);
log.add(activityLog.toString());
}
return log;
}
});
return activityResultStream;
}
Update 2:
In a single broker and single partition environment with the above config, I started the Kafka broker and streams app. Produced 6 messages on source topic and when I started a consumer on destination topic there are 36 messages and counting. They keep on coming.
So I ran this to see consumer-groups:
kafka_2.11-1.1.0/bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server localhost:9092 --list
Output:
streams-collection-app-0
Next I ran this:
kafka_2.11-1.1.0/bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group streams-collection-app-0
Output:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
o365_activity_contenturl 0 1 1 0 streams-collection-app-0-244b6f55-b6be-40c4-9160-00ea45bba645-StreamThread-1-consumer-3a2940c2-47ab-49a0-ba72-4e49d341daee /127.0.0.1 streams-collection-app-0-244b6f55-b6be-40c4-9160-00ea45bba645-StreamThread-1-consumer
After a while the output showed this:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
o365_activity_contenturl 0 1 6 5 - - -
And then:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
o365_activity_contenturl 0 1 7 6 - - -
seems you are facing with known limitation. Kafka topic by default stores messages at least 7 days, but committed offsets stored for 1 day (default config value offsets.retention.minutes = 1440). so if no messages were produced to your source topic during more than 1 day, after app restart all messages from topic will be reprocessed again (actually multiple times, depending on number of restarts, max 1 time per day per such topic with rare incoming messages).
you could find description of expiration committed offsets How does an offset expire for consumer group.
in kafka version 2.0 retention for committed offsets was increased KIP-186: Increase offsets retention default to 7 days.
to prevent reprocessing, you could add consumer property auto.offset.reset: latest (default value is earliest).
there is exist a small risk with latest: if no one produced message into a source topic longer that day, and after that you restarted app, you could lost some messages (only messages which arrived exactly during restart).

Kafka last offset increases on application restart

I have a Java Akka application that reads from Kafka, process the messages and commits manually.
I'm using the High Level consumer of the 0.10.1.1 API.
The strange thing is when I shutdown the application and start it again the offset is a little bigger than the last commit and I cannot find why.
I have only one commit point in the code.
else if(message.getClass() == ProcessedBatches.class) {
try {
Logger.getRootLogger().info("[" + this.name + "/Reader] Commiting ...");
ProcessedBatches msg = (ProcessedBatches) message;
consumer.commitSync(msg.getCommitInfo());
lastCommitData = msg.getCommitInfo();
lastCommit = System.currentTimeMillis();
} catch (CommitFailedException e) {
Logger.getRootLogger().info("[" + this.name + "/Reader] Failed to commit... Last commit: " + lastCommit + " | Last batch: " + lastBatch + ". Current uncommited messages: " + uncommitedMessages);
self().tell(HarakiriMessage.getInstance(), self());
}
}
After commit I save the offsets HashMap in the lastCommitData in order to debug it.
Then I've added a shutdown hook to print the lastCommitData variable to check what is the last offset commited for each partition.
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
String output =
"############## SHUTTING DOWN CONSUMER ############### \n" +
lastCommitData+"\n";
System.out.println(output);
}));
Also I have a consumer rebalance listener to check start position of each partition when consumer starts.
new ConsumerRebalanceListener() {
#Override
public void onPartitionsRevoked(Collection<TopicPartition> collection) {}
#Override
public void onPartitionsAssigned(Collection<TopicPartition> collection) {
for (TopicPartition p:collection
) {
System.out.println("Starting position "+p.toString()+":" + consumer.position(p));
}
coordinator.setRebalanceTimestamp(System.currentTimeMillis());
}
});
Example for one partition:
Offset before shutdown: 3107169023
Offset when partition assigned: 3107180350
As you can see is almost 10K messages between each.
The consumer properties are the following:
Properties props = new Properties();
props.put("bootstrap.servers", bootstrapServers);
props.put("group.id", group_id);
props.put("enable.auto.commit", "false");
props.put("auto.commit.interval.ms", "100000000");
props.put("session.timeout.ms", "10000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.ByteArrayDeserializer");
props.put("max.poll.records", "40000");
props.put("auto.offset.reset", "latest");
I'm not sure what I'm doing wrong.
Am I correct in thinking you base your assumed "Offset before shutdown: 3107169023" on what your shutdown hook prints?
If so, I see 2 potentials issues.
When you register your shutdown hook you are closing over the lastCommitData field.
Since you are accessing it from another thread, the shutdown hook thread, is the field declared volatile? Otherwise you may be printing a stale value.
Also, java.lang.Runtime.addShutdownHook says:
When the virtual machine begins its shutdown sequence it will start all registered shutdown hooks in some unspecified order and let them run concurrently
so there is no guarantee that your consumer won't manage to commit offsets further after your shutdown hook has already printed the lastCommitData value.
I suggest you inspect Kafka to check what are the actual committed offsets after your app shuts down to be sure.
Check retention policy of our topic
it maybe the case when you start back your consumer, the last committed offset might have been purged from the partition and the consumer will move forward to the latest offset for that partition.
When you poll Kafka using the Consumer API, it reads from the last consumed offset in the partition. There must be other consumers in the system which must have got the partitions which were previously been consumed by the instance which you just stopped - thus the latest offset would have changed. Since you know which offset you were at before exiting, you would need to save it to some durable store - use the ConsumerRebalanceListener#onPartitionsRevoked for this. Read that offset when you restart the consumer process and point your consumer to start from there - do this by calling seek(partition, offset) in ConsumerRebalanceListener#onPartitionsAssigned

Can a Kafka broker retain messages while there are no consumers connected?

I am trying to build a pub/sub application and I am exploring the best tools out there. I am currently looking at Kafka and have a little demo app already running. However, I am running into a conceptual issue.
I have a producer (Java code):
String topicName = "MyTopic;
String key = "MyKey";
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092,localhost:9093");
props.put("acks", "all");
props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
Producer<String, byte[]> producer = new KafkaProducer <String, byte[]>(props);
byte[] data = <FROM ELSEWHERE>;
ProducerRecord<String, byte[]> record = new ProducerRecord<String, byte[]>(topicName, key, data);
try {
RecordMetadata result = producer.send(record).get();
}
catch (Exception e) {
// Nothing for now
}
producer.close();
When I start a consumer via the Kakfa command line tools:
kafka-console-consumer --bootstrap-server localhost:9092 --topic MyTopic
and then I execute the producer code, I see the data message show up on my consumer terminal.
However, if I do not run the consumer prior executing the producer, the message appears "lost". When I start the consumer (after executing the producer), nothing appears in the consumer terminal.
Does anyone know if it's possible to have the Kafka broker retain messages while there are no consumers connected? If so, how?
Append --from-beginning to the console consumer command to have it start consuming from the earliest offset. This is actually about the offset reset strategy which is controlled by config auto.offset.reset. Here is what this config means:
What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found for the consumer's group
anything else: throw exception to the consumer.

Categories

Resources