Kafka consumer stops consuming messages - java

I have a simple kafka setup. A producer is producing messages to a single partition with a single topic at a high rate. A single consumer is consuming messages from this partition. During this process, the consumer may pause processing messages several times. The pause can last a couple of minutes. After the producer stops producing messages, all messages queued up will be processed by the consumer. It appears that messages produced by the producer are not being seen immediately by the consumer. I am using kafka 0.10.1.0. What can be happening here? Here is the section of code that consumes the message:
while (true)
{
try
{
ConsumerRecords<String, byte[]> records = consumer.poll(100);
for (final ConsumerRecord<String, byte[]> record : records)
{
serviceThread.submit(() ->
{
externalConsumer.accept(record);
});
}
consumer.commitAsync();
} catch (org.apache.kafka.common.errors.WakeupException e)
{
}
}
where consumer is a KafkaConsumer with auto commit disabled, max poll record of 100, and session timeout of 30000. serviceThread is an ExecutorService.
The producer just involves the KafkaProducer.send call to send a ProducerRecord.
All configurations on the broker are left as kafka defaults.
I am also using kafka-consumer-groups.sh to check what is happening when consumer is not consuming the message. But when this happens, the kafka-consumer-groups.sh will be hanging there also, not able to get information back. Sometimes it triggers the consumer re-balance. But not always.

For those who can find this helpful. I've encountered this problem (when kafka silently supposedly stops consuming) often enough and every single time it wasn't actually problem with Kafka.
Usually it is some long-running or hanged silent process that keeps Kafka from committing the offset. For example a DB client trying to connect to the DB. If you wait for long enough (e.g. 15 minutes for SQLAlchemy and Postgres), you will see a exception will be printed to the STDOUT, saying something like connection timed out.

Related

Good way to check if Kafka Consumer doesn't have any records to return and is empty in java?

I'm using Apache KafkaConsumer. I want to check if the consumer has any messages to return without polling. If I poll the consumer and there aren't any messages, then I get the message "Attempt to heartbeat failed since the group is rebalancing" in an infinite loop until the timeout expires, even though I have a records.isEmpty() clause. This is a snippet of my code:
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(10));
if (records.isEmpty()) {
log.info("No More Records");
consumer.close();
}
else {
records.iterator().forEachRemaining(record -> log.info("RECORD: " + record);
);
This works fine until records are empty. Once it is empty, it logs "Attempt to heartbeat failed since the group is rebalancing" many times, logs "No More Records" once, and then continues to log the heartbeat error. What can I do to combat this and how can I elegantly check (without any heartbeat messages) that there are no more records to poll?
Edit: I asked another question and the full code and context is on this link: How to get messages from Kafka Consumer one by one in java?
Thanks in advance!
Out of comment: "Since I have a UI and want to receive a message one by one by clicking the "receive" button, there might be a case when there are no more messages to be polled."
In that case you need to create a new KafkaConsumer every time someone clicks on the "receive" button and then close it afterwards.
If you want to use the same KafkaConsumer for the lifetime of your client, you need to let the broker know that it is still alive (by sending a heartbeat, which is implicitly done through calling the poll method). Otherwise, as you have already experienced, the broker thinks your KafkaConsumer is dead and will initiate a rebalancing. As there is no other active Consumer available this rebalancing will not stop.

Kafka consumer- Pause polling of event from specific kafka topic partition to use it as delayed queue

We have a scenario in our system where to kafka topic XYZ User details are published by some other producing application A (different system) and my application B is consuming from that topic.
The requirement is application B needs to consume that event 45 minutes after(or any configurable time) it is put in kafka topic XYZ by A (reason for this delay is that another REST api of some system C needs to trigger based on this User details event for particular user to confirm if it has some flag set for that user and that flag can be set at any point in that 45 minutes duration, although it could have been solved if C does not have the capability to publish to kafka or notify us in any way).
Our application B is written in spring.
The solution I tried was taking event from Kafka and checking the timestamp of the first event in the queue and if it is already 45 minutes for that event then process it or if it is less than 45 minutes then pause polling kafka container for that amount of time till it reaches 45 minutes using MessageListnerContainer pause() method.
Something like below -
#KafkaListener(id = "delayed_listener", topics = "test_topic", groupId = "test_group")
public void delayedConsumer(#Payload String message,
Acknowledgment acknowledgment) {
UserDataEvent userDataEvent = null;
try {
userDataEvent = this.mapper.readValue(message, TopicRequest.class);
} catch (JsonProcessingException e) {
logger.error("error while parsing message");
}
MessageListenerContainer delayedContainer = this.kafkaListenerEndpointRegistry.getListenerContainer("delayed_listener");
if (userDataEvent.getPublishTime() > 45 minutes) // this will be some configured value
{
long sleepTimeForPolling = userDataEvent.getPublishTime() - System.currentTimeMillis();
// give negative ack to put already polled messages back to kafka topic
acknowledgment.nack(1000);
// pause container, and later resume it
delayedContainer.pause();
ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(1);
scheduledExecutorService.schedule(() -> {
delayedContainer.resume();
}, sleepTimeForPolling, TimeUnit.MILLISECONDS);
return;
}
// if message was already 45 minutes old then process it
this.service.processMessage(userDataEvent);
acknowledgment.acknowledge();
}
Though it works for single partition but i am not sure if this is a right approach, any comments on that? also i see multiple partitions it will cause problems, as above pause method call will pause the whole container and if one of the partition has old message it will not be consumed if container was paused because of new message in some other partition.
Can i use this pause logic at partition level somehow?
Any better/recommended solution for achieving this delayed processing after a certain amount of configurable time which I can adopt in this scenario rather than doing what I did above?
Kafka is not really designed for such scenarios.
One way I could see that technique working would be to set the container concurrency to the same as the number of partitions in the topic so that each partition is processed by a different consumer on a different thread; then pause/resume the individual Consumer<?, ?>s instead of the whole container.
To do that, add the Consumer<?, ?> as an additional parameter; to resume the consumer, set the idleEventInterval and check the timer in an event listener (ListenerContainerIdleEvent). The Consumer<?, ?> is a property of the event so you can call resume() there.

Kafka streaming - TimeoutException: Expiring * record(s) for TOPIC:* ms has passed since batch creation

Streaming application is rolled out in production and right after 10 days observing errors/warnings in the CustomProductionExceptionHandler for expired transactions which belongs to older day window.
FLOW :
INPUT TOPIC --> STREAMING APPLICATION(Produces stats and emits after day window closed) --> OUTPUT TOPIC
Producer continuously trying to publish records to OUTPUT Topic which is already expired with older window and logs an error into CustomProductionExceptionHandler.
I have reduced batch size and kept default but this change is not yet promoted to production.
CustomProductionExceptionHandler Implementation: To Avoid streaming to die due to NeworkException,TimeOutException.
With this implementation producer does not retry and in case of any exceptions it does CONTINUE.. On other side upon returning FAIL.. stream thread dies and does not auto restart..Need suggestions..
public class CustomProductionExceptionHandler implements ProductionExceptionHandler {
#Override
public ProductionExceptionHandlerResponse handle(final ProducerRecord<byte[], byte[]> record,
final Exception exception) {
String recordKey = new String(record.key());
String recordVal = new String(record.value());
String recordTopic = record.topic();
logger.error("Kafka message marked as processed although it failed. Message: [{}:{}], destination topic: [{}]", recordKey,recordVal,recordTopic,exception);
return ProductionExceptionHandlerResponse.CONTINUE;
}
}
Exception:
2019-12-20 16:31:37.576 ERROR com.jpmc.gpg.exception.CustomProductionExceptionHandler.handle(CustomProductionExceptionHandler.java:19) kafka-producer-network-thread | profile-day-summary-generator-291e69b1-5a3d-4d49-8797-252c2ae05607-StreamThread-19-producerid - Kafka message marked as processed although it failed. Message: [{"statistics":{}], destination topic: [OUTPUT-TOPIC]
org.apache.kafka.common.errors.TimeoutException: Expiring * record(s) for TOPIC:1086149 ms has passed since batch creation
Trying to get answer for below questions.
1) Why producer is trying to publish older transactions to OUTPUT Topic for which day window is already closed?
Example - Producer is trying to send 12/09 day window transaction but current opened window is 12/20
2) Streaming threads could have been died without CustomProductionExceptionHandler -->
ProductionExceptionHandlerResponse.CONTINUE.
Do we have any way that Producer can do retries in case of NetworkException or TimeoutException and
then continue instead of stream thread die?
Problem of specifying ProductionExceptionHandlerResponse.CONTINUE in the
CustomProductionExceptionHandler is - In case of any exception it skips
that record publishing to output topic and proceed with next records. No Resiliency.
1) It's not really possible to answer this question without knowing what your program does. Note, that in general, Kafka Streams works on event-time and handle out-of-order data.
2) You can configure all internally used client of a Kafka Streams application (ie, consumer, producer, admin client, and restore consumer) by specifying the corresponding client configuration in the Properties you pass into KafkaStreams. If you wand different configs for different clients, you can prefix them accordingly, ie, producer.retries instead of retries. Check out the docs for more details: https://docs.confluent.io/current/streams/developer-guide/config-streams.html#ak-consumers-producer-and-admin-client-configuration-parameters

How should Apache Pulsar Consumer.acknowledgeAsync() failure be handled?

I am using Consumer.acknowledgeAsync() to ack messages in my Java service and was wondering what happens if the ack fails? Should I retry the operation a few times and discard my consumer when retries are exhausted?
I am counting the number of messages being processed for flow-control to limit memory usage.
Usually, If message was not ack-ed successfully, after ackTimeout, the message will be redelivered from broker to consumer again.
So here, most of the case, it is no need to retry.
maybe some handling like this is enough:
consumer.acknowledgeAsync(msgId)
.thenAccept(consumer -> successHandlerMethod())
.exceptionally(exception -> failHandlerMethod());

Kafka consumer hangs on poll when kafka is down

I've been playing around with a basic setup of Zookeeper and Kafka to learn how to use it, but I'm having trouble with the consumer. When Kafka is not available the call to the poll() method hangs until it is back online.
Kafka version: 0.10.1.0
My code looks like this:
KafkaConsumer<String, byte[]> consumer = new KafkaConsumer<>(props);
consumer.subscribe(topics);
while (!stopped) {
// If by any reason Kafka is not available this call will hang
// until Kafka is back online.
records = consumer.poll(timeout);
for (ConsumerRecord<String, byte[]> record : records) {
process(record);
}
Thread.sleep(sleepTime);
}
I've read that when I call to poll() the consumer will try to connect to Kafka indefinitely until it is back online or until consumer.wakeup() is called.
I want the code act differently when Kafka is not online. Is there any way of limiting the consumer retries or making it fail when polling from a non-existent kafka?
Unfortunately this is still an issue. Many Consumer methods can hang with various scenarios.
There is a Kafka Improvement Proposal in progress, KIP-266, to add timeouts to the Consumer methods to avoid hangs.
As far as I know, calling wakeup() from another thread is the best workaround
EDIT: As of Kafka 2.0.0, all Consumer calls can accept a timeout. That allows to recover control in case brokers go down.

Categories

Resources