Consume all messages on Kafka topic and disconnect - java

I have a batch job which will be triggered once a day. The requirement is to
consume all the messages available on the Kafka Topic at that point of time
Process the messages
If the process was successfully completed, commit the offsets.
Currently I poll() the messages in while loop until ConsumerRecords.isEmpty() is true. When ConsumerRecords.isEmpty() is true I assume all the records available on Topic at the point of time has been consumed. The application maintains the offsets and closes the kafka consumer.
When the processing on messages is done and successfully completed, I create a new KafkaConsumer and commit the offsets maintained by the application.
Note I close the KafkaConsumer initially used to read the messages and use another KafkaConsumer instance to commit the offsets to avoid the consumer rebalance exception.
I am expecting max of 5k messages on Topic. The topic is partitioned and replicated.
Is there any better way to consume all messages on Topic at a specific point of time ? Is there anything I am missing or need to take care of ? I don't think I need to take care of consumer rebalancing since I poll() for the messages in loop and process the messages after the polling is done.
I am using java kafka client v0.9 and can change to v0.10 if it helps in above scenario.
Thanks
Updated:
AtomicBoolean flag = new AtomicBoolean();
flag.set(true);
while(flag.get()) {
ConsumerRecords<String, String> consumerRecords = kafkaConsumer.poll(timeout);
if(consumerRecords.isEmpty()) {
flag.set(false);
continue;
}
//if the ConsumerRecords is not empty process the messages and continue to poll()
}
kafkaConsumer.close();

You can't assume that after a call to poll() you have read all the messages available in the topic in that moment due to the max.poll.records configuration parameter on the consumer. This is the maximum number of records returned by a single poll() and its default value is 500. It means that if in that moment there are i.e. 600 messages in the topic, you need two calls on poll() for reading all the messages (but consider that meanwhile some other messages could arrive).
The other thing I don't understand is why you are using a different consumer for committing offsets. What's the consumer rebalance exception you are talking about ?

Related

Spring cloud stream Kafka consumer stuck with long running job and large value for max.poll.interval.ms

We have got some long-running jobs that have been also implemented with Spring Cloud Stream and Kafka binder. The issue we are facing is because the default value for max.poll.interval.ms and max.poll.records is not appropriate for our use case, we need to set a relatively large value for max.poll.interval.ms (a few hours) and a relatively small value for max.poll.records (e.g. 1) to be aligned with the longest-running job could ever get consumed by the consumer. This addresses the issue of a consumer getting into the rebalance loop. However, it causes some operational challenges with the consumer. It happens sometimes that the consumer gets stuck on the restart and does not consume any messages until the max.poll.interval.ms passes.
Is this because of the way that the Spring Cloud stream poll has been implemented? Does it help if I use the sync consumer and manages the poll() accordingly?
The consumer logs the lose of heartbeat and the message I can see in the Kafka log when the consumer has stuck:
GroupCoordinator 11]: Member consumer-3-f46e14b4-5998-4083-b7ec-bed4e3f374eb in group foo has failed, removing it from the group
Spring cloud stream (message-driven) is not a good fit for this application. It would be better to manage the consumer yourself; close it after the poll(); process the job; create a new consumer and commit the offset, and poll() again.

How is Kafka offset reset being managed for parallel consumption?

I would like to better understand Kafka message retry process.
I have heard that failed processing of consumed messages can be addressed using 2 options:
SeekToCurrentErrorHandler (offset reset)
publishing a message to a Dead Letter Queues (DLQs)
The 2nd option is a pretty clear, that if a message failed to be processed it is simply pushed to an error queue. I am more curious about the first option.
AFAIK, the 1st option is the most widely used one, but how does it work when multiple consumers concurrently consume messages from the same topic? Does it work that if a particular message has failed the offset for the consumer-id is being reset to the message's offset? What will happen with the messages successfully processed simultaneously/after the failed one, will they be re-processed?
How can you advice me to deal with message re-tries?
Each partition can only be consumed by one consumer.
When you have multiple consumers, you must have at least that number of partitions.
The offset is maintained for each partition; the error handler will (can) only perform seeks on the partitions that are assigned to this consumer.

Kafka behaviour if consumer fail

I've looked up throught a lot of different articles about Apache Kafka transactions, recovery and exactly-once new features. Still don't understand an issue with consumer recovery. How to be sure that every message from queue will be processed even if one of consumers dies?
Let's say we have a topic partition assigned to consumer. Consumer polls a message and started to work on it. And shutted down due to power failure without commit. What will happens? Will any other consumer from the same group repoll this message?
Consumers periodically send heartbeats, telling the broker that they are alive. If broker does not receive heartbeats from the consumer, it considers the consumer dead and reassigns its partitions. So, if consumer dies, its partitions will be assigned to another consumer from the group and uncommitted messages will be sent to the newly assigned consumer.

How to handle offset commit failures with enable.auto.commit disabled in Spark Streaming with Kafka?

I use Spark 2.0.0 with Kafka 0.10.2.
I have an application that is processing messages from Kafka and is a long running job.
From time to time I see the following message in the logs. Which I understand how I can increase the timeout and everything but what I wanted to know was given that I do have this error how can I recover from it ?
ERROR ConsumerCoordinator: Offset commit failed.
org.apache.kafka.clients.consumer.CommitFailedException:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member.
This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing.
You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
This is not on how I escape this error but how to handle it once it occurs
Background: In normal situations I will not see commit errors, but if I do get one I should be able to recover from it. I am using AT_LEAST_ONCE setup, So I am completely happy with reprocessing a few messages.
I am running Java and using DirectKakfaStreams with Manual commits.
Creating the stream:
JavaInputDStream<ConsumerRecord<String, String>> directKafkaStream =
KafkaUtils.createDirectStream(
jssc,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams));
Commiting the offsets
((CanCommitOffsets) directKafkaStream.inputDStream()).commitAsync(offsetRanges);
My understanding of the situation is that you use the Kafka Direct Stream integration (using spark-streaming-kafka-0-10_2.11 module as described in Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher)).
As said in the error message:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member.
Kafka manages what topic partition a consumer consumes so the Direct Stream will create a pool of consumers (inside a single consumer group).
As with any consumer group you should expect rebalancing which (quoting Chapter 4. "Kafka Consumers - Reading Data from Kafka" from Kafka: The Definitive Guide):
consumers in a consumer group share ownership of the partitions in the topics they subscribe to. When we add a new consumer to the group it starts consuming messages from partitions which were previously consumed by another consumer. The same thing happens when a consumer shuts down or crashes, it leaves the group, and the partitions it used to consume will be consumed by one of the remaining consumers. Reassignment of partitions to consumers also happen when the topics the consumer group is consuming are modified, for example if an administrator adds new partitions.
There are quite a few cases when rebalancing can occur and should be expected. And you do.
You asked:
how can I recover from it? This is not on how I escape this error but how to handle it once it occurs?
My answer would be to use the other method of CanCommitOffsets:
def commitAsync(offsetRanges: Array[OffsetRange], callback: OffsetCommitCallback): Unit
that gives you access to Kafka's OffsetCommitCallback:
OffsetCommitCallback is a callback interface that the user can implement to trigger custom actions when a commit request completes. The callback may be executed in any thread calling poll().
I think onComplete gives you a handle on how the async commit has finished and act accordingly.
Something I can't help you with much is how to revert the changes in a Spark Streaming application when some offsets could not have been committed. That I think requires tracking offsets and accept a case where some offsets can't be committed and be re-processed.

Kafka high level consumer

I am trying to use the High level consumer for batch reading the messages in the Kafka topic.
During this batch read, my thread has to stop at some point.
Either, once all the messages in the topic are exhausted. or Get
the max offset at the point when the messages are about to be read and
stop till that max offset is reached.
I tried to use the code at high-level-consumer but the iterator methods on the KafkaStream seems to be a blocking call and waits till another messages comes in.
So 3 questions,
How do i know that there are no more messages to be read from the topic?
If i have an answer to the above question, how do i stop it from listening to the topic anymore?
Is there a way to find the maximum offset when the batch read starts (i think simple consumer can do this) and make the high level consumer stop at that point?
You have the option to decide that when no new message has arrived for a specified amount of time, then you consider all messages to have been read. This can be set using the consumer property consumer.timeout.ms. After this specified value has passed without any new messages arriving, the ConsumerIterator will throw a timeout exception and you could handle that in the consumer and exit.

Categories

Resources