I have a custom Kafka Consumer in which I use to send some requests to a REST API.
According to the response from the API, I either commit the offset or skip the message without commit.
Minimal example:
while (true) {
ConsumerRecords<String, Object> records = consumer.poll(200);
for (ConsumerRecord<String, Object> record : records) {
// Sending a POST request and retrieving the answer
// ...
if (responseCode.startsWith("2")) {
try {
consumer.commitSync();
} catch(CommitFailedException ex) {
ex.printStackTrace();
}
} else {
// Do Nothing
}
}
}
Now when a response from the REST API does not start with a 2 the offset is not committed, but the message is not re-consumed. How can I force the consumer to re-consume messages with uncommitted offsets?
Make sure your data is idempotent if you are planning to use seek(). Since you are selectively committing offsets, the records left out are possibly going to be before committed (successfully processed) records. If you do seek() - which is moving your groupId's pointer to uncommitted offset and start the replay, you will get those successfully processed messages also. It also has potential of becoming an infinite loop.
Alternatively, you can save unsuccessful record's metadata in memory or db and replay topic from beginning with "poll(retention.ms)" so that all records are replayed back but add a filter to process only those through API whose metadata matches with what you had saved earlier. Do this as a batch processing once every hour or few hours.
Committing offsets is just a way to store the current offset, also know as position, of the Consumer. So in case it stops, it (or the new consumer instance taking over) can find its previous position and restart consuming from there.
So even if you don't commit, the consumer's position is moved once you receive records. If you want to reconsume some records, you have to change the consumer's current position.
With the Java client, you can set the position using seek().
In your scenario, you probably want to calculate the new position relative to the current position. If so you can find the current position using position().
Below are the alternate approaches you can take(instead of seek) :
When REST is failed, move the message to a adhoc kafka topic. You can write another program to read the messages of this topic on a scheduled manner.
When REST is failed, write the Request to a flat flat. Use a shell/any script to read each request and send it on a scheduled basis.
Related
I have a Beam pipeline to consume streaming events with multiple stages (PTransforms) to process them. See the following code,
pipeline.apply("Read Data from Stream", StreamReader.read())
.apply("Decode event and extract relevant fields", ParDo.of(new DecodeExtractFields()))
.apply("Deduplicate process", ParDo.of(new Deduplication()))
.apply("Conversion, Mapping and Persisting", ParDo.of(new DataTransformer()))
.apply("Build Kafka Message", ParDo.of(new PrepareMessage()))
.apply("Publish", ParDo.of(new PublishMessage()))
.apply("Commit offset", ParDo.of(new CommitOffset()));
The streaming events read by using the KafkaIO and the StreamReader.read() method implementation is like this,
public static KafkaIO.Read<String, String> read() {
return KafkaIO.<String, String>read()
.withBootstrapServers(Constants.BOOTSTRAP_SERVER)
.withTopics(Constants.KAFKA_TOPICS)
.withConsumerConfigUpdates(Constants.CONSUMER_PROPERTIES)
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializer(StringDeserializer.class);
}
After we read a streamed event/message through the KafkaIO, we can commit the offset.
What i need to do is commit the offset manually, inside the last Commit offset PTransform when all the previous PTransforms executed.
The reason is, I am doing some conversions, mappings and persisting in the middle of the pipeline and when all the things done without failing, I need to commit the offset.
By doing so, if the processing fails in the middle, i can consume same record/event again and process.
My question is, how do I commit the offset manually?
Appreciate if its possible to share resources/sample codes.
Well, for sure, there are Read.commitOffsetsInFinalize() method, that is supposed to commit offsets while finalising the checkpoints, and AUTO_COMMIT consumer config option, that is used to auto-commit read records by Kafka consumer.
Though, in your case, it won't work and you need to do it manually by grouping the offsets of the same topic/partitiona/window and creating a new instance of Kafka client in your CommitOffset DoFn which will commit these offsets. You need to group the offsets by partition, otherwise it may be a race condition with committing the offsets of the same partition on different workers.
I'm using Apache KafkaConsumer. I want to check if the consumer has any messages to return without polling. If I poll the consumer and there aren't any messages, then I get the message "Attempt to heartbeat failed since the group is rebalancing" in an infinite loop until the timeout expires, even though I have a records.isEmpty() clause. This is a snippet of my code:
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(10));
if (records.isEmpty()) {
log.info("No More Records");
consumer.close();
}
else {
records.iterator().forEachRemaining(record -> log.info("RECORD: " + record);
);
This works fine until records are empty. Once it is empty, it logs "Attempt to heartbeat failed since the group is rebalancing" many times, logs "No More Records" once, and then continues to log the heartbeat error. What can I do to combat this and how can I elegantly check (without any heartbeat messages) that there are no more records to poll?
Edit: I asked another question and the full code and context is on this link: How to get messages from Kafka Consumer one by one in java?
Thanks in advance!
Out of comment: "Since I have a UI and want to receive a message one by one by clicking the "receive" button, there might be a case when there are no more messages to be polled."
In that case you need to create a new KafkaConsumer every time someone clicks on the "receive" button and then close it afterwards.
If you want to use the same KafkaConsumer for the lifetime of your client, you need to let the broker know that it is still alive (by sending a heartbeat, which is implicitly done through calling the poll method). Otherwise, as you have already experienced, the broker thinks your KafkaConsumer is dead and will initiate a rebalancing. As there is no other active Consumer available this rebalancing will not stop.
I have the following rabbitMq consumer:
Consumer consumer = new DefaultConsumer(channel) {
#Override
public void handleDelivery(String consumerTag, Envelope envelope, MQP.BasicProperties properties, byte[] body) throws IOException {
String message = new String(body, "UTF-8");
sendNotificationIntoTopic(message);
saveIntoDatabase(message);
}
};
Following situation can occur:
Message was send into topic successfully
Connection to database was lost so database insert was failed.
As a result we have data inconsistency.
Expected result either both action were successfully executed or both were not executed at all.
Any solutions how can I achieve it?
P.S.
Currently I have following idea(please comment upon)
We can suppose that broker doesn't lose any messages.
We have to be subscribed on topic we want to send.
Save entry into database and set field status with value 'pending'
Attempt to send data to topic. If send was successfull - update field status with value 'success'
We have to have a sheduled job which have to check rows with pending status. At the moment 2 cases are possible:
3.1 Notification wasn't send at all
3.2 Notification was send but save into database was failed(probability is very low but it is possible)
So we have to distinquish that 2 cases somehow: we may store messages from topic in the collection and job can check if the message was accepted or not. So if job found a message which corresponds the database row we have to update status to "success". Otherwise we have to remove entry from database.
I think my idea has some weaknesses(for example if we have multinode application we have to store messages in hazelcast(or analogs) but it is additional point of hypothetical failure)
Here is an example of Try Cancel Confirm pattern https://servicecomb.apache.org/docs/distributed_saga_3/ that should be capable of dealing with your problem. You should tolerate some chance of double submission of the data via the queue. Here is an example:
Define abstraction Operation and Assign ID to the operation plus a timestamp.
Write status Pending to the database (you can do this in the same step as 1)
Write a listener that polls the database for all operations with status pending and older than "timeout"
For each pending operation send the data via the queue with the assigned ID.
The recipient side should be aware of the ID and if the ID has been processed nothing should happen.
6A. If you need to be 100% that the operation has completed you need a second queue where the recipient side will post a message ID - DONE. If such consistency is not necessary skip this step. Alternatively it can post ID -Failed reason for failure.
6B. The submitting side either waits for a message from 6A of completes the operation by writing status DONE to the database.
Once a sertine timeout has passed or certain retry limit has passed. You write status to operation FAIL.
You can potentialy send a message to the recipient side opertaion with ID rollback.
Notice that all this steps do not involve a technical transactions. You can do this with a non transactional database.
What I have written is a variation of the Try Cancel Confirm Pattern where each recipient of message should be aware of how to manage its own data.
In the listener save database row with field staus='pending'
Another job(separated thread) will obtain all pending rows from DB and following for each row:
2.1 send data to topic
2.2 save into database
If we failured on the step 1 - everything is ok - data in consistent state because job won't know anything about that data
if we failured on the step 2.1 - no problem, next job invocation will attempt to handle it
if we failured on the step 2.2 - If we failured here - it means that next job invocation will handle the same data again. From the first glance you can think that it is a problem. But your consumer has to be idempotent - it means that it has to understand that message was already processed and skip the processing. This requirement is a consequence that all message brokers have guarantees that message will be delivered AT LEAST ONCE. So our consumers have to be ready for duplicated messages anyway. No problem again.
Here's the pseudocode for how i'd do it: (Assuming the dao layer has transactional capability and your messaging layer doesnt)
//Start a transaction
try {
String message = new String(body, "UTF-8");
// Ordering is important here as I'm assuming the database has commit and rollback capabilities, but the messaging system doesnt.
saveIntoDatabase(message);
sendNotificationIntoTopic(message);
} catch (MessageDeliveryException) {
// rollback the transaction
// Throw a domain specific exception
}
//commit the transaction
Scenarios:
1. If the database fails, the message wont be sent as the exception will break the code flow .
2. If the database call succeeds and the messaging system fails to deliver, catch the exception and rollback the database changes
All the actions necessary for logging and replaying the failures can be outside this method
If there is enough time to modify the design, it is recommended to use JTA like APIs to manage 2phase commit. Even weblogic and WebSphere support XA resource for 2 phase commit.
If timeline is less, it is suggested perform as below to reduce the failure gap.
Send data topic (no commit) (incase topic is down, retry to be performed with an interval)
Write data into DB
Commit DB
Commit Topic
Here failure will happen only when step 4 fails. It will result in same message send again. So receiving system will receive duplicate message. Each message has unique messageID and CorrelationID in JMS2.0 structure. So finding duplicate is bit straight forward (but this is to be handled at receiving system)
Both case will work for clustered environment as well.
Strict to your case, thought below steps might help to overcome your issue
Subscribe a listener listener-1 to your topic.
Process-1
Add DB entry with status 'to be sent' for message msg-1
Send message msg-1 to topic. Retry sending incase of any topic failure
If step 2 failed after certain retry, process-1 has to resend the msg-1 before sending any new messages OR step-1 to be rolled back
Listener-1
Using subscribed listener, read reference(meesageID/correlationID) from Topic, and update DB status to SENT, and read/remove message from topic. Incase reference-read success and DB update failed, topic still have message. So next read will update DB. Incase DB update success and message removal failed. Listener will read again and tries to update message which is already done. So can be ignored after validation.
Incase listener itself down, topic will have messages until listener reading the messages. Until then SENT messages will be in status 'to be sent'.
The quote from https://www.safaribooksonline.com/library/view/kafka-the-definitive/9781491936153/ch04.html#callout_kafka_consumers__reading_data_from_kafka_CO2-1
The drawback is that while commitSync() will retry the commit until it
either succeeds or encounters a non-retriable failure, commitAsync()
will not retry.
This phrase is not clear to me. I suppose that consumer sends commit request to broker and in case if the broker doesn't respond within some timeout it means that the commit failed. Am I wrong?
Can you clarify the difference of commitSync and commitAsync in details?
Also, please provide use cases when which commit type should I prefer.
As it is said in the API documentation:
commitSync
This is a synchronous commits and will block until either the commit succeeds or an unrecoverable error is encountered (in which case it is thrown to the caller).
That means, the commitSync is a blocking method. Calling it will block your thread until it either succeeds or fails.
For example,
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s", record.offset(), record.key(), record.value());
consumer.commitSync();
}
}
For each iteration in the for-loop, only after consumer.commitSync() successfully returns or interrupted with exception thrown, your code will move to the next iteration.
commitAsync
This is an asynchronous call and will not block. Any errors encountered are either passed to the callback (if provided) or discarded.
That means, the commitAsync is a non-blocking method. Calling it will not block your thread. Instead, it will continue processing the following instructions, no matter whether it will succeed or fail eventually.
For example, similar to previous example, but here we use commitAsync:
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s", record.offset(), record.key(), record.value());
consumer.commitAsync(callback);
}
}
For each iteration in the for-loop, no matter what will happen to consumer.commitAsync() eventually, your code will move to the next iteration. And, the result of the commit is going to be handled by the callback function you defined.
Trade-offs: latency vs. data consistency
If you have to ensure the data consistency, choose commitSync() because it will make sure that, before doing any further actions, you will know whether the offset commit is successful or failed. But because it is sync and blocking, you will spend more time on waiting for the commit to be finished, which leads to high latency.
If you are ok of certain data inconsistency and want to have low latency, choose commitAsync() because it will not wait to be finished. Instead, it will just send out the commit request and handle the response from Kafka (success or failure) later, and meanwhile, your code will continue executing.
This is all generally speaking, the actually behaviour will depend on your actual code and where you are calling the method.
Robust Retry handling with commitAsync()
In the book "Kafka - The Definitive Guide", there is a hint on how to mitigate the potential problem of commiting lower offsets due to an asynchronous commit:
Retrying Async Commits: A simple pattern to get commit order right for asynchronous retries is to use a monotonically increasing sequence number. Increase the sequence number every time you commit and add the sequence number at the time of the commit to the commitAsync callback. When you’re getting ready to send a retry, check if the commit sequence number the callback got is equal to the instance variable; if it is, there was no newer commit and it is safe to retry. If the instance sequence number is higher, don’t retry because a newer commit was already sent.
The following code depicts a possible solution:
import java.util._
import java.time.Duration
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord, KafkaConsumer, OffsetAndMetadata, OffsetCommitCallback}
import org.apache.kafka.common.{KafkaException, TopicPartition}
import collection.JavaConverters._
object AsyncCommitWithCallback extends App {
// define topic
val topic = "myOutputTopic"
// set properties
val props = new Properties()
props.put(ConsumerConfig.GROUP_ID_CONFIG, "AsyncCommitter")
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
// [set more properties...]
// create KafkaConsumer and subscribe
val consumer = new KafkaConsumer[String, String](props)
consumer.subscribe(List(topic).asJavaCollection)
// initialize global counter
val atomicLong = new AtomicLong(0)
// consume message
try {
while(true) {
val records = consumer.poll(Duration.ofMillis(1)).asScala
if(records.nonEmpty) {
for (data <- records) {
// do something with the records
}
consumer.commitAsync(new KeepOrderAsyncCommit)
}
}
} catch {
case ex: KafkaException => ex.printStackTrace()
} finally {
consumer.commitSync()
consumer.close()
}
class KeepOrderAsyncCommit extends OffsetCommitCallback {
// keeping position of this callback instance
val position = atomicLong.incrementAndGet()
override def onComplete(offsets: util.Map[TopicPartition, OffsetAndMetadata], exception: Exception): Unit = {
// retrying only if no other commit incremented the global counter
if(exception != null){
if(position == atomicLong.get) {
consumer.commitAsync(this)
}
}
}
}
}
Both commitSync and commitAsync uses kafka offset management feature and both has demerits.
If the message processing succeeds and commit offset failed(not atomic) and at same time partition re balancing happens, your processed message gets processed again(duplicate processing) by some other consumer. If you are okay with duplicate message processing, then you can go for commitAsync(because it doesn't block and provide low latency, and it provides a higher order commit. so you should be okay). Otherwise go for a custom offset management that takes care of atomicity while processing and updating the offset(use an external offset storage)
commitAync will not retry because if it retries it will make a mess.
Imagine that you are trying to commit offset 20 (async), and it did not commit (failed), and then the next poll block tries to commit the offset 40 (async), and it succeeded.
Now, commit offset 20 is still waiting to commit, if it reties and succeed it will make a mess.
The mess is that the committed offset should be 40 not 20.
The new Kafka version (0.11) supports exactly once semantics.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging
I've got a producer setup with kafka transactional code in java like this.
producer.initTransactions();
try {
producer.beginTransaction();
for (ProducerRecord<String, String> record : payload) {
producer.send(record);
}
Map<TopicPartition, OffsetAndMetadata> groupCommit = new HashMap<TopicPartition, OffsetAndMetadata>() {
{
put(new TopicPartition(TOPIC, 0), new OffsetAndMetadata(42L, null));
}
};
producer.sendOffsetsToTransaction(groupCommit, "groupId");
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
I'm not quite sure how to use the sendOffsetsToTransaction and the the intended use case of it. AFAIK, consumer groups is a multithreaded read feature on consumer end.
javadoc says
" Sends a list of consumed offsets to the consumer group coordinator, and also marks those offsets as part of the current transaction. These offsets will be considered consumed only if the transaction is committed successfully. This method should be used when you need to batch consumed and produced messages together, typically in a consume-transform-produce pattern."
How would produce maintain a list of consumed offsets? Whats the point of it?
This is only relevant to workflows in which you are consuming and then producing messages based on what you consumed. This function allows you to commit offsets you consumed only if the downstream producing succeeds. If you consume data, process it somehow, and then produce the result, this enables transactional guarantees across the consumption/production.
Without transactions, you normally use Consumer#commitSync() or Consumer#commitAsync() to commit consumer offsets. But if you use these methods before you've produced with your producer, you will have committed offsets before knowing whether the producer succeeded sending.
So, instead of committing your offsets with the consumer, you can use Producer#sendOffsetsToTransaction() on the producer to commit the offsets instead. This sends the offsets to the transaction manager handling the transaction. It will commit the offsets only if the entire transactions—consuming and producing—succeeds.
(Note: when you send the offsets to commit, you should add 1 to the offset last read, so that future reads resume from the offset you haven't read. This is true regardless of whether you commit with the consumer or the producer. See: KafkaProducer sendOffsetsToTransaction need offset+1 to successfully commit current offset).