The quote from https://www.safaribooksonline.com/library/view/kafka-the-definitive/9781491936153/ch04.html#callout_kafka_consumers__reading_data_from_kafka_CO2-1
The drawback is that while commitSync() will retry the commit until it
either succeeds or encounters a non-retriable failure, commitAsync()
will not retry.
This phrase is not clear to me. I suppose that consumer sends commit request to broker and in case if the broker doesn't respond within some timeout it means that the commit failed. Am I wrong?
Can you clarify the difference of commitSync and commitAsync in details?
Also, please provide use cases when which commit type should I prefer.
As it is said in the API documentation:
commitSync
This is a synchronous commits and will block until either the commit succeeds or an unrecoverable error is encountered (in which case it is thrown to the caller).
That means, the commitSync is a blocking method. Calling it will block your thread until it either succeeds or fails.
For example,
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s", record.offset(), record.key(), record.value());
consumer.commitSync();
}
}
For each iteration in the for-loop, only after consumer.commitSync() successfully returns or interrupted with exception thrown, your code will move to the next iteration.
commitAsync
This is an asynchronous call and will not block. Any errors encountered are either passed to the callback (if provided) or discarded.
That means, the commitAsync is a non-blocking method. Calling it will not block your thread. Instead, it will continue processing the following instructions, no matter whether it will succeed or fail eventually.
For example, similar to previous example, but here we use commitAsync:
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s", record.offset(), record.key(), record.value());
consumer.commitAsync(callback);
}
}
For each iteration in the for-loop, no matter what will happen to consumer.commitAsync() eventually, your code will move to the next iteration. And, the result of the commit is going to be handled by the callback function you defined.
Trade-offs: latency vs. data consistency
If you have to ensure the data consistency, choose commitSync() because it will make sure that, before doing any further actions, you will know whether the offset commit is successful or failed. But because it is sync and blocking, you will spend more time on waiting for the commit to be finished, which leads to high latency.
If you are ok of certain data inconsistency and want to have low latency, choose commitAsync() because it will not wait to be finished. Instead, it will just send out the commit request and handle the response from Kafka (success or failure) later, and meanwhile, your code will continue executing.
This is all generally speaking, the actually behaviour will depend on your actual code and where you are calling the method.
Robust Retry handling with commitAsync()
In the book "Kafka - The Definitive Guide", there is a hint on how to mitigate the potential problem of commiting lower offsets due to an asynchronous commit:
Retrying Async Commits: A simple pattern to get commit order right for asynchronous retries is to use a monotonically increasing sequence number. Increase the sequence number every time you commit and add the sequence number at the time of the commit to the commitAsync callback. When you’re getting ready to send a retry, check if the commit sequence number the callback got is equal to the instance variable; if it is, there was no newer commit and it is safe to retry. If the instance sequence number is higher, don’t retry because a newer commit was already sent.
The following code depicts a possible solution:
import java.util._
import java.time.Duration
import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord, KafkaConsumer, OffsetAndMetadata, OffsetCommitCallback}
import org.apache.kafka.common.{KafkaException, TopicPartition}
import collection.JavaConverters._
object AsyncCommitWithCallback extends App {
// define topic
val topic = "myOutputTopic"
// set properties
val props = new Properties()
props.put(ConsumerConfig.GROUP_ID_CONFIG, "AsyncCommitter")
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
// [set more properties...]
// create KafkaConsumer and subscribe
val consumer = new KafkaConsumer[String, String](props)
consumer.subscribe(List(topic).asJavaCollection)
// initialize global counter
val atomicLong = new AtomicLong(0)
// consume message
try {
while(true) {
val records = consumer.poll(Duration.ofMillis(1)).asScala
if(records.nonEmpty) {
for (data <- records) {
// do something with the records
}
consumer.commitAsync(new KeepOrderAsyncCommit)
}
}
} catch {
case ex: KafkaException => ex.printStackTrace()
} finally {
consumer.commitSync()
consumer.close()
}
class KeepOrderAsyncCommit extends OffsetCommitCallback {
// keeping position of this callback instance
val position = atomicLong.incrementAndGet()
override def onComplete(offsets: util.Map[TopicPartition, OffsetAndMetadata], exception: Exception): Unit = {
// retrying only if no other commit incremented the global counter
if(exception != null){
if(position == atomicLong.get) {
consumer.commitAsync(this)
}
}
}
}
}
Both commitSync and commitAsync uses kafka offset management feature and both has demerits.
If the message processing succeeds and commit offset failed(not atomic) and at same time partition re balancing happens, your processed message gets processed again(duplicate processing) by some other consumer. If you are okay with duplicate message processing, then you can go for commitAsync(because it doesn't block and provide low latency, and it provides a higher order commit. so you should be okay). Otherwise go for a custom offset management that takes care of atomicity while processing and updating the offset(use an external offset storage)
commitAync will not retry because if it retries it will make a mess.
Imagine that you are trying to commit offset 20 (async), and it did not commit (failed), and then the next poll block tries to commit the offset 40 (async), and it succeeded.
Now, commit offset 20 is still waiting to commit, if it reties and succeed it will make a mess.
The mess is that the committed offset should be 40 not 20.
Related
I am implementing Spring Boot application in Java, using Spring Cloud Stream with Kafka Streams binder.
I need to implement blocking operation inside of KStream map method like so:
public Consumer<KStream<?, ?>> sink() {
return input -> input
.mapValues(value -> methodReturningCompletableFuture(value).get())
.foreach((key, value) -> otherMethod(key, value));
}
completableFuture.get() throws exceptions (InterruptedException, ExecutionException)
How to handle these exceptions so that the chained method doesn't get executed and the Kafka message is not acknowledged? I cannot afford message loss, sending it to a dead letter topic is not an option.
Is there a better way of blocking inside map()?
You can try the branching feature in Kafka Streams to control the execution of the chained methods. For example, here is a pseudo-code that you can try.
You can possibly use this as a starting point and adapt this to your particular use case.
final Map<String, ? extends KStream<?, String>> branches =
input.split()
.branch(k, v) -> {
try {
methodReturningCompletableFuture(value).get();
return true;
}
catch (Exception e) {
return false;
}
}, Branched.as("good-records"))
.defaultBranch();
final KStream<?, String> kStream = branches.get("good-records");
kStream.foreach((key, value) -> otherMethod(key, value));
The idea here is that you will only send the records that didn't throw an exception to the named branch good-records, everything else goes into a default branch which we simply ignore in this pseudo-code. Then you invoke additional chained methods (as this foreach call shows) only for those "good" records.
This does not solve the problem of not acknowledging the message after an exception is thrown. That seems to be a bit challenging. However, I am curious about that use case. When an exception happens and you handle it, why don't you want to ack the message? The requirements seem to be a bit rigid without using a DLT. The ideal solution here is that you might want to introduce some retries and once exhausted from the retries, send the record to a DLT which makes Kafka Streams consumer acknowledges the message. Then the application moves on to the next offset.
The call methodReturningCompletableFuture(value).get() simply waits until a default or configured timeout is reached, assuming that methodReturningCompletableFuture() returns a Future object. Therefore, that is already a good approach to wait inside the KStream map operation. I don't think anything else is necessary to make it wait further.
Since I upgraded to Ignite 2.8.1 I'm frequently seeing timeouts on Ignite compute for Callables. This hasn't been the case with v2.7.6. I even never used the timeout option before because I never saw any delay in compute.
Since it's now often stuck in the compute().call() method I added withTimeout(60000) but still had to add a loop to try this several times before the function is actually called on any of the client nodes. This doesn't happen all times, but sometimes.
When it happens I see in the Ignite log messages like this:
[01:14:27,237][WARNING][grid-timeout-worker-#119][GridTaskWorker] Task has timed out: GridTaskSessionImpl [taskName=de.my.comp.net.ignite.CreateNewMedia, dep=LocalDeployment [super=GridDeployment [ts=1609753866994, depMode=SHARED, clsLdr=jdk.internal.loader.ClassLoaders$AppClassLoader#5c29bfd, clsLdrId=06faeccc671-65368413-cf29-4a85-9f51-9b124c6a3351, userVer=0, loc=true, sampleClsName=java.lang.String, pendingUndeploy=false, undeployed=false, usage=0]], taskClsName=de.my.comp.net.ignite.CreateNewMedia, sesId=7bccb2ec671-65368413-cf29-4a85-9f51-9b124c6a3351, startTime=1610586807229, endTime=1610586867229, taskNodeId=65368413-cf29-4a85-9f51-9b124c6a3351, clsLdr=jdk.internal.loader.ClassLoaders$AppClassLoader#5c29bfd, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, internal=false, topPred=null, subjId=65368413-cf29-4a85-9f51-9b124c6a3351, mapFut=IgniteFuture [orig=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=null, hash=154761191]], execName=null]
This is now my code (shortened to be more readable):
while(something)
{
...
...
final long TASK_TIMEOUT_CREATENEWMEDIA = 60 * 1000;
ClusterGroup servers = IgniteHelper.getComputeNodes(ignite);
UUID thisMasterID = IgniteHelper.getLocalNodeID(ignite);
IgniteSemaphore semaphore = ignite.semaphore("masterFindNodeSempahore", 1, true, true);
try
{
semaphore.acquire();
CreateNewMedia cnm = new CreateNewMedia(currentConfig, thisMasterID, currentJobID, msResubmit);
CreateNewMediaResult result = ignite.compute(servers).withTimeout(TASK_TIMEOUT_CREATENEWMEDIA).call(cnm);
...
...
}
catch (IgniteException e)
{
if (e instanceof ComputeTaskTimeoutException)
{
// log task timeout, try again if not max tries reached
...
...
}
else
{
// if some other issue, we better leave
throw new Exception(e);
}
}
finally
{
semaphore.release();
}
}
The goal of this code is to find a node that is free to perform a specific action. The function returns if the node where it was executed is available. If it returns negative result, I remove this node ID from the list and try the remaining servers (this logic is not in the above code for simplicity).
It seems that setting publicThreadPoolSize to 512 reduced chances for the timeout to happen, but it still sometimes occur. If it occurs, in most cases it eventually calls the function after a few minutes, i.e. several tries to compute().call() in loop.
Is this known in Ignite 2.8.1? This is running on a server farm with 25 Ignite compute clients.
Workload on the client nodes is high but definitely not overloaded. Actually the nodes are mostly running an external cmdline executable while the Java portion is only waiting for its console output.
Any help appreciated.
Additional note: I've seen the timeout with v2.8.1 on other Callable's as well, so I don't think it's particularly related to the implementation of CreateNewMedia or of how it's been called in my sample code above.
I'm working on one requirement where I need to consume messages from Kafka broker. The frequency is very high, so that's why I've choosen Async mechanism.
I want to know, while consuming messages, lets say connection break down with broker or broker itself failed due to any reason and offset could not get commit back to broker. So after restarting, I've to consume same messages again which was consumed earlier but not commited back in broker.
private static void startConsumer() {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5));
for (ConsumerRecord<String, String> record : records) {
System.out.printf("consumed: key = %s, value = %s, partition id= %s, offset = %s%n",
record.key(), record.value(), record.partition(), record.offset());
}
if (records.isEmpty()) {
System.out.println("-- terminating consumer --");
break;
}
printOffsets("before commitAsync() call", consumer, topicPartition);
consumer.commitAsync();
printOffsets("after commitAsync() call", consumer, topicPartition);
}
printOffsets("after consumer loop", consumer, topicPartition);
}
may I know please, what can be done to overcome this situation where I dont need to consume same message again after restart ?
You need to manage your offsets on your own in a atomic way. That means, you need build your own "transaction" around
fetching data from Kafka,
processing data, and
storing processed offsets externally (or in your case printing it to the logs).
The methods commitSync and commitAsync will not get you far here as they can only ensure at-most-once or at-least-once processing within the Consumer. In addition, it is beneficial that your processing is idempotent.
There is a nice blog that explains such an implementation making use of the ConsumerRebalanceListener and storing the offsets in your local file system. A full code example is also provided.
I have a custom Kafka Consumer in which I use to send some requests to a REST API.
According to the response from the API, I either commit the offset or skip the message without commit.
Minimal example:
while (true) {
ConsumerRecords<String, Object> records = consumer.poll(200);
for (ConsumerRecord<String, Object> record : records) {
// Sending a POST request and retrieving the answer
// ...
if (responseCode.startsWith("2")) {
try {
consumer.commitSync();
} catch(CommitFailedException ex) {
ex.printStackTrace();
}
} else {
// Do Nothing
}
}
}
Now when a response from the REST API does not start with a 2 the offset is not committed, but the message is not re-consumed. How can I force the consumer to re-consume messages with uncommitted offsets?
Make sure your data is idempotent if you are planning to use seek(). Since you are selectively committing offsets, the records left out are possibly going to be before committed (successfully processed) records. If you do seek() - which is moving your groupId's pointer to uncommitted offset and start the replay, you will get those successfully processed messages also. It also has potential of becoming an infinite loop.
Alternatively, you can save unsuccessful record's metadata in memory or db and replay topic from beginning with "poll(retention.ms)" so that all records are replayed back but add a filter to process only those through API whose metadata matches with what you had saved earlier. Do this as a batch processing once every hour or few hours.
Committing offsets is just a way to store the current offset, also know as position, of the Consumer. So in case it stops, it (or the new consumer instance taking over) can find its previous position and restart consuming from there.
So even if you don't commit, the consumer's position is moved once you receive records. If you want to reconsume some records, you have to change the consumer's current position.
With the Java client, you can set the position using seek().
In your scenario, you probably want to calculate the new position relative to the current position. If so you can find the current position using position().
Below are the alternate approaches you can take(instead of seek) :
When REST is failed, move the message to a adhoc kafka topic. You can write another program to read the messages of this topic on a scheduled manner.
When REST is failed, write the Request to a flat flat. Use a shell/any script to read each request and send it on a scheduled basis.
The new Kafka version (0.11) supports exactly once semantics.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging
I've got a producer setup with kafka transactional code in java like this.
producer.initTransactions();
try {
producer.beginTransaction();
for (ProducerRecord<String, String> record : payload) {
producer.send(record);
}
Map<TopicPartition, OffsetAndMetadata> groupCommit = new HashMap<TopicPartition, OffsetAndMetadata>() {
{
put(new TopicPartition(TOPIC, 0), new OffsetAndMetadata(42L, null));
}
};
producer.sendOffsetsToTransaction(groupCommit, "groupId");
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
I'm not quite sure how to use the sendOffsetsToTransaction and the the intended use case of it. AFAIK, consumer groups is a multithreaded read feature on consumer end.
javadoc says
" Sends a list of consumed offsets to the consumer group coordinator, and also marks those offsets as part of the current transaction. These offsets will be considered consumed only if the transaction is committed successfully. This method should be used when you need to batch consumed and produced messages together, typically in a consume-transform-produce pattern."
How would produce maintain a list of consumed offsets? Whats the point of it?
This is only relevant to workflows in which you are consuming and then producing messages based on what you consumed. This function allows you to commit offsets you consumed only if the downstream producing succeeds. If you consume data, process it somehow, and then produce the result, this enables transactional guarantees across the consumption/production.
Without transactions, you normally use Consumer#commitSync() or Consumer#commitAsync() to commit consumer offsets. But if you use these methods before you've produced with your producer, you will have committed offsets before knowing whether the producer succeeded sending.
So, instead of committing your offsets with the consumer, you can use Producer#sendOffsetsToTransaction() on the producer to commit the offsets instead. This sends the offsets to the transaction manager handling the transaction. It will commit the offsets only if the entire transactions—consuming and producing—succeeds.
(Note: when you send the offsets to commit, you should add 1 to the offset last read, so that future reads resume from the offset you haven't read. This is true regardless of whether you commit with the consumer or the producer. See: KafkaProducer sendOffsetsToTransaction need offset+1 to successfully commit current offset).