Kafka - How to obtain failed messages details in Producer class - java

Kafka allows for asynchronous message sending through below methods on Producer (KafkaProducer) class:
public java.util.concurrent.Future<RecordMetadata> send(ProducerRecord<K,V> record)
public java.util.concurrent.Future<RecordMetadata> send(ProducerRecord<K,V> record, Callback callback)
Successes can be handled through
1) the Future<RecordMetaData> object or
2) onCompletion method invoked by the callback. Full method signature and usage of onCompletion is as below (taken from kafka docs)
`
ProducerRecord<byte[],byte[]> record = new ProducerRecord<byte[],byte[]>("the-topic", key, value);
producer.send(record,
new Callback() {
public void onCompletion(RecordMetadata metadata, Exception e) {
if(e != null)
e.printStackTrace();
System.out.println("The offset of the record we just sent is: " + metadata.offset());
}
});
While failure needs to be handled through the Exception e passed to the onCompletion method
Fine every thing looks good so far.
But if I am getting it right, any reasonable information that can be obtained from exception or e object is stacktrace and exception message. What I mean to point out here is, e does not contain any information of the actual record sent. Or in other words, it does not contain a reference to the actual record that was sent to kafka broker. So what useful processing or handling can be done by the producer if the record was not sent successfully. Really not much.
Why I say this is - ideally I would like to make a log of the failed message some where and then try to resend it. But with the little information (e) provided by framework, i feel this is not possible.
Can someone point out if I am right or wrong?

You could easily create a callback that receives the producerRecord as a constructor argument. So upon onCompletion with an exception, you can have complete knowledge of the producer record, and even try to send it again.
I dealt with the same issue. Created a callback that gets both producerRecord, and a callback handler that uses an executor service to send the record again. So eventually, I can tolerate any number of failures (e.g. network issues or kafka is down), and recover from it.

Related

DeserializationException on consuming message from Kafka topic

I have a problem with my kafka consumer, when consumer start reading from topic, message deserialization fails and block my consumer. Looks like it can't deserialize massage that has been changed. You can reproduce the problem, just produce message change it structure and produce it again on the same consumer. How I can handle this case and avoid a blocking of consumer?
Error message:
enter image description here
// initial message
class Message {
String name;
}
// Message after change
class Message {
List<String> names;
}
// When Exception throws, this code does not execute
#KafkaHandler
public void listenOnMessage(Message message) {
log.info("message: {}", message);
// ...
}
Update:
basically problem was in producer, it from some point in time started publish different messages structure, solution is simple, all I need to add is ErrorHandling on consumer side, retry logic and recovery. Basically discussion on reddit and repos with similar issue and solution you can find here https://www.reddit.com/r/apachekafka/comments/10dd0qg/how_to_handle_a_poison_pill_on_consumer_side/

Where does the message head after a Patterns.ask timeout?

Been recently experiencing some timeouts with scala.concurrent.Future objects created awaiting processing within an Akka actor and I was wondering how to handle those timeout'd events. Are they really lost? Are they retried and preserved in memory or how does it work?
To put a bit of context, the code goes the following.
List<Future<MyMessage>> futureMessageList = plainMessages.stream()
.map(this::toFuture)
.collect(Collectors.toList());
Futures.sequence(futureMessageList, ExecutionContexts.global())
.onComplete(new OnComplete<Iterable<MyMessage>>() {
#Override
public void onComplete(Throwable throwable, Iterable<MyMessage> messages) {
... // iterate futureMessageList list
Within the onComplete an iteration over futureMessageList takes place, which is basically composed of Future objects which encapsulate MyMessage.
However, the function toFuture does a Patterns.ask() with a given dispatcher and that seems to be taking more than the timeout I sent (60 seconds). Take into account that the response times depend on an underlying system which may be under high load or without the fastest network depending on the environment it runs.
Future<MyMessage> message = Patterns.ask(actorSystem.getSampleDispatcher(), msg, TIMEOUT_60_SECS)
So my question is, after the onComplete throws the following exception due to the Future not being processed in time...
java.lang.NullPointerException
at my.package.Clazz.onComplete(Clazz.java:4)
at my.package.Clazz$1.onComplete(Clazz.java:5)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run$$$capture(Promise.scala:32)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala)
at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
Are those MyMessage objs saved within memory and retried afterwards? Should I somehow handle the exception and handle those timeout'd messages with an in-memory list or how should I workaround this?
When ask times out from not getting a reply it completes the Future (or CompletionStage) with a failure. The message may still be somewhere being processed and if there is a response it will end up in dead letters (https://doc.akka.io/docs/akka/current/general/message-delivery-reliability.html#dead-letters). Other scenarios where the timeout could hit are if the actor has stopped or crashed processing the message, the request or response got lost (not likely unless the responding actor is remote).
Future.sequence will either complete successfully when all futures passed to it has completed successfully or fail if any of them fails.
This means that if any of the asks time out you will get null as the messages parameter and the exception from the first failing future as the throwable parameter in your onComplete callback.
If you rather would like to get a partial list of results, each being either a successful value or an exception. You can do that with with the help of recover on each future before passing them to Future.sequence.

KafkaTemplate send method safe to use without manual (blocking) check of returned future

I use the KafkaTemplate of spring-kafka for producing some messages to kafka.
I have a simple rest API which could be used for creating messages.
Inside my code i use the KafkaTemplate like this for producing my message to kafka.
kafkaTemplate.send("topic", "key", "data")
It's important for me that I only give back a success to the client when the message was really send.
But now i realised that the method is at least partially asynchron.
The signature of the method is:
public ListenableFuture<SendResult<K, V>> send(String topic, K key, #Nullable V data) {
So it returns a future and the future could eventually call the failure callback when it will be resolved.
I have looked a little bit under the hood in the java code of spring-kafka (version 2.2.6) and it seems like there are some error which will be thrown directly and some which will be only available by resolving the future, there is also a javadoc for it which looks like:
// handling exceptions and record the errors;
// for API exceptions return them in the future,
// for other exceptions throw directly
My main question is: Do I have to resolve the future (which includes API exception) in a blocking way for finding out if something goes wrong on send; so that I can be 100% sure that my message was send correctly?
Or is the sending itself already guaranteed when the send method don't throw any exception and the error inside the future are only about issues on receiving back some meta information? (or something similar).
If you don't want to block on the get to detect a failure, the other option is to add a callback to the future to get the result asynchronously. (It is a ListenableFuture).
public interface ListenableFutureCallback<T> extends SuccessCallback<T>, FailureCallback {
}
#FunctionalInterface
public interface SuccessCallback<T> {
/**
* Called when the {#link ListenableFuture} completes with success.
* <p>Note that Exceptions raised by this method are ignored.
* #param result the result
*/
void onSuccess(#Nullable T result);
}
#FunctionalInterface
public interface FailureCallback {
/**
* Called when the {#link ListenableFuture} completes with failure.
* <p>Note that Exceptions raised by this method are ignored.
* #param ex the failure
*/
void onFailure(Throwable ex);
}
There are no guarantees of success if you just send (and pray).
EDIT
Client side exceptions (e.g. serialization) are thrown on the calling thread, but server side will be completed on the future asynchronously.
There are a number of server-side errors that might result in an async failure; see the Errors class...
/**
* This class contains all the client-server errors--those errors that must be sent from the server to the client. These
* are thus part of the protocol. The names can be changed but the error code cannot.
*
* Note that client library will convert an unknown error code to the non-retriable UnknownServerException if the client library
* version is old and does not recognize the newly-added error code. Therefore when a new server-side error is added,
* we may need extra logic to convert the new error code to another existing error code before sending the response back to
* the client if the request version suggests that the client may not recognize the new error code.
*
* Do not add exceptions that occur only on the client or only on the server here.
*/
public enum Errors {
UNKNOWN_SERVER_ERROR(-1, "The server experienced an unexpected error when processing the request.",
UnknownServerException::new),
NONE(0, null, message -> null),
OFFSET_OUT_OF_RANGE(1, "The requested offset is not within the range of offsets maintained by the server.",
OffsetOutOfRangeException::new),
CORRUPT_MESSAGE(2, "This message has failed its CRC checksum, exceeds the valid size, has a null key for a compacted topic, or is otherwise corrupt.",
CorruptRecordException::new),
UNKNOWN_TOPIC_OR_PARTITION(3, "This server does not host this topic-partition.",
UnknownTopicOrPartitionException::new),
INVALID_FETCH_SIZE(4, "The requested fetch size is invalid.",
InvalidFetchSizeException::new),
LEADER_NOT_AVAILABLE(5, "There is no leader for this topic-partition as we are in the middle of a leadership election.",
LeaderNotAvailableException::new),
...

Assert Kafka send worked

I'm writing an application with Spring Boot so to write to Kafka I do:
#Autowired
private KafkaTemplate<String, String> kafkaTemplate;
and then inside my method:
kafkaTemplate.send(topic, data)
But I feel like I'm just relying on this to work, how can I know if this has worked? If it's asynchronous, is it a good practice to return a 200 code and hoped it did work? I'm confused. If Kafka isn't available, won't this fail? Shouldn't I be prompted to catch an exception?
Along with what #mjuarez has mentioned you can try playing with two Kafka producer properties. One is ProducerConfig.ACKS_CONFIG, which lets you set the level of acknowledgement that you think is safe for your use case. This knob has three possible values. From Kafka doc
acks=0: Producer doesn't care about acknowledgement from server, and considers it as sent.
acks=1: This will mean the leader will write the record to its local log but will respond without awaiting full acknowledgement from all followers.
acks=all: This means the leader will wait for the full set of in-sync replicas to acknowledge the record.
The other property is ProducerConfig.RETRIES_CONFIG. Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error.
Yes, if Kafka is not available, that .send() call will fail, but if you send it async, no one will be notified. You can specify a callback that you want to be executed when the future finally finishes. Full interface spec here: https://kafka.apache.org/20/javadoc/org/apache/kafka/clients/producer/Callback.html
From the official Kafka javadoc here: https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html
Fully non-blocking usage can make use of the Callback parameter to
provide a callback that will be invoked when the request is complete.
ProducerRecord<byte[],byte[]> record = new ProducerRecord<byte[],byte[]>("the-topic", key, value);
producer.send(myRecord,
new Callback() {
public void onCompletion(RecordMetadata metadata, Exception e) {
if(e != null) {
e.printStackTrace();
} else {
System.out.println("The offset of the record we just sent is: " + metadata.offset());
}
}
});
you can use below command while sending messages to kafka:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topic-name
while above command is running you should run your code and if sending messages being successful then the message must be printed on the console.
Furthermore, likewise any other connection to any resources if the connection could not be established, then doing any kinds of operations would result some exception raises.

Howto solve this typical Producer Consumer scenario

I encountered an interresting and I think very common synchronization problem in my test code.
This is the test (its a functional test that connects from the outside to the system), i run it via TestNG.
#Test
public void operationalClientConnected_sendGetUserSessionRequest_clientShallReceiveGetUserSessionResponse() {
// GIVEN
OperationalClientSimulator client = operationalClientHasEstablishedWebSocketConnection("ClientXY");
// WHEN
GetUserSessionRequest request = PojoRequestBuilder.newRequest(GetUserSessionRequest.class).build();
client.sendRequest(request);
// THEN
assertThatClientReceivesResponse(client, GetUserSessionResponse.class, request.getCorrelationId(), request.getRequestId());
}
Basically i send a single request and wait for the correct response, this is what i want to verify in this test.
Behind the assertThatClientReceivesResponse there is a hamcrest matcher that looks like this:
#Override
protected boolean matchesSafely(final OperationalClientSimulator client) {
Object awaitedMessage = client.awaitMessage(
new Verification<Object>() {
#Override
public VerificationResult verify(final Object actual) {
VerificationResult result = new VerificationResult();
if (!_expectedResponseClass.isInstance(actual)) {
result.addMismatch("not of expected type", actual, _expectedResponseClass.getSimpleName());
}
// check more details of message ..
return result;
}
}, _expectedTimeout);
boolean matches = awaitedMessage != null;
if (matches) {
_messageCaptor.setActualMessage((T) awaitedMessage);
}
return matches;
}
Now to the interresting part, the synchronization in the OperationalClientSimulator class.
Two methods are of interrest:
awaitMessage which blocks until either a message that matches the given Verification is received or the timeout expired
onMessage received method which is called for each message that is received (over a websocket connection)
Basically what I want to achive is having the test thread block on the awaitMessage method until either the correct message is received (via onMessage) or the specified timeout elapsed.
public Object awaitMessage(final Verification<Object> verification, final long timeoutMillis) {
// howto sync?
return awaitedMessage; // or null
}
#Override
public void onMessage(final String message) {
LOG.info("#Client {} <== received a message on websocket - {}", name, message);
// howto sync?
}
About the test:
The test thread will almost always be faster and therefor has to wait until the response is received via the awaitMessage method
There can be very rare cases when the expected message is received before the test thread is checking for it (this basically means i have to save every received message)
In this specific test case there are only a handfull of messages that are received (some heartbeat messages, the actual response and a notification), but in other cases there can be hundreds of messages which in need to inspect to find the expected message(s)
I was thinking about different solutions for synchronizing here:
The simplest of course would be the sync with the synchronized keyword but I think there are neater ways to do this
The onMessage received method could simply write into a blocking queue and the test thread can consume from it but here I dont know how to measure the timeout.. can I use a CountdownLatch?
Maybe I can do a non blocking solution where the producer (onMessage) writes into an Array and the consumer reads until it reaches an index that is published by the producer (like the LMAX Disruptor)
I know this is test code and performance is not really an issue here, I am just thinking how to solve this in a "nice" way.. you know.. because its christmas :-)
So the actual question here is, how do i "safely" wait for the message which i expect in my test with a timeout? Safely here means that i never miss a message or lose a message because of concurrency issues and that I also need to check if the expected message was already received.
How should I synchronize between the test runner thread and the thread that calls the onMessage method in the OperationalClientSimulator when a message is received on the websocket connection.

Categories

Resources