Optimal Kafka Producer Options for Failover and Recovery

Optimal Kafka Producer Options for Failover and Recovery - java

I would like to fine tune the KafkaTemplate options for the Producer, to handle as optimally as possible, the various failover and recovery scenarios.
We have our KafkaProducerMessageHandler running in sync mode (i.e. waiting for the send operation results - see: acks below). Note: this is necessary in the current version of Kafka to enable ErrorChannel reporting.
Here are the options I have choosen:
acks = 1 (we are performing basic acknowlegement from the Kafka
broker leader)
retries = 10
max.in.flight.requests.per.connection = 1 (This will keep the messages in order, if an error state is reached)
linger.ms = 1 (not sure about this one or whether it is relevant?)
request.timeout.ms = 5000
(five seconds for timeout, this will work with the retries - so
total time of 50 seconds, before the message is deemed to have failed and will then appear on the error channel)
enable.idempotence = false (again, not sure about this option?)
retry.backoff.ms = 100 (this is the default - again is it worth playing with?)
How do these values sound?
Is there anything I am missing?

This is an old post about Kafka producer tuning: http://ingest.tips/2015/07/19/tips-for-improving-performance-of-kafka-producer/
TLDR version:
Pay attention on the 'batch.size' and 'linger.ms' parameters.

Related

TooManyMessagesWithoutAckException while processing kafka message in quarkus

In a quarkus process we're performing below steps once the message is polled from kafka
Thread.sleep(30000) - Due to business logic
call a 3rd party API
call another 3rd party api
Inserting data in db
Once almost everyday the process hangs after throwing TooManyMessagesWithoutAckException.
2022-12-02 20:02:50 INFO [2bdf7fc8-e0ad-4bcb-87b8-c577eb506b38, ] : Going to sleep for 30 sec.....
2022-12-02 20:03:20 WARN [ kafka] : SRMSG18231: The record 17632 from topic-partition '<partition>' has waited for 60 seconds to be acknowledged. This waiting time is greater than the configured threshold (60000 ms). At the moment 2 messages from this partition are awaiting acknowledgement. The last committed offset for this partition was 17631. This error is due to a potential issue in the application which does not acknowledged the records in a timely fashion. The connector cannot commit as a record processing has not completed.
2022-12-02 20:03:20 WARN [ kafka] : SRMSG18228: A failure has been reported for Kafka topics '[<topic name>]': io.smallrye.reactive.messaging.kafka.commit.KafkaThrottledLatestProcessedCommit$TooManyMessagesWithoutAckException: The record 17632 from topic/partition '<partition>' has waited for 60 seconds to be acknowledged. At the moment 2 messages from this partition are awaiting acknowledgement. The last committed offset for this partition was 17631.
2022-12-02 20:03:20 INFO [2bdf7fc8-e0ad-4bcb-87b8-c577eb506b38, ] : Sleep over!
Below is an example on how we are consuming the messages
#Incoming("my-channel")
#Blocking
CompletionStage<Void> consume(Message<Person> person) {
String msgKey = (String) person
.getMetadata(IncomingKafkaRecordMetadata.class).get()
.getKey();
// ...
return person.ack();
}
As per the logs only 30 seconds have passed since the event was polled but the exception of kafka acknowledgement not being sent for 60 second is thrown.
I checked whole day's log when the error was thrown to see if the REST api calls took more than 30 seconds to fetch the data, but I wasn't able to find any.
We haven't done any specific kafka configuration other than topic name, channel name, serializer, deserializer, group id and managed kafka connection details.
There are 4 partitions in this topic with replication factor of 3. There are 3 pods running for this process.
We're unable to reproduce to this issue in Dev and UAT environments.
I checked configuration options which but couldn't find any configuration which might help : Quarkus Kafka Reference
mp:
messaging:
incoming:
my-channel:
topic: <topic>
group:
id: <group id>
connector: smallrye-kafka
value:
serializer: org.apache.kafka.common.serialization.StringSerializer
deserializer: org.apache.kafka.common.serialization.StringDeserializer
Is it possible that quarkus is acknowledging the messages in batches and by that time the waiting time has already reached the threshold?
Please comment if there are any other possibilities for this issue.

I have similiar issues on our production environment running different quarkus services with a simple 3-Node-Kafka-Cluster and I researched the problem a lot - with no clear answer. At the moment, I have two approaches to this problem:
Make sure, you really ack or nack the kafka-message in your code. Is really every exception catched and answered with a "person.nack(exception);" (or a "person.ack(()" - depends on your failure strategy)? Make sure it is. The error Throttled-Exception is thrown, if no ack() OR nack() is performed. The problem occurres mostly, if nothing happens at all.
When this does not help, I switch the commit-strategy to "latest":
mp.messaging.incoming.my-channel.commit-strategy=latest
This is a little slower, because the batch commit is disabled, but runs stable in my case. If you don't know about commit strategies and the default, catch up with the good article by Escoffier:
I am aware, that this does not solve the root-cause, but helped in desperate times. The problem has to be, that one or more queued messages are not acknowledged in time, but I can't tell you why. Maybe the application logic is too slow, but I have a hard time - like you - to reproduce this locally. You can also try to increase the threshold of 60 sec with throttled.unprocessed-record-max-age.ms and a see for yourself, if this helps. In my case, it did not. Maybe someone else can share his insights with this problem and can provide you with a real solution.

Kafka consumer not automatically reconnecting after outage

In our infrastructure we are running Kafka with 3 nodes and have several spring boot services running in OpenShift. Some of the communication between the services happens via Kafka. For the consumers/listeners we are using the #KafkaListener spring annotation with a unique group ID so that each instance (pod) consumes all the partitions of a topic
#KafkaListener(topics = "myTopic", groupId = "group#{T(java.util.UUID).randomUUID().toString()}")
public void handleMessage(String message) {
doStuffWithMessage(message);
}
For the configuration we are using pretty much the default values. For the consumers all we got is
spring.kafka.consumer:
auto-offset-reset: latest
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
Sometimes we face the unfortunate situation, where all of our Kafka nodes are shortly down, which will result in the consumers unregistering, as logged by org.apache.kafka.common.utils.AppInfoParser
App info kafka.consumer for consumer-group5c327050-5b05-46fb-a7be-c8d8a20d293a-1 unregistered
Once the nodes are up again, we would expect that the consumers register again, however that is not the case. So far we have no idea why they fail to do so. For now we are forced to restart the affected pods, when this issue occurs. Did anybody have a similar issue before or has an idea what we might be doing wrong?
Edit: We are using the following versions
spring-boot 2.6.1
spring-kafka 2.8.0
apache kafka 2.8.0

In kafka config you can use reconnect.backoff.max.ms config parameter to set a maximum number of milliseconds to retry connecting.
Additionally, set the parameter reconnect.backoff.ms to a base number of milliseconds to wait before retrying to connect.
If provided, the backoff per host will increase exponentially for each
consecutive connection failure, up to this maximum.
Kafka documentation https://kafka.apache.org/31/documentation/#streamsconfigs
If you set the max milliseconds to reconnect to something fairly high, like a day, the connection will be reattempted for up to a day (With increasing intervals, 50,500,5000,50000 etc').

We did some more digging in our logs and found the underlying issue that causes the consumer(s) to be stopped.
Authentcation/Authorization Exception and no authExceptionRetryInterval set
So apparently the consumer is getting an Authentcation/Authorization Exception when trying to reconnect to the currently unavailable Kafka nodes and since we did not set authExceptionRetryInterval there won't be any retries and the consumer (listener container) is stopped. https://docs.spring.io/spring-kafka/api/org/springframework/kafka/listener/ConsumerProperties.html#setAuthExceptionRetryInterval(java.time.Duration)
Set the interval between retries after and AuthenticationException or org.apache.kafka.common.errors.AuthorizationException is thrown by KafkaConsumer. By default the field is null and retries are disabled. In such case the container will be stopped. The interval must be less than max.poll.interval.ms consumer property.
We are quite confident, that setting authExceptionRetryInterval will solve our problem.

The coordinator is not aware of this member even with very large poll interval

We have a KStreams application with following configuration:
props.setProperty(RETRIES_CONFIG, String.valueOf(Integer.MAX_VALUE));
props.setProperty(RETRY_BACKOFF_MS_CONFIG, "5000"); // 5 seconds
props.setProperty(RECONNECT_BACKOFF_MS_CONFIG, "5000"); // 5 seconds
props.setProperty(REQUEST_TIMEOUT_MS_CONFIG, "5000"); // 5 seconds
props.setProperty(SESSION_TIMEOUT_MS_CONFIG, "25000"); // 25 seconds session timeout
props.setProperty(MAX_POLL_RECORDS_CONFIG, "100"); // 100 records per poll
props.setProperty(MAX_POLL_INTERVAL_MS_CONFIG, String.valueOf(Integer.MAX_VALUE));
// do not add any more time to window retention period, delete immidiately
props.setProperty(WINDOW_STORE_CHANGE_LOG_ADDITIONAL_RETENTION_MS_CONFIG, "0");
Even with very large MAX_POLL_INTERVAL_MS_CONFIG, we see errors like (formatted exceptions in json):
{
"#timestamp": "2020-02-07T15:30:19.631Z",
"message": "[Consumer clientId=client-03a38ada-b39c-497a-acd4-aa95066fdc8a-StreamThread-6-consumer, groupId=group-name] Offset commit failed on partition group-name-repartition-3 at offset 9066: The coordinator is not aware of this member.",
"logger_name": "org.apache.kafka.clients.consumer.internals.ConsumerCoordinator",
"level": "ERROR"
}
What else do we need to configure? Is there any other parameter involved? I have to mention that Kafka broker is managed service and we don't configure server side configuration parameters. Additionally commit interval is set to 10 seconds. Everything else is default for KStreams 2.4.0.

Another reason of this problem is not sending heartbeat in session.timeout.ms. So maybe you can consider to increase this.
heartbeat.interval.ms: The expected time between heartbeats to the
consumer coordinator when using Kafka's group management facilities.
Heartbeats are used to ensure that the consumer's session stays active
and to facilitate rebalancing when new consumers join or leave the
group. The value must be set lower than session.timeout.ms, but
typically should be set no higher than 1/3 of that value. It can be
adjusted even lower to control the expected time for normal
rebalances.
session.timeout.ms: The timeout used to detect client failures when
using Kafka's group management facility. The client sends periodic
heartbeats to indicate its liveness to the broker. If no heartbeats
are received by the broker before the expiration of this session
timeout, then the broker will remove this client from the group and
initiate a rebalance.

Why does the first Kafka Producer request to publish take longer than subsequent requests?

I have a simple kafka producer where I have enabled retries by giving retries as 3.
I need to set an interval between retries and I have set the same using
retry.backoff.ms. However, I am able to see that the initial request to publish takes as much time as is specified in the retry.backoff.ms config. If the retry.backoff.ms is 100 I get the response after publishing in around 110 ms and if I give retry.backoff.ms as 60000 ms (1 min, after increasing the max.block.ms accordingly)the first request takes slightly more than 60000 ms. Why am I observing this behavior? Does spring kafka do a dummy retry on initial connection?
I'm on java 8, spring boot 1.5.9 with Edgware.SR1 cloud version.

How to "copy" a JMS message to 2 destinations?

I have a requirement that a single JMS message sent by a client must be delivered reliably (exactly-once) to two systems. These 2 systems are not HA-enabled, so the best suggestion that I came up with is to:
create single queue where client posts to
set up two "intermediate" queues
use a custom "DuplicatorMDB" that will read messages from the client queue and post them to two queues within the same transaction.
client->JMSDQ->DuplicatorMDB->Q1->MDB->System1
\->Q2->MDB->System2
Is there any existing functionality like that? What would be the proper way to balance the system to keep it stable if one or both of the backend systems are down?
The application server is WebLogic 10.
I can't use topics for this because in a cluster topics will cause too much message duplication. If we have 2 instances, then with topics it'll go like this:
client->Topic-->MDB1#server1->System1
| \->MDB2#server1->System2
\---->MDB1#server2->System1
\--->MDB2#server2->System2
Thus every message will be delivered twice to System1 and twice to System2 and if there'll be 8 servers in a cluster, each message will be delivered 8 times. This is what I'd really like to avoid...
Finally I got some time to test it and here is what I observed:
2 nodes in a cluster. 2 JMS servers: jms1 on node1, jms2 on node2.
Distributed topic dt. MDB with durable subscription and jms-client-id=durableSubscriber. Started the system: 0 messages, mdb#node1 is up, mdb#node2 trying to connect periodically, but it can't because "Client id, durableSubscriber, is in use". As expected.
Sent in 100 messages:
jms1#dt messages current = 0, messages total = 100, consumers current = 1
I can see that node1 processed 100 messages.
jms2#dt messages current = 100, messages total = 100 , consumers current = 1
i.e. "duplicate" messages are pending in the topic.
Sent in another 100 messages, 100 were processed on the node1, 200 pending on node2.
Rebooted node1, mdb#node2 reconnected to dt and started processing "pending" messages. 200 messages were processed on node2.
After node1 is up, mdb#node1 can't connect to the dt, while mdb#node2 is connected.
jms1#dt messages current=0, messages total = 0, consumers current = 0
jms2#dt messages current=0, messages total = 200, consumers current = 1
Send in 100 more messages, I see that all 100 messages are processed on node2 and discarded on node1.
jms1#dt messages current=0, messages total = 100, consumers current = 0
jms2#dt messages current=0, messages total = 300, consumers current = 1
Now I reboot node2, mdb#node1 reconnects to dt. After reboot mdb#node2 reconnects to dt and mdb#node1 gets disconnected from dt.
jms1#dt messages current=0, messages total = 100, consumers current = 1
jms2#dt messages current=0, messages total = 0, consumers current = 1
I send in 100 messages, all are processed on node2 and stored in the topic on node1:
jms1#dt messages current=100, messages total = 200, consumers current = 1
jms2#dt messages current=0, messages total = 0, consumers current = 1
Then I shut down node2 and I see 100 "pending messages" being processed on node1 after mdb#node1 reconnects to the topic.
So the result is:
I sent 400 messages, 700 were processed by MDB out of which 300 were duplicates.
It looks like the MDB reconnection works good as expected, but the messages may be duplicated if a node hosting the "active" MDB goes down.
This might be a bug or a feature of weblogic JMS implementation.

I haven't used Weblogic, but most JMS solutions have the concept of Queues and Topics. You want a JMS Topic. Subscribers register and the topic ensures that the message is delivered to each subscriber once.
Configuration details.
Update: If you are running into issues in a clustered environment, I would make sure that everything is configured properly (here is a guide for JMS Topic Clustering). It definitely sounds strange that Weblogic would fail so miserably when clustering. If that does not work, you could look into a 3rd party Messaging Queue, such as RabbitMQ, which support JMS and will definitely not have this issue.

This is the kind of behaviour that an ESB implementation should enbale. In terms of processing overhead there would be no great difference, but it can be useful to have separation of concerns between "plumbing" and application code.
As it happens, the WebSphere JMS implementaion has support for installing mediations that address such requirements. I don't know whether WebLogic has something similar, or whether their associated ESB products are an option for you, but I would recommend investigating those capabilities. You currently have a simple requirement, and your code is surely sufficient, however it's quite easy to imagine how a few minor additional requirements (could we just convert this field from dollars to pounds before we transmit to that destination, could we not send messages with this content to that destination ...) and lo! you find yourself writing your own ESB.

[...] Thus every message will be delivered twice to System1 and twice to System2 and if there'll be 8 servers in a cluster, each message will be delivered 8 times. This is what I'd really like to avoid...
This is right for non-durable subscriptions, but not for durable. For durable, all MDBs share the same connection-id and subscription-id (based on the MDB name by default), so only one MDB will be able to attach and receive messages at a time. The first MDB to try will succeed in connecting, the others will detect a conflict and fail, but keep retrying. So using a durable topic subscription should do the trick.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.