Why Kafka KTable is missing entries?

Why Kafka KTable is missing entries? - java

I have a single instance java application that uses KTable from Kafka Streams. Until recently I could retrieve all data using KTable when suddenly some of the messages seemed to vanish. There should be ~33k messages with unique keys there.
When I want to retrieve messages by key I don't get some of the messages. I use ReadOnlyKeyValueStore to retrieve messages:
final ReadOnlyKeyValueStore<GenericRecord, GenericRecord> store = ((KafkaStreams)streams).store(storeName, QueryableStoreTypes.keyValueStore());
store.get(key);
These are the configuration settings I set to the KafkaStreams.
final Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_SERVER_CONFIG, serverId);
config.put(StreamsConfig.APPLICATION_ID_CONFIG, applicationId);
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
config.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, schemaRegistryUrl);
config.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, GenericAvroSerde.class);
config.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, GenericAvroSerde.class);
config.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
Kafka: 0.10.2.0-cp1
Confluent: 3.2.0
Investigations brought me to some very worrying insights. Using REST Proxy I manually read partitions and found out that some offsets return error.
Request:
/topics/{topic}/partitions/{partition}/messages?offset={offset}
{
"error_code": 50002,
"message": "Kafka error: Fetch response contains an error code: 1"
}
No client, neither java nor command line however return any error. They just skip over the faulty missing messages resulting in missing data in KTables. Everything was fine and without notice it seems that somehow some of the messages got corrupt.
I have two brokers and all the topics have the replication factor of 2 and are fully replicated. Both brokers separately return the same. Restarting brokers makes no difference.
What could possibly be the cause?
How to detect this case in a client?

By default Kafka Broker config key cleanup.policy is set to delete. Set it to compact to keep the latest message for each key. See compaction.
Deletion of old messages does not change the minimum offset so trying to retrieve message below it causes an error. The error is very vague. The Kafka Streams client will start reading messages from minimum offset so there is no error. The only visible effect is missing data in KTables.
While the application is running thanks to the caches all data might still be available even after messages are deleted from Kafka itself. They will vanish after cleanup.

Related

Spring Cloud stream binder kafka DLQ Handling for Batching doesn't work

I am using spring-cloud-stream-binder-kafka version 3.2.6 in my application. I have enabled the batch-mode so that I should get the messages in batches. While handling DLQ scenario I observed that, In my batch if I have 3 messages and if my last message fail then entire batch goes to DLQ. Although I am throwing BatchListenerFailedException with the correct index of the message.
After debugging further I observed in Class org.springframework.cloud.stream.binder.kafka KafkaMessageChannelBinder method getErrorMessageHandler throws ClassCastException (Can not cast LinkedList to ConsumerRecord). Below is that line.
ConsumerRecord<Object, Object> record = StaticMessageHeaderAccessor.getSourceData(message);
Method StaticMessageHeaderAccessor.getSourceData(message) returns list of ConsumerRecord but in the code it is expecting single consumer record. Is this a bug? If not can somebody help me to find how to resolve this issue.
Appreciate your early response.
Thanks

Spring cloud stream kafka- manual polling

Need to trigger a consumption of particular topic on condition basis. How to do that in cloud stream Kafka?
Detailed Scenario:
We are processing messages from Kafka and updating in database. if DB is down/has any issues we are redirecting messages to a different topic.
Later when the DB is up again, we need to pause the actual consumption and poll the topic of failed messages and once those data are updated to DB then we need to resume the actual consumption.
We need to consider the order of messages here. Hence we took this approach.
Am currently trying with ConsumeFactory, but it's returning an empty collection.
Consumer<byte[], byte[]> consumer = consumerFactory.createConsumer("0", "consumer-
1");
consumer.subscribe(Arrays.asList("some-topic"));
ConsumerRecords<byte[], byte[]> poll = consumer.poll(Duration.ofMillis(10000));
poll.forEach(record -> {
log.info("record {}", record);
});
Could someone help here or please suggest if we have any other option to handle this?

How to handle kafka publishing failure in robust way

I'm using Kafka and we have a use case to build a fault tolerant system where not even a single message should be missed. So here's the problem:
If publishing to Kafka fails due to any reason (ZooKeeper down, Kafka broker down etc) how can we robustly handle those messages and replay them once things are back up again. Again as I say we cannot afford even a single message failure.
Another use case is we also need to know at any given point in time how many messages were failed to publish to Kafka due to any reason i.e. something like counter functionality and now those messages needs to be re-published again.
One of the solution is to push those messages to some database (like Cassandra where writes are very fast but we also need counter functionality and I guess Cassandra counter functionality is not that great and we don't want to use that.) which can handle that kind of load and also provide us with the counter facility which is very accurate.
This question is more from architecture perspective and then which technology to use to make that happen.
PS: We handle some where like 3000TPS. So when system start failing those failed messages can grow very fast in very short time. We're using java based frameworks.
Thanks for your help!

The reason Kafka was built in a distributed, fault-tolerant way is to handle problems exactly like yours, multiple failures of core components should avoid service interruptions. To avoid a down Zookeeper, deploy at least 3 instances of Zookeepers (if this is in AWS, deploy them across availability zones). To avoid broker failures, deploy multiple brokers, and ensure you're specifying multiple brokers in your producer bootstrap.servers property. To ensure that the Kafka cluster has written your message in a durable manor, ensure that the acks=all property is set in the producer. This will acknowledge a client write when all in-sync replicas acknowledge reception of the message (at the expense of throughput). You can also set queuing limits to ensure that if writes to the broker start backing up you can catch an exception and handle it and possibly retry.
Using Cassandra (another well thought out distributed, fault tolerant system) to "stage" your writes doesn't seem like it adds any reliability to your architecture, but does increase the complexity, plus Cassandra wasn't written to be a message queue for a message queue, I would avoid this.
Properly configured, Kafka should be available to handle all your message writes and provide suitable guarantees.

I am super late to the party. But I see something missing in above answers :)
The strategy of choosing some distributed system like Cassandra is a decent idea. Once the Kafka is up and normal, you can retry all the messages that were written into this.
I would like to answer on the part of "knowing how many messages failed to publish at a given time"
From the tags, I see that you are using apache-kafka and kafka-consumer-api.You can write a custom call back for your producer and this call back can tell you if the message has failed or successfully published. On failure, log the meta data for the message.
Now, you can use log analyzing tools to analyze your failures. One such decent tool is Splunk.
Below is a small code snippet than can explain better about the call back I was talking about:
public class ProduceToKafka {
private ProducerRecord<String, String> message = null;
// TracerBulletProducer class has producer properties
private KafkaProducer<String, String> myProducer = TracerBulletProducer
.createProducer();
public void publishMessage(String string) {
ProducerRecord<String, String> message = new ProducerRecord<>(
"topicName", string);
myProducer.send(message, new MyCallback(message.key(), message.value()));
}
class MyCallback implements Callback {
private final String key;
private final String value;
public MyCallback(String key, String value) {
this.key = key;
this.value = value;
}
#Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (exception == null) {
log.info("--------> All good !!");
} else {
log.info("--------> not so good !!");
log.info(metadata.toString());
log.info("" + metadata.serializedValueSize());
log.info(exception.getMessage());
}
}
}
}
If you analyze the number of "--------> not so good !!" logs per time unit, you can get the required insights.
God speed !

Chris already told about how to keep the system fault tolerant.
Kafka by default supports at-least once message delivery semantics, it means when it try to send a message something happens, it will try to resend it.
When you create a Kafka Producer properties, you can configure this by setting retries option more than 0.
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:4242");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(props);
For more info check this.

How to check delayed/scheduled messages in RabbitMQ's Mnesia

I was checking for some alternatives for Quartz-scheduler.
Though this is not a complete replacement, I was trying out RabbitMQ Delayed Messages Plugin (suits for my use-case).
I was able to get the scheduling work but I was not to view the messages which are delayed(which are stored in Mnesia).
Is there a way to check the messages and/or number of messages in Mnesia?
Edit : I inferred that the messages are stored in Mnesia from the comment from here.

There is no way to check the messages that RabbitMQ is persisting in it's mnesia database.
RabbitMQ is not a generalized datastore. It is a purpose-built message broker and queueing system. The datastore it has in it is there to facilitate the persistence of messages, not to be queried and used as if it were a database on it's own.

To view the data inside MNESIA you could :
Write a simple Erlang program as this, as result you have:
(rabbit#gabrieles-MBP)5>
load:traverse_table_and_show('rabbit_delayed_messagerabbit#gabrieles-MBP').
{delay_entry,
{delay_key,1442258857832,
{exchange,
{resource,<<"/">>,exchange,<<"my-exchange">>},
'x-delayed-message',true,false,false,
[{<<"x-delayed-type">>,longstr,<<"direct">>}],
undefined,undefined, {[],[]}}},
{delivery,false,false,<0.2008.0>,
{basic_message,
{resource,<<"/">>,exchange,<<"my-exchange">>},
[<<>>],
{content,60,
{'P_basic',undefined,undefined,
[{<<"x-delay">>,signedint,100000}],
undefined,undefined,undefined,undefined,undefined,
undefined,undefined,undefined,undefined,undefined,
undefined},
..
OR in this way:
execute an Erlang shell session using:
erl -set-cookie ABCDEFGHI -sname monitorNode#gabrielesMBP
you have to use the same cookie that rabbitmq are using.
Typically $(HOME).erlang.cookie
execute this command:observer:start().
and you should have this:
Once you are connected to rabbitmq node open Table Viewer and from the menu Mnesia table as:
Here you can see your data:

Kafka producer config to send a message immediately

I am using Kafka producer 0.8.2 and I am trying to send a single message to the topic, in a way that the message is sent immediately. I have a console consumer to observe if the message arrives. I notice that the message is not sent immediately, unless of course I run producer.close(), immediately after sending, which isn't what I would like to do.
What is the correct producer configuration setting to target this? I'm using the following (I'm aware that it looks like a mess of different configurations/versions, but I simply cannot find something that's working as I would expect in the documentation):
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokersStr);
props.put(ProducerConfig.RETRIES_CONFIG, "3");
props.put("producer.type", "sync");
props.put("batch.num.messages", "1");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "none");
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 1);
props.put(ProducerConfig.BLOCK_ON_BUFFER_FULL_CONFIG, true);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");

I found a solution, which seems reasonable, and involves running get() on the Future returned by the Producer's send() command. I changed the send command from:
producer.send(record);
to the following:
producer.send(record).get();
It would be nice to hear from the more experienced Kafka users if there are any issues with that approach? Also, I would be interested to learn if there is a configuration setting for the Producer to achieve the same thing (that is, send a single message immediately without running get() of the Future).

Old post but I have struggled way to much to miss a post here.
I stumbled upon the same behavior trying to run the Kafka examples and this .get() was the only thing that got the messages to Kafka. The Javadoc for KafkaProducer.send(…) states this method is asynchronous. On my test code, the message was thus sent to Kafka while my code continued to run and actually just got to the end of the run and terminated before the message was actually sent inside the Future.
So this .get() just blocks on the Future until it is realized. This actually removes the benefits of the Future. A cleaner way to do it could be to wait a bit with a Thread.sleep(…) right after the .send(…) (depends on your use case).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why Kafka KTable is missing entries? - java

Related

Spring Cloud stream binder kafka DLQ Handling for Batching doesn't work

Spring cloud stream kafka- manual polling

How to handle kafka publishing failure in robust way

How to check delayed/scheduled messages in RabbitMQ's Mnesia

Kafka producer config to send a message immediately

Categories

Resources