I'm using Kafka and we have a use case to build a fault tolerant system where not even a single message should be missed. So here's the problem:
If publishing to Kafka fails due to any reason (ZooKeeper down, Kafka broker down etc) how can we robustly handle those messages and replay them once things are back up again. Again as I say we cannot afford even a single message failure.
Another use case is we also need to know at any given point in time how many messages were failed to publish to Kafka due to any reason i.e. something like counter functionality and now those messages needs to be re-published again.
One of the solution is to push those messages to some database (like Cassandra where writes are very fast but we also need counter functionality and I guess Cassandra counter functionality is not that great and we don't want to use that.) which can handle that kind of load and also provide us with the counter facility which is very accurate.
This question is more from architecture perspective and then which technology to use to make that happen.
PS: We handle some where like 3000TPS. So when system start failing those failed messages can grow very fast in very short time. We're using java based frameworks.
Thanks for your help!
The reason Kafka was built in a distributed, fault-tolerant way is to handle problems exactly like yours, multiple failures of core components should avoid service interruptions. To avoid a down Zookeeper, deploy at least 3 instances of Zookeepers (if this is in AWS, deploy them across availability zones). To avoid broker failures, deploy multiple brokers, and ensure you're specifying multiple brokers in your producer bootstrap.servers property. To ensure that the Kafka cluster has written your message in a durable manor, ensure that the acks=all property is set in the producer. This will acknowledge a client write when all in-sync replicas acknowledge reception of the message (at the expense of throughput). You can also set queuing limits to ensure that if writes to the broker start backing up you can catch an exception and handle it and possibly retry.
Using Cassandra (another well thought out distributed, fault tolerant system) to "stage" your writes doesn't seem like it adds any reliability to your architecture, but does increase the complexity, plus Cassandra wasn't written to be a message queue for a message queue, I would avoid this.
Properly configured, Kafka should be available to handle all your message writes and provide suitable guarantees.
I am super late to the party. But I see something missing in above answers :)
The strategy of choosing some distributed system like Cassandra is a decent idea. Once the Kafka is up and normal, you can retry all the messages that were written into this.
I would like to answer on the part of "knowing how many messages failed to publish at a given time"
From the tags, I see that you are using apache-kafka and kafka-consumer-api.You can write a custom call back for your producer and this call back can tell you if the message has failed or successfully published. On failure, log the meta data for the message.
Now, you can use log analyzing tools to analyze your failures. One such decent tool is Splunk.
Below is a small code snippet than can explain better about the call back I was talking about:
public class ProduceToKafka {
private ProducerRecord<String, String> message = null;
// TracerBulletProducer class has producer properties
private KafkaProducer<String, String> myProducer = TracerBulletProducer
.createProducer();
public void publishMessage(String string) {
ProducerRecord<String, String> message = new ProducerRecord<>(
"topicName", string);
myProducer.send(message, new MyCallback(message.key(), message.value()));
}
class MyCallback implements Callback {
private final String key;
private final String value;
public MyCallback(String key, String value) {
this.key = key;
this.value = value;
}
#Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (exception == null) {
log.info("--------> All good !!");
} else {
log.info("--------> not so good !!");
log.info(metadata.toString());
log.info("" + metadata.serializedValueSize());
log.info(exception.getMessage());
}
}
}
}
If you analyze the number of "--------> not so good !!" logs per time unit, you can get the required insights.
God speed !
Chris already told about how to keep the system fault tolerant.
Kafka by default supports at-least once message delivery semantics, it means when it try to send a message something happens, it will try to resend it.
When you create a Kafka Producer properties, you can configure this by setting retries option more than 0.
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:4242");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(props);
For more info check this.
Related
From the examples I have seen the below code snippet and it works fine. But the problem is that : I don't always have requirements of processing the input-stream and produce it to a sink.
What if I have an application where based on some events I have to only publish to a kafka topic so that down-stream applications can make certain decisions. That means, I don't really have an input-stream but I just know when something happens in my application, I need to publish a message to a particular topic of kafka. That is, I only need a sink.
I was going through examples but didn't find anything matching to my requirements. Is there a way to only configure a KafkaSink that exposes a method() to be called for publishing messages to a topic.
Many thanks in advance!!
String inputTopic = "flink_input";
String outputTopic = "flink_output";
String consumerGroup = "baeldung";
String address = "localhost:9092";
StreamExecutionEnvironment environment = StreamExecutionEnvironment
.getExecutionEnvironment();
FlinkKafkaConsumer011<String> flinkKafkaConsumer = createStringConsumerForTopic(
inputTopic, address, consumerGroup);
DataStream<String> stringInputStream = environment
.addSource(flinkKafkaConsumer);
FlinkKafkaProducer011<String> flinkKafkaProducer = createStringProducer(
outputTopic, address);
stringInputStream
.map(new WordsCapitalizer())
.addSink(flinkKafkaProducer);
You must have a source. You might want to implement a custom source, or you could use something like a NumberSequenceSource followed by an operator like a process function that emits whatever you know you want to write to the sink, followed by the sink.
That process function could, for example, transform the incoming events into whatever you want to write to Kafka, or it could ignore its inputs and use a timer to generate the events to be sent to Kafka.
Or you might find that async i/o is a better building block than a process function, depending on your requirements.
#Configuration
public class KafkaConfiguration {
#Value("${kafka.boot.server}")
private String kafkaServer;
#Bean
public KafkaTemplate<String,String> kafkaTemplate(){
return new KafkaTemplate<>(producerConfig());}
#Bean
public ProducerFactory<String,String> producerConfig() {
Map<String,Object> config= new HashMap<>();
config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaServer);
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,StringSerializer.class );
config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,JsonSerializer.class); return new DefaultKafkaProducerFactory<>(config);
}
}
What are the prerequisites for kafka? What do you suggest for publishing message? What other ways possible are there?
delivery.timeout.ms:
If massive events appear in a short time is your case, this value should be higher because when the network is busy, your client will complain about NetworkException, and increasing it you can see less NetworkException.
Understanding what is delivery.timeout.ms:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-91+Provide+Intuitive+User+Timeouts+in+The+Producer?source=post_page-----fa3910d9aa54----------------------#KIP-91ProvideIntuitiveUserTimeoutsinTheProducer-TestPlan
acks: If you need no data loss. You have to set it to all. The default is 1 and
the leader will write the record to its local log but will respond without awaiting full acknowledgement from all followers. In this case should the leader fail immediately after acknowledging the record but before the followers have replicated it then the record will be lost.
retries: It depends on your kafka client version. Now the default retries is Integer.Max,
but for the earlier versions you will like to set retries to a higher value so your producer does not stop due to one simple exception that the leader partition is not reachable.
Exactly-once: If you app requires exactly-once, You have to refer to enable.idempotence and transactional.id
Note that the configs mentioned here should be possible to find the corresponding enums in your java client
Further reference of producer settings:
https://docs.confluent.io/current/installation/configuration/producer-configs.html
I'm writing an application with Spring Boot so to write to Kafka I do:
#Autowired
private KafkaTemplate<String, String> kafkaTemplate;
and then inside my method:
kafkaTemplate.send(topic, data)
But I feel like I'm just relying on this to work, how can I know if this has worked? If it's asynchronous, is it a good practice to return a 200 code and hoped it did work? I'm confused. If Kafka isn't available, won't this fail? Shouldn't I be prompted to catch an exception?
Along with what #mjuarez has mentioned you can try playing with two Kafka producer properties. One is ProducerConfig.ACKS_CONFIG, which lets you set the level of acknowledgement that you think is safe for your use case. This knob has three possible values. From Kafka doc
acks=0: Producer doesn't care about acknowledgement from server, and considers it as sent.
acks=1: This will mean the leader will write the record to its local log but will respond without awaiting full acknowledgement from all followers.
acks=all: This means the leader will wait for the full set of in-sync replicas to acknowledge the record.
The other property is ProducerConfig.RETRIES_CONFIG. Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error.
Yes, if Kafka is not available, that .send() call will fail, but if you send it async, no one will be notified. You can specify a callback that you want to be executed when the future finally finishes. Full interface spec here: https://kafka.apache.org/20/javadoc/org/apache/kafka/clients/producer/Callback.html
From the official Kafka javadoc here: https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html
Fully non-blocking usage can make use of the Callback parameter to
provide a callback that will be invoked when the request is complete.
ProducerRecord<byte[],byte[]> record = new ProducerRecord<byte[],byte[]>("the-topic", key, value);
producer.send(myRecord,
new Callback() {
public void onCompletion(RecordMetadata metadata, Exception e) {
if(e != null) {
e.printStackTrace();
} else {
System.out.println("The offset of the record we just sent is: " + metadata.offset());
}
}
});
you can use below command while sending messages to kafka:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topic-name
while above command is running you should run your code and if sending messages being successful then the message must be printed on the console.
Furthermore, likewise any other connection to any resources if the connection could not be established, then doing any kinds of operations would result some exception raises.
I have written a simple program to read data from Kafka and print in flink. Below is the code.
public static void main(String[] args) throws Exception {
Options flinkPipelineOptions = PipelineOptionsFactory.create().as(Options.class);
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Class<?> unmodColl = Class.forName("java.util.Collections$UnmodifiableCollection");
env.getConfig().addDefaultKryoSerializer(unmodColl, UnmodifiableCollectionsSerializer.class);
env.enableCheckpointing(1000, CheckpointingMode.EXACTLY_ONCE);
flinkPipelineOptions.setJobName("MyFlinkTest");
flinkPipelineOptions.setStreaming(true);
flinkPipelineOptions.setCheckpointingInterval(1000L);
flinkPipelineOptions.setNumberOfExecutionRetries(5);
flinkPipelineOptions.setExecutionRetryDelay(3000L);
Properties p = new Properties();
p.setProperty("zookeeper.connect", "localhost:2181");
p.setProperty("bootstrap.servers", "localhost:9092");
p.setProperty("group.id", "test");
FlinkKafkaConsumer09<Notification> kafkaConsumer = new FlinkKafkaConsumer09<>("testFlink",new ProtoDeserializer(),p);
DataStream<Notification> input = env.addSource(kafkaConsumer);
input.rebalance().map(new MapFunction<Notification, String>() {
#Override
public String map(Notification value) throws Exception {
return "Kafka and Flink says: " + value.toString();
}
}).print();
env.execute();
}
I need flink to process my data in kafka exactly once and I have few questions on how it can be done.
When does FlinkKafkaConsumer09 commits the processed offsets to kafka?
Say my topic has 10 messages, the consumer processes all 10 messages. When I stop the job and start it again, it starts processing random messages from the set of previously read messages. I need to ensure none of my messages are processed twice.
Please advice. Appreciate all the help. Thanks.
This page describes the fault tolerance guarantees of the Flink Kafka connector.
You can use Flink's savepoints to re-start a job in an exactly-once (state preserving) manner.
The reason why you are seeing the messages again is because the offsets committed by Flink to the Kafka broker / Zookeeper are not in line with Flink's registered state.
You'll always see messages processed multiple times after restore / failure in Flink, even with exactly-once semantics enabled. The exactly-once guarantees in Flink are with respect to registered state, not for the records send to the operators.
Slightly off-topic: What are these lines for? They are not passed to Flink anywhere.
Options flinkPipelineOptions = PipelineOptionsFactory.create().as(Options.class);
flinkPipelineOptions.setJobName("MyFlinkTest");
flinkPipelineOptions.setStreaming(true);
flinkPipelineOptions.setCheckpointingInterval(1000L);
flinkPipelineOptions.setNumberOfExecutionRetries(5);
flinkPipelineOptions.setExecutionRetryDelay(3000L);
I am using Kafka producer 0.8.2 and I am trying to send a single message to the topic, in a way that the message is sent immediately. I have a console consumer to observe if the message arrives. I notice that the message is not sent immediately, unless of course I run producer.close(), immediately after sending, which isn't what I would like to do.
What is the correct producer configuration setting to target this? I'm using the following (I'm aware that it looks like a mess of different configurations/versions, but I simply cannot find something that's working as I would expect in the documentation):
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokersStr);
props.put(ProducerConfig.RETRIES_CONFIG, "3");
props.put("producer.type", "sync");
props.put("batch.num.messages", "1");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "none");
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 1);
props.put(ProducerConfig.BLOCK_ON_BUFFER_FULL_CONFIG, true);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
I found a solution, which seems reasonable, and involves running get() on the Future returned by the Producer's send() command. I changed the send command from:
producer.send(record);
to the following:
producer.send(record).get();
It would be nice to hear from the more experienced Kafka users if there are any issues with that approach? Also, I would be interested to learn if there is a configuration setting for the Producer to achieve the same thing (that is, send a single message immediately without running get() of the Future).
Old post but I have struggled way to much to miss a post here.
I stumbled upon the same behavior trying to run the Kafka examples and this .get() was the only thing that got the messages to Kafka. The Javadoc for KafkaProducer.send(…) states this method is asynchronous. On my test code, the message was thus sent to Kafka while my code continued to run and actually just got to the end of the run and terminated before the message was actually sent inside the Future.
So this .get() just blocks on the Future until it is realized. This actually removes the benefits of the Future. A cleaner way to do it could be to wait a bit with a Thread.sleep(…) right after the .send(…) (depends on your use case).