I am a Reactor newbie. I am trying to develop the following application logic:
Read messages from a Kafka topic source.
Transform the massages.
Write a subset of the transformed messages to a new Kafka topic target.
Explicitly acknowledge the reading operation for all the messages originally read from topic source.
The only solution I found is to rewrite the above business logic as it follows.
Read messages from a Kafka topic source.
Transform the massages.
Immediately acknowledge the message not be written to topic target.
Filter all the above messages.
Write the rest of the transformed messages to the new Kafka topic target.
Explicitly acknowledge the reading operation for these messages
The code implementing the second logic is the following:
receiver.receive()
.flatMap(this::processMessage)
.map(this::acknowledgeMessagesNotToWriteInKafka)
.filter(this::isMessageToWriteInKafka)
.as(this::sendToKafka)
.doOnNext(r -> r.correlationMetadata().acknowledge());
Clearly, receiver type is KafkaReceiver, and method sendToKafka uses a KafkaSender. One of the things I don't like is that I am using a map to acknowledge some messages.
Is there any better solution to implement the original logic?
This is not exactly your four business logic steps, but I think it's a little bit closer to what you want.
You could acknowledge the "discarded" messages that won't be written in .doOnDiscard after .filter...
receiver.receive()
.flatMap(this::processMessage)
.filter(this::isMessageToWriteInKafka)
.doOnDiscard(ReceiverRecord.class, record -> record.receiverOffset().acknowledge())
.as(this::sendToKafka)
.doOnNext(r -> r.correlationMetadata().acknowledge());
Note: you'll need to use the proper object type that was discarded. I don't know what type of object the Publisher returned from processMessage emits, but I assume you can get the ReceiverRecord or ReceiverOffset from it in order to acknowledge it.
Alternatively, you could combine filter/doOnDiscard into a single .handle operator...
receiver.receive()
.flatMap(this::processMessage)
.handle((m, sink) -> {
if (isMessageToWriteInKafka(m)) {
sink.next(m);
} else {
m.getReceiverRecord().getReceiverOffset().acknowledge();
}
})
.as(this::sendToKafka)
.doOnNext(r -> r.correlationMetadata().acknowledge());
Related
From the examples I have seen the below code snippet and it works fine. But the problem is that : I don't always have requirements of processing the input-stream and produce it to a sink.
What if I have an application where based on some events I have to only publish to a kafka topic so that down-stream applications can make certain decisions. That means, I don't really have an input-stream but I just know when something happens in my application, I need to publish a message to a particular topic of kafka. That is, I only need a sink.
I was going through examples but didn't find anything matching to my requirements. Is there a way to only configure a KafkaSink that exposes a method() to be called for publishing messages to a topic.
Many thanks in advance!!
String inputTopic = "flink_input";
String outputTopic = "flink_output";
String consumerGroup = "baeldung";
String address = "localhost:9092";
StreamExecutionEnvironment environment = StreamExecutionEnvironment
.getExecutionEnvironment();
FlinkKafkaConsumer011<String> flinkKafkaConsumer = createStringConsumerForTopic(
inputTopic, address, consumerGroup);
DataStream<String> stringInputStream = environment
.addSource(flinkKafkaConsumer);
FlinkKafkaProducer011<String> flinkKafkaProducer = createStringProducer(
outputTopic, address);
stringInputStream
.map(new WordsCapitalizer())
.addSink(flinkKafkaProducer);
You must have a source. You might want to implement a custom source, or you could use something like a NumberSequenceSource followed by an operator like a process function that emits whatever you know you want to write to the sink, followed by the sink.
That process function could, for example, transform the incoming events into whatever you want to write to Kafka, or it could ignore its inputs and use a timer to generate the events to be sent to Kafka.
Or you might find that async i/o is a better building block than a process function, depending on your requirements.
I am implementing Spring Boot application in Java, using Spring Cloud Stream with Kafka Streams binder.
I need to implement blocking operation inside of KStream map method like so:
public Consumer<KStream<?, ?>> sink() {
return input -> input
.mapValues(value -> methodReturningCompletableFuture(value).get())
.foreach((key, value) -> otherMethod(key, value));
}
completableFuture.get() throws exceptions (InterruptedException, ExecutionException)
How to handle these exceptions so that the chained method doesn't get executed and the Kafka message is not acknowledged? I cannot afford message loss, sending it to a dead letter topic is not an option.
Is there a better way of blocking inside map()?
You can try the branching feature in Kafka Streams to control the execution of the chained methods. For example, here is a pseudo-code that you can try.
You can possibly use this as a starting point and adapt this to your particular use case.
final Map<String, ? extends KStream<?, String>> branches =
input.split()
.branch(k, v) -> {
try {
methodReturningCompletableFuture(value).get();
return true;
}
catch (Exception e) {
return false;
}
}, Branched.as("good-records"))
.defaultBranch();
final KStream<?, String> kStream = branches.get("good-records");
kStream.foreach((key, value) -> otherMethod(key, value));
The idea here is that you will only send the records that didn't throw an exception to the named branch good-records, everything else goes into a default branch which we simply ignore in this pseudo-code. Then you invoke additional chained methods (as this foreach call shows) only for those "good" records.
This does not solve the problem of not acknowledging the message after an exception is thrown. That seems to be a bit challenging. However, I am curious about that use case. When an exception happens and you handle it, why don't you want to ack the message? The requirements seem to be a bit rigid without using a DLT. The ideal solution here is that you might want to introduce some retries and once exhausted from the retries, send the record to a DLT which makes Kafka Streams consumer acknowledges the message. Then the application moves on to the next offset.
The call methodReturningCompletableFuture(value).get() simply waits until a default or configured timeout is reached, assuming that methodReturningCompletableFuture() returns a Future object. Therefore, that is already a good approach to wait inside the KStream map operation. I don't think anything else is necessary to make it wait further.
Just getting my head around Mutiny API (and java stream api)...
I have the following code that reads messages off an AWS SQS queue, ref: quarkus sqs guide
Uni<List<Quark>> result =Uni.createFrom()
.completionStage(sqs.receiveMessage(m -> m.maxNumberOfMessages(10).queueUrl(queueUrl)))
.onItem().transform(ReceiveMessageResponse::messages)
.onItem().transform(m -> m.stream().map(Message::body).map(this::toQuark).collect(Collectors.toList()));
Next I want to send each element in the list to a method handleMessage(Quark quark). How do I do this in a "mutiny way!". Do i need to transform again or should not collect..or ?
At the moment, you get a Uni<List<Quark>>. The Mutiny way would be to transform this into a Multi and process each item:
Multi<Quark> multi = result.onItem().transformToMulti(list -> Multi.createFrom().items(list));
A Multi is a stream. Each item will be a Quark. Then, you just need to do the following:
multi.onItem().invoke(q -> handleMessage(q))
I used invoke because I don't know what handleMessage is doing. If it's processing the Quark and returning something, use transform. If it does not return anything, use invoke.
BTW, do not forget to subscribe to the returned Multi.
Hello I have this issue that I'm trying to solve. Basically I have a Kafka Streams topology that will read JSON messages from a Kafka topic and that message gets deserialized into a POJO. Then ideally it will read check that message for a certain boolean flag. If that flag is true it will do some transformation and then write it back to the topic. However if the flag is false, I'm trying to have it not write anything but I'm not sure how I can go about it. With the MP Reactive Messaging I can just use an RxJava 2 Flowable Stream and return something like Flowable.empty() but I can't use that method here it seems.
JsonbSerde<FinancialMessage> financialMessageSerde = new JsonbSerde<>(FinancialMessage.class);
StreamsBuilder builder = new StreamsBuilder();
builder.stream(
TOPIC_NAME,
Consumed.with(Serdes.Integer(), financialMessageSerde)
)
.mapValues (
message -> checkCondition(message)
)
.to (
TOPIC_NAME,
Produced.with(Serdes.Integer(), financialMessageSerde)
);
The below is the function call logic.
public FinancialMessage checkCondition(FinancialMessage rawMessage) {
FinancialMessage receivedMessage = rawMessage;
if (receivedMessage.compliance_services) {
receivedMessage.compliance_services = false;
return receivedMessage;
}
else return null;
}
If the boolean is false it just returns a JSON body with "null".
I've tried changing the return type of the checkCondition function wrapped like
public Flowable<FinancialMessage> checkCondition (FinancialMessage rawMessage)
And then having the return from the if be like Flowable.just(receivedMessage) or Flowable.empty() but I can't seem to serialize the Flowable object. This might be a silly question but is there a better way to go about this?
Note that Kafka messages are immutable and not deleted after read, and if you read/write from the same topic with a single application, a message would be processed infinitely often (or to be more precise different copies of it) if you don't have a condition to "break" the cycle.
Also, if for example 5 services read from the same topic, all 5 services get a copy of every event. And if one service write back, all other 4 services and the writing service itself will read the message again. Thus, you get quite some data amplification.
If you have different services to react on the original input message consecutively, you could have one topic between each pair of consecutive services to really build a pipeline though.
Last, you say if the boolean flag is true you want to transform the message and emit (I assume for the next service to consumer). And for false you want to do nothing. I a further assume that for a message only a single flag will be true and a successful transformation also switches the flag (to enable processing by the next service). For this case, it's best if you can ensure that each original input message has the same initial boolean flag set to build your pipeline. Thus, only the corresponding service will read messages with its boolean flag set (you don't even need to check the boolean flag as your upstream write ensures that it's set; you could only have a sanity check).
If you don't know which boolean flag is set initially and all services read from the same input topic, just filtering out the message is correct. If all services read all messages, 4 services will filter the message while one service will process it and emit a new message with a different flag. For this architecture, a single topic might work: if a message is processed by all services and all boolean flags are false (after all services processed the message), and you write it back to the input topic, all services would drop the last copy correctly. However, using a single topic implies a lot of redundant reading/writing.
Maybe the best architecture is, to have your original input topic, and one additional input topic for each service. You also use an additional "dispatcher" service that read from the original input topics, and branches() the KStream into the service input topics according to the boolean flag. This way, each service will read only messages with the right flag set to true. Furthermore, each service will write to the input topic of the other services also using branch() after the message transformation to write it to the input topic of the correct next service. Last, you would want an output topic that each service can write into after a message is fully processed.
I need to process flux of message from kafka topic using reactor-kafka. What should I achieve are these steps:
Read messages from kafka topic as flux.
Try to send each message to external system.
In case of success log that event, in case of failure - send message to another topic.
Manually acknowledge processed messages (important: in order).
I've found in docs example of building reactive pipeline (just receiving message from one topic and sending it to another) with manual acknowledgement of messages:
sender.send(KafkaReceiver.create(receiverOptions)
.receive()
.map(m -> SenderRecord.create(transform(m.value()), m.receiverOffset())))
.doOnNext(m -> m.correlationMetadata().acknowledge());
But how can I send to another topic one part of processed messages and skip another part, but in the end make acknowledgements for all processed messages in order?