Kafka State Store Processor Delete: how to use?

Kafka State Store Processor Delete: how to use? - java

I've defined a class extending Processor and I'm using a KeyValueStore to store temporary store some messages before to send them to the sink topic. In particular, I receive on a source topic a set of fragmented messages and, once all are received, I've to assemble them and send, the concatenated message to the sink topic. In the Processor:process() method, once I've sent the message with the forward() I want to delete, by means of the delete(K key) method, the message from the state store.
In the processor I've created the state store with
StoreBuilder<KeyValueStore<byte[], String>> storeBuilder = Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("message-store"), Serdes.ByteArray(), Serdes.String());
The problem is that the removal does not happen and when I send another message with the same key, I still receive in the value the previous messages.
Code:
kvStore = (KeyValueStore<byte[], String>) this.context.getStateStore("message-store");
kvStore.delete((byte[]key)
The put and the get of the state store work properly.
Is there anything wrong with this approach?

Related

Java/Quarkus Kafka Streams Reading/Writing to Same Topic based on a condition

Hello I have this issue that I'm trying to solve. Basically I have a Kafka Streams topology that will read JSON messages from a Kafka topic and that message gets deserialized into a POJO. Then ideally it will read check that message for a certain boolean flag. If that flag is true it will do some transformation and then write it back to the topic. However if the flag is false, I'm trying to have it not write anything but I'm not sure how I can go about it. With the MP Reactive Messaging I can just use an RxJava 2 Flowable Stream and return something like Flowable.empty() but I can't use that method here it seems.
JsonbSerde<FinancialMessage> financialMessageSerde = new JsonbSerde<>(FinancialMessage.class);
StreamsBuilder builder = new StreamsBuilder();
builder.stream(
TOPIC_NAME,
Consumed.with(Serdes.Integer(), financialMessageSerde)
)
.mapValues (
message -> checkCondition(message)
)
.to (
TOPIC_NAME,
Produced.with(Serdes.Integer(), financialMessageSerde)
);
The below is the function call logic.
public FinancialMessage checkCondition(FinancialMessage rawMessage) {
FinancialMessage receivedMessage = rawMessage;
if (receivedMessage.compliance_services) {
receivedMessage.compliance_services = false;
return receivedMessage;
}
else return null;
}
If the boolean is false it just returns a JSON body with "null".
I've tried changing the return type of the checkCondition function wrapped like
public Flowable<FinancialMessage> checkCondition (FinancialMessage rawMessage)
And then having the return from the if be like Flowable.just(receivedMessage) or Flowable.empty() but I can't seem to serialize the Flowable object. This might be a silly question but is there a better way to go about this?

Note that Kafka messages are immutable and not deleted after read, and if you read/write from the same topic with a single application, a message would be processed infinitely often (or to be more precise different copies of it) if you don't have a condition to "break" the cycle.
Also, if for example 5 services read from the same topic, all 5 services get a copy of every event. And if one service write back, all other 4 services and the writing service itself will read the message again. Thus, you get quite some data amplification.
If you have different services to react on the original input message consecutively, you could have one topic between each pair of consecutive services to really build a pipeline though.
Last, you say if the boolean flag is true you want to transform the message and emit (I assume for the next service to consumer). And for false you want to do nothing. I a further assume that for a message only a single flag will be true and a successful transformation also switches the flag (to enable processing by the next service). For this case, it's best if you can ensure that each original input message has the same initial boolean flag set to build your pipeline. Thus, only the corresponding service will read messages with its boolean flag set (you don't even need to check the boolean flag as your upstream write ensures that it's set; you could only have a sanity check).
If you don't know which boolean flag is set initially and all services read from the same input topic, just filtering out the message is correct. If all services read all messages, 4 services will filter the message while one service will process it and emit a new message with a different flag. For this architecture, a single topic might work: if a message is processed by all services and all boolean flags are false (after all services processed the message), and you write it back to the input topic, all services would drop the last copy correctly. However, using a single topic implies a lot of redundant reading/writing.
Maybe the best architecture is, to have your original input topic, and one additional input topic for each service. You also use an additional "dispatcher" service that read from the original input topics, and branches() the KStream into the service input topics according to the boolean flag. This way, each service will read only messages with the right flag set to true. Furthermore, each service will write to the input topic of the other services also using branch() after the message transformation to write it to the input topic of the correct next service. Last, you would want an output topic that each service can write into after a message is fully processed.

How to save message into database and send response into topic eventually consistent?

I have the following rabbitMq consumer:
Consumer consumer = new DefaultConsumer(channel) {
#Override
public void handleDelivery(String consumerTag, Envelope envelope, MQP.BasicProperties properties, byte[] body) throws IOException {
String message = new String(body, "UTF-8");
sendNotificationIntoTopic(message);
saveIntoDatabase(message);
}
};
Following situation can occur:
Message was send into topic successfully
Connection to database was lost so database insert was failed.
As a result we have data inconsistency.
Expected result either both action were successfully executed or both were not executed at all.
Any solutions how can I achieve it?
P.S.
Currently I have following idea(please comment upon)
We can suppose that broker doesn't lose any messages.
We have to be subscribed on topic we want to send.
Save entry into database and set field status with value 'pending'
Attempt to send data to topic. If send was successfull - update field status with value 'success'
We have to have a sheduled job which have to check rows with pending status. At the moment 2 cases are possible:
3.1 Notification wasn't send at all
3.2 Notification was send but save into database was failed(probability is very low but it is possible)
So we have to distinquish that 2 cases somehow: we may store messages from topic in the collection and job can check if the message was accepted or not. So if job found a message which corresponds the database row we have to update status to "success". Otherwise we have to remove entry from database.
I think my idea has some weaknesses(for example if we have multinode application we have to store messages in hazelcast(or analogs) but it is additional point of hypothetical failure)

Here is an example of Try Cancel Confirm pattern https://servicecomb.apache.org/docs/distributed_saga_3/ that should be capable of dealing with your problem. You should tolerate some chance of double submission of the data via the queue. Here is an example:
Define abstraction Operation and Assign ID to the operation plus a timestamp.
Write status Pending to the database (you can do this in the same step as 1)
Write a listener that polls the database for all operations with status pending and older than "timeout"
For each pending operation send the data via the queue with the assigned ID.
The recipient side should be aware of the ID and if the ID has been processed nothing should happen.
6A. If you need to be 100% that the operation has completed you need a second queue where the recipient side will post a message ID - DONE. If such consistency is not necessary skip this step. Alternatively it can post ID -Failed reason for failure.
6B. The submitting side either waits for a message from 6A of completes the operation by writing status DONE to the database.
Once a sertine timeout has passed or certain retry limit has passed. You write status to operation FAIL.
You can potentialy send a message to the recipient side opertaion with ID rollback.
Notice that all this steps do not involve a technical transactions. You can do this with a non transactional database.
What I have written is a variation of the Try Cancel Confirm Pattern where each recipient of message should be aware of how to manage its own data.

In the listener save database row with field staus='pending'
Another job(separated thread) will obtain all pending rows from DB and following for each row:
2.1 send data to topic
2.2 save into database
If we failured on the step 1 - everything is ok - data in consistent state because job won't know anything about that data
if we failured on the step 2.1 - no problem, next job invocation will attempt to handle it
if we failured on the step 2.2 - If we failured here - it means that next job invocation will handle the same data again. From the first glance you can think that it is a problem. But your consumer has to be idempotent - it means that it has to understand that message was already processed and skip the processing. This requirement is a consequence that all message brokers have guarantees that message will be delivered AT LEAST ONCE. So our consumers have to be ready for duplicated messages anyway. No problem again.

Here's the pseudocode for how i'd do it: (Assuming the dao layer has transactional capability and your messaging layer doesnt)
//Start a transaction
try {
String message = new String(body, "UTF-8");
// Ordering is important here as I'm assuming the database has commit and rollback capabilities, but the messaging system doesnt.
saveIntoDatabase(message);
sendNotificationIntoTopic(message);
} catch (MessageDeliveryException) {
// rollback the transaction
// Throw a domain specific exception
}
//commit the transaction
Scenarios:
1. If the database fails, the message wont be sent as the exception will break the code flow .
2. If the database call succeeds and the messaging system fails to deliver, catch the exception and rollback the database changes
All the actions necessary for logging and replaying the failures can be outside this method

If there is enough time to modify the design, it is recommended to use JTA like APIs to manage 2phase commit. Even weblogic and WebSphere support XA resource for 2 phase commit.
If timeline is less, it is suggested perform as below to reduce the failure gap.
Send data topic (no commit) (incase topic is down, retry to be performed with an interval)
Write data into DB
Commit DB
Commit Topic
Here failure will happen only when step 4 fails. It will result in same message send again. So receiving system will receive duplicate message. Each message has unique messageID and CorrelationID in JMS2.0 structure. So finding duplicate is bit straight forward (but this is to be handled at receiving system)
Both case will work for clustered environment as well.
Strict to your case, thought below steps might help to overcome your issue
Subscribe a listener listener-1 to your topic.
Process-1
Add DB entry with status 'to be sent' for message msg-1
Send message msg-1 to topic. Retry sending incase of any topic failure
If step 2 failed after certain retry, process-1 has to resend the msg-1 before sending any new messages OR step-1 to be rolled back
Listener-1
Using subscribed listener, read reference(meesageID/correlationID) from Topic, and update DB status to SENT, and read/remove message from topic. Incase reference-read success and DB update failed, topic still have message. So next read will update DB. Incase DB update success and message removal failed. Listener will read again and tries to update message which is already done. So can be ignored after validation.
Incase listener itself down, topic will have messages until listener reading the messages. Until then SENT messages will be in status 'to be sent'.

Does KStream process one message at a time?

I am using Kafka streaming and i have a doubt.
My code is
final KStream<String, Entity> inStream = builder.stream(TOPIC);
inStream.map((key, entity) -> {
....
return new KeyValue<>(key, entity);
}).to(NEW_TOPIC);`
Value of NEW_TOPIC is present in entity object. My problem is how to i extract the value of this NEW_TOPIC from entity in case of multiple task running.
My problems drill down to the fact that if there are multiple task, would kafka-streaming process the incoming message till end (by calling the to() method to push it back to new kafka topic) and then pull new message from incoming topic? If this is the behavior, i can store this value in a local /final variable to use it later. If this is not the behavior then i need to use some other way.

Design suggestion for handling large mailboxes using java mail api (IMAP)

We use java mail api with imap and fetch messages of folders containing millions of messages. There are some rules and limitiations:
We do not have always open connections to mail server and therefore we can not add listeners.
The messages will be stored in a local database with all properties, subject, body, receive date, from etc.
Can not use multiple threads
To keep the performance at acceptable levels and prevent out of memory crashes, I am planning:
1.During inital fetch, where all messages have to be fetched, store only the message headers and bypass body and attachments. Getting the body and attachment of a message will be done when requested by the client. The initialization can take hours, it is not a problem.
2.When fetching all messages at start, use a appropriate fetch profile to make it faster, but process in blocks, for example:
Message m1[] = f.getMessages(1, 10000);
f.fetch(m1, fp);
//process m1 array
Message m2[] = f.getMessages(10001, 20000);
f.fetch(m2, fp);
//process m2 array
instead of
Message m_all[] = f.getMessages(1, NUMALLMESSAGES);
f.fetch(m_all, fp);
//process m_all array, may throw out of memory errors
3.And after we have all the messages, store the UID of recent message in the DB and on the next fetch perform:
f.getMessagesByUID(LASTUIDREADFROMDB, UIDMAX)
Do you have additional suggestions, or see any points we have to care of (memory, performance)

apache camel - add message alert to deadletter queue

String queueA = "rabbitmq://host:5672/queue-a.exchange?queue=queue-a.exchange..etc
from(queueA)
.routeId("idForQueueA")
.onException(Exception.class)
.maximumRedeliveries(0)
// .processRef("sendEmailAlert") * not sure this belongs here*
.to(deadLetterQueueA)
.useOriginalMessage()
.end()
.processRef("dataProcessing")
.processRef("dataExporting")
.end();
Explaining the code above:
Messages are taken from queueA. Upon various processes being successful the message is consumed. If it fails its added to the dead letter queue "deadLetterQueueA". This all works ok.
My question is
When messages arrive in the deadletter queue I want to add alerts so we know to do something about it... How could I to add an email alert when a message arrives in the dead letter queue. I dont want to lose the original message if the alert fails - nor do I want the alert to consume the message.
My thoughts are.. I would need to split the message on an exception so its sent to two different queues? One for the alert which then sends out an email alert and then consumes itself. Then one for the dead letter queue that just sites there? However I'm not sure how to do this?

You can split a message to go to multiple endpoints using a multicast (details here):
.useOriginalMessage().multicast().to(deadLetterQueueA, "smtp://username#host:port?options")
This uses the camel mail component endpoints described here. Alternatively, you can continue processing the message after the to. So something like:
.useOriginalMessage()
.to(deadLetterQueueA)
.transform().simple("Hi <name>, there has been an error on the object ${body.toString}")
.to("smtp://username#host:port?options")
If you had multiple recipients, you could use a recipients list
public class EmailListBean {
#RecipientList
public String[] emails() {
return new String[] {"smtp://joe#host:port?options",
"smtp://fred#host:port?options"};
}
}
.useOriginalMessage()
.to(deadLetterQueueA)
.transform().simple("...")
.bean(EmailListBean.class)
Be careful of using JMS queues to store messages while waiting for a human to action them. I don't know what sort of message traffic you're getting. I'm assuming if you want to send an email for every failure, it's not a lot. But I would normally be wary of this sort of thing, and chose to use logging or database persistence to store the results of errors, and only use a JMS error queue to notify other processes or consumers of the error or to schedule a re-try.

There are two ways you can do this , but based on your message volume you might not want to send email on every failed message.
You can use the solution provided by AndyN , or you can use the Advisory Topics ActiveMQ.Advisory.MessageDLQd.Queue.* , whenever a message gets in to the DLQ the enqueue count of the topic will increase by 1 . By monitoring the Queue Depth you might now be able to send a mail to based on the number of the errors that ocurred.
If you want to do it at the producer end. You can use any one of the solutions provided by AndyN

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.