ActiveMQ : dead letter queue keeps my messages order

ActiveMQ : dead letter queue keeps my messages order - java

I use ActiveMQ as a broker to deliver messages. Theses messages are intented to be written in a dabatase. Sometimes, the database is unreachable or down. In that case, I want to rollback my message to retry later this message and I want to continue reading other messages.
This code works fine, except one point : the rollbacked message is blocking me from reading the others :
private Connection getConnection() throws JMSException {
RedeliveryPolicy redeliveryPolicy = new RedeliveryPolicy();
redeliveryPolicy.setMaximumRedeliveries(3); // will retry 3 times to dequeue rollbacked messages
redeliveryPolicy.setInitialRedeliveryDelay(5 *1000); // will wait 5s to read that message
ActiveMQConnectionFactory connectionFactory = new ActiveMQConnectionFactory(user, password, url);
Connection connection = connectionFactory.createConnection();
((ActiveMQConnection)connection).setUseAsyncSend(true);
((ActiveMQConnection)connection).setDispatchAsync(true);
((ActiveMQConnection)connection).setRedeliveryPolicy(redeliveryPolicy);
((ActiveMQConnection)connection).setStatsEnabled(true);
connection.setClientID("myClientID");
return connection;
}
I create my session this way :
session = connection.createSession(true, Session.SESSION_TRANSACTED);
Rollback is easy to ask :
session.rollback();
Let's imagine I have 3 messages in my queue :
1: ok
2: KO (will need to be treated again : the message I want to rollback)
3: ok
4: ok
My consumer will do (linear sequence) :
commit 1
rollback 2
wait 5s
rollback 2
wait 5s
rollback 2
put 2 in dead letter queue (ActiveMQ.DLQ)
commit 3
commit 4
But I want :
commit 1
rollback 2
commit 3
commit 4
wait 5s
rollback 2
wait 5s
rollback 2
wait 5s
put 2 in dead letter queue (ActiveMQ.DLQ)
So, how can I configure my Consumer to delay my rollbacked messages later ?

This is actually expected behavior, because message retries are handled by the client, not the broker. So, since you have 1 session bound, and your retry policy is setup for the 3 retries before DLQ, then the entire retry process blocks that particular thread.
So, my first question is that if the database insert fails, wouldn't you expect all of the rest of your DB inserts to fail for a similar reason?
If not, the way to get around that is to set the retry policy for that queue to be 0 retries, with a specific DLQ, so that messages will fail immediately and go into the DLQ. Then have another process that pulls off of the DLQ every 5 seconds and reprocesses and/or puts it back in the main queue for processing.

Are you using the <strictOrderDispatchPolicy /> in the ActiveMQ XML config file? I'm not sure if this will affect the order of messages for redelivery or not. If you are using strict order dispatch, try commenting out that policy to see if that changes the behavior.
Bruce

I had same problem, i haven't found solution here so decided to post it here after i found one for people struggling with the same.
This is fixed prior to version 5.6 when you set property nonBlockingRedelivery to true in connection factory:
<property name="nonBlockingRedelivery" value="true" />

Related

Kafka transaction commit timeout exception handling

I am running kafka transactions on a large scale and below is the codepiece.
producer.initTransaction();
try {
producer.beginTransaction();
producer.send(new ProducerRecord<>(producerTopic, element));
producer.commitTransaction();
} catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException e) {
producer.close();
canSendNext = false;
}catch (KafkaException e) {
producer.abortTransaction();
}
properties used:
ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, STRING_SERIALIZER
ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, BYTE_ARRAY_SERIALIZER
ProducerConfig.TRANSACTIONAL_ID_CONFIG, UUID.randomUUID().toString()
bootstrap.servers=localhost:9092
acks=all
retries=1
partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner
while the commitTransaction is timing out the catch block of KafkaException runs and try to abort the transaction.
Which fails with the error: **
Cannot attempt operation abortTransaction because the previous call to commitTransaction timed out and must be retried**
how to handle commit transaction timeout scenario
expecting the code to work

As per the documentation:
Note that this method will raise TimeoutException if the transaction
cannot be committed before expiration of max.block.ms. Additionally,
it will raise InterruptException if interrupted. It is safe to retry
in either case, but it is not possible to attempt a different
operation (such as abortTransaction) since the commit may already be
in the progress of completing. If not retrying, the only option is to
close the producer.
The problem here is that the kafka producer has timed out and it does not know whether the kafka broker will complete the transaction or not. So it cannot provide you abort transaction functionality as there is a chance that the transaction might have been committed by the broker even after the producer times out.
You should configure your kafka producer with a sufficiently large max.block.ms and good number of retries to handle such scenarios (Not sure why you have configured your retries to 1). Ideally you should be timing out very rarely - like when there is some issue in the network or there is some actual issue going on in kafka brokers.
In such scenarios, it won't be possible for you to know if your last transaction was actually successful or not. You cannot do anything but close your kafka producer and create a new one.

Kafka Transactions are not rolling back after interrupting current thread on timeout

I'm trying to interrupt current thread if http request times out. I have setup PlatformTransactionManager for Kafka Transactions as a bean. I'm using #Transactional annotation at method level. We are publishing message in 3 topics. After publishing message in first topic I'm putting Thread.sleep(5000) and current thread is interrupting from filter if execution takes more than 6 seconds. So here call is getting interrupted but message is getting published to Kafka. We are just producing the message. We are not consuming any message but able to see message in our internal Kafka Inspection Tool. We are using KafkaTemplate.send() to send message.

Producer records are always written to the log, even if rolled back. There is a special record in the slot following the published record(s) to indicate whether the transaction was committed or rolled back.
Consumers' isolation.level is read_uncommitted by default; you need to set it to read_committed to avoid seeing rolled-back records.
https://kafka.apache.org/documentation/#consumerconfigs_isolation.level

Kafka streaming - TimeoutException: Expiring * record(s) for TOPIC:* ms has passed since batch creation

Streaming application is rolled out in production and right after 10 days observing errors/warnings in the CustomProductionExceptionHandler for expired transactions which belongs to older day window.
FLOW :
INPUT TOPIC --> STREAMING APPLICATION(Produces stats and emits after day window closed) --> OUTPUT TOPIC
Producer continuously trying to publish records to OUTPUT Topic which is already expired with older window and logs an error into CustomProductionExceptionHandler.
I have reduced batch size and kept default but this change is not yet promoted to production.
CustomProductionExceptionHandler Implementation: To Avoid streaming to die due to NeworkException,TimeOutException.
With this implementation producer does not retry and in case of any exceptions it does CONTINUE.. On other side upon returning FAIL.. stream thread dies and does not auto restart..Need suggestions..
public class CustomProductionExceptionHandler implements ProductionExceptionHandler {
#Override
public ProductionExceptionHandlerResponse handle(final ProducerRecord<byte[], byte[]> record,
final Exception exception) {
String recordKey = new String(record.key());
String recordVal = new String(record.value());
String recordTopic = record.topic();
logger.error("Kafka message marked as processed although it failed. Message: [{}:{}], destination topic: [{}]", recordKey,recordVal,recordTopic,exception);
return ProductionExceptionHandlerResponse.CONTINUE;
}
}
Exception:
2019-12-20 16:31:37.576 ERROR com.jpmc.gpg.exception.CustomProductionExceptionHandler.handle(CustomProductionExceptionHandler.java:19) kafka-producer-network-thread | profile-day-summary-generator-291e69b1-5a3d-4d49-8797-252c2ae05607-StreamThread-19-producerid - Kafka message marked as processed although it failed. Message: [{"statistics":{}], destination topic: [OUTPUT-TOPIC]
org.apache.kafka.common.errors.TimeoutException: Expiring * record(s) for TOPIC:1086149 ms has passed since batch creation
Trying to get answer for below questions.
1) Why producer is trying to publish older transactions to OUTPUT Topic for which day window is already closed?
Example - Producer is trying to send 12/09 day window transaction but current opened window is 12/20
2) Streaming threads could have been died without CustomProductionExceptionHandler -->
ProductionExceptionHandlerResponse.CONTINUE.
Do we have any way that Producer can do retries in case of NetworkException or TimeoutException and
then continue instead of stream thread die?
Problem of specifying ProductionExceptionHandlerResponse.CONTINUE in the
CustomProductionExceptionHandler is - In case of any exception it skips
that record publishing to output topic and proceed with next records. No Resiliency.

1) It's not really possible to answer this question without knowing what your program does. Note, that in general, Kafka Streams works on event-time and handle out-of-order data.
2) You can configure all internally used client of a Kafka Streams application (ie, consumer, producer, admin client, and restore consumer) by specifying the corresponding client configuration in the Properties you pass into KafkaStreams. If you wand different configs for different clients, you can prefix them accordingly, ie, producer.retries instead of retries. Check out the docs for more details: https://docs.confluent.io/current/streams/developer-guide/config-streams.html#ak-consumers-producer-and-admin-client-configuration-parameters

How to save message into database and send response into topic eventually consistent?

I have the following rabbitMq consumer:
Consumer consumer = new DefaultConsumer(channel) {
#Override
public void handleDelivery(String consumerTag, Envelope envelope, MQP.BasicProperties properties, byte[] body) throws IOException {
String message = new String(body, "UTF-8");
sendNotificationIntoTopic(message);
saveIntoDatabase(message);
}
};
Following situation can occur:
Message was send into topic successfully
Connection to database was lost so database insert was failed.
As a result we have data inconsistency.
Expected result either both action were successfully executed or both were not executed at all.
Any solutions how can I achieve it?
P.S.
Currently I have following idea(please comment upon)
We can suppose that broker doesn't lose any messages.
We have to be subscribed on topic we want to send.
Save entry into database and set field status with value 'pending'
Attempt to send data to topic. If send was successfull - update field status with value 'success'
We have to have a sheduled job which have to check rows with pending status. At the moment 2 cases are possible:
3.1 Notification wasn't send at all
3.2 Notification was send but save into database was failed(probability is very low but it is possible)
So we have to distinquish that 2 cases somehow: we may store messages from topic in the collection and job can check if the message was accepted or not. So if job found a message which corresponds the database row we have to update status to "success". Otherwise we have to remove entry from database.
I think my idea has some weaknesses(for example if we have multinode application we have to store messages in hazelcast(or analogs) but it is additional point of hypothetical failure)

Here is an example of Try Cancel Confirm pattern https://servicecomb.apache.org/docs/distributed_saga_3/ that should be capable of dealing with your problem. You should tolerate some chance of double submission of the data via the queue. Here is an example:
Define abstraction Operation and Assign ID to the operation plus a timestamp.
Write status Pending to the database (you can do this in the same step as 1)
Write a listener that polls the database for all operations with status pending and older than "timeout"
For each pending operation send the data via the queue with the assigned ID.
The recipient side should be aware of the ID and if the ID has been processed nothing should happen.
6A. If you need to be 100% that the operation has completed you need a second queue where the recipient side will post a message ID - DONE. If such consistency is not necessary skip this step. Alternatively it can post ID -Failed reason for failure.
6B. The submitting side either waits for a message from 6A of completes the operation by writing status DONE to the database.
Once a sertine timeout has passed or certain retry limit has passed. You write status to operation FAIL.
You can potentialy send a message to the recipient side opertaion with ID rollback.
Notice that all this steps do not involve a technical transactions. You can do this with a non transactional database.
What I have written is a variation of the Try Cancel Confirm Pattern where each recipient of message should be aware of how to manage its own data.

In the listener save database row with field staus='pending'
Another job(separated thread) will obtain all pending rows from DB and following for each row:
2.1 send data to topic
2.2 save into database
If we failured on the step 1 - everything is ok - data in consistent state because job won't know anything about that data
if we failured on the step 2.1 - no problem, next job invocation will attempt to handle it
if we failured on the step 2.2 - If we failured here - it means that next job invocation will handle the same data again. From the first glance you can think that it is a problem. But your consumer has to be idempotent - it means that it has to understand that message was already processed and skip the processing. This requirement is a consequence that all message brokers have guarantees that message will be delivered AT LEAST ONCE. So our consumers have to be ready for duplicated messages anyway. No problem again.

Here's the pseudocode for how i'd do it: (Assuming the dao layer has transactional capability and your messaging layer doesnt)
//Start a transaction
try {
String message = new String(body, "UTF-8");
// Ordering is important here as I'm assuming the database has commit and rollback capabilities, but the messaging system doesnt.
saveIntoDatabase(message);
sendNotificationIntoTopic(message);
} catch (MessageDeliveryException) {
// rollback the transaction
// Throw a domain specific exception
}
//commit the transaction
Scenarios:
1. If the database fails, the message wont be sent as the exception will break the code flow .
2. If the database call succeeds and the messaging system fails to deliver, catch the exception and rollback the database changes
All the actions necessary for logging and replaying the failures can be outside this method

If there is enough time to modify the design, it is recommended to use JTA like APIs to manage 2phase commit. Even weblogic and WebSphere support XA resource for 2 phase commit.
If timeline is less, it is suggested perform as below to reduce the failure gap.
Send data topic (no commit) (incase topic is down, retry to be performed with an interval)
Write data into DB
Commit DB
Commit Topic
Here failure will happen only when step 4 fails. It will result in same message send again. So receiving system will receive duplicate message. Each message has unique messageID and CorrelationID in JMS2.0 structure. So finding duplicate is bit straight forward (but this is to be handled at receiving system)
Both case will work for clustered environment as well.
Strict to your case, thought below steps might help to overcome your issue
Subscribe a listener listener-1 to your topic.
Process-1
Add DB entry with status 'to be sent' for message msg-1
Send message msg-1 to topic. Retry sending incase of any topic failure
If step 2 failed after certain retry, process-1 has to resend the msg-1 before sending any new messages OR step-1 to be rolled back
Listener-1
Using subscribed listener, read reference(meesageID/correlationID) from Topic, and update DB status to SENT, and read/remove message from topic. Incase reference-read success and DB update failed, topic still have message. So next read will update DB. Incase DB update success and message removal failed. Listener will read again and tries to update message which is already done. So can be ignored after validation.
Incase listener itself down, topic will have messages until listener reading the messages. Until then SENT messages will be in status 'to be sent'.

Failure retry in EJB 3

We have recently migrated our EJB 2 application to EJB 3.In EJB2 if some failures in onMessage container will be able to do retry the message on configured number of times however in EJB3 there is no such option.Could someone help on this.
Can we explicitly sleep the thread and do explicitly retry in onMessage?
Thanks in advance .

If you are using #TransactionManagement(value=
TransactionManagementType.CONTAINER) that is container managed
transaction then on exception, message will be retired 10 time
before the message is send to the DLQ.
If you are not using Activemq RA then, following two documents can
be useful to you if you are having Container-Managed Transaction
Redelivery and Exception Handling and Managing Rolled Back,
Recovered, Redelivered, or Expired Messages
If you are using ActiveMq resource adapter, use can use
MaximumRedeliveries Resource Adapter properties
Else, if you want to retry only on specific exception then you can
catch the exception and then send the message back to the same queue
and with this additional property. Activemq consume message after
delay interval Also, set the retry count in the message header
so that you can keep the track of the retries.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.