How to prevent duplicated msg from happening in Google Cloud PubSub?
Say, I have a code that handles the msg that it is subscribed for.
Say, I have 2 nodes with the same Service that has this code.
Once one has received the msg but not yet acknowledged it, another node will receive the same message. And this is where there's the problem that we have two duplicated msgs.
void messageReceiver(PubsubMessage pubsubMessage, AckReplyConsumer ackReply) {
submitHandler.handle(toMessage(pubsubMessage))
.doOnSuccess((response) -> {
log.info("Acknowledging the successfully processed message id: {}, response {}", pubsubMessage.getMessageId(), response);
ackReply.ack(); // <---- acknowledged
})
.doOnError((e) -> {
log.error("Not acknowledging due to an exception", e);
ackReply.nack();
})
.doOnTerminate(span::finish)
.subscribe();
}
What is the solution for this? Is it normal behaviour?
Google Cloud Pub/Sub uses "At-Least-Once" delivery. From the docs:
Typically, Cloud Pub/Sub delivers each message once and in the order in which it was published. However, messages may sometimes be delivered out of order or more than once. In general, accommodating more-than-once delivery requires your subscriber to be idempotent when processing messages.
This means it guarantees it will deliver the message 1:N times, so you can potentially get the message multiple times if you don't pipe it through something else that deduplicates it first. There isn't a setting you can define to guarantee exactly once delivery. The docs do reference you can get the behavior you desire using Cloud Dataflow's PubSubIO, but that solution appears to be deprecated:
You can achieve exactly once processing of Cloud Pub/Sub message streams using Cloud Dataflow PubsubIO. PubsubIO de-duplicates messages on custom message identifiers or those assigned by Cloud Pub/Sub.
Saying all of this, I've never actually seen Google Cloud Pub/Sub send a message twice. Are you sure that's really the problem you're having, or is the message being reissued because you are not acknowledging the message within the Acknowledgement Deadline (as you stated above, this defaults to 10 seconds). If you don't acknowledge it, it will get reissued. From the docs (emphasis mine):
A subscription is created for a single topic. It has several properties that can be set at creation time or updated later, including:
An acknowledgment deadline: If your code doesn't acknowledge the message before the deadline, the message is sent again. The default is 10 seconds. The maximum custom deadline you can specify is 600 seconds (10 minutes).
If that's the situation, just acknowledge your messages within the deadline and you won't see these duplicates as often.
You can use Redis from Memorystore in order to deduplicate messages. Your publisher should add trace iD to the message body just before publishing it to PubSub. On the other side client (subscriber) should check if the trace ID is in the cache - skip the message. If there is no such message - process the message and add trace ID to cache with 7-8 days expiry time (PubSub deadline is 7 days). In such a simple way You can grant the correct messages received.
All messages in a given topic have a unique messageID field:
ID of this message, assigned by the server when the message is published. Guaranteed to be unique within the topic. This value may be read by a subscriber that receives a PubsubMessage via a subscriptions.pull call or a push delivery. It must not be populated by the publisher in a topics.publish call.
You can use it to deduplicate incoming messages. No need to manually assigning ID.
It is a bit harder in distributed systems (e.g. multiple instances of consumers for a given subscription). You would need a global synchronization mechanism, the simplest would be to setup database (e.g. Redis) and use it to keep processed messages IDs.
You should take a look at Replaying and discarding messages which describes how to configure message retention.
There are two properties of subscription:
retain_acked_messages - keep acknowledge messages,
message_retention_duration - how long to keep messages.
If you do not plan to rewind your subscription to a past point in time, e.g. if you do not plan to reprocess messages or have bugs forcing you to reset your subscription you can set retain_acked_messages=false and message_retention_duration='3600s'. This will allow you to keep only last hour message IDs.
Bear in mind that PubSub message also have publish_time so you don't need to add it in your message's data. It can be used with message_id. Both of these are set by a PubSub server when it receives a message.
Related
I am trying to implement the following scenario and I could really use and appreciate some help. I am using ActiveMQ 5.14 with camel 2.21.
In the queue, each message corresponds to a single machine. The machines connect to the queue through a single polling consumer and are indistinguishable to the consumer. The messages should be kept in the queue until one machine acknowledges that it has reached the correct machine via a separate request. After each fetch of a message said message should be locked for a certain time.
I could not find any ActiveMQ functionality that translates to my problem. My approach would be to send the message after each fetch to a second queue, which serves as a lock mechanism and send it back to the fetchable queue after the specified timeout.
Maybe a better approach would be to rollback the session after each fetch if the message has not been acknowledged by the machine.
Do you have any suggestions what a viable solution to this problem would look like?
edit: more details to clarify the situation
The application communicates to the clients via exposing a REST API to the web with two calls: GET and DELETE.
GET fetches the next message from the queue and DELETE deletes the message from the queue. I need to make sure that a message is only fetched once in a given time period and that it makes its way back to the queue if the client doesn't send a DELETE request. Currently I have a route from the rest service to a bean which fetches a message from the queue returns it to the GET request and sends it back to the queue after. On a DELETE request I dequeue the message from the queue with the given id.
I still need to find a way to ensure that the last fetched message cant be accessed for a specified time period.
I am a bit confused about the part with the indistinguishable machines, but I understood the following:
You have 1 queue with messages
You have 1 consumer
The consumer takes a message and calls a service or similar
If the call is successful the message can be deleted
If the call fails the message must be reprocessed
If these assumptions are correct, you can build a Camel route that consumes messages from the queue (transacted) and calls the service.
If the Camel route fails to complete (service returns error) and the error is not handled, the broker does a rollback and the message is redelivered (immediately)
If the route fails multiple times (when max redelivery value is reached), the message is sent to the dead letter queue (by the broker) to move it out of the way but save it
If the route succeeds the message consumption is committed to the broker and the message deleted
In such a setup you could also configure more consumers to process the messages in parallel (if the called service allows this)
This behaviour is more or less the default behaviour if
Your broker is configured as persistent (avoid message loss)
You consume messages transacted (a local transaction with the broker is enough)
Camel does not handle the errors (when they are handled, the message is committed because the broker does not "see" any error)
You get an error from the service or you can at least decide if there was a problem and throw the error yourself. The broker must get an Exception so that a rollback is done
EDIT
Based on your clarification I understand that it is the other way round than I assumed.
Well then I would probably see the two request types as "workflow steps" since they are triggered from the clients.
GET
Consume a message, send it to requestor
Add a timestamp to the message header
Send the message to another queue (let's call it delievered)
DELETE
Dequeue the message from the delievered queue
Not deleted messages
Use the timestamp header and message selectors to consume not deleted messages after a certain amount of time
Move them back to the source queue
With a second queue you have various advantages
Messages in processing cannot be consumed again and therefore need no "lock"
The source queue contains only waiting messages, the delievered queue only messages in processing
You could increase message priority when sending not deleted messages back to the source queue so they are re-consumed fast
You could also add a counter header when sending not deleted messages back to the source queue to identify messages that are failed multiple times and process them in another way.
If application is publishing message to TOPIC [JMS provider Tibco] in single thread And receiver also reads message in single thread.
Still Experiencing rare out of order delivery to receiver.
Relying on JMS Message ID to verify sequencing of the message delivered to JMS Provider.
Is current design good for in order delivery to receiver ? and reliance on JMS message ID to verify the order in which Message received by JMS provider is correct ?
According to the Java Doc, the messageId is only specified to be unique. http://docs.oracle.com/javaee/5/api/javax/jms/Message.html#getJMSMessageID()
As it is not stated that it is related to message sequencing, I would suggest that it is not, and would recommend against using it for such purposes.
#Stephan's answer is correct.
I just want to add that the javadoc states that the message ID is generated when the message is sent, not when it is received. So even if the IDs are sequential, they don't tell you the order of delivery.
MessageID generated by JMS is expected to be unique. However the logic used by a JMS provider to generate a MessageID could be from a simple incremental of a number to usage of a complex algorithm. You should not assume MessageIDs will be sequential. If your application requires message delivery order, then you can look at using message groups where messages in the group are ordered and you can wait till all messages in group are received before you start processing messages.
Our project is to integrate two applications, using rest api of each, using JMS(to provide asynchronous nature) and spring batch to read bulk data from the JMS queue and process it and then post it to the receiving application.
I am a newbie to both JMS and spring batch. I have few basic questions to ask:
Which JMS model to ahead with-(PTP or Pub/Sub)
Can messages be read in bulk from the JMS queue(using JMSItemReader). If yes, can anyone pls provide with a code.
We want to acknowledge messages as 'read' once it is successfully posted (ie. read-process-write) to receiving application and not when it is read by the JMSItemReader. How can we achieve this?
The high level design diagram is below
PTP vs Pub/sub
The point to point method using a message queue is the most standard method to go. Especially in a batch application I can not see immediate reason to use Publish subscribe which presumes you have multiple consumers of the same messages.
Theoretically If multiple functions need to be executed over the same chunks of data you can organize the different processors as subscribers this way scaling the application, but this is pretty advanced usage scenario.
Can messages be read in bulk from JMS queue:
The JMS specification here only talks (vaguely might be misreading it) about bulk acknowledgment of messages, but it does not set a requirement over bulk delivery of messages.
CLIENT_ACKNOWLEDGE - With this option, a client acknowledges a message
by calling the message’s acknowledge method. Acknowledging a consumed
message automatically acknowledges the receipt of all messages that
have been delivered by its session.
Simply put the answer with respect of the bulk delivery is "If the JMS provider supports it, then yes, otherwise no"
Most providers allow bulk acknowledgment of messages
Here is the Oracles' interface doing that:
public interface com.sun.messaging.jms.Message {
void acknowledgeThisMessage() throws JMSException;
void acknowledgeUpThroughThisMessage() throws JMSException;
}
A combination of CLIENT_ACKNOWLEDGE . + invoking the method acknowledgeUpThroughThisMessage on a . message will acknowledge all messages received up to that moment in time.
Manual acknowledgment of messages:
This can be achieved through CLIENT_ACKNOWLEDGE and the acknowledge method on the Message. Here I will quote you again the javadoc of the acknowledge method which is also referencing one more time your second question and it talks about bulk acknowledgment of all messages up to a point.
void acknowledge()
throws JMSException Acknowledges all consumed messages of the session of this consumed message. All consumed JMS messages
support the acknowledge method for use when a client has specified
that its JMS session's consumed messages are to be explicitly
acknowledged. By invoking acknowledge on a consumed message, a client
acknowledges all messages consumed by the session that the message was
delivered to.
Calls to acknowledge are ignored for both transacted sessions and
sessions specified to use implicit acknowledgement modes.
A client may individually acknowledge each message as it is consumed,
or it may choose to acknowledge messages as an application-defined
group (which is done by calling acknowledge on the last received
message of the group, thereby acknowledging all messages consumed by
the session.)
Messages that have been received but not acknowledged may be
redelivered.
I have Tibco EMS server, some topics and number of durable subscriptions to this topics(more than one to every topic).
My task is to delete(by receiving them with appropriate acknowledge mode) messages for specific durable subscriber.
My question: is it possible to manage subscriber's pending messages by "substitute" it with my own subscriber(with the same name, id)? And it's important not to affect topic's pending messages, in other words, delete some messages from one topic subscription, but remain those messages in other topic(the same topic) subscription.
Well, I've found the answer, just forgot to post it before.
As mentioned above, under question itself, there is no way to delete messages from topic. But I had little different task: to delete messages under specific durable subscription. And this is real(with some conditions).
Lets say, you have to delete messages from durable subscription "MySubscr". To do so you should create connection and create Durable Subscriber with same name "MySubscr". But it's not enough. If you just do so, then another durable subscriber will be created with the same name, but under connection with different ClientID . And it will operate as standalone durable connection without any impact to required "MySubscr" durable(actually, they will look like MySubscr:123 and MySubscr:567 durable subscription, where 123 and 567 are the ClientIDs, at least for TibcoEMS). To fix it, you should set ClientID explicitly to your connection by connection.setClientID() method, but you can do it only if initial connection is not connected(that's why I noted about durable subscriber, it can accumulate messages without subscriber connected).
So you should wait until subscriber will disconnect by itself(isConnected() method for TibcoEMS, I didn't see similar method in JMS API, but suppose most of implementations have something like this) or to destroy connection(with certain ClientID) manually(TibjmsAdmin.destroyConnection() method from TibcoEMS). And after this set the ClientID to your connection and get access to the messages of this subscriber. You can read messages by consuming them with Acknowledge mode Client(then they will remain in the topic), or with the mode Auto(then they will be deleted).
Important note: you can't consume some certain message, all the messages are consumed like in queue, so you can do it only one-by-one. If you found some unwanted message and wish to delete it(by consuming with autoacknowledge mode or by calling acknowledge() method on the message) then you'll lost all prior messages with it. AFAIK, there is no way to delete message without deleting prior ones.
Another important note: while you doing your messages magic it is important for the initial client not to connect again till your connection isn't closed, because it will get DublicateClientIDException(if it is using certain ClientID) or it will create another Durable Subscription which will have no access to the prior messages from the subscription.
How to ensure message acknowledge deletes only messages upto the message on which acknowledge is called in a jms broker.
Currently I have a system which consumes from a jms queue and partially processes it.Sometime later a batch of these messages gets persisted by a different thread. I need to acknowledge on messages now. But the problem is I have to stop consuming the messages, otherwise acknowledging a previously received message will also acknowledge all other subsequent messages received.
In other words suppose I have 10 messages in a queue. I consume 7 of them, and then acknowledge on 5th message. This in turn removes all 7 messages received by consumer from the queue.Is there a way to only acknowledge and remove messages from queue till 5th message.
EDIT: I have tried creating two sessions and consuming from different sessions, but (with apache qpid atleast) this performs inconsistently. By inconsistently I mean, sometimes during the test it so happens that one consumer is able to receive messages, while the other doesn't receive at all, no matter how long you wait. This would have worked for me as a solution, but because of inconsistency can't use this as a solution.
I understand this post is old, but this answer should benefit those who stumble upon it later.
If you'd like fine grained control of which messages you'd like to acknowledge, the individual acknowledge method should help you. Using this acknowledgement mode you can ack individual messages in a session. Messages that have not been ack-ed will be redelivered.
This is not part of the spec, but most queue providers support it outside the spec.
Oracle
For more flexibility, Message Queue lets you customize the JMS
client-acknowledge mode. In client-acknowledge mode, the client
explicitly acknowledges message consumption by invoking the
acknowledge() method of a message object.
The standard behavior of
this method is to cause the session to acknowledge all messages that
have been consumed by any consumer in the session since the last time
the method was invoked. (That is, the session acknowledges the current
message and all previously unacknowledged messages, regardless of who
consumed them.)
In addition to the standard behavior specified by JMS, Message Queue
lets you use client-acknowledge mode to acknowledge one message at a
time.
public interface com.sun.messaging.jms.Message {
void acknowledgeThisMessage() throws JMSException;
void acknowledgeUpThroughThisMessage() throws JMSException;
}
ActiveMQ
One can imagine other acknowledge modes that would be useful too, for
example: CONSUMER_ACKNOWLEDGE where Message.acknowledge() would
acknowledge only messages received up on a particular MessageConsumer,
or CONSUMER_CHECKPOINT_ACKNOWLEDGE where Message.acknowledge() would
acknowledge only messages received up to and including the Message
instance on which the method was called.
But without embarking on all these various different possibilities,
would it be possible to consider just adding INDIVIDUAL_ACKNOWLEDGE
mode? This alone would make it possible for multithreaded applications
to achieve whatever behaviors they need.
connection.createQueueSession(false, ActiveMQSession.INDIVIDUAL_ACKNOWLEDGE);
I have not used QPID personally, however the documentation hints to the fact that individual message acks are possible.
Examples
# acknowledge all received messages
session.acknowledge
# acknowledge a single message
session.acknowledge :message => message
While processing a batch you can ack each message that is received and processed. If you encounter an exception, do not ack the message.
Acknowledging a message will make the queue manager to remove that message plus all other messages received before that message. It should not remove the messages which have not yet been received by an application. You may want to check your application on how you are acknowledging a message.